How To Design A Database for a "Check In" Social Service - mysql

I want to build a "check in" service like FourSquare or Untappd.
How do I design a suitable database schema for storing check-ins?
For example, suppose I'm developing "CheeseSquare" to help people keep track of the delicious cheeses they've tried.
The table for the items into which one can check in is fairly simple and would look like
+----+---------+---------+-------------+--------+
| ID | Name | Country | Style | Colour |
+----+---------+---------+-------------+--------+
| 1 | Brie | France | Soft | White |
| 2 | Cheddar | UK | Traditional | Yellow |
+----+---------+---------+-------------+--------+
I would also have a table for the users, say
+-----+------+---------------+----------------+
| ID | Name | Twitter Token | Facebook Token |
+-----+------+---------------+----------------+
| 345 | Anne | qwerty | poiuyt |
| 678 | Bob | asdfg | mnbvc |
+-----+------+---------------+----------------+
What's the best way of recording that a user has checked in to a particular cheese?
For example, I want to record how many French cheeses Anne has checked-in. Which cheeses Bob has checked into etc. If Cersei has eaten Camembert more than 5 times etc.
Am I best putting this information in the user's table? E.g.
+-----+------+------+--------+------+------+---------+---------+
| ID | Name | Blue | Yellow | Soft | Brie | Cheddar | Stilton |
+-----+------+------+--------+------+------+---------+---------+
| 345 | Anne | 1 | 0 | 2 | 1 | 0 | 5 |
| 678 | Bob | 3 | 1 | 1 | 1 | 1 | 2 |
+-----+------+------+--------+------+------+---------+---------+
That looks rather ungainly and hard to maintain. So should I have separate tables for recordings check in?

No, don't put it into the users table. That information is better stored in a join table which represents a many-to-many relationship between users and cheeses.
The join table (we'll call cheeses_users) must have at least two columns (user_ID, cheese_ID), but a third (a timestamp) would be useful too. If you default the timestamp column to CURRENT_TIMESTAMP, you need only insert the user_ID, cheese_ID into the table to log a checkin.
cheeses (ID) ⇒ (cheese_ID) cheeses_users (user_ID) ⇐ users (ID)
Created as:
CREATE TABLE cheeses_users
cheese_ID INT NOT NULL,
user_ID INT NOT NULL,
-- timestamp defaults to current time
checkin_time DATETIME DEFAULT CURRENT_TIMESTAMP,
-- (add any other column *specific to* this checkin (user+cheese+time))
--The primary key is the combination of all 3
-- It becomes impossible for the same user to log the same cheese
-- at the same second in time...
PRIMARY KEY (cheese_ID, user_ID, checkin_time),
-- FOREIGN KEYs to your other tables
FOREIGN KEY (cheese_ID) REFERENCES cheeses (ID),
FOREIGN KEY (user_ID) REFERENCES users (ID),
) ENGINE=InnoDB; -- InnoDB is necessary for the FK's to be honored and useful
To log a checkin for Bob & Cheddar, insert with:
INSERT INTO cheeses_users (cheese_ID, user_ID) VALUES (2, 678);
To query them, you join through this table. For example, to see the number of each cheese type for each user, you might use:
SELECT
u.Name AS username,
c.Name AS cheesename,
COUNT(*) AS num_checkins
FROM
users u
JOIN cheeses_users cu ON u.ID = cu.user_ID
JOIN cheeses c ON cu.cheese_ID = c.ID
GROUP BY
u.Name,
c.Name
To get the 5 most recent checkins for a given user, something like:
SELECT
c.Name AS cheesename,
cu.checkin_time
FROM
cheeses_users cu
JOIN cheeses c ON cu.cheese_ID = c.ID
WHERE
-- Limit to Anne's checkins...
cu.user_ID = 345
ORDER BY checkin_time DESC
LIMIT 5

Let's define more clearly, so you can tell me if I'm wrong:
Cheese instances exist and aren't divisible ("Cheddar/UK/Traditional/Yellow" is a valid checkinable cheese, but "Cheddar" isn't, nor is "Yellow" or "Cheddar/France/...)
Users check into a single cheese instance at a given time
Users can re-check into the same cheese instance at a later date.
If this is the case, then to store fully normalized data, and to be able to retrieve that data's history, you need a third relational table linking the two existing tables.
+-----+------------+---------------------+
| uid | cheese_id | timestamp |
+----+-------------+---------------------+
| 345 | 1 | 2014-05-04 19:04:38 |
| 345 | 2 | 2014-05-08 19:04:38 |
| 678 | 1 | 2014-05-09 19:04:38 |
+-----+------------+---------------------+
etc. You can add extra columns to correspond to the cheese data, but strictly speaking you don't need to.
By putting all this in a third table, you potentially improve both performance and flexibility. You can always reconstruct the additions to the users table you mooted, using aggregate queries.
If you really decide you don't need the timestamps, then you'd replace them with basically the equivalent of a COUNT(*) field:
+-----+------------+--------------+
| uid | cheese_id | num_checkins |
+----+-------------+--------------+
| 345 | 1 | 15 |
| 345 | 2 | 3 |
| 678 | 1 | 8 |
+-----+------------+--------------+
That would dramatically reduce the size of your joining table, although obviously there's less of a "paper trail", should you need to reconstruct your data (and possibly say to a user "oh, yeah, we forgot to record your checkin on such-a-date.")

The entities 'User' and 'Cheese' have a many-to-many relationship. A user can have multiple cheeses he checked into, and a cheese can have multiple people that checked into it.
The only right way to design this in a relational database is to store it into a separate table. There are many reasons why storing it into the user table for instance, is a very bad idea. Read up on normalizing databases for more info on this.
Your table should look something like this:
CheckIns(CheeseId, UserId, (etc...))
Other useful columns might include date or rating, or whatever you want to store about a particular relationship between a user and a cheese.

Related

WHERE statement with dynamic input

I have two tables. The first one (item) is listing apartments. The second (feature) is a list of features that an apartment could have. Currently we list about 25 different features.
As every apartment can have a different set of features, I think it makes sense to have a 1:1 relationship between items and features table.
If in feature table for one the features the value is '1', this means that the linked apartment has this feature.
+-------------+------------+--------------+-------------+------------+
| table: item | | | | |
+-------------+------------+--------------+-------------+------------+
| id | created_by | titel | description | address |
+-------------+------------+--------------+-------------+------------+
| 10 | user.id | Nice Flat | text | address.id |
+-------------+------------+--------------+-------------+------------+
| 20 | user.id | Another Flat | text | address.id |
+-------------+------------+--------------+-------------+------------+
| 30 | user.id | Bungalow | text | address.id |
+-------------+------------+--------------+-------------+------------+
| 40 | user.id | Apartment | text | address.id |
+-------------+------------+--------------+-------------+------------+
+----------------+---------+--------------+----------------+--------------+------+
| table: feature | | | | | |
+----------------+---------+--------------+----------------+--------------+------+
| id | item_id | key_provided | security_alarm | water_supply | lift |
+----------------+---------+--------------+----------------+--------------+------+
| 1 | 10 | 1 | 0 | 0 | 1 |
+----------------+---------+--------------+----------------+--------------+------+
| 2 | 20 | 0 | 1 | 1 | 0 |
+----------------+---------+--------------+----------------+--------------+------+
| 3 | 30 | 1 | 1 | 0 | 1 |
+----------------+---------+--------------+----------------+--------------+------+
| 4 | 40 | 1 | 1 | 1 | 1 |
+----------------+---------+--------------+----------------+--------------+------+
I want to build a filter functionality so user can select to show only apartments with certain features.
e.g.:
$key_provided = 1;
$security_alarm = 1;
$water_supply = 0;
Does this database approach sounds reasonable for you?
What’s the best way to build a MySQL query to retrieve only apartments where the filter criteria match, keeping in mind that the number of features can be grow in future?
A better approach is to have a features table. In your case, they all seem to be binary -- yes or no -- so you can get away with:
create table item_features (
item_feature_id int auto_increment primary key,
item_id int not null,
feature varchar(255)
foreign key item_id references items(item_id)
);
The data would then have the positive features, so the first item would be:
insert into item_features (item_id, feature)
values (1, 'key_provided'), (1, 'lift');
This makes it easy to manage the features, particularly adding new ones. You might want to use a trigger, check constraint, or reference table to validate the feature names themselves, but I don't want to stray too far from your question.
Then checking for features is a little more complicated, but not that much more so. One method is explicitly using exists and not exists for each desired/undesired one:
select i.*
from items i
where exists (select 1
from item_features itf
where itf.item_id = i.item_id and
itf.feature = 'key_provided'
) and
exists (select 1
from item_features itf
where itf.item_id = i.item_id and
itf.feature = 'security_alarm'
) and
not exists (select 1
from item_features itf
where itf.item_id = i.item_id and
itf.feature = 'water supply'
);
For your existing data structure, you can filter as follows:
select i.*
from item i
inner join feature f
on f.item_id = i.id
and f.key_provided = 1
and f.security_alarm = 1
and f.water_supply = 0
This will give you all the apartments that satisfy the given criteria. For more criterias, you can just add more conditions to the on part of the join.
As a general comment about your design:
since you are creating a 1-1 relationship between apartments and features, you might as well consider having a single table to store them (spreading the information over two tables does not have any obvious advantages)
your design is OK as long as features do not change too often, since, basically, everytime a new feature is created, you need to add more columns to your table. If features are added (or removed) frequently, this can become heavy to manage; in that case, you could consider having a separated table where each (item, feature) tuple is stored in a different row, which will make this of things easier to do (with the downside that queries will get more complicated to write)

MySQL link two tables together implicitly

Suppose we have two tables
A table called people with people linked to a bank account balances
| id | name | account_id |
--------------------------
| 1 | bob | 11 |
--------------------------
| 2 | sam | 22
A table called accounts with bank account balances
| id | value |
--------------
| 11 | 200 |
--------------
| 22 | 500 |
In order to link the two tables you can do
SELECT a.value as account_balance
FROM people p
WHERE p.name="bob"
LEFT JOIN accounts a ON p.account_id = a.id`
This would return
id => 1
name => bob
account_balance => 200
That's cool - but I am wondering if there is a more implicit way to do this via SQL linkage (foreign keys or otherwise). Can we in MySQL add links in some other way so that when we do a SELECT, it already knows to return value instead of **account_id **?
I'm asking this because I am creating a system where my users can create lookup tables and link them to other tables - but it must be do-able without any programming. The only other way I can think of is to set the name of account_id for example to accounts.value and treat that as a foreign key when doing a SELECT.
I would have to get the column structure and analyze and then determine that there is a foreign key and then return the appropriate foreign column by looking at the column name.

Improve relationship between 3 tables in MySQL

I have 3 tables on my database: users, payment_methods and user_blocked_pm. The users table speaks for itself, the payment_methods stores all the payment methods the company uses, and the user_blocked_pm has the payment methods blocked for a specific user.
+------------------+
| users |
+-----+------------+
| id | user_name |
+-----+------------+
| 1 | John |
| 2 | Davis |
+-----+------------+
+-----------------------+
| payment_methods |
+-----+-----------------+
| id | payment_method |
+-----+-----------------+
| 1 | credit_card |
| 2 | cash |
+-----+-----------------+
+-----------------------------------+
| user_blocked_pm |
+-----+---------+-------------------+
| id | user_id | payment_method_id |
+-----+---------+-------------------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 2 |
+-----+---------+-------------------+
So, following the structure above, both payment_methods are blocked for the user John and cash is blocked for Davis.
Following this structure when there are multiple users and payment methods I'll have multiple records on user_blocked_pm because each user will be allowed to use only a few of the payment methods.
Is there a better way to work this relationship between the users and the user_blocked_pm so that the table doesn't get gigantic?
You do not need the id column in user_blocked_pm table because you going to select on user_id or pm_id
If the number of the allowed pm is less then the number of the not allowed, why not to make a user_allowed_pm table instead of user_blocked_pm
If you have a fixed number of pm for each user then you do not need a table just you create a column for every pm and you put the key of the pm (like a foreign key)
If you have a few user "types", then perhaps you can replace the user_blocked_pm with a user_type_blocked_pm. A "type" is a set of blocked/permitted payment methods. So the user_type_blocked_pm table is small -- has entries for the different types (users who can pay with cash only, users who can pay with credit and cash, etc. ) Then, you can add a column to the users table to indicate the user type.
Your method is fine, and the other ideas so far suggested are also fine. If the number of payment types is small (not more than 7, say - and certainly less than 64!), and finite, then you might also consider a bitwise method, where 1 = credit_card, 2 = cash, and 3 = both. I do this for days of the week, which are unlikely to ever be more than 7.

One to many table with 3M records

I have a table in MySQL that contains almsot 3 million records.
The table saves friend information in a user system. So it has many users and even more friends (There is a (soft)max of 2000 per user). I had added some extra fields name, url, dob, image, registered which are varchar(255) and dates.
My basic data is 2 int's and 1 varchar(6).
When using PHPMyAdmin it all gets really slow. I have an index on the user ID and the varchar(6) and that's how I query all the friends of a user (which goes well). However, any other operation (or the ones to come) aren't going to be fast.
My options:
Remove the double data (Normalizing)
Change the datatype for the friend IDs and save it like a JSON blob
So questions;
When my table is only 2 ints and a tiny varchar, will it still be
slow with 3M records?
Should I change my datatype?
Should I be
using a different pattern for this friendlist problem?
Edit: To clarify a bit more.
The Users are not my actual users, but they are user objects nonetheless. All the Friends are a User object, but I may or may not already have the User object. So I'm using the extra data in Friends to show data about it in the list on the Users page.
In the ideal world things wouldn't take so long, in the next optimal world I would only have 2 fields in Friends which are user_id and friend_id. But I can not rely on linking friend_id to a User object, I may not have it..
Users (has more fields, but for brevity)
+-------+---------+-------+------------+
| shard | user_id | name | dob |
+-------+---------+-------+------------+
| nl | 1 | Bob | 2014-03-26 |
| nl | 2 | Erik | 2014-03-26 |
| de | 1 | Johan | 2014-02-01 |
+-------+---------+-------+------------+
Friends (has more fields, see description above)
+-------+---------+-----------+--------+
| shard | user_id | friend_id | name |
+-------+---------+-----------+--------+
| nl | 1 | 2 | Erik |
| nl | 1 | 3 | Alice |
| de | 1 | 2 | Rasmus |
+-------+---------+-----------+--------+
nl-Bob is friends with nl-Erik (Is a user)
nl-Bob is friends with nl-Alice (Is not a user)
de-Johan is friends with de-Rasmus (Is not a user)

Structuring a MySQL database for user information

I am quite new to MySQL, I know most of the basic functions and how to send queries etc. However, I am trying to learn about structuring it for optimal searches for user information and wanted to get some ideas.
Right now I just have one table (for functionality purposes and testing) called user_info which holds the users information and another table that stores photos linked to the user. Ideally id like most of this information to be as quickly as accessible as possible
In creating a database which is primarily used to store and retrieve user information (name, age, phone, messages, etc.) would it be a good idea to create a NEW TABLE for each new user that stores all the information so the one table user_info does not become bogged down by multiple queries, locking, etc. So for example user john smith would have his very own table in the database holding all his information including photos, messages etc.
OR
is it better to have just a few tables such as user_info, user_photos, user_messages,etc. and accessing data in this manner.
I am not concerned about redundancy in the tables such as the users email address being repeated multiple times.
The latter is the best way. You declare one table for users, and several columns with the data you want.
Now if you want users to have photos, you'd require a new table with photos and a Foreign Key attribute that links to the user table's Primary Key.
You should definitely NOT create a new table for each user. Create one table for user_info, one for photos if each user can have many photos. A messages table would probably contain two user_id columns (user_to, user_from) and a message column. Try to normalize the data as much as possible.
Users
====
id
email
etc
Photos
====
id
user_id
meta_data
etc
Messages
====
id
user_id_to
user_id_from
message
timestamp
etc
I agree with both the answers supplied here, but one thing they haven't mentioned yet is lookup tables.
Going with the general examples here consider this: you have a users table, and a photos table. Now you want to introduce a featre on your site that allows users to "Favorite" photos from other users.
Rather than making a new table called "Favorites" and adding in all your data about the image (fiel location, metadata, score/whatever) all over again, have a table that effectively sits BETWEEN the other two.
+-----------------------+ +-------------------------------------+
| ++ users | | ++ photos |
| userID | email | name | | photoID | ownerID | fileLo | etc... |
+--------+-------+------| +---------+---------+--------+--------+
| 1 | .... | Tom | | 35 | 1 | ..... | .......|
| 2 | .... | Rob | | 36 | 2 | ..... | .......|
| 3 | .... | Dan | | 37 | 1 | ..... | .......|
+--------+-------+------+ | 43 | 3 | ..... | .......|
| 48 | 2 | ..... | .......|
| 49 | 3 | ..... | .......|
| 53 | 2 | ..... | .......|
+---------+---------+--------+--------+
+------------------+
| ++ Favs |
| userID | photoID |
+--------+---------+
| 1 | 37 |
| 1 | 48 |
| 2 | 37 |
+--------+---------+
With this approach, you link the data you have cleanly, efficiently and without too much data replication.