One to many table with 3M records - mysql

I have a table in MySQL that contains almsot 3 million records.
The table saves friend information in a user system. So it has many users and even more friends (There is a (soft)max of 2000 per user). I had added some extra fields name, url, dob, image, registered which are varchar(255) and dates.
My basic data is 2 int's and 1 varchar(6).
When using PHPMyAdmin it all gets really slow. I have an index on the user ID and the varchar(6) and that's how I query all the friends of a user (which goes well). However, any other operation (or the ones to come) aren't going to be fast.
My options:
Remove the double data (Normalizing)
Change the datatype for the friend IDs and save it like a JSON blob
So questions;
When my table is only 2 ints and a tiny varchar, will it still be
slow with 3M records?
Should I change my datatype?
Should I be
using a different pattern for this friendlist problem?
Edit: To clarify a bit more.
The Users are not my actual users, but they are user objects nonetheless. All the Friends are a User object, but I may or may not already have the User object. So I'm using the extra data in Friends to show data about it in the list on the Users page.
In the ideal world things wouldn't take so long, in the next optimal world I would only have 2 fields in Friends which are user_id and friend_id. But I can not rely on linking friend_id to a User object, I may not have it..
Users (has more fields, but for brevity)
+-------+---------+-------+------------+
| shard | user_id | name | dob |
+-------+---------+-------+------------+
| nl | 1 | Bob | 2014-03-26 |
| nl | 2 | Erik | 2014-03-26 |
| de | 1 | Johan | 2014-02-01 |
+-------+---------+-------+------------+
Friends (has more fields, see description above)
+-------+---------+-----------+--------+
| shard | user_id | friend_id | name |
+-------+---------+-----------+--------+
| nl | 1 | 2 | Erik |
| nl | 1 | 3 | Alice |
| de | 1 | 2 | Rasmus |
+-------+---------+-----------+--------+
nl-Bob is friends with nl-Erik (Is a user)
nl-Bob is friends with nl-Alice (Is not a user)
de-Johan is friends with de-Rasmus (Is not a user)

Related

MS Access help needed forming a specific report

I have a table with a column for agent names and a column for each of the skills those agents could possibly have. Each skill the agent is assigned shows a 1 in the field under that skill.
Columns look like this:
+---------+----------+----------+----------+
| Name | 'Skill1' | 'Skill2' | 'Skill3' |
+---------+----------+----------+----------+
| John | 1 | | 1 |
| Sam | 1 | 1 | |
| Roberta | 1 | | 1 |
+---------+----------+----------+----------+
I would like to make a query that returns a list of all agent names that have a 1 for each particular skill. The query would return something like this:
+-----------+
| Skill 1 |
+-----------+
| John |
| Sam |
| Roberta |
+-----------+
Additionally I would like to be able to query a single name and retrieve all skills that agent has (all rows the Name column has a 1 in) like this:
+-----------+
| John |
+-----------+
| Skill 1 |
| Skill 3 |
+-----------+
I've done this in Excel using an index but I'm new to Access and not sure how to complete this task.
Thanks in advance.
One of the reasons that you are finding this task difficult is because your database is not normalised and so due to the way that your database is structured, you are working against MS Access, not with it.
Consequently, whilst a solution is still possible with the current data, the resulting queries will be painful to construct and will either be full of multiple messy iif statements, or several union queries performing the same operations over & over again, one for each 'skill'.
Then, if you every wish to add another Skill to the database, all of your queries have to be rewritten!
Whereas, if your database was normalised (as Gustav has suggested in the comments), the task would be a simple one-liner; and what's more, if you add a new skill later on, your queries will automatically output the results as if the skill had always been there.
Your data has a many-to-many relationship: an agent may have many skills, and a skill may be known by many agents.
As such, the most appropriate way to represent this relationship is using a junction table.
Hence, you would have a table of Agents such as:
tblAgents
+-----+-----------+----------+------------+
| ID | FirstName | LastName | DOB |
+-----+-----------+----------+------------+
| 1 | John | Smith | 1970-01-01 |
| ... | ... | ... | ... |
+-----+-----------+----------+------------+
This would only contain information unique to each agent, i.e. minimising the repeated information between records in the table.
You would then have a table of possible Skills, such as:
tblSkills
+-----+---------+---------------------+
| ID | Name | Description |
+-----+---------+---------------------+
| 1 | Skill 1 | Skill 1 Description |
| 2 | Skill 2 | Skill 2 Description |
| ... | ... | ... |
+-----+---------+---------------------+
Finally, you would have a junction table linking Agents to Skills, e.g.:
tblAgentSkills
+----+----------+----------+
| ID | Agent_ID | Skill_ID |
+----+----------+----------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 1 |
| 4 | 3 | 2 |
+----+----------+----------+
Now, say you want to find out which agents have Skill 1, the query is simple:
select Agent_ID from tblAgentSkills where Skill_ID = 1
What if you want to find out the skills known by an agent? Equally as simple:
select Skill_ID from tblAgentSkills where Agent_ID = 1
Of course, these queries will merely return the ID fields as present in the junction table - but since the ID uniquely identifies a record in the tblAgents or tblSkills tables, such ID is all you need to retrieve any other required information:
select
tblAgents.FirstName,
tblAgents.LastName
from
tblAgentSkills inner join tblAgents on
tblAgentSkills.AgentID = tblAgents.ID
where
tblAgentSkills.Skill_ID = 1
To get all agents with skill1, open the query designer and create the following query:
this will generate the following sql
SELECT Skills.AgentName
FROM Skills
WHERE (((Skills.Skill1)=1));
If you adjust the names you can also paste this query into the sql pane of the designer to get the query you want.
To get all the skills an agent has I chose a parameterized query. Open the query designer and create a new query:
When you run this query it will ask you for the name of the agent. Make sure to type the agent name exactly. Here is the resulting sql:
SELECT Skills.AgentName, Skills.Skill1, Skills.Skill2, Skills.Skill3
FROM Skills
WHERE (((Skills.AgentName)=[Agent]));
If you continue working with this query I would improve the table design by breaking your table into a skills table, agents table, skills&agents table. Then link the skills and agents tables to the skills&agents table in a many to many relationship. The query to get all an agents skills would then look like this in the designer:

How To Design A Database for a "Check In" Social Service

I want to build a "check in" service like FourSquare or Untappd.
How do I design a suitable database schema for storing check-ins?
For example, suppose I'm developing "CheeseSquare" to help people keep track of the delicious cheeses they've tried.
The table for the items into which one can check in is fairly simple and would look like
+----+---------+---------+-------------+--------+
| ID | Name | Country | Style | Colour |
+----+---------+---------+-------------+--------+
| 1 | Brie | France | Soft | White |
| 2 | Cheddar | UK | Traditional | Yellow |
+----+---------+---------+-------------+--------+
I would also have a table for the users, say
+-----+------+---------------+----------------+
| ID | Name | Twitter Token | Facebook Token |
+-----+------+---------------+----------------+
| 345 | Anne | qwerty | poiuyt |
| 678 | Bob | asdfg | mnbvc |
+-----+------+---------------+----------------+
What's the best way of recording that a user has checked in to a particular cheese?
For example, I want to record how many French cheeses Anne has checked-in. Which cheeses Bob has checked into etc. If Cersei has eaten Camembert more than 5 times etc.
Am I best putting this information in the user's table? E.g.
+-----+------+------+--------+------+------+---------+---------+
| ID | Name | Blue | Yellow | Soft | Brie | Cheddar | Stilton |
+-----+------+------+--------+------+------+---------+---------+
| 345 | Anne | 1 | 0 | 2 | 1 | 0 | 5 |
| 678 | Bob | 3 | 1 | 1 | 1 | 1 | 2 |
+-----+------+------+--------+------+------+---------+---------+
That looks rather ungainly and hard to maintain. So should I have separate tables for recordings check in?
No, don't put it into the users table. That information is better stored in a join table which represents a many-to-many relationship between users and cheeses.
The join table (we'll call cheeses_users) must have at least two columns (user_ID, cheese_ID), but a third (a timestamp) would be useful too. If you default the timestamp column to CURRENT_TIMESTAMP, you need only insert the user_ID, cheese_ID into the table to log a checkin.
cheeses (ID) ⇒ (cheese_ID) cheeses_users (user_ID) ⇐ users (ID)
Created as:
CREATE TABLE cheeses_users
cheese_ID INT NOT NULL,
user_ID INT NOT NULL,
-- timestamp defaults to current time
checkin_time DATETIME DEFAULT CURRENT_TIMESTAMP,
-- (add any other column *specific to* this checkin (user+cheese+time))
--The primary key is the combination of all 3
-- It becomes impossible for the same user to log the same cheese
-- at the same second in time...
PRIMARY KEY (cheese_ID, user_ID, checkin_time),
-- FOREIGN KEYs to your other tables
FOREIGN KEY (cheese_ID) REFERENCES cheeses (ID),
FOREIGN KEY (user_ID) REFERENCES users (ID),
) ENGINE=InnoDB; -- InnoDB is necessary for the FK's to be honored and useful
To log a checkin for Bob & Cheddar, insert with:
INSERT INTO cheeses_users (cheese_ID, user_ID) VALUES (2, 678);
To query them, you join through this table. For example, to see the number of each cheese type for each user, you might use:
SELECT
u.Name AS username,
c.Name AS cheesename,
COUNT(*) AS num_checkins
FROM
users u
JOIN cheeses_users cu ON u.ID = cu.user_ID
JOIN cheeses c ON cu.cheese_ID = c.ID
GROUP BY
u.Name,
c.Name
To get the 5 most recent checkins for a given user, something like:
SELECT
c.Name AS cheesename,
cu.checkin_time
FROM
cheeses_users cu
JOIN cheeses c ON cu.cheese_ID = c.ID
WHERE
-- Limit to Anne's checkins...
cu.user_ID = 345
ORDER BY checkin_time DESC
LIMIT 5
Let's define more clearly, so you can tell me if I'm wrong:
Cheese instances exist and aren't divisible ("Cheddar/UK/Traditional/Yellow" is a valid checkinable cheese, but "Cheddar" isn't, nor is "Yellow" or "Cheddar/France/...)
Users check into a single cheese instance at a given time
Users can re-check into the same cheese instance at a later date.
If this is the case, then to store fully normalized data, and to be able to retrieve that data's history, you need a third relational table linking the two existing tables.
+-----+------------+---------------------+
| uid | cheese_id | timestamp |
+----+-------------+---------------------+
| 345 | 1 | 2014-05-04 19:04:38 |
| 345 | 2 | 2014-05-08 19:04:38 |
| 678 | 1 | 2014-05-09 19:04:38 |
+-----+------------+---------------------+
etc. You can add extra columns to correspond to the cheese data, but strictly speaking you don't need to.
By putting all this in a third table, you potentially improve both performance and flexibility. You can always reconstruct the additions to the users table you mooted, using aggregate queries.
If you really decide you don't need the timestamps, then you'd replace them with basically the equivalent of a COUNT(*) field:
+-----+------------+--------------+
| uid | cheese_id | num_checkins |
+----+-------------+--------------+
| 345 | 1 | 15 |
| 345 | 2 | 3 |
| 678 | 1 | 8 |
+-----+------------+--------------+
That would dramatically reduce the size of your joining table, although obviously there's less of a "paper trail", should you need to reconstruct your data (and possibly say to a user "oh, yeah, we forgot to record your checkin on such-a-date.")
The entities 'User' and 'Cheese' have a many-to-many relationship. A user can have multiple cheeses he checked into, and a cheese can have multiple people that checked into it.
The only right way to design this in a relational database is to store it into a separate table. There are many reasons why storing it into the user table for instance, is a very bad idea. Read up on normalizing databases for more info on this.
Your table should look something like this:
CheckIns(CheeseId, UserId, (etc...))
Other useful columns might include date or rating, or whatever you want to store about a particular relationship between a user and a cheese.

Database table that has many-to-many and one-to-many relationship

In the interest of learning more about database design im was drawing up a database model, i choose to draw a simple database model for a social network website to keep a little more interesting than your average student/teacher/class models.
The question i have is about the relationships between the different tables.
Im not that good drawing these text database drawings like other peoples has on stack exchange ill try to just list the tables and explain the relationships, if its unclear i can try to draw a text drawing.
Database tables:
User
Friend
Group
Newsfeed
User has a one-to-many relationship to Friend and Group based on that one user can have many friends and a user can be a member of several groups and a group can have many users.
Friend has a many-to-many relationship with Group based on that one friend can be a part of many groups and a group can contain many friends. There is a one-to-many relationship to Newsfeed based on that one friend can have many newsfeeds.
Group has a many-to-many relationship with Friend based on that one group can contain many friends and one friend can be part of many groups. Group has a one-to-many relationships with Newsletter based on that one group can have many newsfeeds.
So now there is one many-to-many relationship and a one-to-many relationship in one table point to two other tables, is this correct ? Some part of this feels wrong, especially the Friend part but maybe im just misunderstanding something here. This might be a stupid database model but i need to ask stupid questions sometimes in order to get smarter at something. Ive read about and watch some videos about database relationships and to be they seem easy but when drawing this database model im getting confused since i suddently end up with a many-to-many and a one-to-many relationship in one table which seems weird.
This is how I'd start:
Let's say we have two groups, Group A and Group B.
groups
id unsigned int(P)
name varchar(30)
...
+----+---------+-----+
| id | name | ... |
+----+---------+-----+
| 1 | Group A | ... |
| 2 | Group B | ... |
| .. | ....... | ... |
+----+---------+-----+
Let's say Group A has two newsfeeds and Group B doesn't have any:
newsfeeds
id unsigned int(P)
group_id unsigned int(F groups.id)
name varchar(30)
...
+----+----------+--------------------+-----+
| id | group_id | name | ... |
+----+----------+--------------------+-----+
| 1 | 1 | Interesting Things | ... |
| 2 | 1 | Other Information | ... |
| .. | ........ | .................. | ... |
+----+----------+--------------------+-----+
Let's say we have three users: Bob, Mary and John:
users
id unsigned int(P)
name varchar(30)
...
+----+------+-----+
| id | name | ... |
+----+------+-----+
| 1 | Bob | ... |
| 2 | Mary | ... |
| 3 | John | ... |
| .. | .... | ... |
+----+------+-----+
A "Friend" is really just another user so let's create a table that allows many-to-many relationships between two users. My sample data shows that Bob is friends with Mary and John while Mary is only friends with John. (user_id and friend_id form the Primary Key)
users_friends
user_id unsigned int \_ (P) (F users.id)
friend_id unsigned int / (F users.id)
+---------+-----------+
| user_id | friend_id |
+---------+-----------+
| 1 | 2 |
| 1 | 3 |
| 2 | 3 |
| ....... | ......... |
+---------+-----------+
Users can belong to many groups and each group can have many users so we need to have a table that gives us that many-to-many relationship. In my example data we see that Bob is a member of Group A and Group B while Mary and John are only members of Group B. (user_id and group_id form the Primary Key)
users_groups
user_id unsigned int \_ (P)(F users.id)
group_id unsigned int / (F groups.id)
+---------+----------+
| user_id | group_id |
+---------+----------+
| 1 | 1 |
| 1 | 2 |
| 2 | 2 |
| 3 | 2 |
| ....... | ........ |
+---------+----------+
Finally we need a table that shows the relationship between newsfeeds and users. I haven't entered any example data here but this table works exactly like the users_groups table. Tables like this are called many different things, you can read more about them at Wikipedia. (user_id and newsfeed_id form the Primary key)
users_newsfeeds
user_id unsigned int \_ (P) (F users.id)
newsfeed_id unsigned int / (F newsfeeds.id)
IMO, when thinking about relation modeling, one should remember about the 'direction' of the relation, otherwise it gets very confusing and also, one should remember every 'many to many' relation must have to be modeled using 'one to many'. Anyway, take a look here http://screencast.com/t/sJbPrvO53MS
even though it took a min to read the question...this was interesting problem...

How to store multiple values in single column where use less memory?

I have a table of users where 1 column stores user's "roles".
We can assign multiple roles to particular user.
Then I want to store role IDs in the "roles" column.
But how can I store multiple values into a single column to save memory in a way that is easy to use? For example, storing using a comma-delimited field is not easy and uses memory.
Any ideas?
If a user can have multiple roles, it is probably better to have a user_role table that stores this information. It is normalised, and will be much easier to query.
A table like:
user_id | role
--------+-----------------
1 | Admin
2 | User
2 | Admin
3 | User
3 | Author
Will allow you to query for all users with a particular role, such as SELECT user_id, user.name FROM user_role JOIN user WHERE role='Admin' rather than having to use string parsing to get details out of a column.
Amongst other things this will be faster, as you can index the columns properly and will take marginally more space than any solution that puts multiple values into a single column - which is antithetical to what relational databases are designed for.
The reason this shouldn't be stored is that it is inefficient, for the reason DCoder states on the comment to this answer. To check if a user has a role, every row of the user table will need to be scanned, and then the "roles" column will have to be scanned using string matching - regardless of how this action is exposed, the RMDBS will need to perform string operations to parse the content. These are very expensive operations, and not at all good database design.
If you need to have a single column, I would strongly suggest that you no longer have a technical problem, but a people management one. Adding additional tables to an existing database that is under development, should not be difficult. If this isn't something you are authorised to do, explain to why the extra table is needed to the right person - because munging multiple values into a single column is a bad, bad idea.
You can also use bitwise logic with MySQL. role_id must be in BASE 2 (0, 1, 2, 4, 8, 16, 32...)
role_id | label
--------+-----------------
1 | Admin
2 | User
4 | Author
user_id | name | role
--------+-----------------
1 | John | 1
2 | Steve | 3
3 | Jack | 6
Bitwise logic allows you to select all user roles
SELECT * FROM users WHERE role & 1
-- returns all Admin users
SELECT * FROM users WHERE role & 5
-- returns all users who are admin or Author because 5 = 1 + 4
SELECT * FROM users WHERE role & 6
-- returns all users who are User or Author because 6 = 2 + 4
From your question what I got,
Suppose, you have to table. one is "meal" table and another one is "combo_meal" table. Now I think you want to store multiple meal_id inside one combo_meal_id without separating coma[,]. And you said that it'll make your DB to more standard.
If I not getting wrong from your question then please read carefully my suggestion bellow. It may be help you.
First think is your concept is right. Definitely it'll give you more standard DB.
For this you have to create one more table [ example table: combo_meal_relation ] for referencing those two table data. May be one visible example will clear it.
meal table
+------+--------+-----------+---------+
| id | name | serving | price |
+------+--------+-----------+---------+
| 1 | soup1 | 2 person | 12.50 |
+------+--------+-----------+---------+
| 2 | soup2 | 2 person | 15.50 |
+------+--------+-----------+---------+
| 3 | soup3 | 2 person | 23.00 |
+------+--------+-----------+---------+
| 4 | drink1 | 2 person | 4.50 |
+------+--------+-----------+---------+
| 5 | drink2 | 2 person | 3.50 |
+------+--------+-----------+---------+
| 6 | drink3 | 2 person | 5.50 |
+------+--------+-----------+---------+
| 7 | frui1 | 2 person | 3.00 |
+------+--------+-----------+---------+
| 8 | fruit2 | 2 person | 3.50 |
+------+--------+-----------+---------+
| 9 | fruit3 | 2 person | 4.50 |
+------+--------+-----------+---------+
combo_meal table
+------+--------------+-----------+
| id | combo_name | serving |
+------+--------------+-----------+
| 1 | combo1 | 2 person |
+------+--------------+-----------+
| 2 | combo2 | 2 person |
+------+--------------+-----------+
| 4 | combo3 | 2 person |
+------+--------------+-----------+
combo_meal_relation
+------+--------------+-----------+
| id | combo_meal_id| meal_id |
+------+--------------+-----------+
| 1 | 1 | 1 |
+------+--------------+-----------+
| 2 | 1 | 2 |
+------+--------------+-----------+
| 3 | 1 | 3 |
+------+--------------+-----------+
| 4 | 2 | 4 |
+------+--------------+-----------+
| 5 | 2 | 2 |
+------+--------------+-----------+
| 6 | 2 | 7 |
+------+--------------+-----------+
When you search inside table then it'll generate faster result.
search query:
SELECT m.*
FROM combo_meal cm
JOIN meal m
ON m.id = cm.meal_id
WHERE cm.combo_id = 1
Hopefully you understand :)
You could do something like this
INSERT INTO table (id, roles) VALUES ('', '2,3,4');
Then to find it use FIND_IN_SET
As you might already know, storing multiple values in a cell goes against 1NF form. If youre fine with that, using a json column type is a great way and has good methods to query properly.
SELECT * FROM table_name
WHERE JSON_CONTAINS(column_name, '"value 2"', '$')
Will return any entry with json data like
[
"value",
"value 2",
"value 3"
]
Youre using json, so remember, youre query performance will go down the drain.

Structuring a MySQL database for user information

I am quite new to MySQL, I know most of the basic functions and how to send queries etc. However, I am trying to learn about structuring it for optimal searches for user information and wanted to get some ideas.
Right now I just have one table (for functionality purposes and testing) called user_info which holds the users information and another table that stores photos linked to the user. Ideally id like most of this information to be as quickly as accessible as possible
In creating a database which is primarily used to store and retrieve user information (name, age, phone, messages, etc.) would it be a good idea to create a NEW TABLE for each new user that stores all the information so the one table user_info does not become bogged down by multiple queries, locking, etc. So for example user john smith would have his very own table in the database holding all his information including photos, messages etc.
OR
is it better to have just a few tables such as user_info, user_photos, user_messages,etc. and accessing data in this manner.
I am not concerned about redundancy in the tables such as the users email address being repeated multiple times.
The latter is the best way. You declare one table for users, and several columns with the data you want.
Now if you want users to have photos, you'd require a new table with photos and a Foreign Key attribute that links to the user table's Primary Key.
You should definitely NOT create a new table for each user. Create one table for user_info, one for photos if each user can have many photos. A messages table would probably contain two user_id columns (user_to, user_from) and a message column. Try to normalize the data as much as possible.
Users
====
id
email
etc
Photos
====
id
user_id
meta_data
etc
Messages
====
id
user_id_to
user_id_from
message
timestamp
etc
I agree with both the answers supplied here, but one thing they haven't mentioned yet is lookup tables.
Going with the general examples here consider this: you have a users table, and a photos table. Now you want to introduce a featre on your site that allows users to "Favorite" photos from other users.
Rather than making a new table called "Favorites" and adding in all your data about the image (fiel location, metadata, score/whatever) all over again, have a table that effectively sits BETWEEN the other two.
+-----------------------+ +-------------------------------------+
| ++ users | | ++ photos |
| userID | email | name | | photoID | ownerID | fileLo | etc... |
+--------+-------+------| +---------+---------+--------+--------+
| 1 | .... | Tom | | 35 | 1 | ..... | .......|
| 2 | .... | Rob | | 36 | 2 | ..... | .......|
| 3 | .... | Dan | | 37 | 1 | ..... | .......|
+--------+-------+------+ | 43 | 3 | ..... | .......|
| 48 | 2 | ..... | .......|
| 49 | 3 | ..... | .......|
| 53 | 2 | ..... | .......|
+---------+---------+--------+--------+
+------------------+
| ++ Favs |
| userID | photoID |
+--------+---------+
| 1 | 37 |
| 1 | 48 |
| 2 | 37 |
+--------+---------+
With this approach, you link the data you have cleanly, efficiently and without too much data replication.