Mysql: is it better to split tables if possible? - mysql

To make you understand my question I'll give you an example:
I have a chat web app with many rooms, let's say 5 rooms.
People can choose to stay only in one room and they choose it at login.
When they choose the room I have to retrieve the people already in the room, so I can structure my db in two ways:
each room one table with the people being records;
all the rooms in one table, people are the records and a column indicating the room they are in;
In the first case the query would be:
SELECT * FROM 'room_2' WHERE 1
In the second case the query would be:
SELECT * FROM 'rooms' WHERE room = 'room_2'
Which is the best?
I think the only parameter to consider is performance, right?

In this example, no, because people are all 'like' objects and should therefore be in the same table.
All people and rooms in one table with a primary key on people, in this simple example.
Table Rooms(pk_person, personName, table_id)
But I want to talk about a structure that you will want to consider as your website grows. You’ll want three tables, one for each object (chat rooms, people) and one for the relationships.
Chat_Rooms(pk_ChatId, ChatName, MaxOccupants, other unique attributes of a chat room)
People(pk_PersonID, FirstName, LastName, other unique attributes of a person)
Room_People_Join(pk_JoinId, fk_ChatId, fk_PersonID, EnterDateTime, ExitDateTime)
This is a “highly normalized” structure. Each table is a collection of like objects, the join allows for many to many relationships, and object rows are not duplicated. So, a Person with all their attributes (name, gender, age) is never duplicated in the person table. Also, the person table never defines which chat rooms a person is in, because a person could be in one, many, none, or may have entered and exit multiple times. The same concept applies to a chat room. A chat rooms features, such as background color, max occupants, etc. have nothing to do with people.
The Room_People_Join is the important one. This has a unique primary key for which chat rooms a person is in and when they were there. This table grows indefinitely, but it tracks usage. Including the relationship table is what logically normalizes your database.
So how do you know which users are currently in chat room 1? You join your people and rooms to the join table with their respective Primary and Foreign keys in your FROM clause, ask for the columns you want in your SELECT clause, and filter for chat room 1 and people who haven’t yet left.
SELECT p.FirstName, p.LastName, r.ChatName
FROM Room_People_Join j
JOIN People p ON j.fk_PersonID = p.pk_PersonID
JOIN Chat_Rooms r ON j.fk_ChatId = r.pk_ChatId
WHERE r.ExitDateTime IS NOT NULL
AND pk_ChatId = 1
Sorry that’s long winded, but I extrapolated your question for database growth.

The answer is very simple and strongly recommended - one database table for all rooms for sure! What if you will later like to create rooms dynamically!? For sure you would not create new tables dynamically.

Related

Whats the best way to implement a database with multivalued attributes?

i am trying to implement a database which has multi valued attributes and create a filter based search. For example i want my people_table to contain id, name, address, hobbies, interests (hobbies and interests are multi-valued). The user will be able to check many attributes and sql will return only those who have all of them.
I made my study and i found some ways to implement this but i can't decide which one is the best.
The first one is to have one table with the basic info of people (id, name, address), two more for the multi-valued attributes and one more which contains only the keys of the other tables (i understand how to create this tables, i don't know yet how to implement the search).
The second one is to have one table with the basic info and then one for each attribute. So i will have 20 or more tables (football, paint, golf, music, hiking etc.) which they only contain the ids of the people. Then when the user checks the hobbies and the activities i am going to get the desired results with the use of the JOIN feature (i am not sure about the complexity, so i don't know how fast is going to be if the user do many checks).
The last one is an implementation that i didn't find on internet (and i know there is a reason :) ) but in my mind is the easiest to implement and the fastest in terms of complexity. Use only one table which will have the basic infos as normal and also all the attributes as boolean variables. So if i have 1000 people in my table there are going to be only 1000 loops and which i imagine with the use of AND condition are going to be fast enough.
So my question is: can i use the the third implementation or there is a big disadvantage that i don't get? And also which one of the first two ways do you suggest me to use?
That is a typical n to m relation. It works like this
persons table
------------
id
name
address
interests table
---------------
id
name
person_interests table
----------------------
person_id
interest_id
person_interests contains a record for each interest of a person. To get the interests of a person do:
select i.name
from interests i
join person_interests pi on pi.interest_id = i.id
join persons p on pi.person_id = p.id
where p.name = 'peter'
You could create also tables for hobbies. To get the hobbies do the same in a separate query. To get both in one query you can do something like this
select p.id, p.name,
i.name as interest,
h.name as hobby
from persons p
left join person_interests pi on pi.person_id = p.id
left join interests i on pi.interest_id = i.id
left join person_hobbies ph on ph.person_id = p.id
left join hobbies h on ph.hobby_id = h.id
where p.name = 'peter'
The basic way to deal with this is with a many-to-many join table. Each user can have many hobbies. Each hobby can have many users. That's basic stuff you can find information about anywhere, and #juergend already covered that.
The harder part is tracking different information about various hobbies and interests. Like if their hobby is "baseball" you might want to track what position they play, but if their hobby is "travel" you might want to track their favorite countries. Doing this with typical SQL relationships will lead to a rapid proliferation of tables and columns.
A hybrid approach is to use the new JSON data type to store some unstructured data. To expand on #juergend's example, you might add a field to Person_Interests which can store some of those details about that person's interest.
create table Person_Interests (
InterestID integer references Interests(ID),
PersonID integer references Persons(ID),
Details JSON
);
And now you could add that Person 45 has Interest 12 (travel), their favorite country is Djibouti, and they've been to 45 countries.
insert into person_interests
(InterestID, PersonID, Details)
(12, 45, '{"favorite_country": "Djibouti", "countries_visited": 45}');
And you can use JSON search functions to find, for example, everyone whose favorite country is Djibouti.
select p.id, p.name
from person_interests pi
join persons p on p.id = pi.personid
where pi.details->"$.favorite_country" = "Djibouti"
The advantage here is flexibility: interests and their attributes aren't limited by your database schema.
The disadvantages is performance. The JSON data type isn't the most efficient, and indexing a JSON column in MySQL is complicated. Good indexing is critical to good SQL performance. So as you figure out common patterns you might want to turn commonly used attributes into real columns in real tables.
The other option would be to use table inheritance. This is a feature of Postgres, not MySQL, and I'd recommend considering switching. Postgres also has better and more mature JSON support and JSON columns are easier to index.
With table inheritance, rather than having to write a completely new table for every different interest, you can make specific tables which inherit from a more generic one.
create table person_interests_travel (
FavoriteCountry text,
CountriesVisited text[]
) inherits(person_interests);
This still has InterestID, PersonID, and Details, but it's added some specific columns for tracking their favorite country and countries they've visited.
Note that text[]. Postgresql also supports arrays so you can store real lists without having to create another join table. You can also do this in MySQL with a JSON field, but arrays offer type constraints that JSON does not.

Get stats table from a many to many relationship

I have a pivot table for a Many to Many relationship between users and collected_guitars. As you can see a "collected_guitar" is an item that references some data in foreign tables (guitar_models, finish).
My users also have some foreign data in foreign tables (hand_types and genders)
I want to get a derived table that lists data if I look for a particular model_id in "collected_guitar_user"
Let's say "Fender Stratocaster" is model id = 200, where the make is Fender (id = 1 of makes table).
The same guitar could come in a variety of finish hence the use of another table collected_guitars.
One user could have this item in his collection
Now what I want to find by looking at model_id (in this case 200) in the pivot table "collected_guitar_user" is the number of Fender Stratocasters that are collected by users that share the same genders.sex and hand_types.type as the logged in user and to see what finish they divide in (some percent of finish A and B etc...).
So a user could see that is interested in what others are buying could see some statistics for the model.
What query can derive this kind of table??
You can do aggregate counts by using the GROUP BY syntax, and CROSS JOIN to compute a percentage of the total:
SELECT make.make, models.model_name as model, finish.finish,
COUNT(1) AS number_of_users,
(COUNT(1) / u.total * 100) AS percent_owned
FROM owned_guitar, owned_guitar_users, users, models, make, finish
CROSS JOIN (SELECT COUNT(1) AS total FROM users) u
WHERE users.id = owned_guitar_users.user_id
AND owned_guitar_user.owned_guitar_id = owned_guitar.id
AND owned_guitar.model_id = models.id
AND owned_guitar.make_id = make.id
AND owned_guitar.finish_id = finish.id
GROUP BY owned_guitar.id
Please note though, that in cases where a user owns more than one guitar, the percentages will no longer necessarily sum to unity (for example, Jack and John could both own all five guitars, so each of them owns "100%" of the guitars).
I'm also a little confused by your database design. Why do you have a finish_id and make_id associated directly in the owned_guitar table as well as in the models table?

Mysql setup for multiple users with large number of individual options

i'm building a study tool and i'm not sure of the best way to go about structuring my database.
Basically, i have a simple but big table with around 50000 bits of information in it.
info (50'000 rows)
id
info_text
user
id
name
email
password
etc
What i want is for the students to be able to marked each item as studied or to be studied(basically on and off), so that they can tick off each item when they have revised it.
I want to build tool to cope with thousands of users and was wondering what the most efficient/easiest option way of setting up the database and associated queries.
At the moment i would lean towards just having one huge table with two primary keys one with user id and then id of the info they had studied and then doing some sort of JOIN statement so i could only pull back the items that they had left to study.
user_info
user_id
info_id
Thanks in advance
Here is one way to model this situation:
The table in the middle has a composite primary key on USER_ID and ITEM_ID, so a combination of the two must be unique, even though individually they don't have to be.
A user (with given USER_ID) has studied a particular item (with given ITEM_ID) only if there is a corresponding row in the STUDIED table (with these same USER_ID and ITEM_ID values).
Conversely, the user has not studied the item, if and only if the corresponding row in STUDIED is missing. To pull all items a given user hasn't studied, you can do something like this:
SELECT * FROM ITEM
WHERE NOT EXISTS (
SELECT * FROM STUDIED
WHERE
USER_ID = <given_user_id>
AND ITEM.ITEM_ID = STUDIED.ITEM_ID
)
Or, alternatively:
SELECT ITEM.*
FROM ITEM LEFT JOIN STUDIED ON ITEM.ITEM_ID = STUDIED.ITEM_ID
WHERE USER_ID = <given_user_id> AND STUDIED.ITEM_ID IS NULL
The good thing about this design is that you don't need to care about STUDIED table in advance. When adding a new user or item, just leave the STUDIED alone - you'll gradually fill it later as users progress with their studies.
I would do something like this:
1) A users table with a uid primary key
2) A enrolled table (this table shows all courses that have enrolled students) with a primary key of (uid, cid)
3) A items (info) table holding all items to study, with a primary key of itemid
Then in the enrolled table just have one attribute (a binary flag) 1 means it has been studyed and 0 means they still need to study it.

Storing Friends in Database for Social Network

For storing friends relationships in social networks, is it better to have another table with columns relationship_id, user1_id, user2_id, time_created, pending or should the confirmed friend's user_id be seralized/imploded into a single long string and stored along side with the other user details like user_id, name, dateofbirth, address and limit to like only 5000 friends similar to facebook?
Are there any better methods? The first method will create a huge table! The second one has one column with really long string...
On the profile page of each user, all his friends need to be retrieved from database to show like 30 friends similar to facebook, so i think the first method of using a seperate table will cause a huge amount of database queries?
The most proper way to do this would be to have the table of Members (obviously), and a second table of Friend relationships.
You should never ever store foreign keys in a string like that. What's the point? You can't join on them, sort on them, group on them, or any other things that justify having a relational database in the first place.
If we assume that the Member table looks like this:
MemberID int Primary Key
Name varchar(100) Not null
--etc
Then your Friendship table should look like this:
Member1ID int Foreign Key -> Member.MemberID
Member2ID int Foreign Key -> Member.MemberID
Created datetime Not Null
--etc
Then, you can join the tables together to pull a list of friends
SELECT m.*
FROM Member m
RIGHT JOIN Friendship f ON f.Member2ID = m.MemberID
WHERE f.MemberID = #MemberID
(This is specifically SQL Server syntax, but I think it's pretty close to MySQL. The #MemberID is a parameter)
This is always going to be faster than splitting a string and making 30 extra SQL queries to pull the relevant data.
Separate table as in method 1.
method 2 is bad because you would have to unserialize it each time and wont be able to do JOINS on it; plus UPDATE's will be a nightmare if a user changes his name, email or other properties.
sure the table will be huge, but you can index it on Member11_id, set the foreign key back to your user table and could have static row sizes and maybe even limit the amount of friends a single user can have. I think it wont be an issue with mysql if you do it right; even if you hit a few million rows in your relationship table.

MySQL Multiple interests matching problem

I have a database where users enter their interests. I want to find people with matching interests.
The structure of the interest table is
interestid | username | hobby | location | level | matchinginterestids
Let's take two users to keep it simple.
User Joe may have 10 different interest records
User greg may have 10 different interest records.
I want to do the following algorithm
Take Joe's interest record 1 and look for matching hobbies and locations from the interest database. Put any matching interest id's in the matches field. Then go to joe's interest record 2 etc..
I guess what I need is some sort of for loop that will loop through all of joe's intersts and then do an update each time it finds a match in the interest database. Is that even possible in MySQL?
Further example:
I am Dan. I have 3 interests. Each interest is composed of 3 subjects:
Dan cats,nutrition,hair
Dan superlens,dna,microscopes
Dan film,slowmotion,fightscenes
Other people may have other interests
Joe:
Joe cats,nutrition,strength
Joe superlens,dna,microscopes
Moe
Moe mysql,queries,php
Moe film,specialfx,cameras
Moe superlens,dna,microscopes
Now I want the query to return the following when I log in as Dan:
Here are your interest matches:
--- is interested in cats nutrition hair
Joe is interested in cats and nutrition
Joe and Moe are interested in superlens, dna, microscopes
Moe is interested in film
The query needs to iterate through all Dan's interests, and compare 3,2,1 subject matches.
I could do this in php from a loop but it would be calling the database all the time to get the results. I was wondering if there's a crafty way to do it using a single query Or maybe 3 separate queries one looking for 3 matches, one for 2 and one for 1.
This is definitely possible with MySQL, but I think you may be going about it in an awkward way. I would begin by structuring the tables as follows:
TABLE Users ( userId, username, location )
TABLE Interests( interestId, hobby )
TABLE UserInterests( userId, interestId, level )
When a user adds an interest, if it hasn't been added before, you add it to the Interests table, and then add it to the UserInterests table. When you want to check for other nearby folks with similar interests, you can simply query the UserInterests table for other people who have similar interests, which has all that information for you already:
SELECT DISTINCT userId
FROM UserInterests
WHERE interestId IN (
SELECT interestId
FROM UserInterests
WHERE userId = $JoesID
)
This can probably be done in a more elegant fashion without subqueries, but it's what I thought of now.
As per special request from daniel, although it's kind of duplicate but never mind.
The schema explained
TABLE User (id, username, location )
TABLE Interests(id, hobby )
TABLE UserInterest(userId, interestId, level )
Table users has just user data and a primary key field at the start: id.
The primary key field is a pure link field, the other fields are info fields.
Table Interest again has a primary key that is use to link against and some info field
(ehm well just one, but that's because this is an example)
Note that users and interests are not linked in any way whatsoever.
That's odd, why is that?
Well there is a problem... One user can have multiple intrests and intrests can belong to multiple people.
We can solve this by changing to users table like so:
TABLE users (id, username, location, intrest1, intrest2, intrest3)
But this is a bad, really really bad idea, because:
This way only 3 interests per user are allowed
It's a waste of space if many users have 2, 1 or no interests
And most important, it makes queries difficult to write.
Example query for linking with the bad users table
SELECT * FROM user
INNER JOIN interests ON (user.intrest1 = interests.id) or
(user.intrest2 = interests.id) or
(user.intrest3 = interests.id);
And that's just for a simple query listing all users and their interests.
It quickly gets horribly complex as things progress.
many-to-many relationships
The solution to the problem of a many to many relationship is to use a link table.
This reduces the many-to-many relationship into two 1-to-many relationships.
A: 1 userinterest to many user's
B: 1 userinterest to many interests
Example query using a link-table
SELECT * FROM user
INNER JOIN userInterest ON (user.id = userInterest.userID) //many-to-1
INNER JOIN interest ON (interest.id = userInterest.InterestID); //many-to-1
Why is this better?
Unlimited number of interests per user and visa versa
No wasted space if a user has a boring life and few if any interests
Queries are simpler to maintain
Making it interesting
Just listing all users is not very fun, because then we still have to process the data in php or whatever. But there's no need to do that SQL is a query language after all so let's ask a question:
Give all users that share an interest with user Moe.
OK, lets make a cookbook and gather our ingredients. What do we need.
Well we have a user "Moe" and we have other user's, everybody but not "Moe".
And we have the interests shared between them.
And we'll need the link table userInterest as well because that's the way we link user and interests.
Let's first list all of Moe's Hobbies
SELECT i_Moe.hobby FROM interests AS i_Moe
INNER JOIN userInterests as ui2 ON (ui2.InterestID = i_Moe.id)
INNER JOIN user AS u_Moe ON (u_Moe.id = ui2.UserID)
WHERE u_Moe.username = 'Moe';
Now we combine the select for all users against only Moe's hobbies.
SELECT u_Others.username FROM interests AS i_Others
INNER JOIN userinterests AS ui1 ON (ui1.interestID = i_Others.id)
INNER JOIN user AS u_Others ON (ui1.user_id = u_Others.id)
/*up to this point this query is a list of all interests of all users*/
INNER JOIN Interests AS i_Moe ON (i_Moe.Hobby = i_Others.hobby)
/*Here we link Moe's hobbies to other people's hobbies*/
INNER JOIN userInterests as ui2 ON (ui2.InterestID = i_Moe.id)
INNER JOIN user AS u_Moe ON (u_Moe.id = ui2.UserID)
/*And using the link table we link Moe's hobbies to Moe*/
WHERE u_Moe.username = 'Moe'
/*We limited user-u_moe to only 'Moe'*/
AND u_Others.username <> 'Moe';
/*and the rest to everybody except 'Moe'*/
Because we are using INNER JOIN's on link fields only matches will be considered and non-matches will be thrown out.
If you read the query in english it goes like this.
Consider all users who are not Moe, call them U_others.
Consider user Moe, call him U_Moe.
Consider user Moe's Hobbies, call those i_Moe
Consider other users's Hobbies, call those i_Others
Now link i_Others hobbies to u_Moe's Hobbies
Return only users from U_Others that have a hobby that matches Moe's
Hope this helps.