Whats the best way to implement a database with multivalued attributes? - mysql

i am trying to implement a database which has multi valued attributes and create a filter based search. For example i want my people_table to contain id, name, address, hobbies, interests (hobbies and interests are multi-valued). The user will be able to check many attributes and sql will return only those who have all of them.
I made my study and i found some ways to implement this but i can't decide which one is the best.
The first one is to have one table with the basic info of people (id, name, address), two more for the multi-valued attributes and one more which contains only the keys of the other tables (i understand how to create this tables, i don't know yet how to implement the search).
The second one is to have one table with the basic info and then one for each attribute. So i will have 20 or more tables (football, paint, golf, music, hiking etc.) which they only contain the ids of the people. Then when the user checks the hobbies and the activities i am going to get the desired results with the use of the JOIN feature (i am not sure about the complexity, so i don't know how fast is going to be if the user do many checks).
The last one is an implementation that i didn't find on internet (and i know there is a reason :) ) but in my mind is the easiest to implement and the fastest in terms of complexity. Use only one table which will have the basic infos as normal and also all the attributes as boolean variables. So if i have 1000 people in my table there are going to be only 1000 loops and which i imagine with the use of AND condition are going to be fast enough.
So my question is: can i use the the third implementation or there is a big disadvantage that i don't get? And also which one of the first two ways do you suggest me to use?

That is a typical n to m relation. It works like this
persons table
------------
id
name
address
interests table
---------------
id
name
person_interests table
----------------------
person_id
interest_id
person_interests contains a record for each interest of a person. To get the interests of a person do:
select i.name
from interests i
join person_interests pi on pi.interest_id = i.id
join persons p on pi.person_id = p.id
where p.name = 'peter'
You could create also tables for hobbies. To get the hobbies do the same in a separate query. To get both in one query you can do something like this
select p.id, p.name,
i.name as interest,
h.name as hobby
from persons p
left join person_interests pi on pi.person_id = p.id
left join interests i on pi.interest_id = i.id
left join person_hobbies ph on ph.person_id = p.id
left join hobbies h on ph.hobby_id = h.id
where p.name = 'peter'

The basic way to deal with this is with a many-to-many join table. Each user can have many hobbies. Each hobby can have many users. That's basic stuff you can find information about anywhere, and #juergend already covered that.
The harder part is tracking different information about various hobbies and interests. Like if their hobby is "baseball" you might want to track what position they play, but if their hobby is "travel" you might want to track their favorite countries. Doing this with typical SQL relationships will lead to a rapid proliferation of tables and columns.
A hybrid approach is to use the new JSON data type to store some unstructured data. To expand on #juergend's example, you might add a field to Person_Interests which can store some of those details about that person's interest.
create table Person_Interests (
InterestID integer references Interests(ID),
PersonID integer references Persons(ID),
Details JSON
);
And now you could add that Person 45 has Interest 12 (travel), their favorite country is Djibouti, and they've been to 45 countries.
insert into person_interests
(InterestID, PersonID, Details)
(12, 45, '{"favorite_country": "Djibouti", "countries_visited": 45}');
And you can use JSON search functions to find, for example, everyone whose favorite country is Djibouti.
select p.id, p.name
from person_interests pi
join persons p on p.id = pi.personid
where pi.details->"$.favorite_country" = "Djibouti"
The advantage here is flexibility: interests and their attributes aren't limited by your database schema.
The disadvantages is performance. The JSON data type isn't the most efficient, and indexing a JSON column in MySQL is complicated. Good indexing is critical to good SQL performance. So as you figure out common patterns you might want to turn commonly used attributes into real columns in real tables.
The other option would be to use table inheritance. This is a feature of Postgres, not MySQL, and I'd recommend considering switching. Postgres also has better and more mature JSON support and JSON columns are easier to index.
With table inheritance, rather than having to write a completely new table for every different interest, you can make specific tables which inherit from a more generic one.
create table person_interests_travel (
FavoriteCountry text,
CountriesVisited text[]
) inherits(person_interests);
This still has InterestID, PersonID, and Details, but it's added some specific columns for tracking their favorite country and countries they've visited.
Note that text[]. Postgresql also supports arrays so you can store real lists without having to create another join table. You can also do this in MySQL with a JSON field, but arrays offer type constraints that JSON does not.

Related

Need support to properly query my DB when a lot different data are coming into play

Suppose you have this schema:
Of course it's over-simplified in this example, just pretend you have a collection of users, that are described with a lot of different tables like the ones drawn here.
You can assume that:
Any user have a name,
Any user can speak one or more languages
Any user can own one or more titles or certifications
Any user can own one or more experiences
Suppose that you have to show this large amount of data to a third user, that needs to nimbly access, search and administrate this (great) amount of data with ease.
What is done:
My first approach was to show just the most relevant infos of each users, to provide a first clean interface where the admin could start from, and allow him to filter the records shown, using all the data available on the DB.
To cut a long story short, the admin can (or should be able to) "display all male users that speak english and worked for IBM" on screen, while seeing just a clean and simplified list of records, that he'll be able to examine further in a different way if he needs to.
How my query look like:
SELECT
users.id as id,
name,
surname,
etc,
certificazioni.title as certifications,
lingue.language as language,
esperienze.company,
FROM users
LEFT JOIN lingue ON users.id = lingue.iduser
LEFT JOIN certificazioni ON users.id = certificazioni.iduser
LEFT JOIN esperienze ON users.id = esperienze.iduser
GROUP BY users.id
ORDER BY users.id
I Built an interface that given some user input, is able to append conditions to this query like this:
WHERE language = 'English' AND Sex = 'm'
Now the problem:
With this query i'm able to find out if is there a certain user that speak English, is male and so on, but it fails to find out if is there a users that speaks both English and Dutch to say one.
Why?
(From my point of view) It's because (i'm failing to find a good approach, AND because) of the relations between users and other tables, that are one to many in most cases and causes the output of this query to be something like that:
without GROUP BY
ID NAME SEX LANGUAGE COMPANY
-----------------------------
12 Alamo M English IBM
12 Alamo M Italian NBA
12 Alamo M Dutch NULL
12 Alamo M French NULL
(Combination of every different value of each language, experience and so on)
with GROUP BY
ID NAME SEX LANGUAGE COMPANY
-----------------------------
12 Alamo M Italian NBA
(That are of course flatted with evident loose of information with the group by function)
Now the requirement:
My need is to find a different approach to this problem that consider the limits i'm imposed to have, and still allow me to efficiently query my DB in most cases.
I'm also uploading a screen of the platform, to better describe what kind of user input i expect:
One approach would be to use a subquery for each condition:
SELECT *
FROM users
WHERE
users.id IN (SELECT users_id FROM lingue WHERE language = 'English')
AND
users.id IN (SELECT users_id FROM lingue WHERE language = 'Dutch')
AND
...
This is not trivial. For each language/skill/certification condition you add to a query you need to (inner) join another table. Note that when the same table appears more than once, each instance must have its own alias, and you can use group_concat to put multiple input row values on the same output row e.g.
Select u,name, group_concat(c.title)
From users u
Inner join lingue l1
On u.id=l1.user_Id
And l1.language='Italian'
Inner join lingue l2
On l1,user_Id=l2.user_id
And l2.language='French'
Inner join certificazioni c
On u.id=c.user_id
Group by u.name
Lists names and qualifications of people whom speak both French and Italian, and have at least one qualification. But you may find it simpler to denormalise the database into one or two tables and use full text search.

Mysql: is it better to split tables if possible?

To make you understand my question I'll give you an example:
I have a chat web app with many rooms, let's say 5 rooms.
People can choose to stay only in one room and they choose it at login.
When they choose the room I have to retrieve the people already in the room, so I can structure my db in two ways:
each room one table with the people being records;
all the rooms in one table, people are the records and a column indicating the room they are in;
In the first case the query would be:
SELECT * FROM 'room_2' WHERE 1
In the second case the query would be:
SELECT * FROM 'rooms' WHERE room = 'room_2'
Which is the best?
I think the only parameter to consider is performance, right?
In this example, no, because people are all 'like' objects and should therefore be in the same table.
All people and rooms in one table with a primary key on people, in this simple example.
Table Rooms(pk_person, personName, table_id)
But I want to talk about a structure that you will want to consider as your website grows. You’ll want three tables, one for each object (chat rooms, people) and one for the relationships.
Chat_Rooms(pk_ChatId, ChatName, MaxOccupants, other unique attributes of a chat room)
People(pk_PersonID, FirstName, LastName, other unique attributes of a person)
Room_People_Join(pk_JoinId, fk_ChatId, fk_PersonID, EnterDateTime, ExitDateTime)
This is a “highly normalized” structure. Each table is a collection of like objects, the join allows for many to many relationships, and object rows are not duplicated. So, a Person with all their attributes (name, gender, age) is never duplicated in the person table. Also, the person table never defines which chat rooms a person is in, because a person could be in one, many, none, or may have entered and exit multiple times. The same concept applies to a chat room. A chat rooms features, such as background color, max occupants, etc. have nothing to do with people.
The Room_People_Join is the important one. This has a unique primary key for which chat rooms a person is in and when they were there. This table grows indefinitely, but it tracks usage. Including the relationship table is what logically normalizes your database.
So how do you know which users are currently in chat room 1? You join your people and rooms to the join table with their respective Primary and Foreign keys in your FROM clause, ask for the columns you want in your SELECT clause, and filter for chat room 1 and people who haven’t yet left.
SELECT p.FirstName, p.LastName, r.ChatName
FROM Room_People_Join j
JOIN People p ON j.fk_PersonID = p.pk_PersonID
JOIN Chat_Rooms r ON j.fk_ChatId = r.pk_ChatId
WHERE r.ExitDateTime IS NOT NULL
AND pk_ChatId = 1
Sorry that’s long winded, but I extrapolated your question for database growth.
The answer is very simple and strongly recommended - one database table for all rooms for sure! What if you will later like to create rooms dynamically!? For sure you would not create new tables dynamically.

Best option for getting feed data from multiple tables?

I am having a database design issue and i'm still pretty new to MySQL so I thought I would ask here. What would be the best way to get data for a chronological feed from multiple tables? For example a user does many things, they vote, comment, rate, ask questions. I save all this information in their respective tables "tblVote", "tblRate" etc, now the tricky part. a user can follow a user or many, so say you follow 3-4 people. Following allows you to see their interactions, voting, rating, commenting, asking questions etc in your feed (like facebook or something similar).
What would be the best way to get all the information from all 5 tables for every person they follow and then sort all of that chronologically? I Am assuming my current method (foreach follower grab all votes, comments, ratings etc and sort all would be terrible)
My working theory, so my working idea is to create a Interaction table, that has a column for the users id, the id of the other tables entry, and a type reference. so for example
User ID | InteractionID | Type
9 1232 Comment
10 80 Rating
9 572 Vote
Then you could just go ahead and grab all Interactions for each of the people they follow, sort that and then say grab the top 10? and query the individual databases to get the full info (time of comment, text of comment etc)
A many to many relationship exists between User and Follower. Since, Follower is also another user, this becomes a recursive many-to-many. When you decompose this relationship, you get a Association table or a gerund.
User_Follower {id, userid_fk, followerid_fk}
Both the userid_fk and followerid_fk are referencing to the User table.
Now, I am assuming you have a One-to-many relationship between User-tblRate, User-tblVote, User-tblPost etc.
So, you can write a join something like this:
select p.postTitle, p.postTag, ...,
c.commentId, c.commentData, ...
from (tblUser u INNER JOIN tblPost p
ON (p.userid = u.userid)) INNER JOIN tblComment c
ON (c.userid =
u.userid)
where u.userid in
(select userid_fk from user_follower where followerid_fk = 100)
orderby p.datetime_col ASC, c.datetime_col ASC
LIMIT 10;
100 is the user you want to get the information for.
The idea is that you just have one association table linking the User and Follower, then use simple joins to get the data for all the followees

'Likes' system database

I am developing web application where I have to implement 'Likes' system as facebook has. Application will have a few categories of products that customer can 'like'. So I have started to create database, but I stuck on one obstacle. As I understand there are two ways of doing this:
First. Create one database table with fields of "id, user_id, item_category, item_id". When user click 'like' button information will be saved in this table with various categories of products (item_category).
Second. Create several tables for certain categories of item. For instance, "tbl_item_category_1, tbl_item_category_2, tbl_item_category_3" with fields of "user_id, item_id".
Would be great to get more insight about best practices of this kind database structures. Which works faster? and more logical/practical? I will use only several categories of items.
I would go with the first version with a table structure similar to this:
User Table: PK id
id
username
Category Table: PK id
id
categoryname
Like Table: PK both user_id and catgory_id
user_id
category_id
Here is a SQL Fiddle with demo of table structure with two sample queries to give the Total Likes by user and Total Likes by category
The second one - creating multiple tables is a terrible idea. If you have 50-100 categories trying to query those tables would be horrible. It would become completely unmanageable.
If you have multiple tables trying to get a the total likes would be:
Select count(*)
from category_1
JOIN category_2
ON userid = userid
join category_3
ON userid = userid
join .....
Use one table, no question.
The first method is the correct one. Never make multiple tables for item categories, it makes maintaining your code a nightmare, and makes queries ugly.
In fact, the general rule is that anything that is dynamic (i.e. it changes) should not be stored as a set of static objects (e.g. tables). If you think you might add a new type of 'something' later on, then you need a 'something' types table.
For example, imagine trying to get a count of how many items a user has liked. With the first method, you can just do SELECT COUNT(*) FROM likes WHERE user_id = 123, but in the second method you'd need to do a JOIN or UNION, which is bad for performance and bad for maintainability.
The first method is the correct one. Because you dont know how many categories you will be having and it is very difficult to get the data.

MySQL Multiple interests matching problem

I have a database where users enter their interests. I want to find people with matching interests.
The structure of the interest table is
interestid | username | hobby | location | level | matchinginterestids
Let's take two users to keep it simple.
User Joe may have 10 different interest records
User greg may have 10 different interest records.
I want to do the following algorithm
Take Joe's interest record 1 and look for matching hobbies and locations from the interest database. Put any matching interest id's in the matches field. Then go to joe's interest record 2 etc..
I guess what I need is some sort of for loop that will loop through all of joe's intersts and then do an update each time it finds a match in the interest database. Is that even possible in MySQL?
Further example:
I am Dan. I have 3 interests. Each interest is composed of 3 subjects:
Dan cats,nutrition,hair
Dan superlens,dna,microscopes
Dan film,slowmotion,fightscenes
Other people may have other interests
Joe:
Joe cats,nutrition,strength
Joe superlens,dna,microscopes
Moe
Moe mysql,queries,php
Moe film,specialfx,cameras
Moe superlens,dna,microscopes
Now I want the query to return the following when I log in as Dan:
Here are your interest matches:
--- is interested in cats nutrition hair
Joe is interested in cats and nutrition
Joe and Moe are interested in superlens, dna, microscopes
Moe is interested in film
The query needs to iterate through all Dan's interests, and compare 3,2,1 subject matches.
I could do this in php from a loop but it would be calling the database all the time to get the results. I was wondering if there's a crafty way to do it using a single query Or maybe 3 separate queries one looking for 3 matches, one for 2 and one for 1.
This is definitely possible with MySQL, but I think you may be going about it in an awkward way. I would begin by structuring the tables as follows:
TABLE Users ( userId, username, location )
TABLE Interests( interestId, hobby )
TABLE UserInterests( userId, interestId, level )
When a user adds an interest, if it hasn't been added before, you add it to the Interests table, and then add it to the UserInterests table. When you want to check for other nearby folks with similar interests, you can simply query the UserInterests table for other people who have similar interests, which has all that information for you already:
SELECT DISTINCT userId
FROM UserInterests
WHERE interestId IN (
SELECT interestId
FROM UserInterests
WHERE userId = $JoesID
)
This can probably be done in a more elegant fashion without subqueries, but it's what I thought of now.
As per special request from daniel, although it's kind of duplicate but never mind.
The schema explained
TABLE User (id, username, location )
TABLE Interests(id, hobby )
TABLE UserInterest(userId, interestId, level )
Table users has just user data and a primary key field at the start: id.
The primary key field is a pure link field, the other fields are info fields.
Table Interest again has a primary key that is use to link against and some info field
(ehm well just one, but that's because this is an example)
Note that users and interests are not linked in any way whatsoever.
That's odd, why is that?
Well there is a problem... One user can have multiple intrests and intrests can belong to multiple people.
We can solve this by changing to users table like so:
TABLE users (id, username, location, intrest1, intrest2, intrest3)
But this is a bad, really really bad idea, because:
This way only 3 interests per user are allowed
It's a waste of space if many users have 2, 1 or no interests
And most important, it makes queries difficult to write.
Example query for linking with the bad users table
SELECT * FROM user
INNER JOIN interests ON (user.intrest1 = interests.id) or
(user.intrest2 = interests.id) or
(user.intrest3 = interests.id);
And that's just for a simple query listing all users and their interests.
It quickly gets horribly complex as things progress.
many-to-many relationships
The solution to the problem of a many to many relationship is to use a link table.
This reduces the many-to-many relationship into two 1-to-many relationships.
A: 1 userinterest to many user's
B: 1 userinterest to many interests
Example query using a link-table
SELECT * FROM user
INNER JOIN userInterest ON (user.id = userInterest.userID) //many-to-1
INNER JOIN interest ON (interest.id = userInterest.InterestID); //many-to-1
Why is this better?
Unlimited number of interests per user and visa versa
No wasted space if a user has a boring life and few if any interests
Queries are simpler to maintain
Making it interesting
Just listing all users is not very fun, because then we still have to process the data in php or whatever. But there's no need to do that SQL is a query language after all so let's ask a question:
Give all users that share an interest with user Moe.
OK, lets make a cookbook and gather our ingredients. What do we need.
Well we have a user "Moe" and we have other user's, everybody but not "Moe".
And we have the interests shared between them.
And we'll need the link table userInterest as well because that's the way we link user and interests.
Let's first list all of Moe's Hobbies
SELECT i_Moe.hobby FROM interests AS i_Moe
INNER JOIN userInterests as ui2 ON (ui2.InterestID = i_Moe.id)
INNER JOIN user AS u_Moe ON (u_Moe.id = ui2.UserID)
WHERE u_Moe.username = 'Moe';
Now we combine the select for all users against only Moe's hobbies.
SELECT u_Others.username FROM interests AS i_Others
INNER JOIN userinterests AS ui1 ON (ui1.interestID = i_Others.id)
INNER JOIN user AS u_Others ON (ui1.user_id = u_Others.id)
/*up to this point this query is a list of all interests of all users*/
INNER JOIN Interests AS i_Moe ON (i_Moe.Hobby = i_Others.hobby)
/*Here we link Moe's hobbies to other people's hobbies*/
INNER JOIN userInterests as ui2 ON (ui2.InterestID = i_Moe.id)
INNER JOIN user AS u_Moe ON (u_Moe.id = ui2.UserID)
/*And using the link table we link Moe's hobbies to Moe*/
WHERE u_Moe.username = 'Moe'
/*We limited user-u_moe to only 'Moe'*/
AND u_Others.username <> 'Moe';
/*and the rest to everybody except 'Moe'*/
Because we are using INNER JOIN's on link fields only matches will be considered and non-matches will be thrown out.
If you read the query in english it goes like this.
Consider all users who are not Moe, call them U_others.
Consider user Moe, call him U_Moe.
Consider user Moe's Hobbies, call those i_Moe
Consider other users's Hobbies, call those i_Others
Now link i_Others hobbies to u_Moe's Hobbies
Return only users from U_Others that have a hobby that matches Moe's
Hope this helps.