I'm working on a JSON api. I'm aiming for speed, and making the least amount of queries possible by joining related data.
I can do joins. But I'm confused about something. How do I join multiple tables that return n number of records? For example, lets say I have the following tables:
- Users
- Addresses
- Orders
I want to get use 5 from the database, and their address, and orders, in one query.
Joining Users and Addresses would return all the Addresses the user has. Each Address as a row, with Users columns. But, when you add another table that can also return n results, how does the database return that?
I hope this isn't too confusing. I struggled to put it into better words.
If you make a JOIN from a table A to a table B ON A.UserID = B.UserID, and B.UserID is not unique, it returns as many rows as B dictates and fields selected from A are duplicated.
To put in a example, if you do:
SELECT a.Name, B.Adress FROM users A INNER JOIN adresses B ON A.UserID = B.UserID
And adresses contains 3 rows for ID 1 (Let's call him Max), then the output would be:
Max | 123 Fake St.
Max | 456 Real St.
Max | 789 Imaginary St.
The same would apply if you add a third table. Records from tables A and B will be duplicated for each match in table C.
Related
This question already has an answer here:
How can I use FIND_IN_SET within a JOINED table?
(1 answer)
Closed 5 years ago.
Got 2 tables - contacts and messages:
contact_id | contact_email
1 | some#mail.com
2 | other#mail.com
3 | no#nono.com
message_id | message_recipients
1 | 1,2,3
2 | 3
message_recipients field contains ID(s) of contact(s) message was assigned to. Each message can have one or more IDs assigned, so they are separated by , symbol.
I need to show all contacts, and count of messages are assigned to each contact. Since message_recipients field may contain multiple IDs, I can't run a query like SELECT * FROM contacts, messages WHERE contacts.contact_id=messages.message_recipients because it won't work properly.
If I run SELECT * FROM contacts FULL JOIN messages, it returns many duplicated rows from contacts table. Sure thing, I can run SELECT * FROM contacts FULL JOIN messages GROUP BY contact_id, but this one returns only 1st message from messages table.
I know that in order to count how many messages each contact has assigned to, I will probably need to explode message_recipients field from each row into array and use code like if (in_array($contact_id, $message_recipients_array)) {$total++;} or similar. Now my main concern is how to all I need by writing as simple query as possible.
Fix your table structure. Do not store multiple values in one cell. See Normalization
For now, you can use FIND_IN_SET:
select c.contact_id,
c.contact_email,
count(*) no_of_messages
from messages m
join contacts c on find_in_set(c.contact_id, m.message_recipients) > 0
group by c.contact_id,
c.contact_email
But this will be slow as it can't use any index on the contact_id or message_recipients.
To actually fix the issues, don't include recipient_id in the messages table.
You should have stored single recipient in one row in a separate mapping table with many to many relation with (maybe) the following structure.
messages_recipients (
id int PK,
message_id int FK referring message(message_id),
message_recipient_id int FK referring contacts(contact_id)
)
Then all you had to do was:
select c.contact_id,
c.contact_email,
count(*) no_of_messages
from messages_recipients m
join contacts c on c.contact_id = m.message_recipient_id
group by c.contact_id,
c.contact_email
This query is Sargable and will be faster.
Fix your data structure! Storing ids in strings is a really bad idea. Why?
Numbers should be stored as numbers not strings.
SQL does not offer very good string functions.
Foreign key constraints should be properly expressed.
The query optimizer cannot use indexes or partitions.
SQL has a great method for storing lists: it is called a "table".
Sometimes, we are stuck with other people's really, really bad design decisions. MySQL does offer a method for doing what you want, find_in_set(). This is a hack to get around the short-comings of a bad data layout:
select . . .
from contacts c join
messages m
on find_in_set(c.contact_id, m.message_recipients) > 0
Trying to do things a bit different with a database, I got a table called "services", this table consist off pID, uID, serviceID.
Then I got a table called "user_profile", that of course got the same uID as used in the table services.
So a user can have multiple services, let's say
pID uID serviceID
1 1 101
2 1 102
3 1 104
4 2 105
So how do I join this to my user_profile data? I'm a bit confused about that.
Let's say somebody visits the profile with uID 1.
Then I need all the services to in the same SQL call if that's possible somehow?
Hope I make abit of sense.
In order to relate tables in SQL you must have in both tables the same column, in your example uID.
Then you write something like:
select a.uID,b.pID,b.serviceID from user_profile a left join services b on a.uID=b.uID
I have two tables:
1. Person
2. Record
Multiple rows from table person can be connected to one row from record.
But if i have one person that is connected to multiple records, what is the best way to connect this?
Record no.1 = Person no.1 , Person no.2, Person no.3
Record no.2 = Person no.4, Person no.5, Person no.2
Is it the best way to create a new table called relations and add two columns: recordid, personid . So it would look like this:
recordid | personid
1 1
1 2
1 3
2 4
2 5
2 2
I am doing this becaues if a user changes data for one person, it should be displayed also in other records that have same person atached to it.
Basicly i would get what rows from person are conncted to rows from record by doing this:
SELECT `record`.`data` FROM `record` LEFT JOIN `relation` ON `record`.`id`=`relation`.`recordid` LEFT JOIN `person` ON `relation`.`personid`=`person`.`id`
I would use this kind of joining for getting the data and for all the searches through records (by name, surname, and a lot of other parametes), and i could have more than 100k rows in a person teble and more than 100k rows in person table.
Is there some other simplier and faster way to do something like this?
Yes, this is the right usage.
However, depending on what you want, an Inner Join may be more faster than a Left Outer Join, particularly since you speak of more than 100k records etc. Of course, it would fetch only records those where an entry is present and matched.
Also, depending on what you do after running the query, you can write separate queries to find records that match and do not match and display accordingly.
I have tables called: activities and deficiencies.
activities table contains all activities registered by students. deficiencies table contains all deficiencies that the students might get due to the registered activity.
Here are the table structures with sample data:
activities table
activityid title
-------------------------------------------------------------
1 Student Retreat
2 Student Orientation
deficiencies table
deficiencyid activity_id deficiency status
-------------------------------------------------------------
1 1 NARRATION CLEARED
2 1 PHOTO CLEARED
3 1 REPORT CLEARED
4 2 NARRATION WAITING
5 2 PHOTO CLEARED
6 2 REPORT WAITING
For each activity entry, there will be three rows in the deficiencies table. I want to be able to list each activity once if all the statuses of the items listed there are already CLEARED. So if one or more is still WAITING - they don't get listed in the query.
I was attempting to do this using this query but I couldn't get any lucky:
SELECT * FROM deficiencies,activities WHERE status='CLEARED' AND activityid=activity_id AND COUNT(deficiencyid)=3 GROUP BY activity_id ORDER BY deficiencyid ASC
I was getting the following from MySQL:
Invalid use of group function
The output I was expecting is the first record in the activities table.
What could be the best solution using only one query without multiple SELECT in SELECT in another SELECT sub-queries? There will be thousands of records in the tables so I'm hoping that the most efficient query can be used.
If you want to do this with a JOIN:
SELECT A.activityid, A.title FROM activities A INNER JOIN deficiencies D
ON A.activityid = D.activityid WHERE D.status = 'CLEARED'
GROUP BY A.activityID, A.title HAVING COUNT(*) = 3
This JOINs the activity and deficiencies tables, filters out the records other than CLEARED, groups by activity, and the filters out the groups that do not have exactly three records in them.
It requires that the data is guaranteed to be as you described it (always three deficiency records). I wrote the GROUP BY to avoid using the MySQL extension allowing non-grouped, non-aggregated columns to be selected. Also, I assume that there's also a studentid field involved which you left out for the sake of clarity, otherwise this whole system will support only a single student.
I've got 3 tables that are something like this (simplified here ofc):
users
user_id
user_name
info
info_id
user_id
rate
contacts
contact_id
user_id
contact_data
users has a one-to-one relationship with info, although info doesn't always have a related entry.
users has a one-to-many relationship with contacts, although contacts doesn't always have related entries.
I know I can grab the proper 'users' + 'info' with a left join, is there a way to get all the data I want at once?
For example, one returned record might be:
user_id: 5
user_name: tom
info_id: 1
rate: 25.00
contact_id: 7
contact_data: 555-1212
contact_id: 8
contact_data: 555-1315
contact_id: 9
contact_data: 555-5511
Is this possible with a single query? Or must I use multiple?
It is possible to do what you're asking in one query, but you'd either need a variable number of columns which is evil because SQL isn't designed for that, or you'd have to have a fixed number of columns, which is even more evil because there is no sensible fixed number of columns you could choose.
I'd suggest using one of two alternatives:
1. Return one row for each contact data, repeating the data in other columns:
5 tom 1 25.00 7 555-1212
5 tom 1 25.00 8 555-1315
5 tom 1 25.00 9 555-5511
The problem with this of course is that redundant data is normally a bad idea, but if you don't have too much redundant data it will be OK. Use your judgement here.
2. Use two queries. This means a slightly longer turnaround time, but less data to transfer.
In most cases I'd prefer the second solution.
You should try to avoid making a large number of queries inside a loop. This can almost always be rewritten to a single query. But if using two queries is the most natural way to solve your problem, just use two queries. Don't try to cram all the data you need into a single query just for the sake of reducing the number of queries.
Each row of result must have the same columns, so you can't aggregate multiple rows of contact not having the other columns as well.
Hopefully, this query would achieve what you need:
SELECT
u.user_id as user_id,
u.user_name as user_name,
i.info_id as info_id,
i.rate as rate,
c.contact_id as contact_id,
c.contact_data as contact_data
FROM users as u
LEFT JOIN info as i ON i.user_id = u.user_id
LEFT JOIN contacts as c ON c.user_id = u.user_id