So, I'm working on my outer join skills with pure SQL (MySQL is the DBMS I'm using). I've got four tables: People, Address, Phone, Email. All are pretty self explanatory. People is the main table, every record in the other three contains a foreign key references the People table.
I want to write a query that will do outer joins on all four tables. Ideally, here is what I want the results to look like:
Name Address Phone Email
Bob 5 Steel Dr 512-222-1358 bob#gmail.com
Bob 212-333-4444 bob#aol.com
Bob bob#hotmail.com
The idea is to have the address/phone/email records not repeat at all. The info from the People table can repeat, that's fine, but I would be trying to avoid repeats from the others.
Here is a very, very crude version of what I'm going for.
select p.name, e.email, a.address
from people p
left join email e on p.pid = e.pid
left join address a on p.pid = a.pid
where p.pid = 1;
It doesn't work, though. It repeats all the records that match each other.
Is there any way to get the effect I'm going for? I'm a bit rusty on outer joins.
Not without jumping through a lot of unreliable "hoops", usually involving session variables.
You are overlooking that the result set you want implies relationships that are just not there. 5 Steel Dr, 512-222-1358, and bob#gmail.com have no direct connection. Your results slightly imply the phone number may be present at the address. What you have are 3 lists that happen to be aligned side by side.
A query that outputs this kind of result is the equivalent of giving each row a "line number".
There is no simple solution to get the output like you describe in a single SQL query.
Which shouldn't be surprising, because that output violates Fourth Normal Form.
Honestly, I would run three queries, one for People joined to Address, one for People joined to Phone, and one for People joined to Email.
select p.name, a.address
from people p
inner join address a on p.pid = a.pid
where p.pid = 1;
select p.name, ph.phone
from people p
inner join phone ph on p.pid = ph.pid
where p.pid = 1;
select p.name, e.email
from people p
inner join email e on p.pid = e.pid
where p.pid = 1;
Then combine the results however you want in application code.
Related
I have an elementary question about SQL query with joining the same table twice. It sounds very simple, but I have some troubles with it. I hope, anyone can help me with this issue :)
I have two little tables: "peoples" (columns: id, name, ...) and "likes" (id, who, whom). People may set the "likes" to each other. The relationship is many to many.
I want get the table with peoples likes: count of received "likes", delivered and count of mutual likes.
All is correctly, when I use only one join. But for two joins (or more) MySQL combine all rows (as expected) and I get wrong values in counts. I don't know, how I must use count/sum/group-by operators in this case:( I would like to do this without subqueries in one query.
I used a query like this:
SELECT *, count(l1.whom), count(l2.whom)
FROM people p
LEFT JOIN likes l1 ON l1.who = p.id
LEFT JOIN likes l2 ON l2.whom = p.id
GROUP BY p.id;
SELECT p.id, name,
count(lwho.who) delivered_likes,
count(lwhom.whom) received_likes,
count(lmut.who) mutual_likes
FROM people AS p
LEFT JOIN likes AS lwho ON p.id = lwho.who
LEFT JOIN likes AS lwhom ON lwhom.id = lwho.id
LEFT JOIN likes AS lmut ON lwhom.who = lmut.whom AND lwhom.whom = lmut.who
GROUP BY p.id;
But it's calculated the counts of likes incorrect.
It's issue just for training and performance is not important, but I guess, that three joins in my last query is too much. Can I do it using 2 joins?
Thanks in advance for help.
I surmise that there is a 1:N relationship between people and likes.
One problem with your second query, as far as I can tell, is that the lwhom correlation of likes is joined to lwho via id=id. Basically lwhom is lwho. I'd recommend changing the ON clause for this correlation from lwhom.id = lwho.id to p.id = lwhom.whom.
The counts will still be affected by the JOINs, however. Supposing that you have an ID column in the likes table, though, you could then have each COUNT tally the distinct Like IDs per person – if not, consider just using COUNT(DISTINCT correlation.*) instead.
Digressions aside, the following should hopefully work:
SELECT p.id, name,
count(distinct lwho.id) delivered_likes,
count(distinct lwhom.id) received_likes,
count(distinct lmut.id) mutual_likes
FROM people AS p
LEFT JOIN likes AS lwho ON p.id = lwho.who
LEFT JOIN likes AS lwhom ON p.id = lwhom.whom
LEFT JOIN likes AS lmut ON lwhom.who = lmut.whom AND lwhom.whom = lmut.who
GROUP BY p.id,p.name;
I have an SQL Fiddle here.
I have a printed table here, and I issue a query to attempt to join the tables where the Tech_id, clients_id, job_id, part_id should populate with corresponding key in their tables / column too.
Here is my query:
SELECT * FROM work_orders, technicians as tech, parts_list as parts, job_types as job, clients as client
LEFT JOIN technicians ON tech_id = technicians.tech_name
LEFT JOIN parts_list ON part_id = parts_list.Part_Name
LEFT JOIN job_types ON job_id = job_types.Job_Name
LEFT JOIN clients ON clients_id = clients.client_name
I've messed around with multiple different variations, this one seem to be syntax correct, but now I'm getting: Column 'clients_id' in on clause is ambiguous
I'm sure that it will happen for not only clients but maybe others. I want to be able to print the table as in the picture above, but with the clients listed. Is it possible to be done via one query as well? thanks.
You have two problems.
First (this might not be your problem, but that's a "good practice"), you shouldn't use SELECT *, as you could indeed have a field with same name in different tables.
This is one (of the many) good reason to avoid * in a Select clause.
Then, your main problem is that you select tables in your from clause, and then again by joining.
Problematic line :
FROM work_orders, technicians as tech, parts_list as parts, job_types as job, clients as client
So (I don't know your table structure, so they may be errors, but you've got the idea)
SELECT
w.client_id,
t.tech_name
--etc
FROM work_orders w
LEFT JOIN technicians t ON c.tech_id = t.tech_name
LEFT JOIN parts_list p ON c.part_id = p.Part_Name
LEFT JOIN job_types j ON w.job_id = j.Job_Name
LEFT JOIN clients c ON w.clients_id = c.client_name
This means that clients_id exists in multiple tables. You need to specify which one you want. So if you for example want the clients_id of the clients table, do SELECT clients.clients_id
If all the fiels listed in your question are in the clients table you could do:
SELECT clients.* FROM work_orders, technicians as tech, parts_list as parts, job_types as job, clients as client
LEFT JOIN technicians ON tech_id = technicians.tech_name
LEFT JOIN parts_list ON part_id = parts_list.Part_Name
LEFT JOIN job_types ON job_id = job_types.Job_Name
LEFT JOIN clients ON clients_id = clients.client_name
I'm sure this has been asked before, and I apologize for asking it again, but I've done some searching probably for the wrong terms and just haven't been able to find the right approach.
I have two tables. Websites is a list of web sites (id | website), cdd is a list of users and the site they were referred from (userid | website | etc..). Not every site that referred users is a sponsor, and thus in the websites table. Also, not every site that is a sponsor has sent us users. I need a list of the number of users from each sponsor, including the 0s.
This is the query I have so far:
SELECT w.website, COUNT(*) AS num FROM websites w LEFT JOIN cdd c ON w.website = c.website WHERE c.submitted LIKE '05/26/11 %' GROUP BY w.website ORDER BY num DESC;
There are five sites in the website table, but one has not sent any users. That one does not show up in the queries.
Any thoughts?
The LEFT JOIN is correct, but because you are specifying a WHERE clause against the table (cdd) on the right side of the join, you are filtering out websites that have no associated cdd record. You need to specify that criterion like so:
[...]
FROM websites w
LEFT JOIN cdd c ON w.website = c.website
WHERE c.submitted IS NULL OR c.submitted LIKE '05/26/11 %'
[...]
which includes the websites that don't join to any cdd record, or
[...]
FROM websites w
LEFT JOIN cdd c ON w.website = c.website AND c.submitted LIKE '05/26/11 %' 'Replaces WHERE clause
[...]
which includes all websites, but only joins to cdds with the matching submitted date.
Note: To ensure that sites with no associated users return a count of 0, you may also need to COUNT() a column from cdd, rather than *...
I need a list of the number of users from each sponsor, including the 0s.
In that case you probably should be using a left join:
SELECT a.site, COUNT(b.ref_site) AS num FROM table1 a LEFT JOIN table2 b ON a.site = b.ref_site GROUP BY a.site;
Make the INNER JOIN a LEFT JOIN instead and I think you're good to go.
SELECT a.site, COUNT(b.ref_site) AS num
FROM table1 a
LEFT JOIN table2 b
ON a.site = b.ref_site
GROUP BY a.site;
In this sql:
SELECT s.*,
u.id,
u.name
FROM shops s
LEFT JOIN users u ON u.id = s.user_id
OR u.id = s.owner_user_id
WHERE s.status = 1
For some reason this query takes an amazing time. although id is the primary key. it seems especially after I added this part OR u.id=s.owner_user_id the query became slow. owner_user_id often is 0 only handful of times. But why would it take so long apparently scanning the whole table? The database table users is very long and big. I didn't design it. this is for a client who subsequent programmers added too many fields. the table is 22k rows and dozens of fields.
*the names of the fields for demonstration only. actual names are different, so don't ask me why I'm looking for owner_user_id (; I did solve the slowness by remove the "OR ..." part and instead searching for the id in the loop if it is not 0. but I would like to know why this is happening and how to speedup that query as is.
You may be able to speed it up by using IN instead of the OR but that is minor.
SELECT u.id,
u.name
FROM shops s
LEFT JOIN users u ON u.id IN ( s.user_id, s.owner_user_id )
WHERE s.status = 1
Firstly, are there any indexes on this table? Mainly one on the user.id field or the s.user_id or s.owner_user_id?
However, I must ask why you need to use a LEFT JOIN instead of a regular join. The LEFT JOIN causes the matching of every row with every other one. And since I'm assuming the value / id should either be in the user_id or the owner_user_id field, and that there will always be a match, if that is the case then the use of a JOIN should speed the query up a bit.
And as Mitch said, 22k rows is tiny.
How are you going to know which user record is which? Here's how I'd do it
SELECT s.*,
u.name AS user_name,
o.name AS owner_name
FROM shops s
LEFT JOIN users u ON s.user_id = u.id
LEFT JOIN users o ON s.owner_user_id = o.id
WHERE s.status = 1
I've omitted the IDs from the user table in the SELECT as these will be part of s.* anyway.
I'm curious about the left joins too. If shops.user_id and shops.owner_user_id are required foreign keys, use inner joins instead.
I have data related as follows:
A table of Houses
A table of Boxes (with an FK back into Houses)
A table of Things_in_boxes (with an FK back to Boxes)
A table of Owners (with an FK back into Houses)
In a nutshell, a House has many Boxes, and each Box has many Things in it. In addition, each House has many Owners.
If I know two Owners (say Peter and Paul), how can I list all the Things that are in the Boxes that are in the Houses owned by these guys?
Also, I'd like to master this SQL stuff. Can anyone recommend a good book/resource? (I'm using MySQL).
Thanks!
Peter and Paul are gay couple ?
Then you should go for many-to-many relationship instead of having ownerID inside of Houses Table
ie. Houses2Owners with two columns ownerID and houseID
then the query would be
select item from houses as h
left join Boxes as b on h.houseID=b.houseID
left join Things as t on b.boxID=t.boxID
left join Houses2Owners as h2o on h.houseID=h2o.houseID
left join Owners as o on h2o.ownerID=o.ownerID
Main question you should ask yourself while designing that would be whether each object will appear once ie. if there are two similar boxes with similar things in them or ie. two boxes with ski masks in them.
Then you should create tables with no relationship to parent object and also to create a table that connects two tables. This way you will avoid ski mask to appear twice for two boxes which contain that mask.
SELECT
Things_in_boxes.*
FROM
Houses
JOIN Boxes ON Houses.HouseID = Boxes.House
JOIN Things_in_boxes ON Boxes.BoxID = Things_in_boxes.Box
WHERE
Houses.Owner = 'Peter' OR Houses.Owner = 'Paul'
As for resources to learn from... I can't really suggest anything specific. I learnt how to use (My)SQL gradually and from a number of sources, and can't single any of them out as having been of primary importance. w3schools has OK coverage of the very basic stuff, and MySQL's own documentation (available on the web, google for it) does an OK job and is a reasonable reference for when you want to know the nitty gritty of some topic or other.
EDIT: The above answer is wrong. I had missed the stipulation that a House can have multiple Owners.
New approach: I'll assume that there is a cross-referencing table, HouseOwners, with House and Owner as foreign keys.
My first thought was this:
SELECT
Things_in_boxes.*
FROM
Houses
JOIN Boxes ON Houses.HouseID = Boxes.House
JOIN Things_in_boxes ON Boxes.BoxID = Things_in_boxes.Box
JOIN HouseOwners ON Houses.HouseID = HouseOwners.House
WHERE
HouseOwners.Owner = 'Peter' OR HouseOwners.Owner = 'Paul'
However, this is not quite right. If both Peter and Paul are Owners of a given house, then the things in the boxes in that house would show up twice. I think a subquery is needed.
SELECT
Things_in_boxes.*
FROM
Houses
JOIN Boxes ON Houses.HouseID = Boxes.House
JOIN Things_in_boxes ON Boxes.BoxID = Things_in_boxes.Box
WHERE
Houses.HouseID IN (
SELECT DISTINCT House
FROM HouseOwners
WHERE Owner = 'Peter' OR Owner = 'Paul'
) AS MySubquery
SELECT t.name
FROM Houses h
INNER JOIN Boxes b ON b.houseId = h.id
INNER JOIN Things t ON t.boxId = b.id
INNER JOIN Owners o ON o.houseId = h.id
WHERE o.name = 'Peter' OR o.name = 'Paul'
By using inner joins you can combine these 4 tables with all the linked information. There is also an other way using inner select queries:
SELECT t.name
FROM Houses h
INNER JOIN Boxes b ON b.houseId = h.id
INNER JOIN Things t ON t.boxId = b.id
INNER JOIN Owners o ON o.houseId = h.id
WHERE h.id IN (SELECT o.housId
FROM Owners o
WHERE o.name = 'Peter' OR o.name = 'Paul')
This query works differently (by first finding the two house ID's of Peter and Paul and then performing the join), but it has the same effect.
Hopefully these examples will help you understand SQL :)
This isn't tested and written on the spot:
SELECT *
FROM
`things_in_boxes` AS a
LEFT JOIN `houses` AS b
on ( a.`house_id` = b.`house_id` )
LEFT JOIN `owners` AS c
on ( b.`house_id` = c.`house_id` )
WHERE c.`owner_id` IN( 0, 1 )
That is the general structure I would use, where the "0, 1" in the last IN statement are the owner ids for Peter and Paul. If you wanted to do it by name, you could simply make it something like
c.`name` IN( 'Peter', 'Paul' )
As far as books, I can't really tell you, I've learned through tutorials and references.
Here's one approach:
SELECT * FROM Things_in_boxes t
WHERE box_id IN (
SELECT b.id
FROM Boxes b
INNER JOIN Owners o
ON (o.house_id = b.house_id)
WHERE o.name LIKE 'Peter'
OR o.name LIKE 'Paul'
)
Note that you don't need to join on the House table, since both the Boxes and Owners have a house id.
Without knowing the full structure, I will assume a structure to build a query, step by step
Get the IDs of the houses belonging to the owners
select id from House where owner in ('peter', 'paul')
Get the boxes in those houses
select boxid from boxes where homeid in (select id from House where owner in ('peter', 'paul'))
Get the things in those boxes
select * from things where boxid in (select boxid from boxes where homeid in (select id from House where owner in ('peter', 'paul')))
This should get you what you want, but is very inefficient.
In the above method, The final query in step 3 gets the ids in each step, and stores them in temporary storage while it consumes them. This is a very slow operation in most DBMS.
The better alternative is a join. Combine all the tables and select the desired data.
select * from things join boxes on things.boxid =boxes.boxid join houses on boxes.houseid=house.id join owners on houses.owner=owner.ownerid where owner.name in ('peter',''paul)