SQL join with missing rows - mysql

I have the following two tables.
SurveyTable:
QID | Text
----------------------------------------
1 | Favorite movie
2 | Favorite book
3 | Favorite city
SurveyResponses:
UserID | QID | Answer
----------------------------------------
1001 | 1 | StarWars
1001 | 2 | Harry Potter
1001 | 3 | Los Angeles
1003 | 3 | New York
I would like to get a response which also has all the questions that the User did not answer in the survey.
Expected Output of SQL join:
UserID | QID | Answer
----------------------------------------
1001 | 1 | StarWars
1001 | 2 | Harry Potter
1001 | 3 | Los Angeles
1003 | 1 | -
1003 | 2 | -
1003 | 3 | New York
I tried various SQL query combinations but no luck. Please help.

Since you don't provide a Users table, you may end up with something like this:
SELECT
U.UserId, Q.QID, A.Answer
FROM SurveyTable Q
CROSS JOIN (SELECT DISTINCT UserId FROM SurveyResponses) U
LEFT JOIN SurveyResponses A ON A.QID = Q.QID AND U.UserId = A.UserId
You'll really want to avoid having to do this in real life though, the cross join is likely going to suck all performance out of the server if your tables get anywhere near large.
Better would be to have a Users table so you can use that and left join against it like this:
SELECT
U.UserId, Q.QID, A.Answer
FROM Users U
INNER JOIN SurveyTable Q
LEFT JOIN SurveyResponses A ON A.UserId = U.UserId AND A.QID = Q.QID

Related

MYSQL Left Join on multiple table but select only most recent rows

I have three tables (form_completions, customer_sessions and conversion_sessions) which are design like the below, these are heavily striped down for the purpose of this example:
Form_completions:
id | name | email
------------------------
101 | Tom | tom#website.com
102 | Ben | ben#website.com
Customer_sessions:
id | customer_id | session_id | session_source
---------------------------------
1 | 900 | 9kc73bsf | twitter
2 | 901 | 15jvuw83 | google
3 | 901 | 45h73bgf | twitter
Conversion_sessions:
id | customer_id | session_id | form_completion_id
------------------------------------
1 | 900 | 9kc73bsf | 101
2 | 901 | 45h73bgf | 102
The query that i currently have is:
SELECT custsess.session_source,
c.id as form_conversion_id,
c.name,
FROM conversion_sessions convsess
LEFT JOIN form_completions fc
ON convsess.`form_completion_id` = fc.`id`
LEFT JOIN customer_sessions custsess
ON custsess.`customer_id` = convsess.`customer_id`
This will give me the following:
session_source | form_conversion_id | name
------------------------------------------
twitter | 101 | Tom
google | 102 | Ben
twitter | 102 | Ben
But what i need is to avoid the duplication of form_conversion_id and only include the most recent so it would be...
session_source | form_conversion_id | name
------------------------------------------
twitter | 101 | Tom
twitter | 102 | Ben
Hopefully this makes sense.
Solution to your problem:
SELECT custsess.session_source,
fc.id as form_conversion_id,
fc.name
FROM conversion_sessions AS convsess
INNER JOIN form_completions AS fc
ON convsess.form_completion_id = fc.id
INNER JOIN customer_sessions AS custsess
ON custsess.customer_id = convsess.customer_id
AND custsess.session_id = convsess.session_id
While joining you forgot to join customer_sessions and coversion_sessions ON session_id.
OUTPUT:
session_source form_conversion_id name
twitter 01 Tom
twitter 102 Ben
For demo follow the below link:
https://www.db-fiddle.com/f/pQG4VCWAnoGtRJa6MkvST6/0
You need a Subquery to get the most recent, change your JOIN to something like this
LEFT JOIN customer_sessions custsess
ON custsess.id =
(
SELECT MAX(id)
FROM Customer_sessions
WHERE c.id = custsess.customer_id
)
Looking to your data You should just use inner join
You should use inner join on a subquery for max session_id
SELECT custsess.session_source,
c.id as form_conversion_id,
c.name,
FROM conversion_sessions convsess
customer_sessions custsess ON custsess.`customer_id` = convsess.`customer_id`
INNER JOIN (
select customer_id, max(session_id) as max_sess
from Customer_sessions
group by customer_id
) t on t.customer_id = custsess.customer_id and t.max_sess = custsess.session_id
INNER JOIN form_completions fc ON convsess.`form_completion_id` = fc.`id`

How to write this basic sql statement

So I have 4 tables that are connected via foreign keys namely result, position, student, candidates
What i need to achieve is this:
output:
------------------------
s_fname | count(c_id)
-----------------------
Mark | 2 -> President
France| 2 -> President
.. to count as to how many times a c_id have been repeated in the table "result" which is also filtered by pos_id from the "candidates" table
Below is my code which lacks the counting part:
select s_fname
from results, candidates, student, positioning
where results.c_id = candidates.c_id
AND student.sid = results.sid
AND candidates.pos_id = positioning.pos_id
AND positioning.pos_id = 1
Group BY results.sid;
..which I know lacks a lot of things ...
Thanks
it seems very complex to me but i know there are gurus here who can achieve this,
result table
---------------------
| r_id | sid | c_id |
---------------------
1 | 1 | 1
2 | 1 | 2
3 | 1 | 4
4 | 2 | 1
5 | 2 | 2
6 | 2 | 4
7 | 3 | 3
8 | 3 | 2
9 | 5 | 3
10 | 5 | 2
----------------------
student table
----------------
| s_id| s_fname|
----------------
1 | Mark
2 | Jorge
3 | France
4 | James
--------------------
Candidates Table
------------------------
| c_id | sid | pos_id
------------------------
1 | 1 | 1
2 | 2 | 2
3 | 4 | 3
4 | 3 | 1
5 | 5 | 2
----------------------
positioning Table
-----------------------
| pos_id | po_name |
-----------------------
1 | President
2 | Vice President
3 | Secretary
4 | Treasurer
This is untested, but should return your intended result.
What it does is joins all of your tables on the related foreign keys, effectively giving a wide table of all of your columns. Then we limit on the candidates that are running for the President position. Since we need to group because of the count aggregate we group on the name. The count should reflect the number of votes they got, because there is a one to many relationship to the result table.
SELECT s_fname, Count(*)
FROM studentTable st
INNER JOIN Candidates c On c.sid = st.s_ID
INNER JOIN positioning p on c.pos_ID = p.pos_ID
INNER JOIN results r on st.s_ID = r.s_ID
WHERE po_Name = "President"
GROUP BY s_Fname
Due to a misunderstanding of the intended joins, the following query should show the appropriate results.
SELECT s_fname, Count(*)
FROM studentTable st
INNER JOIN Candidates c On c.sid = st.s_ID
INNER JOIN positioning p on c.pos_ID = p.pos_ID
INNER JOIN results r on c.c_ID = r.c_ID
WHERE po_Name = "President"
GROUP BY s_Fname
Code:
SELECT s_fname AS [Student Name], COUNT(A.c_id) AS [Count], po_name AS [Position]
FROM results AS A INNER JOIN candidates AS B ON A.c_id=B.c_id
INNER JOIN student AS C ON A.sid=C.sid
INNER JOIN positioning AS D ON B.pos_id=D.pos_id
WHERE B.pos_id = 1
GROUP BY s_fname
SELECT s.s_fname, COUNT(*), p.po_name
FROM students s
JOIN candidates c ON c.s_id = s.s_id
JOIN positioning p ON c.pos_id = p.pos_id
JOIN results r ON s.s_id = r.s_id
WHERE p.pos_id = 1
GROUP BY s.s_id
http://sqlfiddle.com/#!2/9472a/17

What query to obtain the following on MySQL?

Users
+--------+--------------+----------+--------------+
| userID | userUsername | userName | userLastName |
+--------+--------------+----------+--------------+
| 6 | richard | Ricardo | Vega |
| 10 | jason | Jason | Bourne |
+--------+--------------+----------+--------------+
Restocks
+-----------+-------------+--------+--------+-----------------+
| restockID | restockDate | itemID | userID | restockQuantity |
+-----------+-------------+--------+--------+-----------------+
| 1 | 2012-02-29 | 1 | 6 | 48 |
| 2 | 2012-02-29 | 1 | 10 | 100 |
| 3 | 2012-02-29 | 2 | 10 | 50 |
| 4 | 2012-02-29 | 2 | 6 | 100 |
| 5 | 2012-02-29 | 2 | 6 | 200 |
| 6 | 2012-02-29 | 2 | 10 | 2000 |
| 7 | 2012-02-29 | 1 | 10 | 2000 |
+-----------+-------------+--------+--------+-----------------+
Items
+--------+--------------------+
| itemID | itemName |
+--------+--------------------+
| 1 | Coca Cola (lata) |
| 2 | Cerveza Sol (lata) |
+--------+--------------------+
Ok guys, i have supplied some sample data as requested. I need to get this table:
+--------+--------------------+---------------+-------------+----------+--------------+--------------+
| itemID | itemName | itemExistence | restockDate | userName | userLastName | userUsername |
+--------+--------------------+---------------+-------------+----------+--------------+--------------+
| 2 | Cerveza Sol (lata) | 2350 | 2012-02-29 | Jason | Bourne | jason |
| 1 | Coca Cola (lata) | 2148 | 2012-02-29 | Ricardo | Vega | richard |
+--------+--------------------+---------------+-------------+----------+--------------+--------------+
But, i need restockDate to be THE LATEST ONE for each itemName. In the example, it shows the first restock and not the latest one. I just need to show what's the existence for the item and when was restocked for last time, not first time.
If my tables are not good or so, please suggest a new schema.
I know maybe this is a lot so i will tip 5 USD (Paypal) to the one how can help me with this. Promise.
As discussed in comments, many restocks can be performed on the same day so it is not possible to compare dates in this case. Two options are presented here: Use the incremental PK from restocks table or restructure the table. For the first case, this is the solution:
select i.itemID, i.itemName, i.itemExistence, r.restockDate, u.userName,
u.userLastName, u.userUsername
from items i
left join (
select r1.restockDate, r1.itemID, r1.userID from restocks r1
left join restocks r2
on r1.itemId = r2.itemId and r1.restockId < r2.restockId
where r2.restockDate is null
) as r on i.itemID = r.itemID
inner join users u on r.userID = u.userID
For the second case, the restructre would imply changing the date field to a unique datetime that would uniquely identify a record. That is the best solution, however, it does require to also update any previous data present in the table. That means, to update all the records that have the same date for a single product restock and set different date times to them.
The lazy one (like me), would go for the first option :) Let me know if you have any doubt about this.
first get the distinct from items table and then use it to join others
SELECT items.*, restocks.restockDate, users.userName, users.userLastName, users.userUsername
FROM (SELECT DISTINCT items.itemID, items.itemName, items.itemExistence FROM items) AS items
LEFT JOIN restocks on items.itemID = restocks.itemID
LEFT JOIN users on restocks.userID = users.userID
GROUP BY items.itemName
Not Tested
UPDATED
select items.itemID, items.itemName, items.itemExistence, restocks.restockDate, users.userName, users.userLastName, users.userUsername
from items
inner join restocks on items.itemID = restocks.itemID
inner join users on restocks.userID = users.userID
GROUP BY items.itemName
select
items.itemID, items.itemName, items.itemExistence,
(select A.restockDate from restocks A where A.itemId = items.itemID limit 0, 1),
(select B.userID from restocks B where B.itemId = items.itemID limit 0, 1),
users.userName, users.userLastName, users.userUsername
from items
left join users on B.userID = users.userID
Please try this.
You don't mention what itemExistence is, so I'm hoping it's a column in the Items table.
Here's an easy way to do it with a self-join:
SELECT i.itemID, i.itemName, i.itemExistence, r1.restockDate,
u.userName, u.userLastName, u.userUsername
FROM Items i
JOIN Restocks r1
ON r1.itemID = i.itemID
JOIN Users u
ON u.userID = r1.userID
LEFT JOIN Restocks r2
ON r2.itemID = i.itemID
AND r2.restockDate > r1.restockDate
WHERE r2.itemID IS NULL
The LEFT JOIN with the WHERE clause ensures that we only pull the row with the latest restockDate.
The advantage of this approach is that it avoids subqueries, which often negate the use of indexes.
You can get duplicate records for a particular item if it was restocked more than once on the same date.

JOIN 4 Tables with meta_table

I have this database structure:
sites
id | name
1 | Site 1
2 | Site 2
locations
id | city
23 | Baltimore
24 | Annapolis
people
id | name
45 | John
46 | Sue
sites_meta
id | site_id | meta_name | meta_value
1 | 1 | local | 23
2 | 1 | person | 45
3 | 2 | local | 24
4 | 2 | person | 46
So, as you can see, Site 1 (id 1) is in Baltimore and is associated with John, Site 2 (id 2) is in Annapolis and associated with Sue.
I need to figure out a clever sql statement that can return
id | name | id | city | id | name
1 | Site 1 | 23 | Baltimore | 45 | John
2 | Site 2 | 24 | Annapolis | 46 | Sue
I would be super appreciative if anyone can help me out. I've tried a few combinations of a select statement, but I keep getting stuck with using two values from the sites_meta table.
select
s.id as siteId,
s.name as siteName,
max(l.id) as locationId,
max(l.city) as city,
max(p.id) as personId,
max(p.name) as personName
from
sites_meta sm
join sites s on s.id = sm.site_id
left join locations l on l.id = sm.meta_value and sm.meta_name = 'local'
left join people p on p.id = sm.meta_value and sm.meta_name = 'person'
group by
s.id,
s.name
You can probably imagine how this kind of "meta" table might become a pain... especially as more items are added to it.
Instead, you might consider replacing it with two new tables, sites_locations and sites_people.
SELECT
s.id,s.name,l.id,l.city,p.id,p.name
FROM
sites s
INNER JOIN sites_meta sm1 ON s.id = sm1.site_id
INNER JOIN sites_meta sm2 ON s.id = sm2.site_id
INNER JOIN locations l ON sm1.meta_value = l.id AND sm1.meta_name = 'local'
INNER JOIN people p ON sm2.meta_value = p.id AND sm2.meta_name = 'person'
;

Most efficient way to SELECT one row in a one:many pair of tables in MySQL

Let's say I've got the following data in one-to-many tables city and person, respectively:
SELECT city.*, person.* FROM city, person WHERE city.city_id = person.person_city_id;
+---------+-------------+-----------+-------------+----------------+
| city_id | city_name | person_id | person_name | person_city_id |
+---------+-------------+-----------+-------------+----------------+
| 1 | chicago | 1 | charles | 1 |
| 1 | chicago | 2 | celia | 1 |
| 1 | chicago | 3 | curtis | 1 |
| 1 | chicago | 4 | chauncey | 1 |
| 2 | new york | 5 | nathan | 2 |
| 3 | los angeles | 6 | luke | 3 |
| 3 | los angeles | 7 | louise | 3 |
| 3 | los angeles | 8 | lucy | 3 |
| 3 | los angeles | 9 | larry | 3 |
+---------+-------------+-----------+-------------+----------------+
9 rows in set (0.00 sec)
And I want to select a single record from person for each unique city using some particular logic. For example:
SELECT city.*, person.* FROM city, person WHERE city.city_id = person.person_city_id
GROUP BY city_id ORDER BY person_name DESC
;
The implication here is that within each city, I want to get the lexigraphically greatest value, eg:
+---------+-------------+-----------+-------------+----------------+
| city_id | city_name | person_id | person_name | person_city_id |
+---------+-------------+-----------+-------------+----------------+
| 2 | new york | 5 | nathan | 2 |
| 3 | los angeles | 6 | luke | 3 |
| 1 | chicago | 1 | curtis | 1 |
+---------+-------------+-----------+-------------+----------------+
The actual output I get, however, is:
+---------+-------------+-----------+-------------+----------------+
| city_id | city_name | person_id | person_name | person_city_id |
+---------+-------------+-----------+-------------+----------------+
| 2 | new york | 5 | nathan | 2 |
| 3 | los angeles | 6 | luke | 3 |
| 1 | chicago | 1 | charles | 1 |
+---------+-------------+-----------+-------------+----------------+
I understand that the reason for this discrepancy is that MySQL first performs the GROUP BY, then it does the ORDER BY. This is unfortunate for me, as I want the GROUP BY to have selection logic in which record it picks.
I can workaround this by using some nested SELECT statements:
SELECT c.*, p.* FROM city c,
( SELECT p_inner.* FROM
( SELECT * FROM person ORDER BY person_city_id, person_name DESC ) p_inner
GROUP BY person_city_id ) p
WHERE c.city_id = p.person_city_id;
+---------+-------------+-----------+-------------+----------------+
| city_id | city_name | person_id | person_name | person_city_id |
+---------+-------------+-----------+-------------+----------------+
| 1 | chicago | 3 | curtis | 1 |
| 2 | new york | 5 | nathan | 2 |
| 3 | los angeles | 6 | luke | 3 |
+---------+-------------+-----------+-------------+----------------+
This seems like it would be terribly inefficient when the person table grows arbitrarily large. I assume the inner SELECT statements don't know about outermost WHERE filters. Is this true?
What is the accepted best approach for doing what effectively is an ORDER BY before the GROUP BY?
The usual way to do this (in MySQL) is with a join of your table to itself.
First to get the greatest person_name per city (ie per person_city_id in the person table):
SELECT p.*
FROM person p
LEFT JOIN person p2
ON p.person_city_id = p2.person_city_id
AND p.person_name < p2.person_name
WHERE p2.person_name IS NULL
This joins person to itself within each person_city_id (your GROUP BY variable), and also pairs the tables up such that p2's person_name is greater than p's person_name.
Since it's a left join if there's a p.person_name for which there is no greater p2.person_name (within that same city), then the p2.person_name will be NULL. These are precisely the "greatest" person_names per city.
So to join your other information (from city) to it, just do another join:
SELECT c.*,p.*
FROM person p
LEFT JOIN person p2
ON p.person_city_id = p2.person_city_id
AND p.person_name < p2.person_name
LEFT JOIN city c -- add in city table
ON p.person_city_id = c.city_id -- add in city table
WHERE p2.person_name IS NULL -- ORDER BY c.city_id if you like
Your "solution" is not valid SQL but it works in MySQL. You can't be sure however if it will break with a future change in the query optimizer code. It could be slightly improved to have just 1 level of nesting (still not valid SQL):
--- Option 1 ---
SELECT
c.*
, p.*
FROM
city AS c
JOIN
( SELECT *
FROM person
ORDER BY person_city_id
, person_name DESC
) AS p
ON c.city_id = p.person_city_id
GROUP BY p.person_city_id
Another way (valid SQL syntax, works in other DBMS, too) is to make a subquery to select the last name for every city and then join:
--- Option 2 ---
SELECT
c.*
, p.*
FROM
city AS c
JOIN
( SELECT person_city_id
, MAX(person_name) AS person_name
FROM person
GROUP BY person_city_id
) AS pmax
ON c.city_id = pmax.person_city_id
JOIN
person AS p
ON p.person_city_id = pmax.person_city_id
AND p.person_name = pmax.person_name
Another way is the self join (of the table person), with the < trick that #mathematical_coffee describes.
--- Option 3 ---
see #mathematical-coffee's answer
Yet another way is to use a LIMIT 1 subquery for the join of city with person:
--- Option 4 ---
SELECT
c.*
, p.*
FROM
city AS c
JOIN
person AS p
ON
p.person_id =
( SELECT person_id
FROM person AS pm
WHERE pm.person_city_id = c.city_id
ORDER BY person_name DESC
LIMIT 1
)
This will run a subquery (on table person) for every city and it will be efficient if you have a (person_city_id, person_name) index for InnoDB engine or an (person_city_id, person_name, person_id) for MyISAM engine.
There is one major difference between these options:
Oprions 2 and 3 will return all tied results (if you have two or more persons in a city with same name that is alphabetically last, then both or all will be shown).
Options 1 and 4 will return one result per city, even if there are ties. You can choose which one by altering the ORDER BY clause.
Which option is more efficient depends also on the distribution of your data, so the best way is to try them all, check their execution plans and find the best indexes that work for each one. An index on (person_city_id, person_name) will most likely be good for any of those queries.
With distribution I mean:
Do you have few cities with many persons per city? (I would think that options 2 and 4 would behave better in this case)
Or many cities with few persons per city? (option 3 may be better with such data).