SQL Left Join Multiple Tables and count values conditionally in each table - mysql

I am having some trouble putting together a SQL statement properly because I don't have much experience SQL, especially aggregate functions. Safe to say I don't really know what I'm doing outside of the basic SQL structure. I can do regular joins, but not complex ones.
I have some tables: 'Survey', 'Questions', 'Session', 'ParentSurvey', and 'ParentSurveyQuestion'. Structurally, a survey can have questions, it can have users that started the survey (a session), and it can have a parent survey whose questions get imported into the current survey.
What I want to do is get information for a each survey in the Survey table; total questions it has, how many sessions have been started (conditionally, ones that have not finished), and the number of questions in the parents survey. The three joined tables can but do not have to contain any values, and if they don't then 0 should be returned by COUNT. The common field in three of the tables is a variation of 'survey_id'
Here is my SQL so far, I put the table structure below it.
SELECT
`kp_survey_id`,
COALESCE( q.cnt, 0 ) AS questionsAmount,
COALESCE( s.cnt, 0 ) AS sessionsAmount
COALESCE( p.cnt, 0 ) AS parentQAmount,
FROM `Survey`
LEFT JOIN <-- I'd like the count of questions for this survey
( SELECT COUNT(*) AS cnt
FROM Questions
GROUP BY kf_survey_id ) q
ON Survey.kp_survey_id = Questions.kf_survey_id
LEFT JOIN
( SELECT COUNT(*) AS cnt <-- I'd like the count of started sessions for this survey
FROM Session
WHERE session_status = 'started' <-- should this be Session.session_status?
GROUP BY kf_survey_id ) s
ON Survey.kp_survey_id = Session.kf_survey_id
LEFT JOIN
( SELECT COUNT(*) AS cnt <-- I'd like the count of questions in the parent survey with this survey id
FROM ParentSurvey
GROUP BY kp_parent_survey_id ) p
ON Survey.kf_parent_survey_id = ParentSurveyQuestion.kf_parent_survey_id
'kp' prefix means primary key, while 'kf' prefix means foreign key
Structure:
Survey: 'kp_survey_id' | 'kf_parent_survey_id'
Question: 'kp_question_id' | 'kf_survey_id'
Session: 'kp_session_id' | 'kf_survey_id' | 'session_status'
ParentSurvey: 'kp_parent_survey_id' | 'survey_name'
ParentSurveyQuestion: 'kp_parent_question_id' | 'kf_parent_survey_id'
There are also other columns in each table like 'name' or 'account_id', but i don't think they matter in this case
I'd like to know if I'm doing this correctly or if I'm missing something. I'm repurposing some code I found here on stackoverflow and modifying it to meet my needs, as I haven't seen conditional aggregation for more than three tables on this site.
My expected output is something like:
kp_survey_id | questionsAmount | sessionsAmount | parentQAmount
1 | 3 | 0 | 3
2 | 0 | 5 | 3

I think you were pretty close -- just need to fix your joins and include the survey id in the subqueries to use in those joins:
SELECT
`kp_survey_id`,
COALESCE( q.cnt, 0 ) AS questionsAmount,
COALESCE( s.cnt, 0 ) AS sessionsAmount
COALESCE( p.cnt, 0 ) AS parentQAmount,
FROM `Survey`
LEFT JOIN
( SELECT COUNT(*) cnt, kf_survey_id AS cnt
FROM Questions
GROUP BY kf_survey_id ) q
ON Survey.kp_survey_id = q.kf_survey_id
LEFT JOIN
( SELECT COUNT(*) cnt, kf_survey_id
FROM Session
WHERE session_status = 'started'
GROUP BY kf_survey_id ) s
ON Survey.kp_survey_id = s.kf_survey_id
LEFT JOIN
( SELECT COUNT(*) cnt, kp_parent_survey_id
FROM ParentSurvey
GROUP BY kp_parent_survey_id ) p
ON Survey.kf_parent_survey_id = p.kp_parent_survey_id

One thing you need to do is correct your joins. When you are joining to a subquery, you need to use the alias of the subquery. In your case you are using the alias of the table being used in the subquery.
Another thing you need to change is to include the field you wish to use in your JOIN in the subquery.
Make these changes and try running. Do you get an error or the desired results?
SELECT
`kp_survey_id`,
COALESCE( q.cnt, 0 ) AS questionsAmount,
COALESCE( s.cnt, 0 ) AS sessionsAmount
COALESCE( p.cnt, 0 ) AS parentQAmount,
FROM `Survey`
LEFT JOIN <-- I'd like the count of questions for this survey
( SELECT kf_survey_id, COUNT(*) AS cnt
FROM Questions
GROUP BY kf_survey_id ) q
ON Survey.kp_survey_id = q.kf_survey_id
LEFT JOIN
( SELECT kf_survey_id, COUNT(*) AS cnt <-- I'd like the count of started sessions for this survey
FROM Session
WHERE session_status = 'started' <-- should this be Session.session_status?
GROUP BY kf_survey_id ) s
ON Survey.kp_survey_id = s.kf_survey_id
LEFT JOIN
( SELECT kp_parent_survey_id, COUNT(*) AS cnt <-- I'd like the count of questions in the parent survey with this survey id
FROM ParentSurvey
GROUP BY kp_parent_survey_id ) p
ON Survey.kf_parent_survey_id = p.kf_parent_survey_id

Related

MySQL condition statement across multiple rows

I have a Profile table like this
|--------|-----------|
| People | Favorite |
|--------|-----------|
| A | Movie |
| B | Movie |
| B | Jogging |
|--------|-----------|
Q: How to retrieve the people whose favorite is movie but not jogging?
In this table, the result is only People A.
Although I came out with this
select People from Profile
where
People
in
(select People from Profile
where favorite='Movie')
and
People
not in
(select People from Profile
where favorite='Jogging')
But it seem like can be better, any suggestion or answer (without using join or union clause)?
https://www.db-fiddle.com/f/rboiDpxxbABCpjtduEz7uY/1
SELECT People
FROM `profile`
GROUP BY people
HAVING SUM('Movie' = favorite) > 0
AND SUM('Jogging' = favorite) = 0
There's lots of ways. While you can use a UNION, its rather messy and innefficient. MySQL doesn't have a MINUS clause which would give a fairly easy to understand query.
You could aggregate the data:
SELECT people
, MAX(IF(favorite='jogging', 1, 0)) as jogging
, MAX(IF(favorite='movie', 1, 0)) as movie
FROM profile
GROUP BY people
HAVING movie=1 AND jogging=0
Or use an outer join:
SELECT m.people
FROM profile m
LEFT JOIN
( SELECT j.people
FROM joggers j
WHERE j.favorite='jogging' ) joggers
ON m.people=joggers.people
WHERE joggers.people IS NULL
AND m.favorite='movies'
Using a NOT IN/NOT EXISTS gives clearer syntax but again would be very innefficient.
There are several query patterns that will return a result that satisfies the specification.
We can use NOT EXISTS with a correlated subquery:
SELECT p.people
FROM profile p
WHERE p.favorite = 'Movie'
AND NOT EXISTS ( SELECT 1
FROM profile q
WHERE q.favorite = 'Jogging'
AND q.people = p.people /* related to row in out query */
)
ORDER
BY p.people
An equivalent result can also be done with an anti-join pattern:
SELECT p.people
FROM profile p
LEFT
JOIN profile q
ON q.people = p.people
AND q.favorite = 'Jogging'
WHERE q.people IS NULL
AND p.favorite = 'Movie'
ORDER BY p.people
Another option is conditional aggregation. Without a guarantee about uniqueness, and some MySQL shorthand:
SELECT p.people
FROM profile p
GROUP
BY p.people
HAVING 1 = MAX(p.favorite='Movie')
AND 0 = MAX(p.favorite='Jogging')
A more portable more ANSI standard compliant syntax for the conditional aggregation:
SELECT p.people
FROM profile p
GROUP
BY p.people
HAVING 1 = MAX(CASE p.favorite WHEN 'Movie' THEN 1 ELSE 0 END)
AND 0 = MAX(CASE p.favorite WHEN Jogging' THEN 1 ELSE 0 END)
This is a common problem when you want to have multiple conditions with the same column. I have answered this here and there are other methods like intersect and subqueries.
SELECT people, GROUP_CONCAT(favorite) as fav
FROM profile
GROUP BY people
HAVING fav REGEXP 'Movie'
AND NOT fav REGEXP 'Jogging';
With group by people and checking the minimum and maximum values of favorite to be 'Movie':
select people from tablename
where favorite in ('Movie', 'Jogging')
group by people
having min(favorite) = 'Movie' and max(favorite) = 'Movie'

An explanation with SQL query

I trying to get some data for my JavaFX Application from a couple of tables in database with MySQl.
Here's the query:
select veturattable.id, veturattable.vetura,veturattable.modeli,veturattable.ngjyra,
veturattable.targa, renttable.pagesa, hargjimettable.shuma
from veturattable
left join hargjimettable
on hargjimettable.veturaid= veturattable.id
left join renttable
on renttable.veturaid = veturattable.id ;
Here are datas from rentable
And here are datas from hargjimettable
So what I need is to show me this one:
veturaid | pagesa | shuma
1 | 150 | 91
10 | 110 | 40
You actually need to do two subqueries pre-aggregating the sum amounts per respective ID. Then join each individually back to the main. If you don't, you are getting a Cartesian product. For every record in the hargjimettable table for a given ID, it is joined to the renttable for each amount there. So, if you have 2 records in first table and 3 records in the second, you are getting a multiple of 6.
By pre-querying each grouping by the one ID key respectively, you will only have at most, one record for each possible summation. So grab that record if it exists. The left-join prevents some IDs from not showing up. Using coalesce() prevents nulls from showing.
select
v.id,
v.vetura,
v.modeli,
v.ngjyra,
v.targa,
COALESCE( RSum.SumPagesa, 0 ) as AllPagesa,
COALESCE( HSum.SumShuma, 0 ) as AllShuma
from
veturattable v
left join
( select
h.veturaid,
SUM( h.shuma ) as SumShuma
from
hargjimettable h
group by
h.veturaid ) HSum
ON v.id = HSum.veturaid
left join
( select
r.veturaid,
SUM( r.pagesa ) as SumPagesa
from
renttable r
group by
r.veturaid ) RSum
ON v.id = RSum.veturaid
You actually want the MAX() and SUM() along the GROUP BY like
select max(veturattable.id) as id, max(veturattable.vetura) as vetura,
max(veturattable.modeli) as modeli,
max(veturattable.ngjyra) as ngjyra,
max(veturattable.targa) as targa,
max(renttable.pagesa) as pagesa,
sum(hargjimettable.shuma) as shuma
from veturattable
left join hargjimettable
on hargjimettable.veturaid= veturattable.id
left join renttable
on renttable.veturaid = veturattable.id
group by veturattable.id;

mysql big query optimization

I'd need to optimize the following query which takes up to 10 minutes to run.
Performing the explain it seems to be running on all 350815 rows of the "table_3" table and 1 for all the others.
General rules to place indexes the propper way? Should I think about using multidimensional indexes? Where should I use them at first on the JOINS, the WHERE or the GROUP BY, if I remember right there should be a hierarchy to follow. Also If I have 1 row for all tables but one (in the row column of the explain table) how can I optimize usually my optimization consists in ending up with only one row for all columns but one.
All tables average from 100k to 1000k+ rows.
CREATE TABLE datab1.sku_performance
SELECT
table1.sku,
CONCAT(table1.sku,' ',table1.fk_container ) as sku_container,
table1.price as price,
SUM( CASE WHEN ( table1.fk_table1_status = 82
OR table1.fk_table1_status = 119
OR table1.fk_table1_status = 124
OR table1.fk_table1_status = 141
OR table1.fk_table1_status = 131) THEN 1 ELSE 0 END)
/ COUNT( DISTINCT id_catalog_school_class) as qty_returned,
SUM( CASE WHEN ( table1.fk_table1_status In (23,13,44,65,6,75,8,171,12,166))
THEN 1 ELSE 0 END)
/ COUNT( DISTINCT id_catalog_school_class) as qt,
container.id_container as container_id,
container.idden as container_idden,
container.delivery_badge,
catalog_school.id_catalog_school,
LEFT(catalog_school.flight_fair,2) as departing_country,
catalog_school.weight,
catalog_school.flight_type,
catalog_school.price,
table_3.id_table_3,
table_3.fk_catalog_brand,
MAX( LEFT( table_3.note,3 )) AS supplier,
GROUP_CONCAT( product_number, ' by ',FORMAT(catalog_school_class.quantity,0)
ORDER BY product_number ASC SEPARATOR ' + ') as supplier_prod,
Sum( distinct( catalog_school_class.purch_pri * catalog_school_class.quantity)) AS final_purch_pri,
catalog_groupp.idden as supplier_idden,
catalog_category_details.id_catalog_category,
catalog_category_details.cat1 as product_cat1,
catalog_category_details.cat2 as product_cat2,
COUNT( distinct catalog_school_class.id_catalog_school_class) as setinfo,
datab1.pageviewgrouped.pv as page_views,
Sum(distinct(catalog_school_class.purch_pri * catalog_school_class.quantity)) AS purch_pri,
container_has_table_3.position,
max( table1.created_at ) as last_order_date
FROM
table1
LEFT JOIN container
ON table1.fk_container = container.id_container
LEFT JOIN catalog_school
ON table1.sku = catalog_school.sku
LEFT JOIN table_3
ON catalog_school.fk_table_3 = table_3.id_table_3
LEFT JOIN container_has_table_3
ON table_3.id_table_3 = container_has_table_3.fk_table_3
LEFT JOIN datab1.pageviewgrouped
on table_3.id_table_3 = datab1.pageviewgrouped.url
LEFT JOIN datab1.catalog_category_details
ON datab1.catalog_category_details.id_catalog_category = table_3_has_catalog_minority.fk_catalog_category
LEFT JOIN catalog_groupp
ON table_3.fk_catalog_groupp = catalog_groupp.id_catalog_groupp
LEFT JOIN table_3_has_catalog_minority
ON table_3.id_table_3 = table_3_has_catalog_minority.fk_table_3
LEFT JOIN catalog_school_class
ON catalog_school.id_catalog_school = catalog_school_class.fk_catalog_school
WHERE
table_3.status_ok = 1
AND catalog_school.status = 'active'
AND table_3_has_catalog_minority.is_primary = '1'
GROUP BY
table1.sku,
table1.fk_container;
rows per table :
.table1 960096 to 1.3mn rows
.container 9275 to 13000 rows
.catalog_school 709970 to 1 mn rows
.table_3 709970 to 1 mn rows
.container_has_table_3 709970 to 1 mn rows
.pageviewgrouped 500000 rows
.catalog_school_class 709970 to 1 mn rows
.catalog_groupp 3000 rows
.table_3_has_catalog_minority 709970 to 1 mn rows
.catalog_category_details 659 rows
Too much to put into a single comment, so I'll add here and adjust later as possibly needed... You have LEFT JOINs everywhere, but your WHERE clause is specifically qualifying fields from the Table_3, Catalog_School and Table_3_has_catalog_minority. This by default changes them to INNER JOINs.
With respect to your where clause
WHERE
table_3.status_ok = 1
AND catalog_school.status = 'active'
AND table_3_has_catalog_minority.is_primary = '1'
Which table / column would have the smallest results based on these criteria. ex: Table_3.Status_ok = 1 might have 500k records but table_3_has_catalog_minority.is_primary may only have 65k and catalog_school.status = 'active' may have 430k.
Also, some of your columns are not qualified with the table they are coming from. Can you please confirm... such as "id_catalog_school_class" and "product_number"
SOMETIMES, changing the order of the tables, with good knowledge of the makeup of the data and in MySQL adding a "STRAIGHT_JOIN" keyword can improve performance. This was something I've had in the past working with gov't database of contracts and grants with 20+ million records and joining to about 15+ lookup tables. It went from hanging the server to getting the query finished in less than 2 hrs. Considering the amount of data I was dealing with, that was actually a good time.
AFTER dissecting this thing some, I restructured a bit more for readability, added aliases for table references and changed the order of the query and have some suggested indexes. To help the query, I tried moving the Catalog_School table to the first position and added the STRAIGHT_JOIN. The index is based on the STATUS first to match the WHERE clause, THEN I included the SKU as it is first element of the GROUP BY, then the other columns used to join to the subsequent tables. By having these columns in the index, it can qualify the joins without having to go to the raw data.
By changing the group by to the Catalog_School.SKU instead of table_1.SKU the index from catalog_school can be used to help optimize that. It is the same value since the join from the catalog_school.sku = table_1.sku. I also added index references for table_1 and table_3 that are suggestions -- again, to preemptively qualify the joins without going to the raw data pages of the tables.
I would be interested in knowing the final performance (better or worse) from your data.
TABLE INDEX ON...
catalog_school ( status, sku, fk_table_3, id_catalog_school )
table_1 ( sku, fk_container )
table_3 ( id_table_3, status_ok, fk_catalog_groupp )
SELECT STRAIGHT_JOIN
CS.sku,
CONCAT(CS.sku,' ',T1.fk_container ) as sku_container,
T1.price as price,
SUM( CASE WHEN ( T1.fk_table1_status IN ( 82, 119, 124, 141, 131)
THEN 1 ELSE 0 END)
/ COUNT( DISTINCT CSC.id_catalog_school_class) as qty_returned,
SUM( CASE WHEN ( T1.fk_table1_status In (23,13,44,65,6,75,8,171,12,166))
THEN 1 ELSE 0 END)
/ COUNT( DISTINCT CSC.id_catalog_school_class) as qt,
CS.id_catalog_school,
LEFT(CS.flight_fair,2) as departing_country,
CS.weight,
CS.flight_type,
CS.price,
T3.id_table_3,
T3.fk_catalog_brand,
MAX( LEFT( T3.note,3 )) AS supplier,
C.id_container as container_id,
C.idden as container_idden,
C.delivery_badge,
GROUP_CONCAT( product_number, ' by ',FORMAT(CSC.quantity,0)
ORDER BY product_number ASC SEPARATOR ' + ') as supplier_prod,
Sum( distinct( CSC.purch_pri * CSC.quantity)) AS final_purch_pri,
CGP.idden as supplier_idden,
CCD.id_catalog_category,
CCD.cat1 as product_cat1,
CCD.cat2 as product_cat2,
COUNT( distinct CSC.id_catalog_school_class) as setinfo,
PVG.pv as page_views,
Sum(distinct(CSC.purch_pri * CSC.quantity)) AS purch_pri,
CHT3.position,
max( T1.created_at ) as last_order_date
FROM
catalog_school CS
JOIN table1 T1
ON CS.sku = T1.sku
LEFT JOIN container C
ON T1.fk_container = C.id_container
LEFT JOIN catalog_school_class CSC
ON CS.id_catalog_school = CSC.fk_catalog_school
JOIN table_3 T3
ON CS.fk_table_3 = T3.id_table_3
JOIN table_3_has_catalog_minority T3HCM
ON T3.id_table_3 = T3HCM.fk_table_3
LEFT JOIN datab1.catalog_category_details CCD
ON T3HCM.fk_catalog_category = CCD.id_catalog_category
LEFT JOIN container_has_table_3 CHT3
ON T3.id_table_3 = CHT3.fk_table_3
LEFT JOIN datab1.pageviewgrouped PVG
on T3.id_table_3 = PVG.url
LEFT JOIN catalog_groupp CGP
ON T3.fk_catalog_groupp = CGP.id_catalog_groupp
WHERE
CS.status = 'active'
AND T3.status_ok = 1
AND T3HCM.is_primary = '1'
GROUP BY
CS.sku,
T1.fk_container;

SQL Select outter Join and Group By and IFNULL ( COUNT ) HELP =x

I have this database and i was wondering to create a great select but is too hard for me I guess I tried so many ways and I get really close, but i cant go longer.
Database
Table -> Candidato | IdCandidato(int) | idNome(varchar)
Table -> Voto | idVoto(int) | Candidato_idCandidato(int) | DiaVotacao(date)
i am creating a web voting system and need i greate select to complet my graphics to show the total voting for each day for each candidate.
Candidato = candidate | voto = vote | diaVotacao = voting day (english translation)
I need i response like this:
|VotingDay---------|-----Candidate1----------|-----Candidate2------|--Candidate3
|2014-05-14-------|---13(total votes)---------|------------4------------|-----------10|
|2014-05-15-------|---18(total votes)---------|------------0------------|------------8|
and so far i got this:
|VotingDay---------|-----TOTAL Votes----------|-----Name------|
|2014-05-14-------|---13(total votes)---------|-Candidate1
|2014-05-14-------|---18(total votes)---------|-Candidate2
|2014-05-15-------|---10(total votes)---------|-Candidate1
|2014-05-15-------|----8(total votes)---------|-Candidate2
I used the following code:
SELECT voto.DiaVotacao, IFNULL(COUNT(voto.Candidato_idCandidato),0) as Votos, candidato.Nome
FROM candidato LEFT OUTER JOIN voto ON voto.Candidato_idCandidato=candidato.idCandidato
GROUP BY voto.DiaVotacao, voto.Candidato_idCandidato
Note that i want the count of the votes for each candidate for everey day and if there is no votes apear the number 0 to indicate no votes
did u guys understand?
You need to generate the full list of candidates and days and then do the left outer join. You can get the list of days from the votos table.
Note that when you use count(<column>), it will return 0 if all the values are NULL. There is no need for ifull() or coalesce():
SELECT d.DiaVotacao, COUNT(v.Candidato_idCandidato) as Votos, c.Nome
FROM candidato c cross join
(SELECT DISTINCT v.DiaVotacao FROM voto v
) d LEFT OUTER JOIN
voto v
ON v.Candidato_idCandidato = c.idCandidato and
v.DiaVotacao = d.DiaVotacao
GROUP BY d.DiaVotacao, v.Candidato_idCandidato;

MySQL query joining 3 tables and counting

I've been stuck on this problem for far too long.
I have to merge 3 tables and do some counting of distinct values.
I have 3 tables
1.User_me
profileId( String )
responded( int 1 or 0)
2.Profiles
profileId ( String )
idLocation ( int )
3.lookup_location
id ( int )
location (String )
I can join User_me and Profiles ON User_me.profileId = Profiles.profileId
I can join Profiles and lookup_location ON Profiles.idLocation = lookup_location.id
Under Profiles I need to count the number of distinct values for idLocation where User_me.profileId = Profiles.profileId
I also need to count the number of Profiles.idLocation that have User_me.responded = 1
I have this:
SELECT lookup.location, count(*) as total
FROM User_me user
JOIN Profiles
ON user.profileId= profiles.profileId
JOIN lookup_location lookup
ON profiles.idLocation = lookup.id
GROUP BY profiles.idLocation
but I still need to have the column giving me the count where User_me.responded = 1
Something like:
SELECT lookup.location, count(*) as total, count(*) responded
If I'm understanding your question correctly, you can you a case statement in the count aggregate:
SELECT lookup.location, count(*) as total,
count(case when user.responded = 1 then 1 end) as responded
FROM User_me user
JOIN Profiles
ON user.profileId= profiles.profileId
JOIN lookup_location lookup
ON profiles.idLocation = lookup.id
GROUP BY profiles.idLocation
Since you're using MySQL, you can also use something like sum(user.responded = 1).