Group Concat having giving weird results - mysql
I have this SQL query :
SELECT v.*, group_concat(distinct(vi.interest_id)) as interests, group_concat(distinct(vs.skill_id)) as skills, vc.date_from
FROM `vacancies` as v
LEFT JOIN `vacancy_interests` as vi on v.vacancy_id = vi.vacancy_id
LEFT JOIN `vacancy_skills` as vs on v.vacancy_id = vs.vacancy_id
LEFT JOIN `vacancy_calendar` as vc on v.vacancy_id = vc.vacancy_id
WHERE v.vacancy_visibility_end_date >= CURDATE()
GROUP BY v.vacancy_id
Consider this query returning the following results (the 3 last columns are the ones discussed in this question):
vacancy_id,org_id,name,description,number_required,occupancy_kind,website,offer,logo,banner,address_country,address_city,address_postal_code,address_line_1,address_line_2,vacancy_visibility_start_date,vacancy_visibility_end_date,engagement,interests,skills,date_from
"2","1","test123","aze<sdgqswdfg","1","1","","blabla",NULL,"12049394_10208129537615226_4853636504350654671_n.jpg","Belgie","Brussel","1000","Brusselsestraat 15",NULL,"2016-09-02 00:00:00","2016-09-19 00:00:00","3","13,6,1","4,3","2016-09-13 00:00:00"
"3","1","blablabla","lkpjoip","1","2","","blabla",NULL,NULL,"Belgie","Antwerpen","2000","Antwerpsestraat 16",NULL,"2016-09-02 00:00:00","2016-09-29 00:00:00","3","28","7,8,5","2016-09-01 00:00:00"
"4","1","hahaha","14556dsf","1","3","","blabla",NULL,NULL,"Belgie","Mechelen","2800","Mechelsesteenweg 17",NULL,"2016-09-02 00:00:00","2016-09-28 00:00:00","3",NULL,NULL,"2016-09-26 00:00:00"
"5","1","omggg","45sdfdj5","1","1","","blabla",NULL,NULL,"Belgie","Gent","3000","Gentsesteenweg 18",NULL,"2016-09-02 00:00:00","2016-09-30 00:00:00","3","17,11","4,1","2016-09-19 00:00:00"
"6","1","this is a test","wauhiufdsq","1","2","","blabla",NULL,NULL,"Belgie","Luik","4000","Luikseweg 19",NULL,"2016-09-02 00:00:00","2016-09-30 00:00:00","3","19,17,22","6","2016-08-10 00:00:00"
Note that the vacancy interests and vacancy skills table can contain multiple records for a single vacancy. E.g. It could be that vacancy 3 has 3 rows with all different interest_id's. The group_concat solves my problem here.
So this query works fine as it should.
However, 2 problems I encountered are the following:
1) When I add a filter in HAVING on the interests by an ID this only returns me one row instead of the expected two rows.
SELECT v.*, group_concat(distinct(vi.interest_id)) as interests, group_concat(distinct(vs.skill_id)) as skills, vc.date_from
FROM `vacancies` as v
LEFT JOIN `vacancy_interests` as vi on v.vacancy_id = vi.vacancy_id
LEFT JOIN `vacancy_skills` as vs on v.vacancy_id = vs.vacancy_id
LEFT JOIN `vacancy_calendar` as vc on v.vacancy_id = vc.vacancy_id
WHERE v.vacancy_visibility_end_date >= CURDATE()
GROUP BY v.vacancy_id
HAVING interests IN (17)
This returns me only one row. Namely record with vacanacy_id 5, while it should also obviously also return vacancy_id = 6.
The thing that is the weirdest to me is that if I do the exact same thing but for skills (HAVING skills IN (4)), this does return me multiple rows with the correct result.
2) When I want to filter on the date_from (together with the interests and skills in the HAVING, I do the following:
SELECT v.*, group_concat(distinct(vi.interest_id)) as interests, group_concat(distinct(vs.skill_id)) as skills, vc.date_from
FROM `vacancies` as v
LEFT JOIN `vacancy_interests` as vi on v.vacancy_id = vi.vacancy_id
LEFT JOIN `vacancy_skills` as vs on v.vacancy_id = vs.vacancy_id
LEFT JOIN `vacancy_calendar` as vc on v.vacancy_id = vc.vacancy_id
WHERE v.vacancy_visibility_end_date >= CURDATE() AND date(vc.date_from) > '2016-09-10'
GROUP BY v.vacancy_id
HAVING skills IN (4)
This will only return me vacancy number 5, while obviously also vacancy number 2 has a date greater than 2016-09-10 (2016-09-13 00:00:00)....
What am i doing wrong here?
Your HAVING clauses are the wrong way to check for presence of a condition. Instead of using the concatenated value, just use:
HAVING MAX(vi.interest_id IN (17)) > 0
When you do:
HAVING interests IN (17)
Then you are comparing a string to a number. The string gets silently converted to a number. In this case, only the first element is converted. So, if the interests starts with "17," then it matches, otherwise it does not.
Also, note that your method of using distinct in group_concat() is fine, as long as there are not too many interests and skills. If there were 100 of each for a vacancy, then the intermediate result would have 10,000 rows -- and take longer to process. However, with just a handful of each, the method is fine.
Related
How do I divide data from a column in one table by data from a column in another table?
I'm pretty new to MySQL and am working on a query which should divide the data from a column in one table by the data from a column in another table. I've had a good hunt around the site but can't find quite what I'm looking for (or it may be a lack of understanding on my part). I wish to divide the TotalSpend by TotalHires and think I'm pretty close but it's not quite right. I have 4 rows of data in each column and am looking for 4 calculations to be executed. At the moment 16 results are being returned. I only wish to divide data which relates to the same member id but think that each cell in one column is being divided by each cell in the other column. How can I alter my code to return only the desired 4 calculations? Total spend: SELECT garment_hire_header.member_id, garment_hire_line.hire_id, SUM(garment_hire_line.days*catalogue.daily_rate) AS 'Total Spend' FROM garment_hire_line JOIN garment_hire_header ON garment_hire_line.hire_id = garment_hire_header.hire_id JOIN garment ON garment_hire_line.garment_id = garment.garment_id JOIN catalogue ON garment.catalogue_id = catalogue.catalogue_id Group by garment_hire_header.member_id TotalHires: SELECT garment_hire_header.member_id, COUNT(garment_hire_header.hire_id) as TotalHires FROM garment_hire_header Group by garment_hire_header.member_id Nearly working final code (returns 16 calculations): SELECT s.TotalSpend / h.TotalHires from (SELECT SUM(garment_hire_line.days*catalogue.daily_rate) AS TotalSpend FROM garment_hire_line JOIN garment_hire_header ON garment_hire_line.hire_id = garment_hire_header.hire_id JOIN garment ON garment_hire_line.garment_id = garment.garment_id JOIN catalogue ON garment.catalogue_id = catalogue.catalogue_id Group by garment_hire_header.member_id) s CROSS JOIN (SELECT COUNT(garment_hire_header.hire_id) as TotalHires FROM garment_hire_header Group by garment_hire_header.member_id) h Now that that's working I'm trying to push it a little bit further. I only want to display data if the hire_id is found in another query I've tried adding the following which has worked in the past on different queries but for some reason is not working in this case: WHERE garment_hire_line.hire_id IN (SELECT hire_id, DATE_ADD(date_out, INTERVAL (days) DAY) AS 'Expected Return', DATEDIFF(return_date, DATE_ADD(date_out, INTERVAL (days) DAY)) as 'Days Late' FROM garment_hire_line WHERE DATEDIFF(return_date, DATE_ADD(date_out, INTERVAL (days) DAY)) > 0 ORDER BY hire_id)
It is possible to fix this by just changing your cross join to a left join and then adding in the member_id to match up the tables. However, there is a much nicer approach to this situation: SELECT garment_hire_header.member_id, garment_hire_line.hire_id, SUM(garment_hire_line.days*catalogue.daily_rate) AS 'Total Spend', SQ1.TotalHires, SUM(garment_hire_line.days*catalogue.daily_rate)/SQ1.TotalHires AS `Total` FROM garment_hire_line LEFT JOIN garment_hire_header ON garment_hire_line.hire_id = garment_hire_header.hire_id LEFT JOIN garment ON garment_hire_line.garment_id = garment.garment_id LEFT JOIN catalogue ON garment.catalogue_id = catalogue.catalogue_id LEFT JOIN (SELECT garment_hire_header.member_id, COUNT(garment_hire_header.hire_id) as TotalHires FROM garment_hire_header Group by garment_hire_header.member_id) AS SQ1 ON SQ1.member_id = garment_hire_header.member_id Group by garment_hire_header.member_id; I've left your initial code as it was as much as possible. The key point being that as you are using the same tables for the calculation you don't need to place everything into sub-queries. I also renamed your joins to left joins, this doesn't change the way the query works but it is nicer to specify the type of join you are doing.
Getting Incorrect SUM for left joins and GROUP BY
I am getting wrong results in the sum of total deposits. I want to output a report of total deposits per campaign_name and eventually inside a date range. SELECT IFNULL(campaign_name,'DIRECT'), IFNULL(TotalDeposit,0) FROM trackings LEFT JOIN (SELECT deposit_amount, sum(deposit_amount) AS TotalDeposit, uuid FROM conversions LEFT JOIN transactions ON conversions.trader_id = transactions.trader_id WHERE aff_id =3 AND TYPE='deposit' GROUP BY transactions.trader_id) AS conversions ON trackings.uuid = conversions.uuid WHERE aff_id=3 GROUP BY campaign_name results: missing 200 from trynow campaign?? campaign_name,TotalDeposit DIRECT,0.00 new_campaign_name,0.00 test march,500.00 testing,0.00 trynow,800.00 expected results: campaign_name,TotalDeposit DIRECT,0.00 new_campaign_name,0.00 test march,500.00 testing,0.00 trynow,1000.00
I think your data isn't quite right - using the data that you've supplied, the deposit of 500 for test march is never going to be returned, as it is linked to trader_id 7506, who has no records in the conversions table. However, the following query is simpler and easier to understand, and correctly returns 1000 for trynow SELECT IFNULL(SUM(t.deposit_amount),0) AS total_deposits , IFNULL(tr.campaign_name,'DIRECT') AS campaign FROM trackings tr LEFT JOIN conversions c ON tr.uuid = c.uuid LEFT JOIN transactions t ON c.trader_id = t.trader_id AND tr.`aff_id` = t.aff_id AND t.type = 'Deposit' WHERE tr.aff_id = 3 AND tr.updated_at >= '2015-03-01' AND tr.updated_at < '2015-04-01' GROUP BY IFNULL(tr.campaign_name,'DIRECT') If you can check the test data supplied or otherwise point me in the right direction, I might be able to improve the query to return exactly what you want. For date filtering, see the addition to the where clause above. NOte that if you need to filter on a date in the transactions table, the date filtering clause must be part of the "on" statement instead (as this table is left-joined, so we can't filter in the main where clause).
Return the minimum of single field grouping in wider query
Here is a query: select e.eid, e.event_time as contact_created, cs.name as initial_class, cn.create_time as lead_date from bm_sets.event378 e inner join bm_config.classes cs on cs.id = e.class_id and cs.cid=378 # and cs.name = 'Candidate' left join bm_sets.conversion378 cn on cn.eid = e.eid and cn.create_time > e.event_time where e.eid = 283818 group by eid, contact_created, initial_class, lead_date The results of this query look like this: eid, contact_created, initial_class, lead_date 283818 2015-03-07 09:43:42 Hot 283818 2015-03-10 22:19:47 Candidate 283818 2015-03-10 22:22:11 Candidate I need to adjust this query so that only the first record is returned, the one with the min contact_created date. But since I'm using an aggregate function with other fields, I'm grouping by initial_class too so min is the min based on the combined groupings. Our server seems to struggle whenever I use a subquery. So I tried using another join as a filter, something like: inner join bm_sets.event378 e1 on e1.eid = e.eid and e1.event_time < e.event_time But I know before running it that this won't work since the eid (user id) 283818 will still be returned and thus all associated data. How can I restrict the results to only those records that correspond to the minimum of event_time? I am using the where condition 283818 (my own user id for debugging) only as a sanity check as I construct this query. The query, when ready, will not have this condition and the results will thus be for many users.
If you need only top 1 then you can call just select top 1 or you need make a function that returns you minimum created_contact then you just compare it with current created_contact. I hope that will help you Thanks
OK I got it. I used a null left join like so (on the back of other SO posts on the topic of groupwise min/max): added this to selector: e1.event_time added this to joins: left join bm_sets.event378 e1 on e1.eid = e.eid and e1.event_time < e.event_time added this to where clause: and e1.event_time is null
MySQL - Trying to show results for rows that have 0 records...across 3 columns
There's a lot of Q&A out there for how to make MySQL show results for rows that have 0 records, but they all involve 1-2 tables/fields at most. I'm trying to achieve the same ends, but across 3 fields, and I just can't seem to get it. Here's what I've hacked together: SELECT circuit.circuit_name, county.county_name, result.adr_result, count( result.adr_result ) AS num_results FROM ( SELECT cases.case_id, cases.county_id, cases.result_id FROM cases WHERE cases.status_id <> "2" ) q1 RIGHT JOIN county ON q1.county_id = county.county_id RIGHT JOIN circuit ON county.circuit_id = circuit.circuit_id RIGHT JOIN result ON q1.result_id = result.result_id GROUP BY adr_result, circuit_name, county_name ORDER BY circuit_name, county_name, adr_result What I need to see is a list of ALL circuits in the first column, a list of ALL counties per circuit in the second column, a list of ALL possible adr_result entries for each county (they're the same for every county) in the third column, and then the respective count for the circuit/county/result combination-- even if it is 0. I've tried every combination of left, right and inner join (I know inner is definitely not the solution, but I'm frustrated) and just can't see where I'm going wrong. Any help would be appreciated!
Here is a start. I can't follow your problem statement completely. For instance, what is the purposes of the cases table? None the less, when you say "ALL" records for each of those tables, I interpret it as a Cartesian product - which is implemented through the derived table in the FROM clause (notice the lack of the JOIN in that clause) SELECT everthingjoin.circuit_name , everthingjoin.county_name , everthingjoin.adr_result , COUNT(result.adr_result) AS num_results FROM (SELECT circuit.circuit_name, county.county_name, result.adr_result, FROM circuit JOIN county JOIN result) AS everthingjoin LEFT JOIN cases ON cases.status_id <> "2" AND cases.county_id = everthingjoin.county_id LEFT JOIN circuit ON everthingjoin.circuit_id = circuit.circuit_id LEFT JOIN result ON cases.result_id = result.result_id GROUP BY adr_result, circuit_name, county_name ORDER BY circuit_name, county_name, adr_result
try this, see if it provides some ideas: SELECT circuit.circuit_name , county.county_name , result.adr_result , ISNULL(COUNT(result.*)) AS num_results , COUNT(DISTINCT result.adr_result) AS num_distinct_results FROM cases LEFT JOIN county ON cases.county_id = county.county_id LEFT JOIN circuit ON county.circuit_id = circuit.circuit_id LEFT JOIN result ON cases.result_id = result.result_id WHERE cases.status_id <> "2" GROUP BY circuit.circuit_name , county.county_name , result.adr_result ORDER BY circuit_name, county_name, adr_result
How can I summarise several columns in MySQL
I have a table with a couple of thousand rows: from this I need to extract a total for column WTE for each value in column band, including those where the total is 0. I also need each total to be in a column of its own, so that I can easily update a summary table. The code I have at present returns the values from the relevant rows: SELECT IF(band="E",WTE,0) AS `Band6_WTE` FROM `orthoptists` AS o LEFT JOIN `instances` AS i ON o.instance_FK = i.id WHERE i.region = 14 But when I add SUM(), the return is incorrect (zero, when it should be several thousands): SELECT IF(band="E",SUM(WTE),0) AS `Band6_WTE` FROM `orthoptists` AS o LEFT JOIN `instances` AS i ON o.instance_FK = i.id WHERE i.region = 14 I have looked at http://en.wikibooks.org/wiki/MySQL/Pivot_table, but I do not understand how that approach should be applied to my problem. What should I do?
you must sum if: SELECT SUM(IF(band="E",WTE,0)) AS `Band6_WTE` FROM `orthoptists` AS o LEFT JOIN `instances` AS i ON o.instance_FK = i.id WHERE i.region = 14
In this particular case, wouldn't it be easier to use WHERE? SELECT SUM(WTE) AS `Band6_WTE` FROM `orthoptists` AS o LEFT JOIN `instances` AS i ON o.instance_FK = i.id WHERE i.region = 14 AND band = "E" For the general case, you could use GROUP BY since you say you need the result for each band (each value in column band): SELECT band, SUM(WTE) AS `WTE` FROM `orthoptists` AS o LEFT JOIN `instances` AS i ON o.instance_FK = i.id WHERE i.region = 14 GROUP BY band This will give either NULL or 0, you can instead use IFNULL to convert NULL to 0 if you like: IFNULL(SUM(WTE), 0) Edit: as you pointed out in the comments, you'd like multiple columns for the different bands rather than multiple rows. Generally speaking, you should not do that from SQL (use the second query and perform a pivot operation from your code), but there are exceptions, cases where it's significantly more complicated to do that outside of SQL, so here's how you could do it: SELECT (SELECT SUM(WTE) FROM `orthoptists` AS o LEFT JOIN `instances` AS i ON o.instance_FK = i.id WHERE i.region = 14 AND band = "E") AS `Band6_WTE`, (SELECT SUM(WTE) FROM `orthoptists` AS o LEFT JOIN `instances` AS i ON o.instance_FK = i.id WHERE i.region = 14 AND band = "F") AS `Band7_WTE`, (SELECT SUM(WTE) FROM `orthoptists` AS o LEFT JOIN `instances` AS i ON o.instance_FK = i.id WHERE i.region = 14 AND band = "G") AS `Band8_WTE` The precise syntax might need a little bit of tweaking (some databases require each SELECT to include a FROM clause, some may require a name for each column in a subselect, I don't think MySQL does but I can't check right now), but the principle should be applicable regardless.
Noody has explained why your original query give you the incorrect result. The original query has an aggregation function in the SELECT clause. This tells MySQL that this is an aggregation. There is no GROUP BY clause, so it returns one row, treating all rows as a single group. Now, what happens to band in this case? In the SQL standard or any other dialect of SQL, the original query would return an error, saying something like band is not aggregated. MySQL has a (mis)feature called Hidden Columns, which allows this syntax. It takes a random value of band from all the rows for the comparison. The value might have the value "E", in which case the sum of all WTE is returned. Or, it might have another value, in which case 0 is returned. In general, you should avoid using Hidden Columns. Any "bare" column in your SELECT statement should also be in the GROUP BY clause, when you have an aggregation query.