Group Concat having giving weird results - mysql

I have this SQL query :
SELECT v.*, group_concat(distinct(vi.interest_id)) as interests, group_concat(distinct(vs.skill_id)) as skills, vc.date_from
FROM `vacancies` as v
LEFT JOIN `vacancy_interests` as vi on v.vacancy_id = vi.vacancy_id
LEFT JOIN `vacancy_skills` as vs on v.vacancy_id = vs.vacancy_id
LEFT JOIN `vacancy_calendar` as vc on v.vacancy_id = vc.vacancy_id
WHERE v.vacancy_visibility_end_date >= CURDATE()
GROUP BY v.vacancy_id
Consider this query returning the following results (the 3 last columns are the ones discussed in this question):
vacancy_id,org_id,name,description,number_required,occupancy_kind,website,offer,logo,banner,address_country,address_city,address_postal_code,address_line_1,address_line_2,vacancy_visibility_start_date,vacancy_visibility_end_date,engagement,interests,skills,date_from
"2","1","test123","aze<sdgqswdfg","1","1","","blabla",NULL,"12049394_10208129537615226_4853636504350654671_n.jpg","Belgie","Brussel","1000","Brusselsestraat 15",NULL,"2016-09-02 00:00:00","2016-09-19 00:00:00","3","13,6,1","4,3","2016-09-13 00:00:00"
"3","1","blablabla","lkpjoip","1","2","","blabla",NULL,NULL,"Belgie","Antwerpen","2000","Antwerpsestraat 16",NULL,"2016-09-02 00:00:00","2016-09-29 00:00:00","3","28","7,8,5","2016-09-01 00:00:00"
"4","1","hahaha","14556dsf","1","3","","blabla",NULL,NULL,"Belgie","Mechelen","2800","Mechelsesteenweg 17",NULL,"2016-09-02 00:00:00","2016-09-28 00:00:00","3",NULL,NULL,"2016-09-26 00:00:00"
"5","1","omggg","45sdfdj5","1","1","","blabla",NULL,NULL,"Belgie","Gent","3000","Gentsesteenweg 18",NULL,"2016-09-02 00:00:00","2016-09-30 00:00:00","3","17,11","4,1","2016-09-19 00:00:00"
"6","1","this is a test","wauhiufdsq","1","2","","blabla",NULL,NULL,"Belgie","Luik","4000","Luikseweg 19",NULL,"2016-09-02 00:00:00","2016-09-30 00:00:00","3","19,17,22","6","2016-08-10 00:00:00"
Note that the vacancy interests and vacancy skills table can contain multiple records for a single vacancy. E.g. It could be that vacancy 3 has 3 rows with all different interest_id's. The group_concat solves my problem here.
So this query works fine as it should.
However, 2 problems I encountered are the following:
1) When I add a filter in HAVING on the interests by an ID this only returns me one row instead of the expected two rows.
SELECT v.*, group_concat(distinct(vi.interest_id)) as interests, group_concat(distinct(vs.skill_id)) as skills, vc.date_from
FROM `vacancies` as v
LEFT JOIN `vacancy_interests` as vi on v.vacancy_id = vi.vacancy_id
LEFT JOIN `vacancy_skills` as vs on v.vacancy_id = vs.vacancy_id
LEFT JOIN `vacancy_calendar` as vc on v.vacancy_id = vc.vacancy_id
WHERE v.vacancy_visibility_end_date >= CURDATE()
GROUP BY v.vacancy_id
HAVING interests IN (17)
This returns me only one row. Namely record with vacanacy_id 5, while it should also obviously also return vacancy_id = 6.
The thing that is the weirdest to me is that if I do the exact same thing but for skills (HAVING skills IN (4)), this does return me multiple rows with the correct result.
2) When I want to filter on the date_from (together with the interests and skills in the HAVING, I do the following:
SELECT v.*, group_concat(distinct(vi.interest_id)) as interests, group_concat(distinct(vs.skill_id)) as skills, vc.date_from
FROM `vacancies` as v
LEFT JOIN `vacancy_interests` as vi on v.vacancy_id = vi.vacancy_id
LEFT JOIN `vacancy_skills` as vs on v.vacancy_id = vs.vacancy_id
LEFT JOIN `vacancy_calendar` as vc on v.vacancy_id = vc.vacancy_id
WHERE v.vacancy_visibility_end_date >= CURDATE() AND date(vc.date_from) > '2016-09-10'
GROUP BY v.vacancy_id
HAVING skills IN (4)
This will only return me vacancy number 5, while obviously also vacancy number 2 has a date greater than 2016-09-10 (2016-09-13 00:00:00)....
What am i doing wrong here?

Your HAVING clauses are the wrong way to check for presence of a condition. Instead of using the concatenated value, just use:
HAVING MAX(vi.interest_id IN (17)) > 0
When you do:
HAVING interests IN (17)
Then you are comparing a string to a number. The string gets silently converted to a number. In this case, only the first element is converted. So, if the interests starts with "17," then it matches, otherwise it does not.
Also, note that your method of using distinct in group_concat() is fine, as long as there are not too many interests and skills. If there were 100 of each for a vacancy, then the intermediate result would have 10,000 rows -- and take longer to process. However, with just a handful of each, the method is fine.

Related

How do I divide data from a column in one table by data from a column in another table?

I'm pretty new to MySQL and am working on a query which should divide the data from a column in one table by the data from a column in another table. I've had a good hunt around the site but can't find quite what I'm looking for (or it may be a lack of understanding on my part). I wish to divide the TotalSpend by TotalHires and think I'm pretty close but it's not quite right.
I have 4 rows of data in each column and am looking for 4 calculations to be executed. At the moment 16 results are being returned. I only wish to divide data which relates to the same member id but think that each cell in one column is being divided by each cell in the other column.
How can I alter my code to return only the desired 4 calculations?
Total spend:
SELECT garment_hire_header.member_id, garment_hire_line.hire_id, SUM(garment_hire_line.days*catalogue.daily_rate) AS 'Total Spend'
FROM garment_hire_line
JOIN garment_hire_header ON garment_hire_line.hire_id = garment_hire_header.hire_id
JOIN garment ON garment_hire_line.garment_id = garment.garment_id
JOIN catalogue ON garment.catalogue_id = catalogue.catalogue_id
Group by garment_hire_header.member_id
TotalHires:
SELECT garment_hire_header.member_id, COUNT(garment_hire_header.hire_id) as TotalHires
FROM garment_hire_header
Group by garment_hire_header.member_id
Nearly working final code (returns 16 calculations):
SELECT s.TotalSpend / h.TotalHires
from (SELECT SUM(garment_hire_line.days*catalogue.daily_rate) AS TotalSpend
FROM garment_hire_line
JOIN garment_hire_header ON garment_hire_line.hire_id = garment_hire_header.hire_id
JOIN garment ON garment_hire_line.garment_id = garment.garment_id
JOIN catalogue ON garment.catalogue_id = catalogue.catalogue_id
Group by garment_hire_header.member_id) s CROSS JOIN
(SELECT COUNT(garment_hire_header.hire_id) as TotalHires
FROM garment_hire_header
Group by garment_hire_header.member_id) h
Now that that's working I'm trying to push it a little bit further. I only want to display data if the hire_id is found in another query
I've tried adding the following which has worked in the past on different queries but for some reason is not working in this case:
WHERE garment_hire_line.hire_id IN
(SELECT hire_id, DATE_ADD(date_out, INTERVAL (days) DAY) AS 'Expected Return', DATEDIFF(return_date, DATE_ADD(date_out, INTERVAL (days) DAY)) as 'Days Late'
FROM garment_hire_line
WHERE DATEDIFF(return_date, DATE_ADD(date_out, INTERVAL (days) DAY)) > 0
ORDER BY hire_id)
It is possible to fix this by just changing your cross join to a left join and then adding in the member_id to match up the tables. However, there is a much nicer approach to this situation:
SELECT garment_hire_header.member_id, garment_hire_line.hire_id,
SUM(garment_hire_line.days*catalogue.daily_rate) AS 'Total Spend',
SQ1.TotalHires,
SUM(garment_hire_line.days*catalogue.daily_rate)/SQ1.TotalHires AS `Total`
FROM garment_hire_line
LEFT JOIN garment_hire_header ON garment_hire_line.hire_id =
garment_hire_header.hire_id
LEFT JOIN garment ON garment_hire_line.garment_id = garment.garment_id
LEFT JOIN catalogue ON garment.catalogue_id = catalogue.catalogue_id
LEFT JOIN (SELECT garment_hire_header.member_id,
COUNT(garment_hire_header.hire_id) as TotalHires
FROM garment_hire_header
Group by garment_hire_header.member_id) AS SQ1 ON SQ1.member_id =
garment_hire_header.member_id
Group by garment_hire_header.member_id;
I've left your initial code as it was as much as possible. The key point being that as you are using the same tables for the calculation you don't need to place everything into sub-queries. I also renamed your joins to left joins, this doesn't change the way the query works but it is nicer to specify the type of join you are doing.

Getting Incorrect SUM for left joins and GROUP BY

I am getting wrong results in the sum of total deposits.
I want to output a report of total deposits per campaign_name
and eventually inside a date range.
SELECT IFNULL(campaign_name,'DIRECT'),
IFNULL(TotalDeposit,0)
FROM trackings
LEFT JOIN
(SELECT deposit_amount,
sum(deposit_amount) AS TotalDeposit,
uuid
FROM conversions
LEFT JOIN transactions ON conversions.trader_id = transactions.trader_id
WHERE aff_id =3
AND TYPE='deposit'
GROUP BY transactions.trader_id) AS conversions ON trackings.uuid = conversions.uuid
WHERE aff_id=3
GROUP BY campaign_name
results: missing 200 from trynow campaign??
campaign_name,TotalDeposit
DIRECT,0.00
new_campaign_name,0.00
test march,500.00
testing,0.00
trynow,800.00
expected results:
campaign_name,TotalDeposit
DIRECT,0.00
new_campaign_name,0.00
test march,500.00
testing,0.00
trynow,1000.00
I think your data isn't quite right - using the data that you've supplied, the deposit of 500 for test march is never going to be returned, as it is linked to trader_id 7506, who has no records in the conversions table.
However, the following query is simpler and easier to understand, and correctly returns 1000 for trynow
SELECT
IFNULL(SUM(t.deposit_amount),0) AS total_deposits
, IFNULL(tr.campaign_name,'DIRECT') AS campaign
FROM
trackings tr LEFT JOIN
conversions c ON
tr.uuid = c.uuid LEFT JOIN
transactions t ON
c.trader_id = t.trader_id AND
tr.`aff_id` = t.aff_id AND
t.type = 'Deposit'
WHERE
tr.aff_id = 3 AND
tr.updated_at >= '2015-03-01' AND tr.updated_at < '2015-04-01'
GROUP BY
IFNULL(tr.campaign_name,'DIRECT')
If you can check the test data supplied or otherwise point me in the right direction, I might be able to improve the query to return exactly what you want.
For date filtering, see the addition to the where clause above. NOte that if you need to filter on a date in the transactions table, the date filtering clause must be part of the "on" statement instead (as this table is left-joined, so we can't filter in the main where clause).

Return the minimum of single field grouping in wider query

Here is a query:
select
e.eid,
e.event_time as contact_created,
cs.name as initial_class,
cn.create_time as lead_date
from bm_sets.event378 e
inner join bm_config.classes cs on cs.id = e.class_id and cs.cid=378 # and cs.name = 'Candidate'
left join bm_sets.conversion378 cn on cn.eid = e.eid and cn.create_time > e.event_time
where e.eid = 283818
group by eid, contact_created, initial_class, lead_date
The results of this query look like this:
eid, contact_created, initial_class, lead_date
283818 2015-03-07 09:43:42 Hot
283818 2015-03-10 22:19:47 Candidate
283818 2015-03-10 22:22:11 Candidate
I need to adjust this query so that only the first record is returned, the one with the min contact_created date. But since I'm using an aggregate function with other fields, I'm grouping by initial_class too so min is the min based on the combined groupings.
Our server seems to struggle whenever I use a subquery. So I tried using another join as a filter, something like:
inner join bm_sets.event378 e1 on e1.eid = e.eid and e1.event_time < e.event_time
But I know before running it that this won't work since the eid (user id) 283818 will still be returned and thus all associated data.
How can I restrict the results to only those records that correspond to the minimum of event_time?
I am using the where condition 283818 (my own user id for debugging) only as a sanity check as I construct this query. The query, when ready, will not have this condition and the results will thus be for many users.
If you need only top 1 then you can call just
select top 1
or you need make a function that returns you minimum created_contact then you just compare it with current created_contact.
I hope that will help you
Thanks
OK I got it. I used a null left join like so (on the back of other SO posts on the topic of groupwise min/max):
added this to selector:
e1.event_time
added this to joins:
left join bm_sets.event378 e1 on e1.eid = e.eid and e1.event_time < e.event_time
added this to where clause:
and e1.event_time is null

MySQL - Trying to show results for rows that have 0 records...across 3 columns

There's a lot of Q&A out there for how to make MySQL show results for rows that have 0 records, but they all involve 1-2 tables/fields at most.
I'm trying to achieve the same ends, but across 3 fields, and I just can't seem to get it.
Here's what I've hacked together:
SELECT circuit.circuit_name, county.county_name, result.adr_result, count( result.adr_result ) AS num_results
FROM
(
SELECT cases.case_id, cases.county_id, cases.result_id
FROM cases
WHERE cases.status_id <> "2"
) q1
RIGHT JOIN county ON q1.county_id = county.county_id
RIGHT JOIN circuit ON county.circuit_id = circuit.circuit_id
RIGHT JOIN result ON q1.result_id = result.result_id
GROUP BY adr_result, circuit_name, county_name
ORDER BY circuit_name, county_name, adr_result
What I need to see is a list of ALL circuits in the first column, a list of ALL counties per circuit in the second column, a list of ALL possible adr_result entries for each county (they're the same for every county) in the third column, and then the respective count for the circuit/county/result combination-- even if it is 0. I've tried every combination of left, right and inner join (I know inner is definitely not the solution, but I'm frustrated) and just can't see where I'm going wrong.
Any help would be appreciated!
Here is a start. I can't follow your problem statement completely. For instance, what is the purposes of the cases table? None the less, when you say "ALL" records for each of those tables, I interpret it as a Cartesian product - which is implemented through the derived table in the FROM clause (notice the lack of the JOIN in that clause)
SELECT everthingjoin.circuit_name
, everthingjoin.county_name
, everthingjoin.adr_result
, COUNT(result.adr_result) AS num_results
FROM
(SELECT circuit.circuit_name, county.county_name, result.adr_result,
FROM circuit
JOIN county
JOIN result) AS everthingjoin
LEFT JOIN cases
ON cases.status_id <> "2"
AND cases.county_id = everthingjoin.county_id
LEFT JOIN circuit
ON everthingjoin.circuit_id = circuit.circuit_id
LEFT JOIN result
ON cases.result_id = result.result_id
GROUP BY adr_result, circuit_name, county_name
ORDER BY circuit_name, county_name, adr_result
try this, see if it provides some ideas:
SELECT
circuit.circuit_name
, county.county_name
, result.adr_result
, ISNULL(COUNT(result.*)) AS num_results
, COUNT(DISTINCT result.adr_result) AS num_distinct_results
FROM cases
LEFT JOIN county
ON cases.county_id = county.county_id
LEFT JOIN circuit
ON county.circuit_id = circuit.circuit_id
LEFT JOIN result
ON cases.result_id = result.result_id
WHERE cases.status_id <> "2"
GROUP BY
circuit.circuit_name
, county.county_name
, result.adr_result
ORDER BY
circuit_name, county_name, adr_result

How can I summarise several columns in MySQL

I have a table with a couple of thousand rows: from this I need to extract a total for column WTE for each value in column band, including those where the total is 0. I also need each total to be in a column of its own, so that I can easily update a summary table.
The code I have at present returns the values from the relevant rows:
SELECT
IF(band="E",WTE,0) AS `Band6_WTE`
FROM `orthoptists` AS o
LEFT JOIN `instances` AS i
ON o.instance_FK = i.id
WHERE i.region = 14
But when I add SUM(), the return is incorrect (zero, when it should be several thousands):
SELECT
IF(band="E",SUM(WTE),0) AS `Band6_WTE`
FROM `orthoptists` AS o
LEFT JOIN `instances` AS i
ON o.instance_FK = i.id
WHERE i.region = 14
I have looked at http://en.wikibooks.org/wiki/MySQL/Pivot_table, but I do not understand how that approach should be applied to my problem.
What should I do?
you must sum if:
SELECT
SUM(IF(band="E",WTE,0)) AS `Band6_WTE`
FROM `orthoptists` AS o
LEFT JOIN `instances` AS i
ON o.instance_FK = i.id
WHERE i.region = 14
In this particular case, wouldn't it be easier to use WHERE?
SELECT SUM(WTE) AS `Band6_WTE`
FROM `orthoptists` AS o
LEFT JOIN `instances` AS i
ON o.instance_FK = i.id
WHERE i.region = 14
AND band = "E"
For the general case, you could use GROUP BY since you say you need the result for each band (each value in column band):
SELECT band, SUM(WTE) AS `WTE`
FROM `orthoptists` AS o
LEFT JOIN `instances` AS i
ON o.instance_FK = i.id
WHERE i.region = 14
GROUP BY band
This will give either NULL or 0, you can instead use IFNULL to convert NULL to 0 if you like:
IFNULL(SUM(WTE), 0)
Edit: as you pointed out in the comments, you'd like multiple columns for the different bands rather than multiple rows. Generally speaking, you should not do that from SQL (use the second query and perform a pivot operation from your code), but there are exceptions, cases where it's significantly more complicated to do that outside of SQL, so here's how you could do it:
SELECT
(SELECT SUM(WTE)
FROM `orthoptists` AS o
LEFT JOIN `instances` AS i
ON o.instance_FK = i.id
WHERE i.region = 14
AND band = "E") AS `Band6_WTE`,
(SELECT SUM(WTE)
FROM `orthoptists` AS o
LEFT JOIN `instances` AS i
ON o.instance_FK = i.id
WHERE i.region = 14
AND band = "F") AS `Band7_WTE`,
(SELECT SUM(WTE)
FROM `orthoptists` AS o
LEFT JOIN `instances` AS i
ON o.instance_FK = i.id
WHERE i.region = 14
AND band = "G") AS `Band8_WTE`
The precise syntax might need a little bit of tweaking (some databases require each SELECT to include a FROM clause, some may require a name for each column in a subselect, I don't think MySQL does but I can't check right now), but the principle should be applicable regardless.
Noody has explained why your original query give you the incorrect result.
The original query has an aggregation function in the SELECT clause. This tells MySQL that this is an aggregation. There is no GROUP BY clause, so it returns one row, treating all rows as a single group.
Now, what happens to band in this case? In the SQL standard or any other dialect of SQL, the original query would return an error, saying something like band is not aggregated.
MySQL has a (mis)feature called Hidden Columns, which allows this syntax. It takes a random value of band from all the rows for the comparison. The value might have the value "E", in which case the sum of all WTE is returned. Or, it might have another value, in which case 0 is returned.
In general, you should avoid using Hidden Columns. Any "bare" column in your SELECT statement should also be in the GROUP BY clause, when you have an aggregation query.