Mysql count group by and left join not working as expected - mysql

Below is a simple query for a click tracker. I've had a look at a lot of other posts and I'm scratching my head. I cannot get this query to work so that all rows from the calendar table (one calendar day per row) are displayed:
SELECT DATE_FORMAT(calendar_date, '%a %D') AS calendar_date,
count( tracker_id ) as clicks
FROM calendar
LEFT JOIN offer_tracker USING(calendar_id)
WHERE
calendar_month = Month(CURDATE()) AND
calendar_year = Year(CURDATE()) AND ( offer_id = 4 OR offer_id IS NULL )
GROUP BY calendar_date;
It's nearly there but not all rows in the calendar table are returned i.e. there is no Fri 2nd, Tue 6th, Wed 7th etc:
Does anyone have any ideas on where I'm going wrong? Should I be using a subquery?

I guess the offer_id is from the offer_tracker table. When you have an (left) outer join, and you use a field from the right table in a WHERE condition (like your offer_id = 4), the join is actually cancelled and gives same results as an inner join.
The attempt to lift the cancellation (offer_id = 4 OR offer_id IS NULL) does not work as you expect. Any row from offer_tracker with offer_id <> 4 has already passed the LEFT JOIN but is removed because of the WHERE condition. So, no row with Friday 2nd will appear in the results if there is a row with offer_id different than 4 for this date.
Move the offer_id = 4 check to the LEFT JOIN, instead:
SELECT DATE_FORMAT(calendar_date, '%a %D') AS calendar_date
, count( tracker_id ) as clicks
FROM calendar
LEFT JOIN offer_tracker
ON offer_tracker.calendar_id = calendar.calendar_id
AND offer_id = 4
WHERE calendar_month = Month(CURDATE())
AND calendar_year = Year(CURDATE())
GROUP BY calendar_date

Related

Limiting SQL LIKE to same wildcard in AND parameter

I'm trying to create a SQL query that retrieves records based on the start and end dates. However, the system doesn't have a very nice Database schema, so I'm looking for a way to create a query that works with it.
Basically, there's a meta table that stores the events' dates using the "date_NUMBER_(start|end)" format:
event_id
meta_key
meta_value
1
date_0_start
2022-04-02
1
date_0_end
2022-04-03
-
-
-
2
date_0_start
2022-03-21
2
date_0_end
2022-03-22
2
date_1_start
2022-06-24
2
date_1_end
2022-06-30
So, Event 1 has one date span: 2022-04-02 to 2022-04-03. While Event 2 has two: 2022-03-21 to 2022-03-22 and 2022-06-24 to 2022-06-30.
After 2022-03-22, Event 2 is not active, starting again only on 2022-06-24.
To create a filter that works with this schema, I came up with the following SQL query. For example, to search for events happening between the 2022-04-01 - 2022-04-03 range (any event that there's an event day between the range):
SELECT events.id
FROM events INNER JOIN events_meta ON ( events.id = events_meta.event_id ) INNER JOIN events_meta AS evt1 ON (
events.id = evt1.event_id )
WHERE
(
(
events_meta.meta_key LIKE 'date_%_start' AND
CAST(events_meta.meta_value AS DATE) >= '2022-04-01'
)
AND
(
evt1.meta_key LIKE 'date_%_end' AND
CAST(evt1.meta_value AS DATE) <= '2022-04-03'
)
)
The problem is that in this way, I should get only Event 1, but Event 2 is also returned since date_1_start is >= '2022-04-01' and date_0_end is <= '2022-04-03'
I'm not being able to find a way where I can match the NUMBER in the "date_NUMBER_(start|end)" meta_key format, so the query doesn't compare different NUMBERs.
Any help is appreciated :)
I made a fiddle with the INNER JOIN query:
http://sqlfiddle.com/#!9/fb497e/2
Use a self-join to get different keys for the same event ID. See MYSQL Select from tables based on multiple rows
SELECT m1.event_id, m1.meta_value AS start, m2.meta_value AS end
FROM events_meta AS m1
JOIN events_meta AS m2
ON m1.event_id = m2.event_id
AND SUBSTRING_INDEX(m1.meta_key, '_', 2) = SUBSTRING_INDEX(m2.meta_key, '_', 2)
WHERE m1.meta_key LIKE 'date_%_start' AND m2.meta_key LIKE 'date_%_end'
AND CAST(m1.meta_value AS DATE) >= '2022-04-01'
AND CAST(m2.meta_value AS DATE) <= '2022-04-03'
The SUBSTRING_INDEX() calls will return the prefixes date_0, date_1, etc. Including this in the ON condition will pair up the corresponding start and end keys.

Comparing Dates in a MYSQL Subquery

I have two tables
class
-------------
id name
-------------
1 Knives
2 Pastries
class_date
-------------
get_id start_date
-------------
1 2017-10-09
1 2017-11-15
1 2017-12-03
2 2017-10-30
The class 'Knives' is a series with multiple dates. The class 'Pastries' is only offered on one date.
I want my result to be based on Oct 10, 2017 (or current date). In my search I only want results based on the first date - in this case the date of Oct 9, 2017 for 'Knives' should disqualify it from showing up in the results. 'Pastries' should show up.
I am not sure if I should do a LEFT OUTER JOIN or a Subquery. I've tried both but neither works - but I'm probably not doing it correctly.
This is what I tried:
SELECT *
FROM class, class_date WHERE
class_date.get_id = class.id &&
(SELECT DATE(start_date)
FROM class, class_date WHERE
class_date.get_id = classes.id
ORDER BY class_date.start_date ASC
LIMIT 1
) > CURDATE()
ORDER BY class_date.start_date ASC
and
SELECT *
FROM class
LEFT OUTER JOIN
class_date ON
class_date.get_id = classes.id
WHERE
class_date.start_date > CURDATE()
GROUP BY classes.class_id
ORDER BY class_dates.start_date ASC
I have a feeling that the subquery is the way to go but I get no results. If I use < instead of > I get too many results. Any help would be appreciated.
Here is one method to get the most recent record as of a particular date. This allows you to get all the rows (and you can join in class to get rows there):
select cd.*
from class_date cd
where cd.date = (select max(cd2.date)
from class_date cd2
where cd2.get_id = cd.get_id and
cd2.date <= '2017-10-09'
);
If you just want the maximum date for a given class:
select cd.get_id, max(cd.date)
from class_date cd
where cd.date <= '2017-10-09'
group by cd.get_id;

Join to table according to date

I have two tables, one is a list of firms, the other is a list of jobs the firms have advertised with deadlines for application and start dates.
Some of the firms will have advertised no jobs, some will only have jobs that are past their deadline dates, some will only have live jobs and others will have past and live applications.
What I want to be able to show as a result of a query is a list of all the firms, with the nearest deadline they have, sorted by that deadline. So the result might look something like this (if today was 2015-01-01).
Sorry, I misstated that. What I want to be able to do is find the next future deadline, and if there is no future deadline then show the last past deadline. So in the first table below the BillyCo deadline has passed, but the next BuffyCo deadline is shown. In the BillyCo case there are earlier deadlines, but in the BuffyCo case there are both earlier and later deadlines.
id name title date
== ==== ===== ====
1 BobCo null null
2 BillCo Designer 2014-12-01
3 BuffyCo Admin 2015-01-31
So, BobCo has no jobs listed at all, BillCo has a deadline that has passed and BuffyCo has a deadline in the future.
The problematic part is that BillCo may have a set of jobs like this:
id title date desired hit
== ===== ==== ===========
1 Coder 2013-12-01
2 Manager 2014-06-30
3 Designer 2012-12-01 <--
And BuffyCo might have:
id title date desired hit
== ===== ==== ===========
1 Magician 2013-10-01
2 Teaboy 2014-05-19
3 Admin 2015-01-31 <--
4 Writer 2015-02-28
So, I can do something like:
select * from (
select * from firms
left join jobs on firms.id = jobs.firmid
order by date desc)
as t1 group by firmid;
Or, limit the jobs joined or returned by a date criterion, but I don't seem to be able to get the records I want returned. ie the above query would return:
id name title date
== ==== ===== ====
1 BobCo null null
2 BillCo Designer 2014-12-01
3 BuffyCo Writer 2015-02-28
For BuffyCo it's returning the Writer job rather than the Admin job.
Is it impossible with an SQL query? Any advice appreciated, thanks in advance.
I think this may be what you need, you need:
1) calculate the delta for all of your jobs between the date and the current date finding the min delta for each firm.
2) join firms to jobs only on where firm id's match and where the calculated min delta for the firm matches the delta for the row in jobs.
SELECT f.id, f.name, j.title,j.date
FROM firms f LEFT JOIN
(SELECT firmid,MIN(abs(datediff(date, curdate())))) AS delta
FROM jobs
GROUP BY firmid) d
ON f.id = d.firmid
LEFT JOIN jobs j ON f.id = j.id AND d.delta = abs(datediff(j.date, curdate())))) ;
You want to make an outer join with something akin to the group-wise maximum of (next upcoming, last expired):
SELECT * FROM firms LEFT JOIN (
-- fetch the "groupwise" record
jobs NATURAL JOIN (
-- using the relevant date for each firm
SELECT firmid, MAX(closest_date) date
FROM (
-- next upcoming deadline
SELECT firmid, MIN(date) closest_date
FROM jobs
WHERE date >= CURRENT_DATE
GROUP BY firmid
UNION ALL
-- most recent expired deadline
SELECT firmid, MAX(date)
FROM jobs
WHERE date < CURRENT_DATE
GROUP BY firmid
) closest_dates
GROUP BY firmid
) selected_dates
) ON jobs.firmid = firms.id
This will actually give you all jobs that have the best deadline date for each firm. If you want to restrict the results to an indeterminate record from each such group, you can add GROUP BY firms.id to the very end.
The revision to your question makes it rather trickier, but it can still be done. Try this:
select
closest_job.*, firm.name
from
firms
left join (
select future_job.*
from
(
select firmid, min(date) as mindate
from jobs
where date >= curdate()
group by firmid
) future
inner join jobs future_job
on future_job.firmid = future.firmid and future_job.date = future.mindate
union all
select past_job.*
from
(
select firmid, max(date) as maxdate
from jobs
group by firmid
having max(date) < curdate()
) past
inner join jobs past_job
on past_job.firmid = past.firmid and past_job.date = past.maxdate
) closest_job
on firms.id = closest_job.firmid
I think this does what I need:
select * from (
select firms.name, t2.closest_date from firms
left join
(
select * from (
--get first date in the future
SELECT firmid, MIN(date) closest_date
FROM jobs
WHERE date >= CURRENT_DATE
GROUP BY firmid
UNION ALL
-- most recent expired deadline
SELECT firmid, MAX(date)
FROM jobs
WHERE date < CURRENT_DATE
GROUP BY firmid) as t1
-- order so latest date is first
order by closest_date desc) as t2
on firms.id = t2.firmid
-- group by eliminates all but latest date
group by firms.id) as t3
order by closest_date asc;
Thanks for all the help on this

Count by month with only two date fields - IN and OUT

Haven't been able to find an answer to this specific issue. Need a total count of inventory grouped by month on different products. Source data has date fields, one for IN and one for OUT. Total count for a specific month would include an aggregate sum of all rows with an IN date prior to specific month as long as the out date is null or a date after the specific month.
Obviously I can get a count for any given month by writing a query for count(distinct productID) with a WHERE clause stating that the IN Date be before the month I'm interested in (IE September 2012) AND the Out Date is null or after 9/2012:
Where ((in_date <= '2012-09-30') AND (out_date >= '2012-09-01' or out_date is null))
If the product was even part of inventory for one day in September I want it to count which is why out date above 9/1/12. Sample data below. Instead of querying for a specific month, how can I turn this:
Raw Data - Each Row Is Individual Item
InDate OutDate ProductAttr ProductID
2008-04-05 NULL Blue 101
2008-06-04 NULL Red 125
2008-01-01 2012-06-01 Blue 134
2008-12-10 2012-10-09 Red 129
2009-10-15 2012-11-01 Blue 153
2012-10-01 2013-06-01 Red 149
Into this?:
Date ProductAttr Count
2008-04 Blue 503
2008-04 Red 1002
2008-05 Blue 94
2008-05 Red 3004
2008-06 Blue 2000
2008-06 Red 322
Through grouping I can get the raw data into this format grouped by months:
InDate OutDate Value Count
2008-05 2012-05 Blue 119
2008-05 2008-06 Red 333
2008-05 2012-10 Blue 4
2008-05 NULL Red 17488
2008-06 2012-11 Blue 711
2008-06 2013-02 Red 34
If you wanted to know how many products were 'IN' as of Oct. 2012- you would sum the counts of all rows except for 2. Group on Value to keep blue and red separate. Row 2 is ruled out because OutDate is before Oct. 2012.
Thank in advance.
EDIT:
Gordon Linoff's solution works just like I need it to. The only issue I am having now is the size and efficiency of the query, because the part I left out above is that the product attribute is actually located in a different table then the IN/OUT dates and I also need to join a third table to limit to a certain type of product (ForSale for example). I have tried two different approaches and they both work and return the same data, but both take far too long to automate this report:
select months.mon, count(distinct d.productID), d.ProductAttr
from (select '2008-10' as mon union all
select '2008-11' union all
select '2008-12' union all
select '2009-01'
) months left outer join
t
on months.mon >= date_format(t.Indate, '%Y-%m') and
(months.mon <= date_format(t.OutDate, '%Y-%m') or t.OutDate is NULL)
join x on x.product_id = t.product_id and x.type = 'ForSale'
join d on d.product_id = x.product_id and d.type = 'Attribute'
group by months.mon, d.ProductAttr;
Also tried the above without the last two joins by adding subqueries for the product attribute and where/exclusion - this seems to run about the same or a bit slower:
select months.mon, count(distinct t.productID), (select ProductAttr from d where productid = t.productID and type = 'attribute' limit 1)
from (select '2008-10' as mon union all
select '2008-11' union all
select '2008-12' union all
select '2009-01'
) months left outer join
t
on months.mon >= date_format(t.Indate, '%Y-%m') and
(months.mon <= date_format(t.OutDate, '%Y-%m') or t.OutDate is NULL)
WHERE exists (select 1 from x where x.productid = t.productID and x.type = 'ForSale')
group by months.mon, d.ProductAttr;
Any ideas to make this more efficient with the additional data that I need to rely on 3 source tables in total (1 just for exclusion). Thanks in advance.
You can do this by generating a list of the months that you need. The easiest way is to do this manually in MySQL (although generating the code in Excel can make this easier).
Then use a left join and aggregation to get the information you want:
select months.mon, t.ProductAttr, count(distinct t.productID)
from (select '2008-10' as mon union all
select '2008-11' union all
select '2008-12' union all
select '2009-01'
) months left outer join
t
on months.mon >= date_format(t.Indate, '%Y-%m') and
(months.mon <= date_format(t.OutDate, '%Y-%m) or t.OutDate is NULL)
group by t months.mon, t.ProductAttr;
This version does all the comparisons as strings. You are working at the granularity of "month" and the format YYYY-MM does a good job for comparisons.
EDIT:
You do need every month that you want in the output. If you have products coming in every month, then you could do:
select months.mon, t.ProductAttr, count(distinct t.productID)
from (select distinct date_format(t.InDate, '%Y-%m') as mon
from t
) months left outer join
t
on months.mon >= date_format(t.InDate, '%Y-%m') and
(months.mon <= date_format(t.OutDate, '%Y-%m) or t.OutDate is NULL)
group by t months.mon, t.ProductAttr;
This pulls the months from the data.

mysql trying a select with multiple conditions

I have 2 tables with values like below:
tbl_users
user_ID name
1 somename1
2 somename2
3 somename3
tbl_interviews
int_ID user_ID answer date
1 1 sometextaba 2012-11-04
2 2 sometextxcec 2012-10-05
3 1 sometextabs 2011-06-04
4 3 sometextxcfc 2012-11-04
5 3 sometextxcdn 2012-11-04
how can i ask mysql tell me who is the only user in the table above that was interviewed this year but had also another interview in the previous years? the only one is the user with id = 1 (since he had an interview (the int_id 1) this year, but the first interview was in 2011 (int-id 3). )
unfortunately I'm not able even to select them..
By joining the table against itself, where one side of the join only includes interviews from this year and the other side only includes previous years, the result of the INNER JOIN will be users having both.
Because it doesn't need to rely on any aggregates or subqueries, this method should be extremely efficient. Especially so, if the date column has an index.
SELECT
DISTINCT
thisyear.user_ID,
name
FROM
/* Left side of join retrieces only this year (year=2012) */
tbl_interviews thisyear
/* Right side retrieves year < 2012 */
/* The combined result will elmininate any users who don't exist on both sides of the join */
INNER JOIN tbl_interviews previous_years ON thisyear.user_ID = previous_years.user_ID
/* and JOIN in the user table to get a name */
INNER JOIN tbl_users ON tbl_users.user_ID = thisyear.user_ID
WHERE
YEAR(thisyear.date) = 2012
AND YEAR(previous_years.date) < 2012
Here is a demonstration on SQLFiddle
A simple approach, perhaps less efficient than JOINs
SELECT DISTINCT user_ID
FROM tbl_interviews
WHERE user_ID IN (
SELECT user_ID
FROM tbl_interviews
WHERE date < 2012-01-01
)
AND user_ID IN (
SELECT user_ID
FROM tbl_interviews
WHERE date > 2012-01-01
)
Following gives you the users taking interviews in Current year, only those who also had appeared in some Previous year/s
SELECT Distinct tc.user_ID FROM tbl_interviews tc
INNER JOIN tbl_interviews tp ON tc.user_ID = tp.user_ID
WHERE YEAR(tc.date) = Year(curDate()) AND YEAR(tp.date) < Year(curDate());
SqlFiddle Demo
Here is a version with no joins, and only one subselect.
SELECT user_id
FROM (
SELECT user_id,
MAX(date) AS last_interview,
COUNT(int_id) AS interviews
FROM tbl_interviews
GROUP BY user_id) AS t
WHERE YEAR(last_interview) = 2012 AND interviews > 1
You can group tbl_interviews by user_id to count the number of interviews per user, and then filter for users who have more than one interview (in addition to having an interview this year). There a number of variations on this theme, according to your specific needs, so let me know if needs a tweak.
For example, this should work as well.
SELECT user_id
FROM (
SELECT user_id,
BIT_OR(YEAR(date) = 2012) AS this_year,
BIT_OR(YEAR(date) < 2012) AS other_year
FROM tbl_interviews
GROUP BY user_id) AS t
WHERE this_year AND other_year