GROUP BY MONTH() hide result - mysql

I'm trying to count how many result in each month.
This is my query :
SELECT
COUNT(*) as nb,
CONCAT(MONTH(t.date),0x3a,YEAR(t.date)) as period
FROM table1 t
WHERE t.criteria = 'value'
GROUP BY MONTH(t.date)
ORDER BY YEAR(t.date)
My Result:
nb period
---------------
7 6:2009
46 8:2009
2 10:2009
1 11:2009
14 1:2009
9 9:2010
161 7:2010
5 2:2010
88 3:2010
28 4:2010
4 5:2011
2 12:2011
The problem is, I'm sure that I've result between 5:2011 & 12:2011 , and each other period
since 2009 ... :/
This is a problem of my request or mysql configuration ?
Thx a lot

You have to group by both the year and the month. Otherwise your April 2012 rows are grouped with April 2011 (and April 2010 ...) rows as well.
SELECT
COUNT(*) AS nb,
CONCAT(MONTH(t.date), ':', YEAR(t.date)) AS period
FROM table1 AS t
WHERE t.criteria = 'value'
GROUP BY YEAR(t.date)
, MONTH(t.date) ;
(and is there a reason you used 0x3a and not ':'?)
You could also use some other DATE and TIME functions of MySQL so there are fewer functions calls per row and probably a more efficient query:
SELECT
COUNT(*) AS nb,
DATE_FORMAT(t.date, '%m:%Y') AS period
FROM table1 AS t
WHERE t.criteria = 'value'
GROUP BY EXTRACT( YEAR_MONTH FROM t.date) ;
For several queries, it's useful to have a permanent Calendar table in your database (with all dates or all year-months) or even several Calendar tables. Example:
CREATE TABLE CalendarYear
( Year SMALLINT UNSIGNED NOT NULL
, PRIMARY KEY (Year)
) ENGINE = InnoDB ;
INSERT INTO CalendarYear
(Year)
VALUES
(1900), (1901), ..., (2099) ;
CREATE TABLE CalendarMonth
( Month TINYINT UNSIGNED NOT NULL
, PRIMARY KEY (Month)
) ENGINE = InnoDB ;
INSERT INTO CalendarMonth
(Month)
VALUES
(1), (2), ..., (12) ;
Those can also help us make the one we'll need here:
CREATE TABLE CalendarYearMonth
( Year SMALLINT UNSIGNED NOT NULL
, Month TINYINT UNSIGNED NOT NULL
, FirstDay DATE NOT NULL
, NextMonth_FirstDay DATE NOT NULL
, PRIMARY KEY (Year, Month)
) ENGINE = InnoDB ;
INSERT INTO CalendarYearMonth
(Year, Month, FirstDay, NextMonth_FirstDay)
SELECT
y.Year
, m.Month
, MAKEDATE(y.Year, 1) + INTERVAL (m.Month-1) MONTH
, MAKEDATE(y.Year, 1) + INTERVAL (m.Month) MONTH
FROM
CalendarYear AS y
CROSS JOIN
CalendarMonth AS m ;
Then you can use the Calendar tables to write more complex queries, like the variation you want (with missing months) and probably more efficiently. Tested in SQL-Fiddle:
SELECT
COUNT(t.date) AS nb,
CONCAT(cal.Month, ':', cal.Year) AS period
FROM
CalendarYearMonth AS cal
JOIN
( SELECT
YEAR(MIN(date)) AS min_year
, MONTH(MIN(date)) AS min_month
, YEAR(MAX(date)) AS max_year
, MONTH(MAX(date)) AS max_month
FROM table1
WHERE criteria = 'value'
) AS mm
ON (cal.Year, cal.Month) >= (mm.min_year, mm.min_month)
AND (cal.Year, cal.Month) <= (mm.max_year, mm.max_month)
LEFT JOIN
table1 AS t
ON t.criteria = 'value'
AND t.date >= cal.FirstDay
AND t.date < cal.NextMonth_FirstDay
GROUP BY
cal.Year, cal.Month ;

You must also GROUP BY the year:
GROUP BY MONTH(t.date), YEAR(t.date)
Your original query uses YEAR(t.date) in the SELECT clause outside of any aggregate function without grouping by it -- as a result, you get exactly 12 groups (one for each possible month) and for each group (that possibly contains dates across many years) a "random" year is chosen by MySql for selection. Strictly speaking, this is meaningless and the query should never have been allowed to execute. But MySql... sigh.

Related

Nested MariaDB Query Slow

I am having performance issues with a query, I have 21 million records across the table, and 2 of the tables I'm looking in here have 8 million each; individually, they are very quick. But I've done a query that, in my opinion, isn't very good, but it's the only way I know how to do it.
This query takes 65 seconds, I need to get it under 1 second and I think it's possible if I don't have all the SELECT queries, but once again, I am not sure how else to do it with my SQL knowledge.
Database server version is MariaDB 10.6.
SELECT
pa.`slug`,
(
SELECT
SUM(`impressions`)
FROM `rh_pages_gsc_country`
WHERE `page_id` = pa.`page_id`
AND `country` = 'aus'
AND `date_id` IN
(
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
)
) as au_impressions,
(
SELECT
SUM(`clicks`)
FROM `rh_pages_gsc_country`
WHERE `page_id` = pa.`page_id`
AND `country` = 'aus'
AND `date_id` IN
(
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
)
) as au_clicks,
(
SELECT
COUNT(`keywords_id`)
FROM `rh_pages_gsc_keywords`
WHERE `page_id` = pa.`page_id`
AND `date_id` IN
(
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
)
) as keywords,
(
SELECT
AVG(`position`)
FROM `rh_pages_gsc_keywords`
WHERE `page_id` = pa.`page_id`
AND `date_id` IN
(
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
)
) as avg_pos,
(
SELECT
AVG(`ctr`)
FROM `rh_pages_gsc_keywords`
WHERE `page_id` = pa.`page_id`
AND `date_id` IN
(
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
)
) as avg_ctr
FROM `rh_pages` pa
WHERE pa.`site_id` = 13
ORDER BY au_impressions DESC, keywords DESC, slug DESC
If anyone can help, I don't think the table structure is needed here as it's basically shown in the query, but here is a photo of the constraints and table types.
Anyone that can help is greatly appreciated.
Do NOT normalize any column that will be regularly used in a "range scan", such as date. The following is terribly slow:
AND `date_id` IN (
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH
AND NOW() )
It also consumes extra space to have BIGINT (8 bytes) pointing to a DATE (5 bytes).
Once you move the date to the various tables, the subqueries simplify, such as
SELECT AVG(`position`)
FROM `rh_pages_gsc_keywords`
WHERE `page_id` = pa.`page_id`
AND `date_id` IN (
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH
AND NOW() )
becomes
SELECT AVG(`position`)
FROM `rh_pages_gsc_keywords`
WHERE `page_id` = pa.`page_id`
AND `date` >= NOW() - INTERVAL 12 MONTH
I'm assuming that nothing after "NOW" has yet been stored.
If there are dates in the future, then add
AND `date` < NOW()
Each table will probably need a new index, such as
INDEX(page_id, date) -- in that order
(Yes, the "JOIN" suggestion by others is a good one. It's essentially orthogonal to my suggestions above and below.)
After you have made those changes, if the performance is not good enough, we can discuss Summary Tables
Your query is aggregating (summarizing) rows from two different detail tables, rh_pages_gsc_country and rh_pages_gsc_keywords, and doing so for a particular date range. And it has a lot of correlated subqueries.
The first steps in your path to better performance are
Converting your correlated subqueries to independent subqueries, then JOINing them.
Writing one subquery for each detail table, rather than one for each column you need summarized.
You mentioned you've been struggling with this. The concept I hope you learn from this answer is this: you can often refactor away your correlated subqueries if you can come up with independent subqueries that give the same results, and then join them together. If you mention subqueries in your SELECT clause -- SELECT ... (SELECT whatever) whatever ... -- you probably have an opportunity to do this refactoring.
Here goes. First you need a subquery for your date range. You have this one right, just repeated.
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
Next you need a subquery for rh_pages_gsc_country. It is a modification of what you have. We'll fetch both SUMs in one subquery.
SELECT SUM(`impressions`) impressions,
SUM(`clicks`) clicks,
page_id, date_id
FROM `rh_pages_gsc_country`
WHERE `country` = 'aus'
GROUP BY page_id, date_id
See how this goes? This subquery yields a virtual table with exactly one row for every combination of page_id and date_id, containing the number of impressions and the number of clicks.
Next, let's join the subqueries together in a main query. This yields some columns of your result set.
SELECT pa.slug, country.impressions, country.clicks
FROM rh_pages pa
JOIN (
SELECT SUM(`impressions`) impressions,
SUM(`clicks`) clicks,
page_id, date_id
FROM `rh_pages_gsc_country`
WHERE `country` = 'aus' -- constant for country code
GROUP BY page_id, date_id
) country ON country.page_id = pa.page_id
JOIN (
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
) dates ON dates.date_id = country.date_id
WHERE pa.site_id = 13 -- constant for page id
ORDER BY country.impressions DESC
This runs through the rows of rh_pages_gsc_dates and rh_pages_gsc_country just once to satisfy your query. So, faster.
Finally let's do the same thing for your rh_pages_gsc_keywords table's summary.
SELECT pa.slug, country.impressions, country.clicks,
keywords.keywords, keywords.avg_pos, keywords.avg_ctr
FROM rh_pages pa
JOIN (
SELECT SUM(`impressions`) impressions,
SUM(`clicks`) clicks,
page_id, date_id
FROM `rh_pages_gsc_country`
WHERE `country` = 'aus' -- constant for country code
GROUP BY page_id, date_id
) country ON country.page_id = pa.page_id
JOIN (
SELECT SUM(`keywords_id`) keywords,
AVG(`position`) position,
AVG(`ctr`) avg_ctr,
page_id, date_id
FROM `rh_pages_gsc_keywords`
GROUP BY page_id, date_id
) keywords ON keywords.page_id = pa.page_id
JOIN (
SELECT `date_id`
FROM `rh_pages_gsc_keywords`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
) dates ON dates.date_id = country.date_id
AND dates.date_id = keywords.date_id
WHERE pa.site_id = 13 -- constant for page id
ORDER BY impressions DESC, keywords DESC, slug DESC
This will almost certainly be faster than what you have now. If it's fast enough, great. If not, please don't hesitate to ask another question for help, tagging it query-optimization. We will need to see your table definitions, your index definitions, and the output of EXPLAIN. Please read this before asking a followup question.
I did not, repeat not, debug any of this. That's up to you.

collect_set() distinct users by day from last 90 days only when user is older than last 90 days

for now I was able to collect_set() everyone that is active with no problem:
with aux as(
select date
,collect_set(user_id) over(
partition by feature
order by cast(timestamp(date) as float)
range between (-90*60*60*24) following and 0 preceding
) as user_id
,feature
--
from (
select data
,feature
,collect_set(user_id)
--
from table
--
group by date, feature
)
)
--
select date
,distinct_array(flatten(user_id))
,feature
--
from aux
The problem is, now I have to keep only users that are older than last 90 days
I tried this and didn't work:
select date
,collect_set(case when user_created_at < date - interval 90 day
then user_id end) over(
partition by feature
order by cast(timestamp(date) as float)
range between (-90*60*60*24) following and 0 preceding
) as teste
,feature
from table
The reason it didn't work is because the filter inside collect_select() filters only users from one day instead filtering all the users from the last 90 days,
Making the result with more results than expected.
How can I get it correctly?
As reference, I'm using this query to verify if is correct:
select
count(distinct user_id) as total
,count(distinct case when user_created_at < date('2020-04-30') - interval 90 day then user_id end)
,count(distinct case when user_created_at >= date('2020-04-30') - interval 90 day then user_id end)
--
from table
--
where 1=1
and date >= date('2020-04-30') - interval 90 day
and date <= '2020-04-30'
and feature = 'a_feature'
pretty ugly workaround but:
select data
,feature
,collect_set(cus.client_id) as client
from (
select data
,explode(array_distinct(flatten(client))) as client
,feature
from(
select data
,collect_set(client_id) over(
partition by feature
order by cast(timestamp(data) as float)
range between (-90*60*60*24) following and 0 preceding
) as cliente
,feature
from (
select data
,feature
,collect_set(client_id) as cliente
from da_pandora.ds_transaction dtr
--
group by data, feature
)
)
)as dtr
left join costumer as cus
on cus.client_id = dtr.client and date(client_created_at) < data - interval 90 day
group by data, feature

Calculating a Moving Average MySQL?

Good Day,
I am using the following code to calculate the 9 Day Moving average.
SELECT SUM(close)
FROM tbl
WHERE date <= '2002-07-05'
AND name_id = 2
ORDER BY date DESC
LIMIT 9
But it does not work because it first calculates all of the returned fields before the limit is called. In other words it will calculate all the closes before or equal to that date, and not just the last 9.
So I need to calculate the SUM from the returned select, rather than calculate it straight.
IE. Select the SUM from the SELECT...
Now how would I go about doing this and is it very costly or is there a better way?
If you want the moving average for each date, then try this:
SELECT date, SUM(close),
(select avg(close) from tbl t2 where t2.name_id = t.name_id and datediff(t2.date, t.date) <= 9
) as mvgAvg
FROM tbl t
WHERE date <= '2002-07-05' and
name_id = 2
GROUP BY date
ORDER BY date DESC
It uses a correlated subquery to calculate the average of 9 values.
Starting from MySQL 8, you should use window functions for this. Using the window RANGE clause, you can create a logical window over an interval, which is very powerful. Something like this:
SELECT
date,
close,
AVG (close) OVER (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
FROM tbl
WHERE date <= DATE '2002-07-05'
AND name_id = 2
ORDER BY date DESC
For example:
WITH t (date, `close`) AS (
SELECT DATE '2020-01-01', 50 UNION ALL
SELECT DATE '2020-01-03', 54 UNION ALL
SELECT DATE '2020-01-05', 51 UNION ALL
SELECT DATE '2020-01-12', 49 UNION ALL
SELECT DATE '2020-01-13', 59 UNION ALL
SELECT DATE '2020-01-15', 30 UNION ALL
SELECT DATE '2020-01-17', 35 UNION ALL
SELECT DATE '2020-01-18', 39 UNION ALL
SELECT DATE '2020-01-19', 47 UNION ALL
SELECT DATE '2020-01-26', 50
)
SELECT
date,
`close`,
COUNT(*) OVER w AS c,
SUM(`close`) OVER w AS s,
AVG(`close`) OVER w AS a
FROM t
WINDOW w AS (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
ORDER BY date DESC
Leading to:
date |close|c|s |a |
----------|-----|-|---|-------|
2020-01-26| 50|1| 50|50.0000|
2020-01-19| 47|2| 97|48.5000|
2020-01-18| 39|3|136|45.3333|
2020-01-17| 35|4|171|42.7500|
2020-01-15| 30|4|151|37.7500|
2020-01-13| 59|5|210|42.0000|
2020-01-12| 49|6|259|43.1667|
2020-01-05| 51|3|159|53.0000|
2020-01-03| 54|3|154|51.3333|
2020-01-01| 50|3|155|51.6667|
Use something like
SELECT
sum(close) as sum,
avg(close) as average
FROM (
SELECT
(close)
FROM
tbl
WHERE
date <= '2002-07-05'
AND name_id = 2
ORDER BY
date DESC
LIMIT 9 ) temp
The inner query returns all filtered rows in desc order, and then you avg, sum up those rows returned.
The reason why the query given by you doesn't work is due to the fact that the sum is calculated first and the LIMIT clause is applied after the sum has already been calculated, giving you the sum of all the rows present
an other technique is to do a table:
CREATE TABLE `tinyint_asc` (
`value` tinyint(3) unsigned NOT NULL default '0',
PRIMARY KEY (value)
) ;
​
INSERT INTO `tinyint_asc` VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),(91),(92),(93),(94),(95),(96),(97),(98),(99),(100),(101),(102),(103),(104),(105),(106),(107),(108),(109),(110),(111),(112),(113),(114),(115),(116),(117),(118),(119),(120),(121),(122),(123),(124),(125),(126),(127),(128),(129),(130),(131),(132),(133),(134),(135),(136),(137),(138),(139),(140),(141),(142),(143),(144),(145),(146),(147),(148),(149),(150),(151),(152),(153),(154),(155),(156),(157),(158),(159),(160),(161),(162),(163),(164),(165),(166),(167),(168),(169),(170),(171),(172),(173),(174),(175),(176),(177),(178),(179),(180),(181),(182),(183),(184),(185),(186),(187),(188),(189),(190),(191),(192),(193),(194),(195),(196),(197),(198),(199),(200),(201),(202),(203),(204),(205),(206),(207),(208),(209),(210),(211),(212),(213),(214),(215),(216),(217),(218),(219),(220),(221),(222),(223),(224),(225),(226),(227),(228),(229),(230),(231),(232),(233),(234),(235),(236),(237),(238),(239),(240),(241),(242),(243),(244),(245),(246),(247),(248),(249),(250),(251),(252),(253),(254),(255);
After you can used it like that:
select
date_add(tbl.date, interval tinyint_asc.value day) as mydate,
count(*),
sum(myvalue)
from tbl inner
join tinyint_asc.value <= 30 -- for a 30 day moving average
where date( date_add(o.created_at, interval tinyint_asc.value day ) ) between '2016-01-01' and current_date()
group by mydate
This query is fast:
select date, name_id,
case #i when name_id then #i:=name_id else (#i:=name_id)
and (#n:=0)
and (#a0:=0) and (#a1:=0) and (#a2:=0) and (#a3:=0) and (#a4:=0) and (#a5:=0) and (#a6:=0) and (#a7:=0) and (#a8:=0)
end as a,
case #n when 9 then #n:=9 else #n:=#n+1 end as n,
#a0:=#a1,#a1:=#a2,#a2:=#a3,#a3:=#a4,#a4:=#a5,#a5:=#a6,#a6:=#a7,#a7:=#a8,#a8:=close,
(#a0+#a1+#a2+#a3+#a4+#a5+#a6+#a7+#a8)/#n as av
from tbl,
(select #i:=0, #n:=0,
#a0:=0, #a1:=0, #a2:=0, #a3:=0, #a4:=0, #a5:=0, #a6:=0, #a7:=0, #a8:=0) a
where name_id=2
order by name_id, date
If you need an average over 50 or 100 values, it's tedious to write, but
worth the effort. The speed is close to the ordered select.

Optimization of a mysql query

I'm using MySQL and have a table user_data like this:
user_id int(10) unsigned
reg_date int(10) unsigned
carrier char(1)
The reg_data is the unix timestamp of the registration time (it could be any second of a day), and the carrier is the type of carriers, the possible values of which could ONLY be 'D', 'A' or 'V'.
I need to write a sql statement to select the registered user number of different carriers on each day from 2013/01/01 to 2013/01/31. So the desirable result could be:
2013/01/01 D 10
2013/01/01 A 31
2013/01/01 V 24
2013/01/02 D 9
2013/01/02 A 23
2013/01/02 V 14
....
2013/01/31 D 11
2013/01/31 A 34
2013/01/31 V 22
Can anyone help me with this question? I'm required to give the BEST answer, which means I can add index if necessary, but I need to keep the query efficient.
Currently, I created an index on (reg_date, carrier) and use the following query:
select FROM_UNIXTIME(reg_date, "%M %D %Y") as reg_day, carrier, count(carrier) as user_count
from user_data
where reg_date >= UNIX_TIMESTAMP('2013-01-01 00:00:00') and reg_date < UNIX_TIMESTAMP('2013-02-01 00:00:00')
group by reg_day, carrier
order by reg_date;
Thanks!
If you can not change the table (storing individual dates would help a little), only indexes, then:
Create a compound index: carrier, reg_date, then group carrier, reg_date and order by reg_date, carrier.
You can create an other index just for the timestamp (it may work better for the WHERE caluse, depending your number of records outside the scope).
Further more you can use completely unix timestamps, then embed this as a subquery an an outer one can covert the timestamps to human-readable dates (this way the conversion is done after the group, not for each individual record).
Creating indexes:
CREATE INDEX bytime ON user_data (reg_date);
CREATE INDEX daily_group ON user_data (carrier, reg_date);
Query:
SELECT FROM_UNIXTIME(reg_day, "%M %D %Y") AS reg_day
, carrier
, user_count
FROM (
SELECT FLOOR(reg_date / (60 * 60 * 24)) AS reg_day
, carrier
, count(carrier) AS user_count
FROM user_data
WHERE reg_date >= UNIX_TIMESTAMP('2013-01-01 00:00:00')
AND reg_date < UNIX_TIMESTAMP('2013-02-01 00:00:00')
GROUP BY carrier, reg_day
ORDER BY reg_day, carrier
) AS a;

MySQL query to count items by week for the current 52-weeks?

I have a query that I'd like to change so that it gives me the counts for the current 52 weeks. This query makes use of a calendar table I've made which contains a list of dates in a fixed range. The query as it stands is selecting max and min dates and not necessarily the last 52 weeks.
I'm wondering how to keep my calendar table current such that I can get the last 52-weeks (i.e, from right now to one year ago). Or is there another way to make the query independent of using a calendar table?
Here's the query:
SELECT calendar.datefield AS date, IFNULL(SUM(purchaseyesno),0) AS item_sales
FROM items_purchased join items on items_purchased.item_id=items.item_id
RIGHT JOIN calendar ON (DATE(items_purchased.purchase_date) = calendar.datefield)
WHERE (calendar.datefield BETWEEN (SELECT MIN(DATE(purchase_date))
FROM items_purchased) AND (SELECT MAX(DATE(purchase_date)) FROM items_purchased))
GROUP BY week(date)
thoughts?
Some people dislike this approach but I tend to use a dummy table that contains values from 0 - 1000 and then use a derived table to produce the ranges that are needed -
CREATE TABLE dummy (`num` INT NOT NULL);
INSERT INTO dummy VALUES (0), (1), (2), (3), (4), (5), .... (999), (1000);
If you have a table with an auto-incrementing id and plenty of rows you could generate it from that -
CREATE TABLE `dummy`
SELECT id AS `num` FROM `some_table` WHERE `id` <= 1000;
Just remember to insert the 0 value.
SELECT CURRENT_DATE - INTERVAL num DAY
FROM dummy
WHERE num < 365
So, applying this approach to your query you could do something like this -
SELECT WEEK(calendar.datefield) AS `week`, IFNULL(SUM(purchaseyesno),0) AS item_sales
FROM items_purchased join items on items_purchased.item_id=items.item_id
RIGHT JOIN (
SELECT (CURRENT_DATE - INTERVAL num DAY) AS datefield
FROM dummy
WHERE num < 365
) AS calendar ON (DATE(items_purchased.purchase_date) = calendar.datefield)
WHERE calendar.datefield >= (CURRENT_DATE - INTERVAL 1 YEAR)
GROUP BY week(datefield) -- shouldn't this be datefield instead of date?
I too typically "simulate" a table on the fly by using #sql variables and just join to ANY table in your system that has AT least as many weeks as you want. NOTE... when dealing with dates, I like to typically use the date-part only which implies a 12:00:00 am. Also, by advancing the start date by 7 days for the "EndOfWeek", you can now apply a BETWEEN clause for records within a given time period... such as your weekly needs.
I've applied such a sample to coordinate the join based on date association to the per week basis... Since your
select
DynamicCalendar.StartOfWeek,
COALESCE( SUM( IP.PurchaseYesNo ), 0 ) as Item_Sales
from
( select
#weekNum := #weekNum +1 as WeekNum,
#startDate as StartOfWeek,
#startDate := date_add( #startDate, interval 1 week ) EndOfWeek
from
( select #weekNum := 0,
#startDate := date(date_sub(now(), interval 1 year ))) sqlv,
AnyTableThatHasAtLeast52Records,
limit
52 ) DynamicCalendar
LEFT JOIN items_purchased IP
on IP.Purchase_Date bewteen DynamicCalendar.StartOfWeek
AND DynamicCalendar.EndOfWeek
group by
DynamicCalendar.StartOfWeek
This is under the premise that your "PurchaseYesNo" value is in your purchased table directly. If so, no need to join to the ITEMS table. If the field IS in the items table, then I would just tack on a LEFT JOIN for your items table and get value from that.
However you could use the dynamicCalendar context in MANY conditions.