MySQL report -- fill in empty dates - mysql

I am building a query to return daily sales data. My current query returns a table similar to this:
----------------------------------
| DATE | SKU | TOTAL |
----------------------------------
| 2014-11-01 | AV155_A | 209.00 |
| 2014-11-02 | AV155_B | 627.00 |
| 2014-11-04 | AV155_C | 279.00 |
| 2014-11-05 | AV155 | 279.00 |
| 2014-11-08 | AV1556_A | 209.00 |
| 2014-11-09 | AV1556_B | 627.00 |
| 2014-11-10 | AV1556_C | 279.00 |
| 2014-11-12 | AV1556 | 279.00 |
What I would like is a results table that displays every day, even if there are no data points for that particular day. Something like this:
----------------------------------
| DATE | SKU | TOTAL |
----------------------------------
| 2014-11-01 | AV155_A | 209.00 |
| 2014-11-02 | AV155_B | 627.00 |
| 2014-11-03 | | 0 |
| 2014-11-04 | AV155_C | 279.00 |
| 2014-11-05 | AV155 | 279.00 |
| 2014-11-06 | | 0 |
| 2014-11-07 | | 0 |
| 2014-11-08 | AV1556_A | 209.00 |
| 2014-11-09 | AV1556_B | 627.00 |
| 2014-11-10 | AV1556_C | 279.00 |
| 2014-11-11 | | 0 |
| 2014-11-12 | AV1556 | 279.00 |
The query I currently have looks like this:
select
DATE_FORMAT(created_on, '%m-%d-%Y') as date,
sku,
SUM(price) as total
FROM order_items
WHERE created_on between FROM_UNIXTIME(1415577600) AND NOW()
GROUP BY MONTH(created_on), DAY(v.created_on), order_item_sku;

You need to use an outer join. The easiest way is if you have a calendar table, but you can make one on the fly:
select c.thedate, oi.sku, sum(price) as total
from (select date('2014-11-01') as thedate union all
date('2014-11-02') as thedate union all
date('2014-11-03') as thedate union all
date('2014-11-04') as thedate union all
date('2014-11-05') as thedate union all
date('2014-11-06') as thedate union all
date('2014-11-07') as thedate union all
date('2014-11-08') as thedate union all
date('2014-11-09') as thedate union all
date('2014-11-10') as thedate union all
date('2014-11-11') as thedate union all
date('2014-11-12') as thedate
) c left join
order_items oi
on c.thedate = date(oi.created_on)
where oi.created_on between FROM_UNIXTIME(1415577600) AND NOW()
group by ci.thedate, oi.sku

Here's an answer that addresses the need for a flexible list of dates. You need to figure out a way to get a virtual table containing all the dates in the appropriate range, and then join them to the summary. Here’s a query that will get the dates in the range.
SELECT mintime + INTERVAL seq.seq DAY AS reportdate
FROM (
SELECT MIN(DATE(created_on)) AS mintime,
MAX(DATE(created_on)) AS maxtime
FROM order_items
WHERE created_on >= starting_time
AND created_on <= NOW()
) AS order_items
JOIN seq_0_to_999 AS seq
ON seq.seq < TIMESTAMPDIFF(DAY,mintime,maxtime)
What’s going on here? Three things.
We have a subquery which determines the first and last day (min and max created_on) we care about reporting.
We apply a time range to that query. I like to avoid using BETWEEN for timestamp ranges because it often gets the ending time wrong in an off-by-one-second error.
We have a table called seq_0_to_999. It contains a sequence of a thousand cardinal numbers: the integers starting at zero. More about this in a moment.
Then, you can join that as a subquery to your aggregate query to get all the dates in the range listed, like so.
select DATE_FORMAT(d.reportdate, '%m-%d-%Y') as date,
sku,
SUM(price) as total
FROM (
SELECT mintime + INTERVAL seq.seq DAY AS reportdate
FROM (
SELECT MIN(DATE(created_on)) AS mintime,
MAX(DATE(created_on)) AS maxtime
FROM order_items
WHERE created_on >= starting_time
AND created_on <= NOW()
) AS order_items
JOIN seq_0_to_999 AS seq
ON seq.seq < TIMESTAMPDIFF(DAY,mintime,maxtime)
) AS d
LEFT JOIN order_items ON d.reportdate = DATE(order_items.created_on)
WHERE created_on >= starting_time
AND created_on <= NOW()
GROUP BY d.reportdate, sku
ORDER BY d.reportdate, sku
It looks like a big nasty hairball of a query. But if you think of it as a sandwich made of various layers of queries, it really isn't that complicated.
It uses LEFT JOIN so it makes sure all the dates in the range are preserved even if there's no corresponding data in your order_items table.
Finally, what about this seq_0_to_999 table? Where do we get those integers starting with zero? The answer is this: we have to arrange to do that; those numbers aren’t built in to MySQL. (They are built into the MySQL fork called MariaDB.) Create a short table with the integers from 0-9 in it, like so:
DROP TABLE IF EXISTS seq_0_to_9;
CREATE TABLE seq_0_to_9 AS
SELECT 0 AS seq UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4
UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9;
Then create a view that joins that table with itself to generate 1000 combinations like this:
DROP VIEW IF EXISTS seq_0_to_999;
CREATE VIEW seq_0_to_999 AS (
SELECT (a.seq + 10 * (b.seq + 10 * c.seq)) AS seq
FROM seq_0_to_9 a
JOIN seq_0_to_9 b
JOIN seq_0_to_9 c
);
I wrote this up in some detail at http://www.plumislandmedia.net/mysql/filling-missing-data-sequences-cardinal-integers/

Related

Select all dates which are not in one of the date ranges

I have a table of time periods. (date ranges). These date ranges can overlap. These date ranges can also be subranges of another data record.
+----+------------+------------+
| id | start_date | end_date |
+----+------------+------------+
| 1 | 2019-01-01 | 2019-01-31 |
| 2 | 2019-02-01 | 2010-02-28 |
| 3 | 2019-04-01 | 2010-04-30 |
+----+------------+------------+
Then I have a table with invoices with invoice date and invoice number:
+----+--------------+------------+
| id | invoice_date | invoice_no |
+----+--------------+------------+
| 1 | 2019-01-14 | 4534534BG |
| 2 | 2019-03-01 | 678678AAA |
| 3 | 2019-04-13 | 123123DDD |
+----+--------------+------------+
I'm looking for all invoices that are available in one date period.
The goal in this small example would be to find the invoice from March: invoice_no: 678678AAA
My Approach
SELECT *
FROM `invoice`
WHERE (invoice_date BETWEEN '2019-01-01' AND '2019-01-31')
With this solution I would have to mark the found invoices (which provide a result) as "found" and then repeat the query for all other ranges. (Until no open invoices or periods are processed).
That would be a lot of queries, because there are a lot of invoices and a lot of time periods. I would like to avoid that.
Is there a trick here how to get the start and end date into the BETWEEN via Select?
To exhibit invoices that do not belong to any of the date ranges defined in the other table, you could use a not exists condition:
select i.*
from invoices i
where not exists (
select 1
from periods p
where i.invoice_date >= p.start_date and i.invoice_date <= p.end_date
)
Another typical solution is to use a left join antipattern, ie:
select i.*
from invoices i
left join periods p
on i.invoice_date >= p.start_date and i.invoice_date <= p.end_date
where p.id is null

MySQL: Get the minimum record for a user on a given day

I have a table of events, each with someone in charge. There may be multiple of these events per day, but I need a query record of the first for each user on a given day.
For example, if I have the following table of events:
+----------+-------------+---------------------+
| event_id | director_id | event_start |
+----------+-------------+---------------------+
| 1 | 111 | 2015-04-27 10:00:00 |
+----------+-------------+---------------------+
| 2 | 222 | 2015-04-27 11:00:00 |
+----------+-------------+---------------------+
| 3 | 333 | 2015-04-27 12:00:00 |
+----------+-------------+---------------------+
| 4 | 111 | 2015-04-27 13:00:00 |
+----------+-------------+---------------------+
| 5 | 222 | 2015-04-27 09:00:00 |
+----------+-------------+---------------------+
I would like the following returned:
+----------+-------------+---------------------+
| event_id | director_id | event_start |
+----------+-------------+---------------------+
| 1 | 111 | 2015-04-27 10:00:00 |
+----------+-------------+---------------------+
| 5 | 222 | 2015-04-27 09:00:00 |
+----------+-------------+---------------------+
| 3 | 333 | 2015-04-27 12:00:00 |
+----------+-------------+---------------------+
I thought a query like the following would have worked, but it turns out that MySQL does not support MIN in the WHERE clause (simple SQL query giving Invalid use of group function):
SELECT
event_id, director_id, MIN(event_start) AS event_start
FROM events
WHERE MIN(event_start) >= '2015-04-27 00:00:00'
AND MIN(event_start) < '2015-04-28 00:00:00'
GROUP BY director_id;
How can I do this in the most efficient way possible? My events table may easily have 10,000-100,000 records.
You can get the minimum event time on each day with a query similar to yours:
SELECT director_id, date(event_start) as dte, MIN(event_start) AS event_start
FROM events e
GROUP BY director_id, date(event_start);
You can then use this as a subquery to get all other information from the row:
select e.*
from events e join
(SELECT e.director_id, date(e.event_start) as dte, MIN(e.event_start) AS event_start
FROM events e
GROUP BY e.director_id, date(e.event_start)
) ee
on e.event_start = ee.event_start -- note, this has both the date and time;
If you want to restrict the results to a single day, you can put the where clause in the subquery.
You cant use group by / aggregate functions in the where clause of a query. One way to do what you want is to use a left join like so:
select e1.*
from events e1
left join events e2
on e1.director_id = e2.director_id
and e1.event_start > e2.event_start
and date(e1.event_start) = date(e2.event_start)
where e2.director_id is null
fiddle here
Performance is likely to be increased if you have an index across (director_id, event_start)
You can also further limit the result size by changing and date(e1.event_start) = date(e2.event_start) to check for specific dates.
You can give this a try:
SELECT
e1.*
FROM events AS e1
INNER JOIN ( SELECT director_id, MIN(event_start) AS `eventStart`
FROM `events` GROUP BY director_id ) AS e2
ON e1.director_id = e2.director_id
AND e1.event_start = e2.eventStart
WHERE e2.eventStart >= '2015-04-27 00:00:00'
AND e2.eventStart < '2015-04-28 00:00:00';
Here is the sqlfiddle.

MySQL DATETIME sort with virtual datetable

I have a table with some events in conjunction with timestamp as DATETIME.
Now I want to have a statistic about my data, e.g. how much events per day... In some cases I don´t have events every day and of course I don´t get in my statistic that days with no existens entries.
| id | DATE | COUNT |
| 1 | 2014-09-06 | 1 |
| 2 | 2014-09-07 | 8 |
| 3 | 2014-09-10 | 2 |
| 4 | 2014-09-14 | 78 |
So i wrote a little script who generates me a query to solv that problem. It generates a virtual table with my days i want to know and do a LEFT OUTER JOIN with my event table.
So i will got all dates without gaps! The query looks like this e.g.:
SELECT DATE_FORMAT(d.date, '%d.%m.%Y') as datum, COUNT(l.id) as anzahl
FROM
(
SELECT STR_TO_DATE('25.11.2014', '%d.%m.%Y') as date UNION ALL
SELECT STR_TO_DATE('26.11.2014', '%d.%m.%Y') as date UNION ALL
SELECT STR_TO_DATE('27.11.2014', '%d.%m.%Y') as date UNION ALL
SELECT STR_TO_DATE('28.11.2014', '%d.%m.%Y') as date UNION ALL
SELECT STR_TO_DATE('29.11.2014', '%d.%m.%Y') as date UNION ALL
SELECT STR_TO_DATE('30.11.2014', '%d.%m.%Y') as date UNION ALL
SELECT STR_TO_DATE('01.12.2014', '%d.%m.%Y') as date
) as d
LEFT OUTER JOIN events l ON d.date = DATE(l.date)
GROUP BY datum
ORDER BY datum DESC
This query works absolute perfectly and i have also dates with no data in my statistic.
But now comes the real problem i have: the sorting doesn´t work! I got some weird output. I have no idea what the problem is. The output looks like this:
| DATE | COUNT |
| 31.10.2014 | 0 |
| 30.11.2014 | 5 |
| 30.10.2014 | 0 |
| 29.11.2014 | 0 |
| 29.10.2014 | 0 |
| 28.11.2014 | 0 |
| 28.10.2014 | 0 |
| 27.11.2014 | 0 |
| 27.10.2014 | 0 |
| 26.11.2014 | 0 |
| 26.10.2014 | 0 |
| 25.11.2014 | 1 |
| 25.10.2014 | 0 |
| 24.11.2014 | 1 |
| 24.10.2014 | 0 |
| 23.11.2014 | 0 |
| 23.10.2014 | 0 |
| 22.11.2014 | 0 |
| 22.10.2014 | 0 |
| 21.11.2014 | 1 |
| 21.10.2014 | 0 |
| 20.11.2014 | 0 |
| 20.10.2014 | 0 |
| 19.11.2014 | 2 |
| 19.10.2014 | 0 |
| 18.11.2014 | 0 |
| 18.10.2014 | 0 |
| 17.11.2014 | 0 |
| 17.10.2014 | 0 |
| 16.11.2014 | 0 |
So what´s wrong with my query? I have conscious use the function STR_TO_DATE i got a "real" date format. Normaly the sorting should work with it, isn´t it?
Problem is your eyes see them as dates, but in your quesry they're strings so they are correctly sorted as string with the day and month first. Try having anotehr field formated for sorting and use that for the sort.
SELECT DATE_FORMAT(d.date, '%d.%m.%Y') as datum, DATE_FORMAT(d.date, '%Y.%m.%d') as sortdatum, COUNT(l.id) as anzahl
FROM
(
SELECT STR_TO_DATE('25.11.2014', '%d.%m.%Y') as date UNION ALL
SELECT STR_TO_DATE('26.11.2014', '%d.%m.%Y') as date UNION ALL
SELECT STR_TO_DATE('27.11.2014', '%d.%m.%Y') as date UNION ALL
SELECT STR_TO_DATE('28.11.2014', '%d.%m.%Y') as date UNION ALL
SELECT STR_TO_DATE('29.11.2014', '%d.%m.%Y') as date UNION ALL
SELECT STR_TO_DATE('30.11.2014', '%d.%m.%Y') as date UNION ALL
SELECT STR_TO_DATE('01.12.2014', '%d.%m.%Y') as date
) as d
LEFT OUTER JOIN events l ON d.date = DATE(l.date)
GROUP BY datum
ORDER BY sortdatum DESC
please it **unix_timestamp**
SELECT DATE_FORMAT(d.date, '%d.%m.%Y') as datum, COUNT(l.id) as anzahl
FROM
(
SELECT STR_TO_DATE('25.11.2014', '%d.%m.%Y') as date UNION ALL
SELECT STR_TO_DATE('26.11.2014', '%d.%m.%Y') as date UNION ALL
SELECT STR_TO_DATE('27.11.2014', '%d.%m.%Y') as date UNION ALL
SELECT STR_TO_DATE('28.11.2014', '%d.%m.%Y') as date UNION ALL
SELECT STR_TO_DATE('29.11.2014', '%d.%m.%Y') as date UNION ALL
SELECT STR_TO_DATE('30.11.2014', '%d.%m.%Y') as date UNION ALL
SELECT STR_TO_DATE('01.12.2014', '%d.%m.%Y') as date
) as d
LEFT OUTER JOIN events l ON unix_timestamp(d.date) = unix_timestamp(l.date)
GROUP BY datum
ORDER BY datum DESC

How can I fetch last 30 days left joined with values from my own table?

In my Symfony2/Doctrine2 application, I have an entity, respectively a table in my database where I keep track of every user, if he or she has done a specific action on a specified day.
My table looks like that, let's call it track_user_action:
+---------+------------+
| user_id | date |
+---------+------------+
| 1 | 2013-09-19 |
| 2 | 2013-09-19 |
| 1 | 2013-09-18 |
| 5 | 2013-09-18 |
| 8 | 2013-09-17 |
| 5 | 2013-09-17 |
+---------+------------+
I would like to retrieve a set of rows, where it shows the last 30 days, the corresponding weekday and if the specified user has an entry in this table, e.g. for user with user_id = 1:
+------------+--------------+-----------------+
| date | weekday | has_done_action |
+------------+--------------+-----------------+
| 2013-09-20 | Friday | false |
| 2013-09-19 | Thursday | true |
| 2013-09-18 | Wednesday | true |
| ... | | |
| 2013-08-20 | Tuesday | false |
+------------+--------------+-----------------+
I could think of a LEFT JOIN of a date-table and my track_user_action. But it seems senseless to create a special table just for the dates. MySQL should be able to handle the days, shouldn't it?
Approach:
SELECT
# somehow retrieve last 30 days
date AS date,
DAYNAME(date) AS weekday,
IF ... THEN has_done_action = true ELSE has_done_action = false
# and according weekdays
LEFT JOIN track_user_action AS t
ON t.date = # date field from above
WHERE t.user_id = 1
ORDER BY # date field from above
DESC
LIMIT 0,30
My questions:
What would be a good (My)SQL query that fetches this kind of result?
In how far is this query implementable in Doctrine2 (I know for fact that Doctrine2 doesn't support all MySQL statements, e.g. YEAR() or MONTH())?
This is a working query statement for seven days (adapt query for 30 days accordingly):
SELECT
d.date AS date,
DAYNAME(d.date) AS weekday,
IF(t.user_id IS NOT NULL, 'true', 'false') AS has_done_action
FROM (
SELECT SUBDATE(CURDATE(), 1) AS date UNION
SELECT SUBDATE(CURDATE(), 2) AS date UNION
SELECT SUBDATE(CURDATE(), 3) AS date UNION
SELECT SUBDATE(CURDATE(), 4) AS date UNION
SELECT SUBDATE(CURDATE(), 5) AS date UNION
SELECT SUBDATE(CURDATE(), 6) AS date UNION
SELECT SUBDATE(CURDATE(), 7) AS date
) AS d
LEFT JOIN track_user_action t
ON t.date = d.date

Mysql join query

I'm using two tables in the database. These tables look like this:
Table A:
id | date
----------------------
12001 | 2011-01-01
13567 | 2011-01-04
13567 | 2011-01-04
11546 | 2011-01-07
13567 | 2011-01-07
18000 | 2011-01-08
Table B:
user | date | amount
----------------------------------
15467 | 2011-01-04 | 140
14568 | 2011-01-04 | 120
14563 | 2011-01-05 | 140
12341 | 2011-01-07 | 140
18000 | 2011-01-08 | 120
I need a query that will join these the two tables.
The first query should result in a total number of users from table A group by date and the number of unique users from table A grouped by date. That query looks like:
SELECT COUNT(DISTINCT id) AS uniq, COUNT(*) AS total, format_date(date, '%Y-%m-%d') as date FROM A GROUP BY date
From the second table I need the sum of the amounts grouped by dates.
That query looks like:
SELECT SUM(amount) AS total_amount FROM B GROUP BY DATE_FORMAT( date, '%Y-%m-%d' )
What I want to do is to merge these two queries into one on column "date", and that as a result I get the following list:
date | unique | total | amount
-----------------------------------------------
2011-01-01 | 1 | 1 | 0
2011-01-04 | 1 | 2 | 260
2011-01-05 | 0 | 0 | 140
2011-01-07 | 2 | 2 | 140
2011-01-08 | 1 | 1 | 120
How can I do that using one query?
Thanks for all suggestions.
select date_format(a.date, '%Y-%m-%d') as date, a.uniq, a.total, ifnull(b.amount, 0) as amount
from (
select count(distinct id) as uniq, count(*) as total, date
from tablea
group by date
) a
left join (
select sum(amount) as amount, date
from tableb
group by date
) b on a.date = b.date
order by a.date
I assume that field date is a datetime type. It's better to format output fields in final result set (date field in this case).
Your queries are fine everything they need is a join.