Select all dates which are not in one of the date ranges - mysql

I have a table of time periods. (date ranges). These date ranges can overlap. These date ranges can also be subranges of another data record.
+----+------------+------------+
| id | start_date | end_date |
+----+------------+------------+
| 1 | 2019-01-01 | 2019-01-31 |
| 2 | 2019-02-01 | 2010-02-28 |
| 3 | 2019-04-01 | 2010-04-30 |
+----+------------+------------+
Then I have a table with invoices with invoice date and invoice number:
+----+--------------+------------+
| id | invoice_date | invoice_no |
+----+--------------+------------+
| 1 | 2019-01-14 | 4534534BG |
| 2 | 2019-03-01 | 678678AAA |
| 3 | 2019-04-13 | 123123DDD |
+----+--------------+------------+
I'm looking for all invoices that are available in one date period.
The goal in this small example would be to find the invoice from March: invoice_no: 678678AAA
My Approach
SELECT *
FROM `invoice`
WHERE (invoice_date BETWEEN '2019-01-01' AND '2019-01-31')
With this solution I would have to mark the found invoices (which provide a result) as "found" and then repeat the query for all other ranges. (Until no open invoices or periods are processed).
That would be a lot of queries, because there are a lot of invoices and a lot of time periods. I would like to avoid that.
Is there a trick here how to get the start and end date into the BETWEEN via Select?

To exhibit invoices that do not belong to any of the date ranges defined in the other table, you could use a not exists condition:
select i.*
from invoices i
where not exists (
select 1
from periods p
where i.invoice_date >= p.start_date and i.invoice_date <= p.end_date
)
Another typical solution is to use a left join antipattern, ie:
select i.*
from invoices i
left join periods p
on i.invoice_date >= p.start_date and i.invoice_date <= p.end_date
where p.id is null

Related

MySQL - Request to count periodically with accumulation

Sorry if this is a duplicate but I never found an answer to this.
I have a User table which is as follows :
| id | pseudo | inscription date |
|----|-------------|------------------|
| 1 | johndoe | 01/01/1970 |
| 2 | janeyes | 02/01/1970 |
| 3 | thirdpseudo | 05/01/1970 |
And I am searching for a query to do statistics of accumulation. I would like to retrieve, day by day, the number of users registered.
I made a query that retrieves only for the registering days, but I don't find how to accumulate every days...
SELECT DATE_FORMAT(date, "%d/%m/%Y") AS 'Day', COUNT(*) AS 'Number of registered users'
FROM User
GROUP BY DATE(date)
ORDER BY date DESC;
This query outputs :
| date | number of registered users |
| ---------- | -------------------------- |
| 01/01/1970 | 1 |
| 02/01/1970 | 1 |
| 05/01/1970 | 1 |
The output I would like for this example is :
| date | number of registered users |
| ---------- | -------------------------- |
| 01/01/1970 | 1 |
| 02/01/1970 | 2 |
| 03/01/1970 | 2 |
| 04/01/1970 | 2 |
| 05/01/1970 | 3 |
| 06/01/1970 | 3 |
I would suggest to generate some dates data defined as range of dates. Then join all users available to these dates and count how many users were registered during such days.
Here is the code:
-- creating simple table
create table Users
(
id int not null,
pseudo varchar(15),
date date
);
-- adding some data
insert into Users
values
(1,'jonh','1970-01-01'),
(2,'doe','1970-01-02'),
(3,'janeyes','1970-01-02'),
(4,'third','1970-01-03'),
(5,'pseudo','1970-01-03'),
(6,'title','1970-01-04'),
(7,'somename','1970-01-04'),
(8,'anothername','1970-01-04');
-- defines the start date and the end date
set #startDate = '1970-01-01';
set #endDate = '1970-02-01';
-- recursively geneterates all dates within the range
with RECURSIVE dateRange (Date) as
(
select #startDate as Date
union ALL
select DATE_ADD(Date, INTERVAL 1 DAY)
from dateRange
where Date < #endDate
)
-- using SUM() over () would result in running total starting
-- from 1, it would count next day + all previous days
select Date, Sum(RegisteredUsersCount) over(order by RegisteredUsersCount asc
rows between unbounded preceding and current row) as RegisteredUsersCount
from
(
-- left join will join all users, if there is no users that correspond to the date of join, then it would be 0 for that date.
select dr.Date, Count(u.id) as RegisteredUsersCount
from dateRange as dr
left join Users as u
on dr.Date = u.date
group by dr.Date
) as t
order by Date asc;
And working example to test: SQLize Online

get the most ordered package using two database tables

I want to get the count of all packages ordered for the whole week and get the package_id of the one with the highest frequency and also has the status='active' in my package table
these are my database tables
sales
+------------+------------------+
| package_id | datesales |
+------------+------------------+
| 1 | timestamp |
| 2 | timestamp |
| 1 | timestamp |
| 1 | timestamp |
| 2 | timestamp |
| 2 | timestamp |
| 3 | timestamp |
+------------+------------------+
packages
+------------+------------------+
| package_id | status |
+------------+------------------+
| 1 | inactive |
| 2 | active |
| 3 | active |
+------------+------------------+
I tried using this sql but I'm not really good with aggregation
SELECT count(product_id) as product_id from i.sales
where [i dunno how to put the sql for package table here]
i.date(datesales) <= curdate() and
i.date(datesales) >= curdate() - interval 6 day
group by product_id
with the above example in sales table, since I have 3 counts of package_id=1 and also 3 counts of package_id=2,
I want to get the id for package_id=2 since it is the highest frequency of orders and it has the status='active' in my package table
I think you basically want order by and limit and join:
select package_id, count(*) as cnt
from sales i join
packages p
using (package_id)
where -- i.date(i.datesales) <= curdate() and -- I doubt you have future start dates
i.datesales >= curdate() - interval 6 day and
p.status = 'active'
group by package_id
order by count(*) desc
limit 1;
Here is a db<>fiddle.

MySQL: Get the minimum record for a user on a given day

I have a table of events, each with someone in charge. There may be multiple of these events per day, but I need a query record of the first for each user on a given day.
For example, if I have the following table of events:
+----------+-------------+---------------------+
| event_id | director_id | event_start |
+----------+-------------+---------------------+
| 1 | 111 | 2015-04-27 10:00:00 |
+----------+-------------+---------------------+
| 2 | 222 | 2015-04-27 11:00:00 |
+----------+-------------+---------------------+
| 3 | 333 | 2015-04-27 12:00:00 |
+----------+-------------+---------------------+
| 4 | 111 | 2015-04-27 13:00:00 |
+----------+-------------+---------------------+
| 5 | 222 | 2015-04-27 09:00:00 |
+----------+-------------+---------------------+
I would like the following returned:
+----------+-------------+---------------------+
| event_id | director_id | event_start |
+----------+-------------+---------------------+
| 1 | 111 | 2015-04-27 10:00:00 |
+----------+-------------+---------------------+
| 5 | 222 | 2015-04-27 09:00:00 |
+----------+-------------+---------------------+
| 3 | 333 | 2015-04-27 12:00:00 |
+----------+-------------+---------------------+
I thought a query like the following would have worked, but it turns out that MySQL does not support MIN in the WHERE clause (simple SQL query giving Invalid use of group function):
SELECT
event_id, director_id, MIN(event_start) AS event_start
FROM events
WHERE MIN(event_start) >= '2015-04-27 00:00:00'
AND MIN(event_start) < '2015-04-28 00:00:00'
GROUP BY director_id;
How can I do this in the most efficient way possible? My events table may easily have 10,000-100,000 records.
You can get the minimum event time on each day with a query similar to yours:
SELECT director_id, date(event_start) as dte, MIN(event_start) AS event_start
FROM events e
GROUP BY director_id, date(event_start);
You can then use this as a subquery to get all other information from the row:
select e.*
from events e join
(SELECT e.director_id, date(e.event_start) as dte, MIN(e.event_start) AS event_start
FROM events e
GROUP BY e.director_id, date(e.event_start)
) ee
on e.event_start = ee.event_start -- note, this has both the date and time;
If you want to restrict the results to a single day, you can put the where clause in the subquery.
You cant use group by / aggregate functions in the where clause of a query. One way to do what you want is to use a left join like so:
select e1.*
from events e1
left join events e2
on e1.director_id = e2.director_id
and e1.event_start > e2.event_start
and date(e1.event_start) = date(e2.event_start)
where e2.director_id is null
fiddle here
Performance is likely to be increased if you have an index across (director_id, event_start)
You can also further limit the result size by changing and date(e1.event_start) = date(e2.event_start) to check for specific dates.
You can give this a try:
SELECT
e1.*
FROM events AS e1
INNER JOIN ( SELECT director_id, MIN(event_start) AS `eventStart`
FROM `events` GROUP BY director_id ) AS e2
ON e1.director_id = e2.director_id
AND e1.event_start = e2.eventStart
WHERE e2.eventStart >= '2015-04-27 00:00:00'
AND e2.eventStart < '2015-04-28 00:00:00';
Here is the sqlfiddle.

MySQL report -- fill in empty dates

I am building a query to return daily sales data. My current query returns a table similar to this:
----------------------------------
| DATE | SKU | TOTAL |
----------------------------------
| 2014-11-01 | AV155_A | 209.00 |
| 2014-11-02 | AV155_B | 627.00 |
| 2014-11-04 | AV155_C | 279.00 |
| 2014-11-05 | AV155 | 279.00 |
| 2014-11-08 | AV1556_A | 209.00 |
| 2014-11-09 | AV1556_B | 627.00 |
| 2014-11-10 | AV1556_C | 279.00 |
| 2014-11-12 | AV1556 | 279.00 |
What I would like is a results table that displays every day, even if there are no data points for that particular day. Something like this:
----------------------------------
| DATE | SKU | TOTAL |
----------------------------------
| 2014-11-01 | AV155_A | 209.00 |
| 2014-11-02 | AV155_B | 627.00 |
| 2014-11-03 | | 0 |
| 2014-11-04 | AV155_C | 279.00 |
| 2014-11-05 | AV155 | 279.00 |
| 2014-11-06 | | 0 |
| 2014-11-07 | | 0 |
| 2014-11-08 | AV1556_A | 209.00 |
| 2014-11-09 | AV1556_B | 627.00 |
| 2014-11-10 | AV1556_C | 279.00 |
| 2014-11-11 | | 0 |
| 2014-11-12 | AV1556 | 279.00 |
The query I currently have looks like this:
select
DATE_FORMAT(created_on, '%m-%d-%Y') as date,
sku,
SUM(price) as total
FROM order_items
WHERE created_on between FROM_UNIXTIME(1415577600) AND NOW()
GROUP BY MONTH(created_on), DAY(v.created_on), order_item_sku;
You need to use an outer join. The easiest way is if you have a calendar table, but you can make one on the fly:
select c.thedate, oi.sku, sum(price) as total
from (select date('2014-11-01') as thedate union all
date('2014-11-02') as thedate union all
date('2014-11-03') as thedate union all
date('2014-11-04') as thedate union all
date('2014-11-05') as thedate union all
date('2014-11-06') as thedate union all
date('2014-11-07') as thedate union all
date('2014-11-08') as thedate union all
date('2014-11-09') as thedate union all
date('2014-11-10') as thedate union all
date('2014-11-11') as thedate union all
date('2014-11-12') as thedate
) c left join
order_items oi
on c.thedate = date(oi.created_on)
where oi.created_on between FROM_UNIXTIME(1415577600) AND NOW()
group by ci.thedate, oi.sku
Here's an answer that addresses the need for a flexible list of dates. You need to figure out a way to get a virtual table containing all the dates in the appropriate range, and then join them to the summary. Here’s a query that will get the dates in the range.
SELECT mintime + INTERVAL seq.seq DAY AS reportdate
FROM (
SELECT MIN(DATE(created_on)) AS mintime,
MAX(DATE(created_on)) AS maxtime
FROM order_items
WHERE created_on >= starting_time
AND created_on <= NOW()
) AS order_items
JOIN seq_0_to_999 AS seq
ON seq.seq < TIMESTAMPDIFF(DAY,mintime,maxtime)
What’s going on here? Three things.
We have a subquery which determines the first and last day (min and max created_on) we care about reporting.
We apply a time range to that query. I like to avoid using BETWEEN for timestamp ranges because it often gets the ending time wrong in an off-by-one-second error.
We have a table called seq_0_to_999. It contains a sequence of a thousand cardinal numbers: the integers starting at zero. More about this in a moment.
Then, you can join that as a subquery to your aggregate query to get all the dates in the range listed, like so.
select DATE_FORMAT(d.reportdate, '%m-%d-%Y') as date,
sku,
SUM(price) as total
FROM (
SELECT mintime + INTERVAL seq.seq DAY AS reportdate
FROM (
SELECT MIN(DATE(created_on)) AS mintime,
MAX(DATE(created_on)) AS maxtime
FROM order_items
WHERE created_on >= starting_time
AND created_on <= NOW()
) AS order_items
JOIN seq_0_to_999 AS seq
ON seq.seq < TIMESTAMPDIFF(DAY,mintime,maxtime)
) AS d
LEFT JOIN order_items ON d.reportdate = DATE(order_items.created_on)
WHERE created_on >= starting_time
AND created_on <= NOW()
GROUP BY d.reportdate, sku
ORDER BY d.reportdate, sku
It looks like a big nasty hairball of a query. But if you think of it as a sandwich made of various layers of queries, it really isn't that complicated.
It uses LEFT JOIN so it makes sure all the dates in the range are preserved even if there's no corresponding data in your order_items table.
Finally, what about this seq_0_to_999 table? Where do we get those integers starting with zero? The answer is this: we have to arrange to do that; those numbers aren’t built in to MySQL. (They are built into the MySQL fork called MariaDB.) Create a short table with the integers from 0-9 in it, like so:
DROP TABLE IF EXISTS seq_0_to_9;
CREATE TABLE seq_0_to_9 AS
SELECT 0 AS seq UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4
UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9;
Then create a view that joins that table with itself to generate 1000 combinations like this:
DROP VIEW IF EXISTS seq_0_to_999;
CREATE VIEW seq_0_to_999 AS (
SELECT (a.seq + 10 * (b.seq + 10 * c.seq)) AS seq
FROM seq_0_to_9 a
JOIN seq_0_to_9 b
JOIN seq_0_to_9 c
);
I wrote this up in some detail at http://www.plumislandmedia.net/mysql/filling-missing-data-sequences-cardinal-integers/

MySQL query based on time range, group users, and sum values over a sliding window

I want to create a new Table B based on the information from another existing Table A. I'm wondering if MySQL has the functionality to take into account a range of time and group column A values then only sum up the values in a column B based on those groups in column A.
Table A stores logs of events like a journal for users. There can be multiple events from a single user in a single day. Say hypothetically I'm keeping track of when my users eat fruit and I want to know how many fruit they eat in a week (7days) and also how many apples they eat.
So in Table B I want to count for each entry in Table A, the previous 7 day total # of fruit and apples.
EDIT:
I'm sorry I over simplified my given information and didn't thoroughly think my example.
I'm initially have only Table A. I'm trying to create Table B from a query.
Assume:
User/id can log an entry multiple times in a day.
sum counts should be for id between date and date - 7 days
fruit column stands for the total # of fruit during the 7 day interval ( apples and bananas are both fruit)
The data doesn't only start at 2013-9-5. It can date back 2000 and I want to use the 7 day sliding window over all the dates between 2000 to 2013.
The sum count is over a sliding window of 7 days
Here's an example:
Table A:
| id | date-time | apples | banana |
---------------------------------------------
| 1 | 2013-9-5 08:00:00 | 1 | 1 |
| 2 | 2013-9-5 09:00:00 | 1 | 0 |
| 1 | 2013-9-5 16:00:00 | 1 | 0 |
| 1 | 2013-9-6 08:00:00 | 0 | 1 |
| 2 | 2013-9-9 08:00:00 | 1 | 1 |
| 1 | 2013-9-11 08:00:00 | 0 | 1 |
| 1 | 2013-9-12 08:00:00 | 0 | 1 |
| 2 | 2013-9-13 08:00:00 | 1 | 1 |
note: user 1 logged 2 entries on 2013-9-5
The result after the query should be Table B.
Table B
| id | date-time | apples | fruit |
--------------------------------------------
| 1 | 2013-9-5 08:00:00 | 1 | 2 |
| 2 | 2013-9-5 09:00:00 | 1 | 1 |
| 1 | 2013-9-5 16:00:00 | 2 | 3 |
| 1 | 2013-9-6 08:00:00 | 2 | 4 |
| 2 | 2013-9-9 08:00:00 | 2 | 3 |
| 1 | 2013-9-11 08:00:00 | 2 | 5 |
| 1 | 2013-9-12 08:00:00 | 0 | 3 |
| 2 | 2013-9-13 08:00:00 | 2 | 4 |
At 2013-9-12 the sliding window moves and only includes 9-6 to 9-12. That's why id 1 goes from a sum of 2 apples to 0 apples.
You need years in your data to be able to use date arithmetic correctly. I added them.
There's an odd thing in your data. You seem to have multiple log entries for each person for each day. You're assuming an implicit order setting the later log entries somehow "after" the earlier ones. If SQL and MySQL do that, it's only by accident: there's no implicit ordering of rows in a table. Plus if we duplicate date/id combinations, the self join (read on) has lots of duplicate rows and ruins the sums.
So we need to start by creating a daily summary table of your data, like so:
select id, `date`, sum(apples) as apples, sum(banana) as banana
from fruit
group by id, `date`
This summary will contain at most one row per id per day.
Next we need to do a limited cross product self-join, so we get seven days' worth of fruit eating.
select --whatever--
from (
-- summary query --
) as a
join (
-- same summary query once again
) as b
on ( a.id = b.id
and b.`date` between a.`date` - interval 6 day AND a.`date` )
The between clause in the on gives us the seven days (today, and the six days prior). Notice that the table in the join with the alias b is the seven day stuff, and the a table is the today stuff.
Finally, we have to summarize that result according to your specification. The resulting query is this.
select a.id, a.`date`,
sum(b.apples) + sum(b.banana) as fruit_last_week,
a.apples as apple_today
from (
select id, `date`, sum(apples) as apples, sum(banana) as banana
from fruit
group by id, `date`
) as a
join (
select id, `date`, sum(apples) as apples, sum(banana) as banana
from fruit
group by id, `date`
) as b on (a.id = b.id and
b.`date` between a.`date` - interval 6 day AND a.`date` )
group by a.id, a.`date`, a.apples
order by a.`date`, a.id
Here's a fiddle: http://sqlfiddle.com/#!2/670b2/15/0
Assumptions:
one row per id/date
the counts should be for id between date and date - 7 days
"fruit" = "banana"
the "date" column is actually a date (including year) and not just month/day
then this SQL should do the trick:
INSERT INTO B
SELECT a1.id, a1.date, SUM( a2.banana ), SUM( a2.apples )
FROM (SELECT DISTINCT id, date
FROM A
WHERE date > NOW() - INTERVAL 7 DAY
) a1
JOIN A a2
ON a2.id = a1.id
AND a2.date <= a1.date
AND a2.date >= a1.date - INTERVAL 7 DAY
GROUP BY a1.id, a1.date
Some questions:
Are the above assumptions correct?
Does table A contain more fruits than just Bananas and Apples? If so, what does the real structure look like?