MySQL get latest consecutive rows set by date - mysql

I am retrieving following rows with query,
SELECT id, entry_id, DATE(entry_date)
FROM entries
WHERE entry_id = 51
ORDER BY entry_date DESC
+-----+----------+---------------------+
| id | entry_id | entry_date |
+-----+----------+---------------------+
| 84 | 51 | 2021-02-27 xx:xx:xx |<---
| 81 | 51 | 2021-02-26 xx:xx:xx | |
| 76 | 51 | 2021-02-25 xx:xx:xx | |-- consecutive set
| 74 | 51 | 2021-02-25 xx:xx:xx | |
| 73 | 51 | 2021-02-24 xx:xx:xx |<---
| 52 | 51 | 2021-02-20 xx:xx:xx |
| 44 | 51 | 2021-02-19 xx:xx:xx |
| 32 | 51 | 2021-02-18 xx:xx:xx |
| . | .. | ... |
| . | .. | ... |
+-----+----------+---------------------+
entry_date's data type is timestamp. The time does not matter here in entry_date. I am only concerned with the dates without time.
I want to get rows only with "latest consecutive dates" OR first and last date of the latest consecutive set for an "entry_id".
for example, for entry_id = 51, I want only rows,
+-----+----------+------------+
| id | entry_id | entry_date |
+-----+----------+------------+
| 84 | 51 | 2021-02-27 |
| 81 | 51 | 2021-02-26 |
| 76 | 51 | 2021-02-25 |
| 74 | 51 | 2021-02-25 |
| 73 | 51 | 2021-02-24 |
+-----+----------+------------+
OR I want to get first and last date of "latest consecutive dates" set for entry_id = 51
eg. in this case entry_date 2021-02-24 and 2021-02-27.
I don't have any experience with writing such queries. I can get all the records order by DESC for entry_id = 51 and write a script to get latest consecutive rows but since there are hundreds of thousands of rows which can be sometimes inefficient to process just to get latest consecutive rows.
Please note that there can be some entries with the same date (in this case: 2021-02-25) which are also considered in the result.
Edit: I am using MySQL 5.6.

This is a type of gaps-and-islands problem solved using lead() to determine where there is a gap of more than one day.
select entry_id, min(entry_date), max(entry_date)
from (select e.*,
sum(case when entry_date < next_entry_date - interval 1 day then 1 else 0 end) over (partition by entry_id order by entry_date desc) as grp
from (select e.*,
lead(entry_date) over (partition by entry_id order by entry_date) as next_entry_date
from entries e
) e
) e
where grp = 0
group by entry_id;
Then the cumulative sum is done in reverse order. So the last group has a cumulative sum of 0.
Here is a db<>fiddle.

You might try finding the gap dates with a self join, using group_concat to get them together, then using substring_index to split out the first two, then use that to get all rows bounded by the dates.
SELECT q.*
FROM entries q
JOIN (
SELECT entry_id,
SUBSTRING_INDEX(dts, ',', 1) AS upperDate,
SUBSTRING_INDEX(SUBSTRING_INDEX(dts, ',', 2), ',', -1) AS lowerDate
FROM (
SELECT x.entry_id,
GROUP_CONCAT(if(y.entry_date IS NULL, x.entry_date, NULL) ORDER BY x.entry_date DESC) AS dts
FROM entries x
LEFT JOIN entries y ON (
x.entry_id = y.entry_id AND
x.entry_date = y.entry_date - interval 1 DAY
)
GROUP BY 1
) z
) bounds ON (
q.entry_id = bounds.entry_id AND
q.entry_date <= bounds.upperDate AND
q.entry_date > bounds.lowerDate
)
You can avoid the self join with some variables, but that adds a certain level of complexity of its own and makes the logic a little harder to read and maintain.
SELECT entry_id, entry_date FROM (
SELECT entry_id, entry_date, elapsed,
NOT #latch AS keep,
#latch:= if(elapsed > 1, TRUE, #latch),
#latch := if(#currID <> entry_id, FALSE, #latch),
#currID := entry_id
FROM (
SELECT entry_id, entry_date,
TIMESTAMPDIFF(DAY, #prevDate, entry_date) AS elapsed,
#prevDate := entry_date
FROM (
SELECT entry_id, entry_date
FROM entries
JOIN (SELECT #currID := 0, #prevDate := null, #latch:=false) v
ORDER BY entry_id, entry_date ASC
) z
) y ORDER BY entry_id, entry_date DESC
) x WHERE keep

In MySQL 8, it can be done like this:
WITH e2 AS (
SELECT entry_id, entry_date
, COALESCE(LAG(DATE(entry_date))
OVER (PARTITION BY entry_id
ORDER BY entry_date DESC)
- DATE(entry_date), 0)
AS date_diff
FROM entries
), e3 AS (
SELECT e2.*
, MAX(date_diff)
OVER (PARTITION BY e2.entry_id
ORDER BY e2.entry_date DESC
ROWS UNBOUNDED PRECEDING)
AS max_diff
FROM e2
)
SELECT entry_id
, DATE(MIN(entry_date)) AS min_date
, DATE(MAX(entry_date)) AS max_date
FROM e3
WHERE max_diff <= 1
GROUP BY entry_id;
Result
entry_id
min_date
max_date
51
2021-02-24
2021-02-27
See DB Fiddle for demo.
If you want the WHERE entry_id = 51 condition, it should be added to the first WITH query.

Related

How to get only first rows from table ordered by dublicating values in column?

I have a table looks like:
id | size | price | date
0 | 30 | 800 | 2021-10-01
1 | 30 | 900 | 2021-10-02
2 | 32 | 700 | 2021-09-11
3 | 30 | 800 | 2021-09-21
4 | 32 | 800 | 2021-09-01
5 | 32 | 0 | 2021-10-03
And i need to get the last updated prices of size <= 'size' to check it for zero and get first non-zero value. I try to sort table by size desc, date desc, but can't take only first rows with dublicating sizes.
SELECT *
FROM (SELECT *
FROM `prices`
WHERE model_id = '269'
AND partner_id = '0'
AND size <= '32'
AND date_time <= '2021-10-19'
ORDER BY size DESC, date_time DESC
) AS t_1
GROUP BY size
LIMI 1
does not help. The first result what i want is
id | size | price | date
5 | 32 | 0 | 2021-10-03
1 | 30 | 900 | 2021-10-02
then i want to get 900.
Using exists logic we can try:
SELECT p1.*
FROM prices p1
WHERE NOT EXISTS (SELECT 1 FROM prices p2
WHERE p2.size = p1.size AND p2.date > p1.date);
More typically, on MySQL 8+, we would handle this using ROW_NUMBER:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY size ORDER BY date DESC) rn
FROM prices
)
SELECT id, size, price, date
FROM cte
WHERE rn = 1;
One advantage of using ROW_NUMBER is that should there be two or more latest records tied on the same date, we can simply add another sorting level to the ORDER BY clause to break the tie.
SELECT *
FROM prices p
WHERE
size <= '32'
AND date LIKE (SELECT max(date) from prices where size = p.size and price <> 0)

The division of each row by the sum of the rows that has the same value of two columns

I want to calculate the division of each row per the sum of all rows that have the same Dateadded and fundid, but it seems my query is wrong due the results is not what I was expecting.
My table schema looks like this, I avoided mine because it has many more columns:
+----+--------+------------+--------+
| id | fundid | Dateadded | amount |
+====+========+============+========+
| 1 | 45 | 21-02-2018 | 5412 |
| 2 | 45 | 21-02-2018 | 5414 |
| 3 | 45 | 21-02-2018 | 1412 |
| 4 | 45 | 22-02-2018 | 5756 |
| 5 | 45 | 22-02-2018 | 4412 |
| 6 | 45 | 25-02-2018 | 2532 |
| 7 | 45 | 26-02-2018 | 7892 |
| 8 | 45 | 26-02-2018 | 8143 |
+----+-------+-------------+--------+
Rows with id's: 1,2,3 should be calculated together because they have
the same fundid and date.
Rows with id's: 4,5 same thing.
Rows with id's: 6 it is just one.
Rows with id's: 7,8 same thing.
My SQL query:
SELECT fundid
, Dateadded
, ( amount / SUM(amount) ) AS AvgRow
FROM stock2
GROUP
BY fundid
, Dateadded
ORDER
BY DateAdded ASC
Is this what you want?
select t.*, t.amount / tt.total_amount
from stock2 t join
(select fundid, dateadded, sum(amount) as total_amount
from stock2 t
group by fundid, dateadded
) tt
using (fundid, dateadded);
Or is this?
select fundid, dateadded, sum(t.amount) / tt.total_amount
from stock2 t cross join
(select sum(amount) as total_amount
from stock2 t
) tt
group by fundid, dateadded, tt.total_amount;
Check out a very well explained response to a similar issue related to usage of Group by here).
Similarly to the situation described there, for your query is ambiguous re: what "amount" should be used for each row. I.e. if you try:
SELECT fundid, Dateadded, ( AVG(amount) / SUM(amount) ) AS AvgRow FROM stock2 GROUP BY fundid, Dateadded ORDER BY DateAdded ASC
it will work because AVG(amount) is non-ambiguous for each (fundid, Dateadded) pair that should be calculated together.
It seems you are looking for something like:
SELECT st.fundid, st.Dateadded, ( amount / st2.total) ) AS AvgRow
FROM stock2 st
inner join
(select fundid, Dateadded, sum(amount) as total
from stock2
GROUP BY fundid, Dateadded) st2
on st.fundid = st2.fundid and st.Dateadded = st2.Dateadded
order by st.Dateadded

How to Select First Date, Previous Date, Latest Date where first date is higher than a reference date

I want to SELECT the Latest Date, the Second Latest Date and the First Date FROM a table1 where the First Date is higher than a reference Date found in another table2. And that reference Date should also be the latest from that table2. I have a solution, supposed to be. But the problem is, the solutions will not return an output if there is ONLY 1 record from table1. Example of the tables:
table1
Reg ID | DateOfAI | byTechnician
2GP001 | 2015-01-13 | 31
2GP001 | 2015-02-18 | 31
2GP001 | 2017-11-10 | 45
2GP001 | 2017-11-30 | 32
2GP044 | 2017-11-30 | 28
2GP001 | 2017-12-23 | 32
table2
Reg ID | DateOfCalving | DryOffDate
2GP001 | 2016-01-14 |
2GP070 | 2016-01-14 |
2GP065 | 2017-04-08 |
2GP001 | 2017-04-12 |
my expected output would be:
Reg ID | LatestDateOfCalving | 1stDateOfAI | PreviousAIDate | LastestAIDate
2GP001 | 2017-04-12 | 2017-11-10 | 2017-11-30 | 2017-12-23
I have searched everywhere from the moon and back... still no luck. these are the queries that i have used
the Fisrt:
SELECT b.actualDam,COUNT(x.actualDam) AS ilanba, max(b.breedDate) AS huli, max(x.breedDate) AS nex,MIN(x.breedDate) AS una,IFNULL(c.calvingDate,NULL) AS nganak,r.*,h.herdID,a.animalID,a.regID, IFNULL(a.dateOfBirth,NULL) AS buho
FROM x_animal_breeding_rec b
LEFT JOIN x_animal_calving_rec c ON b.recID=c.brecID
LEFT JOIN x_herd_animal_rel r ON b.actualDam=r.animal
LEFT JOIN x_herd h ON r.herd=h.herdID
LEFT JOIN x_animal_main_info a ON b.actualDam=a.animalID
JOIN x_animal_breeding_rec x ON b.actualDam = x.actualDam AND x.breedDate < b.breedDate
WHERE h.herdID = ? AND x.mateType = ? AND x.recFlag = ? GROUP BY b.actualDam
and the Second one that I've tried is this code:
SELECT b.recID
, b.actualDam
, b.breedDate
, min(b.breedDate) AS una
, max(b.breedDate) AS huli
, COUNT(b.actualDam) AS sundot
, b.mateType
, b.recFlag
, a.animalID
, a.regID
, h.*
FROM
( SELECT c.recID, c.actualDam
, c.breedDate
, c.mateType
, c.recFlag
, CASE WHEN #prev=c.recID THEN #i:=#i+1 ELSE #i:=1 END i
, #prev:=c.recID prev
FROM x_animal_breeding_rec c
, ( SELECT #prev:=null,#i:=0 ) vars
ORDER BY c.recID,c.breedDate DESC
) b
LEFT JOIN x_animal_main_info a ON b.actualDam=a.animalID
LEFT JOIN x_herd_animal_rel h ON b.actualDam=h.animal
WHERE i <= 2 GROUP BY b.actualDam HAVING h.herd = ? AND b.mateType = ? AND b.recFlag = ? ORDER BY b.breedDate DESC
Another problem here is the first solution returns a WRONG COUNT. the second solution returns a CORRECT COUNT, however, wrong Dates were returned. I hope you could give me an idea. Thanx in Advance.
The following query answers your question:
SELECT
RegID,
LatestDateOfCalving,
MIN(DateOfAI) AS 1stDateOfAI,
REPLACE(SUBSTRING_INDEX(GROUP_CONCAT(DateOfAI ORDER BY DateOfAI DESC), ',', 2), CONCAT(MAX(DateOfAI), ','), '') AS PreviousAIDate,
MAX(DateOfAI) AS LatestAIDate
FROM (
SELECT
t1.RegID,
LatestDateOfCalving,
DateOfAI,
IF(DateOfAI >= LatestDateOfCalving, 1, 0) AS dates
FROM table1 AS t1
INNER JOIN (
SELECT
RegID,
MAX(DateOfCalving) AS LatestDateOfCalving
FROM table2 GROUP BY RegID
) AS tt2 ON t1.RegID = tt2.RegID) AS x
WHERE dates = 1
GROUP BY RegID
HAVING COUNT(dates) >= 3;
Output:
+--------+---------------------+-------------+----------------+--------------+
| RegID | LatestDateOfCalving | 1stDateOfAI | PreviousAIDate | LatestAIDate |
+--------+---------------------+-------------+----------------+--------------+
| 2GP001 | 2017-04-12 | 2017-11-10 | 2017-11-30 | 2017-12-23 |
+--------+---------------------+-------------+----------------+--------------+
DEMO
In a subquery we select RegID and LatestDateOfCalving from table2 in order to have a reference date. Then join it to table1 and flag the record whether DateOfAI is greater or equal to LatestDateOfCalving (IF(DateOfAI >= LatestDateOfCalving, 1, 0)). We use this subquery in the outer query (SELECT RegID, LatestDateOfCalving, MIN(DateOfAI) AS 1stDateOfAI, MAX(DateOfAI) AS LatestAIDate, ...) and select only those records where the DateOfAI are at or after LatestDateOfCalving (WHERE dates = 1, where 1 is the flag where the condition was true) and have at least 3 records (HAVING COUNT(dates) >= 3). In the outer query I use the REPLACE(SUBSTRING_INDEX(GROUP_CONCAT(...))) structure in order to extract the previousAIDate from a comma (,) separated list of dates.

Optimizing SQL Query for max value with various conditions from a single MySQL table

I have the following SQL query
SELECT *
FROM `sensor_data` AS `sd1`
WHERE (sd1.timestamp BETWEEN '2017-05-13 00:00:00'
AND '2017-05-14 00:00:00')
AND (`id` =
(
SELECT `id`
FROM `sensor_data` AS `sd2`
WHERE sd1.mid = sd2.mid
AND sd1.sid = sd2.sid
ORDER BY `value` DESC, `id` DESC
LIMIT 1)
)
Background:
I've checked the validity of the query by changing LIMIT 1 to LIMIT 0, and the query works without any problem. However with LIMIT 1 the query doesn't complete, it just states loading until I shutdown and restart.
Breaking the Query down:
I have broken down the query with the date boundary as follows:
SELECT *
FROM `sensor_data` AS `sd1`
WHERE (sd1.timestamp BETWEEN '2017-05-13 00:00:00'
AND '2017-05-14 00:00:00')
This takes about 0.24 seconds to return the query with 8200 rows each having 5 columns.
Question:
I suspect the second half of my Query, is not correct or well optimized.
The tables are as follows:
Current Table:
+------+-------+-------+-----+-----------------------+
| id | mid | sid | v | timestamp |
+------+-------+-------+-----+-----------------------+
| 51 | 10 | 1 | 40 | 2015-05-13 11:56:01 |
| 52 | 10 | 2 | 39 | 2015-05-13 11:56:25 |
| 53 | 10 | 2 | 40 | 2015-05-13 11:56:42 |
| 54 | 10 | 2 | 40 | 2015-05-13 11:56:45 |
| 55 | 10 | 2 | 40 | 2015-05-13 11:57:01 |
| 56 | 11 | 1 | 50 | 2015-05-13 11:57:52 |
| 57 | 11 | 2 | 18 | 2015-05-13 11:58:41 |
| 58 | 11 | 2 | 19 | 2015-05-13 11:58:59 |
| 59 | 11 | 3 | 58 | 2015-05-13 11:59:01 |
| 60 | 11 | 3 | 65 | 2015-05-13 11:59:29 |
+------+-------+-------+-----+-----------------------+
Q: How would I get the MAX(v)for each sid for each mid?
NB#1: In the example above ROW 53, 54, 55 have all the same value (40), but I would like to retrieve the row with the most recent timestamp, which is ROW 55.
Expected Output:
+------+-------+-------+-----+-----------------------+
| id | mid | sid | v | timestamp |
+------+-------+-------+-----+-----------------------+
| 51 | 10 | 1 | 40 | 2015-05-13 11:56:01 |
| 55 | 10 | 2 | 40 | 2015-05-13 11:57:01 |
| 56 | 11 | 1 | 50 | 2015-05-13 11:57:52 |
| 58 | 11 | 2 | 19 | 2015-05-13 11:58:59 |
| 60 | 11 | 3 | 65 | 2015-05-13 11:59:29 |
+------+-------+-------+-----+-----------------------+
Structure of the table:
NB#2:
Since this table has over 110 million entries, it is critical to have have date boundaries, which limits to ~8000 entries over a 24 hour period.
The query can be written as follows:
SELECT t1.id, t1.mid, t1.sid, t1.v, t1.ts
FROM yourtable t1
INNER JOIN (
SELECT mid, sid, MAX(v) as v
FROM yourtable
WHERE ts BETWEEN '2015-05-13 00:00:00' AND '2015-05-14 00:00:00'
GROUP BY mid, sid
) t2
ON t1.mid = t2.mid
AND t1.sid = t2.sid
AND t1.v = t2.v
INNER JOIN (
SELECT mid, sid, v, MAX(ts) as ts
FROM yourtable
WHERE ts BETWEEN '2015-05-13 00:00:00' AND '2015-05-14 00:00:00'
GROUP BY mid, sid, v
) t3
ON t1.mid = t3.mid
AND t1.sid = t3.sid
AND t1.v = t3.v
AND t1.ts = t3.ts;
Edit and Explanation:
The first sub-query (first INNER JOIN) fetches MAX(v) per (mid, sid) combination. The second sub-query is to identify MAX(ts) for every (mid, sid, v). At this point, the two queries do not influence each others' results. It is also important to note that ts date range selection is done in the two sub-queries independently such that the final query has fewer rows to examine and no additional WHERE filters to apply.
Effectively, this translates into getting MAX(v) per (mid, sid) combination initially (first sub-query); and if there is more than one record with the same value MAX(v) for a given (mid, sid) combo, then the excess records get eliminated by the selection of MAX(ts) for every (mid, sid, v) combination obtained by the second sub-query. We then simply associate the output of the two queries by the two INNER JOIN conditions to get to the id of the desired records.
Demo
select * from sensor_data s1 where s1.v in (select max(v) from sensor_data s2 group by s2.mid)
union
select * from sensor_data s1 where s1.v in (select max(v) from sensor_data s2 group by s2.sid);
IN ( SELECT ... ) does not optimize well. It is even worse because of being correlated.
What you are looking for is a groupwise-max .
Please provide SHOW CREATE TABLE; we need to know at least what the PRIMARY KEY is.
Suggested code
You will need:
With the WHERE: INDEX(timestamp, mid, sid, v, id)
Without the WHERE: INDEX(mid, sid, v, timestamp, id)
Code:
SELECT id, mid, sid, v, timestamp
FROM ( SELECT #prev_mid := 99999, -- some value not in table
#prev_sid := 99999,
#n := 0 ) AS init
JOIN (
SELECT #n := if(mid != #prev_mid OR
sid != #prev_sid,
1, #n + 1) AS n,
#prev_mid := mid,
#prev_sid := sid,
id, mid, sid, v, timestamp
FROM sensor_data
WHERE timestamp >= '2017-05-13'
timestamp < '2017-05-13' + INTERVAL 1 DAY
ORDER BY mid DESC, sid DESC, v DESC, timestamp DESC
) AS x
WHERE n = 1
ORDER BY mid, sid; -- optional
Notes:
The index is 'composite' and 'covering'.
This should make one pass over the index, thereby providing 'good' performance.
The final ORDER BY is optional; the results may be in reverse order.
All the DESC in the inner ORDER BY must be in place to work correctly (unless you are using MySQL 8.0).
Note how the WHERE avoids including both midnights? And avoids manually computing leap-days, year-ends, etc?
With the WHERE (and associated INDEX), there will be filtering, but a 'sort'.
Without the WHERE (and the other INDEX), sort will not be needed.
You can test the performance of any competing formulations via this trick, even if you do not have enough rows (yet) to get reliable timings:
FLUSH STATUS;
SELECT ...
SHOW SESSION STATUS LIKE 'Handler%';
This can also be used to compare different versions of MySQL and MariaDB -- I have seen 3 significantly different performance characteristics in a related groupwise-max test.

MySQL UNION does not seem to work correctly

I have an SQL query I am using to pull data from an orders database. I am querying 2 tables and combining the results using UNION ALL. However, the UNION ALL does not seem to work as expected. Here is the query I am using:
SELECT year(oc_order.date_added) AS year, COUNT(oc_order.order_id) as cnt, SUM( ifnull(oc_order.new_total,oc_order.total) ) as total
FROM oc_order
WHERE oc_order.order_status_id IN (1,3,5)
AND MONTH(oc_order.date_added) BETWEEN '01' AND '02'
AND DAY(oc_order.date_added) BETWEEN '01' AND '31'
GROUP BY year(oc_order.date_added)
UNION ALL
SELECT ifnull(year(str_to_date(oc_return_custom.date_added,'%d-%m-%Y %H:%i:%s')),year(str_to_date(oc_return_custom.date_added,'%Y-%m-%d %H:%i:%s')) ) AS year, COUNT(oc_return_custom.return_id) as cnt, SUM( oc_return_custom.total ) as total
FROM oc_return_custom
WHERE ifnull(MONTH(str_to_date(oc_return_custom.date_added,'%d-%m-%Y %H:%i:%s')),MONTH(str_to_date(oc_return_custom.date_added,'%Y-%m-%d %H:%i:%s')) ) BETWEEN '01' AND '02'
AND ifnull(DAY(str_to_date(oc_return_custom.date_added,'%d-%m-%Y %H:%i:%s')),DAY(str_to_date(oc_return_custom.date_added,'%Y-%m-%d %H:%i:%s')) ) BETWEEN '01' AND '31'
GROUP BY ifnull(year(str_to_date(oc_return_custom.date_added,'%d-%m-%Y %H:%i:%s')),year(str_to_date(oc_return_custom.date_added,'%Y-%m-%d %H:%i:%s')) )
ORDER BY year DESC
This is what I get from the query:
+=======+========+=======+
| year | cnt | total |
+=======+========+=======+
| 2016 | 200 | 1000 |
| 2016 | 50 | 200 |
| 2015 | 100 | 800 |
| 2015 | 10 | 50 |
+=======+========+=======+
But this is what I wanted to get:
+=======+========+=======+
| year | cnt | total |
+=======+========+=======+
| 2016 | 250 | 1200 |
| 2015 | 110 | 850 |
+=======+========+=======+
Can someone tell me what I am doing wrong???
Notes:
The oc_order table's date_added column is datetime whereas oc_return_custom 's date_added column is just text.
UNION ALL simply puts together two data sets produced by separate GROUP BY operations.
To get the expected result set you have to wrap the query in a subquery and apply an additional GROUP BY:
SELECT year, SUM(cnt) AS cnt, SUM(total) AS total
FROM ( ... your query here ...) AS t
GROUP BY year