Retrieve highest value per n rows - mysql

I have a table where each row contains a value and a datetime. It has hundreds of thousands of rows. I would like to select the highest (max) value every n rows.
I had previously used a query to get the highest value every hour, but this isn't quite what I am looking for:
SELECT datetime, MAX(value)
FROM `table`
GROUP BY date_format(datetime, '%Y-%m-%d &h')
Any advice would be greatly appreciated!

You can enumerate the rows and then aggregate to your heart's desire:
select min(rn), max(rn), min(datetime), max(datetime), max(value)
from (select t.*, (#rn := #rn + 1) rn
from `table` t cross join
(select #rn := 0) params
order by datetime
) t
group by floor((rn - 1) / #n)
order by min(rn);

Related

MySQL: How to find the maximum length of an uninterrupted sequence of certain values?

Given a table:
date value
02.10.2019 1
03.10.2019 2
04.10.2019 2
05.10.2019 -1
06.10.2019 1
07.10.2019 1
08.10.2019 2
09.10.2019 2
10.10.2019 -1
11.10.2019 2
12.10.2019 1
How to find the maximum length of an uninterrupted sequence of positive values (4 in that example)?
This is a gaps-and-islands problem. One simple method is the difference of row numbers to identify the islands:
select min(date), max(date), count(*) as length
from (select t.*,
row_number() over (order by date) as seqnum_1,
row_number() over (partition by sign(value) order by date) as seqnum_2
from t
) t
group by sign(value), (seqnum_1 - seqnum_2)
order by count(*) desc
limit 1;
This is a little hard to explain. I find that if you stare at the results of the subquery, you will see how the difference identifies the groups.
Assuming there are no gaps in the dates, another method finds the next non-positive number (if any):
select t.*,
datediff(date, coalesce(next_end_date, max_date)) as num
from (select t.*,
min(case when value <= 0 then date end) over (order by date desc) as next_end_date,
max(date) over () as max_date
from t
) t
where value > 0
order by datediff(date, coalesce(next_end_date, max_date)) desc
limit 1;

Average timediff of 2nd and 3rd datetimes for a group

I need to find the average time in days between a customer's second order and third order
I know that I need to use the timestampdiff but am quite at a loss for how to select the second and third dates and need some sort of nest.
SELECT CustomerID,
OrderDate,
diff,
avg(timestampdiff(day, start_date, end_date)) AS average_days
FROM () o3
WHERE date3, date2
ORDER BY CustomerID, OrderDate;
Table
To achieve your desired result, you first need to calculate ROW_NUMBER from your data PARTITION BY CustmerId. Then keep rows only with RowNumber IN (2,3) and then get the DateDiff between two days. The following query will help getting your desired results-
SELECT CustomerID,datediff(MAX(OrderDate),MIN(OrderDate))
FROM
(
SELECT *,
#row_num :=IF(#prev_value = concat_ws('',CsutomerID),#row_num+1,1)AS RowNumber
, #prev_value := concat_ws('',CsutomerID)
FROM your_table A
ORDER BY CustomerID,OrderDate
)B
WHERE B.RowNumber IN (2,3)
GROUP BY CustomerID;

Stop query when SUM is reached (mysql)

I have a database with colums I am working on. What I am looking for is the date associated with the row where the SUM(#) reaches 6 in a query. The query I have now will give the date when the number in the colum is six but not the sum of the previous rows. example below
Date number
---- ------
6mar16 1
8mar16 4
10mar16 6
12mar16 2
I would like to get a query to get the 10mar16 date because on that date the number is now greater than 6. Earlier dates wont total up to six.
Here is an example of a query i have been working on:
SELECT max(date) FROM `numbers` WHERE `number` > 60
You could use this query, which tracks the accumulated sum and then returns the first one that meets the condition:
select date
from (select * from mytable order by date) as base,
(select #sum := 0) init
where (#sum := #sum + number) >= 6
limit 1
SQL Fiddle
Most databases support ANSI standard window functions. In this case, cumulative sum is your friend:
select t.*
from (select t.*, sum(number) over (order by date) as sumnumber
from t
) t
where sumnumber >= 10
order by sumnumber
fetch first 1 row only;
In MySQL, you need variables:
select t.*
from (select t.*, (#sumn := #sumn + number) as sumnumber
from t cross join (select #sumn) params
order by date
) t
where sumnumber >= 10
order by sumnumber
fetch first 1 row only;
Awesome!!!! It seems to be working great. Here is the code that I used.
SELECT date, id, crewname
FROM (select * FROM flightrecord WHERE `crewname` = 'brayn'
ORDER BY dutyTimeArrive DESC) as base,
(select #sum := 0) init
WHERE (#sum := #sum + tankDropCount) >= 6
limit 1

MySQL - Rank per month across several months

I'm using MySQL database. I'm looking to generate the rank of customers month by month for the last 6 months.
I just got the following query to work to determine the rank of a customer in a monthly poll. This reports the rank correctly only if the date range in one month.
select
t1.*,
#rownum := #rownum + 1 AS RANK
from
(
select
date_format(EVE_DATE,'%Y-%m') as MON_DATE,
CUST,
SUM(POLL) as SCORE
from
TABLE
where
EVE_DATE >= '2016-01-01' and EVE_DATE <= '2016-01-31'
group by
MON_DATE,
CUST
order by
SCORE desc
)t1,
(SELECT #rownum := 0) r
order by
RANK DESC
The problem I have is, if I were to change the date range to span over multiple months, then the rank shown isn't right. I've dug a bit deeper & realize that, the problem is due to the fact that when the number of days span across months, every customer gets listed as many times as the number of months in question. Thereby, number of rows in the output is number_of_customers * number of months which means the rank per month is no longer a meaningful value.
For example, if there are 100 customers & if I were to calculate the rank for one month, the maximum rank I can have is 100 which is correct. However, if I considered 2 months, the rank can range from 1 to 200 which is incorrect. This is because there are only 100 customers, but, are appearing twice due to 2 months being the consideration.
How could I correct the below query to show me rank per month correctly?
select
t2.*
from
(
select
t1.*,
#rownum := #rownum + 1 AS RANK
from
(
select
date_format(EVE_DATE,'%Y-%m') as MON_DATE,
CUST,
SUM(POLL) as SCORE
from
TABLE
where
EVE_DATE >= (curdate() - INTERVAL 3 MONTH)
group by
MON_DATE,
CUST
order by
SCORE desc
)t1,
(SELECT #rownum := 0) r
order by
RANK DESC
)t2
where
t2.CUST= 'customerA'
order by
t2.MON_DATE desc
I'd appreciate any help here to get me going please.
I think you want the inner subquery to aggregate only by customer, not by customer and date:
select t1.*,
#rownum := #rownum + 1 AS RANK
from (select CUST, SUM(POLL) as SCORE
from TABLE
where EVE_DATE >= '2016-01-01' and EVE_DATE <= '2016-01-31'
group by CUST
order by SCORE desc
) t1 cross join
(SELECT #rownum := 0) r
order by RANK DESC;

MySQL - How to select rows with the min(timestamp) per hour of a given date

I have a table of production readings and need to get a result set containing a row for the min(timestamp) for EACH hour.
The column layout is quite simple:
ID,TIMESTAMP,SOURCE_ID,SOURCE_VALUE
The data sample would look like:
123,'2013-03-01 06:05:24',PMPROD,12345678.99
124,'2013-03-01 06:15:17',PMPROD,88888888.99
125,'2013-03-01 06:25:24',PMPROD,33333333.33
126,'2013-03-01 06:38:14',PMPROD,44444444.44
127,'2013-03-01 07:12:04',PMPROD,55555555.55
128,'2013-03-01 10:38:14',PMPROD,44444444.44
129,'2013-03-01 10:56:14',PMPROD,22222222.22
130,'2013-03-01 15:28:02',PMPROD,66666666.66
Records are added to this table throughout the day and the source_value is already calculated, so no sum is needed.
I can't figure out how to get a row for the min(timestamp) for each hour of the current_date.
select *
from source_readings
use index(ID_And_Time)
where source_id = 'PMPROD'
and date(timestamp)=CURRENT_DATE
and timestamp =
( select min(timestamp)
from source_readings use index(ID_And_Time)
where source_id = 'PMPROD'
)
The above code, of course, gives me one record. I need one record for the min(hour(timestamp)) of the current_date.
My result set should contain the rows for IDs: 123,127,128,130. I've played with it for hours. Who can be my hero? :)
Try below:
SELECT * FROM source_readings
JOIN
(
SELECT ID, DATE_FORMAT(timestamp, '%Y-%m-%d %H') as current_hour,MIN(timestamp)
FROM source_readings
WHERE source_id = 'PMPROD'
GROUP BY current_hour
) As reading_min
ON source_readings.ID = reading_min.ID
SELECT a.*
FROM Table1 a
INNER JOIN
(
SELECT DATE(TIMESTAMP) date,
HOUR(TIMESTAMP) hour,
MIN(TIMESTAMP) min_date
FROM Table1
GROUP BY DATE(TIMESTAMP), HOUR(TIMESTAMP)
) b ON DATE(a.TIMESTAMP) = b.date AND
HOUR(a.TIMESTAMP) = b.hour AND
a.timestamp = b.min_date
SQLFiddle Demo
With window function:
WITH ranked (
SELECT *, ROW_NUMBER() OVER(PARTITION BY HOUR(timestamp) ORDER BY timestamp) rn
FROM source_readings -- original table
WHERE date(timestamp)=CURRENT_DATE AND source_id = 'PMPROD' -- your custom filter
)
SELECT * -- this will contain `rn` column. you can select only necessary columns
FROM ranked
WHERE rn=1
I haven't tested it, but the basic idea is:
1) ROW_NUMBER() OVER(PARTITION BY HOUR(timestamp) ORDER BY timestamp)
This will give each row a number, starting from 1 for each hour, increasing by timestamp. The result might look like:
|rest of columns |rn
123,'2013-03-01 06:05:24',PMPROD,12345678.99,1
124,'2013-03-01 06:15:17',PMPROD,88888888.99,2
125,'2013-03-01 06:25:24',PMPROD,33333333.33,3
126,'2013-03-01 06:38:14',PMPROD,44444444.44,4
127,'2013-03-01 07:12:04',PMPROD,55555555.55,1
128,'2013-03-01 10:38:14',PMPROD,44444444.44,1
129,'2013-03-01 10:56:14',PMPROD,22222222.22,2
130,'2013-03-01 15:28:02',PMPROD,66666666.66,1
2) Then on the main query we select only rows with rn=1, in other words, rows that has lowest timestamp in each hourly partition (1st row after sorted by timestamp in each hour).