Catching latest column value change in SQL - mysql

How can I get the date for the latest value change in one column with one SQL query?
Possible database situation:
Date State
2012-11-25 state one
2012-11-26 state one
2012-11-27 state two
2012-11-28 state two
2012-11-29 state one
2012-11-30 state one
So result should return 2012-11-29 as latest change state. If I group by State value, I will get the date for first time I have that state in database.
The query will group the table on state and show the state and in the date field the latest date created of that state.
From the given input the output would be
Date State
2012-11-30 state one
2012-11-28 state two

This will get you the last state:
-- Query 1
SELECT state
FROM tableX
ORDER BY date DESC
LIMIT 1 ;
Encapsulating the above, we can use it to get the date just before the last change:
-- Query 2
SELECT t.date
FROM tableX AS t
JOIN
( SELECT state
FROM tableX
ORDER BY date DESC
LIMIT 1
) AS last
ON last.state <> t.state
ORDER BY t.date DESC
LIMIT 1 ;
And then use that to find the date (or the whole row) where the last change occurred:
-- Query 3
SELECT a.date -- can also be used: a.*
FROM tableX AS a
JOIN
( SELECT t.date
FROM tableX AS t
JOIN
( SELECT state
FROM tableX
ORDER BY date DESC
LIMIT 1
) AS last
ON last.state <> t.state
ORDER BY t.date DESC
LIMIT 1
) AS b
ON a.date > b.date
ORDER BY a.date
LIMIT 1 ;
Tested in SQL-Fiddle
And a solution that uses MySQL variables:
-- Query 4
SELECT date
FROM
( SELECT t.date
, #r := (#s <> state) AS result
, #s := state AS prev_state
FROM tableX AS t
CROSS JOIN
( SELECT #r := 0, #s := ''
) AS dummy
ORDER BY t.date ASC
) AS tmp
WHERE result = 1
ORDER BY date DESC
LIMIT 1 ;

I believe this is the answer:
SELECT
DISTINCT State AS State, `Date`
FROM
Table_1 t1
WHERE t1.`Date`=(SELECT MAX(`Date`) FROM Table_1 WHERE State=t1.State)
...and the test:
http://sqlfiddle.com/#!2/8b0d8/5

If you add another column 'changed datetime' you can fill this using an update trigger that inserts NOW(). If you query your table ordering on the changed column, it will endup first.
CREATE TRIGGER `trigger` BEFORE UPDATE ON `table`
FOR EACH ROW
BEGIN
SET ROW.changed = NOW();
END$$

Try this ::
Select
MAX(`Date`), state from mytable
group by state

If you had been using postgres, you could compare different rows in the same table using "LEAD .. OVER" I have not managed to find the same functionallity in mysql.
A bit hairy, but I think this will do:
select min(t1.date) from table_1 t1 where
(select count(distinct state) from table_1 where table_1.date>=t1.date)=1
Basically, this asks for the first time no changes in state is found for any later values. Be warned, it may be this query scales terribly for large data sets....

I think your best choice here are analytical functions. Try this - it should be OK performance-wise:
SELECT *
FROM test
WHERE my_date = (SELECT MAX (my_date)
FROM (SELECT MY_DATE
FROM ( SELECT MY_DATE,
STATE,
LAG (state) OVER (ORDER BY MY_DATE)
lag_val
FROM test
ORDER BY MY_DATE) a
WHERE state != lag_val))
In the inner select, the LAG function gets the previous value in the STATE column and in the outer select I mark the date of a change - those with lag value different than the current state value. And outside, I'm getting the latest date from those dates of a change... I hope that this is what you needed.

SELECT MAX(DATE) FROM YOUR_TABLE
Above answer doesn't seem to satisfy what OP needs.
UPDATED ANSWER WITH AFTER INSERT/UPDATE TRIGGER
DELCARE #latestState varchar;
DELCARE #latestDate date;
CREATE TRIGGER latestInsertTrigger AFTER INSERT ON myTable
FOR EACH ROW
BEGIN
IF OLD.DATE <> NEW.date THEN
SET #latestState = NEW.state
SET #latestDate = NEW.date
END IF
END
;
CREATE TRIGGER latestUpdateTrigger AFTER UPDATE ON myTable
FOR EACH ROW
BEGIN
IF OLD.DATE = NEW.date AND OLD.STATE <> NEW.STATE THEN
SET #latestState = NEW.state
SET #latestDate = NEW.date
END IF
END
;
You may use the following query to get the latest record added/updated:
SELECT DATE, STATE FROM myTable
WHERE STATE = #latestState
OR DATE = #latestDate
ORDER BY DATE DESC
;
Results:
DATE STATE
November, 30 2012 00:00:00+0000 state one
November, 28 2012 00:00:00+0000 state two
November, 27 2012 00:00:00+0000 state two
The above query results needs to be limitted to 2, 3 or n based on what you need.
Frankly it seems like you want to get max from both columns based on the data sample you have given. Assuming that your state only increases with the date. Only I wish if the state was an integer :D
Then union of two max sub queries on both columns would have solved it easily. Still a string manipulation regex can find what's the max in state column. Finally this approach needs limit x. However it still has lope hole. Anyway it took me sometime to figure your need out :$

Related

MYSQL - Filter consecutive not null dates

Get only the biggest date:
These are check-in and check-out records of employees, some times they do twice or more entries on the system in a row. In this sample there were two check-out in a row. Assuming these rows always gonna be ordered, in the case of check-out I would like have the biggest date, and in the case of the check-in the smallest date.
In that case I would like to have this:
The smaller date was excluded:
DEMO
Try this, in this big CASE statement I increment column by one, if checkin switches from null to not null and the other way around. Then it's enough to group by this column taking max and min of checkout and checkin respectively:
select #checkinLag := null, #rn := 0;
select max(id),
functionario,
loja,
min(checkin),
max(checkout)
from (
select case when (checkinLag is null and checkin is not null) or
(checkinLag is not null and checkin is null)
then #rn := #rn + 1 else #rn end rn,
checkin,
checkout,
loja,
id,
functionario
from (
select #checkinLag checkinLag,
#checkinLag := checkin,
checkin,
checkout,
loja,
id,
functionario
from dummyTable
order by coalesce(checkin, checkout)
) a
) a group by functionario, loja, rn
I have used subqueries, to guarantee order of evaluating expressions (assigning and using of #checkinLag), as Gordon Linoff pointed.
Demo
My solution:
Select
*
from dummyTable base
where (base.checkout is null or not exists (
select
1
from dummyTable co
where co.checkout between base.checkout and DATE_ADD(base.checkout, INTERVAL 5 SECOND)
and base.id <> co.id
and base.functionario = co.functionario
and base.loja = co.loja
)) and (base.checkin is null or not exists (
select
1
from dummyTable ci
where ci.checkin between DATE_SUB(base.checkin, INTERVAL 5 SECOND) and base.checkin
and base.id <> ci.id
and base.functionario = ci.functionario
and base.loja = ci.loja
));
you can test the query here. There is no need that the rows are orderd. I choose 5 seconds as the interval where check-in/outs should be ignored.

Grouping rows via two different columns in MYSQL

I just want to ask if grouping rows with the same value but came from different columns is possible.
I have a scenario that we should sum up the total minutes if the records are found "continuous" transactions by checking if the STARTDATETIME column matches the previous data of ENDDATETIME column if they are the same. See image link below for reference.
Thanks guys.
I modified Gordon Linoff's solution ( see my comment under the question):
SELECT
c.employee_id
,MIN(c.start_date) AS start_date
,MAX(c.end_date) AS end_date
,COUNT(*) AS numcontracts,
TIMESTAMPDIFF(minute,MIN(c.start_date),MAX(c.end_date)) AS timediff
FROM
(
SELECT
c0.*
,(#rn := #rn + COALESCE(startflag, 0)) AS cumestarts
FROM
(SELECT c1.*,
(NOT EXISTS (SELECT 1
FROM contracts c2
WHERE c1.employee_id = c2.employee_id AND
c1.start_date = c2.end_date
)
) AS startflag
FROM contracts c1
ORDER BY employee_id, start_date
) c0 CROSS JOIN (SELECT #rn := 0) params
) c
GROUP BY c.employee_id, c.cumestarts
http://rextester.com/VOGMU19779
timediff contains the minutes passed in the combined interval.

Eliminate First 14 For Each Symbol From Query

The following query pulls all rows that do not exist in a relative_strength_index table. But I also need to eliminate the first 14 rows for each symbol based on date asc from the historical_data table. I have tried several attempts to do this but am having real trouble with the 14 days. How could this issue be resolved and added into my current query?
Current Query
select *
from historical_data hd
where not exists (select rsi_symbol, rsi_date from relative_strength_index where hd.symbol = rsi_symbol and hd.histDate = rsi_date);
What you want is the first argument of the limit clause. Which states which row to start from accompanied by order by asc.
select * from historical_data hd where not exists (select rsi_symbol, rsi_date from relative_strength_index where hd.symbol = rsi_symbol and hd.histDate = rsi_date ORDER BY rsi_date ASC LIMIT 14)
use OFFSET along with LIMIT like this this will return maximum of 100,000 rows starting at row 15
select *
from historical_data hd
where not exists (select rsi_symbol, rsi_date from relative_strength_index where hd.symbol = rsi_symbol and hd.histDate = rsi_date)
order by date asc
limit 100000 offset 14;
but because you're using limit and offset, you might want to ORDER BY by some order before specifying limit and offset.
UPDATE you mentioned for each symbol, so try this query, it ranks each symbol based on date asc, then only selects rows where rank >= 15
SELECT *
FROM
(select hd.*,
CASE WHEN #previous_symbol = hd.symbol THEN #rank:=#rank+1
ELSE #rank := 1
END as rank,
#previous_symbol := hd.symbol
from historical_data hd
where not exists (select rsi_symbol, rsi_date from relative_strength_index where hd.symbol = rsi_symbol and hd.histDate = rsi_date)
order by hd.symbol, hd.date asc
)T
WHERE T.rank >= 15
It's not clear (to me) what resultset you want to return, or the conditions that specify whether a row should be returned.
All we have to go on is a confusingly vague description, to exclude "the first 14 rows", or "the first 14 days" for each symbol.
What we don't have is a represetative sample of the data, or an example of what rows should be returned.
Without that, we don't have a way to know if we understand the description of the specification, and we don't have anything to test against or to compare our results to.
So, we are basically just guessing. (Which seems to be the most popular kind of answer provided by the "try this" enthusiatss.)
I can provide some examples of some patterns, which may suit your specification, or may not.
To get the earliest `histdate` for each `symbol`, and add 14 days to that, we can use an inline view. We can then do a semi-join to the `historical_data` data, to exclude rows that have a `histdate` before the date returned from the inline view.
(This is based on an assumption that the datatype of the `histdate` column is DATE.)
SELECT hd.*
FROM ( SELECT d.symbol
, MIN(d.histdate) + INTERVAL 14 DAY AS histdate
FROM historical_data d
GROUP BY d.symbol
) dd
JOIN historical_data hd
ON hd.symbol = dd.symbol
AND hd.histdate > dd.histdate
ORDER
BY hd.symbol
, hd.histdate
But that query doesn't include any reference to the `relative_strength_index` table. The original query includes a NOT EXISTS predicate, with a correlated subquery of the `relative_strength_index` table.
If the goal is get the earliest `rsi_date` for each `rsi_symbol` from that table, and then add 14 days to that value...
SELECT hd.*
FROM ( SELECT rsi.rsi_symbol
, MIN(rsi.rsi_date) + INTERVAL 14 DAY AS rsi_date
FROM relative_strength_index rsi
GROUP BY rsi.rsi_symbol
) rs
JOIN historical_data hd
ON hd.symbol = rs.rsi_symbol
ON hd.histdate > rs.rsi_date
ORDER
BY hd.symbol
, hd.histdate
If the goal is to exclude rows where a matching row in relative_strength_index already exists, I would use an anti-join pattern...
SELECT hd.*
FROM ( SELECT d.symbol
, MIN(d.histdate) + INTERVAL 14 DAY AS histdate
FROM historical_data d
GROUP BY d.symbol
) dd
JOIN historical_data hd
ON hd.symbol = dd.symbol
AND hd.histdate > dd.histdate
LEFT
JOIN relative_strength_index xr
ON xr.rsi_symbol = hd.symbol
AND xr.rsi_date = hd.histdate
WHERE xr.rsi_symbol IS NULL
ORDER
BY hd.symbol
, hd.histdate
These are just example query patterns, which are likely not suited to your exact specification, since they are guesses.
It doesn't make much sense to provide more examples of other patterns, without a more detailed specification.

MySQL aggregate sum of count

I have a simple group by query:
SELECT timestamp, COUNT(users)
FROM my_table
GROUP BY users
How do I add a sum_each_day column that will sum the users count of each row and will aggregate it forward to the next row and so on
The output should be like this:
timestamp | users | sum_each_day
2015-11-27 1 1
2015-11-28 5 6
2015-11-29 3 9
2015-11-30 7 16
Thanks in advance
You could use a sub-query, like this:
SELECT timestamp,
num_users,
(SELECT COUNT(users)
FROM my_table
WHERE timestamp <= main.timestamp) sum_users
FROM (
SELECT timestamp,
COUNT(users) num_users
FROM my_table
GROUP BY timestamp
) main
If you really need this in mysql it'll cost some performance but i believe a sub query with a count will solve it:
SELECT t1.timestamp, count (), select count () from my_table t2 where t2.timestamp <= t1.timestamp From my_table t1 Group by users
If you display this data through a scripting language like PHP it would be easier to keep a counter and display the aggregate per row.
I would do this using variable:
SET #total := 0;
SELECT timestamp, DayCount, (#total := #total + DayCount) AS Total
FROM
(SELECT timestamp, COUNT(users) AS DayCount
FROM my_table
GROUP BY timestamp) AS t1
Fiddler: I am not using your table structure here, but you can get idea
If I understand correclty, this will work:
set #c=0;
SELECT `timestamp`,sum(`users`),(select #c:=#c+sum(`users`))
FROM `my_table`
group by `timestamp`;

Group rows by time interval

i have a database with very much rows from a gps sender. The gps have 1 seconds delay to send next row to the database. So what i want to do is a web interface that shows travels, i dont want to show much rows, i want to group the rows to trips. So i want to do is a query who can declare a trip/travel by checking if its more then 14 minutes to next row, if it is then make a row of all rows before a give it a trip number, else add it to the "travel" collection.
Try this (example is at http://sqlfiddle.com/#!2/a0c86/39)
SELECT Trip, MIN(Date_Time), MAX(Date_Time)
FROM (
SELECT #Trip := IF(TIMESTAMPDIFF(MINUTE, #Date_Time, Date_Time) <= 20, #Trip, #Trip+1) AS TRIP
, logid
, #Date_Time := Date_time AS Date_Time
FROM gpslog
JOIN (SELECT #TRIP := 1, #Date_Time := null ) AS tmp
ORDER BY Date_Time) AS triplist
GROUP BY Trip