Comparing Row Values Based On User MySQL - mysql

I thought every MySQL query question would be answered by now, but I have a query that I can't find a good answer to.
I would like to run a query analyzing the differences between rows in one table, based on each SecState column and Timestamp. So loop through all rows, find the next (or previous) reading based on the Timestamp of that SecState and get the difference between the Timestamp, Location, and Days readings.
Input Table Example:
+----------+---------------+-------------+-------------+
| SecState | Timestamp | Location | Days |
+----------+---------------+-------------+-------------+
| 1 | 1574614810000 | 0.030520002 | 0.068209626 |
| 2 | 1574614810000 | 0.000491507 | 0.000124721 |
| 1 | 1574614780000 | 0.030519481 | 0.068209626 |
| 2 | 1574614780000 | 0.000491507 | 0.000124721 |
| 3 | 1574614752000 | 1 | 1 |
| 3 | 1574614731000 | 1 | 1 |
| 1 | 1574614750000 | 0.03051896 | 0.068209626 |
| 2 | 1574614750000 | 0.000491493 | 0.000124721 |
| 1 | 1574614720000 | 0.030518439 | 0.068206906 |
| 2 | 1574614720000 | 0.00049148 | 0.000124721 |
| 1 | 1574614690000 | 0.030517918 | 0.068206906 |
| 2 | 1574614690000 | 0.00049148 | 0.000124721 |
| 3 | 1574614671000 | 1 | 1 |
| 3 | 1574614631000 | 1 | 1 |
| 3 | 1574614571000 | 1 | 1 |
| 1 | 1574614660000 | 0.030517397 | 0.068206906 |
| 2 | 1574614660000 | 0.000491467 | 0.000124721 |
| 1 | 1574614630000 | 0.030516876 | 0.068206906 |
+----------+---------------+-------------+-------------+
Thanks!

If you are running MySQL 8.0, you can use window function lead() to access columns of the next record.
Something like this should be what you want:
select
t.*,
lead(Timestamp) over(partition by SecState order by Timestamp)
- Timestamp TimestampDiff,
lead(Location) over(partition by SecState order by Timestamp)
- Location LocationDiff,
lead(Days) over(partition by SecState order by Timestamp)
- Days DaysDiff
from mytable t
In earlier versions, you can self-join the table and use a correlated subquery with a not exists condition to locate the next record, like so:
select
t.*,
t1.Timestamp - t.Timestamp TimestampDiff,
t1.Location - t.Location LocationDiff,
t1.Days - t.Days DaysDiff
from mytable t
left join mytable t1
on t1.SecState = t.SecState
and t1.Timestamp > t.Timestamp
and not exists (
select 1
from mytable t2
where
t2.SecState = t.SecState
and t2.Timestamp > t.Timestamp
and t2.Timestamp < t1.Timestamp
)

Related

Select complete record with earliest timestamp on a day for each employee [duplicate]

This question already has answers here:
Group by minimum value in one field while selecting distinct rows
(10 answers)
Closed 2 years ago.
I have a table that stores facial login data of employees based upon employee id. I need to get the earliest login for each employee on a day and all other logins to be ignored. I know how to get latest or earliest record for each employee but I am unable to figure out how to get earliest entry in each day by each employee.
+----+-----------+--------------------------------------+-------------+-----------------------+
| id | camera_id | image_name | employee_id | created_at |
+----+-----------+--------------------------------------+-------------+-----------------------+
| 10 | 2 | pjcc7vf142pec6li7k8kqxuqvnmhm0tyo8ib | 16 | 2020-07-11 10:40:20 |
| 11 | 2 | 9iizfdtk3m81a745ut7tzqzqh8kf9ipz2u02 | 2 | 2020-07-11 10:40:22 |
| 14 | 2 | 3p74yrq35nfaazwdo8auguvn2h5hpugtfvvw | 2 | 2020-07-11 12:07:24 |
| 15 | 2 | hpa2am40ufke7o7q2y733hh83h7ykxxdgkof | 16 | 2020-07-11 12:09:35 |
| 16 | 2 | g7adgyzloab2t4z7xx2id0a9cjqx8ojfni99 | 2 | 2020-07-11 12:09:41 |
| 17 | 2 | tapufkiuj5toxfdoikjicbe3k7tl32yj5khp | 16 | 2020-07-12 12:09:47 |
| 18 | 2 | pjcc7vf142pec6li7k8kqxuqvnmhm0tyo8ib | 16 | 2020-07-12 14:40:20 |
| 19 | 2 | 9iizfdtk3m81a745ut7tzqzqh8kf9ipz2u02 | 2 | 2020-07-12 15:40:22 |
| 20 | 2 | 3p74yrq35nfaazwdo8auguvn2h5hpugtfvvw | 2 | 2020-07-12 16:07:24 |
| 21 | 2 | hpa2am40ufke7o7q2y733hh83h7ykxxdgkof | 16 | 2020-07-12 17:09:35 |
| 22 | 2 | g7adgyzloab2t4z7xx2id0a9cjqx8ojfni99 | 2 | 2020-07-13 12:09:41 |
+----+-----------+--------------------------------------+-------------+-----------------------+
The result will look like below...
+----+-----------+--------------------------------------+-------------+-----------------------+
| id | camera_id | image_name | employee_id | created_at |
+----+-----------+--------------------------------------+-------------+-----------------------+
| 10 | 2 | pjcc7vf142pec6li7k8kqxuqvnmhm0tyo8ib | 16 | 2020-07-11 10:40:20 |
| 11 | 2 | 9iizfdtk3m81a745ut7tzqzqh8kf9ipz2u02 | 2 | 2020-07-11 10:40:22 |
| 17 | 2 | tapufkiuj5toxfdoikjicbe3k7tl32yj5khp | 16 | 2020-07-12 12:09:47 |
| 19 | 2 | 9iizfdtk3m81a745ut7tzqzqh8kf9ipz2u02 | 2 | 2020-07-12 15:40:22 |
| 22 | 2 | g7adgyzloab2t4z7xx2id0a9cjqx8ojfni99 | 2 | 2020-07-13 12:09:41 |
+----+-----------+--------------------------------------+-------------+-----------------------+
You can do:
select *
from t
where (employee_id, created_at) in (
select employee_id, min(created_at)
from t
group by employee_id, date(created_at)
)
how to get earliest entry in each day by each employee
You can filter with a correlated subquery:
select t.*
from mytable t
where t.created_at = (
select min(t1.created_at)
from mytable t1
where
t1.employee_id = t.employee_id
and t1.created_at >= date(t.created_at)
and t1.created_at < date(t.created_at) + interval 1 day
)
This query would take advantage of an index on (employee_id, created_at).
Or, if you are running MySQL 8.0, you can use window functions:
select *
from (
select
t.*,
row_number() over(
partition by employee_id, date(created_at)
order by created_at
) rn
from mytable t
) t
where rn = 1

Remove duplicates leaving at least one with highest parameter from group

I have following schema:
+--+------+-----+----+
|id|device|token|cash|
+--+------+-----+----+
column device is unique and token is not unique and null by default.
What i want to achieve is to set all duplicate token values to default (null) leaving only one with highest cash. If duplicates have same cash leave first one.
I have heard about cursor, but it seems that it can be done with usual query.
I have tried following SELECT only to see if im right about my thought how to achieve this, but it seems im wrong.
SELECT
*
FROM
db.table
WHERE
db.table.token NOT IN (SELECT
*
FROM
(
SELECT DISTINCT
MAX(db.table.balance)
FROM
db.table
GROUP BY db.table.balance) temp
)
For example:
This table after query
+-----+---------+--------+-------+
| id | device | token | cash|
+-----+---------+--------+-------+
| 1 | dev_1 | tkn_1 | 3 |
| 2 | dev_2 | tkn_1 | 10 |
| 3 | dev_3 | tkn_2 | 10 |
| 4 | dev_4 | tkn_2 | 14 |
| 5 | dev_5 | tkn_3 | 10 |
| 6 | dev_6 | null | 10 |
| 7 | dev_7 | null | 10 |
| 8 | dev_8 | tkn_4 | 11 |
| 8 | dev_8 | tkn_4 | 11 |
| 8 | dev_8 | tkn_5 | 11 |
+-----+---------+--------+-------+
should be:
+-----+---------+--------+-------+
| id | device | token | cash|
+-----+---------+--------+-------+
| 1 | dev_1 | null | 3 |
| 2 | dev_2 | tkn_1 | 10 |
| 3 | dev_3 | null | 10 |
| 4 | dev_4 | tkn_2 | 14 |
| 5 | dev_5 | tkn_3 | 10 |
| 6 | dev_6 | null | 10 |
| 7 | dev_7 | null | 10 |
| 8 | dev_8 | tkn_4 | 11 |
| 8 | dev_8 | null | 11 |
| 8 | dev_8 | tkn_5 | 15 |
+-----+---------+--------+-------+
Thanks in advance :)
Try using an EXISTS subquery:
UPDATE yourTable t1
SET token = NULL
WHERE EXISTS (SELECT 1 FROM (SELECT * FROM yourTable) t2
WHERE t2.token = t1.token AND
t2.cash > t1.cash);
Demo
Note that this answer assumes that there would never be a tie for two token records having the same highest cash amount.
To set exactly one row in the even of duplicates on the maximum cash, use the id:
update t join
(select tt.*,
(select t3.id
from t t3
where t3.token = tt.token
order by t3.cash desc, id desc
) as max_cash_id
from t tt
) tt
on t.id = tt.id and t.id < tt.max_cash_id
set token = null;

Incorrect update with inner join

I encountered a problem that I just can not solve.
For example, I have the table with rows, id, season, episode, order.
Data in the table looks like:
+--------+---------------+----------------+--------------+
| id | season | episode | order |
+--------+---------------+----------------+--------------+
| 153914 | 1 | 1 | NULL |
| 153915 | 1 | 3 | NULL |
| 153916 | 1 | 2 | NULL |
| 153919 | 1 | 3 | NULL |
| 153920 | 1 | 4 | NULL |
| 153921 | 1 | 3 | NULL |
+--------+---------------+----------------+--------------+
So, when I run SELECT query without UPDATE, row order is sorted absolutely correctly
SELECT id, season, episode, (#row:=#row+1) as order
FROM `shows`, (select #row:=0) as rc
WHERE `show_id`= 14670
ORDER BY CAST(season AS UNSIGNED) ASC, CAST(episode AS UNSIGNED) ASC
+--------+--------+---------+--------+
| id | season | episode | order |
+--------+--------+---------+--------+
| 153914 | 1 | 1 | 1 |
| 153916 | 1 | 2 | 2 |
| 153915 | 1 | 3 | 3 |
| 153919 | 1 | 3 | 4 |
| 153921 | 1 | 3 | 5 |
| 153920 | 1 | 4 | 6 |
+--------+--------+---------+--------+
But when I use the same query as the subquery of UPDATE statement it doesn't sort the same way and set different order values.
UPDATE `shows` f
JOIN
(
SELECT id, (#row:=#row+1) as rowOrder
FROM `shows` as Fl, (select #row:=0) as rc
WHERE Fl.`show_id` = 14670
ORDER BY Fl.season ASC, Fl.episode ASC
) t
ON t.id = f.id
SET f.order = t.rowOrder
mysql> SELECT id, season, episode, order FROM `shows` WHERE `show_id`=14670;
+--------+--------+---------+--------+
| id | season | episode | order |
+--------+--------+---------+--------+
| 153914 | 1 | 1 | 1 |
| 153915 | 1 | 3 | 2 |
| 153916 | 1 | 2 | 3 |
| 153919 | 1 | 3 | 4 |
| 153920 | 1 | 4 | 5 |
| 153921 | 1 | 3 | 6 |
+--------+--------+---------+--------+
Please, explain to me why it happens and how to solve it.
MySQL version:
>mysql --version
mysql Ver 14.14 Distrib 5.7.18, for Linux (x86_64) using EditLine wrapper
Hmmm. It would appear that the order by is not affecting the variable. I wonder if this has changed in recent versions of MySQL. It certainly used to work.
In any case, you can fix it by using a subquery:
UPDATE shows s JOIN
(SELECT id, (#row:=#row+1) as rowOrder
FROM (SELECT id, sea
FROM shows s2
WHERE s2.show_id = 14670
ORDER BY s2.season ASC, s2.episode ASC
) s2 CROSS JOIN
(SELECT #row := 0) as rc
) s3
ON s.id = s3.id
SET s.order = s3.rowOrder;

MySQL - Time difference between rows of the same type

I have a table with a list of agent_ids, a previous_status, a new status, and a time stamp. I'm trying to determine the time difference between each status change, by agent, in order to determine how long an agent was active in a particular status.
For example:
+------+--------------+--------------+----------------+----------------------+
| id | agent_id | old_status | new_status | date_time |
+----------------------------------------------------------------------------+
| 1 | 1 | offline | online | 2015-06-11 09:00:01 |
| 2 | 1 | online | busy | 2015-06-11 09:30:23 |
| 3 | 3 | offline | online | 2015-06-11 09:31:27 |
| 4 | 1 | busy | offline | 2015-06-11 09:31:45 |
| 5 | 3 | online | offline | 2015-06-11 09:32:10 |
+----------------------------------------------------------------------------+
The goal would be to create a new result table with a time_difference column,
and the time_difference column for row 5 for example, should be 43 seconds, which is the difference between row 5 (the most recent status for agent_id 3) and row 3, the previous status for agent_id 3. Likewise, the time_difference for row 4 should be difference between row 4 and row 2.
You can do something along the lines of
SELECT id, agent_id, old_status, new_status, date_time, seconds
FROM
(
SELECT id, agent_id, old_status, new_status, date_time,
IF(#a = agent_id, TIMESTAMPDIFF(SECOND, #p, date_time), NULL) seconds,
#a := agent_id, #p := date_time
FROM table1 t CROSS JOIN (SELECT #p := NULL, #a := NULL) i
ORDER BY agent_id, id
) q
Output:
+------+----------+------------+------------+---------------------+---------+
| id | agent_id | old_status | new_status | date_time | seconds |
+------+----------+------------+------------+---------------------+---------+
| 1 | 1 | offline | online | 2015-06-11 09:00:01 | NULL |
| 2 | 1 | online | busy | 2015-06-11 09:30:23 | 1822 |
| 4 | 1 | busy | offline | 2015-06-11 09:31:45 | 82 |
| 3 | 3 | offline | online | 2015-06-11 09:31:27 | NULL |
| 5 | 3 | online | offline | 2015-06-11 09:32:10 | 43 |
+------+----------+------------+------------+---------------------+---------+
Here is a SQLFiddle demo
You can approach this without variables, using a correlated subquery:
select t.*,
timestampdiff(second, t.date_time, t.next_date_time) as secs
from (select t.*,
(select t2.date_time
from table t2
where t2.agent_id = t.agent_id and
t2.date_time > t.date_time
order by t2.date_time
limit 1
) as next_date_time
from table t
) t

Top 'n' results for each keyword

I have a query to get the top 'n' users who commented on a specific keyword,
SELECT `user` , COUNT( * ) AS magnitude
FROM `results`
WHERE `keyword` = "economy"
GROUP BY `user`
ORDER BY magnitude DESC
LIMIT 5
I have approx 6000 keywords, and would like to run this query to get me the top 'n' users for each and every keyword we have data for. Assistance appreciated.
Since you haven't given the schema for results, I'll assume it's this or very similar (maybe extra columns):
create table results (
id int primary key,
user int,
foreign key (user) references <some_other_table>(id),
keyword varchar(<30>)
);
Step 1: aggregate by keyword/user as in your example query, but for all keywords:
create view user_keyword as (
select
keyword,
user,
count(*) as magnitude
from results
group by keyword, user
);
Step 2: rank each user within each keyword group (note the use of the subquery to rank the rows):
create view keyword_user_ranked as (
select
keyword,
user,
magnitude,
(select count(*)
from user_keyword
where l.keyword = keyword and magnitude >= l.magnitude
) as rank
from
user_keyword l
);
Step 3: select only the rows where the rank is less than some number:
select *
from keyword_user_ranked
where rank <= 3;
Example:
Base data used:
mysql> select * from results;
+----+------+---------+
| id | user | keyword |
+----+------+---------+
| 1 | 1 | mysql |
| 2 | 1 | mysql |
| 3 | 2 | mysql |
| 4 | 1 | query |
| 5 | 2 | query |
| 6 | 2 | query |
| 7 | 2 | query |
| 8 | 1 | table |
| 9 | 2 | table |
| 10 | 1 | table |
| 11 | 3 | table |
| 12 | 3 | mysql |
| 13 | 3 | query |
| 14 | 2 | mysql |
| 15 | 1 | mysql |
| 16 | 1 | mysql |
| 17 | 3 | query |
| 18 | 4 | mysql |
| 19 | 4 | mysql |
| 20 | 5 | mysql |
+----+------+---------+
Grouped by keyword and user:
mysql> select * from user_keyword order by keyword, magnitude desc;
+---------+------+-----------+
| keyword | user | magnitude |
+---------+------+-----------+
| mysql | 1 | 4 |
| mysql | 2 | 2 |
| mysql | 4 | 2 |
| mysql | 3 | 1 |
| mysql | 5 | 1 |
| query | 2 | 3 |
| query | 3 | 2 |
| query | 1 | 1 |
| table | 1 | 2 |
| table | 2 | 1 |
| table | 3 | 1 |
+---------+------+-----------+
Users ranked within keywords:
mysql> select * from keyword_user_ranked order by keyword, rank asc;
+---------+------+-----------+------+
| keyword | user | magnitude | rank |
+---------+------+-----------+------+
| mysql | 1 | 4 | 1 |
| mysql | 2 | 2 | 3 |
| mysql | 4 | 2 | 3 |
| mysql | 3 | 1 | 5 |
| mysql | 5 | 1 | 5 |
| query | 2 | 3 | 1 |
| query | 3 | 2 | 2 |
| query | 1 | 1 | 3 |
| table | 1 | 2 | 1 |
| table | 3 | 1 | 3 |
| table | 2 | 1 | 3 |
+---------+------+-----------+------+
Only top 2 from each keyword:
mysql> select * from keyword_user_ranked where rank <= 2 order by keyword, rank asc;
+---------+------+-----------+------+
| keyword | user | magnitude | rank |
+---------+------+-----------+------+
| mysql | 1 | 4 | 1 |
| query | 2 | 3 | 1 |
| query | 3 | 2 | 2 |
| table | 1 | 2 | 1 |
+---------+------+-----------+------+
Note that when there are ties -- see users 2 and 4 for keyword "mysql" in the examples -- all parties in the tie get the "last" rank, i.e. if the 2nd and 3rd are tied, both are assigned rank 3.
Performance: adding an index to the keyword and user columns will help. I have a table being queried in a similar way with 4000 and 1300 distinct values for the two columns (in a 600000-row table). You can add the index like this:
alter table results add index keyword_user (keyword, user);
In my case, query time dropped from about 6 seconds to about 2 seconds.
You can use a pattern like this (from Within-group quotas (Top N per group)):
SELECT tmp.ID, tmp.entrydate
FROM (
SELECT
ID, entrydate,
IF( #prev <> ID, #rownum := 1, #rownum := #rownum+1 ) AS rank,
#prev := ID
FROM test t
JOIN (SELECT #rownum := NULL, #prev := 0) AS r
ORDER BY t.ID
) AS tmp
WHERE tmp.rank <= 2
ORDER BY ID, entrydate;
+------+------------+
| ID | entrydate |
+------+------------+
| 1 | 2007-05-01 |
| 1 | 2007-05-02 |
| 2 | 2007-06-03 |
| 2 | 2007-06-04 |
| 3 | 2007-07-01 |
| 3 | 2007-07-02 |
+------+------------+