MySQL query for row sequences - mysql

Is there a way to express the following query in MySQL:
Let a table have types of rows A, B, C, D, E ... Z and each row represents an event. Find the timestamps and ids of all event sequences A, .. , B, ... , C ordered by timestamp so that timestamp(C) - timestamp(A) < Thresh.
For example consider the following table
| type | timestamp | id |
|------+-----------+-----|
| Z | 19:00 | 20 |
| A | 19:01 | 21 |
| | | |
| . | ... | .. |
| | | |
| A | 20:13 | 50 | *
| B | 20:14 | 51 | *
| D | 20:17 | 52 |
| C | 20:19 | 53 | *
| | | |
| . | ... | .. |
| | | |
| A | 22:13 | 80 | *
| D | 22:14 | 81 |
| B | 22:15 | 82 | *
| K | 22:16 | 83 |
| J | 22:17 | 84 |
| C | 22:19 | 85 | *
| | | |
| . | ... | .. |
| | | |
| A | 23:13 | 100 |
| B | 23:14 | 101 |
| C | 23:50 | 102 |
The rows that the query with Thresh = 10mins should yield something along the lines of:
| A_id | B_id | C_id |
|------+------+------|
| 50 | 51 | 53 |
| 80 | 82 | 85 |
See how the last triplet of A, B and C is not present. The time distance between the last A event and the last C event is more that Thresh.
I suspect that the answer would be something along the lines of "MySQL is not the right tool if you need to ask this kind of question". In that case the followup is, which database is a good candidate to handle this kind of task?
Edit: provided an example

I think you can express this using a self join:
SELECT A.id as A_id, B.id as B_id, C.id as C_id
FROM (
SELECT *
FROM the_table
WHERE type = 'A'
) A
JOIN (
SELECT *
FROM the_table
WHERE type = 'B'
) B
JOIN (
SELECT *
FROM the_table
WHERE type = 'C'
) C ON (
(C.timestamp - A.timestamp) < 10 -- threshold here
AND B.timestamp BETWEEN A.timestamp AND C.timestamp
)

Related

How can I get the last row from each given row value in a column through date? [duplicate]

This question already has answers here:
Retrieving the last record in each group - MySQL
(33 answers)
Closed 4 years ago.
I have the following table.
+--------------------+--------------+-------+
Date | SymbolNumber | Value
+--------------------+--------------+-------+
2018-08-31 15:00:00 | 123 | data
2018-09-31 15:00:00 | 456 | data
2018-09-31 15:00:00 | 123 | data
2018-09-31 15:00:00 | 555 | data
2018-10-31 15:00:00 | 555 | data
2018-10-31 15:00:00 | 231 | data
2018-10-31 15:00:00 | 123 | data
2018-11-31 15:00:00 | 123 | data
2018-11-31 15:00:00 | 555 | data
2018-12-31 15:00:00 | 123 | data
2018-12-31 15:00:00 | 555 | data
I need a query that can select the last row of each SymbolNumber stated in the query.
SELECT
*
FROM
MyTable
WHERE
symbolNumber IN (123, 555)
AND
**lastOfRow ordered by latest-date**
Expected results:
2018-12-31 15:00:00 | 123 | data
2018-12-31 15:00:00 | 555 | data
How can I do this?
First, you will need a query that get the latest date for each symbolNumber. Second, you can inner join to this table (using date) for get the rest of the columns. Like this:
SELECT
t.*
FROM
<table_name> AS t
INNER JOIN
(SELECT
symbolNumber,
MAX(date) AS maxDate
FROM
<table_name>
GROUP BY
symbolNumber) AS latest_date ON latest_date.symbolNumber = t.symbolNumber AND latest_date.maxDate = t.date
The previous query will get latest data for each existing symbolNumber on the table. If you want to restrict to symbolNumbers: 123 and 555, you will need to made next modification:
SELECT
t.*
FROM
<table_name> AS t
INNER JOIN
(SELECT
symbolNumber,
MAX(date) AS maxDate
FROM
<table_name>
WHERE
symbolNumber IN (123, 555)
GROUP BY
symbolNumber) AS latest_date ON latest_date.symbolNumber = t.symbolNumber AND latest_date.maxDate = t.date
We can do a "self-left-join" on symbolNumber, and match to other rows in the same group with higher Date value on the right side.
We will eventually consider only those rows, where higher date could not be found (meaning the current row belongs to highest date in the group).
Here is a solution avoiding subquery, and utilizing Left Join:
SELECT t1.*
FROM MyTable AS t1
LEFT JOIN MyTable AS t2
ON t2.symbolNumber = t1.symbolNumber AND
t2.Date > t1.Date -- Joining to a row in same group with higher date
WHERE t1.symbolNumber IN (123, 555) AND
t2.symbolNumber IS NULL -- Higher date not found; so this is highest row
EDIT:
Benchmarking studies comparing Left Join method v/s Derived Table (Subquery)
#Strawberry ran a little benchmark test in 5.6.21. Here's what he found...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id SERIAL PRIMARY KEY
,dense_user INT NOT NULL
,sparse_user INT NOT NULL
);
INSERT INTO my_table (dense_user,sparse_user)
SELECT RAND()*100,RAND()*100000;
INSERT INTO my_table (dense_user,sparse_user)
SELECT RAND()*100,RAND()*100000 FROM my_table;
-- REPEAT THIS LINE A FEW TIMES !!!
SELECT COUNT(DISTINCT dense_user) dense
, COUNT(DISTINCT sparse_user) sparse
, COUNT(*) total
FROM my_table;
+-------+--------+---------+
| dense | sparse | total |
+-------+--------+---------+
| 101 | 99999 | 1048576 |
+-------+--------+---------+
ALTER TABLE my_table ADD INDEX(dense_user);
ALTER TABLE my_table ADD INDEX(sparse_user);
--dense_test
SELECT x.*
FROM my_table x
LEFT
JOIN my_table y
ON y.dense_user = x.dense_user
AND y.id < x.id
WHERE y.id IS NULL
ORDER
BY dense_user
LIMIT 10;
+------+------------+-------------+
| id | dense_user | sparse_user |
+------+------------+-------------+
| 1212 | 0 | 1950 |
| 153 | 1 | 23193 |
| 255 | 2 | 27472 |
| 28 | 3 | 86440 |
| 18 | 4 | 47886 |
| 291 | 5 | 76563 |
| 15 | 6 | 85049 |
| 16 | 7 | 78384 |
| 135 | 8 | 52304 |
| 62 | 9 | 40930 |
+------+------------+-------------+
10 rows in set (2.64 sec)
SELECT x.*
FROM my_table x
JOIN
( SELECT dense_user, MIN(id) id FROM my_table GROUP BY dense_user ) y
ON y.dense_user = x.dense_user
AND y.id = x.id
ORDER
BY dense_user
LIMIT 10;
+------+------------+-------------+
| id | dense_user | sparse_user |
+------+------------+-------------+
| 1212 | 0 | 1950 |
| 153 | 1 | 23193 |
| 255 | 2 | 27472 |
| 28 | 3 | 86440 |
| 18 | 4 | 47886 |
| 291 | 5 | 76563 |
| 15 | 6 | 85049 |
| 16 | 7 | 78384 |
| 135 | 8 | 52304 |
| 62 | 9 | 40930 |
+------+------------+-------------+
10 rows in set (0.05 sec)
Uncorrelated query is 50 times faster.
--sparse test
SELECT x.*
FROM my_table x
LEFT
JOIN my_table y
ON y.sparse_user = x.sparse_user
AND y.id < x.id
WHERE y.id IS NULL
ORDER
BY sparse_user
LIMIT 10;
+--------+------------+-------------+
| id | dense_user | sparse_user |
+--------+------------+-------------+
| 165055 | 75 | 0 |
| 37598 | 63 | 1 |
| 170596 | 70 | 2 |
| 46142 | 87 | 3 |
| 33546 | 21 | 4 |
| 323114 | 87 | 5 |
| 86592 | 96 | 6 |
| 156711 | 36 | 7 |
| 17148 | 62 | 8 |
| 139965 | 71 | 9 |
+--------+------------+-------------+
10 rows in set (0.03 sec)
SELECT x.*
FROM my_table x
JOIN ( SELECT sparse_user, MIN(id) id FROM my_table GROUP BY sparse_user ) y
ON y.sparse_user = x.sparse_user
AND y.id = x.id
ORDER
BY sparse_user
LIMIT 10;
+--------+------------+-------------+
| id | dense_user | sparse_user |
+--------+------------+-------------+
| 165055 | 75 | 0 |
| 37598 | 63 | 1 |
| 170596 | 70 | 2 |
| 46142 | 87 | 3 |
| 33546 | 21 | 4 |
| 323114 | 87 | 5 |
| 86592 | 96 | 6 |
| 156711 | 36 | 7 |
| 17148 | 62 | 8 |
| 139965 | 71 | 9 |
+--------+------------+-------------+
10 rows in set (4.73 sec)
Exclusion Join is 150 times faster
However, as you move further up the result set, the picture begins to change very dramatically...
SELECT x.*
FROM my_table x
JOIN ( SELECT sparse_user, MIN(id) id FROM my_table GROUP BY sparse_user ) y
ON y.sparse_user = x.sparse_user
AND y.id = x.id
ORDER
BY sparse_user
LIMIT 10000,10;
+--------+------------+-------------+
| id | dense_user | sparse_user |
+--------+------------+-------------+
| 9810 | 93 | 10000 |
| 162438 | 4 | 10001 |
| 467371 | 62 | 10002 |
| 8258 | 13 | 10003 |
| 297049 | 17 | 10004 |
| 68354 | 23 | 10005 |
| 192701 | 64 | 10006 |
| 176225 | 92 | 10007 |
| 156595 | 37 | 10008 |
| 318266 | 1 | 10009 |
+--------+------------+-------------+
10 rows in set (9.17 sec)
SELECT x.*
FROM my_table x
LEFT
JOIN my_table y
ON y.sparse_user = x.sparse_user
AND y.id < x.id
WHERE y.id IS NULL
ORDER
BY sparse_user
LIMIT 10000,10;
+--------+------------+-------------+
| id | dense_user | sparse_user |
+--------+------------+-------------+
| 9810 | 93 | 10000 |
| 162438 | 4 | 10001 |
| 467371 | 62 | 10002 |
| 8258 | 13 | 10003 |
| 297049 | 17 | 10004 |
| 68354 | 23 | 10005 |
| 192701 | 64 | 10006 |
| 176225 | 92 | 10007 |
| 156595 | 37 | 10008 |
| 318266 | 1 | 10009 |
+--------+------------+-------------+
10 rows in set (32.19 sec) -- !!!
In summary, the exclusion join (the so-called 'strawberry query' can be (significantly) faster in certain, limited situations. More generally, an uncorrelated query will be faster.

MySQL: Comparing date using IF condition in SELECT statement

Sample data:
db1.locationDetails table
| id | locationUID | locationName |
|----|-------------|--------------|
| 1 | L0001 | Site A |
| 2 | L0002 | Site B |
| 3 | L0003 | Site C |
| 3 | L0004 | Site D |
db2.HealthData table
| id | locationID | Date_Time | memUsage |
|----|-------------|------------------|----------|
| 1 | L0001 | 2018-09-10 11:43 | 35 |
| 2 | L0002 | 2018-09-10 08:22 | 39 |
| 3 | L0003 | 2018-09-10 14:44 | 43 |
| 4 | L0004 | 2018-09-10 16:01 | 72 |
| 5 | L0001 | 2018-09-12 01:26 | 50 |
| 6 | L0002 | 2018-09-12 03:15 | 32 |
I have a query:
SELECT DISTINCT db1.locationDetails.locationUID,
db1.locationDetails.locationName,
MAX(db2.HealthData.Date_Time),
db2.HealthData.memUsage,
IF(DATE(db2.HealthData.Date_Time) = '2018-09-12', "ON", "OFF") AS Status
FROM db1.locationDetails
LEFT JOIN db2.HealthData
ON db1.locationDetails.locationUID = db2.HealthData.locationID
GROUP BY db1.locationDetails.locationUID
Based on my understanding, the 'Status' column will show "ON" if the Date is equals to 2018-09-12 but somehow it always returns "OFF" regardless of whether the value in the Date_Time column is equal to the Date value specified in the query.
Can anyone tell me what is wrong here? Thanks in advance.
Expected output:
| locationUID | locationName | Date_Time | memUsage | Status |
|-------------|--------------|-----------------|----------|--------|
| L0001 | Site A |2018-09-12 01:26 | 50 | ON |
| L0002 | Site B |2018-09-12 03:15 | 32 | ON |
| L0003 | Site C |2018-09-10 14:44 | 43 | OFF |
| L0004 | Site D |2018-09-10 16:01 | 72 | OFF |
Use subquery to get your desired result:
select x.locationuid,x.locationname,maxitme, memusage, case when date(maxtime)='2018-09-12' then 'ON' else 'OFF' end as status
from db1.locationDetails x
inner join
(select a.locationuid,maxtime,memusage
from
(SELECT locationUID,MAX(Date_Time) as maxtime FROM db2.HealthData group by locationUID)a
inner join db2.HealthData b on a.locationuid=b.locationuid)y
on x.locationuid=y.locationuid
add Group by db1.locationDetails.locationUID,db2.HealthData.id
SELECT DISTINCT db1.locationDetails.locationUID,
db1.locationDetails.locationName,
MAX(db2.HealthData.Date_Time),
db2.HealthData.memUsage,
IF(DATE(db2.HealthData.Date_Time) = '2018-09-12', "ON", "OFF") AS Status
FROM db1.locationDetails
LEFT JOIN db2.HealthData
ON db1.locationDetails.locationUID = db2.HealthData.locationID
GROUP BY db1.locationDetails.locationUID,db2.HealthData.id

mysql: comparing two columns

my tables and their layout:
mysql> select * FROM xt_shipping_zones;
+---------+-------------+---------------------------------------------------------------------------+
| zone_id | zone_name | zone_countries |
+---------+-------------+---------------------------------------------------------------------------+
| 5 | ZONE1 | AT,BE,BG,DK,FI,FR,GR,IE,IT,LV,LT,LU,MC,NL,PL,PT,RO,SM,SE,SK,SI,ES,HU,GB |
| 6 | Deutschland | DE |
| 8 | ZONE2Brutto | AD,NO,VA |
| 9 | ZONE2NETTO | CH,LI |
+---------+-------------+---------------------------------------------------------------------------+
mysql> select * FROM xt_shipping_cost WHERE shipping_geo_zone = 99995 LIMIT 5;
+------------------+-------------+-------------------+-----------------------+--------------------------+------------------------+----------------+------------------+
| shipping_cost_id | shipping_id | shipping_geo_zone | shipping_country_code | shipping_type_value_from | shipping_type_value_to | shipping_price | shipping_allowed |
+------------------+-------------+-------------------+-----------------------+--------------------------+------------------------+----------------+------------------+
| 269 | 34 | 99995 | | 0.31 | 17.99 | 17.0000 | 1 |
| 270 | 34 | 99995 | | 17.99 | 35.99 | 34.0000 | 1 |
| 271 | 34 | 99995 | | 35.99 | 53.99 | 51.0000 | 1 |
| 272 | 34 | 99995 | | 53.99 | 71.99 | 68.0000 | 1 |
| 273 | 34 | 99995 | | 71.99 | 89.99 | 85.0000 | 1 |
+------------------+-------------+-------------------+-----------------------+--------------------------+------------------------+----------------+------------------+
mysql> SELECT * FROM geoip WHERE 92569600 BETWEEN start AND end;
+----------+----------+---------+-----+
| start | end | country | id |
+----------+----------+---------+-----+
| 92569600 | 92585983 | AT | 895 |
+----------+----------+---------+-----+
My Query:
SELECT
xt_shipping_cost.shipping_type_value_from,
xt_shipping_cost.shipping_type_value_to,
xt_shipping_cost.shipping_price,
geoip.country
FROM xt_shipping_cost
INNER JOIN xt_shipping_zones
ON xt_shipping_cost.shipping_geo_zone = xt_shipping_zones.zone_id + 99990
INNER JOIN geoip
ON geoip.country REGEXP xt_shipping_zones.zone_countries
WHERE 34664448 BETWEEN geoip.start AND geoip.end
My Problem:
Query is working if there is only ONE entry in xt_shipping_zones.zone_countries like DE. If there are multiple (with comma seperated entries) i cant get a match on that row.
Doing it manually:
mysql> SELECT * FROM `xt_shipping_zones` WHERE `zone_countries` REGEXP 'AT';
+---------+-----------+---------------------------------------------------------------------------+
| zone_id | zone_name | zone_countries |
+---------+-----------+---------------------------------------------------------------------------+
| 5 | ZONE1 | AT,BE,BG,DK,FI,FR,GR,IE,IT,LV,LT,LU,MC,NL,PL,PT,RO,SM,SE,SK,SI,ES,HU,GB |
+---------+-----------+---------------------------------------------------------------------------+
SQLFiddle: http://sqlfiddle.com/#!9/68f8d0/1
I hope i didn't failed to much to make my problem clear.
Thank you
I think you can use find_in_set()
SELECT
xt_shipping_cost.shipping_type_value_from,
xt_shipping_cost.shipping_type_value_to,
xt_shipping_cost.shipping_price,
geoip.country
FROM xt_shipping_cost
INNER JOIN xt_shipping_zones
ON xt_shipping_cost.shipping_geo_zone = xt_shipping_zones.zone_id + 99990
INNER JOIN geoip
ON find_in_set(geoip.country, xt_shipping_zones.zone_countries)
WHERE 34664448 BETWEEN geoip.start AND geoip.end
It is no good idea to store the values as csv. That is very bad database design.

SQL Query design involving multiple tables

I am working on a MYSQL query design that, in my opinion, is pretty hard. I'm not experienced in SQL, so I found it really dificult. The point is:
I've got the 'ordertable' table which stores the order of some codes (AA, BB, CC..). In another table, 'AllTables' I store the name of a table associated to a code (AA -> tableA). Finally, 'tableA' table stores some data of diferent units (unit1, unit2...).
CASE 1.
ordertable : Order of codes is given like:
+----------------+------+
| split_position | code |
+----------------+------+
| 1 | AA |
| 2 | BB |
| 3 | CC |
| 4 | DD |
+----------------+------+
CASE 2.
ordertable Order of codes is given like:
+-------+------+------+------+------+
| id | pos1 | pos2 | pos3 | pos4 |
+-------+------+------+------+------+
| unit1 | AA | BB | DD | CC |
| unit2 | CC | BB | AA | DD |
| unit3 | BB | DD | CC | AA |
+-------+------+------+------+------+
In Case 2 we can also find special codes like 'var15':
+-------+------+-------+------+-------+
| id | pos1 | pos2 | pos3 | pos4 |
+-------+------+-------+------+-------+
| unit1 | AA | var15 | DD | var37 |
| unit2 | CC | BB | AA | DD |
+-------+------+-------+------+-------+
In case we find something similar to 'var'+ number the associated table is always the same: 'variable', where de 'id' is the number of the code 'var37' -> id = 37.
variable
+-----+------------+------+--------+
| id | name | time | active |
+-----+------------+------+--------+
| 15 | Pedro | 5 | 1 |
| 17 | Maria | 4 | 1 |
+-----+------------+------+--------+
Info of tables:
AllTables
+------+------------+
| code | name |
+------+------------+
| AA | tableA |
| BB | tableB |
| CC | tableC |
| DD | tableD |
+------+------------+
tableA
+-------+------+------+--------+
| id | name | time | active |
+-------+------+------+--------+
| unit1 | Mark | 11 | 1 |
| unit2 | Jame | 20 | 0 |
+-------+------+------+--------+
tableB
+-------+------+------+--------+
| id | name | time | active |
+-------+------+------+--------+
| unit1 | Mari | 44 | 1 |
| unit3 | nam2 | 57 | 1 |
+-------+------+------+--------+
Given an id='unit1', I'm expecting the next:
Result
+----------------+------+-------+-------+--------+
| split_position | code | name | time | active |
+----------------+------+-------+-------+--------+
| 1 | AA | Mark | 11 | 1 |
| 2 | BB | Mari | 44 | 1 |
| 3 | CC | | | 0 |
| 4 | DD | | | 0 |
+----------------+------+-------+-------+--------+
In case that the id (unit1) does not exists in tableC or tableD, 'split_position' and 'code' associated should appear but in the 'active' field should appear a 0.
it's a bit of a steep learning curve, but basically you have to declare a cursor and loop
over the each row in the ordertable and select your data then UNION the result together using dynamic SQL.
check this sqlFiddle
to order by final result by split position ASC just add ORDER BY split_position ASC to the sql variable before executing it like this sqlFiddle
to solve your problem you would need something like the following:
select split_position, code, name, time, active
from
(
select 'tableA' as tablename, id, [name], [time], active
from tableA
union all select 'tableB' as tablename, id, [name], [time], active
from tableB
) as tbls
inner join alltables atbls
on tbls.tablename=atbls.name
inner join ordertable ot
on atbls.code=ot.code
where tbls.id='unit1'

How I create a table without singles records in MySQL

For example, I have the next table (IN MySQL)
| a | 1002 |
| b | 1002 |
| c | 1015 |
| a | 1005 |
| b | 1016 |
| a | 1106 |
| d | 1006 |
| a | 1026 |
| f | 1106 |
I want to select the objects that are duplicates.
| a | 1002 |
| a | 1106 |
| a | 1026 |
| a | 1005 |
| b | 1002 |
| b | 1016 |
Thank you
If I understand the question, you want to select rows where the number column is duplicated. One way to do it is to join against a subquery returns a list of number values that occur more than once.
SELECT letter, number
FROM myTable A
INNER JOIN (
SELECT number
FROM myTable
GROUP BY number
HAVING COUNT(*) > 1
) B ON A.number = B.number
As an alternative, if you want the list of all values where there are duplicates, you can use group_concat:
select col1, group_concat(col2)
from t
group by col1
having count(*) > 1
This does not return the exact format you want. Instead it would return:
| a | 1002,1106,1026,1005 |
| b | 1002,1016 |
But you might find it useful.