Using time interval in table for select in another - mysql

I am using a MySQL data base with 2 tables:
In one table I have BatchNum and Time_Stamp. In another I have ErrorCode and Time_Stamp.
My goal is to use timestamps in one table as the beginning and end of an interval within which I'd like to select in another table. I would like to select the beginning and end of intervals within which the BatchNum is constant.
CREATE TABLE Batch (BatchNum INT, Time_Stamp DATETIME);
INSERT INTO Batch VALUES (1,'2020-12-17 07:29:36'), (1, '2020-12-17 08:31:56'), (1, '2020-12-17 08:41:56'), (2, '2020-12-17 09:31:13'), (2, '2020-12-17 10:00:00'), (2, '2020-12-17 10:00:57'), (2, '2020-12-17 10:01:57'), (3, '2020-12-17 10:47:59'), (3, '2020-12-17 10:48:59'), (3, '2020-12-17 10:50:59');
CREATE TABLE Errors (ErrorCode INT, Time_Stamp DATETIME);
INSERT INTO Errors VALUES (10, '2020-12-17 07:29:35'), (11, '2020-12-17 07:30:00'), (12, '2020-12-17 07:30:35'), (10, '2020-12-17 07:30:40'), (22, '2020-12-17 10:01:45'), (23, '2020-12-17 10:48:00');
In my example below, I would like something like SELECT BatchNum , ErrorCode, Errors.Time_Stamp WHERE Erorrs.Time_Stamp BETWEEN beginning_of_batch and end_of_batch:
+----------+-----------+---------------------+
| BatchNum | ErrorCode | Errors.Time_Stamp |
+----------+-----------+---------------------+
| 1 | 11 | 2020-12-17 07:30:00 |
| 1 | 12 | 2020-12-17 07:30:35 |
| 1 | 10 | 2020-12-17 07:30:40 |
| 2 | 22 | 2020-12-17 10:01:45 |
| 3 | 23 | 2020-12-17 10:48:00 |
+----------+-----------+---------------------+
I am using an answer from a previous question:
Select on value change
to find BatchNum changes but I don't know how to include this in another select to get the ErrorCodes happening within the interval defined by BatchNum changes.

I think you want:
select b.*, e.error_code, e.time_stamp as error_timestamp
from (
select b.*,
lead(time_stamp) over(order by time_stamp) lead_time_stamp
from batch b
) b
inner join errors e
on e.time_stamp >= b.time_stamp
and (e.time_stamp < b.lead_time_stamp or b.lead_time_stamp is null)

Related

How to count duplication items?

Every customer should not have duplicated code, as you can see the result below for example Customer-A have duplicated Code of 22 and Customer-D have duplicated Code of 44
I like to run a query to get a number of how many duplications do we have, from the result below it should be 4. I have tried using Group By Code and Having but not having much luck.
customer Code
------ ---------
A 11
A 22
A 22
B 33
C 22
D 44
D 44
D 44
D 22
We can use group by and keep the combinations with more than one line
create table t(
customer char(1),
Code int);
insert into t values
('A', 11),
('A', 22),
('A', 22),
('B', 33),
('C', 22),
('D', 44),
('D', 44),
('D', 44),
('D', 22);
SELECT
customer,
code,
count(*) "number"
FROM t
GROUP BY
customer,
code
HAVING
COUNT(*) > 1;
customer | code | number
:------- | ---: | -----:
A | 22 | 2
D | 44 | 3
db<>fiddle here

SQL - get distinct values and its frequency count of occurring in a group

I have a table data as:
CREATE TABLE SERP (
id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
s_product_id INT,
search_product_result VARCHAR(255)
);
INSERT INTO SERP(s_product_id, search_product_result)
VALUES
(0, 'A'),
(0, 'B'),
(0, 'C'),
(0, 'D'),
(1, 'A'),
(1, 'E'),
(2, 'A'),
(2, 'B'),
(3, 'D'),
(3, 'E'),
(3, 'D');
The data set is as follows:
s_product_id | search_product_result
___________________________________________
0 | A
0 | B
0 | C
0 | D
-------------------------------------------
1 | A
1 | E
-------------------------------------------
2 | A
2 | B
-------------------------------------------
3 | D
3 | E
3 | D
I need to list all distinct search_product_result values and count frequencies of these values occurring in s_product_id.
Required Output result-set:
DISTINCT_SEARCH_PRODUCT | s_product_id_frequency_count
------------------------------------------------------------
A | 3
B | 2
C | 1
D | 2 [occurred twice in 3, but counted only once.]
E | 2
Here, A occurs in three s_product_id : 0, 1, 2, B in two : 0, 2, and so on.
D occurred twice in the same group 3, but is counted only once for that group.
I tried grouping by search_product_result, but this counts D twice in the same group.
select search_product_result, count(*) as Total from serp group by search_product_result
Output:
search_product_result | Total
------------------------------------
A | 3
B | 2
C | 1
D | 3 <---
B | 2
You can try below - use count(distinct s_product_id)
select search_product_result, count(distinct s_product_id) as Total
from serp group by search_product_result
use count(distinct()
select search_product_result, count(distinct s_product_id, search_product_result) as Total
from SERP
group by search_product_result
see dbfiddle

mysql running difference with group by

Dataset I am experimenting has the structure as given in this SQLFiddle.
create table readings_tab (id int, site varchar(15), logged_at datetime, reading smallint);
insert into readings_tab values (1, 'A', '2017-08-21 13:22:00', 2500);
insert into readings_tab values (2, 'B', '2017-08-21 13:22:00', 1210);
insert into readings_tab values (3, 'C', '2017-08-21 13:22:00', 3500);
insert into readings_tab values (4, 'A', '2017-08-22 13:22:00', 2630);
insert into readings_tab values (5, 'B', '2017-08-22 13:22:00', 1400);
insert into readings_tab values (6, 'C', '2017-08-22 13:22:00', 3800);
insert into readings_tab values (7, 'A', '2017-08-23 13:22:00', 2700);
insert into readings_tab values (8, 'B', '2017-08-23 13:22:00', 1630);
insert into readings_tab values (9, 'C', '2017-08-23 13:22:00', 3950);
insert into readings_tab values (10, 'A', '2017-08-24 13:22:00', 2850);
insert into readings_tab values (11, 'B', '2017-08-24 13:22:00', 1700);
insert into readings_tab values (12, 'C', '2017-08-24 13:22:00', 4200);
insert into readings_tab values (13, 'A', '2017-08-25 13:22:00', 3500);
insert into readings_tab values (14, 'B', '2017-08-25 13:22:00', 2300);
insert into readings_tab values (15, 'C', '2017-08-25 13:22:00', 4700);
Current Query:
select t.rownum, t.logged_on, t.tot_reading, coalesce(t.tot_reading - t3.tot_reading, 0) AS daily_generation
from
(
select #rn:=#rn+1 AS rownum, date(t.logged_at) AS logged_on, sum(t.reading) AS tot_reading
from readings_tab t, (SELECT #rn:=0) t2
group by date(t.logged_at)
order by date(t.logged_at) desc
) t
left join
(
select #rn:=#rn+1 AS rownum, date(t.logged_at) AS logged_on, sum(t.reading) AS tot_reading
from readings_tab t, (SELECT #rn:=0) t2
group by date(t.logged_at)
order by date(t.logged_at) desc
) t3 on t.rownum = t3.rownum + 1
order by t.logged_on desc;
I am expecting below output. I don't need the formula (3500+2300+4700, etc...) in the result set. Just included it to make it understandable.
-----------------------------------------------------------------
| logged_on | tot_reading | daily_generation |
-----------------------------------------------------------------
| 2017-08-25 | (3500+2300+4700) = 10500 | (10500 - 8750) = 1750 |
| 2017-08-24 | (2850+1700+4200) = 8750 | (8750-8280) = 470 |
| 2017-08-23 | (2700+1630+3950) = 8280 | (8280-7830) = 450 |
| 2017-08-22 | (2630+1400+3800) = 7830 | (7830-7210) = 620 |
| 2017-08-21 | (2500+1210+3500) = 7210 | 0 |
-----------------------------------------------------------------
I cannot figure out why it doesn't produce expected output. Can someone please help?
If using variables make sure they are unique to each subquery else you can get incorrect results. I suggest the following adjusted query (which has some added columns to help follow what is happening):
select
t.rownum, t.logged_on, t.tot_reading
, coalesce(t.tot_reading - t3.tot_reading, 0) AS daily_generation
, t3.rownum t3_rownum
, t3.tot_reading t3_to_read
, t.tot_reading t_tot_read
from
(
select #rn:=#rn+1 AS rownum, date(t.logged_at) AS logged_on, sum(t.reading) AS tot_reading
from readings_tab t
cross join (SELECT #rn:=0) t2
group by date(t.logged_at)
order by date(t.logged_at) desc
) t
left join
(
select #rn2:=#rn2+1 AS rownum, date(t.logged_at) AS logged_on, sum(t.reading) AS tot_reading
from readings_tab t
cross join (SELECT #rn2:=0) t2
group by date(t.logged_at)
order by date(t.logged_at) desc
) t3 on t.rownum = t3.rownum + 1
order by t.logged_on desc
;
Note I also recommend using explicit CROSS JOIN syntax as it leads to easier comprehension for anyone who needs to maintain this query.
Here is the result (& also see http://sqlfiddle.com/#!9/dcb5e2/1 )
| rownum | logged_on | tot_reading | daily_generation | t3_rownum | t3_to_read | t_tot_read |
|--------|------------|-------------|------------------|-----------|------------|------------|
| 5 | 2017-08-25 | 10500 | 1750 | 4 | 8750 | 10500 |
| 4 | 2017-08-24 | 8750 | 470 | 3 | 8280 | 8750 |
| 3 | 2017-08-23 | 8280 | 450 | 2 | 7830 | 8280 |
| 2 | 2017-08-22 | 7830 | 620 | 1 | 7210 | 7830 |
| 1 | 2017-08-21 | 7210 | 0 | (null) | (null) | 7210 |

How to make Mysql variables work in a query

I've been struggling for a while now with attempting to generate code for automatic aggregations in my mysql/mariadb database. The method That i'm currently trying uses variables. I will admit in advance I'm not a database expert by any means. I'm totally self taught, and have been struggling to find adequate resources for this particular problem. Ive included simplified examples below, Oh and i'm using mariadb 10.1.
This code should work in mysql 5.6 as well as mariadb 10.0+, I have tested it on 10.1 and it works.
Here is my Table: and SQL FIDDLE <- doesn't work for some reason. Probably the dynamic columns. I'll leave it in case someone knows why.
CREATE TABLE data_points
(
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
device_id INTEGER,
dtime DATETIME,
sf INTEGER(11), -- sample frequency or interval
agg INTEGER(11), -- aggregation type, actually a fk
data_point BLOB,
PRIMARY KEY (id),
UNIQUE (device_id, dtime, sf, agg)
);
Lets insert some data:
INSERT INTO data_points
(device_id, dtime, sf, agg, data_point)
VALUES
(1, '2015-01-02 12:00:00', 1, 60, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 13:00:00', 1, 60, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 14:00:00', 1, 60, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 15:00:00', 1, 60, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 16:00:00', 1, 60, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45));
So up to this point everything works just fine. What i'm trying to do is perform aggregations over different time periods, my lowest grain period is 60 seconds. Here is where I have issues, Its probably something obvious.
SELECT
#dp_dtime := MAX(dtime),
#dp_aa := MIN(ROUND(COLUMN_GET(data_point, 'aa' AS DOUBLE), 4)),
#dp_ab := MIN(ROUND(COLUMN_GET(data_point, 'ab' AS DOUBLE), 4)),
#dp_ac := MIN(ROUND(COLUMN_GET(data_point, 'ac' AS DOUBLE), 4))
FROM data_points
WHERE
device_id = 1 AND
dtime BETWEEN '2015/01/02 12:00:00' AND '2015/01/17 23:05:00' AND
sf = 60 AND
agg = 1;
INSERT INTO data_points
(device_id, dtime, sf, agg, data_point)
VALUES (8, #dp_dtime, 300, 2, COLUMN_CREATE('aa', #dp_aa, 'ab', #dp_ab, 'ac', #dp_ac));
This ends up creating another row with NULL everywhere a variable was in the statement.
select #dp_dtime, #dp_aa, #dp_ab, #pd_ac;
-- This results in NULL, NULL, NULL, NULL
At this point I'm pretty sure i'm doing something wrong with the variables.
It's Late, 14 hour day. Am I even close? Is there a better/easier way?
Any help would be greatly appreciated.
EDIT:
In my real use case the number of columns is dependent on the type of device were doing an aggregation for. Columns are excel style 'aa' through 'zz' possible. although the max I've seen is about 150 cols wide. This may sound like a bad design, but the performance is surprising, I can't tell the difference between these dynamic columns and actual columns. (at least as long as you don't need to index on them)
Try the following queries.
SQL:
CREATE TABLE data_points
(
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
device_id INTEGER,
dtime DATETIME,
sf INTEGER(11), -- sample frequency or interval
agg INTEGER(11), -- aggregation type, actually a fk
data_point BLOB,
UNIQUE (device_id, dtime, sf, agg)
);
INSERT INTO data_points
(device_id, dtime, sf, agg, data_point)
VALUES
(1, '2015-01-02 12:00:00', 1, 1, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 13:00:00', 1, 1, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 14:00:00', 1, 1, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 15:00:00', 1, 1, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 16:00:00', 1, 1, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45));
select * from data_points;
SELECT
#dp_dtime := MAX(dtime) as dp_dtime,
#dp_aa := MIN(ROUND(COLUMN_GET(data_point, 'aa' AS DOUBLE), 4)) as dp_aa,
#dp_ab := MIN(ROUND(COLUMN_GET(data_point, 'ab' AS DOUBLE), 4)) as dp_ab,
#dp_ac := MIN(ROUND(COLUMN_GET(data_point, 'ac' AS DOUBLE), 4)) as dp_ac
FROM data_points
WHERE
device_id = 1 AND
dtime BETWEEN '2015/01/02 12:00:00' AND '2015/1/17 23:05:00' AND
sf = 1 AND
agg = 1;
INSERT INTO data_points
(device_id, dtime, sf, agg, data_point)
VALUES (8, #dp_dtime, 300, 2, COLUMN_CREATE('aa', #dp_aa, 'ab', #dp_ab, 'ac', #dp_ac));
select * from data_points;
Output:
mysql> select * from data_points;
+----+-----------+---------------------+------+------+----------------------------+
| id | device_id | dtime | sf | agg | data_point |
+----+-----------+---------------------+------+------+----------------------------+
| 1 | 1 | 2015-01-02 12:00:00 | 1 | 1 | aaabacDZ |
| 2 | 1 | 2015-01-02 13:00:00 | 1 | 1 | aaabacDZ |
| 3 | 1 | 2015-01-02 14:00:00 | 1 | 1 | aaabacDZ |
| 4 | 1 | 2015-01-02 15:00:00 | 1 | 1 | aaabacDZ |
| 5 | 1 | 2015-01-02 16:00:00 | 1 | 1 | aaabacDZ |
+----+-----------+---------------------+------+------+----------------------------+
5 rows in set (0.00 sec)
mysql> SELECT
-> #dp_dtime := MAX(dtime) as dp_dtime,
-> #dp_aa := MIN(ROUND(COLUMN_GET(data_point, 'aa' AS DOUBLE), 4)) as dp_aa,
-> #dp_ab := MIN(ROUND(COLUMN_GET(data_point, 'ab' AS DOUBLE), 4)) as dp_ab,
-> #dp_ac := MIN(ROUND(COLUMN_GET(data_point, 'ac' AS DOUBLE), 4)) as dp_ac
-> FROM data_points
-> WHERE
-> device_id = 1 AND
-> dtime BETWEEN '2015/01/02 12:00:00' AND '2015/1/17 23:05:00' AND
-> sf = 1 AND
-> agg = 1;
+---------------------+---------+---------+---------+
| dp_dtime | dp_aa | dp_ab | dp_ac |
+---------------------+---------+---------+---------+
| 2015-01-02 16:00:00 | 12.0000 | 34.0000 | 45.0000 |
+---------------------+---------+---------+---------+
1 row in set (0.00 sec)
mysql> INSERT INTO data_points
-> (device_id, dtime, sf, agg, data_point)
-> VALUES (8, #dp_dtime, 300, 2, COLUMN_CREATE('aa', #dp_aa, 'ab', #dp_ab, 'ac', #dp_ac));
Query OK, 1 row affected (0.00 sec)
mysql> select * from data_points;
+----+-----------+---------------------+------+------+-------------------------------------------------+
| id | device_id | dtime | sf | agg | data_point |
+----+-----------+---------------------+------+------+-------------------------------------------------+
| 1 | 1 | 2015-01-02 12:00:00 | 1 | 1 | aaabacDZ |
| 2 | 1 | 2015-01-02 13:00:00 | 1 | 1 | aaabacDZ |
| 3 | 1 | 2015-01-02 14:00:00 | 1 | 1 | aaabacDZ |
| 4 | 1 | 2015-01-02 15:00:00 | 1 | 1 | aaabacDZ |
| 5 | 1 | 2015-01-02 16:00:00 | 1 | 1 | aaabacDZ |
| 6 | 8 | 2015-01-02 16:00:00 | 300 | 2 | ▒ aaabac (# A# ▒F# |
+----+-----------+---------------------+------+------+-------------------------------------------------+
6 rows in set (0.00 sec)
Possibly a simple typo: I see #_dtime.
In the UNIQUE index, put dtime last; it will make the queries faster. Mini index lesson: All = columns should come first in a composite index, in any order (cardinality makes virtually no difference). Then you can put one 'range' (dtime). Any columns after a range are not used for filtering. See my cookbook.
Get rid of id and promote the UNIQUE index to PRIMARY KEY; it will make the queries still faster. Mini index lesson: Secondary keys (such as your UNIQUE) requires bouncing between the key and the data. The PRIMARY KEY is clustered with the data (in InnoDB), thereby avoiding the bouncing. Instead a 'range scan' over the PK is a range over the table.

Detecting near duplicates above a threshold

I want to be able to query a table for records I suspect may be nearly duplicates.
I've racked my brains but can't think where to begin with this one, so I've simplified the problem as much as possible, and came to ask here!
Here's my simplified table:
CREATE TABLE sales
(
`id1` int auto_increment primary key,
`amount` decimal(6,2),
`date` datetime
);
Here's some test values:
INSERT INTO sales
(`amount`, `date`)
VALUES
(10, '2013-05-15T11:11:00'),
(11, '2013-05-15T11:11:11'),
(20, '2013-05-15T11:22:00'),
(3, '2013-05-15T12:12:00'),
(4, '2013-05-15T12:12:12'),
(45, '2013-05-15T12:22:00'),
(4, '2013-05-15T12:24:00'),
(8, '2013-05-15T13:00:00'),
(9, '2013-05-15T13:01:00'),
(10, '2013-05-15T14:00:00');
The problem
I want to return sales above amount Y, that have neighbour sales above Y that recorded within X minutes of each other.
ie, from this data:
amt, date
(10, '2013-05-15T11:11:00'),
(11, '2013-05-15T11:11:11'),
(20, '2013-05-15T11:22:00'),
(3, '2013-05-15T12:12:00'),
(4, '2013-05-15T12:12:12'),
(45, '2013-05-15T12:22:00'),
(4, '2013-05-15T12:24:00'),
(8, '2013-05-15T13:00:00'),
(9, '2013-05-15T13:01:00'),
(10, '2013-05-15T14:00:00');
where #yVal = 5 and #xMins = 10
expected result would be:
(10, '2013-05-15T11:11:00'),
(11, '2013-05-15T11:11:11'),
(20, '2013-05-15T11:22:00'),
(8, '2013-05-15T13:00:00'),
(9, '2013-05-15T13:01:00'),
I've put the above into a fiddle: http://sqlfiddle.com/#!2/cf8fe
Any help will be greatly appreciated!
Try somthing like this:
SELECT DISTINCT s1.* FROM sales s1
LEFT JOIN sales s2
ON (s1.id1 != s2.id1
AND s1.amount >= s2.amount - #xVal AND s1.amount <= s2.amount + #xVal
AND s1.date >= DATE_SUB(s2.date, INTERVAL #xMins minute) AND s1.date <= DATE_ADD(s2.date, INTERVAL #xMins minute)
)
WHERE
s2.id1 is not null
Extends
Fix some errors
Result for your data looks like:
+-----+--------+---------------------+
| id1 | amount | date |
+-----+--------+---------------------+
| 1 | 10.00 | 2013-05-15 11:11:00 |
| 2 | 11.00 | 2013-05-15 11:11:11 |
| 4 | 3.00 | 2013-05-15 12:12:00 |
| 5 | 4.00 | 2013-05-15 12:12:12 |
| 8 | 8.00 | 2013-05-15 13:00:00 |
| 9 | 9.00 | 2013-05-15 13:01:00 |
+-----+--------+---------------------+
Extends 2
SELECT DISTINCT s1.* FROM sales s1
LEFT JOIN sales s2
ON (s1.id1 != s2.id1
AND s2.amount >= #xVal
AND s1.date >= DATE_SUB(s2.date, INTERVAL #xMins minute) AND s1.date <= DATE_ADD(s2.date, INTERVAL #xMins minute)
)
WHERE
s2.id1 is not null
AND s1.amount >= #xVal