Sum amount of overlapping datetime ranges in MySQL - mysql

I have a question that is almost the same as Sum amount of overlapping datetime ranges in MySQL, so I'm reusing part of his text, hope that is ok...
I have a table of events, each with a StartTime and EndTime (as type DateTime) in a MySQL Table.
I'm trying to output the sum of overlapping times for each type of event and the number of events that overlapped.
What is the most efficient / simple way to perform this query in MySQL?
CREATE TABLE IF NOT EXISTS `events` (
`EventID` int(10) unsigned NOT NULL auto_increment,
`EventType` int(10) unsigned NOT NULL,
`StartTime` datetime NOT NULL,
`EndTime` datetime default NULL,
PRIMARY KEY (`EventID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=37 ;
INSERT INTO `events` (`EventID`, EventType,`StartTime`, `EndTime`) VALUES
(10001,1, '2009-02-09 03:00:00', '2009-02-09 10:00:00'),
(10002,1, '2009-02-09 05:00:00', '2009-02-09 09:00:00'),
(10003,1, '2009-02-09 07:00:00', '2009-02-09 09:00:00'),
(10004,3, '2009-02-09 11:00:00', '2009-02-09 13:00:00'),
(10005,3, '2009-02-09 12:00:00', '2009-02-09 14:00:00');
# if the query was run using the data above,
# the table below would be the desired output
# Number of Overlapped Events , The event type, | Total Amount of Time those events overlapped.
1,1, 03:00:00
2,1, 02:00:00
3,1, 02:00:00
1,3, 01:00:00
There is a really beautiful solution given there by Mark Byers and I'm wondering if that one can be extended to include "Event Type".
His solution without event type was:
SELECT `COUNT`, SEC_TO_TIME(SUM(Duration))
FROM (
SELECT
COUNT(*) AS `Count`,
UNIX_TIMESTAMP(Times2.Time) - UNIX_TIMESTAMP(Times1.Time) AS Duration
FROM (
SELECT #rownum1 := #rownum1 + 1 AS rownum, `Time`
FROM (
SELECT DISTINCT(StartTime) AS `Time` FROM events
UNION
SELECT DISTINCT(EndTime) AS `Time` FROM events
) AS AllTimes, (SELECT #rownum1 := 0) AS Rownum
ORDER BY `Time` DESC
) As Times1
JOIN (
SELECT #rownum2 := #rownum2 + 1 AS rownum, `Time`
FROM (
SELECT DISTINCT(StartTime) AS `Time` FROM events
UNION
SELECT DISTINCT(EndTime) AS `Time` FROM events
) AS AllTimes, (SELECT #rownum2 := 0) AS Rownum
ORDER BY `Time` DESC
) As Times2
ON Times1.rownum = Times2.rownum + 1
JOIN events ON Times1.Time >= events.StartTime AND Times2.Time <= events.EndTime
GROUP BY Times1.rownum
) Totals
GROUP BY `Count`

SELECT
COUNT(*) as occurrence
, sub.event_id
, SEC_TO_TIME(SUM(LEAST(e1end, e2end) - GREATEST(e1start, e2start)))) as duration
FROM
( SELECT
, e1.event_id
, UNIX_TIMESTAMP(e1.starttime) as e1start
, UNIX_TIMESTAMP(e1.endtime) as e1end
, UNIX_TIMESTAMP(e2.starttime) as e2start
, UNIX_TIMESTAMP(e2.endtime) as e2end
FROM events e1
INNER JOIN events e2
ON (e1.eventtype = e2.eventtype AND e1.id <> e2.id
AND NOT(e1.starttime > e2.endtime OR e1.endtime < e2.starttime))
) sub
GROUP BY sub.event_id
ORDER BY occurrence DESC

Related

How to get count value by one SQL statement?

I have a source data like:
CREATE TABLE `test` (
`startdate` varchar(100) DEFAULT NULL,
`stopdate` varchar(100) DEFAULT NULL,
`code` varchar(100) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO test (startdate,stopdate,code) VALUES
('20200630','20200731','a01')
,('20200701','2020731','a02')
,('20200702','20200801','a03')
,('20200901','20201001','a04')
,('20200629','20200701','a05')
,('20200621','20200628','a06')
;
I need to get data for every day between 20200701 and 20200703:
select '0701' as a,count(*) as b from test where startdate <= 20200701 and stopdate >= 20200701
union
select '0702' as a,count(*) as b from test where startdate <= 20200702 and stopdate >= 20200702
union
select '0703' as a,count(*) as b from test where startdate <= 20200703 and stopdate >= 20200703
But the problem is I actually have lots of data, I can not use this union one by one.
How to optimize this statement?
Join with a synthesized table that lists all the dates you want to compare with.
SELECT RIGHT(x.date,4) AS a, COUNT(*) AS b
FROM test
JOIN (
SELECT '20200701' AS date
UNION
SELECT '20200702' AS date
UNION
SELECT '20200703' AS date
) AS x ON x.date BETWEEN test.startdate AND test.stopdate
GROUP BY x.date
A bit clumsy because working with varchars that contain a data, but:
with recursive sel as (
select CONVERT('20200701',CHAR(20)) as d
union all
select date_format(adddate(d,interval 1 day),'%Y%m%d')
from sel
where d< '20200703')
select d, count(*)
from sel
left join test on startdate <= d and stopdate >=d
group by d;

Get 100 rows with maximum of 10 rows per group

I have the following query, and I would like to get 100 items from the database, but host_id is in the urls table many times, and I would like to get a maximum of 10 unique rows from that table per host_id.
select *
from urls
join hosts using(host_id)
where
(
last_run_date is null
or last_run_date <= date_sub(curdate(), interval 30 day)
)
and ignore_url != 1
limit 100
So, I would like:
Maximum Results = 100
Max Rows Per Host = 10
I am not sure what I would need to do to accomplish this task. Is there a way to do this without a subquery?
Hosts Table
CREATE TABLE `hosts` (
`host_id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`host` VARCHAR(50) NOT NULL,
`last_fetched` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
`ignore_host` TINYINT(1) UNSIGNED NOT NULL,
PRIMARY KEY (`host_id`),
UNIQUE INDEX `host` (`host`)
)
Urls Table
CREATE TABLE `urls` (
`url_id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`parent_url_id` INT(10) UNSIGNED NOT NULL,
`scheme` VARCHAR(5) NOT NULL,
`host_id` INT(10) UNSIGNED NOT NULL,
`path` VARCHAR(500) NOT NULL,
`query` VARCHAR(500) NOT NULL,
`date_found` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
`last_run_date` DATETIME NULL DEFAULT NULL,
`ignore_url` TINYINT(1) UNSIGNED NOT NULL,
PRIMARY KEY (`url_id`),
UNIQUE INDEX `host_path_query` (`host_id`, `path`, `query`)
)
Thats it (I hope)
I cant test i real. i have no data. pls test it and give me a little ping.
SELECT *
FROM (
SELECT
#nr:=IF(#lasthost = host_id, #nr+1, 1) AS nr,
u.*,
#lasthost:=IF(#lasthost = host_id, #lasthost, host_id) AS lasthost
FROM
urls u,
( SELECT #nr:=4, #lasthost:=-1 ) AS tmp
WHERE (
last_run_date IS NULL
OR last_run_date <= date_sub(curdate(), INTERVAL 30 DAY)
)
AND ignore_url != 1
ORDER BY host_id, last_run_date
) AS t
LEFT JOIN HOSTS USING(host_id)
WHERE t.nr < 11
LIMIT 100;
ok,
first:
I only select the rows with your query, and order it
by the host_id and time
SELECT
u.*
FROM
urls u
( SELECT #nr:=4, #lasthost:=-1 ) AS tmp
WHERE (
last_run_date IS NULL
OR last_run_date <= date_sub(curdate(), INTERVAL 30 DAY)
)
AND ignore_url != 1
ORDER BY host_id, last_run_date
second
I add to variables nr and lasthost and setup it in the select. Now
i count nr each row and reset it to 1 if the host_id is change. So i get a
list of rows numbert from 1 to n for each host_id
SELECT
#nr:=IF(#lasthost = host_id, #nr+1, 1) AS nr,
u.*,
#lasthost:=IF(#lasthost = host_id, #lasthost, host_id) AS lasthost
FROM
urls u,
( SELECT #nr:=4, #lasthost:=-1 ) AS tmp
WHERE (
last_run_date IS NULL
OR last_run_date <= date_sub(curdate(), INTERVAL 30 DAY)
)
AND ignore_url != 1
ORDER BY host_id, last_run_date
third
i put it this query in a new select so i can join your second table and restrict the result only for rows less 11 and also limit the result to 100
SELECT *
FROM (
SELECT
#nr:=IF(#lasthost = host_id, #nr+1, 1) AS nr,
u.*,
#lasthost:=IF(#lasthost = host_id, #lasthost, host_id) AS lasthost
FROM
urls u,
( SELECT #nr:=4, #lasthost:=-1 ) AS tmp
WHERE (
last_run_date IS NULL
OR last_run_date <= date_sub(curdate(), INTERVAL 30 DAY)
)
AND ignore_url != 1
ORDER BY host_id, last_run_date
) AS t
LEFT JOIN HOSTS USING(host_id)
WHERE t.nr < 11
LIMIT 100;
Thats all
So you need a limited JOIN. Another guess:
SELECT * FROM hosts
LEFT JOIN urls ON
urls.host_id = hosts.host_id
WHERE urls.host_id IN
(SELECT host_id FROM urls
LIMIT 0,10)
LIMIT 0,100

Load top 5 records per date

I have a table, in which there are date wise quiz score of different users. I want to load top 5 scorers for every date.
Table sample create statement:
CREATE TABLE `subscriber_score` (
`msisdn` varchar(25) COLLATE utf8_unicode_ci NOT NULL,
`date` date NOT NULL,
`score` int(11) NOT NULL DEFAULT '0',
`total_questions_sent` int(11) NOT NULL DEFAULT '0',
`total_correct_answers` int(11) NOT NULL DEFAULT '0',
`total_wrong_answers` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`msisdn`,`date`),
KEY `fk_subscriber_score_subscriber1` (`msisdn`),
CONSTRAINT `fk_subscriber_score_subscriber1` FOREIGN KEY (`msisdn`) REFERENCES `subscriber` (`msisdn`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Query which I have tried:
SELECT subscriber.msisdn AS msisdn,subscriber.name AS name,subscriber.gender AS gender,tmp2.score AS score,tmp2.date AS winning_date
FROM subscriber,
(SELECT msisdn,tmp.date,tmp.score
FROM subscriber_score,
(SELECT date,MAX(score) AS score
FROM subscriber_score
WHERE date > '2014-10-10' AND date < '2014-11-10' GROUP BY date)
tmp
WHERE subscriber_score.date=tmp.date AND subscriber_score.score=tmp.score)
tmp2
WHERE subscriber.msisdn=tmp2.msisdn ORDER BY winning_date
Actual output: Only one top scorer for every date is shown.
Wanted Output Top 5(or say 10) records for every date are required.
I think you can do this using variables to assign each row a row number, then filter the top 5 for each date.
SELECT s.name AS name,
s.gender AS gender,
s.msisdn,
ss.date,
ss.score
FROM ( SELECT ss.msisdn,
ss.score,
#r:= CASE WHEN ss.Date = #d THEN #r + 1 ELSE 1 END AS RowNum,
#d:= ss.date AS winning_date
FROM subscriber_score AS ss
CROSS JOIN (SELECT #d:= '', #r:= 0) AS v
WHERE ss.date > '2014-10-10'
AND ss.date < '2014-11-10'
ORDER BY ss.Date, ss.Score DESC
) AS ss
INNER JOIN Subscriber AS s
ON s.msisdn = ss.msisdn
WHERE ss.RowNum <= 5;
Example on SQL Fiddle
refer this query its not complete but hope it helps
SELECT SCORE
FROM table
WHERE date='somedate'
ORDER BY SCORE DESC LIMIT 5
select bc.msisdn msisdn,bc.name name,bc.gender gender,ab.score score,ab.date winning_date
(
select msisdn,date,score,
dense_rank() over (partition by date order by score desc) rnk
from subscriber_score
) ab,subscriber bc
where bc.msisdn=ab.msisdn and ab.rnk<=5
order by winning_date ;
This is how you can get solution of your problem in oracle sql.
try below
SELECT subscriber.msisdn AS msisdn,subscriber.name AS name,subscriber.gender AS gender,tmp2.score AS score,tmp2.date AS winning_date
FROM subscriber inner join
(select msisdn,date, score, ROW_NUMBER() OVER(PARTITION BY date ORDER BY score DESC) AS Row
FROM subscriber_score
WHERE date > '2014-10-10' AND date < '2014-11-10' GROUP BY date)
tmp
on subscriber.msisdn=tmp.msisdn and tmp.row<=5

Order by multiple conditions

im very noob and this became ungoogleable (is that a word?)
the rank is by time but..
time done with ( A=0 ) AND ( B=0 ) beat everyone
time done with ( A=0 ) AND ( B=1 ) beat everyone with ( A=1 )
time done with ( A=1 ) AND ( B=0 ) beat everyone with ( A=1 + B=1 )
rank example (track=desert)
pos--car------time---A----B
1.---yellow----90----No---No
2.---red-------95----No---No
3.---grey-----78-----No---Yes
4.---orange--253---No---Yes
5.---black----86----Yes---No
6.---white----149---Yes---No
7.---pink-----59----Yes---Yes
8.---blue-----61----Yes---Yes
to make it even worst, the table accept multiple records for the same car
here is the entries
create table `rank`
(
`id` int not null auto_increment,
`track` varchar(25) not null,
`car` varchar(32) not null,
`time` int not null,
`a` boolean not null,
`b` boolean not null,
primary key (`id`)
);
insert into rank (track,car,time,a,b) values
('desert','red','95','0','0'),
('desert','yellow','89','0','1'),
('desert','yellow','108','0','0'),
('desert','red','57','1','1'),
('desert','orange','120','1','0'),
('desert','grey','85','0','1'),
('desert','grey','64','1','0'),
('desert','yellow','90','0','0'),
('desert','white','92','1','1'),
('desert','orange','253','0','1'),
('desert','black','86','1','0'),
('desert','yellow','94','0','1'),
('desert','white','149','1','0'),
('desert','pink','59','1','1'),
('desert','grey','78','0','1'),
('desert','blue','61','1','1'),
('desert','pink','73','1','1');
please, help? :p
ps: sorry about the example table
To prioritize a, then b, then time, use order by b, a, time.
You can use a not exists subquery to select only the best row per car.
Finally, you can add a Pos column using MySQL's variables, like #rn := #rn + 1.
Example query:
select #rn := #rn + 1 as pos
, r.*
from rank r
join (select #rn := 0) init
where not exists
(
select *
from rank r2
where r.car = r2.car
and (
r2.a < r.a
or (r2.a = r.a and r2.b < r.b)
or (r2.a = r.a and r2.b = r.b and r2.time < r.time)
)
)
order by
b
, a
, time
See it working at SQL Fiddle.

Time interval calculation in time series using SQL

I have a MySQL table like this
CREATE TABLE IF NOT EXISTS `vals` (
`DT` datetime NOT NULL,
`value` INT(11) NOT NULL,
PRIMARY KEY (`DT`)
);
the DT is unique date with time
data sample:
INSERT INTO `vals` (`DT`,`value`) VALUES
('2011-02-05 06:05:00', 300),
('2011-02-05 11:05:00', 250),
('2011-02-05 14:35:00', 145),
('2011-02-05 16:45:00', 100),
('2011-02-05 18:50:00', 125),
('2011-02-05 19:25:00', 100),
('2011-02-05 21:10:00', 125),
('2011-02-06 00:30:00', 150);
I need to get something like this:
start|end|value
NULL,'2011-02-05 06:05:00',300
'2011-02-05 06:05:00','2011-02-05 11:05:00',250
'2011-02-05 11:05:00','2011-02-05 14:35:00',145
'2011-02-05 14:35:00','2011-02-05 16:45:00',100
'2011-02-05 16:45:00','2011-02-05 18:50:00',125
'2011-02-05 18:50:00','2011-02-05 19:25:00',100
'2011-02-05 19:25:00','2011-02-05 21:10:00',125
'2011-02-05 21:10:00','2011-02-06 00:30:00',150
'2011-02-06 00:30:00',NULL,NULL
I tried the following query:
SELECT T1.DT AS `start`,T2.DT AS `stop`, T2.value AS value FROM (
SELECT DT FROM vals
) T1
LEFT JOIN (
SELECT DT,value FROM vals
) T2
ON T2.DT > T1.DT ORDER BY T1.DT ASC
but it returns to many rows (29 instead of 9) in result and I cold not find any way to limit this using SQL. Is it Possible in MySQL?
Use a subquery
SELECT
(
select max(T1.DT)
from vals T1
where T1.DT < T2.DT
) AS `start`,
T2.DT AS `stop`,
T2.value AS value
FROM vals T2
ORDER BY T2.DT ASC
You can also use a MySQL specific solution employing variables
SELECT CAST( #dt AS DATETIME ) AS `start` , #dt := DT AS `stop` , `value`
FROM (SELECT #dt := NULL) dt, vals
ORDER BY dt ASC
But you need to do it precisely
the ORDER by must be present otherwise the variables don't roll properly
the variable needs to be NULLified within the query using a subquery to set it, otherwise if you run it twice in a row, the 2nd time it will not start with NULL
You can use a server-side variable to simulate it:
select #myvar as start, end, value, #myvar := end as next_rows_start
from vals
Variables are interpreted from left-right in sequence, so the two references to #myvar (start and next_rows_start) will output with two different values.
Just remember to reset #myvar to null before and/or after the query, otherwise the second and subsequent runs will have a wrong first row:
select #myvar := null
This would be easier if the table had a running ID column which corresponds to the times in DT (same order). If you don't want to change the table you can use a temp:
drop table if exists temp;
CREATE TABLE temp (
`id` INT(11) AUTO_INCREMENT,
`DT` datetime NOT NULL,
`value` INT(11) NOT NULL,
PRIMARY KEY (`id`)
);
insert into temp (DT,value) select * from vals order by DT asc;
select t1.DT as `start`, t2.DT as `end`, t2.value
from temp t2
left join temp t1 ON t2.id = t1.id + 1;