SQL query to retrieve records closest to timestamp - mysql

I'm trying to retrive the records from a table in my MySQL database, where:
the timestamp is the closest to a variable I provide; and,
grouped by the fields keyA, keyB, keyC and keyD
I've hard coded the variable as below to test this, however can not get the query to work.
SQLFiddle
My current schema is:
CREATE TABLE dataHistory (
timestamp datetime NOT NULL,
keyA varchar(10) NOT NULL,
keyB varchar(10) NOT NULL,
keyC varchar(25) NOT NULL,
keyD varchar(10) NOT NULL,
value int NOT NULL,
PRIMARY KEY (timestamp,keyA,keyB,keyC,keyD)
);
INSERT INTO dataHistory
(timestamp, keyA, keyB, keyC, keyD, value)
VALUES
('2016-05-12 04:15:00', 'value1', 'all', 'value2', 'domestic', 96921),
('2016-05-12 04:05:00', 'value1', 'all', 'value2', 'domestic', 96947),
('2016-05-12 04:20:00', 'value1', 'all', 'value2', 'domestic', 96954),
('2016-05-12 04:15:00', 'value1', 'all', 'value3', 'domestic', 2732),
('2016-05-12 04:10:00', 'value1', 'all', 'value3', 'domestic', 2819),
('2016-05-12 04:20:00', 'value1', 'all', 'value3', 'domestic', 2802);
and the query I currently have is:
SELECT e.difference, e.timestamp, e.keyA, e.keyB, e.keyC, e.keyD, e.value
FROM (SELECT TIMESTAMPDIFF(minute, '2016-05-12 04:11:00', d.timestamp) as difference, d.timestamp, d.keyA, d.keyB, d.keyC, d.keyD, d.value
FROM dataHistory d
GROUP BY d.keyA, d.keyB, d.keyC, d.keyD) as e;
All I can seem to extract from the sample data is the earliest two records and not the two closest to the datetime.
What I receive:
difference timestamp keyA keyB keyC keyD value
-10 May, 12 2016 04:05:00 value1 all value2 domestic 96947
-5 May, 12 2016 04:10:00 value1 all value3 domestic 2819
I am expecting to see:
timestamp keyA keyB keyC keyD value
May, 12 2016 04:15:00 value1 all value2 domestic 96921
May, 12 2016 04:10:00 value1 all value3 domestic 2819
Any assistance would be appreciated!

SELECT e.difference, e.timestamp, e.keyA, e.keyB, e.keyC, e.keyD, e.value
FROM (SELECT ABS(TIMESTAMPDIFF(minute, '2016-05-12 04:11:00', d.timestamp)) as difference, d.timestamp, d.keyA, d.keyB, d.keyC, d.keyD, d.value
FROM dataHistory d
ORDER BY difference) as e
GROUP BY e.keyA, e.keyB, e.keyC, e.keyD;
This query is returning the values you want.

Does this help?
SELECT
TIMESTAMPDIFF (MINUTE , '2016-05-12 04:15:00' , MainTable.timestamp) AS Difference ,
MainTable.timestamp ,
MainTable.KeyA ,
MainTable.KeyB ,
MainTable.KeyC ,
MainTable.KeyD ,
MainTable.value
FROM
dataHistory AS MainTable
LEFT OUTER JOIN
dataHistory AS SecondaryTable
ON
MainTable.KeyA = SecondaryTable.KeyA
AND
MainTable.KeyB = SecondaryTable.KeyB
AND
MainTable.KeyC = SecondaryTable.KeyC
AND
MainTable.KeyD = SecondaryTable.KeyD
AND
ABS (TIMESTAMPDIFF (MINUTE , '2016-05-12 04:15:00' , MainTable.timestamp)) > ABS (TIMESTAMPDIFF (MINUTE , '2016-05-12 04:15:00' , SecondaryTable.timestamp))
WHERE
SecondaryTable.timestamp IS NULL;
Guy Glantser,
Data Professional,
Madeira - Data Solutions,
http://www.madeiradata.com

You are obviously expecting some magic to happen here. You group by some fields and select the column timestamp and its difference to the current time. And somehow you think you should get the closest time to now. Why? Why should this happen? You are not telling the DBMS to do that. You are simply letting it pick one of the matching timestamps arbitrarily. To pick a particular value per group, you need an aggregate function, e.g. MIN to get a minimum value.
You need two steps:
First step: Find the minimum timestamp difference to now per group.
select
keya,
keyb,
keyc,
keyd,
min(abs(timestampdiff(minute, '2016-05-12 04:11:00', d.timestamp))) as difference
from datahistory
group by keya, keyb, keyc, keyd;
Second step: With the query from the first step, find the matching records for each of these minimum differences.
select
best.difference,
dh.timestamp,
best.keyA,
best.keyB,
best.keyC,
best.keyD,
dh.value
from
(
select
keya, keyb, keyc, keyd,
min(abs(timestampdiff(minute, '2016-05-12 04:11:00', timestamp))) as difference
from datahistory
group by keya, keyb, keyc, keyd
) best
join datahistory dh
on dh.keya = best.keya and dh.keyb = best.keyb
and dh.keyc = best.keyc and dh.keyd = best.keyd
and abs(timestampdiff(minute, '2016-05-12 04:11:00', dh.timestamp)) = best.difference
order by best.keyA, best.keyB, best.keyC, best.keyD;
SQL fiddle: http://sqlfiddle.com/#!9/a6004b/10
(Replace '2016-05-12 04:11:00' with now() in your real query.)

Related

SQL multi query

I need some help to do it right in one query (if it possible).
(this is a theoretical example and I assume the presence of events in event_name(like registration/action etc)
I have 3 colums:
-user_id
-event_timestamp
-event_name
From this 3 columns we need to create new table with 4 new columns:
-user year and month registration time
-number of new user registration in this month
-number of users who returned to the second calendar month after registration
-return probability
Result must be looks like this:
2019-1 | 1 | 1 | 100%
2019-2 | 3 | 2 | 67%
2019-3 | 2 | 0 | 0%
What I've done now:
I'm use this toy example of my possible main table:
CREATE TABLE `main` (
`event_timestamp` timestamp,
`user_id` int(10),
`event_name` char(12)
) DEFAULT CHARSET=utf8;
INSERT INTO `main` (`event_timestamp`, `user_id`, `event_name`) VALUES
('2019-01-23 20:02:21.550', '1', 'registration'),
('2019-01-24 20:03:21.550', '2', 'action'),
('2019-02-21 20:04:21.550', '3', 'registration'),
('2019-02-22 20:05:21.550', '4', 'registration'),
('2019-02-23 20:06:21.550', '5', 'registration'),
('2019-02-23 20:06:21.550', '1', 'action'),
('2019-02-24 20:07:21.550', '6', 'action'),
('2019-03-20 20:08:21.550', '3', 'action'),
('2019-03-21 20:09:21.550', '4', 'action'),
('2019-03-22 20:10:21.550', '9', 'action'),
('2019-03-23 20:11:21.550', '10', 'registration'),
('2019-03-22 20:10:21.550', '4', 'action'),
('2019-03-22 20:10:21.550', '5', 'action'),
('2019-03-24 20:11:21.550', '11', 'registration');
I'm trying to test some queries to create 4 new columns:
This is for column #1, we select month and year from timestamp where action is registration (as I guess), but I need to sum it for month (like 2019-11, 2019-12)
SELECT DATE_FORMAT(event_timestamp, '%Y-%m') AS column_1 FROM main
WHERE event_name='registration';
For column #2 we need to sum users with even_name registration in this month for every month, or.. we can trying for searching first time activity by user_id, but I don't know how to do this.
Here is some thinks about it...
SELECT COUNT(DISTINCT user_id) AS user_count
FROM main
GROUP BY MONTH(event_timestamp);
SELECT COUNT(DISTINCT user_id) AS user_count FROM main
WHERE event_name='registration';
For column #3 we need to compare user_id with the event_name registration and last month event with any event of the second month so we get users who returned for the next month.
Any idea how to create this query?
This is how to calc column #4
SELECT *,
ROUND ((column_3/column_2)*100) AS column_4
FROM main;
I hope you will find the following answer helpful.
The first column is the extraction of year and month. The new_users column is the COUNT of the unique user ids when the action is 'registration' since the user can be duplicated from the JOIN as a result of taking multiple actions the following month. The returned_users column is the number of users who have an action in the next month from the registration. The returned_users column needs a DISTINCT clause since a user can have multiple actions during one month. The final column is the probability that you asked from the two previous columns.
The JOIN clause is a self-join to bring the users that had at least one action the next month of their registration.
SELECT CONCAT(YEAR(A.event_timestamp),'-',MONTH(A.event_timestamp)),
COUNT(DISTINCT(CASE WHEN A.event_name LIKE 'registration' THEN A.user_id END)) AS new_users,
COUNT(DISTINCT B.user_id) AS returned_users,
CASE WHEN COUNT(DISTINCT(CASE WHEN A.event_name LIKE 'registration' THEN A.user_id END))=0 THEN 0 ELSE COUNT(DISTINCT B.user_id)/COUNT(DISTINCT(CASE WHEN A.event_name LIKE 'registration' THEN A.user_id END))*100 END AS My_Ratio
FROM main AS A
LEFT JOIN main AS B
ON A.user_id=B.user_id AND MONTH(A.event_timestamp)+1=MONTH(B.event_timestamp)
AND A.event_name='registration' AND B.event_name='action'
GROUP BY CONCAT(YEAR(A.event_timestamp),'-',MONTH(A.event_timestamp))
What we will do is to use window functions and aggregation -- window functions to get the earliest registration date. Then some conditional aggregation.
One challenge is the handling of calendar months. To handle this, we will truncate the dates to the beginning of the month to facilitate the date arithmetic:
select yyyymm_reg, count(*) as regs_in_month,
sum( month_2 > 0 ) as visits_2months,
avg( month_2 > 0 ) as return_rate_2months
from (select m.user_id, m.yyyymm_reg,
max( (timestampdiff(month, m.yyyymm_reg, m.yyyymm) = 1) ) as month_1,
max( (timestampdiff(month, m.yyyymm_reg, m.yyyymm) = 2) ) as month_2,
max( (timestampdiff(month, m.yyyymm_reg, m.yyyymm) = 3) ) as month_3
from (select m.*,
cast(concat(extract(year_month from event_timestamp), '01') as date) as yyyymm,
cast(concat(extract(year_month from min(case when event_name = 'registration' then event_timestamp end) over (partition by user_id)), '01') as date) as yyyymm_reg
from main m
) m
where m.yyyymm_reg is not null
group by m.user_id, m.yyyymm_reg
) u
group by u.yyyymm_reg;
Here is a db<>fiddle.
Here you go, done in T-SQL:
;with cte as(
select a.* from (
select form,user_id,sum(count_regs) as count_regs,sum(count_action) as count_action from (
select FORMAT(event_timestamp,'yyyy-MM') as form,user_id,event_name,
CASE WHEN event_name = 'registration' THEN 1 ELSE 0 END as count_regs,
CASE WHEN event_name = 'action' THEN 1 ELSE 0 END as count_action from main) a
group by form,user_id) a)
select final.form,final.count_regs,final.count_action,((CAST(final.count_action as float)/(CASE WHEN final.count_regs = '0' THEN '1' ELSE final.count_regs END))*100) as probability from (
select a.form,sum(a.count_regs) count_regs,CASE WHEN sum(b.count_action) is null then '0' else sum(b.count_action) end count_action from cte a
left join
cte b
ON a.user_id = b.user_id and
DATEADD(month,1,CONVERT(date,a.form+'-01')) = CONVERT(date,b.form+'-01')
group by a.form ) final where final.count_regs != '0' or final.count_action != '0'

SQL Query Issue - Picking the minimum time when there is a maximum number

SQL God...I need some help!
I have a data table that has a route_complete_percentage column and a created_at column.
I need two pieces of data:
the time stamp (within created_at column) when the route_complete_percentage is at its minimum but not zero
the time stamp (within created_at column) when the route_complete_percentage is at its maximum, it might be 100% or not, but when its at its highest.
Here is the kicker, there might be multiple time stamps for the highest route completion column. For example,
Example Table
I have multiple values when the route_completion_percentage is at its maximum, but I need the minimum time stamp value.
Here is the query so far...but the two time stamps are the same.
SELECT
A.fc,
A.plan_id,
A.route_id,
mintime.first_scan AS First_Batch_Scan,
min(route_complete_percentage),
maxtime.last_scan AS Last_Batch_Scan,
max(route_complete_percentage)
FROM
(SELECT
fc,
plan_id,
route_id,
route_complete_percentage,
CONCAT(plan_id, '-', route_id) AS JOINKEY
FROM
houdini_ops.BATCHINATOR_SCAN_LOGS_V2
WHERE
fc <> ''
AND order_id <> 'Can\'t find order'
AND source = 'scan'
AND created_at > DATE_ADD(CURDATE(), INTERVAL - 3 DAY)) A
LEFT JOIN
(SELECT
l.fc,
l.route_id,
l.plan_id,
CONCAT(l.plan_id, '-', l.route_id) AS JOINKEY,
CASE
WHEN MIN(route_complete_percentage) THEN CONVERT_TZ(l.created_at, 'UTC', s.time_zone)
END AS first_scan
FROM
houdini_ops.BATCHINATOR_SCAN_LOGS_V2 l
JOIN houdini_ops.O_SERVICE_AREA_ATTRIBUTES s ON l.fc = s.default_station_code
WHERE
l.fc <> ''
AND l.order_id <> 'Can\'t find order'
AND l.source = 'scan'
AND l.created_at > DATE_ADD(CURDATE(), INTERVAL - 3 DAY)
GROUP BY fc , plan_id , route_id) mintime ON A.JOINKEY = mintime.JOINKEY
LEFT JOIN
(SELECT
l.fc,
l.route_id,
l.plan_id,
CONCAT(l.plan_id, '-', l.route_id) AS JOINKEY,
CASE
WHEN MAX(route_complete_percentage) THEN CONVERT_TZ(l.created_at, 'UTC', s.time_zone)
END AS last_scan
FROM
houdini_ops.BATCHINATOR_SCAN_LOGS_V2 l
JOIN houdini_ops.O_SERVICE_AREA_ATTRIBUTES s ON l.fc = s.default_station_code
WHERE
l.fc <> ''
AND l.order_id <> 'Can\'t find order'
AND l.source = 'scan'
AND l.created_at > DATE_ADD(CURDATE(), INTERVAL - 3 DAY)
GROUP BY fc , plan_id , route_id) maxtime ON mintime.JOINKEY = maxtime.JOINKEY
GROUP BY fc , plan_id , route_id
I don't want to meddle with the rest of your query. Here is something that will do what it sounds like you need. There's sample data included. -- I interpreted your blank values as nulls from your sample data.
Basically, what you are looking for is the Minimum created_at value, inside each of the route_complete_percentage groups. So I treated route_complete_percentage as a group identifier. But you only care about two of the groups, so I identify those groups first in the cte, and use them to filter the aggregate query.
if object_id('tempdb.dbo.#Data') is not null drop table #Data
go
create table #Data (
route_complete_percentage int,
created_at datetime
)
insert into #Data (route_complete_percentage, created_at)
values
(0, '20170531 19:58'),
(1, null),
(2, null),
(3, null),
(4, null),
(5, null),
(6, null),
(7, null),
(80, null),
(90, null),
(100, '20170531 20:10'),
(100, '20170531 20:12'),
(100, '20170531 20:15')
;with cteMinMax(min_route_complete_percentage, max_route_complete_percentage) as (
select
min(route_complete_percentage),
max(route_complete_percentage)
from #Data D
-- This ensures the condition that you don't get the timestamp for 0
where D.route_complete_percentage > 0
)
select
route_complete_percentage,
min_created_at = min(created_at)
from #Data D
join cteMinMax MM on D.route_complete_percentage in (MM.min_route_complete_percentage, MM.max_route_complete_percentage)
group by route_complete_percentage

MySQL aggregate data IN, OUT times

I got in table something like this:
ID | UID | ACTION | URL | TIMESTAMP
Where ...
ID - primary key
UID - user id
ACTION - IN or OUT
URL - action URL
TIMESTAMP - action TIMESTAMP
How to aggregate all data with one query?
I mean... as output I would like table with UID,URL,TOTAL_TIME where TOTAL_TIME would be a sum of all times between IN and OUT of given URL...
I tried some custom functions, but without luck...
Example Input (timestamp simplified to show what I mean):
1|13|IN|http://www.gógle.koń|1
2|13|OUT|http://www.gógle.koń|5
...
13454|13|IN|http://www.gógle.koń|550
...
13465|13|OUT|http://www.gógle.koń|600
...
243252|13|IN|http://www.pr0nstaff.meh|tiny_leg_finger|1200
...
245431|13|OUT|http://www.pr0nstaff.meh/tiny_leg_finger|2200
PLEASE NOTE THAT THERE MAY BE A CASE (AND SURELY WILL BE) WHERE IN - OUT OF ONE URL WOULD BE BROKEN BY IN OR IN - OUT OR OUT OF OTHER
... so we canno't simply count from IN to OUT without checking the site match.
Output for example input (for UUID = 13) should be:
13|www.gógle.koń|14
13|http://www.pr0nstaff.meh/tiny_leg_finger|1000
Try this, but I'm not shure, if there IN/OUT is not always double. So please check..
CREATE TABLE test1 (
id INT NOT NULL,
uid INT NOT NULL,
action VARCHAR(3),
url varchar(100),
timestamp1 TIMESTAMP
);
INSERT INTO test1 VALUES
( 1 , 13 , 'IN', 'www.go.com', '2015-01-07 08:00:00'),
( 2 , 13 , 'OUT', 'www.go.com', '2015-01-07 09:00:00'),
( 3 , 14 , 'IN', 'www.go2.com', '2015-01-07 08:30:00'),
( 4 , 14 , 'OUT', 'www.go2.com', '2015-01-07 09:00:00'),
( 5 , 15 , 'IN', 'www.go3.com', '2015-01-07 09:00:00'),
( 6 , 16 , 'OUT', 'www.go3.com', '2015-01-07 09:00:00');
SELECT i.uid,i.url,SUM(TIMESTAMPDIFF(minute, i.timestamp1, o.timestamp1)) AS diff_hour
FROM (SELECT id,uid,url,timestamp1
FROM test1
WHERE action = 'IN') i
JOIN (SELECT id,uid,url,timestamp1
FROM test1
WHERE action = 'OUT') o
ON i.uid = o.uid
AND i.url = o.url
AND i.id < o.id
GROUP BY i.uid,i.url
ORDER BY i.uid,i.url;
Try this:
SELECT UID, URL, TIMESTAMPDIFF(HOUR, InTime, OutTime) AS TOTAL_TIME
FROM (SELECT UID, URL,
MAX(CASE WHEN ACTION = 'IN' THEN TIMESTAMP ELSE NULL END) InTime,
MAX(CASE WHEN ACTION = 'OUT' THEN TIMESTAMP ELSE NULL END) OutTime
FROM tableA
GROUP BY UID, URL
) AS A;

Need to re arrange all records into same rows

I need the following code to have all three proj1, proj4 and proj5 columns to be together in one row each according to dates.
As you can see dates are similar but it is showing in different records.
MYSQL Query is as follows:
select DISTINCT dates,proj1,proj4, proj5 from
(SELECT DISTINCT tc.dates AS dates , IF( tc.project_id = 1, tc.minutes, '' ) AS 'proj1',
IF(tc.project_id = 5, tc.minutes, '') AS 'proj5', IF(tc.project_id = 4, tc.minutes, '') AS 'proj4'
FROM timecard AS tc where (tc.dates between '2013-04-01' AND '2013-04-05') ) as X
I need all three proj1 , proj4 and proj5 records to display all in same rows and then query should have only 5 rows
You can group by the dates and then use max() to show values that are not empty
select dates, max(proj1) as proj1, max(proj4) as proj4, max(proj5) as proj5
from timecard
where tc.dates between '2013-04-01' AND '2013-04-05'
group by dates
Try this sql.
select dates,
(case t1.proj1
when t1.proj1 not null then t1.proj1
when t2.proj1 not null then t2.proj1
when t3.proj1 not null then t3.proj1
end) as "proj1",
(case t1.proj2
when t1.proj2 not null then t1.proj2
when t2.proj2 not null then t2.proj2
when t3.proj2 not null then t3.proj2
end) as "proj2",
(case t1.proj3
when t1.proj3 not null then t1.proj3
when t2.proj3 not null then t2.proj3
when t3.proj3 not null then t3.proj3
end) as "proj3"
from timecard t1,timecardt2,timecardt3
where t1.dates=t2.dates
and t2.dates=t3.dates
group by t1.dates

MySQL multidimensional select from views

I would like to display data re-arranged year by year and one of the possible solution is using views and select from them. The data matrix is something like (of course it's a ficticious demo dataset):
USA 2005 22 156
CAN 2005 14 101
MEX 2005 5 32
USA 2006 24 160
CAN 2006 16 103
USA 2007 26 163
MEX 2007 8 35
The SQL code to create and populate the table is:
DROP TABLE IF EXISTS `tab1`;<br>
CREATE TABLE `tab1` ( <br>
`id1` int(4) unsigned NOT NULL AUTO_INCREMENT,
`iso3` char(3) NOT NULL,
`year` int(4) unsigned NOT NULL,
`aaa` int(10) DEFAULT NULL,
`bbb` int(10) DEFAULT NULL,
PRIMARY KEY (`id1`)
)
INSERT INTO `tab1` VALUES
('1', 'USA', '2005', '22', '156'),
('2', 'CAN', '2005', '14', '101'),
('3', 'MEX', '2005', '5', '32'),
('4', 'USA', '2006', '24', '160'),
('5', 'CAN', '2006', '16', '103'),
('6', 'USA', '2007', '26', '163'),
('7', 'MEX', '2007', '8', '35');
COMMIT;
And now I would like to obtain for parameter 'aaa' a 2D table like this:
country 2005 2006 2007
USA 22 24 26
CAN 14 16
MEX 5 8
However the following SQL code is omitting all the lines with missing data, be it one single value and I only get one line
USA 22 24 26
The SQL code is:
SELECT view2005.Country, view2005.2005, view2006.2006, view2007.2007
FROM view2005, view2006, view2007
WHERE view2005.country = view2006.country
AND view2005.country = view2007.country
Any idea how to do it including lines with missing data? Thanks in advance.
Use left joins, and a view (or table, or inner select like below) which has all distinct countries:
SELECT c.country, view2005.2005, view2006.2006, view2007.2007
FROM (SELECT DISTINCT country FROM tab1) as c
LEFT JOIN view2005 ON view2005.country = c.country
LEFT JOIN view2006 ON view2006.country = c.country
LEFT JOIN view2007 ON view2007.country = c.country
GROUP BY c.country
EDIT:
In a more general context, what you are asking here is to create a pivot of this table, which is a common problem that has common solutions. Here is a nice "How To": http://www.artfulsoftware.com/infotree/queries.php?&bw=1339#78
It's better to use JOIN than implicit JOIN with WHERE. Additional advanatge is that you can convert it to a LEFT JOIN so data for 2005 that don't have a 2006 related row (and are not matched) will still be shown.
Use Galz's solution or search as correctly advised for how to create PIVOT queries.
One such logic to create a pivot query would be:
SELECT iso3 AS Country
, SUM(IF(year=2005, aaa, 0)) AS 2005
, SUM(IF(year=2006, aaa, 0)) AS 2006
, SUM(IF(year=2007, aaa, 0)) AS 2007
FROM tab1 AS t
GROUP BY iso3
If there are years without any data, you will get NULL in that column.
You can use COALESCE() function if you want 0 to be shown and not NULL:
SELECT iso3 AS Country
, COALESCE( SUM( IF(year=2004, aaa, 0) ) , 0) AS "2004"
, COALESCE( SUM( IF(year=2005, aaa, 0) ) , 0) AS "2005"
, COALESCE( SUM( IF(year=2006, aaa, 0) ) , 0) AS "2006"
, COALESCE( SUM( IF(year=2007, aaa, 0) ) , 0) AS "2007"
FROM tab1 AS t
GROUP BY iso3
Thank you Galz for the link to pivots and thank you ypercube for the SQL. It worked after enclosing the years into quotes to make them CHAR.
I was further intrigued by the question what happens if I add a row with no values at all or a row out of range of the years so I have added
INSERT INTO `tab1` VALUES
('7', 'ATA', '2004', '', '')
The result was that I got a mix of NULL and INT zero values. This is not good because zero is a valid number and legitimate data. So I have modified the query to get exactly the result I need:
SELECT iso3 AS countryб
SUM( IF(year=2004, aaa, NULL) ) AS "2004",
SUM( IF(year=2005, aaa, NULL) ) AS "2005",
SUM( IF(year=2006, aaa, NULL) ) AS "2006",
SUM( IF(year=2007, aaa, NULL) ) AS "2007"
FROM tab1
GROUP BY iso3