Get the Top records from each group in MYSQL - mysql

I've a table event_log with the following columns in MYSQL,
CREATE TABLE IF NOT EXISTS `event_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`customer_id` varchar(50) DEFAULT NULL,
`event_time` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
The sample Data can be,
id, customer_id, event_time
1 100 '2015-03-22 23:54:37'
2 100 '2015-03-21 23:54:37'
3 100 '2015-03-20 23:54:37'
4 101 '2015-03-19 23:54:37'
5 102 '2015-03-19 23:54:37'
6 102 '2015-03-18 23:54:37'
7 103 '2015-03-17 23:54:37'
8 103 '2015-03-16 23:54:37'
9 103 '2015-03-15 23:54:37'
10 103 '2015-03-14 23:54:37'
I want to group on customer_id and then pick the top 2 records from each group using event_time column (whose time is greater)
Please, suggest
Thanks,
Faisal Nasir

Here is a version that doesn't use variables:
select el.*
from event_log el
where 2 >= (select count(*)
from event_log el2
where el2.customer_id = el.customer_id and
el2.event_time >= el.event_time
);
This could even have reasonable performance with an index on event_log(customer_id, event_time).

One way to use user defined variables to pick 2 recent entries per customer_id
SELECT `id`, `customer_id`, `event_time`,row_num
FROM (
SELECT *,
#r:= CASE WHEN #g = `customer_id` THEN #r +1 ELSE 1 END row_num,
#g:= `customer_id`
FROM event_log
CROSS JOIN(SELECT #g:= NULL,#r:=0) a
ORDER BY `customer_id`,`event_time` desc
) t
where row_num <= 2
DEMO

Related

Adjacency list model duplicate parent & children

Hierarchical Data in MySQL Using the Adjacency List Model
I have this table named node_structur_data
CREATE TABLE node_structure_data (
id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
title VARCHAR(455) NOT NULL,
parent_id INT(10) UNSIGNED DEFAULT NULL,
PRIMARY KEY (id),
FOREIGN KEY (parent_id) REFERENCES node_structure_data (id)
ON DELETE CASCADE ON UPDATE CASCADE
);
Output:
id title parent_id
1 Division NULL
2 Site 1 1
3 Paper 2
4 ms1 3
How can I duplicate a node and its children?
For example Site 1
The id & parent_id should be unique but the title should stay the same.
Expected Output:
id title parent_id
1 Division NULL
2 Site 1 1
3 Paper 2
4 ms1 3
5 Site 1 1
6 Paper 5
7 ms1 6
The following approach first estimates the new max and then uses a recursive cte to find all children of the desired node 'Site 1' and determine their new possible parent_id if there were no other concurrent writes to the table.
I would recommend running the following in a transaction and locking the table during the operation to prevent concurrent table modifications.
To test this approach I added some additional sample data which I have included below, however you may see the approach in a demo with your initial sample data here
See output of working db fiddle below:
Schema (MySQL v8.0)
CREATE TABLE node_structure_data (
id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
title VARCHAR(455) NOT NULL,
parent_id INT(10) UNSIGNED DEFAULT NULL,
PRIMARY KEY (id),
FOREIGN KEY (parent_id) REFERENCES node_structure_data (id)
ON DELETE CASCADE ON UPDATE CASCADE
);
INSERT INTO node_structure_data
(`id`, `title`, `parent_id`)
VALUES
('1', 'Division', NULL),
('2', 'Site 1', '1'),
('3', 'Paper', '2'),
('4', 'ms1', '3'),
('5', 'ms2', '3'),
('6', 'os1', '4'),
('7', 'os2', '4'),
('8', 'gs1', '1'),
('9', 'hs1', '3'),
('10','js1','9');
Query #1
select 'Before Insert';
Before Insert
Before Insert
Query #2
select * from node_structure_data;
id
title
parent_id
1
Division
2
Site 1
1
3
Paper
2
4
ms1
3
5
ms2
3
6
os1
4
7
os2
4
8
gs1
1
9
hs1
3
10
js1
9
Query #3
select 'Possible Data Changes';
Possible Data Changes
Possible Data Changes
Query #4
with recursive max_id AS (
SELECT MAX(id) as id FROM node_structure_data
),
child_nodes AS (
SELECT
n.id,
title,
parent_id,
m.id+1 as new_id,
parent_id as new_parent_id
FROM
node_structure_data n
CROSS JOIN
max_id as m
WHERE
title='Site 1'
UNION ALL
SELECT
n.id,
n.title,
n.parent_id,
#row_num:=IF(#row_num=0,c.new_id,0) + 1 + #row_num as new_id,
c.new_id
FROM
child_nodes c
INNER JOIN
node_structure_data n ON n.parent_id = c.id
CROSS JOIN (
SELECT #row_num:=0 as rn
) as vars
)
SELECT * FROM child_nodes;
id
title
parent_id
new_id
new_parent_id
2
Site 1
1
11
1
3
Paper
2
12
11
4
ms1
3
13
12
5
ms2
3
14
12
9
hs1
3
15
12
6
os1
4
16
13
7
os2
4
17
13
10
js1
9
18
15
Query #5 - Performing actual insert
INSERT INTO node_structure_data (title,parent_id)
with recursive max_id AS (
SELECT MAX(id) as id FROM node_structure_data
),
child_nodes AS (
SELECT
n.id,
title,
parent_id,
m.id+1 as new_id,
parent_id as new_parent_id
FROM
node_structure_data n
CROSS JOIN
max_id as m
WHERE
title='Site 1'
UNION ALL
SELECT
n.id,
n.title,
n.parent_id,
#row_num:=IF(#row_num=0,c.new_id,0) + 1 + #row_num as new_id,
c.new_id
FROM
child_nodes c
INNER JOIN
node_structure_data n ON n.parent_id = c.id
CROSS JOIN (
SELECT #row_num:=0 as rn
) as vars
)
SELECT title,new_parent_id FROM child_nodes ORDER BY new_id;
There are no results to be displayed.
Query #6
select 'AFTER INSERT';
AFTER INSERT
AFTER INSERT
Query #7
select * from node_structure_data;
id
title
parent_id
1
Division
2
Site 1
1
3
Paper
2
4
ms1
3
5
ms2
3
6
os1
4
7
os2
4
8
gs1
1
9
hs1
3
10
js1
9
11
Site 1
1
12
Paper
11
13
ms1
12
14
ms2
12
15
hs1
12
16
os1
13
17
os2
13
18
js1
15
View on DB Fiddle
Let me know if this works for you.

MySQL: Select newest two rows per Group

I have a table like this:
CREATE TABLE `data` (
`id` int(11) NOT NULL,
`deviceId` int(11) NOT NULL,
`position_x` int(11) NOT NULL,
`position_y` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
ALTER TABLE `data`
ADD PRIMARY KEY (`id`);
COMMIT;
id, deviceId, position_x, position_y
1 1 100 0
2 2 150 50
3 3 200 20
4 1 220 20
5 1 210 10
6 2 100 40
7 3 120 50
8 3 130 60
9 2 240 15
I need the "newest" two rows per DeviceID, where a bigger ID means newer.
Right now, I'm selecting the newest row per Device via this query:
SELECT
id,
deviceId,
position_x, position_y
FROM data
WHERE deviceId > 0 AND
id IN (SELECT MAX(id) FROM data GROUP BY deviceId)
And in a loop, where I output the data, I select the second latest row for every deviceId in an individual query, which is kinda slow/dirty:
SELECT
position_x
position_y
FROM data
WHERE deviceId = :deviceId AND
id < :id
ORDER BY id DESC
LIMIT 1
Is there a way to combine both queries or at least, in one query, select the second row for every deviceId from query 1?
Thanks
You can try using row_number()
select * from
(
SELECT
id,
deviceId,
position_x, position_y,row_number() over(partition by deviceid order by id desc) as rn
FROM data
WHERE deviceId > 0
)A where rn=2
You can use a correlated subquery for this as well:
SELECT d.*
FROM data d
WHERE d.deviceId > 0 AND
d.id = (SELECT d2.id
FROM data d2
WHERE d2.deviceId = d.deviceId
ORDER BY d2.id DESC
LIMIT 1, 1
);
With an index on data(deviceId, id desc), you might be impressed at the performance.

Aggregate rows by id comparing column values

I have the following table that groups users by their permissions
userIds permissions
4,5,7,8 100,1600,500,501,502,400,401,1500,1501
The numbers in the permissions column are the sections ids.
Some of these sections may have other data associated which I retrieved and stored in another table.
sectionId userId resourceId
100 4 NULL
1600 4 NULL
500 4 NULL
501 4 NULL
502 4 NULL
400 4 NULL
401 4 1
1500 4 NULL
1501 4 NULL
100 5 NULL
1600 5 NULL
500 5 NULL
501 5 NULL
502 5 NULL
400 5 NULL
401 5 1,2
1500 5 NULL
1501 5 NULL
100 7 NULL
1600 7 NULL
500 7 NULL
501 7 NULL
502 7 NULL
400 7 NULL
401 7 2
1500 7 NULL
1501 7 NULL
100 8 NULL
1600 8 NULL
500 8 NULL
501 8 NULL
502 8 NULL
400 8 NULL
401 8 1
1500 8 NULL
1501 8 NULL
My goal is to compare, for each user in the userIds column of the first table (splitted by comma), every row of the second table in order to check if each user has the same resourceId value for that specific sectionId.
If one or more users have the same resourceId value for each section I want to keep them group together, otherwise they need to be on different rows.
This is the output I'm expecting from the sample data provided:
userIds permissions
4,8 100,1600,500,501,502,400,401,1500,1501
5 100,1600,500,501,502,400,401,1500,1501
7 100,1600,500,501,502,400,401,1500,1501
UPDATE
I managed to get the desidered output in the following way:
-- Numbers table creation
DROP temporary TABLE IF EXISTS tally;
CREATE temporary TABLE tally
(
n INT NOT NULL auto_increment PRIMARY KEY
);
INSERT INTO tally
(n)
SELECT NULL
FROM (SELECT 0 AS N
UNION ALL
SELECT 1
UNION ALL
SELECT 2
UNION ALL
SELECT 3
UNION ALL
SELECT 4
UNION ALL
SELECT 5
UNION ALL
SELECT 6
UNION ALL
SELECT 7
UNION ALL
SELECT 8
UNION ALL
SELECT 9) a,
(SELECT 0 AS N
UNION ALL
SELECT 1
UNION ALL
SELECT 2
UNION ALL
SELECT 3
UNION ALL
SELECT 4
UNION ALL
SELECT 5
UNION ALL
SELECT 6
UNION ALL
SELECT 7
UNION ALL
SELECT 8
UNION ALL
SELECT 9) b;
-- Split users by comma from first table
DROP temporary TABLE IF EXISTS tmppermissions2;
CREATE temporary TABLE tmppermissions2
(
userid VARCHAR(255) NOT NULL,
permissions TEXT NOT NULL
);
INSERT INTO tmppermissions2
SELECT userid,
permissions
FROM (SELECT Substring_index(Substring_index(t.userids, ',', tally.n), ',', -1
)
userId,
t.permissions
permissions
FROM tally
INNER JOIN tmppermissions t
ON Char_length(t.userids) - Char_length(
REPLACE(t.userids, ',',
'')) >=
tally.n - 1
ORDER BY n) AS split;
-- Gets the users with the same permissions
DROP temporary TABLE IF EXISTS sharedprofiles;
CREATE temporary TABLE sharedprofiles
(
userids VARCHAR(255) NOT NULL,
permissions TEXT NOT NULL,
profileid INT(11)
);
INSERT INTO sharedprofiles
SELECT Group_concat(userid),
permissions,
NULL
FROM tmppermissions2
WHERE userid NOT IN (SELECT split.userid
FROM (SELECT Substring_index(Substring_index(r.userids,
',',
t.n), ',', -1)
userId
FROM tally t
INNER JOIN tmppermissions r
ON Char_length(r.userids)
- Char_length(
REPLACE(r.userids, ',',
'')) >=
t.n - 1
WHERE Position(',' IN r.userids) > 0
ORDER BY n) AS split
WHERE split.userid IN (SELECT *
FROM (SELECT Group_concat(userid
ORDER
BY userid ASC)
AS
users
FROM
tmpcurrentresources2
GROUP BY resourceid,
sectionid
ORDER BY users) b
WHERE Position(',' IN b.users) =
0))
GROUP BY permissions
ORDER BY Group_concat(userid);
-- Gets the users with specific permissions
DROP temporary TABLE IF EXISTS singleprofiles;
CREATE temporary TABLE singleprofiles
(
userid VARCHAR(255) NOT NULL,
permissions TEXT NOT NULL,
profileid INT(11)
);
INSERT INTO singleprofiles
SELECT userid,
permissions,
NULL
FROM tmppermissions2
WHERE userid IN (SELECT split.userid
FROM (SELECT Substring_index(Substring_index(r.userids, ',',
t.n),
',', -1)
userId
FROM tally t
INNER JOIN tmppermissions r
ON Char_length(r.userids) -
Char_length(
REPLACE(r.userids, ',',
'')) >=
t.n - 1
WHERE Position(',' IN r.userids) > 0
ORDER BY n) AS split
WHERE split.userid IN (SELECT *
FROM (SELECT Group_concat(userid
ORDER BY
userid ASC)
AS
users
FROM tmpcurrentresources2
GROUP BY resourceid,
sectionid
ORDER BY users) b
WHERE Position(',' IN b.users) = 0))
ORDER BY userid;
-- Merge the results
SELECT *
FROM sharedprofiles
UNION
SELECT *
FROM singleprofiles;
I'm wondering if there is a more concise way to accomplish the same result.
The solution (as I suspect you already know) is to normalise your schema.
So instead of...
userIds permissions
4,5 100,1600,500
...you might have
userIds permissions
4 100
4 1600
4 500
5 100
5 1600
5 500

mysql rank and subtraction of count

I have a table to store the votes. I query out the rank of candidates, and I also want the candidate to see how many votes are required to equal the votes held by the candidate ranked immediately above them.
CREATE TABLE `vote` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`candidateid` int(11) NOT NULL,
`openid` varchar(2048) NOT NULL,
`weight` int(11) DEFAULT '1',
`time` bigint(20) DEFAULT NULL,
`date` varchar(56) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=73 DEFAULT CHARSET=utf8;
select t.* , #curRank := #curRank + 1 AS rank
from
(
SELECT candidateid,
count(*) as num
FROM vote p
group by candidateid
ORDER BY num desc
) t, (SELECT #curRank := 0) r
As far as I got
candidateid num rank
1 42 1
6 16 2
8 9 3
2 3 4
7 1 5
4 1 6
I want to get
candidateid num sub rank
1 42 0 1
6 16 26 2
8 9 7 3
2 3 6 4
7 1 2 5
4 1 0 6
e.g. candidateid=6 requires 26 votes to equal the candidate ranked above them. candidateid=2 only needs 6 votes to reach 9, drawing level with candidateid=8.
Just extend your query with an additional variable to calculate the difference:
select t.candidateid , #curRank := #curRank + 1 AS rank, if(#prevote=-1, 0,#prevote-t.num) as sub, #prevote:=t.num as num
from
(
SELECT candidateid,
count(*) as num
FROM vote p
group by candidateid
ORDER BY num desc
) t, (SELECT #curRank := 0, #prevote:=-1) r

mysql groupwise max as second where condition

I have a working query that seems awfully inefficient; I'm wondering if I'm missing a simple way to improve it.
Simple table:
id date master_id
-------------------------
1 2015-02-01 0
2 2015-02-02 0
3 2015-02-03 0
4 2015-02-04 1
5 2015-02-02 1
6 2015-02-17 1
7 2015-02-27 1
8 2015-01-01 1
Objective: Get all rows where the master_id is zero, OR the master_id is not zero and no other rows of the same master_id have an earlier date. Order every result by date.
Current query, using a groupwise minimum subquery to create the second WHERE condition.
SELECT *
FROM `test`
WHERE `master_id` =0
OR `id` IN (
SELECT test.`id`
FROM (
SELECT `master_id`, MIN(`date`) AS mindate
FROM `test`
WHERE `master_id` 0
GROUP BY `master_id`
) AS x
INNER JOIN `test` ON x.`master_id` = test.`master_id`
AND x.mindate= test.`date`
)
ORDER BY `date`
It works, but the EXPLAIN makes it seem inefficient:
id select_type table type possible_keys key key_len ref rows Extra
-------------------------------------------------------------------------------------------------------------
1 PRIMARY test ALL NULL NULL NULL NULL 8 Using where; Using filesort
2 DEPENDENT SUBQUERY derived3 system NULL NULL NULL NULL 1
2 DEPENDENT SUBQUERY test eq_ref PRIMARY PRIMARY 4 func 1 Using where
3 DERIVED test ALL NULL NULL NULL NULL 8 Using where; Using temporary; Using filesort
Can I improve this? Should I break it into two queries, one for ID=0 and one for the groupwise min? Thanks in advance.
Avoiding the inner join can improve the query:
SELECT *
FROM `test`
WHERE `master_id` =0
OR `id` IN (
SELECT t1.id
FROM (SELECT *
FROM test t2
WHERE t2.master_id!=0
ORDER BY t2.date ASC) t1
GROUP BY t1.master_id
)
ORDER BY `date`;
How about this...
SELECT * FROM test WHERE master_id = 0
UNION
SELECT x.*
FROM test x
JOIN (SELECT master_id,MIN(date) min_date FROM test GROUP BY master_id) y
ON y.master_id = x.master_id
AND y.min_date = x.date;