How to find most frequent pair in SQL? - mysql

I am trying to write a query in MySQL that will output the most frequently occurring pair of values. I have the following table:
Original Dataset
This table contains users' music streaming activity on a given day. I want to find out which pair of artists was the most frequently played one on a specific day. The answer should be (Pink Floyd, Queen) because 3 users listened to both artists on the same day. How can I achieve this?
I've started by joining the table onto itself using this code:
With temp as (
select person_id, artist_name, count(*) as times_played from users where date_played = '2020-10-01' group by 1,2)
select a.person_id, a.artist_name, b.artist_name from temp a join temp b
On a.person_id = b.person_id and a.artist_name != b. artist_name;
The result is the following:
I am not sure how to process from this point, so any help would be appreciated!
Below is the code to create the table in mySQL
create table users
(
person_id int,
artist_name varchar(255),
date_played date
);
insert into users
(person_id, artist_name, date_played)
values
(1, 'Pink Floyd', '2020-10-01'),
(1, 'Led Zeppelin', '2020-10-01'),
(1, 'Queen', '2020-10-01'),
(1, 'Pink Floyd', '2020-10-01'),
(2, 'Journey', '2020-10-01'),
(2, 'Pink Floyd', '2020-10-01'),
(2, 'Queen', '2020-10-01'),
(2, 'Pink Floyd', '2020-10-01'),
(3, 'Pink Floyd', '2020-10-01'),
(3, 'Aerosmith', '2020-10-01'),
(3, 'Queen', '2020-10-01'),
(4, 'Pink Floyd', '2020-10-01'),
(4, 'Led Zeppelin', '2020-10-01');

Here's how I solved my question thanks to the trick I found in the code provided by Tim Biegeleisen in this post (u1.artist_name < u2.artist_name):
With temp AS (
SELECT
person_id,
artist_name
FROM users
WHERE date_played = '2020-10-01'
GROUP BY 1,2
)
SELECT *
FROM (
SELECT
u1.artist_name AS artist1,
u2.artist_name AS artist2,
COUNT(*) AS times_played,
RANK() OVER (ORDER BY COUNT(*) DESC) Rnk
FROM temp u1
JOIN temp u2
ON u1.artist_name < u2.artist_name AND u1.person_id = u2.person_id
GROUP by 1,2
) sub
WHERE Rnk = 1;

We can try handling this requirement using a self join along with the RANK() analytic function:
WITH cte AS (
SELECT
u1.artist_name AS artist1,
u2.artist_name AS artist2,
RANK() OVER (ORDER BY COUNT(*) DESC) rnk
FROM users u1
INNER JOIN users u2
ON u1.artist_name < u2.artist_name AND u1.person_id = u2.person_id
WHERE
u1.date_played = u2.date_played
GROUP BY
u1.artist_name,
u2.artist_name
)
SELECT
artist1,
artist2
FROM cte
WHERE rnk = 1;

Related

How do I pull random rows from multiple columns and insert them into another table?

I am trying to combine rows from three different tables into another table. I have replicated this in my db-fiddle tables and query below. My issue is that if each table has 3 rows in it, then I should have 27 possible combinations but I am only getting 3. I understand why I am only getting three but I don't know how I can change it to do what I want. As it is now, if the random number selected is 3 it pulls the id = 3 from each table. I want the number to be random for each table.
Practice table and query
create table First (id int(10), first varchar(255));
Insert into First (id, first) values (1, 'John'), (2, 'Bill'), (3, 'Chad');
create table Middle (id int(10), middle varchar(255));
Insert into Middle (id, middle) values (1, 'Ethan'), (2, 'Dave'), (3, 'Ron');
create table Last (id int(10), last varchar(255));
Insert into Last (id, last) values (1, 'Smith'), (2, 'Miller'), (3, 'Darnold');
create table Full (id int(10) auto_increment primary key, full varchar(255));
insert into Full (id) values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
Update Full u1
join (select id,
#i:=Floor(1+ RAND() * 3),
(select concat(l.last, ', ', f.first, ' ', m.middle)
from First as f, Middle as m, Last as l
where f.id = #i and m.id = #i and l.id = #i) full
from Full) u2
on u1.id = u2.id
set u1.full = u2.full;
select * from Full
Edit: I am trying to avoid exact duplicates.
So you need 3 random numbers to improve your mix
Update Full u1
join (select id,
#i:=Floor(1+ RAND() * 3),
#j:=Floor(1+ RAND() * 3),
#k:=Floor(1+ RAND() * 3),
(select concat(l.last, ', ', f.first, ' ', m.middle)
from First as f, Middle as m, Last as l
where f.id = #i and m.id = #j and l.id = #k) full
from Full) u2
on u1.id = u2.id
set u1.full = u2.full;
And using a proper JOIN mechanism
Update Full u1
join (select id,
#i:=Floor(1+ RAND() * 3),
#j:=Floor(1+ RAND() * 3),
#k:=Floor(1+ RAND() * 3),
(select concat(l.last, ', ', f.first, ' ', m.middle)
from First as f
left join Middle m on m.id = #j
left join Last l on l.id = #k
where f.id = #i ) full
from Full) u2
on u1.id = u2.id
set u1.full = u2.full;

In need of a query logic, how to group by id from the users table?

I have two tables:
INSERT INTO `companies` (`name`) VALUES
('Walmart'),
('Disney'),
('Amazon'),
('Unicom'),
('Microsoft'),
('Intel')
INSERT INTO `users` (`id`, `company`) VALUES
(1, 'Disney'),
(2, 'Amazon'),
(3, 'Intel'),
(3, 'Walmart'),
(4, 'Microsoft'),
(4, 'Unicom'),
(5, 'Microsoft')
The result should be following:
1. 'Walmart', 'Amazon', 'Unicom', 'Microsoft', 'Intel'
2. 'Walmart', 'Disney', 'Unicom', 'Microsoft', 'Intel'
3. 'Disney', 'Amazon', 'Unicom', 'Microsoft'
4. 'Walmart', 'Disney', 'Amazon', 'Intel'
5. 'Walmart', 'Disney', 'Amazon', 'Unicom', 'Intel'
I have tried with:
"SELECT a.name, b.id, b.company FROM users RIGHT JOIN companies ON b.company <> a.name"
This gives the correct logic by omitting the company name that's already on the list but the problem is that it processes the same id twice and omits a different company name. How would one approach this query?
The basic idea in the query below is to left join a calendar table containing every possible user/company combination to the users table. Those combinations which do match are removed, and the remaining companies are then rolled up into a CSV string for each user using GROUP_CONCAT.
SELECT t1.id, GROUP_CONCAT(t1.name)
FROM
(
SELECT DISTINCT u.id, c.name
FROM users u
CROSS JOIN companies c
) t1
LEFT JOIN users t2
ON t1.name = t2.company AND t1.id = t2.id
WHERE t2.company IS NULL
GROUP BY
t1.id;
Demo
Try this
"SELECT a.name, b.id, b.company FROM users RIGHT JOIN companies ON b.company <> a.name group by b.company"

Select query with left outer join and sum with group by

I have 3 tables for example
Parent Table :TEST_SUMMARY
Child Tables : TEST_DETAIL, TEST_DETAIL2
I have data show in image, and want output result shown in image,
I tried below 2 query, but not giving expected output
SELECT s.NAME, sum(s.AMT), sum(d.d_amt), sum(d2.d2_amt)
FROM TEST_SUMMARY s LEFT OUTER JOIN TEST_DETAIL d
ON s.ID = d.SUMMARY_ID
LEFT OUTER JOIN TEST_DETAIL2 d2
ON s.ID =d2.SUMMARY_ID
GROUP BY s.NAME
ORDER BY s.NAME;
select rs1.*,rs2.total1,rs3.total2
FROM
(select id, name,amt from TEST_SUMMARY a) RS1,
(select SUMMARY_ID, sum(d_amt) over(partition by summary_id ) total1 from TEST_DETAIL a) RS2,
(select SUMMARY_ID, sum(d2_amt) over(partition by summary_id ) total2 from TEST_DETAIL2 a) RS3
where rs1.id(+)= RS2.SUMMARY_ID
and rs1.id(+)= RS3.SUMMARY_ID;
Create table and insert data test Queries
CREATE TABLE TEST_SUMMARY(ID NUMBER, NAME VARCHAR2(20 BYTE),AMT NUMBER(10,2));
CREATE TABLE TEST_DETAIL (ID NUMBER, SUMMARY_ID NUMBER, NAME VARCHAR(20), D_AMT NUMBER(10,2));
CREATE TABLE TEST_DETAIL2 (ID NUMBER, SUMMARY_ID NUMBER, NAME VARCHAR(20), D2_AMT NUMBER(10,2));
INSERT INTO TEST_SUMMARY VALUES (1, 'NAME1', 100);
INSERT INTO TEST_SUMMARY VALUES (4, 'NAME1', 150);
INSERT INTO TEST_SUMMARY VALUES (6, 'NAME1', 50);
INSERT INTO TEST_SUMMARY VALUES (2, 'NAME2', 200);
INSERT INTO TEST_SUMMARY VALUES (3, 'NAME3', 300);
INSERT INTO TEST_DETAIL VALUES (1, 1, 'NAME11', 11);
INSERT INTO TEST_DETAIL VALUES (2, 1, 'NAME12', 12);
INSERT INTO TEST_DETAIL2 VALUES (1, 1, 'NAME_2_11', 1);
INSERT INTO TEST_DETAIL2 VALUES (2, 1, 'NAME_2_12', 1);
One way to solve it for both MySQL and Oracle is to use subqueries to help solve the duplication for you by aggregating the sums from the details tables by name, so you can summarise with a normal join;
SELECT ts.name, SUM(ts.amt) amt1, MAX(td1.amt) amt2, MAX(td2.amt) amt3
FROM TEST_SUMMARY ts
LEFT JOIN (
SELECT ts.name, SUM(td.d_amt) amt
FROM TEST_DETAIL td JOIN TEST_SUMMARY ts ON td.summary_id = ts.id
GROUP BY ts.name) td1 ON ts.name = td1.name
LEFT JOIN (
SELECT ts.name, SUM(td.d2_amt) amt
FROM TEST_DETAIL2 td JOIN TEST_SUMMARY ts ON td.summary_id = ts.id
GROUP BY ts.name) td2 ON ts.name = td2.name
GROUP BY ts.name
ORDER BY ts.name
A MySQL SQLfiddle and an Oracle SQLfiddle to test with.
You could try this:
SELECT
TEST_SUMMARY.NAME,
TEST_SUMMARY.AMT AS AMT1,
(
SELECT
SUM(TEST_DETAIL.D_AMT)
FROM
TEST_DETAIL
WHERE
TEST_DETAIL.SUMMARY_ID=TEST_SUMMARY.ID
) AS AMT2,
(
SELECT
SUM(TEST_DETAIL2.D2_AMT)
FROM
TEST_DETAIL2
WHERE
TEST_DETAIL2.SUMMARY_ID=TEST_SUMMARY.ID
) AS AMT3
FROM
TEST_SUMMARY
Update
You could basically do this if you have many name that are the same. But the question comes what you should do with the other fields (AMT1,AMT2)? Should you sum them for the same name or maybe a max is enough. Depends on what your requirement are :
SELECT
TEST_SUMMARY.NAME,
SUM(TEST_SUMMARY.AMT) AS AMT,
SUM(tblAMT2.AMT2) AS AMT2,
SUM(tblAMT3.AMT3) AS AMT3
FROM
TEST_SUMMARY
LEFT JOIN
(
SELECT
SUM(TEST_DETAIL.D_AMT) AS AMT2,
TEST_DETAIL.SUMMARY_ID
FROM
TEST_DETAIL
GROUP BY
TEST_DETAIL.SUMMARY_ID
) AS tblAMT2
ON TEST_SUMMARY.ID=tblAMT2.SUMMARY_ID
LEFT JOIN
(
SELECT
SUM(TEST_DETAIL2.D2_AMT) AS AMT3,
TEST_DETAIL2.SUMMARY_ID
FROM
TEST_DETAIL2
GROUP BY
TEST_DETAIL2.SUMMARY_ID
) AS tblAMT3
ON TEST_SUMMARY.ID=tblAMT3.SUMMARY_ID
GROUP BY
TEST_SUMMARY.NAME
Try this:
SELECT TS.NAME, TS.AMT AS AMT1, SUM(TD.D_AMT) AS AMT2, SUM(TD2.D2_AMT) AS AMT3
FROM TEST_SUMMARY TS LEFT OUTER JOIN TEST_DETAIL TD ON TS.ID = TD.SUMMARY_ID
LEFT OUTER JOIN TEST_DETAIL2 TD2 ON TS.ID = TD2.SUMMARY_ID
GROUP BY TS.NAME, TS.AMT
ORDER BY TS.NAME, TS.AMT

How to get all children of a parent and then their children using recursion in query

I have structure like this:
<Unit>
<SubUnit1>
<SubSubUnit1/>
<SubSubUnit2/>
...
<SubSubUnitN/>
</SubUnit1/>
<SubUnit2>
<SubSubUnit1/>
<SubSubUnit2/>
...
<SubSubUnitN/>
</SubUnit2/>
...
<SubUnitN>
<SubSubUnit1/>
<SubSubUnit2/>
...
<SubSubUnitN/>
</SubUnitN/>
</Unit>
This structure has 3 levels: main Unit, SubUnits and SubSubUnits.
I want to select all children by UnitId.
If I search by Unit, I have to get all tree.
If I search by SubUnit1, I have to get SubUnit1 and all children of SubUnit1.
If I search SubSubUnit2, I have to get itself.
Here is my try:
with a(id, parentid, name)
as (
select id, parentId, name
from customer a
where parentId is null
union all
select a.id, a.parentid, a.Name
from customer
inner join a on customer.parentId = customer.id
)
select parentid, id, name
from customer pod
where pod.parentid in (
select id
from customer grbs
where grbs.parentid in (
select id
from customer t
where t.parentid = #UnitId
))
union
select parentid, id, name
from customer grbs
where grbs.parentid in (
select id
from customer t
where t.parentid = #UnitId
)
union
select parentid, id, name
from customer c
where c.Id = #UnitId
order by parentid, id
I use 3 union-words, it is not well but it works. Case structure will have N levels, how I have to get correct result?
DECLARE #Id int = your_UnitId
;WITH cte AS
(
SELECT a.Id, a.parentId, a.name
FROM customer a
WHERE Id = #Id
UNION ALL
SELECT a.Id, a.parentid, a.Name
FROM customer a JOIN cte c ON a.parentId = c.id
)
SELECT parentId, Id, name
FROM cte
Demo on SQLFiddle
In case of parent id is a child of itself then we need to use a different query. For example, schema structure is like below
CREATE TABLE customer
(
id int,
parentid int,
name nvarchar(10)
)
INSERT customer
VALUES(1, 1, 'aaa'),
(2, 1, 'bbb'),
(3, 2, 'ccc'),
(4, 2, 'ddd'),
(5, 1, 'eee'),
(6, 5, 'fff'),
(7, 5, 'ggg'),
(8, 8, 'hhh'),
(9, 8, 'iii'),
(10, 8, 'jjj')
In this case, we need to use below query:
DECLARE #Id int = 1 -- your UnitId
;WITH cte AS
(
SELECT a.Id, a.parentId, a.name
FROM customer a
WHERE parentid = #Id
UNION ALL
SELECT a.Id, a.parentid, a.Name
FROM customer a JOIN cte c ON a.parentId = c.id
and c.id != #Id
)
SELECT parentId, Id, name
FROM cte
go

How to combine these two queries into one? (multiple joins against the same table)

Given two tables, one for workers and one for tasks completed by workers,
CREATE TABLE IF NOT EXISTS `workers` (
`id` int(11) NOT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO `workers` (`id`) VALUES
(1);
CREATE TABLE IF NOT EXISTS `tasks` (
`id` int(11) NOT NULL,
`worker_id` int(11) NOT NULL,
`status` int(11) NOT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO `tasks` (`id`, `worker_id`, `status`) VALUES
(1, 1, 1),
(2, 1, 1),
(3, 1, 2),
(4, 1, 2),
(5, 1, 2);
I'm trying to get the number of tasks each worker has with each status code.
I can say either
SELECT w.*
,COUNT(t1.worker_id) as status_1_count
FROM workers w
LEFT JOIN tasks t1 ON w.id = t1.worker_id AND t1.status = 1
WHERE 1
GROUP BY
t1.worker_id
ORDER BY w.id
or
SELECT w.*
,COUNT(t2.worker_id) as status_2_count
FROM workers w
LEFT JOIN tasks t2 ON w.id = t2.worker_id AND t2.status = 2
WHERE 1
GROUP BY
t2.worker_id
ORDER BY w.id
and get the number of tasks with a single given status code, but when I try to get the counts for multiple task statuses in a single query, it doesn't work!
SELECT w.*
,COUNT(t1.worker_id) as status_1_count
,COUNT(t2.worker_id) as status_2_count
FROM workers w
LEFT JOIN tasks t1 ON w.id = t1.worker_id AND t1.status = 1
LEFT JOIN tasks t2 ON w.id = t2.worker_id AND t2.status = 2
WHERE 1
GROUP BY t1.worker_id
,t2.worker_id
ORDER BY w.id
The tasks table is cross-joining against itself when I would rather it wouldn't!
Is there any way to combine these two queries into one such that we can retrieve the counts for multiple task statuses in a single query?
Thanks!
SELECT w.*,
SUM(t1.status = 1) AS status_1_count,
SUM(t1.status = 2) AS status_2_count
FROM workers w
LEFT JOIN tasks t1 ON w.id = t1.worker_id AND t1.status IN (1, 2)
GROUP BY w.id
ORDER BY w.id;
I'm trying to get the number of tasks each worker has with each status code.
SELECT worker_id, status, COUNT(*)
FROM tasks
GROUP BY worker_id, status;
That's all.
I don't have an instance of MySQL here, but I tested this on a t-sql box and it worked.
select distinct(worker_id),
(select count(*) from tasks where status = 1) as Status1,
(select count(*) from tasks where status = 2) as Status2
from tasks;