SELECT COUNT(*) for unique pairs of IDs - mysql

I have a table like the following, named matches:
match_id ( AUTO INCREMENT )
user_id ( INT 11 )
opponent_id ( INT 11 )
date ( TIMESTAMP )
What I have to do is to SELECT the count of the rows where user_id and opponent_id are a unique pair. The goal is to see the count of total matches started between different users.
So if we have:
user_id = 10 and opponent_id = 11
user_id = 20 and opponent_id = 22
user_id = 10 and opponent_id = 11
user_id = 11 and opponent_id = 10
The result of the query should be 2.
In fact we only have 2 matches that have been started by a couple of different users. Match 1 - 3 - 4 are the same matches, because played by the same couple of user IDs.
Can anyone help me with this?
I have done similar queries but never on pairs of IDs, always on a single ID.

FancyPants answer is correct, but I prefer to use DISTINCT when no aggregate function is used:
SELECT COUNT(DISTINCT
LEAST(user_id, opponent_id),
GREATEST(user_id, opponent_id)
)
FROM yourtable;
is sufficient.

SELECT COUNT(*) AS nr_of_matches FROM (
SELECT
LEAST(user_id, opponent_id) AS pl1,
GREATEST(user_id, opponent_id) AS pl2
FROM yourtable
GROUP BY pl1, pl2
) sq
see it working in an sqlfiddle

Related

How can I select the second-to-last rows in a mysql table, grouped by column?

Structure is:
CREATE TABLE current
(
id BIGINT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
symbol VARCHAR(5),
UNIQUE (id), INDEX (symbol)
) ENGINE MyISAM;
id
symbol
1
A
2
B
3
C
4
C
5
B
6
A
7
C
8
C
9
A
10
B
I am using the following
SELECT *
FROM current
WHERE id
IN
(
SELECT MAX(id)
FROM current
GROUP BY symbol
)
to return the last records in a table.
id
symbol
8
C
9
A
10
B
How can I return the next-to-last results in a similar fashion?
I know that I need
ORDER BY id DESC LIMIT 1,1
somewhere, but my foo is weak.
I would want to return
id
symbol
5
B
6
A
7
C
For versions of MySql prior to 8.0, use a subquery in the WHERE clause to filter out the max id of each symbol and then aggregate:
SELECT MAX(id) id, symbol
FROM current
WHERE id NOT IN (SELECT MAX(id) FROM current GROUP BY symbol)
GROUP BY symbol
ORDER BY id;
See the demo.
SELECT *
FROM current
WHERE id IN (
SELECT DISTINCT T.id FROM current AS T
WHERE id=(
SELECT id FROM current
WHERE symbol=T.symbol
ORDER BY id DESC LIMIT 1,1
)
)
Easy if your MySql can use ROW_NUMBER. (MySql 8)
Just make it sort descending, then take the 2nd.
WITH CTE AS (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY id DESC) AS symbol_rn
FROM current
)
SELECT id, symbol
FROM CTE
WHERE symbol_rn = 2
ORDER BY id;
In MySql 7.5 you can simply self-join on the symbol, and group by.
Then the 2nd last will have 1 higher id.
SELECT c1.id, c1.symbol
FROM current c1
LEFT JOIN current c2
ON c2.symbol = c1.symbol
AND c2.id >= c1.id
GROUP BY c1.id, c1.symbol
HAVING COUNT(c2.id) = 2
ORDER BY c1.id;
id
symbol
5
B
6
A
7
C
db<>fiddle here
The performance will really benefit from an index on symbol.
You can try this;
SELECT *
FROM current
WHERE id
IN (SELECT MAX(id)
FROM current
GROUP BY symbol)
ORDER BY id DESC LIMIT 1,3
limit 1,3 says; get the last 3 results excluding the last result. You can change the numbers.

Select rows with duplicate values that may be in different columns

I have a table of businesses, and each business can have up to 3 phone numbers. I want to find any duplicate phone numbers, but since the phone numbers are in different columns I don't think I can make the classic GROUP BY query work.
Sample data:
ID
Business_Name
phone_main
phone_mobile
phone_tollfree
1
John's Donuts
555-551-5555
555-551-5556
null
2
John's Bakery
555-551-5557
555-551-5555
null
3
SuperBake!
555-300-1005
null
555-551-5555
4
Grocery Fred
555-223-5511
555-334-5555
null
In this case I want to identify records 1, 2, and 3 as being the same. Simply identifying the phone number 555-551-5555 as a number with duplicates would be fine, as I can do a subquery or the calling program can use the phone number and send a new query getting all records with 555-551-5555 in any of the 3 phone columns.
This is on MariaDB if it matters.
Edit, (adding my current flailing attempt since someone seems to really want it):
Here's what I have right now:
SELECT ID, phone_main, phone_mobile, phone_tollfree
(
SELECT COUNT(*) FROM businesses b
WHERE (
phone IS NOT NULL AND (b.phone_mobile=phone OR b.tollfree=phone )
)
OR (
phone_mobile IS NOT NULL AND (b.phone=phone_mobile OR b.phone_tollfree=phone_mobile)
)
OR (
phone_tollfree IS NOT NULL AND (b.phone=phone_tollfree OR b.phone_mobile=phone_tollfree)
)
) cnt
from business HAVING cnt > 1
Problems with this:
It seems to be returning every row in my table.
It won't find duplicates within a single column.
How about uniting all the phone columns into one an then counting the reoccurrences?
I Didn't run the code but it might give you a direction...:
SELECT phone, COUNT(phone)
FROM (
SELECT phone_main as phone FROM SampleData
UNION ALL
(SELECT phone_mobile as phone FROM SampleData
ORDER BY City;
UNION ALL
SELECT phone_tollfree as phone FROM SampleData
ORDER BY City; and )
)
GROUP BY phone
E.g.:
SELECT DISTINCT x.*
FROM my_table x
JOIN my_table y
ON y.id <> x.id
AND
( y.phone_main IN(x.phone_main,x.phone_mobile,x.phone_tollfree)
OR y.phone_mobile IN(x.phone_main,x.phone_mobile,x.phone_tollfree)
OR y.phone_tollfree IN(x.phone_main,x.phone_mobile,x.phone_tollfree)
);
You can do:
with
l as (
select *, least(phone_main, phone_mobile, phone_tollfree) as p
from t
)
select *
from l
where p in (select p from l group by p having count(*) > 1)

SQL/MySQL DELETE all rows EXCEPT 2 of them

I have a database table setup like this:
id | code | group_id | status ---
---|-------|---------|------------
1 | abcd1 | group_1 | available
2 | abcd2 | group_1 | available
3 | adsd3 | group_1 | available
4 | dfgd4 | group_1 | available
5 | vfcd5 | group_1 | available
6 | bgcd6 | group_2 | available
7 | abcd7 | group_2 | available
8 | ahgf8 | group_2 | available
9 | dfgd9 | group_2 | available
10 | qwer6 | group_2 | available
In the example above, each group_id has 5 total rows (arbitrary for example, total rows will be dynamic and vary), I need to remove every row that matches available in status except for 2 of them (which 2 does not matter, as long as there are 2 of them remaining)
Basically every unique group_id should only have 2 total rows with status of available. I am able to do a simple SQL query to remove all of them, but struggling to come up with a SQL query to remove all except for 2 ... please helppppp :)
If code is unique, you can use subqueries to keep the "min" and "max"
DELETE FROM t
WHERE t.status = 'available'
AND (t.group_id, t.code) NOT IN (
SELECT group_id, MAX(code)
FROM t
WHERE status = 'available'
GROUP BY group_id
)
AND (t.group_id, t.code) NOT IN (
SELECT group_id, MIN(code)
FROM t
WHERE status = 'available'
GROUP BY group_id
)
Similarly, with an auto increment id:
DELETE FROM t
WHERE t.status = 'available'
AND t.id NOT IN (
SELECT MAX(id) FROM t WHERE status = 'available' GROUP BY group_id
UNION
SELECT MIN(id) FROM t WHERE status = 'available' GROUP BY group_id
)
I reworked the subquery into a UNION instead in this version, but the "AND" format would work just as well too. Also, if "code" was unique across the whole table, the NOT IN could be simplified down to excluding the group_id as well (though it would still be needed in the subqueries' GROUP BY clauses).
Edit: MySQL doesn't like subqueries referencing tables being UPDATEd/DELETEd in the WHERE of the query doing the UPDATE/DELETE; in those cases, you can usually double-wrap the subquery to give it an alias, causing MySQL to treat it as a temporary table (behind the scenes).
DELETE FROM t
WHERE t.status = 'available'
AND t.id NOT IN (
SELECT * FROM (
SELECT MAX(id) FROM t WHERE status = 'available' GROUP BY group_id
UNION
SELECT MIN(id) FROM t WHERE status = 'available' GROUP BY group_id
) AS a
)
Another alternative, I don't recall if MySQL complains as much about joins in DELETE/UPDATE....
DELETE t
FROM t
LEFT JOIN (
SELECT MIN(id) AS minId, MAX(id) AS maxId, 1 AS keep_flag
FROM t
WHERE status = 'available'
GROUP BY group_id
) AS tKeep ON t.id IN (tKeep.minId, tKeep.maxId)
WHERE t.status = 'available'
AND tKeep.keep_flag IS NULL
To keep the min and max ids, I think a join is the simplest solution:
DELETE t
FROM t LEFT JOIN
(SELECT group_id, MIN(id) as min_id, MAX(id) as max_id
FROM t
WHERE t.status = 'available'
GROUP BY group_id
) tt
ON t.id IN (tt.min_id, tt.max_id)
WHERE t.status = 'available' AND
tt.group_id IS NULL;
If the column "id" is the PRIMARY KEY or a UNIQUE KEY, then we could use a correlated subquery to get the second lowest value for a particular group_id.
We could then use that to identify rows for group_id that have higher values of the "id" column.
A query something like this:
SELECT t.`id`
, t.`group_id`
FROM `setup_like_this` t
WHERE t.`status` = 'available'
AND t.`id`
> ( SELECT s.`id`
FROM `setup_like_this` s
WHERE s.`status` = 'available'
AND s.`group_id` = t.`group_id`
ORDER
BY s.`id`
LIMIT 1,1
)
We test that as a SELECT first, to examine the rows that are returned. When we are satisfied this query is returning the set of rows we want to delete, we can replace SELECT ... FROM with DELETE t.* FROM to convert it to a DELETE statement to remove the rows.
Error 1093 encountered converting to DELETE statement.
One workaround is to make the query above into a inline view, and then join to the target table
DELETE q.*
FROM `setup_like_this` q
JOIN ( -- inline view, query from above returns `id` of rows we want to delete
SELECT t.`id`
, t.`group_id`
FROM `setup_like_this` t
WHERE t.`status` = 'available'
AND t.`id`
> ( SELECT s.`id`
FROM `setup_like_this` s
WHERE s.`status` = 'available'
AND s.`group_id` = t.`group_id`
ORDER
BY s.`id`
LIMIT 1,1
)
) r
ON r.id = q.id
select id, code, group_id, status
from (
select id, code, group_id, status
, ROW_NUMBER() OVER (
PARTITION BY group_id
ORDER BY id DESC) row_num
) rownum
from a
) q
where rownum < 3

MySql Sum and Count for simple table

Could you help me with simple table SUM and COUNT calculating?
I've simple table 'test'
id name value
1 a 4
2 a 5
3 b 3
4 b 7
5 b 1
I need calculate SUM and Count for "a" and "b". I try this sql request:
SELECT name, SUM( value ) AS val, COUNT( * ) AS count FROM `test`
result:
name val count
a 20 5
But should be
name val count
a 9 2
b 11 3
Could you help me with correct sql request?
Add GROUP BY. That will cause the query to return a count and sum per group you defined (in this case, per name).
Without GROUP BY you just get the totals and any of the names (in your case 'a', but if could just as well have been 'b').
SELECT name, SUM( value ) AS val, COUNT( * ) AS count
FROM `test`
GROUP BY name
You need group by
select
name,
sum(value) as value,
count(*) as `count`
from test group by name ;

Fetch 2nd Higest value from MySql DB with GROUP BY

I have a table tbl_patient and I want to fetch last 2 visit of each patient in order to compare whether patient condition is improving or degrading.
tbl_patient
id | patient_ID | visit_ID | patient_result
1 | 1 | 1 | 5
2 | 2 | 1 | 6
3 | 2 | 3 | 7
4 | 1 | 2 | 3
5 | 2 | 3 | 2
6 | 1 | 3 | 9
I tried the query below to fetch the last visit of each patient as,
SELECT MAX(id), patient_result FROM `tbl_patient` GROUP BY `patient_ID`
Now i want to fetch the 2nd last visit of each patient with query but it give me error
(#1242 - Subquery returns more than 1 row)
SELECT id, patient_result FROM `tbl_patient` WHERE id <(SELECT MAX(id) FROM `tbl_patient` GROUP BY `patient_ID`) GROUP BY `patient_ID`
Where I'm wrong
select p1.patient_id, p2.maxid id1, max(p1.id) id2
from tbl_patient p1
join (select patient_id, max(id) maxid
from tbl_patient
group by patient_id) p2
on p1.patient_id = p2.patient_id and p1.id < p2.maxid
group by p1.patient_id
id11 is the ID of the last visit, id2 is the ID of the 2nd to last visit.
Your first query doesn't get the last visits, since it gives results 5 and 6 instead of 2 and 9.
You can try this query:
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
union
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
where id not in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
GROUP BY patient_ID)
order by 1,2
SELECT id, patient_result FROM `tbl_patient` t1
JOIN (SELECT MAX(id) as max, patient_ID FROM `tbl_patient` GROUP BY `patient_ID`) t2
ON t1.patient_ID = t2.patient_ID
WHERE id <max GROUP BY t1.`patient_ID`
There are a couple of approaches to getting the specified resultset returned in a single SQL statement.
Unfortunately, most of those approaches yield rather unwieldy statements.
The more elegant looking statements tend to come with poor (or unbearable) performance when dealing with large sets. And the statements that tend to have better performance are more un-elegant looking.
Three of the most common approaches make use of:
correlated subquery
inequality join (nearly a Cartesian product)
two passes over the data
Here's an approach that uses two passes over the data, using MySQL user variables, which basically emulates the analytic RANK() OVER(PARTITION ...) function available in other DBMS:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM (
SELECT p.id
, p.patient_id
, p.visit_id
, p.patient_result
, #rn := if(#prev_patient_id = patient_id, #rn + 1, 1) AS rn
, #prev_patient_id := patient_id AS prev_patient_id
FROM tbl_patients p
JOIN (SELECT #rn := 0, #prev_patient_id := NULL) i
ORDER BY p.patient_id DESC, p.id DESC
) t
WHERE t.rn <= 2
Note that this involves an inline view, which means there's going to be a pass over all the data in the table to create a "derived tabled". Then, the outer query will run against the derived table. So, this is essentially two passes over the data.
This query can be tweaked a bit to improve performance, by eliminating the duplicated value of the patient_id column returned by the inline view. But I show it as above, so we can better understand what is happening.
This approach can be rather expensive on large sets, but is generally MUCH more efficient than some of the other approaches.
Note also that this query will return a row for a patient_id if there is only one id value exists for that patient; it does not restrict the return to just those patients that have at least two rows.
It's also possible to get an equivalent resultset with a correlated subquery:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
WHERE ( SELECT COUNT(1) AS cnt
FROM tbl_patients p
WHERE p.patient_id = t.patient_id
AND p.id >= t.id
) <= 2
ORDER BY t.patient_id ASC, t.id ASC
Note that this is making use of a "dependent subquery", which basically means that for each row returned from t, MySQL is effectively running another query against the database. So, this will tend to be very expensive (in terms of elapsed time) on large sets.
As another approach, if there are relatively few id values for each patient, you might be able to get by with an inequality join:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
LEFT
JOIN tbl_patients p
ON p.patient_id = t.patient_id
AND t.id < p.id
GROUP
BY t.id
, t.patient_id
, t.visit_id
, t.patient_result
HAVING COUNT(1) <= 2
Note that this will create a nearly Cartesian product for each patient. For a limited number of id values for each patient, this won't be too bad. But if a patient has hundreds of id values, the intermediate result can be huge, on the order of (O)n**2.
Try this..
SELECT id, patient_result FROM tbl_patient AS tp WHERE id < ((SELECT MAX(id) FROM tbl_patient AS tp_max WHERE tp_max.patient_ID = tp.patient_ID) - 1) GROUP BY patient_ID
Why not use simply...
GROUP BY `patient_ID` DESC LIMIT 2
... and do the rest in the next step?