The solution to the topic is evading me.
I have a table looking like (beyond other fields that have nothing to do with my question):
NAME,CARDNUMBER,MEMBERTYPE
Now, I want a view that shows rows where the cardnumber AND membertype is identical. Both of these fields are integers. Name is VARCHAR. Name is not unique, and duplicate cardnumber, membertype should show for the same name, as well.
I.e. if the following was the table:
JOHN | 324 | 2
PETER | 642 | 1
MARK | 324 | 2
DIANNA | 753 | 2
SPIDERMAN | 642 | 1
JAMIE FOXX | 235 | 6
I would want:
JOHN | 324 | 2
MARK | 324 | 2
PETER | 642 | 1
SPIDERMAN | 642 | 1
this could just be sorted by cardnumber to make it useful to humans.
What's the most efficient way of doing this?
What's the most efficient way of doing this?
I believe a JOIN will be more efficient than EXISTS
SELECT t1.* FROM myTable t1
JOIN (
SELECT cardnumber, membertype
FROM myTable
GROUP BY cardnumber, membertype
HAVING COUNT(*) > 1
) t2 ON t1.cardnumber = t2.cardnumber AND t1.membertype = t2.membertype
Query plan: http://www.sqlfiddle.com/#!2/0abe3/1
You can use exists for this:
select *
from yourtable y
where exists (
select 1
from yourtable y2
where y.name <> y2.name
and y.cardnumber = y2.cardnumber
and y.membertype = y2.membertype)
SQL Fiddle Demo
Since you mentioned names can be duplicated, and that a duplicate name still means is a different person and should show up in the result set, we need to use a GROUP BY HAVING COUNT(*) > 1 in order to truly detect dupes. Then join this back to the main table to get your full result list.
Also since from your comments, it sounds like you are wrapping this into a view, you'll need to separate out the subquery.
CREATE VIEW DUP_CARDS
AS
SELECT CARDNUMBER, MEMBERTYPE
FROM mytable t2
GROUP BY CARDNUMBER, MEMBERTYPE
HAVING COUNT(*) > 1
CREATE VIEW DUP_ROWS
AS
SELECT t1.*
FROM mytable AS t1
INNER JOIN DUP_CARDS AS DUP
ON (T1.CARDNUMBER = DUP.CARDNUMBER AND T1.MEMBERTYPE = DUP.MEMBERTYPE )
SQL Fiddle Example
If you just need to know the valuepairs of the 3 fields that are not unique then you could simply do:
SELECT concat(NAME, "|", CARDNUMBER, "|", MEMBERTYPE) AS myIdentifier,
COUNT(*) AS count
FROM myTable
GROUP BY myIdentifier
HAVING count > 1
This will give you all the different pairs of NAME, CARDNUMBER and MEMBERTYPE that are used more than once with a count (how many times they are duplicated). This doesnt give you back the entries, you would have to do that in a second step.
Related
I have bit strange requirement in mysql.
I should select all records from table where last 6 characters are not unique.
for example if I have table:
I should select row 1 and 3 since last 6 letters of this values are not unique.
Do you have any idea how to implement this?
Thank you for help.
I uses a JOIN against a subquery where I count the occurences of each unique combo of n (2 in my example) last chars
SELECT t.*
FROM t
JOIN (SELECT RIGHT(value, 2) r, COUNT(RIGHT(value, 2)) rc
FROM t
GROUP BY r) c ON c.r = RIGHT(value, 2) AND c.rc > 1
Something like that should work:
SELECT `mytable`.*
FROM (SELECT RIGHT(`value`, 6) AS `ending` FROM `mytable` GROUP BY `ending` HAVING COUNT(*) > 1) `grouped`
INNER JOIN `mytable` ON `grouped`.`ending` = RIGHT(`value`, 6)
but it is not fast. This requires a full table scan. Maybe you should rethink your problem.
EDITED: I had a wrong understanding of the question previously and I don't really want to change anything from my initial answer. But if my previous answer is not acceptable in some environment and it might mislead people, I have to correct it anyhow.
SELECT GROUP_CONCAT(id),RIGHT(VALUE,6)
FROM table1
GROUP BY RIGHT(VALUE,6) HAVING COUNT(RIGHT(VALUE,6)) > 1;
Since this question already have good answers, I made my query in a slightly different way. And I've tested with sql_mode=ONLY_FULL_GROUP_BY. ;)
This is what you need: a subquery to get the duplicated right(value,6) and the main query yo get the rows according that condition.
SELECT t.* FROM t WHERE RIGHT(`value`,6) IN (
SELECT RIGHT(`value`,6)
FROM t
GROUP BY RIGHT(`value`,6) HAVING COUNT(*) > 1);
UPDATE
This is the solution to avoid the mysql error in the case you have sql_mode=only_full_group_by
SELECT t.* FROM t WHERE RIGHT(`value`,6) IN (
SELECT DISTINCT right_value FROM (
SELECT RIGHT(`value`,6) AS right_value,
COUNT(*) AS TOT
FROM t
GROUP BY RIGHT(`value`,6) HAVING COUNT(*) > 1) t2
)
Fiddle here
Might be a fast code, as there is no counting involved.
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/0
select *
from tbl outr
where not exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Output:
| id | value |
| --- | --------------- |
| 2 | aaaaaaaaaaaaaa |
| 4 | aaaaaaaaaaaaaaB |
| 5 | Hello |
The logic is to test other rows that is not equal to the same id of the outer row. If those other rows has same right 6 characters as the outer row, then don't show that outer row.
UPDATE
I misunderstood the OP's intent. It's the reversed. Anyway, just reverse the logic. Use EXISTS instead of NOT EXISTS
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/3
select *
from tbl outr
where exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Output:
| id | value |
| --- | ----------- |
| 1 | abcdePuzzle |
| 3 | abcPuzzle |
UPDATE
Tested the query. The performance of my answer (correlated EXISTS approach) is not optimal. Just keeping my answer, so others will know what approach to avoid :)
GhostGambler's answer is faster than correlated EXISTS approach. For 5 million rows, his answer takes 2.762 seconds only:
explain analyze
SELECT
tbl.*
FROM
(
SELECT
RIGHT(value, 6) AS ending
FROM
tbl
GROUP BY
ending
HAVING
COUNT(*) > 1
) grouped
JOIN tbl ON grouped.ending = RIGHT(value, 6)
My answer (correlated EXISTS) takes 4.08 seconds:
explain analyze
select *
from tbl outr
where exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Straightforward query is the fastest, no join, just plain IN query. 2.722 seconds. It has practically the same performance as JOIN approach since they have the same execution plan. This is kiks73's answer. I just don't know why he made his second answer unnecessarily complicated.
So it's just a matter of taste, or choosing which code is more readable select from in vs select from join
explain analyze
SELECT *
FROM tbl
where right(value, 6) in
(
SELECT
RIGHT(value, 6) AS ending
FROM
tbl
GROUP BY
ending
HAVING
COUNT(*) > 1
)
Result:
Test data used:
CREATE TABLE tbl (
id INTEGER primary key,
value VARCHAR(20)
);
INSERT INTO tbl
(id, value)
VALUES
('1', 'abcdePuzzle'),
('2', 'aaaaaaaaaaaaaa'),
('3', 'abcPuzzle'),
('4', 'aaaaaaaaaaaaaaB'),
('5', 'Hello');
insert into tbl(id, value)
select x.y, 'Puzzle'
from generate_series(6, 5000000) as x(y);
create index ix_tbl__right on tbl(right(value, 6));
Performances without the index, and with index on tbl(right(value, 6)):
JOIN approach:
Without index: 3.805 seconds
With index: 2.762 seconds
IN approach:
Without index: 3.719 seconds
With index: 2.722 seconds
Just a bit neater code (if using MySQL 8.0). Can't guarantee the performance though
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/1
select x.*
from
(
select
*,
count(*) over(partition by right(value, 6)) as unique_count
from tbl
) as x
where x.unique_count = 1
Output:
| id | value | unique_count |
| --- | --------------- | ------------ |
| 2 | aaaaaaaaaaaaaa | 1 |
| 4 | aaaaaaaaaaaaaaB | 1 |
| 5 | Hello | 1 |
UPDATE
I misunderstood OP's intent. It's the reversed. Just change the count:
select x.*
from
(
select
*,
count(*) over(partition by right(value, 6)) as unique_count
from tbl
) as x
where x.unique_count > 1
Output:
| id | value | unique_count |
| --- | ----------- | ------------ |
| 1 | abcdePuzzle | 2 |
| 3 | abcPuzzle | 2 |
I am trying to get distinct result of following table
id | name | created_on
1 | xyz | 2015-07-04 09:45:14
1 | xyz | 2015-07-04 10:40:59
2 | abc | 2015-07-05 10:40:59
I want distinct id with latest created_on means following result
1 | xyz | 2015-07-04 10:40:59
2 | abc | 2015-07-05 10:40:59
How to get above result by sql query?
Try this:
Select id, name, max(created_on) as created_on from table group by id
Try:
select id,max(name), max(created_on) from table_name group by id
Additional Note:
As it appears, your table is not normalized. That is, you store the name along with id in this table. So you may have these two rows simultaneously:
id | name | created_on
1 | a | 12-12-12
1 | b | 11-11-11
If that state is not logically possible in your model, you should redesign your database by splitting this table into two separate tables; one for holding id-name relationship, and another to hold id-created_on relationship:
table_1 (id,name)
table_2 (id,created_on)
Now, to get last created_on for each id:
select id,max(created_on) from table_2
And if you want to hold name in the query:
select t1.id, t1.name, t2.created_on from table_1 as t1 inner join
(select id, max(created_on) as created_on from table_2) as t2
on t1.id=t2.id
Assuming that id/name is always a pair:
select id, name, max(created_on)
from table
group by id, name;
It is safer to include both in the group by. I also find it misleading to name a column id when it is not unique for the table.
You can use the keyword DISTINCT
like
SELECT DISTINCT
I have a table like this:
id | val
---------
1 | abc
2 | def
5 | xyz
6 | foo
8 | bar
and a query like
SELECT id, val FROM tab WHERE id IN (1,2,3,4,5)
which returns
id | val
---------
1 | abc
2 | def
5 | xyz
Is there a way to make it return NULLs on missing ids, that is
id | val
---------
1 | abc
2 | def
3 | NULL
4 | NULL
5 | xyz
I guess there should be a tricky LEFT JOIN with itself, but can't wrap my head around it.
EDIT: I see people are thinking I want to "fill the gaps" in a sequence, but actually what I want is to substitute NULL for the missing values from the IN list. For example, this
SELECT id, val FROM tab WHERE id IN (1,100,8,200)
should return
id | val
---------
1 | abc
100 | NULL
8 | bar
200 | NULL
Also, the order doesn't matter much.
EDIT2: Just adding a couple of related links:
How to select multiple rows filled with constants?
Is it possible to have a tableless select with multiple rows?
You could use this trick:
SELECT v.id, t.val
FROM
(SELECT 1 AS id
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5) v
LEFT JOIN tab t
ON v.id = t.id
Please see fiddle here.
Yes, you can. But that will be tricky since there are no sequences in MySQL.
I assume you want just any selection, so it's:
SELECT
*
FROM
(SELECT
(two_1.id + two_2.id + two_4.id +
two_8.id + two_16.id) AS id
FROM
(SELECT 0 AS id UNION ALL SELECT 1 AS id) AS two_1
CROSS JOIN (SELECT 0 id UNION ALL SELECT 2 id) AS two_2
CROSS JOIN (SELECT 0 id UNION ALL SELECT 4 id) AS two_4
CROSS JOIN (SELECT 0 id UNION ALL SELECT 8 id) AS two_8
CROSS JOIN (SELECT 0 id UNION ALL SELECT 16 id) AS two_16
) AS sequence
LEFT JOIN
t
ON sequence.id=t.id
WHERE
sequence.id IN (1,2,3,4,5);
(check the fiddle)
It will work as combination of powers of 2 to generate consecutive table of numbers. Your values are passed to WHERE clause, so you can substitute there any set of values.
I would recommend you to use application for this case - because it will be faster. It may have some sense if you want to use this row set somewhere else (i.e. in some other queries) - but if not, it's a work for your application.
If you'll need higher values, add more rows to sequence generator, like in this fiddle.
Not sure if this is possible but I have a schema like this:
id | user_id | thread_id
1 | 1 | 1
2 | 4 | 1
3 | 1 | 2
4 | 3 | 2
I am trying to retrieve the thread_id where user_id = 1 and 4. I know that in(1,4) does not fit my needs as its pretty much a OR and will pull up record 3 as well and Exists only returns a bool.
You may use JOIN (that answer already exists) or HAVING, like this:
SELECT
thread_id,
COUNT(1) AS user_count
FROM
t
WHERE
user_id IN (1,4)
GROUP BY
thread_id
HAVING
user_count=2
-check the demo. HAVING will fit better in case of many id's (because with JOIN you'll need to join as many times as many id you have). This is a bit tricky, however: you may do = comparison only if your records are unique per (user_id, thread_id); for example, your user_id can repeat, then use >=, like in this demo.
Try this with join, i guess you need to do AND operation with user_id must be 4 and 1 then
SELECT
t1.thread_id
FROM
TABLE t1
JOIN TABLE t2
ON (t1.user_id = t2.user_id)
WHERE t1.user_id = 1
AND t2.user_id = 4
I have a MySQL table like this:
| id1 | id2 |
| 34567 | 75879 | <---- pair1
| 13245 | 46753 |
| 75879 | 34567 | <---- pair2
| 06898 | 00013 |
with 37 000 entries.
What is the SQL Request or how can i identify duplicates pairs (like pair1 and pair2)?
Thanks
if you want to identify the duplicates and count them at the same time, you could use:
SELECT if(id1 < id2, id1, id2), if (id1 < id2, id2, id1), count(*)
FROM your_table
GROUP BY 1,2
HAVING count(*) > 1
This does not perform a join, which might be faster in the end.
If you join the table with it self you can filter out the ones you need.
SELECT *
FROM your_table yt1,
your_table yt2
WHERE (yt1.id1 = yt2.id2 AND yt1.id2 = yt1.id1)
OR (yt1.id1 = yt2.id1 AND yt1.id2 = yt2.id2)
The original post is 1000 years old, but here's another form:
SELECT CONCAT(d1, '/' d2) AS pair, count(*) AS total
FROM your_table
GROUP BY pair HAVING total > 1
ORDER BY total DESC;
May or may not perform as well as the other suggested answers.