Select duplicates while concatenating every one except the first - mysql

I am trying to write a query that will select all of the numbers in my table, but those numbers with duplicates i want to append something on the end that shows it as a duplicate. However I am not sure how to do this.
Here is an example of the table
TableA
ID Number
1 1
2 2
3 2
4 3
5 4
SELECT statement output would be like this.
Number
1
2
2-dup
3
4
Any insight on this would be appreciated.

if you mysql version didn't support window function. you can try to write a subquery to make row_number then use CASE WHEN to judgement rn > 1 then mark dup.
create table T (ID int, Number int);
INSERT INTO T VALUES (1,1);
INSERT INTO T VALUES (2,2);
INSERT INTO T VALUES (3,2);
INSERT INTO T VALUES (4,3);
INSERT INTO T VALUES (5,4);
Query 1:
select t1.id,
(CASE WHEN rn > 1 then CONCAT(Number,'-dup') ELSE Number END) Number
from (
SELECT *,(SELECT COUNT(*)
FROM T tt
where tt.Number = t1.Number and tt.id <= t1.id
) rn
FROM T t1
)t1
Results:
| id | Number |
|----|--------|
| 1 | 1 |
| 2 | 2 |
| 3 | 2-dup |
| 4 | 3 |
| 5 | 4 |
If you can use window function you can use row_number with window function to make rownumber by Number.
select t1.id,
(CASE WHEN rn > 1 then CONCAT(Number,'-dup') ELSE Number END) Number
from (
SELECT *,row_number() over(partition by Number order by id) rn
FROM T t1
)t1
sqlfiddle

I made a list of all the IDs that weren't dups (left join select) and then compared them to the entire list(case when):
select
case when a.id <> b.min_id then cast(a.Number as varchar(6)) + '-dup' else cast(a.Number as varchar(6)) end as Number
from table_a
left join (select MIN(b.id) min_id, Number from table_a b group by b.number)b on b.number = a.number
I did this in MS SQL 2016, hope it works for you.
This creates the table used:
insert into table_a (ID, Number)
select 1,1
union all
select 2,2
union all
select 3,2
union all
select 4,3
union all
select 5,4

Related

Group overlapping ranges of data in MySQL

Is there an easy way avoiding the usage of cursors to convert this:
+-------+------+-------+
| Group | From | Until |
+-------+------+-------+
| X | 1 | 3 |
+-------+------+-------+
| X | 2 | 4 |
+-------+------+-------+
| Y | 5 | 7 |
+-------+------+-------+
| X | 8 | 10 |
+-------+------+-------+
| Y | 11 | 12 |
+-------+------+-------+
| Y | 12 | 13 |
+-------+------+-------+
Into this:
+-------+------+-------+
| Group | From | Until |
+-------+------+-------+
| X | 1 | 4 |
+-------+------+-------+
| Y | 5 | 7 |
+-------+------+-------+
| X | 8 | 10 |
+-------+------+-------+
| Y | 11 | 13 |
+-------+------+-------+
So far I've tried to assign an ID to each row and GROUP BY that ID, but I can't get any closer without using cursors.
SELECT `Group`, `From`, `Until`
FROM ( SELECT `Group`, `From`, ROW_NUMBER() OVER (PARTITION BY `Group` ORDER BY `From`) rn
FROM test t1
WHERE NOT EXISTS ( SELECT NULL
FROM test t2
WHERE t1.`From` > t2.`From`
AND t1.`From` <= t2.`Until`
AND t1.`Group` = t2.`Group` ) ) t3
JOIN ( SELECT `Group`, `Until`, ROW_NUMBER() OVER (PARTITION BY `Group` ORDER BY `From`) rn
FROM test t1
WHERE NOT EXISTS ( SELECT NULL
FROM test t2
WHERE t1.`Until` >= t2.`From`
AND t1.`Until` < t2.`Until`
AND t1.`Group` = t2.`Group` ) ) t4 USING (`Group`, rn)
fiddle
Must work at any overlapping type (partially overlapped, adjacent, fully included).
Will not work if From and/or Until is NULL.
Could you add an explanation in English? – ysth
1st subquery searches joined ranges starts (see the fiddle - it is executed separately) - it searches for From value in a group which is not in the middle/end of any other range (start point equiality allowed).
2nd subquery do the same for joined ranges Until.
Both additionally enumerates found values ascending.
Outer query simply joins each range start and its finish into one row.
If you are using MYSQL version 8+ then you can use row_number to get the desired result:
Demo
SELECT MIN(`FROM`) START,
MAX(`UNTIL`) END,
`GROUP` FROM (
SELECT A.*,
ROW_NUMBER() OVER(ORDER BY `FROM`) RN_FROM,
ROW_NUMBER() OVER(PARTITION BY `GROUP` ORDER BY `UNTIL`) RN_UNTIL
FROM Table_lag A) X
GROUP BY `GROUP`, (RN_FROM - RN_UNTIL)
ORDER BY START;
You can do this with window functions only, using some gaps-and-island technique.
The idea is to build group of consecutive record having the same group and overlapping ranges, using lag() and a window sum(). You can then aggregate the groups:
select grp, min(c_from) c_from, max(c_until) c_until
from (
select
t.*,
sum(lag_c_until < c_from) over(partition by grp order by c_from) mygrp
from (
select
t.*,
lag(c_until, 1, c_until) over(partition by grp order by c_from) lag_c_until
from mytable t
) t
) t
group by grp, mygrp
The column names you chose conflict with SQL keywords (group, from), so I renamed them to grp, c_from and c_until.
Demo on DB Fiddle - with credits to ysth for creating the fiddle in the first place:
grp | c_from | c_until
:-- | -----: | ------:
X | 1 | 4
Y | 5 | 7
X | 8 | 10
Y | 11 | 13
I would use a recursive CTE for this:
with recursive intervals (`Group`, `From`, `Until`) as (
select distinct t1.Group, t1.From, t1.Until
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.Group=t2.Group
and t1.From between t2.From and t2.Until+1
and (t1.From,t1.Until) <> (t2.From,t2.Until)
)
union all
select t1.Group, t1.From, t2.Until
from intervals t1
join Table_lag t2
on t2.Group=t1.Group
and t2.From between t1.From and t1.Until+1
and t2.Until > t1.Until
)
select `Group`, `From`, max(`Until`) as Until
from intervals
group by `Group`, `From`
order by `From`, `Group`;
The anchor expression (select .. where not exists (...)) finds all the group & from that won't combine with some earlier from (so has one row for each row in our eventual output):
Then the recursive query adds rows for merged intervals for each of our rows.
Then just group by group and from (those are awful column names) to get the biggest
interval for each starting group/from.
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=9efa508504b80e44b73c952572394b76
Alternatively, you can do it with a straightforward set of joins and subqueries, with no CTE or window functions needed:
select
interval_start_range.grp,
interval_start_range.start,
max(merged.finish) finish
from (
select
interval_start.grp,
interval_start.start,
min(later_interval_start.start) next_start
from (
select distinct t1.grp, t1.start, t1.finish
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.grp=t2.grp
and t1.start between t2.start and t2.finish+1
and (t1.start,t1.finish) <> (t2.start,t2.finish)
)
) interval_start
left join (
select distinct t1.grp, t1.start, t1.finish
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.grp=t2.grp
and t1.start between t2.start and t2.finish+1
and (t1.start,t1.finish) <> (t2.start,t2.finish)
)
) later_interval_start
on interval_start.grp=later_interval_start.grp
and interval_start.start < later_interval_start.start
group by interval_start.grp, interval_start.start
) as interval_start_range
join Table_lag merged
on merged.grp=interval_start_range.grp
and merged.start >= interval_start_range.start
and (interval_start_range.next_start is null or merged.start < interval_start_range.next_start)
group by interval_start_range.grp, interval_start_range.start
order by interval_start_range.start, interval_start_range.grp
(I have renamed the columns here to not need backticks.)
Here there's a select to get all the starts of the reportable intervals we will report, joined to another similar select (you could use a CTE to avoid the redundancy) to find the following start of a reportable interval for the same group (if there is one). That's wrapped in a subquery to get the group, the start value, and the start value of the following reportable interval. Then it just needs to join all the other records that start within that range and pick the maximum ending value.
https://dbfiddle.uk/?rdbms=mysql_5.5&fiddle=151cc933489c299f7beefa99e1959549

Incrementing count ONLY for duplicates in MySQL

Here is my MySQL table. I updated the question by adding an 'id' column to it (as instructed in the comments by others).
id data_id
1 2355
2 2031
3 1232
4 9867
5 2355
6 4562
7 1232
8 2355
I want to add a new column called row_num to assign an incrementing number ONLY for duplicates, as shown below. Order of the results does not matter.
id data_id row_num
3 1232 1
7 1232 2
2 2031 null
1 2355 1
5 2355 2
8 2355 3
6 4562 null
4 9867 null
I followed this answer and came up with the code below. But following code adds a count of '1' to non-duplicate values too, how can I modify below code to add a count only for duplicates?
select data_id,row_num
from (
select data_id,
#row:=if(#prev=data_id,#row,0) + 1 as row_num,
#prev:=data_id
from my_table
)t
If you are running MySQL 8.0, you can do this more efficiently with window functions only:
select
data_id,
case when count(*) over(partition by data_id) > 1
then row_number() over(partition by data_id order by data_id) row_num
end
from mytable
When the window count returns more than 1, you know that the current data_id has duplicates, in which case you can use row_number() to assign the incrementing number.
Note that, in absence of an ordering columns to uniquely identify each record within groups sharing the same data_id, it is undefined which record will actually get each number.
I am assuming that id is the column that defines the order on the rows.
In MySQL 8 you can use row_number() to get the number of each data_id and a CASE with EXISTS to exclude the rows which have no duplicate.
SELECT t1.data_id,
CASE
WHEN EXISTS (SELECT *
FROM my_table t2
WHERE t2.data_id = t1.data_id
AND t2.id <> t1.id) THEN
row_number() OVER (PARTITION BY t1.data_id
ORDER BY t1.id)
END row_num
FROM my_table t1;
In older versions you can use a subquery counting the rows with the same data_id but smaller id. With an EXISTS in a HAVING clause you can exclude the rows that have no duplicate.
SELECT t1.data_id,
(SELECT count(*)
FROM my_table t2
WHERE t2.data_id = t1.data_id
AND t2.id < t1.id
HAVING EXISTS (SELECT *
FROM my_table t2
WHERE t2.data_id = t1.data_id
AND t2.id <> t1.id)) + 1 row_num
FROM my_table t1;
db<>fiddle
Join with a query that returns the number of duplicates.
select t1.data_id, IF(t2.dups > 1, row_num, '') AS row_num
from (
select data_id,
#row:=if(#prev=data_id,#row,0) + 1 as row_num,
#prev:=data_id
from my_table
order by data_id
) AS t1
join (
select data_id, COUNT(*) AS dups
FROM my_table
GROUP BY data_id
) AS t2 ON t1.data_id = t2.data_id
If you want to have the old "order" of the old table, you need much more code
SELECT
data_id, IF (row_num = 1 AND cntid = 1, NULL,row_num)
FROM
(SELECT
#row:=IF(#prev = t1.data_id, #row, 0) + 1 AS row_num,
cntid,
#prev:=t1.data_id data_id
FROM
(SELECT
*
FROM
my_table
ORDER BY data_id) t1
INNER JOIN (SELECT Count(*) cntid,data_id FROM my_table GROUP BY data_id)t2
ON t1.data_id = t2.data_id) t2
data_id | IF (row_num = 1 AND cntid = 1, NULL,row_num)
------: | -------------------------------------------:
1232 | 1
1232 | 2
2031 | null
2355 | 1
2355 | 2
2355 | 3
4562 | null
9867 | null
db<>fiddle here

Find Values between two different columns in a table

table one
+----------------------+
|column A | Column B|
| 2 | 4 |
| 3 | 5 |
| 1 | 2 |
| 1 | 2 |
| 8 | 7 |
+----------------------+
Output
+-------+
|1 | 2 |
|1 | 2 |
+-------+
i want to print only the above output without COUNT, and any duplicate record example? please help
how about below where cluase
select * from t where columnA=1 and columnB=2
or
select columnA,columnB from t
group by columnA,columnB
having count(*)>1
or you can use exists
select t1.* from t t1 where exists
(select 1 from t t2 where t2.columnA=t1.columnA
and t2.columnB=t1.columnB group by columnA,columnB
having count(*)>1
)
You possibly want only those rows which are duplicate. If you don't have Window Functions available in your MySQL version, you can do the following:
SELECT
t.*
FROM your_table AS t
JOIN (SELECT columnA, columnB
FROM your_table
GROUP BY columnA, columnB
HAVING COUNT(*) > 1) AS dt
ON dt.columnA = t.columnA AND dt.columnB = t.columnB
Details: In a Derived table, we get all those combination of columnA and columnB which have more than one row(s) (HAVING COUNT(*) > 1).
Now, we simply join this result-set back to the main table, to get those rows only.
Note: This approach would not be needed if you want to fetch only these two columns. A simple Group By with Having would suffice, as suggested in other answer(s). However, if you have more columns in the table, and you will need to fetch all of them, and not just the columns (used to determine duplicates); you will need to use this approach.
You can use in operator with a grouped subquery as :
select *
from tab
where ( columnA, columnB) in
(
select columnA, count(columnA)
from tab
group by columnA
);
or use a self-join as :
select t1.columnA, t1.columnB
from tab t1
join
(
select columnA, count(columnA) as columnB
from tab
group by columnA
) t2
on ( t1.columnA = t2.columnA and t1.columnB = t2.columnB );
Rextester Demo
I would use EXISTS, if the table has primary column :
SELECT t.*
FROM table t
WHERE EXISTS (SELECT 1 FROM table t1 WHERE t1.col1 = t.col1 AND t1.col2 = t.col2 AND t1.pk <> t.pk);

MySQL Group where any 3 of 5 columns match

I am searching an addresses table for duplicates, using SOUNDEX to find the duplicates. This works fine, and it requires all 5 soundex columns to match in order to group
However, I want to GROUP where ANY 3 of my 5 SOUNDEX columns match.
Here is my current query:
SELECT `Address`.`id`,
SOUNDEX(`Address`.`address_company_name`) as soundex_address_company_name,
SOUNDEX(`Address`.`contact_name`) as soundex_contact_name,
SOUNDEX(`Address`.`street_address`) as soundex_street_address,
SOUNDEX(`Address`.`suburb`) as soundex_suburb,
SOUNDEX(`Address`.`city`) as soundex_city,
`Address`.`address_country_id`,
`Address`.`address_zone_id`,
`Address`.`postcode`,
COUNT(*)
FROM
`addresses` AS `Address`
WHERE
((`Address`.`address_company_name` IS NOT NULL)
OR (`Address`.`contact_name` IS NOT NULL))
GROUP BY
SOUNDEX(address_company_name),
SOUNDEX(contact_name),
SOUNDEX(street_address),
SOUNDEX(suburb),
SOUNDEX(city),
address_country_id,
address_zone_id,
postcode
HAVING
COUNT(*) > 1
I understand how to do this with multiple queries, ie: loop through each address in our database and then re-query the database for addresses which match any 3 of the 5 columns, however I am hoping to do this in fewer queries as the above query executes very quickly.
I also understand that were this possible, some records may be grouped multiple times, I don't mind if this is the case but I am unsure whether this flies in the face of MySQL logic?
You can try something like this
SELECT a.id, b.id id2, COUNT(*) no_matches
FROM
(
SELECT id,
column_id,
CASE column_id
WHEN 1 THEN SOUNDEX(address_company_name)
WHEN 2 THEN SOUNDEX(contact_name)
WHEN 3 THEN SOUNDEX(street_address)
WHEN 4 THEN SOUNDEX(suburb)
WHEN 5 THEN SOUNDEX(city)
END column_value
FROM addresses a CROSS JOIN
(
SELECT 1 column_id UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5
) i
WHERE address_company_name IS NOT NULL
OR contact_name IS NOT NULL
) a CROSS JOIN
(
SELECT id,
column_id,
CASE column_id
WHEN 1 THEN SOUNDEX(address_company_name)
WHEN 2 THEN SOUNDEX(contact_name)
WHEN 3 THEN SOUNDEX(street_address)
WHEN 4 THEN SOUNDEX(suburb)
WHEN 5 THEN SOUNDEX(city)
END column_value
FROM addresses a CROSS JOIN
(
SELECT 1 column_id UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5
) i
WHERE address_company_name IS NOT NULL
OR contact_name IS NOT NULL
) b
WHERE a.column_value = b.column_value
AND a.id < b.id
GROUP BY a.id, b.id
HAVING COUNT(*) > 2
Sample output:
| ID | ID2 | NO_MATCHES |
|----|-----|------------|
| 1 | 2 | 4 |
| 4 | 5 | 3 |
Here is SQLFiddle demo

MySql select next lower number without using limit

Is it possible to select the next lower number from a table without using limit.
Eg: If my table had 10, 3, 2 , 1 I'm trying to select * from table where col > 10.
The result I'm expecting is 3. I know I can use limit 1, but can it be done without that?
Try
SELECT MAX(no) no
FROM table1
WHERE no < 10
Output:
| NO |
------
| 3 |
SQLFiddle
Try this query
SELECT
*
FROM
(SELECT
#rid:=#rid+1 as rId,
a.*
FROM
tbl a
JOIN
(SELECT #rid:=0) b
ORDER BY
id DESC)tmp
WHERE rId=2;
SQL FIDDLE:
| RID | ID | TYPE | DETAILS |
------------------------------------
| 2 | 28 | Twitter | #sqlfiddle5 |
Another approach
select a.* from supportContacts a inner join
(select max(id) as id
from supportContacts
where
id in (select id from supportContacts where id not in
(select max(id) from supportContacts)))b
on a.id=b.id
SQL FIDDLE:
| ID | TYPE | DETAILS |
------------------------------
| 28 | Twitter | #sqlfiddle5 |
Alternatively, this query will always get the second highest number based on the inner where clause.
SELECT *
FROM
(
SELECT t.col,
(
SELECT COUNT(distinct t2.col)
FROM tableName t2
WHERE t2.col >= t.col
) as rank
FROM tablename t
WHERE col <= 10
) xx
WHERE rank = 2 -- <<== means second highest
SQLFiddle Demo
SQLFiddle Demo (supports duplicate values)
If you want to get next lower number from table
you can get it with this query:
SELECT distinct col FROM table1 a
WHERE 2 = (SELECT count(DISTINCT(b.col)) FROM table1 b WHERE a.col >= b.col);
later again if you want to get third lower number you can just pass 3 in place of 2 in where clause
again if you want to get second higher number, just change the condition of where clause in inner query with
a.col <= b.col