MySQL Duplicate rows - specify columns

MySQL Duplicate rows - specify columns - mysql

How can I run a query that finds duplicates between rows? It needs to not match one field but multiple.
Here is the EXPLAIN of the table.
+-------------+--------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| token | varchar(64) | NO | MUL | NULL | |
| maxvar | float | NO | | NULL | |
| maxvbr | float | NO | | NULL | |
| minvcr | float | NO | | NULL | |
| minvdr | float | NO | | NULL | |
| atype | int(11) | NO | | NULL | |
| avalue | varchar(255) | NO | | NULL | |
| createddate | timestamp | NO | | CURRENT_TIMESTAMP | |
| timesrun | int(11) | NO | | NULL | |
+-------------+--------------+------+-----+-------------------+----------------+
I need to match all rows that match: token,maxvar,maxvbr,minvcr,minvdr,type and avalue. If all of those fields match those in another row then treat it as a "duplicate".
Ultimately I want to run this as a delete command but I can easily alter the select.
UPDATE Still looking for solution that deletes with single query in MySQL

Just join the table to itself and compare the rows. You can make sure you keep the duplicate with the lowest ID by requiring the id to be deleted to be greater than the id of a duplicate:
DELETE FROM my_table WHERE id IN (
SELECT DISTINCT t1.id
FROM my_table t1
JOIN my_table t2
WHERE t1.id > t2.id
AND t1.token = t2.token AND t1.maxvar = t2.maxvar
AND t1.maxvbr = t2.maxvbr AND t1.minvcr = t2.minvcr
AND t1.minvdr = t2.minvdr AND t1.type = t2.type)

This query will find all duplicate records which should be deleted -
SELECT t1.id FROM table_duplicates t1
INNER JOIN (
SELECT MIN(id) id, token, maxvar, maxvbr, minvcr, minvdr, atype, avalue FROM table_duplicates
GROUP BY token, maxvar, maxvbr, minvcr, minvdr, atype, avalue
HAVING COUNT(*) > 1
) t2
ON t1.id <> t2.id AND t1.token = t2.token AND t1.maxvar=t2.maxvar AND t1.maxvbr = t2.maxvbr AND t1.minvcr = t2.minvcr AND t1.minvdr = t2.minvdr AND t1.atype = t2.atype AND t1.avalue = t2.avalue;
This query will remove all duplicates -
DELETE t1 FROM table_duplicates t1
INNER JOIN (
SELECT MIN(id) id, token, maxvar, maxvbr, minvcr, minvdr, atype, avalue FROM table_duplicates
GROUP BY token, maxvar, maxvbr, minvcr, minvdr, atype, avalue
HAVING COUNT(*) > 1
) t2
ON t1.id <> t2.id AND t1.token = t2.token AND t1.maxvar=t2.maxvar AND t1.maxvbr = t2.maxvbr AND t1.minvcr = t2.minvcr AND t1.minvdr = t2.minvdr AND t1.atype = t2.atype AND t1.avalue = t2.avalue;

SELECT token,maxvar,maxvbr,minvcr,minvdr,type, avalue,
Count(*)
FROM yourtable
GROUP BY token,maxvar,maxvbr,minvcr,minvdr,type, avalue
HAVING Count(*) > 1
This query returns all the rows that are in the table two times or more often (and how often they are).

Try:
SELECT token,maxvar,maxvbr,minvcr,minvdr,type,avalue, COUNT(*)
FROM table
GROUP BY token,maxvar,maxvbr,minvcr,minvdr,type,avalue
HAVING COUNT(*)>1

Related

Coalessing with condition in related table

I am working in MySQL 5.7.35 and I have the following tables:
create table Table1 (
Id int not null auto_increment,
Name varchar(255) not null,
primary key(Id)
);
create table Table2 (
Id int not null auto_increment,
Name varchar(255) not null,
Table1_Id int not null,
primary key(Id),
foreign key(Table1_Id) references Table1(Id)
);
create table Table3 (
Id int not null auto_increment,
Type varchar(255) not null,
Name varchar(255) not null,
Result varchar(255) not null,
Table2_Id int not null,
primary key(Id),
foreign key(Table2_Id) references Table2(Id)
);
Inside, I have the following data:
| Id | Name |
| --- | ---------- |
| 1 | Computer A |
---
| Id | Name | Table1_Id |
| --- | ---------- | --------- |
| 1 | Test Run 1 | 1 |
---
| Id | Type | Name | Result | Table2_Id |
| --- | --------- | --------- | ------- | --------- |
| 1 | Processor | MMX | Pass | 1 |
| 2 | Processor | SSE | Pass | 1 |
| 3 | Processor | SSE 2 | Pass | 1 |
| 4 | Display | Red | Pass | 1 |
| 5 | Display | Green | Pass | 1 |
| 6 | Keyboard | General | Pass | 1 |
| 7 | Keyboard | Lights | Skipped | 1 |
| 8 | Network | Ethernet | Pass | 1 |
| 9 | Network | Wireless | Skipped | 1 |
| 10 | Network | Bluetooth | Fail | 1 |
Desired Query
I would like two columns table1_name and test_result where test_result is a concatenated string with the following logic:
For any given value in Type:
If all are passes, then the result is a Pass
If any are fails, then the result is a Fail
If any are Skipped (poviding the first two points are checked), then the result is Skipped.
So for the current data, the output will be:
| table1_name | test_result |
| ----------- | ---------------------------------------------------------------- |
| Computer A | Processor: Pass, Display: Pass, Keyboard: Skipped, Network: Fail |
Current Query
I am struggling to do the coalecing bit when the items I wish to coalesce are in a child table two levels down. My current query is:
select t1.Name as 'table1_name'
-- coalesce to happen here
from Table1 t1
inner join Table2 t2 on t1.Id = t2.Table1_Id
inner join Table3 t3 on t2.Id = t3.Table2_Id;
I have created a db-fiddle to make things easier.

Use GROUP_CONCAT() to collect all Results for each Name and Type combination in your preferred order and then in another level of aggregation pick the the first 1:
SELECT table1_name,
GROUP_CONCAT(Type, ': ', SUBSTRING_INDEX(Results, ',', 1) SEPARATOR ', ') test_result
FROM (
SELECT t1.Name table1_name, t3.Type,
GROUP_CONCAT(Result ORDER BY Result = 'Fail' DESC, Result = 'Skipped' DESC) Results
FROM Table1 t1
INNER JOIN Table2 t2 on t1.Id = t2.Table1_Id
INNER JOIN Table3 t3 on t2.Id = t3.Table2_Id
GROUP BY t1.Name, t3.Type
) t
GROUP BY table1_name;
If you want to preserve the order of Types in the results:
SELECT table1_name,
GROUP_CONCAT(Type, ': ', SUBSTRING_INDEX(Results, ',', 1) ORDER BY Id SEPARATOR ', ') test_result
FROM (
SELECT t1.Name table1_name, MIN(t3.Id) Id, t3.Type,
GROUP_CONCAT(Result ORDER BY Result = 'Fail' DESC, Result = 'Skipped' DESC) Results
FROM Table1 t1
INNER JOIN Table2 t2 on t1.Id = t2.Table1_Id
INNER JOIN Table3 t3 on t2.Id = t3.Table2_Id
GROUP BY t1.Name, t3.Type
) t
GROUP BY table1_name;
See the demo.

This looks like two levels of aggregation:
select Name, group_concat(name, ': ', result separator ', ')
from (select t1.Name, t3.type,
(case when min(result) = max(result) then min(result)
else 'Skipped'
end) as result
from Table1 t1 inner join
Table2 t2
on t1.Id = t2.Table1_Id inner join
Table3 t3
on t2.Id = t3.Table2_Id
group by t1.Name, t3.type
) nt
group by Name;

Conditionally select some parts of table

I'm trying to conditionally select some columns of my table.
The structure of my table could look weird, but I have no influence on that:
| id | col1|1 | col2|1 | col1|2 | col2|2 | col1|3 | col2|3 |
|:--:|:------------:|:------------:|:------------:|:------------:|:------------:|:------------:|
| 1 | some | meaningless | text | don't | mind | me |
| 2 | abc | def | NULL | NULL | my | text |
| 3 | dummytext... | dummytext... | dummytext... | dummytext... | dummytext... | dummytext... |
This table is divided into 3 parts, marked with a |X at the end.
col1|1 and col2|1
col1|2 and col2|2
col1|3 and col2|3
I only want each part, if col2 of that part IS NOT NULL.
This is my approach:
SELECT t1.`col1|1`, t1.`col2|1`, t2.`col1|2`, t2.`col2|2`, t3.`col1|3`, t3.`col2|3`
FROM tab1 t1
LEFT JOIN tab1 t2 On t1.`id` = t2.`id`
LEFT JOIN tab1 t3 on t1.`id` = t3.`id`
WHERE
t1.`col2|1` IS NOT NULL
AND t2.`col2|2` IS NOT NULL # this column is NULL, so I don't want it (including table t2)
AND t3.`col2|3` IS NOT NULL
AND t1.`id` = 2
AND t2.`id` = 2
AND t3.`id` = 2
If works only if all col2 are NOT NULL, but if 1 of them IS NULL, the whole result is empty.
If you replace the both NULL-values in my example table, you would get all 6 columns, which would be right, as no part would be NULL in this case.
In my example, I want that output:
| col1|1 | col2|1 | col1|3 | col2|3 |
|:------:|:------:|:------:|:------:|
| abc | def | my | text |
Here is a fiddle.

I modified your code:
SELECT t1.`col1|1`, t1.`col2|1`, t2.`col1|2`, t2.`col2|2`, t3.`col1|3`, t3.`col2|3`
FROM
tab1 t
LEFT JOIN tab1 t1 On t.`id` = t1.`id` AND t1.`col2|1` IS NOT NULL
LEFT JOIN tab1 t2 On t.`id` = t2.`id` AND t2.`col2|2` IS NOT NULL
LEFT JOIN tab1 t3 on t.`id` = t3.`id` AND t3.`col2|3` IS NOT NULL
WHERE
t.`id` = 2
It realizes the described logic, but doesn't exclude the NULL columns, the result of the query:
+----+--------+--------+--------+--------+--------+--------+
| | col1|1 | col2|1 | col1|2 | col2|2 | col1|3 | col2|3 |
+----+--------+--------+--------+--------+--------+--------+
| 1 | abc | def | NULL | NULL | my | text |
+----+--------+--------+--------+--------+--------+--------+

In MySQL, a query can not generate the dynamic number of columns in its result. So it's not possible to conditionally select some columns of a table. The result of your query will always return six columns.
But you could try to select all 3 parts of the table. For each part, if col2 of that part is NULL then col1 of that part will be NULL also.
SELECT IF(`col2|1` IS NULL, NULL, `col1|1`) AS `col1|1`, `col2|1`,
IF(`col2|2` IS NULL, NULL, `col1|2`) AS `col1|2`, `col2|2`,
IF(`col2|3` IS NULL, NULL, `col1|3`) AS `col1|3`, `col2|3`
FROM tab1
WHERE `id` = 2

Select and Update in same time and gets displayed

I have this query:
SELECT MIN(id),CustomerName, Scenario,StepNo,InTransit,IsAlef,runNo,ResponseLength
FROM `RequestInfo`
WHERE `CustomerName` = 'Hotstar'
AND `ResponseContentType` like '%video/MP2T%'
AND `RequestHttpRequest` like '%segment%' ;
which gives me output like this:-
+---------+--------------+----------+--------+-----------+--------+-------+----------------+----------+
| MIN(id) | CustomerName | Scenario | StepNo | InTransit | IsAlef | runNo | ResponseLength | IsActive |
+---------+--------------+----------+--------+-----------+--------+-------+----------------+----------+
| 139 | HotStar | SearchTv | 1 | No | No | 1 | 410098 | NULL |
+---------+--------------+----------+--------+-----------+--------+-------+----------------+----------+
I want to insert string "Yes" in the last column i.e "IsActive" when the above data is being displayed but only when the IsActive is set as NULL.

Use below query
Update RequestInfo R inner join (SELECT MIN(id) as id,CustomerName, Scenario,StepNo,InTransit,IsAlef,runNo,ResponseLength
FROM `RequestInfo`
WHERE `CustomerName` = 'Hotstar'
AND `ResponseContentType` like '%video/MP2T%'
AND `RequestHttpRequest` like '%segment%')as T on R.id = T.id set R.isAcitve ='Yes' Where R.id = T.id;

Efficient assignment of percentile/rank in MYSQL

I have a couple of very large tables (over 400,000 rows) that look like the following:
+---------+--------+---------------+
| ID | M1 | M1_Percentile |
+---------+--------+---------------+
| 3684514 | 3.2997 | NULL |
| 3684515 | 3.0476 | NULL |
| 3684516 | 2.6499 | NULL |
| 3684517 | 0.3585 | NULL |
| 3684518 | 1.6919 | NULL |
| 3684519 | 2.8515 | NULL |
| 3684520 | 4.0728 | NULL |
| 3684521 | 4.0224 | NULL |
| 3684522 | 5.8207 | NULL |
| 3684523 | 6.8291 | NULL |
+---------+--------+---------------+...about 400,000 more
I need to assign each row in the M1_Percentile column a value that represents "the percent of rows with M1 values equal or lower to the current row's M1 value"
In other words, I need:
I implemented this sucessfully, but it is FAR FAR too slow. If anyone could create a more efficient version of the following code, I would really appreciate it!
UPDATE myTable AS X JOIN (
SELECT
s1.ID, COUNT(s2.ID)/ (SELECT COUNT(*) FROM myTable) * 100 AS percentile
FROM
myTable s1 JOIN myTable s2 on (s2.M1 <= s1.M1)
GROUP BY s1.ID
ORDER BY s1.ID) AS Z
ON (X.ID = Z.ID)
SET X.M1_Percentile = Z.percentile;
This is the (correct but slow) result from the above query if the number of rows is limited to the ones you see (10 rows):
+---------+--------+---------------+
| ID | M1 | M1_Percentile |
+---------+--------+---------------+
| 3684514 | 3.2997 | 60 |
| 3684515 | 3.0476 | 50 |
| 3684516 | 2.6499 | 30 |
| 3684517 | 0.3585 | 10 |
| 3684518 | 1.6919 | 20 |
| 3684519 | 2.8515 | 40 |
| 3684520 | 4.0728 | 80 |
| 3684521 | 4.0224 | 70 |
| 3684522 | 5.8207 | 90 |
| 3684523 | 6.8291 | 100 |
+---------+--------+---------------+
Producing the same results for the entire 400,000 rows takes magnitudes longer.

I cannot test this, but you could try something like:
update table t
set mi_percentile = (
select count(*)
from table t1
where M1 < t.M1 / (
select count(*)
from table));
UPDATE:
update test t
set m1_pc = (
(select count(*) from test t1 where t1.M1 < t.M1) * 100 /
( select count(*) from test));
This works in Oracle (the only database I have available). I do remember getting that error in MySQL. It is very annoying.

Fair warning: mysql isn't my native environment. However, after a little research, I think the following query should be workable:
UPDATE myTable AS X
JOIN (
SELECT X.ID, (
SELECT COUNT(*)
FROM myTable X1
WHERE (X.M1, X.id) >= (X1.M1, X1.id) as Rank)
FROM myTable as X
) AS RowRank
ON (X.ID = RowRank.ID)
CROSS JOIN (
SELECT COUNT(*) as TotalCount
FROM myTable
) AS TotalCount
SET X.M1_Percentile = RowRank.Rank / TotalCount.TotalCount;

SQL Group by combination?

I am having problems selecting items from a table where a device_id can be either in the from_device_id column or the to_device_id column. I am trying to return all chats where the given device is ID is in the from_device_id or to_device_id columns, but only return the latest message.
select chat.*, (select screen_name from usr where chat.from_device_id=usr.device_id limit 1) as from_screen_name, (select screen_name from usr where chat.to_device_id=usr.device_id limit 1) as to_screen_name from chat where to_device_id="ffffffff-af28-3427-a2bc-83865900edbe" or from_device_id="ffffffff-af28-3427-a2bc-83865900edbe" group by from_device_id, to_device_id;
+----+--------------------------------------+--------------------------------------+---------+---------------------+------------------+----------------+
| id | from_device_id | to_device_id | message | date | from_screen_name | to_screen_name |
+----+--------------------------------------+--------------------------------------+---------+---------------------+------------------+----------------+
| 20 | ffffffff-af28-3427-a2bc-83860033c587 | ffffffff-af28-3427-a2bc-83865900edbe | ee | 2011-02-28 12:36:38 | kevin | handset |
| 1 | ffffffff-af28-3427-a2bc-83865900edbe | ffffffff-af28-3427-a2bc-83860033c587 | yyy | 2011-02-27 17:43:17 | handset | kevin |
+----+--------------------------------------+--------------------------------------+---------+---------------------+------------------+----------------+
2 rows in set (0.00 sec)
As expected, two rows are returned. How can I modify this query to only return one row?
mysql> describe chat;
+----------------+---------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+---------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| from_device_id | varchar(128) | NO | | NULL | |
| to_device_id | varchar(128) | NO | | NULL | |
| message | varchar(2048) | NO | | NULL | |
| date | timestamp | YES | | CURRENT_TIMESTAMP | |
+----------------+---------------+------+-----+-------------------+----------------+
5 rows in set (0.00 sec)

select chat.*,
(select screen_name
from usr
where chat.from_device_id=usr.device_id
limit 1
) as from_screen_name,
(select screen_name
from usr
where chat.to_device_id=usr.device_id
limit 1
) as to_screen_name
from chat
where to_device_id="ffffffff-af28-3427-a2bc-83865900edbe" or
from_device_id="ffffffff-af28-3427-a2bc-83865900edbe"
group by from_device_id, to_device_id
order by date DESC
limit 1;
You need to tell SQL that it should sort the returned data by date to get the most recent chat. Then you just limit the returned rows to 1.

You shouldn't need to use a Group By at all. Rather, you can simply use the Limit predicate to return the last row. In addition, you shouldn't need subqueries as you can use simply Joins. If chat.from_device_id and chat.to_device_id are both not-nullable, then you can replace the Left Joins with Inner Joins.
Select chat.id
, chat.from_device_id
, chat.to_device_id
, chat.message
, chat.date
, FromUser.screen_name As from_screen_nam
, ToUser.screen_name As to_screen_name
From chat
Left Join usr As FromUser
On FromUser.device_id = chat.from_device_id
Left Join usr As ToUser
On ToUser.device_id = chat.to_device_id
Where chat.to_device_id="ffffffff-af28-3427-a2bc-83865900edbe"
Or chat.from_device_id="ffffffff-af28-3427-a2bc-83865900edbe"
Order By chat.date Desc
Limit 1

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL Duplicate rows - specify columns - mysql

SELECT token,maxvar,maxvbr,minvcr,minvdr,type, avalue, Count() FROM yourtable GROUP BY token,maxvar,maxvbr,minvcr,minvdr,type, avalue HAVING Count() > 1 This query returns all the rows that are in the table two times or more often (and how often they are).

Try: SELECT token,maxvar,maxvbr,minvcr,minvdr,type,avalue, COUNT() FROM table GROUP BY token,maxvar,maxvbr,minvcr,minvdr,type,avalue HAVING COUNT()>1

Related

Coalessing with condition in related table

Conditionally select some parts of table

Select and Update in same time and gets displayed

Efficient assignment of percentile/rank in MYSQL

SQL Group by combination?

Categories

Resources

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL Duplicate rows - specify columns - mysql

SELECT token,maxvar,maxvbr,minvcr,minvdr,type, avalue, Count(*) FROM yourtable GROUP BY token,maxvar,maxvbr,minvcr,minvdr,type, avalue HAVING Count(*) > 1 This query returns all the rows that are in the table two times or more often (and how often they are).

Try: SELECT token,maxvar,maxvbr,minvcr,minvdr,type,avalue, COUNT(*) FROM table GROUP BY token,maxvar,maxvbr,minvcr,minvdr,type,avalue HAVING COUNT(*)>1

Related

Coalessing with condition in related table

Conditionally select some parts of table

Select and Update in same time and gets displayed

Efficient assignment of percentile/rank in MYSQL

SQL Group by combination?

Categories

Resources

SELECT token,maxvar,maxvbr,minvcr,minvdr,type, avalue, Count() FROM yourtable GROUP BY token,maxvar,maxvbr,minvcr,minvdr,type, avalue HAVING Count() > 1 This query returns all the rows that are in the table two times or more often (and how often they are).

Try: SELECT token,maxvar,maxvbr,minvcr,minvdr,type,avalue, COUNT() FROM table GROUP BY token,maxvar,maxvbr,minvcr,minvdr,type,avalue HAVING COUNT()>1