MySQL delete duplicated rows keep none [closed]

MySQL delete duplicated rows keep none [closed] - mysql

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
My table looks like this:
Event_id
Species
1
Dog
1
Horse
2
Dog
3
Cat
4
Fish
4
Bird
5
Cat
I dont want to keep any of the rows which have a duplicated event_id, as I cant be sure about the species type of the event. How do I remove both rows of the table in mysql? I dont have a unique id for each row.
The output should look like this:
Event_id
Species
2
Dog
3
Cat
5
Cat
Thanks in advance!

Here's a solution I tested on MySQL 8.0 (required for the use of with):
mysql> create table mytable (event_id int, species varchar(20));
mysql> insert into mytable (Event_id,Species) values (1,'Dog'), (1,'Horse'),
(2,'Dog'), (3,'Cat'), (4,'Fish'), (4,'Bird'), (5,'Cat');
mysql> with cte as (select event_id from mytable group by event_id having count(*)>1)
delete mytable from mytable join cte using (event_id);
mysql> select * from mytable;
+----------+---------+
| event_id | species |
+----------+---------+
| 2 | Dog |
| 3 | Cat |
| 5 | Cat |
+----------+---------+

An easy approach would be:
delete t1
from my_tbl as t1
inner join ( select event_id
from my_tbl
group by event_id
having count(*) >1
)as t2
on t1.event_id=t2.event_id;
Demo: https://www.db-fiddle.com/f/7yUJcuMJPncBBnrExKbzYz/155
Or with sybquery:
delete from my_tbl
where event_id not in ( select t1.event_id from (select event_id
from my_tbl
group by event_id
having count(*) =1) as t1
) ;
Demo: https://www.db-fiddle.com/f/7yUJcuMJPncBBnrExKbzYz/151
Below query return the event_id that exist only once. So you can delete with the condition event_id not in.
select event_id
from my_tbl
group by event_id
having count(*) =1

Related

Select all records where last n characters in column are not unique

I have bit strange requirement in mysql.
I should select all records from table where last 6 characters are not unique.
for example if I have table:
I should select row 1 and 3 since last 6 letters of this values are not unique.
Do you have any idea how to implement this?
Thank you for help.

I uses a JOIN against a subquery where I count the occurences of each unique combo of n (2 in my example) last chars
SELECT t.*
FROM t
JOIN (SELECT RIGHT(value, 2) r, COUNT(RIGHT(value, 2)) rc
FROM t
GROUP BY r) c ON c.r = RIGHT(value, 2) AND c.rc > 1

Something like that should work:
SELECT `mytable`.*
FROM (SELECT RIGHT(`value`, 6) AS `ending` FROM `mytable` GROUP BY `ending` HAVING COUNT(*) > 1) `grouped`
INNER JOIN `mytable` ON `grouped`.`ending` = RIGHT(`value`, 6)
but it is not fast. This requires a full table scan. Maybe you should rethink your problem.

EDITED: I had a wrong understanding of the question previously and I don't really want to change anything from my initial answer. But if my previous answer is not acceptable in some environment and it might mislead people, I have to correct it anyhow.
SELECT GROUP_CONCAT(id),RIGHT(VALUE,6)
FROM table1
GROUP BY RIGHT(VALUE,6) HAVING COUNT(RIGHT(VALUE,6)) > 1;
Since this question already have good answers, I made my query in a slightly different way. And I've tested with sql_mode=ONLY_FULL_GROUP_BY. ;)

This is what you need: a subquery to get the duplicated right(value,6) and the main query yo get the rows according that condition.
SELECT t.* FROM t WHERE RIGHT(`value`,6) IN (
SELECT RIGHT(`value`,6)
FROM t
GROUP BY RIGHT(`value`,6) HAVING COUNT(*) > 1);
UPDATE
This is the solution to avoid the mysql error in the case you have sql_mode=only_full_group_by
SELECT t.* FROM t WHERE RIGHT(`value`,6) IN (
SELECT DISTINCT right_value FROM (
SELECT RIGHT(`value`,6) AS right_value,
COUNT(*) AS TOT
FROM t
GROUP BY RIGHT(`value`,6) HAVING COUNT(*) > 1) t2
)
Fiddle here

Might be a fast code, as there is no counting involved.
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/0
select *
from tbl outr
where not exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Output:
| id | value |
| --- | --------------- |
| 2 | aaaaaaaaaaaaaa |
| 4 | aaaaaaaaaaaaaaB |
| 5 | Hello |
The logic is to test other rows that is not equal to the same id of the outer row. If those other rows has same right 6 characters as the outer row, then don't show that outer row.
UPDATE
I misunderstood the OP's intent. It's the reversed. Anyway, just reverse the logic. Use EXISTS instead of NOT EXISTS
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/3
select *
from tbl outr
where exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Output:
| id | value |
| --- | ----------- |
| 1 | abcdePuzzle |
| 3 | abcPuzzle |
UPDATE
Tested the query. The performance of my answer (correlated EXISTS approach) is not optimal. Just keeping my answer, so others will know what approach to avoid :)
GhostGambler's answer is faster than correlated EXISTS approach. For 5 million rows, his answer takes 2.762 seconds only:
explain analyze
SELECT
tbl.*
FROM
(
SELECT
RIGHT(value, 6) AS ending
FROM
tbl
GROUP BY
ending
HAVING
COUNT(*) > 1
) grouped
JOIN tbl ON grouped.ending = RIGHT(value, 6)
My answer (correlated EXISTS) takes 4.08 seconds:
explain analyze
select *
from tbl outr
where exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Straightforward query is the fastest, no join, just plain IN query. 2.722 seconds. It has practically the same performance as JOIN approach since they have the same execution plan. This is kiks73's answer. I just don't know why he made his second answer unnecessarily complicated.
So it's just a matter of taste, or choosing which code is more readable select from in vs select from join
explain analyze
SELECT *
FROM tbl
where right(value, 6) in
(
SELECT
RIGHT(value, 6) AS ending
FROM
tbl
GROUP BY
ending
HAVING
COUNT(*) > 1
)
Result:
Test data used:
CREATE TABLE tbl (
id INTEGER primary key,
value VARCHAR(20)
);
INSERT INTO tbl
(id, value)
VALUES
('1', 'abcdePuzzle'),
('2', 'aaaaaaaaaaaaaa'),
('3', 'abcPuzzle'),
('4', 'aaaaaaaaaaaaaaB'),
('5', 'Hello');
insert into tbl(id, value)
select x.y, 'Puzzle'
from generate_series(6, 5000000) as x(y);
create index ix_tbl__right on tbl(right(value, 6));
Performances without the index, and with index on tbl(right(value, 6)):
JOIN approach:
Without index: 3.805 seconds
With index: 2.762 seconds
IN approach:
Without index: 3.719 seconds
With index: 2.722 seconds

Just a bit neater code (if using MySQL 8.0). Can't guarantee the performance though
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/1
select x.*
from
(
select
*,
count(*) over(partition by right(value, 6)) as unique_count
from tbl
) as x
where x.unique_count = 1
Output:
| id | value | unique_count |
| --- | --------------- | ------------ |
| 2 | aaaaaaaaaaaaaa | 1 |
| 4 | aaaaaaaaaaaaaaB | 1 |
| 5 | Hello | 1 |
UPDATE
I misunderstood OP's intent. It's the reversed. Just change the count:
select x.*
from
(
select
*,
count(*) over(partition by right(value, 6)) as unique_count
from tbl
) as x
where x.unique_count > 1
Output:
| id | value | unique_count |
| --- | ----------- | ------------ |
| 1 | abcdePuzzle | 2 |
| 3 | abcPuzzle | 2 |

MySQL JOIN two tables by one value and take other results (without that value too)

I have some problem with a mySQL query.
The table A is this:
A.id
A.value1
A.user
Table B is:
B.id
B.user
I need to find value_that_i_need from query, by searching for B.user.
But I don't need only values with A.user, i need all values from Table A with the same A.id (inside Table A) that matches B.user.
So I need all distinct id (where there is B.user=A.user) and search for them inside table A by A.id.
I want to avoid 2 different queries! Already tried differents JOIN, nothing works for me.
EDIT
Ok, i will ty to explain the problem in a easiest way.
I have this table:
+---------+------------+
| id_user | another_id |
+---------+------------+
id_user -> unique id for each user
another_id -> an id related to something like a group
another_id can be the same to more users, but i need to take only users who are inside my same groups.
So i will have to check my groups (by searching my id_user) and then i have to see all users with my same another_id.
Problem is that if i query something like this:
SELECT * FROM table0 AS t0, something_like_groups AS slg
JOIN user_inside_group as uig ON slg.id_group=uig.group_id AND slg.id_user='my_user_id'
WHERE slg.id='id_group' AND t0.user_id=uig.user_id
Actually i have to join 3 tables, but the problem is that i need to find the "group" inside i am and get ALL informations about all users inside my same group. (without an additional query)

Perhaps you just want to find the min id based on b user and then get all the rows from a which match. for example
drop table if exists t,t1;
create table t( id int,user varchar(10));
create table t1( id int,user varchar(10));
insert into t values
(1,'aaa'),(1,'bbb'),(2,'ccc');
insert into t1 values
(1,'bbb'),(2,'ccc')
;
select t.id,t.user
from t
join
(
select t1.user,min(t.id) minid
from t1
join t on t.user = t1.user
group by t1.user
) s
on t.id = s.minid;
+------+------+
| id | user |
+------+------+
| 1 | aaa |
| 1 | bbb |
| 2 | ccc |
+------+------+
3 rows in set (0.00 sec)

how to get distinct result in sql?

I am trying to get distinct result of following table
id | name | created_on
1 | xyz | 2015-07-04 09:45:14
1 | xyz | 2015-07-04 10:40:59
2 | abc | 2015-07-05 10:40:59
I want distinct id with latest created_on means following result
1 | xyz | 2015-07-04 10:40:59
2 | abc | 2015-07-05 10:40:59
How to get above result by sql query?

Try this:
Select id, name, max(created_on) as created_on from table group by id

Try:
select id,max(name), max(created_on) from table_name group by id
Additional Note:
As it appears, your table is not normalized. That is, you store the name along with id in this table. So you may have these two rows simultaneously:
id | name | created_on
1 | a | 12-12-12
1 | b | 11-11-11
If that state is not logically possible in your model, you should redesign your database by splitting this table into two separate tables; one for holding id-name relationship, and another to hold id-created_on relationship:
table_1 (id,name)
table_2 (id,created_on)
Now, to get last created_on for each id:
select id,max(created_on) from table_2
And if you want to hold name in the query:
select t1.id, t1.name, t2.created_on from table_1 as t1 inner join
(select id, max(created_on) as created_on from table_2) as t2
on t1.id=t2.id

Assuming that id/name is always a pair:
select id, name, max(created_on)
from table
group by id, name;
It is safer to include both in the group by. I also find it misleading to name a column id when it is not unique for the table.

You can use the keyword DISTINCT
like
SELECT DISTINCT

SQL Query - Not in a set of already in-use items

I am trying to select jobs that are not currently assigned to a user.
Users table: id | name
Jobs: id | name
Assigned: id | user_id | job_id | date_assigned
I want to select all the jobs that are not currently taken. Example:
Users:
id | name
--------------
1 | Chris
2 | Steve
Jobs
id | name
---------------
1 | Sweep
2 | Skids
3 | Mop
Assigned
id | user_id | job_id | date_assigned
-------------------------------------------------
1 | 1 | 1 | 2012-01-01
2 | 1 | 2 | 2012-01-02
3 | 2 | 3 | 2012-01-05
No two people can be assigned the same job. So the query would return
[1, Sweep]
Since no one is working on it since Chris got moved to Skids a day later.
So far:
SELECT
*
FROM
jobs
WHERE
id
NOT IN
(
SELECT
DISTINCT(job_id)
FROM
assigned
ORDER BY
date_assigned
DESC
)
However, this query returns NULL on the same data set. Not addressing that the sweep job is now open because it is not currently being worked on.

SELECT a.*
FROM jobs a
LEFT JOIN
(
SELECT a.job_id
FROM assigned a
INNER JOIN
(
SELECT MAX(id) AS maxid
FROM assigned
GROUP BY user_id
) b ON a.id = b.maxid
) b ON a.id = b.job_id
WHERE b.job_id IS NULL
This gets the most recent job per user. Once we have a list of those jobs, we select all jobs that aren't on that list.

You can try this variant:
select * from jobs
where id not in (
select job_id from (
select user_id, job_id, max(date_assigned)
from assigned
group by user_id, job_id));

I think you might want:
SELECT *
FROM jobs
WHERE id NOT IN (SELECT job_id
from assigned
where user_id is not null
)
This assumes that re-assigning someone changes the user id on the original assignment. Does this happen? By the way, I also simplified the subquery.

First you need to be looking at a list of only current job assignments. Ordering isn't enough. The way you have it set up, you need a distinct subset of job assignments from Assigned that are the most recent assignments.
So you want a grouping subquery something like
select job_id, user_id, max(date_assigned) last_assigned from assigned group by job_id, user_id
Put it all together and you get
select id, name from jobs
where id not in (
select job_id as id from (
select job_id, user_id, max(date_assigned) last_assigned from assigned
group by job_id, user_id
)
)
As an extra feature, you could pass up the value of "last_assigned" and it would tell you how long a job has been idle for.

Using ORDER BY and GROUP BY together

My table looks like this (and I'm using MySQL):
m_id | v_id | timestamp
------------------------
6 | 1 | 1333635317
34 | 1 | 1333635323
34 | 1 | 1333635336
6 | 1 | 1333635343
6 | 1 | 1333635349
My target is to take each m_id one time, and order by the highest timestamp.
The result should be:
m_id | v_id | timestamp
------------------------
6 | 1 | 1333635349
34 | 1 | 1333635336
And i wrote this query:
SELECT * FROM table GROUP BY m_id ORDER BY timestamp DESC
But, the results are:
m_id | v_id | timestamp
------------------------
34 | 1 | 1333635323
6 | 1 | 1333635317
I think it causes because it first does GROUP_BY and then ORDER the results.
Any ideas? Thank you.

One way to do this that correctly uses group by:
select l.*
from table l
inner join (
select
m_id, max(timestamp) as latest
from table
group by m_id
) r
on l.timestamp = r.latest and l.m_id = r.m_id
order by timestamp desc
How this works:
selects the latest timestamp for each distinct m_id in the subquery
only selects rows from table that match a row from the subquery (this operation -- where a join is performed, but no columns are selected from the second table, it's just used as a filter -- is known as a "semijoin" in case you were curious)
orders the rows

If you really don't care about which timestamp you'll get and your v_id is always the same for a given m_i you can do the following:
select m_id, v_id, max(timestamp) from table
group by m_id, v_id
order by max(timestamp) desc
Now, if the v_id changes for a given m_id then you should do the following
select t1.* from table t1
left join table t2 on t1.m_id = t2.m_id and t1.timestamp < t2.timestamp
where t2.timestamp is null
order by t1.timestamp desc

Here is the simplest solution
select m_id,v_id,max(timestamp) from table group by m_id;
Group by m_id but get max of timestamp for each m_id.

You can try this
SELECT tbl.* FROM (SELECT * FROM table ORDER BY timestamp DESC) as tbl
GROUP BY tbl.m_id

SQL>
SELECT interview.qtrcode QTR, interview.companyname "Company Name", interview.division Division
FROM interview
JOIN jobsdev.employer
ON (interview.companyname = employer.companyname AND employer.zipcode like '100%')
GROUP BY interview.qtrcode, interview.companyname, interview.division
ORDER BY interview.qtrcode;

I felt confused when I tried to understand the question and answers at first. I spent some time reading and I would like to make a summary.
The OP's example is a little bit misleading.
At first I didn't understand why the accepted answer is the accepted answer.. I thought that the OP's request could be simply fulfilled with
select m_id, v_id, max(timestamp) as max_time from table
group by m_id, v_id
order by max_time desc
Then I took a second look at the accepted answer. And I found that actually the OP wants to express that, for a sample table like:
m_id | v_id | timestamp
------------------------
6 | 1 | 11
34 | 2 | 12
34 | 3 | 13
6 | 4 | 14
6 | 5 | 15
he wants to select all columns based only on (group by)m_id and (order by)timestamp.
Then the above sql won't work. If you still don't get it, imagine you have more columns than m_id | v_id | timestamp, e.g m_id | v_id | timestamp| columnA | columnB |column C| .... With group by, you can only select those "group by" columns and aggreate functions in the result.
By far, you should have understood the accepted answer.
What's more, check row_number function introduced in MySQL 8.0:
https://www.mysqltutorial.org/mysql-window-functions/mysql-row_number-function/
Finding top N rows of every group
It does the simlar thing as the accepted answer.
Some answers are wrong. My MySQL gives me error.
select m_id,v_id,max(timestamp) from table group by m_id;
#abinash sahoo
SELECT m_id,v_id,MAX(TIMESTAMP) AS TIME
FROM table_name
GROUP BY m_id
#Vikas Garhwal
Error message:
[42000][1055] Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'testdb.test_table.v_id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

Why make it so complicated? This worked.
SELECT m_id,v_id,MAX(TIMESTAMP) AS TIME
FROM table_name
GROUP BY m_id

Just you need to desc with asc. Write the query like below. It will return the values in ascending order.
SELECT * FROM table GROUP BY m_id ORDER BY m_id asc;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL delete duplicated rows keep none [closed] - mysql

Related

Select all records where last n characters in column are not unique

MySQL JOIN two tables by one value and take other results (without that value too)

how to get distinct result in sql?

SQL Query - Not in a set of already in-use items

Using ORDER BY and GROUP BY together

Categories

Resources