I have 2 tables, one of participants:
+----+------------+-----------+
| id | First Name | Last Name |
+----+------------+-----------+
| 0 | John | Snow |
| 1 | John | Snow |
| 2 | Michael | Jackson |
+----+------------+-----------+
And one pivot table which connects participants with events:
+----+----------------+----------+
| id | participant_id | event_id |
+----+----------------+----------+
| 0 | 0 | 12 |
| 1 | 1 | 35 |
| 2 | 2 | 35 |
+----+----------------+----------+
By mistake there are duplicate entries in the participants' table.
How I can delete duplicate entries in participants' table and update accordingly the pivot table? So the expected results will be :
participants:
+----+------------+-----------+
| id | First Name | Last Name |
+----+------------+-----------+
| 0 | John | Snow |
| | | | //deleted
| 2 | Michael | Jackson |
+----+------------+-----------+
pivot table:
+----+----------------+----------+
| id | participant_id | event_id |
+----+----------------+----------+
| 0 | 0 | 12 |
| 1 | 0 | 35 | //participant_id changed from 1 to 0
| 2 | 2 | 35 |
+----+----------------+----------+
This will be a multi-step process:
First step is to update the mapping table pivot. Following query will give you all the names which are Duplicate, and the first id for them:
SELECT first_name, last_name, MIN(id) AS first_id
FROM participants
GROUP BY first_name, last_name
HAVING COUNT(*) > 1 -- more than one rows means duplicates exist
You can use the above query as a subquery to update the pivot table using a series of Joins:
UPDATE pivot AS m
JOIN participants AS p1
ON p1.id = m.participant_id
JOIN (
SELECT first_name, last_name, MIN(id) AS first_id
FROM participants
GROUP BY first_name, last_name
HAVING COUNT(*) > 1
) AS p2 ON p2.first_name = p1.first_name
AND p2.last_name = p1.last_name
AND p2.first_id <> p1.id -- avoid the original row
SET m.participant_id = p2.first_id -- update the duplicate row's id to first id
Now, you can DELETE the duplicate rows using the same subquery (to find duplicates):
DELETE p1 FROM participants AS p1
JOIN (
SELECT first_name, last_name, MIN(id) AS first_id
FROM participants
GROUP BY first_name, last_name
HAVING COUNT(*) > 1
) AS p2 ON p2.first_name = p1.first_name
AND p2.last_name = p1.last_name
AND p2.first_id <> p1.id -- avoid the original row
Finally, fix the problem at your data definition level, to avoid this happening again, by defining a UNIQUE constraint on the (first_name, last_name)
ALTER TABLE participants ADD CONSTRAINT unq_idx_name UNIQUE(first_name, last_name);
Related
I have this 2 tables
1st Table "Users"
+----+-----------+----------+
| ID | FirstName | LastName |
+----+-----------+----------+
| 1 | Jeff | Bezos |
| 2 | Bill | Gates |
| 3 | Elon | Musk |
+----+-----------+----------+
2nd Table "Records"
+----+--------+------------+
| ID | IDUser | RecordDate |
+----+--------+------------+
| 1 | 1 | 15/06/2021 |
| 2 | 2 | 05/06/2021 |
| 3 | 2 | 12/06/2021 |
| 4 | 2 | 02/06/2021 |
| 5 | 1 | 17/06/2021 |
+----+--------+------------+
So this 2 tables are linked each other by using a Foreing key Records.IDUsers -> Users.ID
I wanted to make a query that does this
+-----------+----------+----------------+--------------------+
| FirstName | LastName | Lastest Record | Numbers of Records |
+-----------+----------+----------------+--------------------+
| Jeff | Bezos | 17/06/2021 | 2 |
| Bill | Gates | 12/06/2021 | 3 |
| Elon | Musk | NULL | NULL |
+-----------+----------+----------------+--------------------+
You need to use LEFT JOIN in order to get back users without records too; then the MAX and COUNT aggregate functions.
First version: This will return 0 for the number of records instead of NULL, when there are no records for a specific user. Latest record will be NULL as expected.
SELECT
FirstName,
LastName,
MAX(RecordDate) AS LatestRecord,
COUNT(Records.ID) AS NumberOfRecords
FROM Users LEFT JOIN Records on Users.ID = Records.IDUser
GROUP BY Users.ID;
If you want NULL instead of 0 (which normally you do not want), you can use the IF function like this:
SELECT
FirstName,
LastName,
MAX(RecordDate) AS LatestRecord,
IF(COUNT(Records.ID) > 0, COUNT(Records.ID), NULL) AS NumberOfRecords
FROM Users LEFT JOIN Records on Users.ID = Records.IDUser
GROUP BY Users.ID;
Second version: It might happen that running the above query will return an error, something like:
Error: ER_WRONG_FIELD_WITH_GROUP: ...; this is incompatible with sql_mode=only_full_group_by
This happens when/if the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default since MySQL 5.7.5). In order to get around this error, you can use the ANY_VALUE function to select the nonaggregated fields:
SELECT
ANY_VALUE(FirstName) AS FirstName,
ANY_VALUE(LastName) AS LastName,
MAX(RecordDate) AS LatestRecord,
COUNT(Records.ID) AS NumberOfRecords
FROM Users LEFT JOIN Records on Users.ID = Records.IDUser
GROUP BY Users.ID;
left join select all user even if does not have records
select * from users left join records on records.IDUser = ID;
I have a Purchases table, where I'm trying to select all rows where first name, surname and email are duplicates (for all 3).
Purchases table:
| purchase_id | product_id | user_id | firstname | surname | email |
| ------------- | -----------| ------------- | ----------- | --------- | ----------- |
| 1 | 1 | 777 | Sally | Smith | s#gmail.com |
| 2 | 2 | 777 | Sally | Smith | s#gmail.com |
| 3 | 3 | 777 | Sally | Smith | s#gmail.com |
| 4 | 1 | 888 | Bob | Smith | b#gmail.com |
Further to this, each product ID corresponds to a product type in a 'Products' table, and I'm trying to filter by 'lawnmower' purchases (so only product ID 1 & 2)
Products table:
| product_type | product_id |
| ------------- | -----------|
| lawnmower | 1 |
| lawnmower | 2 |
| leafblower | 3 |
I'm hoping to write a query that will return all purchases of the 'lawnmower' type where first name, last name, and email are duplicates (so would return the first two rows of the Purchases table).
This is where my query is at so far, however it's not returning accurate data (e.g. I know I have around 350 duplicates and it's returning 10,000 rows):
SELECT t. *
FROM database_name.purchases t
JOIN (
SELECT firstname, surname, email, count( * ) AS NumDuplicates
FROM database_name.purchases
GROUP BY firstname, surname, email
HAVING NumDuplicates >1
)tsum ON t.firstname = tsum.firstname
AND t.surname = tsum.surname
AND t.email = tsum.email
INNER JOIN database_name.products p2 ON t.product_id = p2.product_id
WHERE p2.product_type = 'lawnmower'
Just wanting to know what I need to tweak in my query syntax.
You know that you should be returning Sally Smith. Create a table from the results of your query above. Then Select * from that table where first_name=sally and surname=Smith. See if you can figure out where you are going wrong based on that. This will help you debug these type of issues yourself in the future.
Your inner SELECT does not filter on the product type. It gets all customers who have purchased any two items. Then you join it to purchases and therefore also get the purchases of customers who have bought any two items and, possibly only one, lawnmower. Add a filter on the product type in the subquery too:
SELECT t.*
FROM database_name.purchases t
INNER JOIN (SELECT purchases.userid
FROM database_name.purchases
INNER JOIN database_name.products
ON products.product_id = purchases.product_id
WHERE products.product_type = 'lawnmower'
GROUP BY userid
HAVING count(*) > 1) s
ON t.user_id = s.user_id
INNER JOIN database_name.products p
ON t.product_id = p.product_id
WHERE p.product_type = 'lawnmower';
Your schema also is problematic -- denormalised. firstname, surname and email depend on user_id (Note that I only grouped and joined using the user_id, that's enough,). So they shouldn't be in purchases, only user_id. product_type better by an ID referencing to some product type table too.
I have a table like this
| user_id | company_id | employee_id |
|---------|------------|-------------|
| 1 | 2 | 123 |
| 2 | 2 | 123 |
| 3 | 5 | 432 |
| 4 | 5 | 432 |
| 5 | 7 | 432 |
I have a query that looks like this
SELECT COUNT(*) AS Repeated, employee_id, GROUP_CONCAT(user_id) as user_ids, GROUP_CONCAT(username)
FROM user_company
INNER JOIN user ON user.id = user_company.user_id
WHERE employee_id IS NOT NULL
AND user_company.deleted_at IS NULL
GROUP BY employee_id, company_id
HAVING Repeated >1;
The results I am getting look like this
| Repeated | employee_id | user_ids |
|---------|--------------|------------|
| 2 | 123 | 2,3 |
| 2 | 432 | 7,8 |
I need results that look like this
| user_id |
|---------|
| 2 |
| 3 |
| 7 |
| 8 |
I realize my query is getting more, but that's just to make sure I'm getting the correct data. Now I need to get a single column result with each user_id in a new row for updating based on user_id in another query. I've tried this by only selecting the user_id but I only get two rows, I need all four rows of duplicates.
Any ideas on how to modify my query?
Here is the query to get all of your user_ids:
SELECT user_id
FROM user_company uc
INNER JOIN
(
SELECT employee_id, company_id
FROM user_company
WHERE employee_id IS NOT NULL
AND deleted_at IS NULL
GROUP BY employee_id, company_id
HAVING COUNT(employee_id) >1
) AS `emps`
ON emps.employee_id = uc.`employee_id`
AND emps.company_id = uc.`company_id`;
This query below will generate the query you are looking for.
SELECT CONCAT('UPDATE user_company SET employee_id = null WHERE user_id IN (', GROUP_CONCAT(user_id SEPARATOR ', '),')') AS user_sql
FROM user_company uc
INNER JOIN
(SELECT employee_id, company_id
FROM user_company
WHERE employee_id IS NOT NULL
AND deleted_at IS NULL
GROUP BY employee_id, company_id
HAVING COUNT(employee_id) >1) AS `emps`
ON emps.employee_id = uc.`employee_id`
AND emps.company_id = uc.`company_id`;
I have two different tables with some of the records having the same sub groups of information, but both having different id values. Below is an example where I have a table of actors from movies and plays.
I would like to query these two tables such that I get a pair of movie_id, play_id values that have all the same actors (i.e. have first_name = given_name and last_name = family_name for each record with the same id).
What would be the appropriate query to accomplish this?
TABLE: movie_actors
| movie_id | first_name | last_name |
|----------+------------+-----------|
| 1 | mary | johnson |
| 1 | john | smith |
| 2 | tom | anderson |
TABLE: play_actors
| play_id | given_name | family_name |
|----------+------------+-------------|
| 23 | mary | johnson |
| 23 | john | smith |
| 31 | marc | anthony |
DESIRED OUTPUT:
| movie_id | play_id |
|----------+---------|
| 1 | 23 |
Use GROUP_CONCAT in subqueries to get all the actors as a single column. Then join the subqueries based on this.
SELECT movie_id, play_id
FROM (SELECT movie_id, GROUP_CONCAT(CONCAT(first_name, '-', last_name) ORDER BY first_name, last_name) AS actors
FROM movie_actors
GROUP BY movie_id) AS m
JOIN (SELECT play_id, GROUP_CONCAT(CONCAT(given_name, '-', family_name) ORDER BY given_name, family_name) AS actors
FROM play_actors
GROUP BY play_id) AS p
ON m.actors = p.actors
Try this:
SELECT DISTINCT `movie_id`, `play_id`
FROM `movie_actors`
INNER JOIN `play_actors`
ON STRCMP(`first_name`,`given_name`) = 0
AND STRCMP(`last_name`,`family_name`) = 0
Note that the values for first_name must exactly match given_name, likewise for last_name and family_name.
If you want to restrict for e.g. the movie_id just add a where clause specifying the desired value at the end of the query like:
WHERE `movie_id` = ** ID **
I have two tables:
t1 with the following columns: name | key | length
t2 with the following columns: name | country.
I need to select all distinct keys with length>2000 group by country. So, I made
SELECT count(distinct key), country
from db.t1
inner join db.t2
on t1.name=t2.name
where length>2000
group by country;
But, when I make the query:
SELECT count(distinct key)
from db.t1
where Length>2000;
I am supposed to get equal results but I'm getting different results. For example, in the first query, I get 125494 and in the second I get: 121653.
What is the reason for this different results?? Knowing that there are some fields in the country are ''. It seems to me they don't appear as a group and i counted them and found that they are 134 records. but I can't find out the reason.
Unless key is UNIQUE (in which case, why bother with the DISTINCT keywords?), there is no reason that your two queries should return the same results.
Suppose t1 contains:
+------+-----+--------+
| name | key | length |
+------+-----+--------+
| a | x | 5000 |
| b | x | 5000 |
| b | y | 5000 |
| c | z | 5000 |
+------+-----+--------+
And t2 contains:
+------+---------+
| name | country |
+------+---------+
| a | uk |
| b | fr |
| c | de |
+------+---------+
Then your queries will return:
First query:
SELECT count(distinct key), country
from db.t1
inner join db.t2
on t1.name=t2.name
where length>2000
group by country;
Will yield:
+---------------------+---------+
| count(distinct key) | country |
+---------------------+---------+
| 1 | uk |
| 2 | fr |
| 1 | de |
+---------------------+---------+
Second query:
SELECT count(distinct key)
from db.t1
where Length>2000;
Will yield:
+---------------------+
| count(distinct key) |
+---------------------+
| 3 |
+---------------------+
See it on sqlfiddle.
If you have multiple rows in t2 with the same name the join will be creating duplicates.