Merge and then Delete duplicate entries - mysql

I have mySQL database with some duplicate entries. They have the same field - phone. But they also had fields which differs. At example I have two entries with same phone, but first entry has rating filed = default_value and second entry has rating field = 5.
So I must merge this entries and only then delete duplicates...
More common example:
entry1.phone==123
entry1.phone==etry2.phone
entry1.rating!=entry2.phone
entry1.rating==default_value(0)
entry2.rating==5
merge
entry1.phone==123
entry1.rating==5
entry2 is deleted

I don't think you can do this in SQL efficiently. One slow way to do it is something like:
CREATE TEMPORARY TABLE tmp_table (...);
INSERT INTO tmp_table SELECT phone, max(rating) FROM table GROUP BY phone;
TRUNCATE table;
INSERT INTO table SELECT * FROM tmp_table;
A better way would be a stored procedure or an external script. Select all rows from the table ordered by phone and do the grouping/merging/deleting manually (iterate over the results, compare to the phone value from the previous row, if it's different you have a new group, etc.). Writing stored procedures in MySQL is painful though, so I'm not going to write the code for you. :)

It sounds like you don't really need to merge any records if you are just trying to update the first record with the non-default rating. I think you can just delete any records with the default rating.
Select a.*
from tbl a
inner join tbl b
on a.Phone = b.Phone
and a.Rating < b.Rating
Delete a
from tbl a
inner join tbl b
on a.Phone = b.Phone
and a.Rating < b.Rating
If you truly have to update the first record and delete the second record, you can do something similar if you have an autoincrement ID. The next example is what I would do to update the first record if an ID exists. This is only reliable if you only have phone numbers duplicated one time.
Update a
Set a.Rating = b.Rating
from tbl a
inner join tbl b
on a.Phone = b.Phone
and a.Rating < b.Rating
and a.ID < b.ID
Delete a
from tbl a
inner join tbl b
on a.Phone = b.Phone
and a.Rating = b.Rating
and b.ID > a.ID
Hope this helps.
-Ranthalion

Related

Delete, Update with derived tables?

I have just studied FROM clause and derived tables in mysql and most of the websites provided the examples using SELECT command
Example SELECT * FROM (SELECT * FROM usrs) as u WHERE u.name = 'john'
But when I have tried using delete or update command it does not seem to work.
Example DELETE FROM (SELECT * FROM usrs) as u WHERE u.name = 'john'
1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near (SELECT * FROM usrs) as u WHERE u.name = 'john' at line
UPDATE (SELECT * FROM usrs) as u SET u.lname ='smith' WHERE u.name = 'john'
1288 The target table e of the UPDATE is not updatable
So derived tables does not work with delete or update commands? or is there a way to make it work.
Instead of writing the table name for update and delete I want to write a subquery that gets the records and perform the delete operation on that records? Is that possible in mysql?
UPDATED I have to delete a record and i have three tables, the record may exist in any of the table
My approach delete from first table rows effected? quit: else check second table rows effected? quit : else check third table
But if I use UNION ALL I can do this way
Delete from (select * from tb1 union all select * from tb2 union all select * from tb3) e as e.uname = 'john'
but this query does not seem to work , now could anyone tell me how do i delete or update a record when i have more than one table to search. Any help is greatly appreciated.
You can't directly delete from the subquery, but you can still use it if you'd like, you'll just need to use it in a JOIN:
DELETE usrs
FROM usrs
INNER JOIN (
SELECT * FROM usrs WHERE name = 'john'
) t ON usrs.Id = t.Id
Or you could use IN:
DELETE usrs
WHERE ID IN (
SELECT ID
FROM usrs
WHERE name = 'John'
)
With this said, for this example, I don't know why you'd want a subquery:
DELETE usrs WHERE name = 'John'
Edit base on comments. To delete from multiple tables at the same time, you can either have multiple DELETE statements, or you can use something like the following:
delete t1, t2, t3
from (select 'john' as usr) t
left join t1 on t.usr=t1.usr
left join t2 on t.usr=t2.usr
left join t3 on t.usr=t3.usr
SQL Fiddle Demo
Derived tables exist only for the duration of the parent query they're a member of. Assuming that this syntax and the operations were allowed by MySQL, consider what happens:
a) Your main query starts executing
b) the sub-query executes and returns its results as a temporary table
c) the parent update changes that temporary table
d) the parent query finishes
e) temporary tables are cleaned up and deleted
Essentially you'll have done nothing except waste a bunch of cpu cycles and disk bandwidth.
UPDATE queries DO allow you to join against other tables to use in the WHERE clause, e.g..
UPDATE maintable
LEFT JOIN othertable ON maintable.pk = othertable.fk
SET maintable.somefield='foo'
WHERE othertable.otherfield = 'bar'

Update a field in table A with the max value of field in table B

I want to update a field, let's call it 'field_A', in table 'table_A' with the maximum value that exists of field 'field_B' in 'table_B', but only IF there is a max value for that field 'field_B' in table 'table_B'.
Table 'table_B' has a 'reference' field which contains the 'id' of the table_A record we want to update.
Now I have the following query, which works perfectly.
UPDATE table_A a SET a.field_A = (SELECT MAX(b.field_B)
FROM table_B WHERE b.reference = a.id)
WHERE a.id IN (
SELECT reference
FROM table_B
GROUP BY reference
HAVING COUNT(reference) > 0
)
So it only updates field_A IF there are records found for that reference because I don't want to end up setting fields 'field_A' to zero when no related records were found.
As I said before, this query already works perfectly, but now I have to run a query for table_B two times, which seems a little bit inefficient and it is probably possible to do it with only 1 join statement but I can't seem to tackle the issue.
Since this query has to cross reference a lot, really a lot, of records, performance is really an issue here.
With these two nested statements, your UPDATE statements seems quite
inefficient to me. Try this SQL statement below, that should do the job.
SQL Fiddle here: http://sqlfiddle.com/#!2/5825a/2
UPDATE table_A a1
JOIN
(
SELECT a.id as id, max(b.field_B) as max_val
FROM
table_A a
LEFT JOIN table_B b ON a.id = b.reference
GROUP BY a.id
) t on a1.id = t.id
SET
a1.field_A = t.max_val
WHERE
(t.max_val IS NOT NULL)

SQL Query to find missing records between two tables and then update the second with the missing records from the first

Two tables. 8 fields in each. Both tables have the same data, one with
137,002 record (tablea) and one with 135,759 records (tableb). Both tables share a common primary field if three columns (qid, sid, aid).
Is there a single query that will.
1) compare tablea to tableb on the primary field
and
2) if the record is in tablea and not tableb copy the record from tablea to tableb
I would rather be able to update tableb with an sql query rather than writing a php loop to go through the 137,002 and do a compare on each one.
Thanks
That should be smth looking like:
insert into table2 (qid, sid ...)
select
t1.qid,
t1.sid,
...
from table1 t1
where
not exist (select t2.qid, t2.sid, ... from table2 t2 where t2.qid = t1.qid and t2.sid = t1.sid...)
INSERT INTO tableb AS b
(SELECT * FROM tablea AS a WHERE NOT EXISTS (SELECT * FROM tableb AS b2 WHERE b2.id = a.id))
Use merge...and use insert only....not update.
So, the following worked.
insert into f_step_ans (`qid`, `assid`, `sid`, `sas`, `cas`, `etim`, `stim`, `endtim`, `fil`)
select
t1.qid,
t1.assid,
t1.sid,
t1.sas,
t1.cas,
t1.etim,
t1.stim,
t1.endtim,
t1.fil
from f_step_ans_back t1
where
not exists (select t2.qid, t2.sid,t2.assid from f_step_ans as t2 where t2.qid = t1.qid and t2.assid = t1.assid and t2.sid = t1.sid)
1,588 records were moved from the f_step_ans_back table (old backup table) to the f_step_ans table (partially recovered backup + new data). Reporting shows that everything is working like it should be. Thank you all for the help.

Eliminating duplicates from SQL query

What would be the best way to return one item from each id instead of all of the other items within the table. Currently the query below returns all manufacturers
SELECT m.name
FROM `default_ps_products` p
INNER JOIN `default_ps_products_manufacturers` m ON p.manufacturer_id = m.id
I have solved my question by using the DISTINCT value in my query:
SELECT DISTINCT m.name, m.id
FROM `default_ps_products` p
INNER JOIN `default_ps_products_manufacturers` m ON p.manufacturer_id = m.id
ORDER BY m.name
there are 4 main ways I can think of to delete duplicate rows
method 1
delete all rows bigger than smallest or less than greatest rowid value. Example
delete from tableName a where rowid> (select min(rowid) from tableName b where a.key=b.key and a.key2=b.key2)
method 2
usually faster but you must recreate all indexes, constraints and triggers afterward..
pull all as distinct to new table then drop 1st table and rename new table to old table name
example.
create table t1 as select distinct * from t2; drop table t1; rename t2 to t1;
method 3
delete uing where exists based on rowid. example
delete from tableName a where exists(select 'x' from tableName b where a.key1=b.key1 and a.key2=b.key2 and b.rowid >a.rowid) Note if nulls are on column use nvl on column name.
method 4
collect first row for each key value and delete rows not in this set. Example
delete from tableName a where rowid not in(select min(rowid) from tableName b group by key1, key2)
note that you don't have to use nvl for method 4
Using DISTINCT often is a bad practice. It may be a sing that there is something wrong with your SELECT statement, or your data structure is not normalized.
In your case I would use this (in assumption that default_ps_products_manufacturers has unique records).
SELECT m.id, m.name
FROM default_ps_products_manufacturers m
WHERE EXISTS (SELECT 1 FROM default_ps_products p WHERE p.manufacturer_id = m.id)
Or an equivalent query with IN:
SELECT m.id, m.name
FROM default_ps_products_manufacturers m
WHERE m.id IN (SELECT p.manufacturer_id FROM default_ps_products p)
The only thing - between all possible queries it is better to select the one with the better execution plan. Which may depend on your vendor and/or physical structure, statistics, etc... of your data base.
I think in most cases EXISTS will work better.

INSERT INTO SELECT WHERE col = col

Please can someone help me, I have a table of users and another table with users and date time (this is a log file and multiple dates exist per user). I need to take the most recent date from the log table and insert it into the first table next to the same user.
This is what I have but its not working:
INSERT INTO tb1 n (DT)
SELECT w.DT
FROM tb2 w
WHERE w.User = n.User
ORDER BY w.DT DESC
limit 1
you don't need to use INSERT statement here since there are already records present on your table. But instead UPDATE it with JOIN
UPDATE tb1 a
INNER JOIN
(
SELECT user, MAX(DT) maxDT
FROM tb2
GROUP by user
) b ON a.user = b.user
SET a.DT = b.maxDT