Mysql delete older duplicates - mysql

I've got table with this data
id, archive id, ean, index, date, (...)
I've got some items with same archive id, same ean, but different index.
So in this case, I want to delete older (basing on date) item, so result will be that for each combination archive_id/index there will be no more than 1 result.

The following (untested) should work:
DELETE FROM someTable WHERE EXISTS ( SELECT id FROM someTable AS subqTable WHERE
subqTable.id = someTable.id
AND subqTable.ean = someTable.ean
-- and other equality comparisons
AND subqTable.date AFTER someTable.date)

DELETE duplicates.*
FROM _table
JOIN _table AS duplicates
ON (_table.archive_id = duplicates.archive_id AND _table.index = duplicates.index)
WHERE duplicates.date < _table.date;

delete t1
from your_table t1
left join
(
select archive_id, ean, min(date) as mdate
from your_table
group by archive_id, ean
) t2 on t1.archive_id = t2.archive_id
and t1.ean = t2.ean
and t1.date = t2.mdate
where t2.mdate is null

Related

Delete duplicate rows in mySQL in same table

I have this script running to check for duplicates in my table:
select s.id, t.*
from [stuff] s
join (
select name, city, count(*) as qty
from [stuff]
group by name, city
having count(*) > 1
) t on s.name = t.name and s.city = t.city
This works fine and returns the ID's of the duplicate rows:
myresult = cur.fetchall()
print(myresult)
Example output:
[(84,), (85,), (339,), (340,), (351,), (352,), (416,), (417,), (511,), (512,), (532,), (533,),
(815,), (816,), (978,), (979,), (1075,), (1076,), (1385,), (1386,), (1512,)]
Now I want to delete records 84, 339, 351, 416, etc.
What would be the most convenient way to do so?
MySQL provides you with the DELETE JOIN statement that allows you to remove duplicate rows quickly.
The following statement deletes duplicate rows and keeps the highest id:
DELETE t1 FROM table_name t1
INNER JOIN table_name t2
WHERE
t1.id < t2.id AND
t1.unique_col = t2.unique_col;
In case you want to delete duplicate rows and keep the lowest id, you can use the following statement:
DELETE t1 FROM table_name t1
INNER JOIN table_name t2
WHERE
t1.id > t2.id AND
t1.unique_col = t2.unique_col;
you can remove duplicate rows in MySQL in this way
WHERE customer_id NOT IN
(
SELECT
customer_id
FROM
(
SELECT MIN(customer_id) as customer_id
FROM CUSTOMERS
GROUP BY CONCAT(first_name, last_name, phone)
) AS duplicate_customer_ids
);`

Mysql - Return only first row having the same unique ids

Consider the following table:
As shown in image, I want to return all the data from only first distinct id. How can I achieve that in MySQL ?
You can filter with a subquery. Assuming that by first you mean the row with the earlier start_time, that would be:
select t.*
from mytable t
where t.start_time = (
select min(t1.start_time) from mytable t1 where t1.call_unique_id = t.call_unique_id
)
from your_table t1
join
(
select min(call_unique_id) as id
from your_table
group by start_time
) t2 on t1.id = t2.id
group by should also do the job. so try
select * from your_table group by call_unique_id

How to SELECT date and time within that date?

I read few articles about this: Select max date, then max time This one seems most helpful but I do not see way to implement it.
There is five tables. I join them. I need to select only one row with highest date and highest time from first table and same from second table and join the rest on some other value. With the code I wrote I get multiple rows. It seems time selection is not right.
It might be done with subquery in subquery. I've tried something like this:
SELECT * from table1
INNER JOIN table2 ON table1.date = table2.date AND table1.gm = table2.gm
INNER JOIN table3 ON table2.gm = table3.gm ...
WHERE table3.date = :date AND table4.date = :date ...
AND table1.date IN(
SELECT MAX(table1.date) FROM table1 WHERE table1.time IN(
SELECT MAX(table1.time) FROM table1
)
)
AND table2.date IN(
SELECT MAX(table2.date) FROM table1 WHERE table2.time IN(
SELECT MAX(table2.time) FROM table2 )
)
ORDER BY table1.id
Question is:
How to get single row after joining all of this where date is highest and time is highest on that date?
Thanks!
EDIT: I am sorry for this. I forgot to say that I need max time of max date related with specific value from tables(gm columns). So that is one row(in example I gave it is table1.gm and table2.gm ... ) for each one of that .gm values which are same in every table, not just one row all together. Solutions Nick and Salim provided works but I did not solved problem.
EDIT: SOLVED! after implementing solutions by Nick I just neded to add GROUP BY cntrs_reper.gm_company_no, cntrs_reper.date.
And that's it. For every row in one table enties with highest date and time from others!! Thanks to all.
EDIT. If this can help this is full query:
SELECT cntrs_gm.gm_company_no AS company_c_g,
bns_gms.ded_bns AS ded_bns_gms,
bns_gms.no_ded_bns AS no_ded_bns_gms,
bns_gms.wag_ded_bns AS wag_ded_bns_gms,
cntrs_gm.cur_credit AS cur_credit_c_g,
cntrs_gm.cdrop AS cdrop_c_g,
cntrs_gm.total_jp AS total_jp_c_g,
cntrs_gm.games AS games_c_g,
cntrs_gm.wgames AS wgames_c_g,
cntrs_gm.doors AS doors_c_g,
cntrs_gm.power AS power_c_g,
cntrs_gm.total_in AS total_in_c_g,
cntrs_gm.total_out AS total_out_c_g,
cntrs_gm.total_acc AS total_acc_c_g,
cntrs_gm.total_bet AS total_bet_c_g,
cntrs_gm.total_win AS total_win_c_g,
cntrs_gm.total_bonus AS total_bonus_c_g,
cntrs_gm.date AS date_c_g,
cntrs_reper.gm_company_no AS company_reper,
bns_reper.ded_bns AS ded_bns_reper,
bns_reper.no_ded_bns AS no_ded_bns_reper,
bns_reper.wag_ded_bns AS wag_ded_bns_reper,
cntrs_reper.cur_credit AS cur_credit_reper,
cntrs_reper.cdrop AS cdrop_reper,
cntrs_reper.total_jp AS total_jp_reper,
cntrs_reper.games AS games_reper,
cntrs_reper.wgames AS wgames_reper,
cntrs_reper.doors AS doors_reper,
cntrs_reper.power AS power_reper,
cntrs_reper.total_in AS total_in_reper,
cntrs_reper.total_out AS total_out_reper,
cntrs_reper.total_acc AS total_acc_reper,
cntrs_reper.total_bet AS total_bet_reper,
cntrs_reper.total_win AS total_win_reper,
cntrs_reper.total_bonus AS total_bonus_reper,
cntrs_reper.date AS date_reper,
cntrs_reper.time AS time_reper,
bns_reper.time AS time_c_g,
gms_cfg.gm_no AS machine_id,
gms_cfg.denom_cin AS machine_cin
FROM bns_gms
INNER JOIN cntrs_gm
ON bns_gms.gm_company_no = cntrs_gm.gm_company_no AND bns_gms.date = cntrs_gm.date
INNER JOIN bns_reper
ON cntrs_gm.gm_company_no = bns_reper.gm_company_no
INNER JOIN cntrs_reper
ON bns_reper.gm_company_no = cntrs_reper.gm_company_no AND bns_reper.date = cntrs_reper.date
INNER JOIN gms_cfg
ON cntrs_reper.gm_company_no = gms_cfg.gm_no
WHERE bns_reper.date IN(
SELECT MAX(DATE(bns_reper.date)) FROM bns_reper WHERE bns_reper.time IN(
SELECT MAX(TIME(bns_reper.time)) FROM bns_reper
)
)
AND cntrs_reper.date IN(
SELECT MAX(DATE(cntrs_reper.date)) FROM cntrs_reper WHERE cntrs_reper.time IN(
SELECT MAX(TIME(cntrs_reper.time)) FROM cntrs_reper
)
)
ORDER BY cntrs_gm.gm_company_no
DB example
bns_gms
bns_reper
cntrs_gm
cntrs_reper
gms_cfg
The problem with your current query is that it will select all rows where table1.date is the latest date on which the highest time occurs, which may well be more than one e.g. for data such as
id date time
1 2018-03-30 18:40
2 2018-03-31 12:20
3 2018-03-31 19:20
Your WHERE clause:
table1.date IN(
SELECT MAX(table1.date) FROM table1 WHERE table1.time IN(
SELECT MAX(table1.time) FROM table1
)
will select rows with id=2 and id=3 as they both have date = '2018-03-31' which is when the maximum time occurs.
What you want to do is select the row which has the latest time on the latest date, for which you could use
table1.date = (SELECT MAX(date) FROM table1) AND
table1.time = (SELECT MAX(time) FROM table1 WHERE date = (SELECT(MAX(date) FROM table1))
By using aliasing, that can be simplified (since we already know table1.date = MAX(date) FROM table1) to
table1.date = (SELECT MAX(date) FROM table1) AND
table1.time = (SELECT MAX(time) FROM table1 AS t1 WHERE t1.date = table1.date)
I don't have MySQL but here is the general idea you can use. I don't have enough points to write a comment so I am responding as a reply. Essentially make a subquery/inline view for each table to select max of a column, then join those subqueries/inline views together.
Here is Oracle syntax. You can convert it to ANSI syntax.
select table1.column1, table2.column2,table3.column3
from
(select id1, max(column1) as column1 from table1 group by id1) as table1
(select id2, max(column2) as column2 from table2 group by id2) as table2
(select id3, max(column3) as column3 from table3 group by id3) as table3
where
table1.id1 = table2.id2
and table1.id1 = table3.id3
;

using the same field within a subquery of mysql

how can i make something like this work?
INSERT INTO age.page(domain,title_count,youtube_count,ipaddress,updated)
SELECT * FROM
(
SELECT domain,
COUNT(domain) AS titlecount,
(SELECT COUNT(*) FROM table2 WHERE title = table1.title) AS YoutubeCount, ipaddress
NOW() AS timeNow
FROM table1
GROUP BY domain
ORDER BY title DESC
) a;
I want to use a subquery to get a count of a different table but use the same field from the main query.
the reason i want to do this is so i dont have to run two queries instead its only one.
You can do this COUNT in a subquery and then JOIN it with the first table:
INSERT INTO age.page(domain, title_count, youtube_count, ipaddress, updated)
SELECT * FROM
(
SELECT
domain,
COUNT(domain) AS titlecount,
t2.titlecount AS YoutubeCount,
ipaddress,
NOW() AS timeNow
FROM table1
INNER JOIN
(
SELECT title, COUNT(*) Titlecount
FROM table2
GROUP BY title
) AS t2 ON t2.title = table1.title
GROUP BY domain
ORDER BY table1.title DESC
) a;

Diff value last two record by datetime

I have table with id, item_id, value (int), run (datetime) and i need select value diff betwen last two run per *item_id*.
SELECT item_id, ABS(value1 - value2) AS diff
FROM ( SELECT h.item_id, h.value AS value1, h2.value AS value2
FROM ( SELECT id, item_id, value
FROM table_name
GROUP BY item_id
ORDER BY run DESC) AS h
INNER JOIN ( SELECT id, item_id, value
FROM table_name
ORDER BY run DESC) AS h2
ON h.item_id = h2.item_id AND h.id != h2.id
GROUP BY item_id) AS h3
I believe this should do the trick for you. Just replace table_name to correct name.
Explanation:
Basicly I join the table with itself in a run DESC order, JOIN them based on item_id but also on id. Then I GROUP BY them again to remove potential 3rd and so on cases. Lastly I calculate the difference between them through ABS(value1 - value2).
SELECT t2.id, t2.item_id, (t2.value- t1.value) valueDiff, t2.run
FROM ( table_name AS t1
INNER JOIN
table_name AS t2
ON t1.run = (SELECT MAX(run) FROM table_name where run < t2.run)
and t1.item_id = t2.item_id)
This is assuming you want the diff between a record and the record with the previous run