Slow MySQL distinct with where - mysql

I have a large table that has two columns (among others):
event_date
country
This query is very fast:
select distinct event-date from my_table
This query is also very fast:
select * from my_table where country = 'US'
However, this query is very slow:
select distinct event_date from my_table where country = 'US'
I tried adding all combinations of indexes, including one on both columns. Nothing makes the third query faster.
Any insights?

Have you tried staging the results in a temporary table, adding an index, then completing the query from there? Not sure if this will work in MySQL, but it's a trick I use successfully in MSSQL quite often.
CREATE TEMPORARY TABLE IF NOT EXISTS staging AS (
SELECT event_date FROM my_table WHERE country = 'US'
);
CREATE INDEX ix_date ON staging(event_date);
SELECT DISTINCT event_date FROM staging;

ALTER TABLE my_table ADD INDEX my_idx (event_date, country);

Related

Optimization of specific very slow query

I wrote this query after optimizing a php script that did the same thing but using 3 different queries and 2 loops while...and this php script took over 6 hours to run...
So i've compress all in a simple query that to the same job without any loops...
DELETE table FROM table WHERE id IN (
SELECT id from(
SELECT MAX(data_elab) as data_elab_new, count(*) as volte,t1.* FROM (
SELECT * from table ORDER BY data_elab DESC
)t1
group by cod_dl,issn,variante,add_on having volte>1
)t2
);
Note:the server is very old (Windows,3gb of ram,32bit),table size 204 MB,100.000 row,20 columns,only id is primary key,no indexes.
This query took only 20sec...the delete is the problem....
SELECT id from(
SELECT MAX(data_elab) as data_elab_new, count(*) as volte,t1.* FROM (
SELECT * from table ORDER BY data_elab DESC
)t1
group by cod_dl,issn,variante,add_on having volte>1
)t2
The problem is that I thought of speeding up the operation a lot but actually after more than two hours the query did not complete and continues to works...
Any advice to optimize this query or I did something wrong in the query?
Thank you.
Assuming data_elab is never repeated for any combination of cod_dl, issn, variante, add_on (I am assuming that is what "univocal" means), this the form the query you need should take:
DELETE table
FROM table
WHERE (cod_dl, issn, variante, add_on, data_elab) IN (
SELECT cod_dl, issn, variante, add_on, MAX(data_elab) as data_elab_max
FROM table
GROUP BY cod_dl, issn, variante, add_on
HAVING COUNT(*) > 1
);
As MySQL doesn't tend to like DELETEing and SELECTing from the same table in a query, you might have to do some tweaking, something like:
DELETE table
FROM table
WHERE (cod_dl, issn, variante, add_on, data_elab) IN (
SELECT extraLayerOfIndirection.*
FROM (
SELECT cod_dl, issn, variante, add_on, MAX(data_elab) as data_elab_max
FROM table
GROUP BY cod_dl, issn, variante, add_on
HAVING COUNT(*) > 1
) AS extraLayerOfIndirection
);
Also, it's not quite the same, but you might want to consider this instead:
DELETE table
FROM table
WHERE (cod_dl, issn, variante, add_on, data_elab) NOT IN (
SELECT extraLayerOfIndirection.*
FROM (
SELECT cod_dl, issn, variante, add_on, MIN(data_elab) as data_elab_max
FROM table
GROUP BY cod_dl, issn, variante, add_on
) AS extraLayerOfIndirection
);
Instead of only deleting the last of each grouping, this deletes all but the first for each grouping. If you have a lot of repeats, and only want to preserve the first for each anyway, this could result in a much smaller result from the subquery.

how to optimize a query relating group by and order by

SELECT *
FROM `table_name`
WHERE `id` IN ( SELECT MAX(`id`) FROM `table_name` GROUP BY `name` )
How can we optimize this query?
I would suggest writing the query as:
select t.*
from table_name t
where t.id = (select max(t2.id) from table_name t2 where t2.name = t.name);
Then you want an index on table_name(name, id):
create index idx_table_name_name_id on table_name(name, id);
Your version of the query is going to require aggregation for the subquery -- I don't think MySQL will rewrite it. The aggregation can probably use the index. However, writing the query using an = guarantees an optimal execution plan.
I recommend adding an index on (name, id). This should greatly improve the performance the subquery, allowing MySQL to quickly lookup each id value in the outer query.
CREATE INDEX idx ON table_name (name, id);
Assuming table_name has many columns, then SELECT * would probably preclude the chance that any single index could speed up the outer query. But, at least we can try optimizing the WHERE IN clause.

Compare data for two versions of the same MySQL database table

I have two MySQL tables that have the exact same structure and mostly the same data. Some of the rows would be different between the two because my client updated the old website instead of the new website. There are hundreds of records and a column is not in place for the last modified date. I have created a new database on localhost and imported the old and new tables. All of the rows of data will need to be compared and differences between the old and new databases will need to be returned. Once the differences are identified, would there be a way to easily migrate the updated data from the old table to the new table? I am a MySQL novice, but I can usually muddle my way through issues. Thanks in advance for your assistance.
I have been looking at the following code, but I am not sure if it is the best answer.
SELECT *,'table_1' AS o FROM table_1
UNION
SELECT *,'table_2' AS o FROM table_2
WHERE some_id IN (
SELECT some_id
FROM (
SELECT * FROM table_1
UNION
SELECT * FROM table_2
) AS x
GROUP BY some_id
HAVING COUNT(*) > 1
)
ORDER BY some_id, o;
This should do the trick. You are finding the primary keys for all rows where the every value is the same across both tables in the subselect used in the where clause. You then exclude rows with those primary keys from the unioned result set. Now how you go about reconciling the differences is a totally different story :)
SELECT * FROM (
SELECT *, 'table 1' FROM table_1
UNION ALL
SELECT *, 'table 2' FROM table_2
) AS combined
WHERE combined.primary_key_field
NOT IN (
SELECT t1.primary_key_field
FROM table_1 AS t1
INNER JOIN table_2 AS t2
ON t1.primary_key_field = t2.primary_key_field
AND t1.some_other_field = t2.some_other_field
AND ... /* join on all fields in tables */
)
A insert into select single query will do.
insert into table_new
select * from table_old
where some_id NOT IN (select some_id from table_new)

How to make a select statement in update?

I need to update a table, and the Where clause should contain the last (or max) from a certain column, so I made this query:
UPDATE Orders
SET Ordermethod='Pickup'
WHERE orderid IN (
SELECT MAX(orderid)
FROM Orders);
But, for some reason I don't understand, mysql returns this error:
1093 - You can't specify target table 'Bestellingen' for update in FROM clause
I tried different queries, which aren't working either...
Can someone help??
Sorry for the crappy english
This is a MySQL limitation. (As the documentation puts it: "Currently, you cannot update a table and select from the same table in a subquery.") You can work around the limitation by writing your subquery as (SELECT * FROM (SELECT ...) t), so that MySQL will create a temporary table for you:
UPDATE Orders
SET Ordermethod='Pickup'
WHERE orderid IN
( SELECT *
FROM ( SELECT MAX(orderid)
FROM Orders
) t
)
;
UPDATE Orders
SET Ordermethod='Pickup'
WHERE orderid IN( SELECT MAX(orderid) FROM
(
SELECT * FROM Orders
)
AS c1
)

How to remove duplicate entries from a mysql db?

I have a table with some ids + titles. I want to make the title column unique, but it has over 600k records already, some of which are duplicates (sometimes several dozen times over).
How do I remove all duplicates, except one, so I can add a UNIQUE key to the title column after?
This command adds a unique key, and drops all rows that generate errors (due to the unique key). This removes duplicates.
ALTER IGNORE TABLE table ADD UNIQUE KEY idx1(title);
Edit: Note that this command may not work for InnoDB tables for some versions of MySQL. See this post for a workaround. (Thanks to "an anonymous user" for this information.)
Create a new table with just the distinct rows of the original table. There may be other ways but I find this the cleanest.
CREATE TABLE tmp_table AS SELECT DISTINCT [....] FROM main_table
More specifically:
The faster way is to insert distinct rows into a temporary table. Using delete, it took me a few hours to remove duplicates from a table of 8 million rows. Using insert and distinct, it took just 13 minutes.
CREATE TABLE tempTableName LIKE tableName;
CREATE INDEX ix_all_id ON tableName(cellId,attributeId,entityRowId,value);
INSERT INTO tempTableName(cellId,attributeId,entityRowId,value) SELECT DISTINCT cellId,attributeId,entityRowId,value FROM tableName;
DROP TABLE tableName;
INSERT tableName SELECT * FROM tempTableName;
DROP TABLE tempTableName;
Since the MySql ALTER IGNORE TABLE has been deprecated, you need to actually delete the duplicate date before adding an index.
First write a query that finds all the duplicates. Here I'm assuming that email is the field that contains duplicates.
SELECT
s1.email
s1.id,
s1.created
s2.id,
s2.created
FROM
student AS s1
INNER JOIN
student AS s2
WHERE
/* Emails are the same */
s1.email = s2.email AND
/* DON'T select both accounts,
only select the one created later.
The serial id could also be used here */
s2.created > s1.created
;
Next select only the unique duplicate ids:
SELECT
DISTINCT s2.id
FROM
student AS s1
INNER JOIN
student AS s2
WHERE
s1.email = s2.email AND
s2.created > s1.created
;
Once you are sure that only contains the duplicate ids you want to delete, run the delete. You have to add (SELECT * FROM tblname) so that MySql doesn't complain.
DELETE FROM
student
WHERE
id
IN (
SELECT
DISTINCT s2.id
FROM
(SELECT * FROM student) AS s1
INNER JOIN
(SELECT * FROM student) AS s2
WHERE
s1.email = s2.email AND
s2.created > s1.created
);
Then create the unique index:
ALTER TABLE
student
ADD UNIQUE INDEX
idx_student_unique_email(email)
;
Below query can be used to delete all the duplicate except the one row with lowest "id" field value
DELETE t1 FROM table_name t1, table_name t2 WHERE t1.id > t2.id AND t1.name = t2.name
In the similar way, we can keep the row with the highest value in 'id' as follows
DELETE t1 FROM table_name t1, table_name t2 WHERE t1.id < t2.id AND t1.name = t2.name
This shows how to do it in SQL2000. I'm not completely familiar with MySQL syntax but I'm sure there's something comparable
create table #titles (iid int identity (1, 1), title varchar(200))
-- Repeat this step many times to create duplicates
insert into #titles(title) values ('bob')
insert into #titles(title) values ('bob1')
insert into #titles(title) values ('bob2')
insert into #titles(title) values ('bob3')
insert into #titles(title) values ('bob4')
DELETE T FROM
#titles T left join
(
select title, min(iid) as minid from #titles group by title
) D on T.title = D.title and T.iid = D.minid
WHERE D.minid is null
Select * FROM #titles
delete from student where id in (
SELECT distinct(s1.`student_id`) from student as s1 inner join student as s2
where s1.`sex` = s2.`sex` and
s1.`student_id` > s2.`student_id` and
s1.`sex` = 'M'
ORDER BY `s1`.`student_id` ASC
)
The solution posted by Nitin seems to be the most elegant / logical one.
However it has one issue:
ERROR 1093 (HY000): You can't specify target table 'student' for
update in FROM clause
This can however be resolved by using (SELECT * FROM student) instead of student:
DELETE FROM student WHERE id IN (
SELECT distinct(s1.`student_id`) FROM (SELECT * FROM student) AS s1 INNER JOIN (SELECT * FROM student) AS s2
WHERE s1.`sex` = s2.`sex` AND
s1.`student_id` > s2.`student_id` AND
s1.`sex` = 'M'
ORDER BY `s1`.`student_id` ASC
)
Give your +1's to Nitin for coming up with the original solution.
Deleting duplicates on MySQL tables is a common issue, that usually comes with specific needs. In case anyone is interested, here (Remove duplicate rows in MySQL) I explain how to use a temporary table to delete MySQL duplicates in a reliable and fast way (with examples for different use cases).
In this case, something like this should work:
-- create a new temporary table
CREATE TABLE tmp_table1 LIKE table1;
-- add a unique constraint
ALTER TABLE tmp_table1 ADD UNIQUE(id, title);
-- scan over the table to insert entries
INSERT IGNORE INTO tmp_table1 SELECT * FROM table1 ORDER BY sid;
-- rename tables
RENAME TABLE table1 TO backup_table1, tmp_table1 TO table1;