Optimization of specific very slow query - mysql

I wrote this query after optimizing a php script that did the same thing but using 3 different queries and 2 loops while...and this php script took over 6 hours to run...
So i've compress all in a simple query that to the same job without any loops...
DELETE table FROM table WHERE id IN (
SELECT id from(
SELECT MAX(data_elab) as data_elab_new, count(*) as volte,t1.* FROM (
SELECT * from table ORDER BY data_elab DESC
)t1
group by cod_dl,issn,variante,add_on having volte>1
)t2
);
Note:the server is very old (Windows,3gb of ram,32bit),table size 204 MB,100.000 row,20 columns,only id is primary key,no indexes.
This query took only 20sec...the delete is the problem....
SELECT id from(
SELECT MAX(data_elab) as data_elab_new, count(*) as volte,t1.* FROM (
SELECT * from table ORDER BY data_elab DESC
)t1
group by cod_dl,issn,variante,add_on having volte>1
)t2
The problem is that I thought of speeding up the operation a lot but actually after more than two hours the query did not complete and continues to works...
Any advice to optimize this query or I did something wrong in the query?
Thank you.

Assuming data_elab is never repeated for any combination of cod_dl, issn, variante, add_on (I am assuming that is what "univocal" means), this the form the query you need should take:
DELETE table
FROM table
WHERE (cod_dl, issn, variante, add_on, data_elab) IN (
SELECT cod_dl, issn, variante, add_on, MAX(data_elab) as data_elab_max
FROM table
GROUP BY cod_dl, issn, variante, add_on
HAVING COUNT(*) > 1
);
As MySQL doesn't tend to like DELETEing and SELECTing from the same table in a query, you might have to do some tweaking, something like:
DELETE table
FROM table
WHERE (cod_dl, issn, variante, add_on, data_elab) IN (
SELECT extraLayerOfIndirection.*
FROM (
SELECT cod_dl, issn, variante, add_on, MAX(data_elab) as data_elab_max
FROM table
GROUP BY cod_dl, issn, variante, add_on
HAVING COUNT(*) > 1
) AS extraLayerOfIndirection
);
Also, it's not quite the same, but you might want to consider this instead:
DELETE table
FROM table
WHERE (cod_dl, issn, variante, add_on, data_elab) NOT IN (
SELECT extraLayerOfIndirection.*
FROM (
SELECT cod_dl, issn, variante, add_on, MIN(data_elab) as data_elab_max
FROM table
GROUP BY cod_dl, issn, variante, add_on
) AS extraLayerOfIndirection
);
Instead of only deleting the last of each grouping, this deletes all but the first for each grouping. If you have a lot of repeats, and only want to preserve the first for each anyway, this could result in a much smaller result from the subquery.

Related

Query time that I can't understand in MYSQL

I'm new to this platform, even this is my first question. Sorry for my bad English, I use translate. Let me know if I have used inappropriate language.
my table is like this
CREATE TABLE tbl_records (
id int(11) NOT NULL,
data_id int(11) NOT NULL,
value double NOT NULL,
record_time datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
ALTER TABLE tbl_records
ADD PRIMARY KEY (id),
ADD KEY data_id (data_id),
ADD KEY record_time (record_time);
ALTER TABLE tbl_records
MODIFY id int(11) NOT NULL AUTO_INCREMENT;
COMMIT;
my first query
It takes 0.0096 seconds
SELECT b.* FROM tbl_records b
INNER JOIN
(SELECT MAX(id) AS id FROM tbl_records GROUP BY data_id) a
ON a.id=b.id;
my second query
Its takes 2.4957 seconds
SELECT MAX(id) AS id FROM tbl_records GROUP BY data_id;
When I do these operations over and over again, the result is similar.
There are 20 million data in the table.
Why is the one with the subquery faster?
Also what I really need is MAX(record_time) but
SELECT b.* FROM tbl_records b
INNER JOIN
(SELECT MAX(record_time) AS id FROM tbl_records GROUP BY data_id) a
ON a.id=b.id
It takes minutes when I run it.
I also need records such as hourly, daily, and monthly. I couldn't see much performance difference between GROUP BY SUBSTR(record_time,1,10) or GROUP BY DATE_FORMAT(record_time,'%Y%m%d') both take minutes.
What am I doing wrong?
The first query can be simplified to
SELECT * FROM tbl_records
ORDER BY id DESC
LIMIT 1.
The second:
SELECT id FROM tbl_records
ORDER BY data_id DESC
LIMIT 1;
I don't know what the third is trying to do. This does not make sense: MAX(record_time) AS id -- it is a DATETIME that will subsequently be compared to an INT in ON a.id=b.id.
Another option for turning a DATETIME into a DATE is simply DATE(record_time). But it will not be significantly faster.
If the goal is to build daily counts and subtotals, then there is a much better way. Build and incrementally maintain a Summary table .
(responding to Comment)
The GROUP BY that you have is improper and probably incorrect. I took the liberty of changing from id to data_id:
SELECT b.*
FROM
( SELECT data_id, MAX(record_time) AS max_time
FROM tbl_records
GROUP BY data_id
) AS a
FROM tbl_records AS b
ON a.data_id = b.data_id
AND a.max_time = b.record_time
And have
INDEX(data_id, record_time)
Can there be duplicate times for one data_id? To discuss that and other "groupwise-max" queries, see http://mysql.rjweb.org/doc.php/groupwise_max

Slow MySQL distinct with where

I have a large table that has two columns (among others):
event_date
country
This query is very fast:
select distinct event-date from my_table
This query is also very fast:
select * from my_table where country = 'US'
However, this query is very slow:
select distinct event_date from my_table where country = 'US'
I tried adding all combinations of indexes, including one on both columns. Nothing makes the third query faster.
Any insights?
Have you tried staging the results in a temporary table, adding an index, then completing the query from there? Not sure if this will work in MySQL, but it's a trick I use successfully in MSSQL quite often.
CREATE TEMPORARY TABLE IF NOT EXISTS staging AS (
SELECT event_date FROM my_table WHERE country = 'US'
);
CREATE INDEX ix_date ON staging(event_date);
SELECT DISTINCT event_date FROM staging;
ALTER TABLE my_table ADD INDEX my_idx (event_date, country);

mysql - how do I get count of counts

I have a table with duplicate skus.
skua
skua
skub
skub
skub
skuc
skuc
skud
SELECT sku, COUNT(1) AS `Count` FROM products GROUP BY sku;
shows me all the skus that have duplicates and the number of duplicates
skua 2
skub 3
skuc 2
skud 1
I am trying to find how many there are with 2 duplicates, 3 duplicates etc.
i.e.
duplicated count
1 1 (skud)
2 2 (skua, and skuc)
3 1 (skub)
and I don't know how to write the sql. I imagine it needs a subselect...
thanks
Just use your current query as an inline view, and use the rows from that just like it was from a table.
e.g.
SELECT t.Count AS `duplicated`
, COUNT(1) AS `count`
FROM ( SELECT sku, COUNT(1) AS `Count` FROM products GROUP BY sku ) t
GROUP BY t.Count
MySQL refers to an inline view as a "derived table", and that name makes sense, when we understand how MySQL actually processes that. MySQL runs that inner query, and creates a temporary MyISAM table; once that is done, MySQL runs the outer query, using the temporary MyISAM table. (You'll see that if you run an EXPLAIN on the query.)
Above, I left your query just as you formatted it; I'd tend to reformat your query, so that entire query looks like this:
SELECT t.Count AS `duplicated'
, COUNT(1) AS `count`
FROM ( SELECT p.sku
, COUNT(1) AS `Count`
FROM products p
GROUP BY p.sku
) t
GROUP BY t.Count
(Just makes it easier for me to see the inner query, and easier to extract it and run it separately. And qualifying all column references (with a table alias or table name) is a best practice.)
select dup_count as duplicated,
count(*) as `count`,
group_concat(sku) as skus
from
(
SELECT sku, COUNT(1) AS dup_count
FROM products
GROUP BY sku
) tmp_tbl
group by dup_count

order by date in union query

SELECT *
FROM
(
SELECT case_id,diagnosis_title,updated
FROM tbl_case
order by updated desc
) as table1
UNION
select *
FROM
(
select image_id,image_title,updated
from tbl_image
order by updated desc
) as table2
how to display records with mixed order. currently tbl_Case records displaying first and tbl_image record displaying in second section.
i want to mix the output. ORDER BY should work for both table.
Any reason you're doing those outer select *? They're rather pointless, since they just reselect everything you've already selected.
With mysql unions, this is how you order the entire result set:
(SELECT case_id, diagnosis_title, ... FROM ...)
UNION
(SELECT image_id, image_title, ... FROM ...)
ORDER BY ...
With the bracketing in place as it is above, the order by will sort all the records from both results sets together, instead of sorting each individual query's results separately.
Try to simply your query.
SELECT case_id,diagnosis_title,updated
FROM tbl_case
UNION
select image_id,image_title,updated
from tbl_image
ORDER BY updated desc
You could do this, (I'm just guessing about data types)
CREATE TEMPORARY TABLE TempTable
(id int, title varchar(100), updated tinyint(1));
INSERT TempTable
SELECT case_id, diagnosis_title, updated FROM tbl_case ORDER BY updated DESC;
INSERT TempTable
SELECT image_id, image_title, updated FROM tbl_image ORDER BY updated DESC;
SELECT * FROM TempTable;
Obviously, this uses a temporary table instead of a union to achieve what I think you are asking for.

MySQL - SELECT WHERE field IN (subquery) - Extremely slow why?

I've got a couple of duplicates in a database that I want to inspect, so what I did to see which are duplicates, I did this:
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
This way, I will get all rows with relevant_field occuring more than once. This query takes milliseconds to execute.
Now, I wanted to inspect each of the duplicates, so I thought I could SELECT each row in some_table with a relevant_field in the above query, so I did like this:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
)
This turns out to be extreeeemely slow for some reason (it takes minutes). What exactly is going on here to make it that slow? relevant_field is indexed.
Eventually I tried creating a view "temp_view" from the first query (SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1), and then making my second query like this instead:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT relevant_field
FROM temp_view
)
And that works just fine. MySQL does this in some milliseconds.
Any SQL experts here who can explain what's going on?
The subquery is being run for each row because it is a correlated query. One can make a correlated query into a non-correlated query by selecting everything from the subquery, like so:
SELECT * FROM
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
) AS subquery
The final query would look like this:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT * FROM
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
) AS subquery
)
Rewrite the query into this
SELECT st1.*, st2.relevant_field FROM sometable st1
INNER JOIN sometable st2 ON (st1.relevant_field = st2.relevant_field)
GROUP BY st1.id /* list a unique sometable field here*/
HAVING COUNT(*) > 1
I think st2.relevant_field must be in the select, because otherwise the having clause will give an error, but I'm not 100% sure
Never use IN with a subquery; this is notoriously slow.
Only ever use IN with a fixed list of values.
More tips
If you want to make queries faster,
don't do a SELECT * only select
the fields that you really need.
Make sure you have an index on relevant_field to speed up the equi-join.
Make sure to group by on the primary key.
If you are on InnoDB and you only select indexed fields (and things are not too complex) than MySQL will resolve your query using only the indexes, speeding things way up.
General solution for 90% of your IN (select queries
Use this code
SELECT * FROM sometable a WHERE EXISTS (
SELECT 1 FROM sometable b
WHERE a.relevant_field = b.relevant_field
GROUP BY b.relevant_field
HAVING count(*) > 1)
SELECT st1.*
FROM some_table st1
inner join
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
)st2 on st2.relevant_field = st1.relevant_field;
I've tried your query on one of my databases, and also tried it rewritten as a join to a sub-query.
This worked a lot faster, try it!
I have reformatted your slow sql query with www.prettysql.net
SELECT *
FROM some_table
WHERE
relevant_field in
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT ( * ) > 1
);
When using a table in both the query and the subquery, you should always alias both, like this:
SELECT *
FROM some_table as t1
WHERE
t1.relevant_field in
(
SELECT t2.relevant_field
FROM some_table as t2
GROUP BY t2.relevant_field
HAVING COUNT ( t2.relevant_field ) > 1
);
Does that help?
Subqueries vs joins
http://www.scribd.com/doc/2546837/New-Subquery-Optimizations-In-MySQL-6
Try this
SELECT t1.*
FROM
some_table t1,
(SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT (*) > 1) t2
WHERE
t1.relevant_field = t2.relevant_field;
Firstly you can find duplicate rows and find count of rows is used how many times and order it by number like this;
SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,
(
CASE q.NID
WHEN #curCode THEN
#curRow := #curRow + 1
ELSE
#curRow := 1
AND #curCode := q.NID
END
) AS No
FROM UserInfo q,
(
SELECT
#curRow := 1,
#curCode := ''
) rt
WHERE q.NID IN
(
SELECT NID
FROM UserInfo
GROUP BY NID
HAVING COUNT(*) > 1
)
after that create a table and insert result to it.
create table CopyTable
SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,
(
CASE q.NID
WHEN #curCode THEN
#curRow := #curRow + 1
ELSE
#curRow := 1
AND #curCode := q.NID
END
) AS No
FROM UserInfo q,
(
SELECT
#curRow := 1,
#curCode := ''
) rt
WHERE q.NID IN
(
SELECT NID
FROM UserInfo
GROUP BY NID
HAVING COUNT(*) > 1
)
Finally, delete dublicate rows.No is start 0. Except fist number of each group delete all dublicate rows.
delete from CopyTable where No!= 0;
sometimes when data grow bigger mysql WHERE IN's could be pretty slow because of query optimization. Try using STRAIGHT_JOIN to tell mysql to execute query as is, e.g.
SELECT STRAIGHT_JOIN table.field FROM table WHERE table.id IN (...)
but beware: in most cases mysql optimizer works pretty well, so I would recommend to use it only when you have this kind of problem
This is similar to my case, where I have a table named tabel_buku_besar. What I need are
Looking for record that have account_code='101.100' in tabel_buku_besar which have companyarea='20000' and also have IDR as currency
I need to get all record from tabel_buku_besar which have account_code same as step 1 but have transaction_number in step 1 result
while using select ... from...where....transaction_number in (select transaction_number from ....), my query running extremely slow and sometimes causing request time out or make my application not responding...
I try this combination and the result...not bad...
`select DATE_FORMAT(L.TANGGAL_INPUT,'%d-%m-%y') AS TANGGAL,
L.TRANSACTION_NUMBER AS VOUCHER,
L.ACCOUNT_CODE,
C.DESCRIPTION,
L.DEBET,
L.KREDIT
from (select * from tabel_buku_besar A
where A.COMPANYAREA='$COMPANYAREA'
AND A.CURRENCY='$Currency'
AND A.ACCOUNT_CODE!='$ACCOUNT'
AND (A.TANGGAL_INPUT BETWEEN STR_TO_DATE('$StartDate','%d/%m/%Y') AND STR_TO_DATE('$EndDate','%d/%m/%Y'))) L
INNER JOIN (select * from tabel_buku_besar A
where A.COMPANYAREA='$COMPANYAREA'
AND A.CURRENCY='$Currency'
AND A.ACCOUNT_CODE='$ACCOUNT'
AND (A.TANGGAL_INPUT BETWEEN STR_TO_DATE('$StartDate','%d/%m/%Y') AND STR_TO_DATE('$EndDate','%d/%m/%Y'))) R ON R.TRANSACTION_NUMBER=L.TRANSACTION_NUMBER AND R.COMPANYAREA=L.COMPANYAREA
LEFT OUTER JOIN master_account C ON C.ACCOUNT_CODE=L.ACCOUNT_CODE AND C.COMPANYAREA=L.COMPANYAREA
ORDER BY L.TANGGAL_INPUT,L.TRANSACTION_NUMBER`
I find this to be the most efficient for finding if a value exists, logic can easily be inverted to find if a value doesn't exist (ie IS NULL);
SELECT * FROM primary_table st1
LEFT JOIN comparision_table st2 ON (st1.relevant_field = st2.relevant_field)
WHERE st2.primaryKey IS NOT NULL
*Replace relevant_field with the name of the value that you want to check exists in your table
*Replace primaryKey with the name of the primary key column on the comparison table.
It's slow because your sub-query is executed once for every comparison between relevant_field and your IN clause's sub-query. You can avoid that like so:
SELECT *
FROM some_table T1 INNER JOIN
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
) T2
USING(relevant_field)
This creates a derived table (in memory unless it's too large to fit) as T2, then INNER JOIN's it with T1. The JOIN happens one time, so the query is executed one time.
I find this particularly handy for optimising cases where a pivot is used to associate a bulk data table with a more specific data table and you want to produce counts of the bulk table based on a subset of the more specific one's related rows. If you can narrow down the bulk rows to <5% then the resulting sparse accesses will generally be faster than a full table scan.
ie you have a Users table (condition), an Orders table (pivot) and LineItems table (bulk) which references counts of Products. You want the sum of Products grouped by User in PostCode '90210'. In this case the JOIN will be orders of magnitude smaller than when using WHERE relevant_field IN( SELECT * FROM (...) T2 ), and therefore much faster, especially if that JOIN is spilling to disk!