how to optimize a query relating group by and order by - mysql

SELECT *
FROM `table_name`
WHERE `id` IN ( SELECT MAX(`id`) FROM `table_name` GROUP BY `name` )
How can we optimize this query?

I would suggest writing the query as:
select t.*
from table_name t
where t.id = (select max(t2.id) from table_name t2 where t2.name = t.name);
Then you want an index on table_name(name, id):
create index idx_table_name_name_id on table_name(name, id);
Your version of the query is going to require aggregation for the subquery -- I don't think MySQL will rewrite it. The aggregation can probably use the index. However, writing the query using an = guarantees an optimal execution plan.

I recommend adding an index on (name, id). This should greatly improve the performance the subquery, allowing MySQL to quickly lookup each id value in the outer query.
CREATE INDEX idx ON table_name (name, id);
Assuming table_name has many columns, then SELECT * would probably preclude the chance that any single index could speed up the outer query. But, at least we can try optimizing the WHERE IN clause.

Related

Slow MySQL distinct with where

I have a large table that has two columns (among others):
event_date
country
This query is very fast:
select distinct event-date from my_table
This query is also very fast:
select * from my_table where country = 'US'
However, this query is very slow:
select distinct event_date from my_table where country = 'US'
I tried adding all combinations of indexes, including one on both columns. Nothing makes the third query faster.
Any insights?
Have you tried staging the results in a temporary table, adding an index, then completing the query from there? Not sure if this will work in MySQL, but it's a trick I use successfully in MSSQL quite often.
CREATE TEMPORARY TABLE IF NOT EXISTS staging AS (
SELECT event_date FROM my_table WHERE country = 'US'
);
CREATE INDEX ix_date ON staging(event_date);
SELECT DISTINCT event_date FROM staging;
ALTER TABLE my_table ADD INDEX my_idx (event_date, country);

Find total records in various tables in a single query

Currently I m using this query ,Is there any substitution for this query,which will work more faster .
SELECT
SUM(result1),
SUM(result2),
SUM(result3)
FROM (
(
SELECT
0 as result1,0 as result2,COUNT(*) as result3
FROM
table1
)
UNION
(
SELECT
count(*) as result1,0 as result2,0 as result3
FROM
table2
)
UNION
(
SELECT
0 as result1,count(*) as result2,0 as result3
FROM
table3
)
) as allresult
Alternate solution of above query is as below:
SELECT (SELECT COUNT(1) FROM table2) AS result1,
(SELECT COUNT(1) FROM table3) AS result2,
(SELECT COUNT(1) FROM table1) AS result3;
Add the table names in the WHERE clause and execute the below query:
SELECT
T.Name AS TableName,
S.Row_count AS RecordsCount
FROM
sys.dm_db_partition_stats S
INNER JOIN sys.tables T ON T.object_id = S.object_id
Where
Object_Name(S.Object_Id) IN ('Employees','Country')
Very simple way to shave some performance load off this query:
Use UNION ALL instead of UNION. UNION ALL will return duplicates if there are any but the only difference between that and waht you are using, just UNION, is that UNION removes these duplicates at the expense of decreased performace. In other words it does a UNION ALL and then goes back and removes the duplicate entries.
It should increase your querys performance
(Copying my comment from this answer)
You can get the row counts for a table from the INFORMATION_SCHEMA as follows (but see caveat below):
SELECT table_rows
FROM information_schema.tables
WHERE table_schema = DATABASE()
AND table_name IN ('table1', 'table2', 'table3');
However the MySQL documentation notes that these values are not exact for InnoDb tables: "For InnoDB tables, the row count is only a rough estimate used in SQL optimization. (This is also true if the InnoDB table is partitioned.)". If you are using MyISAM, this approach may be sufficient.

Wildly different query performance on similar tables?

I am trying to select duplicate rows from a series of MySQL tables. The following query...
SELECT *
FROM table_name
WHERE column_name
IN (SELECT *
FROM (SELECT column_name
FROM table_name
GROUP BY column_name
HAVING COUNT(*) > 1
) AS subquery)
);
...is producing wildly different performance when run in different tables with identical schema and similar number of rows. In one table it executes within a few seconds, in another with identical data types and similar number of rows it is hanging up for an extended period of time (currently at 30 minutes and counting). What possible explanations are there for such a discrepancy?
EDIT - using EXPLAIN is showing that all the queries are returning "Impossible WHERE noticed after reading const tables" for the dependent subquery. This probably is a good time to mention that there are no indexes on any of the tables (which I inherited...). Finding duplicate values in what is supposed to be a uniqid column so that I can turn that into a proper primary key is the point of this entire snape hunt.
I'd suggest splitting the subquery out into a temporary table.
CREATE TEMPORARY TABLE IF NOT EXISTS DupeColumn AS (
SELECT column_name
FROM table_name
GROUP BY column_name
HAVING COUNT(*) > 1
);
SELECT t.*
FROM DupeColumn dc
INNER JOIN table_name t
ON dc.column_name = t.column_name;
DROP TEMPORARY TABLE DupeColumn;
In my experience, MySQL is very poor at optimizing
SELECT *
FROM table1
WHERE col1 in (SELECT col2 FROM table2 WHERE ...)
Instead of performing the subquery once and then looking up all the col2 values in table1, it performs a full scan of table1 and then searches for col1 in table2.col2.
It does better when you write a JOIN:
SELECT table1.*
FROM table1
JOIN table2 ON table1.col1 = table2.col2
In your case, this would be done using a subquery for table2:
SELECT t1.*
FROM table_name AS t1
JOIN (SELECT column_name
FROM table_name
GROUP BY column_name
HAVING COUNT(*) > 1) AS t2
ON t1.column_name = t2.column_name

Alter and Optimize sql query

I need to please change this SQL query to NOT use sub-query with IN, I need for this query to work faster.
here is the query i am working on. About 7 million rows.
SELECT `MovieID`, COUNT(*) AS `Count`
FROM `download`
WHERE `UserID` IN (
SELECT `UserID` FROM `download`
WHERE `MovieID` = 995
)
GROUP BY `MovieID`
ORDER BY `Count` DESC
Thanks
Something like this - but (in the event that you switch to an OUTER JOIN) make sure you're counting the right thing...
SELECT MovieID
, COUNT(*) ttl
FROM download x
JOIN download y
ON y.userid = x.userid
AND y.movieid = 995
GROUP
BY x.MovieID
ORDER
BY ttl DESC;
Use Exists instead, see Optimizing Subqueries with EXISTS Strategy:
Consider the following subquery comparison:
outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where) MySQL
evaluates queries “from outside to inside.” That is, it first obtains
the value of the outer expression outer_expr, and then runs the
subquery and captures the rows that it produces.
A very useful optimization is to “inform” the subquery that the only
rows of interest are those where the inner expression inner_expr is
equal to outer_expr. This is done by pushing down an appropriate
equality into the subquery's WHERE clause. That is, the comparison is
converted to this:
EXISTS (SELECT 1 FROM ... WHERE subquery_where AND
outer_expr=inner_expr) After the conversion, MySQL can use the
pushed-down equality to limit the number of rows that it must examine
when evaluating the subquery.
filter direct on movieId..you does not need to add sub query. it can be done by using movieID =995 in where clause.
SELECT `MovieID`, COUNT(*) AS `Count`
FROM `download`
WHERE `MovieID` = 995
GROUP BY `MovieID`
ORDER BY `Count` DESC

Optimizing database query with up to 10mil rows as result

I have a MySQL Query that i need to optimize as much as possible (should have a load time below 5s, if possible)
Query is as follow:
SELECT domain_id, COUNT(keyword_id) as total_count
FROM tableName
WHERE keyword_id IN (SELECT DISTINCT keyword_id FROM tableName WHERE domain_id = X)
GROUP BY domain_id
ORDER BY total_count DESC
LIMIT ...
X is an integer that comes from an input
domain_id and keyword_id are indexed
database is on localhost, so the network speed should be max
The subquery from the WHERE clause can get up to 10 mil results. Also, for MySQL seems really hard to calculate the COUNT and ORDER BY this count.
I tried to mix this query with SOLR, but no results, getting such a high number of rows at once gives hard time for both MySQL and SOLR
I'm looking for a solution to have the same results, no matter if i have to use a different technology or an improvement to this MySQL query.
Thanks!
Query logic is this:
We have a domain and we are searching for all the keywords that are being used on that domain (this is the sub query). Then we take all the domains that use at least one of the keywords found on the first query, grouped by domain, with the number of keywords used for each domain, and we have to display it ordered DESC by the number of keywords used.
I hope this make sense
You may try JOIN instead of subquery:
SELECT tableName.domain_id, COUNT(tableName.keyword_id) AS total_count
FROM tableName
INNER JOIN tableName AS rejoin
ON rejoin.keyword_id = tableName.keyword_id
WHERE rejoin.domain_id = X
GROUP BY tableName.domain_id
ORDER BY tableName.total_count DESC
LIMIT ...
I am not 100% sure but can you try this please
SELECT t1.domain_id, COUNT(t1.keyword_id) as total_count
FROM tableName AS t1 LEFT JOIN
(SELECT DISTINCT keyword_id FROM tableName WHERE domain_id = X) AS t2
ON t1.keyword_id = t2.keyword_id
WHERE t2.keyword_id IS NTO NULL
GROUP BY t1.domain_id
ORDER BY total_count DESC
LIMIT ...
The goal is to replace the WHERE IN clause with INNER JOIN and that will make it lot quicker. WHERE IN clause always make the Mysql server to struggle, but it is even more noticeable when you do it with huge amount of data. Use WHERE IN only if it make you query look easier to be read/understood, you have a small data set or it is not possible in another way (but you probably will have another way to do it anyway :) )
In terms of MySQL all you can do is to minimize Disk IO for the query using covering indexes and rewrite it a little more efficient so that the query would benefit from them.
Since keyword_id has a match in another copy of the table, COUNT(keyword_id) becomes COUNT(*).
The kind of subqueries you use is known to be the worst case for MySQL (it executes the subquery for each row), but I am not sure if it should be replaced with a JOIN here, because It might be a proper strategy for your data.
As you probably understand, the query like:
SELECT domain_id, COUNT(*) as total_count
FROM tableName
WHERE keyword_id IN (X,Y,Z)
GROUP BY domain_id
ORDER BY total_count DESC
would have the best performance with a covering composite index (keyword_id, domain_id [,...]), so it is a must. From the other side, the query like:
SELECT DISTINCT keyword_id FROM tableName WHERE domain_id = X
will have the best performance on a covering composite index (domain_id, keyword_id [,...]). So you need both of them.
Hopefully, but I am not sure, when you have the latter index, MySQL can understand that you do not need to select all those keyword_id in the subquery, but you just need to check if there is an entry in the index, and I am sure that it is better expressed if you do not use DISTINCT.
So, I would try to add those two indexes and rewrite the query as:
SELECT domain_id, COUNT(*) as total_count
FROM tableName
WHERE keyword_id IN (SELECT keyword_id FROM tableName WHERE domain_id = X)
GROUP BY domain_id
ORDER BY total_count DESC
Another option is to rewrite the query as follows:
SELECT domain_id, COUNT(*) as total_count
FROM (
SELECT DISTINCT keyword_id
FROM tableName
WHERE domain_id = X
) as kw
JOIN tableName USING (keyword_id)
GROUP BY domain_id
ORDER BY total_count DESC
Once again you need those two composite indexes.
Which one of the queries is quicker depends on the statistics in your tableName.