I need to please change this SQL query to NOT use sub-query with IN, I need for this query to work faster.
here is the query i am working on. About 7 million rows.
SELECT `MovieID`, COUNT(*) AS `Count`
FROM `download`
WHERE `UserID` IN (
SELECT `UserID` FROM `download`
WHERE `MovieID` = 995
)
GROUP BY `MovieID`
ORDER BY `Count` DESC
Thanks
Something like this - but (in the event that you switch to an OUTER JOIN) make sure you're counting the right thing...
SELECT MovieID
, COUNT(*) ttl
FROM download x
JOIN download y
ON y.userid = x.userid
AND y.movieid = 995
GROUP
BY x.MovieID
ORDER
BY ttl DESC;
Use Exists instead, see Optimizing Subqueries with EXISTS Strategy:
Consider the following subquery comparison:
outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where) MySQL
evaluates queries “from outside to inside.” That is, it first obtains
the value of the outer expression outer_expr, and then runs the
subquery and captures the rows that it produces.
A very useful optimization is to “inform” the subquery that the only
rows of interest are those where the inner expression inner_expr is
equal to outer_expr. This is done by pushing down an appropriate
equality into the subquery's WHERE clause. That is, the comparison is
converted to this:
EXISTS (SELECT 1 FROM ... WHERE subquery_where AND
outer_expr=inner_expr) After the conversion, MySQL can use the
pushed-down equality to limit the number of rows that it must examine
when evaluating the subquery.
filter direct on movieId..you does not need to add sub query. it can be done by using movieID =995 in where clause.
SELECT `MovieID`, COUNT(*) AS `Count`
FROM `download`
WHERE `MovieID` = 995
GROUP BY `MovieID`
ORDER BY `Count` DESC
Related
Im trying to do two queries on the same table to get the Count(*) value.
I have this
SELECT `a`.`name`, `a`.`points` FROM `rank` AS a WHERE `id` = 1
And in the same query I want to do this
SELECT `b`.`Count(*)` FROM `rank` as b WHERE `b`.`points` >= `a`.`points`
I tried searching but did not find how to do a Count(*) in the same query.
Typically you would not intermingle a non aggregate and aggregate query together in MySQL. You might do this in databases which support analytic functions, such as SQL Server, but not in (the current version of) MySQL. That being said, your second query can be handled using a correlated subquery in the select clause the first query. So you may try the following:
SELECT
a.name,
a.points,
(SELECT COUNT(*) FROM rank b WHERE b.points >= a.points) AS cnt
FROM rank a
WHERE a.id = 1;
As I understand from the question, you want to find out in a table for a given id how many rows have the points greater than this row. This can be achieved using full join.
select count(*) from rank a join rank b on(a.id != b.id) where a.id=1 and b.points >= a.points;
I have table with 38k rows and I use this query to compare item id from items table with item id from posted_domains table.
select * from `items`
where `items`.`source_id` = 2 and `items`.`source_id` is not null
and not exists (select *
from `posted_domains`
where `posted_domains`.`item_id` = `items`.`id` and `domain_id` = 1)
order by `item_created_at` asc limit 1
This query took 8s. I don't know if is a problem with my query or my mysql is bad configured. This query is generated by Laravel relations like
$items->doesntHave('posted', 'and', function ($q) use ($domain) {
$q->where('domain_id', $domain->id);
});
CORRELATED subqueries can be rather slow (as they are often executed repeatedly, once for each row in the outer query), this might be faster.
select *
from `items`
where `items`.`source_id` = 2
and `items`.`source_id` is not null
and item_id not in (
select DISTINCT item_id
from `posted_domains`
where `domain_id` = 1)
order by `item_created_at` asc
limit 1
I say might because subqueries in where are also rather slow in MySQL.
This LEFT JOIN will probably be the fastest.
select *
from `items`
LEFT JOIN (
select DISTINCT item_id
from `posted_domains`
where `domain_id` = 1) AS subQ
ON items.item_id = subQ.item_id
where `items`.`source_id` = 2
and `items`.`source_id` is not null
and subQ.item_id is null
order by `item_created_at` asc
limit 1;
Since it is a no matches scenario, it technically doesn't even need to be a subquery; and might be faster as a direct left join, but that will depend on indexes, and possibly actual data values.
I have a MySQL Query that i need to optimize as much as possible (should have a load time below 5s, if possible)
Query is as follow:
SELECT domain_id, COUNT(keyword_id) as total_count
FROM tableName
WHERE keyword_id IN (SELECT DISTINCT keyword_id FROM tableName WHERE domain_id = X)
GROUP BY domain_id
ORDER BY total_count DESC
LIMIT ...
X is an integer that comes from an input
domain_id and keyword_id are indexed
database is on localhost, so the network speed should be max
The subquery from the WHERE clause can get up to 10 mil results. Also, for MySQL seems really hard to calculate the COUNT and ORDER BY this count.
I tried to mix this query with SOLR, but no results, getting such a high number of rows at once gives hard time for both MySQL and SOLR
I'm looking for a solution to have the same results, no matter if i have to use a different technology or an improvement to this MySQL query.
Thanks!
Query logic is this:
We have a domain and we are searching for all the keywords that are being used on that domain (this is the sub query). Then we take all the domains that use at least one of the keywords found on the first query, grouped by domain, with the number of keywords used for each domain, and we have to display it ordered DESC by the number of keywords used.
I hope this make sense
You may try JOIN instead of subquery:
SELECT tableName.domain_id, COUNT(tableName.keyword_id) AS total_count
FROM tableName
INNER JOIN tableName AS rejoin
ON rejoin.keyword_id = tableName.keyword_id
WHERE rejoin.domain_id = X
GROUP BY tableName.domain_id
ORDER BY tableName.total_count DESC
LIMIT ...
I am not 100% sure but can you try this please
SELECT t1.domain_id, COUNT(t1.keyword_id) as total_count
FROM tableName AS t1 LEFT JOIN
(SELECT DISTINCT keyword_id FROM tableName WHERE domain_id = X) AS t2
ON t1.keyword_id = t2.keyword_id
WHERE t2.keyword_id IS NTO NULL
GROUP BY t1.domain_id
ORDER BY total_count DESC
LIMIT ...
The goal is to replace the WHERE IN clause with INNER JOIN and that will make it lot quicker. WHERE IN clause always make the Mysql server to struggle, but it is even more noticeable when you do it with huge amount of data. Use WHERE IN only if it make you query look easier to be read/understood, you have a small data set or it is not possible in another way (but you probably will have another way to do it anyway :) )
In terms of MySQL all you can do is to minimize Disk IO for the query using covering indexes and rewrite it a little more efficient so that the query would benefit from them.
Since keyword_id has a match in another copy of the table, COUNT(keyword_id) becomes COUNT(*).
The kind of subqueries you use is known to be the worst case for MySQL (it executes the subquery for each row), but I am not sure if it should be replaced with a JOIN here, because It might be a proper strategy for your data.
As you probably understand, the query like:
SELECT domain_id, COUNT(*) as total_count
FROM tableName
WHERE keyword_id IN (X,Y,Z)
GROUP BY domain_id
ORDER BY total_count DESC
would have the best performance with a covering composite index (keyword_id, domain_id [,...]), so it is a must. From the other side, the query like:
SELECT DISTINCT keyword_id FROM tableName WHERE domain_id = X
will have the best performance on a covering composite index (domain_id, keyword_id [,...]). So you need both of them.
Hopefully, but I am not sure, when you have the latter index, MySQL can understand that you do not need to select all those keyword_id in the subquery, but you just need to check if there is an entry in the index, and I am sure that it is better expressed if you do not use DISTINCT.
So, I would try to add those two indexes and rewrite the query as:
SELECT domain_id, COUNT(*) as total_count
FROM tableName
WHERE keyword_id IN (SELECT keyword_id FROM tableName WHERE domain_id = X)
GROUP BY domain_id
ORDER BY total_count DESC
Another option is to rewrite the query as follows:
SELECT domain_id, COUNT(*) as total_count
FROM (
SELECT DISTINCT keyword_id
FROM tableName
WHERE domain_id = X
) as kw
JOIN tableName USING (keyword_id)
GROUP BY domain_id
ORDER BY total_count DESC
Once again you need those two composite indexes.
Which one of the queries is quicker depends on the statistics in your tableName.
I'd like to list all rows having a match on same table.
So far i have came up with this
SELECT *
FROM parim_firms
WHERE firm_name IN (
SELECT firm_name
FROM parim_firms
GROUP BY firm_name
HAVING COUNT(*) > 1
)
But this query keeps running, although the subquery itself runs in 0.1 sec.
How could i optimize this?
I think the subquery executes for each row, not only once. Am i right?
how about joining it?
SELECT a.*
FROM parim_firms a
INNER JOIN
(
SELECT firm_name
FROM parim_firms
GROUP BY firm_name
HAVING COUNT(*) > 1
) b ON a.firm_name = b.firm_name
PS: be sure to add index on column firm_name for faster execution.
I have a table which counts occurrences of one specific action by different users on different objects:
CREATE TABLE `Actions` (
`object_id` int(10) unsigned NOT NULL,
`user_id` int(10) unsigned NOT NULL,
`actionTime` datetime
);
Every time a user performs this action, a row is inserted. I can count how many actions were performed on each object, and order objects by 'activity':
SELECT object_id, count(object_id) AS action_count
FROM `Actions`
GROUP BY object_id
ORDER BY action_count;
How can I limit the results to the top n objects? The LIMIT clause is applied before the aggregation, so it leads to wrong results. The table is potentially huge (millions of rows) and I probably need to count tens of times per minute, so I'd like to do this as efficient as possible.
edit: Actually, Machine is right, and I was wrong with the time at which LIMIT is applied. My query returned the correct results, but the GUI presenting them to me threw me off...this kind of makes this question pointless. Sorry!
Actually... LIMIT is applied last, after a eventual HAVING clause. So it should not give you incorrect results. However, since LIMIT is applied last, it will not provide any faster execution of your query, since a temporary table will have to be created and sorted in order of action count before chopping off the result. Also, remember to sort in descending order:
SELECT object_id, count(object_id) AS action_count
FROM `Actions`
GROUP BY object_id
ORDER BY action_count DESC
LIMIT 10;
You could try adding an index to object_id for optimization. In that way only the index will need to be scanned instead of the Actions table.
How about:
SELECT * FROM
(
SELECT object_id, count(object_id) AS action_count
FROM `Actions`
GROUP BY object_id
ORDER BY action_count
)
LIMIT 15
Also, if you have some measure of what must be the minimum number of actions to be included (e.g. the top n ones are surely more than 1000), you can increase the efficiency by adding a HAVING clause:
SELECT * FROM
(
SELECT object_id, count(object_id) AS action_count
FROM `Actions`
GROUP BY object_id
HAVING action_count > 1000
ORDER BY action_count
)
LIMIT 15
I know this thread is 2 years old but stackflow still finds it relevant so here goes my $0.02. ORDER BY clauses are computationally very expensive so they should be avoided in large tables. A trick I used (in part from Joe Celko's SQL for Smarties) is something like:
SELECT COUNT(*) AS counter, t0.object_id FROM (SELECT COUNT(*), actions.object_id FROM actions GROUP BY id) AS t0, (SELECT COUNT(*), actions.object_id FROM actions GROUP BY id) AS t1 WHERE t0.object_id < t1.object_id GROUP BY object_id HAVING counter < 15
Will give you the top 15 edited objects without sorting. Note that as of v5, mysql will only cache result sets for exactly duplicate (whitespace incl) queries so the nested query will not get cached. Using a view would resolve that problem.
Yes, it's three queries instead of two and and the only gain is the not having to sort the grouped query but if you have a lot of groups, it will be faster.
Side note: the query is really handy for median functions w/o sorts
SELECT * FROM (SELECT object_id, count(object_id) AS action_count
FROM `Actions`
GROUP BY object_id
ORDER BY action_count) LIMIT 10;