Combined sql query takes too much time - mysql

My one query in MySQL takes too much time in executing. In this query, I use IN operator for fetching database from MySQL database.
My query :
SELECT *
FROM databse_posts.post_feeds
WHERE
post_id IN (SELECT post_id FROM database_users.user_bookmarks where user_id=3) AND
post_date < unix_timestamp();
In this, both individual queries takes very less time for execution like
SELECT post_id FROM database_users.user_bookmarks where user_id=3
takes around 400 ms max
and
SELECT * FROM databse_posts.post_feeds Where post_date < unix_timestamp();
takes 300 ms max
But combining both of queries in one using IN operator it takes around 6 to 7 secs.
Why this taking too much time.
I also write a different same type of queries but all those don't take that amount of time.

instead of a where IN (subselect) you could try a inner join on the subselect
SELECT *
FROM databse_posts.post_feeds
INNER JOIN (
SELECT post_id
FROM database_users.user_bookmarks
where user_id=3
) T on T.post_id = post_feeds.post_id
AND
post_date < unix_timestamp();
and be sure you have proper index on post_feeds.post_id and user_bookmarks.user_id, user_bookmarks.post_id

My approach:
You need to create indexes for the fields post_feeds.post_id, user_bookmarks.post_id, user_bookmarks.user_id and post_feeds.post_date fields, then use INNER JOIN to let MySQL engine to manipulate the filtering and merging of rows in an efficient way:
SELECT
pf.*
FROM
databse_posts.post_feeds pf
INNER JOIN database_users.user_bookmarks ub
ON ( pf.post_id = ub.post_id )
WHERE
ub.user_id = 3
AND pf.post_date < unix_timestamp();

My rough guess here would be that the WHERE IN expression is doing something of which you might not be aware. Consider your full query:
SELECT *
FROM databse_posts.post_feeds
WHERE
post_id IN (SELECT post_id FROM database_users.user_bookmarks where user_id=3) AND
post_date < unix_timestamp();
MySQL has to check, for each record, each value of post_id and compare it to the list of post_id coming from the subquery. This is much more costly than just running that subquery once. There are various tricks available at MySQL's disposal to speed this up, but the subquery inside a WHERE IN clause is not the same as just running that subquery once.
If this hypothesis be correct, then the following query should also be in the range of 6-7 seconds:
SELECT *
FROM databse_posts.post_feeds
WHERE
post_id IN (SELECT post_id FROM database_users.user_bookmarks where user_id=3)
If so, then we would know the source of slow performance.

Related

Why is my sql query with a subquery so slow even though the subquery performs fast?

My query is something like:
SELECT *,
(SELECT COUNT(*) FROM comments WHERE comments.thread = threads.id) AS comments
FROM threads
LIMIT 10
comments.thread is an index, queries like this run fast:
SELECT COUNT(*) FROM comments WHERE comments.thread = 'someId'
However, my query is extremly slow. It takes 10 seconds times the limit I define. Why?
For this query:
SELECT t.*,
(SELECT COUNT(*) FROM comments c WHERE c.thread = t.id) AS comments
FROM threads t
LIMIT 10;
You want an index on comments(thread). If your other query runs fast, then I would guess that you already have one.
Perhaps the LIMIT and subquery are acting strangely. Is this version also slow?
SELECT t.*,
(SELECT COUNT(*) FROM comments c WHERE c.thread = t.id) AS comments
FROM (SELECT t.*
FROM threads t
LIMIT 10
) t;
Your inner query is a corelated subquery, meaning it uses a value from the outer query, so executes for every row of the outer query. Maybe MySQL is not so good at optimizing the query.
Try this:
SELECT threads.*, count(comments.thread) AS comments
FROM threads
JOIN comments ON comments.thread = threads.id
GROUP BY 1,2,3,4,5 -- one number here for each column of the threads table
LIMIT 10

Merge multiple SQL queries for efficiency

How would I combine these two SQL queries into a single query for greater speed and efficiency?
SELECT price FROM table LIMIT 30
SELECT AVG(price) as avg FROM table
It will not be faster or more efficient, but I can see needs for it (such as trying to compare a value to the average), something like this:
SELECT t.value, avgSubQ.theAvg, t.value / avgSubQ.theAvg AS relativeToAvg
FROM table AS t
INNER JOIN (SELECT AVG(value) AS theAvg FROM table) AS avgSubQ
LIMIT 30;
as avg will generate 1 record, you could just use a cross join.
SELECT price, B.mavg
FROM table
CROSS JOIN (SELECT AVG(price) as mavg FROM table) B
LIMIT 30
Try this query
SELECT price,avg(price) as avg from table LIMIT 30
Here every row will duplicate the average price, but you'll get what you want in one query.

MySQL nested query speed

I'm coming from a Postgres background and trying to convert my application to MySQL. I have a query which is very fast on Postgres and very slow on MySQL. After doing some analysis, I have determined that one cause of the drastic speed difference is nested queries. The following pseudo query takes 170 ms on Postgres and 5.5 seconds on MySQL.
SELECT * FROM (
SELECT id FROM a INNER JOIN b
) AS first LIMIT 10
On both MySQL and Postgres the speed is the same for the following query (less than 10 ms)
SELECT id FROM a INNER JOIN b LIMIT 10
I have the exact same tables, indices, and data on both databases, so I really have no idea why this is so slow.
Any insight would be greatly appreciated.
Thanks
EDIT
Here is one specific example of why I need to do this. I need to get the sum of max. In order to do this I need a sub select as shown in the query below.
SELECT SUM(a) AS a
FROM (
SELECT table2.b, MAX(table1.a) AS a
FROM table1
INNER JOIN table2 ON table2.abc_id = table1.abc_id
AND table1.read_datetime >= table2.issuance_datetime
AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
GROUP BY table2.id, b
) AS first
GROUP BY b
LIMIT 10
Again this query takes 14 seconds on MySQL and 238 ms on Postgres. Here is the output from explain on MySQL:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,PRIMARY,<derived2>,ALL,\N,\N,\N,\N,25584,Using temporary; Using filesort
2,DERIVED,table2,index,PRIMARY,index_table2_on_b,index_table2_on_d,index_table2_on_issuance_datetime,index_table2_on_unassignment_datetime,index_table2_on_e,PRIMARY,4,\N,25584,Using where
2,DERIVED,tz,ref,index_table1_on_d,index_table1_on_read_datetime,index_table1_on_d_and_read_datetime,index_table1_on_4,4,db.table2.dosimeter_id,1,Using where
Jon, answering your comment, here is an example:
drop table if exists temp_preliminary_table;
create temporary table temp_preliminary_table
SELECT table2.b, MAX(table1.a) AS a
FROM table1
INNER JOIN table2 ON table2.abc_id = table1.abc_id
AND table1.read_datetime >= table2.issuance_datetime
AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
GROUP BY table2.id, b;
-- I suggest you add indexes to this temp table
alter table temp_preliminary_table
add index idx_b(b); -- Add as many indexes as you need
-- Now perform your query on this temp_table
SELECT SUM(a) AS a
FROM temp_preliminary_table
GROUP BY b
LIMIT 10;
This is just an example, splitting your query in three steps.
You need to remember that temp tables in MySQL are only visible to the connection that created them, so any other connection won't see temp tables created by another connection (for better or worse).
This "divide-and-conquer" approach has saved me many headaches. I hope it helps you.
In the nested query MySQL is doing the whole join before applying the limit while postgresql is smart enough to figure out that it is only necessary to join any 10 tuples.
Correct me if I am wrong, but why don't you try:
SELECT * FROM a INNER JOIN b LIMIT 10;
Given the fact that table2.id is the primary key this query with the limit in the inner query is functionally equivalent to yours where the limit is in the outer query and that is what the Postgresql planner figured out.
SELECT table2.b, MAX(table1.a) AS a
FROM table1
INNER JOIN table2 ON table2.abc_id = table1.abc_id
AND table1.read_datetime >= table2.issuance_datetime
AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
GROUP BY table2.id, b
order by a desc
LIMIT 10

Optimizing database query with up to 10mil rows as result

I have a MySQL Query that i need to optimize as much as possible (should have a load time below 5s, if possible)
Query is as follow:
SELECT domain_id, COUNT(keyword_id) as total_count
FROM tableName
WHERE keyword_id IN (SELECT DISTINCT keyword_id FROM tableName WHERE domain_id = X)
GROUP BY domain_id
ORDER BY total_count DESC
LIMIT ...
X is an integer that comes from an input
domain_id and keyword_id are indexed
database is on localhost, so the network speed should be max
The subquery from the WHERE clause can get up to 10 mil results. Also, for MySQL seems really hard to calculate the COUNT and ORDER BY this count.
I tried to mix this query with SOLR, but no results, getting such a high number of rows at once gives hard time for both MySQL and SOLR
I'm looking for a solution to have the same results, no matter if i have to use a different technology or an improvement to this MySQL query.
Thanks!
Query logic is this:
We have a domain and we are searching for all the keywords that are being used on that domain (this is the sub query). Then we take all the domains that use at least one of the keywords found on the first query, grouped by domain, with the number of keywords used for each domain, and we have to display it ordered DESC by the number of keywords used.
I hope this make sense
You may try JOIN instead of subquery:
SELECT tableName.domain_id, COUNT(tableName.keyword_id) AS total_count
FROM tableName
INNER JOIN tableName AS rejoin
ON rejoin.keyword_id = tableName.keyword_id
WHERE rejoin.domain_id = X
GROUP BY tableName.domain_id
ORDER BY tableName.total_count DESC
LIMIT ...
I am not 100% sure but can you try this please
SELECT t1.domain_id, COUNT(t1.keyword_id) as total_count
FROM tableName AS t1 LEFT JOIN
(SELECT DISTINCT keyword_id FROM tableName WHERE domain_id = X) AS t2
ON t1.keyword_id = t2.keyword_id
WHERE t2.keyword_id IS NTO NULL
GROUP BY t1.domain_id
ORDER BY total_count DESC
LIMIT ...
The goal is to replace the WHERE IN clause with INNER JOIN and that will make it lot quicker. WHERE IN clause always make the Mysql server to struggle, but it is even more noticeable when you do it with huge amount of data. Use WHERE IN only if it make you query look easier to be read/understood, you have a small data set or it is not possible in another way (but you probably will have another way to do it anyway :) )
In terms of MySQL all you can do is to minimize Disk IO for the query using covering indexes and rewrite it a little more efficient so that the query would benefit from them.
Since keyword_id has a match in another copy of the table, COUNT(keyword_id) becomes COUNT(*).
The kind of subqueries you use is known to be the worst case for MySQL (it executes the subquery for each row), but I am not sure if it should be replaced with a JOIN here, because It might be a proper strategy for your data.
As you probably understand, the query like:
SELECT domain_id, COUNT(*) as total_count
FROM tableName
WHERE keyword_id IN (X,Y,Z)
GROUP BY domain_id
ORDER BY total_count DESC
would have the best performance with a covering composite index (keyword_id, domain_id [,...]), so it is a must. From the other side, the query like:
SELECT DISTINCT keyword_id FROM tableName WHERE domain_id = X
will have the best performance on a covering composite index (domain_id, keyword_id [,...]). So you need both of them.
Hopefully, but I am not sure, when you have the latter index, MySQL can understand that you do not need to select all those keyword_id in the subquery, but you just need to check if there is an entry in the index, and I am sure that it is better expressed if you do not use DISTINCT.
So, I would try to add those two indexes and rewrite the query as:
SELECT domain_id, COUNT(*) as total_count
FROM tableName
WHERE keyword_id IN (SELECT keyword_id FROM tableName WHERE domain_id = X)
GROUP BY domain_id
ORDER BY total_count DESC
Another option is to rewrite the query as follows:
SELECT domain_id, COUNT(*) as total_count
FROM (
SELECT DISTINCT keyword_id
FROM tableName
WHERE domain_id = X
) as kw
JOIN tableName USING (keyword_id)
GROUP BY domain_id
ORDER BY total_count DESC
Once again you need those two composite indexes.
Which one of the queries is quicker depends on the statistics in your tableName.

two select statements with count - very slow

I am trying to get all the entries in a session log table where a session has more than 10 entries (i.e. the count of the session_id is greater than 10). What I have right now are two select statements:
select * from log_metrics where session_id in
( select session_id from log_metrics
group by session_id having count(*) > 10
)
The log_metrics table is quite large, aprox. 7,700,000 rows. The inner select takes 12.88 seconds and finds 178,000 session id's. The whole query doesn't finish running written like this, but when adding limit 10 to the end of the outer select it completes in 18 seconds, limit 100 completes in 3 min 35 sec. I tried adding the limit to the inner select but got the following error:
ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
Is there a way to rewrite this query to speed things up? I only need to get about 5,000 rows from log_metrics returned, not the total 178,000 of the session id's.
Thanks for any help you might be able to give. I am new to mysql and posting so pardon any etiquette mis-steps.
select *
from log_metrics a
inner join (select session_id from log_metrics group by session_id having count(*) > 10) b
on a.session_id = b.session_id
Here's a SQL fiddle: http://sqlfiddle.com/#!2/7bed6/3
I have no idea if this will work (I don't know what version of mySQL you have, and I don't have an instance regardless), but would using a JOIN work as you want?
SELECT *
FROM log_metrics a
JOIN (SELECT session_id
FROM log_metrics
GROUP BY session_id
HAVING COUNT(session_id) > 10
LIMIT 5000) b
ON b.session_id = a.session_id
You didn't mention this, but for future questioners, the reason he needs the LIMIT statement inside the inner query is because he wants (a maximum of) 5000 session_ids, not total rows from the log (which could be 50,000 rows or more returned).
Try switching to an EXISTS check instead of the IN clause:
select * from log_metrics a where EXISTS
( select b.session_id from log_metrics b
where a.session_id = b.session_id
group by b.session_id having count(*) > 10
)