two select statements with count - very slow - mysql

I am trying to get all the entries in a session log table where a session has more than 10 entries (i.e. the count of the session_id is greater than 10). What I have right now are two select statements:
select * from log_metrics where session_id in
( select session_id from log_metrics
group by session_id having count(*) > 10
)
The log_metrics table is quite large, aprox. 7,700,000 rows. The inner select takes 12.88 seconds and finds 178,000 session id's. The whole query doesn't finish running written like this, but when adding limit 10 to the end of the outer select it completes in 18 seconds, limit 100 completes in 3 min 35 sec. I tried adding the limit to the inner select but got the following error:
ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
Is there a way to rewrite this query to speed things up? I only need to get about 5,000 rows from log_metrics returned, not the total 178,000 of the session id's.
Thanks for any help you might be able to give. I am new to mysql and posting so pardon any etiquette mis-steps.

select *
from log_metrics a
inner join (select session_id from log_metrics group by session_id having count(*) > 10) b
on a.session_id = b.session_id
Here's a SQL fiddle: http://sqlfiddle.com/#!2/7bed6/3

I have no idea if this will work (I don't know what version of mySQL you have, and I don't have an instance regardless), but would using a JOIN work as you want?
SELECT *
FROM log_metrics a
JOIN (SELECT session_id
FROM log_metrics
GROUP BY session_id
HAVING COUNT(session_id) > 10
LIMIT 5000) b
ON b.session_id = a.session_id
You didn't mention this, but for future questioners, the reason he needs the LIMIT statement inside the inner query is because he wants (a maximum of) 5000 session_ids, not total rows from the log (which could be 50,000 rows or more returned).

Try switching to an EXISTS check instead of the IN clause:
select * from log_metrics a where EXISTS
( select b.session_id from log_metrics b
where a.session_id = b.session_id
group by b.session_id having count(*) > 10
)

Related

Two select statements on same table and get Count(*)

Im trying to do two queries on the same table to get the Count(*) value.
I have this
SELECT `a`.`name`, `a`.`points` FROM `rank` AS a WHERE `id` = 1
And in the same query I want to do this
SELECT `b`.`Count(*)` FROM `rank` as b WHERE `b`.`points` >= `a`.`points`
I tried searching but did not find how to do a Count(*) in the same query.
Typically you would not intermingle a non aggregate and aggregate query together in MySQL. You might do this in databases which support analytic functions, such as SQL Server, but not in (the current version of) MySQL. That being said, your second query can be handled using a correlated subquery in the select clause the first query. So you may try the following:
SELECT
a.name,
a.points,
(SELECT COUNT(*) FROM rank b WHERE b.points >= a.points) AS cnt
FROM rank a
WHERE a.id = 1;
As I understand from the question, you want to find out in a table for a given id how many rows have the points greater than this row. This can be achieved using full join.
select count(*) from rank a join rank b on(a.id != b.id) where a.id=1 and b.points >= a.points;

Why is my sql query with a subquery so slow even though the subquery performs fast?

My query is something like:
SELECT *,
(SELECT COUNT(*) FROM comments WHERE comments.thread = threads.id) AS comments
FROM threads
LIMIT 10
comments.thread is an index, queries like this run fast:
SELECT COUNT(*) FROM comments WHERE comments.thread = 'someId'
However, my query is extremly slow. It takes 10 seconds times the limit I define. Why?
For this query:
SELECT t.*,
(SELECT COUNT(*) FROM comments c WHERE c.thread = t.id) AS comments
FROM threads t
LIMIT 10;
You want an index on comments(thread). If your other query runs fast, then I would guess that you already have one.
Perhaps the LIMIT and subquery are acting strangely. Is this version also slow?
SELECT t.*,
(SELECT COUNT(*) FROM comments c WHERE c.thread = t.id) AS comments
FROM (SELECT t.*
FROM threads t
LIMIT 10
) t;
Your inner query is a corelated subquery, meaning it uses a value from the outer query, so executes for every row of the outer query. Maybe MySQL is not so good at optimizing the query.
Try this:
SELECT threads.*, count(comments.thread) AS comments
FROM threads
JOIN comments ON comments.thread = threads.id
GROUP BY 1,2,3,4,5 -- one number here for each column of the threads table
LIMIT 10

nested query on the same table

do you think a query like this will create problem in the execution of my software?
I need to delete the all the table, except the last 2 groups of entries, grouped by the same time of insert.
delete from tableA WHERE time not in
(
SELECT time FROM
(select distinct time from tableA order by time desc limit 2
) AS tmptable
);
Do you have better solution? I'm using mysql 5.5
I don't see anything wrong with your query, but I prefer using an OUTER JOIN/NULL check (plus it alleviates the need for one of the nested subqueries):
delete a
from tableA a
left join
(
select distinct time
from tableA
order by time desc
limit 2
) b on a.time = b.time
where b.time is null
SQL Fiddle Demo

MySQL nested query speed

I'm coming from a Postgres background and trying to convert my application to MySQL. I have a query which is very fast on Postgres and very slow on MySQL. After doing some analysis, I have determined that one cause of the drastic speed difference is nested queries. The following pseudo query takes 170 ms on Postgres and 5.5 seconds on MySQL.
SELECT * FROM (
SELECT id FROM a INNER JOIN b
) AS first LIMIT 10
On both MySQL and Postgres the speed is the same for the following query (less than 10 ms)
SELECT id FROM a INNER JOIN b LIMIT 10
I have the exact same tables, indices, and data on both databases, so I really have no idea why this is so slow.
Any insight would be greatly appreciated.
Thanks
EDIT
Here is one specific example of why I need to do this. I need to get the sum of max. In order to do this I need a sub select as shown in the query below.
SELECT SUM(a) AS a
FROM (
SELECT table2.b, MAX(table1.a) AS a
FROM table1
INNER JOIN table2 ON table2.abc_id = table1.abc_id
AND table1.read_datetime >= table2.issuance_datetime
AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
GROUP BY table2.id, b
) AS first
GROUP BY b
LIMIT 10
Again this query takes 14 seconds on MySQL and 238 ms on Postgres. Here is the output from explain on MySQL:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,PRIMARY,<derived2>,ALL,\N,\N,\N,\N,25584,Using temporary; Using filesort
2,DERIVED,table2,index,PRIMARY,index_table2_on_b,index_table2_on_d,index_table2_on_issuance_datetime,index_table2_on_unassignment_datetime,index_table2_on_e,PRIMARY,4,\N,25584,Using where
2,DERIVED,tz,ref,index_table1_on_d,index_table1_on_read_datetime,index_table1_on_d_and_read_datetime,index_table1_on_4,4,db.table2.dosimeter_id,1,Using where
Jon, answering your comment, here is an example:
drop table if exists temp_preliminary_table;
create temporary table temp_preliminary_table
SELECT table2.b, MAX(table1.a) AS a
FROM table1
INNER JOIN table2 ON table2.abc_id = table1.abc_id
AND table1.read_datetime >= table2.issuance_datetime
AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
GROUP BY table2.id, b;
-- I suggest you add indexes to this temp table
alter table temp_preliminary_table
add index idx_b(b); -- Add as many indexes as you need
-- Now perform your query on this temp_table
SELECT SUM(a) AS a
FROM temp_preliminary_table
GROUP BY b
LIMIT 10;
This is just an example, splitting your query in three steps.
You need to remember that temp tables in MySQL are only visible to the connection that created them, so any other connection won't see temp tables created by another connection (for better or worse).
This "divide-and-conquer" approach has saved me many headaches. I hope it helps you.
In the nested query MySQL is doing the whole join before applying the limit while postgresql is smart enough to figure out that it is only necessary to join any 10 tuples.
Correct me if I am wrong, but why don't you try:
SELECT * FROM a INNER JOIN b LIMIT 10;
Given the fact that table2.id is the primary key this query with the limit in the inner query is functionally equivalent to yours where the limit is in the outer query and that is what the Postgresql planner figured out.
SELECT table2.b, MAX(table1.a) AS a
FROM table1
INNER JOIN table2 ON table2.abc_id = table1.abc_id
AND table1.read_datetime >= table2.issuance_datetime
AND table1.read_datetime < COALESCE(table2.unassignment_datetime, DATE('9999-01-01'))
WHERE table1.read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-07-18 03:03:42' AND table2.c = 0
GROUP BY table2.id, b
order by a desc
LIMIT 10

Mysql WHERE IN subquery

I'd like to list all rows having a match on same table.
So far i have came up with this
SELECT *
FROM parim_firms
WHERE firm_name IN (
SELECT firm_name
FROM parim_firms
GROUP BY firm_name
HAVING COUNT(*) > 1
)
But this query keeps running, although the subquery itself runs in 0.1 sec.
How could i optimize this?
I think the subquery executes for each row, not only once. Am i right?
how about joining it?
SELECT a.*
FROM parim_firms a
INNER JOIN
(
SELECT firm_name
FROM parim_firms
GROUP BY firm_name
HAVING COUNT(*) > 1
) b ON a.firm_name = b.firm_name
PS: be sure to add index on column firm_name for faster execution.