How to optimize mysql group by with order by - mysql

I am currently experiencing an extremely slow query when using group by and order by. I have an inclination that the indexes are not being used because group by is on a separate column then order by
sqlFiddle
Foo Table Structure
id -> pk (indexed)
bar_id -> foreign key (indexed)
data -> varchar
created_at -> timestamp (indexed)
Here is the query:
SELECT * FROM foo GROUP BY bar_id ORDER BY created_at DESC
I am basically trying to get the most recent records for each bar_id. However this is taking up to 11 seconds to finish. Is there a better way to do this type of query?

SELECT COUNT(*) FROM foo;
+----------+
| COUNT(*) |
+----------+
| 98304 |
+----------+
1 row in set (0.03 sec)
SELECT x.*
FROM foo x
JOIN
( SELECT bar_id
, MAX(created_at) max_created_at
FROM foo
GROUP
BY bar_id
) y
ON y.bar_id = x.bar_id
AND y.max_created_at = x.created_at;
531 rows in set (0.01 sec)
Note: I've modified your schema slightly.
http://sqlfiddle.com/#!2/a6296/2

Related

MySQL subselect needed

I just can't figure this out. I need a select statement that will find all job_id's where its profile_sent column is ALL non-zero. So in this case the select should return just "2064056592" because all its rows for profile_sent are non-zero, but 4064056590 still has a 0 in one of its rows so it is not found.
I can obviously get the distinct job_id with:
mysql> Select distinct job_id from Table;
+------------+
| job_id |
+------------+
| 4064056590 |
| 2064056592 |
+------------+
But have no idea how to subselect where each job_id has all its profile_sent column as non-zeros.
See https://snipboard.io/x4UNKc.jpg for the table structure.
Using a subquery to find all the distinct job_ids with profile_sent as 0
and filtering them out should work:
SELECT
DISTINCT t.`job_id`
FROM
test_table t
WHERE t.`job_id` NOT IN
(SELECT DISTINCT
t1.job_id
FROM
test_table t1
WHERE t1.`profile_sent` = 0)
Another approach could be grouping them by their job_id and then checking the sum of profile_sent value is 0;
SELECT t.`job_id` FROM `test_table` t
GROUP BY t.`job_id`
HAVING SUM(t.`profile_sent` = 0)=0

Why Limit keyword is not working in Mysql?

select count(*) from bill limit 100000;
mysql> select count(*) from `bill` limit 100000;
+----------+
| count(*) |
+----------+
| 47497305 |
+----------+
1 row in set
limit limits the number of rows outputted in the result set, not the number of rows that are processed.
Therefore it doesn't have any impact on queries like count(*) .
To achieve this you would have to wrap query into another sub select. Although such query doesn't make too much sense:
SELECT COUNT(*) FROM (
SELECT * FROM bill LIMIT 100000
) t

It is possible to optimize SQL "order by limit" with large offset

I have a table with 450000 records at MySQL.
The query below cost nearly 3s. Is it possible to lower the time cost?
SELECT * FROM table order by id desc limit 400000, 8000
Assuming id is indexed, then there's not a lot you can do - apart from what's been suggested above. That said, I'm surprised it's soo slow...
SELECT COUNT(*) FROM my_table;
+----------+
| COUNT(*) |
+----------+
| 450000 |
+----------+
1 row in set (0.12 sec)
SELECT * FROM my_table ORDER BY i DESC LIMIT 400000,8000;
...
8000 rows in set (0.20 sec)

Query won't return results when 0 is returned

I need this query to return results when the count is 0 instead of just producing an empty set. How can I adjust my query so that it produces the following table with a '0' for the count and the appropriate date?
mysql> select count(test.id), date(insert_datetime) date
from db.table test where date(insert_datetime)='2015-08-17'
group by date(insert_datetime);
+--------------+------------+
| count(test.id) | date |
+--------------+------------+
| 42 | 2015-08-17 |
+--------------+------------+
1 row in set (0.14 sec)
mysql> select count(test.id), date(insert_datetime) date
from db.table test where date(insert_datetime)='2015-08-16'
group by date(insert_datetime);
Empty set (0.00 sec)
This should do it:
SELECT theDate AS `date`, IFNULL(subC.theCount, 0) AS `theCount`
FROM (SELECT DATE(20150817) AS `theDate`) AS subD
LEFT JOIN (
SELECT COUNT(test.id) AS theCount, DATE(insert_datetime) AS `theDate`
FROM db.table AS test
WHERE insert_datetime BETWEEN 20150817000000 AND 20150817235959
GROUP BY theDate
) AS subC
USING (theDate)
;
As another user hinted in a now deleted comment: if you are going to need this for a date range, an "all dates" table may come in more handy than the subD subquery; making a SELECT DATE(X) UNION SELECT DATE(Y) UNION SELECT DATE(Z) UNION SELECT DATE(etc...) subquery gets ridiculous fairly quickly.

Need advice optimizing SQL query (update on MySQL)

I did a performance profiling on my database with the slow query log. It turned out this is the number one annoyance:
UPDATE
t1
SET
v1t1 =
(
SELECT
t2.v3t2
FROM
t2
WHERE
t2.v2t2 = t1.v2t1
AND t2.v1t2 <= '2012-04-24'
ORDER BY
t2.v1t2 DESC,
t2.v3t2 DESC
LIMIT 1
);
The subquery itself is already slow. I tried variations with DISTINCT, GROUP BY and more subqueries but nothing performed below 4 seconds. For example the following query
SELECT v2t2, v3t2
FROM t2
WHERE t2.v1t2 <= '2012-04-24'
GROUP BY v2t2
ORDER BY v1t2 DESC
takes:
mysql> SELECT ...
...
69054 rows in set (5.61 sec)
mysql> EXPLAIN SELECT ...
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+
| 1 | SIMPLE | t2 | ALL | v1t2 | NULL | NULL | NULL | 5203965 | Using where; Using temporary; Using filesort |
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+
mysql> SHOW CREATE TABLE t2;
...
PRIMARY KEY (`v3t2`),
KEY `v1t2_v3t2` (`v1t2`,`v3t2`),
KEY `v1t2` (`v1t2`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
SELECT COUNT(*) FROM t1;
+----------+
| COUNT(*) |
+----------+
| 77070 |
+----------+
SELECT COUNT(*) FROM t2;
+----------+
| COUNT(*) |
+----------+
| 5203965 |
+----------+
I am trying to fetch the newest entry (v3t2) and its parent (v2t2). Should not be that big of a deal, should it? Does anyone have any advice which knobs I should turn? Any help or hint is greatly appreciated!
This should be a more appropriate SELECT statement:
SELECT
t1.v2t1,
(
SELECT
t2.v3t2
FROM
t2
WHERE
t2.v2t2 = t1.v2t1
AND t2.v1t2 <= '2012-04-24'
ORDER BY
t2.v1t2 DESC,
t2.v3t2 DESC
LIMIT 1
) AS latest
FROM
t1
Your ORDER BY ... LIMIT 1 is forcing database to perform a full scan of the table to return only 1 row. It looks like very much as a candidate for indexing.
Before you build the index, check the fileds selectivity by running:
SELECT count(*), count(v1t2), count(DISTINCT v1t2) FROM t2;
If you're having high number of non-NULL values in your column and number of distinct values is more then 40% of the non-NULLs, then building index is a good thing to go.
If index provides no help, you should analyze the data in your columns. You're using t2.v1t2 <= '2012-04-24' condition, which, in the case you have a historical set of records in your table, will give nothing to the planner, as all rows are expected to be in the past, thus full scan is the best choice anyway. Thus, indexe is useless.
What you should do instead, is think how to rewrite your query in a way, that only a limited subset of records is checked. Your construct ORDER BY ... DESC LIMIT 1 shows that you probably want the most recent entry up to '2012-04-24' (including). Why don't you try to rewrite your query to a something like:
SELECT v2t2, v3t2
FROM t2
WHERE t2.v1t2 => date_add('2012-04-24' interval '-10' DAY)
GROUP BY v2t2
ORDER BY v1t2 DESC;
This is just an example, knowing the design of your database and nature of your data more precise query can be built.
I would take a look at indexes that are built for the sub-select t2. You should have a index for v2t2 and possibly one for v1t2, and v3t2 because of the ordering. The index should reduce the time the sub select has to go looking for the results before using them in your update query.
Does this work any better? Gets rid of one of the sorts and groups by the key being used.
UPDATE
t1
SET
v1t1 =
(
SELECT
MAX(t2.v3t2)
FROM
t2
WHERE
t2.v2t2 = t1.v2t1
AND t2.v1t2 <= '2012-04-24'
GROUP BY t2.v1t2
ORDER BY t2.v1t2 DESC
LIMIT 1
);
Alternate Version
UPDATE `t1`
SET `v1t1` = (
SELECT MAX(`t2`.`v3t2`)
FROM `t2`
WHERE `t2`.`v2t2` = `t1`.`v2t1`
AND `t2`.`v1t2` = (
SELECT MAX(`t2`.`v1t2`)
FROM `t2`
WHERE `t2`.`v2t2` = `t1`.`v2t1
AND `t2`.`v1t2` <= '2012-04-24'
LIMIT 1
)
LIMIT 1
);
And add this index to t2:
KEY `v2t2_v1t2` (`v2t2`, `v1t2`)