Need advice optimizing SQL query (update on MySQL) - mysql

I did a performance profiling on my database with the slow query log. It turned out this is the number one annoyance:
UPDATE
t1
SET
v1t1 =
(
SELECT
t2.v3t2
FROM
t2
WHERE
t2.v2t2 = t1.v2t1
AND t2.v1t2 <= '2012-04-24'
ORDER BY
t2.v1t2 DESC,
t2.v3t2 DESC
LIMIT 1
);
The subquery itself is already slow. I tried variations with DISTINCT, GROUP BY and more subqueries but nothing performed below 4 seconds. For example the following query
SELECT v2t2, v3t2
FROM t2
WHERE t2.v1t2 <= '2012-04-24'
GROUP BY v2t2
ORDER BY v1t2 DESC
takes:
mysql> SELECT ...
...
69054 rows in set (5.61 sec)
mysql> EXPLAIN SELECT ...
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+
| 1 | SIMPLE | t2 | ALL | v1t2 | NULL | NULL | NULL | 5203965 | Using where; Using temporary; Using filesort |
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+
mysql> SHOW CREATE TABLE t2;
...
PRIMARY KEY (`v3t2`),
KEY `v1t2_v3t2` (`v1t2`,`v3t2`),
KEY `v1t2` (`v1t2`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
SELECT COUNT(*) FROM t1;
+----------+
| COUNT(*) |
+----------+
| 77070 |
+----------+
SELECT COUNT(*) FROM t2;
+----------+
| COUNT(*) |
+----------+
| 5203965 |
+----------+
I am trying to fetch the newest entry (v3t2) and its parent (v2t2). Should not be that big of a deal, should it? Does anyone have any advice which knobs I should turn? Any help or hint is greatly appreciated!
This should be a more appropriate SELECT statement:
SELECT
t1.v2t1,
(
SELECT
t2.v3t2
FROM
t2
WHERE
t2.v2t2 = t1.v2t1
AND t2.v1t2 <= '2012-04-24'
ORDER BY
t2.v1t2 DESC,
t2.v3t2 DESC
LIMIT 1
) AS latest
FROM
t1

Your ORDER BY ... LIMIT 1 is forcing database to perform a full scan of the table to return only 1 row. It looks like very much as a candidate for indexing.
Before you build the index, check the fileds selectivity by running:
SELECT count(*), count(v1t2), count(DISTINCT v1t2) FROM t2;
If you're having high number of non-NULL values in your column and number of distinct values is more then 40% of the non-NULLs, then building index is a good thing to go.
If index provides no help, you should analyze the data in your columns. You're using t2.v1t2 <= '2012-04-24' condition, which, in the case you have a historical set of records in your table, will give nothing to the planner, as all rows are expected to be in the past, thus full scan is the best choice anyway. Thus, indexe is useless.
What you should do instead, is think how to rewrite your query in a way, that only a limited subset of records is checked. Your construct ORDER BY ... DESC LIMIT 1 shows that you probably want the most recent entry up to '2012-04-24' (including). Why don't you try to rewrite your query to a something like:
SELECT v2t2, v3t2
FROM t2
WHERE t2.v1t2 => date_add('2012-04-24' interval '-10' DAY)
GROUP BY v2t2
ORDER BY v1t2 DESC;
This is just an example, knowing the design of your database and nature of your data more precise query can be built.

I would take a look at indexes that are built for the sub-select t2. You should have a index for v2t2 and possibly one for v1t2, and v3t2 because of the ordering. The index should reduce the time the sub select has to go looking for the results before using them in your update query.

Does this work any better? Gets rid of one of the sorts and groups by the key being used.
UPDATE
t1
SET
v1t1 =
(
SELECT
MAX(t2.v3t2)
FROM
t2
WHERE
t2.v2t2 = t1.v2t1
AND t2.v1t2 <= '2012-04-24'
GROUP BY t2.v1t2
ORDER BY t2.v1t2 DESC
LIMIT 1
);
Alternate Version
UPDATE `t1`
SET `v1t1` = (
SELECT MAX(`t2`.`v3t2`)
FROM `t2`
WHERE `t2`.`v2t2` = `t1`.`v2t1`
AND `t2`.`v1t2` = (
SELECT MAX(`t2`.`v1t2`)
FROM `t2`
WHERE `t2`.`v2t2` = `t1`.`v2t1
AND `t2`.`v1t2` <= '2012-04-24'
LIMIT 1
)
LIMIT 1
);
And add this index to t2:
KEY `v2t2_v1t2` (`v2t2`, `v1t2`)

Related

Why Limit keyword is not working in Mysql?

select count(*) from bill limit 100000;
mysql> select count(*) from `bill` limit 100000;
+----------+
| count(*) |
+----------+
| 47497305 |
+----------+
1 row in set
limit limits the number of rows outputted in the result set, not the number of rows that are processed.
Therefore it doesn't have any impact on queries like count(*) .
To achieve this you would have to wrap query into another sub select. Although such query doesn't make too much sense:
SELECT COUNT(*) FROM (
SELECT * FROM bill LIMIT 100000
) t

How to optimize mysql group by with order by

I am currently experiencing an extremely slow query when using group by and order by. I have an inclination that the indexes are not being used because group by is on a separate column then order by
sqlFiddle
Foo Table Structure
id -> pk (indexed)
bar_id -> foreign key (indexed)
data -> varchar
created_at -> timestamp (indexed)
Here is the query:
SELECT * FROM foo GROUP BY bar_id ORDER BY created_at DESC
I am basically trying to get the most recent records for each bar_id. However this is taking up to 11 seconds to finish. Is there a better way to do this type of query?
SELECT COUNT(*) FROM foo;
+----------+
| COUNT(*) |
+----------+
| 98304 |
+----------+
1 row in set (0.03 sec)
SELECT x.*
FROM foo x
JOIN
( SELECT bar_id
, MAX(created_at) max_created_at
FROM foo
GROUP
BY bar_id
) y
ON y.bar_id = x.bar_id
AND y.max_created_at = x.created_at;
531 rows in set (0.01 sec)
Note: I've modified your schema slightly.
http://sqlfiddle.com/#!2/a6296/2

MySQL: how to increase speed of a select query with 2 joins and 1 subquery

In a table 'ttraces' I have many records for different tasks (whose value is held in 'taskid' column and is a foreign key of a column 'id' in a table 'ttasks'). Each task inserts a record to 'ttraces' every 8-10 seconds, so caching data to increase performance is not a good idea. What I need is to select only the newest records for each task from 'ttraces', that means the records with the maximum value of the column 'time'. At the moment, I have over 500000 records in the table. The very simplified structure of these two tables looks as follows:
-----------------------
| ttasks |
-----------------------
| id | name | blocked |
-----------------------
---------------------
| ttraces |
---------------------
| id | taskid | time |
---------------------
And my query is shown below:
SELECT t.name,tr.time
FROM
ttraces tr
JOIN
ttasks t ON tr.itask = t.id
JOIN (
SELECT taskid, MAX(time) AS max_time
FROM ttraces
GROUP BY itask
) x ON tr.taskid = x.taskid AND tr.time = x.max_time
WHERE t.blocked
All columns used in WHERE and JOIN clauses are indexed. As for now the query runs for ~1,5 seconds. It's extremely crucial to increase its speed. Thanks for all suggestions. BTW: the database is running on a hosted, shared server and I can't move it anywhere else for the moment.
[EDIT]
EXPLAIN SELECT... results are:
--------------------------------------------------------------------------------------------------------------
id select_type table type possible_keys key key_len ref rows Extra
--------------------------------------------------------------------------------------------------------------
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 74
1 PRIMARY t eq_ref PRIMARY PRIMARY 4 x.taskid 1 Using where
1 PRIMARY tr ref taskid,time time 9 x.max_time 1 Using where
2 DERIVED ttraces index NULL itask 5 NULL 570853
--------------------------------------------------------------------------------------------------------------
The engine is InnoDB.
I may be having a bit of a moment, but is this query not logically the same, and (almost certainly) faster?
SELECT t.id, t.name,max(tr.time)
FROM
ttraces tr
JOIN
ttasks t ON tr.itask = t.id
where BLOCKED
group by t.id, t.name
Here's my idea... You need one composite index on ttraces having taskid and time columns (in that order). Than, use this query:
SELECT t.name,
trm.mtime
FROM ttasks AS t
JOIN (SELECT taskid,
Max(time) AS mtime
FROM ttraces
GROUP BY taskid) AS trm
ON t.id = trm.taskid
WHERE t.blocked
Does this code return correct result? If so how is its speed time?
SELECT t.name, max_time
FROM ttasks t JOIN (
SELECT taskid, MAX(time) AS max_time
FROM ttraces
GROUP BY taskid
) x ON t.id = x.taskid
If there are many traces for each task then you can keep a table with only the newest traces. Whenever you insert into ttraces you also upsert into ttraces_newest:
insert into ttraces_newest (id, taskid, time) values
(3, 1, '2012-01-01 08:02:01')
on duplicate key update
`time` = current_timestamp
The primary key to ttraces_newest would be (id, taskid). Querying ttraces_newest would be cheaper. How much cheaper depends on how many traces there are to each task. Now the query is:
SELECT t.name,tr.time
FROM
ttraces_newest tr
JOIN
ttasks t ON tr.itask = t.id
WHERE t.blocked

MySQL sorting by date with GROUP BY

My table titles looks like this
id |group|date |title
---+-----+--------------------+--------
1 |1 |2012-07-26 18:59:30 | Title 1
2 |1 |2012-07-26 19:01:20 | Title 2
3 |2 |2012-07-26 19:18:15 | Title 3
4 |2 |2012-07-26 20:09:28 | Title 4
5 |2 |2012-07-26 23:59:52 | Title 5
I need latest result from each group ordered by date in descending order. Something like this
id |group|date |title
---+-----+--------------------+--------
5 |2 |2012-07-26 23:59:52 | Title 5
2 |1 |2012-07-26 19:01:20 | Title 2
I tried
SELECT *
FROM `titles`
GROUP BY `group`
ORDER BY MAX( `date` ) DESC
but I'm geting first results from groups. Like this
id |group|date |title
---+-----+--------------------+--------
3 |2 |2012-07-26 18:59:30 | Title 3
1 |1 |2012-07-26 19:18:15 | Title 1
What am I doing wrong?
Is this query going to be more complicated if I use LEFT JOIN?
This page was very helpful to me; it taught me how to use self-joins to get the max/min/something-n rows per group.
In your situation, it can be applied to the effect you want like so:
SELECT * FROM
(SELECT group, MAX(date) AS date FROM titles GROUP BY group)
AS x JOIN titles USING (group, date);
I found this topic via Google, looked like I had the same issue.
Here's my own solution if, like me, you don't like subqueries :
-- Create a temporary table like the output
CREATE TEMPORARY TABLE titles_tmp LIKE titles;
-- Add a unique key on where you want to GROUP BY
ALTER TABLE titles_tmp ADD UNIQUE KEY `group` (`group`);
-- Read the result into the tmp_table. Duplicates won't be inserted.
INSERT IGNORE INTO titles_tmp
SELECT *
FROM `titles`
ORDER BY `date` DESC;
-- Read the temporary table as output
SELECT *
FROM titles_tmp
ORDER BY `group`;
It has a way better performance. Here's how to increase speed if the date_column has the same order as the auto_increment_one (you then don't need an ORDER BY statement) :
-- Create a temporary table like the output
CREATE TEMPORARY TABLE titles_tmp LIKE titles;
-- Add a unique key on where you want to GROUP BY
ALTER TABLE titles_tmp ADD UNIQUE KEY `group` (`group`);
-- Read the result into the tmp_table, in the natural order. Duplicates will update the temporary table with the freshest information.
INSERT INTO titles_tmp
SELECT *
FROM `titles`
ON DUPLICATE KEY
UPDATE `id` = VALUES(`id`),
`date` = VALUES(`date`),
`title` = VALUES(`title`);
-- Read the temporary table as output
SELECT *
FROM titles_tmp
ORDER BY `group`;
Result :
+----+-------+---------------------+---------+
| id | group | date | title |
+----+-------+---------------------+---------+
| 2 | 1 | 2012-07-26 19:01:20 | Title 2 |
| 5 | 2 | 2012-07-26 23:59:52 | Title 5 |
+----+-------+---------------------+---------+
On large tables this method makes a significant point in terms of performance.
Well, if dates are unique in a group this would work (if not, you'll see several rows that match the max date in a group). (Also, bad naming of columns, 'group', 'date' might give you syntax errors and such specially 'group')
select t1.* from titles t1, (select group, max(date) date from titles group by group) t2
where t2.date = t1.date
and t1.group = t2.group
order by date desc
Another approach is to make use of MySQL user variables to identify a "control break" in the group values.
If you can live with an extra column being returned, something like this will work:
SELECT IF(s.group = #prev_group,0,1) AS latest_in_group
, s.id
, #prev_group := s.group AS `group`
, s.date
, s.title
FROM (SELECT t.id,t.group,t.date,t.title
FROM titles t
ORDER BY t.group DESC, t.date DESC, t.id DESC
) s
JOIN (SELECT #prev_group := NULL) p
HAVING latest_in_group = 1
ORDER BY s.group DESC
What this is doing is ordering all the rows by group and by date in descending order. (We specify DESC on all the columns in the ORDER BY, in case there is an index on (group,date,id) that MySQL can do a "reverse scan" on. The inclusion of the id column gets us deterministic (repeatable) behavior, in the case when there are more than one row with the latest date value.) That's the inline view aliased as s.
The "trick" we use is to compare the group value to the group value from the previous row. Whenever we have a different value, we know that we are starting a "new" group, and that this row is the "latest" row (we have the IF function return a 1). Otherwise (when the group values match), it's not the latest row (and we have the IF function returns a 0).
Then, we filter out all the rows that don't have that latest_in_group set as a 1.
It's possible to remove that extra column by wrapping that query (as an inline view) in another query:
SELECT r.id
, r.group
, r.date
, r.title
FROM ( SELECT IF(s.group = #prev_group,0,1) AS latest_in_group
, s.id
, #prev_group := s.group AS `group`
, s.date
, s.title
FROM (SELECT t.id,t.group,t.date,t.title
FROM titles t
ORDER BY t.group DESC, t.date DESC, t.id DESC
) s
JOIN (SELECT #prev_group := NULL) p
HAVING latest_in_group = 1
) r
ORDER BY r.group DESC
If your id field is an auto-incrementing field, and it's safe to say that the highest value of the id field is also the highest value for the date of any group, then this is a simple solution:
SELECT b.*
FROM (SELECT MAX(id) AS maxid FROM titles GROUP BY group) a
JOIN titles b ON a.maxid = b.id
ORDER BY b.date DESC
Use the below mysql query to get latest updated/inserted record from table.
SELECT * FROM
(
select * from `titles` order by `date` desc
) as tmp_table
group by `group`
order by `date` desc
Use the following query to get the most recent record from each group
SELECT
T1.* FROM
(SELECT
MAX(ID) AS maxID
FROM
T2
GROUP BY Type) AS aux
INNER JOIN
T2 AS T2 ON T1.ID = aux.maxID ;
Where ID is your auto increment field and Type is the type of records, you wanted to group by.
MySQL uses an dumb extension of GROUP BY which is not reliable if you want to get such results therefore, you could use
select id, group, date, title from titles as t where id =
(select id from titles where group = a.group order by date desc limit 1);
In this query, each time the table is scanned full for each group so it can find the most recent date. I could not find any better alternate for this. Hope this will help someone.

Do I need to have a multicolumn index?

EXPLAIN SELECT *
FROM (
`phppos_items`
)
WHERE (
name LIKE 'AB10LA2%'
OR item_number LIKE 'AB10LA2%'
OR category LIKE 'AB10LA2%'
)
AND deleted =0
ORDER BY `name` ASC
LIMIT 16
+----+-------------+--------------+-------+-----------------------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+-----------------------------------+------+---------+------+------+-------------+
| 1 | SIMPLE | phppos_items | index | item_number,name,category,deleted | name | 257 | NULL | 32 | Using where |
+----+-------------+--------------+-------+-----------------------------------+------+---------+------+------+-------------+
This query takes 9 seconds to run (the table has 1 million + rows).
I have an index on item_number,name,category,deleted separately. How can I speed up this query?
Best I'm aware, MySQL doesn't know how to perform bitmap OR index scans. But you could rewrite it as the union of three queries to force it to do such a thing, if you've an index on each field. If so, this will be very fast:
select *
from (
select * from (
select *
from phppos_items
where name like 'AB10LA2%' and deleted = 0
order by `name` limit 16
) t
union
select * from (
select *
from phppos_items
where item_number like 'AB10LA2%' and deleted = 0
order by `name` limit 16
) t
union
select * from (
select *
from phppos_items
where category like 'AB10LA2%' and deleted = 0
order by `name` limit 16
) t
) as top rows
order by `name` limit 16
The OR operator can be poison for an execution plan. You could try to re-phrase your query replacing the OR clauses by an equivalent UNION:
SELECT *
FROM (
SELECT * FROM `phppos_items`
WHERE name LIKE 'AB10LA2%'
UNION
SELECT * FROM `phppos_items`
WHERE item_number LIKE 'AB10LA2%'
UNION
SELECT * FROM `phppos_items`
WHERE category LIKE 'AB10LA2%'
)
WHERE deleted =0
ORDER BY `name` ASC
LIMIT 16
This will allow MySQL to run several sub-queries in parallel before applying the UNION operator to each of the subqueries' results. I know this can help a lot with Oracle. Maybe MySQL can do similar things? Note: I assume that LIKE 'AB10LA2%' is quite a selective filter. Otherwise, this might not improve things due to late ordering and limiting in the execution plan. See Denis's answer for a more general approach.
In any case, I think a multi-column index won't help you because you have '%' signs in your search expressions. That way, only the first column in the multi-column index could be used, the rest would still need index-scanning or a full table scan.