Avoid "filesort" on UNION RESULT - mysql

Sub-query 1:
SELECT * from big_table
where category = 'fruits' and name = 'apple'
order by yyyymmdd desc
Explain:
table | key | extra
big_table | name_yyyymmdd | using where
Looks great!
Sub-query 2:
SELECT * from big_table
where category = 'fruits' and (taste = 'sweet' or wildcard = '*')
order by yyyymmdd desc
Explain:
table | key | extra
big_table | category_yyyymmdd | using where
Looks great!
Now if I combine those with UNION:
SELECT * from big_table
where category = 'fruits' and name = 'apple'
UNION
SELECT * from big_table
where category = 'fruits' and (taste = 'sweet' or wildcard = '*')
Order by yyyymmdd desc
Explain:
table | key | extra
big_table | name | using index condition, using where
big_table | category | using index condition
UNION RESULT| NULL | using temporary; using filesort
Not so good, it uses filesort.
This is a trimmed down version of a more complexed query, here are some facts about the big_table:
big_table has 10M + rows
There are 5 unique "category"s
There are 5 unique "taste"s
There are about 10,000 unique "name"s
There are about 10,000 unique "yyyymmdd"s
I have created single index on each of those fields, plus composite idx such as yyyymmdd_category_taste_name but Mysql is not using it.

SELECT * FROM big_table
WHERE category = 'fruits'
AND ( name = 'apple'
OR taste = 'sweet'
OR wildcard = '*' )
ORDER BY yyyymmdd DESC
And have INDEX(catgory) or some index starting with category. However, if more than about 20% of the table is category = 'fruits' will probably decide to ignore the index and simply do a table scan. (Since you say there are only 5 categories, I suspect the optimizer will rightly eschew the index.)
Or this might be beneficial: INDEX(category, yyyymmdd), in this order.
The UNION had to do a sort (either in memory on disk, it is not clear) because it was unable to fetch the rows in the desired order.
A composite index INDEX(yyyymmdd, ...) might be used to avoid the 'filesort', but it won't use any columns after yyyymmdd.
When constructing a composite index, start with any WHERE columns compared '='. After that you can add one range or group by or order by. More details.
UNION is often a good choice for avoiding a slow OR, but in this case it would need three indexes
INDEX(category, name)
INDEX(category, taste)
INDEX(category, wildcard)
and adding yyyymmdd would not help unless you add a LIMIT.
And the query would be:
( SELECT * FROM big_table WHERE category = 'fruits' AND name = 'apple' )
UNION DISTINCT
( SELECT * FROM big_table WHERE category = 'fruits' AND taste = 'sweet' )
UNION DISTINCT
( SELECT * FROM big_table WHERE category = 'fruits' AND wildcard = '*' )
ORDER BY yyyymmdd DESC
Adding a limit would be even messier. First tack yyyymmdd on the end of each of the three composite indexes, then
( SELECT ... ORDER BY yyyymmdd DESC LIMIT 10 )
UNION DISTINCT
( SELECT ... ORDER BY yyyymmdd DESC LIMIT 10 )
UNION DISTINCT
( SELECT ... ORDER BY yyyymmdd DESC LIMIT 10 )
ORDER BY yyyymmdd DESC LIMIT 10
Adding an OFFSET would be even worse.
Two other techniques -- "covering" index and "lazy lookup" might help, but I doubt it.
Yet another technique is to put all the words in the same column and use a FULLTEXT index. But this may be problematical for several reasons.

This must be also work without UNION
SELECT * from big_table
where
( category = 'fruits' and name = 'apple' )
OR
( category = 'fruits' and (taste = 'sweet' or wildcard = '*')
ORDER BY yyyymmdd desc;

Related

MySQL - Sort table efficiently when querying by another attribute

On this JS fiddle:
http://sqlfiddle.com/#!9/4c7bc0/7
CREATE TABLE big_table(id INTEGER AUTO_INCREMENT PRIMARY KEY,
user_id INTEGER NOT NULL);
The second query
SELECT * FROM big_table WHERE user_id IN ( 1, 100, 1000 )
ORDER BY id DESC LIMIT 10;
Is planned as: Using where; Using index; Using filesort
When big_table contains tens of millions of rows, the filesort kills the performance.
How do I ORDER BY faster?
For this query:
SELECT *
FROM big_table
WHERE user_id IN ( 1, 100, 1000 )
ORDER BY id DESC
LIMIT 10;
You cannot easily get rid of the work for the ORDER BY (there is one possible way, see below). Most importantly, you need an index on big_table(user_id):
CREATE INDEX idx_bigtable_userid ON big_table(user_id);
If there are only a few dozen or hundreds of rows that match, then this should be fine.
Another possibility is to rewrite the query, and use an index on:
CREATE INDEX idx_bigtable_userid ON big_table(user_id, id);
I should start by saying that I'm not 100% sure that this approach will work because of the DESC on id. But, the following query should use the index:
SELECT *
FROM big_table
WHERE user_id = 1
ORDER BY id DESC;
So you can do:
SELECT *
FROM ((SELECT t.*
FROM big_table t
WHERE t.user_id = 1
ORDER BY id DESC
LIMIT 10
) UNION ALL
(SELECT t.*
FROM big_table t
WHERE t.user_id = 100
ORDER BY id DESC
LIMIT 10
) UNION ALL
(SELECT t.*
FROM big_table t
WHERE t.user_id = 1000
ORDER BY id DESC
LIMIT 10
)
) t
ORDER BY id DESC
LIMIT 10;

How to use ORDER BY inside UNION

I want to use ORDER BY on every UNION ALL queries, but I can't figure out the right syntax. This is what I want:
(
SELECT id, user_id, other_id, name
FROM tablename
WHERE user_id = 123 AND user_in IN (...)
ORDER BY name
)
UNION ALL
(
SELECT id, user_id, other_id, name
FROM tablename
WHERE user_id = 456 AND user_id NOT IN (...)
ORDER BY name
)
EDIT:
Just to be clear: I need two ordered lists like this, not one:
1
2
3
1
2
3
4
5
Thank you very much!
Something like this should work in MySQL:
SELECT a.*
FROM (
SELECT ... FROM ... ORDER BY ...
) a
UNION ALL
SELECT b.*
FROM (
SELECT ... FROM ... ORDER BY ...
) b
to return rows in an order we'd like them returned. i.e. MySQL seems to honor the ORDER BY clauses inside the inline views.
But, without an ORDER BY clause on the outermost query, the order that the rows are returned is not guaranteed.
If we need the rows returned in a particular sequence, we can include an ORDER BY on the outermost query. In a lot of use cases, we can just use an ORDER BY on the outermost query to satisfy the results.
But when we have a use case where we need all the rows from the first query returned before all the rows from the second query, one option is to include an extra discriminator column in each of the queries. For example, add ,'a' AS src in the first query, ,'b' AS src to the second query.
Then the outermost query could include ORDER BY src, name, to guarantee the sequence of the results.
FOLLOWUP
In your original query, the ORDER BY in your queries is discarded by the optimizer; since there is no ORDER BY applied to the outer query, MySQL is free to return the rows in whatever order it wants.
The "trick" in query in my answer (above) is dependent on behavior that may be specific to some versions of MySQL.
Test case:
populate tables
CREATE TABLE foo2 (id INT PRIMARY KEY, role VARCHAR(20)) ENGINE=InnoDB;
CREATE TABLE foo3 (id INT PRIMARY KEY, role VARCHAR(20)) ENGINE=InnoDB;
INSERT INTO foo2 (id, role) VALUES
(1,'sam'),(2,'frodo'),(3,'aragorn'),(4,'pippin'),(5,'gandalf');
INSERT INTO foo3 (id, role) VALUES
(1,'gimli'),(2,'boromir'),(3,'elron'),(4,'merry'),(5,'legolas');
query
SELECT a.*
FROM ( SELECT s.id, s.role
FROM foo2 s
ORDER BY s.role
) a
UNION ALL
SELECT b.*
FROM ( SELECT t.id, t.role
FROM foo3 t
ORDER BY t.role
) b
resultset returned
id role
------ ---------
3 aragorn
2 frodo
5 gandalf
4 pippin
1 sam
2 boromir
3 elron
1 gimli
5 legolas
4 merry
The rows from foo2 are returned "in order", followed by the rows from foo3, again, "in order".
Note (again) that this behavior is NOT guaranteed. (The behavior we observer is a side effect of how MySQL processes inline views (derived tables). This behavior may be different in versions after 5.5.)
If you need the rows returned in a particular order, then specify an ORDER BY clause for the outermost query. And that ordering will apply to the entire resultset.
As I mentioned earlier, if I needed the rows from the first query first, followed by the second query, I would include a "discriminator" column in each query, and then include the "discriminator" column in the ORDER BY clause. I would also do away with the inline views, and do something like this:
SELECT s.id, s.role, 's' AS src
FROM foo2 s
UNION ALL
SELECT t.id, t.role, 't' AS src
FROM foo3 t
ORDER BY src, role
Don't use ORDER BY in an individual SELECT statement inside a UNION, unless you're using LIMIT with it.
The MySQL docs on UNION explain why (emphasis mine):
To apply ORDER BY or LIMIT to an individual SELECT, place the clause
inside the parentheses that enclose the SELECT:
(SELECT a FROM t1 WHERE a=10 AND B=1 ORDER BY a LIMIT 10) UNION
(SELECT a FROM t2 WHERE a=11 AND B=2 ORDER BY a LIMIT 10);
However, use of ORDER BY for individual SELECT statements implies
nothing about the order in which the rows appear in the final result
because UNION by default produces an unordered set of rows. Therefore,
the use of ORDER BY in this context is typically in conjunction with
LIMIT, so that it is used to determine the subset of the selected rows
to retrieve for the SELECT, even though it does not necessarily affect
the order of those rows in the final UNION result. If ORDER BY appears
without LIMIT in a SELECT, it is optimized away because it will have
no effect anyway.
To use an ORDER BY or LIMIT clause to sort or limit the entire UNION
result, parenthesize the individual SELECT statements and place the
ORDER BY or LIMIT after the last one. The following example uses both
clauses:
(SELECT a FROM t1 WHERE a=10 AND B=1)
UNION
(SELECT a FROM t2 WHERE a=11 AND B=2)
ORDER BY a LIMIT 10;
It seems like an ORDER BY clause like the following will get you what you want:
ORDER BY user_id, name
You just use one ORDER BY at the very end.
The Union turns two selects into one logical select. The order-by applies to the entire set, not to each part.
Don't use any parens either. Just:
SELECT 1 as Origin, blah blah FROM foo WHERE x
UNION ALL
SELECT 2 as Origin, blah blah FROM foo WHERE y
ORDER BY Origin, z
(SELECT id, user_id, other_id, name
FROM tablename
WHERE user_id = 123
AND user_in IN (...))
UNION ALL
(SELECT id, user_id, other_id, name
FROM tablename
WHERE user_id = 456
AND user_id NOT IN (...)))
ORDER BY name
You can also simplify this query:
SELECT id, user_id, other_id, name
FROM tablename
WHERE (user_id = 123 AND user_in IN (...))
OR (user_id = 456 AND user_id NOT IN (...))

MySQL sorting by date with GROUP BY

My table titles looks like this
id |group|date |title
---+-----+--------------------+--------
1 |1 |2012-07-26 18:59:30 | Title 1
2 |1 |2012-07-26 19:01:20 | Title 2
3 |2 |2012-07-26 19:18:15 | Title 3
4 |2 |2012-07-26 20:09:28 | Title 4
5 |2 |2012-07-26 23:59:52 | Title 5
I need latest result from each group ordered by date in descending order. Something like this
id |group|date |title
---+-----+--------------------+--------
5 |2 |2012-07-26 23:59:52 | Title 5
2 |1 |2012-07-26 19:01:20 | Title 2
I tried
SELECT *
FROM `titles`
GROUP BY `group`
ORDER BY MAX( `date` ) DESC
but I'm geting first results from groups. Like this
id |group|date |title
---+-----+--------------------+--------
3 |2 |2012-07-26 18:59:30 | Title 3
1 |1 |2012-07-26 19:18:15 | Title 1
What am I doing wrong?
Is this query going to be more complicated if I use LEFT JOIN?
This page was very helpful to me; it taught me how to use self-joins to get the max/min/something-n rows per group.
In your situation, it can be applied to the effect you want like so:
SELECT * FROM
(SELECT group, MAX(date) AS date FROM titles GROUP BY group)
AS x JOIN titles USING (group, date);
I found this topic via Google, looked like I had the same issue.
Here's my own solution if, like me, you don't like subqueries :
-- Create a temporary table like the output
CREATE TEMPORARY TABLE titles_tmp LIKE titles;
-- Add a unique key on where you want to GROUP BY
ALTER TABLE titles_tmp ADD UNIQUE KEY `group` (`group`);
-- Read the result into the tmp_table. Duplicates won't be inserted.
INSERT IGNORE INTO titles_tmp
SELECT *
FROM `titles`
ORDER BY `date` DESC;
-- Read the temporary table as output
SELECT *
FROM titles_tmp
ORDER BY `group`;
It has a way better performance. Here's how to increase speed if the date_column has the same order as the auto_increment_one (you then don't need an ORDER BY statement) :
-- Create a temporary table like the output
CREATE TEMPORARY TABLE titles_tmp LIKE titles;
-- Add a unique key on where you want to GROUP BY
ALTER TABLE titles_tmp ADD UNIQUE KEY `group` (`group`);
-- Read the result into the tmp_table, in the natural order. Duplicates will update the temporary table with the freshest information.
INSERT INTO titles_tmp
SELECT *
FROM `titles`
ON DUPLICATE KEY
UPDATE `id` = VALUES(`id`),
`date` = VALUES(`date`),
`title` = VALUES(`title`);
-- Read the temporary table as output
SELECT *
FROM titles_tmp
ORDER BY `group`;
Result :
+----+-------+---------------------+---------+
| id | group | date | title |
+----+-------+---------------------+---------+
| 2 | 1 | 2012-07-26 19:01:20 | Title 2 |
| 5 | 2 | 2012-07-26 23:59:52 | Title 5 |
+----+-------+---------------------+---------+
On large tables this method makes a significant point in terms of performance.
Well, if dates are unique in a group this would work (if not, you'll see several rows that match the max date in a group). (Also, bad naming of columns, 'group', 'date' might give you syntax errors and such specially 'group')
select t1.* from titles t1, (select group, max(date) date from titles group by group) t2
where t2.date = t1.date
and t1.group = t2.group
order by date desc
Another approach is to make use of MySQL user variables to identify a "control break" in the group values.
If you can live with an extra column being returned, something like this will work:
SELECT IF(s.group = #prev_group,0,1) AS latest_in_group
, s.id
, #prev_group := s.group AS `group`
, s.date
, s.title
FROM (SELECT t.id,t.group,t.date,t.title
FROM titles t
ORDER BY t.group DESC, t.date DESC, t.id DESC
) s
JOIN (SELECT #prev_group := NULL) p
HAVING latest_in_group = 1
ORDER BY s.group DESC
What this is doing is ordering all the rows by group and by date in descending order. (We specify DESC on all the columns in the ORDER BY, in case there is an index on (group,date,id) that MySQL can do a "reverse scan" on. The inclusion of the id column gets us deterministic (repeatable) behavior, in the case when there are more than one row with the latest date value.) That's the inline view aliased as s.
The "trick" we use is to compare the group value to the group value from the previous row. Whenever we have a different value, we know that we are starting a "new" group, and that this row is the "latest" row (we have the IF function return a 1). Otherwise (when the group values match), it's not the latest row (and we have the IF function returns a 0).
Then, we filter out all the rows that don't have that latest_in_group set as a 1.
It's possible to remove that extra column by wrapping that query (as an inline view) in another query:
SELECT r.id
, r.group
, r.date
, r.title
FROM ( SELECT IF(s.group = #prev_group,0,1) AS latest_in_group
, s.id
, #prev_group := s.group AS `group`
, s.date
, s.title
FROM (SELECT t.id,t.group,t.date,t.title
FROM titles t
ORDER BY t.group DESC, t.date DESC, t.id DESC
) s
JOIN (SELECT #prev_group := NULL) p
HAVING latest_in_group = 1
) r
ORDER BY r.group DESC
If your id field is an auto-incrementing field, and it's safe to say that the highest value of the id field is also the highest value for the date of any group, then this is a simple solution:
SELECT b.*
FROM (SELECT MAX(id) AS maxid FROM titles GROUP BY group) a
JOIN titles b ON a.maxid = b.id
ORDER BY b.date DESC
Use the below mysql query to get latest updated/inserted record from table.
SELECT * FROM
(
select * from `titles` order by `date` desc
) as tmp_table
group by `group`
order by `date` desc
Use the following query to get the most recent record from each group
SELECT
T1.* FROM
(SELECT
MAX(ID) AS maxID
FROM
T2
GROUP BY Type) AS aux
INNER JOIN
T2 AS T2 ON T1.ID = aux.maxID ;
Where ID is your auto increment field and Type is the type of records, you wanted to group by.
MySQL uses an dumb extension of GROUP BY which is not reliable if you want to get such results therefore, you could use
select id, group, date, title from titles as t where id =
(select id from titles where group = a.group order by date desc limit 1);
In this query, each time the table is scanned full for each group so it can find the most recent date. I could not find any better alternate for this. Hope this will help someone.

How to select last N Rows without using an Index

I have a query that contains several conditions to extract data from a table of 5 million rows. A composite index has been built to partially cover some of these conditions to the extend that I am not able to cover the sorting with an index:
SELECT columns FROM Table WHERE conditions='conditions' ORDER BY id DESC LIMIT N;
The id itself is an auto-increment column. The above query can be very slow (4-5s) as filesort is being used. By removing the ORDER BY clause, I am able to speed up the query by up to 4 times. However the data extracted will be mostly old data.
Since post-processing can be carried out to sort the extracted data, I am more interested in extracting data from roughly the latest N rows from the resultset. My question is, is there a way to do something like this:
SELECT columns FROM Table WHERE conditions='conditions' LIMIT -N;
Since I do not really need a sort and I know that there is very high likelihood that the bottom N rows contain newer data.
Here you go. Keep in mind that there should be no problem in using ORDER BY with any indexed columns, including id.
SET #seq:=0;
SELECT `id`
FROM (
SELECT #seq := #seq +1 AS `seq` , `id`
FROM `Table`
WHERE `condition` = 'whatever'
)t1
WHERE t1.seq
BETWEEN (
(
SELECT COUNT( * )
FROM `Table`
WHERE `condition` = 'whatever'
) -49
)
AND (
SELECT COUNT( * )
FROM `Table`
WHERE `condition` = 'whatever'
);
You can replace the "-49" with an expression like: -1 * ($quantity_desired -1);
Also check out this answer as it might help you:
https://stackoverflow.com/a/725439/631764
And here's another one:
https://stackoverflow.com/a/1441164/631764
Grab the last "few" rows using a between:
SELECT columns
FROM Table
WHERE conditions = 'conditions'
AND id between (select max(id) from table) - 50 AND (select max(id) from table)
ORDER BY id
DESC LIMIT N;
This example gets the last 50 rows, but the id index will be used efficiently. The other conditions and ordering will then be only over 50 rows. Should work a treat.

Do I need to have a multicolumn index?

EXPLAIN SELECT *
FROM (
`phppos_items`
)
WHERE (
name LIKE 'AB10LA2%'
OR item_number LIKE 'AB10LA2%'
OR category LIKE 'AB10LA2%'
)
AND deleted =0
ORDER BY `name` ASC
LIMIT 16
+----+-------------+--------------+-------+-----------------------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+-----------------------------------+------+---------+------+------+-------------+
| 1 | SIMPLE | phppos_items | index | item_number,name,category,deleted | name | 257 | NULL | 32 | Using where |
+----+-------------+--------------+-------+-----------------------------------+------+---------+------+------+-------------+
This query takes 9 seconds to run (the table has 1 million + rows).
I have an index on item_number,name,category,deleted separately. How can I speed up this query?
Best I'm aware, MySQL doesn't know how to perform bitmap OR index scans. But you could rewrite it as the union of three queries to force it to do such a thing, if you've an index on each field. If so, this will be very fast:
select *
from (
select * from (
select *
from phppos_items
where name like 'AB10LA2%' and deleted = 0
order by `name` limit 16
) t
union
select * from (
select *
from phppos_items
where item_number like 'AB10LA2%' and deleted = 0
order by `name` limit 16
) t
union
select * from (
select *
from phppos_items
where category like 'AB10LA2%' and deleted = 0
order by `name` limit 16
) t
) as top rows
order by `name` limit 16
The OR operator can be poison for an execution plan. You could try to re-phrase your query replacing the OR clauses by an equivalent UNION:
SELECT *
FROM (
SELECT * FROM `phppos_items`
WHERE name LIKE 'AB10LA2%'
UNION
SELECT * FROM `phppos_items`
WHERE item_number LIKE 'AB10LA2%'
UNION
SELECT * FROM `phppos_items`
WHERE category LIKE 'AB10LA2%'
)
WHERE deleted =0
ORDER BY `name` ASC
LIMIT 16
This will allow MySQL to run several sub-queries in parallel before applying the UNION operator to each of the subqueries' results. I know this can help a lot with Oracle. Maybe MySQL can do similar things? Note: I assume that LIKE 'AB10LA2%' is quite a selective filter. Otherwise, this might not improve things due to late ordering and limiting in the execution plan. See Denis's answer for a more general approach.
In any case, I think a multi-column index won't help you because you have '%' signs in your search expressions. That way, only the first column in the multi-column index could be used, the rest would still need index-scanning or a full table scan.