I have a MySQL table:
CREATE TABLE mytable (
id INT NOT NULL AUTO_INCREMENT,
other_id INT NOT NULL,
expiration_datetime DATETIME,
score INT,
PRIMARY KEY (id)
)
I need to run query in the form of:
SELECT * FROM mytable
WHERE other_id=1 AND expiration_datetime > NOW()
ORDER BY score LIMIT 10
If I add this index to mytable:
CREATE INDEX order_by_index
ON mytable ( other_id, expiration_datetime, score);
Would MySQL be able to use the entire order_by_index in the query above?
It seems like it should be able to, but then according to MySQL's documentation: "The index can also be used even if the ORDER BY does not match the index exactly, as long as all of the unused portions of the index and all the extra ORDER BY columns are constants in the WHERE clause."
The above passage seems to suggest that index would only be used in a constant query while mine is a range query.
Can anyone clarify if index would be used in this case? If not, any way I could force the use of index?
Thanks.
MySQL will use the index to satisfy the where clause, and will use a filesort to order the results.
It can't use the index for the order by because you are not comparing expiration_datetime to a constant. Therefore, the rows being returned will not always all have a common prefix in the index, so the index can't be used for the sort.
For example, consider a sample set of 4 index records for your table:
a) [1,'2010-11-03 12:00',1]
b) [1,'2010-11-03 12:00',3]
c) [1,'2010-11-03 13:00',2]
d) [2,'2010-11-03 12:00',1]
If I run your query at 2010-11-03 11:00, it will return rows a,c,d which are not consecutive in the index. Thus MySQL needs to do the extra pass to sort the results and can't use an index in this case.
Can anyone clarify if index would be used in this case? If not, any way I could force the use of index?
You have a range in filtering condition and the ORDER BY not matching the range.
These conditions cannot be served with a single index.
To choose which index to create, you need to run these queries
SELECT COUNT(*)
FROM mytable
WHERE other_id = 1
AND (score, id) <
(
SELECT score, id
FROM mytable
WHERE other_id = 1
AND expiration_datetime > NOW()
ORDER BY
score, id
LIMIT 10
)
and
SELECT COUNT(*)
FROM mytable
WHERE other_id = 1
AND expiration_datetime >= NOW()
and compare their outputs.
If the second query yields about same or more values as the first one, then you should use an index on (other_id, score) (and let it filter on expiration_datetime).
If the second query yields significantly less values than the first one, you should use an index on (other_id, expiration_datetime) (and let it sort on score).
This article might be interesting to you:
Choosing index
Sounds like you've already checked the documentation and setup the index. Use EXPLAIN and see...
EXPLAIN SELECT * FROM mytable
WHERE other_id=1 AND expiration_datetime > NOW()
ORDER BY score LIMIT 10
Related
Having trouble with a query. Here is the outline -
Table structure:
CREATE TABLE `world` (
`placeRef` int NOT NULL,
`forenameRef` int NOT NULL,
`surnameRef` int NOT NULL,
`incidence` int NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb3;
ALTER TABLE `world`
ADD KEY `surnameRef_forenameRef` (`surnameRef`,`forenameRef`),
ADD KEY `forenameRef_surnameRef` (`forenameRef`,`surnameRef`),
ADD KEY `forenameRef` (`forenameRef`,`placeRef`);
COMMIT;
This table contains data like and has over 600,000,000 rows:
placeRef forenameRef surnameRef incidence
1 1 2 100
2 1 3 600
This represents the number of people with a given forename-surname combination in a place.
I would like to be able to query all the forenames that a surname is attached to; and then perform another search for where those forenames exist, with a count of the sum incidence. For Example: get all the forenames of people who have the surname "Smith"; then get a list of all those forenames, grouped by place and with the sum incidence. I can do this with the following query:
SELECT placeRef, SUM( incidence )
FROM world
WHERE forenameRef IN
(
SELECT DISTINCT forenameRef
FROM world
WHERE surnameRef = 214488
)
GROUP BY world.placeRef
However, this query takes about a minute to execute and will take more time if the surname being searched for is common.
The root problem is: performing a range query with a group doesn't utilize the full index.
Any suggestions how the speed could be improved?
In my experience, if your query has a range condition (i.e. any kind of predicate other than = or IS NULL), the column for that condition is the last column in your index that can be used to optimize search, sort, or grouping.
In other words, suppose you have an index on columns (a, b, c).
The following uses all three columns. It is able to optimize the ORDER BY c, because since all rows matching the specific values of a and b will by definition be tied, and then those matching rows will already be in order by c, so the ORDER BY is a no-op.
SELECT * FROM mytable WHERE a = 1 AND b = 2 ORDER BY c;
But the next example only uses columns a, b. The ORDER BY needs to do a filesort, because the index is not in order by c.
SELECT * FROM mytable WHERE a = 1 AND b > 2 ORDER BY c;
A similar effect is true for GROUP BY. The following uses a, b for row selection, and it can also optimize the GROUP BY using the index, because each group of values per distinct value of c is guaranteed to be grouped together in the index. So it can count the rows for each value of c, and when it's done with one group, it is assured there will be no more rows later with that value of c.
SELECT c, COUNT(*) FROM mytable WHERE a = 1 AND b = 2 GROUP BY c;
But the range condition spoils that. The rows for each value of c are not grouped together. It's assumed that the rows for each value of c may be scattered among each of the higher values of b.
SELECT c, COUNT(*) FROM mytable WHERE a = 1 AND b > 2 GROUP BY c;
In this case, MySQL can't optimize the GROUP BY in this query. It must use a temporary table to count the rows per distinct value of c.
MySQL 8.0.13 introduced a new type of optimizer behavior, the Skip Scan Range Access Method. But as far as I know, it only applies to range conditions, not ORDER BY or GROUP BY.
It's still true that if you have a range condition, this spoils the index optimization of ORDER BY and GROUP BY.
Unless I don't understand the task, it seems like this works:
SELECT placeRef, SUM( incidence )
FROM world
WHERE surnameRef = 214488
GROUP BY placeRef;
Give it a try.
It would benefit from a composite index in this order:
INDEX(surnameRef, placeRef, incidence)
Is incidence being updated a lot? If so, leave it off my Index.
You should consider moving from MyISAM to InnoDB. It will need a suitable PK, probably
PRIMARY KEY(placeRef, surnameRef, forenameRef)
and it will take 2x-3x the disk space.
I am using mysql to query a DB. What I would like to do is to query NOT the full database, but only the last 1000 rows ordered by timestamp.
I have tried to use this query which doesn't work ad I would like. I know the limit is used to return a fixed number of selected elements. But that is not what I would like. I want to query a fixed number of element.
select * from mutable where name = 'myschema' ORDER BY start_time DESC LIMIT 1000;
Any help?
tl;dr Sort Less Data
I guess your mutable table has an autoincrementing primary key called mutable.mutable_id.
You can then do this:
SELECT mutable_id
FROM mutable
WHERE name = 'myschema'
ORDER BY start_time DESC
LIMIT 1000;
It gives you a result set of the ids of all the relevant rows. The ORDER BY ... LIMIT work then only has to sort mutable_id and start_time values, not the whole table. So it takes less space and time in the MySql server.
Then you use that query to retrieve the details:
SELECT *
FROM mutable
WHERE mutable_id IN (
SELECT mutable_id
FROM mutable
WHERE name = 'myschema'
ORDER BY start_time DESC
LIMIT 1000
)
ORDER BY start_time DESC;
This will fetch all the data you need without needing to scan and sort the whole table.
If you create an index on name and start_time the subquery will be faster: the query can random-access the index to the appropriate name, then scan the start_time entries one by one until it finds 1000. No need to sort; the index is presorted.
CREATE INDEX x_mutable_start_time ON mutable (name, start_time);
If you're on MySQL 8 you can create a descending index and it's even faster.
CREATE INDEX x_mutable_start_time ON mutable (name, start_time DESC);
This works only with auto_increment
The trick is to sort less data, like O. Jones mentioned. Problem is in telling MySQL how to do so.
MySQL can't know what "last 1000 records" are unless it sorts them based on the query. That's exactly what you want to avoid so you need to tell MySQL how to find "last 1000 records".
This trick consists of telling MySQL at which auto_increment to start looking for the data. The problem is that you're using timestamps so I'm not sure whether this fits your particular use case.
Here's the query:
SELECT * FROM mutable
WHERE `name` = 'myschema'
AND id > (SELECT MAX(id) - 1000 FROM mutable WHERE `name` = 'myschema')
ORDER BY start_time DESC LIMIT 1000;
Problems:
auto_increments have gaps. These numbers aren't sequential, they're unique, calculated via sequential increment algorithm. To get better results, increase the subtraction number. You might get 1000 results, but you might get 500 results - depending on your dataset
if you don't have an auto_increment, this is useless
if the timestamps inserted are required to be sorted beforehand from larger to lower, this is useless
advantages:
- primary key is used to define the value range (where id > x), therefore dataset reduction will be the fastest possible.
I wish to fetch the last 10 rows from the table of 1 M rows.
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`updated_date` datetime NOT NULL,
PRIMARY KEY (`id`)
)
One way of doing this is -
select * from test order by -id limit 10;
**10 rows in set (0.14 sec)**
Another way of doing this is -
select * from test order by id desc limit 10;
**10 rows in set (0.00 sec)**
So I did an 'EXPLAIN' on these queries -
Here is the result for the query where I use 'order by desc'
EXPLAIN select * from test order by id desc limit 10;
And here is the result for the query where I use 'order by -id'
EXPLAIN select * from test order by -id limit 10;
I thought this would be same but is seems there are differences in the execution plan.
RDBMS use heuristics to calculate the execution plan, they cannot always determine the semantic equivalence of two statements as it is a too difficult problem (in terms of theoretical and practical complexity).
So MySQL is not able to use the index, as you do not have an index on "-id", that is a custom function applied to the field "id". Seems trivial, but the RDBMSs must minimize the amount of time needed to compute the plans, so they get stuck with simple problems.
When an optimization cannot be found for a query (i.e. using an index) the system fall back to the implementation that works in any case: a scan of the full table.
As you can see in Explain results,
1 : order by id
MySQL is using indexing on id. So it need to iterate only 10 rows as it is already indexed. And also in this case MySQL don't need to use filesort algorithm as it is already indexed.
2 : order by -id
MySQL is not using indexing on id. So it needs to iterate all the rows.( e.g. 455952) to get your expected results. In this case MySQL needs to use filesort algorithm as id is not indexed. So it will obviously take more time :)
You use ORDER BY with an expression that includes terms other than the key column name:
SELECT * FROM t1 ORDER BY ABS(key);
SELECT * FROM t1 ORDER BY -key;
You index only a prefix of a column named in the ORDER BY clause. In this case, the index cannot be used to fully resolve the sort order. For example, if you have a CHAR(20) column, but index only the first 10 bytes, the index cannot distinguish values past the 10th byte and a filesort will be needed.
The type of table index used does not store rows in order. For example, this is true for a HASH index in a MEMORY table.
Please follow this link: http://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html
I'm trying to optimize a query.
My question seems to be similar to MySQL, Union ALL and LIMIT and the answer might be the same (I'm afraid). However in my case there's a stricter limit (1) as well as an index on a datetime column.
So here we go:
For simplicity, let's have just one table with three: columns:
md5 (varchar)
value (varchar).
lastupdated (datetime)
There's an index on (md5, updated) so selecting on a md5 key, ordering by updated and limiting to 1 will be optimized.
The search shall return a maximum of one record matching one of 10 md5 keys. The keys have a priority. So if there's a record with prio 1 it will be preferred over any record with prio 2, 3 etc.
Currently UNION ALL is used:
select * from
(
(
select 0 prio, value
from mytable
where md5 = '7b76e7c87e1e697d08300fd9058ed1db'
order by lastupdated desc
limit 1
)
union all
(
select 1 prio, value
from mytable
where md5 = 'eb36cd1c563ffedc6adaf8b74c259723'
order by lastupdated desc
limit 1
)
) x
order by prio
limit 1;
It works, but the UNION seems to execute all 10 queries if 10 keys are provided.
However, from a business perspective, it would be ok to run the selects sequentially and stop after the first match.
Is that possible though plain SQL?
Or would the only option be a stored procedure?
There's a much better way to do this that doesn't need UNION. You really want the groupwise max for each key, with a custom ordering.
Groupwise Max
Order by FIELD()
There's no way the optimizer for UNION ALL can figure out what you're up to.
I don't know if you can do this, but suppose you had a md5prio table with the list of hash codes you know you're looking for. For example.
prio md5
0 '7b76e7c87e1e697d08300fd9058ed1db'
1 'eb36cd1c563ffedc6adaf8b74c259723'
etc
in it.
Then your query could be:
select mytable.*
from mytable
join md5prio on mytable.md5 = md5prio.md5
order by md5prio.prio, mytable.lastupdated desc
limit 1
This might save the repeated queries. You'll definitely need your index on mytable.md5. I am not sure whether your compound index on lastupdated will help; you'll need to try it.
In your case, the most efficient solution may be to build an index on (md5, lastupdated). This index should be used to resolve each subquery very efficiently (looking up the values in the index and then looking up one data page).
Unfortunately, the groupwise max referenced by Gavin will produce multiple rows when there are duplicate lastupdated values (admittedly, perhaps not a concern in your case).
There is, actually, a MySQL way to get this answer, using group_concat and substring_index:
select p.prio,
substring_index(group_concat(mt.value order by mt.lastupdated desc), ',', 1)
from mytable mt join
(select 0 as prio, '7b76e7c87e1e697d08300fd9058ed1db' as md5 union all
select 1 as prio, 'eb36cd1c563ffedc6adaf8b74c259723' as md5 union all
. . .
) p
on mt.md5 = p.md5
I'm getting performance problems when LIMITing a mysql SELECT with a large offset:
SELECT * FROM table LIMIT m, n;
If the offset m is, say, larger than 1,000,000, the operation is very slow.
I do have to use limit m, n; I can't use something like id > 1,000,000 limit n.
How can I optimize this statement for better performance?
Perhaps you could create an indexing table which provides a sequential key relating to the key in your target table. Then you can join this indexing table to your target table and use a where clause to more efficiently get the rows you want.
#create table to store sequences
CREATE TABLE seq (
seq_no int not null auto_increment,
id int not null,
primary key(seq_no),
unique(id)
);
#create the sequence
TRUNCATE seq;
INSERT INTO seq (id) SELECT id FROM mytable ORDER BY id;
#now get 1000 rows from offset 1000000
SELECT mytable.*
FROM mytable
INNER JOIN seq USING(id)
WHERE seq.seq_no BETWEEN 1000000 AND 1000999;
If records are large, the slowness may be coming from loading the data. If the id column is indexed, then just selecting it will be much faster. You can then do a second query with an IN clause for the appropriate ids (or could formulate a WHERE clause using the min and max ids from the first query.)
slow:
SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000
fast:
SELECT id FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000
SELECT * FROM table WHERE id IN (1,2,3...10)
There's a blog post somewhere on the internet on how you should best make the selection of the rows to show should be as compact as possible, thus: just the ids; and producing the complete results should in turn fetch all the data you want for only the rows you selected.
Thus, the SQL might be something like (untested, I'm not sure it actually will do any good):
select A.* from table A
inner join (select id from table order by whatever limit m, n) B
on A.id = B.id
order by A.whatever
If your SQL engine is too primitive to allow this kind of SQL statements, or it doesn't improve anything, against hope, it might be worthwhile to break this single statement into multiple statements and capture the ids into a data structure.
Update: I found the blog post I was talking about: it was Jeff Atwood's "All Abstractions Are Failed Abstractions" on Coding Horror.
I don't think there's any need to create a separate index if your table already has one. If so, then you can order by this primary key and then use values of the key to step through:
SELECT * FROM myBigTable WHERE id > :OFFSET ORDER BY id ASC;
Another optimisation would be not to use SELECT * but just the ID so that it can simply read the index and doesn't have to then locate all the data (reduce IO overhead). If you need some of the other columns then perhaps you could add these to the index so that they are read with the primary key (which will most likely be held in memory and therefore not require a disc lookup) - although this will not be appropriate for all cases so you will have to have a play.
Paul Dixon's answer is indeed a solution to the problem, but you'll have to maintain the sequence table and ensure that there is no row gaps.
If that's feasible, a better solution would be to simply ensure that the original table has no row gaps, and starts from id 1. Then grab the rows using the id for pagination.
SELECT * FROM table A WHERE id >= 1 AND id <= 1000;
SELECT * FROM table A WHERE id >= 1001 AND id <= 2000;
and so on...
I have run into this problem recently. The problem was two parts to fix. First I had to use an inner select in my FROM clause that did my limiting and offsetting for me on the primary key only:
$subQuery = DB::raw("( SELECT id FROM titles WHERE id BETWEEN {$startId} AND {$endId} ORDER BY title ) as t");
Then I could use that as the from part of my query:
'titles.id',
'title_eisbns_concat.eisbns_concat',
'titles.pub_symbol',
'titles.title',
'titles.subtitle',
'titles.contributor1',
'titles.publisher',
'titles.epub_date',
'titles.ebook_price',
'publisher_licenses.id as pub_license_id',
'license_types.shortname',
$coversQuery
)
->from($subQuery)
->leftJoin('titles', 't.id', '=', 'titles.id')
->leftJoin('organizations', 'organizations.symbol', '=', 'titles.pub_symbol')
->leftJoin('title_eisbns_concat', 'titles.id', '=', 'title_eisbns_concat.title_id')
->leftJoin('publisher_licenses', 'publisher_licenses.org_id', '=', 'organizations.id')
->leftJoin('license_types', 'license_types.id', '=', 'publisher_licenses.license_type_id')
The first time I created this query I had used the OFFSET and LIMIT in MySql. This worked fine until I got past page 100 then the offset started getting unbearably slow. Changing that to BETWEEN in my inner query sped it up for any page. I'm not sure why MySql hasn't sped up OFFSET but between seems to reel it back in.