mysql indexes / ordering data based on multiple criteria - mysql

I have a simple table "t1" that contains:
word VARCHAR(16)
id INT
importance SMALLINT
word and id are unique together, but neither is unique alone.
I added a UNIQUE INDEX(word, id)
My query looks something like this:
SELECT id FROM t1 WHERE word = "something" ORDER BY importance DESC
But it takes 0.0002 seconds to execute say:
SELECT id FROM t1 WHERE word = "something"
But takes as much as 0.15s to execute the ORDER BY importance DESC.
My question is, how can I index/reorder my table so it's organized firstly by word, then by importance without having to do the sorting on the fly?
Can I just reorder the static data so it's sorted by word, importance DESC by default?

To speed up your query, add an index on (word, importance DESC, id).
You can have more than one index on a table so you don't need to remove your existing index if you don't want to.

Related

How to query a fixed number of rows ordered by date in mysql?

I am using mysql to query a DB. What I would like to do is to query NOT the full database, but only the last 1000 rows ordered by timestamp.
I have tried to use this query which doesn't work ad I would like. I know the limit is used to return a fixed number of selected elements. But that is not what I would like. I want to query a fixed number of element.
select * from mutable where name = 'myschema' ORDER BY start_time DESC LIMIT 1000;
Any help?
tl;dr Sort Less Data
I guess your mutable table has an autoincrementing primary key called mutable.mutable_id.
You can then do this:
SELECT mutable_id
FROM mutable
WHERE name = 'myschema'
ORDER BY start_time DESC
LIMIT 1000;
It gives you a result set of the ids of all the relevant rows. The ORDER BY ... LIMIT work then only has to sort mutable_id and start_time values, not the whole table. So it takes less space and time in the MySql server.
Then you use that query to retrieve the details:
SELECT *
FROM mutable
WHERE mutable_id IN (
SELECT mutable_id
FROM mutable
WHERE name = 'myschema'
ORDER BY start_time DESC
LIMIT 1000
)
ORDER BY start_time DESC;
This will fetch all the data you need without needing to scan and sort the whole table.
If you create an index on name and start_time the subquery will be faster: the query can random-access the index to the appropriate name, then scan the start_time entries one by one until it finds 1000. No need to sort; the index is presorted.
CREATE INDEX x_mutable_start_time ON mutable (name, start_time);
If you're on MySQL 8 you can create a descending index and it's even faster.
CREATE INDEX x_mutable_start_time ON mutable (name, start_time DESC);
This works only with auto_increment
The trick is to sort less data, like O. Jones mentioned. Problem is in telling MySQL how to do so.
MySQL can't know what "last 1000 records" are unless it sorts them based on the query. That's exactly what you want to avoid so you need to tell MySQL how to find "last 1000 records".
This trick consists of telling MySQL at which auto_increment to start looking for the data. The problem is that you're using timestamps so I'm not sure whether this fits your particular use case.
Here's the query:
SELECT * FROM mutable
WHERE `name` = 'myschema'
AND id > (SELECT MAX(id) - 1000 FROM mutable WHERE `name` = 'myschema')
ORDER BY start_time DESC LIMIT 1000;
Problems:
auto_increments have gaps. These numbers aren't sequential, they're unique, calculated via sequential increment algorithm. To get better results, increase the subtraction number. You might get 1000 results, but you might get 500 results - depending on your dataset
if you don't have an auto_increment, this is useless
if the timestamps inserted are required to be sorted beforehand from larger to lower, this is useless
advantages:
- primary key is used to define the value range (where id > x), therefore dataset reduction will be the fastest possible.

Why are my indexed fields being returned in a random order rather than alphabetical (ASC) order?

I have a table of users -
id INT
account_id INT
name VARCHAR
email VARCHAR
And I have added an index for (account_id, name) so that users are returned in alphabetical order by name.
However, in some of my queries the users are returned in alphabetical order on the name field, but in others they are not, and are returned in a random order - and my index does not seem to be applied.
SELECT * FROM users WHERE account_id = 56; // Index is applied.
// Sorted by name in ASC order.
SELECT * FROM users WHERE account_id = 110; // Index is not applied.
// Not sorted by name.
What might be the reason for this?
(Could it be related to the number of records the query fetches? Could it be because of partitions?)
Kindly help.
An index does not gurantee the order of results. Indexes are used to make searching easier. And in this case, since you are searching by account_id, that would be the only index that would be used.
If you want your results ordered use an "ORDER BY" clause.
You are misunderstanding what an index does in MySQL. An index is an internal mechanism which allows your database to perform faster on certain fields.
Any data you query can be returned in any order, unless you specifically include an ORDER BY clause.
If you want to sort users by name, your query would become SELECT * FROM users WHERE account_id = 56 ORDER BY name ASC.

mysql order by -id vs order by id desc

I wish to fetch the last 10 rows from the table of 1 M rows.
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`updated_date` datetime NOT NULL,
PRIMARY KEY (`id`)
)
One way of doing this is -
select * from test order by -id limit 10;
**10 rows in set (0.14 sec)**
Another way of doing this is -
select * from test order by id desc limit 10;
**10 rows in set (0.00 sec)**
So I did an 'EXPLAIN' on these queries -
Here is the result for the query where I use 'order by desc'
EXPLAIN select * from test order by id desc limit 10;
And here is the result for the query where I use 'order by -id'
EXPLAIN select * from test order by -id limit 10;
I thought this would be same but is seems there are differences in the execution plan.
RDBMS use heuristics to calculate the execution plan, they cannot always determine the semantic equivalence of two statements as it is a too difficult problem (in terms of theoretical and practical complexity).
So MySQL is not able to use the index, as you do not have an index on "-id", that is a custom function applied to the field "id". Seems trivial, but the RDBMSs must minimize the amount of time needed to compute the plans, so they get stuck with simple problems.
When an optimization cannot be found for a query (i.e. using an index) the system fall back to the implementation that works in any case: a scan of the full table.
As you can see in Explain results,
1 : order by id
MySQL is using indexing on id. So it need to iterate only 10 rows as it is already indexed. And also in this case MySQL don't need to use filesort algorithm as it is already indexed.
2 : order by -id
MySQL is not using indexing on id. So it needs to iterate all the rows.( e.g. 455952) to get your expected results. In this case MySQL needs to use filesort algorithm as id is not indexed. So it will obviously take more time :)
You use ORDER BY with an expression that includes terms other than the key column name:
SELECT * FROM t1 ORDER BY ABS(key);
SELECT * FROM t1 ORDER BY -key;
You index only a prefix of a column named in the ORDER BY clause. In this case, the index cannot be used to fully resolve the sort order. For example, if you have a CHAR(20) column, but index only the first 10 bytes, the index cannot distinguish values past the 10th byte and a filesort will be needed.
The type of table index used does not store rows in order. For example, this is true for a HASH index in a MEMORY table.
Please follow this link: http://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html

MySQL query takes long to execute if using ORDER BY String Column

So my query on a table that contains 4 million records executes instant if I dont use order by. However I want to give my clients a way to sort results by Name field and only show last 100 of the filtered result. As soon as I add order by Name it takes 100 seconds to execute.
My table structure is similar to this:
CREATE TABLE Test(
ID INT PRIMARY KEY AUTO_INCREMENT,
Name VARCHAR(100),
StatusID INT,
KEY (StatusID), <-- Index on StatusID
KEY (StatusID, Name) <-- Index on StatusID, Name
KEY(Name) <-- Index on Name
);
My query simply does something like:
explain SELECT ID, StatusID, Name
FROM Test
WHERE StatusID = 113
ORDER BY Name DESC
LIMIT 0, 100
Above explain when I order by Name gives this result:
StatusID_2 is the composite index of StatausID, Name
Now If I change ORDER BY Name DESC to ORDER BY ID I get this:
How can I make it so that it also examines only 100 rows when using ORDER BY Name?
You can try one thing, try for letters which would be in 100 rows expected in result like
SELECT *
FROM Test
*** Some Joins to filter data or get more columns from other tables
WHERE StatusID = 12 AND NAME REGEXP '^[A-H]'
ORDER BY Name DESC
LIMIT 0, 100
Moreover using index is very important on name (which is already applied) – in this case index range scan will be started and query execution stopped as soon as soon as required amount of rows generated.
So we can't use ID for nothing as it won't scan when it has reached its limit, the only thing we can try for is remove letters which are not possible in expected result and this what we are trying to do with REGEXP
It's hard to tell without the joins and the explain result but you're not making use of index appeareantly.
It might be because of the joins or because you have another key in the where clause. I'd recommend reading this, it covers all possible cases: http://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html
Increasing the sort_buffer_size and/or read_rnd_buffer_size might help...
You need a composite key based on the filtering WHERE criteria PLUS the order by... create an index on
( StatusID, Name )
This way, the WHERE jumps right to your StatusID = 12 records and ignores the rest of the 4 million... THEN uses the name as a secondary to qualify the ORDER BY.
Without seeing the other tables / join criteria and associated indexes, you might also want to try adding MySQL keyword
SELECT STRAIGHT_JOIN ... rest of query
So it does the query in the order you have selected, but unsure of impact without seeing other joins as noted previously.
ADDITION (per feedback)
I would remove the individual indexes on the ID only so the engine doesn't have to guess which one to use. The composite index can be used as an ID only query regardless of the name so you don't need to have both.
Also, remove the Name only index UNLESS you will ever be querying PRIMARILY on the name as a where qualifier without the ID qualifier... Also, how many total records are even possible for the example IDs you are querying out of the 4 million... You MIGHT want to pull the full set for the id as a sub-query, get a few thousand and have THAT ordered by name which would be quick... something like...
select *
from ( SELECT
ID,
StatusID,
Name
FROM
Test
WHERE
StatusID = 113 ) PreQuery
ORDER BY
Name DESC
LIMIT 0, 100

How can I speed up a MySQL query with a large offset in the LIMIT clause?

I'm getting performance problems when LIMITing a mysql SELECT with a large offset:
SELECT * FROM table LIMIT m, n;
If the offset m is, say, larger than 1,000,000, the operation is very slow.
I do have to use limit m, n; I can't use something like id > 1,000,000 limit n.
How can I optimize this statement for better performance?
Perhaps you could create an indexing table which provides a sequential key relating to the key in your target table. Then you can join this indexing table to your target table and use a where clause to more efficiently get the rows you want.
#create table to store sequences
CREATE TABLE seq (
seq_no int not null auto_increment,
id int not null,
primary key(seq_no),
unique(id)
);
#create the sequence
TRUNCATE seq;
INSERT INTO seq (id) SELECT id FROM mytable ORDER BY id;
#now get 1000 rows from offset 1000000
SELECT mytable.*
FROM mytable
INNER JOIN seq USING(id)
WHERE seq.seq_no BETWEEN 1000000 AND 1000999;
If records are large, the slowness may be coming from loading the data. If the id column is indexed, then just selecting it will be much faster. You can then do a second query with an IN clause for the appropriate ids (or could formulate a WHERE clause using the min and max ids from the first query.)
slow:
SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000
fast:
SELECT id FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000
SELECT * FROM table WHERE id IN (1,2,3...10)
There's a blog post somewhere on the internet on how you should best make the selection of the rows to show should be as compact as possible, thus: just the ids; and producing the complete results should in turn fetch all the data you want for only the rows you selected.
Thus, the SQL might be something like (untested, I'm not sure it actually will do any good):
select A.* from table A
inner join (select id from table order by whatever limit m, n) B
on A.id = B.id
order by A.whatever
If your SQL engine is too primitive to allow this kind of SQL statements, or it doesn't improve anything, against hope, it might be worthwhile to break this single statement into multiple statements and capture the ids into a data structure.
Update: I found the blog post I was talking about: it was Jeff Atwood's "All Abstractions Are Failed Abstractions" on Coding Horror.
I don't think there's any need to create a separate index if your table already has one. If so, then you can order by this primary key and then use values of the key to step through:
SELECT * FROM myBigTable WHERE id > :OFFSET ORDER BY id ASC;
Another optimisation would be not to use SELECT * but just the ID so that it can simply read the index and doesn't have to then locate all the data (reduce IO overhead). If you need some of the other columns then perhaps you could add these to the index so that they are read with the primary key (which will most likely be held in memory and therefore not require a disc lookup) - although this will not be appropriate for all cases so you will have to have a play.
Paul Dixon's answer is indeed a solution to the problem, but you'll have to maintain the sequence table and ensure that there is no row gaps.
If that's feasible, a better solution would be to simply ensure that the original table has no row gaps, and starts from id 1. Then grab the rows using the id for pagination.
SELECT * FROM table A WHERE id >= 1 AND id <= 1000;
SELECT * FROM table A WHERE id >= 1001 AND id <= 2000;
and so on...
I have run into this problem recently. The problem was two parts to fix. First I had to use an inner select in my FROM clause that did my limiting and offsetting for me on the primary key only:
$subQuery = DB::raw("( SELECT id FROM titles WHERE id BETWEEN {$startId} AND {$endId} ORDER BY title ) as t");
Then I could use that as the from part of my query:
'titles.id',
'title_eisbns_concat.eisbns_concat',
'titles.pub_symbol',
'titles.title',
'titles.subtitle',
'titles.contributor1',
'titles.publisher',
'titles.epub_date',
'titles.ebook_price',
'publisher_licenses.id as pub_license_id',
'license_types.shortname',
$coversQuery
)
->from($subQuery)
->leftJoin('titles', 't.id', '=', 'titles.id')
->leftJoin('organizations', 'organizations.symbol', '=', 'titles.pub_symbol')
->leftJoin('title_eisbns_concat', 'titles.id', '=', 'title_eisbns_concat.title_id')
->leftJoin('publisher_licenses', 'publisher_licenses.org_id', '=', 'organizations.id')
->leftJoin('license_types', 'license_types.id', '=', 'publisher_licenses.license_type_id')
The first time I created this query I had used the OFFSET and LIMIT in MySql. This worked fine until I got past page 100 then the offset started getting unbearably slow. Changing that to BETWEEN in my inner query sped it up for any page. I'm not sure why MySql hasn't sped up OFFSET but between seems to reel it back in.