MySQL query takes long to execute if using ORDER BY String Column - mysql

So my query on a table that contains 4 million records executes instant if I dont use order by. However I want to give my clients a way to sort results by Name field and only show last 100 of the filtered result. As soon as I add order by Name it takes 100 seconds to execute.
My table structure is similar to this:
CREATE TABLE Test(
ID INT PRIMARY KEY AUTO_INCREMENT,
Name VARCHAR(100),
StatusID INT,
KEY (StatusID), <-- Index on StatusID
KEY (StatusID, Name) <-- Index on StatusID, Name
KEY(Name) <-- Index on Name
);
My query simply does something like:
explain SELECT ID, StatusID, Name
FROM Test
WHERE StatusID = 113
ORDER BY Name DESC
LIMIT 0, 100
Above explain when I order by Name gives this result:
StatusID_2 is the composite index of StatausID, Name
Now If I change ORDER BY Name DESC to ORDER BY ID I get this:
How can I make it so that it also examines only 100 rows when using ORDER BY Name?

You can try one thing, try for letters which would be in 100 rows expected in result like
SELECT *
FROM Test
*** Some Joins to filter data or get more columns from other tables
WHERE StatusID = 12 AND NAME REGEXP '^[A-H]'
ORDER BY Name DESC
LIMIT 0, 100
Moreover using index is very important on name (which is already applied) – in this case index range scan will be started and query execution stopped as soon as soon as required amount of rows generated.
So we can't use ID for nothing as it won't scan when it has reached its limit, the only thing we can try for is remove letters which are not possible in expected result and this what we are trying to do with REGEXP

It's hard to tell without the joins and the explain result but you're not making use of index appeareantly.
It might be because of the joins or because you have another key in the where clause. I'd recommend reading this, it covers all possible cases: http://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html
Increasing the sort_buffer_size and/or read_rnd_buffer_size might help...

You need a composite key based on the filtering WHERE criteria PLUS the order by... create an index on
( StatusID, Name )
This way, the WHERE jumps right to your StatusID = 12 records and ignores the rest of the 4 million... THEN uses the name as a secondary to qualify the ORDER BY.
Without seeing the other tables / join criteria and associated indexes, you might also want to try adding MySQL keyword
SELECT STRAIGHT_JOIN ... rest of query
So it does the query in the order you have selected, but unsure of impact without seeing other joins as noted previously.
ADDITION (per feedback)
I would remove the individual indexes on the ID only so the engine doesn't have to guess which one to use. The composite index can be used as an ID only query regardless of the name so you don't need to have both.
Also, remove the Name only index UNLESS you will ever be querying PRIMARILY on the name as a where qualifier without the ID qualifier... Also, how many total records are even possible for the example IDs you are querying out of the 4 million... You MIGHT want to pull the full set for the id as a sub-query, get a few thousand and have THAT ordered by name which would be quick... something like...
select *
from ( SELECT
ID,
StatusID,
Name
FROM
Test
WHERE
StatusID = 113 ) PreQuery
ORDER BY
Name DESC
LIMIT 0, 100

Related

How to query a fixed number of rows ordered by date in mysql?

I am using mysql to query a DB. What I would like to do is to query NOT the full database, but only the last 1000 rows ordered by timestamp.
I have tried to use this query which doesn't work ad I would like. I know the limit is used to return a fixed number of selected elements. But that is not what I would like. I want to query a fixed number of element.
select * from mutable where name = 'myschema' ORDER BY start_time DESC LIMIT 1000;
Any help?
tl;dr Sort Less Data
I guess your mutable table has an autoincrementing primary key called mutable.mutable_id.
You can then do this:
SELECT mutable_id
FROM mutable
WHERE name = 'myschema'
ORDER BY start_time DESC
LIMIT 1000;
It gives you a result set of the ids of all the relevant rows. The ORDER BY ... LIMIT work then only has to sort mutable_id and start_time values, not the whole table. So it takes less space and time in the MySql server.
Then you use that query to retrieve the details:
SELECT *
FROM mutable
WHERE mutable_id IN (
SELECT mutable_id
FROM mutable
WHERE name = 'myschema'
ORDER BY start_time DESC
LIMIT 1000
)
ORDER BY start_time DESC;
This will fetch all the data you need without needing to scan and sort the whole table.
If you create an index on name and start_time the subquery will be faster: the query can random-access the index to the appropriate name, then scan the start_time entries one by one until it finds 1000. No need to sort; the index is presorted.
CREATE INDEX x_mutable_start_time ON mutable (name, start_time);
If you're on MySQL 8 you can create a descending index and it's even faster.
CREATE INDEX x_mutable_start_time ON mutable (name, start_time DESC);
This works only with auto_increment
The trick is to sort less data, like O. Jones mentioned. Problem is in telling MySQL how to do so.
MySQL can't know what "last 1000 records" are unless it sorts them based on the query. That's exactly what you want to avoid so you need to tell MySQL how to find "last 1000 records".
This trick consists of telling MySQL at which auto_increment to start looking for the data. The problem is that you're using timestamps so I'm not sure whether this fits your particular use case.
Here's the query:
SELECT * FROM mutable
WHERE `name` = 'myschema'
AND id > (SELECT MAX(id) - 1000 FROM mutable WHERE `name` = 'myschema')
ORDER BY start_time DESC LIMIT 1000;
Problems:
auto_increments have gaps. These numbers aren't sequential, they're unique, calculated via sequential increment algorithm. To get better results, increase the subtraction number. You might get 1000 results, but you might get 500 results - depending on your dataset
if you don't have an auto_increment, this is useless
if the timestamps inserted are required to be sorted beforehand from larger to lower, this is useless
advantages:
- primary key is used to define the value range (where id > x), therefore dataset reduction will be the fastest possible.

Fetching random set of values from a huge table

I have a table containing about 1 billion records. It has the following structure:
id | name | first_id | second_id
I also have an array with a set of specific words:
$arr = ['camel', 'toe', 'glasses', 'book'];
I now have to fetch all records from this table where:
- name contains one or more keywords from this array
- first_id matches 8
- second_id matches 55
Those values are made up of course, they change dynamically in my application.
How can I do this so that it's most efficient?
I tried the following:
SELECT *
FROM table t
WHERE (t.name LIKE '%camel%' OR t.name LIKE '%toe%' OR t.name LIKE '%glasses%' OR t.name LIKE '%book%') AND t.first_id = 8 AND t.second_id = 55;
But it executes about 3.5s.
I just need to get about 3-4 random records from this query, so I also tried limiting results to 300. But it still gave me 700ms, which is way too long.
I also tried randomizing limit and offset, but I'd have to count all results earlier, so it would be even slower.
Is there a way to solve this problem?
First, learn how to use EXPLAIN SELECT. This should tell you a bit about how mysql will pick a strategy for your query.
If just using the first_id and second_id reduces the table to a small amount of records, it should be pretty fast, but it does mean that you need an index. Only 1 index can be used, so how you build that index depends on the cardinality of both first_id and second_id. If both only contain a limited about of values (say: under a hundred), you should make an index that references both.
But if there's still a ton of records in the table even for those first_id and second_id values, it means you need an index on the name field instead.
A regular index will do nothing for you for that field. You need a FULLTEXT index.

Why are my indexed fields being returned in a random order rather than alphabetical (ASC) order?

I have a table of users -
id INT
account_id INT
name VARCHAR
email VARCHAR
And I have added an index for (account_id, name) so that users are returned in alphabetical order by name.
However, in some of my queries the users are returned in alphabetical order on the name field, but in others they are not, and are returned in a random order - and my index does not seem to be applied.
SELECT * FROM users WHERE account_id = 56; // Index is applied.
// Sorted by name in ASC order.
SELECT * FROM users WHERE account_id = 110; // Index is not applied.
// Not sorted by name.
What might be the reason for this?
(Could it be related to the number of records the query fetches? Could it be because of partitions?)
Kindly help.
An index does not gurantee the order of results. Indexes are used to make searching easier. And in this case, since you are searching by account_id, that would be the only index that would be used.
If you want your results ordered use an "ORDER BY" clause.
You are misunderstanding what an index does in MySQL. An index is an internal mechanism which allows your database to perform faster on certain fields.
Any data you query can be returned in any order, unless you specifically include an ORDER BY clause.
If you want to sort users by name, your query would become SELECT * FROM users WHERE account_id = 56 ORDER BY name ASC.

mysql indexes / ordering data based on multiple criteria

I have a simple table "t1" that contains:
word VARCHAR(16)
id INT
importance SMALLINT
word and id are unique together, but neither is unique alone.
I added a UNIQUE INDEX(word, id)
My query looks something like this:
SELECT id FROM t1 WHERE word = "something" ORDER BY importance DESC
But it takes 0.0002 seconds to execute say:
SELECT id FROM t1 WHERE word = "something"
But takes as much as 0.15s to execute the ORDER BY importance DESC.
My question is, how can I index/reorder my table so it's organized firstly by word, then by importance without having to do the sorting on the fly?
Can I just reorder the static data so it's sorted by word, importance DESC by default?
To speed up your query, add an index on (word, importance DESC, id).
You can have more than one index on a table so you don't need to remove your existing index if you don't want to.

How can I speed up a MySQL query with a large offset in the LIMIT clause?

I'm getting performance problems when LIMITing a mysql SELECT with a large offset:
SELECT * FROM table LIMIT m, n;
If the offset m is, say, larger than 1,000,000, the operation is very slow.
I do have to use limit m, n; I can't use something like id > 1,000,000 limit n.
How can I optimize this statement for better performance?
Perhaps you could create an indexing table which provides a sequential key relating to the key in your target table. Then you can join this indexing table to your target table and use a where clause to more efficiently get the rows you want.
#create table to store sequences
CREATE TABLE seq (
seq_no int not null auto_increment,
id int not null,
primary key(seq_no),
unique(id)
);
#create the sequence
TRUNCATE seq;
INSERT INTO seq (id) SELECT id FROM mytable ORDER BY id;
#now get 1000 rows from offset 1000000
SELECT mytable.*
FROM mytable
INNER JOIN seq USING(id)
WHERE seq.seq_no BETWEEN 1000000 AND 1000999;
If records are large, the slowness may be coming from loading the data. If the id column is indexed, then just selecting it will be much faster. You can then do a second query with an IN clause for the appropriate ids (or could formulate a WHERE clause using the min and max ids from the first query.)
slow:
SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000
fast:
SELECT id FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000
SELECT * FROM table WHERE id IN (1,2,3...10)
There's a blog post somewhere on the internet on how you should best make the selection of the rows to show should be as compact as possible, thus: just the ids; and producing the complete results should in turn fetch all the data you want for only the rows you selected.
Thus, the SQL might be something like (untested, I'm not sure it actually will do any good):
select A.* from table A
inner join (select id from table order by whatever limit m, n) B
on A.id = B.id
order by A.whatever
If your SQL engine is too primitive to allow this kind of SQL statements, or it doesn't improve anything, against hope, it might be worthwhile to break this single statement into multiple statements and capture the ids into a data structure.
Update: I found the blog post I was talking about: it was Jeff Atwood's "All Abstractions Are Failed Abstractions" on Coding Horror.
I don't think there's any need to create a separate index if your table already has one. If so, then you can order by this primary key and then use values of the key to step through:
SELECT * FROM myBigTable WHERE id > :OFFSET ORDER BY id ASC;
Another optimisation would be not to use SELECT * but just the ID so that it can simply read the index and doesn't have to then locate all the data (reduce IO overhead). If you need some of the other columns then perhaps you could add these to the index so that they are read with the primary key (which will most likely be held in memory and therefore not require a disc lookup) - although this will not be appropriate for all cases so you will have to have a play.
Paul Dixon's answer is indeed a solution to the problem, but you'll have to maintain the sequence table and ensure that there is no row gaps.
If that's feasible, a better solution would be to simply ensure that the original table has no row gaps, and starts from id 1. Then grab the rows using the id for pagination.
SELECT * FROM table A WHERE id >= 1 AND id <= 1000;
SELECT * FROM table A WHERE id >= 1001 AND id <= 2000;
and so on...
I have run into this problem recently. The problem was two parts to fix. First I had to use an inner select in my FROM clause that did my limiting and offsetting for me on the primary key only:
$subQuery = DB::raw("( SELECT id FROM titles WHERE id BETWEEN {$startId} AND {$endId} ORDER BY title ) as t");
Then I could use that as the from part of my query:
'titles.id',
'title_eisbns_concat.eisbns_concat',
'titles.pub_symbol',
'titles.title',
'titles.subtitle',
'titles.contributor1',
'titles.publisher',
'titles.epub_date',
'titles.ebook_price',
'publisher_licenses.id as pub_license_id',
'license_types.shortname',
$coversQuery
)
->from($subQuery)
->leftJoin('titles', 't.id', '=', 'titles.id')
->leftJoin('organizations', 'organizations.symbol', '=', 'titles.pub_symbol')
->leftJoin('title_eisbns_concat', 'titles.id', '=', 'title_eisbns_concat.title_id')
->leftJoin('publisher_licenses', 'publisher_licenses.org_id', '=', 'organizations.id')
->leftJoin('license_types', 'license_types.id', '=', 'publisher_licenses.license_type_id')
The first time I created this query I had used the OFFSET and LIMIT in MySql. This worked fine until I got past page 100 then the offset started getting unbearably slow. Changing that to BETWEEN in my inner query sped it up for any page. I'm not sure why MySql hasn't sped up OFFSET but between seems to reel it back in.