Mysql prioritising Primary key over column index - mysql

Query:
select * from table_a where col_a = 'value1' and col_b ='value2' order by id desc limit 1
Indexes:
col_a is indexed but col_b is not. col_a has a high cardinality (2M)
The entire table consists of 28M rows. No. of rows with col_a = 'value1' is 22,000.
The latest id is 28M. The latest rows with col_a = 'value1' has id somewhere in 25M-25.5M range.
Ideally it should scan these 22000 rows only and give us the result. But we have seen that mysql is scanning these 3M rows (28M - 25M primary key id value) and then returning the result.
Using mysql explain we found out that PRIMARY key is being used if the limit is set to less than 20 but after that user_id is being prioritised.
Has anyone else seen this behaviour? Is there any flag which can be set which will avoid scanning primary key ? (we don't want to use force index(col_a_idx). Is there any other way which could avoid this ?

Although How to hint the index to use in a MySQL select query? discusses the question as stated, I suggest that there is a better way to optimize the query.
INDEX(col_a, col_b, id)
(And Drop INDEX(col_a))
This will allow the query to run faster than forcing the use of PRIMARY KEY(id).
With this index, the Optimizer will automatically use it and look at exactly 1 row, not 28M, not 22000.
If col_b is TEXT, this new index won't work. Let's see SHOW CREATE TABLE, and please explain what type of stuff is in col_b.
Perhaps there is a datatype issue. Perhaps there is something goofy about your index(col_a).

Related

myqsl: index for order by query

Here is the table_a schema I have:
Field
type
id(PRIMARY)
bigint
status
tinyint
err_code
bigint
...
...
The sql I want to execute will be:
select * from table_a where id > 123456 and status = -1 and err_code = 100001 order by id asc LIMIT 500
I'd like to query this sql above in real time.
My question is what kind of the index should I use here, I ready create a composite index -- idx_id_status_err_code, but it seems that mysql does not choose it.
There are two possible keys reported by explain statement -- PRIMARY and idx_id_status_err_code, but mysql use primary key instead of idx_id_status_err_code.
Another thing, there are some concurrent write operations, so I add row lock(for update not share mode) to target rows. I'm not sure if these write locks will affect the sql I mentioned above.
Any help is appreciated.
where id > 123456 and status = -1 and err_code = 100001 order by id
needs
INDEX(status, error_code, -- 1st because they are tested with "=", either order
id) -- for range test (>) and for ORDER BY
Since that handles all of the WHERE, GROUP BY, and ORDER BY, the Optimizer can even handle the LIMIT 500, thereby stopping after 500 rows.
When you start an INDEX with the column(s) of the PRIMARY KEY (id), there is little reason for the Optimizer to pick the INDEX instead of simply reaching into the data. This is especially true since you are fetching columns that are not in the index (SELECT *).
Avoid "index hints". What helps today may hurt tomorrow (when the data distribution changes).
You mentioned a "row lock"; let's hear more about why you think you need such. If you are afraid that some other thread will change one of the rows this SELECT picked, then that is better fixed by adding a suitable WHERE to the UPDATE -- to make sure the row still has that status and error_code.

Avoid filesort in simple filtered ordered query

I have a simple table:
CREATE TABLE `user_values` (
`id` bigint NOT NULL AUTO_INCREMENT,
`user_id` bigint NOT NULL,
`value` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`,`id`),
KEY `id` (`id`,`user_id`);
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
that I am trying to execute the following simple query:
select * from user_values where user_id in (20020, 20030) order by id desc;
I would fully expect this query to 100% use an index (either the (user_id, id) one or the (id, user_id) one) Yet, it turns out that's not the case:
explain select * from user_values where user_id in (20020, 20030); yields:
id
select_type
table
partitions
type
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
user_values
NULL
range
user_id
8
NULL
9
100.00
Using index condition; Using filesort
Why is that the case? How can I avoid a filesort on this trivial query?
You can't avoid the filesort in the query you show.
When you use a range predicate (for example, IN ( ) is a range predicate), and an index is used, the rows are read in index order. But there's no way for the MySQL query optimizer to guess that reading the rows in index order by user_id will guarantee they are also in id order. The two user_id values you are searching for are potentially scattered all over the table, in any order. Therefore MySQL must assume that once the matching rows are read, an extra step of sorting the result by id is necessary.
Here's an example of hypothetical data in which reading the rows by an index on user_id will not be in id order.
id
user_id
1
20030
2
20020
3
20016
4
20030
5
20020
So when reading from an index on (user_id, id), the matching rows will be returned in the following order, sorted by user_id first, then by id:
id
user_id
2
20020
5
20020
1
20030
4
20030
Clearly, the result is not in id order, so it needs to be sorted to satisfy the ORDER BY you requested.
The same kind of effect happens for other type of predicates, for example BETWEEN, or < or != or IS NOT NULL, etc. Every predicate except for = is a range predicate.
The only ways to avoid the filesort are to change the query in one of the following ways:
Omit the ORDER BY clause and accepting the results in whatever order the optimizer chooses to return them, which could be in id order, but only by coincidence.
Change the user_id IN (20020, 20030) to user_id = 20020, so there is only one matching user_id, and therefore reading the matching rows from the index will already be returned in the id order, and therefore the ORDER BY is a no-op. The optimizer recognizes when this is possible, and skips the filesort.
MySQL will most likely use index for the query (unless the user_id's in the query covers most of the rows).
The "filesort" happens in memory (it's really not a filesort), and is used to sort the found rows based on the ORDER BY clause.
You cannot avoid a "sort" in this case.
There were about 9 rows to sort, so it could not have taken long.
How long did the query take? Probably only a few milliseconds, so who cares?
"Filesort" does not necessarily mean that a "file" was involved. In many queries the sort is done in RAM.
Do you use id for anything other than to have a PRIMARY KEY on the table? If not, then this will help a small amount. (The speed-up won't be indicated in EXPLAIN.)
PRIMARY KEY (`user_id`,`id`), -- to avoid secondary lookups
KEY `id` (`id`); -- to keep auto_increment happy

How to optimise mysql query as Full ProcessList is showing Sending Data for over 24 hours

I have the following query that runs forever and I am looking to see if there is anyway that I can optimise it. This is running on a table that has in total 1,406,480 rows of data but apart from the Filename and Refcolumn, the ID and End_Date have both been indexed.
My Query:
INSERT INTO UniqueIDs
(
SELECT
T1.ID
FROM
master_table T1
LEFT JOIN
master_table T2
ON
(
T1.Ref_No = T2.Ref_No
AND
T1.End_Date = T2.End_Date
AND
T1.Filename = T2.Filename
AND
T1.ID > T2.ID
)
WHERE T2.ID IS NULL
AND
LENGTH(T1.Ref_No) BETWEEN 5 AND 10
)
;
Explain Results:
The reason for not indexing the Ref_No is that this is a text column and therefore I get a BLOB/TEXT error when I try and index this column.
Would really appreciate if somebody could advise on how I can quicken this query.
Thanks
Thanks to Bill in regards to multi column indexes I have managed to make some headway. I first ran this code:
CREATE INDEX I_DELETE_DUPS ON master_table(id, End_Date);
I then added a new column to show the length of the Ref_No but had to change it from the query Bill mentioned as my version of MySQL is 5.5. So I ran it in 3 steps:
ALTER TABLE master_table
ADD COLUMN Ref_No_length SMALLINT UNSIGNED;
UPDATE master_table SET Ref_No_length = LENGTH(Ref_No);
ALTER TABLE master_table ADD INDEX (Ref_No_length);
Last step was to change my insert query with the where clause for the length. This was changed to:
AND t1.Ref_No_length between 5 and 10;
I then ran this query and within 15 mins I had 280k worth of id's inserted into my UniqueIDs table. I did go change my insert script to see if I could add more values to the length by doing the following:
AND t1.Ref_No_length IN (5,6,7,8,9,10,13);
This was to bring in the values where length was also equal to 13. This query took a lot longer, 2hr 50 mins to be precise but the additional ask of looking for all rows that have length of 13 gave me an extra 700k unique ids.
I am looking at ways to optimise the query with the IN clause, but a big improvement where this query kept running for 24 hours. So thank you so much Bill.
For the JOIN, you should have a multi-column index on (Ref_No, End_Date, Filename).
You can create a prefix index on a TEXT column like this:
ALTER TABLE master_table ADD INDEX (Ref_No(10));
But that won't help you search based on the LENGTH(). Indexing only helps search by value indexed, not by functions on the column.
In MySQL 5.7 or later, you can create a virtual column like this, with an index on the values calculated for the virtual column:
ALTER TABLE master_table
ADD COLUMN Ref_No_length SMALLINT UNSIGNED AS (LENGTH(Ref_No)),
ADD INDEX (Ref_No_length);
Then MySQL will recognize that your condition in your query is the same as the expression for the virtual column, and it will automatically use the index (exception: in my experience, this doesn't work for expressions using JSON functions).
But this is no guarantee that the index will help. If most of the rows match the condition of the length being between 5 and 10, the optimizer will not bother with the index. It may be more work to use the index than to do a table-scan.
the ID and End_Date have both been indexed.
You have PRIMARY KEY(id) and redundantly INDEX(id)? A PK is a unique key.
"have both been indexed" -- INDEX(a), INDEX(b) is not the same as INDEX(a,b) -- they have different uses. Read about "composite" indexes.
That query smells a lot like "group-wise" max done in a very slow way. (Alas, that may have come from the online docs.)
I have compiled the fastest ways to do that task here: http://mysql.rjweb.org/doc.php/groupwise_max (There are multiple versions, based on MySQL version and what issues your code can/cannot tolerate.)
Please provide SHOW CREATE TABLE. One important question: Is id the PRIMARY KEY?
This composite index may be useful:
(Filename, End_Date, Ref_No, -- first, in any order
ID) -- last
This, as others have noted, is unlikely to be helped by any index, hence T1 will need a full-table-scan:
AND LENGTH(T1.Ref_No) BETWEEN 5 AND 10
If Ref_No cannot be bigger than 191 characters, change it to a VARCHAR so that it can be used in an index. Oh, did I ask for SHOW CREATE TABLE? If you can't make it VARCHAR, then my recommended composite index is
INDEX(Filename, End_Date, ID)

Mysql query runs very slow when using order by

The following query takes 30 seconds to finish when having order by. Without order by it finish in 0.0035 seconds. I am already having an index on field "name". Field "id" is the primary key. I have 400,000 record in this table. Please help, what is wrong with the query when using order by.
SELECT *
FROM users
WHERE name IS NOT NULL
AND name != ''
AND ( status IS NULL OR status = '0' )
order by id desc
limit 50
Update: (solution at the end )
Hi All, Thanks for the help. Below is some updates as you asked:
Below is the output of explain.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE users range name name 258 NULL 226009 Using where; Using filesort
Yes, there are around 20 fields in this table.
Below are the indexes that I have:
Keyname Type Cardinality Field
PRIMARY PRIMARY 418099 id
name INDEX 411049 name
Solution:
It turns out the fields with null values are the reason. When making those 2 fields in the where condition to NOT NULL, it just takes .000x seconds. But the strange thing is, it increases to 29 seconds if I create an index of (status,name,id DESC) or (status,name,id).
You should definitely have compound index. A single one containing all the fields you need as DBMSs can not really use more than one index on a single query.
An OR clause is not really index-friendly, so if you can I recommend setting status to NOT NULL. I assume NULL does not have any different meaning from the zero number. This will help a lot to actually use the index.
I do not know how much name != '' is optimized. Semantically equal would be name > '' (meaning it is later in the alphabet), may be this also save you some CPU cycles.
Then you have to decide the order in which your columns appear. A rule of thumb could be cardinality, the possible values a field can have.
By this:
ALTER TABLE users ADD INDEX order1 (status, name, id DESC);
Edit
You don't need to delete indexes. MySQL will choose the best one very quickly and ignore the rest. They only cost disk space and some CPU cycles on UPDATEs. But if you do not need them in any circumstances you can remove them of course.
The long time is because the access to your table is slow. This is probably caused by dynamic length fields such as TEXT or BLOB. If you do not ALWAYS need these, you can move them to a twin auxiliary table like:
users (id, name, status, group_id)
profile (user_id, birthdate, gender, motto, cv)
This way the essential system-operations can be done with a restricted information about the user, and all the other stuff which is really content associated with the user only have to be used when it is really needed.
Edit2
You hint MySQL which index to use by specifying it (or more of them) like:
SELECT id, name FROM users USE INDEX (order1) WHERE name != '' and status = '0' ORDER BY id DESC
without having an explain it is hard to say, but most probably you also need an index on
the "status" column. slowness on a single table query almost always comes down to the query doing a full table scan as opposed to using an index.
try doing:
explain SELECT *
FROM users
WHERE name IS NOT NULL
AND name != ''
AND ( status IS NULL OR status = '0' )
order by id desc
limit 50
and post the output. you'll probably see it is doing a full table scan, because it doesn't have an index for status. here's some documentation on using "explain". If you want more background, this is a nice article on the kind of problem you are having.

Does using "LIMIT 1" speed up a query on a primary key?

If I have a primary key of say id and I do a simple query for the key such as,
SELECT id FROM myTable WHERE id = X
Will it find one row and then stop looking as it is a primary key, or would it be better to tell mysql to limit its select by using LIMIT 1? For instance:
SELECT id FROM myTable WHERE id = X LIMIT 1
Does using “LIMIT 1” speed up a query on a primary key?
No. It's already as fast as can be without LIMIT 1. LIMIT 1 is effectively implied anyway.
Will it find one row and then stop looking as it is a primary key
Yes.
No table scan should be necessary at all here: it's a key-based lookup. The matching row is found and that's the end of the procedure.
There is no need to worry about this sort of "optimization".
Only one row will be fetched -- per unique index contract -- and the database is able to very quickly find all the (1) rows. It is able to do this because of the underlying structures that back the index (or primary key) support fast-seeking by value. There is no table-scan involved. (Generally a variant of a B-tree, but it might be hash-based, etc, is used. I suppose a smart query optimizer might also be able to pass down additional hints based on the unique constraint in effect, but I don't know enough about this.)