Here is the table_a schema I have:
Field
type
id(PRIMARY)
bigint
status
tinyint
err_code
bigint
...
...
The sql I want to execute will be:
select * from table_a where id > 123456 and status = -1 and err_code = 100001 order by id asc LIMIT 500
I'd like to query this sql above in real time.
My question is what kind of the index should I use here, I ready create a composite index -- idx_id_status_err_code, but it seems that mysql does not choose it.
There are two possible keys reported by explain statement -- PRIMARY and idx_id_status_err_code, but mysql use primary key instead of idx_id_status_err_code.
Another thing, there are some concurrent write operations, so I add row lock(for update not share mode) to target rows. I'm not sure if these write locks will affect the sql I mentioned above.
Any help is appreciated.
where id > 123456 and status = -1 and err_code = 100001 order by id
needs
INDEX(status, error_code, -- 1st because they are tested with "=", either order
id) -- for range test (>) and for ORDER BY
Since that handles all of the WHERE, GROUP BY, and ORDER BY, the Optimizer can even handle the LIMIT 500, thereby stopping after 500 rows.
When you start an INDEX with the column(s) of the PRIMARY KEY (id), there is little reason for the Optimizer to pick the INDEX instead of simply reaching into the data. This is especially true since you are fetching columns that are not in the index (SELECT *).
Avoid "index hints". What helps today may hurt tomorrow (when the data distribution changes).
You mentioned a "row lock"; let's hear more about why you think you need such. If you are afraid that some other thread will change one of the rows this SELECT picked, then that is better fixed by adding a suitable WHERE to the UPDATE -- to make sure the row still has that status and error_code.
Related
I am trying to optimize this MySQL query and having less experience in understanding execution plan I am having hard time making sense of the execution plan.
My question is : Can you please help me in understanding why the query execution plan of New Query is worse than that of Original query even though New query performs better in Prod.
SQL needed to reproduce this case is here
Also kept relevant table definition in the end ( Table bill_range references bill using foreign key bill_id )
Original query takes 10 second to complete in PROD
select *
from bill_range
where (4050 between low and high )
order by bill_id limit 1;
while new query (I am forcing/suggesting to use index) takes 5 second to complete in PROD
select *
from bill_range
use index ( bill_range_low_high_index)
where (4050 between low and high )
order by bill_id limit 1;
But the execution plan gives suggest original query is better( this is the part where my understanding seems to be wrong )
Original query
New query
Column "type" for original query suggest index while new query
says ALL
Column "Key" is bill_id (perhaps index on FK) for
original queryand Null for new query
Column "rows" for original query is 1 while for new query says 9
So given all this information wouldn't it imply that new query is actually worse than original query .
And if that is true why is new query performing better? Or am I reading the execution plan wrong.
Table defintions
CREATE TABLE bill_range (
id int(11) NOT NULL AUTO_INCREMENT,
low varchar(255) NOT NULL,
high varchar(255) NOT NULL,
PRIMARY KEY (id),
bill_id int(11) NOT NULL,
FOREIGN KEY (bill_id) REFERENCES bill(id)
);
CREATE TABLE bill (
id int(11) NOT NULL AUTO_INCREMENT,
label varchar(10),
PRIMARY KEY (id)
);
create index bill_range_low_high_index on bill_range( low, high);
NOTE : The reason I am providing definition of 2 tables is because original query decided to use an index based on Foreign key to bill table
Your index isn't quite optimal for your query. Let me explain if I may.
MySQL indexes use BTREE data structures. Those work well in indexed-sequential access mode (hence the MyISAM name of MySQL's first storage engine). It favors queries that jump to a particular place in an index and then run through the index element by element. The typical example is this, with an index on col.
SELECT whatever FROM tbl WHERE col >= constant AND col <= constant2
That is a rewrite of WHERE col BETWEEN constant AND constant2.
Let's recast your query so this pattern is obvious, and so the columns you want are explicit.
select id, low, high, bill_id
from bill_range
where low <= 4050
and high >= 4050
order by bill_id limit 1;
An index on the high column allows a range scan starting with the first eligible row with high >= 4050. Then, we can go on to make it a compound index, including the bill_id and low columns.
CREATE INDEX high_billid_low ON bill_range (high, bill_id, low);
Because we want the lowest matching bill_id we put that into the index next, then finally the low value. So the query planner random accesses the index to the first elibible row by high, then scans until it finds the very first index item that meets the low criterion. And then it's done: that's the desired result. It's already ordered by bill_id so it can stop. ORDER BY comes from the index. The query can be satisfied entirely from the index -- it is a so-called covering index.
As to why your two queries performed differently: In the first, the query planner decided to scan your data in bill_id order looking for the first matching low/high pair. Possibly it decided that actually sorting a result set would likely be more expensive than scanning bill_ids in order. It looks to me like your second query did a table scan. Why that was faster, who knows?
Notice that this index would also work for you.
CREATE INDEX low_billid_high ON bill_range (low DESCENDING, bill_id, high);
In InnoDB the table's PK id is implicitly part of every index, so there's no need to mention it in the compound index.
And, you can still write it the way you first wrote it; the query planner will figure out what you want.
Pro tip: Avoid SELECT * ... the * makes it harder to reason about the columns you need to retrieve.
I have the following query that runs forever and I am looking to see if there is anyway that I can optimise it. This is running on a table that has in total 1,406,480 rows of data but apart from the Filename and Refcolumn, the ID and End_Date have both been indexed.
My Query:
INSERT INTO UniqueIDs
(
SELECT
T1.ID
FROM
master_table T1
LEFT JOIN
master_table T2
ON
(
T1.Ref_No = T2.Ref_No
AND
T1.End_Date = T2.End_Date
AND
T1.Filename = T2.Filename
AND
T1.ID > T2.ID
)
WHERE T2.ID IS NULL
AND
LENGTH(T1.Ref_No) BETWEEN 5 AND 10
)
;
Explain Results:
The reason for not indexing the Ref_No is that this is a text column and therefore I get a BLOB/TEXT error when I try and index this column.
Would really appreciate if somebody could advise on how I can quicken this query.
Thanks
Thanks to Bill in regards to multi column indexes I have managed to make some headway. I first ran this code:
CREATE INDEX I_DELETE_DUPS ON master_table(id, End_Date);
I then added a new column to show the length of the Ref_No but had to change it from the query Bill mentioned as my version of MySQL is 5.5. So I ran it in 3 steps:
ALTER TABLE master_table
ADD COLUMN Ref_No_length SMALLINT UNSIGNED;
UPDATE master_table SET Ref_No_length = LENGTH(Ref_No);
ALTER TABLE master_table ADD INDEX (Ref_No_length);
Last step was to change my insert query with the where clause for the length. This was changed to:
AND t1.Ref_No_length between 5 and 10;
I then ran this query and within 15 mins I had 280k worth of id's inserted into my UniqueIDs table. I did go change my insert script to see if I could add more values to the length by doing the following:
AND t1.Ref_No_length IN (5,6,7,8,9,10,13);
This was to bring in the values where length was also equal to 13. This query took a lot longer, 2hr 50 mins to be precise but the additional ask of looking for all rows that have length of 13 gave me an extra 700k unique ids.
I am looking at ways to optimise the query with the IN clause, but a big improvement where this query kept running for 24 hours. So thank you so much Bill.
For the JOIN, you should have a multi-column index on (Ref_No, End_Date, Filename).
You can create a prefix index on a TEXT column like this:
ALTER TABLE master_table ADD INDEX (Ref_No(10));
But that won't help you search based on the LENGTH(). Indexing only helps search by value indexed, not by functions on the column.
In MySQL 5.7 or later, you can create a virtual column like this, with an index on the values calculated for the virtual column:
ALTER TABLE master_table
ADD COLUMN Ref_No_length SMALLINT UNSIGNED AS (LENGTH(Ref_No)),
ADD INDEX (Ref_No_length);
Then MySQL will recognize that your condition in your query is the same as the expression for the virtual column, and it will automatically use the index (exception: in my experience, this doesn't work for expressions using JSON functions).
But this is no guarantee that the index will help. If most of the rows match the condition of the length being between 5 and 10, the optimizer will not bother with the index. It may be more work to use the index than to do a table-scan.
the ID and End_Date have both been indexed.
You have PRIMARY KEY(id) and redundantly INDEX(id)? A PK is a unique key.
"have both been indexed" -- INDEX(a), INDEX(b) is not the same as INDEX(a,b) -- they have different uses. Read about "composite" indexes.
That query smells a lot like "group-wise" max done in a very slow way. (Alas, that may have come from the online docs.)
I have compiled the fastest ways to do that task here: http://mysql.rjweb.org/doc.php/groupwise_max (There are multiple versions, based on MySQL version and what issues your code can/cannot tolerate.)
Please provide SHOW CREATE TABLE. One important question: Is id the PRIMARY KEY?
This composite index may be useful:
(Filename, End_Date, Ref_No, -- first, in any order
ID) -- last
This, as others have noted, is unlikely to be helped by any index, hence T1 will need a full-table-scan:
AND LENGTH(T1.Ref_No) BETWEEN 5 AND 10
If Ref_No cannot be bigger than 191 characters, change it to a VARCHAR so that it can be used in an index. Oh, did I ask for SHOW CREATE TABLE? If you can't make it VARCHAR, then my recommended composite index is
INDEX(Filename, End_Date, ID)
Given this table in MySQL 5.6:
create table PlayerSession
(
id bigint auto_increment primary key,
lastActivity datetime not null,
player_id bigint null,
...
constraint FK4410E05525A98981
foreign key (player_id) references Player (id)
)
How can it possibly be that this query returns about 2000 rows instantly:
SELECT * FROM PlayerSession
WHERE player_id = ....
ORDER BY lastActivity DESC
but adding LIMIT 1 makes it take 4 seconds, even though all that should do is pick the first result?
Using EXPLAIN I found the only difference to be that without the limit, filesort is used. From what I gather, this should make it slower, not faster. The whole table contains about 2M rows.
Also, adding LIMIT 3 or anything higher than that, gives the same performance as no limit.
And yes, I have since created an index on playerId, lastActivity, which, surprise surprise, makes it fast again. While that takes the immediate stress out of the situation (the server was rather overloaded), it doesn't really explain the mystery.
What specific version of 5.6? Please provide EXPLAIN FORMAT=JSON SELECT .... Please provide SHOW CREATE TABLE; we need to see the other indexes, plus datatypes.
INDEX(playerId, lastActivity) lets the query avoid "filesort".
A possible reason for the strange timings could be caching. Run each query twice to avoid that hiccup.
The following query takes 30 seconds to finish when having order by. Without order by it finish in 0.0035 seconds. I am already having an index on field "name". Field "id" is the primary key. I have 400,000 record in this table. Please help, what is wrong with the query when using order by.
SELECT *
FROM users
WHERE name IS NOT NULL
AND name != ''
AND ( status IS NULL OR status = '0' )
order by id desc
limit 50
Update: (solution at the end )
Hi All, Thanks for the help. Below is some updates as you asked:
Below is the output of explain.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE users range name name 258 NULL 226009 Using where; Using filesort
Yes, there are around 20 fields in this table.
Below are the indexes that I have:
Keyname Type Cardinality Field
PRIMARY PRIMARY 418099 id
name INDEX 411049 name
Solution:
It turns out the fields with null values are the reason. When making those 2 fields in the where condition to NOT NULL, it just takes .000x seconds. But the strange thing is, it increases to 29 seconds if I create an index of (status,name,id DESC) or (status,name,id).
You should definitely have compound index. A single one containing all the fields you need as DBMSs can not really use more than one index on a single query.
An OR clause is not really index-friendly, so if you can I recommend setting status to NOT NULL. I assume NULL does not have any different meaning from the zero number. This will help a lot to actually use the index.
I do not know how much name != '' is optimized. Semantically equal would be name > '' (meaning it is later in the alphabet), may be this also save you some CPU cycles.
Then you have to decide the order in which your columns appear. A rule of thumb could be cardinality, the possible values a field can have.
By this:
ALTER TABLE users ADD INDEX order1 (status, name, id DESC);
Edit
You don't need to delete indexes. MySQL will choose the best one very quickly and ignore the rest. They only cost disk space and some CPU cycles on UPDATEs. But if you do not need them in any circumstances you can remove them of course.
The long time is because the access to your table is slow. This is probably caused by dynamic length fields such as TEXT or BLOB. If you do not ALWAYS need these, you can move them to a twin auxiliary table like:
users (id, name, status, group_id)
profile (user_id, birthdate, gender, motto, cv)
This way the essential system-operations can be done with a restricted information about the user, and all the other stuff which is really content associated with the user only have to be used when it is really needed.
Edit2
You hint MySQL which index to use by specifying it (or more of them) like:
SELECT id, name FROM users USE INDEX (order1) WHERE name != '' and status = '0' ORDER BY id DESC
without having an explain it is hard to say, but most probably you also need an index on
the "status" column. slowness on a single table query almost always comes down to the query doing a full table scan as opposed to using an index.
try doing:
explain SELECT *
FROM users
WHERE name IS NOT NULL
AND name != ''
AND ( status IS NULL OR status = '0' )
order by id desc
limit 50
and post the output. you'll probably see it is doing a full table scan, because it doesn't have an index for status. here's some documentation on using "explain". If you want more background, this is a nice article on the kind of problem you are having.
If I have a primary key of say id and I do a simple query for the key such as,
SELECT id FROM myTable WHERE id = X
Will it find one row and then stop looking as it is a primary key, or would it be better to tell mysql to limit its select by using LIMIT 1? For instance:
SELECT id FROM myTable WHERE id = X LIMIT 1
Does using “LIMIT 1” speed up a query on a primary key?
No. It's already as fast as can be without LIMIT 1. LIMIT 1 is effectively implied anyway.
Will it find one row and then stop looking as it is a primary key
Yes.
No table scan should be necessary at all here: it's a key-based lookup. The matching row is found and that's the end of the procedure.
There is no need to worry about this sort of "optimization".
Only one row will be fetched -- per unique index contract -- and the database is able to very quickly find all the (1) rows. It is able to do this because of the underlying structures that back the index (or primary key) support fast-seeking by value. There is no table-scan involved. (Generally a variant of a B-tree, but it might be hash-based, etc, is used. I suppose a smart query optimizer might also be able to pass down additional hints based on the unique constraint in effect, but I don't know enough about this.)