Mysql query runs very slow when using order by - mysql
The following query takes 30 seconds to finish when having order by. Without order by it finish in 0.0035 seconds. I am already having an index on field "name". Field "id" is the primary key. I have 400,000 record in this table. Please help, what is wrong with the query when using order by.
SELECT *
FROM users
WHERE name IS NOT NULL
AND name != ''
AND ( status IS NULL OR status = '0' )
order by id desc
limit 50
Update: (solution at the end )
Hi All, Thanks for the help. Below is some updates as you asked:
Below is the output of explain.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE users range name name 258 NULL 226009 Using where; Using filesort
Yes, there are around 20 fields in this table.
Below are the indexes that I have:
Keyname Type Cardinality Field
PRIMARY PRIMARY 418099 id
name INDEX 411049 name
Solution:
It turns out the fields with null values are the reason. When making those 2 fields in the where condition to NOT NULL, it just takes .000x seconds. But the strange thing is, it increases to 29 seconds if I create an index of (status,name,id DESC) or (status,name,id).
You should definitely have compound index. A single one containing all the fields you need as DBMSs can not really use more than one index on a single query.
An OR clause is not really index-friendly, so if you can I recommend setting status to NOT NULL. I assume NULL does not have any different meaning from the zero number. This will help a lot to actually use the index.
I do not know how much name != '' is optimized. Semantically equal would be name > '' (meaning it is later in the alphabet), may be this also save you some CPU cycles.
Then you have to decide the order in which your columns appear. A rule of thumb could be cardinality, the possible values a field can have.
By this:
ALTER TABLE users ADD INDEX order1 (status, name, id DESC);
Edit
You don't need to delete indexes. MySQL will choose the best one very quickly and ignore the rest. They only cost disk space and some CPU cycles on UPDATEs. But if you do not need them in any circumstances you can remove them of course.
The long time is because the access to your table is slow. This is probably caused by dynamic length fields such as TEXT or BLOB. If you do not ALWAYS need these, you can move them to a twin auxiliary table like:
users (id, name, status, group_id)
profile (user_id, birthdate, gender, motto, cv)
This way the essential system-operations can be done with a restricted information about the user, and all the other stuff which is really content associated with the user only have to be used when it is really needed.
Edit2
You hint MySQL which index to use by specifying it (or more of them) like:
SELECT id, name FROM users USE INDEX (order1) WHERE name != '' and status = '0' ORDER BY id DESC
without having an explain it is hard to say, but most probably you also need an index on
the "status" column. slowness on a single table query almost always comes down to the query doing a full table scan as opposed to using an index.
try doing:
explain SELECT *
FROM users
WHERE name IS NOT NULL
AND name != ''
AND ( status IS NULL OR status = '0' )
order by id desc
limit 50
and post the output. you'll probably see it is doing a full table scan, because it doesn't have an index for status. here's some documentation on using "explain". If you want more background, this is a nice article on the kind of problem you are having.
Related
Order by fields from different tables create full table scan. Should I combine data to one table?
This is a theoretical question. Sorry, but I don't have a working tables data to show, I'll try to improvise with a theoretical example. Using MySql/MariaDB. Have indexes for all relevant fields. I have a system, which historical design had a ProductType table, something like: ID=1, Description="Milk" ID=2, Description="Bread" ID=3, Description="Salt" ID=4, Description="Sugar" and so on. There are some features in the system that rely on the ProductType ID and the Description is also used in different places, such as for defining different properties of the product type. There is also a Product table, with fields such as: ID, ProductTypeID, Name The Product:Name don't have the product type description in it, so a "Milk bottle 1l" will have an entry such as: ID=101, ProductTypeID=1, Name="bottle 1l" and "Sugar pack 1kg" will be: ID=102, ProductTypeID=4, Name="pack 1kg" You get the idea... The system combines the ProductType:Description and Product:Name to show full product names to the users. This creates a systematic naming for all the products, so there is no way to define a product with a name such as "1l bottle of milk". I know that in English that might be hard to swallow, but that way works great with my local language. Years passed, the database grow to millions of products. Since full-text index should have all searched data in one table, I had to store the ProductType:Description inside the Product table in a string field I added that have different keywords related to the product, so the full-text search will be able to find anything related to the product (type, name, barcode, SKU and etc.) Now I'm trying to solve the full table scans and it makes me think that current design might not be optimal and I'll have to redesign and store the full product name (type + name) in the same table... In order to show the proper order of the products there's an ORDER BY TypeDescription ASC, ProductName ASC after the ProductType table is joined to Product select queries. From my research I see that the database can't use indexes when the order is done on fields from different tables, so it's doing full table scan to get to the right entries. During pagination, there's ORDER and LIMIT 50000,100 in the query that take lots of time. There are sections with lots for products, so that ordering and limiting cause very long full table scans. How would you handle that situation? Change the design and store all query related data to the Product table? Feels a bit of a duplication and not natural solution. Or maybe there's another way to solve it? Will index on VARCHAR type (product name) be efficient for the ORDER speed? Or the database will still do full table scan? My first question here. Couldn't find answers on similar cases. Thanks! I've tried to play with the queries to see if ordering by a VARCHAR field that have an index will work, but the EXPLAIN SELECT still shows that the query didn't use the index and did WHERE run :( UPDATE Trying to add some more data... The situation is a bit more complicated and after digging a bit more it looks like the initial question was not in the right direction. I removed the product type from the queries and still have the slow query. I feel like it's a chicken and egg situation... I have a table that maps prodcut IDs to section IDs: CREATE TABLE `Product2Section` ( `SectionId` int(10) unsigned NOT NULL, `ProductId` int(10) unsigned NOT NULL, KEY `idx_ProductId` (`ProductId`), KEY `idx_SectionId` (`SectionId`), KEY `idx_ProductId_SectionId` (`ProductId`,`SectionId`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC The query (after stripping all non-relevant to the question feilds): SELECT DISTINCT DRIVER.ProductId AS ID, p.* FROM Product2Section AS DRIVER LEFT JOIN Product p ON (p.ID = DRIVER.ProductId) WHERE DRIVER.SectionId IN( 544,545,546,548,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,566,567,568,570,571,572,573,574,575,1337,1343,1353,1358,1369,1385,1956,1957,1964,1973,1979,1980,1987,1988,1994,1999,2016,2020,576,577,578,579,580,582,586,587,589,590,591,593,596,597,598,604,605,606,608,609,612,613,614,615,617,619,620,621,622,624,625,626,627,628,629,630,632,634,635,637,639,640,642,643,644,645,647,648,651,656,659,660,661,662,663,665,667,669,670,672,674,675,677,683,684,689,690,691,695,726,728,729,730,731,734,736,741,742,743,745,746,749,752,758,761,762,763,764,768,769,771,772,773,774,775,776,777 ) ORDER BY p.ProductName ASC LIMIT 500900,100; explain shows: id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE DRIVER index idx_SectionId idx_ProductId_SectionId 8 NULL 589966 Using where; Using index; Using temporary; Using filesort 1 SIMPLE p eq_ref PRIMARY,idx_ID PRIMARY 4 4project.DRIVER.ProductId 1 Using where I've tried to select from the products table and join the Product2Section in order to filter the results, but get the same results: SELECT DISTINCT p.ID, p.ProductName FROM Product p LEFT JOIN Product2Section p2s ON (p.ID=p2s.ProductId) WHERE p2s.SectionId IN( 544,545,546,548,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,566,567,568,570,571,572,573,574,575,1337,1343,1353,1358,1369,1385,1956,1957,1964,1973,1979,1980,1987,1988,1994,1999,2016,2020,576,577,578,579,580,582,586,587,589,590,591,593,596,597,598,604,605,606,608,609,612,613,614,615,617,619,620,621,622,624,625,626,627,628,629,630,632,634,635,637,639,640,642,643,644,645,647,648,651,656,659,660,661,662,663,665,667,669,670,672,674,675,677,683,684,689,690,691,695,726,728,729,730,731,734,736,741,742,743,745,746,749,752,758,761,762,763,764,768,769,771,772,773,774,775,776,777 ) ORDER BY p.ProductName ASC LIMIT 500900, 100; explain: id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE p2s index idx_ProductId,idx_SectionId,idx_ProductId_SectionId idx_ProductId_SectionId 8 NULL 589966 Using where; Using index; Using temporary; Using filesort 1 SIMPLE p eq_ref PRIMARY,idx_ID PRIMARY 4 4project.p2s.ProductId 1 Using where Don't see a way out of that situation.
The two single column indices on Product2Section serve no purpose. You should change your junction table to: CREATE TABLE `Product2Section` ( `SectionId` int unsigned NOT NULL, `ProductId` int unsigned NOT NULL, PRIMARY KEY (`SectionId`, `ProductId`), KEY `idx_ProductId_SectionId` (`ProductId`, `SectionId`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; There are other queries in the system that probably use the single field indexes The single column indices cannot be used for anything that the two composite indices cannot be used for. They are just wasting space and cause unnecessary overhead on insert and for the optimizer. Setting one of the composite indices as PRIMARY stops InnoDB from having to create its own internal rowid, which just wastes space. It also adds the uniqueness constraint which is currently missing from your table. From the docs: Accessing a row through the clustered index is fast because the index search leads directly to the page that contains the row data. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record. This is not significant for a "simple" junction table as both columns should be stored in both indices, therefor no further read is required. You said: that didn't really bother me since there was no real performance hit You may not see the difference when running an individual query with no contention but the difference in a highly contended production environment can be huge, due to the amount of effort required. Do you really need to accommodate 4,294,967,295 (int unsigned) sections? Perhaps the 65,535 provided by smallint unsigned would be enough? You said: Might change it in the future. Don't think it will change the performance somehow Changing SectionId to smallint will reduce each index entry from 8 to 6 bytes. That's a 25% reduction in size. Smaller is faster. Why are you using LEFT JOIN? The fact that you are happy to reverse the order of the tables in the query suggests it should be an INNER JOIN. Do you have your buffer pool configured appropriately, or is it set to defaults? Please run ANALYZE TABLE Product2Section; and then provide the output from: SELECT TABLE_ROWS, AVG_ROW_LENGTH, DATA_LENGTH + INDEX_LENGTH FROM information_schema.TABLES WHERE TABLE_NAME = 'Product2Section'; And: SELECT ROUND(SUM(DATA_LENGTH + INDEX_LENGTH)/POW(1024, 3), 2) FROM information_schema.TABLES WHERE TABLE_SCHEMA = 'your_database_name'; And: SHOW VARIABLES LIKE 'innodb_buffer%';
myqsl: index for order by query
Here is the table_a schema I have: Field type id(PRIMARY) bigint status tinyint err_code bigint ... ... The sql I want to execute will be: select * from table_a where id > 123456 and status = -1 and err_code = 100001 order by id asc LIMIT 500 I'd like to query this sql above in real time. My question is what kind of the index should I use here, I ready create a composite index -- idx_id_status_err_code, but it seems that mysql does not choose it. There are two possible keys reported by explain statement -- PRIMARY and idx_id_status_err_code, but mysql use primary key instead of idx_id_status_err_code. Another thing, there are some concurrent write operations, so I add row lock(for update not share mode) to target rows. I'm not sure if these write locks will affect the sql I mentioned above. Any help is appreciated.
where id > 123456 and status = -1 and err_code = 100001 order by id needs INDEX(status, error_code, -- 1st because they are tested with "=", either order id) -- for range test (>) and for ORDER BY Since that handles all of the WHERE, GROUP BY, and ORDER BY, the Optimizer can even handle the LIMIT 500, thereby stopping after 500 rows. When you start an INDEX with the column(s) of the PRIMARY KEY (id), there is little reason for the Optimizer to pick the INDEX instead of simply reaching into the data. This is especially true since you are fetching columns that are not in the index (SELECT *). Avoid "index hints". What helps today may hurt tomorrow (when the data distribution changes). You mentioned a "row lock"; let's hear more about why you think you need such. If you are afraid that some other thread will change one of the rows this SELECT picked, then that is better fixed by adding a suitable WHERE to the UPDATE -- to make sure the row still has that status and error_code.
How can I speed up a slow SUM + ORDER BY + LIMIT query in MySQL?
I have a simple key-value table with two fields, created like so: CREATE TABLE `mytable` ( `key` varchar(255) NOT NULL, `value` double NOT NULL, KEY `MYKEY` (`key`) ); The keys are not unique. The table contains over one million records. I need a query that will sum up all the values for a given key, and return the top 10 keys. Here's my attempt: SELECT t.key, SUM(t.value) value FROM mytable t GROUP BY t.key ORDER BY value DESC LIMIT 0, 10; But this is very slow. Thing is, without the GROUP BY and SUM, it's pretty fast, and without the ORDER BY, it's very fast, but for some reason the combination of the two makes it very very slow. Can anyone explain why this is so, and how this can be speeded up? There is no index on value. I tried creating one but it didn't help. EXPLAIN EXTENDED produces the following in Workbench: id select_type table type possible_keys key key_len ref rows filtered Extra 1 SIMPLE t index NULL MYKEY 257 NULL 1340532 100.00 "Using temporary; Using filesort" There are about 400K unique keys in the table. The query takes over 3 minutes to run. I don't know how long because I stopped it after 3 minutes. However, if I remove the index on key, it runs in 30 seconds! Anyone has any idea why?
The only way to really speed this up, as far as I can see, is to create a seperate table with unique keys in and maintain the total values. Then you will be able to index values to retrieve the top ten quickly, also the calculation will already be done. As long as the table is not updated in too many places, this shouldn't be a major problem. The major problem with this type of query is that the group by requires indexing in one order and the order by requires sorting into a different order.
mysql: very simple SELECT id ORDER BY LIMIT will not use INDEX as expected (?!)
I have a simple table with about 3 million records. I made the neccessary indexes, i also force the index PRIMARY but still doesnt work. It searches for nearly all 3 million rows instead of using the index to execute this one (record_id is INT auto-increment): EXPLAIN SELECT record_id FROM myrecords FORCE INDEX ( PRIMARY ) ORDER BY record_id ASC LIMIT 2955900 , 300 id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE myrecords index NULL PRIMARY 4 NULL 2956200 Using index The index is Keyname Type Unique Packed Column Cardinality Collation Null PRIMARY BTREE Yes No record_id 2956742 A No I would like to know why this FORCED index is not being used the right way. Without forcing index 'primary' both ASC and DESC tried, result is the same. Table has been repaired-optimized-analyzed. No luck. query needs over a minute to execute! WHAT I EXPECTED: query should proccess only 300 rows since that column is indexed. not nearly all 3 million of them as you can see in the first code-formatted block (scroll a little to the right)
Index lookups are by value, not by position. An index can search for a value 2955900, but you're not asking for that. You're asking for the query to start at an offset of the 2955900th row in the table. The optimizer can't assume that all primary key values are consecutive. So it's pretty likely that the 2955900th row has a value much higher than that. Even if the primary key values are consecutive, you might have a WHERE condition that only matches, for example, 45% of the rows. In which case the id value on the 2955900th row would be way past the id value 2955900. In other words, an index lookup of the id value 2955900 will not deliver the 2955900th row. So MySQL can't use the index for a limit's offset. It must scan the rows to count them until it reaches offset+limit rows. MySQL does have optimizations related to LIMIT, but it's more about stopping a table-scan once it has reached the number of rows to return. The optimizer may still report in an EXPLAIN plan that it expects it might have to scan the whole table. A frequent misunderstand about FORCE INDEX is that it forces the use of an index. :-) In fact, if the query can't use an index (or if the available indexes don't have any benefit for this query), FORCE INDEX has no effect. Re your comment: Pagination is a frequent bane of data-driven web applications. Despite how common this feature is, it's not easy to optimize. Here are a few tips: Why are you querying with offset 2955900? Do you really expect users to sift through that many pages? Most users give up after a few pages (exactly how many depends on the type of application and the data). Reduce the number of queries. Your pagination function could fetch the first 5-10 pages, even if only it shows the first page to the user. Cache the other pages, with the assumption that the user will advance through a few pages. Only if they advance past the cached set of pages does your app have to do another query. You could even cache all 10 pages in Javascript on the client's browser so clicking "Next" is instantaneous for them (at least for those first few pages). Don't put a "Last" button on any user interface, because people will click it out of curiosity. Notice Google has a "Next" button but not a "Last" button. So the UI itself discourages people from running inefficient queries with high offsets. If the user is advancing one page at a time, use the highest id value returned in the previous page in the WHERE clause of the next page's query. I.e. the following does use the index, even with no FORCE INDEX hint: SELECT * FROM thistable WHERE id > 544 LIMIT 20
mysql: how to optimize this when condition search + more?
I have been trying to debug my search MYSQL search speeds and they are horrible (a couple of seconds to 2 minutes). This is an example code for the search. Search complexity can become really complicated depending on the user requirements. SELECT Device,Input,`W(m)`,`L(m)`,VDD,`Temp(C)`,Param,Value FROM `TABLE_NAME` WHERE (`Temp(C)`='110' OR `Temp(C)`='125' ) AND (Device='ngear' ) AND (Input='a' OR Input='b' OR Input='a' OR Input='b' OR Input='c' OR Input='b' ) AND (Param='speed' OR Param='leakage' ) Please note this table has no indices and no primary key. This data isn't really relational as it contains statistical simulation data that is stored in MYSQL. This table has about 1 million rows of data. Should I start indexing every column? Any thoughts would be appreciated. id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE TABLE_NAME ALL NULL NULL NULL NULL 12278640 Using where
You need a composite index, and properly require some redefine of your data type, here is some general advice : alter table MQUAL__TSMC_20G_v0d1_Robert_02122011 add index ( `Temp(C)`, Device, Input, Param); And your query can be change to :- where `Temp(C)` in(110, 125) AND Device='ngear' AND Input in('a','b','c') AND Param in('speed', 'leakage'); Due to lack of information of your schema, so, data type should be :- `Temp(C)` <-- int
My suggestion is to add an index with the 'where' clause that narrows the data the most. As a guess temp should be the first column in the index. You want a compound index as well. The general rule of columns should start with the column that reduces the result set the most to the last being the most common value.