I had issues with my query that took 17 seconds to execute (350k rows):
SELECT idgps_unit, MAX(dt)
FROM gps_unit_location
GROUP BY 1
Explain
1 SIMPLE gps_unit_location index fk_gps2 5 422633
After playing with it, I came with this solution that takes 1second:
Select idgps_unit, MAX(dt) from (
SELECT idgps_unit, dt
FROM gps_unit_location
) d1
Group by 1
Explain:
1 PRIMARY <derived2> ALL 423344 Using temporary; Using filesort
2 DERIVED gps_unit_location index gps_unit_location_dt_gpsid 10 422617 Using index
And now I am confused- why query #2 is fast, while query #1 seems to be the same query and seems to be written more efficiently.
Index1 :DT, Index2: idgps_unit, Index3: idgps_unit+DT
The execution times are consistent; query #1 always takes 17-19sec; while #1 <1sec.
I am using Godaddy VPS Windows Server 2008 Economy
Table example:
id | idgps_unit | dt | location
1 | 1 | 2012-01-01 | 1
2 | 1 | 2012-01-02 | 2
3 | 2 | 2012-01-03 | 3
4 | 2 | 2012-01-04 | 4
5 | 3 | 2012-01-05 | 5
First, I'm assuming gps_unit_location is really a table and not a view. Second, I'm also assuming that you have run both queries multiple times, so caching is the not explanation. (Caching would be that you run the first query, it loads the table into page cache and the second reads from memory rather than disk.)
Do you have an index on gps_unit_location(idgps_unit)? Are the records very wide? If the answers to these questions are "yes", then the following may be happening.
If so, you might have a curious problem with indexing. You would think that an index would speed up such a query. What it does, though, is to look up the values in idgps_id in order. If the index does not contain the date, then the database needs to fetch the data from each page. If the table does not fit into memory, then this will often result in a cache-miss -- that is, time to load the page.
By contrast, if the table is wide and the engine does a full table scan, then it can zip through the table and extract the two fields of interest. It puts them on the side. If they are small relative to the full table, then sorting them might take very little time. Voila, the query finishes faster.
My guess would be that the second structure removes the use of an index.
By the way, you can fix this by changing the index to gps_unit_location(idgps_unit, dt). By including the field in the index, the query does not have to load the data.
I would say your indexs are not set up properly, Your second query is kind of an inner query which is effectively creating its own inner index group if that makes sense !
Related
I am running a query to retrieve some game levels from a MySQL database. The query itself takes around 0.00025 seconds to execute on a base that contains 40 level strings. I thought it was satisfactory, until I got a message from the website host telling me to optimise the below-mentioned query, or the script will be removed since it is pushing a lot of strain onto their servers.
I tried optimising by using explain and explain extended and adjusting the columns accordingly(adding indexes), but am always getting the same performance. What I noticed also is that MySQL didn't use indexes where they were available but instead did a full-table scan.
Results from EXPLAIN EXTENDED:
table id select_type type possible_keys key key_len ref rows Extra
users 1 SIMPLE ALL PRIMARY,id NULL NULL NULL 7 Using temporary; Using filesort
AllTime 1 SIMPLE ref PRIMARY,userid PRIMARY 4 Test.users.id 1
query:
SELECT users.nickname, AllTime.userid, AllTime.id, AllTime.levelname, AllTime.levelstr
FROM AllTime
INNER JOIN users
ON AllTime.userid=users.id
ORDER BY AllTime.id DESC
LIMIT ($value_from_php),20;
The tables:
users
| id(int) | nickname(varchar) |
| (Primary, Auto_increment) | |
|---------------------------|-------------------|
| 1 | username1 |
| 2 | username2 |
| 3 | username3 |
| ... | ... |
and AllTime
| id(int) | userid(int) | levelname(varchar) | levelstr(text) |
| (Primary, Auto_increment) | (index) | | |
|---------------------------|-------------|--------------------|----------------|
| 1 | 2 | levelname1 | levelstr1 |
| 2 | 2 | levelname2 | levelstr2 |
| 3 | 3 | levelname3 | levelstr3 |
| 4 | 1 | levelname4 | levelstr4 |
| 5 | 1 | levelname5 | levelstr5 |
| 6 | 1 | levelname6 | levelstr6 |
| 7 | 2 | levelname7 | levelstr7 |
Is there a way to optimize this query or would I be better off by calling two consecutive queries from php just to avoid the warning?
I am just learning MySQL, so please take that information into account when replying, thank you :)
I'm assuming you're using InnoDB.
For an INNER JOIN, MySQL typically starts with the table with the fewest rows, in this case users. However, since you just want the latest 20 AllTime records joined with the corresponding user records, you actually should start with AllTime since with the LIMIT, it will be the smaller data set.
Use STRAIGHT_JOIN to force the join order:
SELECT users.nickname, AllTime.userid, AllTime.id, AllTime.levelname,
AllTime.levelstr
FROM AllTime
STRAIGHT_JOIN users
ON users.id = AllTime.userid
ORDER BY AllTime.id DESC
LIMIT ($value_from_php),20;
It should be able to use the primary key on the AllTime table and follow it in descending order. It'll grab all the data on the same pages as it goes.
It should also use the primary key on the users table to grab the id and nickname. If there are more than just two columns, you might add a multi-column covering index on (id, nickname) to improve the speed.
If you can, convert the levelstr column to VARCHAR so that the data is stored on the same page as the rest of the data, otherwise, it has to go fetch the text columns separately. This assumes that your columns are under the 8000 byte row limit for InnoDB. There is no way to avoid the USING TEMPORARY unless you get rid of the text column.
Most likely, your host has identified this query by using the slow query log, which can identify all queries that don't use an index, or they may have red flagged it because of the Using temporary.
it doesn't look like the query has a problem.
Review the application code. Most likely the issue is in the code
Check MySQL query execution plan
possibly you are missing an index
Make sure you cache the data in Application and Database (fyi, sometimes you can load the whole database into Application memory)
Make sure you use a connection pool
Create a view (a very small chance for improvement)
Try to remove the "Order By" clause (again a very small chance it will improve the performance)
The query itself takes around 0.00025 seconds ... I got a message from the website host telling me to optimise the below-mentioned query, or the script will be removed since it is pushing a lot of strain onto their servers.
Ask the website host for more details about why this query has been flagged for attention. A query that trivial is not going to cause strain on anything unless it is being called very frequently.
Find out how many times that query is being run. I will bet you a nickel that your site is getting hammered by a bot and being executed hundreds or thousands of times per minute. If so, then that's your real problem.
LIMIT ($value_from_php),20; -- if $value_form_php is huge, then the query is slow. This is because all the 'old' pages need to be scanned before getting to the 20 you need.
By "remembering where you left off" you can make every page equally fast. See this for further details: http://mysql.rjweb.org/doc.php/pagination
is it possible to always respect an expected number of element constraint by filling the remaining of a SQL dataset with previous written data, keeping the data insertion in order? Using MySQL?
Edit
In a web store, I always want to show n elements. I update the show elements every w seconds and I want to loop indefinitely.
By example, using table myTable:
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+----+
Something like
SELECT id FROM myTable WHERE id > 3 ORDER BY id ALWAYS_RETURN_THIS_NUMBER_OF_ELEMENTS 5
would actually return (where ALWAYS_RETURN_THIS_NUMBER_OF_ELEMENTS doesn't exist)
+----+
| id |
+----+
| 4 |
| 5 |
| 4 |
| 5 |
| 4 |
+----+
This is a very strange need. Here is a method:
select id
from (SELECT id
FROM myTable
WHERE id > 3
ORDER BY id
LIMIT 5
) t cross join
(select 1 as n union all select 2 union all select 3 union all select 4 union all select 5
) n
order by n.n, id
limit 5;
You may need to extend the list of numbers in n to be sure you have enough rows for the final limit.
No, that's not what LIMIT does. The LIMIT clause is applied as the last step in the statement execution, after aggregation, after the HAVING clause, and after ordering.
I can't fathom a use case that would require the type of functionality you describe.
FOLLOWUP
The query that Gordon Linoff provided will return the specified result, as long as there is at least one row in myTable that satisfies the predicate. Otherwise, it will return zero rows.
Here's the EXPLAIN output for Gordon's query:
id select_type table type key rows Extra
-- ------------ ---------------- ----- ------- ---- -------------------------------
1 PRIMARY <derived2> ALL 5 Using temporary; Using filesort
1 PRIMARY <derived3> ALL 5 Using join buffer
3 DERIVED No tables used
4 UNION No tables used
5 UNION No tables used
6 UNION No tables used
7 UNION No tables used
UNION RESULT <union3,4,5,6,7> ALL
2 DERIVED myTable range PRIMARY 10 Using where; Using index
Here's the EXPLAIN output for the original query:
id select_type table type key rows Extra
-- ----------- ----------------- ----- ------- ---- -------------------------------
1 SIMPLE myTable range PRIMARY 10 Using where; Using index
It just seems like it would be a whole lot more efficient to reprocess the resultset from the original query, if that resultset contains fewer than five (and more than zero) rows. (When that number of rows goes from 5 to 1,000 or 150,000, it would be even stranger.)
The code to get multiple copies of rows from a resultset is quite simple: fetch the rows, and if the end of the result set is reached before you've fetched five (or N) rows, then just reset the row pointer back to the first row, so the next fetch will return the first row again. In PHP using mysqli, for example, you could use:
$result->data_seek(0);
Or, for those still using the deprecated mysql_ interface:
mysql_data_seek($result,0);
But if you're returning only five rows, it's likely you aren't even looping through the result at all, and you already stuffed all the rows into an array. Just loop back through the beginning of the array.
For MySQL interfaces that don't support a scrollable cursor, we'd just store the whole resultset and process it multiple times. With Perl DBI, using the fetchall_arrayref, with JDBC (which is going to store the whole result set in memory anyway without special settings on the connection), we'd store the resultset as an object.
Bottom line, squeezing this requirement (to produce a resultset of exactly five rows) back to the database server, and pulling back duplicate copies of a row and/or storing duplicate copies of a row in memory just seems like the wrong way to satisfy the use case. (If there's rationale for storing duplicate copies of a row in memory, then that can be achieved without pulling duplicate copies of rows back from the database.)
It's just very odd that you say you're using/implementing a "circular buffer", but that you choose not to "circle" back around to the beginning of a resultset which contains fewer than five rows, and instead need to have MySQL return you duplicate rows. Just very, very strange.
I'm sure I must be doing something stupid, but as is often the case I can't figure out what it is.
I'm trying to run this query:
SELECT `f`.`FrenchWord`, `f`.`Pronunciation`, `e`.`EnglishWord`
FROM (`FrenchWords` f)
INNER JOIN `FrenchEnglishMappings` m ON `m`.`FrenchForeignKey`=`f`.`id`
INNER JOIN `EnglishWords` e ON `e`.`id`=`m`.`EnglishForeignKey`
WHERE `f`.`Pronunciation` = '[whatever]';
When I run it, what happens seems quite weird to me. I get the results of the query fine, 2 rows in about 0.002 seconds.
However, I also get a huge spike in CPU and SHOW PROCESSLIST shows two identical processes for that query with state 'Copying to tmp table on disk'. These seem to keep running endlessly until I kill them or the system freezes.
None of the tables involved is big - between 100k and 600k rows each. tmp_table_size and max_heap_table_size are both 16777216.
Edit: EXPLAIN on the statement gives:
+edit reduced keylen of Pronunciation to 112
+----+-------------+-------+--------+-------------------------------------------------------------+-----------------+---------+----------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------------------------------------------+-----------------+---------+----------------------------+------+----------------------------------------------+
| 1 | SIMPLE | f | ref | PRIMARY,Pronunciation | Pronunciation | 112 | const | 2 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | m | ref | tmpindex,CombinedIndex,FrenchForeignKey,EnglishForeignKey | tmpindex | 4 | dict.f.id | 1 | Using index |
| 1 | SIMPLE | e | eq_ref | PRIMARY,id | PRIMARY | 4 | dict.m.EnglishForeignKey | 1 | |
+----+-------------+-------+--------+-------------------------------------------------------------+-----------------+---------+----------------------------+------+----------------------------------------------+
I'd be grateful if someone could point out what might be causing this. What I really don't understand is what MySQL is doing - surely if the query is complete then it doesn't need to do anything else?
UPDATE
Thanks for all the responses. I learnt something from all of them. This query was made massively faster after following the advice of nrathaus. I added a PronunciationHash binary(16) column to FrenchWords that contains unhex( md5 ( Pronunciation ) ). That is indexed with a keylen of 16 (vs 600+ for the varchar index on Pronunciation), and queries are much faster now.
As said by the EXPLAIN, you key size is HUGE : 602, this requires MySQL to write down the data.
You need to reduce (greatly) the keylen, I believe recommended is below 128.
I suggest you create a column called MD5_FrenchWord which will contain the MD5 value of FrenchWord. Then use this column for the GROUP BY. This assumes that you are looking for similarities, when you group by rather than the actual value
You are misusing GROUP BY. This clause is entirely pointless unless you also have a summary function such as MAX(something) or COUNT(*) in your SELECT clause.
Try removing GROUP BY and see if it helps.
It's not clear what you're trying to do with GROUP BY. But you might try SELECT DISTINCT if you're trying to dedup your result set.
Looking further at this question, it seems like you might benefit from a couple of compound indexes.
First, can you make sure your table declarations have NOT NULL in as many columns as possible?
Second, you're retrieving Pronunciation, FrenchWord, and id from your Frenchwords table, so try this compound index on that table. Your query will then be able to get what it needs directly from the index, saving a bunch of disk io. Notice that Pronunciation is mentioned first in the compound index declaration because that's the value you're searching for. This allows MySQL to do a lookup on the index, and get the other information it needs directly from the index, without thrashing back to the table itself.
(Pronunciation, FrenchWord, id)
You're retrieving Englishword from Englishwords looking it up by id. So, the same reasoning can apply to this compound index.
(id, Englishword)
Finally, I can't tell what your ORDER BY is for, once you use SELECT DISTINCT. You might try getting rid of it. But it probably makes no difference.
Give this a try. If your MySQL server is still thrashing after you make these changes, you have some kind of configuration problem.
I feel like the following query is too slow:
(1679.1ms)
SELECT `media_files` . *
FROM `media_files`
INNER JOIN `playlist_media_files` ON `media_files`.`id` = `playlist_media_files`.`media_file_id`
WHERE `media_files`.`type`
IN (
'AudioFile'
)
AND `playlist_media_files`.`playlist_id` =7
ORDER BY media_files.artist ASC , media_files.release_year ASC , media_files.album ASC , media_files.disc_number ASC , media_files.position ASC
EXPLAIN:
+----+-------------+----------------------+--------+---------------------------------------------------------------------------------------+-------------------------------------------+---------+---------------------------------------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+--------+---------------------------------------------------------------------------------------+-------------------------------------------+---------+---------------------------------------------------------+------+---------------------------------+
| 1 | SIMPLE | playlist_media_files | ref | index_playlist_media_files_on_playlist_id,index_playlist_media_files_on_media_file_id | index_playlist_media_files_on_playlist_id | 4 | const | 3782 | Using temporary; Using filesort |
| 1 | SIMPLE | media_files | eq_ref | PRIMARY,index_media_files_on_type | PRIMARY | 4 | mydb.playlist_media_files.media_file_id | 1 | Using where |
+----+-------------+----------------------+--------+---------------------------------------------------------------------------------------+-------------------------------------------+---------+---------------------------------------------------------+------+---------------------------------+
Every column is indexed.
Any MySQL expert can tell how can it be improved by looking at the explain?
The multiple ORDER BY is killing the performance.
Edit: removed private URLs from comments
Update: it seems I can do something like concat(..fields..) AS sort for a late ORDER BY sort.
For this query you would probably benefit from having a composite index on (playlist_id,media_file_id) in the playlist_media_files table, which would let mysql to use only this index to know which row to read from media_files without having to read actual data from playlist_media_files to see what is the value of media_file_id for every row that satisfies playlist_id = 7 condition (a lot of them do).
You should see additional using index for the first row of explain output.
Mysql would still have to create temporary table to sort it by so many columns, but sorting 4k rows in memory is not so expensive.
So basically try:
ALTER TABLE `playlist_media_files`
ADD INDEX `playlist_media_composite` ( `playlist_id` , `media_file_id` ) ;
and see the results.
Edit: I tried to simulate the same problem on my test db, creating the same tables and filling them with random 400k rows using php, trying to get similar index cardinality.
Without composite index the same query has following execution plan:
1 SIMPLE playlist_media_files ref playlist_id,media_file_id playlist_id 4 const 3925 Using temporary; Using filesort
1 SIMPLE media_files eq_ref PRIMARY PRIMARY 4 test.playlist_media_files.media_file_id 1 Using where
And the average result is about:
Showing rows 0 - 29 ( 2,702 total, Query took 0.0359 sec)
After adding the composite index and doing ANALYZE TABLE playlist_media_files explain shows:
1 SIMPLE playlist_media_files ref playlist_id,media_file_id,playlist_media_composite playlist_media_composite 4 const 3925 Using index; Using temporary; Using filesort
1 SIMPLE media_files eq_ref PRIMARY PRIMARY 4 test.playlist_media_files.media_file_id 1 Using where
And the average result:
Showing rows 0 - 29 ( 2,702 total, Query took 0.0176 sec)
However in both cases sorting was done in memory (and still creating tmp table and sorting takes 80% of the time here) and looking at your profiling screenshot most of the time is lost on copying temporary table to disc. Thats where the difference comes from. My tables have only columns required for this query, and probably my random strings weren't as long as yours, while you have a lot more columns and you are selecting all of them, sorting only on few. So your temporary table doesn't fit in the memory and obviously doing things on disc has to be a lot slower.
So your main focus here should be either on increasing buffer sizes to accomodate your big select or limiting number of columns to select that maybe you don't need that much.
What's the meaning of the orderbys? Using it in more than one variable just doesn't make sense in this case.
Why not just order by one thing?
You might be having problems with database design normalization, are you familiar with that?
A colleague asked me to explain how indexes (indices?) boost up performance; I tried to do so, but got confused myself.
I used the model below for explanation (an error/diagnostics logging database). It consists of three tables:
List of business systems, table "System" containing their names
List of different types of traces, table "TraceTypes", defining what kinds of error messages can be logged
Actual trace messages, having foreign keys from System and TraceTypes tables
I used MySQL for the demo, however I don't recall the table types I used. I think it was InnoDB.
System TraceTypes
----------------------------- ------------------------------------------
| ID | Name | | ID | Code | Description |
----------------------------- ------------------------------------------
| 1 | billing | | 1 | Info | Informational mesage |
| 2 | hr | | 2 | Warning| Warning only |
----------------------------- | 3 | Error | Failure |
| ------------------------------------------
| ------------|
Traces | |
--------------------------------------------------
| ID | System_ID | TraceTypes_ID | Message |
--------------------------------------------------
| 1 | 1 | 1 | Job starting |
| 2 | 1 | 3 | System.nullr..|
--------------------------------------------------
First, i added some records to all of the tables and demonstrated that the query below executes in 0.005 seconds:
select count(*) from Traces
inner join System on Traces.System_ID = System.ID
inner join TraceTypes on Traces.TraceTypes_ID = TraceTypes.ID
where
System.Name='billing' and TraceTypes.Code = 'Info'
Then I generated more data (no indexes yet)
"System" contained about 100 entries
"TraceTypes" contained about 50 entries
"Traces" contained ~10 million records.
Now the previous query took 8-10 seconds.
I created indexes on Traces.System_ID column and Traces.TraceTypes_ID column. Now this query executed in milliseconds:
select count(*) from Traces where System_id=1 and TraceTypes_ID=1;
This was also fast:
select count(*) from Traces
inner join System on Traces.System_ID = System.ID
where System.Name='billing' and TraceTypes_ID=1;
but the previous query which joined all the three tables still took 8-10 seconds to complete.
Only when I created a compound index (both System_ID and TraceTypes_ID columns included in index), the speed went down to milliseconds.
The basic statement I was taught earlier is "all the columns you use for join-ing, must be indexed".
However, in my scenario I had indexes on both System_ID and TraceTypes_ID, however MySQL didn't use them. The question is - why? My bets is - the item count ratio 100:10,000,000:50 makes the single-column indexes too large to be used. But is it true?
First, the correct, and the easiest, way to analyze a slow SQL statement is to do EXPLAIN. Find out how the optimizer chose its plan and ponder on why and how to improve that. I'd suggest to study the EXPLAIN results with only 2 separate indexes to see how mysql execute your statement.
I'm not very familiar with MySQL, but it seems that there's restriction of MySQL 4 of using only one index per table involved in a query. There seems to be improvements on this since MySQL 5 (index merge), but I'm not sure whether it applies to your case. Again, EXPLAIN should tell you the truth.
Even with using 2 indexes per table allowed (MySQL 5), using 2 separate indexes is generally slower than compound index. Using 2 separate indexes requires index merge step, compared to the single pass of using a compound index.
Multi Column indexes vs Index Merge might be helpful, which uses MySQL 5.4.2.
It's not the size of the indexes so much as the selectivity that determines whether the optimizer will use them.
My guess would be that it would be using the index and then it might be using traditional look up to move to another index and then filter out. Please check the execution plan. So in short you might be looping through two indexes in nested loop. As per my understanding. We should try to make a composite index on column which are in filtering or in join and then we should use Include clause for the columns which are in select. I have never worked in MySql so my this understanding is based on SQL Server 2005.