I have a pretty simple query over a table with about 14 millions records that is taking about 30 minutes to complete. Here is the query:
select a.switch_name, a.recording_id, a.recording_date, a.start_time,
a.recording_id, a.duration, a.ani, a.dnis, a.agent_id, a.campaign,
a.call_type, a.agent_call_result, a.queue_name, a.rec_stopped,
a.balance, a.client_number, a.case_number, a.team_code
from recording_tbl as a
where client_number <> '1234567'
Filtering on client_number seems to be the culprit and that columns does have an index. I'm not sure what else to try.
You can start from creating INDEX on client_number and see how it helps, but the best results you'll get when you analyze your problem using EXPLAIN command.
http://dev.mysql.com/doc/refman/5.5/en/execution-plan-information.html
Is the table myisam or innodb? If innodb increase innodb buffer to a large amount so entire table can fit into memory. If myisam well it should automatically load into memory via OS cache buffers. Install more RAM. Install faster disk drives. These seem to be your only solutions considering you are doing an entire table scan (minus whatever client number which appears to be your testing client id?)
It takes awhile to load the tables into RAM as well so dont expect it as soon as the db starts up.
Your query is doing a full table scan on the one table in the query, recording_tbl. I am assuming this is a table and not a view, because of the "tbl" prefix. If this is a view, then you need to optimize the view.
There is no need to look at the explain. An index is unlikely to be helpful, unless 99% or so of the records have a client_number of 1234567. An index might makes things work, because of a phenomenon called thrashing.
Your problem is either undersized hardware or underallocated resources for the MySQL query engine. I would first look at buffering for the engine, and then the disk hardware and bandwidth to the processor.
Maybe...
where client_number = '1234567'
...would be a bit faster.
If Client_Number is stored as a number field then
where client_number = 1234567
May be faster if the string comparison was causing it to do a cast and possibly preventing the indexes being used.
Why do you need to return 14m rows? (I'm assuming that most records do not have the ID you are searching on).
If you don't need all 14m rows, add LIMIT to the end of your query. Less rows -> less memory -> faster query.
Example:
select a.switch_name, a.recording_id, a.recording_date, a.start_time,
a.recording_id, a.duration, a.ani, a.dnis, a.agent_id, a.campaign,
a.call_type, a.agent_call_result, a.queue_name, a.rec_stopped,
a.balance, a.client_number, a.case_number, a.team_code
from recording_tbl as a
where client_number <> '1234567'
LIMIT 1000
Would return the first 1000 rows.
And here's a comparison of how to return the top N rows across different SQL RDBMS:
http://www.petefreitag.com/item/59.cfm
Related
We had to run a couple of update queries directly on a prod mysql datastore. Mysql 5.7 is being used.
The first of them was something like below:
Update dbName.tableName
set row1 = 1
where clientID = 123 and
identifier like 'ABC.%'
limit 16000
The above query matched around 16k rows and took 29 seconds to run.
My second query was something like
Update dbName.tableName
set row1 = 1 ,
row2 = 2
where clientId = 123 and
identifier like 'XYZ.%'
limit 8000
This query took only 1.2 seconds to run and matched around 8000 rows.
There is no index on any of the filters or any of the columns that are being modified.
I checked mysql's performance monitor and there was nothing anomalous happening throughout the duration of both of these queries.
I don't think mysql query cache is involved as I am given to understand that it works only on select queries whose statements are the same when compared "byte for byte".
How is the second query so much faster than the first when the matching rows are comparable and the filters are similar?
A LIMIT without an ORDER BY is asking for random rows; beware.
Add this composite INDEX(clientId, identifier).
The timing anomaly may be explained by this:
The first query did not find any of the table in cache (the "buffer pool"), so it read it all from disk. Since there was no useful index, it had to read all. The second query found the entire table cached, so it was much faster. Meanwhile, I will guess that the disk is SSD, not HDD.
A single update modifying 16K rows smells like a poor schema design. Would you like to discuss that further?
The "Query cache" is useful only for SELECTs. The QC is purged by write operations, so it will not explain the difference. It is different than the "buffer pool". (The QC may even be turned off.)
I have several tables with ~15 million rows. When I create an idex on the id column and then I execute a simple query like SELECT * FROM my_table WHERE id = 1 I retrieve the data within one second. But then, after a few minutes, if I execute the query with a different id it takes over 15 seconds.
I'm sure it is not the query cache because I'm trying different ids all the time to make sure I'm not retrieving from the cache. Also, I used EXPLAIN to make sure the index it's being used.
The specs of the server are:
CPU: Intel Dual Xeon 5405 Harpertown 2.0Ghz Quad Core
RAM: 8GB
Hard drive 2: 146GB SAS (15k rpm)
Another thing I noticed is that if I execute REPAIR TABLE my_table the queries become within one second again. I assume something is being cached, either the table or the index. If so, is there any way to tell MySQL to keep it cached. Is it normal, given the specs of the server, to take around 13 seconds on an indexed table? The index is not unique and each query returns around 3000 rows.
NOTE: I'm using MyISAM and I know there won't be any write in these tables, all the queries will be to read data.
SOLVED: thank you for your answers, as many of you pointed out it was the key_buffer_size.I also reordered the tables using the same column as the index so the records are not scattered, now I'm executing the queries consistently under 1 second.
Please provide
SHOW CREATE TABLE
SHOW VARIABLES LIKE '%buffer%';
Likely causes:
key_buffer_size (when using MyISAM) is not 20% of RAM; or innodb_buffer_pool_size is not 70% of available RAM (when using InnoDB).
Another query (or group of queries) is coming in and "blowing out the cache" (key_buffer or buffer_pool). Look for such queries).
When using InnoDB, you don't have a PRIMARY KEY. (It is really important to have such.)
For 3000 rows to take 15 seconds to load, I deduce:
The cache for the table (not necessarily for the index) was blown out, and
The 3000 rows were scattered around the table (hence fetching one row does not help much in finding subsequent rows).
Memory allocation blog: http://mysql.rjweb.org/doc.php/memory
Is it normal, given the specs of the server, to take around 13 seconds on an indexed table?
The high variance in response time indicates that something is amiss. With only 8 GB of RAM and 15 million rows, you might not have enough RAM to keep the index in memory.
Is swap enabled on the server? This could explain the extreme jump in response time.
Investigate the memory situation with a tool like top, htop or glances.
The orders table has 2m records. There are ~900K unique ship-to-ids.
There is an index on ship_to_id ( the field isint(8)).
The query below takes nearly 10mn to complete. I've run PROCESSLIST which has Command = Query and State = Sending Data.
When I run explain, the existing index is used, and possible_keys is NULL.
Is there anything I should do to speed this query up? Thanks.
SELECT
ship_to_id as customer_id
FROM orders
GROUP BY ship_to_id
HAVING SUM( price_after_discount ) > 0
Does not look like you have a useful index. Try adding an index on price_after_discount, and add a where condition like this:
WHERE price_after_discount > 0
to minimize the number of rows you need to sum as you can obviously discard any that are 0.
Also try running "top" command and look at the io "wait" column while the query is running. If its high, it means your query causes a lot of disk I/O. You can increase various memory buffers if you have the RAM to speed this up (if you're using innodb) or myisam is done through filesystem cacheing. Restarting the server will flush these caches.
If you do not have enough RAM (which you shouldn't need too much for 2M records) then consider a partitioning scheme against maybe ship-to-ids column (if your version of mysql supports it).
If all the orders in that table aren't current (i.e. not going to change again) then you could archive them off into another table to reduce how much data has to be scanned.
Another option is to throw a last_modified timestamp on the table with an index. You could then keep track of when the query is run and store the results in another table (query_results). When it's time to run the query again, you would only need to select the orders that were modified since the last time the query was run, then use that to update the query_results. The logic is a little more complicated, but it should be much faster assuming a low percentage of the orders are updated between query executions.
MySQL will use an index for a group by, at least according to the documentation, as explained here.
To be most useful, all the columns used in the query should be in the index. This prevents the engine from having to reference the original data as well as the index. So, try an index on orders(ship_to_id, price_after_discount).
In a tb with 1 mil. rows if I do (after I restart the computer - so nothing it's cached):
1. SELECT price,city,state FROM tb1 WHERE zipId=13458;
the result is 23rows in 0.270s
after I run 'LOAD INDEX INTO CACHE tb1' (key_buffer_size=128M and total index size for tb is 82M):
2. SELECT price,city,state FROM tb1 WHERE zipId=24781;
the result is 23rows in 0.252s, Key_reads remains constant, Key_read_requests is incremented with 23
BUT after I load 'zipId' into OS cache, if I run again the query:
2. SELECT price,city,state FROM tb1 WHERE zipId=20548;
the result is 22rows in 0.006s
This it's just a simple example, but I run tens of tests and combinations. But the results are always the same.
I use: MySql with MyISAM, WINDOWS 7 64, and the query_cache is 0;
zipId it's a regular index (not primary key)
SHOULDN'T key_cache be faster than OS cache ??
SHOULDN'T be a huge difference in speed, after I load the index into cache ??
(in my test it's almost no difference).
I've read a lot of websites,tutorials and blogs on this matter but none of them really discuss the difference in speed. So, any ideas or links will be greatly appreciated.
Thank you.
Under normal query processing, MySQL will scan the index for the where clause values (i.e. zipId = 13458). Then uses the index to look up the corresponding values from the MyISAM main table (a second disk access). When you load the table into memory, the disk accesses are all done in memory, not from reading a real disk.
The slow part of the query is the lookup from the index into the main table. So loading the index into memory may not improve the query speed.
One thing to try is Explain Select on your queries to see how the index is being used.
Edit: Since I don't think the answers to your comments will fit in a comment space. I'll answer them here.
MyISAM in and of itself does not have a cache. It relies upon the OS to do the disk caching. How much of your table is cached by depends upon what else you are running in the system, and how much data you are reading through. Windows in particular does not allow the user much control over what data is cached and for how long.
The OS caches disk blocks (either 4K or 8K chunks) of the index file or the full table file.
SELECT indexed_col FROM tb1 WHERE zipId+0>1
Queries like this where you use functions on the predicate (Where clause) can cause MySQL to do full table scans rather than using any index. As I suggested above, use EXPLAIN SELECT to see what MySQL is doing.
If you want more control over the cache, try using an INNODB table. The InnoDB engine creates its own cache which you can size, and does a better job of keeping the most recent used stuff in it.
I'm running a mySQL query that joins various tables of 500,000+ rows. Sometimes it takes a second, other times around 15 seconds! This is on my local machine. I have experienced similarly varied times before on other intensive queries, does anyone know why this is?
Thanks
Thanks for the replies - I am using appropriate indexes, inner and left joins and have a WHERE clause range of one week out of possible 2 year period of invoices. If I keep varying it (so presumably query results are not cached) and re-running, time varies a lot, even if no. of rows retrieved is similar. The server is not busy. A few scheduled queries every minute but not intensive, take around 200ms.
The explain plan shows that a table of around 2000 rows is always fully scanned. So maybe these rows are sometimes cached, or maybe indexes are cached - didnt know indexes could be cached. I will try again with caching turned off.
Editing again - query cache is in fact off, I'm using InnoDB so looks like increasing innodb_buffer_pool_size is way to go
Same query each time?
It's hard to tell, based on what you've posted. If we assume that the schema and data aren't changing, I'd guess that there's something else running on your machine when the queries are long that would explain the difference. It could be that the state of memory is different, so paging is going on; an anti-virus program is running; some other service has started. It's impossible to answer.
Try to do an
Optimize Table
That should help to refresh some data useful for the query planner.
You have not give us much information, if you're using MyISAM tables, it may be a matter of locks.
Are you using ANSI INNER JOINs? Little basic, but don't use "cross joins". Those are the joins with the comma, like
SELECT * FROM t1, t2 WHERE t1.id_t1=t2.id_t1
Last things you may want to try. Increase your buffers (innodb), your key_buffers (myisam), and some query cache buffers.
Here's some common reasons(bar your server simply being too busy)
The slow query is hitting the harddrive. In the fast case the indexes and data are already cached in MySQL or the OS file cache.
Retrieving the data gets locked by updates/inserts, for MyISAM tables the whole table gets locked whenever someone inserts/updates data in it in some cases.
Table statistics are out of date and/or the wrong index gets selected. running analyze oroptimize on the table can help.
You have the query cache enabled, fetching the result of a cached query is fast, fetching it if it's not in the cache might be slow. Try turning off the query cache to check if the query is always slow if its not fetched from the cache.
In any case, you should show the output of EXPLAIN on your queries to verify indexes are getting used properly - even if they're not, queries can be fast if everything is in ram but grinding to a halt if it needs to hit the hardddrive.