Mysql Index misbehave in rails - mysql

I have an rails app hosted with Mysql, there is a reservations table with index set in column rescheduled_reservation_id (nullable).
In my rails app there are two part to query reservation by rescheduled_reservation_id fields as below:
Transit::Reservation.find_by(rescheduled_reservation_id: 25805)
and produce the following log output:
Transit::Reservation Load (60.3ms) SELECT `transit_reservations`.* FROM `transit_reservations` WHERE `transit_reservations`.`deleted_at` IS NULL AND `transit_reservations`.`rescheduled_reservation_id` = 25805 LIMIT 1
However the other part of the app:
Transit::Reservation.where(rescheduled_reservation_id: 25805).last
with the log output belows
Transit::Reservation Load (2.3ms) SELECT `transit_reservations`.* FROM `transit_reservations` WHERE `transit_reservations`.`deleted_at` IS NULL AND `transit_reservations`.`rescheduled_reservation_id` = 25805 ORDER BY `transit_reservations`.`id` DESC LIMIT 1
As clearly seen the first query
Transit::Reservation Load (60.3ms) SELECT `transit_reservations`.* FROM `transit_reservations` WHERE `transit_reservations`.`deleted_at` IS NULL AND `transit_reservations`.`rescheduled_reservation_id` = 25805 LIMIT 1
took up to 60ms, the index might not have been used properly comparing to 2ms in this
Transit::Reservation Load (2.3ms) SELECT `transit_reservations`.* FROM `transit_reservations` WHERE `transit_reservations`.`deleted_at` IS NULL AND `transit_reservations`.`rescheduled_reservation_id` = 25805 ORDER BY `transit_reservations`.`id` DESC LIMIT 1
I also tried to debug further by running explain on both queries, I got back the same result i.e the index rescheduled_reservation_id being used
Is there anyone experiencing with this issue? I am wondering whether rails mysql connection ( I am using mysql2 gem ) might cause Mysql server to not choose the right index

It's Rare, but Normal.
The likely answer is that the first occurrence did not find the blocks it needed cached in the buffer_pool. So, it had to fetch them from disk. On a plain ole HDD, a Rule of Thumb is 10ms per disk hit. So, maybe there were 6 blocks that it needed to fetch, leading to 60.3ms.
Another possibility is that other activities were interfering, thereby slowing down this operation.
2.3ms is reasonable for a simple query like that the can be performed entirely with cached blocks in RAM.
Was the server recently restarted? After a restart, there is nothing in cache. Is the table larger than innodb_buffer_pool_size? If so, that would lead to 60ms happening sporadically -- blocks would get bumped out. (Caveat: The buffer_pool should not be made so big that 'swapping' occurs.)
A block is 16KB; it contains some rows of data or rows of index or nodes of a BTree. Depending on the size of the table, even that 'point query' might have needed to look at 6 blocks or more.
If you don't get 2.3ms most of the time, we should dig deeper. (I have hinted at sizes to investigate.)

Related

SQL query on MySQL taking three second longer with no changes to the database or to the SQL query

I have been asked to diagnose why a query looking something like this
SELECT COUNT(*) AS count
FROM users
WHERE first_digit BETWEEN 500 AND 1500
AND second_digit BETWEEN 5000 AND 45000;
went from taking around 0.3 seconds to execute suddenly is taking over 3 seconds. The system is MySQL running on Ubuntu.
The table is not sorted and contains about 1.5M rows. After I added a composite index I got the execution time down to about 0.2 seconds again, however this does not explain the root cause why all of a sudden the execution time increased exponentially.
How can I begin to investigate the cause of this?
Since your SQL query has not changed, and I interpret your description as the data set has not changed/grown - I suggest you take a look at the following areas, in order:
1) Have your removed the index and run your SQL query again?
2) Other access to the database. Are other applications or users running heavy queries on the same database? Larger data transfers, in particular to and from the database server in question.
A factor of 10 slowdown? A likely cause is going from entirely cached to not cached.
Please show us SHOW CREATE TABLE. EXPLAIN SELECT, RAM size, and the value of innodb_buffer_pool_size. And how big (GB) is the table?
Also, did someone happen to do a dump or ALTER TABLE or OPTIMIZE TABLE just before the slowdown.
The above info will either show what caused caching to fail, or show the need for more RAM.
INDEX(first_digit, second_digit) (in either order) will be "covering" for that query; this will be faster than without any index.

MySQL UPDATES get progressively slower

I have an application using a MySQL database hosted on one machine and 6 clients running on other machines that read and write to it over a local network.
I have one main work table which contains about 120,000 items in rows to be worked on. Each client grabs 40 unallocated work items from the table (marking them as allocated), does the work and then writes back the results to the same work table. This sequence continues until there is no more work to do.
The above is a picture that shows the amount of time taken to write back each block of 40 results to the table from one of the clients using UPDATE queries. You can see that the duration is fairly small for most of the time but suddenly the duration goes up to 300 sec and stays there until all work completes. This rapid increase in time to execute the queries towards the end is what I need help with.
The clients are not heavily loaded. The server is a little loaded but it has 16GB of RAM, 8 cores and is doing nothing other than hosting this db.
Here is the relevant SQL code.
Table creation:
CREATE TABLE work (
item_id MEDIUMINT,
item VARCHAR(255) CHARACTER SET utf8,
allocated_node VARCHAR(50),
allocated_time DATETIME,
result TEXT);
/* Then insert 120,000 items, which is quite fast. No problem at this point. */
INSERT INTO work VALUES (%s,%s,%s,NULL,NULL,NULL);
Client allocating 40 items to work on:
UPDATE work SET allocated_node = %s, allocated_time=NOW()
WHERE allocated_node IS NULL LIMIT 40;
SELECT item FROM work WHERE allocated_node = %s AND result IS NULL;
Update the row with the completed result (this is the part that gets really slower after a few hours of running):
/* The chart above shows the time to execute 40 of these for each write back of results */
UPDATE work SET result = %s WHERE item = %s;
I'm using MySQL on Ubuntu 14.04, with all the standard settings.
The final table is about 160MB, and there are no indexes.
I don't see anything wrong with my queries and they work fine apart from the whole thing taking twice as long as it should overall.
Can someone with experience in these matters suggest any configuration settings I should change in MySQL to fix this performance issue or please point out any issues with what I'm doing that might explain the timing in the chart.
Thanks.
Without an index the complete table is scanned. If the item id gets larger a greater amount of the table has to be scanned to get the row to be updated.
I would try an index perhaps even the primary key for item_id?
Still the increase of duration seems too high for such a machine and relativly small database.
Given that more details would be required for a proper diagnosing (see below), I see two potential performance decrease possibilities here.
One is that you're running into a Schlemiel the Painter's Problem which you could ameliorate with
CREATE INDEX table_ndx ON table(allocated_node, item);
but it looks unlikely with so low a cardinality. MySQL shouldn't take so long to locate unallocated nodes.
A more likely explanation could be that you're running into a locking conflict of some kind between clients. To be sure, during those 300 seconds in which the system is stalled, run
SHOW FULL PROCESSLIST
from an administrator connection to MySQL. See what it has to say, and possibly use it to update your question. Also, post the result of
SHOW CREATE TABLE
against the tables you're using.
You should be doing something like this:
START TRANSACTION;
allocate up to 40 nodes using SELECT...FOR UPDATE;
COMMIT WORK;
-- The two transactions serve to ensure that the node selection can
-- never lock more than those 40 nodes. I'm not too sure of that LIMIT
-- being used in the UPDATE.
START TRANSACTION;
select those 40 nodes with SELECT...FOR UPDATE;
<long work involving those 40 nodes and nothing else>
COMMIT WORK;
If you use a single transaction and table level locking (even implicitly), it might happen that one client locks all others out. In theory this ought to happen only with MyISAM tables (that only have table-level locking), but I've seen threads stalled for ages with InnoDB tables as well.
Your 'external locking' technique sounds fine.
INDEX(allocated_node) will help significantly for the first UPDATE.
INDEX(item) will help significantly for the final UPDATE.
(A compound index with the two columns will help only one of the updates, not both.)
The reason for the sudden increase: You are continually filling in big TEXT fields, making the table size grow. At some point the table is so big that it cannot be cached in RAM. So, it goes from being cached to being a full table scan.
...; SELECT ... FOR UPDATE; COMMIT; -- The FOR UPDATE is useless since the COMMIT happens immediately.
You could play with the "40", though I can't think why a larger or smaller number would help.

How do I prime MySQL so I can benchmark the performance of my query / index?

I have a 25 million row MySQL 5.6 table. I'm in the process of refining my indexes on the table. When I execute a simple query the first time it takes 10 seconds and it only takes 0.1 seconds every subsequent time. When I filter on a different key the execution time jumps back up to 10 seconds.
This behavior tells me I'm reaping the benefits of caching and buffering on the subsequent queries. I understand MySQL implements an QUERY CACHE but subsequent queries post RESET QUERY CACHE still only take 0.1 seconds.
I would ideally like to:
Call the query a few times to get a baseline average execution time (i.e. 10.38 seconds over 10 instance).
Refine my table index design
Call the query a few times to get a new average execution time (i.e. 7.91 seconds over 10 instance).
Decide whether to keep or discard the refinements
How do I prime MySQL so I can benchmark the performance of my query / index without the benefit of the buffering, caching, pre-fetching, etc?
MySQL has a performance_schema named database default. It can be usefull for you. If it not exists on your server try to turn on this feature. Here's a description: http://dev.mysql.com/doc/refman/5.5/en/performance-schema-quick-start.html
Executive summary:
Turn off the Query cache;
Set innodb_buffer_pool_size;
Run your test query twice, using the second timing;
Don't try to include all the I/O; it's not realistic.
Details:
First, you need to understand what the "Query cache" is and is not. It is a kludge that is usually best turned off. It records the exact queries (byte for byte) and their result sets, but it gets purged whenever a write occurs. (I am over-simplifying it.) So, either turn it off, or use SELECT SQL_NO_CACHE ... to keep it from confusing the benchmarks. A typical query that can make use of the QC will take about 1 millisecond, regardless of how complex it is. That's not a useful metric.
Now, let's get to the 'real' cache -- InnoDB's "buffer pool". (I assume you are using InnoDB, not MyISAM.) That cache should be about 70% of available RAM (but may not be if you have not set innodb_buffer_pool_size in my.cnf). It caches reads and writes for the blocks that make up InnoDB tables. When you read a record from a table, InnoDB needs to find some block(s) from the index you are using, plus the block(s) that contain the data. Reading a block from disk takes, say, 10ms. If these were recently looked at, then they are very likely to be in the buffer pool. Usually the difference is 10x between not cached and cached. (You are seeing a 100x speedup, so I may not be explaining everything.)
When testing the speed of a given query, I like to run it twice. The first time it will fetch any blocks not yet in the buffer pool, then it will use just CPU effort to perform it. The second run will be only CPU (unless it, say, scans 25M rows and they don't fit in the buffer pool).
To get timings for I/O gets more complex, and often not necessary. I say "not necessary" because comparing two runs that show a difference in CPU time generally implies that the I/O could also be similarly different. Also, timing I/O is not realistic because in "production" the buffer pool is full of stuff cached, hence many queries won't need to hit the disk.
If you would like to discuss your app further, we can discuss PARTITIONing (not necessarily beneficial), Data Warehousing (and speed techniques), and/or UUIDs (bad), etc.
MySQL has more powerful ways to judge query performance, the timing helps, but explain provides more insight (I believe it forces the query to run uncached, since other databases do this for execution numbers, but I can't find a reference).
If you make index changes to the database, you're going to first want to optimize for the index. This updates the statistics tables to tell MySQL how to build your explain plan. The syntax is just:
OPTIMIZE TABLE_NAME
Plus some light reading: http://dev.mysql.com/doc/refman/5.1/en/optimize-table.html
Then you're going to want to run an explain plan to tell you what MySQL is really doing with your indices. If you put on indices that never get used, then you're just slowing down your inserts. How to run:
EXPLAIN YOUR_QUERY
Here's an example from MySQL
mysql> EXPLAIN EXTENDED
-> SELECT t1.a, t1.a IN (SELECT t2.a FROM t2) FROM t1\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: t1
type: index
possible_keys: NULL
key: PRIMARY
key_len: 4
ref: NULL
rows: 4
Extra: Using index
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: t2
type: index_subquery
possible_keys: a
key: a
key_len: 5
ref: func
rows: 2
Extra: Using index
2 rows in set, 1 warning (0.00 sec)
mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: select `test`.`t1`.`a` AS `a`,
<in_optimizer>(`test`.`t1`.`a`,
<exists>(<index_lookup>(<cache>(`test`.`t1`.`a`)
in t2 on a checking NULL having
<is_not_null_test>(`test`.`t2`.`a`)))) AS `t1.a
IN (SELECT t2.a FROM t2)` from `test`.`t1`
1 row in set (0.00 sec)
Notice the rows field, this tells you how many records MySQL had to deal with to get your answer. Since this involves an inner select, then the "query cost" should be ~4+2, which is a relative number you can compare against a different query. Whereas, if it were a join then the "query cost" would be ~4x2. You can consider the type a modifier to your cost equation, but it's easiest just to go with the best possible type for whatever operation you're doing. There's a pretty good explanation about types on the blog post link below.
More explain plan reading: http://dev.mysql.com/doc/refman/5.5/en/execution-plan-information.html
A pretty good explain plan explanation: http://www.sitepoint.com/using-explain-to-write-better-mysql-queries/
You have to use the SQL_NO_CACHE function in order to disable the MySQL query cache on a per-query basis
SELECT SQL_NO_CACHE <columns> FROM table WHERE <condition>;
One thing you have to use is the EXPLAIN command in order to know which indices are used in your query, or aren't:
EXPLAIN SELECT SQL_NO_CACHE <columns> FROM table WHERE <condition>;
What you experience is the difference between a hot and a cold query / database. A hot query is a query accessing the required data already being in memory where a cold (query) needs to load its data into the memory. Thats why with big databases you speak of a warmup phase before the data cache in the memory contains the data necessary for serving most of the requests.
Sure it can be the query cache but is also likely its more likely the disk cache. Depends on the memory you are using.
Next question is what are you going to do with your database. Usually today you are highly likely to through memory at the problem. A TB of memory just cost 10k. So who cares.
Also use explain. Database is usually all about pages you page in from the database and the way the inmemory data is used to fullfill your request.
EXPLAIN is the correct way to use. Query Cache you can usually switch off. Not much use to it. If the result is static it is highly likely you use a web cache or site cache for that anyways.
Also check the indizes being used. Often you find ways to ease the pain.
when I was doing this, I used SELECT SQL_NO_CACHE.
Or you can make whitespace changes to the query each time. The query cache indexes by the verbatim query string, and any change, including extra spaces, appear to be a different query.

Does executing a statement always take in memory for the result set?

I was told by a colleague that executing an SQL statement always puts the data into RAM/swap by the database server. Thus it is not practical to select large result sets.
I thought that such code
my $sth = $dbh->prepare('SELECT million_rows FROM table');
while (my #data = $sth->fetchrow) {
# process the row
}
retrieves the result set row by row, without it being loaded to RAM.
But I can't find any reference to this in DBI or MySQL docs. How is the result set really created and retrieved? Does it work the same for simple selects and joins?
Your colleague is right.
By default, the perl module DBD::mysql uses mysql_store_result which does indeed read in all SELECT data and cache it in RAM. Unless you change that default, when you fetch row-by-row in DBI, it's just reading them out of that memory buffer.
This is usually what you want unless you have very very large result sets. Otherwise, until you get the last data back from mysqld, it has to hold that data ready and my understanding is that it causes blocks on writes to the same rows (blocks? tables?).
Keep in mind, modern machines have a lot of RAM. A million-row result set is usually not a big deal. Even if each row is quite large at 1 KB, that's only 1 GB RAM plus overhead.
If you're going to process millions of rows of BLOBs, maybe you do want mysql_use_result -- or you want to SELECT those rows in chunks with progressive uses of LIMIT x,y.
See mysql_use_result and mysql_store_result in perldoc DBD::mysql for details.
This is not true (if we are talking about the database server itself, not client layers).
MySQL can buffer the whole resultset, but this is not necessarily done, and if done, not necessarily in RAM.
The resultset is buffered if you are using inline views (SELECT FROM (SELECT …)), the query needs to sort (which is shown as using filesort), or the plan requires creating a temporary table (which is shown as using temporary in the query plan).
Even if using temporary, MySQL only keeps the table in memory when its size does not exceed the limit set in tmp_table. When the table grows over this limit, it is converted from memory into MyISAM and stored on disk.
You, though, may explicitly instruct MySQL to buffer the resultset by appending SQL_BUFFER_RESULT instruction to the outermost SELECT.
See the docs for more detail.
No, that is not how it works.
Database will not hold rows in RAM/swap.
However, it will try, and mysql tries hard here, to cache as much as possible (indexes, results, etc...). Your mysql configuration gives values for the available memory buffers for different kinds of caches (for different kinds of storage engines) - you should not allow this cache to swap.
Test it
Bottom line - it should be very easy to test this using client only (I don't know perl's dbi, it might, but I doubt it, be doing something that forces mysql to load everything on prepare). Anyway... test it:
If you actually issue a prepare on SELECT SQL_NO_CACHE million_rows FROM table and then fetch only few rows out of millions.
You should then compare performance with SELECT SQL_NO_CACHE only_fetched_rows FROM table and see how that fares.
If the performance is comparable (and fast) then I believe that you can call your colleague's bluff.
Also if you enable log of the statements actually issued to mysql and give us a transcript of that then we (non perl folks) can give more definitive answer on what would mysql do.
I am not super familiar with this, but it looks to me like DBD::mysql can either fetch everything up front or only as needed, based on the mysql_use_result attribute. Consult the DBD::mysql and MySQL documentation.

Which is faster, key_cache or OS cache?

In a tb with 1 mil. rows if I do (after I restart the computer - so nothing it's cached):
1. SELECT price,city,state FROM tb1 WHERE zipId=13458;
the result is 23rows in 0.270s
after I run 'LOAD INDEX INTO CACHE tb1' (key_buffer_size=128M and total index size for tb is 82M):
2. SELECT price,city,state FROM tb1 WHERE zipId=24781;
the result is 23rows in 0.252s, Key_reads remains constant, Key_read_requests is incremented with 23
BUT after I load 'zipId' into OS cache, if I run again the query:
2. SELECT price,city,state FROM tb1 WHERE zipId=20548;
the result is 22rows in 0.006s
This it's just a simple example, but I run tens of tests and combinations. But the results are always the same.
I use: MySql with MyISAM, WINDOWS 7 64, and the query_cache is 0;
zipId it's a regular index (not primary key)
SHOULDN'T key_cache be faster than OS cache ??
SHOULDN'T be a huge difference in speed, after I load the index into cache ??
(in my test it's almost no difference).
I've read a lot of websites,tutorials and blogs on this matter but none of them really discuss the difference in speed. So, any ideas or links will be greatly appreciated.
Thank you.
Under normal query processing, MySQL will scan the index for the where clause values (i.e. zipId = 13458). Then uses the index to look up the corresponding values from the MyISAM main table (a second disk access). When you load the table into memory, the disk accesses are all done in memory, not from reading a real disk.
The slow part of the query is the lookup from the index into the main table. So loading the index into memory may not improve the query speed.
One thing to try is Explain Select on your queries to see how the index is being used.
Edit: Since I don't think the answers to your comments will fit in a comment space. I'll answer them here.
MyISAM in and of itself does not have a cache. It relies upon the OS to do the disk caching. How much of your table is cached by depends upon what else you are running in the system, and how much data you are reading through. Windows in particular does not allow the user much control over what data is cached and for how long.
The OS caches disk blocks (either 4K or 8K chunks) of the index file or the full table file.
SELECT indexed_col FROM tb1 WHERE zipId+0>1
Queries like this where you use functions on the predicate (Where clause) can cause MySQL to do full table scans rather than using any index. As I suggested above, use EXPLAIN SELECT to see what MySQL is doing.
If you want more control over the cache, try using an INNODB table. The InnoDB engine creates its own cache which you can size, and does a better job of keeping the most recent used stuff in it.