MySQL UPDATES get progressively slower - mysql

I have an application using a MySQL database hosted on one machine and 6 clients running on other machines that read and write to it over a local network.
I have one main work table which contains about 120,000 items in rows to be worked on. Each client grabs 40 unallocated work items from the table (marking them as allocated), does the work and then writes back the results to the same work table. This sequence continues until there is no more work to do.
The above is a picture that shows the amount of time taken to write back each block of 40 results to the table from one of the clients using UPDATE queries. You can see that the duration is fairly small for most of the time but suddenly the duration goes up to 300 sec and stays there until all work completes. This rapid increase in time to execute the queries towards the end is what I need help with.
The clients are not heavily loaded. The server is a little loaded but it has 16GB of RAM, 8 cores and is doing nothing other than hosting this db.
Here is the relevant SQL code.
Table creation:
CREATE TABLE work (
item_id MEDIUMINT,
item VARCHAR(255) CHARACTER SET utf8,
allocated_node VARCHAR(50),
allocated_time DATETIME,
result TEXT);
/* Then insert 120,000 items, which is quite fast. No problem at this point. */
INSERT INTO work VALUES (%s,%s,%s,NULL,NULL,NULL);
Client allocating 40 items to work on:
UPDATE work SET allocated_node = %s, allocated_time=NOW()
WHERE allocated_node IS NULL LIMIT 40;
SELECT item FROM work WHERE allocated_node = %s AND result IS NULL;
Update the row with the completed result (this is the part that gets really slower after a few hours of running):
/* The chart above shows the time to execute 40 of these for each write back of results */
UPDATE work SET result = %s WHERE item = %s;
I'm using MySQL on Ubuntu 14.04, with all the standard settings.
The final table is about 160MB, and there are no indexes.
I don't see anything wrong with my queries and they work fine apart from the whole thing taking twice as long as it should overall.
Can someone with experience in these matters suggest any configuration settings I should change in MySQL to fix this performance issue or please point out any issues with what I'm doing that might explain the timing in the chart.
Thanks.

Without an index the complete table is scanned. If the item id gets larger a greater amount of the table has to be scanned to get the row to be updated.
I would try an index perhaps even the primary key for item_id?
Still the increase of duration seems too high for such a machine and relativly small database.

Given that more details would be required for a proper diagnosing (see below), I see two potential performance decrease possibilities here.
One is that you're running into a Schlemiel the Painter's Problem which you could ameliorate with
CREATE INDEX table_ndx ON table(allocated_node, item);
but it looks unlikely with so low a cardinality. MySQL shouldn't take so long to locate unallocated nodes.
A more likely explanation could be that you're running into a locking conflict of some kind between clients. To be sure, during those 300 seconds in which the system is stalled, run
SHOW FULL PROCESSLIST
from an administrator connection to MySQL. See what it has to say, and possibly use it to update your question. Also, post the result of
SHOW CREATE TABLE
against the tables you're using.
You should be doing something like this:
START TRANSACTION;
allocate up to 40 nodes using SELECT...FOR UPDATE;
COMMIT WORK;
-- The two transactions serve to ensure that the node selection can
-- never lock more than those 40 nodes. I'm not too sure of that LIMIT
-- being used in the UPDATE.
START TRANSACTION;
select those 40 nodes with SELECT...FOR UPDATE;
<long work involving those 40 nodes and nothing else>
COMMIT WORK;
If you use a single transaction and table level locking (even implicitly), it might happen that one client locks all others out. In theory this ought to happen only with MyISAM tables (that only have table-level locking), but I've seen threads stalled for ages with InnoDB tables as well.

Your 'external locking' technique sounds fine.
INDEX(allocated_node) will help significantly for the first UPDATE.
INDEX(item) will help significantly for the final UPDATE.
(A compound index with the two columns will help only one of the updates, not both.)
The reason for the sudden increase: You are continually filling in big TEXT fields, making the table size grow. At some point the table is so big that it cannot be cached in RAM. So, it goes from being cached to being a full table scan.
...; SELECT ... FOR UPDATE; COMMIT; -- The FOR UPDATE is useless since the COMMIT happens immediately.
You could play with the "40", though I can't think why a larger or smaller number would help.

Related

SQL query on MySQL taking three second longer with no changes to the database or to the SQL query

I have been asked to diagnose why a query looking something like this
SELECT COUNT(*) AS count
FROM users
WHERE first_digit BETWEEN 500 AND 1500
AND second_digit BETWEEN 5000 AND 45000;
went from taking around 0.3 seconds to execute suddenly is taking over 3 seconds. The system is MySQL running on Ubuntu.
The table is not sorted and contains about 1.5M rows. After I added a composite index I got the execution time down to about 0.2 seconds again, however this does not explain the root cause why all of a sudden the execution time increased exponentially.
How can I begin to investigate the cause of this?
Since your SQL query has not changed, and I interpret your description as the data set has not changed/grown - I suggest you take a look at the following areas, in order:
1) Have your removed the index and run your SQL query again?
2) Other access to the database. Are other applications or users running heavy queries on the same database? Larger data transfers, in particular to and from the database server in question.
A factor of 10 slowdown? A likely cause is going from entirely cached to not cached.
Please show us SHOW CREATE TABLE. EXPLAIN SELECT, RAM size, and the value of innodb_buffer_pool_size. And how big (GB) is the table?
Also, did someone happen to do a dump or ALTER TABLE or OPTIMIZE TABLE just before the slowdown.
The above info will either show what caused caching to fail, or show the need for more RAM.
INDEX(first_digit, second_digit) (in either order) will be "covering" for that query; this will be faster than without any index.

Altering MySQL table column type from INT to BIGINT

I have a table with just under 50 million rows. It hit the limit for INT (2147483647). At the moment the table is not being written to.
I am planning on changing the ID column from INT to BIGINT. I am using a Rails migration to do this with the following migration:
def up
execute('ALTER TABLE table_name MODIFY COLUMN id BIGINT(8) NOT NULL AUTO_INCREMENT')
end
I have tested this locally on a dataset of 2000 rows and it worked ok. Running the ALTER TABLE command across the 50 million should be ok since the table is not being used at the moment?
I wanted to check before I run the migration. Any input would be appreciated, thanks!
We had exactly same scenario but with postgresql, and i know how 50M fills up the whole range of int, its gaps in the ids, gaps generated by deleting rows over time or other factors involving incomplete transactions etc.
I will explain what we ended up doing, but first, seriously, testing a data migration for 50M rows on 2k rows is not a good test.
There can be multiple solutions to this problem, depending on the factors like which DB provider are you using? We were using mazon RDS and it has limits on runtime and what they call IOPS(input/output operations) if we run such intensive query on a DB with such limits it will run out of its IOPS quota mid way throuh, and when IOPS quota runs out, DB ends up being too slow and kind of just useless. We had to cancel our query, and let the IOPS catch up which takes about 30 minutes to 1 hour.
If you have no such restrictions and have DB on premises or something like that, then there is another factor, which is, if you can afford downtime?**
If you can afford downtime and have no IOPS type restriction on your DB, you can run this query directly, which will take a lot fo time(may half hour or so, depending on a lot of factors) and in the meantime
Table will be locked, as rows are being changed, so make sure not only this table is not getting any writes, but also no reads during the process, to make sure your process goes to the end smoothly without any deadlocks type situation.
What we did avoiding downtimes and the Amazon RDS IOPS limits:
In my case, we had still about 40M ids left in the table when we realized this is going to run out, and we wanted to avoid downtimes. So we took a multi step approach:
Create a new big_int column, name it new_id or something(have it unique indexed from start), this will be nullable with default null.
Write background jobs which runs each night a few times and backfills the new_id column from id column. We were backfilling about 4-5M rows each night, and a lot more over weekends(as our app had no traffic on weekends).
When you are caught up backfilling, now we will have to stop any access to this table(we just took down our app for a few minutes at night), and create a new sequence starting from the max(new_id) value, or use existing sequence and bind it to the new_id column with default value to nextval of that sequence.
Now switch primary key from id to new_id, before that make new_id not null.
Delete id column.
Rename new_id to id.
And resume your DB operations.
This above is minimal writeup of what we did, you can google up some nice articles about it, one is this. This approach is not new and pretty much common, so i am sure you will find even mysql specific ones too, or you can just adjust a couple of things in this above article and you should be good to go.

How to improve InnoDB's SELECT performance while INSERTing

We recently switched our tables to use InnoDB (from MyISAM) specifically so we could take advantage of the ability to make updates to our database while still allowing SELECT queries to occur (i.e. by not locking the entire table for each INSERT)
We have a cycle that runs weekly and INSERTS approximately 100 million rows using "INSERT INTO ... ON DUPLICATE KEY UPDATE ..."
We are fairly pleased with the current update performance of around 2000 insert/updates per second.
However, while this process is running, we have observed that regular queries take very long.
For example, this took about 5 minutes to execute:
SELECT itemid FROM items WHERE itemid = 950768
(When the INSERTs are not happening, the above query takes several milliseconds.)
Is there any way to force SELECT queries to take a higher priority? Otherwise, are there any parameters that I could change in the MySQL configuration that would improve the performance?
We would ideally perform these updates when traffic is low, but anything more than a couple seconds per SELECT query would seem to defeat the purpose of being able to simultaneously update and read from the database. I am looking for any suggestions.
We are using Amazon's RDS as our MySQL server.
Thanks!
I imagine you have already solved this nearly a year later :) but I thought I would chime in. According to MySQL's documentation on internal locking (as opposed to explicit, user-initiated locking):
Table updates are given higher priority than table retrievals. Therefore, when a lock is released, the lock is made available to the requests in the write lock queue and then to the requests in the read lock queue. This ensures that updates to a table are not “starved” even if there is heavy SELECT activity for the table. However, if you have many updates for a table, SELECT statements wait until there are no more updates.
So it sounds like your SELECT is getting queued up until your inserts/updates finish (or at least there's a pause.) Information on altering that priority can be found on MySQL's Table Locking Issues page.

MySQL query slowing down until restart

I have a service that sits on top of a MySQL 5.5 database (INNODB). The service has a background job that is supposed to run every week or so. On a high level the background job does the following:
Do some initial DB read and write in one transaction
Execute UMQ (described below) with a set of parameters in one transaction.
If no records are returned we are done!
Process the result from UMQ (this is a bit heavy so it is done outside of any DB
transaction)
Write the outcome of the previous step to DB in one transaction (this
writes to tables queried by UMQ and ensures that the same records are not found again by UMQ).
Goto step 2.
UMQ - Ugly Monster Query: This is a nasty database query that joins a bunch of tables, has conditions on columns in several of these tables and includes a NOT EXISTS subquery with some more joins and conditions. UMQ includes ORDER BY also has LIMIT 1000. Even though the query is bad I have done what I can here - there are indexes on all columns filtered on and the joins are all over foreign key relations.
I do expect UMQ to be heavy and take some time, which is why it's executed in a background job. However, what I'm seeing is rapidly degrading performance until it eventually causes a timeout in my service (maybe 50 times slower after 10 iterations).
First I thought that it was because the data queried by UMQ changes (see step 4 above) but that wasn't it because if I took the last query (the one that caused the timeout) from the slow query log and executed it myself directly I got the same behavior only until I restated the MySQL service. After restart the exact query on the exact same data that took >30 seconds before restart now took <0.5 seconds. I can reproduce this behavior every time by restoring the database to it's initial state and restarting the process.
Also, using the trick described in this question I could see that the query scans around 60K rows after restart as opposed to 18M rows before. EXPLAIN tells me that around 10K rows should be scanned and the result of EXPLAIN is always the same. No other processes are accessing the database at the same time and the lock_time in the slow query log is always 0. SHOW ENGINE INNODB STATUS before and after restart gives me no hints.
So finally the question: Does anybody have any clue of why I'm seeing this behavior? And how can I analyze this further?
I have the feeling that I need to configure MySQL differently in some way but I have searched and tested like crazy without coming up with anything that makes a difference.
Turns out that the behavior I saw was the result of how the MySQL optimizer uses InnoDB statistics to decide on an execution plan. This article put me on the right track (even though it does not exactly discuss my problem). The most important thing I learned from this is that MySQL calculates statistics on startup and then once in a while. This statistics is then used to optimize queries.
The way I had set up the test data the table T where most writes are done in step 4 started out as empty. After each iteration T would contain more and more records but the InnoDB statistics had not yet been updated to reflect this. Because of this the MySQL optimizer always chose an execution plan for UMQ (which includes a JOIN with T) that worked well when T was empty but worse and worse the more records T contained.
To verify this I added an ANALYZE TABLE T; before every execution of UMQ and the rapid degradation disappeared. No lightning performance but acceptable. I also saw that leaving the database for half an hour or so (maybe a bit shorter but at least more than a couple of minutes) would allow the InnoDB statistics to refresh automatically.
In a real scenario the relative difference in index cardinality for the tables involved in UMQ will look quite different and will not change as rapidly so I have decided that I don't really need to do anything about it.
thank you very much for the analysis and answer. I've been searching this issue for several days during ci on mariadb 10.1 and bacula server 9.4 (debian buster).
The situation was that after fresh server installation during a CI cycle, the first two tests (backup and restore) runs smoothly on unrestarted mariadb server and only the third test showed that one particular UMQ took about 20 minutes (building directory tree during restore process from the table with about 30k rows).
Unless the mardiadb server was restarted or table has been analyzed the problem would not go away. ANALYZE TABLE or the restart changed the cardinality of the fields and internal query processing exactly as stated in the linked article.

Best Approach for Checking and Inserting Records

EDIT: To clarify the records originally come from a flat-file database and is not in the MySQL database.
In one of our existing C programs which purpose is to take data from the flat-file and insert them (based on criteria) into the MySQL table:
Open connection to MySQL DB
for record in all_record_of_my_flat_file:
if record contain a certain field:
if record is NOT in sql_table A: // see #1
insert record information into sql_table A and B // see #2
Close connection to MySQL DB
select field from sql_table A where field=XXX
2 inserts
I believe that management did not feel it is worth it to add the functionality so that when the field in the flat file is created, it would be inserted into the database. This is specific to one customer (that I know of). I too, felt it odd that we use tool such as this to "sync" the data. I was given the duty of using and maintaining this script so I haven't heard too much about the entire process. The intent is to primarily handle additional records so this is not the first time it is used.
This is typically done every X months to sync everything up or so I'm told. I've also been told that this process takes roughly a couple of days. There is (currently) at most 2.5million records (though not necessarily all 2.5m will be inserted and most likely much less). One of the table contains 10 fields and the other 5 fields. There isn't much to be done about iterating through the records since that part can't be changed at the moment. What I would like to do is speed up the part where I query MySQL.
I'm not sure if I have left out any important details -- please let me know! I'm also no SQL expert so feel free to point out the obvious.
I thought about:
Putting all the inserts into a transaction (at the moment I'm not sure how important it is for the transaction to be all-or-none or if this affects performance)
Using Insert X Where Not Exists Y
LOAD DATA INFILE (but that would require I create a (possibly) large temp file)
I read that (hopefully someone can confirm) I should drop indexes so they aren't re-calculated.
mysql Ver 14.7 Distrib 4.1.22, for sun-solaris2.10 (sparc) using readline 4.3
Why not upgrade your MySQL server to 5.0 (or 5.1), and then use a trigger so it's always up to date (no need for the monthly script)?
DELIMITER //
CREATE TRIGGER insert_into_a AFTER INSERT ON source_table
FOR EACH ROW
BEGIN
IF NEW.foo > 1 THEN
SELECT id AS #testvar FROM a WHERE a.id = NEW.id;
IF #testvar != NEW.id THEN
INSERT INTO a (col1, col2) VALUES (NEW.col1, NEW.col2);
INSERT INTO b (col1, col2) VALUES (NEW.col1, NEW.col2);
END IF
END IF
END //
DELIMITER ;
Then, you could even setup update and delete triggers so that the tables are always in sync (if the source table col1 is updated, it'll automatically propagate to a and b)...
Here's my thoughts on your utility script...
1) Is just a good practice anyway, I'd do it no matter what.
2) May save you a considerable amount of execution time. If you can solve a problem in straight SQL without using iteration in a C-Program, this can save a fair amount of time. You'll have to profile it first to ensure it really does in a test environment.
3) LOAD DATA INFILE is a tactic to use when inserting a massive amount of data. If you have a lot of records to insert (I'd write a query to do an analysis to figure out how many records you'll have to insert into table B), then it might behoove you to load them this way.
Dropping the indexes before the insert can be helpful to reduce running time, but you'll want to make sure you put them back when you're done.
Although... why aren't all the records in table B in the first place? You haven't mentioned how processing works, but I would think it would be advantageous to ensure (in your app) that the records got there without your service script's intervention. Of course, you understand your situation better than I do, so ignore this paragraph if it's off-base. I know from experience that there are lots of reasons why utility cleanup scripts need to exist.
EDIT: After reading your revised post, your problem domain has changed: you have a bunch of records in a (searchable?) flat file that you need to load into the database based on certain criteria. I think the trick to doing this as quickly as possible is to determine where the C application is actually the slowest and spends the most time spinning its proverbial wheels:
If it's reading off the disk, you're stuck, you can't do anything about that, unless you get a faster disk.
If it's doing the SQL query-insert operation, you could try optimizing that, but your'e doing a compare between two databases (the flat-file and the MySQL one)
A quick thought: by doing a LOAD DATA INFILE bulk insert to populate a temporary table very quickly (perhaps even an in-memory table if MySQL allows that), and then doing the INSERT IF NOT EXISTS might be faster than what you're currently doing.
In short, do profiling, and figure out where the slowdown is. Aside from that, talk with an experienced DBA for tips on how to do this well.
I discussed with another colleague and here is some of the improvements we came up with:
For:
SELECT X FROM TABLE_A WHERE Y=Z;
Change to (currently waiting verification on whether X is and always unique):
SELECT X FROM TABLE_A WHERE X=Z LIMIT 1;
This was an easy change and we saw some slight improvements. I can't really quantify it well but I did:
SELECT X FROM TABLE_A ORDER BY RAND() LIMIT 1
and compared the first two query. For a few test there was about 0.1 seconds improvement. Perhaps it cached something but the LIMIT 1 should help somewhat.
Then another (yet to be implemented) improvement(?):
for record number X in entire record range:
if (no CACHE)
CACHE = retrieve Y records (sequentially) from the database
if (X exceeds the highest record number in cache)
CACHE = retrieve the next set of Y records (sequentially) from the database
search for record number X in CACHE
...etc
I'm not sure what to set Y to, are there any methods for determining what's a good sized number to try with? The table has 200k entries. I will edit in some results when I finish implementation.