I have ran into an issue when using a mysql database where, after creating a new table and adding CRUD database query logic to my web application (with backend written in c), update querys will sometimes take 10-20 minute to execute.
The web application has apache modules that talk to server daemons that have a connection to a mysql (MariaDB 10.4) database. The server daemons each have about 20 work threads, waiting to handle any requests from the apache modules. The work threads maintain a consent connection to the mysql database. I added a new table of the following schema:
CREATE TABLE MyTable
(
table_id INT NOT NULL AUTO_INCREMENT,
index_id INT NOT NULL,
int_column_1 INT DEFAULT 0,
decimal_column_1 DECIMAL(9,3) DEFAULT 0,
decimal_column_2 DECIMAL(9,3) DEFAULT 0,
varchar_column_1 varchar(3000) DEFAULT NULL,
varchar_column_2 varchar(3000) DEFAULT NULL,
deleted tinyint DEFAULT 0,
PRIMARY KEY (table_id) ,
KEY index_on_index_id (index_id)
)
Then I added the following crud operations:
1. RETRIEVE:
SELECT * FROM MyTable table_id, varchar_column_1,... WHERE index_id = ${given index_id}
2. CREATE:
INSERT INTO MyTable (index_id, varchar_column_2, ,,,) VALUES ( ${given}, ${given})Note: This is done using a prepare statement because ${given varchar_column_2} is a user entered value.
3. UPDATE:
UPDATE MyTable SET varchar_column_1 = ISNULL(${given varchar_column_2}, `varchar_column_2 `) WHERE table_id = ${given table_id} Note: This is also done using a prepare statement because ${given varchar_column_2} is a user entered value. Also, the isnull is a kludge solution to the possibility that the given varchar_column_2 might be null, so that the column will just be set to the value in the table.
4. DELETE:
UPDATE MyTable SET deleted = 1 WHERE table_id = ${given table_id}
Finally, there is a delete index_id operation:
UPDATE MyTable SET deleted = 1 WHERE index_id = ${given index_id }
This was deployed to a production server without proper testing. On that production server, a script I wrote was ran that filled MyTable with about 30,000 entries. Then, using the crud operations, about 600 updates, 50 creates, 20 deletes, and thousands of retrieves were performed on the table. The problem that is occurring is that after some time (an hour or two) of these operations being performed, the update operation would take 10+ minutes to execute. Eventually, all of the work threads in the server daemon would be stuck waiting on the update operations, and any other requests to the daemon would time out. This behavior happened twice in one day and one more time two days later.
There were three parts of this behavior that really confused me. One is that all update operations on the database were being blocked. So even if the daemon, or any daemon, was updating a different table in database, that update would take 10+ minutes. The next is that the select operations would execute instantly as all the update queries were taking 10+ minutes. Finally, after 10-20 minutes, all of the 20-ish update queries would successfully execute, the database would be correctly updated, and the threads would go back to working properly.
I received a dump of the database and ran EXPLAIN ${mysql query} for each of the new CRUD queries, and none produced strange results. In the "Extras" column, the only entry was "using where clause" for the queries that have where clauses. Another potential problem is the use of varchars. Since the UPDATE operations are used the most and are the ones that seem to be causing the problem, I thought maybe the fact that the varchars are changing sizes a lot (they range from 8 chars to 500 chars), it might run into some mysql memory issues that cause the long execution time. I also thought maybe there was an issue with table level locks, but running
Show status like ' table%
returned table_locks_waited = 0.
Unfortunately, no database monitoring was being done on the production server that was having issues, I only have the order of the transactions as they happened. To this, each time this issue occurred, the first update query that was blocked was an update to a different table in the database. It was the same query twice (but it is also the most common update query in the application), but it has been in the application for months without any issues.
I tried to reproduce this issue on a server with the same table and CRUD operations, but with only 600 entries in MyTable. Making about 100 update requests, 20 create requests, 5 delete requests, and hundreds of get requests. I could not reproduce the issue of the update queries taking 10+ minutes. This makes me think that maybe the size of the table has something to do with it.
I am looking for any suggestions on what might be causing this issue, or any ideas on how to better diagnose the problem.
Sorry for the extremely long question. I am a junior software engineer that is in a little over his head. Any help would be really appreciated. I can also provide any additional information about the database or application if needed.
Related
I have a table with just under 50 million rows. It hit the limit for INT (2147483647). At the moment the table is not being written to.
I am planning on changing the ID column from INT to BIGINT. I am using a Rails migration to do this with the following migration:
def up
execute('ALTER TABLE table_name MODIFY COLUMN id BIGINT(8) NOT NULL AUTO_INCREMENT')
end
I have tested this locally on a dataset of 2000 rows and it worked ok. Running the ALTER TABLE command across the 50 million should be ok since the table is not being used at the moment?
I wanted to check before I run the migration. Any input would be appreciated, thanks!
We had exactly same scenario but with postgresql, and i know how 50M fills up the whole range of int, its gaps in the ids, gaps generated by deleting rows over time or other factors involving incomplete transactions etc.
I will explain what we ended up doing, but first, seriously, testing a data migration for 50M rows on 2k rows is not a good test.
There can be multiple solutions to this problem, depending on the factors like which DB provider are you using? We were using mazon RDS and it has limits on runtime and what they call IOPS(input/output operations) if we run such intensive query on a DB with such limits it will run out of its IOPS quota mid way throuh, and when IOPS quota runs out, DB ends up being too slow and kind of just useless. We had to cancel our query, and let the IOPS catch up which takes about 30 minutes to 1 hour.
If you have no such restrictions and have DB on premises or something like that, then there is another factor, which is, if you can afford downtime?**
If you can afford downtime and have no IOPS type restriction on your DB, you can run this query directly, which will take a lot fo time(may half hour or so, depending on a lot of factors) and in the meantime
Table will be locked, as rows are being changed, so make sure not only this table is not getting any writes, but also no reads during the process, to make sure your process goes to the end smoothly without any deadlocks type situation.
What we did avoiding downtimes and the Amazon RDS IOPS limits:
In my case, we had still about 40M ids left in the table when we realized this is going to run out, and we wanted to avoid downtimes. So we took a multi step approach:
Create a new big_int column, name it new_id or something(have it unique indexed from start), this will be nullable with default null.
Write background jobs which runs each night a few times and backfills the new_id column from id column. We were backfilling about 4-5M rows each night, and a lot more over weekends(as our app had no traffic on weekends).
When you are caught up backfilling, now we will have to stop any access to this table(we just took down our app for a few minutes at night), and create a new sequence starting from the max(new_id) value, or use existing sequence and bind it to the new_id column with default value to nextval of that sequence.
Now switch primary key from id to new_id, before that make new_id not null.
Delete id column.
Rename new_id to id.
And resume your DB operations.
This above is minimal writeup of what we did, you can google up some nice articles about it, one is this. This approach is not new and pretty much common, so i am sure you will find even mysql specific ones too, or you can just adjust a couple of things in this above article and you should be good to go.
I have an application using a MySQL database hosted on one machine and 6 clients running on other machines that read and write to it over a local network.
I have one main work table which contains about 120,000 items in rows to be worked on. Each client grabs 40 unallocated work items from the table (marking them as allocated), does the work and then writes back the results to the same work table. This sequence continues until there is no more work to do.
The above is a picture that shows the amount of time taken to write back each block of 40 results to the table from one of the clients using UPDATE queries. You can see that the duration is fairly small for most of the time but suddenly the duration goes up to 300 sec and stays there until all work completes. This rapid increase in time to execute the queries towards the end is what I need help with.
The clients are not heavily loaded. The server is a little loaded but it has 16GB of RAM, 8 cores and is doing nothing other than hosting this db.
Here is the relevant SQL code.
Table creation:
CREATE TABLE work (
item_id MEDIUMINT,
item VARCHAR(255) CHARACTER SET utf8,
allocated_node VARCHAR(50),
allocated_time DATETIME,
result TEXT);
/* Then insert 120,000 items, which is quite fast. No problem at this point. */
INSERT INTO work VALUES (%s,%s,%s,NULL,NULL,NULL);
Client allocating 40 items to work on:
UPDATE work SET allocated_node = %s, allocated_time=NOW()
WHERE allocated_node IS NULL LIMIT 40;
SELECT item FROM work WHERE allocated_node = %s AND result IS NULL;
Update the row with the completed result (this is the part that gets really slower after a few hours of running):
/* The chart above shows the time to execute 40 of these for each write back of results */
UPDATE work SET result = %s WHERE item = %s;
I'm using MySQL on Ubuntu 14.04, with all the standard settings.
The final table is about 160MB, and there are no indexes.
I don't see anything wrong with my queries and they work fine apart from the whole thing taking twice as long as it should overall.
Can someone with experience in these matters suggest any configuration settings I should change in MySQL to fix this performance issue or please point out any issues with what I'm doing that might explain the timing in the chart.
Thanks.
Without an index the complete table is scanned. If the item id gets larger a greater amount of the table has to be scanned to get the row to be updated.
I would try an index perhaps even the primary key for item_id?
Still the increase of duration seems too high for such a machine and relativly small database.
Given that more details would be required for a proper diagnosing (see below), I see two potential performance decrease possibilities here.
One is that you're running into a Schlemiel the Painter's Problem which you could ameliorate with
CREATE INDEX table_ndx ON table(allocated_node, item);
but it looks unlikely with so low a cardinality. MySQL shouldn't take so long to locate unallocated nodes.
A more likely explanation could be that you're running into a locking conflict of some kind between clients. To be sure, during those 300 seconds in which the system is stalled, run
SHOW FULL PROCESSLIST
from an administrator connection to MySQL. See what it has to say, and possibly use it to update your question. Also, post the result of
SHOW CREATE TABLE
against the tables you're using.
You should be doing something like this:
START TRANSACTION;
allocate up to 40 nodes using SELECT...FOR UPDATE;
COMMIT WORK;
-- The two transactions serve to ensure that the node selection can
-- never lock more than those 40 nodes. I'm not too sure of that LIMIT
-- being used in the UPDATE.
START TRANSACTION;
select those 40 nodes with SELECT...FOR UPDATE;
<long work involving those 40 nodes and nothing else>
COMMIT WORK;
If you use a single transaction and table level locking (even implicitly), it might happen that one client locks all others out. In theory this ought to happen only with MyISAM tables (that only have table-level locking), but I've seen threads stalled for ages with InnoDB tables as well.
Your 'external locking' technique sounds fine.
INDEX(allocated_node) will help significantly for the first UPDATE.
INDEX(item) will help significantly for the final UPDATE.
(A compound index with the two columns will help only one of the updates, not both.)
The reason for the sudden increase: You are continually filling in big TEXT fields, making the table size grow. At some point the table is so big that it cannot be cached in RAM. So, it goes from being cached to being a full table scan.
...; SELECT ... FOR UPDATE; COMMIT; -- The FOR UPDATE is useless since the COMMIT happens immediately.
You could play with the "40", though I can't think why a larger or smaller number would help.
My application accesses a local DB where it inserts records into a table (+- 30-40 million a day). I have processes that run and process data and do these inserts. Part of the process involves selecting an id from an IDs table which is unique and this is done using a simple
Begin Transaction
Select top 1 #id = siteid from siteids WITH (UPDLOCK, HOLDLOCK)
delete siteids where siteid = #id
Commit Transaction
I then immediately delete that id with a separate statement from that very table so that no other process grabs it. This is causing tremendous timeout issues and with only 4 processes accessing it, I am surprised though. I also get timeout issues when checking my main post table to see if a record was inserted using the above id. It runs fast but with all the deadlocks and timeouts I think this indicates poor design and is a recipe for disaster.
Any advice?
EDIT
this is the actual statement that someone else here helped with. I then removed the delete and included it in my code as a separately executed statement. Will the order by clause really help here?
MySQL 5.1, Ubuntu 10.10 64bit, Linode virtual machine.
All tables are InnoDB.
One of our production machines uses a MySQL database containing 31 related tables. In one table, there is a field containing display values that may change several times per day, depending on conditions.
These changes to the display values are applied lazily throughout the day during usage hours. A script periodically runs and checks a few inexpensive conditions that may cause a change, and updates the display value if a condition is met. However, this lazy method doesn't catch all posible scenarios in which the display value should be updated, in order to keep background process load to a minimum during working hours.
Once per night, a script purges all display values stored in the table and recalculates them all, thereby catching all possible changes. This is a much more expensive operation.
This has all been running consistently for about 6 months. Suddenly, 3 days ago, the run time of the nightly script went from an average of 40 seconds to 11 minutes.
The overall proportions on the stored data have not changed in a significant way.
I have investigated as best I can, and the part of the script that is suddenly running slower is the last update statement that writes the new display values. It is executed once per row, given the (INT(11)) id of the row and the new display value (also an INT).
update `table` set `display_value` = ? where `id` = ?
The funny thing is, that the purge of all the previous values is executed as:
update `table` set `display_value` = null
And this statement still runs at the same speed as always.
The display_value field is not indexed. id is the primary key. There are 4 other foreign keys in table that are not modified at any point during execution.
And the final curve ball: If I dump this schema to a test VM, and execute the same script it runs in 40 seconds not 11 minutes. I have not attempted to rebuild the schema on the production machine, as that's simply not a long term solution and I want to understand what's happening here.
Is something off with my indexes? Do they get cruft in them after thousands of updates on the same rows?
Update
I was able to completely resolve this problem by running optimize on the schema. Since InnoDB doesn't support optimize, this forced a rebuild, and resolved the issue. Perhaps I had a corrupted index?
mysqlcheck -A -o -u <user> -p
There is a chance the the UPDATE statement won't use an index on id, however, it's very improbable (if possible at all) for a query like yours.
Is there a chance your table are locked by a long-running concurrent query / DML? Which engine does the table use?
Also, updating the table record-by-record is not efficient. You can load your values into a temporary table in a bulk manner and update the main table with a single command:
CREATE TEMPORARY TABLE tmp_display_values (id INT NOT NULL PRIMARY KEY, new_display_value INT);
INSERT
INTO tmp_display_values
VALUES
(?, ?),
(?, ?),
…;
UPDATE `table` dv
JOIN tmp_display_values t
ON dv.id = t.id
SET dv.new_display_value = t.new_display_value;
To start, a few details to describe the situation as a whole:
MySQL (5.1.50) database on a very beefy (32 CPU cores, 64GB RAM) FreeBSD 8.1-RELEASE machine which also runs Apache 2.2.
Apache gets an average of about 50 hits per second. The vast majority of these hits are API calls for a sale platform.
The API calls usually take about a half of a second or less to generate a result, but could take up to 30 seconds depending on third parties.
Each of the API calls stores a row in a database. The information stored there is important, but only for about fifteen minutes, after which it must expire.
In the table which stores API call information (schema for this table is below), InnoDB row-level locking is used to synchronize between threads (Apache connections, really) requesting the same information at the same time, which happens often. This means that several threads may be waiting for a lock on a row for up to 30 seconds, as API calls can take that long (but usually don't).
Above all, the most important thing to note is that everything works perfectly under normal circumstances.
That said, this is the very highly used table (fifty or so INSERTs per second, many SELECTs, row-level locking is utilized) I'm running the DELETE query on:
CREATE TABLE `sales` (
`sale_id` int(32) unsigned NOT NULL auto_increment,
`start_time` int(20) unsigned NOT NULL,
`end_time` int(20) unsigned default NULL,
`identifier` char(9) NOT NULL,
`zip_code` char(5) NOT NULL,
`income` mediumint(6) unsigned NOT NULL,
PRIMARY KEY USING BTREE (`sale_id`),
UNIQUE KEY `SALE_DATA` (`ssn`,`zip_code`,`income`),
KEY `SALE_START` USING BTREE (`start_time`)
) ENGINE=InnoDB DEFAULT CHARSET=ascii ROW_FORMAT=FIXED
The DELETE query looks like this, and is run every five minutes on cron (I'd prefer to run it once per minute):
DELETE FROM `sales` WHERE
`start_time` < UNIX_TIMESTAMP(NOW() - INTERVAL 30 MINUTE);
I've used INT for the time field because it is apparent that MySQL has trouble using indexes with DATETIME fields.
So this is the problem: The DELETE query seems to run fine the majority of the time (maybe 7 out of 10 times). Other times, the query finishes quickly, but MySQL seems to get choked up for awhile afterwards. I can't exactly prove it's MySQL that is acting up, but the times the symptoms happen definitely coincides with the times that this query is run. Here are the symptoms while everything is choked up:
Logging into MySQL and using SHOW FULL PROCESSLIST;, there are just a few INSERT INTOsales... queries running, where normally there are more than a hundred. What's abnormal here is actually the lack of any tasks in the process list, rather than there being too many. It seems MySQL stops taking connections entirely.
Checking Apache server-status, Apache has reached MaxClients. All threads are in "Sending reply" status.
Apache begins using lots of system time CPU. Load averages shoot way up, I've seen 1-minute load averages as high as 100. Normal load average for this machine is around 15. I see that it's using system CPU (as opposed to user CPU) because I use GKrellM to monitor it.
In top, there are many Apache processes using lots of CPU.
The web site and API (served by Apache of course) are unreachable most of the time. Some requests go through, but take around three or four minutes. Other requests reply after a time with a "Can't connect to MySQL server through /tmp/mysql.sock" error - this is the same error as I get when MySQL is over capacity and has too many connections (only it doesn't actually say too many connections).
MySQL accepts a maximum of 1024 connections, mysqltuner.pl reports "[!!] Highest connection usage: 100% (1025/1024)", meaning it's taken on more than it could handle at one point. Generally under normal conditions, there are only a few hundred concurrent MySQL connections at most. mysqltuner.pl reports no other issues, I'd be happy to paste the output if anybody wants.
Eventually, after about a minute or two, things recover on their own without any intervention. CPU usage goes back to normal, Apache and MySQL resume normal operations.
So, what can I do? :) How can I even begin to investigate why this is happening? I need that DELETE query to run for various reasons, why do things go bonkers when it's run (but not all the time)?
hard one. This is not a response but the start of a brainstorming.
I would say, maybe, a re-Index problem on delete, on the doc we can find "delete quick" followed by "optimize table" to try avoiding the multi index-merge.
One other possibility, maybe as well, is a chain of dead lock on delete with at least one other thread, row locks could pause the delete operation, and the delete operation could pause some next row lock. And then you've got either a detected deadlock , or an undetected one and so a timeout occuring. How do you detect such concurrency aborted exceptions? Do you re-run your transactions? If you threads are doing a lot of different row locks in the same transactions chances are that the first deadlock will impact more and more threads (traffic jam).
Did you tried to lock the table in the delete transaction? Check the manual, the way of locking tables in transaction in Innodb or to get a SHARE LOCK on all rows. Maybe it will take you some time to get the table only for you but if your delete is quite fast no one will notice you've taken the table for you only for 1s.
Now even if you do not tried it before, it's maybe what the delete is doing. Check as well this doc on implicit locks, your delete query should be using the start_time index, so I'm quite sure your current delete is not locking all rows (not completly sure, they lock al analysed rows not only the rows matching the where condition), but the delete is quite certainly blocking inserts. Some examples of deadlocks with transaction performing deletes are explained. Good luck! For me it's too late to understand all the lock isolation impacts.
edit you could try to change your DELETE by an UPDATE setting a deleted=1, and perform the real delete on low usage times (if you have some). And change the client queries to check this indexed deleted status.