I'm having VERY bad performance with UPDATE on MySQL, my update statement is quite basic like:
UPDATE `tbl_name`
SET `field1` = 'value1', `field2` = 'value2' .. `fieldN` = 'valueN'
WHERE `tbl_name`.`id` = 123;
values are few (15), all TEXT kind and WHERE condition is just one using id.
values are JSON strings (but this should not bother to MySQL, it should see them as just plain text).
In "tbl_name" I have few records (around 4k).
The problem is that executing this UPDATE statement I got 8 seconds of execution time (taken from MySQL slow query log).
I'm running MySQL alone on an EC2 High CPU Medium istance and I think it's pretty impossible that these performances are "normal", I would expect much more performance.
Do you have any idea to investigate the problem?
** UPDATE **
Thank you for your fast answers, table is InnoDB and id is a PRIMARY, UNIQUE. Values are TEXT (not varchar)
** UPDATE bis **
No, id is an integer, all other fields are TEXT
Since MySQL do not support EXPLAIN UPDATE statements before version 5.6.3, we're quite blind about this query. Try USE INDEX statement...
I've launched the same on my server. All was ok with 15 TEXT fields and 4096 rows of quite arbitrary text. It was ok with both USE INDEX(PRIMARY) and IGNORE INDEX(PRIMARY) statements.
So, I suppose, you have problems with your SQL server, installation package, or whatever, not query...
Related
I have ran into an issue when using a mysql database where, after creating a new table and adding CRUD database query logic to my web application (with backend written in c), update querys will sometimes take 10-20 minute to execute.
The web application has apache modules that talk to server daemons that have a connection to a mysql (MariaDB 10.4) database. The server daemons each have about 20 work threads, waiting to handle any requests from the apache modules. The work threads maintain a consent connection to the mysql database. I added a new table of the following schema:
CREATE TABLE MyTable
(
table_id INT NOT NULL AUTO_INCREMENT,
index_id INT NOT NULL,
int_column_1 INT DEFAULT 0,
decimal_column_1 DECIMAL(9,3) DEFAULT 0,
decimal_column_2 DECIMAL(9,3) DEFAULT 0,
varchar_column_1 varchar(3000) DEFAULT NULL,
varchar_column_2 varchar(3000) DEFAULT NULL,
deleted tinyint DEFAULT 0,
PRIMARY KEY (table_id) ,
KEY index_on_index_id (index_id)
)
Then I added the following crud operations:
1. RETRIEVE:
SELECT * FROM MyTable table_id, varchar_column_1,... WHERE index_id = ${given index_id}
2. CREATE:
INSERT INTO MyTable (index_id, varchar_column_2, ,,,) VALUES ( ${given}, ${given})Note: This is done using a prepare statement because ${given varchar_column_2} is a user entered value.
3. UPDATE:
UPDATE MyTable SET varchar_column_1 = ISNULL(${given varchar_column_2}, `varchar_column_2 `) WHERE table_id = ${given table_id} Note: This is also done using a prepare statement because ${given varchar_column_2} is a user entered value. Also, the isnull is a kludge solution to the possibility that the given varchar_column_2 might be null, so that the column will just be set to the value in the table.
4. DELETE:
UPDATE MyTable SET deleted = 1 WHERE table_id = ${given table_id}
Finally, there is a delete index_id operation:
UPDATE MyTable SET deleted = 1 WHERE index_id = ${given index_id }
This was deployed to a production server without proper testing. On that production server, a script I wrote was ran that filled MyTable with about 30,000 entries. Then, using the crud operations, about 600 updates, 50 creates, 20 deletes, and thousands of retrieves were performed on the table. The problem that is occurring is that after some time (an hour or two) of these operations being performed, the update operation would take 10+ minutes to execute. Eventually, all of the work threads in the server daemon would be stuck waiting on the update operations, and any other requests to the daemon would time out. This behavior happened twice in one day and one more time two days later.
There were three parts of this behavior that really confused me. One is that all update operations on the database were being blocked. So even if the daemon, or any daemon, was updating a different table in database, that update would take 10+ minutes. The next is that the select operations would execute instantly as all the update queries were taking 10+ minutes. Finally, after 10-20 minutes, all of the 20-ish update queries would successfully execute, the database would be correctly updated, and the threads would go back to working properly.
I received a dump of the database and ran EXPLAIN ${mysql query} for each of the new CRUD queries, and none produced strange results. In the "Extras" column, the only entry was "using where clause" for the queries that have where clauses. Another potential problem is the use of varchars. Since the UPDATE operations are used the most and are the ones that seem to be causing the problem, I thought maybe the fact that the varchars are changing sizes a lot (they range from 8 chars to 500 chars), it might run into some mysql memory issues that cause the long execution time. I also thought maybe there was an issue with table level locks, but running
Show status like ' table%
returned table_locks_waited = 0.
Unfortunately, no database monitoring was being done on the production server that was having issues, I only have the order of the transactions as they happened. To this, each time this issue occurred, the first update query that was blocked was an update to a different table in the database. It was the same query twice (but it is also the most common update query in the application), but it has been in the application for months without any issues.
I tried to reproduce this issue on a server with the same table and CRUD operations, but with only 600 entries in MyTable. Making about 100 update requests, 20 create requests, 5 delete requests, and hundreds of get requests. I could not reproduce the issue of the update queries taking 10+ minutes. This makes me think that maybe the size of the table has something to do with it.
I am looking for any suggestions on what might be causing this issue, or any ideas on how to better diagnose the problem.
Sorry for the extremely long question. I am a junior software engineer that is in a little over his head. Any help would be really appreciated. I can also provide any additional information about the database or application if needed.
We have just migrated from MySQL to PostgreSQL, a particular row for every minute will be heavily updated. All those period when the product was running in MySQL we had no issues, but when after moving to PostgreSQL we faced so many deadlocks.
Table structure.
Create table tab(col1 int , col2 int , col3 int, PRIMARY KEY(col1));
No index.
Deadlock query -
Update tab set col2=col2+1 where col3=xx;
(yes, there will be more than one row for result).
My question: How has MySQL handled this situation to avoid deadlocks ? (Asking this question assuming that the problem in PostgreSQL with regard to this query is because of getting the rows in different order every time a concurrent update is happening).
I might have faced deadlocks in MySQL also, but definitely it was not to the extent of how it happened with PostgreSQL.
And I have already gone through the question posted in https://dba.stackexchange.com/questions/151813/why-can-mysql-handle-multiple-updates-concurrently-and-postgresql-cant
the answer posted here was not very convincing as the author went all about complaining the update architecture of PostgreSQL and HOT updates.
I want to know the difference in architecture that enabled MySQL to avoid this problem.
At a guess, MySQL (presumably with InnoDB tables) is probably doing the updates in a consistent order each time, while PostgreSQL's access is generally unordered. This makes sense, given that InnoDB uses index-organized tables while PostgreSQL uses heaps.
PostgreSQL unfortunately does not support UPDATE ... ORDER BY. You can take a row-lock before you UPDATE to ensure reliable ordering at the cost of an extra round-trip, e.g.
BEGIN;
SELECT 1 FROM tab WHERE col3 = xx FOR UPDATE;
UPDATE tab SET col2=col2+1 WHERE col3=xx;
COMMIT;
(I'd love to have UPDATE ... ORDER BY support in PostgreSQL. Patches welcome!)
I've came across the situation, where I need to select huge amount of data (say 100k records which look like ID | {"points":"9","votes":"2","breakdown":"0,0,0,1,1"}), process it in PHP and then put it back. Question is about putting it back efficiently. I saw a solution using INSERT ... ON DUPLICATE KEY UPDATE, I saw a solution with UPDATE using CASE. Are there any other solutions? Which would be the most efficient way to update huge data array?
Better choice is using simple update.
When you try to put data with insert exceptions your DB will do more additional work: try to insert, verify constraints, raise exception, update row, verify constraints again.
Update
Run tests on my local PC for insert into ... ON DUPLICATE KEY UPDATE and UPDATE statements against the table with 43k rows.
the first approach works on 40% faster.
But both worked faster then 1.5s. I suppose, you php code will be bottleneck of your approach and you should not worry about speed of MySQL statements. Of course, it works if you table not huge and does not had dozens millions rows.
Update 2
My local PC uses MySQL 5.6 in default configuration.
CPU: 8Gb
MySQL 5.1, Ubuntu 10.10 64bit, Linode virtual machine.
All tables are InnoDB.
One of our production machines uses a MySQL database containing 31 related tables. In one table, there is a field containing display values that may change several times per day, depending on conditions.
These changes to the display values are applied lazily throughout the day during usage hours. A script periodically runs and checks a few inexpensive conditions that may cause a change, and updates the display value if a condition is met. However, this lazy method doesn't catch all posible scenarios in which the display value should be updated, in order to keep background process load to a minimum during working hours.
Once per night, a script purges all display values stored in the table and recalculates them all, thereby catching all possible changes. This is a much more expensive operation.
This has all been running consistently for about 6 months. Suddenly, 3 days ago, the run time of the nightly script went from an average of 40 seconds to 11 minutes.
The overall proportions on the stored data have not changed in a significant way.
I have investigated as best I can, and the part of the script that is suddenly running slower is the last update statement that writes the new display values. It is executed once per row, given the (INT(11)) id of the row and the new display value (also an INT).
update `table` set `display_value` = ? where `id` = ?
The funny thing is, that the purge of all the previous values is executed as:
update `table` set `display_value` = null
And this statement still runs at the same speed as always.
The display_value field is not indexed. id is the primary key. There are 4 other foreign keys in table that are not modified at any point during execution.
And the final curve ball: If I dump this schema to a test VM, and execute the same script it runs in 40 seconds not 11 minutes. I have not attempted to rebuild the schema on the production machine, as that's simply not a long term solution and I want to understand what's happening here.
Is something off with my indexes? Do they get cruft in them after thousands of updates on the same rows?
Update
I was able to completely resolve this problem by running optimize on the schema. Since InnoDB doesn't support optimize, this forced a rebuild, and resolved the issue. Perhaps I had a corrupted index?
mysqlcheck -A -o -u <user> -p
There is a chance the the UPDATE statement won't use an index on id, however, it's very improbable (if possible at all) for a query like yours.
Is there a chance your table are locked by a long-running concurrent query / DML? Which engine does the table use?
Also, updating the table record-by-record is not efficient. You can load your values into a temporary table in a bulk manner and update the main table with a single command:
CREATE TEMPORARY TABLE tmp_display_values (id INT NOT NULL PRIMARY KEY, new_display_value INT);
INSERT
INTO tmp_display_values
VALUES
(?, ?),
(?, ?),
…;
UPDATE `table` dv
JOIN tmp_display_values t
ON dv.id = t.id
SET dv.new_display_value = t.new_display_value;
Is running an update using a WHERE pkey IN () is more optimal than individual update statements
update table_name set col='val' where primary_key in (..)
vs
update table_name set col='val' where primary_key = xx1
update table_name set col='val' where primary_key = xx2
...
There will be 1000s of updates on a table with millions of rows.
Yes, IN () is much faster as the query optimizer can do 1 pass of the key index, to update many rows in 1 hit. As long as there isn't a SELECT in the brackets, it will be faster.
As to how many id's to pack into the brackets, find out the max packet size for your deployment server, and work it out based on the longest an INT can be in base10 digits.
It's almost always better to execute fewer queries rather than more queries, because that gives the server the opportunity to optimize the operation. That said, I don't think any server will let you can pass millions of value arguments to an in() clause, so you might want to batch them up, updating, perhaps 50 at a time, or something like that.
I think IN will be faster, but it will be limited with number of entries. You will not be able to create 100K SQL in most cases. Time of parsing can also became significant, since database parse might be not optimized for the large SQLs.
In the same time I would like to say that update of the primary key is inherently expensive operation. From some percentage of the change it will be faster to create new table with updated data, and then reindex it.