Magento 1.8: Lock wait timeout issues when customer is checking out - mysql

My website is experiencing issues at checkout. I'm using Magento Enterprise 1.8 and my checkout module is Idev's Onestepcheckout.
The issue we are seeing is that the eav_entity_store table is taking an exceedingly long time (up to 51 seconds) to return an order number to Mage_Eav_Model_Entity_Type.
What I do know is that the query run to get this is a transaction run as 'FOR UPDATE' so the row being accessed is locked until the transaction completes. I've looked at other parts of the code as well as the PHP code throughout the transaction where the row is locked (we're using InnoDB so the lock should be getting released once the transaction is committed) and I'm just not seeing anything there (or in the slow query logs) that should be causing a lock wait anywhere near 51 seconds.
I have considered that requests may be getting stacked up and slowly creeping up in time as they wait, but I'm seeing the query time go from 6ms to 20k ms to 50k ms 1,2,3. It isn't an issue of 100-200 requests stacked up, as there are only a few dozen of these a day.
I'm aware that MySql uses parent locking, but there are no FK's related to this table whatsoever. There are two BTREE indexes that at one point were FK's but have since been Altered (that happened years ago). For those who are un-Magento savy, the eav_entity_store table has less than 50 rows and is only 5 columns wide (4 smallint and a varchar). I seriously doubt tablesize or improper indexing is the culprit. In the spirit of TLDR, however, I will say that the two BTREE indexes are the two columns by which we select from this table.
One possibility is that I may need to replace the two indexes with a compound index, as the ONLY reads to this table are coming from a query that reads (FROM [Column with Index A] AND [Column with Index B]). I simply don't know if row-level locking would prevent this query from accessing another row in the table with the indexes currently on the table.
At this point, I've become convinced that the underlying issue is strictly DB related, but any Magento or MySql advice regarding this would be greatly appreciated. Anybody still actually reading this can hopefully appreciate that I have exhausted a number of options already and am seriously stumped here. Any info that you think may help is welcome. Thanks.
Edit The exact error we are seeing is:
Error message: SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction

Issue solved. Wasn't a problem with MySql. For some reason, generation of Invoice Numbers was taking an obscene amount of time. Company doesn't use Invoices from Magento. Turned them off. Problem solved. No full RCA done on what specifically the problem with invoice generation was.

Related

Deleting old records - MySQL

I currently am looking for a solution to a basic problem I have: the deletion of old records.
To explain the situation, I have a table, which I'll call table1, with a reduced number of records. Usually it stays empty, as it is used to relay messages. These messages are read within two seconds of being added to the database, and deleted so that they aren't read again.
However, if one of the clients supposed to receive the messages from table1 goes offline, several messages can become pending. Sometimes hundreds. Sometimes thousands, or even hundreds of thousands, if not more.
Not only does this hurt the client's performance, which will have to process a huge amount of messages, it also hurts the database's which is kept in memory and should keep a minimal amount of records.
Considering the clients check for new messages every second, what would be the best way to delete old records? I've thought about adding timestamps, but won't that hurt the performance: the fact that it has to calculate timestamps when inserting? I've tried it out, and all those queries ended up in the slow queries log.
What would the best solution be? I've thought about something like checking if the table was altered in the past 5 seconds, and if not, we can be safe that all messages that should be relayed have been relayed already, and it can be wiped. But how can this be done?
I've thought about events running every couple of minutes, but I'm not sure how to implement something that would have no (or meaningless) impact on the select/insert/delete queries.
PS: This situation arrives when I noticed that some clients were offline, and there were 8 million messages pending.
EDIT :
I had forgotten to mention that the storage engine is MEMORY, and therefore all records are kept in RAM. That's the main reason I want to get rid of these records: because millions of records which shouldn't even be there, being kept in RAM, has an impact on system resources.
Here is an extract from the error log:
# Query_time: 0.000283 Lock_time: 0.000070 Rows_sent: 0 Rows_examined: 96
SET timestamp=1387199997;
DELETE FROM messages WHERE clientid='100';
[...]
# Query_time: 0.000178 Lock_time: 0.000054 Rows_sent: 0 Rows_examined: 96
SET timestamp=1387199998;
DELETE FROM messages WHERE clientid='14';
So I guess they do have a quite small delay, but is it in any way meaningful in MySQL? I mean, in "real life", 0.0003 could be completely ignored due to its insignificance, can the same be said about MySQL and connections with approximately 10ms ping?
Your question is interesting, but hasn't a lot of detail, so I can only give general points of view.
Firstly - there exist already a number of message queuing solutions which may do what you need out of the box. They hide the underlying implementation of data storage, clean-up etc. and allow you to focus on the application logic. RabbitMQ is a popular open source option.
Secondly, unless you are working with constrained hardware, 100s of thousands of records in a MySQL table is not a performance problem in most cases, nor is generating a time stamp on insert. So, I would recommend building a solution that's obvious and straightforward (and therefore less error prone) - add a timestamp column to your message table, and find a way of removing messages older than 5 minutes. You could add this to the logic which cleans up the records after delivery. As long as your queries are hitting indexed columns, I don't think you have to worry about hundreds of thousands of records.
I would put some energy into creating a performance test suite that allows you to experiment with solutions and see which is really faster. That can be tricky, especially if you want to test scenarios with multiple clients, but you will learn a lot more about the performance characteristics of the app by working through those scenarios.
EDIT:
You can have one column in your table automatically set a timestamp value - I've always found this to be extremely fast. As in - it's never been a problem on very large tables (tens of millions of rows).
I've not got much experience with the memory storage engine - but the MySQL documentation suggests that data modification actions (like insert or update or delete) can be slow due to locking time - that's borne out by your statistics, where the locking time is roughly 30% of the total.
I've had a similar problem.
A couple of questions: First, how long should undelivered messages dwell in the system? Forever? A day? Ten seconds?
Second, what is the consequence of erroneously deleting an undelivered message? Does it cause the collapse of the global banking system? Does it cause a hospital patient not to receive a needed injection? Or does a subsequent message simply cover for the missing one?
The best situation is short dwell time and low error consequence. If the error consequence is high, none of this is wise.
Setting up the solution took several steps for me.
First, write some code to fetch the max id from the messages table.
SELECT MAX(message_id) AS max_message_id FROM message
Then, an hour later, or ten seconds, or a day, or whatever, delete all the messages with id numbers less than the recorded one from the previous run.
DELETE FROM message WHERE message_id <= ?max_message_id
If all is functioning correctly, there won't be anything to delete. But if you have a bunch of stale messages for a client that's gone walkabout, pow, they're gone.
Finally, before putting this into production, wait for a quiet moment in your system, and, just once, issue the command
TRUNCATE TABLE message
to clear out any old rubbish in the table.
You can do this with an event (a stored job in the MySQL database) by creating a little one-row, one-column table to store the max_message_id.
EDIT
You can also alter your table to add a message_time column, in such a way that it gets set automatically whenever you insert a row. Issue these three statements at a time when your system is quiet and you can afford to trash all extant messages.
TRUNCATE TABLE message;
ALTER TABLE message ADD COLUMN message_time TIMESTAMP
NOT NULL
DEFAULT CURRENT_TIMESTAMP;
ALTER TABLE message ADD INDEX message_time (message_time);
Then you can just use a single statement to clean out the old records, like so.
DELETE FROM message WHERE message_time <= NOW() - INTERVAL 1 HOUR
(or whatever interval is appropriate). You should definitely alter an empty or almost-empty table because it takes time to alter lots of rows.
This is a good solution because there's a chance that you don't have to alter your message-processing client code at all. (Of course, if you did SELECT * anywhere, you probably will have to alter it. Pro-tip: never use SELECT * in application code.)

Mysql Lock times in slow query log

I have an application that has been running fine for quite awhile, but recently a couple of items have started popping up in the slow query log.
All the queries are complex and ugly multi join select statements that could use refactoring. I believe all of them have blobs, meaning they get written to disk. The part that gets me curious is why some of them have a lock time associated with them. None of the queries have any specific locking protocols set by the application. As far as I know, by default you can read against locks unless explicitly specified.
so my question: What scenarios would cause a select statement to have to wait for a lock (and thereby be reported in the slow query log)? Assume both INNODB and MYISAM environments.
Could the disk interaction be listed as some sort of lock time? If yes, is there documentation around that says this?
thanks in advance.
MyISAM will give you concurrency problems, an entire table is completely locked when an insert is in progress.
InnoDB should have no problems with reads, even while a write/transaction is in progress due to it's MVCC.
However, just because a query is showing up in the slow-query log doesn't mean the query is slow - how many seconds, how many records are being examined?
Put "EXPLAIN" in front of the query to get a breakdown of the examinations going on for the query.
here's a good resource for learning about EXPLAIN (outside of the excellent MySQL documentation about it)
I'm not certain about MySql, but I know that in SQL Server select statements do NOT read against locks. Doing so will allow you to read uncommitted data, and potentially see duplicate records or miss a record entirely. The reason for this is because if another process is writing to the table, the database engine may decide it's time to reorganize some data and shifts it around on disk. So it moves a record you already read to the end and you see it again, or it moves one from the end up higher where you've already past.
There's a guy on the net somewhere who actually wrote a couple of scripts to prove that this happens and I tried them once and it only took a few seconds before a duplicate showed up. Of course, he designed the scripts in a fashion that would make it more likely to happen, but it proves that it definitely can happen.
This is okay behaviour if your data doesn't need to be accurate and can certainly help prevent deadlocks. However, if you're working on an application dealing with something like people's money then that's very bad.
In SQL Server you can use the WITH NOLOCK hint to tell your select statement to ignore locks. I'm not sure what the equivalent in MySql would be but maybe someone else here will say.

MySQL query slowing down until restart

I have a service that sits on top of a MySQL 5.5 database (INNODB). The service has a background job that is supposed to run every week or so. On a high level the background job does the following:
Do some initial DB read and write in one transaction
Execute UMQ (described below) with a set of parameters in one transaction.
If no records are returned we are done!
Process the result from UMQ (this is a bit heavy so it is done outside of any DB
transaction)
Write the outcome of the previous step to DB in one transaction (this
writes to tables queried by UMQ and ensures that the same records are not found again by UMQ).
Goto step 2.
UMQ - Ugly Monster Query: This is a nasty database query that joins a bunch of tables, has conditions on columns in several of these tables and includes a NOT EXISTS subquery with some more joins and conditions. UMQ includes ORDER BY also has LIMIT 1000. Even though the query is bad I have done what I can here - there are indexes on all columns filtered on and the joins are all over foreign key relations.
I do expect UMQ to be heavy and take some time, which is why it's executed in a background job. However, what I'm seeing is rapidly degrading performance until it eventually causes a timeout in my service (maybe 50 times slower after 10 iterations).
First I thought that it was because the data queried by UMQ changes (see step 4 above) but that wasn't it because if I took the last query (the one that caused the timeout) from the slow query log and executed it myself directly I got the same behavior only until I restated the MySQL service. After restart the exact query on the exact same data that took >30 seconds before restart now took <0.5 seconds. I can reproduce this behavior every time by restoring the database to it's initial state and restarting the process.
Also, using the trick described in this question I could see that the query scans around 60K rows after restart as opposed to 18M rows before. EXPLAIN tells me that around 10K rows should be scanned and the result of EXPLAIN is always the same. No other processes are accessing the database at the same time and the lock_time in the slow query log is always 0. SHOW ENGINE INNODB STATUS before and after restart gives me no hints.
So finally the question: Does anybody have any clue of why I'm seeing this behavior? And how can I analyze this further?
I have the feeling that I need to configure MySQL differently in some way but I have searched and tested like crazy without coming up with anything that makes a difference.
Turns out that the behavior I saw was the result of how the MySQL optimizer uses InnoDB statistics to decide on an execution plan. This article put me on the right track (even though it does not exactly discuss my problem). The most important thing I learned from this is that MySQL calculates statistics on startup and then once in a while. This statistics is then used to optimize queries.
The way I had set up the test data the table T where most writes are done in step 4 started out as empty. After each iteration T would contain more and more records but the InnoDB statistics had not yet been updated to reflect this. Because of this the MySQL optimizer always chose an execution plan for UMQ (which includes a JOIN with T) that worked well when T was empty but worse and worse the more records T contained.
To verify this I added an ANALYZE TABLE T; before every execution of UMQ and the rapid degradation disappeared. No lightning performance but acceptable. I also saw that leaving the database for half an hour or so (maybe a bit shorter but at least more than a couple of minutes) would allow the InnoDB statistics to refresh automatically.
In a real scenario the relative difference in index cardinality for the tables involved in UMQ will look quite different and will not change as rapidly so I have decided that I don't really need to do anything about it.
thank you very much for the analysis and answer. I've been searching this issue for several days during ci on mariadb 10.1 and bacula server 9.4 (debian buster).
The situation was that after fresh server installation during a CI cycle, the first two tests (backup and restore) runs smoothly on unrestarted mariadb server and only the third test showed that one particular UMQ took about 20 minutes (building directory tree during restore process from the table with about 30k rows).
Unless the mardiadb server was restarted or table has been analyzed the problem would not go away. ANALYZE TABLE or the restart changed the cardinality of the fields and internal query processing exactly as stated in the linked article.

InnoDB deadlock with lock modes S and X

In my application, I have two queries that occur from time to time (from different processes), that cause a deadlock.
Query #1
UPDATE tblA, tblB SET tblA.varcharfield=tblB.varcharfield WHERE tblA.varcharfield IS NULL AND [a few other conditions];
Query #2
INSERT INTO tmp_tbl SELECT * FROM tblA WHERE [various conditions];
Both of these queries take a significant time, as these tables have millions of rows. When query #2 is running, it seems that tblA is locked in mode S. It seems that query #1 requires an X lock. Since this is incompatible with an S lock, query #1 waits for up to 30 seconds, at which point I get a deadlock:
Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction
Based on what I've read in the documentation, I think I have a couple options:
Set an index on tblA.varcharfield. Unfortunately, I think that this would require a very large index to store the field of varchar(512). (See edit below... this didn't work.)
Disable locking with SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
. I don't understand the implications of this, and am worried about corrupt data. I don't use explicit transactions in my application currently, but I might at some point in the future.
Split my time-consuming queries into small pieces so that they can queue and run in MySQL without reaching the 30-second timeout. This wouldn't really fix the heart of the issue, and I am concerned that when my database servers get busy that the problem will occur again.
Simply retrying queries over and over again... not an option I am hoping for.
How should I proceed? Are there alternate methods I should consider?
EDIT: I have tried setting an index on varcharfield, but the table is still locking. I suspect that the locking happens when the UPDATE portion is actually executing. Are there other suggestions to get around this problem?
A. If we assume that indexing varcharField takes a lot of disk space and adding new column will not hit you hard I can suggest the following approach:
create new field with datatype "tinyint"
index it.
this field will store 0 if varcharField is null and 1 - otherwise.
rewrite the first query to do update relying on new field. In this case it will not cause entire table locking.
Hope it helps.
You can index only part of the varchar column, it will still work, and will require less space. Just specify index size:
CREATE INDEX someindex ON sometable (varcharcolumn(32))
I was able to solve the issue by adding explicit LOCK TABLE statements around both queries. This turned out to be a better solution, since each query affects so many records, and that both of these are background processes. They now wait on each other.
http://dev.mysql.com/doc/refman/5.0/en/lock-tables.html
While this is an okay solution for me, it obviously isn't the answer for everyone. Locking with WRITE means that you cannot READ. Only a READ lock will allow others to READ.

Common-practice in dealing with high-load tables in MySQL

I have a table in MySQL 5 (InnoDB) that is used as a daemon Processing Queue, thus it is being accessed very often. It is typical to have around 250 000 records inserted per day. When I select records to be processed, they are read using a FOR UPDATE query to eliminate race conditions (everything is Transaction Based).
Now I am developing a "queue archive" and I have stumbled into a serious dead-lock problem. I need to delete "executed" records from the table as they are being processed (live), yet the table dead-locks every once in a while if I do so (two-three times per day at).
I though of moving towards delayed deletion (once per day at low load times) but this will not eliminate the problem only make it less obvious.
Is there a common-practice in dealing with high-load tables in MySQL?
InnoDB locks all rows it examines, not only those requested.
See this question for more details.
You need to create an index that would exactly match your search condition to get rid of unnecessary locks, and make sure it is used.
Unfortunately, DML queries in MySQL do not accept hints.