Two MySQL requests at the same time - Performance issue - mysql

I have a MySQL server with many innodb tables.
I have a background script that does A LOT a delete/insert with one request : it deletes many millions of rows from table 2, then insert many millions of rows to table 2 using data from table 1 :
INSERT INTO table 2 (date)
SELECT date from table 1 GROUP BY date
(The request is actually more complex but it is to shown what kind of request I am doing).
At the same time, I am going to run a second background script, that does about a million INSERT or UPDATE requests, but separately (I mean, I execute the first update query, then I execute an insert query, etc...) in table 3.
My issue is that when a script is running, it is fast, like let's say it takes 30minutes each, so 1h total. But when the two scripts are running at the same time, it is VERY slow, like it will take 5h, instead of 1h.
So first, I would like to know what can cause this ? Is it because of IO performance ? (like mysql is writing in two different tables so it is slow to switch between the two ?)
And how could I fix this ? If I could say that the big INSERT query is paused while my second background script is running, it would be great, for example... But I can't find a way to do something like this.
I am not an expert at MySQL administration.. If you need more information, please let me know !
Thank you !!

30 minutes for million INSERT is not fast. Do you have an index on date column? (or whatever column you are using to pivot on)
Regarding your original question.It's difficult to say much without knowing the details of both your scripts and the table structures, but one possible reason why the scripts are running reasonably quickly separately is because you are doing similar kinds of SELECT queries, which might be getting cached by MySQL and then reused for subsequent queries. But if you are running two queries in parallel, then the SELECT's for the corresponding query might not stay in the cache (because there are two concurrent processes which send new queries all the time).
You might want to explicitly disable cache for some queries which you are sure you only run once (using SQL_NO_CACHE modifier) and see if it changes anything. But I'd look into indexing and into your table structure first, because 30 minutes seems to be extremely slow :) E.g. you might also want to introduce partitioning by date for your tables, if you know that you always choose entries in a given period (say by month). The exact tricks depend on your data.
UPDATE: Another issue might be that both your queries work with the same table (table 1), and the default transaction isolation level in MySQL is REPEATABLE READS afair. So it might be that one query is waiting until the other is done with the table to satisfy the transaction isolation level. You might want to lower the transaction isolation level if you are sure that your table 1 is not changed when scripts are working on it.

You can use an event scheduler so you can set mysql to launch this queries at different hours of the day, in another stackoverflow related question you have an exmaple of how to do it: MySQL Event Scheduler on a specific time everyday
Another thing to have in mind is to use the explain plan to see what could be the reason the query is that slow.

Related

Will a MySQL SELECT statement interrupt INSERT statement?

I have a mysql table that keep gaining new records every 5 seconds.
The questions are
can I run query on this set of data that may takes more than 5 seconds?
if SELECT statement takes more than 5s, will it affect the scheduled INSERT statement?
what happen when INSERT statement invoked while SELECT is still running, will SELECT get the newly inserted records?
I'll go over your questions and some of the comments you added later.
can I run query on this set of data that may takes more than 5 seconds?
Can you? Yes. Should you? It depends. In a MySQL configuration I set up, any query taking longer than 3 seconds was considered slow and logged accordingly. In addition, you need to keep in mind the frequency of the queries you intend to run.
For example, if you try to run a 10 second query every 3 seconds, you can probably see how things won't end well. If you run a 10 second query every few hours or so, then it becomes more tolerable for the system.
That being said, slow queries can often benefit from optimizations, such as not scanning the entire table (i.e. search using primary keys), and using the explain keyword to get the database's query planner to tell you how it intends to work on that internally (e.g. is it using PKs, FKs, indices, or is it scanning all table rows?, etc).
if SELECT statement takes more than 5s, will it affect the scheduled INSERT statement?
"Affect" in what way? If you mean "prevent insert from actually inserting until the select has completed", that depends on the storage engine. For example, MyISAM and InnoDB are different, and that includes locking policies. For example, MyISAM tends to lock entire tables while InnoDB tends to lock specific rows. InnoDB is also ACID-compliant, which means it can provide certain integrity guarantees. You should read the docs on this for more details.
what happen when INSERT statement invoked while SELECT is still running, will SELECT get the newly inserted records?
Part of "what happens" is determined by how the specific storage engine behaves. Regardless of what happens, the database is designed to answer application queries in a way that's consistent.
As an example, if the select statement were to lock an entire table, then the insert statement would have to wait until the select has completed and the lock has been released, meaning that the app would see the results prior to the insert's update.
I understand that locking database can prevent messing up the SELECT statement.
It can also put a potentially unacceptable performance bottleneck, especially if, as you say, the system is inserting lots of rows every 5 seconds, and depending on the frequency with which you're running your queries, and how efficiently they've been built, etc.
what is the good practice to do when I need the data for calculations while those data will be updated within short period?
My recommendation is to simply accept the fact that the calculations are based on a snapshot of the data at the specific point in time the calculation was requested and to let the database do its job of ensuring the consistency and integrity of said data. When the app requests data, it should trust that the database has done its best to provide the most up-to-date piece of consistent information (i.e. not providing a row where some columns have been updated, but others yet haven't).
With new rows coming in at the frequency you mentioned, reasonable users will understand that the results they're seeing are based on data available at the time of request.
All of your questions are related to locking of table.
Your all questions depend on the way database is configured.
Read : http://www.mysqltutorial.org/mysql-table-locking/
Perform Select Statement While insert statement working
If you want to perform a select statement during insert SQL is performing, you should check by open new connection and close connection every time. i.e If I want to insert lots of records, and want to know that last record has inserted by selecting query. I must have to open connection and close connection in for loop or while loop.
# send a request to store data
insert statement working // take a long time
# select statement in while loop.
while true:
cnx.open()
select statement
cnx.close
//break while loop if you get the result

MySQL "pileup" when importing rows

I have the following cron process running every hour to update global game stats:
Create temporary table
For each statistic, insert rows into the temporary table (stat key, user, score, rank)
Truncate main stats table
Copy data from temporary table to main table
The last step causes massive backlog in queries. Looking at SHOW PROCESSLIST I see a bunch of updating-status queries that are stuck until the copy completes (which may take up to a minute).
However I did notice that it's not like it has consecutive query IDs piling up, many queries get completed just fine. So it almost seems like it's a "thread" that gets stuck or something. Also of note is that the stuck updates have nothing in common with the ongoing copy (different tables, etc)
So:
Can I have cron connect to MySQL on a dedicated "thread" such that its disk activity (or whatever it is) doesn't lock other updates, OR
Am I misinterpreting what's going on, and if so how can I find out what the actual case is?
Let me know if you need any more info.
MySQL threads are not perfectly named. If you're a Java dev, for example, you might make some untrue assumptions about MySQL threads based on your Java knowledge.
For some reason that's hard to diagnose from a distance, your copy step is blocking some queries from completing. If you're curious about which ones try doing
SHOW FULL PROCESSLIST
and try to make sense of the result.
In the meantime, you might consider a slightly different approach to refreshing these hourly stats.
create a new, non temporary table, calling it something like stats_11 for the 11am update. If the table with that name already existed, drop the old one first.
populate that table as needed.
add the indexes it needs. Sometimes populating the table is faster if the indexes aren't in place while you're doing it.
create or replace view stats as select * from stats_11
Next hour, do the same with stats_12. The idea is to have your stats view pointing to a valid stats table almost always.
This should reduce your exposure time to the stats-table building operaiton.
If the task is to completely rebuild the table, this is the best:
CREATE TABLE new_stats LIKE stats;
... fill up new_stats by whatever means ...
RENAME TABLE stats TO old_stats, new_stats TO stats;
DROP TABLE old_stats;
There is zero interference because table real is always available and always has a complete set of rows. (OK, RENAME does take a minuscule amount of time.)
No VIEWs, no TEMPORARY table, no copying the data over, no need for 24 tables.
You could consider doing the task "continually", rather than hourly. This becomes especially beneficial if the table gets so big that the hourly cron job takes more than one hour!

How to improve InnoDB's SELECT performance while INSERTing

We recently switched our tables to use InnoDB (from MyISAM) specifically so we could take advantage of the ability to make updates to our database while still allowing SELECT queries to occur (i.e. by not locking the entire table for each INSERT)
We have a cycle that runs weekly and INSERTS approximately 100 million rows using "INSERT INTO ... ON DUPLICATE KEY UPDATE ..."
We are fairly pleased with the current update performance of around 2000 insert/updates per second.
However, while this process is running, we have observed that regular queries take very long.
For example, this took about 5 minutes to execute:
SELECT itemid FROM items WHERE itemid = 950768
(When the INSERTs are not happening, the above query takes several milliseconds.)
Is there any way to force SELECT queries to take a higher priority? Otherwise, are there any parameters that I could change in the MySQL configuration that would improve the performance?
We would ideally perform these updates when traffic is low, but anything more than a couple seconds per SELECT query would seem to defeat the purpose of being able to simultaneously update and read from the database. I am looking for any suggestions.
We are using Amazon's RDS as our MySQL server.
Thanks!
I imagine you have already solved this nearly a year later :) but I thought I would chime in. According to MySQL's documentation on internal locking (as opposed to explicit, user-initiated locking):
Table updates are given higher priority than table retrievals. Therefore, when a lock is released, the lock is made available to the requests in the write lock queue and then to the requests in the read lock queue. This ensures that updates to a table are not “starved” even if there is heavy SELECT activity for the table. However, if you have many updates for a table, SELECT statements wait until there are no more updates.
So it sounds like your SELECT is getting queued up until your inserts/updates finish (or at least there's a pause.) Information on altering that priority can be found on MySQL's Table Locking Issues page.

How can I parallelize Writes to the same row in MySQL?

I'm currently building a system that does running computations, and every 5 seconds inserts or updates information based on those computations to a few rows in MySQL. I'm working on running this system on a few different servers at once right now with a few agents that are each doing similar processing and then writing on the same set of rows. I already randomize the order in which each agent writes its set of rows, but there's still a lot of deadlock happening. What's the best/fastest way to get through those deadlocks? Should I just rerun the query each time one happens, or do row locks, or something else entirely?
I suggest you try something that won't require more than one client to update your 'few rows.'
For example, you could have each agent that produces results do an INSERT to a staging table with the MEMORY access method.
Then, every five seconds you can run a MySQL event (a stored procedure within the server) that loops through all the rows in that table, posting their results to your 'few rows' and then deleting them. If it's important for the rows in your staging table to be processed in order, then you can use an AUTO_INCREMENT id field. But it might not be important for them to be in order.
If you want to get fancier and more scalable than that, you'll need a queue management system like Apache ActiveMQ.

MySQL row locking myisam innodb

I've got a theoretical question and can't find a good solution for this on the net:
For a tblA with 100,000 recs.
I want to have multiple processes/apps running, each of which accesses tblA.
I don't want the apps to access the same recs. ie, I want appA to access the 1st 50 rows, with appB accessing the next 50, and appC accessing the next 50 after that..
So basically I want the apps to do a kind of fetch on the next "N" recs in the table. I'm looking for a way to access/process the row data as fast as possible, essentially running the apps in a simultaneous manner. but I don't want the apps to process the same rows.
So, just how should this kind of process be set up?
Is it simply doing a kind of:
select from tblA limit 50
and doing some kind of row locking for each row (which requires innodb)
Pointers/psuedo code would be useful.
Here is some posts from the DBA StackExchange on this
https://dba.stackexchange.com/q/10017/877
https://dba.stackexchange.com/a/4470/877
It discusses SELECT ... LOCK IN SHARE MODE and potential headcahes that comes with it.
Percona wrote a nice article on this along with SELECT ... FOR UPDATE
Your application should handle what data it wants to access. Create a pointer in that. If you're using stored procedures, use another table to store the pointers. Each process would "reserve" a set of rows before beginning processing. Every process should check for the max of that and also see if it is greater than the length of the table.
If you are specifically looking for processing first set, second set, etc. The you can use LIMIT # (i.e. 0,50 51,100 101,150) with an ORDER BY. Locking is not necessary since the processes won't even try to access each others record sets. But I can't imagine a scenario where that would be a good implementation.
An alternative is to just to use update with a limit, then select the records that were updated. You can use the process ID, random number or something else that is almost guaranteed to be unique across processes. Add a "status" field to your table indicating if the record is available for processing (i.e. value is NULL). Then each process would update the status field to "own" the record for processing.
UPDATE tblA SET status=1234567890 WHERE status IS NULL LIMIT 50;
SELECT * FROM tblA WHERE status=1234567890;
This would work for MyISAM or Innodb. With Innodb you would be able to have multiple updates running at once, improving performance.
The problem with these solutions is lag time. If process A executes at 12:00:00 and proccess B also executes at precisely the same time, and in an application, there are several blocks of distinct code leading up to the locks/DMLs, the process time for each would vary. So process A may complete first, or it may be process B. If process A is setting the lock, and process B modifies the record first, you're in trouble. This is the trouble with forking.