MySQL "pileup" when importing rows - mysql

I have the following cron process running every hour to update global game stats:
Create temporary table
For each statistic, insert rows into the temporary table (stat key, user, score, rank)
Truncate main stats table
Copy data from temporary table to main table
The last step causes massive backlog in queries. Looking at SHOW PROCESSLIST I see a bunch of updating-status queries that are stuck until the copy completes (which may take up to a minute).
However I did notice that it's not like it has consecutive query IDs piling up, many queries get completed just fine. So it almost seems like it's a "thread" that gets stuck or something. Also of note is that the stuck updates have nothing in common with the ongoing copy (different tables, etc)
So:
Can I have cron connect to MySQL on a dedicated "thread" such that its disk activity (or whatever it is) doesn't lock other updates, OR
Am I misinterpreting what's going on, and if so how can I find out what the actual case is?
Let me know if you need any more info.

MySQL threads are not perfectly named. If you're a Java dev, for example, you might make some untrue assumptions about MySQL threads based on your Java knowledge.
For some reason that's hard to diagnose from a distance, your copy step is blocking some queries from completing. If you're curious about which ones try doing
SHOW FULL PROCESSLIST
and try to make sense of the result.
In the meantime, you might consider a slightly different approach to refreshing these hourly stats.
create a new, non temporary table, calling it something like stats_11 for the 11am update. If the table with that name already existed, drop the old one first.
populate that table as needed.
add the indexes it needs. Sometimes populating the table is faster if the indexes aren't in place while you're doing it.
create or replace view stats as select * from stats_11
Next hour, do the same with stats_12. The idea is to have your stats view pointing to a valid stats table almost always.
This should reduce your exposure time to the stats-table building operaiton.

If the task is to completely rebuild the table, this is the best:
CREATE TABLE new_stats LIKE stats;
... fill up new_stats by whatever means ...
RENAME TABLE stats TO old_stats, new_stats TO stats;
DROP TABLE old_stats;
There is zero interference because table real is always available and always has a complete set of rows. (OK, RENAME does take a minuscule amount of time.)
No VIEWs, no TEMPORARY table, no copying the data over, no need for 24 tables.
You could consider doing the task "continually", rather than hourly. This becomes especially beneficial if the table gets so big that the hourly cron job takes more than one hour!

Related

Proper way to sync a table from another similar table with a few different columns while inserts and updates are happening

We need to alter an existing InnoDb table with 10+ million records to add a few columns. We tried simple alter table query and it took almost an hour to complete. However, the change did not reflect. No error details available.
So, we are trying this approach:
creating a new table with same schema,
then altering the table
then syncing up data from the existing table
then just renaming the first table to use a different name (application will cause error during this time) and then renaming the 2nd table to the production name, being used by application.
Problem in hand
I am not sure how to go ahead with the syncing, while application is live.
I think we should go ahead with syncing, instead of just dumping and restoring. If dumping is to be done, should be done by shutting down traffic.
Edits can happen to the table in question corresponding to txns done. So, we need to ensure that in addition to sanity checks on total accounts migrated, we also don’t lose any edits done to the table during migration.
Is a stored procedure needed in this scenario?
Update
We need to make sure no updates to existing table (being written from application) and inserts are missed. Not sure if stored procedure is the solution here.
Do we need to shut down writes completely for this? Any way of doing this by keeping application running?

MYSQL Table Times Out During Drop

I have a table with several hundred million rows of data. I want to delete the table, but every operation I perform on the table loses connection after running for 50,000+ seconds (about 16 hours), which is under the 60,000 second time out condition I have set in the database. I've tried creating a stored procedure with the Drop Table code thinking that if I send the info to the DB to perform the operation it will not need a connection to process it, but it does the same thing. Is it just timing out? Or do I need to do something else?
Instead do TRUNCATE TABLE. Internally it creates an equivalent, but empty, table, then swaps. This technique might take a second, even for a very big table.
If you are deleting most of a table, then it is usually faster (sometimes a lot faster), to do
CREATE TABLE new LIKE real;
INSERT INTO new
SELECT ... FROM real
WHERE ... -- the rows you want to keep
Why do you need to delete everything?
For other techniques in massive deletes, including big chunks out of a huge table, see https://mariadb.com/kb/en/mariadb/big-deletes/

Two MySQL requests at the same time - Performance issue

I have a MySQL server with many innodb tables.
I have a background script that does A LOT a delete/insert with one request : it deletes many millions of rows from table 2, then insert many millions of rows to table 2 using data from table 1 :
INSERT INTO table 2 (date)
SELECT date from table 1 GROUP BY date
(The request is actually more complex but it is to shown what kind of request I am doing).
At the same time, I am going to run a second background script, that does about a million INSERT or UPDATE requests, but separately (I mean, I execute the first update query, then I execute an insert query, etc...) in table 3.
My issue is that when a script is running, it is fast, like let's say it takes 30minutes each, so 1h total. But when the two scripts are running at the same time, it is VERY slow, like it will take 5h, instead of 1h.
So first, I would like to know what can cause this ? Is it because of IO performance ? (like mysql is writing in two different tables so it is slow to switch between the two ?)
And how could I fix this ? If I could say that the big INSERT query is paused while my second background script is running, it would be great, for example... But I can't find a way to do something like this.
I am not an expert at MySQL administration.. If you need more information, please let me know !
Thank you !!
30 minutes for million INSERT is not fast. Do you have an index on date column? (or whatever column you are using to pivot on)
Regarding your original question.It's difficult to say much without knowing the details of both your scripts and the table structures, but one possible reason why the scripts are running reasonably quickly separately is because you are doing similar kinds of SELECT queries, which might be getting cached by MySQL and then reused for subsequent queries. But if you are running two queries in parallel, then the SELECT's for the corresponding query might not stay in the cache (because there are two concurrent processes which send new queries all the time).
You might want to explicitly disable cache for some queries which you are sure you only run once (using SQL_NO_CACHE modifier) and see if it changes anything. But I'd look into indexing and into your table structure first, because 30 minutes seems to be extremely slow :) E.g. you might also want to introduce partitioning by date for your tables, if you know that you always choose entries in a given period (say by month). The exact tricks depend on your data.
UPDATE: Another issue might be that both your queries work with the same table (table 1), and the default transaction isolation level in MySQL is REPEATABLE READS afair. So it might be that one query is waiting until the other is done with the table to satisfy the transaction isolation level. You might want to lower the transaction isolation level if you are sure that your table 1 is not changed when scripts are working on it.
You can use an event scheduler so you can set mysql to launch this queries at different hours of the day, in another stackoverflow related question you have an exmaple of how to do it: MySQL Event Scheduler on a specific time everyday
Another thing to have in mind is to use the explain plan to see what could be the reason the query is that slow.

Will cron job for creating db affect the user?

I have a website where I need to create a temporary database table which recreates every 5 hours. It takes about 0.5 sec to complete the action.
Site analytics shows about 5 hits / Second. This figure may gradually increase.
Question
The cron job empties the db and the recreates it. Does it mean, while someone is accessing a page which populates data based on the temporary db while its active under the cron job, he may get no data found or incomplete data?
Or
This scenario is taken care of by Mysql due to locking?
From my tests, if one MySQWL client attempts to drop a database while another client has one of its tables locked, the client will wait until the tasble is unlocked.
However the client dropping the database cannot itself hold a lock on any of the database's tables either. So depending on what you are doing, you may need to use some other method to serialise requests and so on. For example, if the job needs to drop and re-create the database, create the table(s) and populate them before other clients use them, table locking will be a problem because there won't always be a table to lock.
Consider using explicit locking using get_lock() to coordinate operations on the "temporary" database.
Also consider rethinking your strategy. Dropping and re-creating entire databases on a regular basis is not a common practice.
Instead of dropping and recreating, you might to create first under a temporary name, populate and then drop the old one while renaming the new one.
Additionally, you should either make your web app fit for retrying if the table was not found in order to cope with the small time window where the table is not here, or operate on a view instead of renaming tables.
as I know, when you lock the table, others couldn't access that table unless you unlock it, but other connections will only be waiting until 0.5 seconds later, so your users may have to wait for extra 0.5 seconds when you recreate the table.
don't worry about no data, only sometime delay.

How to avoid to blow up transaction log?

I have a table which stores data out of a complex query. This table is truncated and new populated once per hour. As you might assume this is for performance reason so the application accesses this table and not the query.
Is truncate and insert the only way to resolve this task cheap, or are there other possibilities in respect of the transaction log?
If I am assuming right, you are using this table as a temp table to store some records and want to remove all records from this table every one hour, right?
Truncate is always minimally logged. So yes, truncate and then insert will work. Another option is to create a new table with same structure. Drop old table and then rename new table to the old table name.
If you want to avoid the above, you can explore the "simple" recovery model (this has implications on point of time recovery - so be very careful with this if you have other tables in this same database). Or you can create a new database which will just have this one table, set recovery for this DB to "simple". Simple recovery model will help you keep your t-log small.
Lastly, if you have to have full recovery and also cannot use "truncate" or "drop" options from above, you should at the very least backup your t-log at very regular intervals (depending on how big its growing and how much space you have).