Too many unknown writes in MySQL - mysql

I have a MySQL database in my production environment.Which had about 430 million row, of which 190 million rows were not of any use, so I started deleting these rows range by range, in night, as it would have affected my apps performance in daytime.
Now when I am seeing in my monitoring app, I am seeing 100%IO, of which maximum is write (12-30MB/s). (400-500 writes/sec)
But when I am checking process list I don't find any INSERT or UPDATE query or any rollback.
What can be the possible issue or how can I find any hidden query which may be writing in MySQL.
(In IOTP, I found that write operations are being done by mysqld only)
One more thing, I can see write with 80MB/s in IOTOP , but when I am checking directory size in / , I don't see any rise in any directory size.

Back away slowly... and wait.
InnoDB doesn't change the actual data in the tablespace files with each DML query.
It writes the changes to memory, of course, and then the redo log, at which point they are effectively "live" and safely persisted to disk... but they are not yet applied to the actual data (tablespace) files. InnoDB then syncs the changes to the data files in the background but in the mean time, other queries use a combination of the tablespace and log contents to determine what the "official" table data currently contains. This is, of course, an oversimplification, but MVCC necessarily means the physical data is a superset, though not necessarily a proper superset, of the logical data.
That's very likely to be the explanation for what you are seeing now.
It makes sense that free/used disk space isn't changing, because finalizing the deletion of those rows will only really be marking the space inside the tablespace files as unused. They shouldn't grow or shrink.
Resist the temptation to try to "fix" it and whatever you do, don't restart the server... because the best possible outcome is that it will pick up where it left off because it still has work to do.
SHOW ENGINE INNODB STATUS takes some practice to interpret but will likely be another key to the puzzle.

Is the delete operation still undergoing? DELETE can be extremely slow and generate a lot of writes. It is often better to create a new identical table and copy the rows you want to KEEP over to it and then switch it with the production table instead of delete stuff in the production table directly.
If the DELETE has already finished and you suspect that there are other queries running, you can enable query log for a few seconds and see which queries are executed:
TRUNCATE TABLE mysql.general_log;
SET GLOBAL log_output = 'TABLE';
SET GLOBAL general_log = 'ON';
SELECT SLEEP(10);
SET GLOBAL general_log = 'OFF';
Then SELECT from mysql.general_log to see which queries executed during the 10 seconds sleep.

Related

Long running innodb query generate a big undo file in mariadb

I have a big query in php using MYSQLI_USE_RESULT not to put all the results into the php memory.
Because if I use MYSQLI_STORE_RESULT it will put all of the data into memory for all results, which takes multiple GB of ram, instead of getting row by row.
It returns millions of rows and each row will generate an api request, so the query will be running for days.
In the mean time, I have other mysql queries that update/insert the tables related to the first query, and I think it cause the undo log to grow without stopping.
I setup innodb_undo_tablespaces=2 and innodb_undo_log_truncate = ON
so the undo log is separated from ibdata1, but the undo files are still big until I kill the queries that have been running for days.
I executed "SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;" before running the long running query, hoping that it would prevent undo file to grow, but it didn't.
The other queries that are updating/inserting have autocommit.
In 1-2 day, the undo file is already 40GB large.
The question : how to prevent this undo file to increase ? As I don't want to keep the previous version of the data while the query is running. It's not important if I get updated data instead of the data that was at the time of the query.
Regardless of your transaction isolation level, a given query will always establish a fixed snapshot, which requires the data to be preserved in the state it was when the query started.
In other words, READ-COMMITTED or READ-UNCOMMITTED allow subsequent queries in the same transaction to see updated data, but a single query will never see a changing data set. Thus concurrent updates to data will force old record versions to be copied to the undo log, and those record versions will be preserved there until your long-running query is finished.
READ-UNCOMMITTED doesn't help any more than READ-COMMITTED. In fact, I've never needed to use READ-UNCOMMITTED for any reason. Allowing "dirty reads" of unfinished transactions breaks rules of ACID databases, and leads to anomalies.
The only way to avoid long-lasting growth of your undo log is to finish your query.
The simplest way to achieve this is to use multiple short-running queries, each fetching a subset of the result. Finish each query in a timely way.
Another solution would be to run the whole query for the millions of rows of result, and store the result somewhere that isn't constrained by InnoDB transaction isolation.
MyISAM table
Message queue
Plain file on disk
Cache like Memcached or Redis
PHP memory (but you said you aren't comfortable with this because of the size)

Will remove MySQL logs file impact the performance of the database?

One of our users is using Cloud SQL(MySQL).
They turn on general logs flag and log_output is file.
They need these general logs because of some special circumstances.
MySQL generates about 8TB of general logs and these logs result a bigger use of the disk.
Here is the tricky part:
They want to remove these general logs file [1] to decrease the size of the disk.
However, this is their production database. They afraid this operation will impact their database's performance.
Since these log files are located in /var/log/mysql.log, the remove logs operation will execute on the OS level, right? -> This is the part we are not so sure.
If our user executes this truncateAPI, will this operation affect their database's performance?
Is there any best practice for this kind of situation?
P.S: Our user doesn't want to turn off general logs flag. They will try to truncate these logs once for a while. But for now, they need to truncate the huge amount of logs that they accumulated in the past few momths.
[1] https://cloud.google.com/sql/docs/mysql/admin-api/v1beta4/instances/truncateLog
I understand that you have turned on general logs flag and log_output is FILE and you want to remove these general logs files to decrease the size of the disk.
According to to the official documentation link:
To make your general or slow query logs available, enable the
corresponding flag and set the log_output flag to FILE. This makes the
log output available using the Logs Viewer in the Google Cloud
Platform Console. Note that Stackdriver logging charges apply.
If log_output is set to NONE, you will not be able to access the logs.
If you set log_output to TABLE, the log output is placed in a table in
your database. If this table becomes large, it can affect instance
restart time or cause the instance to lose its SLA coverage; for this
reason, the TABLE option is not recommended. If needed, you can
truncate your log tables by using the API. For more information, see
the instances.truncateLog reference page.
Instances: truncateLog truncate MySQL general and slow query log tables.
If I understood correctly, you can not "truncate the huge amount of logs that they accumulated in the past few months" because you did not set log_output to TABLE, therefore there are no tables to be truncated.
Regarding database performance: TRUNCATE TABLE Statement
On a system with a large InnoDB buffer pool and
innodb_adaptive_hash_index enabled, TRUNCATE TABLE operations may
cause a temporary drop in system performance due to an LRU scan that
occurs when removing an InnoDB table's adaptive hash index entries.
The problem was addressed for DROP TABLE in MySQL 5.5.23 (Bug
13704145, Bug #64284) but remains a known issue for TRUNCATE TABLE (Bug #68184).
Here you can check MySQL Server Log Maintenance.
Removing MySQL general_log FILES should not impact the performance of the database.

tmp table not growing when running ALTER to add new column

I have about a 25 GB table that I have to add a column to.
I run a script and when I execute it, I can see the temp table in the data directory but it stays stuck at about 480K. I can see in processlist that the ALTER is running and there are no issues.
If I kill the script after a long period of activity, then in processlist the query remains in "killed" state and the tmp file will start growing until the query is LITERALLY killed (ie., goes from "killed" state in processlist to disappearing off of the processlist altogether).
When I run the following (before killing the query):
select * from global_temporary_tables\G
it doesn't show any rows being added either.
Is there anything else I can do?
Firstly, what your "ps" output report may show has nothing to do with anything. Don't rely upon what "ps" says: it includes stale data.
If the process has been killed (SIGKILL, not SIGTERM), I guarantee you it is no longer delivering any output to anywhere. If it's been SIGTERMed, it depends what signal handlers you've attached. I'm going to hazard a wild guess that you haven't registered any signal handlers.
Most production DBMS set up storage in chunks. X amount of space is obtained, which may contain "slack" room that enables rows and/or columns to be added (I do NOT say that the two mechanisms are identical). Just because something didn't grow in a manner that you could perceive doesn't mean that the changes weren't made. Why not check out the data dictionary, interrogating the current structure of the table.
Did you COMMIT your changes? In some DBMS, DDL operations are regarded as committable/rollbackable (yecch) events.

how to clear/flush mysql innodb buffer pool?

I'm perf tuning a large query, and want to run it from the same baseline before and after, for comparison.
I know about the mysql query cache, but its not relevant to me, since the 2 queries would not be cached anyway.
What is being cached, is the innodb pages, in the buffer pool.
Is there a way to clear the entire buffer pool so I can compare the two queries from the same starting point?
Whilst restarting the mysql server after running each query would no doubt work, Id like to avoid this if possible
WARNING : The following only works for MySQL 5.5 and MySQL 5.1.41+ (InnoDB Plugin)
Tweak the duration of entries in the InnoDB Buffer Pool with these settings:
// This is 0.25 seconds
SET GLOBAL innodb_old_blocks_time=250;
SET GLOBAL innodb_old_blocks_pct=5;
SET GLOBAL innodb_max_dirty_pages_pct=0;
When you are done testing, setting them back to the defaults:
SET GLOBAL innodb_old_blocks_time=0;
SET GLOBAL innodb_old_blocks_pct=37;
SET GLOBAL innodb_max_dirty_pages_pct=90;
// 75 for MySQL 5.5/MySQL 5.1 InnoDB Plugin
Check out the definition of these settings
MySQL 5.5
innodb_old_blocks_time
innodb_old_blocks_pct
innodb_max_dirty_pages_pct
MySQL 5.1.41+
innodb_old_blocks_time
innodb_old_blocks_pct
innodb_max_dirty_pages_pct
Much simpler... Run this twice
SELECT SQL_NO_CACHE ...;
And look at the second timing.
The first one warms up the buffer_pool; the second one avoids the QC by having SQL_NO_CACHE. (In MySQL 8.0, leave off SQL_NO_CACHE; it is gone.)
So the second timing is a good indication of how long it takes in a production system with a warm cache.
Further, Look at Handler counts
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handlers%';
gives a reasonably clear picture of how many rows are touched. That, in turn, gives you a good feel for how much effort the query takes. Note that this can be run quite successfully (and quickly) on small datasets. Then you can (often) extrapolate to larger datasets.
A "Handler_read" might be reading an index row or a data row. It might be the 'next' row (hence probably cached in the block that was read for the previous row), or it might be random (hence possibly subject to another disk hit). That is, the technique fails to help much with "how many blocks are needed".
This Handler technique is impervious to what else is going on; it gives consistent results.
"Handler_write" indicates that a tmp table was needed.
Numbers that approximate the number of rows in the table (or a multiple of such), probably indicate a table scan(s). A number that is the same as LIMIT might mean that you build such a good index that it consumed the LIMIT into itself.
If you do flush the buffer_pool, you could watch for changes in Innodb_buffer_pool_reads to give a precise(?) count of the number of pages read in a cold system. This would include non-leaf index pages, which are almost always cached. If anything else is going on in the system, this STATUS value should not be trusted because it is 'global', not 'session'.

Innodb Performance Optimization

One of the portion of my site requires bulk insert, it takes around 40 mins for innodb to load that file into database. I have been digging around the web and found few things.
innodb_autoinc_lock_mode=2 (It wont generate consecutive keys)
UNIQUE_CHECKS=0; (disable unique key checks)
FOREIGN_KEY_CHECKS=0 (disable foreign key checks)
--log_bin=OFF turn off binary log used for replication
Problem
I want to set first 3 options for just one session i.e. during bulk insert. The first option does not work mysql says unknown system variable 'innodb_autoinc_lock_mode'. I am using MySQL 5.0.4
The last option, I would like to turn it off but I am wondering what if I need replication later will it just start working if I turn it on again?
Suggestions
Any other suggestions how to improve bulk inserts/updates for innodb engine? Or please comments on my findings.
Thanks
Assuming you are loading the data in a single or few transactions, most of the time is likely to be spent building indexes (depending on the schema of the table).
Do not do a large number of small inserts with autocommit enabled, that will destroy performance with syncs for each commit.
If your table is bigger (or nearly as big as) the innodb buffer pool you are in trouble; a table which can't fit in ram with its indexes cannot be inserted into efficiently, as it will have to do READS to insert. This is so that existing index blocks can be updated.
Remember that disc writes are ok (they are mostly sequential, and you have a battery-backed raid controller, right?), but reads are slow and need to be avoided.
In summary
Do the insert in a small number of big-ish transactions, say 10k-100k rows or each. Don't make the transactions too big or you'll exhaust the logs.
Get enough ram that your table fits in memory; set the innodb buffer pool appropriately (You are running x86_64, right?)
Don't worry about the operation taking a long time, as due to MVCC, your app will be able to operate on the previous versions of the rows assuming it's only reading.
Don't make any of the optimisations listed above, they're probably waste of time (don't take my word for it - benchmark the operation on a test system in your lab with/without those).
Turning unique checks off is actively dangerous as you'll end up with broken data.
To answer the last part of your question, no it won't just start working again; if the inserts are not replicated but subsequent updates are, the result will not be a pretty sight. Disabling foreign and unique keys should be OK, provided you re-enable them afterwards, and deal with any constraint violations.
How often do you have to do this? Can you load smaller datasets more frequently?