Why should we store log files and bin-log files on different path or disks in mysql - mysql

I have replication setup mysql databases....the log file location the bin-log file all are at one path that is default my data directory of mysql.
I have read that for better performance one should store them separately.
Can anyone provide me how this improves the performance. Is there is documentation available for the same. The reason why one should do so?

Mainly because then, reads and writes can be made almost in parallel. Stored separately meaning on different disks.
Linux and H/W optimizations for MySQL is a nice presentation of ways to improve MySQL performance - it presents benchmarks and conclusions of when to use SSD disks and when to use SCSI disks, what kind of processors are better for what tasks.
Very good presentation, a must read for any DBA!!

It also can be really embarrassing to have your log files fill the file system and bring the database to a halt.

One consideration is that using a separate disk for binlogging introduces another SPOF since if MySQL cannot write the binlog it will croak the same as if it couldn't write to the data files. Otherwise, adding another disk just better separates the two tasks so that binlog writes and data file writes don't have to contend for resources. With SSDs this is much less of an issue unless you have some crazy heavy write load and are already bound by SSD performance.

It's mostly for cases where your database write traffic is so high that a single disk volume can't keep up while writing for both data files and log files. Disks have a finite amount of throughput, and you could have a very busy database server.
But it's not likely that separating data files from binlogs will give better performance for queries, because MySQL writes to the binlog at commit time, not at query time. If your disks were too slow to keep up with the traffic, you'd see COMMIT become a bottleneck.
The system I currently support stores binlogs in the same directory as the datadir. The datadir is on a RAID10 volume over 12 physical drives. This has plenty of throughput to support our workload. But if we had about double our write traffic, this RAID array wouldn't be able to keep up.
You don't need to do every tip that someone says gives better performance, because any given tip might make no difference to your application's workload. You need to measure many metrics of performance and resource use, and come up with the right tuning or configuration to help the bottlenecks under your workload.
There is no magic configuration that makes everything have high performance.

Related

Distributing MySQL storage to drives by function

I'm wondering if MySQL has any capability to specify that data belonging to a certain account (representing e.g., a particular app, or a particular corporate program) be stored at at some particular place in the filesystem (such as a particular drive or RAID), instead of bundling it inside the same physical file structure that is shared by every other account, table, and data element managed by MySQL for everybody on that server.
I'm aware that I can jigger MySQL to store its entire data bundle at a place other than the default place, but I was hoping there might be a way to do this by function, for "some data but not all data."
In MySQL 8.0, there are options to specify the location for each table or tablespace. See https://dev.mysql.com/doc/refman/8.0/en/innodb-create-table-external.html
In earlier versions of MySQL, these options didn't work consistently. You could specify the directory for individual table partitions, if your table was partitioned, but not for a non-partitioned table. Go figure. :-)
That said, I've never encountered a situation where it was worth the time to specify the physical location of tables. Basically, if your performance depends on the difference between one RAID filesystem vs. carefully choosing among different drives, you're already losing.
Instead, I've always done this approach:
Use one datadir on a fast RAID filesystem. Use the default configuration of all tables and logs under this datadir.
Allocate as much RAM as I can afford to the InnoDB buffer pool (up to the size of the database of course - no need to use more RAM than that). RAM is orders of magnitude faster than any disk, even an SSD. So you'd prefer to be reading data out of RAM.
If that's not enough performance, there are other things you can do to optimize, like creating indexes, or modifying the application code to do more caching to reduce database reads, or using a message queue to postpone database writes.
If that's still not enough performance, then scale out to multiple database servers. In other words, sharding.

MySQL: Speed over reliability config

For my development machine I need no data consistency in case of a crash. Is there a config for a Debian-like system, that optimizes MySQL for speed (even if it sacrifices reliability)?
So something like: Cache the last 1 GB in RAM. Don't touch the disk with data until the 1 GB is used.
What kind of queries are going on? One of my mantras: "You cannot configure your way out of a performance problem."
Here's one thing that speeds up InnoDB, wrt transactions:
innodb_flush_log_at_trx_commit = 2
There is a simple way to speed up single-row inserts by a factor of 10.
Some 'composite' indexes can speed up a SELECT by a factor of 100.
Reformulating a WHERE can sometimes speed up a query by a factor of 100.
You can disable many of the InnoDB configurations for durability, at the risk of increased risk of losing data. But sometimes you want to operate the database in Running with scissors mode because the original data is safely stored somewhere else, and the copy in your test database is easily recreated.
This blog describes Reducing MySQL durability for testing. You aren't going to see any official MySQL recommendation to do this for any purpose other than testing!
Here's a summary of changes you can make in your /etc/my.cnf:
[mysqld]
# log_bin (comment this out to disable the binary log)
# sync_binlog=0 (irrelevant if you don't use the binary log)
sync_frm=0
innodb_flush_log_at_trx_commit=0
innodb_doublewrite=0
innodb_checksums=0
innodb_support_xa=0
innodb_log_file_size=2048M # or more
He also recommends to increase innodb_buffer_pool_size, but the size depends on your available RAM.
For what it's worth, I recently tried to set innodb_flush_log_at_trx_commit=0 in the configuration in the default Vagrant box I built for developers on my team, but I had to back out that change because it was causing too much lost time for developers who were getting corrupted databases. Just food for thought. Sometimes it's not a good tradeoff.
This doesn't do exactly what you asked (keep the last 1GB of data in RAM), as it still operates InnoDB with transaction logging and the log flushes to disk once per second. There's no way to turn that off in MySQL.
You could try using MyISAM, which uses buffered writes for data and index, and relies on the filesystem buffer. Therefore it could cache some of your data (in practice I have found that the buffer flushes to disk pretty promptly, so you're unlikely to have a full 1GB in RAM at any time). MyISAM has other problems, like lack of support for transactions. Developing with MyISAM and then using InnoDB in production can set you up for some awkward surprises.
Here's a couple of other changes you could make in your MySQL sessions for the sake of performance, but I don't recommend these even for development, because it can change your application behavior.
set session unique_checks=0;
set session foreign_key_checks=0;
Some people recommend using the MEMORY storage engine. That has its own problems, like size limits, table-locking, and lack of support for transactions.
I've also experimented with trying to put tables or tmpdir onto a tmpfs, but I found that didn't give nearly the performance boost you might expect. There's overhead in an RDBMS that is not directly related to disk I/O.
You might also like to experiment with MyRocks, a version of MySQL including the RocksDB storage engine for MySQL. Facebook developed it and released it as open-source. See Facebook rocks an open source storage engine for MySQL (InfoWorld). They promise it reduces I/O, it compresses data, and does other neat things.
But again, it's a good rule of thumb to make your development environment as close as possible to your production environment. Using a different storage engine creates a risk of not discovering some bugs until your code reaches production.
Bottom line: Tuning MySQL isn't a magic bullet. Maybe you should consider designing your application to make more use of microservices, caches, and message queues, and less reliance on direct SQL queries.
Also, I'd recommend to always supply your developers the fastest SSD-based workstation you can afford. Go for the top of the line on CPU and RAM and disk speed.
#Bill Karwin's answer has useful mysql settings to improve performance. I have used them all and was able to achieve a roughly 2x performance improvement.
However, what gave me the biggest performance boost (nearly 15x faster) for my use case -- which was reloading a mysql dump -- was to mount the underlying filesystem (ext4) using the nobarriers option.
mount -o remount,nobarrier /
More info here
You should only consider this if you have a separate partition (or logical volume) mounted at /var/lib/mysql, so that you can make this tradeoff only for MySQL, not your entire system.
Although this answer may not hit exactly the questions you ask, consider creating your tables with MEMORY engine as documented here: http://dev.mysql.com/doc/refman/5.7/en/memory-storage-engine.html
A typical use case for the MEMORY engine involves these
characteristics:
Operations involving transient, non-critical data such as session
management or caching. When the MySQL server halts or restarts, the
data in MEMORY tables is lost.
In-memory storage for fast access and low latency. Data volume can fit
entirely in memory without causing the operating system to swap out
virtual memory pages.
A read-only or read-mostly data access pattern (limited updates).
Give that a shot.
My recommendation, even for a development machine, would be to use the default InnoDB. If you choose to do transactions, InnoDB will be helpful.
This blog can help you run MySQL off of tmpfs: http://jotschi.de/2014/02/03/high-performance-mysql-testdatabase/. User Jotschi also speaks about that in a SO answer #10692398

Apache & MySQL with Persistent Disks to Multiple Instances

I plan on mount persistent disks into folders Apache(/var/www) and Mysql (/var/lib/mysql) to avoid having to replicate information between servers.
Anyone has done tests to know the I/O performance of persistent disk is similar when attaching the same disk to 100 instances as well as only 2 instances? Also has a limit of how many instances can be attach one persistent disk?
I'm not sure exactly what setup you're planning to use, so it's a little hard to comment specifically.
If you plan to attach the same persistent disk to all servers, note that a disk can only be attached to multiple instances in read-only mode, so you may not be able to use temporary tables, etc. in MySQL without extra configuration.
It's a bit hard to give performance numbers for a hypothetical configuration; I'd expect performance would depend on amount of data stored (e.g. 1TB of data will behave differently than 100MB), instance size (larger instances have more memory for page cache and more CPU for processing I/O), and access pattern. (Random reads vs. sequential reads)
The best option is to set up a small test system and run an actual loadtest using something like apachebench, jmeter, or httpperf. Failing that, you can try to construct an artificial load that's similar to your target benchmark.
Note that just running bonnie++ or fio against the disk may not tell you if you're going to run into problems; for example, it could be that a combination of sequential reads from one machine and random reads from another causes problems, or that 500 simultaneous sequential reads from the same block causes a problem, but that your application never does that. (If you're using Apache+MySQL, it would seem unlikely that your application would do that, but it's hard to know for sure until you test it.)

Can massive writing & deleting files hurt our server performance?

We run a system that for cache purposes, currently writes and deletes about 1,000 small files (10k) every hour.
In the near future this number will raise to about 10,000 - 20,000 files being written and deleted every hour.
For every files that is being written a new row on our mysql DB is added and deleted when the file is deleted an hour later.
My question:
Can this excessive write & delete operation hurt our server performance eventually somehow?
(btw we currently run this on a VPS and soon on a dedicated server.)
Can writing and deleting so many rows eventually slow our DB?
This greatly depends on operating system, file system and configuration of file system caching. Also this depends on whether your database is stored on the same disk as files that are written/deleted.
Usually, operation that affect file system structure such as file creations and file deletions require some synchronous disk IO, so operating system will not loose these changes after power failure. Though, some operating systems and file systems may support more relaxed policy for this. For example, UFS file system on FreeBSD has nice "soft updates" option that does this. Probably etx3/Linus should have similar feature.
Once you will move to dedicated server I think it would be reasonable to attach several HDDs to it and to make sure that database is stored on once disk while massive file operations are performed on another disk. In this cases DB performance should not be affected.
You should make some calculations and estimate needed throughtput for the storage. In your worst scenario, 20000 files x 10K = 200MB per hour which is a very low requirement.
Deleting a file, on modern filesystems, takes a very little time.
In my opinion you don't have to worry, especially if your applications creates and deletes files sequentially.
Consider also that modern operative systems cache parts of file system in memory to improve performance and reduce disk access (this is true especially for multiple deletes).
Your database will grow but engines are optimized for it, no need to care about it.
Only downside is that handling many small files could cause disk fragmentation if your file system is subjected to it.
For a performance bonus, you should consider to use a separate phisical storage for these files (e.g. a different disk drive or disk array) so you will take advantage of full bandwidth transfer with no other interferences.

How long should a 20GB restore take in MySQL? (A.k.a. Is something broken?)

I'm trying to build a dev copy of a production MySQL database by loading one of the backups. How long should it take to do this if the uncompressed dump is ~20G?
This command has been running for something like 24h with 10% CPU load and I'm wondering if it's just slow or if it/I am doing something wrong.
mysql -u root -p < it_mysql_dump.sql
BTW it's on a beefy desktop dev machine with plenty of ram, but it might be reading and writing the same HDD. I think I'm using InnoDB.
Restoring MySQL dumps can take a long time. This is because it does really rebuild the entire tables.
Exactly what you need to do to fix it depends on the engine, but in general
I would say, do the following:
Zeroth rule: Only use a 64-bit OS.
Make sure that you have enough physical ram to fit the biggest single table into memory; include any overhead for the OS in this calculation (NB: On operating systems that use 4k pages i.e. all of them, the page tables take up a lot of memory themselves on large-memory systems - don't forget this)
Tune the innodb_buffer_pool such that it is bigger than the largest single table; or if using MyISAM, tune the key_buffer so that it is big enough to hold the indexes of the largest table.
Be patient.
Now, if you are still finding that it is slow having done the above, it may be that your particular database has a very tricky structure to restore.
Personally I've managed to rebuild a server with ~ 2Tb in < 48 hours, but that was a particular case.
Be sure that your development system has production-grade hardware if you intend to load production data into it.
In particular, if you think that you can bulk-load data into tables which don't fit into memory (or at least, mostly into memory), forget it.
If this all seems like too much, remember that you can just use a filesystem or LVM snapshot online with InnoDB, and then just copy the files. With MyISAM it's a bit trickier but can still be done.
Open another terminal, run mysql, and count the rows in some of the tables in your dump (SELECT COUNT(*) FROM table). Compare to the source database. That'll tell you the progress.
I INSERTed about 80GB of data into MySQL over a network in about 14 hours. They were one-insert-per-row dumps (slow) with a good bit of overhead, inserting on a server with fast disks.
24 hours is possible if the hardware is old enough, or your import is competing with something else for disk IO and memory.
I just went through the experience of restoring a 51.8 Gb database from a 36.8 Gb mysqldump file to create an imdb database. For me the restore which was not done over the network but was done from a file on the native machine took a little under 4 hours.
The machine is a Quad Core Server running Windows Server 2008. People have wondered if there is a way to monitor progress. There actually is. You can watch the restore create the database files by going to the Program Data directory and finding the MYSQL subdirectory and then finding the subdirectory with your database name.
The files are gradually built in the directory and you can watch them build up. No small comfort when you have a production issue and you are wondering if the restore job is hung up or just taking a long time.