Distributing MySQL storage to drives by function - mysql

I'm wondering if MySQL has any capability to specify that data belonging to a certain account (representing e.g., a particular app, or a particular corporate program) be stored at at some particular place in the filesystem (such as a particular drive or RAID), instead of bundling it inside the same physical file structure that is shared by every other account, table, and data element managed by MySQL for everybody on that server.
I'm aware that I can jigger MySQL to store its entire data bundle at a place other than the default place, but I was hoping there might be a way to do this by function, for "some data but not all data."

In MySQL 8.0, there are options to specify the location for each table or tablespace. See https://dev.mysql.com/doc/refman/8.0/en/innodb-create-table-external.html
In earlier versions of MySQL, these options didn't work consistently. You could specify the directory for individual table partitions, if your table was partitioned, but not for a non-partitioned table. Go figure. :-)
That said, I've never encountered a situation where it was worth the time to specify the physical location of tables. Basically, if your performance depends on the difference between one RAID filesystem vs. carefully choosing among different drives, you're already losing.
Instead, I've always done this approach:
Use one datadir on a fast RAID filesystem. Use the default configuration of all tables and logs under this datadir.
Allocate as much RAM as I can afford to the InnoDB buffer pool (up to the size of the database of course - no need to use more RAM than that). RAM is orders of magnitude faster than any disk, even an SSD. So you'd prefer to be reading data out of RAM.
If that's not enough performance, there are other things you can do to optimize, like creating indexes, or modifying the application code to do more caching to reduce database reads, or using a message queue to postpone database writes.
If that's still not enough performance, then scale out to multiple database servers. In other words, sharding.

Related

Should unused tables be archived?

There is a table in our database that takes about 25GB. It is no longer used by the current code.
Will it give any performance improvements (for rest of the tables) if we archive this table, even though it's not queried/used? Please provide explanation.
We are using MySQL with AWS Aurora.
Archiving tables will not have any impact on Aurora. Unused pages are evicted from buffer pool eventually [1], and since then, they never get pulled back onto the db instances, unless you make a query that would touch those pages.
You would continue to pay storage costs (and other in-direct costs like snapshots) by keeping them as unused. A better option would be to move the unused data to a new cluster, create a snapshot out of it, and remove the cluster. You can always recover the data when you need it by restoring a snapshot. The original database can then be cleaned by dropping these unused tables. This way you end up only paying for the snapshot, which is cheaper.
You could also export the data out of mysql (CSV let say) and store it in S3/Glacier. Only caviat is that when you need to access the data, it can end up being a much more time consuming effort to load it back to an existing or new database cluster.
[1] Buffer pool uses LRU for eviction. When you workload runs for long, you would eventually end up evicting all the pages associated with the unused table. Link: https://dev.mysql.com/doc/refman/5.5/en/innodb-buffer-pool.html
Yes, archiving will improve performance also along with reduction in side and quickness of of backup/recovery cycles.
I have tried it on different projects in my recent full time job and results are amazing. For those who deny I would only say:
Reduction in footprint reduce disk IO and scans
Reduction in foot print reduce buffer requirements and hence RAM requirements.
YES, archiving infrequently used data will ease the burden on faster and more frequently accessed data storage systems. Older data that is unlikely to be needed often is put on systems that don’t need to have the speed and accessibility of systems that contain data still in use
Archived data is stored on a lower-cost tier of storage, serving as a way to reduce primary storage consumption and related costs. Typically, data reduplication is performed on data being moved to a lower storage tier, which reduces the overall storage footprint and lowers secondary storage costs

MySQL: Speed over reliability config

For my development machine I need no data consistency in case of a crash. Is there a config for a Debian-like system, that optimizes MySQL for speed (even if it sacrifices reliability)?
So something like: Cache the last 1 GB in RAM. Don't touch the disk with data until the 1 GB is used.
What kind of queries are going on? One of my mantras: "You cannot configure your way out of a performance problem."
Here's one thing that speeds up InnoDB, wrt transactions:
innodb_flush_log_at_trx_commit = 2
There is a simple way to speed up single-row inserts by a factor of 10.
Some 'composite' indexes can speed up a SELECT by a factor of 100.
Reformulating a WHERE can sometimes speed up a query by a factor of 100.
You can disable many of the InnoDB configurations for durability, at the risk of increased risk of losing data. But sometimes you want to operate the database in Running with scissors mode because the original data is safely stored somewhere else, and the copy in your test database is easily recreated.
This blog describes Reducing MySQL durability for testing. You aren't going to see any official MySQL recommendation to do this for any purpose other than testing!
Here's a summary of changes you can make in your /etc/my.cnf:
[mysqld]
# log_bin (comment this out to disable the binary log)
# sync_binlog=0 (irrelevant if you don't use the binary log)
sync_frm=0
innodb_flush_log_at_trx_commit=0
innodb_doublewrite=0
innodb_checksums=0
innodb_support_xa=0
innodb_log_file_size=2048M # or more
He also recommends to increase innodb_buffer_pool_size, but the size depends on your available RAM.
For what it's worth, I recently tried to set innodb_flush_log_at_trx_commit=0 in the configuration in the default Vagrant box I built for developers on my team, but I had to back out that change because it was causing too much lost time for developers who were getting corrupted databases. Just food for thought. Sometimes it's not a good tradeoff.
This doesn't do exactly what you asked (keep the last 1GB of data in RAM), as it still operates InnoDB with transaction logging and the log flushes to disk once per second. There's no way to turn that off in MySQL.
You could try using MyISAM, which uses buffered writes for data and index, and relies on the filesystem buffer. Therefore it could cache some of your data (in practice I have found that the buffer flushes to disk pretty promptly, so you're unlikely to have a full 1GB in RAM at any time). MyISAM has other problems, like lack of support for transactions. Developing with MyISAM and then using InnoDB in production can set you up for some awkward surprises.
Here's a couple of other changes you could make in your MySQL sessions for the sake of performance, but I don't recommend these even for development, because it can change your application behavior.
set session unique_checks=0;
set session foreign_key_checks=0;
Some people recommend using the MEMORY storage engine. That has its own problems, like size limits, table-locking, and lack of support for transactions.
I've also experimented with trying to put tables or tmpdir onto a tmpfs, but I found that didn't give nearly the performance boost you might expect. There's overhead in an RDBMS that is not directly related to disk I/O.
You might also like to experiment with MyRocks, a version of MySQL including the RocksDB storage engine for MySQL. Facebook developed it and released it as open-source. See Facebook rocks an open source storage engine for MySQL (InfoWorld). They promise it reduces I/O, it compresses data, and does other neat things.
But again, it's a good rule of thumb to make your development environment as close as possible to your production environment. Using a different storage engine creates a risk of not discovering some bugs until your code reaches production.
Bottom line: Tuning MySQL isn't a magic bullet. Maybe you should consider designing your application to make more use of microservices, caches, and message queues, and less reliance on direct SQL queries.
Also, I'd recommend to always supply your developers the fastest SSD-based workstation you can afford. Go for the top of the line on CPU and RAM and disk speed.
#Bill Karwin's answer has useful mysql settings to improve performance. I have used them all and was able to achieve a roughly 2x performance improvement.
However, what gave me the biggest performance boost (nearly 15x faster) for my use case -- which was reloading a mysql dump -- was to mount the underlying filesystem (ext4) using the nobarriers option.
mount -o remount,nobarrier /
More info here
You should only consider this if you have a separate partition (or logical volume) mounted at /var/lib/mysql, so that you can make this tradeoff only for MySQL, not your entire system.
Although this answer may not hit exactly the questions you ask, consider creating your tables with MEMORY engine as documented here: http://dev.mysql.com/doc/refman/5.7/en/memory-storage-engine.html
A typical use case for the MEMORY engine involves these
characteristics:
Operations involving transient, non-critical data such as session
management or caching. When the MySQL server halts or restarts, the
data in MEMORY tables is lost.
In-memory storage for fast access and low latency. Data volume can fit
entirely in memory without causing the operating system to swap out
virtual memory pages.
A read-only or read-mostly data access pattern (limited updates).
Give that a shot.
My recommendation, even for a development machine, would be to use the default InnoDB. If you choose to do transactions, InnoDB will be helpful.
This blog can help you run MySQL off of tmpfs: http://jotschi.de/2014/02/03/high-performance-mysql-testdatabase/. User Jotschi also speaks about that in a SO answer #10692398

Apache & MySQL with Persistent Disks to Multiple Instances

I plan on mount persistent disks into folders Apache(/var/www) and Mysql (/var/lib/mysql) to avoid having to replicate information between servers.
Anyone has done tests to know the I/O performance of persistent disk is similar when attaching the same disk to 100 instances as well as only 2 instances? Also has a limit of how many instances can be attach one persistent disk?
I'm not sure exactly what setup you're planning to use, so it's a little hard to comment specifically.
If you plan to attach the same persistent disk to all servers, note that a disk can only be attached to multiple instances in read-only mode, so you may not be able to use temporary tables, etc. in MySQL without extra configuration.
It's a bit hard to give performance numbers for a hypothetical configuration; I'd expect performance would depend on amount of data stored (e.g. 1TB of data will behave differently than 100MB), instance size (larger instances have more memory for page cache and more CPU for processing I/O), and access pattern. (Random reads vs. sequential reads)
The best option is to set up a small test system and run an actual loadtest using something like apachebench, jmeter, or httpperf. Failing that, you can try to construct an artificial load that's similar to your target benchmark.
Note that just running bonnie++ or fio against the disk may not tell you if you're going to run into problems; for example, it could be that a combination of sequential reads from one machine and random reads from another causes problems, or that 500 simultaneous sequential reads from the same block causes a problem, but that your application never does that. (If you're using Apache+MySQL, it would seem unlikely that your application would do that, but it's hard to know for sure until you test it.)

Can massive writing & deleting files hurt our server performance?

We run a system that for cache purposes, currently writes and deletes about 1,000 small files (10k) every hour.
In the near future this number will raise to about 10,000 - 20,000 files being written and deleted every hour.
For every files that is being written a new row on our mysql DB is added and deleted when the file is deleted an hour later.
My question:
Can this excessive write & delete operation hurt our server performance eventually somehow?
(btw we currently run this on a VPS and soon on a dedicated server.)
Can writing and deleting so many rows eventually slow our DB?
This greatly depends on operating system, file system and configuration of file system caching. Also this depends on whether your database is stored on the same disk as files that are written/deleted.
Usually, operation that affect file system structure such as file creations and file deletions require some synchronous disk IO, so operating system will not loose these changes after power failure. Though, some operating systems and file systems may support more relaxed policy for this. For example, UFS file system on FreeBSD has nice "soft updates" option that does this. Probably etx3/Linus should have similar feature.
Once you will move to dedicated server I think it would be reasonable to attach several HDDs to it and to make sure that database is stored on once disk while massive file operations are performed on another disk. In this cases DB performance should not be affected.
You should make some calculations and estimate needed throughtput for the storage. In your worst scenario, 20000 files x 10K = 200MB per hour which is a very low requirement.
Deleting a file, on modern filesystems, takes a very little time.
In my opinion you don't have to worry, especially if your applications creates and deletes files sequentially.
Consider also that modern operative systems cache parts of file system in memory to improve performance and reduce disk access (this is true especially for multiple deletes).
Your database will grow but engines are optimized for it, no need to care about it.
Only downside is that handling many small files could cause disk fragmentation if your file system is subjected to it.
For a performance bonus, you should consider to use a separate phisical storage for these files (e.g. a different disk drive or disk array) so you will take advantage of full bandwidth transfer with no other interferences.

Why should we store log files and bin-log files on different path or disks in mysql

I have replication setup mysql databases....the log file location the bin-log file all are at one path that is default my data directory of mysql.
I have read that for better performance one should store them separately.
Can anyone provide me how this improves the performance. Is there is documentation available for the same. The reason why one should do so?
Mainly because then, reads and writes can be made almost in parallel. Stored separately meaning on different disks.
Linux and H/W optimizations for MySQL is a nice presentation of ways to improve MySQL performance - it presents benchmarks and conclusions of when to use SSD disks and when to use SCSI disks, what kind of processors are better for what tasks.
Very good presentation, a must read for any DBA!!
It also can be really embarrassing to have your log files fill the file system and bring the database to a halt.
One consideration is that using a separate disk for binlogging introduces another SPOF since if MySQL cannot write the binlog it will croak the same as if it couldn't write to the data files. Otherwise, adding another disk just better separates the two tasks so that binlog writes and data file writes don't have to contend for resources. With SSDs this is much less of an issue unless you have some crazy heavy write load and are already bound by SSD performance.
It's mostly for cases where your database write traffic is so high that a single disk volume can't keep up while writing for both data files and log files. Disks have a finite amount of throughput, and you could have a very busy database server.
But it's not likely that separating data files from binlogs will give better performance for queries, because MySQL writes to the binlog at commit time, not at query time. If your disks were too slow to keep up with the traffic, you'd see COMMIT become a bottleneck.
The system I currently support stores binlogs in the same directory as the datadir. The datadir is on a RAID10 volume over 12 physical drives. This has plenty of throughput to support our workload. But if we had about double our write traffic, this RAID array wouldn't be able to keep up.
You don't need to do every tip that someone says gives better performance, because any given tip might make no difference to your application's workload. You need to measure many metrics of performance and resource use, and come up with the right tuning or configuration to help the bottlenecks under your workload.
There is no magic configuration that makes everything have high performance.