Should unused tables be archived? - mysql

There is a table in our database that takes about 25GB. It is no longer used by the current code.
Will it give any performance improvements (for rest of the tables) if we archive this table, even though it's not queried/used? Please provide explanation.
We are using MySQL with AWS Aurora.

Archiving tables will not have any impact on Aurora. Unused pages are evicted from buffer pool eventually [1], and since then, they never get pulled back onto the db instances, unless you make a query that would touch those pages.
You would continue to pay storage costs (and other in-direct costs like snapshots) by keeping them as unused. A better option would be to move the unused data to a new cluster, create a snapshot out of it, and remove the cluster. You can always recover the data when you need it by restoring a snapshot. The original database can then be cleaned by dropping these unused tables. This way you end up only paying for the snapshot, which is cheaper.
You could also export the data out of mysql (CSV let say) and store it in S3/Glacier. Only caviat is that when you need to access the data, it can end up being a much more time consuming effort to load it back to an existing or new database cluster.
[1] Buffer pool uses LRU for eviction. When you workload runs for long, you would eventually end up evicting all the pages associated with the unused table. Link: https://dev.mysql.com/doc/refman/5.5/en/innodb-buffer-pool.html

Yes, archiving will improve performance also along with reduction in side and quickness of of backup/recovery cycles.
I have tried it on different projects in my recent full time job and results are amazing. For those who deny I would only say:
Reduction in footprint reduce disk IO and scans
Reduction in foot print reduce buffer requirements and hence RAM requirements.

YES, archiving infrequently used data will ease the burden on faster and more frequently accessed data storage systems. Older data that is unlikely to be needed often is put on systems that don’t need to have the speed and accessibility of systems that contain data still in use
Archived data is stored on a lower-cost tier of storage, serving as a way to reduce primary storage consumption and related costs. Typically, data reduplication is performed on data being moved to a lower storage tier, which reduces the overall storage footprint and lowers secondary storage costs

Related

Distributing MySQL storage to drives by function

I'm wondering if MySQL has any capability to specify that data belonging to a certain account (representing e.g., a particular app, or a particular corporate program) be stored at at some particular place in the filesystem (such as a particular drive or RAID), instead of bundling it inside the same physical file structure that is shared by every other account, table, and data element managed by MySQL for everybody on that server.
I'm aware that I can jigger MySQL to store its entire data bundle at a place other than the default place, but I was hoping there might be a way to do this by function, for "some data but not all data."
In MySQL 8.0, there are options to specify the location for each table or tablespace. See https://dev.mysql.com/doc/refman/8.0/en/innodb-create-table-external.html
In earlier versions of MySQL, these options didn't work consistently. You could specify the directory for individual table partitions, if your table was partitioned, but not for a non-partitioned table. Go figure. :-)
That said, I've never encountered a situation where it was worth the time to specify the physical location of tables. Basically, if your performance depends on the difference between one RAID filesystem vs. carefully choosing among different drives, you're already losing.
Instead, I've always done this approach:
Use one datadir on a fast RAID filesystem. Use the default configuration of all tables and logs under this datadir.
Allocate as much RAM as I can afford to the InnoDB buffer pool (up to the size of the database of course - no need to use more RAM than that). RAM is orders of magnitude faster than any disk, even an SSD. So you'd prefer to be reading data out of RAM.
If that's not enough performance, there are other things you can do to optimize, like creating indexes, or modifying the application code to do more caching to reduce database reads, or using a message queue to postpone database writes.
If that's still not enough performance, then scale out to multiple database servers. In other words, sharding.

Large log data in Couchbase

I have a couchbase server to storing a huge data.
This data growing daily, but i also daily delete after process it.
Current, this data has about 1320168 items count, with 2.97G of Data Usage
But why Disc Usage is very large with 135G ???
My disc is lowing space to store more data.
Could delete this data log files to reduce disc Usage?
Couchbase uses an append-only format for storage. This means that every update or delete operation is actually stored as a new entry in the storage file and consumes more disk space.
Then a process called compaction occurs, that will reclaim the unnecessary used space. Compaction can either be configured to run automatically, when a certain fragmentation % is reached in your cluster, or manually on each node.
IIRC auto-compaction is not on by default.
So what you probably want to do is run compaction on your cluster. Note that it may require quite a large amount of diskspace, as noted here...
See the doc on how to perform compaction (in your case at the end of business day I guess you have an "off-peak" window where you currently delete and could perform compaction).
PS: Maybe guys in the official forums may have more insight and recommandations to offer.

Can massive writing & deleting files hurt our server performance?

We run a system that for cache purposes, currently writes and deletes about 1,000 small files (10k) every hour.
In the near future this number will raise to about 10,000 - 20,000 files being written and deleted every hour.
For every files that is being written a new row on our mysql DB is added and deleted when the file is deleted an hour later.
My question:
Can this excessive write & delete operation hurt our server performance eventually somehow?
(btw we currently run this on a VPS and soon on a dedicated server.)
Can writing and deleting so many rows eventually slow our DB?
This greatly depends on operating system, file system and configuration of file system caching. Also this depends on whether your database is stored on the same disk as files that are written/deleted.
Usually, operation that affect file system structure such as file creations and file deletions require some synchronous disk IO, so operating system will not loose these changes after power failure. Though, some operating systems and file systems may support more relaxed policy for this. For example, UFS file system on FreeBSD has nice "soft updates" option that does this. Probably etx3/Linus should have similar feature.
Once you will move to dedicated server I think it would be reasonable to attach several HDDs to it and to make sure that database is stored on once disk while massive file operations are performed on another disk. In this cases DB performance should not be affected.
You should make some calculations and estimate needed throughtput for the storage. In your worst scenario, 20000 files x 10K = 200MB per hour which is a very low requirement.
Deleting a file, on modern filesystems, takes a very little time.
In my opinion you don't have to worry, especially if your applications creates and deletes files sequentially.
Consider also that modern operative systems cache parts of file system in memory to improve performance and reduce disk access (this is true especially for multiple deletes).
Your database will grow but engines are optimized for it, no need to care about it.
Only downside is that handling many small files could cause disk fragmentation if your file system is subjected to it.
For a performance bonus, you should consider to use a separate phisical storage for these files (e.g. a different disk drive or disk array) so you will take advantage of full bandwidth transfer with no other interferences.

Why not use MySQL like memcached?

For the same reason it makes sense to use both NoSQL and RDBMSs in one application, It makes sense to me to have an RDBMS cache besides NoSQL cache.
I was thinking about using a MySQL server with all tables using the memory engine.
Is there any caveat to this approach?
Just to clarify, I am suggesting here to use a MySQL server for caching purposes only, not for the actual data storage of my application.
Memory tables are stored entirely in memory, so it is very fast.
It uses hash indexes which are also very fast, great for temp table purposes and lookups.
Memory tables have table level locks, so if concurrency is required, this is a problem
No transactions
When server shuts down or crashes ALL ROWS ARE LOST
though table definition stays the same, the data would be all gone.
You may want to check out the official documents on the Memory Engine
EDIT:
The Memory Storage Engine is a good candidate for caching purposes.
The following are a few things that the Memory Engine is good for:
Lookup/mapping tables
caching results of periodically added data
for data analysis purposes
Sessions management
Low latency operations
Better than other strategies such as CREATE TEMPORARY TABLE as the Memory table persists (if that is what you need)
There are a few negatives:
It does not support TEXT or BLOB columns, the table would be converted to a MyISAM on disk if such an event happens.
The table should not hold too much data, as it takes up resources that can otherwise be allocated to indexes/query caches
All in all the memory engine should be the best choice for you if you need caching.
another reason,
RAM is much limited compared to disk space,
you can have disk drive up to terabytes,
but hardly can let memory goes up-to terabytes
If you got the $$, the following article about MySQL and InnoDB's capabilities will be of high interest.
It outperforms any sort of cache, memory engine or memcached. Drawback is - it requires ram, lots of it.

Best storage engine for constantly changing data

I currently have an application that is using 130 MySQL table all with MyISAM storage engine. Every table has multiple queries every second including select/insert/update/delete queries so the data and the indexes are constantly changing.
The problem I am facing is that the hard drive is unable to cope, with waiting times up to 6+ seconds for I/O access with so many read/writes being done by MySQL.
I was thinking of changing to just 1 table and making it memory based. I've never used a memory table for something with so many queries though, so I am wondering if anyone can give me any feedback on whether it would be the right thing to do?
One possibility is that there may be other issues causing performance problems - 6 seconds seems excessive for CRUD operations, even on a complex database. Bear in mind that (back in the day) ArsDigita could handle 30 hits per second on a two-way Sun Ultra 2 (IIRC) with fairly modest disk configuration. A modern low-mid range server with a sensible disk layout and appropriate tuning should be able to cope with quite a substantial workload.
Are you missing an index? - check the query plans of the slow queries for table scans where they shouldn't be.
What is the disk layout on the server? - do you need to upgrade your hardware or fix some disk configuration issues (e.g. not enough disks, logs on the same volume as data).
As the other poster suggests, you might want to use InnoDB on the heavily written tables.
Check the setup for memory usage on the database server. You may want to configure more cache.
Edit: Database logs should live on quiet disks of their own. They use a sequential access pattern with many small sequential writes. Where they share disks with a random access work load like data files the random disk access creates a big system performance bottleneck on the logs. Note that this is write traffic that needs to be completed (i.e. written to physical disk), so caching does not help with this.
I've now changed to a MEMORY table and everything is much better. In fact I now have extra spare resources on the server allowing for further expansion of operations.
Is there a specific reason you aren't using innodb? It may yield better performance due to caching and a different concurrency model. It likely will require more tuning, but may yield much better results.
should-you-move-from-myisam-to-innodb
I think that that your database structure is very wrong and needs to be optimised, has nothing to do with the storage