Difference between In memory databases and disk memory database - mysql

Recently i heard about the concept of In memory database.
In any type of database we are finally storing the data in the computer,from there our program will get the data .How in memory database operations are fast when compared to the others.
Will the in memory database load all the data from the database into memory(RAM).
Thanks in advance....

An in-memory database (IMDB; also main memory database system or MMDB or memory resident database) is a database management system that primarily relies on main memory for computer data storage. It is contrasted with database management systems that employ a disk storage mechanism. Main memory databases are faster than disk-optimized databases since the internal optimization algorithms are simpler and execute fewer CPU instructions. Accessing data in memory eliminates seek time when querying the data, which provides faster and more predictable performance than disk.
Applications where response time is critical, such as those running telecommunications network equipment and mobile advertising networks, often use main-memory databases.
In reply to your query, yes it loads the data in RAM of your computer.
On-Disk Databases
All data stored on disk, disk I/O needed to move data into main
memory when needed.
Data is always persisted to disk.
Traditional data structures like B-Trees designed to store tables and
indices efficiently on disk.
Virtually unlimited database size.
Support very broad set of workloads, i.e. OLTP, data warehousing,
mixed workloads, etc.
In-Memory Databases
All data stored in main memory, no need to perform disk I/O to query
or update data.
Data is persistent or volatile depending on the in-memory database
product.
Specialized data structures and index structures assume data is
always in main memory.
Optimized for specialized workloads; i.e. communications
industry-specific HLR/HSS workloads.
Database size limited by the amount of main memory.

MySQL offerings
MySQL has several "Engines". In all engines, actions are performed in RAM. The Engines differ significantly in how good they are at making sure the data "persists" on disk.
ENGINE=MEMORY -- This is not persistent; the data is found only in RAM. It is limited to some preset max size. On a power failure, all data (in a MEMORY table) is lost.
ENGINE=MyISAM -- This is an old engine; it persists data to disk, but in the case of power failure, sometimes the indexes are corrupted and need 'repairing'.
ENGINE=InnoDB -- This is the preferred engine. It not only persists to disk but 'guarantees' consistency even across power failures.

In-memory db usually have the whole database in memory. (like MySQL DB Engine MEMORY)
This is a huge performance boost, but RAM is expensive and often not persistent, so you would loose data on restart.
There are some ways to reduce the last issue, e.g. by timed snapshots, or replication on a disk database.
Also there are some hybrid types, with just a part of the db in memory.

There are also in-memory databases like Tarantool that can work with data sets larger than available RAM. Tarantool is able to work with these sets because it is optimized for fast random writes, the main bottleneck that arises.

Related

Distributing MySQL storage to drives by function

I'm wondering if MySQL has any capability to specify that data belonging to a certain account (representing e.g., a particular app, or a particular corporate program) be stored at at some particular place in the filesystem (such as a particular drive or RAID), instead of bundling it inside the same physical file structure that is shared by every other account, table, and data element managed by MySQL for everybody on that server.
I'm aware that I can jigger MySQL to store its entire data bundle at a place other than the default place, but I was hoping there might be a way to do this by function, for "some data but not all data."
In MySQL 8.0, there are options to specify the location for each table or tablespace. See https://dev.mysql.com/doc/refman/8.0/en/innodb-create-table-external.html
In earlier versions of MySQL, these options didn't work consistently. You could specify the directory for individual table partitions, if your table was partitioned, but not for a non-partitioned table. Go figure. :-)
That said, I've never encountered a situation where it was worth the time to specify the physical location of tables. Basically, if your performance depends on the difference between one RAID filesystem vs. carefully choosing among different drives, you're already losing.
Instead, I've always done this approach:
Use one datadir on a fast RAID filesystem. Use the default configuration of all tables and logs under this datadir.
Allocate as much RAM as I can afford to the InnoDB buffer pool (up to the size of the database of course - no need to use more RAM than that). RAM is orders of magnitude faster than any disk, even an SSD. So you'd prefer to be reading data out of RAM.
If that's not enough performance, there are other things you can do to optimize, like creating indexes, or modifying the application code to do more caching to reduce database reads, or using a message queue to postpone database writes.
If that's still not enough performance, then scale out to multiple database servers. In other words, sharding.

Should unused tables be archived?

There is a table in our database that takes about 25GB. It is no longer used by the current code.
Will it give any performance improvements (for rest of the tables) if we archive this table, even though it's not queried/used? Please provide explanation.
We are using MySQL with AWS Aurora.
Archiving tables will not have any impact on Aurora. Unused pages are evicted from buffer pool eventually [1], and since then, they never get pulled back onto the db instances, unless you make a query that would touch those pages.
You would continue to pay storage costs (and other in-direct costs like snapshots) by keeping them as unused. A better option would be to move the unused data to a new cluster, create a snapshot out of it, and remove the cluster. You can always recover the data when you need it by restoring a snapshot. The original database can then be cleaned by dropping these unused tables. This way you end up only paying for the snapshot, which is cheaper.
You could also export the data out of mysql (CSV let say) and store it in S3/Glacier. Only caviat is that when you need to access the data, it can end up being a much more time consuming effort to load it back to an existing or new database cluster.
[1] Buffer pool uses LRU for eviction. When you workload runs for long, you would eventually end up evicting all the pages associated with the unused table. Link: https://dev.mysql.com/doc/refman/5.5/en/innodb-buffer-pool.html
Yes, archiving will improve performance also along with reduction in side and quickness of of backup/recovery cycles.
I have tried it on different projects in my recent full time job and results are amazing. For those who deny I would only say:
Reduction in footprint reduce disk IO and scans
Reduction in foot print reduce buffer requirements and hence RAM requirements.
YES, archiving infrequently used data will ease the burden on faster and more frequently accessed data storage systems. Older data that is unlikely to be needed often is put on systems that don’t need to have the speed and accessibility of systems that contain data still in use
Archived data is stored on a lower-cost tier of storage, serving as a way to reduce primary storage consumption and related costs. Typically, data reduplication is performed on data being moved to a lower storage tier, which reduces the overall storage footprint and lowers secondary storage costs

Does MySQL scale on a single multi-processor machine?

My application's typical DB usage is to read/update on one large table. I wonder if MySQL scales read operations on a single multi-processor machine? How about write operations - are they able to utilize multi-processors?
By the way - unfortunately I am not able to optimize the table schema.
Thank you.
Setup details:
X64, quard core
Single hard disk (no RAID)
Plenty of memory (4GB+)
Linux 2.6
MySQL 5.5
If you're using conventional hard disks, you'll often find you run out of IO bandwidth before you run out of CPU cores. The only way to pin a four core machine is to have a very high performance SSD striped RAID array.
If you're not able to optimize the schema you have very limited options. This is like asking to tune a car without lifting the hood. Maybe you can change the tires or use better gasoline, but fundamental performance gains come from several factors, including, most notably, additional indexes and strategically de-normalizing data.
In database land, 4GB of memory is almost nothing, 8GB is the absolute minimum for a system with any loading, and a single disk is a very bad idea. At the very least you should have some form of mirroring for data integrity reasons.

Why not use MySQL like memcached?

For the same reason it makes sense to use both NoSQL and RDBMSs in one application, It makes sense to me to have an RDBMS cache besides NoSQL cache.
I was thinking about using a MySQL server with all tables using the memory engine.
Is there any caveat to this approach?
Just to clarify, I am suggesting here to use a MySQL server for caching purposes only, not for the actual data storage of my application.
Memory tables are stored entirely in memory, so it is very fast.
It uses hash indexes which are also very fast, great for temp table purposes and lookups.
Memory tables have table level locks, so if concurrency is required, this is a problem
No transactions
When server shuts down or crashes ALL ROWS ARE LOST
though table definition stays the same, the data would be all gone.
You may want to check out the official documents on the Memory Engine
EDIT:
The Memory Storage Engine is a good candidate for caching purposes.
The following are a few things that the Memory Engine is good for:
Lookup/mapping tables
caching results of periodically added data
for data analysis purposes
Sessions management
Low latency operations
Better than other strategies such as CREATE TEMPORARY TABLE as the Memory table persists (if that is what you need)
There are a few negatives:
It does not support TEXT or BLOB columns, the table would be converted to a MyISAM on disk if such an event happens.
The table should not hold too much data, as it takes up resources that can otherwise be allocated to indexes/query caches
All in all the memory engine should be the best choice for you if you need caching.
another reason,
RAM is much limited compared to disk space,
you can have disk drive up to terabytes,
but hardly can let memory goes up-to terabytes
If you got the $$, the following article about MySQL and InnoDB's capabilities will be of high interest.
It outperforms any sort of cache, memory engine or memcached. Drawback is - it requires ram, lots of it.

What is a buffer write?

Can some help me understand what a buffer write is? I'm particularly interested in learning its function in the context database systems, MySQL say! Some of the following would be helpful:
What is the purpose of a buffer write
Are there any performance advantages
An example of a buffer write in a database application
I came across this term several times and I was unable to discern its meaning.
Thanks
Imagine a computer with 100M of memory, running a database. The database stores data in files, but also keeps 50M of it in memory in buffers. If there is a request for data (SELECT or INSERT), the request can be handled from buffers in memory, which is much faster that going all the way to disk.
Buffering request for access to information in a database's files is essentially caching requests for disk I/O. If information is INSERT-ed then DELETEd within a very short period of time, writing to disk may be unnecessary. Not writing (buffers) greatly increases performance.
If there as a request to INSERT 100M of data into the database, then all pending writes, (from buffers to disk), must be done. Then at least half the new data is written to disk. Data has to be written because there isn't enough memory for the 100M new data plus 50M old data to all reside in memory. This necessity to write some existing buffers to disk is a performance hit. Luckily it is only the buffers holding changed or new records that need to be written out(or flushed) to disk. Those changed buffers are referred to as "dirty."
After the aforementioned INSERT of 100M, some 50M of the new data may temporarily be held in memory until it's most convenient to write--because not writing increases performance. A convenient time to write write changed records back to disk is when the system has been idled for a while. Writing (buffer writes) when the system is idle doesn't lower performance.