Microsoft SQL Server has a nice feature, which allows a database to be automatically expanded when it becomes full. In MySQL, I understand that a database is, in fact, a directory with a bunch of files corresponding to various objects. Does it mean that a concept of database size is not applicable and a MySQL database can be as big as available disk space allows without any additional concern? If yes, is this behavior the same across different storage engines?
It depends on the engine you're using. A list of the ones that come with MySQL can be found here.
MyISAM tables have a file per table. This file can grow to your file system's limit. As a table gets larger, you'll have to tune it as there's index and data size optimizations that limit the default size. Also, this MyISAM documentation page says:
There is a limit of 2^32 (~4.295E+09)
rows in a MyISAM table. If you build
MySQL with the --with-big-tables
option, the row limitation is
increased to (2^32)^2 (1.844E+19) rows.
See Section 2.16.2, “Typical configure
Options”. Binary distributions for
Unix and Linux are built with this
option.
InnoDB can operate in 3 different modes: using innodb table files, using a whole disk as a table file or using innodb_file_per_table.
Table files are pre-created per your MySQL instance. You typically create a large amount of space and monitor it. When it starts filling up, you need to configure another file and restart your server. You can also set it to autoextend, so that it will add a chunk of space to the last table file when it starts to fill up. I typically don't use this feature, as you never know when you'll take the performance hit for extending the table. This page talks about configuring it.
I've never used a whole disk as a table file, but it can be done. Instead of pointing to a file, I believe you point your InnoDB table files at the un-formatted, unmounted device.
innodb_file_per_table makes InnoDB tables act like MyISAM tables. Each table gets its own table file. Last time I used this, the table files did not shrink if you deleted rows from them. When a table is dropped or altered, the file resizes.
The Archive engine is a gzipped MyISAM table.
A memory table doesn't use disk at all. In fact, when a server restarts, all the data is lost.
Merge tables are like a poor man's partitioning for MyISAM tables. It causes a bunch of identical tables to be queried as if there were one. Aside from the FRM table definition, no files exist other than the MyISAM ones.
CSV tables are wrappers around CSV files. The usual file system limits apply here. They are not too fast, since they can't have indexes.
I don't think anyone uses BDB any more. At least, I've never used it. It uses a Berkly database as a back end. I'm not familiar with its restrictions.
Federated tables are used to connect to and query tables on other database servers. Again, there is only an FRM file.
The Blackhole engine doesn't store anything locally. It's used primarily for creating replication logs and not for actual data storage, since there is no data storage :)
MySQL Cluster is completely different: it stores just about everything in memory (recent editions allow disk storage) and is very different from all the other engines.
what you describe is roughly true for MyISAM tables. for InnoDB tables the picture is different, and more similar to what other DBMSs do: one (or a few) big file with complex internal structure for the whole server. to optimize it, you can use a whole disk (or partition) as a file. (at least in unix-like systems, where everything is a file)
Related
I have largish (InnoDB) tables in a database; apparently the users are capable of making SELECTs with JOINs that result in temporary, large (and thus on-disk) tables. Sometimes, those are so large that they exhaust disk space, leading to all sorts of weird issues.
Is there a way to limit temp table maximum size for an on-disk table, so that the table doesn't overgrow the disk? tmp_table_size only applies to in-memory tables, despite the name. I haven't found anything relevant in the documentation.
There's no option for this in MariaDB and MySQL.
I ran into the same issue as you some months ago, I searched a lot and I finally partially solved it by creating a special storage area on the NAS for themporary datasets.
Create a folder on your NAS or a partition on an internal HDD, it will be by definition limited in size, then mount it, and in the mysql ini, assign the temporary storage to this drive: (choose either windows/linux)
tmpdir="mnt/DBtmp/"
tmpdir="T:\"
mysql service should be restarted after this change.
With this approach, once the drive is full, you still have "weird issues" with on-disk queries, but the other issues are gone.
There was a discussion about an option disk-tmp-table-size, but it looks like the commit did not make it through review or got lost for some other reason (at least the option does not exist in the current code base anymore).
I guess your next best try (besides increasing storage) is to tune MySQL to not make on-disk temp tables. There are some tips for this on DBA. Another attempt could be to create a ramdisk for the storage of the "on-disk" temp tables, if you have enough RAM and only lack disk storage.
While it does not answer the question for MySQL, MariaDB has tmp_disk_table_size and potentially also useful max_join_size settings. However, tmp_disk_table_size is only for MyISAM or Aria tables, not for InnoDB. Also, max_join_size works only on the estimated row count of the join, not the actual row count. On the bright side, the error is issued almost immediately.
In SQL server the partitioning have cycle like
Table -> on Partition Schema -> on File Group (f1,f2,f3,f4,....)
For example in Oracle :
A filegroup in SQL Server is similar to tablespaces in Oracle, it is a logical storage for table and index data that can contain one or multiple OS files.
but how about MariaDB does it have File Group ?
Not as such. What are you trying to achieve? Keep in mind that some things like that exist because disks used to be smaller than databases. Today, there is rarely an issue. Furthermore RAID controllers, SANs, etc, eliminate the need (or even the desirability) of manually deciding what file goes where. OS's have ways to concatenate multiple volumes, even on the fly. Etc.
MyISAM has the ability to say where the data goes and where the index file goes. But MyISAM is all but dead. Even there, it was folly to put the data on one drive and the indexes on another. In performing a query, first the index is accessed, then the data. That is little, if any performance was gained. Simple RAID striping is likely to do better.
InnoDB has a way to spell out ibdata1, ibdata2, etc. That dates back to the days when the OS could not make a file bigger than 2GB or 4GB. It is essentially never used today.
InnoDB tables can either be all in ibdata1 or scattered among individual .ibd files. But I don't really think this is what you are talking about. With this "file per table", tiny tables are inefficiently stored. MySQL 8.0 will improve on that slightly by letting you put multiple tables in a given "tablespace", akin to .ibd file.
An InnoDB tablespace contains all the data and indexes for a given table or set of tables. Partitioned tables, when file_per_table, had each partition live in a different .ibd file. This may be changing with 8.0.
All of these are hardly worth noting. I would guess that only 1% of systems need to even think about it. Simply let MySQL/MariaDB do what it wants; it's good enough.
A related thing... In the '80s and '90s some vendors had "raw device" access because they thought they could do better than going through the OS. Again, OS's have improved, RAID controllers are sophisticated, and SANs exist. So raw is no longer important. (I don't think MySQL ever had it.) It's bound to have be a big development and maintenance problem for the vendor.
How many DBAs have put tmpdir in a separate partition, only to find that things are crashing because it was not big enough. Ditto for RAM-disk.
We have a local server for running tests, part of this is the database is dropped very regularly (before almost every scenario). The database itself only contains the rows required to carry out the test.
Is there a way to use InnoDB so that it never flushes to disk and works more like the MEMORY storage engine but would still remain true to the features of InnoDB that we would expect in production?
No, you can't make a table both innoDB and memory- both are different types of table engines, and can't be combined.
You can, however, use a temporary innoDB table. This means that the table will be written to disk, but will be per session, and will be wiped (both data and the table itself) whenever the session ends.
Does PostgreSQL have an equivalent of MySQL memory tables?
These MySQL memory tables can persist across sessions (i.e., different from temporary tables which drop at the end of the session). I haven't been able to find anything with PostgreSQL that can do the same.
No, at the moment they don't exist in PostgreSQL. If you truly need a memory table you can create a RAM disk, add a tablespace for it, and create tables on it.
If you only need the temporary table that is visible between different sessions, you can use an UNLOGGED table. These are not true memory tables but they'll behave surprisingly similarly when the table data is significantly smaller than the system RAM.
Global temporary tables would be another option but are not supported in PostgreSQL as of 9.2 (see comments).
Answering a four year old question but since it comes on top of google search results even now.
There is no built in way to cache a full table in memory, but there is an extension that can do this.
In Memory Column Store is a library that acts as a drop in extension and also as a columnar storage and execution engine. You can refer here for the documentation. There is a load function that you can use to load the entire table into memory.
The advantage is the table is stored inside postgres shared_buffers, so when executing a query postgres immediately senses that the pages are in memory and fetches from there.
The downside is that shared_buffers is not really designed to operate in such a way and instabilities might occur (usually it doesn't), but you can probably have this in a secondary cluster/machine with this configuration just to be safe.
All other usual caveats about postgres and shared_buffers still apply.
I noticed that my database server supports the Memory database engine. I want to make a database I have already made running InnoDB run completely in memory for performance.
How do I do that? I explored PHPMyAdmin, and I can't find a "change engine" functionality.
Assuming you understand the consequences of using the MEMORY engine as mentioned in comments, and here, as well as some others you'll find by searching about (no transaction safety, locking issues, etc) - you can proceed as follows:
MEMORY tables are stored differently than InnoDB, so you'll need to use an export/import strategy. First dump each table separately to a file using SELECT * FROM tablename INTO OUTFILE 'table_filename'. Create the MEMORY database and recreate the tables you'll be using with this syntax: CREATE TABLE tablename (...) ENGINE = MEMORY;. You can then import your data using LOAD DATA INFILE 'table_filename' INTO TABLE tablename for each table.
It is also possible to place the MySQL data directory in a tmpfs in thus speeding up the database write and read calls. It might not be the most efficient way to do this but sometimes you can't just change the storage engine.
Here is my fstab entry for my MySQL data directory
none /opt/mysql/server-5.6/data tmpfs defaults,size=1000M,uid=999,gid=1000,mode=0700 0 0
You may also want to take a look at the innodb_flush_log_at_trx_commit=2 setting. Maybe this will speedup your MySQL sufficently.
innodb_flush_log_at_trx_commit changes the mysql disk flush behaviour. When set to 2 it will only flush the buffer every second. By default each insert will cause a flush and thus cause more IO load.
Memory Engine is not the solution you're looking for. You lose everything that you went to a database for in the first place (i.e. ACID).
Here are some better alternatives:
Don't use joins - very few large apps do this (i.e Google, Flickr, NetFlix), because it sucks for large sets of joins.
A LEFT [OUTER] JOIN can be faster than an equivalent subquery because
the server might be able to optimize it better—a fact that is not
specific to MySQL Server alone.
-The MySQL Manual
Make sure the columns you're querying against have indexes. Use EXPLAIN to confirm they are being used.
Use and increase your Query_Cache and memory space for your indexes to get them in memory and store frequent lookups.
Denormalize your schema, especially for simple joins (i.e. get fooId from barMap).
The last point is key. I used to love joins, but then had to run joins on a few tables with 100M+ rows. No good. Better off insert the data you're joining against into that target table (if it's not too much) and query against indexed columns and you'll get your query in a few ms.
I hope those help.
If your database is small enough (or if you add enough memory) your database will effectively run in memory since it your data will be cached after the first request.
Changing the database table definitions to use the memory engine is probably more complicated than you need.
If you have enough memory to load the tables into memory with the MEMORY engine, you have enough to tune the innodb settings to cache everything anyway.
"How do I do that? I explored PHPMyAdmin, and I can't find a "change engine" functionality."
In direct response to this part of your question, you can issue an ALTER TABLE tbl engine=InnoDB; and it'll recreate the table in the proper engine.
In place of the Memory storage engine, one can consider MySQL Cluster. It is said to give similar performance but to support disk-backed operation for durability. I've not tried it, but it looks promising (and been in development for a number of years).
You can find the official MySQL Cluster documentation here.
Additional thoughts :
Ramdisk - setting the temp drive MySQL uses as a RAM disk, very easy to set up.
memcache - memcache server is easy to set up, use it to store the results of your queries for X amount of time.