Mysql tmpdir is out of space during large index creation - mysql

I am trying to index a varchar(255) column of a table with 2 billion rows. The indexation stops with the below error:
Error number 28 means ‘No space left on device
This post suggests I can just change the tmpdir to another partition with larger disk space.
Is this advised? Are there any downsides to doing so? (slower indexation ... etc)
If not what are the detailed steps I need to go through to change tmpdir of mysql effectively?

Advised: Yes. Easy: yes.
In the old days, disk drives were small. Often a machine would have multiple drives to make it possible to store lots of data. And file systems had limitations. (Long ago, FAT16 on DOS topped out at 2 billion bytes, no way for it to hold 2 billion rows.)
So, having a setting for tmpdir made it easy to chop up the disk usage. In rare cases, it can be beneficial to performance to do such separation, but even that is vanishing with the advent of SSDs. Meanwhile, for really huge datasets, RAID striping provides a better way to provide performance than manually splitting up disk allocation.
Find the my.cnf configuration file. In the [mysqld] section, add
tmpdir = /.../...
to point to some otherwise unused directory on a disk with lots of free space.
Meanwhile... "2 billion rows" is large enough to lead to numerous challenges. You may want to start a question in dba.stackexhange.com to discuss potential issues.
innodb_tmpdir
From the changelog for 5.6.29 and 5.7.11 (Feb 2016), innodb_tmpdir may be a better thing to change:
A new InnoDB configuration option, innodb_tmpdir, allows you to configure a separate directory for temporary files created during online ALTER TABLE operations that rebuild the table. This option was introduced to help avoid MySQL temporary directory overflows that could occur as a result of large temporary files created during online ALTER TABLE operations. innodb_tmpdir can be configured dynamically using a SET statement.
Online ALTER TABLE operations that rebuild a table also create an intermediate table file in the same directory as the original table. The innodb_tmpdir option is not applicable to intermediate table files. (Bug #19183565)

Related

How can I limit the size of temporary tables?

I have largish (InnoDB) tables in a database; apparently the users are capable of making SELECTs with JOINs that result in temporary, large (and thus on-disk) tables. Sometimes, those are so large that they exhaust disk space, leading to all sorts of weird issues.
Is there a way to limit temp table maximum size for an on-disk table, so that the table doesn't overgrow the disk? tmp_table_size only applies to in-memory tables, despite the name. I haven't found anything relevant in the documentation.
There's no option for this in MariaDB and MySQL.
I ran into the same issue as you some months ago, I searched a lot and I finally partially solved it by creating a special storage area on the NAS for themporary datasets.
Create a folder on your NAS or a partition on an internal HDD, it will be by definition limited in size, then mount it, and in the mysql ini, assign the temporary storage to this drive: (choose either windows/linux)
tmpdir="mnt/DBtmp/"
tmpdir="T:\"
mysql service should be restarted after this change.
With this approach, once the drive is full, you still have "weird issues" with on-disk queries, but the other issues are gone.
There was a discussion about an option disk-tmp-table-size, but it looks like the commit did not make it through review or got lost for some other reason (at least the option does not exist in the current code base anymore).
I guess your next best try (besides increasing storage) is to tune MySQL to not make on-disk temp tables. There are some tips for this on DBA. Another attempt could be to create a ramdisk for the storage of the "on-disk" temp tables, if you have enough RAM and only lack disk storage.
While it does not answer the question for MySQL, MariaDB has tmp_disk_table_size and potentially also useful max_join_size settings. However, tmp_disk_table_size is only for MyISAM or Aria tables, not for InnoDB. Also, max_join_size works only on the estimated row count of the join, not the actual row count. On the bright side, the error is issued almost immediately.

FileGroup in MariaDB

In SQL server the partitioning have cycle like
Table -> on Partition Schema -> on File Group (f1,f2,f3,f4,....)
For example in Oracle :
A filegroup in SQL Server is similar to tablespaces in Oracle, it is a logical storage for table and index data that can contain one or multiple OS files.
but how about MariaDB does it have File Group ?
Not as such. What are you trying to achieve? Keep in mind that some things like that exist because disks used to be smaller than databases. Today, there is rarely an issue. Furthermore RAID controllers, SANs, etc, eliminate the need (or even the desirability) of manually deciding what file goes where. OS's have ways to concatenate multiple volumes, even on the fly. Etc.
MyISAM has the ability to say where the data goes and where the index file goes. But MyISAM is all but dead. Even there, it was folly to put the data on one drive and the indexes on another. In performing a query, first the index is accessed, then the data. That is little, if any performance was gained. Simple RAID striping is likely to do better.
InnoDB has a way to spell out ibdata1, ibdata2, etc. That dates back to the days when the OS could not make a file bigger than 2GB or 4GB. It is essentially never used today.
InnoDB tables can either be all in ibdata1 or scattered among individual .ibd files. But I don't really think this is what you are talking about. With this "file per table", tiny tables are inefficiently stored. MySQL 8.0 will improve on that slightly by letting you put multiple tables in a given "tablespace", akin to .ibd file.
An InnoDB tablespace contains all the data and indexes for a given table or set of tables. Partitioned tables, when file_per_table, had each partition live in a different .ibd file. This may be changing with 8.0.
All of these are hardly worth noting. I would guess that only 1% of systems need to even think about it. Simply let MySQL/MariaDB do what it wants; it's good enough.
A related thing... In the '80s and '90s some vendors had "raw device" access because they thought they could do better than going through the OS. Again, OS's have improved, RAID controllers are sophisticated, and SANs exist. So raw is no longer important. (I don't think MySQL ever had it.) It's bound to have be a big development and maintenance problem for the vendor.
How many DBAs have put tmpdir in a separate partition, only to find that things are crashing because it was not big enough. Ditto for RAM-disk.

MySql Performance of Innodb with single large data file vs. multiple data files per table

InnoDB allows the option of using a single data file for everything or one data file per table by setting the following in your my.cnf file:
[mysqld]
innodb_file_per_table
Comparing 8 databases with 20 tables roughly with a single ibdata file of 60G vs. a fairly evenly distributed 60G across the 160 individual data files in the one-per-table setup, does one setup have generally better performance than the other? Are there any considerations that would favor one approach over the other?
Benchmark it! We don't know your typical usage pattern or types of queries (Full scans? Narrow lookups based on index? Lots of updates or nearly read-only?).
innodb_file_per_table is easier to maintain — e.g. you can recover disk space after cleaning up and optimizing a single table; the default one-large-file will only grow.

skip copying to tmp table on disk mysql

I have a question for large mysql queries. Is it possible to skip the copying to tmp table on disk step that mysql takes for large queries or is it possible to make it go faster? because this step is taking way too long to get the results of my queries back. I read on the MySQL page that mysql performs this to save memory, but I don't care about saving memory I just want to get the results of my queries back FAST, I have enough memory on my machine. Also, my tables are properly indexed so that's not the reason why my queries are slow.
Any help?
Thank you
There are two things you can do to lessen the impact by this
OPTION #1 : Increase the variables tmp_table_size and/or max_heap_table_size
These options will govern how large an in-memory temp table can be before it is deemed too large and then pages to disk as a temporary MyISAM table. The larger these values are, the less likely you will get 'copying to tmp table on disk'. Please, make sure your server has enough RAM and max_connections is moderately configured should a single DB connection need a lot of RAM for its own temp tables.
OPTION #2 : Use a RAM disk for tmp tables
You should be able to configure a RAM disk in Linux and then set the tmpdir in mysql to be the folder that has the RAM disk mounted.
For starters, configure a RAM disk in the OS
Create a folder in the Linux called /var/tmpfs
mkdir /var/tmpfs
Next, add this line to /etc/fstab (for example, if you want a 16GB RAM disk)
none /var/tmpfs tmpfs defaults,size=16g 1 2
and reboot the server.
Note : It is possible to make a RAM disk without rebooting. Just remember to still add the aforementioned line to /etc/fstab to have the RAM disk after a server reboot.
Now for MySQL:
Add this line in /etc/my.cnf
[mysqld]
tmpdir=/var/tmpfs
and restart mysql.
OPTION #3 : Get tmp table into the RAM Disk ASAP (assuming you apply OPTION #2 first)
You may want to force tmp tables into the RAM disk as quickly as possible so that MySQL does not spin its wheels migrating large in-memory tmp tables into a RAM disk. Just add this to /etc/my.cnf:
[mysqld]
tmpdir=/var/tmpfs
tmp_table_size=2K
and restart mysql. This will cause even the tiniest temp table to be brought into existence right in the RAM disk. You could periodically run ls -l /var/tmpfs to watch temp tables come and go.
Give it a Try !!!
CAVEAT
If you see nothing but temp tables in /var/tmpfs 24/7, this could impact OS functionality/performance. To make sure /var/tmpfs does not get overpopulated, look into tuning your queries. Once you do, you should see less tmp tables materializing in /var/tmpfs.
You can also skip the copy to tmp table on disk part (not answered in the selected answer)
If you avoid some data types :
Support for variable-length data types (including BLOB and TEXT) not supported by MEMORY.
from https://dev.mysql.com/doc/refman/8.0/en/memory-storage-engine.html
(or https://mariadb.com/kb/en/library/memory-storage-engine/ if you are using mariadb).
If your temporary table is small enough : as said in selected answer, you can
Increase the variables tmp_table_size and/or max_heap_table_size
But if you split your query in smaller queries (not having the query does not help to analyze your problem), you can make it fit inside a memory temporary table.

MySQL database size

Microsoft SQL Server has a nice feature, which allows a database to be automatically expanded when it becomes full. In MySQL, I understand that a database is, in fact, a directory with a bunch of files corresponding to various objects. Does it mean that a concept of database size is not applicable and a MySQL database can be as big as available disk space allows without any additional concern? If yes, is this behavior the same across different storage engines?
It depends on the engine you're using. A list of the ones that come with MySQL can be found here.
MyISAM tables have a file per table. This file can grow to your file system's limit. As a table gets larger, you'll have to tune it as there's index and data size optimizations that limit the default size. Also, this MyISAM documentation page says:
There is a limit of 2^32 (~4.295E+09)
rows in a MyISAM table. If you build
MySQL with the --with-big-tables
option, the row limitation is
increased to (2^32)^2 (1.844E+19) rows.
See Section 2.16.2, “Typical configure
Options”. Binary distributions for
Unix and Linux are built with this
option.
InnoDB can operate in 3 different modes: using innodb table files, using a whole disk as a table file or using innodb_file_per_table.
Table files are pre-created per your MySQL instance. You typically create a large amount of space and monitor it. When it starts filling up, you need to configure another file and restart your server. You can also set it to autoextend, so that it will add a chunk of space to the last table file when it starts to fill up. I typically don't use this feature, as you never know when you'll take the performance hit for extending the table. This page talks about configuring it.
I've never used a whole disk as a table file, but it can be done. Instead of pointing to a file, I believe you point your InnoDB table files at the un-formatted, unmounted device.
innodb_file_per_table makes InnoDB tables act like MyISAM tables. Each table gets its own table file. Last time I used this, the table files did not shrink if you deleted rows from them. When a table is dropped or altered, the file resizes.
The Archive engine is a gzipped MyISAM table.
A memory table doesn't use disk at all. In fact, when a server restarts, all the data is lost.
Merge tables are like a poor man's partitioning for MyISAM tables. It causes a bunch of identical tables to be queried as if there were one. Aside from the FRM table definition, no files exist other than the MyISAM ones.
CSV tables are wrappers around CSV files. The usual file system limits apply here. They are not too fast, since they can't have indexes.
I don't think anyone uses BDB any more. At least, I've never used it. It uses a Berkly database as a back end. I'm not familiar with its restrictions.
Federated tables are used to connect to and query tables on other database servers. Again, there is only an FRM file.
The Blackhole engine doesn't store anything locally. It's used primarily for creating replication logs and not for actual data storage, since there is no data storage :)
MySQL Cluster is completely different: it stores just about everything in memory (recent editions allow disk storage) and is very different from all the other engines.
what you describe is roughly true for MyISAM tables. for InnoDB tables the picture is different, and more similar to what other DBMSs do: one (or a few) big file with complex internal structure for the whole server. to optimize it, you can use a whole disk (or partition) as a file. (at least in unix-like systems, where everything is a file)