How to compress InnoDB table on disc? - mysql

I have a table 'text' initially create with the following script:
CREATE TABLE IF NOT EXISTS `text` (
`old_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`old_text` mediumblob NOT NULL,
`old_flags` tinyblob NOT NULL,
PRIMARY KEY (`old_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 MAX_ROWS=10000000 AVG_ROW_LENGTH=10240 AUTO_INCREMENT=8500 ;
I would like to compress it.
I tried the following script for this:
ALTER TABLE text ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;
But the table requires the same disc space after I ran it (I use innodb_file_per_table).
Initially idea appear after using archived to compress backup of tables - compressed size is 2% of original size.
How to perform compression of InnoDB table which reduces disc size required?
Thanks.

Compression does not affect already used space, so the best way to reduce size of all data is to delete the db server data, create tables again using ROW_FORMAT=COMPRESSED and load the data from backup.
I can prove that works, have tested myself.
All neccessary steps are listed here:
http://code.openark.org/blog/mysql/upgrading-to-barracuda-getting-rid-of-huge-ibdata1-file

Related

Json to mariadb triples in storage size

I am trying to move my file based organizing json files to mariadb. Approximately there are 2,000,000 json files where in my file based system are zipped. The total storage space for the zipped json files is 7GB.
When i inserted all the records to Mariadb the table storage became 35GB.
i altered my table to be compress and the table size is 15GB.
Is there a way to reduce even more the table size?
Is it normal for the storage to double when data is added to mariadb?
this is my table
CREATE TABLE `sbpi_json` (
`fileid` int(11) NOT NULL,
`json_data` longtext COLLATE utf8_bin NOT NULL,
`idhash` char(32) COLLATE utf8_bin NOT NULL,
`sbpi` int(15) NOT NULL,
`district` int(2) NOT NULL,
`index_val` int(2) NOT NULL,
`updated` text COLLATE utf8_bin NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin ROW_FORMAT=COMPRESSED;
ALTER TABLE `sbpi_json`
ADD PRIMARY KEY (`fileid`),
ADD UNIQUE KEY `idhash` (`idhash`),
ADD KEY `sbpi` (`sbpi`);
The JSON column in question is json_data, correct? It averages (uncompressed) about 10KB, correct? In the file implementation, there are multiple 'versions' of each, correct? If so, how do you tell which one you want to deliver to the user?
Most compression techniques give you 3:1; InnoDB compression gives you 2:1. This is partially because it has things that it can't (or won't) compress.
Compressing just the JSON column (in client code) and storing it in a MEDIUMBLOB will probably take less space in InnoDB than using COMPRESSED. (But this will not be a huge savings.)
Focus on how you pick which 'version' of the JSON do deliver to the user. Optimize the schema around that. Then decide on how to store the data.
Given that the table can efficiently say which file contains the desired JSON, then that will be the best approach. And use some normal, fast-to-uncompress technique; don't focus on maximal-compression.
If char(32) COLLATE utf8_bin is a hex string, use ascii, not utf8.
If it is hex, then UNHEX to further shrink it to only BINARY(16).
When a row is bigger than 8KB, some of the data (probably json_data) is stored "off-record". This implies an extra disk access and disk allocation is a bit more sloppy. Hence, storing that column as a file ends up taking about the same amount of time and space.
The OS probably allocates space in 4KB chunks. InnoDB uses 16KB blocks.
It's the text type that takes too much space.
You can try to replace it with a smaller variant of text type if you can give for granted that that much lenght is ok.
Also replacing char(32) with varchar(32) will help if those values are not always full lenght.
Or you can go with varchar even for the textual field, but keep eyes on what's on this answer before doing so.
Hope I helped!

Is there a way to compare two create table queries and alter the existing table with adding new columns?

So in this case, I will get the whole database schema multiple times. But everytime the tables structure might be slightly different than the previous one. Since I already have data inside, is there a way to write a query to compare with the existing table and just adding new columns?
For example I already have this table in my database.
CREATE TABLE `Ages` (
`AgeID` int(11) DEFAULT NULL,
`AgeName` varchar(32) DEFAULT NULL,
`AgeAbbreviation` varchar(13) DEFAULT NULL,
`YouthAge` varchar(15) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
And in the new schema that I get has the same table but with different columns.
CREATE TABLE `Ages` (
`AgeID` int(11) DEFAULT NULL,
`AgeName` varchar(32) DEFAULT NULL,
`AgeAbbreviation` varchar(13) DEFAULT NULL,
`YouthAge` varchar(15) DEFAULT NULL,
`AgeLimit` varchar(20) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
In this case the column AgeLimit will be add to the existing table.
You should be able to do it by looking at the table definitions in the metadata tables (information_schema).
You can always look into the existing schema using the information_schema database, which holds the metadata.
You can then import your new schema into a temporary database, creating all tables according to the new schema and then again look into the metadata.
You might be able to use dynamic sql inside a stored procedure to execute alter table statements created from that differences at runtime
But I think, this is a lot easier from the backend nodejs server, because you can easily do step 1 and 2 also from nodejs (it's in fact just querying a bunch of tables) and you have way more possibilities to calculate the differences, create and execute the appropriate queries.
EDIT 1
If you don't have the possiblility of creating a temporary database from the new schema, you will have to find some other way, to extract information from it. I suspect you have a sql-script with (among others) a bunch of CREATE TABLE ... statements, because that's typically what mysqldump creates. So you'll have to parse this script. Again, this seems to be way easier in javascript, if it even is possible in a MySQL stored procedure. If your schema is as well structured as your examples, it's actually just a few lines of code.
EDIT 2
And maybe you can event get some inspiration from here: Compare two MySQL databases There are some tools mentioned which do a synchronization between databases.

MySQL: column size limit

I'm currently working on a Windows OS and I have installed MySQL community server 5.6.30 and everything is fine. I have a script that initializes the DB and again, everything works fine.
Now I'm trying to run this script on a Linux environment -- same MySQL version -- and I get the following error:
ERROR 1074 (42000) at line 3: Column length too big for column
'txt' (max = 21845); use BLOB or TEXT instead
Script -
DROP TABLE IF EXISTS text;
CREATE TABLE `texts` (
`id` BINARY(16) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
`txt` VARCHAR(50000) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8;
Obviously there's some MySQL server configuration on my Windows OS that I need to replicate on Linux; can anyone share an ideas?
Update 1
on AWS's RDS it also works and im pretty sure its just a service on top of linux so obviously its just a config issue.
does any body knows how to reach varchar 50k with UTF8 ?. i dont want to use TEXT or MEDIUMTEXT or any else , just plain old varchar(size)
Update 2
i appreciate the different solutions that were suggested but im not looking for a new solution im only looking for an answer as do why varchar(50k) works under windows and under linux it doesnt.
Btw , im using charcter set UTF8 and collation utf8_general_ci .
Answer
to answer my own question , it was an issue with the SQL_MODE it was set to
STRICT_TRANS_TABLES and should have been removed.
According to the documentation:
Although InnoDB supports row sizes larger than 65,535 bytes
internally, MySQL itself imposes a row-size limit of 65,535 for the
combined size of all columns:
mysql> CREATE TABLE t (a VARCHAR(8000), b VARCHAR(10000),
-> c VARCHAR(10000), d VARCHAR(10000), e VARCHAR(10000),
-> f VARCHAR(10000), g VARCHAR(10000)) ENGINE=InnoDB;
ERROR 1118 (42000): Row size too large. The maximum row size for the
used table type, not counting BLOBs, is 65535. You have to change some
columns to TEXT or BLOBs
(Unfortunately, this example does not provide the character set so we don't really know how large the columns are.)
The utf8 encoding uses 1, 2, or 3 bytes per character. So, the maximum number of characters that can safely fit in a page of 65,535 bytes (the MySQL maximum) is 21,845 characters (21,845*3 = 65,535).
Despite the versions being similar, it would appear the Windows is being conservative in its space allocation and guaranteeing that you can store any characters in the field. Linux seems to have a more laissez-faire attitude. You can store some strings with over 21,845 characters, depending on the characters.
I have no idea why this difference would exist in the same version. Both methods are "right" in some sense. There are simple enough work-arounds:
Use TEXT.
Switch to a collation that has shorter characters (which is presumably what you want to store).
Reduce the size of the field.
please simply use TEXT to declare txt column
DROP TABLE IF EXISTS text;
CREATE TABLE `texts` (
`id` BINARY(16) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
`txt` TEXT DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8;
utf8 needs up to 3 bytes per character. utf8mb4: 4; latin1: 1; ascii: 1; etc. VARCHAR(N) is implemented as a 1- or 2-byte length in front of the bytes for the text. That is allowed to hold N characters (not bytes). So, if you say you want utf8, then 3*N must be less than 65535, the max value for a 2-byte length.
Be glad you are not running in some old version, where VARCHAR had a limit of 255.
If your txt does not need characters other than ascii or English, then use CHARACTER SET latin1.
In InnoDB, when there are 'long' fields (big varchars, texts, blobs, etc), some or all of the column is stored in a separate block(s). There is a limit of about 8000 bytes for what is stored together in the record.
If you really need 50K of utf8, then MEDIUMTEXT is what you need. It uses a 3-byte length and can hold up to 16M bytes (5M characters, possibly more, since utf8 is a variable length encoding).
Most applications can (should?) use either ascii (1 byte per character) or utf8mb4 (1-4 bytes per character). The latter allows for all languages, including Emoji and the 4-byte Chinese characters that utf8 cannot handle.
As for why Windows and Linux work differently here, I don't know. Are you using the same version? Suggest you file a bug report with http://bugs.mysql.com . (And provide a link to it from this Question.)
If you absolutely must use varchar - which is a bad solution to this problem! - then here's something you can try:
CREATE TABLE `texts` (
`id` BINARY(16) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
`txt` VARCHAR(20000) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8;
CREATE TABLE `texts2` (
`id` BINARY(16) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
`txt` VARCHAR(20000) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8;
CREATE TABLE `texts3` (
`id` BINARY(16) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
`txt` VARCHAR(10000) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8;
There's 50000 characters. Now your client application will have to manage breaking up the text into the separate chunks, and creating the records in each table. Likewise reading the text back in will require you to do 3 select statements, but you will then have 50000 characters.
It's just not at all recommended to do this with any database implementation.
I've worked in a few environments where large text was stored in columns in the database, and it always wound up causing more problems than it solved.
These should really be spooled to files on disk, and a reference to the full path to the file stored in the database.
Then run some indexing engine over this corpus of documents.
you will get greater scalability from this, and easier management.
Just to add for more clarity. If you are using a solution that definitely requires a long VarChar. Like in my case when trying to configure WatchDog.NET to use mysql database for a .NET web api log.
You can sign into mysql database as root user and then run:
SET GLOBAL sql_mode = ""

Can I specify different data directory for each database running on a single MySQL installation?

My question is:
Can I specify different data directory for each database running on a single MySQL installation? I have multiple large databases, I want to point each to it's own directory, each on a different mount (unique disk).
Image is worth a thousand words, so let me illustrate:
Trying something like this for some reason creates DB and Table, but ignores data directory and index directory options:
CREATE DATABASE `DB1` /*!40100 COLLATE 'latin1_swedish_ci' */;
USE DB1;
CREATE TABLE `onDisk1` (
`id` INT(11) NULL DEFAULT NULL
)
COLLATE='latin1_swedish_ci'
ENGINE=MyISAM
DATA DIRECTORY='/mnt/windows/share_data/mysql'
INDEX DIRECTORY='/mnt/windows/share_data/mysql'
;
Mysql support this, You can follow steps here
https://dev.mysql.com/doc/refman/5.6/en/multiple-data-directories.html

Can't create InnoDB table (error -1)

I'm porting a rather simple table to my live db server and it's giving me this strange error when I try to create a InnoDB table, table create is:
CREATE TABLE `cobertura` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`cep` int(8) unsigned zerofill NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`),
KEY `idx_cep` (`cep`)
) ENGINE=InnoDB;
If i change the engine to MyISAM it works, if I change the table name to something else, it works.
If i create the table as MyISAM and do an engine alter to InnoDB I get error 121.
I tried looking on the folder where mysql stores the files to see if there's any trash there, nothing.
Any ideas?
While the database may have a dash (-) in the name it will prevent MariaDB (and therefore MySQL) from setting the engine to InnoDB...though this is at best a half-answer as while I am trying to import an entire database back in to the system there are other tables that get created first without problems. Unfortunately this issue is now forcing itself upon me and I do not have the time to start a whole new database name scheme policy. For now I'm changing the engine for that particular database to use MyISAM instead.
From general troubleshooting try:
SHOW ENGINES;
...and if InnoDB isn't installed apparently then try this:
INSTALL PLUGIN innodb SONAME 'ha_innodb.so';