Json to mariadb triples in storage size - mysql

I am trying to move my file based organizing json files to mariadb. Approximately there are 2,000,000 json files where in my file based system are zipped. The total storage space for the zipped json files is 7GB.
When i inserted all the records to Mariadb the table storage became 35GB.
i altered my table to be compress and the table size is 15GB.
Is there a way to reduce even more the table size?
Is it normal for the storage to double when data is added to mariadb?
this is my table
CREATE TABLE `sbpi_json` (
`fileid` int(11) NOT NULL,
`json_data` longtext COLLATE utf8_bin NOT NULL,
`idhash` char(32) COLLATE utf8_bin NOT NULL,
`sbpi` int(15) NOT NULL,
`district` int(2) NOT NULL,
`index_val` int(2) NOT NULL,
`updated` text COLLATE utf8_bin NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin ROW_FORMAT=COMPRESSED;
ALTER TABLE `sbpi_json`
ADD PRIMARY KEY (`fileid`),
ADD UNIQUE KEY `idhash` (`idhash`),
ADD KEY `sbpi` (`sbpi`);

The JSON column in question is json_data, correct? It averages (uncompressed) about 10KB, correct? In the file implementation, there are multiple 'versions' of each, correct? If so, how do you tell which one you want to deliver to the user?
Most compression techniques give you 3:1; InnoDB compression gives you 2:1. This is partially because it has things that it can't (or won't) compress.
Compressing just the JSON column (in client code) and storing it in a MEDIUMBLOB will probably take less space in InnoDB than using COMPRESSED. (But this will not be a huge savings.)
Focus on how you pick which 'version' of the JSON do deliver to the user. Optimize the schema around that. Then decide on how to store the data.
Given that the table can efficiently say which file contains the desired JSON, then that will be the best approach. And use some normal, fast-to-uncompress technique; don't focus on maximal-compression.
If char(32) COLLATE utf8_bin is a hex string, use ascii, not utf8.
If it is hex, then UNHEX to further shrink it to only BINARY(16).
When a row is bigger than 8KB, some of the data (probably json_data) is stored "off-record". This implies an extra disk access and disk allocation is a bit more sloppy. Hence, storing that column as a file ends up taking about the same amount of time and space.
The OS probably allocates space in 4KB chunks. InnoDB uses 16KB blocks.

It's the text type that takes too much space.
You can try to replace it with a smaller variant of text type if you can give for granted that that much lenght is ok.
Also replacing char(32) with varchar(32) will help if those values are not always full lenght.
Or you can go with varchar even for the textual field, but keep eyes on what's on this answer before doing so.
Hope I helped!

Related

MySQL: column size limit

I'm currently working on a Windows OS and I have installed MySQL community server 5.6.30 and everything is fine. I have a script that initializes the DB and again, everything works fine.
Now I'm trying to run this script on a Linux environment -- same MySQL version -- and I get the following error:
ERROR 1074 (42000) at line 3: Column length too big for column
'txt' (max = 21845); use BLOB or TEXT instead
Script -
DROP TABLE IF EXISTS text;
CREATE TABLE `texts` (
`id` BINARY(16) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
`txt` VARCHAR(50000) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8;
Obviously there's some MySQL server configuration on my Windows OS that I need to replicate on Linux; can anyone share an ideas?
Update 1
on AWS's RDS it also works and im pretty sure its just a service on top of linux so obviously its just a config issue.
does any body knows how to reach varchar 50k with UTF8 ?. i dont want to use TEXT or MEDIUMTEXT or any else , just plain old varchar(size)
Update 2
i appreciate the different solutions that were suggested but im not looking for a new solution im only looking for an answer as do why varchar(50k) works under windows and under linux it doesnt.
Btw , im using charcter set UTF8 and collation utf8_general_ci .
Answer
to answer my own question , it was an issue with the SQL_MODE it was set to
STRICT_TRANS_TABLES and should have been removed.
According to the documentation:
Although InnoDB supports row sizes larger than 65,535 bytes
internally, MySQL itself imposes a row-size limit of 65,535 for the
combined size of all columns:
mysql> CREATE TABLE t (a VARCHAR(8000), b VARCHAR(10000),
-> c VARCHAR(10000), d VARCHAR(10000), e VARCHAR(10000),
-> f VARCHAR(10000), g VARCHAR(10000)) ENGINE=InnoDB;
ERROR 1118 (42000): Row size too large. The maximum row size for the
used table type, not counting BLOBs, is 65535. You have to change some
columns to TEXT or BLOBs
(Unfortunately, this example does not provide the character set so we don't really know how large the columns are.)
The utf8 encoding uses 1, 2, or 3 bytes per character. So, the maximum number of characters that can safely fit in a page of 65,535 bytes (the MySQL maximum) is 21,845 characters (21,845*3 = 65,535).
Despite the versions being similar, it would appear the Windows is being conservative in its space allocation and guaranteeing that you can store any characters in the field. Linux seems to have a more laissez-faire attitude. You can store some strings with over 21,845 characters, depending on the characters.
I have no idea why this difference would exist in the same version. Both methods are "right" in some sense. There are simple enough work-arounds:
Use TEXT.
Switch to a collation that has shorter characters (which is presumably what you want to store).
Reduce the size of the field.
please simply use TEXT to declare txt column
DROP TABLE IF EXISTS text;
CREATE TABLE `texts` (
`id` BINARY(16) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
`txt` TEXT DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8;
utf8 needs up to 3 bytes per character. utf8mb4: 4; latin1: 1; ascii: 1; etc. VARCHAR(N) is implemented as a 1- or 2-byte length in front of the bytes for the text. That is allowed to hold N characters (not bytes). So, if you say you want utf8, then 3*N must be less than 65535, the max value for a 2-byte length.
Be glad you are not running in some old version, where VARCHAR had a limit of 255.
If your txt does not need characters other than ascii or English, then use CHARACTER SET latin1.
In InnoDB, when there are 'long' fields (big varchars, texts, blobs, etc), some or all of the column is stored in a separate block(s). There is a limit of about 8000 bytes for what is stored together in the record.
If you really need 50K of utf8, then MEDIUMTEXT is what you need. It uses a 3-byte length and can hold up to 16M bytes (5M characters, possibly more, since utf8 is a variable length encoding).
Most applications can (should?) use either ascii (1 byte per character) or utf8mb4 (1-4 bytes per character). The latter allows for all languages, including Emoji and the 4-byte Chinese characters that utf8 cannot handle.
As for why Windows and Linux work differently here, I don't know. Are you using the same version? Suggest you file a bug report with http://bugs.mysql.com . (And provide a link to it from this Question.)
If you absolutely must use varchar - which is a bad solution to this problem! - then here's something you can try:
CREATE TABLE `texts` (
`id` BINARY(16) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
`txt` VARCHAR(20000) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8;
CREATE TABLE `texts2` (
`id` BINARY(16) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
`txt` VARCHAR(20000) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8;
CREATE TABLE `texts3` (
`id` BINARY(16) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
`txt` VARCHAR(10000) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=INNODB DEFAULT CHARSET=utf8;
There's 50000 characters. Now your client application will have to manage breaking up the text into the separate chunks, and creating the records in each table. Likewise reading the text back in will require you to do 3 select statements, but you will then have 50000 characters.
It's just not at all recommended to do this with any database implementation.
I've worked in a few environments where large text was stored in columns in the database, and it always wound up causing more problems than it solved.
These should really be spooled to files on disk, and a reference to the full path to the file stored in the database.
Then run some indexing engine over this corpus of documents.
you will get greater scalability from this, and easier management.
Just to add for more clarity. If you are using a solution that definitely requires a long VarChar. Like in my case when trying to configure WatchDog.NET to use mysql database for a .NET web api log.
You can sign into mysql database as root user and then run:
SET GLOBAL sql_mode = ""

Display collation in create table output

I'm writing a set of SQL statements in MySQL to create and modify a few tables. I need to get my output to match a document of sample output exactly (this is for school).
When I show my create table statements, all varchar columns need to look like this:
`name` varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,
but they weren't showing the collation. I tried changing the declaration to
name varchar COLLATE utf8_unicode_ci DEFAULT NULL,
but this caused the output to show both the charset and collation, and I need to be showing just the collation. The sample output document was created on Unix, while I am on Windows, so this could be the source of the difference, but I need to know for sure.
Is there a way I can alter my queries to show collation or is this just a Unix Windows inconsistency?
To be honest, I doubt very much that anyone intends for you to obtain output that is identical verbatem—it's more likely that they require it to be identical semantically. However, you might play around with the table's default charset/collation to see whether that makes a difference to the output obtained from SHOW CREATE TABLE:
ALTER TABLE foo CHARACTER SET utf8 COLLATE ut8_bin;
Failing that, it could be a difference between MySQL versions.

Create table automatically according to input file in mysql

I have a file with large number of columns and I want to input this file in mysql table.
The thing is if we have file with, say, 8 columns then we will first create table by -
CREATE TABLE `input` (
`idInput` varchar(45) DEFAULT NULL,
`row2` varchar(45) DEFAULT NULL,
`col3` varchar(45) DEFAULT NULL,
`col4` varchar(45) DEFAULT NULL,
`col5` varchar(45) DEFAULT NULL,
`col6` varchar(45) DEFAULT NULL,
`col7` varchar(45) DEFAULT NULL,
`col8` varchar(45) DEFAULT NULL
);
then we will input the file by -
LOAD DATA INFILE "FILE" INTO TABLE input;
But the thing is, I have file with 150 columns and I want to insert this file in mysql table automatically (so that I should not have to create table first). The first row of my file is header and it should be as column names in table and also each column and each row has different datatype.
So is there any easy way to do this so that after that I can do different things with this table?
I am using mysql command line client version 5.5.20 (windows 7).
You can try using SequelPro mysql client.
With this tool you can use the option "File->Import", and in the window "CSV Import Field Mapping", instead of selecting to import into an existing table, you can choose the button "New".
It's better if your CSV have a header line describing the column names, so it gives the right column names. The tool also is good at guessing the types of the columns according to the content.
You can eventually experience problems if VARCHAR(255) is setted as default type for fields of text type. If it is the case, change the type of those fields to TEXT type.
use phpmyadmin. It has the ability to create table base on the first line of the file and guess the table structure. Click "Import" link and select your file. Don't forget to select the Format to fit your file format, mostly CSV.
If the file is too big to fit into phpmyadmin, sometimes I "head" the file and use the head file in phpmyadmin to create the table, then import the file using the LOAD DATA command.
It makes my life easier.
I don't think this is possible using straight-up MySQL. Somehow the column definitions would have to be guessed. You'll prob have to go with a 2ndary language to read out the first row, make the table and then import.
You can do this using mysqldump though.
As I understand it, you have a generated text file with different data types ready to load from the command line. Here are instructions from MySQL.
to create:
https://dev.mysql.com/doc/refman/5.7/en/mysql-batch-commands.html
to alter:
https://dev.mysql.com/doc/refman/5.7/en/alter-table.html
all command lines which I also use. (if someone has a handy video that describes every step of how to use one of those MySQL Developer Environments though, that might be kinda of nice, one that it doesn't take 20 steps to load a table, though always probably be faster to type one in by hand and use one step or edit a dump.).

How to compress InnoDB table on disc?

I have a table 'text' initially create with the following script:
CREATE TABLE IF NOT EXISTS `text` (
`old_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`old_text` mediumblob NOT NULL,
`old_flags` tinyblob NOT NULL,
PRIMARY KEY (`old_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 MAX_ROWS=10000000 AVG_ROW_LENGTH=10240 AUTO_INCREMENT=8500 ;
I would like to compress it.
I tried the following script for this:
ALTER TABLE text ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;
But the table requires the same disc space after I ran it (I use innodb_file_per_table).
Initially idea appear after using archived to compress backup of tables - compressed size is 2% of original size.
How to perform compression of InnoDB table which reduces disc size required?
Thanks.
Compression does not affect already used space, so the best way to reduce size of all data is to delete the db server data, create tables again using ROW_FORMAT=COMPRESSED and load the data from backup.
I can prove that works, have tested myself.
All neccessary steps are listed here:
http://code.openark.org/blog/mysql/upgrading-to-barracuda-getting-rid-of-huge-ibdata1-file

What data type to use in MySQL to store images?

I need to store image and resume of user in the data base.
I am using mysql data base and php5. I need to know which data types I should use.
And also how do I set a limit (maximum size) for uploaded data.
What you need, according to your comments, is a 'BLOB' (Binary Large OBject) for both image and resume.
Perfect answer for your question can be found on MYSQL site itself.refer their manual(without using PHP)
http://forums.mysql.com/read.php?20,17671,27914
According to them use LONGBLOB datatype. with that you can only store images less than 1MB only by default,although it can be changed by editing server config file.i would also recommend using MySQL workBench for ease of database management
This can be done from the command line. This will create a column for your image with a NOT NULL property.
CREATE TABLE `test`.`pic` (
`idpic` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
`caption` VARCHAR(45) NOT NULL,
`img` LONGBLOB NOT NULL,
PRIMARY KEY(`idpic`)
)
TYPE = InnoDB;
From here