How can I insert huge BLOBs into a MySQL database (InnoDB)?
Fields of type LONGBLOB support data sizes of up to 4 GB according to the MySQL manual. But how does data of such a huge size get into the database?
I tried to use
INSERT INTO table (bindata) VALUES ( LOAD_FILE('c:/tmp/hugefile') );
which fails if the size of hugefile is bigger than about 500 MB. I have set max_allowed_packet to an appropriate size; the value of innodb_buffer_pool_size doesn't seem to have an influence.
My server machine runs Windows Server 2003 and has 2 GB RAM. I'm using MySQL 5.0.74-enterprise-nt.
BLOBs are cached in memory, that's why you will have 3 copies of a BLOB as you are inserting it into a database.
Your 500 MB BLOB occupies 1,500 MB in RAM, which seems to hit your memory limit.
I do not know which client/api you use, but when trying to use blobs from own Java and Objective-C clients it seems, MySQL does not really support streaming of blobs. You need enough memory to hold the whole blob as byte array in ram (server and client side) more than once! Moving to a 64 bit linux helps, but is not desired solution.
MySQL ist not made for BLOB handling (ok for small BOBs :-). It occupies twice or three times the ram to be stored/read in the "BLOB".
You have to use an other database like PostgreSQL to get real BLOB support, sorry.
Related
Both MyRocks (MySql) and Cassandra uses LSM architecture to store their data. So I have populated around 5 million rows in MySql with MyRocks as storage engine and also in Cassandra. In Cassandra it takes only 1.7 GB of disk space while in MySql with MyRocks as storage engine, it takes 19 GB.
Am I missing something? Both use the same LSM mechanism. But why do they differ in data size?
Update:
I guess it has something to do with the text column. My Table Structure is (bigint,bigint,varchar,text).
Rows populated: 300 000
In MyRocks the data size 185MB
In Cassandra - 13 MB.
But if I remove the text column then:
MyRocks - 21.6 MB
Cassandra - 11 MB
Any idea about this behaviour?
Well the reason for the above behaviour is due to the rocksdb_block_size set to 4kb. Due to smaller data blocks the compressor finds lesser amount of data to compress. Setting it to 16kb solved the issue. Now I get the similar data size as of cassandra.
Not 100% on MyRocks. But Cassandra is LSM and also Key value store. Which means if your column is 'null' it won't be stored on disk. Traditionally RDBMS will still consume some space (varchars, null characters pointers etc) so this may account for your lost space.
Additionally cassandra compresses data. Try:
ALTER myTable WITH compression = { 'enabled' : false };
How much take a time to load 1GB csv file into MySQL?
System Configuration:-
4GB RAM
Windows 2007
I don't have much of an idea about windows, but if you can ensure that your biggest single table fits in the storage engine buffer pool, it should not take more than 30 - 45 minutes (worst case) even with your given system configuration.
I exported one database, in phpmyadmin, this database info: size: 2.7 GiB, overhead:43.5 KiB, after I exported to my local computer(win 7), the size is :495 MB
Questions:
1.size: 2.7 GiB = 2.9 GB, why after exported to local computer, it is only 495 MB?
2.what does this mean 'overhead'?
If there are deletes or updates some dbs don't free the memory slots by itself and the difference to the really needed space can be huge, especially after many updates and deletes.
For example an update can lead to a new used slot and a slot marked as empty They just keep it for future needs. Some have a special command for compacting. But after exporting and importing all the unused space is freed.
Just try optimize table on the tables of your old db and compare them after.
We are currently performing several performance tests on MySQL to compare it to an approach we are developing for a database prototype. To say it short: database is empty, given a huge csv file, load the data into memory as fast as possible.
We are testing on a 12-core Westmere server with 48 GB RAM, so memory consumption is right now not a real issue.
The problem is the following. We haven chosen MySQL (widely spread, open source) for comparison. Since our prototype is an in-memory database, we have chosen the memory engine in MySQL.
We insert this way (file are up to 26 GB large):
drop table if exists a.a;
SET ##max_heap_table_size=40000000000;
create table a.a(col_1 int, col_2 int, col_3 int) ENGINE=MEMORY;
LOAD DATA CONCURRENT INFILE "/tmp/input_files/input.csv" INTO TABLE a.a FIELDS TERMINATED BY ";";
Performing this load on a 2.6 GB file takes about 80 s, which is four times slower that an (wc -l). Using MyISAM is only 4 seconds slower, even though is writing to disk.
What I am doing wrong here? I suppose that a data write using the memory engine must be by far faster than using MyISAM. And I don't understand why wc -l (both single threaded, but writing to mem is not that slow) is that much faster.
PS: changing read_buffer_size or any other vars I found googling, did not result in significant improvements.
try setting following variables as well
max_heap_table_size=40GB;
bulk_insert_buffer_size=32MB
read_buffer_size=1M
read_rnd_buffer_size=1M
It may reduce query execution time slightly.
Also CONCURRENT works only with MyISAM table and it slows inserts according to manual refer: Load Data Infile
I think you can't compare speed of insert which is a write operation with wc -l which is read operation as writes are always slower as compared to reads.
Loading 2.6GB data in RAM is going to take considerable amount of time. It mostly depends on the write speed of RAM and IO configuration of your OS.
Hope this helps.
I think the reason you didn't see a significant difference between the MEMORY engine and the MyISAM engine is due to disk caching. You have 48GB of RAM and are only loading 2.6GB of data.
The MyISAM engine is writing to 'files' but the OS is using its file caching features to make those file writes actually occur in RAM. Then it will 'lazily' make the actual writes to disk. Since you mentioned 'wc', I'll assume you are using Linux. Read up on the dirty_ratio and dirty_background_ratio kernel settings as a starting point to understanding how that works.
I want to insert a file in MYSQL database residing on a remote webserver using a webservice.
My question is: What type of table column (e.g. varchar, etc.) will store a file? And will the insert statement be somewhat different in case of a file?
File size by MySQL type:
TINYBLOB 255 bytes = 0.000255 Mb
BLOB 65535 bytes = 0.0655 Mb
MEDIUMBLOB 16777215 bytes = 16.78 Mb
LONGBLOB 4294967295 bytes = 4294.97 Mb = 4.295 Gb
Yet, in most cases, I would NOT recommend storing big blobs of bytes in database, even if it supports it, because it will increase overall database size & may cause real performance issues. You can read more on topic here. Many databases that care about consistent performance won't even let you do such thing. Like e.g. AWS DynamoDB, which is known to perform extremely well at any scale, limits single item record to 400KB. MongoDB does allow 16MB, which is also already too much, imo. MySQL allows all 4GB if you wish. But again, think twice before doing that. The case where you may be OK to store big blob of data with these column types would be - you have small traffic database and you just want to save all the stuff in one place for faster development. Like internal system in a small company.
The BLOB datatype is best for storing files.
See: How to store .pdf files into MySQL as BLOBs using PHP?
The MySQL BLOB reference manual has some interesting comments
The other answers will give you a good idea how to accomplish what you have asked for....
However
There are not many cases where this is a good idea. It is usually better to store only the filename in the database and the file on the file system.
That way your database is much smaller, can be transported around easier and more importantly is quicker to backup / restore.
You need to use BLOB, there's TINY, MEDIUM, LONG, and just BLOB, as with other types, choose one according to your size needs.
TINYBLOB 255
BLOB 65535
MEDIUMBLOB 16777215
LONGBLOB 4294967295
(in bytes)
The insert statement would be fairly normal. You need to read the file using fread and then addslashes to it.