MySQL, Saving large blobs - mysql

I'm testing an application written in QT that deals with PDFs saved on a database, i was having trouble trying to save anything larger than about 1Mb the application would crash, reading on Goggle end up changing the MAX_ALLOWED_PACKET and let me save blobs.
I plotted several uploads of different size of PDF and i got a number of about 200Kb/sec saving files. Then it came my surprise, checking the data base i realized that anything over around 5Mb would not store. There is no error and it seems that the handshake between the application and MySQL goes ok, as i don't get any errors.
I have some experience with MySQL and Oracle but i have never dealt with Blobs.
I read on a post somewhere that i should try to change the value of innodb_log_file_size (i tried 2000000000) but MySQL tells me that it is a read only variable. Could some body help me fix this problem? I'm running MySQL on Ubuntu.

It's not surprising that you got an error, because the default innodb log file size is 48MB (50331648 bytes).
Your innodb log file size must be at least 10x the size of your largest blob that you try to save. In other words, you can save a blob only if it's no larger than 1/10th the log file size. This started being enforced in MySQL 5.6; before that it was recommended in the manual, but not enforced.
You can change the log file size, but it requires restarting the MySQL Server. The steps are documented here: https://dev.mysql.com/doc/refman/5.7/en/innodb-data-log-reconfiguration.html
P.S. As for the comments about storing images in the database vs. as files on disk, this is a long debate. Some people will make unequivocal statements that it's bad to store images in the database, but there are pros and cons on both sides of the argument. See my answer to Should I use MySQL blob field type?

Bill has the immediate answer. But I suggest that you will get bigger and bigger documents, and be hitting one limit after another.
Meanwhile, unless you have a very recent MySQL, changing the innodb_log_file_size is a pain.
If you ever get to 1GB, you will hit the limit for max_allowed_packet. Even if you got past that, then you will hit another hard limit -- 4GB, the maximum size of LONGBLOB or LONGTEXT.
I suggest you either bite the bullet
Plan A: Put documents in the file system, or
Plan B: Chunk the documents into pieces, storing into multiple BLOB rows. This will avoid all limits, even the 4GB limit. But the code will be messy on input and output.

Related

What is difference between storing data in a blob, vs. storing a pointer to a file?

I have a question about the blob data type in MySQL.
I read that the data type can be used to store files. I also read that an alternative is to store the file on disk and include a pointer to its location in the database (via a varchar column).
But I'm a little confused because I've read that blob fields are not stored in-row and require a separate look-up to retrieve its contents. So is that any different than storing a pointer to a file on the file system?
I read that the data type can be used to store files.
According to MySQL manual page on Blob, A BLOB is a binary large object that can hold a variable amount of data.
Since it's a data type specific to store binary data it's common to use it to store files in binary format, being storing image files a very common use on web applications.
For web applications this would mean that you would first need to convert your file into binary format and then store it, and every time you need to retrieve your file you would need to do the reverse process of converting them back to it's original format.
Besides that, storing large amount of data in your db MAY slow it down. Specially in systems that are not dedicated only to host a database.
I also read that an alternative is to store the file on disk and include a pointer to its location in the database
Bearing in mind all above considerations a common practice for web applications is to store your files elsewhere than your MySQL and then simply store it's path on your database. This approach MAY speed up your database when dealing with large amount of data.
But I'm a little confused because I've read that blob fields are not stored in-row and require a separate look-up to retrieve its contents.
In fact that would depend on what storage engine you are using since every engine treats data and stores it in different ways. For the InnoDB engine, which is suited for relational database you may want to read this article from MySQL Performance blog on how the blob is stored in MySQL.
But in abstract, on MySQL 5 and forward the blob is stored as following:
Innodb stores either whole blob on the row page or only 20 bytes BLOB pointer giving preference to smaller columns to be stored on the page, which is reasonable as you can store more of them.
So you are probably thinking now that the right way to go is to store them as separate file, but there are some advantages of using blob to store data, the first one (in my opinion) is the backup. I manage a small server and I had to create another subroutine only to copy my files stored as paths to another storage disk (We couldn't afford to buy a decent tape backup system). If I had designed my application to use blobs a simple mysqldump would be everything that I needed to backup my whole database.
The advantage of storing blobs for backups are better discussed on this post where the person who answered had a similar problem than mine.
Another advantage is security and the easiness of managing permission and access. All the data inside your MySQL server is password protected and you can easily manage permissions for your users about who access what and who doesn't.
In a application which relies on MySQL privileges system for authentication and use. It's certain a plus since it would be a little harder for let's say an invader to retrieve an image (or a binary file like a zipped one) from your disk or an user without access privileges to access it.
So I'd say that
If you gonna manage your MySQL and all the data you have in it and must do regular backups or intend to change or even consider a future change of OS, and have a decent hardware and optimized your MySQL to it, go for BLOB.
If you will not manage your MySQL (as in a web host for example) and doesn't intend to change OS or make backups, stick with varchar columns pointing to your files.
I hope it helped. Cheers
If you store data is BLOB field, you are making it part of your object abstraction.
BLOB advantages:
Should you want to remove row with BLOB, or remove it as part of master/slave table relationship or maybe the whole table hierarchy, your BLOB is handled automatically and has same lifetime as any other object in database.
Your scripts do not have a need to access anything but database to get everything they require. In many situations, having direct file access open whole can of worms on how to bypass access or security restrictions. For example, with file access, they may have to mount filesystems which contain actual files. But with BLOB in database, you only have to be able to connect to database, no matter where you are.
If you store it in file and file is replaced, removed or no longer accessible, your database would never know - in effect, you cannot guarantee integrity. Also, it is difficult to reliably support multiple versions when using files. If you use and depend on transactions, it becomes almost impossible.
File advantages:
Some databases handle BLOBs rather poorly. For example, while official BLOB limit in MySQL is 4GB, but in reality it is only 1MB in default configuration. You can increase this to 16-32MB by tweaking both client and server configuration to increase MySQL command buffer, but this has a lot of other implications in terms of performance and security.
Even if database does not have some weird size limits, it always will have some overhead in storing BLOB compared to just a file. Also, if BLOB is large, some databases do not provide interface to access blob piece by piece, or stream it, which can be large impediment for your workflow.
In the end, it is up to you. I typically try to keep it in BLOB, unless this creates unreasonable performance problems.
Yes, MySQL blobs that don't fit within the same page as a row get stored on overflow pages Note that some blobs are small enough that they're stored with the rest of the row, like any other column. The blob pages are not adjacent to the page their row is stored on, so they may result in extra I/O to read them.
On the other hand, just like with any other page type, blob pages can occupy memory in the InnoDB buffer pool, so reading the blobs subsequently is very fast even if they are on separate pages. Files can be cached by the operating system, but typically they're read from disk.
Here are a few other factors that may affect your decision:
Blobs are stored logically with a row. This means if you DELETE the row, the associated blob is deleted automatically. But if you store the blob outside the database, you end up with orphaned blob files after you delete rows from the database. You have to do manual steps to find and delete these files.
Blobs stored in the row also follow transaction semantics. For instance, a new blob or an updated blob is invisible to other transactions until you commit. You can also roll back a change. Storing blobs in files outside the database makes this a lot harder.
When you back up a database containing blobs, the database is a lot bigger of course, but when you backup, you get all the data and associated blobs in one step. If you store blobs externally, you have to back up the database and also back up the filesystem where you store blob files. If you need to ensure that the data and blobs are captured from one instant in time, you pretty much need to use some kind of filesystem snapshots.
If you use replication, the only automatic way of ensuring the blobs get copied to the replication slave automatically is to store blobs in the database.
Filesystem access will be faster than through the database. Blobs columns have some disadvantages in terms of indexing/sorting etc, which you could do with your filename column if you wished to in the future.
The database can also grow quickly with large blobs and then tasks like backing up become slower. I would go with a file location in database with the physical storage on the file system.
The better approach is to store your file in the filesystem folder and point to their paths through a varchar field in the database. One of the drawbacks of saving files in the database is slowing it or reducing its performance.

stream file to mysql in c

I've been searching all over the place for streaming a file into MySQL using C, and I can't find anything. This is pretty easy to do in C++, C#, and many other languages, but I can't find anything for straight C.
Basically, I have a file, and I want to read that file into a TEXT or BLOB column in my MySQL database. This can be achieved pretty easily by looping through the file and using subsequent CONCAT() calls to append the data to the column. However, I don't think this is as elegant as a solution, and is probably very error prone.
I've looked into the prepared statements using mysql_stmt_init() and all the binds, etc, but it doesn't seem to accept a FILE pointer to read the data into the database.
It is important to note I am working with very large files that cannot be stored in RAM, so reading the entire file into a temporary variable is out of the question.
Simply put: how can I read a file from disk into a MySQL database using C? And keep in mind, there needs to be some type of buffer (ie, BUFSIZ due to the size of the files). Has anyone achieved this? Is it possible? And I'm looking for a solution that works both with text and binary files.
Can you use LOAD DATA INFILE in a call to mysql_query()?
char statement[STMT_SIZE];
snprintf(statement, STMT_SIZE, "LOAD DATA INFILE '%s' INTO TABLE '%s'",
filename, tablename);
mysql_query(conn, statement);
See
http://dev.mysql.com/doc/refman/5.6/en/load-data.html and http://dev.mysql.com/doc/refman/5.6/en/mysql-query.html for the corresponding pages in the MySQL docs.
You can use a loop to read through the file, but instead of using a function like fgets() that reads one line at a time, use a lower-level function like read() or fread() that will fill an arbitrary-sized buffer at a time:
allocate large buffer
open file
while NOT end of file
fill buffer
CONCAT to MySQL
close file
release buffer
I don't like answering my own questions, but I feel the need in case someone else is looking for a solution to this down the road.
Unless I'm missing something, my research and testing has shown me that I have three general options:
Decent Solution: use a LOAD DATA INFILE statement to send the file
pros: only one statement will ever be needed. Unlike loading the entire file into memory, you can tune the performance of LOAD DATA on both the client and the server to use a given buffer size, and you can make that buffer much smaller, which will give you "better" buffer control without making numerous calls
cons: First of all, the file absolutely MUST be in a given format, which can be difficult to do with binary blob files. Also, this takes a fair amount of work to set up, and requires a lot of tuning. By default, the client will try to load the entire file into memory, and use swap-space for the amount of the file that does not fit into memory. It's very easy to get terrible performance here, and every time you wish to make a change you have to restart the mysql server.
Decent Solution: Have a buffer (eg, char buf[BUFSIZ]), and make numerous queries with CONCAT() calls to update the content
pros: uses the least amount of memory, and gives the program better control over how much memory is being used
cons: takes up A LOT of processing time because you are making numerous mysql calls, and the server has to find the given row, and then append a string to it (which takes time, even with caching)
Worst Solution: Try to load the entire file into memory (or as much as possible), and make only one INSERT or UPDATE call to mysql
pros: limits the amount of processing performance needed on the client, as only a minimum number of calls (preferably one) will need to be buffered and executed.
cons: takes up a TON of memory. If you have numerous clients making these large calls simultaneously, the server will run out of memory quickly, and any performance gains will turn to losses very quickly.
In a perfect world, MySQL would implement a feature which allowed for buffering queries, something akin to buffering a video: you open a MySQL connection, then within that open a 'query connection' and stream the data in buffered sets, then close the 'query connection'
However, this is NOT a perfect world, and there is no such thing in MySQL. this leaves us with the three options shown above. I decided to stick with the second, where I make numerous CONCAT() calls because my current server has plenty of processing time to spare, and I'm very limited on memory in the clients. For my unique situation, trying to beat my head around tuning LOAD DATA INFILE doesn't make sense. Every application, however, will have to analyze it's own problem.
I'll stress none of these are "perfect" for me, but you can only do the best with what you have.
Points to Adam Liss for giving the LOAD DATA INFILE direction.

Upload large files to BLOB

I'm working with saving big files(~200mb) directly into db.
I have issue with that.
Caused by increased huge use of free RAM(about 3gb of ram and 3gb of swap) on stage when file saves to db:
#job.pdf = params[:job][:pdf].read
After this is completed there is still some RAM and swap in use.
Is there some way to optimize that?
p.s. project on rails 3.0.3, uses mysql, running on mogrel.
In MySQL, to be able to save or read BLOB fields with size more than 1MB, you have to increase server side parameter max_allowed_packet to be larger than default. In practice, you can't go much farther than 16-32MB for this parameter. Price for this increase is that every new db client will consume at least as much memory, and in general, server performance will greatly suffer.
In other words, MySQL does not really support handling BLOB fields larger than 1MB (if you can't or don't want to fiddle with server configuration) to around 16MB (even if you do want to do that).
This can be philosophical question - is it good idea or not to keep big blobs in database? I think for many tasks (but not for all) is it great idea, and because MySQL is so bad it this (and for host of other reasons), I simply avoid using it as my SQL server solution.
Instead, I use PostgreSQL, which perfectly supports BLOBs (actually, BYTEA) to advertized limit of 4GB without any tweaks on client or server. In addition to that, it will actually transparently compress them with LZ algorithm - slightly worse than gzip, but still much better than no compression at all.

Hibernate - save large files

I need to save large files (up to 5Mb) to DB (MySql) via Hibernate.
Changing max_allowed_packet param looks not good idea for me.
Is there a way or hibernate technics to do this? For example automatic breaking data into small portions and insert them one-by-one in 1 transaction.
I believe this should be possible.
Since it is limited at server-side you can't modifiy this on client side, thus you need to increase max_allowed_packet to at least 5MB.
The maximum is 1GB according to them manual. So 5MB shouldn't be too large.

Database for Large number of 1kB data chunks (MySQL?)

I have a very large dataset, each item in the dataset being roughly 1kB in size. The data needs to be queried rapidly by many applications distributed over a network. The dataset has more than a million items (so 500 million+ 1kB data chunks).
What would be the best method to storing this dataset (need to allow adding more items, and reading them rapidly, but never modifying already added data)? Would using a MySQL DB using the binary blob format be appropriate?
Or should each of these be stored as files on a file system?
edit: the number is 1 million items now, but needs to be able to scale to well over 500 million items easily.
Since there is no need to index anything inside the object. I would have to say a filesystem is probably your best bet not a relational database. Since there's only an unique ID and a blob, there really isn't any structure here, so there's no value to putting it in a database.
You could use a web server to provide access to the repository. And then a caching solution like nginx w/memcache to keep it all in memory and scale out using load balancing.
And if you run into further performance issues, you can remove the filesystem and roll your own like Facebook did with their photos system. This can reduce the unnecessary IO operations for pulling unneeded meta-data from the file system like security information.
If you need to retrive saved data then storing in files is certainly not a good idea.
MySQL is a good choice. But make sure you have right indexes set.
Regarding binary-blob. It depends on what you plan to store. Give us more details.
That's one GB of data. What are you going to use the database for?
That's definitely just a file, read it into ram when starting up.
Scaling to 500Million is easy. That just takes some more machines.
Depending on the precise application characteristics, you might be able to normalize or compress the data in ram.
You might be able to keep things on disk, and use a database, but that seriously limits your scalability in terms of simultaneous access. You get 50 disk accesses/sec from a disk, so just count how many disk you need.