Hibernate - save large files - mysql

I need to save large files (up to 5Mb) to DB (MySql) via Hibernate.
Changing max_allowed_packet param looks not good idea for me.
Is there a way or hibernate technics to do this? For example automatic breaking data into small portions and insert them one-by-one in 1 transaction.
I believe this should be possible.

Since it is limited at server-side you can't modifiy this on client side, thus you need to increase max_allowed_packet to at least 5MB.
The maximum is 1GB according to them manual. So 5MB shouldn't be too large.

Related

MySQL, Saving large blobs

I'm testing an application written in QT that deals with PDFs saved on a database, i was having trouble trying to save anything larger than about 1Mb the application would crash, reading on Goggle end up changing the MAX_ALLOWED_PACKET and let me save blobs.
I plotted several uploads of different size of PDF and i got a number of about 200Kb/sec saving files. Then it came my surprise, checking the data base i realized that anything over around 5Mb would not store. There is no error and it seems that the handshake between the application and MySQL goes ok, as i don't get any errors.
I have some experience with MySQL and Oracle but i have never dealt with Blobs.
I read on a post somewhere that i should try to change the value of innodb_log_file_size (i tried 2000000000) but MySQL tells me that it is a read only variable. Could some body help me fix this problem? I'm running MySQL on Ubuntu.
It's not surprising that you got an error, because the default innodb log file size is 48MB (50331648 bytes).
Your innodb log file size must be at least 10x the size of your largest blob that you try to save. In other words, you can save a blob only if it's no larger than 1/10th the log file size. This started being enforced in MySQL 5.6; before that it was recommended in the manual, but not enforced.
You can change the log file size, but it requires restarting the MySQL Server. The steps are documented here: https://dev.mysql.com/doc/refman/5.7/en/innodb-data-log-reconfiguration.html
P.S. As for the comments about storing images in the database vs. as files on disk, this is a long debate. Some people will make unequivocal statements that it's bad to store images in the database, but there are pros and cons on both sides of the argument. See my answer to Should I use MySQL blob field type?
Bill has the immediate answer. But I suggest that you will get bigger and bigger documents, and be hitting one limit after another.
Meanwhile, unless you have a very recent MySQL, changing the innodb_log_file_size is a pain.
If you ever get to 1GB, you will hit the limit for max_allowed_packet. Even if you got past that, then you will hit another hard limit -- 4GB, the maximum size of LONGBLOB or LONGTEXT.
I suggest you either bite the bullet
Plan A: Put documents in the file system, or
Plan B: Chunk the documents into pieces, storing into multiple BLOB rows. This will avoid all limits, even the 4GB limit. But the code will be messy on input and output.

Compaction while insertion Mysql

I have a use case to store my JSON payloads in my mysql column,as my scale is huge the data is growing like anything because my payloads are very big, mostly in KBs.
I am trying to find the best way possible to do some compaction while inserting. Mysql provides AES_ENCRYPT.
My question is :
does this impact performance at large scale? Is there any other way
possible?
I am using InnoDB engine currently.
Encryption does not reduce the size of data and potentially adds a small amount, padding with AES can add from 1 to 16 bytes. Encryption will also add the time to encrypt/decrypt, depending on the system hardware this can be a rather large hit or minimal.
There are a couple of possible solutions:
Compress the data with a method such as zip, compress or one of the dependents, depending on the data this can produce little to a substantial reduction in size, with the right data the compression can produce a ~90% reduction. This will add CPU overhead but in many cases is overall faster because there is less data read from disk.
Save large data in files and put the filename in the database. This is a rather standard way to handle large data with databases.
There are different encryption and compression methods which you can use. They are available at this link.
The main compression methods are:
1) AES_ENCRYPT()
3) DES_ENCRYPT()
4) ENCRYPT()
5) COMPRESS()
Some of them like DES_ENCRYPT, SHA1, MD5, are beign either less secure or too old like DES.
I have not seen a lot of infomration about how exactly the files are compressed so I suggest you to test between AES_ENCRYPT(), ENCRYPT() and COMPRESS(), and check the latency and response time of your server to figure it out.
Otherwise, as far as I know, Those methods are more used for establishing more safe communications between servers and applications for end users. I am not shure if compressing those informations will permit you to gain a lot of storage on your server.
But compared to what you said about the payload issues, It really depends on what you want to do and in what structure you are in (enterprise, small business, personal dev.)..If you are developing a small business or a personal project, mysql with innoDB is fine.. But for big projects like for enterprises etc. I suggest you to turn to Oracle or SQL server which are able to manage a lot more quantity of data (not free) or PostgreSQL which is under free licence.
Cheers andy

Which would be more efficient, having each user create a database connection, or caching?

I'm not sure if caching would be the correct term for this but my objective is to build a website that will be displaying data from my database.
My problem: There is a high probability of a lot of traffic and all data is contained in the database.
My hypothesized solution: Would it be faster if I created a separate program (in java for example) to connect to the database every couple of seconds and update the html files (where the data is displayed) with the new data? (this would also increase security as users will never be connecting to the database) or should I just have each user create a connection to MySQL (using php) and get the data?
If you've had any experiences in a similar situation please share, and I'm sorry if I didn't word the title correctly, this is a pretty specific question and I'm not even sure if I explained myself clearly.
Here are some thoughts for you to think about.
First, I do not recommend you create files but trust MySQL. However, work on configuring your environment to support your traffic/application.
You should understand your data a little more (How much is the data in your tables change? What kind of queries are you running against the data. Are your queries optimized?)
Make sure your tables are optimized and indexed correctly. Make sure all your query run fast (nothing causing a long row locks.)
If your tables are not being updated very often, you should consider using MySQL cache as this will reduce your IO and increase the query speed. (BUT wait! If your table is being updated all the time this will kill your server performance big time)
Your query cache is set to "ON". Based on my experience this is always bad idea unless your data does not change on all your tables. When you have it set to "ON" MySQL will cache every query. Then as soon as they data in the table changes, MySQL will have to clear the cached query "it is going to work harder while clearing up cache which will give you bad performance." I like to keep it set to "ON DEMAND"
from there you can control which query should be cache and which should not using SQL_CACHE and SQL_NO_CACHE
Another thing you want to review is your server configuration and specs.
How much physical RAM does your server have?
What types of Hard Drives are you using? SSD is not at what speed do they rotate? perhaps 15k?
What OS are you running MySQL on?
How is the RAID setup on your hard drives? "RAID 10 or RAID 50" will help you out a lot here.
Your processor speed will make a big different.
If you are not using MySQL 5.6.20+ you should consider upgrading as MySQL have been improved to help you even more.
How much RAM does your server have? is your innodb_log_buffer_size set to 75% of your total physical RAM? Are you using innodb table?
You can also use MySQL replication to increase the read sources of the data. So you have multiple servers with the same data and you can point half of your traffic to read from server A and the other half from Server B. so the same work will be handled by multiple server.
Here is one argument for you to think about: Facebook uses MySQL and have millions of hits per seconds but they are up 100% of the time. True they have trillion dollar budget and their network is huge but the idea here is to trust MySQL to get the job done.

Upload large files to BLOB

I'm working with saving big files(~200mb) directly into db.
I have issue with that.
Caused by increased huge use of free RAM(about 3gb of ram and 3gb of swap) on stage when file saves to db:
#job.pdf = params[:job][:pdf].read
After this is completed there is still some RAM and swap in use.
Is there some way to optimize that?
p.s. project on rails 3.0.3, uses mysql, running on mogrel.
In MySQL, to be able to save or read BLOB fields with size more than 1MB, you have to increase server side parameter max_allowed_packet to be larger than default. In practice, you can't go much farther than 16-32MB for this parameter. Price for this increase is that every new db client will consume at least as much memory, and in general, server performance will greatly suffer.
In other words, MySQL does not really support handling BLOB fields larger than 1MB (if you can't or don't want to fiddle with server configuration) to around 16MB (even if you do want to do that).
This can be philosophical question - is it good idea or not to keep big blobs in database? I think for many tasks (but not for all) is it great idea, and because MySQL is so bad it this (and for host of other reasons), I simply avoid using it as my SQL server solution.
Instead, I use PostgreSQL, which perfectly supports BLOBs (actually, BYTEA) to advertized limit of 4GB without any tweaks on client or server. In addition to that, it will actually transparently compress them with LZ algorithm - slightly worse than gzip, but still much better than no compression at all.

Database for Large number of 1kB data chunks (MySQL?)

I have a very large dataset, each item in the dataset being roughly 1kB in size. The data needs to be queried rapidly by many applications distributed over a network. The dataset has more than a million items (so 500 million+ 1kB data chunks).
What would be the best method to storing this dataset (need to allow adding more items, and reading them rapidly, but never modifying already added data)? Would using a MySQL DB using the binary blob format be appropriate?
Or should each of these be stored as files on a file system?
edit: the number is 1 million items now, but needs to be able to scale to well over 500 million items easily.
Since there is no need to index anything inside the object. I would have to say a filesystem is probably your best bet not a relational database. Since there's only an unique ID and a blob, there really isn't any structure here, so there's no value to putting it in a database.
You could use a web server to provide access to the repository. And then a caching solution like nginx w/memcache to keep it all in memory and scale out using load balancing.
And if you run into further performance issues, you can remove the filesystem and roll your own like Facebook did with their photos system. This can reduce the unnecessary IO operations for pulling unneeded meta-data from the file system like security information.
If you need to retrive saved data then storing in files is certainly not a good idea.
MySQL is a good choice. But make sure you have right indexes set.
Regarding binary-blob. It depends on what you plan to store. Give us more details.
That's one GB of data. What are you going to use the database for?
That's definitely just a file, read it into ram when starting up.
Scaling to 500Million is easy. That just takes some more machines.
Depending on the precise application characteristics, you might be able to normalize or compress the data in ram.
You might be able to keep things on disk, and use a database, but that seriously limits your scalability in terms of simultaneous access. You get 50 disk accesses/sec from a disk, so just count how many disk you need.