MySQL - Base64 vs BLOB - mysql

For the sake of simplicity, suppose I'm developing a mobile app like Instagram. Users can download images from the server, and upload images of their own. Currently the server stores all images (in reality, just small thumbnails) in a MySQL database as BLOBs. It seems that the most common way to transfer images is by using Base64 encoding, which leaves me with two options:
Server stores all images as BLOBs. To upload an image, client encodes it into Base64 string, then sends it to the server. Server decodes image BACK into binary format and stores it as BLOB in the database. When client requests an image, server re-encodes the image as Base64 string and sends it to the client, who then decodes it back to binary for display.
Server stores all images as Base64 strings. To upload an image, client encodes it into Base64 string and sends it to the server. Server does no encoding or decoding, but simply stores the string in the database. When client requests an image, the Base64 string is returned to the client, who then decodes it for display.
Clearly, option #1 requires significantly more processing on the server, as images must be encoded/decoded with every single request. This makes me lean toward option #2, but some research has suggested that storing Base64 string in MySQL is much less efficient than storing the image directly as BLOB, and is generally discouraged.
I'm certainly not the first person to encounter this situation, so does anybody have suggestions on the best way to make this work?

JSON assumes utf8, hence is incompatible with images unless they are encoded in some way.
Base64 is almost exactly 8/6 times as bulky as binary (BLOB). One could argue that it is easily affordable. 3000 bytes becomes about 4000 bytes.
Everyone should be able to accept arbitrary 8-bit codes, but not everybody does. Base-64 may be the simplest and overall best compromise for not having to deal with 8-bit data.
Since these are "small", I would store them in a table, not a file. I would, however, store them in a separate table and JOIN by an appropriate id when you need them. This allows queries that don't need the image to run faster because they are not stepping over the BLOBs.
Technically, TEXT CHARACTER SET ascii COLLATE ascii_bin would do, but BLOB makes it clearer that there is not really any usable text in the column.

Why would you base64-encode the images on the wire? I think you're starting from a wrong assumption.

I don't see why the DB Server shouldn't always keep binary data in it's native form. Thus, use a BLOB.
(But even if you did store the data in Base64 string, there is no need concern yourself about encoding/decoding performance because the IO's impact will be more significant.)
I don't get why the client should send the data in base64 though. Why not just "stream" it using a simple HTTP call?

Related

Delphi Indy HTTPServer: Write text in Chunks possible?

Is it possible to send big amounts of text (csv database export) through an Indy HTTP Server in chunks to a requesting client to avoid hitting memory restrictions?
I am implementing a REST interface in an application server written in Delphi 10.4.2 as I can not expose the database connection for several reasons that are not negotiable.
The data will be consumed by statisticans using R.
As the amount of data can grow up to a GB, I have no good feeling filling a string and writing it to the connection.
#MichaSchumann,
Modern Internet Browser's use often "gzip"ed data that comes from an Apache2.4 WebServer (for Windows 10 Version - see XAMPP). The Browser commonly depack the data, and display the contents based on the file, and the magic content/type(s) (this can be text (html) or binary (pictures).
Indy 9 is sending there packets one to one. This means, the data is "not" secured, and "not" packed before sending.
So, You could be "gzip" Your CSV file (csv files consists commonly of ASCII-Text, which can be quick pack to a binary file that is a mount huge smaller in size, and the transmitting of the data stream does quicker on the stage (if You thinking over the time, what a 1 GiByte file need on time !
Note: you should be send a header for each chunck.
In this header, You save the relevant informations about the size of the actual transmitted chunk, and the position.
So, You can implement Your own FTP-Server that is remember his "position" state.
This is useful when the Connection breaks.
Normal (old FTP-Servers) have the cons that they need the first position of the first chunk of each broken Connection. This means: You have always send the file from 0 .. size.
Feel free to ask other question, I am open, You are welcome.

It is worth it memorywise to store images as data URLs in a database instead of as static images?

I have a website in which I let users upload images. I convert these images to data URLs through HTML5 and store them as text fields in a database: http://en.wikipedia.org/wiki/Data_URI_scheme
I figured this would reduce the time for page load since I need to make fewer HTTP requests even though the main HTML page would be much longer.
I'm suspicious though that these images stored in URLs take up more space in the database than their static counterparts on disk. I noticed that a data URL for image had 250K characters (so I assume stored in 250KB), but when I right-clicked and saved the same image on disk, the image was only 180K.
Do data URLs significantly inflate the memory required to store an image?
Yes. Images stored as Data URIs go through a base64_encode process. Which inflates their size by approximately 30%.
You're approaching this wrong.
Store the files themselves on the disk, separated by folders. I like naming the images as hashes of their contents. That way you can avoid duplicates easily if you want.
Store data on those images on the database. Include the path, upload time, uploading user, position of Saturn's moons, and whatever other data you may want on the database, except for the image itself, which should be stored on the file system.
If you wish, you can generate the Data URI in real time (like I said, it's a simple base64_encode).
It can also take more than 250KB of space if you're using a multi-byte character encoding for your DB column. If they are 250K characters, it may be using between 500K and 1M of disk space.
Not only do they take up more space, but they can not be cached by the users browser independently of the page they are on. This can reduce performance significantly.
it's definitly not worth it memorywise. as said in other answers:
base64 encoded grow by about 30% (Second Rikudos Answer)
I would not recommend storing image data in a database at all, but there is controversy, see here
serving base64 encoded dataURLs in your main html may be ok for small images, but as we talk about user uploaded images, I doubt they are lightweight (no more than 10k, depends a bit). also, Daniels note about caching is weighting strong. but in fact, that's another question already.

blob vs text in forum

I'm writing a forum website, but i am having a problem with designing the database to hold the posts right now. From what I have read, phpBB 3 stores the user posted data as BLOB, and some store data as text. Is there an advantage that one would have over another? What about max characters if one were to store the data as text? I have never retrieved BLOB data from database and parse it to text (or whatever data it is meant to be for that matter) but I think learning how to use it would be interesting.
The text data for the post will be utf8 encoded. I'm using MySQL database.
Also, how would you allow uploading images for the forum and should the images be stored as part of the database, like in a BLOB for example, should should it be stored as individual files?
Any recommendation/suggestions are welcomed.
Text and blob are nearly identical, the main difference is that you have to care about the encoding yourself when using blob. On the other hand you will be forced to use the encoding of a text field...
Storing files in the database has some advantages and disatvantages:
Advantages:
Backups and switches to another server are easier to achieve
No need to add another method to access the files beside the database
Disadvantages:
Accessing an image means retrieving it's bytes and maybe storing that as temporary local files and may be more complicated
File systems are simply faster due to lower overhead
The decission is yours, the most opinions that I have heard so far are to store the pictures as files; but that are opinions, try to base your decision on your projects needs.

Storing image in database directly or as base64 data?

The common method to store images in a database is to convert the image to base64 data before storing the data. This process will increase the size by 33%. Alternatively it is possible to directly store the image as a BLOB; for example:
$image = new Imagick("image.jpg");
$data = $image->getImageBlob();
$data = $mysqli->real_escape_string($data);
$mysqli->query("INSERT INTO images (data) VALUES ('$data')");
and then display the image with
<img src="data:image/jpeg;base64,' . base64_encode($data) . '" />
With the latter method, we save 1/3 storage space. Why is it more common to store images as base64 in MySQL databases?
UPDATE: There are many debates about advantages and disadvantages of storing images in databases, and most people believe it is not a practical approach. Anyway, here I assume we store image in database, and discussing the best method to do so.
I contend that images (files) are NOT usually stored in a database base64 encoded. Instead, they are stored in their raw binary form in a binary column, blob column, or file.
Base64 is only used as a transport mechanism, not for storage. For example, you can embed a base64 encoded image into an XML document or an email message.
Base64 is also stream friendly. You can encode and decode on the fly (without knowing the total size of the data).
While base64 is fine for transport, do not store your images base64 encoded.
Base64 provides no checksum or anything of any value for storage.
Base64 encoding increases the storage requirement by 33% over a raw binary format. It also increases the amount of data that must be read from persistent storage, which is still generally the largest bottleneck in computing. It's generally faster to read less bytes and encode them on the fly. Only if your system is CPU bound instead of IO bound, and you're regularly outputting the image in base64, then consider storing in base64.
Inline images (base64 encoded images embedded in HTML) are a bottleneck themselves--you're sending 33% more data over the wire, and doing it serially (the web browser has to wait on the inline images before it can finish downloading the page HTML).
On MySQL, and perhaps similar databases, for performance reasons, you might wish to store very small images in binary format in BINARY or VARBINARY columns so that they are on the same page as the primary key, as opposed to BLOB columns, which are always stored on a separate page and sometimes force the use of temporary tables.
If you still wish to store images base64 encoded, please, whatever you do, make sure you don't store base64 encoded data in a UTF8 column then index it.
Pro base64: the encoded representation you handle is a pretty safe string. It contains neither control chars nor quotes. The latter point helps against SQL injection attempts. I wouldn't expect any problem to just add the value to a "hand coded" SQL query string.
Pro BLOB: the database manager software knows what type of data it has to expect. It can optimize for that. If you'd store base64 in a TEXT field it might try to build some index or other data structure for it, which would be really nice and useful for "real" text data but pointless and a waste of time and space for image data. And it is the smaller, as in number of bytes, representation.
Just want to give one example why we decided to store image in DB not files or CDN, it is storing images of signatures.
We have tried to do so via CDN, cloud storage, files, and finally decided to store in DB and happy about the decision as it was proven us right in our subsequent events when we moved, upgraded our scripts and migrated the sites serveral times.
For my case, we wanted the signatures to be with the records that belong to the author of documents.
Storing in files format risks missing them or deleted by accident.
We store it as a blob binary format in MySQL, and later as based64 encoded image in a text field. The decision to change to based64 was due to smaller size as result for some reason, and faster loading. Blob was slowing down the page load for some reason.
In our case, this solution to store signature images in DB, (whether as blob or based64), was driven by:
Most signature images are very small.
We don't need to index the signature images stored in DB.
Index is done on the primary key.
We may have to move or switch servers, moving physical images files to different servers, may cause the images not found due to links change.
it is embarrassed to ask the author to re-sign their signatures.
it is more secured saving in the DB as compared to exposing it as files which can be downloaded if security is compromised. Storing in DB allows us better control over its access.
any future migrations, change of web design, hosting, servers, we have zero worries about reconcilating the signature file names against the physical files, it is all in the DB!
AC
I recommend looking at modern databases like NoSQL and also I agree with user1252434's post. For instance I am storing a few < 500kb PNGs as base64 on my Mongo db with binary set to true with no performance hit at all. Mongo can be used to store large files like 10MB videos and that can offer huge time saving advantages in metadata searches for those videos, see storing large objects and files in mongodb.

What's the most space-efficient way to store large amounts of text in MySQL?

I'm writing a webcrawler in Python that will store the HTML code of a large set of pages in a MySQL database. I'd like to make sure my methods of storage and processing are optimal before I begin processing data. I would like to:
Minimize storage space used in the database - possibly by minifying HTML code, Huffman encoding, or some other form of compression. I'd like to maintain the possibility of fulltext searching the field - I don't know if compression algorithms like Huffman encoding will allow this.
Minimize the processor usage necessary to encode and store large volumes of rows.
Does anyone have any suggestions or experience in this or a similar issue? Is Python the optimal language to be doing this in, given that it's going to require a number of HTTP requests and regular expressions plus whatever compression is optimal?
If you don't mind the HTML being opaque to MySQL, you can use the COMPRESS function to store the data and UNCOMPRESS to retrieve it. You won't be able to use the HTML contents in a WHERE clause (using, e.g., LIKE).
Do you actully need to store the source in the database?
Trying to run 'LIKE' queries against the data is going to suck big time anyway.
Store the raw data on the file system, as standard files. Just dont stick them all in one folder. use hashes of the id, to store them in predictable folders.
(while of course it is perfectly possible to store the text in the database, it bloats the size of your database, and makes it harder to work with. backups are (much!) bigger, changing storage engine, becomes more painful etc. Scaling your filesystem, is usually just a case of adding another harddisk. That doesnt work so easily with a database - you start needing to shard)
... to do any sort of searching on the data, you looking at building an index. I only have experience with SphinxSearch, but that allows you to specify a filename in the input database.