how blobs are stored in mysql - mysql

I have read that some storage engine of mysql may store blobs "externally".
My question is:
What is the benefit of storing blobs in external space and not
in-line?
How are the externally stored blobs tied to the original
record?

Blobs are "external" in the sense that they may not be stored on same page with the rest of the columns of a given row. They are still stored in the same tablespace file with the row they are associated with.
The advantage is that blobs can be larger than a single page. InnoDB pages are of fixed size, by default they are 16KiB. Blobs may be up to 64KiB, and Mediumblob and Longblob may be up to 16MiB and 4GiB, respectively. That's obviously going to take quite a few pages.
How they are tied to the row varies by row format. In all cases, if a given Blob happens to fit in the page, it is stored in the page. That is, even though Blob allows up to 64KiB, you might typically store shorter content.
If the Blob is too long to be stored in the page, there are two ways this is handled:
In row format of redundant or compact, the first 768 bytes is stored with the row, and then there's a pointer to the page where the remainder of the Blob begins to be stored.
In row format of dynamic or compressed, the Blob is replaced in the row page with a 20-byte pointer to the page where the entire Blob begins.
External pages for the Blob are allocated one by one up to 32 pages (half of one megabyte). If the Blob is larger than that, further pages are allocated 64 at a time (1MiB).
Text/Mediumtext/Longtext and Varchar/Varbinary are also handled similarly. They can be larger than a single page, so they may need to be stored on multiple pages.

Related

Innodb + 8K length row limit

I've a Innodb table with, amongst other fields, a blob field that can contain up to ~15KB of data. Reading here and there on the web I found that my blob field can lead (when the overall fields exceed ~8000 bytes) to a split of some records into 2 parts: on one side the record itself with all fields + the leading 768 bytes of my blob, and on another side the remainder of the blob stored into multiple (1-2?) chunks stored as a linked list of memory pages.
So my question: in such cases, what is more efficient regarding the way data is cached by MySQL ? 1) Let the engine deal with the split of my data or 2) Handle the split myself in a second table, storing 1 or 2 records depending on the length of my blob data and having these records cached by MySQL ? (I plan to allocate as much memory as I can for this to occur)
Unless your system's performance is so terrible that you have to take immediate action, you are better off using the internal mechanisms for record splitting.
The people who work on MySQL and its forks (e.g. MariaDB) spend a lot of time implementing and testing optimizations. You will be much happier with simple application code; spend your development and test time on your application's distinctive logic rather than trying to work around internals issues.

Storing base64 encoded data as BLOB or TEXT datatype

We have a MySQL InnoDB table holding ~10 columns of small base64 encoded javascript files and png (<2KB size) images base64 encoded as well.
There are few inserts and a lot of reads comparatively, however the output is being cached on a Memcached instance for some minutes to avoid subsequent reads.
As it is right now we are using BLOB for those columns, but I am wondering if there is an advantage in switching to TEXT datatype in terms of performance or snapshot backing up.
My search digging indicates that BLOB and TEXT for my case are close to identical and since I do not know before-hand what type of data are actually going to be stored I went for BLOB.
Do you have any pointers on the TEXT vs BLOB debate for this specific case?
One shouldn't store Base64-encoded data in one's database...
Base64 is a coding in which arbitrary binary data is represented using only printable text characters: it was designed for situations where such binary data needs to be transferred across a protocol or medium that can handle only printable-text (e.g. SMTP/email). It increases the data size (by 33%) and adds the computational cost of encoding/decoding, so it should be avoided unless absolutely necessary.
By contrast, the whole point of BLOB columns is that they store opaque binary strings. So just go ahead and store your stuff directly into your BLOB columns without first Base64-encoding them. (That said, if MySQL has a more suitable type for the particular data being stored, you may wish to use that instead: for example, text files like JavaScript sources could benefit from being stored in TEXT columns for which MySQL natively tracks text-specific metadata—more on this below).
The (erroneous) idea that SQL databases require printable-text encodings like Base64 for handling arbitrary binary data has been perpetuated by a large number of ill-informed tutorials. This idea appears to be seated in the mistaken belief that, because SQL comprises only printable-text in other contexts, it must surely require it for binary data too (at least for data transfer, if not for data storage). This is simply not true: SQL can convey binary data in a number of ways, including plain string literals (provided that they are properly quoted and escaped like any other string); of course, the preferred way to pass data (of any type) to your database is through parameterised queries, and the data types of your parameters can just as easily be raw binary strings as anything else.
...unless it's cached for performance reasons...
The only situation in which there might be some benefit from storing Base64-encoded data is where it's usually transmitted across a protocol requiring such encoding (e.g. by email attachment) immediately after being retrieved from the database—in which case, storing the Base64-encoded representation would save from having to perform the encoding operation on the otherwise raw data upon every fetch.
However, note in this sense that the Base64-encoded storage is merely acting as a cache, much like one might store denormalised data for performance reasons.
...in which case it should be TEXT not BLOB
As alluded above: the only difference between TEXT and BLOB columns is that, for TEXT columns, MySQL additionally tracks text-specific metadata (such as character encoding and collation) for you. This additional metadata enables MySQL to convert values between storage and connection character sets (where appropriate) and perform fancy string comparison/sorting operations.
Generally speaking: if two clients working in different character sets should see the same bytes, then you want a BLOB column; if they should see the same characters then you want a TEXT column.
With Base64, those two clients must ultimately find that the data decodes to the same bytes; but they should see that the stored/encoded data has the same characters. For example, suppose one wishes to insert the Base64-encoding of 'Hello world!' (which is 'SGVsbG8gd29ybGQh'). If the inserting application is working in the UTF-8 character set, then it will send the byte sequence 0x53475673624738676432397962475168 to the database.
if that byte sequence is stored in a BLOB column and later retrieved by an application that is working in UTF-16*, the same bytes will be returned—which represent '升噳扇㡧搲㥹扇全' and not the desired Base64-encoded value; whereas
if that byte sequence is stored in a TEXT column and later retrieved by an application that is working in UTF-16, MySQL will transcode on-the-fly to return the byte sequence 0x0053004700560073006200470038006700640032003900790062004700510068—which represents the original Base64-encoded value 'SGVsbG8gd29ybGQh' as desired.
Of course, you could nevertheless use BLOB columns and track the character encoding in some other way—but that would just needlessly reinvent the wheel, with added maintenance complexity and risk of introducing unintentional errors.
* Actually MySQL doesn't support using client character sets that are not byte-compatible with ASCII (and therefore Base64 encodings will always be consistent across any combination of them), but this example nevertheless serves to illustrate the difference between BLOB and TEXT column types and thus explains why TEXT is technically correct for this purpose even though BLOB will actually work without error (at least until MySQL adds support for non-ASCII compatible client character sets).

To BLOB or not to BLOB

I am in the process of writing a web app backed up by a MySQL database where one of the tables has a potential for getting very large (order of gigabytes) with a significant proportion of table operations being writes. One of the table columns needs to store a string sequence that can be quite big. In my tests thus far it has reached a size of 289 bytes but to be on the safe side I want to design for a maximum size of 1 kb. Currently I am storing that column as a MySQL MediumBlob field in an InnoDB table.
At the same time I have been googling to establish the relative merits and demerits of BLOBs vs other forms of storage. There is a plethora of information out there, perhaps too much. What I have gathered is that InnoDB stores the first few bytes (789 if memory serves me right) of the BLOB in the table row itself and the rest elsewhere. I have also got the notion that if a row has more than one BLOB (which my table does not) per column then the "elsewhere" is a different location for each BLOB. That apart I have got the impression that accessing BLOB data is significantly slower than accessing row data (which sounds reasonable).
My question is just this - in light of my BLOB size and the large potential size of the table should I bother with a blob at all? Also, if I use some form of inrow storage instead will that not have an adverse effect on the maximum number of rows that the table will be able to accommodate?
MySQL is neat and lets me get away with pretty much everything in my development environment. But... that ain't the real world.
I'm sure you've already looked here but it's easy to overlook some of the details since there is a lot to keep in mind when it comes to InnoDB limitations.
The easy answer to one of your questions (maximum size of a table) is 64TBytes. Using variable size types to move that storage into a separate file would certainly change the upper limit on number of rows but 64TBytes is quite a lot of space so the ratio might be very small.
Having a column with a 1KByte string type that is stored inside the table seems like a viable solution since it's also very small compared to 64TBytes. Especially if you have very strict requirements for query speed.
Also, keep in mind that the InnoDB 64TByte limit might be pushed down by the the maximum file size for the OS you're using. You can always link several files together to get more space for your table but then it's starting to get a bit more messy.
if the BLOB data is more then 250kb it is not worth it. In your case i wouldn't bother myself whit BLOB'n. Read this

MySQL Blob vs. Disk for "video frames"

I have a c++ app that generates 6x relatively small image-like integer arrays per second. The data is 64x48x2-dimensional int (ie, a grid of 64x48 two-dimensional vectors, with each vector consisting of two floats). That works out to ~26kb per image. The app also generates a timestamp and some features describing the data. I want to store the timestamp and the features in a MySQL db column, per frame. I also need to store the original array as binary data, either in a file on disc or as a blob field in the database. Assume that the app will be running more or less nonstop, and that I'll come up with a way to archive data older than a certain age, so that storage does not become a problem.
What are the tradeoffs here for blobs, files-on-disc, or other methods I may not even be thinking of? I don't need to query against the binary data, but I need to query against the other metadata/features in the table (I'll definitely have an index built against timestamp), and retrieve the binary data. Does the equation change if I store multiple frames in a single file on disk, vs. one frame per file?
Yes, I've read MySQL Binary Storage using BLOB VS OS File System: large files, large quantities, large problems and To Do or Not to Do: Store Images in a Database, but I think my question differs because in this case there are going to be millions of identically-dimensioned binary files. I'm not sure how the performance hit to maintaining that many small files in a filesystem compares to storing that many files in db blob columns. Any perspective would be appreciated.
At a certain point, querying for many blobs becomes unbearably slow. I suspect that even if your identically dimensioned binary files this will be the case. Moreover you will still need some code to access and process the blobs. And this doesn't take advantage of file caching that might speed up image queries straight from the file system.
But! The link you provided did not mention object based databases, which can store the data you described in a way that you can access it extremely quickly, and possibly return it in native format. For a discussion see the link or just search google, there are many discussions:
Storing images in NoSQL stores
I would also look into HBase.
I figured since you were not sure about what to use in the first place(and there were no answers), an alternative solution might be appropriate.

MySQL: What is a page?

I can't for the life of me remember what a page is, in the context of a MySQL database. When I see something like 8KB/page, does that mean 8KB per row or ...?
Database pages are the internal basic structure to organize the data in the database files. Following, some information about the InnoDB model:
From 13.2.11.2. File Space Management:
The data files that you define in the configuration file form the InnoDB tablespace. The files are logically concatenated to form the tablespace. [...] The tablespace consists of database pages with a default size of 16KB. The pages are grouped into extents of size 1MB (64 consecutive pages). The “files” inside a tablespace are called segments in InnoDB.
And from 13.2.14. Restrictions on InnoDB Tables
The default database page size in InnoDB is 16KB. By recompiling the code, you can set it to values ranging from 8KB to 64KB.
Further, to put rows in relation to pages:
The maximum row length, except for variable-length columns (VARBINARY, VARCHAR, BLOB and TEXT), is slightly less than half of a database page. That is, the maximum row length is about 8000 bytes. LONGBLOB and LONGTEXT columns must be less than 4GB, and the total row length, including BLOB and TEXT columns, must be less than 4GB.
Well,
its not really a question about MySql its more about what page size is in general in memory management.
You can read about that here: http://en.wikipedia.org/wiki/Page_(computer_memory)
Simply put its the smallest unit of data that is exchanged/stored.
The default page size is 4k which is probably fine.
If you have large data-sets or only very few write operations it may improve performance to raise the page size.
Have a look here: http://db.apache.org/derby/manuals/tuning/perf24.html
Why? Because more data can be fetched/addressed at once.
If the probability is high that the desired data is in close proximity to the data you just fetched, or directly afterwards (well its not really in 3d space but i think you get what i mean), you can just fetch it along in one operation and take better advantage of several caching and data fetching technologies, in general from your hard drive.
But on the other side you waste space if you have data that doesn't fill up the page size or is just a little bit more or something.
I personally never had a case where tuning the page size was important. There were always better approaches to optimize performance, and if not, it was already more than fast enough.
It's the size of which data is stored/read/written to disk and in memory.
Different page sizes might work better or worse for different work loads/data sets; i.e. sometimes you might want more rows per page, or less rows per page. Having said that, the default page size is fine for the majority of applications.
Note that "pages" aren't unique for MySQL. It's an aspect of a parameter for all databases.