To BLOB or not to BLOB

To BLOB or not to BLOB - mysql

I am in the process of writing a web app backed up by a MySQL database where one of the tables has a potential for getting very large (order of gigabytes) with a significant proportion of table operations being writes. One of the table columns needs to store a string sequence that can be quite big. In my tests thus far it has reached a size of 289 bytes but to be on the safe side I want to design for a maximum size of 1 kb. Currently I am storing that column as a MySQL MediumBlob field in an InnoDB table.
At the same time I have been googling to establish the relative merits and demerits of BLOBs vs other forms of storage. There is a plethora of information out there, perhaps too much. What I have gathered is that InnoDB stores the first few bytes (789 if memory serves me right) of the BLOB in the table row itself and the rest elsewhere. I have also got the notion that if a row has more than one BLOB (which my table does not) per column then the "elsewhere" is a different location for each BLOB. That apart I have got the impression that accessing BLOB data is significantly slower than accessing row data (which sounds reasonable).
My question is just this - in light of my BLOB size and the large potential size of the table should I bother with a blob at all? Also, if I use some form of inrow storage instead will that not have an adverse effect on the maximum number of rows that the table will be able to accommodate?
MySQL is neat and lets me get away with pretty much everything in my development environment. But... that ain't the real world.

I'm sure you've already looked here but it's easy to overlook some of the details since there is a lot to keep in mind when it comes to InnoDB limitations.
The easy answer to one of your questions (maximum size of a table) is 64TBytes. Using variable size types to move that storage into a separate file would certainly change the upper limit on number of rows but 64TBytes is quite a lot of space so the ratio might be very small.
Having a column with a 1KByte string type that is stored inside the table seems like a viable solution since it's also very small compared to 64TBytes. Especially if you have very strict requirements for query speed.
Also, keep in mind that the InnoDB 64TByte limit might be pushed down by the the maximum file size for the OS you're using. You can always link several files together to get more space for your table but then it's starting to get a bit more messy.

if the BLOB data is more then 250kb it is not worth it. In your case i wouldn't bother myself whit BLOB'n. Read this

Related

MySQL - Where can I find metrics on the performance of Blob vs file system?

I know and understand that there are performance hits in storing blob data in the database, but the blob portion of the data is going to be rarely retrieved/viewed, it is for smaller data (the vast majority under 256k with a max of 10mb), it is not going to be used by most customers, and the total rows is expected to be relatively low, very likely under half a million, if not less. Also some of the data is dynamic and can change for some users, as in it's not static images. In other words we're at the edge of whether or not it's worth it.
I keep reading that it's better to store in the file system but I can't find actual metrics that show the performance difference, just people repeating each other without any concrete proof or metrics. For us it may be worth the performance cost in exchange for being fully ACID as well as guaranteeing that all our backups are completely synched.
That being said does anyone know or have any real world metrics to show the performance difference between storing items as blobs vs in the file system. I'm trying to understand if the performance penalty is worth it or not rather than blindly following the general rule of thumb and after spending at least 2-3 hours I've yet to be able to see anyone show any actual numbers...
UPDATE: MySQL with an InnoDB table. The actual data table has a link to the blob data table, so the blob is not in the main able and is only retrieved when need be to avoid any I/O issues. In other words instead of the path to the data on the filesystem, it's an ID to another table with only blobs. How does that compare in terms of performance? Is it 25% worse? Is it 100%? Is it 200-500%? Is it 1000%?
If the cost is only 100%-200% it is probably worth it for us because again the data is rarely retrieved. So even if we had say 10,000 concurrent users, maybe only 50 users would be retrieving their blob data concurrently at best. Yes the data is specific to each user, it isn't images.

The effect of the fields length on the querying time

I have a mysql database in which I keep information of item and also I keep description.
The thing is that the description column can hold up to 150 chars which I think is long and I wondered if it slows the querying time. Also I wanted to know if its recommended to shorten the size of the int I mean if I have a price which is normally not that big should I limit the column to small/medium int?
The columns are something like this:
id name category publisher mail price description
Thanks in advance.

Store your character data as varchar() and not as char() and read up on the MySQL documentation on these data types (here). This only stores the characters actually in the description, plus a few more bytes of overhead.
As for whether or not the longer fields imply worse-performing queries. That is a complicated subject. Obviously, at the extreme, having the maximum size records is going to slow things down versus a 10-byte record. The reason has to do with I/O performance. MySQL reads in pages and a page can contain one or more records. The records on the page are then processed.
The more records that fit on the page, the fewer the I/Os.
But then it gets more complicated, depending on the hardware and the storage engine. Disks, nowadays, do read-aheads as do operating systems. So, the next read of a page (if pages are not fragmented and are adjacent to each other) may be much faster than the read of the initial page. In fact, you might have the next page in memory before processing on the first page has completed. At that point, it doesn't really matter how many records are on each page.
And, 200 bytes for a record is not very big. You should worry first about getting your application working and second about getting it to meet performance goals. Along the way, make reasonable choices, such as using varchar() instead of char() and appropriately sized numerics (you might consider fixed point numeric types rather than float for monetary values).

It is only you that considers 150 long - the database most likely does not, as they're designed to handle much more at once. Do not consider sacrificing your data for "performance". If the nature of your application requires you to store up to 150 characters of text at once, don't be afraid to do so, but do look up optimization tips.
Using proper data types, though, can help you save space. For instance, if you have a field which is meant to store values 0 to 20, there's no need for an INT field type. A TINYINT will do.
The documentation lists the data types and provides information on how much space they use and how they're managed.

MySQL: What is a page?

I can't for the life of me remember what a page is, in the context of a MySQL database. When I see something like 8KB/page, does that mean 8KB per row or ...?

Database pages are the internal basic structure to organize the data in the database files. Following, some information about the InnoDB model:
From 13.2.11.2. File Space Management:
The data files that you define in the configuration file form the InnoDB tablespace. The files are logically concatenated to form the tablespace. [...] The tablespace consists of database pages with a default size of 16KB. The pages are grouped into extents of size 1MB (64 consecutive pages). The “files” inside a tablespace are called segments in InnoDB.
And from 13.2.14. Restrictions on InnoDB Tables
The default database page size in InnoDB is 16KB. By recompiling the code, you can set it to values ranging from 8KB to 64KB.
Further, to put rows in relation to pages:
The maximum row length, except for variable-length columns (VARBINARY, VARCHAR, BLOB and TEXT), is slightly less than half of a database page. That is, the maximum row length is about 8000 bytes. LONGBLOB and LONGTEXT columns must be less than 4GB, and the total row length, including BLOB and TEXT columns, must be less than 4GB.

Well,
its not really a question about MySql its more about what page size is in general in memory management.
You can read about that here: http://en.wikipedia.org/wiki/Page_(computer_memory)
Simply put its the smallest unit of data that is exchanged/stored.
The default page size is 4k which is probably fine.
If you have large data-sets or only very few write operations it may improve performance to raise the page size.
Have a look here: http://db.apache.org/derby/manuals/tuning/perf24.html
Why? Because more data can be fetched/addressed at once.
If the probability is high that the desired data is in close proximity to the data you just fetched, or directly afterwards (well its not really in 3d space but i think you get what i mean), you can just fetch it along in one operation and take better advantage of several caching and data fetching technologies, in general from your hard drive.
But on the other side you waste space if you have data that doesn't fill up the page size or is just a little bit more or something.
I personally never had a case where tuning the page size was important. There were always better approaches to optimize performance, and if not, it was already more than fast enough.

It's the size of which data is stored/read/written to disk and in memory.
Different page sizes might work better or worse for different work loads/data sets; i.e. sometimes you might want more rows per page, or less rows per page. Having said that, the default page size is fine for the majority of applications.
Note that "pages" aren't unique for MySQL. It's an aspect of a parameter for all databases.

Find out how much storage a row is taking up in the database

Is there a way to find out how much space (on disk) a row in my database takes up?
I would love to see it for SQL Server CE, but failing that SQL Server 2008 works (I am storing about the same data in both).
The reason I ask is that I have a Image column in my SQL Server CE db (it is a varbinary[max] in the SQL 2008 db) and I need to know now many rows I can store before I max out the memory on my device.

Maybe not the 100% what you wanted but if you want to know how much size an Image take just do
SELECT [RaportID]
,DATALENGTH([RaportPlik]) AS 'FileSize'
,[RaportOpis]
,[RaportDataOd]
,[RaportDataDo]
FROM [Database]
Any other additional counting you need to do yourself (as in prediction etc).

A varbinary(max) column could potentially contain up to 2GB of data by itself for each row. For estimated use based on existing data, perhaps you could do some analysis using the DATALENGTH function to work out what space a typical one of your images is taking up, and extrapolate from there.

You can only make rough guesses - there is no exact answer to the question "how many rows I can store before I max out the memory on my device" since you do not have exclusive use of your device - other programs take resources too, and you can only know how much storage is available at the present time, not at some time in the future. Additionally, your images are likely compressed and therefore take variable amounts of RAM.
For guessing purposes, simply the size of your image is a good approximation of the row size; the overhead of the row structure is negligible.

I can store a lot of data (<=4GB) in one table column. But is it a good idea?

To make a long story short, one part of the application I'm working on needs to store a somewhat large volume of data in a database, for another part of the application to pick up later on. Normally this would be < 2000 rows, but can occasionally exceed 300,000 rows. The data needs to be temporarily stored and can be deleted afterwards.
I've been playing around with various ideas and one thing came to mind today. The LONGTEXT datatype can store a maximum of 2^32 bytes, which equates to 4 GB. Now, that's a lot of stuff to cram into one table row. Mind you, the data would probably not exceed 60-80 MB at the very most. But my question is, is it a good idea to actually do that?
The two solutions I'm currently toying with using are something like this:
Inserting all data as individual rows into a "temporary" table that would be truncated after finish.
Inserting all data as a serialized string into a LONGTEXT column in a row that would be deleted after finish.
Purely from a performance perspective, would it be better to store the data as potentially >300,000 individual rows, or as a 60 MB LONGTEXT entry?
If it's a wash, I will probably go with the LONGTEXT option, as it would make the part of the application that picks up the data easier to write. It would also tie in better with yet another part, which would increase the overall performance of the application.
I would appreciate any thoughts on this.

Serializing all that data into a LONGTEXT... blasphemy!! :)
Seriously though, it occurs to me that if you do that, you would have no choice than to extract it all in one, giant, piece. If you spread it into individual rows, on the other hand, you can have your front-end fetch it in smaller batches.
At least giving yourself that option seems the smart thing to do. (Keep in mind that underestimating the future size requirements of once data can be a fatal error!)
And if you design your tables right, I doubt very much that 60MiB of data spread over 300.000 rows would be any less efficient than fetching 60MiB of text and parsing that on the front-end.
Ultimately the question is: do you think your front-end can parse the text more efficiently than MySQL can fetch it?

This should be fine as long as you use a memory storage engine. In MySQL, this means using the MEMORY storage engine instead of InnoDB or MyISAM. Otherwise disk usage will bring your app to its knees.

What kind of data and how it will be used? Probably it will be much better to store and process it in memory of your application. At least, it will be much faster and will not load DB engine.

You could always store it in the database as the 300,000 row format and use memcached to cache the data so you don't have to do it again. Please note that memcached stores it in the memory of the machine so if your using a lot of this data you may way to set a low expire on it. But memcached significantly speeds up the time to fetch data because you dont have to do queries every page load.

If you're going to just be writing a large, temporary BLOB you might consider writing to a temporary file on a shared file system instead.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008