MySQL - Define max size of image - mysql

From many websites I've learned the types of BLOB's for saving images, like e.g. here. So having in mind the max size of regular BLOB is 65KB and bigger one is 16MB, what is the best approach to save images that are max. 100KB? Should I pick MEDIUMBLOB and limit it to max 100KB? If yes, does this affect the real size of the database at the end in the terms of reserved space for unused MEDIUMBLOB size? I mean, if MEDIUMBLOB reserves 16MB of database space, when uploading only 100KB image, will the reserved space be never used? Or do I understand this wrong?

There is a wee amount of overhead in choosing larger BLOB (or text) types for storage. However, remember that you are already storing a hundred kbytes of data, so another byte or two or three or four is negligible. Probably not even measurable.
Choose the data type that is most appropriate for the data you need to store. If performance becomes an issue, think about it then. Often premature optimization is merely premature: it is not an optimization and it is a hindrance to future development.
I'm not arguing for things that don't make sense -- don't use text for a three-character code. But you will have many, many, many other performance issues before worrying about a less than a 0.01% size increase due to the blob type that you choose.

Related

The effect of the fields length on the querying time

I have a mysql database in which I keep information of item and also I keep description.
The thing is that the description column can hold up to 150 chars which I think is long and I wondered if it slows the querying time. Also I wanted to know if its recommended to shorten the size of the int I mean if I have a price which is normally not that big should I limit the column to small/medium int?
The columns are something like this:
id name category publisher mail price description
Thanks in advance.
Store your character data as varchar() and not as char() and read up on the MySQL documentation on these data types (here). This only stores the characters actually in the description, plus a few more bytes of overhead.
As for whether or not the longer fields imply worse-performing queries. That is a complicated subject. Obviously, at the extreme, having the maximum size records is going to slow things down versus a 10-byte record. The reason has to do with I/O performance. MySQL reads in pages and a page can contain one or more records. The records on the page are then processed.
The more records that fit on the page, the fewer the I/Os.
But then it gets more complicated, depending on the hardware and the storage engine. Disks, nowadays, do read-aheads as do operating systems. So, the next read of a page (if pages are not fragmented and are adjacent to each other) may be much faster than the read of the initial page. In fact, you might have the next page in memory before processing on the first page has completed. At that point, it doesn't really matter how many records are on each page.
And, 200 bytes for a record is not very big. You should worry first about getting your application working and second about getting it to meet performance goals. Along the way, make reasonable choices, such as using varchar() instead of char() and appropriately sized numerics (you might consider fixed point numeric types rather than float for monetary values).
It is only you that considers 150 long - the database most likely does not, as they're designed to handle much more at once. Do not consider sacrificing your data for "performance". If the nature of your application requires you to store up to 150 characters of text at once, don't be afraid to do so, but do look up optimization tips.
Using proper data types, though, can help you save space. For instance, if you have a field which is meant to store values 0 to 20, there's no need for an INT field type. A TINYINT will do.
The documentation lists the data types and provides information on how much space they use and how they're managed.

To BLOB or not to BLOB

I am in the process of writing a web app backed up by a MySQL database where one of the tables has a potential for getting very large (order of gigabytes) with a significant proportion of table operations being writes. One of the table columns needs to store a string sequence that can be quite big. In my tests thus far it has reached a size of 289 bytes but to be on the safe side I want to design for a maximum size of 1 kb. Currently I am storing that column as a MySQL MediumBlob field in an InnoDB table.
At the same time I have been googling to establish the relative merits and demerits of BLOBs vs other forms of storage. There is a plethora of information out there, perhaps too much. What I have gathered is that InnoDB stores the first few bytes (789 if memory serves me right) of the BLOB in the table row itself and the rest elsewhere. I have also got the notion that if a row has more than one BLOB (which my table does not) per column then the "elsewhere" is a different location for each BLOB. That apart I have got the impression that accessing BLOB data is significantly slower than accessing row data (which sounds reasonable).
My question is just this - in light of my BLOB size and the large potential size of the table should I bother with a blob at all? Also, if I use some form of inrow storage instead will that not have an adverse effect on the maximum number of rows that the table will be able to accommodate?
MySQL is neat and lets me get away with pretty much everything in my development environment. But... that ain't the real world.
I'm sure you've already looked here but it's easy to overlook some of the details since there is a lot to keep in mind when it comes to InnoDB limitations.
The easy answer to one of your questions (maximum size of a table) is 64TBytes. Using variable size types to move that storage into a separate file would certainly change the upper limit on number of rows but 64TBytes is quite a lot of space so the ratio might be very small.
Having a column with a 1KByte string type that is stored inside the table seems like a viable solution since it's also very small compared to 64TBytes. Especially if you have very strict requirements for query speed.
Also, keep in mind that the InnoDB 64TByte limit might be pushed down by the the maximum file size for the OS you're using. You can always link several files together to get more space for your table but then it's starting to get a bit more messy.
if the BLOB data is more then 250kb it is not worth it. In your case i wouldn't bother myself whit BLOB'n. Read this

MySQL: What is a page?

I can't for the life of me remember what a page is, in the context of a MySQL database. When I see something like 8KB/page, does that mean 8KB per row or ...?
Database pages are the internal basic structure to organize the data in the database files. Following, some information about the InnoDB model:
From 13.2.11.2. File Space Management:
The data files that you define in the configuration file form the InnoDB tablespace. The files are logically concatenated to form the tablespace. [...] The tablespace consists of database pages with a default size of 16KB. The pages are grouped into extents of size 1MB (64 consecutive pages). The “files” inside a tablespace are called segments in InnoDB.
And from 13.2.14. Restrictions on InnoDB Tables
The default database page size in InnoDB is 16KB. By recompiling the code, you can set it to values ranging from 8KB to 64KB.
Further, to put rows in relation to pages:
The maximum row length, except for variable-length columns (VARBINARY, VARCHAR, BLOB and TEXT), is slightly less than half of a database page. That is, the maximum row length is about 8000 bytes. LONGBLOB and LONGTEXT columns must be less than 4GB, and the total row length, including BLOB and TEXT columns, must be less than 4GB.
Well,
its not really a question about MySql its more about what page size is in general in memory management.
You can read about that here: http://en.wikipedia.org/wiki/Page_(computer_memory)
Simply put its the smallest unit of data that is exchanged/stored.
The default page size is 4k which is probably fine.
If you have large data-sets or only very few write operations it may improve performance to raise the page size.
Have a look here: http://db.apache.org/derby/manuals/tuning/perf24.html
Why? Because more data can be fetched/addressed at once.
If the probability is high that the desired data is in close proximity to the data you just fetched, or directly afterwards (well its not really in 3d space but i think you get what i mean), you can just fetch it along in one operation and take better advantage of several caching and data fetching technologies, in general from your hard drive.
But on the other side you waste space if you have data that doesn't fill up the page size or is just a little bit more or something.
I personally never had a case where tuning the page size was important. There were always better approaches to optimize performance, and if not, it was already more than fast enough.
It's the size of which data is stored/read/written to disk and in memory.
Different page sizes might work better or worse for different work loads/data sets; i.e. sometimes you might want more rows per page, or less rows per page. Having said that, the default page size is fine for the majority of applications.
Note that "pages" aren't unique for MySQL. It's an aspect of a parameter for all databases.

In 64 bit systems, a 32 bit column occupies less space than a 64 bit one?

In MySQL running on a 64bit OS, every column occupies at least 64 bits? A bit would occupy a whole byte? or a whole word?
Here's what I think you're looking for:
http://dev.mysql.com/doc/refman/5.5/en/storage-requirements.html
"The storage requirements for data vary, according to the storage engine being used for the table in question. Different storage engines use different methods for recording the raw data and different data types. In addition, some engines may compress the information in a given row, either on a column or entire row basis, making calculation of the storage requirements for a given table or column structure [difficult]."
every column occupies at least 64 bits?
Do you actually mean column, or row? I think you mean row, but I'm not entirely sure. I can't quite figure out why you would need to know the size of the column description because AFAIK it takes up O(1) space...
A bit would occupy a whole byte? or a whole word?
In memory: Not sure actually - I'm leaning towards it occupying an entire word so that meaningful comparisons could be made, but I really don't know. Unless you're talking about using the MEMORY storage engine in which case...
On disk: It really does depend on your storage engine (as mdma said) and row sizes vary vastly depending on your storage engine and character set. For example, the article linked above explains some of the optimizations that MyISAM does to record as few bits as needed for each row.
It will be 32 bit on disk. I don't know about memory, but I do know 32bit numbers will be upcasted to 64 for comparisons (even on 32bit systems, which gives 64 bit systems a bit boost).

How important is it to select the smallest possible data type when designing a database?

How much of a difference does using tinyint or smallint (when applicable) instead of just int do? Or restricting a char field to the minimum characters needed?
Do these choices affect performance or just allocated space?
On an Indexed field with a significantly large table the size of your field can make a large affect on performance. On a nonindexed field its not nearly as important bit it still has to write the extra data.
That said, the downtime of a resize of a large table can be several minutes or several hours even, so don't make them smaller than you'd imagine ever needing.
Yes it affects performance too.
If the indexes are larger, it takes longer to read them from disk, and less can be cached in memory.
I've frequently seen these three schema design defects causing problems:
A varchar(n) field was created with n only big enough for the sample of data that the designer had pulled in, not the global population: fine in unit tests, silent truncations in the real world.
A varchar(n) used where the data is fixed size. This masks data bugs.
A char(n) used for variable length data. This provides performance improvements (by enabling the data to sit in-line in the row on disc, but all the client code (and various stored procs/views etc) need to cope with whitespace padding issues (and often they don't). Whitespace padding can be difficult to track down, because spaces don't show up too well, and various libraries/sql clients suppress them.
I've never seen a well intentioned (i.e. not just using varchar(255) for all cols) but conservative selection of the wrong data size cause a significant performance problems. By significant, I mean factor of 10. I regularly see algorithmic design flaws (missing indexes, sending too much data over the wire etc.) causing much bigger performance hits.
Both, in some cases. But imo, it's more of a question of design than performance and storage considerations. The reason you don't make everything varchar(...) is because that doesn't accurately reflect what sort of data should be stored there, and it reduces your data's integrity and type-safety.