The MySQL manual states that a field with the type date takes up three bytes, but if the date is 0000-00-00, does it still take up those bytes? If so, is there an advised method to reduce storage, such as setting the field to NULL?
InnoDB (which you should be using; do not use MyISAM) uses zero bytes for a field that is NULL, but the full number of bytes for a zero value date.
Using NULL may allow InnoDB to store a few more rows per page. I say may because you might have a bunch of other non-null fields per row, so the ratio of savings will be small. If you can do this, InnoDB can fit more rows in the same size buffer pool, thus incur less frequent I/O to read pages (because they stay in the buffer pool), thus get more performance.
Those are a lot of conditions and caveats. The net benefit to performance is likely to be very modest.
I suggest this should not be the focus of your optimization efforts. You'll get better bang for the buck by concentrating on:
Analyzing queries so you can choose the right indexes.
Use Memcached for caching on a case-by-case basis in your app.
Design application architecture for better scaling.
Upgrade your system RAM and buffer pool size, until it holds more of your database pages.
Related
So I am trying to figure out a little bit of optimization with regards to MySQL and the sort rows functionality. As I understand it you can set a max row comparison and it is a good idea to set this fairly high if your machines memory can take it to reduce I/O. My question is does the memory get allocated dynamically as you load in more things to sort or statically as a massive block? Basically if I know 100% for sure I will never have more than say 1000 rows to sort would it be more efficient to set a max rows of say 1200,to give a small buffer just in case, versus 1 million. Thanks for your answers and sorry if I'm not explicit enough I'm still very new to SQL and MySQL.
When MySQL needs to sort a resultset, such as to satisfy an SELECT ... ORDER BY, it will act in one of several ways:
If the ORDER BY can be handled by an INDEX, no sort is needed.
If the number of rows is 'small' (and some other criteria), it will create a MEMORY table in RAM and use an in-memory sort to do the work. This is very fast. Such temp-MEMORY tables are limited by both tmp_table_size and max_heap_table_size. Since multiple connections may be doing such simultaneously, it is a good idea to not set those higher than, say, 1% of RAM.
If that overflows or otherwise fails, a MyISAM table is built instead. This can have essentially unlimited size. Still, because of caching, it may or may not spill to disk, thereby incurring I/O.
There are other cases where MySQL will sort things. For example, creating an index may gather all the info, sort it, then spew the info into the BTree for the index. This probably involves an Operating System sort, which, again, may or may not involved I/O.
1200 rows is likely to be done in RAM; 1M rows is likely to involve I/O.
But, the bottom line is: Don't worry about it. If you need ORDER BY, use it. Let MySQL do it the best way it can.
I've a Innodb table with, amongst other fields, a blob field that can contain up to ~15KB of data. Reading here and there on the web I found that my blob field can lead (when the overall fields exceed ~8000 bytes) to a split of some records into 2 parts: on one side the record itself with all fields + the leading 768 bytes of my blob, and on another side the remainder of the blob stored into multiple (1-2?) chunks stored as a linked list of memory pages.
So my question: in such cases, what is more efficient regarding the way data is cached by MySQL ? 1) Let the engine deal with the split of my data or 2) Handle the split myself in a second table, storing 1 or 2 records depending on the length of my blob data and having these records cached by MySQL ? (I plan to allocate as much memory as I can for this to occur)
Unless your system's performance is so terrible that you have to take immediate action, you are better off using the internal mechanisms for record splitting.
The people who work on MySQL and its forks (e.g. MariaDB) spend a lot of time implementing and testing optimizations. You will be much happier with simple application code; spend your development and test time on your application's distinctive logic rather than trying to work around internals issues.
I am in the process of writing a web app backed up by a MySQL database where one of the tables has a potential for getting very large (order of gigabytes) with a significant proportion of table operations being writes. One of the table columns needs to store a string sequence that can be quite big. In my tests thus far it has reached a size of 289 bytes but to be on the safe side I want to design for a maximum size of 1 kb. Currently I am storing that column as a MySQL MediumBlob field in an InnoDB table.
At the same time I have been googling to establish the relative merits and demerits of BLOBs vs other forms of storage. There is a plethora of information out there, perhaps too much. What I have gathered is that InnoDB stores the first few bytes (789 if memory serves me right) of the BLOB in the table row itself and the rest elsewhere. I have also got the notion that if a row has more than one BLOB (which my table does not) per column then the "elsewhere" is a different location for each BLOB. That apart I have got the impression that accessing BLOB data is significantly slower than accessing row data (which sounds reasonable).
My question is just this - in light of my BLOB size and the large potential size of the table should I bother with a blob at all? Also, if I use some form of inrow storage instead will that not have an adverse effect on the maximum number of rows that the table will be able to accommodate?
MySQL is neat and lets me get away with pretty much everything in my development environment. But... that ain't the real world.
I'm sure you've already looked here but it's easy to overlook some of the details since there is a lot to keep in mind when it comes to InnoDB limitations.
The easy answer to one of your questions (maximum size of a table) is 64TBytes. Using variable size types to move that storage into a separate file would certainly change the upper limit on number of rows but 64TBytes is quite a lot of space so the ratio might be very small.
Having a column with a 1KByte string type that is stored inside the table seems like a viable solution since it's also very small compared to 64TBytes. Especially if you have very strict requirements for query speed.
Also, keep in mind that the InnoDB 64TByte limit might be pushed down by the the maximum file size for the OS you're using. You can always link several files together to get more space for your table but then it's starting to get a bit more messy.
if the BLOB data is more then 250kb it is not worth it. In your case i wouldn't bother myself whit BLOB'n. Read this
I have a table with 70 million records and there is an index missing. I want to calculate the time to add the index without backing up the table and doing the index on the backed up table.
I am just wondering if it will be twice as slow(linear) or if it is exponential.
database: mysql 5.0
Thanks a lot
(Disclaimer: I have minimal experience on MySQL)
It should be somewhere in-between.
The absolutely lowest complexity of the whole operation would be the one that would appear when just reading all records in order, which is a linear process - O(n). This is an I/O bound operation and there is not much that can be done about it - modern caching systems in most OS may help, but only in a DB that is in use and fits in the available memory.
In most SQL engines, indexes are some variation of a B-tree. The CPU complexity of inserting a single record into such a tree is roughly O(log(n)), where n is its size. For n records we get a complexity of O(n log(n)). The total complexity of the operation should be O(n log(n)).
Of course, it's not quite that simple. Computing the index tree is not really CPU-heavy and since the index pages should fit in RAM on any modern system, the operation of inserting a single node when the tree is not rebalanced would be close to O(1) time-wise: a single disk operation to update a leaf page of the index.
Since the tree does get rebalanced, however, things are probably a bit more complex. Multiple index pages may have to be commited to disk, thus increasing the necessary time. As a rough guess, I'd say O(n log(n)) is a good start...
It should never come anywhere close to an exponential complexity, though.
EDIT:
It just occured to me that 70,000,000 B-tree entries may not, in fact, fit in the in-memory cache. It would depend heavily on what is being indexed. INTEGER columns would probably be fine, but TEXT columns are another story altogether. If the average field length is 100 bytes (e.g. HTTP links or 30 characters of non-English UTF-8 text) you'd need more than 7GB of memory to store the index.
Bottom line:
If the index fits in the cache, then since building the index should be a single DB transaction, it would be I/O-bound and roughly linear as all the records have to be parsed and then the index itelse has to be written-out to permanent storage.
If the index does not fit in the cache, then the complexity rises, as I/O wait-times on the index itself become involved in each operation.
What thkala describes is true for inserting individual rows, but when creating a new index, no reasonable RDBMS will just do n inserts, rather it will construct the index directly starting with the leaf nodes. This process will almost certainly be IO-bound.
So, in practical terms, re-indexing time should be linear: twice as long for twice as many records.
I can't for the life of me remember what a page is, in the context of a MySQL database. When I see something like 8KB/page, does that mean 8KB per row or ...?
Database pages are the internal basic structure to organize the data in the database files. Following, some information about the InnoDB model:
From 13.2.11.2. File Space Management:
The data files that you define in the configuration file form the InnoDB tablespace. The files are logically concatenated to form the tablespace. [...] The tablespace consists of database pages with a default size of 16KB. The pages are grouped into extents of size 1MB (64 consecutive pages). The “files” inside a tablespace are called segments in InnoDB.
And from 13.2.14. Restrictions on InnoDB Tables
The default database page size in InnoDB is 16KB. By recompiling the code, you can set it to values ranging from 8KB to 64KB.
Further, to put rows in relation to pages:
The maximum row length, except for variable-length columns (VARBINARY, VARCHAR, BLOB and TEXT), is slightly less than half of a database page. That is, the maximum row length is about 8000 bytes. LONGBLOB and LONGTEXT columns must be less than 4GB, and the total row length, including BLOB and TEXT columns, must be less than 4GB.
Well,
its not really a question about MySql its more about what page size is in general in memory management.
You can read about that here: http://en.wikipedia.org/wiki/Page_(computer_memory)
Simply put its the smallest unit of data that is exchanged/stored.
The default page size is 4k which is probably fine.
If you have large data-sets or only very few write operations it may improve performance to raise the page size.
Have a look here: http://db.apache.org/derby/manuals/tuning/perf24.html
Why? Because more data can be fetched/addressed at once.
If the probability is high that the desired data is in close proximity to the data you just fetched, or directly afterwards (well its not really in 3d space but i think you get what i mean), you can just fetch it along in one operation and take better advantage of several caching and data fetching technologies, in general from your hard drive.
But on the other side you waste space if you have data that doesn't fill up the page size or is just a little bit more or something.
I personally never had a case where tuning the page size was important. There were always better approaches to optimize performance, and if not, it was already more than fast enough.
It's the size of which data is stored/read/written to disk and in memory.
Different page sizes might work better or worse for different work loads/data sets; i.e. sometimes you might want more rows per page, or less rows per page. Having said that, the default page size is fine for the majority of applications.
Note that "pages" aren't unique for MySQL. It's an aspect of a parameter for all databases.