MySQL specifies the row format of a table as either fixed or dynamic, depending on the column data types. If a table has a variable-length column data type, such as TEXT or VARCHAR, the row format is dynamic; otherwise, it's fixed.
My question is, what's the difference between the two row formats? Is one more efficient than the other?
The difference really only matters for MyISAM, other storage engines do not care about the difference.
EDIT : Many users commented that InnoDB does care: link 1 by steampowered, link 2 by Kaan.
With MyISAM with fixed width rows, there are a few advantages:
No row fragmentation: It is possible with variable width rows to get single rows split into multiple sections across the data file. This can increase disk seeks and slow down operations. It is possible to defrag it with OPTIMIZE TABLE, but this isn't always practical.
Data file pointer size: In MyISAM, there is a concept of a data file pointer which is used when it needs to reference the data file. For example, this is used in indexes when they refer to where the row actually is present. With fixed width sizes, this pointer is based on the row offset in the file (ie. rows are 1, 2, 3 regardless of their size). With variable width, the pointer is based on the byte offset (ie. rows might be 1, 57, 163). The result is that with large tables, the pointer needs to be larger which then adds potentially a lot more overhead to the table.
Easier to fix in the case of corruption. Since every row is the same size, if your MyISAM table gets corrupted it is much easier to repair, so you will only lose data that is actually corrupted. With variable width, in theory it is possible that the variable width pointers get messed up, which can result in hosing data in a bad way.
Now the primary drawback of fixed width is that it wastes more space. For example, you need to use CHAR fields instead of VARCHAR fields, so you end up with extra space taken up.
Normally, you won't have much choice in the format, since it is dictated based on the schema. However, it might be worth if you only have a few varchar's or a single blob/text to try to optimize towards this. For example, consider switching the only varchar into a char, or split the blob into it's own table.
You can read even more about this at:
http://dev.mysql.com/doc/refman/5.0/en/static-format.html
http://dev.mysql.com/doc/refman/5.0/en/dynamic-format.html
One key difference occurs when you update a record. If the row format is fixed, there is no change in the length of the record. In contrast, if the row format is dynamic and the new data causes the record to increase in length, a link is used to point to the "overflow" data (i.e. it's called the overflow pointer).
This fragments the table and generally slows things down. There is a command to defragment (OPTIMIZE TABLE), which somewhat mitigates the issue.
This page in MySQL's documentation seems to contradict the top answer here, in that DYNAMIC row format means something for InnoDB tables as well:
https://dev.mysql.com/doc/refman/5.7/en/innodb-row-format.html
Fixed means that every row is exactly the same size. That means that if the 3rd row on a data page needs to be loaded, it will be at exactly PageHeader+2*RowSize, saving some access time.
In order to find the beginning of a dynamic record, the list of record offsets must be consulted, which involves an extra indirection.
In short, yes, there's a slight performance hit for dynamic rows. No, it's not a very big one. If you think it will be a problem, test for it.
Fixed should be faster and more secure than dynamic, with the drawback of having a fixed char-lenght.
You can find this information here: http://dev.mysql.com/doc/refman/5.0/en/static-format.html
Related
I have a MariaDB InnoDB table with several million rows, but with short, fixed-width rows consisting of numbers and timestamps only.
We usually search, filter and sort the rows using any of the existing columns.
We want to add a column to store an associated "url" to each row. Ideally every row will have it's url.
We know for a fact that we won't be sorting, searching and filtering by the url column.
We don't mind truncating the URL to it's first 255 bytes, so we are going to give it the VARCHAR type.
But of course that column's width would be variable. The whole record will become variable-width and the width of the original record will double in many cases.
We were considering the alternative of using a different, secondary table for storing the varchar.
We could join them when querying the data, or even more efficiently -probably- just fetch the url's for the page we are showing.
Would this approach be advisable?
Is there a better alternative that would also allow us to preserve performance?
Update: As user Bill Karwin noted in one comment below, InnoDB does not benefit from fixed width as much as MyISAM does, so the real issue here is about the size of the row and not so much about the fixed versus variable width discussion.
Assuming you have control over how the URL is generated, you may want to change it to a fixed-length state. Youtube videos' URIs, for instance, are always 11 characters long and base-64. This fixes the variable length problem and avoids joining tables.
If changing URI generation is not an option, you have a few alternatives to make it fixed-length:
You could fill in the blanks with a special character to force every url to be 255 within the database, and removing it just before returning it. This is not a clean solution but makes DQL operations faster than joining.
You could fetch the url as you have stated, but beware that two http requests may be more time consuming than any other option with just one request.
You could join with another table only when the user requires it, as opposed to it being the default.
Consider that having variable length may not be as big a problem, depending on your needs. The only issue might be if you're grossly oversizing fields, but it doesn't seem to be your case.
I have a table (millions of rows) where one of the columns is a Text field (stores json blobs). But only about 10-20% of them are actually non-Null.
What is the best practice when it comes to sparse columns?
Should I
a) Just keep the table as is or
b) Create a new table with just that Text column?
If I am not mistaken, option (a) is fine because InnoDB will dynamically only allocate as much space that is needed for that Text column right? Is there any reason to go with option (b)? Seems like option (b) will just add complexity in querying (joining) these tables and further increase the space complexity as well.
MySQL (InnoDB storage engine) stores nothing for a NULL. Well, each row has a bitfield with 1 bit for each nullable column. The bitfield is followed by data values for non-NULL columns. And variable-length columns like VARCHAR, TEXT, BLOB, or JSON take only the space needed given their length.
So I'd suggest keeping your table as is, keep the TEXT field in the table, and make it NULL when there's no JSON data.
P.S.: Aren't you using the JSON data type?
You mentioned the storage/space consideration. I think most importantly is how you will use the data. If you performance is okey with doing a like "%% match, then just leave it.
Denormalize the data allow you better query/index the content.
In general, it does not matter whether you do (a) or (b). But here are some more considerations:
If you do SELECT * but ignore that column, then (a) is wasteful.
Certain InnoDB ROW_FORMATs will put 'short' strings in the table, not separate; others will store them in a separate block, leaving behind 20 or 767 bytes in the main block. (It gets rather tedious and confusing to see if this will really matter for (a).)
(b) involves a LEFT JOIN in your code when you do want the column. You may consider this a bother.
When the text and blob data type column is used in Innodb Engine. The value which are stored in that field is stored in off-page not stored in page (The default Size of Innodb page is 16kb).
My Questions are
1. What is the off-page mean?
2. How its is accessed while retrieving the value from off-page?
(To continue with what rlanvin and fangxing started, and to say more about about question 2.)
The ROW_FORMAT controls some of the details about which column(s) are stored 'off-page', and also whether the entire column is off-page or whether only the part after the first 767 bytes.
If any or all is stored off-page, a 20-byte "pointer" is left behind. This contains a various fields the have the effect of identifying the 16KB page to go to to find the column value. All pages are 16KB (assuming the default; almost no one ventures into having a different page size). Hence, a big column could take multiple blocks.
The blocks come from 'extents' which are 1MB (or is it 8MB??) allocations taken from the 'tablespace' that contains the table. Originally, ibdata* ware the only tablespaces. Later, with innodb_file_per_table, a table could be in its own tablespace. In the near future (8.0), you will be able to define tablespaces and put whichever tables you desire into it.
When doing SHOW TABLE STATUS (and having file_per_table), notice that the Data_free is 4MB, 5MB, 6MB, or 7MB, reflecting the consumption of an extent.
Where are you headed with the question? One direction could be into optimizations.
If you don't SELECT that column, then the overflow pages don't need to be accessed. (Note: This is a strong argument against blindly using SELECT *.)
If some big columns could fit, but not all, some will be stored on-page; the rest will be off-page.
There are multiple ROW_FORMATs; each is slightly better for certain types of data.
If 767 bytes are held in the main page, it seems like LEFT(col, 10) would not have to get the extra pages. I suspect this optimization is missing.
There are more possible optimizations, many of which are not implemented.
Why is "off page" beneficial?
If you are searching for some rows, the search can (hopefully) take place only in the on-page blocks.
When you find the one (or few) row you want, only then do you incur the extra disk hits to reach into the off-page blocks.
A partial, crude, summary of ROW_FORMATs:
If the value (TEXT/BLOB/VARCHAR/VARBLOB) is shorter 40 bytes, it is stored on-page.
If the value is bigger than 40, but not really big, it is either put on-page or off-page, depending on whether there is enough room for it. That is, larger columns will be moved off-page until the rest will fit.
If the value is really big, then COMPACT will put 768 bytes on-page, and add a 20-byte 'pointer' to the rest (which will be off-page)
Really big and DYNAMIC or COMPRESSED, then there is only the 20-byte pointer.
KEY_BLOCK_SIZE has some impact on on/off-page.
What is the off-page mean
I means that it is stored in a separate area, so it doesn't bloat the main index. This way, the size of the pages can stay identical and exact (the exact storage space required for everything other types is known), and it makes accessing the data faster.
How its is accessed while retrieving the value from off-page?
A 20-byte pointer is stored and it contains the address of the rest of the column.
PS. I think this question would be better suited for DBA
From https://dev.mysql.com/doc/refman/5.5/en/innodb-compression-internals.html
In an InnoDB table, BLOB, VARCHAR, and TEXT columns that are not part
of the primary key may be stored on separately allocated overflow
pages. We refer to these columns as off-page columns. Their values are
stored on singly-linked lists of overflow pages.
I know that changing a table with fixed width rows to have variable width rows (by changing a CHAR column to a VARCHAR) has performance implications.
However my question is, given a preexisting table with variable width rows (due to many VARCHAR columns), and thus with that performance penalty already paid, would adding another variable length column further impact performance?
My hunch is that it wouldn't, the biggest performance penalty would be switching from fixed width rows to variable width rows and that adding another variable width column would have a negligible impact.
Yes and no. It is true that variable width character columns are slightly slower then fixed width character columns. But the "penalty" (or performance cost) is cummulative and per column. So, every column you add to your query in general (fixed width or otherwise) is going to impact performance (as you query more data, it takes longer to fetch all of the data).
Each Variable length column you add to the table, makes it worse to retrieve the data.
Another consideration would also be - if the variable length columns are part of the Query (filter/Where clause) and if you are going to be using those in indexes. Variable Length fields in the index will also add to the index overhead. For details, you will need to look at the documentation of the particular database you are using. e.g. http://dev.mysql.com/doc/refman/5.6/en/innodb-table-and-index.html
Sure. Extracting a data row into the component fields will take a few extra cycles.
That, however, will be more than offset by the almost certain reduction in row size — meaning more rows per data page and thus faster lookups across the board.
It will make a tiny (measured in microseconds) difference to data retrieval performance, BUT the human performance impact of using the wrong datatype just to squeeze every last drop out of the database could be large and therefore costly.
Use the datatype most appropriate for the attribute you're persisting in the database.
Don't be driven by "performance", be driven by the usual guidelines for software development, like readability, maintainability, use ability, etc.
Use the wrong datatype and your code will be more complex (possibly losing more performance gain than you gained), and you'll regretted ever doing it. And I doubt you would ever notice those gains anyway.
Only do such things when you have proof that there's a problem, and the problem is large enough to matter. Doing what you're proposing is called "premature optimization", and is probably the worst design strategy there is.
I've heard (from a colleague, who heard it from another developer) that VARCHAR columns should always be put at the end of a table definition in MySQL, because they are variable in length and could therefore slow down queries.
The research I've done on stack overflow seems to contradict this however and suggests that column order is important, while there is varying agreement on how much this applies to VARCHARs.
He wasn't specific about storage engines, or about whether this only applied to VARCHAR columns which are infrequently accessed.
Asking that question about "MySQL" is not helpful, as MySQL relegates storage to storage engines, and they implement storage in very different ways. It makes sense to ask this question for any individual storage engine.
In the MEMORY engine, variable length data types do not exist. A VARCHAR is silently changed into a CHAR. In the context of your question: It does not matter where in a table definition you put your VARCHAR.
In the MyISAM engine, if a table has no variable length data whatsoever (VARCHAR, VARBINARY or any TEXT or BLOB type) it is of the FIXED variant of MyISAM, that is, records have a fixed byte length. This can have performance implications, especially if data is deleted and inserted repeatedly (i.e. the table is not append only). As soon as any variable length data type is part of a table definition it becomes the DYNAMIC variant of MyISAM, and MyISAM internally changes any but the shortest CHAR type internally to VARCHAR. Again, position and even definition of CHAR/VARCHAR do not matter.
In the InnoDB engine, data is stored in pages of 16 KB size. A page has a page footer with a checksum, and a page header, with among other things a page directory. The page directory contains for each row the offset of that row relative to the beginning of the page. A page also contains free space, and all I/O is done in pages.
Hence InnoDB can, as long as there is free space in a page, grow VARCHAR in place, and move rows around inside a page, without incurring any additional I/O. Also, since all rows are being addressed as (pagenumber, page directory entry), movement of a row inside a page is localized to the page and not visible from the outside.
It also means that for InnoDB too, the order of columns inside a row does not matter at all.
These are the three storage engines that are most commonly used with MySQL, and order of columns does not matter for any of these three. It may be that other, more exotic storage engines exist for which this is not true.
It does not matter. And some engines store varlena types in a separate area (e.g. TOAST in Postgres).
Moreover, the logical order (what you see when you select *) may actually differ from the physical order (how it's stored, which is based on the order in which you've created the actual columns using subsequent alter table statements).
http://www.sqlskills.com/BLOGS/PAUL/post/Inside-the-Storage-Engine-Anatomy-of-a-record.aspx