When the text and blob data type column is used in Innodb Engine. The value which are stored in that field is stored in off-page not stored in page (The default Size of Innodb page is 16kb).
My Questions are
1. What is the off-page mean?
2. How its is accessed while retrieving the value from off-page?
(To continue with what rlanvin and fangxing started, and to say more about about question 2.)
The ROW_FORMAT controls some of the details about which column(s) are stored 'off-page', and also whether the entire column is off-page or whether only the part after the first 767 bytes.
If any or all is stored off-page, a 20-byte "pointer" is left behind. This contains a various fields the have the effect of identifying the 16KB page to go to to find the column value. All pages are 16KB (assuming the default; almost no one ventures into having a different page size). Hence, a big column could take multiple blocks.
The blocks come from 'extents' which are 1MB (or is it 8MB??) allocations taken from the 'tablespace' that contains the table. Originally, ibdata* ware the only tablespaces. Later, with innodb_file_per_table, a table could be in its own tablespace. In the near future (8.0), you will be able to define tablespaces and put whichever tables you desire into it.
When doing SHOW TABLE STATUS (and having file_per_table), notice that the Data_free is 4MB, 5MB, 6MB, or 7MB, reflecting the consumption of an extent.
Where are you headed with the question? One direction could be into optimizations.
If you don't SELECT that column, then the overflow pages don't need to be accessed. (Note: This is a strong argument against blindly using SELECT *.)
If some big columns could fit, but not all, some will be stored on-page; the rest will be off-page.
There are multiple ROW_FORMATs; each is slightly better for certain types of data.
If 767 bytes are held in the main page, it seems like LEFT(col, 10) would not have to get the extra pages. I suspect this optimization is missing.
There are more possible optimizations, many of which are not implemented.
Why is "off page" beneficial?
If you are searching for some rows, the search can (hopefully) take place only in the on-page blocks.
When you find the one (or few) row you want, only then do you incur the extra disk hits to reach into the off-page blocks.
A partial, crude, summary of ROW_FORMATs:
If the value (TEXT/BLOB/VARCHAR/VARBLOB) is shorter 40 bytes, it is stored on-page.
If the value is bigger than 40, but not really big, it is either put on-page or off-page, depending on whether there is enough room for it. That is, larger columns will be moved off-page until the rest will fit.
If the value is really big, then COMPACT will put 768 bytes on-page, and add a 20-byte 'pointer' to the rest (which will be off-page)
Really big and DYNAMIC or COMPRESSED, then there is only the 20-byte pointer.
KEY_BLOCK_SIZE has some impact on on/off-page.
What is the off-page mean
I means that it is stored in a separate area, so it doesn't bloat the main index. This way, the size of the pages can stay identical and exact (the exact storage space required for everything other types is known), and it makes accessing the data faster.
How its is accessed while retrieving the value from off-page?
A 20-byte pointer is stored and it contains the address of the rest of the column.
PS. I think this question would be better suited for DBA
From https://dev.mysql.com/doc/refman/5.5/en/innodb-compression-internals.html
In an InnoDB table, BLOB, VARCHAR, and TEXT columns that are not part
of the primary key may be stored on separately allocated overflow
pages. We refer to these columns as off-page columns. Their values are
stored on singly-linked lists of overflow pages.
Related
I have a MariaDB InnoDB table with several million rows, but with short, fixed-width rows consisting of numbers and timestamps only.
We usually search, filter and sort the rows using any of the existing columns.
We want to add a column to store an associated "url" to each row. Ideally every row will have it's url.
We know for a fact that we won't be sorting, searching and filtering by the url column.
We don't mind truncating the URL to it's first 255 bytes, so we are going to give it the VARCHAR type.
But of course that column's width would be variable. The whole record will become variable-width and the width of the original record will double in many cases.
We were considering the alternative of using a different, secondary table for storing the varchar.
We could join them when querying the data, or even more efficiently -probably- just fetch the url's for the page we are showing.
Would this approach be advisable?
Is there a better alternative that would also allow us to preserve performance?
Update: As user Bill Karwin noted in one comment below, InnoDB does not benefit from fixed width as much as MyISAM does, so the real issue here is about the size of the row and not so much about the fixed versus variable width discussion.
Assuming you have control over how the URL is generated, you may want to change it to a fixed-length state. Youtube videos' URIs, for instance, are always 11 characters long and base-64. This fixes the variable length problem and avoids joining tables.
If changing URI generation is not an option, you have a few alternatives to make it fixed-length:
You could fill in the blanks with a special character to force every url to be 255 within the database, and removing it just before returning it. This is not a clean solution but makes DQL operations faster than joining.
You could fetch the url as you have stated, but beware that two http requests may be more time consuming than any other option with just one request.
You could join with another table only when the user requires it, as opposed to it being the default.
Consider that having variable length may not be as big a problem, depending on your needs. The only issue might be if you're grossly oversizing fields, but it doesn't seem to be your case.
There are 210 columns in my table with around 10000 rows. Each row is unique and there is a primary key on the table. The thing is we always had to do select all query on the table to get data of all the sites.
Currently, the problem is it takes too much time and the data returned is around 10mb and it will be large in the future.
The table has varchar, text and date types in it.
Is there any way I can modify the structure or something to make my retrieval faster. More indexing or breaking down the table. (Although I think denormalized data is good for retrieval)
Update: "why do wider tables slow down the query performance?"
Thanks..!
why do wider tables slow down the query performance?
InnoDB stores "wide" tables in a different way. Instead of having all the columns together in a single string (plus overhead, such as lengths, etc), it does the following:
If the total of all the columns for a given row exceeds about 8KB, it will move some of the data to another ("off-record") storage area.
Which columns are moved off-record depends on the sizes of the columns, etc.
The details depend on the ROW_FORMAT chosen.
"Off-record" is another 16KB block (or blocks).
Later, when doing SELECT * (or at least fetching the off-record column(s)), it must do another disk fetch.
What to do?
Rethink having so many columns.
Consider "vertical partitioning", wherein you have another table(s) that contains selected TEXT columns. Suggest picking groups of columns based on access patterns in your app.
For columns that are usually quite long, consider compressing them in the client and storing into a BLOB instead of a TEXT. Most "text" shrinks 3:1. Blobs are sent off-record the same as Texts, however, these compressed blobs would be smaller, hence less likely to spill.
Do more processing in SQL -- to avoid returning all the rows, or to avoid returning the full text, etc. When blindly shoveling lots of text to a client, the network and client become a sighificant factor in the elapsed time, not just the SELECT, itself.
I've heard (from a colleague, who heard it from another developer) that VARCHAR columns should always be put at the end of a table definition in MySQL, because they are variable in length and could therefore slow down queries.
The research I've done on stack overflow seems to contradict this however and suggests that column order is important, while there is varying agreement on how much this applies to VARCHARs.
He wasn't specific about storage engines, or about whether this only applied to VARCHAR columns which are infrequently accessed.
Asking that question about "MySQL" is not helpful, as MySQL relegates storage to storage engines, and they implement storage in very different ways. It makes sense to ask this question for any individual storage engine.
In the MEMORY engine, variable length data types do not exist. A VARCHAR is silently changed into a CHAR. In the context of your question: It does not matter where in a table definition you put your VARCHAR.
In the MyISAM engine, if a table has no variable length data whatsoever (VARCHAR, VARBINARY or any TEXT or BLOB type) it is of the FIXED variant of MyISAM, that is, records have a fixed byte length. This can have performance implications, especially if data is deleted and inserted repeatedly (i.e. the table is not append only). As soon as any variable length data type is part of a table definition it becomes the DYNAMIC variant of MyISAM, and MyISAM internally changes any but the shortest CHAR type internally to VARCHAR. Again, position and even definition of CHAR/VARCHAR do not matter.
In the InnoDB engine, data is stored in pages of 16 KB size. A page has a page footer with a checksum, and a page header, with among other things a page directory. The page directory contains for each row the offset of that row relative to the beginning of the page. A page also contains free space, and all I/O is done in pages.
Hence InnoDB can, as long as there is free space in a page, grow VARCHAR in place, and move rows around inside a page, without incurring any additional I/O. Also, since all rows are being addressed as (pagenumber, page directory entry), movement of a row inside a page is localized to the page and not visible from the outside.
It also means that for InnoDB too, the order of columns inside a row does not matter at all.
These are the three storage engines that are most commonly used with MySQL, and order of columns does not matter for any of these three. It may be that other, more exotic storage engines exist for which this is not true.
It does not matter. And some engines store varlena types in a separate area (e.g. TOAST in Postgres).
Moreover, the logical order (what you see when you select *) may actually differ from the physical order (how it's stored, which is based on the order in which you've created the actual columns using subsequent alter table statements).
http://www.sqlskills.com/BLOGS/PAUL/post/Inside-the-Storage-Engine-Anatomy-of-a-record.aspx
With MyISAM having variable length columns (varchar, blob) on the table really slowed queries so that I encountered advices on the net to move varchar columns into separate table.
Is that still an issue with InnoDB? I don't mean cases where introducing many varchar rows into the table causes page split. I just mean should you consider, for example, move post_text (single BLOB field in the table) into another table, speaking performance-wise about InnoDB?
As far as I know BLOBs (and TEXTs) are actually stored outside of the table, VARCHARs are stored in the table.
VARCHARs are bad for read performance because each record can be of variable length and that makes it more costly to find fields in a record.
BLOBs are slow because the value has to be fetched separately and it will very likely require another read from disk or cache.
To my knowledge InnoDB doesn't do anything differently in this respect so I would assume the performance characteristics hold.
I don't think moving BLOB values really helps - other than reducing overall table size which has a positive influence on performance regardless.
VARCHARs are a different story. You will definitely benefit here. If all your columns are of defined length (and I guess that means you can't use BLOBs either?) the field lookup will be faster.
If you're just 'reading' the VARHCAR and BLOB fields I'd say this is worth a shot. But if your select query needs to compare a value from a VARCHAR or a BLOB you're pretty sour.
So yes you can definitely gain performance here but make sure you test that you're actually gaining performance and that the increase is worth the aggressive denormalization.
PS.
Another way of 'optimizing' VARCHAR read performance is to simply replace them by CHAR fields (of fixed length). This could benefit read performance, so long as the increase in disk space is acceptable.
InnoDB data completely differently than MyISAM.
In MyISAM all indexes--primary or otherwise--- are stored in the MYI file an contain a pointer to the data stored in the MYD file. Variable length rows shouldn't directly affect query speed directly, but the MYD file does tend to get more fragmented with variable length rows because the hole left behind when you delete a row can't necessarily be filed in with the row you insert next. If you update a variable length value to make it longer you might have to move it somewhere else, which means it will tend to get out-of-order with respect to the indexes over time, making range queries slower. (If you're running it on a spinning disk where seek times are important).
InnoDB stores data clustered in pages in a B-tree on the primary key. So long as the data will fit in a page it is stored in the page whether you're using a BLOB or VARCHAR. As long as you aren't trying to insert inordinately long values on a regular basis it shouldn't matter whether your rows are fixed-length or variable-length.
MySQL specifies the row format of a table as either fixed or dynamic, depending on the column data types. If a table has a variable-length column data type, such as TEXT or VARCHAR, the row format is dynamic; otherwise, it's fixed.
My question is, what's the difference between the two row formats? Is one more efficient than the other?
The difference really only matters for MyISAM, other storage engines do not care about the difference.
EDIT : Many users commented that InnoDB does care: link 1 by steampowered, link 2 by Kaan.
With MyISAM with fixed width rows, there are a few advantages:
No row fragmentation: It is possible with variable width rows to get single rows split into multiple sections across the data file. This can increase disk seeks and slow down operations. It is possible to defrag it with OPTIMIZE TABLE, but this isn't always practical.
Data file pointer size: In MyISAM, there is a concept of a data file pointer which is used when it needs to reference the data file. For example, this is used in indexes when they refer to where the row actually is present. With fixed width sizes, this pointer is based on the row offset in the file (ie. rows are 1, 2, 3 regardless of their size). With variable width, the pointer is based on the byte offset (ie. rows might be 1, 57, 163). The result is that with large tables, the pointer needs to be larger which then adds potentially a lot more overhead to the table.
Easier to fix in the case of corruption. Since every row is the same size, if your MyISAM table gets corrupted it is much easier to repair, so you will only lose data that is actually corrupted. With variable width, in theory it is possible that the variable width pointers get messed up, which can result in hosing data in a bad way.
Now the primary drawback of fixed width is that it wastes more space. For example, you need to use CHAR fields instead of VARCHAR fields, so you end up with extra space taken up.
Normally, you won't have much choice in the format, since it is dictated based on the schema. However, it might be worth if you only have a few varchar's or a single blob/text to try to optimize towards this. For example, consider switching the only varchar into a char, or split the blob into it's own table.
You can read even more about this at:
http://dev.mysql.com/doc/refman/5.0/en/static-format.html
http://dev.mysql.com/doc/refman/5.0/en/dynamic-format.html
One key difference occurs when you update a record. If the row format is fixed, there is no change in the length of the record. In contrast, if the row format is dynamic and the new data causes the record to increase in length, a link is used to point to the "overflow" data (i.e. it's called the overflow pointer).
This fragments the table and generally slows things down. There is a command to defragment (OPTIMIZE TABLE), which somewhat mitigates the issue.
This page in MySQL's documentation seems to contradict the top answer here, in that DYNAMIC row format means something for InnoDB tables as well:
https://dev.mysql.com/doc/refman/5.7/en/innodb-row-format.html
Fixed means that every row is exactly the same size. That means that if the 3rd row on a data page needs to be loaded, it will be at exactly PageHeader+2*RowSize, saving some access time.
In order to find the beginning of a dynamic record, the list of record offsets must be consulted, which involves an extra indirection.
In short, yes, there's a slight performance hit for dynamic rows. No, it's not a very big one. If you think it will be a problem, test for it.
Fixed should be faster and more secure than dynamic, with the drawback of having a fixed char-lenght.
You can find this information here: http://dev.mysql.com/doc/refman/5.0/en/static-format.html