With MyISAM having variable length columns (varchar, blob) on the table really slowed queries so that I encountered advices on the net to move varchar columns into separate table.
Is that still an issue with InnoDB? I don't mean cases where introducing many varchar rows into the table causes page split. I just mean should you consider, for example, move post_text (single BLOB field in the table) into another table, speaking performance-wise about InnoDB?
As far as I know BLOBs (and TEXTs) are actually stored outside of the table, VARCHARs are stored in the table.
VARCHARs are bad for read performance because each record can be of variable length and that makes it more costly to find fields in a record.
BLOBs are slow because the value has to be fetched separately and it will very likely require another read from disk or cache.
To my knowledge InnoDB doesn't do anything differently in this respect so I would assume the performance characteristics hold.
I don't think moving BLOB values really helps - other than reducing overall table size which has a positive influence on performance regardless.
VARCHARs are a different story. You will definitely benefit here. If all your columns are of defined length (and I guess that means you can't use BLOBs either?) the field lookup will be faster.
If you're just 'reading' the VARHCAR and BLOB fields I'd say this is worth a shot. But if your select query needs to compare a value from a VARCHAR or a BLOB you're pretty sour.
So yes you can definitely gain performance here but make sure you test that you're actually gaining performance and that the increase is worth the aggressive denormalization.
PS.
Another way of 'optimizing' VARCHAR read performance is to simply replace them by CHAR fields (of fixed length). This could benefit read performance, so long as the increase in disk space is acceptable.
InnoDB data completely differently than MyISAM.
In MyISAM all indexes--primary or otherwise--- are stored in the MYI file an contain a pointer to the data stored in the MYD file. Variable length rows shouldn't directly affect query speed directly, but the MYD file does tend to get more fragmented with variable length rows because the hole left behind when you delete a row can't necessarily be filed in with the row you insert next. If you update a variable length value to make it longer you might have to move it somewhere else, which means it will tend to get out-of-order with respect to the indexes over time, making range queries slower. (If you're running it on a spinning disk where seek times are important).
InnoDB stores data clustered in pages in a B-tree on the primary key. So long as the data will fit in a page it is stored in the page whether you're using a BLOB or VARCHAR. As long as you aren't trying to insert inordinately long values on a regular basis it shouldn't matter whether your rows are fixed-length or variable-length.
Related
There are 210 columns in my table with around 10000 rows. Each row is unique and there is a primary key on the table. The thing is we always had to do select all query on the table to get data of all the sites.
Currently, the problem is it takes too much time and the data returned is around 10mb and it will be large in the future.
The table has varchar, text and date types in it.
Is there any way I can modify the structure or something to make my retrieval faster. More indexing or breaking down the table. (Although I think denormalized data is good for retrieval)
Update: "why do wider tables slow down the query performance?"
Thanks..!
why do wider tables slow down the query performance?
InnoDB stores "wide" tables in a different way. Instead of having all the columns together in a single string (plus overhead, such as lengths, etc), it does the following:
If the total of all the columns for a given row exceeds about 8KB, it will move some of the data to another ("off-record") storage area.
Which columns are moved off-record depends on the sizes of the columns, etc.
The details depend on the ROW_FORMAT chosen.
"Off-record" is another 16KB block (or blocks).
Later, when doing SELECT * (or at least fetching the off-record column(s)), it must do another disk fetch.
What to do?
Rethink having so many columns.
Consider "vertical partitioning", wherein you have another table(s) that contains selected TEXT columns. Suggest picking groups of columns based on access patterns in your app.
For columns that are usually quite long, consider compressing them in the client and storing into a BLOB instead of a TEXT. Most "text" shrinks 3:1. Blobs are sent off-record the same as Texts, however, these compressed blobs would be smaller, hence less likely to spill.
Do more processing in SQL -- to avoid returning all the rows, or to avoid returning the full text, etc. When blindly shoveling lots of text to a client, the network and client become a sighificant factor in the elapsed time, not just the SELECT, itself.
While searching trough SO, I've found two contradicting answers (and even a comment that stated that) but no definitive answer:
The problem is: is there any performance benefit, if you store a TEXT/BLOB field outside of a table?
We assume:
You SELECT correctly (only selection the TEXT/BLOB if required, no SELECT *)
Tables are indexed properly, where it makes sense (so it's not a matter of 'if you index it')
The database design doesnt really matter. This is a question to identify the MySQL behaviour in this special case, not to solve certain database design problems. Let's assume this Database has only one table (or two, if the TEXT/BLOB gets separated)
used engine: innoDB (others would be interesting too, if they fetch different results)
This post states, that putting the TEXT/BLOB into a separate table, only helps if you're already SELECTing in a wrong way (always SELECTing the TEXT/BLOB even when it's not necessary) - basically stating, that TEXT/BLOB in the same table is basically the better solution (less complexity, no performance hit, etc) since the TEXT/BLOB is stored seprately anyway
The only time that moving TEXT columns into another table will offer any benefit is if there it a tendency to usually select all columns from tables. This is merely introducing a second bad practice to compensate for the first. It should go without saying the two wrongs is not the same as three lefts.
MySQL Table with TEXT column
This post however, states that:
When a table has TEXT or BLOB columns, the table can't be stored in memory
Does that mean that it's already enough to have a TEXT/BLOB inside a table, to have a performance hit?
MySQL varchar(2000) vs text?
My Question basically is: What's the correct answer?
Does it really matter if you store TEXT/BLOB into a separate table, if you SELECT correctly?
Or does even having a TEXT/BLOB inside a table, create a potential performance hit?
Update: Barracuda is the default InnoDB file format since version 5.7.
If available on your MySQL version, use the InnoDB Barracuda file format using
innodb_file_format=barracuda
in your MySQL configuration and set up your tables using ROW_FORMAT=Dynamic (or Compressed) to actually use it.
This will make InnoDB to store BLOBs, TEXTs and bigger VARCHARs outside the row pages and thus making it a lot more efficient. See this MySQLperformanceblog.com blog article for more information.
As far as I understand it, using the Barracuda format will make storing TEXT/BLOB/VARCHARs in separate tables not valid anymore for performance reasons. However, I think it's always good to keep proper database normalization in mind.
One performance gain is to have a table with fixed length records. This would mean no variable length fields like varchar or text/blob. With fixed length records, MySQL doesn't need to "seek" the end of a record since it knows the size offset. It also knows how much memory it needs to load X records. Tables with fixed length records are less prone to fragmentation since space made available from deleted records can be fully reused. MyISAM tables actually have a few other benefits from fixed length records.
Assuming you are using innodb_file_per_table, keeping the tex/blob in a separate table will increase the likelihood that the file system caching will be used since the table will be smaller.
That said, this is a micro optimization. There are many other things you can do to get much bigger performance gains. For example, use SSD drives. It's not going to give you enough of a performance boost to push out the day of reckoning when your tables get so big you'll have to implement sharding.
You don't hear about databases using the "raw file system" anymore even though it can be much faster. "Raw" is when the database accesses the disk hardware directly, bypassing any file system. I think Oracle still supports this. But it's just not worth the added complexity, and you have to really know what you are doing. In my opinion, storing your text/blob in a separate table just isn't worth the added complexity for the possible performance gain. You really need to know what you are doing, and your access patterns, to take advantage of it.
I m designing a new forum for My Company and i wanted to confirm that saving the forum posts in MySQL Database would be scalable and would it have good Performance ..?
The Posts may have characters around 400(may be i will limit to 400 Chars). If i save 400 Chars of text in a MySQL field, and the Table has 10 million rows, will it affect performance ..?
My Main constraint is performance. Can Please Someone Shed light on this
There are two data type to consider VARCHAR or TEXT
What datatype you decide on depends on
How frequently you display it ?
Total number of characters you store
TEXT and BLOB is stored off the table with the table just having a pointer to the location of the actual storage.
VARCHAR is stored inline with the table. VARCHAR is faster when the size is reasonable, the tradeoff of which would be faster depends upon your data and your hardware, you'd want to benchmark a realworld senerio with your data.
VARCHAR (stored inline) is usually faster IF the data is frequently retrieved (included by most queries). However, for a large volume of data that is not normally retrieved (that is, not referenced by any query), then it may be better to not have the data stored inline. There is an upper limit on the row size, for data stored inline.
When a table has TEXT or BLOB columns, the table can't be stored in memory. This means every query (which doesn't hit cache) has to access the file system - which is orders of magnitude slower than the memory.
If you post content is large use TEXT field but store the Text field in a seperate table which is only accessed when you actually need it. This way the original table can be stored in memory and will be much faster.
Think of it as separating the data into one "memory table" and one "file table". The reason for doing this is to avoid accessing of the filesystem except when neccessary (i.e. only when you need the text).
You can try (posts, post_text) or (post_details, posts) or something like that.
I've heard (from a colleague, who heard it from another developer) that VARCHAR columns should always be put at the end of a table definition in MySQL, because they are variable in length and could therefore slow down queries.
The research I've done on stack overflow seems to contradict this however and suggests that column order is important, while there is varying agreement on how much this applies to VARCHARs.
He wasn't specific about storage engines, or about whether this only applied to VARCHAR columns which are infrequently accessed.
Asking that question about "MySQL" is not helpful, as MySQL relegates storage to storage engines, and they implement storage in very different ways. It makes sense to ask this question for any individual storage engine.
In the MEMORY engine, variable length data types do not exist. A VARCHAR is silently changed into a CHAR. In the context of your question: It does not matter where in a table definition you put your VARCHAR.
In the MyISAM engine, if a table has no variable length data whatsoever (VARCHAR, VARBINARY or any TEXT or BLOB type) it is of the FIXED variant of MyISAM, that is, records have a fixed byte length. This can have performance implications, especially if data is deleted and inserted repeatedly (i.e. the table is not append only). As soon as any variable length data type is part of a table definition it becomes the DYNAMIC variant of MyISAM, and MyISAM internally changes any but the shortest CHAR type internally to VARCHAR. Again, position and even definition of CHAR/VARCHAR do not matter.
In the InnoDB engine, data is stored in pages of 16 KB size. A page has a page footer with a checksum, and a page header, with among other things a page directory. The page directory contains for each row the offset of that row relative to the beginning of the page. A page also contains free space, and all I/O is done in pages.
Hence InnoDB can, as long as there is free space in a page, grow VARCHAR in place, and move rows around inside a page, without incurring any additional I/O. Also, since all rows are being addressed as (pagenumber, page directory entry), movement of a row inside a page is localized to the page and not visible from the outside.
It also means that for InnoDB too, the order of columns inside a row does not matter at all.
These are the three storage engines that are most commonly used with MySQL, and order of columns does not matter for any of these three. It may be that other, more exotic storage engines exist for which this is not true.
It does not matter. And some engines store varlena types in a separate area (e.g. TOAST in Postgres).
Moreover, the logical order (what you see when you select *) may actually differ from the physical order (how it's stored, which is based on the order in which you've created the actual columns using subsequent alter table statements).
http://www.sqlskills.com/BLOGS/PAUL/post/Inside-the-Storage-Engine-Anatomy-of-a-record.aspx
I'm working on an application that will be implementing a hex value as a business key (in addition to an auto increment field as primary key) similar to the URL id seen in Gmail. I will be adding a unique constraint to the column and was originally thinking of storing the value as a bigint to get away from searching a varchar field but was wondering if that's necessary if the field is unique.
Internal joins would be done using the auto increment field and the hex value would be used in the where clause for filtering.
What sort of performance hit would there be in simply storing the value as a varchar(x), or perhaps a char(x) over the additional work in doing the conversion to and from hex to store the value as an integer in the database? Is it worth the additional complexity?
I did a quick test on a small number of rows (50k) and had similar search result times. If there is a large performance issue would it be linear, or exponential?
I'm using InnoDB as the engine.
Is your hex value a GUID? Although I used to worry about the performance of such long items as indexes, I have found that on modern databases the performance difference on even millions of records is fairly insignificant.
A potentially larger problem is the memory that the index consumes (16 byte vs 4 byte int, for example), but on servers that I control I can allocate for that. As long as the index can be in memory, I find that there is more overhead from other operations that the size of the index element doesn't make a noticeable difference.
On the upside, if you use a GUID you gain server independence for records created and more flexibility in merging data on multiple servers (which is something I care about, as our system aggregates data from child systems).
There is a graph on this article that seems to back up my suspicion: Myths, GUID vs Autoincrement
The hex value is generated from a UUID (Java's implementation); it's hashed and truncated to smaller length (likely 16 characters). The algorithm for which is still under discussion (currently SHA). An advantage I see of storing the value in hex vs integer is that if we needed to grow the size (which I don't see happening with this application at 16 char) we could simply increase the truncated length and leave the old values without fear of collision. Converting to integer values wouldn't work as nicely for that.
The reason for the truncation vs simply using a GUID/UUID is simply to make the URL's and API's (which is where these will be used) more friendly.
All else being equal, keeping the data smaller will make it run faster. Mostly because it'll take less space, so less disk i/o, less memory needed to hold the index, etc etc. 50k rows isn't enough to notice that though...