Limiting the size of TEXT in MySQL for speed - mysql

I have a table with a field that needs to store 512 characters. My question is this: can you limit the length of a TEXT field (VARCHAR stops at 255 - I can't use it) to 512 characters? If I do that, will MySQL run through the table quicker due to the fixed data length, or does it act a bit like VARCHAR in that the length can vary?
I don't have a table (at all) yet, but I'm very conscious about speed and size for the future. I found this article: link text incredibly useful - maybe you will to!
Thanks a lot,
James

Update your mysql server. Since 5.0.3 you can store up to 65535 bytes in VARCHAR.

You don't really say how you are using the table, so I'll just throw this in. One option is to separate out the text field into another table and just have a reference to it in your original table.

Related

mysql mediumtext & max row size

I am confused about two conflicting statements. I read repeatedly that the max size of a row is 64KB. So, for example, what does it mean to say that a mediumtext column can be 16MB if the whole row maxes out at 64KB? (I am using InnoDB.)
I have two related questions as well.
In practice, is a mediumtext field fixed or variable length? On dump it looks variable, but is it actually, or is the db record always alloting 16MB for it?
Is there a concensus that mediumtext is as useful as any smaller relative such as varchar, or is the general wisdom that it is better to use files for text over 60KB or so?
I know these are newbie questions, but I did a fair amount of searching before coming here, and I did not find answers. Your help will be appreciated.
Rows are not limited to 64KB, not even in any qualified way that I can think of. If you have a link to where is says that limit, let me know.
I have collected most of the 'hard' limits in MySQL here.
MEDIUMTEXT is implemented as a 3-byte length plus zero to 16MB bytes of text. The other TEXT and BLOB datatypes are similarly designed. They are all "variable length", not "fixed".
In general, use the datatype that works best for the data an the application code manipulating it.
As it turns out, LONGTEXT and LONGBLOB are difficult to load directly because of a 'variable' max_allowed_packet which has a default of 16MB. I'm not sure if you need to be sure that (innodb_log_file_size * innodb_log_files_in_group) > 10 * the largest TEXT you will have. (I don't think I have heard of anyone having such an issue.
There is a slight advantage in declaring VARCHARs with a reasonable size, not the 255 that is commonly seen. There is probably no difference between larger varchars and TEXT, except for what the cutoff is.
File versus text/blob... This is a common question on the forum, usually relating to the storage of images in BLOBs. There are advantages/disadvantages of either approach.

Permance using varchar(750) or text [duplicate]

I've got a messages table in MySQL which records messages between users. Apart from the typical ids and message types (all integer types) I need to save the actual message text as either VARCHAR or TEXT. I'm setting a front-end limit of 3000 characters which means the messages would never be inserted into the db as longer than this.
Is there a rationale for going with either VARCHAR(3000) or TEXT? There's something about just writing VARCHAR(3000) that feels somewhat counter-intuitive. I've been through other similar posts on Stack Overflow but would be good to get views specific to this type of common message storing.
TEXT and BLOB may by stored off the table with the table just having a pointer to the location of the actual storage. Where it is stored depends on lots of things like data size, columns size, row_format, and MySQL version.
VARCHAR is stored inline with the table. VARCHAR is faster when the size is reasonable, the tradeoff of which would be faster depends upon your data and your hardware, you'd want to benchmark a real-world scenario with your data.
Can you predict how long the user input would be?
VARCHAR(X)
Max Length: variable, up to 65,535 bytes (64KB)
Case: user name, email, country, subject, password
TEXT
Max Length: 65,535 bytes (64KB)
Case: messages, emails, comments, formatted text, html, code, images, links
MEDIUMTEXT
Max Length: 16,777,215 bytes (16MB)
Case: large json bodies, short to medium length books, csv strings
LONGTEXT
Max Length: 4,294,967,29 bytes (4GB)
Case: textbooks, programs, years of logs files, harry potter and the goblet of fire, scientific research logging
There's more information on this question.
Just to clarify the best practice:
Text format messages should almost always be stored as TEXT (they end up being arbitrarily long)
String attributes should be stored as VARCHAR (the destination user name, the subject, etc...).
I understand that you've got a front end limit, which is great until it isn't. *grin* The trick is to think of the DB as separate from the applications that connect to it. Just because one application puts a limit on the data, doesn't mean that the data is intrinsically limited.
What is it about the messages themselves that forces them to never be more then 3000 characters? If it's just an arbitrary application constraint (say, for a text box or something), use a TEXT field at the data layer.
Short answer: No practical, performance, or storage, difference.
Long answer:
There is essentially no difference (in MySQL) between VARCHAR(3000) (or any other large limit) and TEXT. The former will truncate at 3000 characters; the latter will truncate at 65535 bytes. (I make a distinction between bytes and characters because a character can take multiple bytes.)
For smaller limits in VARCHAR, there are some advantages over TEXT.
"smaller" means 191, 255, 512, 767, or 3072, etc, depending on version, context, and CHARACTER SET.
INDEXes are limited in how big a column can be indexed. (767 or 3072 bytes; this is version and settings dependent)
Intermediate tables created by complex SELECTs are handled in two different ways -- MEMORY (faster) or MyISAM (slower). When 'large' columns are involved, the slower technique is automatically picked. (Significant changes coming in version 8.0; so this bullet item is subject to change.)
Related to the previous item, all TEXT datatypes (as opposed to VARCHAR) jump straight to MyISAM. That is, TINYTEXT is automatically worse for generated temp tables than the equivalent VARCHAR. (But this takes the discussion in a third direction!)
VARBINARY is like VARCHAR; BLOB is like TEXT.
A table with several 'large' VARCHARs could hit a limit of 64KB for the whole table definition; switching to TEXT is a simple and practical fix. (Example: (42000) Row size too large, from an Oracle dump to a MySQL dump )
Rebuttal to other answers
The original question asked one thing (which datatype to use); the accepted answer answered something else (off-record storage). That answer is now out of date.
When this thread was started and answered, there were only two "row formats" in InnoDB. Soon afterwards, two more formats (DYNAMIC and COMPRESSED) were introduced.
The storage location for TEXT and VARCHAR() is based on size, not on name of datatype. For an updated discussion of on/off-record storage of large text/blob columns, see this .
Disclaimer: I'm not a MySQL expert ... but this is my understanding of the issues.
I think TEXT is stored outside the mysql row, while I think VARCHAR is stored as part of the row. There is a maximum row length for mysql rows .. so you can limit how much other data you can store in a row by using the VARCHAR.
Also due to VARCHAR forming part of the row, I suspect that queries looking at that field will be slightly faster than those using a TEXT chunk.
The preceding answers don't insist enough on the main problem: even in very simple queries like
(SELECT t2.* FROM t1, t2 WHERE t2.id = t1.id ORDER BY t1.id)
a temporary table can be required, and if a VARCHAR field is involved, it is converted to a CHAR field in the temporary table. So if you have in your table say 500 000 lines with a VARCHAR(65000) field, this column alone will use 6.5*5*10^9 byte. Such temp tables can't be handled in memory and are written to disk. The impact can be expected to be catastrophic.
Source (with metrics): https://nicj.net/mysql-text-vs-varchar-performance/
(This refers to the handling of TEXT vs VARCHAR in "standard"(?) MyISAM storage engine. It may be different in others, e.g., InnoDB.)
Varchar is for small data like email addresses, while Text is for much bigger data like news articles, Blob for binary data such as images.
The performance of Varchar is more powerful because it runs completely from memory, but this will not be the case if data is too big like varchar(4000) for example.
Text, on the other hand, does not stick to memory and is affected by disk performance, but you can avoid that by separating text data in a separate table and apply a left join query to retrieve text data.
Blob is much slower so use it only if you don't have much data like 10000 images which will cost 10000 records.
Follow these tips for maximum speed and performance:
Use varchar for name, titles, emails
Use Text for large data
Separate text in different tables
Use Left Join queries on an ID such as a phone number
If you are going to use Blob apply the same tips as in Text
This will make queries cost milliseconds on tables with data >10 M and size up to 10GB guaranteed.
There is a HUGE difference between VARCHAR and TEXT. While VARCHAR fields can be indexed, TEXT fields cannot. VARCHAR type fields are stored inline while TEXT are stored offline, only pointers to TEXT data is actually stored in the records.
If you have to index your field for faster search, update or delete than go for VARCHAR, no matter how big. A VARCHAR(10000000) will never be the same as a TEXT field bacause these two data types are different in nature.
If you use you field only for archiving
you don't care about data
speed retrival
you care about speed but you will use the operator
'%LIKE%' in your search query so indexing will not help much
you
can't predict a limit of the data length
than go for TEXT.
Just a correction to so many answers here, even if it is a bit late to the party.
Text fields can be fully indexed by MySQL as per their documentation.
Link provided
https://dev.mysql.com/doc/refman/5.6/en/column-indexes.html
Overall Varchar fields are longer to write to then Text fields, but it only matters if you have multitude of write requests

MySQL Maximum Row Length - Possible to increase?

I was wondering if it is possible to increase the maximum row length of MySQL with InnoDB engine. The current is 8KB.
I would also like to know what enforces this limitation. I do not remember having such a limitation with Oracle 10 or MSSQL 2005.
Thank you!
You have to change the inno_db_page_size, check http://www.mysqlperformanceblog.com/2006/06/04/innodb-page-size/ It can be 8K, 16K, 32K or 64K, 16K is the default.
MS SQL Server has this limitation as well. To get around this, you may use LOBs (like CLOB, BLOB, TEXT, etc) since they are not part of the record itself.
Alternatively you may split your table into two tables linked 1:1. Maybe you could give some background what you try to achieve and why your table is that big.
This is a restriction on InnoDB tables and cannot be changed. According to the docs:
The maximum row length, except for
variable-length columns (VARBINARY,
VARCHAR, BLOB and TEXT), is slightly
less than half of a database page.
That is, the maximum row length is
about 8000 bytes.
You will have to think about redesigning your table.

Which DATATYPE is better to use TEXT or VARCHAR?

This question is based on two things performance and size
Which DATATYPE is better to use TEXT or VARCHAR? Based on performance which will affect and which will impove?
It depends on what you're using it for. I hate to give such a generic answer, but it's true. Generally, try to get the data type as specific as you can. If your strings will never exceed some upper limit of characters, then go with VARCHAR because it will be a little more efficient. If you need more space, go with TEXT. If you aren't sure how much space your text will take up, you should probably go with TEXT; the performance difference isn't very large, and it's better to be future-proof than risk having to change it later when your requirements change. Just my two cents.
In the comments, Pitarou points out that, if MySQL creates a temporary table for your query (see this), TEXT columns will not be stored in memory and will have to be read from the disk, which is much slower. (Source, bottom of the page.) This shouldn't matter for most queries, though.
In case anyone was wondering how PostgreSQL compares, I found this benchmark that shows that CHAR, VARCHAR, and TEXT all perform equally well. So if you're using Postgres, it doesn't matter what type you use.
From V 5.0.3 onwards, Limit of VARCHAR is increased from 0-256 to 0-65,535 (subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.)
Ref. http://dev.mysql.com/doc/refman/5.0/en/char.html
If you are using TEXT that is fixed 64k length, even if you required lesser limit
So Better to go with VARCHAR with higher limit than TEXT.
If requirement is more than 64K go with MEDIUMTEXT or LONGTEXT accordingly.
Queries against the TEXT table were always 3 times slower than those against the VARCHAR table (averages: 0.10 seconds for the VARCHAR table, 0.29 seconds for the TEXT table). The difference is 100% repeatable.
Benchmark from http://forums.mysql.com/read.php?24,105964,105964
VARCHAR you can set a limit for how many chars it will accept per record, text is (virtually) unlimited... not exactly sure about performance, but i would assume a more specific datatype (varchar) would be faster.
VARCHAR should have a better performance since it has a limited size. In fact, in all of my experiences with MySQL, the search operation was always faster with VARCHAR than TEXT. Anyway, it's based on my experience. You should check the documentation to find out more about it.
It really depends on your data type.
If your field is fixed-length (e.g. a 32-character hash value), then use CHAR. This has better performance because every entry takes up the same space per row.
The standard limit for VARCHAR was 255 characters but I think it's been increased now. TEXT is pretty damn long and is generally only used for big content like a whole blog post, and comments if you don't want a limit.
With regard to size there is no (or very little) difference between VARCHAR and TEXT since they just store what they need to. CHAR fields will always take up their allotted length.
Performance-wise, VARCHAR is usually faster. VARCHARs can be indexed too which leads to faster searching.
MySQL will internally convert TEXT to varchar while creating temporary tables. So it is better to use VARCHAR if possible. There are a few minor bugs related to TEXT column such as...
http://bugs.mysql.com/bug.php?id=36676
As per my Opinion VARCHAR is best option when u know the length of characters. It will also reduce garbage Memory Allocations and space issue. TEXT will consume 255 where as VARCHAR will consume as u give the values to it.
As per performance, VARCHAR is also faster then TEXT.
There is a subtle difference in text and varchar. I have a table as shown:
CREATE TABLE `test`.`tbl`(
`kee` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
`txt` TEXT(100),
`vrchr` VARCHAR(100),
PRIMARY KEY (`kee`)
);
I insert a row:
INSERT INTO `tbl`
(`txt`,
`vrchr`)
VALUES ('1
2
3',
'1
2
3');
The column txt has value:
1
2
3
and column vrchr has value:
1

MySQL: Why use VARCHAR(20) instead of VARCHAR(255)? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Are there disadvantages to using a generic varchar(255) for all text-based fields?
In MYSQL you can choose a length for the VARCHAR field type. Possible values are 1-255.
But what are its advantages if you use VARCHAR(255) that is the maximum instead of VARCHAR(20)? As far as I know, the size of the entries depends only on the real length of the inserted string.
size (bytes) = length+1
So if you have the word "Example" in a VARCHAR(255) field, it would have 8 bytes. If you have it in a VARCHAR(20) field, it would have 8 bytes, too. What is the difference?
I hope you can help me. Thanks in advance!
Check out: Reference for Varchar
In short there isn't much difference unless you go over the size of 255 in your VARCHAR which will require another byte for the length prefix.
The length indicates more of a constraint on the data stored in the column than anything else. This inherently constrains the MAXIMUM storage size for the column as well. IMHO, the length should make sense with respect to the data. If your storing a Social Security # it makes no sense to set the length to 128 even though it doesn't cost you anything in storage if all you actually store is an SSN.
There are many valid reasons for choosing a value smaller than the maximum that are not related to performance. Setting a size helps indicate the type of data you are storing and also can also act as a last-gasp form of validation.
For instance, if you are storing a UK postcode then you only need 8 characters. Setting this limit helps make clear the type of data you are storing. If you chose 255 characters it would just confuse matters.
I don't know about mySQL but in SQL Server it will let you define fields such that the total number of bytes used is greater than the total number of bytes that can actually be stored in a record. This is a bad thing. Sooner or later you will get a row where the limit is reached and you cannot insert the data.
It is far better to design your database structure to consider row size limits.
Additionally yes, you do not want people to put 200 characters in a field where the maximum value should be 10. If they do, it is almost always bad data.
You say, well I can limit that at the application level. But data does not get into the database just from one application. Sometimes multiple applications use it, sometimes data is imported and sometimes it is fixed manually from the query window (update all the records to add 10% to the price for instance). If any of these other sources of data don't know about the rules you put in your application, you will have bad, useless data in your database. Data integrity must be enforced at the database level (which doesn't stop you from also checking before you try to enter data) or you have no integrity. Plus it has been my experience that people who are too lazy to design their database are often also too lazy to actually put the limits into the application and there is no data integrity check at all.
They have a word for databases with no data integrity - useless.
There is a semantical difference (and I believe that's the only difference): if you try to fill 30 non-space characters into varchar(20), it will produce an error, whereas it will succeed for varchar(255). So it is primarily an additional constraint.
Well, if you want to allow for a larger entry, or limit the entry size perhaps.
For example, you may have first_name as a VARCHAR 20, but perhaps street_address as a VARCHAR 50 since 20 may not be enough space. At the same time, you may want to control how large that value can get.
In other words, you have set a ceiling of how large a particular value can be, in theory to prevent the table (and potentially the index/index entries) from getting too large.
You could just use CHAR which is a fixed width as well, but unlike VARCHAR which can be smaller, CHAR pads the values (although this makes for quicker SQL access.
From a database perspective performance wise I do not believe there is going to be a difference.
However, I think a lot of the decision on the length to use comes down to what you are trying to accomplish and documenting the system to accept just the data that it needs.