I have a field "descr" varchar(15000) which will regularly hold between 1000 and 5000 characters of text per row, is returned in every query, and will be frequently searched with LIKE %search% (the database is mysql 5.5 and fulltext indexes aren't available). The text inputted is research data and so being unique is not a need - but being searchable is.
The table is innodb with utf-8 encoding. The number of rows is not large (30,000). Maximum index size on the varchar is (255), yet when I do a search on the column, it correctly returns a row with 3000 character input.
I have read a lot on indexing, the most relevant being MySQL: Large VARCHAR vs. TEXT?:
TEXT is stored off the table with the table .
VARCHAR is stored inline and is much faster when the size is reasonable and the data is
frequently retrieved.
I need help understanding:
1. What is the performance impact on retrieval (with 30,000 rows)
going to a text field and
2. Is a varchar index workable for
searching through 3000-5000 character fields? How is the search able
to find strings with data longer than the 255 varchar index anyway?
Or would you advise going with mediumtext?
Thanks for your input.
First, it doesn't matter how you index the column if your search uses LIKE %search% because that predicate can't make use of a BTREE index. It's going to do a table-scan regardless of whether you choose VARCHAR or TEXT.
Second, there's no difference between how InnoDB stores VARCHAR vs. TEXT. They are both handled as variable-length strings. If they can fit on the same data page with other columns, then they do. If they're too long for a page (or really a little less than half of one page, because each page must fit at least two rows, plus some header information), then only 768 bytes are stored on the page, and the remainder of the string goes on overflow pages. Unless you declare the table to to use ROW_FORMAT=DYNAMIC, in which case all of the string goes on overflow pages if it doesn't fit on the page. See also http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/
You may also be interested in my presentation Full Text Search Throwdown. I compare other companion technology, such as Sphinx Search.
I know that limiting the row size will keep the size at a set maximum, however, if i limit all my rows to the maximum that they will ever need, does that increase performance on the rows (i.e. take up less space)
Or does mysql dynamically allocate the size necessary for rows and maximum row length has no affect on it?
MySQL will not dynamically size fields; the potential required size is determined by the data type you choose. However, you can choose appropriate data types to help optimize the database.
For example, the TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT data types require 1 byte, 2 bytes, 3 bytes, 4 bytes, and 8 bytes respectively. If you are only ever going to store integers up to 10,000, SMALLINT would be the best choice since it can store values up to 65,535 (unsigned) wioth plenty of extra room and requires half of the size of an INT.
With variable length types like VARCHAR any length up to and including 255 uses a one byte prefix to store the length. Anything more uses a 2 byte prefix. Limiting VARCHARs to at most 255 if possible is best. As an example, the string foo will require 4 bytes in both a VARCHAR(3) field and a VARCHAR(255) field. If the max length of the field was 500, for example, foo would require 5 bytes.
When possible with short (or sometimes even long) text xolumns, the CHAR type can be beneficial if the strings always have the same length or a very close length. Also, if you're using the MyISAM storage engine and all of your columns are fixed length, you can use the FIXED row format, which is beneficial.
More information about the MySQL data types is available in the manual.
I don't know what is purpose of your question, but if you want to minimize storage space used by database you may use built-in compression, which is avaliable when you are using InnoDB storage engine.
More information: Enabling Compression for a Table (dev.mysql.com)
I don't know how it affects performance (speed), I never used that feature, but I think compression may slow your database significantly. However if you have limited disk space on some hosting server - it may be useful for applictions without heavy load.
MySql does not automatically allocate the necessary size, so you wan't to define it yourself to make the database's performance better and keeping it at a minimum size.
One nice approach that I walked into the other way though was to use NVARCHAR instead of VARCHAR if the VARCHAR was to be used for long strings, but yet not long enough to fit for TEXT, which requires a lot of space.
I've got a messages table in MySQL which records messages between users. Apart from the typical ids and message types (all integer types) I need to save the actual message text as either VARCHAR or TEXT. I'm setting a front-end limit of 3000 characters which means the messages would never be inserted into the db as longer than this.
Is there a rationale for going with either VARCHAR(3000) or TEXT? There's something about just writing VARCHAR(3000) that feels somewhat counter-intuitive. I've been through other similar posts on Stack Overflow but would be good to get views specific to this type of common message storing.
TEXT and BLOB may by stored off the table with the table just having a pointer to the location of the actual storage. Where it is stored depends on lots of things like data size, columns size, row_format, and MySQL version.
VARCHAR is stored inline with the table. VARCHAR is faster when the size is reasonable, the tradeoff of which would be faster depends upon your data and your hardware, you'd want to benchmark a real-world scenario with your data.
Can you predict how long the user input would be?
VARCHAR(X)
Max Length: variable, up to 65,535 bytes (64KB)
Case: user name, email, country, subject, password
TEXT
Max Length: 65,535 bytes (64KB)
Case: messages, emails, comments, formatted text, html, code, images, links
MEDIUMTEXT
Max Length: 16,777,215 bytes (16MB)
Case: large json bodies, short to medium length books, csv strings
LONGTEXT
Max Length: 4,294,967,29 bytes (4GB)
Case: textbooks, programs, years of logs files, harry potter and the goblet of fire, scientific research logging
There's more information on this question.
Just to clarify the best practice:
Text format messages should almost always be stored as TEXT (they end up being arbitrarily long)
String attributes should be stored as VARCHAR (the destination user name, the subject, etc...).
I understand that you've got a front end limit, which is great until it isn't. *grin* The trick is to think of the DB as separate from the applications that connect to it. Just because one application puts a limit on the data, doesn't mean that the data is intrinsically limited.
What is it about the messages themselves that forces them to never be more then 3000 characters? If it's just an arbitrary application constraint (say, for a text box or something), use a TEXT field at the data layer.
Short answer: No practical, performance, or storage, difference.
Long answer:
There is essentially no difference (in MySQL) between VARCHAR(3000) (or any other large limit) and TEXT. The former will truncate at 3000 characters; the latter will truncate at 65535 bytes. (I make a distinction between bytes and characters because a character can take multiple bytes.)
For smaller limits in VARCHAR, there are some advantages over TEXT.
"smaller" means 191, 255, 512, 767, or 3072, etc, depending on version, context, and CHARACTER SET.
INDEXes are limited in how big a column can be indexed. (767 or 3072 bytes; this is version and settings dependent)
Intermediate tables created by complex SELECTs are handled in two different ways -- MEMORY (faster) or MyISAM (slower). When 'large' columns are involved, the slower technique is automatically picked. (Significant changes coming in version 8.0; so this bullet item is subject to change.)
Related to the previous item, all TEXT datatypes (as opposed to VARCHAR) jump straight to MyISAM. That is, TINYTEXT is automatically worse for generated temp tables than the equivalent VARCHAR. (But this takes the discussion in a third direction!)
VARBINARY is like VARCHAR; BLOB is like TEXT.
A table with several 'large' VARCHARs could hit a limit of 64KB for the whole table definition; switching to TEXT is a simple and practical fix. (Example: (42000) Row size too large, from an Oracle dump to a MySQL dump )
Rebuttal to other answers
The original question asked one thing (which datatype to use); the accepted answer answered something else (off-record storage). That answer is now out of date.
When this thread was started and answered, there were only two "row formats" in InnoDB. Soon afterwards, two more formats (DYNAMIC and COMPRESSED) were introduced.
The storage location for TEXT and VARCHAR() is based on size, not on name of datatype. For an updated discussion of on/off-record storage of large text/blob columns, see this .
Disclaimer: I'm not a MySQL expert ... but this is my understanding of the issues.
I think TEXT is stored outside the mysql row, while I think VARCHAR is stored as part of the row. There is a maximum row length for mysql rows .. so you can limit how much other data you can store in a row by using the VARCHAR.
Also due to VARCHAR forming part of the row, I suspect that queries looking at that field will be slightly faster than those using a TEXT chunk.
The preceding answers don't insist enough on the main problem: even in very simple queries like
(SELECT t2.* FROM t1, t2 WHERE t2.id = t1.id ORDER BY t1.id)
a temporary table can be required, and if a VARCHAR field is involved, it is converted to a CHAR field in the temporary table. So if you have in your table say 500 000 lines with a VARCHAR(65000) field, this column alone will use 6.5*5*10^9 byte. Such temp tables can't be handled in memory and are written to disk. The impact can be expected to be catastrophic.
Source (with metrics): https://nicj.net/mysql-text-vs-varchar-performance/
(This refers to the handling of TEXT vs VARCHAR in "standard"(?) MyISAM storage engine. It may be different in others, e.g., InnoDB.)
Varchar is for small data like email addresses, while Text is for much bigger data like news articles, Blob for binary data such as images.
The performance of Varchar is more powerful because it runs completely from memory, but this will not be the case if data is too big like varchar(4000) for example.
Text, on the other hand, does not stick to memory and is affected by disk performance, but you can avoid that by separating text data in a separate table and apply a left join query to retrieve text data.
Blob is much slower so use it only if you don't have much data like 10000 images which will cost 10000 records.
Follow these tips for maximum speed and performance:
Use varchar for name, titles, emails
Use Text for large data
Separate text in different tables
Use Left Join queries on an ID such as a phone number
If you are going to use Blob apply the same tips as in Text
This will make queries cost milliseconds on tables with data >10 M and size up to 10GB guaranteed.
There is a HUGE difference between VARCHAR and TEXT. While VARCHAR fields can be indexed, TEXT fields cannot. VARCHAR type fields are stored inline while TEXT are stored offline, only pointers to TEXT data is actually stored in the records.
If you have to index your field for faster search, update or delete than go for VARCHAR, no matter how big. A VARCHAR(10000000) will never be the same as a TEXT field bacause these two data types are different in nature.
If you use you field only for archiving
you don't care about data
speed retrival
you care about speed but you will use the operator
'%LIKE%' in your search query so indexing will not help much
you
can't predict a limit of the data length
than go for TEXT.
Just a correction to so many answers here, even if it is a bit late to the party.
Text fields can be fully indexed by MySQL as per their documentation.
Link provided
https://dev.mysql.com/doc/refman/5.6/en/column-indexes.html
Overall Varchar fields are longer to write to then Text fields, but it only matters if you have multitude of write requests
I am updating a row in MySQL which currently has 15 varchar(2500) columns plus a few other small sized columns. I need to allow the user to enter more data in these columns, would it be better to convert them to BLOBS? Is there a maximum recommended number of BLOBS per row?
What is best practice with regards to not exceeding the maximum number of bytes per row whilst still allowing the user flexibility with regards to how much they enter in each column?
Many thanks.
No, there's no maximum recommended number of BLOBS on a row. You are really only limited by the maximum size of the row. There is some performance overhead for accessing each BLOB (the row contains a pointer to the BLOB data which is stored elsewhere, vs. accessing a column value actually stored within the row.)
The best practice is to include the most frequently accessed columns within the row, and limit the size of variable length columns to practical limits. Consider TEXT and/or BLOB columns when the size of the column exceeds the maximum allowed by other datatypes, or when the length of the row exceeds the maximum allowed. Alternatively, consider separate child table(s) for less frequently accessed "long-ish" columns.
I don't want to use TEXT if possible because of the extra overhead. Is it ok to have a VARCHAR(900) or some other high number in the ()?
If you are sure you won't exceed the limits for your column in future its fine.
http://dev.mysql.com/doc/refman/5.6/en/char.html
You row however shouldn't exceed the maximum row size fo 65535 bytes. This limit applies to all your columns in that table.
A TEXT or BLOB columns are just pointed to on disk by an address in the row. Therefore the table can't be stored in memory.
This would mean that every time you select that column you will have to access the file system which will be slower. A recommendation that i have often seen is to then keep the text column as a separate table.
But a limit of 900 should be ok to keep as varchar.
its ok no problem using varchar(900) if you limited to store 900 character than no problem