mysql large varchar or text - indexing with frequently accessed field - mysql

I have a field "descr" varchar(15000) which will regularly hold between 1000 and 5000 characters of text per row, is returned in every query, and will be frequently searched with LIKE %search% (the database is mysql 5.5 and fulltext indexes aren't available). The text inputted is research data and so being unique is not a need - but being searchable is.
The table is innodb with utf-8 encoding. The number of rows is not large (30,000). Maximum index size on the varchar is (255), yet when I do a search on the column, it correctly returns a row with 3000 character input.
I have read a lot on indexing, the most relevant being MySQL: Large VARCHAR vs. TEXT?:
TEXT is stored off the table with the table .
VARCHAR is stored inline and is much faster when the size is reasonable and the data is
frequently retrieved.
I need help understanding:
1. What is the performance impact on retrieval (with 30,000 rows)
going to a text field and
2. Is a varchar index workable for
searching through 3000-5000 character fields? How is the search able
to find strings with data longer than the 255 varchar index anyway?
Or would you advise going with mediumtext?
Thanks for your input.

First, it doesn't matter how you index the column if your search uses LIKE %search% because that predicate can't make use of a BTREE index. It's going to do a table-scan regardless of whether you choose VARCHAR or TEXT.
Second, there's no difference between how InnoDB stores VARCHAR vs. TEXT. They are both handled as variable-length strings. If they can fit on the same data page with other columns, then they do. If they're too long for a page (or really a little less than half of one page, because each page must fit at least two rows, plus some header information), then only 768 bytes are stored on the page, and the remainder of the string goes on overflow pages. Unless you declare the table to to use ROW_FORMAT=DYNAMIC, in which case all of the string goes on overflow pages if it doesn't fit on the page. See also http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/
You may also be interested in my presentation Full Text Search Throwdown. I compare other companion technology, such as Sphinx Search.

Related

Would it be effective to store text of different sizes in different mysql tables?

I am building a database system that will be storing large amounts of text.
The text will be stored in a table with an id column and one varchar/text column.
I was wondering if it would be more effective to use a single table which employed a large varchar, or multiple tables, each employing a different text type.
The multiple table option would contain several different tables, each employing a different kind of text (tinytext, text, etc.), and the system would store text in the most appropriate one based on the length of the text.
I am concerned with both speed and storage space, and would like a solution to balances both.
Edit -
The text table will not be searched on, but it may be joined (usually an id number will be determined, then a single row accessed).
Size will typically be smaller that text, but some will be large enough to require mediumtext. I doubt that longtext will be needed.
Keep it simple! Seriously.
Unless you have an overwhelming majority of text items that are 255 characters or shorter, just use TEXT or LONGTEXT. Spend your time doing interesting things with your text, not fiddling around with complex data structures. Get your project done now; optimize later.
Disk drives and RAM are getting cheaper much faster than your time is.
If your app requirements absolutely need you to use varchar data, for its brevity and searchability, instead of text data, you can do the following.
Create an article table, with one row per text article. It will have all the stuff you need to manage an article, including let's say the title, author, and an article_id.
Create a second table called something like article_text. It will have, possibly, four columns.
article_id foreign key to article table.
language a language code, if you happen to store translations of articles
ordinal a sequence number
textfrag varchar(255) part of the text.
Store each article's text in an series of article_text rows with ascending ordinal values. Each textfrag will hold up to 255 characters of your text. To retrieve an article's text you'll use a query like this.
SELECT textfrag
FROM article_text
WHERE language = 'en_US' /* or whatever */
AND article_id = 23456 /* or whatever */
ORDER BY ordinal
Then, you'll fetch a bunch of rows, concatenate the contents of the textfrag items, and there's your article with no effective length limit. If you create an index with all the fields in it, your retrieval time will be really fast because all retrievals will come from the index.
(article_id, language, ordinal, textfrag)
If you do your best to spit the text into fragments at word boundaries, and you use MyISAM, and you use FULLTEXT indexes, you'll get a very effective fulltext search system.

Permance using varchar(750) or text [duplicate]

I've got a messages table in MySQL which records messages between users. Apart from the typical ids and message types (all integer types) I need to save the actual message text as either VARCHAR or TEXT. I'm setting a front-end limit of 3000 characters which means the messages would never be inserted into the db as longer than this.
Is there a rationale for going with either VARCHAR(3000) or TEXT? There's something about just writing VARCHAR(3000) that feels somewhat counter-intuitive. I've been through other similar posts on Stack Overflow but would be good to get views specific to this type of common message storing.
TEXT and BLOB may by stored off the table with the table just having a pointer to the location of the actual storage. Where it is stored depends on lots of things like data size, columns size, row_format, and MySQL version.
VARCHAR is stored inline with the table. VARCHAR is faster when the size is reasonable, the tradeoff of which would be faster depends upon your data and your hardware, you'd want to benchmark a real-world scenario with your data.
Can you predict how long the user input would be?
VARCHAR(X)
Max Length: variable, up to 65,535 bytes (64KB)
Case: user name, email, country, subject, password
TEXT
Max Length: 65,535 bytes (64KB)
Case: messages, emails, comments, formatted text, html, code, images, links
MEDIUMTEXT
Max Length: 16,777,215 bytes (16MB)
Case: large json bodies, short to medium length books, csv strings
LONGTEXT
Max Length: 4,294,967,29 bytes (4GB)
Case: textbooks, programs, years of logs files, harry potter and the goblet of fire, scientific research logging
There's more information on this question.
Just to clarify the best practice:
Text format messages should almost always be stored as TEXT (they end up being arbitrarily long)
String attributes should be stored as VARCHAR (the destination user name, the subject, etc...).
I understand that you've got a front end limit, which is great until it isn't. *grin* The trick is to think of the DB as separate from the applications that connect to it. Just because one application puts a limit on the data, doesn't mean that the data is intrinsically limited.
What is it about the messages themselves that forces them to never be more then 3000 characters? If it's just an arbitrary application constraint (say, for a text box or something), use a TEXT field at the data layer.
Short answer: No practical, performance, or storage, difference.
Long answer:
There is essentially no difference (in MySQL) between VARCHAR(3000) (or any other large limit) and TEXT. The former will truncate at 3000 characters; the latter will truncate at 65535 bytes. (I make a distinction between bytes and characters because a character can take multiple bytes.)
For smaller limits in VARCHAR, there are some advantages over TEXT.
"smaller" means 191, 255, 512, 767, or 3072, etc, depending on version, context, and CHARACTER SET.
INDEXes are limited in how big a column can be indexed. (767 or 3072 bytes; this is version and settings dependent)
Intermediate tables created by complex SELECTs are handled in two different ways -- MEMORY (faster) or MyISAM (slower). When 'large' columns are involved, the slower technique is automatically picked. (Significant changes coming in version 8.0; so this bullet item is subject to change.)
Related to the previous item, all TEXT datatypes (as opposed to VARCHAR) jump straight to MyISAM. That is, TINYTEXT is automatically worse for generated temp tables than the equivalent VARCHAR. (But this takes the discussion in a third direction!)
VARBINARY is like VARCHAR; BLOB is like TEXT.
A table with several 'large' VARCHARs could hit a limit of 64KB for the whole table definition; switching to TEXT is a simple and practical fix. (Example: (42000) Row size too large, from an Oracle dump to a MySQL dump )
Rebuttal to other answers
The original question asked one thing (which datatype to use); the accepted answer answered something else (off-record storage). That answer is now out of date.
When this thread was started and answered, there were only two "row formats" in InnoDB. Soon afterwards, two more formats (DYNAMIC and COMPRESSED) were introduced.
The storage location for TEXT and VARCHAR() is based on size, not on name of datatype. For an updated discussion of on/off-record storage of large text/blob columns, see this .
Disclaimer: I'm not a MySQL expert ... but this is my understanding of the issues.
I think TEXT is stored outside the mysql row, while I think VARCHAR is stored as part of the row. There is a maximum row length for mysql rows .. so you can limit how much other data you can store in a row by using the VARCHAR.
Also due to VARCHAR forming part of the row, I suspect that queries looking at that field will be slightly faster than those using a TEXT chunk.
The preceding answers don't insist enough on the main problem: even in very simple queries like
(SELECT t2.* FROM t1, t2 WHERE t2.id = t1.id ORDER BY t1.id)
a temporary table can be required, and if a VARCHAR field is involved, it is converted to a CHAR field in the temporary table. So if you have in your table say 500 000 lines with a VARCHAR(65000) field, this column alone will use 6.5*5*10^9 byte. Such temp tables can't be handled in memory and are written to disk. The impact can be expected to be catastrophic.
Source (with metrics): https://nicj.net/mysql-text-vs-varchar-performance/
(This refers to the handling of TEXT vs VARCHAR in "standard"(?) MyISAM storage engine. It may be different in others, e.g., InnoDB.)
Varchar is for small data like email addresses, while Text is for much bigger data like news articles, Blob for binary data such as images.
The performance of Varchar is more powerful because it runs completely from memory, but this will not be the case if data is too big like varchar(4000) for example.
Text, on the other hand, does not stick to memory and is affected by disk performance, but you can avoid that by separating text data in a separate table and apply a left join query to retrieve text data.
Blob is much slower so use it only if you don't have much data like 10000 images which will cost 10000 records.
Follow these tips for maximum speed and performance:
Use varchar for name, titles, emails
Use Text for large data
Separate text in different tables
Use Left Join queries on an ID such as a phone number
If you are going to use Blob apply the same tips as in Text
This will make queries cost milliseconds on tables with data >10 M and size up to 10GB guaranteed.
There is a HUGE difference between VARCHAR and TEXT. While VARCHAR fields can be indexed, TEXT fields cannot. VARCHAR type fields are stored inline while TEXT are stored offline, only pointers to TEXT data is actually stored in the records.
If you have to index your field for faster search, update or delete than go for VARCHAR, no matter how big. A VARCHAR(10000000) will never be the same as a TEXT field bacause these two data types are different in nature.
If you use you field only for archiving
you don't care about data
speed retrival
you care about speed but you will use the operator
'%LIKE%' in your search query so indexing will not help much
you
can't predict a limit of the data length
than go for TEXT.
Just a correction to so many answers here, even if it is a bit late to the party.
Text fields can be fully indexed by MySQL as per their documentation.
Link provided
https://dev.mysql.com/doc/refman/5.6/en/column-indexes.html
Overall Varchar fields are longer to write to then Text fields, but it only matters if you have multitude of write requests

TEXT vs VARCHAR in InnoDB MySQL 5.5. When to use each one

What are the main differences between text and varchar in InnoDB storage engine at MySQL 5.5?
Is text or varchar(5000) used if we are talking about a variable text field of not more than 5000 chars?
According to a forum post in innodb.com
From InnoDB's perspective varchar, text and blob are laid out on the disk in exactly same way. If you are interested in the internal details, there is a difference in the way InnoDB passes the varchar and text values back to MySQL. In case of varchar the value is passed back inside the row buffer whereas in case of text the row buffer contains a pointer to a chunk of memory containing actual value. But again this is more like a detail of MySQL storage engine interface and should be irrelevant to the end user.
Also check this.
Generally speaking the usage of TEXT was to "attach" infrequently-used, large data to a row with minimal performance impact when it's not being accessed, e.g. you wouldn't want to search/index the contents. It was also the only option above the old 255 byte max for VARCHARs.
However InnoDB has blurred the once clear line between VARCHAR and TEXT. The three main differences I can find are:
One remaining performance difference is that TEXT fields force temporary tables to disk during SELECT statements. This can be a performance hit for smaller data, but see this post for how using really big VARCHARs in this scenario might be worse due to their conversion to fixed-size in MEMORY storage. ie. large VARCHARS will fill up the memory buffers and may still go to disk.
BLOB and TEXT columns cannot have DEFAULT values. reference
Only the first N chars of TEXT fields are indexed. reference
So while its true they're now stored in the same structure, it's always worth benchmarking your entire app to see what other related differences may affect performance, sorting, search, storage size, packet size, backups/replication etc.
Varchar is stored in the table, but text is linked to the table, so performance gain when using VARCHAR.
for more varchar vs text - MySQL

Field/Data Type to Store Articles in MySQL Database

I can't help but believe this topic has been written about over and over again but I'm having trouble finding any good, solid information.
What data type should I use to store 200 to 400 words of text? What about longer articles that could approach two or three thousand words?
What options should affect my decision? I don't plan to search this data but I can't completely rule out the possibility that I may want to do that later.
Unfortunately my background is MS Access where the only option for this was a memo field. It doesn't appear to be quite so simple with MySQL.
If you're using MySQL 5.0.3 or later, go VARCHAR. It can hold 65k bytes. As long as you have only 1 long VARCHAR per row, you should be fine.
Otherwise go with text.
From the mysql manual:
BLOB and TEXT differ from VARBINARY
and VARCHAR in the following ways:
There is no trailing-space removal for
BLOB and TEXT columns when values are
stored or retrieved. Before MySQL
5.0.3, this differs from VARBINARY and VARCHAR, for which trailing spaces are
removed when values are stored.
On comparisons, TEXT is space extended
to fit the compared object, exactly
like CHAR and VARCHAR.
For indexes on BLOB and TEXT columns,
you must specify an index prefix
length. For CHAR and VARCHAR, a prefix
length is optional. See Section 7.5.1,
“Column Indexes”.
BLOB and TEXT columns cannot have
DEFAULT values.
Also nice to know (from the manual):
Instances of BLOB or TEXT columns in
the result of a query that is
processed using a temporary table
causes the server to use a table on
disk rather than in memory because the
MEMORY storage engine does not support
those data types
which you really should take into account when formulating queries which use TEXT.
A TEXT field should be big enough to store most articles. Seems to be about equivalent to Access's Memo type. It can hold up to 65535 chars, which would be somewhere around...i dunno...10-12,000 words, on average?
The TEXT data type is a safe bet for your situation, VARCHARs are usually used when they need to be indexed or there is a well-defined value to be stored (IP address, zip code, etc).

Which DATATYPE is better to use TEXT or VARCHAR?

This question is based on two things performance and size
Which DATATYPE is better to use TEXT or VARCHAR? Based on performance which will affect and which will impove?
It depends on what you're using it for. I hate to give such a generic answer, but it's true. Generally, try to get the data type as specific as you can. If your strings will never exceed some upper limit of characters, then go with VARCHAR because it will be a little more efficient. If you need more space, go with TEXT. If you aren't sure how much space your text will take up, you should probably go with TEXT; the performance difference isn't very large, and it's better to be future-proof than risk having to change it later when your requirements change. Just my two cents.
In the comments, Pitarou points out that, if MySQL creates a temporary table for your query (see this), TEXT columns will not be stored in memory and will have to be read from the disk, which is much slower. (Source, bottom of the page.) This shouldn't matter for most queries, though.
In case anyone was wondering how PostgreSQL compares, I found this benchmark that shows that CHAR, VARCHAR, and TEXT all perform equally well. So if you're using Postgres, it doesn't matter what type you use.
From V 5.0.3 onwards, Limit of VARCHAR is increased from 0-256 to 0-65,535 (subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.)
Ref. http://dev.mysql.com/doc/refman/5.0/en/char.html
If you are using TEXT that is fixed 64k length, even if you required lesser limit
So Better to go with VARCHAR with higher limit than TEXT.
If requirement is more than 64K go with MEDIUMTEXT or LONGTEXT accordingly.
Queries against the TEXT table were always 3 times slower than those against the VARCHAR table (averages: 0.10 seconds for the VARCHAR table, 0.29 seconds for the TEXT table). The difference is 100% repeatable.
Benchmark from http://forums.mysql.com/read.php?24,105964,105964
VARCHAR you can set a limit for how many chars it will accept per record, text is (virtually) unlimited... not exactly sure about performance, but i would assume a more specific datatype (varchar) would be faster.
VARCHAR should have a better performance since it has a limited size. In fact, in all of my experiences with MySQL, the search operation was always faster with VARCHAR than TEXT. Anyway, it's based on my experience. You should check the documentation to find out more about it.
It really depends on your data type.
If your field is fixed-length (e.g. a 32-character hash value), then use CHAR. This has better performance because every entry takes up the same space per row.
The standard limit for VARCHAR was 255 characters but I think it's been increased now. TEXT is pretty damn long and is generally only used for big content like a whole blog post, and comments if you don't want a limit.
With regard to size there is no (or very little) difference between VARCHAR and TEXT since they just store what they need to. CHAR fields will always take up their allotted length.
Performance-wise, VARCHAR is usually faster. VARCHARs can be indexed too which leads to faster searching.
MySQL will internally convert TEXT to varchar while creating temporary tables. So it is better to use VARCHAR if possible. There are a few minor bugs related to TEXT column such as...
http://bugs.mysql.com/bug.php?id=36676
As per my Opinion VARCHAR is best option when u know the length of characters. It will also reduce garbage Memory Allocations and space issue. TEXT will consume 255 where as VARCHAR will consume as u give the values to it.
As per performance, VARCHAR is also faster then TEXT.
There is a subtle difference in text and varchar. I have a table as shown:
CREATE TABLE `test`.`tbl`(
`kee` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
`txt` TEXT(100),
`vrchr` VARCHAR(100),
PRIMARY KEY (`kee`)
);
I insert a row:
INSERT INTO `tbl`
(`txt`,
`vrchr`)
VALUES ('1
2
3',
'1
2
3');
The column txt has value:
1
2
3
and column vrchr has value:
1