Estimating the size needed for text in MySQL - mysql

When I have a text field, I generally store it as varchar. But then I encounter the issue of how do I know the limit I should place? Estimating how much text a user will type seems very imprecise.
As varchar uses as much space as needed, is it better to set the limit to far greater than you estimate?
Is there any disadvantage to using the various Text datatypes? Can it be searched using the "LIKE" operator and wildcards?

I think there could be cases where it's better to shrink the value of varchar. But in many cases it's fine to use 255 as it's limit.
Varchar can be searched with LIKE and wildchards.

If the user is likely to type more than 255 characters (the usual limit for a VARCHAR) then you need TEXT, but it can be indexed. MySQL has a FULLTEXT index for TEXT columns, which works reasonably well, but wouldnt be as quick as a CHAR or VARCHAR column.
Why not try a test out on your local machine with some dummy data to see how the performance compares, and if you can do the queries you would like quickly enough?

Related

MySQL best way to store long strings

I'm looking for some advice on the best way to store long strings of data from the mySQL experts.
I have a general purpose table which is used to store any kind of data, by which I mean it should be able to hold alphanumeric and numeric data.
Currently, the table structure is simple with an ID and the actual data stored in a single column as follows:
id INT(11)
data VARCHAR(128)
I now have a requirement to store a larger amount of data (up to 500 characters) and am wondering whether the best way would be to simply increase the varchar column size, or whether I should add a new column (a TEXT type column?) for the times I need to store longer strings.
If any experts out there has any advice I'm all ears!
My preferred method would be to simply increase the varchar column, but that's because I'm lazy.
The mySQL version I'm running is 5.0.77.
I should mention the new 500 character requirement will only be for the odd record; most records in the table will be not longer than 50 characters.
I thought I'd be future-proofing by making the column 128. Shows how much I knew!
Generally speaking, this is not a question that has a "correct" answer. There is no "infinite length" text storage type in MySQL. You could use LONGTEXT, but that still has an (absurdly high) upper limit. Yet if you do, you're kicking your DBMS in the teeth for having to deal with that absurd blob of a column for your 50-character text. Not to mention the fact that you hardly do anything with it.
So, most futureproofness(TM) is probably offered by LONGTEXT. But it's also a very bad method of resolving the issue. Honestly, I'd revisit the application requirements. Storing strings that have no "domain" (as in, being well-defined in their application) and arbitrary length is not one of the strengths of RDBMS.
If I'd want to solve this on the "application design" level, I'd use NoSQL key-value store for this (and I'm as anti-NoSQL-hype as they get, so you know it's serious), even though I recognize it's a rather expensive change for such a minor change. But if this is an indication of what your DBMS is eventually going to hold, it might be more prudent to switch now to avoid this same problem hundred times in the future. Data domain is very important in RDBMS, whereas it's explicitly sidelined in non-relational solutions, which seems to be what you're trying to solve here.
Stuck with MySQL? Just increase it to VARCHAR(1000). If you have no requirements for your data, it's irrelevant what you do anyway.
Careful if using text. TEXT data is not stored in the database server’s memory, therefore, whenever you query TEXT data, MySQL has to read from it from the disk, which is much slower in comparison with CHAR and VARCHAR as it cannot make use of indexes.The better way to store long string will be nosql databases
We can use varchar(<maximum_limit>). The maximum limit that we can pass is 65535 bytes.
Note: This maximum length of a VARCHAR is shared among all columns except TEXT/BLOB columns and the character set used.

Permance using varchar(750) or text [duplicate]

I've got a messages table in MySQL which records messages between users. Apart from the typical ids and message types (all integer types) I need to save the actual message text as either VARCHAR or TEXT. I'm setting a front-end limit of 3000 characters which means the messages would never be inserted into the db as longer than this.
Is there a rationale for going with either VARCHAR(3000) or TEXT? There's something about just writing VARCHAR(3000) that feels somewhat counter-intuitive. I've been through other similar posts on Stack Overflow but would be good to get views specific to this type of common message storing.
TEXT and BLOB may by stored off the table with the table just having a pointer to the location of the actual storage. Where it is stored depends on lots of things like data size, columns size, row_format, and MySQL version.
VARCHAR is stored inline with the table. VARCHAR is faster when the size is reasonable, the tradeoff of which would be faster depends upon your data and your hardware, you'd want to benchmark a real-world scenario with your data.
Can you predict how long the user input would be?
VARCHAR(X)
Max Length: variable, up to 65,535 bytes (64KB)
Case: user name, email, country, subject, password
TEXT
Max Length: 65,535 bytes (64KB)
Case: messages, emails, comments, formatted text, html, code, images, links
MEDIUMTEXT
Max Length: 16,777,215 bytes (16MB)
Case: large json bodies, short to medium length books, csv strings
LONGTEXT
Max Length: 4,294,967,29 bytes (4GB)
Case: textbooks, programs, years of logs files, harry potter and the goblet of fire, scientific research logging
There's more information on this question.
Just to clarify the best practice:
Text format messages should almost always be stored as TEXT (they end up being arbitrarily long)
String attributes should be stored as VARCHAR (the destination user name, the subject, etc...).
I understand that you've got a front end limit, which is great until it isn't. *grin* The trick is to think of the DB as separate from the applications that connect to it. Just because one application puts a limit on the data, doesn't mean that the data is intrinsically limited.
What is it about the messages themselves that forces them to never be more then 3000 characters? If it's just an arbitrary application constraint (say, for a text box or something), use a TEXT field at the data layer.
Short answer: No practical, performance, or storage, difference.
Long answer:
There is essentially no difference (in MySQL) between VARCHAR(3000) (or any other large limit) and TEXT. The former will truncate at 3000 characters; the latter will truncate at 65535 bytes. (I make a distinction between bytes and characters because a character can take multiple bytes.)
For smaller limits in VARCHAR, there are some advantages over TEXT.
"smaller" means 191, 255, 512, 767, or 3072, etc, depending on version, context, and CHARACTER SET.
INDEXes are limited in how big a column can be indexed. (767 or 3072 bytes; this is version and settings dependent)
Intermediate tables created by complex SELECTs are handled in two different ways -- MEMORY (faster) or MyISAM (slower). When 'large' columns are involved, the slower technique is automatically picked. (Significant changes coming in version 8.0; so this bullet item is subject to change.)
Related to the previous item, all TEXT datatypes (as opposed to VARCHAR) jump straight to MyISAM. That is, TINYTEXT is automatically worse for generated temp tables than the equivalent VARCHAR. (But this takes the discussion in a third direction!)
VARBINARY is like VARCHAR; BLOB is like TEXT.
A table with several 'large' VARCHARs could hit a limit of 64KB for the whole table definition; switching to TEXT is a simple and practical fix. (Example: (42000) Row size too large, from an Oracle dump to a MySQL dump )
Rebuttal to other answers
The original question asked one thing (which datatype to use); the accepted answer answered something else (off-record storage). That answer is now out of date.
When this thread was started and answered, there were only two "row formats" in InnoDB. Soon afterwards, two more formats (DYNAMIC and COMPRESSED) were introduced.
The storage location for TEXT and VARCHAR() is based on size, not on name of datatype. For an updated discussion of on/off-record storage of large text/blob columns, see this .
Disclaimer: I'm not a MySQL expert ... but this is my understanding of the issues.
I think TEXT is stored outside the mysql row, while I think VARCHAR is stored as part of the row. There is a maximum row length for mysql rows .. so you can limit how much other data you can store in a row by using the VARCHAR.
Also due to VARCHAR forming part of the row, I suspect that queries looking at that field will be slightly faster than those using a TEXT chunk.
The preceding answers don't insist enough on the main problem: even in very simple queries like
(SELECT t2.* FROM t1, t2 WHERE t2.id = t1.id ORDER BY t1.id)
a temporary table can be required, and if a VARCHAR field is involved, it is converted to a CHAR field in the temporary table. So if you have in your table say 500 000 lines with a VARCHAR(65000) field, this column alone will use 6.5*5*10^9 byte. Such temp tables can't be handled in memory and are written to disk. The impact can be expected to be catastrophic.
Source (with metrics): https://nicj.net/mysql-text-vs-varchar-performance/
(This refers to the handling of TEXT vs VARCHAR in "standard"(?) MyISAM storage engine. It may be different in others, e.g., InnoDB.)
Varchar is for small data like email addresses, while Text is for much bigger data like news articles, Blob for binary data such as images.
The performance of Varchar is more powerful because it runs completely from memory, but this will not be the case if data is too big like varchar(4000) for example.
Text, on the other hand, does not stick to memory and is affected by disk performance, but you can avoid that by separating text data in a separate table and apply a left join query to retrieve text data.
Blob is much slower so use it only if you don't have much data like 10000 images which will cost 10000 records.
Follow these tips for maximum speed and performance:
Use varchar for name, titles, emails
Use Text for large data
Separate text in different tables
Use Left Join queries on an ID such as a phone number
If you are going to use Blob apply the same tips as in Text
This will make queries cost milliseconds on tables with data >10 M and size up to 10GB guaranteed.
There is a HUGE difference between VARCHAR and TEXT. While VARCHAR fields can be indexed, TEXT fields cannot. VARCHAR type fields are stored inline while TEXT are stored offline, only pointers to TEXT data is actually stored in the records.
If you have to index your field for faster search, update or delete than go for VARCHAR, no matter how big. A VARCHAR(10000000) will never be the same as a TEXT field bacause these two data types are different in nature.
If you use you field only for archiving
you don't care about data
speed retrival
you care about speed but you will use the operator
'%LIKE%' in your search query so indexing will not help much
you
can't predict a limit of the data length
than go for TEXT.
Just a correction to so many answers here, even if it is a bit late to the party.
Text fields can be fully indexed by MySQL as per their documentation.
Link provided
https://dev.mysql.com/doc/refman/5.6/en/column-indexes.html
Overall Varchar fields are longer to write to then Text fields, but it only matters if you have multitude of write requests

mySQL, type and length recommendation for Twitter message

I'm looking to store Tweets in mySQL. I would only need the message body and am looking for a recommendation as far as type and character length. Varchar or Text, if either of those.. how long? Tweets are limited to 140 characters, but I know there is a series of possible gotchas like re-tweets, #'s and links. Anybody went down this path? What worked for you?
I found this but this relates to the user name and not the message..
Twitter name length in DB
I think VARCHAR(255) is best for you. There isn't much reason to make a VARCHAR less than 255 since it is a variable length string it takes the same amount of space to store whether you limit it to 255 or 200. I don't think you have need for a TEXT data type. Side note, just in case: Watch out for injection attacks. Don't trust tweets, haha.
varchar can be used up to 255 i think that is more than enough buffer space for the gotchas.
even though i think you may be able to get by with varchar 200
It looks like it depends on the storage engine you are using. As VARCHAR fields, as opposed to TEXT fields, are stored in-row for the MyISAM storage engine. (I'm not sure but it seems there is not much difference when using innoDB)
So I would go with varchar.

Limiting the size of TEXT in MySQL for speed

I have a table with a field that needs to store 512 characters. My question is this: can you limit the length of a TEXT field (VARCHAR stops at 255 - I can't use it) to 512 characters? If I do that, will MySQL run through the table quicker due to the fixed data length, or does it act a bit like VARCHAR in that the length can vary?
I don't have a table (at all) yet, but I'm very conscious about speed and size for the future. I found this article: link text incredibly useful - maybe you will to!
Thanks a lot,
James
Update your mysql server. Since 5.0.3 you can store up to 65535 bytes in VARCHAR.
You don't really say how you are using the table, so I'll just throw this in. One option is to separate out the text field into another table and just have a reference to it in your original table.

Which DATATYPE is better to use TEXT or VARCHAR?

This question is based on two things performance and size
Which DATATYPE is better to use TEXT or VARCHAR? Based on performance which will affect and which will impove?
It depends on what you're using it for. I hate to give such a generic answer, but it's true. Generally, try to get the data type as specific as you can. If your strings will never exceed some upper limit of characters, then go with VARCHAR because it will be a little more efficient. If you need more space, go with TEXT. If you aren't sure how much space your text will take up, you should probably go with TEXT; the performance difference isn't very large, and it's better to be future-proof than risk having to change it later when your requirements change. Just my two cents.
In the comments, Pitarou points out that, if MySQL creates a temporary table for your query (see this), TEXT columns will not be stored in memory and will have to be read from the disk, which is much slower. (Source, bottom of the page.) This shouldn't matter for most queries, though.
In case anyone was wondering how PostgreSQL compares, I found this benchmark that shows that CHAR, VARCHAR, and TEXT all perform equally well. So if you're using Postgres, it doesn't matter what type you use.
From V 5.0.3 onwards, Limit of VARCHAR is increased from 0-256 to 0-65,535 (subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.)
Ref. http://dev.mysql.com/doc/refman/5.0/en/char.html
If you are using TEXT that is fixed 64k length, even if you required lesser limit
So Better to go with VARCHAR with higher limit than TEXT.
If requirement is more than 64K go with MEDIUMTEXT or LONGTEXT accordingly.
Queries against the TEXT table were always 3 times slower than those against the VARCHAR table (averages: 0.10 seconds for the VARCHAR table, 0.29 seconds for the TEXT table). The difference is 100% repeatable.
Benchmark from http://forums.mysql.com/read.php?24,105964,105964
VARCHAR you can set a limit for how many chars it will accept per record, text is (virtually) unlimited... not exactly sure about performance, but i would assume a more specific datatype (varchar) would be faster.
VARCHAR should have a better performance since it has a limited size. In fact, in all of my experiences with MySQL, the search operation was always faster with VARCHAR than TEXT. Anyway, it's based on my experience. You should check the documentation to find out more about it.
It really depends on your data type.
If your field is fixed-length (e.g. a 32-character hash value), then use CHAR. This has better performance because every entry takes up the same space per row.
The standard limit for VARCHAR was 255 characters but I think it's been increased now. TEXT is pretty damn long and is generally only used for big content like a whole blog post, and comments if you don't want a limit.
With regard to size there is no (or very little) difference between VARCHAR and TEXT since they just store what they need to. CHAR fields will always take up their allotted length.
Performance-wise, VARCHAR is usually faster. VARCHARs can be indexed too which leads to faster searching.
MySQL will internally convert TEXT to varchar while creating temporary tables. So it is better to use VARCHAR if possible. There are a few minor bugs related to TEXT column such as...
http://bugs.mysql.com/bug.php?id=36676
As per my Opinion VARCHAR is best option when u know the length of characters. It will also reduce garbage Memory Allocations and space issue. TEXT will consume 255 where as VARCHAR will consume as u give the values to it.
As per performance, VARCHAR is also faster then TEXT.
There is a subtle difference in text and varchar. I have a table as shown:
CREATE TABLE `test`.`tbl`(
`kee` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
`txt` TEXT(100),
`vrchr` VARCHAR(100),
PRIMARY KEY (`kee`)
);
I insert a row:
INSERT INTO `tbl`
(`txt`,
`vrchr`)
VALUES ('1
2
3',
'1
2
3');
The column txt has value:
1
2
3
and column vrchr has value:
1