MySQL TEXT or VARCHAR - mysql

We have a very large historical table that contains a column with at most 500 UTF8 characters, and the disk space grows really fast!
We're having at least 2 million rows a day... and we were wondering which would do a better job (mostly in storage but in performance as well)? TEXT or VARCHAR(512)?

VARCHAR is probably preferable in your case from both the storage and performance perspective. View this oft-reposted article.

This is useful information; I think in general, the answer is the varchar is usually the better bet.

From the MySQL manual:
In most respects, you can regard a
BLOB column as a VARBINARY column that
can be as large as you like.
Similarly, you can regard a TEXT
column as a VARCHAR column. BLOB and
TEXT differ from VARBINARY and VARCHAR
in the following ways:
There is no trailing-space removal for BLOB and TEXT columns when values
are stored or retrieved. Before MySQL
5.0.3, this differs from VARBINARY and VARCHAR, for which trailing spaces are
removed when values are stored.
On comparisons, TEXT is space extended to fit the compared object,
exactly like CHAR and VARCHAR.
For indexes on BLOB and TEXT columns, you must specify an index
prefix length. For CHAR and VARCHAR, a
prefix length is optional. See Section
7.5.1, “Column Indexes”.
BLOB and TEXT columns cannot have DEFAULT values.
http://dev.mysql.com/doc/refman/5.0/en/blob.html

Related

MYSQL: Difference between Binary and Blob

I'm trying to understand mysql data types, but i don't get the difference between the (Var-)BINARY data fields and the BLOB-Fields. What ist the difference between these types?
BLOB's can be as big as you want.
Also, reading the MySQL manual online:
BLOB and TEXT differ from VARBINARY and VARCHAR in the following ways:
There is no trailing-space removal for BLOB and TEXT columns when
values are stored or retrieved. Before MySQL 5.0.3, this differs from
VARBINARY and VARCHAR, for which trailing spaces are removed when
values are stored.
On comparisons, TEXT is space extended to fit the compared object,
exactly like CHAR and VARCHAR.
For indexes on BLOB and TEXT columns, you must specify an index prefix
length. For CHAR and VARCHAR, a prefix length is optional. See Section
7.5.1, “Column Indexes”.
BLOB and TEXT columns cannot have DEFAULT values.
The binary and varbinary types are binary strings whose actual values are stored in the table. The actual values blob (and text) types are stored elsewhere in the database with a 256 byte alias to that slot being placed in the table; the blob can therefore be "any" size (up to the max).

At which point does MySQL start treating VARCHAR cols like TEXT cols?

I'm aware that since MySQL 5, VARCHAR can have a length of up to 65,000. VARCHAR is stored inline, which means faster retrievals, as opposed to TEXT, which is stored outside of the table. That said, the documentation states that MySQL will treat LONG VARCHAR exactly TEXT.
According to this Source:
From storage prospective BLOB, TEXT as
well as long VARCHAR are handled same
way by Innodb. This is why Innodb
manual calls it “long columns” rather
than BLOBs.
When does MySQL start treating VARCHAR like TEXT? At what character count does MySQL make this distinction, and VARCHAR stops getting stored inline?
Short answer: A "long" VARCHAR is a normal VARCHAR and will be inline.
MySQL won't magically start treating a straight VARCHAR as a text type. It'll always be stored inline. With 5.0.3, the upper limit for VARCHARs was relaxed to 65,535 bytes. They also take up 2 bytes of header if over 255 characters. This limit is still applied to the maximum row size of 65,535 bytes. A LONG VARCHAR is actually a different type which backcompats to MEDIUMTEXT.
See: http://dev.mysql.com/doc/refman/5.0/en/char.html and http://dev.mysql.com/doc/refman/5.0/en/blob.html

Field/Data Type to Store Articles in MySQL Database

I can't help but believe this topic has been written about over and over again but I'm having trouble finding any good, solid information.
What data type should I use to store 200 to 400 words of text? What about longer articles that could approach two or three thousand words?
What options should affect my decision? I don't plan to search this data but I can't completely rule out the possibility that I may want to do that later.
Unfortunately my background is MS Access where the only option for this was a memo field. It doesn't appear to be quite so simple with MySQL.
If you're using MySQL 5.0.3 or later, go VARCHAR. It can hold 65k bytes. As long as you have only 1 long VARCHAR per row, you should be fine.
Otherwise go with text.
From the mysql manual:
BLOB and TEXT differ from VARBINARY
and VARCHAR in the following ways:
There is no trailing-space removal for
BLOB and TEXT columns when values are
stored or retrieved. Before MySQL
5.0.3, this differs from VARBINARY and VARCHAR, for which trailing spaces are
removed when values are stored.
On comparisons, TEXT is space extended
to fit the compared object, exactly
like CHAR and VARCHAR.
For indexes on BLOB and TEXT columns,
you must specify an index prefix
length. For CHAR and VARCHAR, a prefix
length is optional. See Section 7.5.1,
“Column Indexes”.
BLOB and TEXT columns cannot have
DEFAULT values.
Also nice to know (from the manual):
Instances of BLOB or TEXT columns in
the result of a query that is
processed using a temporary table
causes the server to use a table on
disk rather than in memory because the
MEMORY storage engine does not support
those data types
which you really should take into account when formulating queries which use TEXT.
A TEXT field should be big enough to store most articles. Seems to be about equivalent to Access's Memo type. It can hold up to 65535 chars, which would be somewhere around...i dunno...10-12,000 words, on average?
The TEXT data type is a safe bet for your situation, VARCHARs are usually used when they need to be indexed or there is a well-defined value to be stored (IP address, zip code, etc).

Why would I use VARCHAR over TEXT in MySQL?

I noticed that in MySQL, both VARCHAR and TEXT offer variable-sized data. Well, VARCHAR is a bit more efficient in data storage, but still, TEXT MEDIUMTEXT and LONGTEXT offer a lot more potential. So, what are the real uses of VARCHAR?
First of all, you should read the 10.4. String Types section of the MySQL's manual : it'll give you all the informations you are looking for :
10.4.1. The CHAR and VARCHAR Types
10.4.3. The BLOB and TEXT Types
A couple of important differences :
Difference in the amount of text those can contain :
varchar have a quite small size limit ; with the newest versions of MySQL, it's 64 KB, for the total of all varchar columns of a row -- which is not that much.
TEXT have virtually no limit, as they can contain something like 2^32 bytes.
There are differences in indexing and sorting, if I'm not mistake ; quoting the page about TEXT :
About sorting : "Only the first max_sort_length bytes of the column are used when sorting."
And, about performances : "Instances of BLOB or TEXT columns in the result of a query that is processed using a temporary table causes the server to use a table on disk rather than in memory"
Considering these informations, if you are sure that your strings will not be too long, and that you'll always be able to store them in a varchar, I would use a varchar.

Questions about types in MySQL

So I was creating a table for comments, and I was wondering. What would be a good type for comment details? I put longtext. Well then, why would people need varchar if longtext can handle it? Also, which type would I want for usernames?
What is the purpose of "primary" for index? What is the purpose of index?
Update:
Let's say a comment was actually a review.
It is true that TEXT can handle any input you'd place in VARCHAR or CHAR field. In fact TEXT could handle and data you might want to put in DECIMAL, INT, or almost any other type as well. Following this logic we might as well make every column a TEXT type.
But this would be a mistake. Why? Because using the appropriate column type for the expected input allows the database to better optimize queries, uses less disk space and makes the data model easier to understand and maintain.
In regards to the questions: a username column should use VARCHAR(20), since you would want and expect that most usernames are going to short, usually no more than 10 - 20 characters long. For a review column (like a movie review or book review) a TEXT type would be appropriate as reviews can span a single paragraph to several pages.
In regards to indexes, try this link:
http://20bits.com/articles/interview-questions-database-indexes/
That depends on what a "comment" is in your system. Typically VARCHAR is pretty standard for both comments and usernames. This limits you to about 255 characters, which is generally pretty acceptable. If you need more characters in your comments, you can bump it up to a text, which gives you a little over 65k chars.
For more information, see the String Types Reference.
TEXT NOT NULL. That gives sufficient room, has a 2 byte overhead, and generally presents no problems.
Regarding TEXT
On comparisons, TEXT is space extended
to fit the compared object, exactly
like CHAR and VARCHAR.
For indexes on BLOB and TEXT columns,
you must specify an index prefix
length. For CHAR and VARCHAR, a prefix
length is optional. See Section 7.4.2,
“Column Indexes”.
BLOB and TEXT columns cannot have
DEFAULT values.
If you use the BINARY attribute with a
TEXT data type, the column is assigned
the binary collation of the column
character set.
Regarding VARCHAR:
Values in VARCHAR columns are
variable-length strings. The length
can be specified as a value from 0 to
255 before MySQL 5.0.3, and 0 to
65,535 in 5.0.3 and later versions.
The effective maximum length of a
VARCHAR in MySQL 5.0.3 and later is
subject to the maximum row size
(65,535 bytes, which is shared among
all columns) and the character set
used.
More at: http://dev.mysql.com/doc/refman/5.0/en/blob.html
Have a look at this web page, it lists all the MySQL field types and describes what they are and how they're different from each other.