What is the best way to store long hex string that has more then 1500 chars?
What is the best type?
Byte
Text
LongText
VarChar
BINARY have fixed padding and a maximum length of 255 byte,
so it's no good for you.
VARCHAR and TEXT have both a maximum length of
65535 bytes (64kB) and uses 2 more extra bytes to store the length of the data.
LONGTEXT have a maximum length of 4294967295 bytes (4GB)
and uses 4 more extra bytes to store the length of the data.
If you know that you will always have less than 64kB of data in your HEX string, I would choose TEXT, otherwise go with LONGTEXT.
If you pack it into bytes (from hex), then VARBINARY(...) or BLOB.
If you keep it as hex, VARCHAR(...) CHARACTER SET ascii or TEXT CHARACTER SET ascii. I mention ascii especially if your default is utf8; there is no need for utf8 for hex digits.
Related
After reading the documentation, I understood that there is a one-byte or two-byte length prefix to a varying character so as to determine its length. I understand too that, for a varchar, each character might have a different length in bytes depending on the character itself.
So my question is:
How does the DBMS determine each character's length after it's stored?
Meaning: After a string is stored, let's say it's 4 characters long, and let's suppose that the first character is 1 byte long, the second 2 bytes, the 3rd 3 bytes and the 4th is 4..
How does the DB know how long is each character when retrieving the string so as to read it correctly?
I hope the question is clear, sorry for any English mistakes I made. Thanks
The way UTF-8 works as a variable-length encoding is that the 1-byte characters can only use 7 bits of that byte.
If the high bit is 0, then the byte is a 1-byte character (which happens to be encoded in the same way as the 128 ASCII characters).
If the high bit is 1, then it's a multi-byte character.
Picture from https://en.wikipedia.org/wiki/UTF-8
If you're talking about UTF-8, that's not quite how it works. It uses the highest bit in each byte to indicate that the character continues into the next byte, and can store one, two, three or four byte characters fairly efficiently. This is in contrast to UTF-32 where every character is automatically four bytes, something that is obviously very wasteful for some types of text.
When using UTF-8, or any character set where the characters are a variable number of bytes, there's a disconnect between the length of the string in bytes and the length of the string in characters. In a fixed-length system like Latin1, which is rigidly 8-bit, there's no such drift.
Internally the database is most concerned with the length of a field in terms of bytes. The length in terms of characters is only explicitly exposed when calling functions like LENGTH(), as otherwise it's just a bunch of bytes that, if necessary, can be interpreted as a string.
Historically speaking the database stored the length of a field in bytes in a single byte, then the data itself. That's why VARCHAR(255) is so prevalent: It's the longest string you can represent with a single byte length field. Newer databases like Postgres allow >2GB character fields, so they're using four or more bytes to represent the length.
This is probably a stupid question, but i need to ask...
I've created a MySQL table to handle images called images. In it, I have an attribute that keeps the extension of the image called extension.
Most of the accepted images extensions are either jpg or png or gif or bmp or jpeg or tiff In other words, a maximum of 4 characters long.
Now, should the attribute be declared in the MySQL table like:
extension char(4)
or
extension varchar(4)
There's probably no impact what so ever on performance, but i do want the model to be optimize from the get go...
Anyone?
Depends....
If you look at this from the MySQL documentation
Value CHAR(4) Storage Required VARCHAR(4) Storage Required
'' ' ' 4 bytes '' 1 byte
'ab' 'ab ' 4 bytes 'ab' 3 bytes
'abcd' 'abcd' 4 bytes 'abcd' 5 bytes
'abcdefgh' 'abcd' 4 bytes 'abcd' 5 bytes
As you can see 4 characters for CHAR takes 4 bytes, while VARCHAR takes 5. If the vast majority of extensions would be 4 characters then CHAR would be more space efficient.
In your case I am guessing that 3 will be majority so VARCHAR is the better choice.
James :-)
Edited, I was making a wrong assumption on my previous answer. I'll just paste you an excerpt from http://dev.mysql.com/doc/refman/5.0/en/char.html (emphasis added)
The CHAR and VARCHAR types are similar, but differ in the way they are stored and retrieved. As of MySQL 5.0.3, they also differ in maximum length and in whether trailing spaces are retained.
The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30) can hold up to 30 characters.
The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed.
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
In contrast to CHAR, VARCHAR values are stored as a one-byte or two-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.
I want to store paragraphs in my database. This includes some bold and italic characters. Is there any datatype in MySQL for this; or, should I store the data in a varchar column?
How should new line and bold/italic characters by stored in MySQL?
you can't you have to store html code in varchar/text datatype field in mysql and then just echo it without using htmlspecialchars()
You can use one of below data type depending on your requirement. Don't worry about bold or italic. since all those are in character format inside html.
TEXT A string with a maximum length of 65535 characters.
BLOB A string with a maximum length of 65535 characters.
MEDIUMTEXT A string with a maximum length of 16777215 characters.
MEDIUMBLOB A string with a maximum length of 16777215 characters.
LONGTEXT A string with a maximum length of 4294967295 characters.
LONGBLOB A string with a maximum length of 4294967295 characters.
According to the mysql documentation:
A column uses one length byte if values require no more than 255 bytes,
two length bytes if values may require
more than 255 bytes
AND
The maximum row size constrains the
number of columns because the total
width of all columns cannot exceed
this size. For example, utf8
characters require up to three bytes
per character, so for a CHAR(255)
CHARACTER SET utf8 column, the server
must allocate 255 × 3 = 765 bytes per
value. Consequently, a table cannot
contain more than 65,535 / 765 = 85
such columns.
For clarity, what then is the maximum value I can set in the varchar argument so it only uses 1 byte to store its length?
From the MySQL documentation:
The CHAR and VARCHAR types are
declared with a length that indicates
the maximum number of characters you
want to store. For example, CHAR(30)
can hold up to 30 characters.
A [VARCHAR] column uses one length
byte if values require no more than
255 bytes, two length bytes if
values may require more than 255
bytes.
This makes the answer to your question depend on the character encoding.
With a single-byte encoding like windows-1252 (which MySQL calls latin1), the character length is the same as the byte length, so you can use a VARCHAR(255).
With UTF-8, a VARCHAR(N) may require up to 3N bytes, as would be the case if all characters were in the range U+0800 to U+FFFF. Thus, a VARCHAR(85) is the greatest that ensures a single-byte byte length (requiring a maximum of 255 bytes).
(Note that MySQL apparently does not support characters outside the BMP. The official definition of UTF-8 allows 4 bytes per character.)
For clarity, what then is the maximum value I can set in the varchar argument so it only uses 1 byte to store its length?
This depends on the collation of the VARCHAR column.
As you noted, UTF8 may use up to three bytes per character, so if your declare a UTF8 column more than 85 characters long, there is a chance that it will use more than 255 bytes to store its data, and the length hence should be stored in a two-byte field.
If you use latin1, each character is stored in 1 byte.
So the answer is:
VARCHAR(85) COLLATE UTF8_GENERAL_CI
, or
VARCHAR(255) COLLATE LATIN1_GENERAL_CI
I think you're confusing string size with character representation.
For instance, you could have a character that takes 4 bytes to represent it, and put it inside of string whose max storage size requires only one byte to hold the length since there's less than 255 characters in it.
Reading this question, a doubt popped into my head:
char and varchar can store up to 255 chars
text can store up to 65k chars
char size in bytes is number of chars
varchar size in bytes is number of chars used + 1
So how much bytes does TEXT actually occupy? ~65KB or number of chars used + 1?
TEXT is a variable length datatype, with a maximum of 65,000 characters.
LONGTEXT can be used for over 4 trillion characters.
To answer your question: it's a variable lenght, and it will only occupy the amount of characters you store.
TEXT occupies a number actual length of your data + 2 bytes.