Why some MySQL data type require some extra bytes? - mysql

I was reading about the MySQL data type size. I saw VARCHAR takes extra 1/2 bytes, MEDIUMTEXT requires extra 3 bytes, LONGTEXT requires extra 4 bytes. What is the reason for such MySQL behaviour?

When MySQL (or any database or computer language) stores a variable length string, there are basically two ways to store the value:
The length can be encoded followed by the characters in the string
The end of the string can be marked by a special character (typically '0')
Databases (almost?) always use length encoding. So, when you store 'ABC' as a variable length string, in the database storage it looks like:
3 A B C
When you store 'A':
1 A
That way, MySQL knows when one string ends and the next begins. The different lengths for the different types are based on the maximum length of the string. So, 1 byte can hold values from 0 to 255. 2 bytes can hold values from 0 to 65,535 and so on.
When you use a regular character expression, say char(3), then 'ABC' looks like:
A B C
This occupies three bytes/whatever (depending on the character coding). The length is known from the table metadata.
With char(3), the string 'A' also occupies three slots:
A
---^space here
--------^space here
The extra two are occupied by spaces. For long strings, this is generally a big waste of space, which is why most strings are stored as varchar rather than char.

Related

mySQL - Does Int(9.455.487) take more space than string(John) in mySQL?

I understood that in a database an int takes less space than a string. But what if the int is really longer than the string. For example 9.455.487 vs "John". Which one will take more space? TY
From the documentation, size of int is 4 bytes, whereas for char it is "M × w bytes, 0 <= M <= 255, where w is the number of bytes required for the maximum-length character in the character set." and M is the declared column size.
So when you talk of how much space is taken, the int will take up 4 bytes for a value as long as the value is within the range of int. A string like "John", if declared as char(4) will take up 4 * w bytes, so at least 4 bytes assuming w is 1.
Long story short, the size of a number is not how many characters long it is when you write it out, but the number of bytes to represent it in the binary form.
You should be aware of what "int" (integer) is and what strings are. Integer always has some length and that length means how many bytes are in it's binary representation. On the other hand, strings are sequences of bytes. So, depending of encoding, each symbol may be one or more bytes.
The thing that 9.455.487 is "longer" than "John" is irrelevant here. What is relevant - is how DBMS (or whatever other environment) will represent those things. You're seeing "longer" integer versus "shorter" string while it's not so, it's only a matter of "screen" representation (i.e. what you see on the screen).
Answering question - for MySQL, INT is 4 bytes, while string data types may have dynamic length - such as VARCHAR. Static string length date type is CHAR and from that viewpoint, your number and your string will have same length (4 bytes). Strings and integers are just different things to compare for "length". And visual representation should not confuse you. This entities have different internal structure, and, therefore, should not be compared on "length" according to their visual representation.
Also, you should be aware that not always integer will have 4 bytes length - since even in MySQL your number may belong to, for example, BIGINT data type (which is 8 bytes length). And, as mentioned above, for strings there's also encoding issue. For instance, UTF-8 encoded string may have two (or even more) bytes to represent some non-ASCII symbols. In this case each symbol will add more that 1 byte to total string length.

Mysql byte length

ive came across a website who state that the following:
Ex: CountryCode CHAR(3) CHARSET utf8
We are asking for a column with 3 characters exactly. The required storage for this column
will be such that any 3-letter name must fit in. This means (3 characters) times (3 bytes
per character) = 9 bytes of storage. So CHAR and utf8 together may be less than ideal.
VARCHAR behaves better: it only requires as many bytes per character as described above. So
the text "abc" will only require 3 bytes
Do i need for the text 'abc'(with utf8 and char(3)) 3 bytes or 9 bytes?!
Thanks
MySQL's internal structure places CHAR fields directly within the table structure, e.g. A simple table like:
create table foo (
id int
name char(3)
);
would produce an on-disk record that looks like
xxxxccccccccc
^^^^-- 4 bytes of int storage space
^^^^^^^^^ 9 bytes of utf-8 char space
Since MySQL has no way of knowing in advance what kind of text you'll be storing in that char field, it HAS to assume worst-case, and allocates as much space as 3 chars of 'absolutely the longest possible' utf-8 text might take. If it didn't, then an overly long string would overflow the on-disk storage and start scribbling on an adjacent record.
varchar, on the other hand, only has a small 'stub' data section in the table's raw data, and the varchar's contents are stored elsewhere. That means that your varchar(3) will always occupy the same amount of table-space storage, no matter WHAT kind of character set you're using.

char or varchar

This is probably a stupid question, but i need to ask...
I've created a MySQL table to handle images called images. In it, I have an attribute that keeps the extension of the image called extension.
Most of the accepted images extensions are either jpg or png or gif or bmp or jpeg or tiff In other words, a maximum of 4 characters long.
Now, should the attribute be declared in the MySQL table like:
extension char(4)
or
extension varchar(4)
There's probably no impact what so ever on performance, but i do want the model to be optimize from the get go...
Anyone?
Depends....
If you look at this from the MySQL documentation
Value CHAR(4) Storage Required VARCHAR(4) Storage Required
'' ' ' 4 bytes '' 1 byte
'ab' 'ab ' 4 bytes 'ab' 3 bytes
'abcd' 'abcd' 4 bytes 'abcd' 5 bytes
'abcdefgh' 'abcd' 4 bytes 'abcd' 5 bytes
As you can see 4 characters for CHAR takes 4 bytes, while VARCHAR takes 5. If the vast majority of extensions would be 4 characters then CHAR would be more space efficient.
In your case I am guessing that 3 will be majority so VARCHAR is the better choice.
James :-)
Edited, I was making a wrong assumption on my previous answer. I'll just paste you an excerpt from http://dev.mysql.com/doc/refman/5.0/en/char.html (emphasis added)
The CHAR and VARCHAR types are similar, but differ in the way they are stored and retrieved. As of MySQL 5.0.3, they also differ in maximum length and in whether trailing spaces are retained.
The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30) can hold up to 30 characters.
The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed.
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
In contrast to CHAR, VARCHAR values are stored as a one-byte or two-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.

Can MySQL VARCHAR's store any chars?

Can MySQL VARCHAR's store any char like nulls and newlines?
MySQL can store any chars, but depending on the specified encoding, some characters may be misrepresented if different from your display encoding.
Varchars can contain any chars, even 0x00, since its not a null terminated string, but a length specified string, where the length is specified at the beginning of the string as a 2 or 4 byte value.
I'm not sure about Char columns, but I'm guessing they are null terminated in the sense that the first 0x00 indicates end of string (though storage still takes up the full length).
If you want to store true byte data though, you should use the BLOB type. That is also a length specified type, which is guaranteed to contain whatever data you put into it, and is not affected by encodings at all.
You should probably use the BINARY and VARBINARY data types for storing strings of non-printable characters, especially if you want strings of single bytes instead of actual text characters (which can be multi-byte).

How many characters in varchar(max)?

How many characters can a SQL Server 2008 database field contain when the data type is VARCHAR(MAX)?
From http://msdn.microsoft.com/en-us/library/ms176089.aspx
varchar [ ( n | max ) ]
Variable-length, non-Unicode character
data. n can be a value from 1 through
8,000. max indicates that the maximum
storage size is 2^31-1 bytes. The
storage size is the actual length of
data entered + 2 bytes. The data
entered can be 0 characters in length.
The ISO synonyms for varchar are char
varying or character varying.
1 character = 1 byte. And don't forget 2 bytes for the termination. So, 2^31-3 characters.
For future readers who need this answer quickly:
2^31-1 = 2 147 483 647 characters, or roughly 2.147 billion
See the MSDN reference table for maximum numbers/sizes.
Bytes per varchar(max),
varbinary(max), xml, text, or image
column: 2^31-1
There's a two-byte overhead for the column, so the actual data is 2^31-3 max bytes in length. Assuming you're using a single-byte character encoding, that's 2^31-3 characters total. (If you're using a character encoding that uses more than one byte per character, divide by the total number of bytes per character. If you're using a variable-length character encoding, all bets are off.)
There are a few gotchas worth mentioning - you may need to force the use of varchar(max)
https://dba.stackexchange.com/questions/18483/varcharmax-field-cutting-off-data-after-8000-characters
and print only handles 8000 chars
How to print VARCHAR(MAX) using Print Statement?