mySQL - Does Int(9.455.487) take more space than string(John) in mySQL? - mysql

I understood that in a database an int takes less space than a string. But what if the int is really longer than the string. For example 9.455.487 vs "John". Which one will take more space? TY

From the documentation, size of int is 4 bytes, whereas for char it is "M × w bytes, 0 <= M <= 255, where w is the number of bytes required for the maximum-length character in the character set." and M is the declared column size.
So when you talk of how much space is taken, the int will take up 4 bytes for a value as long as the value is within the range of int. A string like "John", if declared as char(4) will take up 4 * w bytes, so at least 4 bytes assuming w is 1.
Long story short, the size of a number is not how many characters long it is when you write it out, but the number of bytes to represent it in the binary form.

You should be aware of what "int" (integer) is and what strings are. Integer always has some length and that length means how many bytes are in it's binary representation. On the other hand, strings are sequences of bytes. So, depending of encoding, each symbol may be one or more bytes.
The thing that 9.455.487 is "longer" than "John" is irrelevant here. What is relevant - is how DBMS (or whatever other environment) will represent those things. You're seeing "longer" integer versus "shorter" string while it's not so, it's only a matter of "screen" representation (i.e. what you see on the screen).
Answering question - for MySQL, INT is 4 bytes, while string data types may have dynamic length - such as VARCHAR. Static string length date type is CHAR and from that viewpoint, your number and your string will have same length (4 bytes). Strings and integers are just different things to compare for "length". And visual representation should not confuse you. This entities have different internal structure, and, therefore, should not be compared on "length" according to their visual representation.
Also, you should be aware that not always integer will have 4 bytes length - since even in MySQL your number may belong to, for example, BIGINT data type (which is 8 bytes length). And, as mentioned above, for strings there's also encoding issue. For instance, UTF-8 encoded string may have two (or even more) bytes to represent some non-ASCII symbols. In this case each symbol will add more that 1 byte to total string length.

Related

SQL VARCHAR length

I read a lot of articles and SO questions about the topic, but all they did was to confuse me more.
I'm trying to understand what is the longest string that a VARCHAR can hold, and how to define it.
In some places it said that a VARCHAR can be created with a max length of 255 (i.e. VARCHAR(255)) - I don't understand if it means 255 bytes or characters.
In other places it said that a VARCHAR can hold up to 8000 bytes - and then depends on the language (if it's 1-byte per character, such as Latin, or more - that determined the length of the longest string).
In simple terms, what does the n in VARCHAR(n) stands for, and what is the range of n?
Is it bytes? Is it the number of a character? between 0-255? between 0-8000?
How does a really long text is saved? does it get split into multiple columns?
VARCHAR is storing strings in 1 byte per symbol (an opposite to nvarchar that can use 2 or more bytes per character). You can read details here.
A common misconception is to think that CHAR(n) and VARCHAR(n), the n defines the number of characters. But in CHAR(n) and VARCHAR(n) the n defines the string length in bytes (0-8,000). n never defines numbers of characters that can be stored. This is similar to the definition of NCHAR(n) and NVARCHAR(n). The misconception happens because when using single-byte encoding, the storage size of CHAR and VARCHAR is n bytes and the number of characters is also n. However, for multi-byte encoding such as UTF-8, higher Unicode ranges (128-1,114,111) result in one character using two or more bytes. For example, in a column defined as CHAR(10), the Database Engine can store 10 characters that use single-byte encoding (Unicode range 0-127), but less than 10 characters when using multi-byte encoding (Unicode range 128-1,114,111). For more information about Unicode storage and character ranges, see Storage differences between UTF-8 and UTF-16.

mysql char, varchar and decimal byte size

I was reading the mysql documentation on the byte size for different data types, but was a little confused when it came to char, varchar and decimal.
Can somebody help explain the bytes for these three data types, and also answer how many bytes for the following:
char(7)
varchar(9)
decimal(15,2)
decimal(11,6)
Thanks
CHAR(N) is probably the most confusing because a char is not a fixed byte size across character sets. Furthermore, different row formats handle this problem differently. Tersely, if you're using ROW_FORMAT=COMPACT, ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED then CHAR(N) reverse a minimum of N bytes in order to achieve updates in place without fragmentation. If more bytes are required as the result of a different character encoding than it will use more as necessary, trying to use as few as possible, and NO MORE than the maximum character byte length * N is used. If you're using ROW_FORMAT=REDUNDANT, than CHAR(N) always uses the maximum character byte length * N.
VARCHAR(N) and VARBINARY(N) sets a maximum character length per column of N. Below N, MySQL uses the number of bytes required given the string and character encoding used. MySQL then uses one additional byte to record the length of the string if the string is below 256 bytes. If the length of the string is greater than 255 bytes than it uses 2 bytes to record the length of the string. VAR columns are storage efficient but for string columns with frequent UPDATES, one can trade storage for performance by using a fixed length column such as BINARY.
The DECIMAL description is pretty self explanatory:
"Values for DECIMAL (and NUMERIC) columns are represented using a binary format that packs nine decimal (base 10) digits into four bytes. Storage for the integer and fractional parts of each value are determined separately. Each multiple of nine digits requires four bytes, and the “leftover” digits require some fraction of four bytes. The storage required for excess digits is given by the following table"

How does SQL determine a character's length in a varchar?

After reading the documentation, I understood that there is a one-byte or two-byte length prefix to a varying character so as to determine its length. I understand too that, for a varchar, each character might have a different length in bytes depending on the character itself.
So my question is:
How does the DBMS determine each character's length after it's stored?
Meaning: After a string is stored, let's say it's 4 characters long, and let's suppose that the first character is 1 byte long, the second 2 bytes, the 3rd 3 bytes and the 4th is 4..
How does the DB know how long is each character when retrieving the string so as to read it correctly?
I hope the question is clear, sorry for any English mistakes I made. Thanks
The way UTF-8 works as a variable-length encoding is that the 1-byte characters can only use 7 bits of that byte.
If the high bit is 0, then the byte is a 1-byte character (which happens to be encoded in the same way as the 128 ASCII characters).
If the high bit is 1, then it's a multi-byte character.
Picture from https://en.wikipedia.org/wiki/UTF-8
If you're talking about UTF-8, that's not quite how it works. It uses the highest bit in each byte to indicate that the character continues into the next byte, and can store one, two, three or four byte characters fairly efficiently. This is in contrast to UTF-32 where every character is automatically four bytes, something that is obviously very wasteful for some types of text.
When using UTF-8, or any character set where the characters are a variable number of bytes, there's a disconnect between the length of the string in bytes and the length of the string in characters. In a fixed-length system like Latin1, which is rigidly 8-bit, there's no such drift.
Internally the database is most concerned with the length of a field in terms of bytes. The length in terms of characters is only explicitly exposed when calling functions like LENGTH(), as otherwise it's just a bunch of bytes that, if necessary, can be interpreted as a string.
Historically speaking the database stored the length of a field in bytes in a single byte, then the data itself. That's why VARCHAR(255) is so prevalent: It's the longest string you can represent with a single byte length field. Newer databases like Postgres allow >2GB character fields, so they're using four or more bytes to represent the length.

Recommended way to store a string in this case?

I am storing strings and 99.5+% are less than 255 characters, so I store them in a VARCHAR(255).
The thing is, some of them can be 4kb or so. What's the best way to store those?
Option #1: store them in another table with a pointer to the main.
Option #1.0: add an INT column with DEFAULT NULL and the pointer will be stored there
Option #1.1: the pointer will be stored in the VARCHAR(255) column, e.g 'AAAAAAAAAAA[NUMBER]AAAAAAAAAAAA'
Option #2: increase the size of VARCHAR from 255 to 32767
What's the best of the above, Option #1.0, Option #1.1 or Option #2, performance wise?
Increase the size of your field to fit the max size of your string. A VARCHAR will not use the space unless needed.
VARCHAR values are stored as a 1-byte or 2-byte length prefix plus
data. The length prefix indicates the number of bytes in the value. A
column uses one length byte if values require no more than 255 bytes,
two length bytes if values may require more than 255 bytes.
http://dev.mysql.com/doc/refman/5.0/en/char.html
The MySQL Definition says that VARCHAR(N) will take up to L + 1 bytes if column values require 0 – 255 bytes, L + 2 bytes if values may require more than 255 bytes where L is the length in bytes of the stored string.
So I guess that option #2 is quite okay, because the small strings will still take less space than 32767 bytes.
EDIT:
Also imagine the countless problems options 1.0 and 1.1 would raise when you actually want to query a string without knowing whether it exceeds the length or not.
Option #2 is clearly best. It just adds 1 byte to the size of each value, and doesn't require any complicated joins to merge in the fields from the second table.

Why some MySQL data type require some extra bytes?

I was reading about the MySQL data type size. I saw VARCHAR takes extra 1/2 bytes, MEDIUMTEXT requires extra 3 bytes, LONGTEXT requires extra 4 bytes. What is the reason for such MySQL behaviour?
When MySQL (or any database or computer language) stores a variable length string, there are basically two ways to store the value:
The length can be encoded followed by the characters in the string
The end of the string can be marked by a special character (typically '0')
Databases (almost?) always use length encoding. So, when you store 'ABC' as a variable length string, in the database storage it looks like:
3 A B C
When you store 'A':
1 A
That way, MySQL knows when one string ends and the next begins. The different lengths for the different types are based on the maximum length of the string. So, 1 byte can hold values from 0 to 255. 2 bytes can hold values from 0 to 65,535 and so on.
When you use a regular character expression, say char(3), then 'ABC' looks like:
A B C
This occupies three bytes/whatever (depending on the character coding). The length is known from the table metadata.
With char(3), the string 'A' also occupies three slots:
A
---^space here
--------^space here
The extra two are occupied by spaces. For long strings, this is generally a big waste of space, which is why most strings are stored as varchar rather than char.