Does empty LONGTEXT string takes 4GB of disc space? - mysql

I've been reading about disc usage/space for different strings, it says that LONGTEXT takes 4GB.
Is that disk space declared for FULLY FILLED column or JUST CREATED (Empty)
Thank You.

The answer is: L + 4 bytes, where L < 2^32
Variable-length string types are stored using a length prefix plus
data. The length prefix requires from one to four bytes depending on
the data type, and the value of the prefix is L (the byte length of
the string). For example, storage for a MEDIUMTEXT value requires L
bytes to store the value plus three bytes to store the length of the
value.
Source: https://dev.mysql.com/doc/refman/5.7/en/storage-requirements.html#data-types-storage-reqs-strings

Related

mysql char, varchar and decimal byte size

I was reading the mysql documentation on the byte size for different data types, but was a little confused when it came to char, varchar and decimal.
Can somebody help explain the bytes for these three data types, and also answer how many bytes for the following:
char(7)
varchar(9)
decimal(15,2)
decimal(11,6)
Thanks
CHAR(N) is probably the most confusing because a char is not a fixed byte size across character sets. Furthermore, different row formats handle this problem differently. Tersely, if you're using ROW_FORMAT=COMPACT, ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED then CHAR(N) reverse a minimum of N bytes in order to achieve updates in place without fragmentation. If more bytes are required as the result of a different character encoding than it will use more as necessary, trying to use as few as possible, and NO MORE than the maximum character byte length * N is used. If you're using ROW_FORMAT=REDUNDANT, than CHAR(N) always uses the maximum character byte length * N.
VARCHAR(N) and VARBINARY(N) sets a maximum character length per column of N. Below N, MySQL uses the number of bytes required given the string and character encoding used. MySQL then uses one additional byte to record the length of the string if the string is below 256 bytes. If the length of the string is greater than 255 bytes than it uses 2 bytes to record the length of the string. VAR columns are storage efficient but for string columns with frequent UPDATES, one can trade storage for performance by using a fixed length column such as BINARY.
The DECIMAL description is pretty self explanatory:
"Values for DECIMAL (and NUMERIC) columns are represented using a binary format that packs nine decimal (base 10) digits into four bytes. Storage for the integer and fractional parts of each value are determined separately. Each multiple of nine digits requires four bytes, and the “leftover” digits require some fraction of four bytes. The storage required for excess digits is given by the following table"

mySQL - Does Int(9.455.487) take more space than string(John) in mySQL?

I understood that in a database an int takes less space than a string. But what if the int is really longer than the string. For example 9.455.487 vs "John". Which one will take more space? TY
From the documentation, size of int is 4 bytes, whereas for char it is "M × w bytes, 0 <= M <= 255, where w is the number of bytes required for the maximum-length character in the character set." and M is the declared column size.
So when you talk of how much space is taken, the int will take up 4 bytes for a value as long as the value is within the range of int. A string like "John", if declared as char(4) will take up 4 * w bytes, so at least 4 bytes assuming w is 1.
Long story short, the size of a number is not how many characters long it is when you write it out, but the number of bytes to represent it in the binary form.
You should be aware of what "int" (integer) is and what strings are. Integer always has some length and that length means how many bytes are in it's binary representation. On the other hand, strings are sequences of bytes. So, depending of encoding, each symbol may be one or more bytes.
The thing that 9.455.487 is "longer" than "John" is irrelevant here. What is relevant - is how DBMS (or whatever other environment) will represent those things. You're seeing "longer" integer versus "shorter" string while it's not so, it's only a matter of "screen" representation (i.e. what you see on the screen).
Answering question - for MySQL, INT is 4 bytes, while string data types may have dynamic length - such as VARCHAR. Static string length date type is CHAR and from that viewpoint, your number and your string will have same length (4 bytes). Strings and integers are just different things to compare for "length". And visual representation should not confuse you. This entities have different internal structure, and, therefore, should not be compared on "length" according to their visual representation.
Also, you should be aware that not always integer will have 4 bytes length - since even in MySQL your number may belong to, for example, BIGINT data type (which is 8 bytes length). And, as mentioned above, for strings there's also encoding issue. For instance, UTF-8 encoded string may have two (or even more) bytes to represent some non-ASCII symbols. In this case each symbol will add more that 1 byte to total string length.

Recommended way to store a string in this case?

I am storing strings and 99.5+% are less than 255 characters, so I store them in a VARCHAR(255).
The thing is, some of them can be 4kb or so. What's the best way to store those?
Option #1: store them in another table with a pointer to the main.
Option #1.0: add an INT column with DEFAULT NULL and the pointer will be stored there
Option #1.1: the pointer will be stored in the VARCHAR(255) column, e.g 'AAAAAAAAAAA[NUMBER]AAAAAAAAAAAA'
Option #2: increase the size of VARCHAR from 255 to 32767
What's the best of the above, Option #1.0, Option #1.1 or Option #2, performance wise?
Increase the size of your field to fit the max size of your string. A VARCHAR will not use the space unless needed.
VARCHAR values are stored as a 1-byte or 2-byte length prefix plus
data. The length prefix indicates the number of bytes in the value. A
column uses one length byte if values require no more than 255 bytes,
two length bytes if values may require more than 255 bytes.
http://dev.mysql.com/doc/refman/5.0/en/char.html
The MySQL Definition says that VARCHAR(N) will take up to L + 1 bytes if column values require 0 – 255 bytes, L + 2 bytes if values may require more than 255 bytes where L is the length in bytes of the stored string.
So I guess that option #2 is quite okay, because the small strings will still take less space than 32767 bytes.
EDIT:
Also imagine the countless problems options 1.0 and 1.1 would raise when you actually want to query a string without knowing whether it exceeds the length or not.
Option #2 is clearly best. It just adds 1 byte to the size of each value, and doesn't require any complicated joins to merge in the fields from the second table.

MySQL char & varchar character sets & storage sizes

Wondering how much actual storage space will be taken up by these two datatypes, as the MySQL documentation is slightly unclear on the matter.
CHAR(M) M × w bytes, 0 <= M <= 255, where w is the number of bytes
required for the maximum-length character in the character set
VARCHAR(M), VARBINARY(M) L + 1 bytes if column values require 0 – 255
bytes, L + 2 bytes if values may require more than 255 bytes
This seems to imply to me that, given a utf8-encoded database, a CHAR will always take up 32 bits per character, whilst a VARCHAR will take between 8 and 32 depending on the actual byte length of the characters stored. Is that correct? Or does a VARCHAR imply an 8-bit character width, and storing multi-octet UTF8 characters actually consumes multiple 'characters' from the VARCHAR? Or does the VARCHAR also always store 32 bits per character? So many possibilities.
Not something I've ever had to worry this much about before, but I'm starting to hit in-memory temp table size limits and I don't necessarily want to have to increase MySQL's available pool (for the second time).
CHAR and VARCHAR both count characters. Both of them count the maximum storage that they might require given the character encoding and length. For ASCII, that's 1 byte per character. For UTF-8, that's 3 bytes per character (not 4 as you'd expect, because MySQL's Unicode support is limited for some reason, and it doesn't support any Unicode characters which would require 4 bytes in UTF-8). So far, CHAR and VARCHAR are the same.
Now, CHAR just goes ahead and reserves this amount of storage.
VARCHAR instead allocated 1 or 2 bytes, depending on whether this maximum storage is < 256 or ≥ 256. And the actual amount of space occupied by the entry is these one or two bytes, plus the amount of space actually occupied by the string.
Interestingly, this makes 85 a magic number for UTF-8 VARCHAR:
VARCHAR(85) uses 1 byte for the length because the maximum possible length of 85 UTF-8 characters is 3 × 85 = 255.
VARCHAR(86) uses 2 byte for the length because the maximum possible length of 86 UTF-8 characters is 3 × 86 = 258.

MySQL entry and space used

I have a MySQL table with one of the columns like "varchar(255)". Will the Database use 255bytes of space even if that column is empty? or only if it has some data and the amount of space used is proportional to the data?
every cell will take only the amount of space proportional to the data.
http://dev.mysql.com/doc/refman/5.0/en/char.html
only if it has some data and the amount of space used is proportional to the data.
VARCHAR(M) takes N+1 bytes or more depending on the size of the data you're adding. A blank field (empty string) will still consume 1 byte (that is the +1 on the N+1) that is used to indicate where that field's data ends - so you have 1 byte for the terminator.
From MySQL's website:
VARCHAR(M), VARBINARY(M)
L + 1 bytes if column values require 0 – 255 bytes,
L + 2 bytes if values may require more than 255 bytes
where L is the length of your data. In your case, you'll be consuming (data length + 1) on your VARCHAR(255) field.