Why can't tinyint store more than the number 255 in MySQL? - mysql

If TINYINT can store three characters, for example, why can't it store up to the number 999?

Because it takes only 8 bit and hence can encode no more than 2^8 = 256 values.
The three characters you see in something like '123' are the result of the binary to decimal conversion. You cannot store arbitrary 3 characters there.

It is 8 bits and can actually store a maximum value of 255. 8 bits have 256 possible states including zero.

Related

mysql char, varchar and decimal byte size

I was reading the mysql documentation on the byte size for different data types, but was a little confused when it came to char, varchar and decimal.
Can somebody help explain the bytes for these three data types, and also answer how many bytes for the following:
char(7)
varchar(9)
decimal(15,2)
decimal(11,6)
Thanks
CHAR(N) is probably the most confusing because a char is not a fixed byte size across character sets. Furthermore, different row formats handle this problem differently. Tersely, if you're using ROW_FORMAT=COMPACT, ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED then CHAR(N) reverse a minimum of N bytes in order to achieve updates in place without fragmentation. If more bytes are required as the result of a different character encoding than it will use more as necessary, trying to use as few as possible, and NO MORE than the maximum character byte length * N is used. If you're using ROW_FORMAT=REDUNDANT, than CHAR(N) always uses the maximum character byte length * N.
VARCHAR(N) and VARBINARY(N) sets a maximum character length per column of N. Below N, MySQL uses the number of bytes required given the string and character encoding used. MySQL then uses one additional byte to record the length of the string if the string is below 256 bytes. If the length of the string is greater than 255 bytes than it uses 2 bytes to record the length of the string. VAR columns are storage efficient but for string columns with frequent UPDATES, one can trade storage for performance by using a fixed length column such as BINARY.
The DECIMAL description is pretty self explanatory:
"Values for DECIMAL (and NUMERIC) columns are represented using a binary format that packs nine decimal (base 10) digits into four bytes. Storage for the integer and fractional parts of each value are determined separately. Each multiple of nine digits requires four bytes, and the “leftover” digits require some fraction of four bytes. The storage required for excess digits is given by the following table"

How does CHAR(n) represent its data in binary?

I am currently reading Robin Nixon's Learning PHP, MySQL, & JavaScript book. In particular the chapters on databases using MySQL. Here is the topic that I am struggling to thoroughly understand.
"The year field, however, has more-predictable values, so instead of VARCHAR we use the more efficient CHAR(4) data type. The parameter of 4 allows for 4 bytes of data, supporting all years from –999 to 9999; a byte comprises 8 bits and can have the values 00000000 through 11111111, which are 0 to 255 in decimal."
In the example above, CHAR(4) is set to store 4 characters that will represent a four digit year entry. Why is it that -999 to 9999 are the only options. How is it represented in binary that it is limited to just those values? My original thought process was that since 4 bytes is 32 bits and if all 32 bits are set to 1, 2^32 is about 4 Billion unsigned and about 2 Billion signed, that would be the amount that could be stored.
I appreciate any explanations.
The book you quoted seems to be suggesting using CHAR to store year values, even though this does not store in binary, it stores digit characters. You are correct that four bytes could store a much greater range if you use the bytes to store a binary value of the number.
It's surprising that the book is concerned about the single extra byte needed to encode the length for a VARCHAR, then it uses CHAR to store numeric year values in such an inefficient way.
It's an example of "penny wise, pound foolish."
Take this book (or any book) with a grain of salt. Remember that tech books are written by people, and they can make mistakes.
First all, for limited year values use YEAR (yyyy, 1901 to 2155, or 0000): 1 byte.
INT(4) can be stored in 2 bytes and would be best (as it has no year 2156 problem). Actually 4 bytes; SMALL INT takes 2 bytes, as #BillKarwin mentions.
CHAR(4) in general (single byte charset) is limited to 4 bytes, and but CHAR is padded with spaces, like '712 ' and cannot immediatedly be used with arithmetic.
VARCHAR(4) needs more than CHAR(4) and allows also smaller strings '712'.

char or varchar

This is probably a stupid question, but i need to ask...
I've created a MySQL table to handle images called images. In it, I have an attribute that keeps the extension of the image called extension.
Most of the accepted images extensions are either jpg or png or gif or bmp or jpeg or tiff In other words, a maximum of 4 characters long.
Now, should the attribute be declared in the MySQL table like:
extension char(4)
or
extension varchar(4)
There's probably no impact what so ever on performance, but i do want the model to be optimize from the get go...
Anyone?
Depends....
If you look at this from the MySQL documentation
Value CHAR(4) Storage Required VARCHAR(4) Storage Required
'' ' ' 4 bytes '' 1 byte
'ab' 'ab ' 4 bytes 'ab' 3 bytes
'abcd' 'abcd' 4 bytes 'abcd' 5 bytes
'abcdefgh' 'abcd' 4 bytes 'abcd' 5 bytes
As you can see 4 characters for CHAR takes 4 bytes, while VARCHAR takes 5. If the vast majority of extensions would be 4 characters then CHAR would be more space efficient.
In your case I am guessing that 3 will be majority so VARCHAR is the better choice.
James :-)
Edited, I was making a wrong assumption on my previous answer. I'll just paste you an excerpt from http://dev.mysql.com/doc/refman/5.0/en/char.html (emphasis added)
The CHAR and VARCHAR types are similar, but differ in the way they are stored and retrieved. As of MySQL 5.0.3, they also differ in maximum length and in whether trailing spaces are retained.
The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30) can hold up to 30 characters.
The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed.
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
In contrast to CHAR, VARCHAR values are stored as a one-byte or two-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.

MySQL char & varchar character sets & storage sizes

Wondering how much actual storage space will be taken up by these two datatypes, as the MySQL documentation is slightly unclear on the matter.
CHAR(M) M × w bytes, 0 <= M <= 255, where w is the number of bytes
required for the maximum-length character in the character set
VARCHAR(M), VARBINARY(M) L + 1 bytes if column values require 0 – 255
bytes, L + 2 bytes if values may require more than 255 bytes
This seems to imply to me that, given a utf8-encoded database, a CHAR will always take up 32 bits per character, whilst a VARCHAR will take between 8 and 32 depending on the actual byte length of the characters stored. Is that correct? Or does a VARCHAR imply an 8-bit character width, and storing multi-octet UTF8 characters actually consumes multiple 'characters' from the VARCHAR? Or does the VARCHAR also always store 32 bits per character? So many possibilities.
Not something I've ever had to worry this much about before, but I'm starting to hit in-memory temp table size limits and I don't necessarily want to have to increase MySQL's available pool (for the second time).
CHAR and VARCHAR both count characters. Both of them count the maximum storage that they might require given the character encoding and length. For ASCII, that's 1 byte per character. For UTF-8, that's 3 bytes per character (not 4 as you'd expect, because MySQL's Unicode support is limited for some reason, and it doesn't support any Unicode characters which would require 4 bytes in UTF-8). So far, CHAR and VARCHAR are the same.
Now, CHAR just goes ahead and reserves this amount of storage.
VARCHAR instead allocated 1 or 2 bytes, depending on whether this maximum storage is < 256 or ≥ 256. And the actual amount of space occupied by the entry is these one or two bytes, plus the amount of space actually occupied by the string.
Interestingly, this makes 85 a magic number for UTF-8 VARCHAR:
VARCHAR(85) uses 1 byte for the length because the maximum possible length of 85 UTF-8 characters is 3 × 85 = 255.
VARCHAR(86) uses 2 byte for the length because the maximum possible length of 86 UTF-8 characters is 3 × 86 = 258.

How many characters in varchar(max)?

How many characters can a SQL Server 2008 database field contain when the data type is VARCHAR(MAX)?
From http://msdn.microsoft.com/en-us/library/ms176089.aspx
varchar [ ( n | max ) ]
Variable-length, non-Unicode character
data. n can be a value from 1 through
8,000. max indicates that the maximum
storage size is 2^31-1 bytes. The
storage size is the actual length of
data entered + 2 bytes. The data
entered can be 0 characters in length.
The ISO synonyms for varchar are char
varying or character varying.
1 character = 1 byte. And don't forget 2 bytes for the termination. So, 2^31-3 characters.
For future readers who need this answer quickly:
2^31-1 = 2 147 483 647 characters, or roughly 2.147 billion
See the MSDN reference table for maximum numbers/sizes.
Bytes per varchar(max),
varbinary(max), xml, text, or image
column: 2^31-1
There's a two-byte overhead for the column, so the actual data is 2^31-3 max bytes in length. Assuming you're using a single-byte character encoding, that's 2^31-3 characters total. (If you're using a character encoding that uses more than one byte per character, divide by the total number of bytes per character. If you're using a variable-length character encoding, all bets are off.)
There are a few gotchas worth mentioning - you may need to force the use of varchar(max)
https://dba.stackexchange.com/questions/18483/varcharmax-field-cutting-off-data-after-8000-characters
and print only handles 8000 chars
How to print VARCHAR(MAX) using Print Statement?