MySQL InnoDB DECIMAL - data size driven by column declaration or by actual data? - mysql

I'm using MySQL, all my tables are using InnoDB engine. I have some columns declared as DECIMAL(38, 0) and they are used extensively. According to the MySQL documentation (http://dev.mysql.com/doc/refman/5.5/en/storage-requirements.html), 38-digit value requires 17 bytes (38 = 4 * 9 + 2; 4 * 4 + 1 = 17). Okay.
But, does that mean that any value stored in this column will take 17 bytes? For example, for value 432 - will it take 4 bytes only (I really hope so...) or will it take 17 bytes anyway?
Finally, I know that in Oracle the size occupied depends on the actual values stored. But is it optimized that way in MySQL as well?

I think the answer is that it will take 17 bytes anyway. If you notice, detailed in the linked manual page there is no means for the DBMS to record how "long" the value is. By comparison, for a VARCHAR(255) CHARACTER SET ascii column there is a single byte at the start of the value that indicates how long the value is (for a maximum size of 256 bytes). For a VARCHAR(1000) CHARACTER SET ascii column there are two bytes to indicate the length. Here no means is detailed to record the length of the value, leading me to conclude that the column always takes the maximum amount of space.

Decimal is "fixed length" so every value requires 17 bytes

Related

Recommended way to store a string in this case?

I am storing strings and 99.5+% are less than 255 characters, so I store them in a VARCHAR(255).
The thing is, some of them can be 4kb or so. What's the best way to store those?
Option #1: store them in another table with a pointer to the main.
Option #1.0: add an INT column with DEFAULT NULL and the pointer will be stored there
Option #1.1: the pointer will be stored in the VARCHAR(255) column, e.g 'AAAAAAAAAAA[NUMBER]AAAAAAAAAAAA'
Option #2: increase the size of VARCHAR from 255 to 32767
What's the best of the above, Option #1.0, Option #1.1 or Option #2, performance wise?
Increase the size of your field to fit the max size of your string. A VARCHAR will not use the space unless needed.
VARCHAR values are stored as a 1-byte or 2-byte length prefix plus
data. The length prefix indicates the number of bytes in the value. A
column uses one length byte if values require no more than 255 bytes,
two length bytes if values may require more than 255 bytes.
http://dev.mysql.com/doc/refman/5.0/en/char.html
The MySQL Definition says that VARCHAR(N) will take up to L + 1 bytes if column values require 0 – 255 bytes, L + 2 bytes if values may require more than 255 bytes where L is the length in bytes of the stored string.
So I guess that option #2 is quite okay, because the small strings will still take less space than 32767 bytes.
EDIT:
Also imagine the countless problems options 1.0 and 1.1 would raise when you actually want to query a string without knowing whether it exceeds the length or not.
Option #2 is clearly best. It just adds 1 byte to the size of each value, and doesn't require any complicated joins to merge in the fields from the second table.

char or varchar

This is probably a stupid question, but i need to ask...
I've created a MySQL table to handle images called images. In it, I have an attribute that keeps the extension of the image called extension.
Most of the accepted images extensions are either jpg or png or gif or bmp or jpeg or tiff In other words, a maximum of 4 characters long.
Now, should the attribute be declared in the MySQL table like:
extension char(4)
or
extension varchar(4)
There's probably no impact what so ever on performance, but i do want the model to be optimize from the get go...
Anyone?
Depends....
If you look at this from the MySQL documentation
Value CHAR(4) Storage Required VARCHAR(4) Storage Required
'' ' ' 4 bytes '' 1 byte
'ab' 'ab ' 4 bytes 'ab' 3 bytes
'abcd' 'abcd' 4 bytes 'abcd' 5 bytes
'abcdefgh' 'abcd' 4 bytes 'abcd' 5 bytes
As you can see 4 characters for CHAR takes 4 bytes, while VARCHAR takes 5. If the vast majority of extensions would be 4 characters then CHAR would be more space efficient.
In your case I am guessing that 3 will be majority so VARCHAR is the better choice.
James :-)
Edited, I was making a wrong assumption on my previous answer. I'll just paste you an excerpt from http://dev.mysql.com/doc/refman/5.0/en/char.html (emphasis added)
The CHAR and VARCHAR types are similar, but differ in the way they are stored and retrieved. As of MySQL 5.0.3, they also differ in maximum length and in whether trailing spaces are retained.
The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30) can hold up to 30 characters.
The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed.
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
In contrast to CHAR, VARCHAR values are stored as a one-byte or two-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.

MySQL char & varchar character sets & storage sizes

Wondering how much actual storage space will be taken up by these two datatypes, as the MySQL documentation is slightly unclear on the matter.
CHAR(M) M × w bytes, 0 <= M <= 255, where w is the number of bytes
required for the maximum-length character in the character set
VARCHAR(M), VARBINARY(M) L + 1 bytes if column values require 0 – 255
bytes, L + 2 bytes if values may require more than 255 bytes
This seems to imply to me that, given a utf8-encoded database, a CHAR will always take up 32 bits per character, whilst a VARCHAR will take between 8 and 32 depending on the actual byte length of the characters stored. Is that correct? Or does a VARCHAR imply an 8-bit character width, and storing multi-octet UTF8 characters actually consumes multiple 'characters' from the VARCHAR? Or does the VARCHAR also always store 32 bits per character? So many possibilities.
Not something I've ever had to worry this much about before, but I'm starting to hit in-memory temp table size limits and I don't necessarily want to have to increase MySQL's available pool (for the second time).
CHAR and VARCHAR both count characters. Both of them count the maximum storage that they might require given the character encoding and length. For ASCII, that's 1 byte per character. For UTF-8, that's 3 bytes per character (not 4 as you'd expect, because MySQL's Unicode support is limited for some reason, and it doesn't support any Unicode characters which would require 4 bytes in UTF-8). So far, CHAR and VARCHAR are the same.
Now, CHAR just goes ahead and reserves this amount of storage.
VARCHAR instead allocated 1 or 2 bytes, depending on whether this maximum storage is < 256 or ≥ 256. And the actual amount of space occupied by the entry is these one or two bytes, plus the amount of space actually occupied by the string.
Interestingly, this makes 85 a magic number for UTF-8 VARCHAR:
VARCHAR(85) uses 1 byte for the length because the maximum possible length of 85 UTF-8 characters is 3 × 85 = 255.
VARCHAR(86) uses 2 byte for the length because the maximum possible length of 86 UTF-8 characters is 3 × 86 = 258.

Sql Msg 1701, Level 16, State 1 while making a very wide table for testing purposes

I am making a huge table simulating a very rough scenario in SQL (a huge table with 1024 atts, of course a lot of rows if you wonder), the data type for each attribute are floats.
To do so I am using another table which has 300 attributes and I am doing something like
SELECT [x1]
,[x2]
,[x3]
,[x4]
,[x5]
,[x6]
,[x7]
,[x8]
,[x9]
,[x10]
,[x11]
,[x12]
,[x13]
,[x14]
,[x300]
,x301= x1
,x302= x2
...
,x600= x300
,x601= x1
,x602= x2
...
,x900= x300
,x901= x1
,x902= x2
...
,x1000= x100
,x1001= x101
,x1002= x102
,x1003= x103
,x1004= x104
...
,x1024= x124
INTO test_1024
FROM my_300;
However an error is present:
Msg 1701, Level 16, State 1, Line 2
Creating or altering table 'test_1024' failed because the minimum row size
would be 8326, including 134 bytes of internal overhead. This exceeds the
maximum allowable table row size of 8060 bytes.
How to overcome this issue? (I know SQL can handle 1024 columns...)
You will have to change your data types to either varchar, nvarchar, varbinary or text to circumvent this error - or break the input into several tables (or better yet, find a better way to structure your data...which I know isn't always possible depending on constraints).
To read more about the 'why' - check out this article which explains it better than I could: http://blog.sqlauthority.com/2007/06/23/sql-server-2005-row-overflow-data-explanation/
Let's have a look at the figures in the error message.
'8326, including 134 bytes of internal overhead' means that data only has taken 8326-134=8192 bytes.
Given that the number of columns is 1024, it's exactly 8192÷1024=8 bytes per column.
Moving on to the overhead, of those 134 bytes, your 1024 columns require 1024÷8=128 bytes for the NULL bitmap.
As for the remaining 134-128=6 bytes, I am not entirely sure but we can very well consider that size a constant overhead.
Now, let's try to estimate the maximum possible number of float columns per table in theory.
The maximum row size is said to be 8060 bytes.
Taking off the constant overhead, it's 8060-6=8054 bytes.
As we now know, one float column takes 8 bytes of data plus 1 bit in the bitmap, which is 8×8+1=65 bits.
The data + NULL bitmap size in bits is 8054×8=64432.
The estimated maximum number of float columns per table is therefore 64432÷65≈991 columns.
So, commenting out 33 columns in your script should result in successful creation of the table.
To verify, uncommenting one back should produce the error again.
SQL server limits row sizes to approximately 8KB - certain column types are excluded from this total, but the value of each individual column must fit within the 8KB limit, and a certain amount of data will be placed in the row itself as a pointer. If you are exceeding this limit, you should step back and reconsider your schema; you do NOT need 300 columns in a table.

MySQL entry and space used

I have a MySQL table with one of the columns like "varchar(255)". Will the Database use 255bytes of space even if that column is empty? or only if it has some data and the amount of space used is proportional to the data?
every cell will take only the amount of space proportional to the data.
http://dev.mysql.com/doc/refman/5.0/en/char.html
only if it has some data and the amount of space used is proportional to the data.
VARCHAR(M) takes N+1 bytes or more depending on the size of the data you're adding. A blank field (empty string) will still consume 1 byte (that is the +1 on the N+1) that is used to indicate where that field's data ends - so you have 1 byte for the terminator.
From MySQL's website:
VARCHAR(M), VARBINARY(M)
L + 1 bytes if column values require 0 – 255 bytes,
L + 2 bytes if values may require more than 255 bytes
where L is the length of your data. In your case, you'll be consuming (data length + 1) on your VARCHAR(255) field.