I am trying to estimate the real disk-usage required space for each record of my table in MySQL RDBMS.
The table has a structure like this:
ID INT 4 byte;
VARCHAR(34) 34 byte;
INT 4 byte;
INT(5) 4 byte;
INT 4 byte;
INT 4 byte which is also a FOREIGN KEY;
So there are 5 INT fields and a VARCHAR of a maximum of 34 chars (i.e. 34 bytes).
I have 2 questions:
1) The total should be 54 bytes per record (with variable VARCHAR, of course) am I right when I am saying that, or there are also some over-head bytes which I should consider when estimating the disk-usage space?
2) I have also used INT(5) instead of CHAR(5) cause I need to store only exactly 5 digits in that field (I am going to do that by application, with regExp and string length, cause I know that INT(5) could be more than an int with 5 digits).
But could this be considered such as an optimization by the disk-usage space cause I am using an INT (4 bytes) instead of a CHAR(5) which is 5 bytes, i.e. 1 more byte per record?
Thanks for the attention!
One record itself will use
1 byte in offsets
0 bytes in NULLable bits
5 bytes in "extra bytes" header
4 bytes ID
6 bytes transaction id
7 bytes rollback pointer
0-3*34 bytes in VARCHAR(34) (one character may take up to 3 bytes because of UTF8)
4*4 bytes in other integers
Each distinct value of FK will lead to one record in a secondary index. it will use
5 bytes in "extra bytes" header
4 bytes INT for FK value
4 bytes INT for Primary key
Other overhead is page level:
120 bytes per page (16k) in headers
page fill factor 15/16 - i.e. one page may contain 15k in records.
And the last - add space used by non-leaf pages, which should be small anyway
So, answer to question - 1) yes there will be some overhead that you can calculate using information above.
2) CHAR(5) in UTF8 will add a byte for its length, so INT looks reasoaanle to use
Related
I am currently learning SQL.
When looking at the INT, I came to the understanding that an INT type is 4 bytes long, which translates to 8 bits each byte, leading to each INT being 32 bits.
However, for INT it is said that the max value for unsigned types is (2^32)-1 where the -1 is accounting for 0 value. I understand that the 32 comes from the fact that each int is 32 bits.
My question is where does the 2 come from in the calculation?
My intuition is telling me that each bit will have some sort of measure valued at 2.
int is actually a signed value in SQL. The range is from -2^31 through 2^31 - 1, which is -2,147,483,648 to 2,147,483,647. There are exactly 2^32 possible values in tis range. Note that it includes 0.
An unsigned integer would range from 0 to 2^32-1, that is up to 4,294,967,295. The - 1s are because 0 is included in the range, so the counting starts at 0 rather than 1.
The range of possible values is easily seen at with fewer bits. For instance, 3 bits can represent the values from -4 to 3:
Bits Unisgned Signed
000 0 0
001 1 1
010 2 2
011 3 3
100 4 -4
101 5 -3
110 6 -2
111 7 -1
Computers use binary system for storing values. Let's try analogy between binary and decimal system:
Consider 10-based (decimal) system. If You have number with 32 decimal places, every place having value 0-9, You have 10^32 possible values (obviously enough; we use this system on daily basis).
Now consider 2-based system, which is the one used by computers (for practical reasons - two states are easiest to distinguish and wiring logic is simplest). Every place (bit) has value 0-1, so there are 2^32 possible values.
In a DECIMAL(M, D) column MySQL gives the option for the range of D to be 0 to 30.
Is there a subtle reason that I'm missing for the option of 0? Isn't a decimal with nothing after the decimal point an integer?
When and why would I want to specify a DECIMAL that has no decimal places?
The number range of the DECIMAL type is much greater than for an INTEGER or BIGINT. The greatest number you are able to store in a DECIMAL(65, 0) is 65 nines. The largest number in a BIGINT is 18446744073709551615.
DECIMAL(x, 0) is often a little more expensive for small numbers. Consider using a defined INTEGER type if your numbers are in the range for one of those.
The storage requirement in bytes for a DECIMAL(x, 0) field depends on the x according to this formula:
Storage = x / 9 + Leftover
Leftover = round_up((x % 9) / 2) (i.e., about half of the leftover digits)
You can read more about storage requirements for numeric types in the MySQL manual and compare for yourself.
Besides allowing to store values bigger than BIGINT, you can use DECIMAL(x,0) if you want to:
allow values in the range -9, ... , +9: use DECIMAL(1,0) (uses 1 byte)
allow values in the range -99, ... , +99: use DECIMAL(2,0) (uses 1 byte)
allow values in the range -999, ... , +999: use DECIMAL(3,0) (uses 2 bytes)
allow values in the range -9999, ... , +9999: use DECIMAL(4,0) (uses 2 bytes)
...
allow values in the range -999999999, ... , +999999999: use DECIMAL(9,0) (uses 4 bytes)
... etc (up to DECIMAL(65,0) which uses 29 bytes)
In Mysql,
Decimal(3,2) means total 3 digits and 2 digits after decimal point like 3.42
Decimal(3,0) means in total 3 digits and no digit after decimal point like 345
Even if you write something beyond the given range in brackets of decimal Mysql will automatically update them to 000 or 999.
In a biging you can only store a digit which is no larger than 18 446 744 073 709 551 615. This is 20 digits, but in a DECIMAL you can specify even a 65 digits to store. Also with int you can't constrait directly the number of digits to a low number (e.g. to one). So it is more flexible, and if you need to expand it on an existing database, it is easier.
I'm creating a form for sending private messages and want to set the maxlength value of a textarea appropriate to the max length of a text field in my MySQL database table. How many characters can a type text field store?
If a lot, would I be able to specify length in the database text type field as I would with varchar?
See for maximum numbers:
http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html
TINYBLOB, TINYTEXT L + 1 bytes, where L < 2^8 (255 Bytes)
BLOB, TEXT L + 2 bytes, where L < 2^16 (64 Kilobytes)
MEDIUMBLOB, MEDIUMTEXT L + 3 bytes, where L < 2^24 (16 Megabytes)
LONGBLOB, LONGTEXT L + 4 bytes, where L < 2^32 (4 Gigabytes)
L is the number of bytes in your text field. So the maximum number of chars for text is 216-1 (using single-byte characters). Means 65 535 chars(using single-byte characters).
UTF-8/MultiByte encoding: using MultiByte encoding each character might consume more than 1 byte of space. For UTF-8 space consumption is between 1 to 4 bytes per char.
TINYTEXT: 256 bytes
TEXT: 65,535 bytes
MEDIUMTEXT: 16,777,215 bytes
LONGTEXT: 4,294,967,295 bytes
Type
Approx. Length
Exact Max. Length Allowed
TINYTEXT
256 Bytes
255 characters
TEXT
64 Kilobytes
65,535 characters
MEDIUMTEXT
16 Megabytes
16,777,215 characters
LONGTEXT
4 Gigabytes
4,294,967,295 characters
Basically, it's like:
"Exact Max. Length Allowed" = "Approx. Length" in bytes - 1
Note: If using multibyte characters (like Arabic, where each Arabic character takes 2 bytes), the column "Exact Max. Length Allowed" for TINYTEXT can hold be up to 127 Arabic characters (Note: space, dash, underscore, and other such characters, are 1-byte characters).
TINYTEXT 256 bytes
TEXT 65,535 bytes ~64kb
MEDIUMTEXT 16,777,215 bytes ~16MB
LONGTEXT 4,294,967,295 bytes ~4GB
TINYTEXT is a string data type that can store up to to 255 characters.
TEXT is a string data type that can store up to 65,535 characters. TEXT is commonly used for brief articles.
LONGTEXT is a string data type with a maximum length of 4,294,967,295 characters. Use LONGTEXT if you need to store large text, such as a chapter of a novel.
Acording to http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html, the limit is L + 2 bytes, where L < 2^16, or 64k.
You shouldn't need to concern yourself with limiting it, it's automatically broken down into chunks that get added as the string grows, so it won't always blindly use 64k.
How many characters can a type text field store?
According to Documentation You can use maximum of 21,844 characters if the charset is UTF8
If a lot, would I be able to specify length in the db text type field as I would with varchar?
You dont need to specify the length. If you need more character use data types MEDIUMTEXT or LONGTEXT. With VARCHAR, specifieng length is not for Storage requirement, it is only for how the data is retrieved from data base.
TEXT is a string data type that can store up to 65,535 characters.
But still if you want to store more data then change its data type to LONGTEXT
ALTER TABLE name_tabel CHANGE text_field LONGTEXT CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
For the MySql version 8.0.
Numeric Type Storage Requirements
Data Type Storage Required
TINYINT 1 byte
SMALLINT 2 bytes
MEDIUMINT 3 bytes
INT, INTEGER 4 bytes
BIGINT 8 bytes
FLOAT(p) 4 bytes if 0 <= p <= 24, 8 bytes if 25 <= p <= 53
FLOAT 4 bytes
DOUBLE, REAL 8 bytes
DECIMAL(M,D), NUMERIC(M,D) Varies; see following discussion
BIT(M) approximately (M+7)/8 bytes
Values for DECIMAL (and NUMERIC) columns are represented using a binary format that packs nine decimal (base 10) digits into four bytes. Storage for the integer and fractional parts of each value are determined separately. Each multiple of nine digits requires four bytes, and the “leftover” digits require some fraction of four bytes. The storage required for excess digits is given by the following table.
Date and Time Type Storage Requirements
For TIME, DATETIME, and TIMESTAMP columns, the storage required for tables created before MySQL 5.6.4 differs from tables created from 5.6.4 on. This is due to a change in 5.6.4 that permits these types to have a fractional part, which requires from 0 to 3 bytes.
Data Type Storage Required Before MySQL 5.6.4 Storage Required as of MySQL 5.6.4
YEAR 1 byte 1 byte
DATE 3 bytes 3 bytes
TIME 3 bytes 3 bytes + fractional seconds storage
DATETIME 8 bytes 5 bytes + fractional seconds storage
TIMESTAMP 4 bytes 4 bytes + fractional seconds storage
As of MySQL 5.6.4, storage for YEAR and DATE remains unchanged. However, TIME, DATETIME, and TIMESTAMP are represented differently. DATETIME is packed more efficiently, requiring 5 rather than 8 bytes for the nonfractional part, and all three parts have a fractional part that requires from 0 to 3 bytes, depending on the fractional seconds precision of stored values.
Fractional Seconds Precision Storage Required
0 0 bytes
1, 2 1 byte
3, 4 2 bytes
5, 6 3 bytes
For example, TIME(0), TIME(2), TIME(4), and TIME(6) use 3, 4, 5, and 6 bytes, respectively. TIME and TIME(0) are equivalent and require the same storage.
For details about internal representation of temporal values, see MySQL Internals: Important Algorithms and Structures.
String Type Storage Requirements
In the following table, M represents the declared column length in characters for nonbinary string types and bytes for binary string types. L represents the actual length in bytes of a given string value.
Data Type Storage Required
CHAR(M) The compact family of InnoDB row formats optimize storage for variable-length character sets. See COMPACT Row Format Characteristics. Otherwise, M × w bytes, <= M <= 255, where w is the number of bytes required for the maximum-length character in the character set.
BINARY(M) M bytes, 0 <= M <= 255
VARCHAR(M), VARBINARY(M) L + 1 bytes if column values require 0 − 255 bytes, L + 2 bytes if values may require more than 255 bytes
TINYBLOB, TINYTEXT L + 1 bytes, where L < 28
BLOB, TEXT L + 2 bytes, where L < 216
MEDIUMBLOB, MEDIUMTEXT L + 3 bytes, where L < 224
LONGBLOB, LONGTEXT L + 4 bytes, where L < 232
ENUM('value1','value2',...) 1 or 2 bytes, depending on the number of enumeration values (65,535 values maximum)
SET('value1','value2',...) 1, 2, 3, 4, or 8 bytes, depending on the number of set members (64 members maximum)
in MySQL if we create a field dataType of INT and does not specify any length/values then it automatically became int(11) and if we set the attribute UNSIGNED or UNSIGNED ZEROFILL then it turns into int(10)
Where does this length(1) goes?
int value can be -2147483648 these are 11 digits so the default display size is 11
unsigned int does not allow negative numbers so by default it need only display size 10
As the documentation below shows, the number of bits required to store SIGNED INT and UNSIGNED INT is the same, the range of storable numbers is merely shifted:
Unsigned type can be used to permit
only nonnegative numbers in a column
or when you need a larger upper
numeric range for the column. For
example, if an INT column is UNSIGNED,
the size of the column's range is the
same but its endpoints shift from
-2147483648 and 2147483647 up to 0 and 4294967295.
http://dev.mysql.com/doc/refman/5.0/en/numeric-types.html
According to the documentation, this number is merely the display width.
For example, INT(4) specifies an INT with a display width of four
digits.
The display width does not constrain the range of values that can be
stored in the column. Nor does it prevent values wider than the column
display width from being displayed correctly. For example, a column
specified as SMALLINT(3) has the usual SMALLINT range of -32768 to
32767, and values outside the range permitted by three digits are
displayed in full using more than three digits.
The default display width for an UNSIGNED INT is one fewer than that for a non-UNSIGNED INT simply because you will never be displaying a - character.
Note that you can still specify whatever display width you like. This is just the default.
The use of the term "digits" in the documentation is slightly misleading here.
Just incase anyone doesn't quite understand Shakti's answer (as I didn't). Here's a visual representation of why:
Signed minimum:
- 2 1 4 7 4 8 3 6 4 8
1 2 3 4 5 6 7 8 9 10 11
Unsigned max (also the signed max):
4 2 9 4 9 6 7 2 9 5
1 2 3 4 5 6 7 8 9 10
While i was creating stress data for a table i found the following files are generated.
-rw-rw---- 1 mysql mysql 8858 Jul 28 06:47 card.frm
-rw-rw---- 1 mysql mysql 7951695624 Jul 29 20:48 card.MYD
-rw-rw---- 1 mysql mysql 51360768 Jul 29 20:57 card.MYI
Actually I inserted 1985968 number of records in this table. But the index file size is unbelievable.
Structure of the table is
create table card(
company_id int(10),
emp_number varchar(100),
card_date varchar(10),
time_entry text,
total_ot varchar(15),
total_per varchar(15),
leave_taken double,
total_lop double,
primary key (company_id,emp_number,card_date),
index (company_id,card_date)
);
Is there any way to reduce the filesize of the MYD?
Please note that .MYI is your index, and .MYD is your data. The only way to reduce the size of your .MYD is to delete rows or alter your column sizes.
50MB for an index on 2 million rows is not large.
Let's look at the size breakdown of your table:
company_id - 4 Bytes
emp_number - 101 Bytes
card_date - 11 Bytes
total_ot - 17 Bytes
total_per - 17 Bytes
leave_taken - 9 Bytes
total_lop - 9 Bytes
time_entry - avg(length(time_entry)) + 3 Bytes
This gives us a row length of 172 + time_entry bytes. If time_entry averages out at 100 bytes. You're looking at 272 * 2000000 = 544MB
Of significance to me is the number of VARCHARs. Does employee number need to be a varchar(100), or even a varchar at all? You're duplicating that data in it's entirety in your index on (company_id,emp_number,card_date) as you're indexing the whole column.
You probably don't need a varchar here, and you possibly don't need it included in the primary key.
Do you really need time_entry to be a TEXT field? This is likely the biggest consumer of space in your database.
Why are you using varchar(10) for card date? If you used DATETIME you'd only use 8 Bytes instead of 11, TIMESTAMP would be 4 Bytes, and DATE would be 3 Bytes.
You're also adding 1 Byte for every column that can be NULL.
Also try running ANALYZE/REPAIR/OPTIMIZE TABLE commands as well.
A lot depends on how big that time_entry text field can be. I'm going to assume it's small, less than 100 bytes. Then you have roughly 4 + 100 + 10 + 100 + 15 + 15 + 8 + 8 = roughly 300 bytes of data per record. You have 2 million records. I'd expect the database to be 600 megabytes. In fact you are showing 8000 megabytes of data in the MYD on disk, or a factor of 12x. Something's not right.
Your best diagnostic tool is show table status. In particular check Avg_row_length and Data_length, they will give you some insight into where the space is going.
If you're using MyISAM tables, you may find that myisamchk will help make the table smaller. This tool particularly helps if you inserted and then deleted a lot of rows from the database. "optimize table" can help too. MyISAM does support read-only compressed tables via myisampack. I'd treat that as a last resort, though.