ive came across a website who state that the following:
Ex: CountryCode CHAR(3) CHARSET utf8
We are asking for a column with 3 characters exactly. The required storage for this column
will be such that any 3-letter name must fit in. This means (3 characters) times (3 bytes
per character) = 9 bytes of storage. So CHAR and utf8 together may be less than ideal.
VARCHAR behaves better: it only requires as many bytes per character as described above. So
the text "abc" will only require 3 bytes
Do i need for the text 'abc'(with utf8 and char(3)) 3 bytes or 9 bytes?!
Thanks
MySQL's internal structure places CHAR fields directly within the table structure, e.g. A simple table like:
create table foo (
id int
name char(3)
);
would produce an on-disk record that looks like
xxxxccccccccc
^^^^-- 4 bytes of int storage space
^^^^^^^^^ 9 bytes of utf-8 char space
Since MySQL has no way of knowing in advance what kind of text you'll be storing in that char field, it HAS to assume worst-case, and allocates as much space as 3 chars of 'absolutely the longest possible' utf-8 text might take. If it didn't, then an overly long string would overflow the on-disk storage and start scribbling on an adjacent record.
varchar, on the other hand, only has a small 'stub' data section in the table's raw data, and the varchar's contents are stored elsewhere. That means that your varchar(3) will always occupy the same amount of table-space storage, no matter WHAT kind of character set you're using.
Related
I have a table of user entries, and for every entry I have an array of (2-byte) integers to store (15-25, sporadically even more). The array elements will be written and read all at the same time, it is never needed to update or to access them individually. Their order matters. It makes sense to think of this as an array object.
I have many millions of these user entries and want to store this with the minimum possible amount of disk space. I'm however struggling with MySQL's lack of Array datatype.
I've been considering the following options.
Do it the MySQL way. Make a table my_data with columns user_id, data_id and data_int. To make this efficient, one needs an index on user_id, totalling well over 10 bytes per integer.
Store the array in text format. This takes ~6.5 bytes per integer.
making 35-40 columns ("enough") and having -32768 be 'empty' (since this value cannot occur in my data). This takes 3.5-4 bytes per integer, but is somewhat ugly (as I have to impose a strict limit on the number of elements in the array).
Is there a better way to do this in MySQL? I know MySQL has an efficient varchar type, so ideally I'd store my 2-byte integers as 2-byte chars in a varchar (or a similar approach with blob), but I'm not sure how to do that. Is this possible? How should this be done?
You could store them as separate SMALLINT NULL columns.
In MyISAM this this uses 2 bytes of data + 1 bit of null indicator for each value.
In InnoDB, the null indicators are encoded into the column's field start offset, so they don't take any extra space, and null values are not actually stored in the row data. If the rows are small enough that all the offsets are 1 byte, then this uses 3 bytes for every existing value (1 byte offset, 2 bytes data), and 1 byte for every nonexistent value.
Either of these would be better than using INT with a special value to indicate that it doesn't exist, since that would be 4 bytes of data for every value.
See NULL in MySQL (Performance & Storage)
The best answer was given in the comments, so I'll repost it here with some use-ready code, for further reference.
MySQL has a varbinary type that works really well for this: you can simply use PHP's pack/unpack functions to convert them to and from binary form, and store that binary form in the database using varbinary. Example code for the conversion is below.
function pack24bit($n) { //input: 24-bit integer, output: binary string of length 3 bytes
$b3 = $n%256;
$b2 = $n/256;
$b1 = $b2/256;
$b2 = $b2%256;
return pack('CCC',$b1,$b2,$b3);
}
function unpack24bit($packed) { //input: binary string of 3 bytes long, output: 24-bit int
$arr = unpack('C3b',$packed);
return 256*(256*$arr['b1']+$arr['b2'])+$arr['b3'];
}
In my development I always make my VARCHAR columns have a character length, e.g. summary VARCHAR (250) because I think "250 characters should be enough to hold the summary." Then I end up lengthening it later when a content manager tells me they're getting an error when they write the summary. Very rarely do I have a situation where I know for sure how long the text in the columns should be exactly or at most. So it seems like I should just use NVARCHAR, which I assume has as its underlying datastructure a dynamic character array as opposed to a fixed-size character array. Or is there some other reason why I should be using VARCHAR (somenumber) ?????
NVARCHAR does not mean at all what you think it does. The N just means to use unicode characters instead of ascii (which, by the way, is probably a good idea for summary fields). NVARCHAR columns still require a length.
You are mistaken about the NVARCHAR. NVARCHAR is used for Unicode strings. You can also specify a max length for NVARCHAR. NVARCHARs take up more space than a VARCHAR but have a wider range of characters and symbols that can be used.
In SQL Server, if you don't specify the max length, it defaults to 1. So just declaring it as NVARCHAR is the same as NVARCHAR(1). The same goes for VARCHAR. Also, the length is not a fixed-size for VARCHAR and NVARCHAR, it only specifies the max size.
You tagged this question both MySQL and SQL Server; I don't know what it defaults to in MySQL.
You could go with VARCHAR(MAX) but be careful with that. Using MAX changes the way the SQL Server database engine stores the data. It stores it like a TEXT field which doesn't have all the functionality of a VARCHAR field. Use VARCHAR(8000) instead.
Is this for SQL Server?
The main question is whether or not the application needs to support cultural information such as various languages.
If you need such support, NVARCHAR() stores unicode characters and supports all types of languages. Remember, every character requires two bytes.
If the application is internal for just a US company, VARCHAR() will save you one byte per character.
If I remember from the SQL Server internals book, a VARCHAR(25) with the string "HELLO WORLD" is 11 characters long with an overhead of two bytes.
So let's review.
DECLARE #A CHAR(11) = "HELLO WORLD" -- Takes 11 characters to store
DECLARE #B VARCHAR(11) = "HELLO WORLD" -- Takes 13 characters to store
DECLARE #C NVARCHAR(11) = "HELLO WORLD" -- Takes 24 characters to store
Good luck with your text.
John Miner
The Crafty DBA
I understood that in a database an int takes less space than a string. But what if the int is really longer than the string. For example 9.455.487 vs "John". Which one will take more space? TY
From the documentation, size of int is 4 bytes, whereas for char it is "M × w bytes, 0 <= M <= 255, where w is the number of bytes required for the maximum-length character in the character set." and M is the declared column size.
So when you talk of how much space is taken, the int will take up 4 bytes for a value as long as the value is within the range of int. A string like "John", if declared as char(4) will take up 4 * w bytes, so at least 4 bytes assuming w is 1.
Long story short, the size of a number is not how many characters long it is when you write it out, but the number of bytes to represent it in the binary form.
You should be aware of what "int" (integer) is and what strings are. Integer always has some length and that length means how many bytes are in it's binary representation. On the other hand, strings are sequences of bytes. So, depending of encoding, each symbol may be one or more bytes.
The thing that 9.455.487 is "longer" than "John" is irrelevant here. What is relevant - is how DBMS (or whatever other environment) will represent those things. You're seeing "longer" integer versus "shorter" string while it's not so, it's only a matter of "screen" representation (i.e. what you see on the screen).
Answering question - for MySQL, INT is 4 bytes, while string data types may have dynamic length - such as VARCHAR. Static string length date type is CHAR and from that viewpoint, your number and your string will have same length (4 bytes). Strings and integers are just different things to compare for "length". And visual representation should not confuse you. This entities have different internal structure, and, therefore, should not be compared on "length" according to their visual representation.
Also, you should be aware that not always integer will have 4 bytes length - since even in MySQL your number may belong to, for example, BIGINT data type (which is 8 bytes length). And, as mentioned above, for strings there's also encoding issue. For instance, UTF-8 encoded string may have two (or even more) bytes to represent some non-ASCII symbols. In this case each symbol will add more that 1 byte to total string length.
I was reading about the MySQL data type size. I saw VARCHAR takes extra 1/2 bytes, MEDIUMTEXT requires extra 3 bytes, LONGTEXT requires extra 4 bytes. What is the reason for such MySQL behaviour?
When MySQL (or any database or computer language) stores a variable length string, there are basically two ways to store the value:
The length can be encoded followed by the characters in the string
The end of the string can be marked by a special character (typically '0')
Databases (almost?) always use length encoding. So, when you store 'ABC' as a variable length string, in the database storage it looks like:
3 A B C
When you store 'A':
1 A
That way, MySQL knows when one string ends and the next begins. The different lengths for the different types are based on the maximum length of the string. So, 1 byte can hold values from 0 to 255. 2 bytes can hold values from 0 to 65,535 and so on.
When you use a regular character expression, say char(3), then 'ABC' looks like:
A B C
This occupies three bytes/whatever (depending on the character coding). The length is known from the table metadata.
With char(3), the string 'A' also occupies three slots:
A
---^space here
--------^space here
The extra two are occupied by spaces. For long strings, this is generally a big waste of space, which is why most strings are stored as varchar rather than char.
This is probably a stupid question, but i need to ask...
I've created a MySQL table to handle images called images. In it, I have an attribute that keeps the extension of the image called extension.
Most of the accepted images extensions are either jpg or png or gif or bmp or jpeg or tiff In other words, a maximum of 4 characters long.
Now, should the attribute be declared in the MySQL table like:
extension char(4)
or
extension varchar(4)
There's probably no impact what so ever on performance, but i do want the model to be optimize from the get go...
Anyone?
Depends....
If you look at this from the MySQL documentation
Value CHAR(4) Storage Required VARCHAR(4) Storage Required
'' ' ' 4 bytes '' 1 byte
'ab' 'ab ' 4 bytes 'ab' 3 bytes
'abcd' 'abcd' 4 bytes 'abcd' 5 bytes
'abcdefgh' 'abcd' 4 bytes 'abcd' 5 bytes
As you can see 4 characters for CHAR takes 4 bytes, while VARCHAR takes 5. If the vast majority of extensions would be 4 characters then CHAR would be more space efficient.
In your case I am guessing that 3 will be majority so VARCHAR is the better choice.
James :-)
Edited, I was making a wrong assumption on my previous answer. I'll just paste you an excerpt from http://dev.mysql.com/doc/refman/5.0/en/char.html (emphasis added)
The CHAR and VARCHAR types are similar, but differ in the way they are stored and retrieved. As of MySQL 5.0.3, they also differ in maximum length and in whether trailing spaces are retained.
The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30) can hold up to 30 characters.
The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed.
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
In contrast to CHAR, VARCHAR values are stored as a one-byte or two-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.