Can MySQL VARCHAR's store any chars? - mysql

Can MySQL VARCHAR's store any char like nulls and newlines?

MySQL can store any chars, but depending on the specified encoding, some characters may be misrepresented if different from your display encoding.
Varchars can contain any chars, even 0x00, since its not a null terminated string, but a length specified string, where the length is specified at the beginning of the string as a 2 or 4 byte value.
I'm not sure about Char columns, but I'm guessing they are null terminated in the sense that the first 0x00 indicates end of string (though storage still takes up the full length).
If you want to store true byte data though, you should use the BLOB type. That is also a length specified type, which is guaranteed to contain whatever data you put into it, and is not affected by encodings at all.

You should probably use the BINARY and VARBINARY data types for storing strings of non-printable characters, especially if you want strings of single bytes instead of actual text characters (which can be multi-byte).

Related

How does SQL determine a character's length in a varchar?

After reading the documentation, I understood that there is a one-byte or two-byte length prefix to a varying character so as to determine its length. I understand too that, for a varchar, each character might have a different length in bytes depending on the character itself.
So my question is:
How does the DBMS determine each character's length after it's stored?
Meaning: After a string is stored, let's say it's 4 characters long, and let's suppose that the first character is 1 byte long, the second 2 bytes, the 3rd 3 bytes and the 4th is 4..
How does the DB know how long is each character when retrieving the string so as to read it correctly?
I hope the question is clear, sorry for any English mistakes I made. Thanks
The way UTF-8 works as a variable-length encoding is that the 1-byte characters can only use 7 bits of that byte.
If the high bit is 0, then the byte is a 1-byte character (which happens to be encoded in the same way as the 128 ASCII characters).
If the high bit is 1, then it's a multi-byte character.
Picture from https://en.wikipedia.org/wiki/UTF-8
If you're talking about UTF-8, that's not quite how it works. It uses the highest bit in each byte to indicate that the character continues into the next byte, and can store one, two, three or four byte characters fairly efficiently. This is in contrast to UTF-32 where every character is automatically four bytes, something that is obviously very wasteful for some types of text.
When using UTF-8, or any character set where the characters are a variable number of bytes, there's a disconnect between the length of the string in bytes and the length of the string in characters. In a fixed-length system like Latin1, which is rigidly 8-bit, there's no such drift.
Internally the database is most concerned with the length of a field in terms of bytes. The length in terms of characters is only explicitly exposed when calling functions like LENGTH(), as otherwise it's just a bunch of bytes that, if necessary, can be interpreted as a string.
Historically speaking the database stored the length of a field in bytes in a single byte, then the data itself. That's why VARCHAR(255) is so prevalent: It's the longest string you can represent with a single byte length field. Newer databases like Postgres allow >2GB character fields, so they're using four or more bytes to represent the length.

MySQL hex strings low bytes first

If I insert 0xFF into a binary column, MySql (5.7) assumes these are the high bytes.
e.g. if the column is BINARY(2):
+--------------------+
| HEX(binary_column) |
+--------------------+
| FF00 |
+--------------------+
Just for convenience, how would you get MySql to interpret a hex string normally?
P.S. Also tried UNHEX()
binary is not really a numerical datatype. It is a special type of string used to store binary data like files. In contrast to e.g. char, binary does not have a character map and comparisons are done with the numerical code.
That behaviour is similar as to how other programming languages treat strings and byte arrays, and is expected in mysql too, see The BINARY and VARBINARY Type:
When BINARY values are stored, they are right-padded with the pad value to the specified length. The pad value is 0x00 (the zero byte). Values are right-padded with 0x00 on insert, and no trailing bytes are removed on select. All bytes are significant in comparisons, including ORDER BY and DISTINCT operations. 0x00 bytes and spaces are different in comparisons, with 0x00 < space.
You seem to look for binary numbers, so you may want to use a numeric type. You can use e.g. int (or bit(16)) and still insert values like 0xFF (just not as '0xFF' without further casting), and you still can display them with e.g. hex(0xFF) in the way you want.
If you want to use binary values (or need large values > 8 byte), you can use lpad to fill them with leading zeros, e.g.
select hex(lpad(0xFF,2,0x0))
You have to know (or query) the size of your column, and you will probably run into a lot of issues with this, starting with the simple task of adding two binary values. So to keep it simple, use a numeric type.

mySQL - Does Int(9.455.487) take more space than string(John) in mySQL?

I understood that in a database an int takes less space than a string. But what if the int is really longer than the string. For example 9.455.487 vs "John". Which one will take more space? TY
From the documentation, size of int is 4 bytes, whereas for char it is "M × w bytes, 0 <= M <= 255, where w is the number of bytes required for the maximum-length character in the character set." and M is the declared column size.
So when you talk of how much space is taken, the int will take up 4 bytes for a value as long as the value is within the range of int. A string like "John", if declared as char(4) will take up 4 * w bytes, so at least 4 bytes assuming w is 1.
Long story short, the size of a number is not how many characters long it is when you write it out, but the number of bytes to represent it in the binary form.
You should be aware of what "int" (integer) is and what strings are. Integer always has some length and that length means how many bytes are in it's binary representation. On the other hand, strings are sequences of bytes. So, depending of encoding, each symbol may be one or more bytes.
The thing that 9.455.487 is "longer" than "John" is irrelevant here. What is relevant - is how DBMS (or whatever other environment) will represent those things. You're seeing "longer" integer versus "shorter" string while it's not so, it's only a matter of "screen" representation (i.e. what you see on the screen).
Answering question - for MySQL, INT is 4 bytes, while string data types may have dynamic length - such as VARCHAR. Static string length date type is CHAR and from that viewpoint, your number and your string will have same length (4 bytes). Strings and integers are just different things to compare for "length". And visual representation should not confuse you. This entities have different internal structure, and, therefore, should not be compared on "length" according to their visual representation.
Also, you should be aware that not always integer will have 4 bytes length - since even in MySQL your number may belong to, for example, BIGINT data type (which is 8 bytes length). And, as mentioned above, for strings there's also encoding issue. For instance, UTF-8 encoded string may have two (or even more) bytes to represent some non-ASCII symbols. In this case each symbol will add more that 1 byte to total string length.

Why some MySQL data type require some extra bytes?

I was reading about the MySQL data type size. I saw VARCHAR takes extra 1/2 bytes, MEDIUMTEXT requires extra 3 bytes, LONGTEXT requires extra 4 bytes. What is the reason for such MySQL behaviour?
When MySQL (or any database or computer language) stores a variable length string, there are basically two ways to store the value:
The length can be encoded followed by the characters in the string
The end of the string can be marked by a special character (typically '0')
Databases (almost?) always use length encoding. So, when you store 'ABC' as a variable length string, in the database storage it looks like:
3 A B C
When you store 'A':
1 A
That way, MySQL knows when one string ends and the next begins. The different lengths for the different types are based on the maximum length of the string. So, 1 byte can hold values from 0 to 255. 2 bytes can hold values from 0 to 65,535 and so on.
When you use a regular character expression, say char(3), then 'ABC' looks like:
A B C
This occupies three bytes/whatever (depending on the character coding). The length is known from the table metadata.
With char(3), the string 'A' also occupies three slots:
A
---^space here
--------^space here
The extra two are occupied by spaces. For long strings, this is generally a big waste of space, which is why most strings are stored as varchar rather than char.

Are there any illegal characters in MySQL which may not be stored in a field?

I'm looking for a shorthand solution to storing an md5 hash inside of a MySQL table, as string data. I had the idea that base256 could reduce the length of the string by half, down to a 16 digit string instead of 32 digits of hex. So I take hex and divide it up into chunks of two digits programatically then convert each set of two digits to ASCII. For example:
4cf5f5941a02573dc007e60442f5358a
is shortened to
Lõõ”W=ÀæBõ5Š
and it's OK if these characters don't print properly - I just need to store them. Would MySQL accept that sort of ASCII data into a text field without complaining?
MySQL will accept these values, but you must be very carefull when writing them - I strongly suggest binding parameters.
You might want to look into COMPRESS() and UNCOMPRESS() as an alternative:
INSERT INTO ... SET hashcode=COMPRESS('4cf5f5941a02573dc007e60442f5358a');
and
SELECT UNCOMPRESS(hashcode) AS hashcode FROM ... WHERE
might do the trick more readable