SQL max size LONGTEXT - mysql

I created this table:
CREATE TABLE Hospital_MedicalRecord(
recNo CHAR(5),
patient CHAR(9),
doctor CHAR(9),
enteredOn DATETIME NOT NULL,
diagnosis LONGTEXT NOT NULL,
treatment TEXT(1000),
PRIMARY KEY (recNo, patient),
CONSTRAINT FK_patient FOREIGN KEY (patient) REFERENCES Hospital_Patient(NINumber),
CONSTRAINT FK_doctor FOREIGN KEY (doctor) REFERENCES Hospital_Doctor(NINumber)
ON DELETE CASCADE
);
How can one make diagnosis contain some long text but never more than 2^24 bytes? I've looked into LONGTEXT but I couldn't find a way to limit it since it can go up to 2^34 I believe?

Use MEDIUMTEXT.
https://dev.mysql.com/doc/refman/8.0/en/string-type-overview.html
MEDIUMTEXT [CHARACTER SET charset_name] [COLLATE collation_name]
A TEXT column with a maximum length of 16,777,215 (224 − 1) characters. The effective maximum length is less if the value contains multibyte characters. Each MEDIUMTEXT value is stored using a 3-byte length prefix that indicates the number of bytes in the value.
The wording is a little strange. The length limit is really on bytes, not characters.

LONGTEXT is limited to the maximum that the filesystem permits for a 32 bit system it is limited to 2^32 that is approximately 4.000.000.000 characters (if you do not use multi-byte characters), I have not calculated exactly.
Then you have MEDIUMTEXT with 2^24 characters .. around 16.000.000 characters. TEXT has a limit at 2^16 character that is much smaller, about 64.000 characters (if you do not have multibyte).
What You need is called MEDIUMTEXT

Related

Why table's index storage size is bigger after change charset from utf8mb4 to utf8?

Executed: alter table device_msg convert to character set 'utf8' COLLATE 'utf8_unicode_ci';"
As my expect,table data size change to smaller.
But at the same time, table index size change to bigger ?
What happen and why ?
ps: table data size and index size are calculated by information_schema.TABLES
DbEngine: InnoDB
Table Before:
CREATE TABLE `device_msg` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`sn` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
`time` datetime(3) NOT NULL,
`msg` json NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `device_UNIQUE` (`sn`,`time`)
) ENGINE=InnoDB AUTO_INCREMENT=62077733 DEFAULT CHARSET=utf8mb4;
Table After:
CREATE TABLE `device_msg` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`sn` varchar(30) COLLATE utf8_unicode_ci NOT NULL,
`time` datetime(3) NOT NULL,
`msg` json NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `device_UNIQUE` (`sn`,`time`)
) ENGINE=InnoDB AUTO_INCREMENT=62077733 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Before:
totalSize: 2.14 GB
indexSize: 282.98 MB
dataSize: 1.86 GB
avg_row_len: 297B
After
totalSize: 1.93 GB
indexSize: 413.97 MB
dataSize: 1.52 GB
avg_row_len: 260B
If data of information_schema.TABLES is not accurate,
How to make it right ?
The space taken by utf8mb4, then utf8 (assuming there were no 4-byte characters beforehand) is the same, in spite of the numbers you show.
This ALTER required rebuilding the table and the indexes.
InnoDB structures the data and each secondary index in a BTrees.
Depending on the order by which you insert elements into a BTree, more or fewer "block splits" will occur.
So, You can't really say whether it is the character set change or the rebuild that lead to the index getting bigger and the data getting smaller.
I say it was not the charset change.
Just in my opinion
As I read on MySQL document about the limitation.
https://dev.mysql.com/doc/refman/5.6/en/innodb-restrictions.html
By default, the index key prefix length limit is 767 bytes
if the index column exceeds this size, it will be truncated.
I assume your indexed column value has 255 characters.
in the case of utf8mb4, 1 character = 4 bytes, the limit is around 191 characters.
So 191 characters will be added to index, other (255-191=64) characters will be truncated from the index.
When you change encoding to utf8 (at that time 1 character = 3 bytes), the indexed limit will become around 255 characters.
It means your column value, all 255 characters, will be added to index without truncating.
The characters that are added to the index increased from 191 characters to 255 characters, so the index size was also increased.

Specified key was too long; max key length is 767 bytes

I understand the InnoDB index max length is 767 bytes.
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(254) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
.....
`token` varchar(128) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`rank` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `user_token_index` (`token`),
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
I want to create a index on my email.
alter table agent add UNIQUE index idx_on_email (email);
But got the error message:
Specified key was too long; max key length is 767 bytes.
But the length of token column only 128 bytes, email is 254 bytes, not above 767 bytes. Hope anyone can help me! Thanks in advance!
varchar(254) when you use utf8mb4, means 254 character and each character has 4 bytes, the email field requires at least 1016 bytes (254 * 4).
you may look at this article:
http://wildlyinaccurate.com/mysql-specified-key-was-too-long-max-key-length-is-767-bytes/
so you can make your email column: varchar(100)
An alternate option would be to reassess the nature and the constraints of the data stored in that table, and how they relate to other data in JOINs, then justify, or not, that a charset and collation of utf8mb4 is needed.
Example: if the data stored and/or compared to other will never have special characters longer then 2 bytes, you may just replace charset and collation with utf8 and utf8_general_ci respectively (or alternate). You may go even shorter for ascii ones.
This assessment / justifying job is a good practice anyway, and may bring accrued performance for free.

What does size limit on MySQL index mean?

I have a table created like so:
CREATE TABLE `my_table` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`info` varchar(50) DEFAULT NULL,
`some_more_info` smallint(5) unsigned NOT NULL
PRIMARY KEY (`id`),
KEY `my_index` (`some_more_info`,`info`(24)),
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
My question is about the second key called my_index. What does the "(24)" size limit mean? The actual size of the column is 50, but the index is only 24 characters.
Does this mean that MySQL indexes only the first 24 characters of the column info?
In short, yes, the first 24 characters are taken into consideration to build the BTree index. Indexing limits are assigned to text types such as varchar and text, as they don't affect numeric precision.
Yes.
The entire description about the index length can be found here:
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
Prefix lengths are given in characters for nonbinary string types and
in bytes for binary string types. That is, index entries consist of
the first length characters of each column value for CHAR, VARCHAR,
and TEXT columns, and the first length bytes of each column value for
BINARY, VARBINARY, and BLOB columns.
Also you create query has/had some extra ,'s.

mySQL VARCHAR(256) + mySQL INT = how many bytes?

CREATE SCHEMA IF NOT EXISTS `utftest` DEFAULT CHARACTER SET utf16;
CREATE TABLE IF NOT EXISTS `metadata_labels` (`metadata_id` INT NOT NULL , `label` VARCHAR(256) NOT NULL , PRIMARY KEY (`metadata_id`, `label`));
however I get the following error msg:
Specified key was too long; max key length is 767 bytes
Please advise
UTF 16 uses 32 bits per character (4 bytes) in MySQL. 4 x 256 > 767.
If possible, I would recommend using something other than UTF16 VARCHAR for your key.
In UTF8, it would require 3 x 256 + 4 = 772 bytes. UTF16 would take another 25% more.
You shouldn't use a primary key that's so wide; for an index to be efficient, the storage for each index should be kept to a minimum.
If you need to prevent duplicates, I would recommend adding a calculated field that contains a hash of the contents (e.g. sha1) and create a unique constraint on that instead.
Alternatively, use latin1 as the character encoding for the label field to reduce the number of bytes to 256 + 4 = 300.
If Unicode is a must and hashes are out of the picture you should reduce the column to either UTF8 (250 chars) or UTF16 (190 chars)

varchar(255) vs tinytext/tinyblob and varchar(65535) vs blob/text

By definition:
VARCHAR: The range of Length is 1 to 255 characters. VARCHAR values are sorted and compared in case-insensitive fashion unless the BINARY keyword is given. x+1 bytes
TINYBLOB, TINYTEXT: A BLOB or TEXT column with a maximum length of 255 (2^8 - 1) characters x+1 bytes
So based on this, I creaate the following table:
CREATE TABLE `user` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255),
`lastname` tinytext,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
Or is it better to create a varchar or tinytext and why?
Is it the same for:
VARCHAR: The range of Length is > 255 characters. VARCHAR values are sorted and compared in case-insensitive fashion unless the BINARY keyword is given. x+2 bytes
BLOB, TEXT A BLOB or TEXT column with a maximum length of 65535 (2^16 - 1) characters x+2 bytes
In this case varchar is better.
Note that varchar can be from 1 to 65535 chars.
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used. See Section E.7.4, “Table Column-Count and Row-Size Limits”.
Blobs are saved in a separate section of the file.
They require an extra fileread to include in the data.
For this reason varchar is fetched much faster.
If you have a large blob that you access infrequently, than a blob makes more sense.
Storing the blob data in a separate (part of the) file allows your core data file to be smaller and thus be fetched quicker.