MySQL index for long strings

MySQL index for long strings - mysql

I have MySQL InnoDb table where I want to store long (limit is 20k symbols) strings. Is there any way to create index for this field?

you can put an MD5 of the field into another field and index that. then when u do a search, u match versus the full field that is not indexed and the md5 field that is indexed.
SELECT *
FROM large_field = "hello world hello world ..."
AND large_field_md5 = md5("hello world hello world ...")
large_field_md5 is index and so we go directly to the record that matches. Once in a blue moon it might need to test 2 records if there is a duplicate md5.

You will need to limit the length of the index, otherwise you are likely to get error 1071 ("Specified key was too long"). The MySQL manual entry on CREATE INDEX describes this:
Indexes can be created that use only the leading part of column values, using col_name(length) syntax to specify an index prefix length:
Prefixes can be specified for CHAR, VARCHAR, BINARY, and VARBINARY columns.
BLOB and TEXT columns also can be indexed, but a prefix length must be given.
Prefix lengths are given in characters for nonbinary string types and in bytes for binary string types. That is, index entries consist of the first length characters of each column value for CHAR, VARCHAR, and TEXT columns, and the first length bytes of each column value for BINARY, VARBINARY, and BLOB columns.
It also adds this:
Prefix support and lengths of prefixes (where supported) are storage engine dependent. For example, a prefix can be up to 1000 bytes long for MyISAM tables, and 767 bytes for InnoDB tables.

Here is an example how you could do that. As #Gidon Wise mentioned in his answer you can index the additional field. In this case it will be query_md5.
CREATE TABLE `searches` (
`id` int(10) UNSIGNED NOT NULL,
`query` varchar(10000) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`query_md5` varchar(32) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
) ENGINE=InnoDB;
ALTER TABLE `searches`
ADD PRIMARY KEY (`id`),
ADD KEY `searches_query_md5_index` (`query_md5`);
To make sure you will not have any similar md5 hashes you want to double check by doing and `query` =''.
The query will look like this:
select * from `searches` where `query_md5` = "b6d31dc40a78c646af40b82af6166676" and `query` = 'long string ...'
b6d31dc40a78c646af40b82af6166676 is md5 hash of the long string ... string. This, I think can improve query performance and you can be sure that you will get right results.

Use the sha2 function with a specific length. Add this to your table:
`hash` varbinary(32) GENERATED ALWAYS AS (unhex(sha2(`your_text`,256)))
ADD UNIQUE KEY `ix_hash` (`hash`);
Read about the SHA2 function

Related

Why mysql query is slow without quotation mark?

The table DDL as flows:
CREATE TABLE `video` (
`short_id` varchar(50) NOT NULL,
`prob` float DEFAULT NULL,
`star_id` varchar(50) NOT NULL,
`qipu_id` int(11) NOT NULL,
`cloud_url` varchar(100) DEFAULT NULL,
`is_identical` tinyint(1) DEFAULT NULL,
`quality` varchar(1) DEFAULT NULL,
PRIMARY KEY (`short_id`),
KEY `ix_video_short_id` (`short_id`),
KEY `sid` (`star_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
The video table has 4.5 million lines.
I execute the same query in mysql shell client as flows. except in where clause the star_id equal to a value with quatation mark, another not as flows.
select * from video where star_id="215343405";
12914 rows in set (0.22 sec)
select * from video where star_id=215343405;
12914 rows in set (3.17 sec)
the one with quatation mark is 10x faster then another(I have create index on star_id).i watch out the slow one does not use the index. I just wonder how mysql process the query?
mysql> explain select * from video where star_id=215343405;
Thanks advance!

This is answered in the manual:
For comparisons of a string column with a number, MySQL cannot use an
index on the column to look up the value quickly. If str_col is an
indexed string column, the index cannot be used when performing the
lookup in the following statement:
SELECT * FROM tbl_name WHERE str_col=1;
The reason for this is that there are many different strings that may convert to the value 1, such as '1', ' 1', or '1a'.

If you do not use Quotation marks mysql uses the value as an int and must convert the value for every record. Therefor the db needs a lot of time.

The quotes define the expression as a string, whereas without the single quote it is evaluated as a number. This means that MySQL is forced to perform a Type Conversion to convert the number to a CHAR to do a proper comparison.
As the doc above says,
For comparisons of a string column with a number, MySQL cannot use an
index on the column to look up the value quickly. If str_col is an
indexed string column, the index cannot be used when performing the
lookup...
However, the inverse of that is not true and while the index can be used, using a string as a value causes a poor execution plan (as illustrated by jkavalik's sqlfiddle) where using where is used instead of the faster using index condition. The main difference between the two is that the former requires a row lookup and the latter can get the data directly from the index.
You should definitely modify the column data type (assuming it truly is only meant to contain numbers) to the appropriate data type ASAP, but make sure that no queries are actually using single quotes, otherwise you'll be back where you started.

Cakephp 3 create i18n table in phpmyadmin issue

I have a problem to create i18n table for CakePHP 3 Translate Behavior. So I have my database in phpmyadmin and when I want to execute this piece of code from the official cookbook :
CREATE TABLE i18n (
id int NOT NULL auto_increment,
locale varchar(6) NOT NULL,
model varchar(255) NOT NULL,
foreign_key int(10) NOT NULL,
field varchar(255) NOT NULL,
content text,
PRIMARY KEY (id),
UNIQUE INDEX I18N_LOCALE_FIELD(locale, model, foreign_key, field),
INDEX I18N_FIELD(model, foreign_key, field)
);
PhpMyAdmin say :
1071 - Specified key was too long; max key length is 767 bytes
I'm in uft8_unicode_ci. Should I go for utf8_general_ci?
Thanks for your help.

There is no difference in size requirements between utf8_unicode and utf8_general, they only differ with regards to sorting.
By default the index (key prefix) limit is 767 bytes for InnoDB tables (and 1000 bytes for MyISAM), if applicable enable the innodb_large_prefix option (it is enabled by default as of MySQL 5.7) which raises the limit to 3072 bytes, or make the VARCHAR columns smaller, and/or change their collation, the locale column (which holds ISO locale/country codes) surely doesn't use unicode characters, and chances are that your model and column/field names also only use ASCII characters, and that their names are way below 255 characters in length.
With an ASCII collation the VARCHAR columns require only 1 byte per char, unlike with UTF-8, which can require up to 3 bytes (or 4 bytes for the mb4 variants), which alone already causes the index size limit to be exceeded (3 * 255 * 2 = 1530).
See also
MySQL 5.7 Manual > Character Sets and Collations
MySQL 5.7 Manual > Limits on InnoDB Tables > Maximums and Minimums
MySQL 5.7 Manual > InnoDB Startup Options and System Variables > innodb_large_prefix

I have limited my request with :
model varchar(85) NOT NULL,
field varchar(85) NOT NULL,
model and field at 85, I think it's enought, I mysql accept it.
Hope that will help someone.

Database field lengths for storing data with unknown und technically unlimited length?

I have to store DOIs in a MySQL database. The handbook says:
There is no limitation on the length of a DOI name.
So far, the maximum length of a DOI in my current data is 78 chars. Which field length would you recommend in order to not waste storage space and to be on the safe side? In general:
How do you handle the problem of not knowing the maximum length of input data that has to be stored in a database, considering space and the efficiency of transactions?
EDIT
There are these two (simplified) tables document and topic with a one-to-many relationship:
CREATE TABLE document
(
ID int(11) NOT NULL,
DOI ??? NOT NULL,
PRIMARY KEY (ID)
);
CREATE TABLE topic
(
ID int(11) NOT NULL,
DocID int(11) NOT NULL,
Name varchar(255) NOT NULL,
PRIMARY KEY (ID),
FOREIGN KEY (DocID) REFERENCES Document(ID), UNIQUE(DocID)
);
I have to run the following (simplified) query for statistics, returning the total value of referenced topic-categories per document (if there are any references):
SELECT COUNT(topic.Name) AS number, document.DOI
FROM document LEFT OUTER JOIN topic
ON document.ID = topic.DocID
GROUP BY document.DOI;
The character set used is utf_8_general_ci.

TEXT and VARCHAR can store 64KB. If you're being extra paranoid, use LONGTEXT which allows 4GB, though if the names are actually longer than 64KB then that is a really abusive standard. VARCHAR(65535) is probably a reasonable accommodation.
Since VARCHAR is variable length then you really only pay for the extra storage if and when it's used. The limit is just there to cap how much data can, theoretically, be put in the field.

Space is not a problem; indexing may be a problem. Please provide the queries that will need an index on this column. Also provide the CHARACTER SET needed. With those, we can discuss the ramifications of various cutoffs: 191, 255, 767, 3072, etc.

Hash method for database string search?

I have a MySQL InnoDB database and one of the fields in a table is term VARCHAR(255) CHARACTER SET utf8 NOT NULL
This is too large, as it can be 255*3 = 765 bytes. It's still within the limit of 767 bytes InnoDB has, but I want to speed up searches based on the term as well save space by reducing the size of the indexes.
Instead of using the term as a key, I decided to use a hash of term.
What kind of hash method should I use?
edit: I am storing search terms, e.g. "how to find a new car", "iphone 5", "best yugioh card" etc

The best way is to use MD5 like this:
CREATE TABLE termtable
(
id int not null auto_increment,
term VARCHAR(255) CHARACTER SET utf8 NOT NULL,
termhash char(32) not null,
primary key (id),
key (termhash)
);
If you are looking for one specific value and those values could be lengths well beyond 32 characters, you could store the hash value:
INSERT INTO mytable (term,termhash)
VALUES ('a long string',MD5('a long string'));
That way, you just such for hash values to retrieve results
SELECT * FROM termtable WHERE termhash = MD5('a long string');

MySQL includes the MD5 algorithm. The resulting hash is only 32 hex characters, or 16 binary "bytes".

What does size limit on MySQL index mean?

I have a table created like so:
CREATE TABLE `my_table` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`info` varchar(50) DEFAULT NULL,
`some_more_info` smallint(5) unsigned NOT NULL
PRIMARY KEY (`id`),
KEY `my_index` (`some_more_info`,`info`(24)),
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
My question is about the second key called my_index. What does the "(24)" size limit mean? The actual size of the column is 50, but the index is only 24 characters.
Does this mean that MySQL indexes only the first 24 characters of the column info?

In short, yes, the first 24 characters are taken into consideration to build the BTree index. Indexing limits are assigned to text types such as varchar and text, as they don't affect numeric precision.

Yes.
The entire description about the index length can be found here:
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
Prefix lengths are given in characters for nonbinary string types and
in bytes for binary string types. That is, index entries consist of
the first length characters of each column value for CHAR, VARCHAR,
and TEXT columns, and the first length bytes of each column value for
BINARY, VARBINARY, and BLOB columns.
Also you create query has/had some extra ,'s.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL index for long strings - mysql

I have MySQL InnoDb table where I want to store long (limit is 20k symbols) strings. Is there any way to create index for this field?

Use the sha2 function with a specific length. Add this to your table: `hash` varbinary(32) GENERATED ALWAYS AS (unhex(sha2(`your_text`,256))) ADD UNIQUE KEY `ix_hash` (`hash`); Read about the SHA2 function

Related

Why mysql query is slow without quotation mark?

Cakephp 3 create i18n table in phpmyadmin issue

Database field lengths for storing data with unknown und technically unlimited length?

Hash method for database string search?

What does size limit on MySQL index mean?

Categories

Resources