i have an innodb mysql table with a varchar(100) field called 'word'.
i created a btree index on an that field.
if i do a query like select * from table where word = 'non-linear', i get all variations of that word. so results would include (Non-linear, Non-Linear, non-Linear, etc...).
it seems to me that that index doesnt care about the capitalization.
does this mean that the index has 1 record for this word?
if so, is there a way to get a list of all the keys in the index so that i would essentially have a list of unique terms?
String comparisons are not case-sensitive for non-binary data types like varchar, char, text. For binary strings (BINARY, VARBINARY, BLOB), comparisons use the numeric values of the bytes in the operands; this means that for alphabetic characters, comparisons will be case sensitive.
More information available at Mysql Docs.
Related
I have a situation where we're storing long unique IDs (up to 200 characters) that are single TEXT entries in our database. The problem is we're using a FULLTEXT index for speed purposes and it works great for the smaller GUID style entries. The problem is it won't work for the entries > 84 characters due to the limitations of innodb_ft_max_token_size, which apparently cannot be set > 84. This means any entries more than 84 characters are omitted from the Index.
Sample Entries (actual data from different sources I need to match):
AQMkADk22NgFmMTgzLTQ3MzEtNDYwYy1hZTgyLTBiZmU0Y2MBNDljMwBGAAADVJvMxLfANEeAePRRtVpkXQcAmNmJjI_T7kK7mrTinXmQXgAAAgENAAAAmNmJjI_T7kK7mrTinXmQXgABYpfCdwAAAA==
AND
<j938ir9r-XfrwkECA8Bxz6iqxVth-BumZCRIQ13On_inEoGIBnxva8BfxOoNNgzYofGuOHKOzldnceaSD0KLmkm9ET4hlomDnLu8PBktoi9-r-pLzKIWbV0eNadC3RIxX3ERwQABAgA=#t2.msgid.quoramail.com>
AND
["ca97826d-3bea-4986-b112-782ab312aq23","ca97826d-3bea-4986-b112-782ab312aaf7","ca97826d-3bea-4986-b112-782ab312a326"]
So what are my options here? Is there any way to get the unique strings of 160 (or so) characters working with a FULLTEXT index?
What's the most efficient Index I can use for large string values without spaces (up to 200 characters)?
Here's a summary of the discussion in comments:
The id's have multiple formats, either a single token of variable length up to 200 characters, or even an "array," being a JSON-formatted document with multiple tokens. These entries come from different sources, and the format is outside of your control.
The FULLTEXT index implementation in MySQL has a maximum token size of 84 characters. This is not able to search for longer tokens.
You could use a conventional B-tree index (not FULLTEXT) to index longer strings, up to 3072 bytes in current versions of MySQL. But this would not support cases of JSON arrays of multiple tokens. You can't use a B-tree index to search for words in the middle of a string. Nor can you use an index with the LIKE predicate to match a substring using a wildcard in the front of the pattern.
Therefore to use a B-tree index, you must store one token per row. If you receive a JSON array, you would have to split this into individual tokens and store each one on a row by itself. This means writing some code to transform the content you receive as id's before inserting them into the database.
MySQL 8.0.17 supports a new kind of index on a JSON array, called a Multi-Value Index. If you could store all your tokens as a JSON array, even those that are received as single tokens, you could use this type of index. But this also would require writing some code to transform the singular form of id's into a JSON array.
The bottom line is that there is no single solution for indexing the text if you must support any and all formats. You either have to suffer with non-optimized searches, or else you need to find a way to modify the data so you can index it.
Create a new table 2 columns: a VARCHAR(200) CHARSET ascii COLLATION ascii_bin (BASE64 needs case sensitivity.)
That table may have multiple rows for one row in your main table.
Use some simple parsing to find the string (or strings) in your table to add them to this new table.
PRIMARY KEY(that-big-column)
Update your code to also do the INSERT of new rows for new data.
Now a simple BTree lookup plus Join will solve all your plans.
TEXT does not work with indexes, but VARCHAR up to some limit does work. 200 with ascii is only 200 bytes, much below the 3072 limit.
I need to store, query and update a large amount of file hashes. What would be the optimal mysql schema for this kind of table? Should I use a hash index eg.
CREATE INDEX hash_index on Hashes(id) using HASH;
can I reuse the PK hash for the index ? (as I understand, the "using hash" will create a hash from the hash)
File hashes are fixed-length data items (unless you change the hash type after you have created some rows). If you represent your file hashes in hexadecimal or Base 64, they'll have characters and digits in them. For example, sha-256 hashes in hex take 64 characters (four bits in each character).
These characters are all 8-bit characters, so you don't need unicode. If you're careful about filling them in, you don't need case sensitivity either. Eliminating all these features of database columns makes the values slightly faster to search.
So, make your hashes fixed-length ASCII columns, using ddl like this:
hash CHAR(64) COLLATE 'ascii_bin'
You can certainly use such a column as a primary key.
Raymond correctly pointed out that MySQL doesn't offer hash indexes except for certain types of tables. That's OK: ordinary BTREE indexes work reasonably well for this kind of information.
I have a table with approximately 2mn records. It has a column with date values in String format (similar format). Now I need to filter records based on this string date column. I tried with STR_TO_DATE but it takes ages to fetch records as this column doesn't have an INDEX.
Can anyone help me adding an Index to it?
Wrapping a column in function in a predicate forces MySQL to evaluate the function on every row, effectively disabling a potentially more efficient index seek or range scan.
As an example, MySQL can't use an index range scan with this
... FROM t WHERE STR_TO_DATE(t.mycol,'%Y-%m-%d') = '2018-04-23'
^^^^^^^^^^^^ ^^^^^^^^^^^^
but having the SQL reference a bare column would allow MySQL to consider using a range scan operation on an appropriate index ...
... FROM t WHERE t.mycol = DATE_FORMAT('2018-04-23','%Y-%m-%d')
^^^^^^^
A first cut at an suitable index for the latter query would be
CREATE INDEX t_IX1 ON t (mycol)
This isn't necessarily the best index for the query. It really depends on the query. For example, a covering index might be a more suitable choice.
The question mentions storing "date" values as strings, presumably meaning CHAR or VARCHAR datatype. Note that MySQL implements a native DATE datatype, which is custom designed for storing "date" values.
Good Day,
I have concern about indexing in mysql. I am trying to limit the Index size of specific DB table column which column names like ###ID.
this seems that the ID looks unique in first 8 bytes instead of entire length.
Thanks in advance.
As mysql documentation on creating indexes describes:
For string columns, indexes can be created that use only the leading part of column values, using col_name(length) syntax to specify an index prefix length.
Prefixes can be specified for CHAR, VARCHAR, BINARY, and VARBINARY column indexes.
Prefixes must be specified for BLOB and TEXT column indexes.
Prefix limits are measured in bytes, whereas the prefix length in CREATE TABLE, ALTER TABLE, and CREATE INDEX statements is interpreted
as number of characters for nonbinary string types (CHAR, VARCHAR,
TEXT) and number of bytes for binary string types (BINARY, VARBINARY,
BLOB). Take this into account when specifying a prefix length for a
nonbinary string column that uses a multibyte character set.
For spatial columns, prefix values cannot be given, as described later in this section.
The statement shown here creates an index using the first 10
characters of the name column (assuming that name has a nonbinary
string type):
CREATE INDEX part_of_name ON customer (name(10));
You can set the size after the field name. You must remove the old key first
ALTER TABLE mytab
ADD KEY `parent` (`parent`(8)) ;
If I have a column short_title in MySQL table and it is defined as UNIQUE, do I also have to add FULLTEXT for it to be searchable really fast? Or does UNIQUE already guarantee it will be searchable quickly (without full table scan)?
Thanks, Boda Cydo
UNIQUE will locate the verbatim short_title using the underlying index.
If you need a word match (as opposed to verbatim match), use FULLTEXT index.
Also note that by default the B-Tree indexes in MyISAM against the VARCHAR columns are subject to key compression. This can slow down the searches for the titles closer to the end of the alphabet:
Index search time depends on the value being searched
Finally, the VARCHAR keys tend to be large in size.
For the fastest verbatim searches, you should store a MD5 hash of the title in a BINARY(16) column, create a UNIQUE index over it (disabling the key compression) and search for the hash.