How to search for a text? (MySQL) - mysql

I have this table:
bussId | nameEn | keywords
500 name1 name2 keyword1 keyword2
I want to return bussId 5000 if the user search for (keyword1 or keyword2 or name2 or name1).
So I should use this query SELECT * FROM business WHERE nameEn LIKE '%searched_word%'.
But this query doesn't use the index nameEn or keywords, according to Comparison of B-Tree and Hash Indexes "The index also can be used for LIKE comparisons if the argument to LIKE is a constant string that does not start with a wildcard character".
I have this solution, I want to create another table and insert all the single words:
bussId | word
500 name1
500 name2
500 keyword1
500 keyword2
Then I will search for the bussId using this query:
SELECT * WHERE word LIKE 'searched_word%'.
In that way I will be sure that the MySQL will use the index , and it will be faster, but this table will contain about 20 million rows!
Is there another solution?

You have to use a fulltext index using MyISAM or InnoDB from MySQL 5.6 onwards:
mysql> ALTER TABLE business ADD FULLTEXT(nameEn, keywords);
And here is your request:
mysql> SELECT * FROM business
-> WHERE MATCH (nameEn, keywords) AGAINST ('searched_word');

Did you try the Instr() or Locate() functions? Here is a SO discussion comparing them with Like but may prove better comparing a front % wildcard. Still it runs full table scans but unaware how the MySQL query optimizer indexes with string functions.
SELECT * FROM business WHERE Instr(nameEN, 'search_word') > 0
OR
SELECT * FROM business WHERE Locate(nameEN, 'search_word') > 0
Also, there may be other areas of optimization. See if other potential indices are available in the business table, explicitly declare specific columns instead of the asterisk (*) if all columns are not being used, and parse the nameEN and keywords columns by the spaces so columns retain one value (with potential to transpose), then use the implicit join, WHERE, or explicit join, JOIN. This might even be a table design issue with the challenge of storing multiple values in a singe field.

With new version of MySQL you don't need to make engine "MyISAM", InnoDB is also support FULLTEXT index (I've tested this 5.6.15, supports form version >=5.6.4 ).
So if your server version higher then 5.6.4 than you need just add FULLTEXT index to your table and make select with MATCH(...)AGAINST(...), example below
CREATE FULLTEXT INDEX idx ON business (nameEn);
SELECT * FROM business
WHERE match(nameEn)against('+searched_word' IN BOOLEAN MODE);

Use the below statement in MySQL or SQL it'll return perfect result:
SELECT * FROM business WHERE (nameEn LIKE 'searched_word%' OR nameEn LIKE '%searched_word%') OR (keywords LIKE 'searched_word%' OR keywords LIKE '%searched_word%') AND bussID = 500;
This should work.

20 million records is quite a lot and a mapping table with varchar column would allocate the max allowed chars in byte for each row + 32-bit for integer column.
What if you could just create a table like (id int, crc int) and store only the text data's crc32 value. It's case sensitive so you need to convert to uppercase/lowercase while populating the data and the same when comparing.
I agree with the full-text approach but to save space and use the advantage of indexing, you can try something like below.
Create Temporary TABLE t (id INT, crc INT);
Insert Into t
Select 500, CRC32(UPPER('name1'))
Union Select 500, CRC32(UPPER('name2'))
Union Select 500, CRC32(UPPER('keyword1'))
Union Select 500, CRC32(UPPER('keyword2'));
Select * From t Where crc = CRC32(UPPER('keyword2');

Related

MySQL fulltext or exact value search is very slow

I have this table of 9.5 million rows and I need to perform both fulltext and exact value search over the same column.
Altho there are 2 indexes over this column, one BTREE, one FULLTEXT, database engine doesn't use any and goes thru all 9.5M rows.
select * from mytable
where match(document) against ('+111/05257' in boolean mode)
or document = '111/05257';
-- very slow, takes ~ 9 seconds
-- possible keys: both
-- used key: none :(
If I use only one type of search, queries run fast.
select * from mytable where document = '111/05257';
-- very fast, around 80 ms
-- used key: btree
select * from mytable where match(document) against ('+111/05257' in boolean mode)
-- very fast, around 100 ms
-- used key: fulltext
Given poorly structured data at document column, ranging from '1/XA' thru '5778292019' to 'S:NXA/0001/XA2019/111/05257', I need to use both exact and partial (fulltext) search over this column.
Wildcard searches ('%111/05257%') also perform terribly over btree index.
Any idea how to solve this?
Thank you all
Queries involving OR are notoriously hard to optimize. A common solution is to change them into two queries, and UNION the results:
select * from mytable
where match(document) against ('+111/05257' in boolean mode)
UNION
select * from mytable
where document = '111/05257';
Each of the respective queries should be free to use a different index. The UNION will eliminate any rows in common from the two results.

MySql Slow Request Time

I have a very good pc and I was wondering why does it takes so long to make a very simple request to one of my table.
My pc:
i9 10900kf (10 cores)
64 GB ram
2TB NVME SSD
RTX 3090 Ventus
1: My table has 531 732 rows (this is not a lot of rows) with 39 columns
I have the following indexes to my table:
DisplayName
TmiSentTs
Username
Channel
DisplayName_TmiSentTs
Username_TmiSentTs
When I make the following query, it takes 66.016 seconds to get a response:
SELECT from_unixtime(TmiSentTs/1000,'%Y/%m/%d %H:%i:%s'),
DisplayName, message
FROM MY_TABLE
WHERE displayname LIKE '%ab%'
ORDER BY TmiSentTs DESC;
I don't think this is normal.
I tried to:
change my innodb_write_io_threads from 4 to 32
change my innodb_read_io_threads from 4 to 32
But none of this works and I have another table (in another database) with 37 325 332 rows and it takes 2 seconds for a similar query.
EDIT:
After a bit of research, I found that this
SELECT * FROM pepegaclapwr.twitchmessages where instr('aa',username)<>0;
Is faster than
SELECT * FROM pepegaclapwr.twitchmessages where username like '%aa%';
For the same result
An index on the displayname column won't help this query. Any LIKE condition with wildcards at the start of the pattern you are searching for cannot use an index, so it is bound to do a table-scan.
Think about a telephone book. If I ask you to look up last names that start with "T" it's easy because the book is sorted by last name. But if I ask you to look up names where "T" is the 4th letter (or anything after the first letter), the fact that the book is sorted doesn't help. You still have to read the whole book page by page to find the names I asked for.
To optimize the kind of query like the one you show above, you may find it easier to use a fulltext index, but that's only if you are searching for whole words. It looks like you are searching for any string that contains "ab" somewhere in it. This is not a whole word, so a fulltext index won't help either.
In that case, your only solution is to add another column to the table to indicate whether the row contains "ab", and then index that column. In MySQL 5.7 and later, you can can make a virtual column based on an expression.
ALTER TABLE MyTable
ADD COLUMN displayname_ab TINYINT NOT NULL AS (displayname LIKE '%ab%'),
ADD INDEX (displayname_ab);
Then search:
SELECT ... FROM MyTable WHERE displayname_ab = true;
In MySQL 8.0 you don't even need to make a virtual column, you can make an expression index directly from an expression on an existing column:
ALTER TABLE MyTable
ADD INDEX ((displayname LIKE '%ab%'));
Then if you search using the exact same expression, and it will use the index:
SELECT ... FROM MyTable WHERE displayname LIKE '%ab%';
But this fixes the string you need into the virtual column definition. It doesn't help if you need to search for "cd" tomorrow, or any other pattern.

Why using IN(...) when selecting on indexed fields, will kill the performance of SELECT query?

Avoid using IN(...) when selecting on indexed fields, It will kill the performance of SELECT query.
I found this here: https://wikis.oracle.com/pages/viewpage.action?pageId=27263381
Can you explain it? Why that will kill performance? And what should I use instead of IN. "OR" statement maybe?
To tell the truth, that statement contradicts to many hints that I have read in books and articles on MySQL.
Here is an example: http://www.mysqlperformanceblog.com/2010/01/09/getting-around-optimizer-limitations-with-an-in-list/
Moreover, expr IN(value, ...) itself has additional enhancements for dealing with large value lists, since it is supposed to be used as a useful alternative to certain range queries:
If all values are constants, they are evaluated according to the type of expr and sorted. The search for the item then is done using a binary search. This means IN is very quick if the IN value list consists entirely of constants.
Still overusing INs may result in slow queries. Some cases are noted in the article.
Because MySQL can't optimize it.
Here is an example:
explain select * from keywordmaster where id in (1, 567899);
plan (sorry for external link. Doesn't show correctly here)
here is another query:
explain
select * from table where id = 1
union
select * from keywordmaster where id = 567899
plan
As you can see in the second query we get ref as const and type is const instead of range. MySQL can't optimize range scans.
Prior to MySQL 5.0 it seems that mySQL would only use a single index for a table. So, if you had a SELECT * FROM tbl WHERE (a = 6 OR b = 33) it could chooose to use either the a index or the b index, but not both. Note that it says fields, plural. I suspect the advice comes from that time and the work-around was to union the OR results, like so:
SELECT * FROM tbl WHERE (a = 6)
UNION
SELECT * FROM tbl WHERE (b = 33)
I believe IN is treated the same as a group of ORs, so using ORs won't help.
An alternative is to create a temporary table to hold the values of your IN-clause and then join with that temporary table in your SELECT.
For example:
CREATE TEMPORARY TABLE temp_table (v VARCHAR)
INSERT INTO temp_table VALUES ('foo')
INSERT INTO temp_table VALUES ('bar')
SELECT * FROM temp_table tmp, orig_table orig
WHERE temp_table.v = orig.value
DROP TEMPORARY TABLE temp_table

What is the most efficient MySQL query to find all entries that start with a number?

In a database that has over 1 million entries, occasionally we need to find all rows that have a column name that starts with a number.
This is what currently is being used, but it just seems like there may be a more efficient manner in doing this.
SELECT * FROM mytable WHERE name LIKE '0%' OR name LIKE '1%' OR name ...
etc...
Any suggestions?
select * from table where your_field regexp '^[0-9]'
Hey,
you should add an index with a length of 1 to the field in the db. The query will then be significantly faster.
ALTER TABLE `database`.`table` ADD INDEX `indexName` ( `column` ( 1 ) )
Felix
My guess is that the indexes on the table aren't being used efficiently (if at all)
Since this is a char field of some type, and if this is the primary query on this table, you could restructure your indexes (and my mysql knowledge is a bit short here, somebody help out) such that this table is ordered (clustered index in ms sql) by this field, thus you could say something like
select * from mytable where name < char(57) and name > char(47)
Do some testing there, I'm not 100% on the details of how mysql would rank those characters, but that should get you going.
Another option is to have a new column that gives you a true/false on "starts_with_number". You could setup a trigger to populate that column. This might give the best and most predictable results.
If you're not actually using each and every field in the rows returned, and you really want to wring every drop of efficiency out of this query, then don't use select *, but instead specify only the fields you want to process.
I'm thinking...
SELECT * FROM myTable WHERE IF( LEFT( name, 1) = '#', 1,0)

How to speed up SELECT .. LIKE queries in MySQL on multiple columns?

I have a MySQL table for which I do very frequent SELECT x, y, z FROM table WHERE x LIKE '%text%' OR y LIKE '%text%' OR z LIKE '%text%' queries. Would any kind of index help speed things up?
There are a few million records in the table. If there is anything that would speed up the search, would it seriously impact disk usage by the database files and the speed of INSERT and DELETE statements? (no UPDATE is ever performed)
Update: Quickly after posting, I have seen a lot of information and discussion about the way LIKE is used in the query; I would like to point out that the solution must use LIKE '%text%' (that is, the text I am looking for is prepended and appended with a % wildcard). The database also has to be local, for many reasons, including security.
An index wouldn't speed up the query, because for textual columns indexes work by indexing N characters starting from left. When you do LIKE '%text%' it can't use the index because there can be a variable number of characters before text.
What you should be doing is not use a query like that at all. Instead you should use something like FTS (Full Text Search) that MySQL supports for MyISAM tables. It's also pretty easy to make such indexing system yourself for non-MyISAM tables, you just need a separate index table where you store words and their relevant IDs in the actual table.
Update
Full text search available for InnoDB tables with MySQL 5.6+.
An index won't help text matching with a leading wildcard, an index can be used for:
LIKE 'text%'
But I'm guessing that won't cut it. For this type of query you really should be looking at a full text search provider if you want to scale the amount of records you can search across. My preferred provider is Sphinx, very full featured/fast etc. Lucene might also be worth a look. A fulltext index on a MyISAM table will also work, but ultimately pursuing MyISAM for any database that has a significant amount of writes isn't a good idea.
An index can not be used to speed up queries where the search criteria starts with a wildcard:
LIKE '%text%'
An index can (and might be, depending on selectivity) used for search terms of the form:
LIKE 'text%'
Add a Full Text Index and Use MATCH() AGAINST().
Normal indexes will not help you with like queries, especially those that utilize wildcards on both sides of the search term.
What you can do is add a full text index on the columns that you're interested in searching and then use a MATCH() AGAINST() query to search those full text indexes.
Add a full text index on the columns that you need:
ALTER TABLE table ADD FULLTEXT INDEX index_table_on_x_y_z (x, y, z);
Then query those columns:
SELECT * FROM table WHERE MATCH(x,y,z) AGAINST("text")
From our trials, we found these queries to take around 1ms in a table with over 1 million records. Not bad, especially compared to the equivalent wildcard LIKE %text% query which takes 16,400ms.
Benchmarks
MATCH(x,y,z) AGAINST("text") takes 1ms
LIKE %text% takes 16400ms
16400x faster!
I would add that in some cases you can speed up the query using an index together with like/rlike if the field you are looking at is often empty or contains something constant.
In that case it seems that you can limit the rows which are visited using the index by adding an "and" clause with the fixed value.
I tried this for searching 'tags' in a huge table which usually does not contain a lot of tags.
SELECT * FROM objects WHERE tags RLIKE("((^|,)tag(,|$))" AND tags!=''
If you have an index on tags you will see that it is used to limit the rows which are being searched.
Maybe you can try to upgrade mysql5.1 to mysql5.7.
I have about 70,000 records. And run following SQL:
select * from comics where name like '%test%';
It takes 2000ms in mysql5.1.
And it takes 200ms in mysql5.7 or mysql5.6.
Another way:
You can mantain calculated columns with those strings REVERSEd and use
SELECT x, y, z FROM table WHERE x LIKE 'text%' OR y LIKE 'text%' OR z LIKE 'text%' OR xRev LIKE 'txet%' OR yRev LIKE 'txet%' OR zRev LIKE 'txet%'
Example of how to ADD a stored persisted column
ALTER TABLE table ADD COLUMN xRev VARCHAR(N) GENERATED ALWAYS AS REVERSE(x) stored;
and then create an indexes on xRev, yRev etc.
Another alternative to avoid full table scans is selecting substrings and checking them in the having statement:
SELECT
al3.article_number,
SUBSTR(al3.article_number, 2, 3) AS art_nr_substr,
SUBSTR(al3.article_number, 1, 3) AS art_nr_substr2,
al1.*
FROM
t1 al1
INNER JOIN t2 al2 ON al2.t1_id = al1.id
INNER JOIN t3 al3 ON al3.id = al2.t3_id
WHERE
al1.created_at > '2018-05-29'
HAVING
(art_nr_substr = "FLA" OR art_nr_substr = 'VKV' OR art_nr_subst2 = 'PBR');
When you optimize a SELECT foo FROM bar WHERE baz LIKE 'ZOT%' query, you want the index length to at least match the number of characters in the request.
Here is a real life example from just now:
Here is the query:
EXPLAIN SELECT COUNT(*) FROM client_detail cd
JOIN client_account ca ON cd.client_acct_id = ca.client_acct_id
WHERE cd.first_name LIKE 'XX%' AND cd.last_name_index LIKE 'YY%';
With no index:
+-------+
| rows |
+-------+
| 13994 |
| 1 |
+-------+
So first try a 4x index,
CREATE INDEX idx_last_first_4x4 on client_detail(last_name_index(4), first_name(4));
+------+
| rows |
+------+
| 7035 |
| 1 |
+------+
A bit better, but COUNT(*) shows there are only 102 results. So lets now add a 2x index:
CREATE INDEX idx_last_first_2x2 on client_detail(last_name_index(2), first_name(2));
yields:
+------+
| rows |
+------+
| 102 |
| 1 |
+------+
Both indexes are still in place at this point, and MySQL chose the latter index for this query---however it will still choose the 4x4 query if it is more efficient.
Index ordering may be useful, try the 2x2 before the 4x4 or vice-versa to see how it performs for your environment. To re-order an index you have to drop and re-create the earlier one.