MySQL search text in best performance - mysql

Is there any way to search on specific text in MySQL without using the Full Text Search
I know LIKE is a solution but using wildcard at the beginning will disable using indexes , therefore not best performance for large data

Please specify more details of your use case.
Meanwhile, I have found this to be beneficial in some use cases. For example, suppose you wanted to search for a bracketed word:
WHERE MATCH(col) AGAINST('+word' IN BOOLEAN MODE)
AND col LIKE '%[word]%'
The MATCH would rapidly find the few rows with "word", then the LIKE would slowly check those few rows. It gives reasonably fast overall speed while checking for some types of non-words.

Related

MySQL - Querying against a non-indexable field

The webpage in question is https://www.christart.com/poetry/
I have a MySQL table with little over 7,000 records of poems entries. I'm getting requests from my users to be able to run queries against they body of the poems. But they are saved in a 'text' column.
I know how to write the SQL statement. That's easy enough. My concern is the load on the database. I always index columns that are queried or join on. But can't index a 'text' column.
There must be a way. How should I approach this?
You could use a full text index:
CREATE FULLTEXT INDEX poem_contents ON poems(body);
And then search using match:
SELECT *
FROM poems
WHERE MATCH(body) AGAINST ('some phrase' IN BOOLEAN MODE)
There's no reason that you can't index a text field. That being said, there's probably very little value in indexing a text field that's containing entire poems.
If your database only has 7,000 rows, you probably won't see a massive performance hit unless you scale much larger than it currently is. For a larger scale, a better solution would probably be to extract keywords from the body and search on those.
I think you must explore Apache Lucene or similar kind of project which provide full text search. Alternatively you can check mongodb instead of mysql. It got number of index types. There are also Solr/ElasticSearch which at back uses Lucene.
Poem body, I assume, it will be stored in varchar type. I dont know indexing possible on varchar or not & dont think it wise to indexing entire poem body. Something like Lucene/Solr provides better option.
Please note, I am not related to any of the product mentioned above.

To keep periods in acronyms or not in a database?

Acronyms are a pain in my database, especially when doing a search. I haven't decided if I should accept periods during search queries. These are the problems I face when searching:
'IRQ' will not find 'I.R.Q.'
'I.R.Q' will not find 'IRQ'
'IRQ.' or 'IR.Q' will not find 'IRQ' or 'I.R.Q.'
etc...
The same problem goes for ellipses (...) or three series of periods.
I just need to know what directions should I take with this issue:
Is it better to remove all periods when inserting the string to the database?
If so what regex can I use to identify periods (instead of ellipses or three series of periods) to identify what needs to be removed?
If it is possible to keep the periods in acronyms, how can it be scripted in a query to find 'I.R.Q' if I input 'IRQ' in the search field, through MySQL using regex or maybe a MySQL function I don't know about?
My responses for each question:
Is it better to remove all periods when inserting the string to the database?
Yes and no. You want the database to have the original text. If you want, create a separate field that is "cleaned up" to search against. Here, you can remove periods, make everything lowercase, etc.
If so what regex can I use to identify periods (instead of ellipses or three series of periods) to identify what needs to be removed?
/\.+/
That finds one or more periods in a given spot. But you'll want to integrate it with your search formula.
Note: regex on a database isn't known to have high performance. Be cautious with this.
Other note: you may want to use FullText search in MySQL. This also, isn't known to have high performance with data sets over 1000+ entries. If you have big data and need fulltext search, use Sphinx (available as a MySQL plug-in and RAM-based indexing system).
If it is possible to keep the periods in acronyms, how can it be scripted in a query to find 'I.R.Q' if I input 'IRQ' in the search field, through MySQL using regex or maybe a MySQL function I don't know about?
Yes, by having the 2 fields I described in the first bullet's answer.
You need to consider the sanctity of your input. If it is not yours to alter then don't alter it. Instead you should have a separate system to allow for text searching, and that can alter the text as it sees fit to be able to handle these types of issues.
Have a read up on Lucene, and specifically Lucene's standard analyzer, to see the types of changes that are commonly carried out to allow successful searching of complex text.
I think you can use the REGEXP function of MySQL to send an acronym :
SELECT col1, col2...coln FROM yourTable WHERE colWithAcronym REGEXP "#I\.?R\.?Q\.?#"
If you use PHP you can build your regexp by this simple loop :
$result = "#";
foreach($yourAcronym as $char){
$result.=$char."\\.?";
}
$result.="#";
The functionality you are searching for is a fulltext search. Mysql supports this for myisam-tables, but not for innodb. (http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html)
Alternatively you could go for an external framework that provides that funcionality. Lucene is a popular open-source one. (lucene.apache.org)
There would be 2 methods,
1. save data -removing symbols from text and match accordingly,
2. you can make a regex ,like this for eg.
select * from table where acronym regexp '^[A-Z]+[.]?[A-Z]+[.]?[A-Z]+[.]?$';
Please note, however, that this requires the acronym to be stored in uppercase. If you don't want the case to matter, just change [A-Z] to [A-Za-z].

How to optimize full text searches?

I know there are a lot of question already on this subject, but I needed more specific information. So here goes:
Ideally what should be the maximum length of characters upon which a full text search can be performed using minimal resources (CPU, memory)?
When should I decide between using the LIKE %$str% and full-text search?
Is it important to have both versions LIKE %$str% and full-text search implemented and use the optimal one dynamically?
As far as I know it depends on the number of words, not characters. The fewer, the faster mysql will be. But don't let that get in your way.
Never use LIKE if you can use a full-text search. Except maybe for queries that you would manually run once in a while and you don't want to slow down the INSERTs on that table.
You know the speed of select vs speed of insert tradeoff in indexes, right?
Always use FT (full-text) search in queries that you don't run manually. LIKE is slow and becomes really slower when the number of rows increases. This is because the mysql engine has to look into EVERY row to answer your query. And FT keeps an index and knows exactly where to look.

MySQL Fulltext search but using LIKE

I'm recently doing some string searches from a table with about 50k strings in it, fairly large I'd say but not that big. I was doing some nested queries for a 'search within results' kinda thing. I was using LIKE statement to get a match of a searched keyword.
I came across MySQL's Full-Text search which I tried so I added a fulltext index to my str column. I'm aware that Full-text searches doesn't work on virtually created tables or even with Views so queries with sub-selects will not fit. I mentioned I was doing a nested queries, example is:
SELECT s2.id, s2.str
FROM
(
SELECT s1.id, s1.str
FROM
(
SELECT id, str
FROM strings
WHERE str LIKE '%term%'
) AS s1
WHERE s1.str LIKE '%another_term%'
) AS s2
WHERE s2.str LIKE '%a_much_deeper_term%';
This is actually not applied to any code yet, I was just doing some tests. Also, searching strings like this can be easily achieved by using Sphinx (performance wise) but let's consider Sphinx not being available and I want to know how this will work well in pure SQL query. Running this query on a table without Full-text added takes about 2.97 secs. (depends on the search term). However, running this query on a table with Full-text added to the str column finished in like 104ms which is fast (i think?).
My question is simple, is it valid to use LIKE or is it a good practice to use it at all in a table with Full-text added when normally we would use MATCH and AGAINST statements?
Thanks!
In this case you not neccessarily need subselects. You can siply use:
SELECT id, str
FROM item_strings
WHERE str LIKE '%term%'
AND str LIKE '%another_term%'
AND str LIKE '%a_much_deeper_term%'
... but also raises a good question: the order in which you are excluding the rows. I guess MySQL is smart enough to assume that the longest term will be the most restrictive, so starting with a_much_deeper_term it will eliminate most of the records then perform addtitional comparsion only on a few rows. - Contrary to this, if you start with term you will probably end up with many possible records then you have to compare them against the st of the terms.
The interesting part is that you can force the order in which the comparsion is made by using your original subselect example. This gives the opportunity to make a decision which term is the most restrictive based upon more han just the length, but for example:
the ratio of consonants a vowels
the longest chain of consonants of the word
the most used vowel in the word
...etc. You can also apply some heuristics based on the type of textual infomation you are handling.
Edit:
This is just a hunch but it could be possible to apply the LIKE to the words in the fulltext indexitself. Then match the rows against the index as if you have serched for full words.
I'm not sure if this is actually done, but it would be a smart thing to pull off by the MySQL people. Also note that this theory can only be used if all possible ocurrences arein fact in the fulltext search. For this you need that:
Your search pattern must be at least the size of the miimal word-length. (If you re searching for example %id% then it can be a part of a 3 letter word too, which is excluded by default form FULLTEXT index).
Your search pattern must not be a substring of any listed excluded word for example: and, of etc.
Your pattern must not contain any special characters.

MYSQL Fulltext search and LIKE

I am working with MySQL full text search but find it lacking in situations where your string is part of a word within a field. If my field is "New York Times" and I search for "Time" I get no results. The hackish way to solve this is to set up two queries, one that does a full text search and the other that does:
SELECT * FROM ___ WHERE 'string' LIKE %searchterm%
Is there any way that I can set up my full text search to solve this issue so I don't have to run the extra query?
You can use wild cards in your full text search. More info here
SELECT * FROM _____ WHERE MATCH (title) AGAINST ('time*' IN BOOLEAN MODE);
I've basically given up on MySql's full text search in favor of Sphinx -- a dedicated full-text search engine which implements MySql's network protocol so you can "interpose" it between your clients and a MySql server, losing nothing and gaining rich, serious full-text capabilities. Maybe it's not suitable for your needs (which you don't fully express), but I think it's at least work checking out!