Realtime search against only 1 column in Mysql - without any plugins - mysql

However I found some threads about this, but nothing fits to my case.
I have a search field in my mobile app, where after text change, the real time search is running via calling my API.
The search request starts only if there are 3 or more characters entered and is searching ONLY in 1 DB column, called TITLE. So each time the user enters a letter, a query is searching for it.
Currently I have it like this (I know this solution is very bad). $searchedword is the word user entered:
if (!empty($searchedword)&&strlen($searchedword)>2 ) {$searchedword=strtolower($searchedword);
$sql = "SELECT * FROM TABLE ";$result = $mysqli->query($sql); $output='';
if ($result->num_rows > 0) {
while($data=$result->fetch_array()) {
$title=strtolower($data['title']);$content=$data['content'];
if (strpos($title,$searchedword) !== false ) {$output.=$title.','.$content;}
}}
So this just checks, if the title from DB contains the searched word. This works very well, but I think it is very bad according to performance, because each time the user enters a letter to the search field, each time all the data from the table are queried and looked for that word.
I want to recreate my code to meet the best performance.
So my first question is, should I add a FULLTEXT INDEX to the TITLE column in DB, will it help or will it just increase the disk space? As I am just searching against 1 column and in this column is just a title (1 or 2 words max).
And second question, what should be the best query for my case and of course with the best performance? As I need to search after each letter which user enters.
Can I use the search this way?
SELECT * FROM TABLE WHERE MATCH (title) AGAINST ('$searchedword' IN NATURAL LANGUAGE MODE)
However it seems, this will return only if the word completely matches the title, but returns nothing when the word is part of the title, so it is not a good solution.
The only solution which works is this:
SELECT * FROM TABLE WHERE title LIKE '%$searchedword%' "
but what about performance? And I don't understand how this works, because searchedword are converted to lowercase and I have removed the accents from that word, and the TITLE column in DB has accents and also Uppercase, but this search works very well!

If your title column has a collation like utfmb4_general_ci, you don't have to worry about dealing with upper case, lower case, and diacritical marks in your MySQL WHERE clauses. MySQL will do it for you. It is really good at handling character sets and collations in all kinds of languages. (Such things are very helpful to Swedish-language users, and the inventors of MySQL are Swedish.)
FULLTEXT with NATURAL LANGUAGE MODE is probably not the right approach for this application. It works on words, not chunks of letters. So it probably won't give you anything until your user has typed a whole word, and not a stop word. And, it is a little squirrely when you search a table with only a few rows. So, that might be a problem if you're just getting started.
It does order the results by the closeness of the match, so the most likely hit is the first one. So, if you know you have a phrase to search, it's good.
For your progressive-search application you may want to use one of these two LIKE queries.
SELECT title FROM tbl WHERE title LIKE CONCAT('$searchedword', '%') /*insecure*/
or this one which is much slower but finds your partial match anywhere in the title, not just at the beginning.
SELECT title FROM tbl WHERE title LIKE CONCAT('%', '$searchedword', '%') /*insecure*/
Avoid running these queries until you have gathered at least a few letters from your user, otherwise you'll get absurdly many results.
In these cases say SELECT title not SELECT *, and create an ordinary index on the title column. That way MySQL can satisfy the whole query from the index, which will make it much faster.
And, use MySQL's WHERE functionality to do the matching. Don't fetch the whole table from MySQL and search it in your php program.
And, use prepared statements. Because cybercreeps.

Related

Get mySQL full text match score for strings not in the table (optimally in a mixed result set with matches from the table)?

This must be a niche scenario since I have not been able to find a similar question around and in my brief testing in my SQL workbench just using the string in place of the column name did not work.
eg:
SELECT MATCH ('fork') AGAINST ('user entered text about forks' IN NATURAL LANGUAGE MODE);
Doesn't work...
I have a query that returns matches on a full text index with the relevance score as one of the columns returned. In this app, I am looking for "search suggestions" in a suggestions table that is built off the websites search index content. The user side also stores everything they search for in their local browser storage.
Currently, I have front end code that uses regex to pull matches from their local storage search history (up to 5) and then sends what they typed (as they type) to the back end to get the best matches from the suggestions table.
The way it works now, is the (up to 5) history matches are shown first, then the rest are filled in up to 10 total matches from the back end. What I would prefer, is that I send the history matches to the back end and include them in the FT match query in some way so that the result set contains all matched suggestions from the table + the history matches sent from the front end, but all sorted by the full text match relevance score to get them all in order of relevance. The new way may result in no history matches showing or it might result in more than 5 history matches showing, it would all boil down the releveance score.
Is something like this possible? The only other way I could image doing this is somehow creating a temporary table with a full text index, on the fly, and then joining that table in my current query, then removing the temp table when its done. The problem with that, in my mind, is that this is all happening in real time as the user types so I don't want to add something like that if its going to bog down the response time. Is there a fast/optimal way of doing this? Is there a way that would also remove the temporary table when the query ends?
Or is there some other command that can just give me a score based on string value against what the user typed in like what I tried above?
EDIT:
It looks like my temporary table idea could work:
https://dev.mysql.com/doc/refman/8.0/en/create-temporary-table.html
I'll just have to see what kind of perforamce impact this has. Im still interested to hear thoughts on if this is the best / only way or if there is a better one.
The CREATE TEMPORARY TABLE route was the way to go here. I tested it out and its working.
Worthy of note to future travelers. I had to switch my main table from innodb to myisam for this to work. I was able to mix/match the myisam temp table with the innodb main table, but the scoring algorithms are different so the innodb matches were taking priority due to higher scores. This was not an issue for me as I really did not need / use transactions for the primary suggestions table so I just made them both MyISAM engines.
Another item of note, is that I had to switch to splitting the user's query into "words" and ecapsulating them in "*" and running the match as a boolean search instead of natural language becausae in the case of the temp table, a user would likely have entered similar searches which would mean most of the words were in more than 50% of the rows so no matches were returning. Boolean search works around this. Again, not a big deal for my particular use case.
Had I needed to stay in innodb for this, it would have been a problem because from what I can tell, there is no way to set a full text index on an innodb temporary table.

mysql database field type for search query

I tried searching in different terms & got some answers too but they were not matching to my requirements. like This Link
I am using a sql statement something like below to fetch matching results from MySQL table.
SELECT statements... WHERE keyword_title_field REGEXP 'abc|axy|91store';
My questions is:
What data type (e.g. varchar, text etc) should i choose for keyword_title_field field in MySQL table to fetch results quickly without putting much load on table/server.
My current data type is Text due to unknown character length supply by user. Is this best suited or should i change?
Though it's not mandatory but any reference reading along with answer would be great for my understanding.
Here are some things to consider:
When you use any field in conditions (like REGEXP, LIKE or even '=') it is importand that you put an INDEX on the field. This will make MySQL not search every record 1 by 1, but find it via its INDEX instead. So make sure to look into that -> https://www.tutorialspoint.com/mysql/mysql-indexes.htm
The less characters allowed in your field, the smaller the INDEX is. You however have variable lengths to consider, so a TEXT is fine. If you know the maximum length and it's less than 256 characters, use a VARCHAR. Just make sure to index the field.
Note that REGEXP is relatively slow. LIKE '%term%' would be prefered, but that of course depends on your needs. If it's just 'abc' OR 'axy' OR '91store', you could consider this query: SELECT statements... WHERE keyword_title_field IN ('abc', 'axy', '91store');

How do I make data retrieval faster? [duplicate]

I am building a search feature for the messages part of my site, and have a messages database with a little over 9,000,000 rows, and and index on the sender, subject, and message fields. I was hoping to use the LIKE mysql clause in my query, such as (ex)
SELECT sender, subject, message FROM Messages WHERE message LIKE '%EXAMPLE_QUERY%';
to retrieve results. unfortunately, MySQL doesn't use indexes when a leading wildcard is present , and this is necessary for the search query could appear anywhere in the message (this is how the wildcards work, no?). Queries are very very slow and I cannot use a full text index either, because of the annoying 50% rule (I just can't afford to rule that much out). Is there anyway (or even, any alternative to this) to optimize a query using like and two wildcards? Any help is appreciated.
You should either use full-text indexes (you said you can't), design a full-text search by yourself or offload the search from MySQL and use Sphinx/Lucene. For Lucene you can use Zend_Search_Lucene implementation from Zend Framework or use Solr.
Normal indexes in MySQL are B+Trees, and they can't be used if the starting of the string is not known (and this is the case when you have wildcard in the beginning)
Another option is to implement search on your own, using reference table. Split text in words and create table that contains word, record_id. Then in the search you split the query in words and search for each of the words in the reference table. In this way you are not limitting yourself to the beginning of the whole text, but only to the beginning of the given word (and you'll match the rest of the words anyway)
'%EXAMPLE_QUERY%'; is a very very bad idea .. am going to give you some
A. Avoid wildcards at the start of LIKE queries use 'EXAMPLE_QUERY%'; instead
B. Create Keywords where you can easily use MATCH
If you want to stick with using MySQL, you should use FULL TEXT indexes. Full text indexes index words in a text block. You can then search on word stems and return the results in order of relevance. So you can find the word "example" within a block of text, but you still can't search efficiently on "xampl" to find "example".
MySQL's full text search is not great, but it is functional.
http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html
select * from emp where ename like '%e';
gives emp_name that ends with letter e.
select * from emp where ename like 'A%';
gives emp_name that begins with letter a.
select * from emp where ename like '_a%';
gives emp_name in which second letter is a.

MySQL Fulltext search but using LIKE

I'm recently doing some string searches from a table with about 50k strings in it, fairly large I'd say but not that big. I was doing some nested queries for a 'search within results' kinda thing. I was using LIKE statement to get a match of a searched keyword.
I came across MySQL's Full-Text search which I tried so I added a fulltext index to my str column. I'm aware that Full-text searches doesn't work on virtually created tables or even with Views so queries with sub-selects will not fit. I mentioned I was doing a nested queries, example is:
SELECT s2.id, s2.str
FROM
(
SELECT s1.id, s1.str
FROM
(
SELECT id, str
FROM strings
WHERE str LIKE '%term%'
) AS s1
WHERE s1.str LIKE '%another_term%'
) AS s2
WHERE s2.str LIKE '%a_much_deeper_term%';
This is actually not applied to any code yet, I was just doing some tests. Also, searching strings like this can be easily achieved by using Sphinx (performance wise) but let's consider Sphinx not being available and I want to know how this will work well in pure SQL query. Running this query on a table without Full-text added takes about 2.97 secs. (depends on the search term). However, running this query on a table with Full-text added to the str column finished in like 104ms which is fast (i think?).
My question is simple, is it valid to use LIKE or is it a good practice to use it at all in a table with Full-text added when normally we would use MATCH and AGAINST statements?
Thanks!
In this case you not neccessarily need subselects. You can siply use:
SELECT id, str
FROM item_strings
WHERE str LIKE '%term%'
AND str LIKE '%another_term%'
AND str LIKE '%a_much_deeper_term%'
... but also raises a good question: the order in which you are excluding the rows. I guess MySQL is smart enough to assume that the longest term will be the most restrictive, so starting with a_much_deeper_term it will eliminate most of the records then perform addtitional comparsion only on a few rows. - Contrary to this, if you start with term you will probably end up with many possible records then you have to compare them against the st of the terms.
The interesting part is that you can force the order in which the comparsion is made by using your original subselect example. This gives the opportunity to make a decision which term is the most restrictive based upon more han just the length, but for example:
the ratio of consonants a vowels
the longest chain of consonants of the word
the most used vowel in the word
...etc. You can also apply some heuristics based on the type of textual infomation you are handling.
Edit:
This is just a hunch but it could be possible to apply the LIKE to the words in the fulltext indexitself. Then match the rows against the index as if you have serched for full words.
I'm not sure if this is actually done, but it would be a smart thing to pull off by the MySQL people. Also note that this theory can only be used if all possible ocurrences arein fact in the fulltext search. For this you need that:
Your search pattern must be at least the size of the miimal word-length. (If you re searching for example %id% then it can be a part of a 3 letter word too, which is excluded by default form FULLTEXT index).
Your search pattern must not be a substring of any listed excluded word for example: and, of etc.
Your pattern must not contain any special characters.

How to search for rows containing a substring?

If I store an HTML TEXTAREA in my ODBC database each time the user submits a form, what's the SELECT statement to retrieve 1) all rows which contain a given sub-string 2) all rows which don't (and is the search case sensitive?)
Edit: if LIKE "%SUBSTRING%" is going to be slow, would it be better to get everything & sort it out in PHP?
Well, you can always try WHERE textcolumn LIKE "%SUBSTRING%" - but this is guaranteed to be pretty slow, as your query can't do an index match because you are looking for characters on the left side.
It depends on the field type - a textarea usually won't be saved as VARCHAR, but rather as (a kind of) TEXT field, so you can use the MATCH AGAINST operator.
To get the columns that don't match, simply put a NOT in front of the like: WHERE textcolumn NOT LIKE "%SUBSTRING%".
Whether the search is case-sensitive or not depends on how you stock the data, especially what COLLATION you use. By default, the search will be case-insensitive.
Updated answer to reflect question update:
I say that doing a WHERE field LIKE "%value%" is slower than WHERE field LIKE "value%" if the column field has an index, but this is still considerably faster than getting all values and having your application filter. Both scenario's:
1/ If you do SELECT field FROM table WHERE field LIKE "%value%", MySQL will scan the entire table, and only send the fields containing "value".
2/ If you do SELECT field FROM table and then have your application (in your case PHP) filter only the rows with "value" in it, MySQL will also scan the entire table, but send all the fields to PHP, which then has to do additional work. This is much slower than case #1.
Solution: Please do use the WHERE clause, and use EXPLAIN to see the performance.
Info on MySQL's full text search. This is restricted to MyISAM tables, so may not be suitable if you wantto use a different table type.
http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
Even if WHERE textcolumn LIKE "%SUBSTRING%" is going to be slow, I think it is probably better to let the Database handle it rather than have PHP handle it. If it is possible to restrict searches by some other criteria (date range, user, etc) then you may find the substring search is OK (ish).
If you are searching for whole words, you could pull out all the individual words into a separate table and use that to restrict the substring search. (So when searching for "my search string" you look for the the longest word "search" only do the substring search on records containing the word "search")
I simply use SELECT ColumnName1, ColumnName2,.....WHERE LOCATE(subtr, ColumnNameX)<>0
To get rows with ColumnNameX having the substring.
Replace <> with = to get rows NOT having the substring.