How to search for rows containing a substring? - mysql

If I store an HTML TEXTAREA in my ODBC database each time the user submits a form, what's the SELECT statement to retrieve 1) all rows which contain a given sub-string 2) all rows which don't (and is the search case sensitive?)
Edit: if LIKE "%SUBSTRING%" is going to be slow, would it be better to get everything & sort it out in PHP?

Well, you can always try WHERE textcolumn LIKE "%SUBSTRING%" - but this is guaranteed to be pretty slow, as your query can't do an index match because you are looking for characters on the left side.
It depends on the field type - a textarea usually won't be saved as VARCHAR, but rather as (a kind of) TEXT field, so you can use the MATCH AGAINST operator.
To get the columns that don't match, simply put a NOT in front of the like: WHERE textcolumn NOT LIKE "%SUBSTRING%".
Whether the search is case-sensitive or not depends on how you stock the data, especially what COLLATION you use. By default, the search will be case-insensitive.
Updated answer to reflect question update:
I say that doing a WHERE field LIKE "%value%" is slower than WHERE field LIKE "value%" if the column field has an index, but this is still considerably faster than getting all values and having your application filter. Both scenario's:
1/ If you do SELECT field FROM table WHERE field LIKE "%value%", MySQL will scan the entire table, and only send the fields containing "value".
2/ If you do SELECT field FROM table and then have your application (in your case PHP) filter only the rows with "value" in it, MySQL will also scan the entire table, but send all the fields to PHP, which then has to do additional work. This is much slower than case #1.
Solution: Please do use the WHERE clause, and use EXPLAIN to see the performance.

Info on MySQL's full text search. This is restricted to MyISAM tables, so may not be suitable if you wantto use a different table type.
http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
Even if WHERE textcolumn LIKE "%SUBSTRING%" is going to be slow, I think it is probably better to let the Database handle it rather than have PHP handle it. If it is possible to restrict searches by some other criteria (date range, user, etc) then you may find the substring search is OK (ish).
If you are searching for whole words, you could pull out all the individual words into a separate table and use that to restrict the substring search. (So when searching for "my search string" you look for the the longest word "search" only do the substring search on records containing the word "search")

I simply use SELECT ColumnName1, ColumnName2,.....WHERE LOCATE(subtr, ColumnNameX)<>0
To get rows with ColumnNameX having the substring.
Replace <> with = to get rows NOT having the substring.

Related

Realtime search against only 1 column in Mysql - without any plugins

However I found some threads about this, but nothing fits to my case.
I have a search field in my mobile app, where after text change, the real time search is running via calling my API.
The search request starts only if there are 3 or more characters entered and is searching ONLY in 1 DB column, called TITLE. So each time the user enters a letter, a query is searching for it.
Currently I have it like this (I know this solution is very bad). $searchedword is the word user entered:
if (!empty($searchedword)&&strlen($searchedword)>2 ) {$searchedword=strtolower($searchedword);
$sql = "SELECT * FROM TABLE ";$result = $mysqli->query($sql); $output='';
if ($result->num_rows > 0) {
while($data=$result->fetch_array()) {
$title=strtolower($data['title']);$content=$data['content'];
if (strpos($title,$searchedword) !== false ) {$output.=$title.','.$content;}
}}
So this just checks, if the title from DB contains the searched word. This works very well, but I think it is very bad according to performance, because each time the user enters a letter to the search field, each time all the data from the table are queried and looked for that word.
I want to recreate my code to meet the best performance.
So my first question is, should I add a FULLTEXT INDEX to the TITLE column in DB, will it help or will it just increase the disk space? As I am just searching against 1 column and in this column is just a title (1 or 2 words max).
And second question, what should be the best query for my case and of course with the best performance? As I need to search after each letter which user enters.
Can I use the search this way?
SELECT * FROM TABLE WHERE MATCH (title) AGAINST ('$searchedword' IN NATURAL LANGUAGE MODE)
However it seems, this will return only if the word completely matches the title, but returns nothing when the word is part of the title, so it is not a good solution.
The only solution which works is this:
SELECT * FROM TABLE WHERE title LIKE '%$searchedword%' "
but what about performance? And I don't understand how this works, because searchedword are converted to lowercase and I have removed the accents from that word, and the TITLE column in DB has accents and also Uppercase, but this search works very well!
If your title column has a collation like utfmb4_general_ci, you don't have to worry about dealing with upper case, lower case, and diacritical marks in your MySQL WHERE clauses. MySQL will do it for you. It is really good at handling character sets and collations in all kinds of languages. (Such things are very helpful to Swedish-language users, and the inventors of MySQL are Swedish.)
FULLTEXT with NATURAL LANGUAGE MODE is probably not the right approach for this application. It works on words, not chunks of letters. So it probably won't give you anything until your user has typed a whole word, and not a stop word. And, it is a little squirrely when you search a table with only a few rows. So, that might be a problem if you're just getting started.
It does order the results by the closeness of the match, so the most likely hit is the first one. So, if you know you have a phrase to search, it's good.
For your progressive-search application you may want to use one of these two LIKE queries.
SELECT title FROM tbl WHERE title LIKE CONCAT('$searchedword', '%') /*insecure*/
or this one which is much slower but finds your partial match anywhere in the title, not just at the beginning.
SELECT title FROM tbl WHERE title LIKE CONCAT('%', '$searchedword', '%') /*insecure*/
Avoid running these queries until you have gathered at least a few letters from your user, otherwise you'll get absurdly many results.
In these cases say SELECT title not SELECT *, and create an ordinary index on the title column. That way MySQL can satisfy the whole query from the index, which will make it much faster.
And, use MySQL's WHERE functionality to do the matching. Don't fetch the whole table from MySQL and search it in your php program.
And, use prepared statements. Because cybercreeps.

mysql database field type for search query

I tried searching in different terms & got some answers too but they were not matching to my requirements. like This Link
I am using a sql statement something like below to fetch matching results from MySQL table.
SELECT statements... WHERE keyword_title_field REGEXP 'abc|axy|91store';
My questions is:
What data type (e.g. varchar, text etc) should i choose for keyword_title_field field in MySQL table to fetch results quickly without putting much load on table/server.
My current data type is Text due to unknown character length supply by user. Is this best suited or should i change?
Though it's not mandatory but any reference reading along with answer would be great for my understanding.
Here are some things to consider:
When you use any field in conditions (like REGEXP, LIKE or even '=') it is importand that you put an INDEX on the field. This will make MySQL not search every record 1 by 1, but find it via its INDEX instead. So make sure to look into that -> https://www.tutorialspoint.com/mysql/mysql-indexes.htm
The less characters allowed in your field, the smaller the INDEX is. You however have variable lengths to consider, so a TEXT is fine. If you know the maximum length and it's less than 256 characters, use a VARCHAR. Just make sure to index the field.
Note that REGEXP is relatively slow. LIKE '%term%' would be prefered, but that of course depends on your needs. If it's just 'abc' OR 'axy' OR '91store', you could consider this query: SELECT statements... WHERE keyword_title_field IN ('abc', 'axy', '91store');

How to speed up search MySQL? Is fulltext search with special characters possible?

I have strings like the following in my VARCHAR InnoDB table column:
"This is a {{aaaa->bbb->cccc}} and that is a {{dddd}}!"
Now, I'd like to search for e.g. {{xxx->yyy->zzz}}. Brackets are part of the string. Sometimes searched together with another colum, but which only contains an ordinary id and hence don't need to be considered (I guess).
I know I can use LIKE or REGEXP. But these (already tried) ways are too slow. Can I introduce a fulltext index? Or should I add another helping table? Should I replace the special characters {, }, -, > to get words for the fulltext search? Or what else could I do?
The search works with some ten-thousand rows and I assume that I often get about one hundred hits.
This link should give you all the info you need regarding FULLTEXT indexes in MySQL.
MySQL dev site
The section that you will want to pay particular attention to is:
"Full-text searching is performed using MATCH() ... AGAINST syntax. MATCH() takes a comma-separated list that names the columns to be searched. AGAINST takes a string to search for, and an optional modifier that indicates what type of search to perform. The search string must be a string value that is constant during query evaluation. This rules out, for example, a table column because that can differ for each row."
So in short, to answer your question you should see an improvement in query execution times by implementing a full text index on wide VARCHAR columns. Providing you are using a compatible storage engine ( InnoDB or MyISAM)
Also here is an example of how you can query the full text index and also an additional ID field as hinted in your question:
SELECT *
FROM table
WHERE MATCH (fieldlist) AGAINST ('search text here')
AND ( field2= '1234');

How do I make data retrieval faster? [duplicate]

I am building a search feature for the messages part of my site, and have a messages database with a little over 9,000,000 rows, and and index on the sender, subject, and message fields. I was hoping to use the LIKE mysql clause in my query, such as (ex)
SELECT sender, subject, message FROM Messages WHERE message LIKE '%EXAMPLE_QUERY%';
to retrieve results. unfortunately, MySQL doesn't use indexes when a leading wildcard is present , and this is necessary for the search query could appear anywhere in the message (this is how the wildcards work, no?). Queries are very very slow and I cannot use a full text index either, because of the annoying 50% rule (I just can't afford to rule that much out). Is there anyway (or even, any alternative to this) to optimize a query using like and two wildcards? Any help is appreciated.
You should either use full-text indexes (you said you can't), design a full-text search by yourself or offload the search from MySQL and use Sphinx/Lucene. For Lucene you can use Zend_Search_Lucene implementation from Zend Framework or use Solr.
Normal indexes in MySQL are B+Trees, and they can't be used if the starting of the string is not known (and this is the case when you have wildcard in the beginning)
Another option is to implement search on your own, using reference table. Split text in words and create table that contains word, record_id. Then in the search you split the query in words and search for each of the words in the reference table. In this way you are not limitting yourself to the beginning of the whole text, but only to the beginning of the given word (and you'll match the rest of the words anyway)
'%EXAMPLE_QUERY%'; is a very very bad idea .. am going to give you some
A. Avoid wildcards at the start of LIKE queries use 'EXAMPLE_QUERY%'; instead
B. Create Keywords where you can easily use MATCH
If you want to stick with using MySQL, you should use FULL TEXT indexes. Full text indexes index words in a text block. You can then search on word stems and return the results in order of relevance. So you can find the word "example" within a block of text, but you still can't search efficiently on "xampl" to find "example".
MySQL's full text search is not great, but it is functional.
http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html
select * from emp where ename like '%e';
gives emp_name that ends with letter e.
select * from emp where ename like 'A%';
gives emp_name that begins with letter a.
select * from emp where ename like '_a%';
gives emp_name in which second letter is a.

Using Full-Text Search instead of "Like '%____%' query

I am quering a table (about 150,000 rows and growing) with a big varchar field (size 2000) which can't be indexed (and there's no point even if it could be). I am using Sql Server 2008.
The query I used till now was:
select * from tbl_name
where field_name like '%bla bla%'
("bla bla" is according to what the user searched for)
In order to improve performence, I wann'a start using the Full-Text Search feature (already defined a catalog and a text index on this field).
I am a bit confused from what I read about quering with this option.
what query should I use in order to get exactly the same results as the query I used to use before?
Comments:
I would like to get results which are not case sensative, as it worked before (meaning if the user searches for "LG" he will also get results that contains "Lg").
If user enters "Sams" he will also get "Samsung".
Thanks!
Eran.
CONTAINS() will get you the LIKE() functionality you are seeking with one exception - I noticed in the comments that you also want to match the second entry - "hhhEranttt". Unfortunately, due to the lack of suffix search this is currently not possible.
For the other entries you can run a prefix search - CONTAINS(field_name, '"eran*"') which matches all the other entries since full-text searches are case-insensitive.
HTH.