how to perform MySQL smart text search in a column? - mysql

I am trying to search for a shop name in one of MySQL table, the table has a field called fullname. As of now I am using the SOUNDS LIKE method of MySQL however here's an example that failed:
Say I have the string Banana's Shop. Then using SOUNDS LIKE with query of 'nana' or 'bananas' won't give me the result. Here's my current query:
SELECT `fullName` FROM `shop` WHERE `fullName` SOUNDS LIKE 'nana';
is there a better way to do simple search like this in MySQL that is smarter so that typo's would also still match?

The ancient and slightly honorable SOUNDEX algorithm used by SOUNDS LIKE doesn't handle suffix sounds. That is, nana doesn't, and can't, match banana. banani will match banana, however.
Two utterances don't necessarily sound alike unless they have the same number of syllables. It's good for matching stuff like surnames: Smith, Schmitt, and Schmidt all have the same SOUNDEX value.
Calling SOUNDEX 'smart text search' is an exaggeration. http://en.wikipedia.org/wiki/Soundex
You might consider MySQL FULLTEXT search, which you can look up. This does a certain amount of phrase matching. That is, if you had "banana shop" and "banana slug" in your column, the word "banana" would have a shot at matching both those values.
Be careful with FULLTEXT. It works counterintuitively when you have less than about a couple of hundred rows in the table you're searching.
But that's not a typo-friendly word matcher. What you're asking isn't really easy.
You could consider the Levenshtein algorithm (which you can look up). But it's a hairball to get working properly.

Related

How to search inside a SQL table for a phrase

I am currently using MySQL but I am willing to migrate if necessary to any solution suggested.
I am looking for an easy way to implement a search on a table.
The table has multiple entries with data similar to what will be found on user accounts, like names, addresses, phone numbers and a text column that contains comments of arbitrary length.
I want to make a search so that I can go over all rows and columns and find the best matching row. Slightly misspells corrected (Not very important). But most important is the ability to cross search everything.
Table can have as many as 20,000 rows.
Search parameter will be for example: "Company First Name"
Expected results:
company|Contact First Name|Address|...|...
example 2, slightly misspelled search parameters : "Pinaple Street Compani"
Expected results row:
company|pinapple street|..|...
companie|pinapple street|..|...
company|pinaple street|..|...
EDIT:
Forgot to clarify that multiple searches will be done at the same time so it has to be fast (Around 100 searches at the same time). Also the language of the data is not english and the database is utf8 with support for non-english characters
The misspelling problem is hard, if not impossible, to solve well in pure MySQL.
The multiple-column FULLTEXT search isn't so bad.
Your query will look something like this ...
SELECT column, column
FROM table
WHERE MATCH(Company, FirstName, LastName, whatever, whatever)
AGAINST('search terms' IN NATURAL LANGUAGE MODE)
It will produce a bunch of results, ordered by what MySQL guesses is the most likely hit first. MySQL's guesses aren't great, but they're usually adequate.
You'll need a FULLTEXT index matching the list of columns in your MATCH() clause. Creating that index looks like this.
ALTER TABLE book
ADD FULLTEXT INDEX Fulltext_search_index_1
(Company, FirstName, LastName, whatever, whatever);
Comments in your question notwithstanding, you just need an index for the group of columns which you will search.
20K rows won't be a big burden on any recent-vintage server hardware.
Misspelling: You could try SOUNDEX(), but it's an early 20th century algorithm designed by the Bell System to look up peoples' names in American English. It's designed to get many false positive hits, and it really is dumber than a bucket of rocks.
If you really do need spell correction you may need to investigate Sphinx.

How do I make a MySQL query that searches for terms both forwards and backwards?

I have the following search terms (from three example words) that a user may enter:
"goofy plastic toy"
"goofy toy plastic"
"plastic goofy toy"
"plastic toy goofy"
"toy goofy plastic"
"toy plastic goofy"
How so I write a proper SELECT statement that can look at all statements without having to hardcode OR?
So just:
SELECT a
FROM b
WHERE c = '<search statements in all possible ways separated by spaces here>';
The amount of words a user enters can be infinite. It could be two, it could be ten, who knows.
I am trying to search 100,000 product titles and their descriptions and pull out which records contain all words in any order.
If you just want to return rows that contain those two words in any order (even if they also contain other words) then you may want to use a FULLTEXT index like this:
SELECT a
FROM b
WHERE MATCH (c) AGAINST ('+goofy +plastic' IN BOOLEAN MODE);
If youd don't care about performace you can use regular expressions. If you do, you should really consider restrcucturing your database schema. Relational databases are made to work with atomic data, not data sets. The simplest update would be to make the column ENUM or SET depending on what is more suitable for your case.

Can any one simplify the where condition of this mysql select

Hi. Can any one simplify the where condition of this mysql select statement? It takes a long time to bring the result or it asks for SET SQL_BIG_SELECTS=1.
In the query below:
The postcode contains values like BH12 or SW10,
The *req_area* contains data like Kensington and Chelsea, SW10,
The region have values like Kensington and Chelsea,
The *town_area* have values like West Brompton, Chelsea.
select `a`.`user_id` AS `user_id`,`a`.`req_area` AS `req_area`,`a`.`req_area2` AS `req_area2`,`a`.`req_area3` AS `req_area3`,
`a`.`req_property_type` AS `req_property_type`,`a`.`req_bedrooms` AS `req_bedrooms`,`b`.`latitude` AS `latitude`,
`b`.`longitude` AS `longitude`,`b`.`postcode` AS `postcode`
from (`cff_user_property_req_view` `a` join `cff_uk_short_postcodes` `b`)
where
(`b`.`postcode` regexp concat("'",TRIM(`a`.`req_area`),'|',TRIM(`a`.`req_area2`),'|',TRIM(`a`.`req_area3`),"'")>=1 or
`b`.`region` regexp concat("'",TRIM(`a`.`req_area`),'|',TRIM(`a`.`req_area2`),'|',TRIM(`a`.`req_area3`),"'")>=1 or
`b`.`town_area` regexp concat("'",concat('[[:<:]]',`a`.`req_area`,'[[:>:]]'),'|',concat('[[:<:]]',`a`.`req_area2`,'[[:>:]]'),'|',concat('[[:<:]]',`a`.`req_area3`,'[[:>:]]'),"'")>=1)
order by `a`.`user_id`;
Thanks in advance.
The reason why this is so slow is because your code requires to evaluate three regular expressions on the whole outer product of the two tables. Regular expressions are slow, and anything that has to go through the whole table to find matching rows is rather slow as well. There is little you can do while preserving the exact semantics of the query you have given.
So instead of asking for ways to improove that query, you might be better of describing what it is you're tyring to achieve, and then find a way to model that in a better way. Fulltext search indices might help. Splitting columns into words and storing those words in an extra table might help. I'm not sure whether it would be better to edit your question, or to leave this question as it now stands, and ask a completely new question for that.
You probably should also give an example of what req_area should look like in cases where you expect a match. As the req_area fields are always included in a regular expression, your example won't yield a match, as this long req_area of “Kensington and Chelsea, SW10” is not included in its entirety in any of the other values from your example. Providing some actual examples using sqlfiddle would make it easier for others to experiment with possible queries, thus increasing both the quality of the answers you receive (as the queries have actually been checked) and the chances of receiving any answers at all (because people can go ahead and develop their answers through experiments).

MySQL Fulltext search but using LIKE

I'm recently doing some string searches from a table with about 50k strings in it, fairly large I'd say but not that big. I was doing some nested queries for a 'search within results' kinda thing. I was using LIKE statement to get a match of a searched keyword.
I came across MySQL's Full-Text search which I tried so I added a fulltext index to my str column. I'm aware that Full-text searches doesn't work on virtually created tables or even with Views so queries with sub-selects will not fit. I mentioned I was doing a nested queries, example is:
SELECT s2.id, s2.str
FROM
(
SELECT s1.id, s1.str
FROM
(
SELECT id, str
FROM strings
WHERE str LIKE '%term%'
) AS s1
WHERE s1.str LIKE '%another_term%'
) AS s2
WHERE s2.str LIKE '%a_much_deeper_term%';
This is actually not applied to any code yet, I was just doing some tests. Also, searching strings like this can be easily achieved by using Sphinx (performance wise) but let's consider Sphinx not being available and I want to know how this will work well in pure SQL query. Running this query on a table without Full-text added takes about 2.97 secs. (depends on the search term). However, running this query on a table with Full-text added to the str column finished in like 104ms which is fast (i think?).
My question is simple, is it valid to use LIKE or is it a good practice to use it at all in a table with Full-text added when normally we would use MATCH and AGAINST statements?
Thanks!
In this case you not neccessarily need subselects. You can siply use:
SELECT id, str
FROM item_strings
WHERE str LIKE '%term%'
AND str LIKE '%another_term%'
AND str LIKE '%a_much_deeper_term%'
... but also raises a good question: the order in which you are excluding the rows. I guess MySQL is smart enough to assume that the longest term will be the most restrictive, so starting with a_much_deeper_term it will eliminate most of the records then perform addtitional comparsion only on a few rows. - Contrary to this, if you start with term you will probably end up with many possible records then you have to compare them against the st of the terms.
The interesting part is that you can force the order in which the comparsion is made by using your original subselect example. This gives the opportunity to make a decision which term is the most restrictive based upon more han just the length, but for example:
the ratio of consonants a vowels
the longest chain of consonants of the word
the most used vowel in the word
...etc. You can also apply some heuristics based on the type of textual infomation you are handling.
Edit:
This is just a hunch but it could be possible to apply the LIKE to the words in the fulltext indexitself. Then match the rows against the index as if you have serched for full words.
I'm not sure if this is actually done, but it would be a smart thing to pull off by the MySQL people. Also note that this theory can only be used if all possible ocurrences arein fact in the fulltext search. For this you need that:
Your search pattern must be at least the size of the miimal word-length. (If you re searching for example %id% then it can be a part of a 3 letter word too, which is excluded by default form FULLTEXT index).
Your search pattern must not be a substring of any listed excluded word for example: and, of etc.
Your pattern must not contain any special characters.

Extra fulltext ordering criteria beyond default relevance

I'm implementing an ingredient text search, for adding ingredients to a recipe. I've currently got a full text index on the ingredient name, which is stored in a single text field, like so:
"Sauce, tomato, lite, Heinz"
I've found that because there are a lot of ingredients with very similar names in the database, simply sorting by relevance doesn't work that well a lot of the time. So, I've found myself sorting by a bunch of my own rules of thumb, which probably duplicates a lot of the full-text search algorithm which spits out a numerical relevance. For instance (abridged):
ORDER BY
[ingredient name is exactly search term],
[ingredient name starts with search term],
[ingredient name starts with any word from the search and contains all search terms in some order],
[ingredient name contains all search terms in some order],
...and so on. Each of these is defined in the SELECT specification as an expression returning either 1 or 0, and so I order by those in sequential order.
I would love to hear suggestions for:
A better way to define complicated order-by criteria in one place, say perhaps in a view or stored procedure that you can pass just the search term to and get back a set of results without having to worry about how they're ordered?
A better tool for this than MySQL's fulltext engine -- perhaps if I was using Sphinx or something [which I've heard of but not used before], would I find some sort of complicated config option designed to solve problems like this?
Some google search terms which might turn up discussion on how to order text items within a specific domain like this? I haven't found much that's of use.
Thanks for reading!
Jeremy,
What you are looking for is Rank Boosting which is supported by Solr. Here is a link where you can read more about this:
http://wiki.apache.org/solr/SolrRelevancyCookbook#Ranking_Terms