How to search for Soundex() substrings in MySQL? - mysql

i got a problem with the Joomla! 3 integrated search engine. This engine's indexer creates so called soundex-values when indexing content like, for example
Testobject,
Testobject 1,
Testobject 2239923,
Textobject ....
which all have the same soundex-value of T23123.
Now my problem is, if i do a search for Test, then there won't be any results since the soundex-value for this term is T230.
The query used by the search engine is:
SELECT DISTINCT t.term_id AS id, t.term AS term
FROM tablename AS t
WHERE t.soundex = SOUNDEX('test')
I checked the soundex_match function in this topic, but unfortunately this cannot resolve my problem, because it does not compare soundex values.
I want to avoid hacking the cms core and would like understand if there is some kind of approximation procedure available to compare soundex-values like for regular queries when using the % symbol which i could then try to implement using a plugin or whatever.
The MSSQL DIFFERENCE function mentioned here would be ideal, if it would be available in MySQL and ready to use a soundex value as second parameter.
I am not very well experienced in MySQL and have no idea how to improve the query to also match soundex-substrings.

You're probably looking to calculate the Levenshtein distance; but if you simply want to find those records that start with something that sounds similar to the search term, you can strip any trailing 0 (which is merely used for padding) and then search for soundex strings with the resulting prefix:
WHERE t.soundex LIKE CONCAT(TRIM(TRAILING '0' FROM SOUNDEX('test')), '%')

Related

How can I find a list of rows which contain a string similar to "Butterflies" in MySQL? [duplicate]

I have a table dictionary which contains a list of words Like:
ID|word
---------
1|hello
2|google
3|similar
...
so i want if somebody writes a text like
"helo iam looking for simlar engines for gogle".
Now I want to check every word if it exists in the database, if not it should
get me the similar word for the word. For example: helo = hello, simlar = similar, gogle = google.
Well, i want to fix the spelling errors. In my database i have a full dictionary of all english words. I coudn't find any mysql function which helps me. LIKE isn't helpfull in my situation.
you can use soundex() function for comparing phonetically
your query should be something like:
select * from table where soundex(word) like soundex('helo');
and this will return you the hello row
There is a function that does roughly want you want, but it's intensive and will slow queries down. You might be able to use in your circumstances, I have used it before. It's called Levenshtein. You can get it here How to add levenshtein function in mysql?
What you want to do is called a fuzzy search. You could use the SOUNDEX function in MySQL, documented here:
http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_soundex
You query would look like:
SELECT * FROM dictionary where SOUNDEX(word) = SOUNDEX(:yourSearchTerm)
... where your search term is bound to the :yourSearchTerm parameter value.
A next step would be to try implementing and making use of a Levenshtein function in MySQL. One is described here:
http://www.artfulsoftware.com/infotree/qrytip.php?id=552
The Levenshtein distance between two strings is the minimum number of
operations needed to transform one string into the other, where an
operation may be insertion, deletion or substitution of one character.
You might also consider looking into databases that are aimed at full text searching, such as Elastic Search, which provides this natively:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html

Search in multiple column with at least 2 words in keyword

I have a table which store some datas. This is my table structure.
Course
Location
Wolden
New York
Sertigo
Seatlle
Monad
Chicago
Donner
Texas
I want to search from that table for example with this keyword Sertigo Seattle and it will return row number two as a result.
I have this query but doesn't work.
SELECT * FROM courses_data a WHERE CONCAT_WS(' ', a.Courses, a.Location) LIKE '%Sertigo Seattle%'
Maybe anyone knows how to make query to achieve my needs?
If you want to search against the course and location then use:
SELECT *
FROM courses_data
WHERE Course = 'Sertigo' AND Location = 'Seattle';
Efficient searching is usually implemented by preparing the search string before running the actual search:
You split the search string "Sertigo Seattle" into two words: "Sertigo" and "Seattle". You trim those words (remove enclosing white space characters). You might also want to normalize the words, perhaps convert them to all lower case to implement a case insentive search.
Then you run a search for the discrete words:
SELECT *
FROM courses_data
WHERE
(Course = 'Sertigo' AND Location = 'Seattle')
OR
(Course = 'Seattle' AND Location = 'Sertigo');
Of course that query is created using a prepared statement and parameter binding, using the extracted and trimmed words as dynamic parameters.
This is is much more efficient than using wildcard based search with the LIKE operator. Because the database engine can make use of the indexes you (hopefully) created for that table. You can check that by using EXPLAIN feature MySQL offers.
Also it does make sense to measure performance: run different search approaches in a loop, say 1000 times, and take the required time. You will get a clear and meaningful example. Also monitoring CPU and memory usage in such a test is of interest.

MYSQL search for right words | fixing spelling errors

I have a table dictionary which contains a list of words Like:
ID|word
---------
1|hello
2|google
3|similar
...
so i want if somebody writes a text like
"helo iam looking for simlar engines for gogle".
Now I want to check every word if it exists in the database, if not it should
get me the similar word for the word. For example: helo = hello, simlar = similar, gogle = google.
Well, i want to fix the spelling errors. In my database i have a full dictionary of all english words. I coudn't find any mysql function which helps me. LIKE isn't helpfull in my situation.
you can use soundex() function for comparing phonetically
your query should be something like:
select * from table where soundex(word) like soundex('helo');
and this will return you the hello row
There is a function that does roughly want you want, but it's intensive and will slow queries down. You might be able to use in your circumstances, I have used it before. It's called Levenshtein. You can get it here How to add levenshtein function in mysql?
What you want to do is called a fuzzy search. You could use the SOUNDEX function in MySQL, documented here:
http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_soundex
You query would look like:
SELECT * FROM dictionary where SOUNDEX(word) = SOUNDEX(:yourSearchTerm)
... where your search term is bound to the :yourSearchTerm parameter value.
A next step would be to try implementing and making use of a Levenshtein function in MySQL. One is described here:
http://www.artfulsoftware.com/infotree/qrytip.php?id=552
The Levenshtein distance between two strings is the minimum number of
operations needed to transform one string into the other, where an
operation may be insertion, deletion or substitution of one character.
You might also consider looking into databases that are aimed at full text searching, such as Elastic Search, which provides this natively:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html

Performance of LIKE 'xyz%' v/s LIKE '%xyz'

I was wondering how the LIKE operator actually work.
Does it simply start from first character of the string and try matching pattern, one character moving to the right? Or does it look at the placement of the %, i.e. if it finds the % to be the first character of the pattern, does it start from the right most character and starts matching, moving one character to the left on each successful match?
Not that I have any use case in my mind right now, just curious.
edit: made question narrow
If there is an index on the column, putting constant characters in the front will lead your dbms to use a more efficient searching/seeking algorithm. But even at the simplest form, the dbms has to test characters. If it is able to find it doesn't match early on, it can discard it and move onto the next test.
The LIKE search condition uses wildcards to search for patterns within a string. For example:
WHERE name LIKE 'Mickey%'
will locate all values that begin with 'Mickey' optionally followed by any number of characters. The % is not case sensitive and not accent sensitive and you can use multiple %, for example
WHERE name LIKE '%mouse%'
will return all values with 'mouse' (or 'Mouse' or 'mousé') in it.
The % is inclusive, meaning that
WHERE name like '%A%'
will return all that starts with an 'A', contain 'A' or end with 'A'.
You can use _ (underscore) for any character on a single position:
WHERE name LIKE '_at%'
will give you all values with 'a' as the second letter and 't' as the third. The first letter can be anything. For example: 'Batman'
In T-SQL, if you use [] you can find values in a range.
WHERE name LIKE '[c-f]%'
it will find any value beginning with letter between c and f, inclusive. Meaning it will return any value that start with c, d, e or f. This [] is T-SQL only. Use [^ ] to find values not in a range.
Finding all values that contain a number:
WHERE name LIKE '%[0-9]%'
returns everything that has a number in it. Example: 'Godfather2'
If you are looking for all values with the 3rd position to be a '-' (dash) use two underscores:
WHERE NAME '__-%'
It will return for example: 'Lo-Res'
Finding the values with names ends in 'xyz' use:
WHERE name LIKE '%xyz'
returns anything that ends with 'xyz'
Finding a % sign in a name use brackets:
WHERE name LIKE '%[%]%'
will return for example: 'Top%Movies'
Searching for [ use brackets around it:
WHERE name LIKE '%[[]%'
gives results as: 'New York [NY]'
The database collation's sort order determines both case sensitivety and the sort order for the range of characters. You can optionally use COLLATE to specify collation sort order used by the LIKE operator.
Usually the main performance bottleneck is IO. The efficiency of the LIKE operator can be only important if your whole table fits in the memory otherwise IO will take most of the time.
AFAIK oracle can use indexes for prefix matching. (like 'abc%'), but these index cannot be used for more complex expressions.
Anyway if you have only this kind of queries you should consider using a simple index on the related column. (Probably this is true for other RDBMS's as well.)
Otherwise LIKE operator is generally slow, but most of the RDBMS have some kind of full text searching solution. I think the main reason of the slowness is that LIKE is too general. Usually full text indexes has lots of different options which can tell the database what you really want to search for, and with these additional information the DB can do its task in a more efficient way.
As a rule of thumb I think if you want to search in a text field and you think performance can be an issue, you should consider your RDBMS's full text searching solution, or the real goal is not text searching, but this is some kind of "design side effect", for example xml/json/statuses stored in a field as text, then probably you should consider choosing a more efficient data storing option. (if there is any...)

extracting strings from mysql field

total slow moment day, i need to extract different areas based on what language is selected from a field in a mysql database
ex:
<!--:en-->Overview<!--:--><!--:es-->Overview<!--:--><!--:fr-->Présentation<!--:--><!--:ar-->نظرة عامة<!--:-->
so if my language is french for example, i want the part between <!--:fr--> and <!--:-->
any ideas?
Strings processing is not the strongest part of MySQL. But here is one idea:
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(column_name, '<!--:fr-->', -1), '<!--:-->', 1) FROM table_name
The easier way would be using a substring. You can find the index for the language on the string first. After that, find the index of the end marker () and extract what's in the middle, which is the value you want.
A more elaborated way would be using regular expressions. The implementation depends on the language you are coding on.