MySQL Find similar strings - mysql

I have an InnoDB database table of 12 character codes which often need to be entered by a user.
Occasionally, a user will enter the code incorrectly (for example typing a lower case L instead of a 1 etc).
I'm trying to write a query that will find similar codes to the one they have entered but using LIKE '%code%' gives me way too many results, many of which contain only one matching character.
Is there a way to perform a more detailed check?
Edit - Case sensitive not required.
Any advice appreciated.
Thanks.

Have a look at soundex. Commonly misspelled strings have the same soundex code, so you can query for:
where soundex(Code) like soundex(UserInput)

use without wildcard % for that
SELECT `code` FROM table where code LIKE 'user_input'
thi wil also check the space
SELECT 'a' = 'a ', return 1 whereas SELCET 'a' LIKE 'a ' return 0
reference

Related

How to find words by partially strings

I have been trying to solve this problem for hours, but I dont know how to approach it, so I would need a push to a right direction.
I want to create a page where users can find the appropriate word, by providing word length and characters.
For example, user wants to find all the 5 letter words, where the second letter is R and fourth V, like this:
_R_V_
I have a table with column WORDS with words "letter", "moon", "drive", "mrive" and the query should return: "drive" and "mrive".
Is it possible to do it in MySQL?
While I was looking for the direction I found that I should create a trie structure. I dont know how to do that, but I will learn it if there is no easier way.
Yes, you can use LIKE :
SELECT * FROM YourTable t
WHERE t.word_col LIKE '_R_V_'
_ Wildcard stands for any single character. This will also force the string to be 5 characters in length, since % wildcard is not used.
You can find a great explanation about LIKE wildcards in the link above.

Isolate an email address from a string using MySQL

I am trying to isolate an email address from a block of free field text (column name is TEXT).
There are many different variations of preceding and succeeding characters in the free text field, i.e.:
email me! john#smith.com
e:john#smith.com m:555-555-5555
john#smith.com--personal email
I've tried variations of INSTR() and SUBSTRING_INDEX() to first isolate the "#" (probably the one reliable constant in finding an email...) and extracting the characters to the left (up until a space or non-qualifying character like "-" or ":") and doing the same thing with the text following the #.
However - everything I've tried so far hasn't filtered out the noise to the level I need.
Obviously 100% accuracy isn't possible but would someone mind taking a crack at how I can structure my select statement?
There is no easy solution to do this within MySQL. However you can do this easily after you have retrieved it using regular expressions.
Here would be a an example of how to use it in your case: Regex example
If you want it to select all e-mail addresses from one string: Regex Example
You can use regex to extract the ones where it does contain an e-mail in MySQL but it still doesn't extract the group from the string. This has to be done outside MySQL
SELECT * FROM table
WHERE column RLIKE '\w*#\w*.\w*'
RLIKE is only for matching it, you can use REGEXP in the SELECT but it only returns 1 or 0 on whether it has found a match or not :s
If you do want to extract it in MySQL maybe this other stackoverflow post helps you out. But it seems like a lot of work instead of doing it outside MySQL
Now in MySQL 5 and 8 you can use REGEXP_SUBSTR to isolate just the email from a block of free text.
SELECT *, REGEXP_SUBSTR(`TEXT`, '([a-zA-Z0-9._%+\-]+)#([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,4})') AS Emails FROM `mytable`;
If you want to get just the records with emails and remove duplicates ...
SELECT DISTINCT REGEXP_SUBSTR(`TEXT`, '([a-zA-Z0-9._%+\-]+)#([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,4})') AS Emails FROM `mytable` WHERE `TEXT` REGEXP '([a-zA-Z0-9._%+\-]+)#([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,4})';

Performance of LIKE 'xyz%' v/s LIKE '%xyz'

I was wondering how the LIKE operator actually work.
Does it simply start from first character of the string and try matching pattern, one character moving to the right? Or does it look at the placement of the %, i.e. if it finds the % to be the first character of the pattern, does it start from the right most character and starts matching, moving one character to the left on each successful match?
Not that I have any use case in my mind right now, just curious.
edit: made question narrow
If there is an index on the column, putting constant characters in the front will lead your dbms to use a more efficient searching/seeking algorithm. But even at the simplest form, the dbms has to test characters. If it is able to find it doesn't match early on, it can discard it and move onto the next test.
The LIKE search condition uses wildcards to search for patterns within a string. For example:
WHERE name LIKE 'Mickey%'
will locate all values that begin with 'Mickey' optionally followed by any number of characters. The % is not case sensitive and not accent sensitive and you can use multiple %, for example
WHERE name LIKE '%mouse%'
will return all values with 'mouse' (or 'Mouse' or 'mousé') in it.
The % is inclusive, meaning that
WHERE name like '%A%'
will return all that starts with an 'A', contain 'A' or end with 'A'.
You can use _ (underscore) for any character on a single position:
WHERE name LIKE '_at%'
will give you all values with 'a' as the second letter and 't' as the third. The first letter can be anything. For example: 'Batman'
In T-SQL, if you use [] you can find values in a range.
WHERE name LIKE '[c-f]%'
it will find any value beginning with letter between c and f, inclusive. Meaning it will return any value that start with c, d, e or f. This [] is T-SQL only. Use [^ ] to find values not in a range.
Finding all values that contain a number:
WHERE name LIKE '%[0-9]%'
returns everything that has a number in it. Example: 'Godfather2'
If you are looking for all values with the 3rd position to be a '-' (dash) use two underscores:
WHERE NAME '__-%'
It will return for example: 'Lo-Res'
Finding the values with names ends in 'xyz' use:
WHERE name LIKE '%xyz'
returns anything that ends with 'xyz'
Finding a % sign in a name use brackets:
WHERE name LIKE '%[%]%'
will return for example: 'Top%Movies'
Searching for [ use brackets around it:
WHERE name LIKE '%[[]%'
gives results as: 'New York [NY]'
The database collation's sort order determines both case sensitivety and the sort order for the range of characters. You can optionally use COLLATE to specify collation sort order used by the LIKE operator.
Usually the main performance bottleneck is IO. The efficiency of the LIKE operator can be only important if your whole table fits in the memory otherwise IO will take most of the time.
AFAIK oracle can use indexes for prefix matching. (like 'abc%'), but these index cannot be used for more complex expressions.
Anyway if you have only this kind of queries you should consider using a simple index on the related column. (Probably this is true for other RDBMS's as well.)
Otherwise LIKE operator is generally slow, but most of the RDBMS have some kind of full text searching solution. I think the main reason of the slowness is that LIKE is too general. Usually full text indexes has lots of different options which can tell the database what you really want to search for, and with these additional information the DB can do its task in a more efficient way.
As a rule of thumb I think if you want to search in a text field and you think performance can be an issue, you should consider your RDBMS's full text searching solution, or the real goal is not text searching, but this is some kind of "design side effect", for example xml/json/statuses stored in a field as text, then probably you should consider choosing a more efficient data storing option. (if there is any...)

How to get records in mysql where first letter equals 'X', when dealing with spanish characters?

As an example:
select *
from recording as r
where left(r.title, 1) = 'p'
This picks all the recordings where the recording title starts with the letter p. I can use other letters...
But, the problem is that i'm dealing with a spanish table that contains recordings like ¿-Por qué? or «Por un amor» o even ¡..Pon la mesa!, etc... And i do want those recordings to be part of my result set.
How do i get the recordings that start with a specific letter only?
Do i need to create a separate column in the table, let's call it 'sort name', and insert the recording title with all of those special characters stripped out? Are they other solutions, special functions that i'm not aware of in mysql?
Thanks
You can use mysql regexp:
SELECT * FROM recording WHERE title REGEXP '^[^a-zA-Z]*[pP]'
You can use the SQL LIKE wildcard '_' this way :
SELECT * FROM recording AS r WHERE r.title LIKE '_P%'
It will accept any character before the P letter. (And only one character)
If you don't wan't to match words such as 'Ap...', you could use the SQL regex function (Google it, it's quite well documented), or check it afterward using PhP if the case is rare enough.
Edit : Ok, so you want to match titles with several special characters behind the first letter.
In that case, using the regexp function (SQL side or PhP side, depends on your needs) should be the best way

MySQL search within the last 5 characters in a column?

My user table has a column "name" which contains information like this:
Joe Lee
Angela White
I want to search for either first name or last name efficiently. First name is easy, I can do
SELECT * FROM user WHERE name LIKE "ABC%"
But for last name, if I do
SELECT * FROM user WHERE name LIKE "%ABC"
That would be extremely slow.
So I am thinking about counting the characters of the input, for example, "ABC" has 3 characters, and if I can search only the last three characters in name column, that would be great. So I want something like
SELECT * FROM user WHERE substring(name, end-3, end) LIKE "ABC%"
Is there anything in MySQL that can do this?
Thanks so much!
PS. I cannot do fulltext because our search engine doesn't support that.
The reason that
WHERE name LIKE '%ith'
is a slow way to look for 'John Smith' by last name is the same reason that
WHERE Right(name, InStr(name, ' ' )) LIKE 'smi%'
or any other expression on the column is slow. It defeats the use of the index for quick lookup and leaves the MySQL server doing a full table scan or full index scan.
If you were using Oracle (that is, if you worked for a formerly wealthy employer) you could use function indexes. As it is you have to add some extra columns or some other helping data to accelerate your search.
Your smartest move is to split your first and last names into separate columns. Several other people have pointed out good reasons for doing that.
If you can't do that you could try creating an extra column which contains the name string reversed, and create an index on that column. That column will have, for example, 'John Smith' stored as 'htimS nhoJ'. Then you can search as follows.
WHERE nameReversed LIKE CONCAT(REVERSE('ith'),'%')
This search will use the index and be decently fast. I've had good success with it.
You're close. In MySQL you should be able to use InStr(str, substr) and Right(str, index) to do the following:
SELECT * FROM user WHERE Right(name, InStr(name, " ")) LIKE "ABC%"
InStr(name, " ") returns the index of the Space character (you may have to play with the " " syntax). This index is then used in the Right() function to search for only the last name (basically; problems arise when you have multiple names, multiple spaces etc). LIKE "ABC%" would then search for a last name starting with ABC.
You cannot use a fixed index as names that are more than 3 or less than 3 characters long would not return properly as you suggest.
However, as Zane said, it's a much better practise to use seperate fields.
If it is a MyIsam table, you may use Free text search to do the same.
You can use the REGEXP operator:
SELECT * FROM user WHERE name REGEXP "ABC$"
http://dev.mysql.com/doc/refman/5.1/en/regexp.html