Match specific string before user input - mysql

I have the following strings:
SDZ420-1241242,
AS42-9639263,
SPF3-2352353
I want to "escape" the SDZ420- part while searching and only search using the last digits, so far I've tried RLIKE '^[a-zA-Z\d-]' which works but I am confused on how to add the next digits (user input, say 1241242) to it. I cannot use LIKE '%$input' since that would return a row even if I just input '242' as the search string.
In simple words, a user input of '1241242' should return the row with 'SDZ420-1241242'. Is there any other approach other than creating a separate table with the numbers only?

Note that without jumping through some crazy hoops, this search needs to hit every row in the table; if you have an index on this, it's not going to use that (an index is generally used, assuming it's of the proper kind, which they tend to be, when you search on start, and generally only when using LIKE 'needle%' and not RLIKE. If that's a problem, storing the digits separately, and then putting an index on that, is probably the simplest way to solve your problem here.
To query for the final few digits, why not:
SELECT * FROM foo WHERE colName LIKE ?
with the string made in your programming language via:
String searchTerm = "%-" + digits;

You can also pass in the number as a string and use:
where substring_index(colname, '-', -1) = ?
This does not require changing the value in the application code.

Related

How do I create a SELECT conditional in MySQL where the conditional is the character length of the LIKE match?

I am working on a search function, where the matches are weighted based on certain conditions. One of the conditions I want to add weight to is matches where the character length of the query string in a LIKE match is longer than 4.
This is what I want to the query to look like, roughly. %s is meant to represent the actual match found by LIKE, but I don't think it does. I'm wondering if there is a special variable in MySQL that does represent the precise character match found by LIKE.
SELECT help.*,
IF(CHAR_LENGTH(%s) > 4, 2, 0) w
FROM help
WHERE (
(title LIKE '%this%' OR title LIKE '%testy%' OR title LIKE '%test%') OR
(content LIKE '%this%' OR content LIKE '%testy%' OR content LIKE '%test%')
) LIMIT 1000
edit: I could in the PHP split the search string array into two arrays based on the character length of the elements, with two separate queries that return different values for 'w', then combine the results, but I'd rather not do that, as it seems to me that would be awkward, messy, and slow.
Check out FULLTEXT as another way to discover rows. It will be faster, but won't address your question.
This probably has the effect you want.
SELECT ....
IF ( (title LIKE '%testy%' OR
content LIKE '%testy%'), 2, 0)
....
Note that the "match" in your LIKEs includes the %, so it is the entire length of the string. I don't think that is what you wanted.
REGEXP "(this|testy|that)" will match either 4 or 5 characters (in this example). It may be possible to do something with REGEXP_REPLACE to replace that with the empty string, then see how much it shrank.
I think the answer to my question is that what I wanted to do isn't possible. There is no special variable in MySQL representing the core character match in a WHERE condtional where LIKE is the operator. The match is the contents of the returned data row.
What I did to reach my objective was took the original dynamic list of search tokens, iterated through that list, and performed a search on each token, with the SQL tailored to the conditions that matched each token.
As I did this I built an array of the search results, using the id for the database row as the index for the array. This allowed me to perform calculations with the array elements, while avoiding duplicates.
I'm not posting the PHP code because the original question was about the SQL.

Performance of LIKE 'xyz%' v/s LIKE '%xyz'

I was wondering how the LIKE operator actually work.
Does it simply start from first character of the string and try matching pattern, one character moving to the right? Or does it look at the placement of the %, i.e. if it finds the % to be the first character of the pattern, does it start from the right most character and starts matching, moving one character to the left on each successful match?
Not that I have any use case in my mind right now, just curious.
edit: made question narrow
If there is an index on the column, putting constant characters in the front will lead your dbms to use a more efficient searching/seeking algorithm. But even at the simplest form, the dbms has to test characters. If it is able to find it doesn't match early on, it can discard it and move onto the next test.
The LIKE search condition uses wildcards to search for patterns within a string. For example:
WHERE name LIKE 'Mickey%'
will locate all values that begin with 'Mickey' optionally followed by any number of characters. The % is not case sensitive and not accent sensitive and you can use multiple %, for example
WHERE name LIKE '%mouse%'
will return all values with 'mouse' (or 'Mouse' or 'mousé') in it.
The % is inclusive, meaning that
WHERE name like '%A%'
will return all that starts with an 'A', contain 'A' or end with 'A'.
You can use _ (underscore) for any character on a single position:
WHERE name LIKE '_at%'
will give you all values with 'a' as the second letter and 't' as the third. The first letter can be anything. For example: 'Batman'
In T-SQL, if you use [] you can find values in a range.
WHERE name LIKE '[c-f]%'
it will find any value beginning with letter between c and f, inclusive. Meaning it will return any value that start with c, d, e or f. This [] is T-SQL only. Use [^ ] to find values not in a range.
Finding all values that contain a number:
WHERE name LIKE '%[0-9]%'
returns everything that has a number in it. Example: 'Godfather2'
If you are looking for all values with the 3rd position to be a '-' (dash) use two underscores:
WHERE NAME '__-%'
It will return for example: 'Lo-Res'
Finding the values with names ends in 'xyz' use:
WHERE name LIKE '%xyz'
returns anything that ends with 'xyz'
Finding a % sign in a name use brackets:
WHERE name LIKE '%[%]%'
will return for example: 'Top%Movies'
Searching for [ use brackets around it:
WHERE name LIKE '%[[]%'
gives results as: 'New York [NY]'
The database collation's sort order determines both case sensitivety and the sort order for the range of characters. You can optionally use COLLATE to specify collation sort order used by the LIKE operator.
Usually the main performance bottleneck is IO. The efficiency of the LIKE operator can be only important if your whole table fits in the memory otherwise IO will take most of the time.
AFAIK oracle can use indexes for prefix matching. (like 'abc%'), but these index cannot be used for more complex expressions.
Anyway if you have only this kind of queries you should consider using a simple index on the related column. (Probably this is true for other RDBMS's as well.)
Otherwise LIKE operator is generally slow, but most of the RDBMS have some kind of full text searching solution. I think the main reason of the slowness is that LIKE is too general. Usually full text indexes has lots of different options which can tell the database what you really want to search for, and with these additional information the DB can do its task in a more efficient way.
As a rule of thumb I think if you want to search in a text field and you think performance can be an issue, you should consider your RDBMS's full text searching solution, or the real goal is not text searching, but this is some kind of "design side effect", for example xml/json/statuses stored in a field as text, then probably you should consider choosing a more efficient data storing option. (if there is any...)

MySql, search with LIKE %str%

I search my table with query contains LIKE clause %str%.
Is here a way to know where string 'str' was finded in sentence?
I would like to print out 'str' as markup (bold).
For this I need information where exact 'str' begins in any row which contain 'str'.
you can get the string position using the POSITION function ( http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_position ) however thats probably not the best way to do it since if they use the markup more than once it will only return the first position. It would be easier just to replace the string with the string wrapped with whatever markup you want.
If you want an all MySQL solution this would probably work:
SELECT REPLACE(exampleTable.field, 'search_string', '<b>search_string</b>')
FROM exampleTable
WHERE exampleTable.field LIKE '%search_string%';
However i would recommend doing any replacement like this on the PHP / ASP side... using string replacement tools from the respective language.
Sure, you want INSTR() .
You could also use it in your where clause, though you'd want to compare performance between that and LIKE
SELECT INSTR(`field`, 'str') FROM `table` WHERE 0 < INSTR(`field`, 'str')
Remember that INSTR() returns a 1-based index, that is, the first character is postion 1, not position 0; saving 0 for "not found".

Converting a upper case database to proper case

I am new to SQL and I have several large database with upper case first and last names that I need to convert to proper case in SQL sever 2008.
I am using the following to do this:
update database
Set FirstNames = upper(substring(FirstNames, 1, 1))
+ lower(substring(FirstNames, 2, (len(FirstNames) - 1) ))
I was wondering if there was any way to adapt this so that a field with two first names is also updated (currently I make the change and then go through and manually change the second name).
I have looked over the other answers in this field and they all seem quit long, compared to the query above.
Also is there any way to assist with converting the Mc suranmes ( I will manually change the others)? MCDONALD to McDonald, again I am just using the about query but replacing the FirstNames with LastName.
This is probably best done outside of SQL. However, if there is a requirement to do it on the server or if speed isn't an issue (because it will be an issue so you need to figure out if you care), the way you are going about it is probably the best way of doing so. If you want, you could create a UDF that puts all of the logic in one area.
Here is some code I came across (with attribution and more information below it):
CREATE FUNCTION dbo.fCapFirst(#input NVARCHAR(4000)) RETURNS NVARCHAR(4000)
AS
BEGIN
DECLARE #position INT
WHILE IsNull(#position,Len(#input)) > 1
SELECT #input = Stuff(#input,IsNull(#position,1),1,upper(substring(#input,IsNull(#position,1),1))),
#position = charindex(' ',#input,IsNull(#position,1)) + 1
RETURN (#input)
END
--Call it like so
select dbo.fCapFirst(Lower(Column)) From MyTable
I got this code from http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=37760 There is more information and other suggestions in this forum as well.
As for dealing with cases like the McDonald, I would suggest one of two ways to handle this. One would be to put a search in the above UDF for key names ('McDonald', 'McGrew', etc.) or for patterns (the first two letters are Mc then make the next one capital, etc.) The second way would be to put these cases (the full names) in a table and have their replacement value in a second column. Then simply do a replace. Most likely, however, it will be easiest to identify rules like Mc then capitalize instead of trying to list every last-name possibility.
Don't forget you may want to modify the above UDF to include dashes, not just spaces.
Maybe this is too long but it is very easy and can be adapted for -, ', etc:
UPDATE tbl SET LastName = Case when (CharIndex(' ',lastname,1)<>0) then (Upper(Substring(lastname,1,1))+Lower(Substring(lastname,2,CharIndex(' ',lastname,1)-1)))+
(Upper(Substring(lastname,CharIndex(' ',lastname,1)+1,1))+
Lower(Substring(lastname,CharIndex(' ',lastname,1)+2,Len(lastname)-(CharIndex(' ',lastname,1)-1))))
else (Upper(Substring(lastname,1,1))+Lower(Substring(lastname,2,Len(lastname)-1))) end,
FirstName = Case when (CharIndex(' ',firstname,1)<>0) then (Upper(Substring(firstname,1,1))+Lower(Substring(firstname,2,CharIndex(' ',firstname,1)-1)))+
(Upper(Substring(firstname,CharIndex(' ',firstname,1)+1,1))+
Lower(Substring(firstname,CharIndex(' ',firstname,1)+2,Len(firstname)-(CharIndex(' ',firstname,1)-1))))
else (Upper(Substring(firstname,1,1))+Lower(Substring(firstname,2,Len(firstname)-1))) end;
Tony Rogerson has code that deals with:
double barrelled names eg Arthur Bentley-Smythe
Control characters
I haven't used it myself though...

Regexp MySql- Only strings containing two words

I have table with rows of strings.
I'd like to search for those strings that consists of only
two words.
I tried few ways with [[:space:]] etc but mysql was returning
three, four word strings also
try this:
select * from yourTable WHERE field REGEXP('^[[:alnum:]]+[[:blank:]]+[[:alnum:]]+$');
more details in link :
http://dev.mysql.com/doc/refman/5.1/en/regexp.html
^\w+\s\w+$ should do well.
Note; what I experience more often in the last days is that close to nobody uses the ^$-operators.
They are absolutely needed if you want to tell if a string starts or ends with something or want to match the string exactly, word for word, as you. "Normal" strings, like you used (I assume you used something like \w[:space]\w match in the string, what means that they also match if the condition is true anywhere within the string!
Keep that in mind and Regex will serve you well :)
REGEXP ('^[a-z0-9]*[[:space:]][a-z0-9]*$')