regular expression adding/removing pluralization from subject - mysql

I would like to perform a MySQL regular expression which will check a string for both singular and plural version of the string.
I am ignoring special characters and complex grammar rules and simply want to add or remove an 's'.
For example if the user enters 'shoes' it should return matches for both 'shoes' and 'shoe'. Conversely, if the user enters 'shoe' it should return matches for both 'shoe' and 'shoes.
I am able to optionally check against the plural version of the match as follows:
WHERE Store.tags RLIKE ('(^|,)+[[:space:]]*shoe(s)*[[:space:]]*(,|$)+')
I have been reading up about positive/negative look-ahead or look-behind and I seem to be constantly chasing my tail.

Well if you're only asking about removing or adding an s ... make it simple as your requirements ...
Just check if last letter is an s.. if yes -> remove it for singular, if no, add one for plural
WHERE (a LIKE '%shoes%' OR a LIKE '%shoe%')
It's as dumb as it gets but seeing you only need to check for s .. it's good enough.

Related

Performance of LIKE 'xyz%' v/s LIKE '%xyz'

I was wondering how the LIKE operator actually work.
Does it simply start from first character of the string and try matching pattern, one character moving to the right? Or does it look at the placement of the %, i.e. if it finds the % to be the first character of the pattern, does it start from the right most character and starts matching, moving one character to the left on each successful match?
Not that I have any use case in my mind right now, just curious.
edit: made question narrow
If there is an index on the column, putting constant characters in the front will lead your dbms to use a more efficient searching/seeking algorithm. But even at the simplest form, the dbms has to test characters. If it is able to find it doesn't match early on, it can discard it and move onto the next test.
The LIKE search condition uses wildcards to search for patterns within a string. For example:
WHERE name LIKE 'Mickey%'
will locate all values that begin with 'Mickey' optionally followed by any number of characters. The % is not case sensitive and not accent sensitive and you can use multiple %, for example
WHERE name LIKE '%mouse%'
will return all values with 'mouse' (or 'Mouse' or 'mousé') in it.
The % is inclusive, meaning that
WHERE name like '%A%'
will return all that starts with an 'A', contain 'A' or end with 'A'.
You can use _ (underscore) for any character on a single position:
WHERE name LIKE '_at%'
will give you all values with 'a' as the second letter and 't' as the third. The first letter can be anything. For example: 'Batman'
In T-SQL, if you use [] you can find values in a range.
WHERE name LIKE '[c-f]%'
it will find any value beginning with letter between c and f, inclusive. Meaning it will return any value that start with c, d, e or f. This [] is T-SQL only. Use [^ ] to find values not in a range.
Finding all values that contain a number:
WHERE name LIKE '%[0-9]%'
returns everything that has a number in it. Example: 'Godfather2'
If you are looking for all values with the 3rd position to be a '-' (dash) use two underscores:
WHERE NAME '__-%'
It will return for example: 'Lo-Res'
Finding the values with names ends in 'xyz' use:
WHERE name LIKE '%xyz'
returns anything that ends with 'xyz'
Finding a % sign in a name use brackets:
WHERE name LIKE '%[%]%'
will return for example: 'Top%Movies'
Searching for [ use brackets around it:
WHERE name LIKE '%[[]%'
gives results as: 'New York [NY]'
The database collation's sort order determines both case sensitivety and the sort order for the range of characters. You can optionally use COLLATE to specify collation sort order used by the LIKE operator.
Usually the main performance bottleneck is IO. The efficiency of the LIKE operator can be only important if your whole table fits in the memory otherwise IO will take most of the time.
AFAIK oracle can use indexes for prefix matching. (like 'abc%'), but these index cannot be used for more complex expressions.
Anyway if you have only this kind of queries you should consider using a simple index on the related column. (Probably this is true for other RDBMS's as well.)
Otherwise LIKE operator is generally slow, but most of the RDBMS have some kind of full text searching solution. I think the main reason of the slowness is that LIKE is too general. Usually full text indexes has lots of different options which can tell the database what you really want to search for, and with these additional information the DB can do its task in a more efficient way.
As a rule of thumb I think if you want to search in a text field and you think performance can be an issue, you should consider your RDBMS's full text searching solution, or the real goal is not text searching, but this is some kind of "design side effect", for example xml/json/statuses stored in a field as text, then probably you should consider choosing a more efficient data storing option. (if there is any...)

Finding small letter between two capital letters - MySQL

I've got problem - I need to find every single phrase like AbC (small b, between two Capital letters).
For Example a statement:
Little John had a ProBlEm and need to know how to do tHiS.
I need to select ProBlEm and tHiS (you see, BlE and HiS, one small letter in between two capital).
How can I select this?
In MySQL you can use a binary (to ensure case sensitivity) regular expression to filter for those records that contain such a pattern:
WHERE my_column REGEXP BINARY '[[:upper:]][[:lower:]][[:upper:]]'
However, it is not so straightforward to extract the substrings which match such a pattern from within MySQL. One can use a UDF, e.g. lib_mysqludf_preg, but it's probably a task more suited to being performed within your application layer. In either case, regular expressions can again help to simplify this task.
Firstly you have split the String. Please refer this SO Question
and then search each retrive word like
substring(word,2) LIKE '[A-Z]' COLLATE latin1_general_cs

Ideas for Find and Replace character

I need to search address fields and change one character to upper case if there is an apartment number. So '521 Main St. #3b' would change to '521 Main St. #3B'.
The way I know to do this would be to write a program that loops through the recordset, looks at the address field for the last character to see if it's an alpha, then if the character before it is a numeric, change the case of the last char and update the record.
Is this something that would be quicker/simpler with regular expressions (haven't ever used)?
If so, is this best done from within a programming environmnet or using a text editor such as Textmate or vi ? The data is in MySQL and Excel, but I can export it to a text file.
Thanks.
I solved this using TextMate which, once I began to understand a little regex, was simple. (details here Regex Syntax for making the last character Uppercase in TextMate)
Still, I wonder if something like sed or awk, (which I started to try out) might be a better tool. And the SQL solution that Olexa provided works. I just don't know how to have it apply to the entire recordset.
If the data is stored in MySQL, then it is better to process it there:
UPDATE addresses
SET address = CONCAT(LEFT(address, CHAR_LENGTH(address) - 1), UPPER(RIGHT(address, 1)))
WHERE address REGEXP BINARY '#[[:digit:]]+[[:lower:]]{1}$'
;
I've added BINARY because otherwise REGEXP is not case-sensitive, but BINARY may need to be omitted to support multi-byte strings. In this case, surplus updates will be made, but the result would be correct anyway.
P. S. An example on SQL Fiddle showing which values are affected, and how they are affected: http://sqlfiddle.com/#!2/b29326/1

MySQL: REGEXP to remove part of a record

I have a table "locales" with a column named "name". The records in name always begin with a number of characters folowed by an underscore (ie "foo_", "bar_"...). The record can have more then one underscore and the pattern before the underscore may be repeated (ie "foo_bar_", "foo_foo_").
How, with a simple query, can I get rid of everything before the first underscore including the first underscore itself?
I know how to do this in PHP, but I cannot understand how to do it in MySQL.
SELECT LOCATE('_', 'foo_bar_') ... will give you the location of the first underscore and SUBSTR('foo_bar_', LOCATE('_', 'foo_bar_')) will give you the substring starting from the first underscore. If you want to get rid of that one, too, increment the locate-value by one.
If you now want to replace the values in the tables itself, you can do this with an update-statement like UPDATE table SET column = SUBSTR(column, LOCATE('_', column)).
select substring('foo_bar_text' from locate('_','foo_bar_text'))
MySQL REGEXs can only match data, they can't do replacements. You'd need to do the replacing client-side in your PHP script, or use standard string operations in MySQL to do the changes.
UPDATE sometable SET somefield=RIGHT(LENGTH(somefield) - LOCATE('_', somefield));
Probably got some off-by-one errors in there, but that's the basic way of going about it.

MYSQL SELECT LIKE "masks"

I'm using a field in a table to hold information about varios checkboxes (60).
The field is parsed to a string to something like this
"0,0,0,1,0,1,0,1,..."
Now I want to make a search using a similar string to match the fields. I.e.
"?,?,1,?,?,1,..."
where the "?" means that it must be 0 or 1 (doesn't matter), but the "1" must match.
As i've seen the '%' is somewhat innapropriate for this case, don't?
Obviosly both strings have the same lenght.
Suggestions?
You can use the underscore (_) character to match a single character in the mask.
Taken from MySQL documentation.