MySql Not Like Regexp? - mysql

I'm trying to find rows where the first character is not a digit. I have this:
SELECT DISTINCT(action) FROM actions
WHERE qkey = 140 AND action NOT REGEXP '^[:digit:]$';
But, I'm not sure how to make sure it checks just the first character...

First there is a slight error in your query. It should be:
NOT REGEXP '^[[:digit:]]'
Note the double square parentheses. You could also rewrite it as the following to avoid also matching the empty string:
REGEXP '^[^[:digit:]]'
Also note that using REGEXP prevents an index from being used and will result in a table scan or index scan. If you want a more efficient query you should try to rewrite the query without using REGEXP if it is possible:
SELECT DISTINCT(action) FROM actions
WHERE qkey = 140 AND action < '0'
UNION ALL
SELECT DISTINCT(action) FROM actions
WHERE qkey = 140 AND action >= ':'
Then add an index on (qkey, action). It's not as pleasant to read, but it should give better performance. If you only have a small number of actions for each qkey then it probably won't give any noticable performance increase so you can stick with the simpler query.

Your current regex will match values consisting of exactly one digit, not the first character only. Just remove the $ from the end of it, that means "end of value". It'll only check the first character unless you tell it to check more.
^[:digit:] will work, that means "start of the value, followed by one digit".

Related

Match on substring

How would I do the following query? I want to match on the entire string minus the last two characters. In python it would be:
>>> x='hello'
>>> x[:-2]
'hel'
Conceptually, in SQL it would be:
select * from main_tmp where vid_country_key[:-2] = '2wh_gQwkMQg'
How would I actually do this? And would the above be faster or slower than:
select * from main_tmp where vid_country_key LIKE '2wh_gQwkMQg%'
As you probably know, the two example queries you provided are not equivalent, but altering the second like so should give the same results as the first, and might keep the potential indexing advantage of the second:
SELECT *
FROM main_tmp
WHERE vid_country_key LIKE '2wh_gQwkMQg%'
AND LENGTH(vid_country_key) = LENGTH('2wh_gQwkMQg')+2
;
If it doesn't use the index (assuming there is one), you could try changing AND to HAVING, or look into USE INDEX.
Oh, for the record, if you wanted to know the actual syntax for your first example, it would be
WHERE LEFT(vid_country_key, GREATEST(LENGTH(vid_country_key)-2, 0)) = '2wh_gQwkMQg'
the GREATEST is need in case the length of the field is less than 2.

Performance of LIKE 'xyz%' v/s LIKE '%xyz'

I was wondering how the LIKE operator actually work.
Does it simply start from first character of the string and try matching pattern, one character moving to the right? Or does it look at the placement of the %, i.e. if it finds the % to be the first character of the pattern, does it start from the right most character and starts matching, moving one character to the left on each successful match?
Not that I have any use case in my mind right now, just curious.
edit: made question narrow
If there is an index on the column, putting constant characters in the front will lead your dbms to use a more efficient searching/seeking algorithm. But even at the simplest form, the dbms has to test characters. If it is able to find it doesn't match early on, it can discard it and move onto the next test.
The LIKE search condition uses wildcards to search for patterns within a string. For example:
WHERE name LIKE 'Mickey%'
will locate all values that begin with 'Mickey' optionally followed by any number of characters. The % is not case sensitive and not accent sensitive and you can use multiple %, for example
WHERE name LIKE '%mouse%'
will return all values with 'mouse' (or 'Mouse' or 'mousé') in it.
The % is inclusive, meaning that
WHERE name like '%A%'
will return all that starts with an 'A', contain 'A' or end with 'A'.
You can use _ (underscore) for any character on a single position:
WHERE name LIKE '_at%'
will give you all values with 'a' as the second letter and 't' as the third. The first letter can be anything. For example: 'Batman'
In T-SQL, if you use [] you can find values in a range.
WHERE name LIKE '[c-f]%'
it will find any value beginning with letter between c and f, inclusive. Meaning it will return any value that start with c, d, e or f. This [] is T-SQL only. Use [^ ] to find values not in a range.
Finding all values that contain a number:
WHERE name LIKE '%[0-9]%'
returns everything that has a number in it. Example: 'Godfather2'
If you are looking for all values with the 3rd position to be a '-' (dash) use two underscores:
WHERE NAME '__-%'
It will return for example: 'Lo-Res'
Finding the values with names ends in 'xyz' use:
WHERE name LIKE '%xyz'
returns anything that ends with 'xyz'
Finding a % sign in a name use brackets:
WHERE name LIKE '%[%]%'
will return for example: 'Top%Movies'
Searching for [ use brackets around it:
WHERE name LIKE '%[[]%'
gives results as: 'New York [NY]'
The database collation's sort order determines both case sensitivety and the sort order for the range of characters. You can optionally use COLLATE to specify collation sort order used by the LIKE operator.
Usually the main performance bottleneck is IO. The efficiency of the LIKE operator can be only important if your whole table fits in the memory otherwise IO will take most of the time.
AFAIK oracle can use indexes for prefix matching. (like 'abc%'), but these index cannot be used for more complex expressions.
Anyway if you have only this kind of queries you should consider using a simple index on the related column. (Probably this is true for other RDBMS's as well.)
Otherwise LIKE operator is generally slow, but most of the RDBMS have some kind of full text searching solution. I think the main reason of the slowness is that LIKE is too general. Usually full text indexes has lots of different options which can tell the database what you really want to search for, and with these additional information the DB can do its task in a more efficient way.
As a rule of thumb I think if you want to search in a text field and you think performance can be an issue, you should consider your RDBMS's full text searching solution, or the real goal is not text searching, but this is some kind of "design side effect", for example xml/json/statuses stored in a field as text, then probably you should consider choosing a more efficient data storing option. (if there is any...)

Removing single quotes from comparison in select statement

I have a table where a field can have single quotes, but I need to be able to search by that field without single quotes. For example, if the search query is "Johns favorite", I need to be able to find a row where that field contains "John's favorite". I was looking into regex for it, but that seems to return a 0 or 1 when used in a select statement, if I'm understanding it correctly.
Take a look at:
http://www.artfulsoftware.com/infotree/queries.php#552
This will give you the distance between two strings. I.e. you can check whether levensthein distance is less than 3, which means, less than 3 operations are required to be equal.
Try using REPLACE:
SELECT
IF(
REPLACE("John's favorite","'","") = "Johns favorite" ,
"found",
"not found"
)
It's not optimal but it should do the job.

Get all records between to alpha variables in alpha order mysql

I have a database of words for dictionary lookup purposes. What I need to be able to do with mysql is allow a user to input to variables (alpha) and my script will return every word that starts with both of those variables and everything in between.
Let's say the two variables are:
$letters1 = abor
$letters2 = accr
I want to get every word that starts with abor through accr. I need to return every word that would fit between those two starting points. So an example SQL statement that I know does not work but might help you understand what I am asking:
SELECT word from table1 WHERE word LIKE '%abor%' THROUGH '%accr%' ORDER BY word ASC
I know that THROUGH is not an operator but that's the general idea of what I need to accomplish.
If you merely want words that start with letters between the two variables, you can use MySQL's BETWEEN ... AND ... operator:
SELECT word FROM table1 WHERE word BETWEEN 'abor' AND 'accr' ORDER BY word

mysql fulltext MATCH,AGAINST returning 0 results

I am trying to follow: http://dev.mysql.com/doc/refman/4.1/en/fulltext-natural-language.html
in an attempt to improve search queries, both in speed and the ability to order by score.
However when using this SQL ("skitt" is used as a search term just so I can try match Skittles).
SELECT
id,name,description,price,image,
MATCH (name,description)
AGAINST ('skitt')
AS score
FROM
products
WHERE
MATCH (name,description)
AGAINST ('skitt')
it returns 0 results. I am trying to find out why, I think I might have set my index's up wrong I'm not sure, this is the first time I've strayed away from LIKE!
Here is my table structure and data:
Thank you!
By default certain words are excluded from the search. These are called stopwords. "a" is an example of a stopword. You could test your query by using a word that is not a stopword, or you can disable stopwords:
How can I write full search index query which will not consider any stopwords?
If you want to also match prefixes use the truncation operator in boolean mode:
*
The asterisk serves as the truncation (or wildcard) operator. Unlike the other operators, it should be appended to the word to be affected. Words match if they begin with the word preceding the * operator.