Match on substring - mysql

How would I do the following query? I want to match on the entire string minus the last two characters. In python it would be:
>>> x='hello'
>>> x[:-2]
'hel'
Conceptually, in SQL it would be:
select * from main_tmp where vid_country_key[:-2] = '2wh_gQwkMQg'
How would I actually do this? And would the above be faster or slower than:
select * from main_tmp where vid_country_key LIKE '2wh_gQwkMQg%'

As you probably know, the two example queries you provided are not equivalent, but altering the second like so should give the same results as the first, and might keep the potential indexing advantage of the second:
SELECT *
FROM main_tmp
WHERE vid_country_key LIKE '2wh_gQwkMQg%'
AND LENGTH(vid_country_key) = LENGTH('2wh_gQwkMQg')+2
;
If it doesn't use the index (assuming there is one), you could try changing AND to HAVING, or look into USE INDEX.
Oh, for the record, if you wanted to know the actual syntax for your first example, it would be
WHERE LEFT(vid_country_key, GREATEST(LENGTH(vid_country_key)-2, 0)) = '2wh_gQwkMQg'
the GREATEST is need in case the length of the field is less than 2.

Related

How do I create a SELECT conditional in MySQL where the conditional is the character length of the LIKE match?

I am working on a search function, where the matches are weighted based on certain conditions. One of the conditions I want to add weight to is matches where the character length of the query string in a LIKE match is longer than 4.
This is what I want to the query to look like, roughly. %s is meant to represent the actual match found by LIKE, but I don't think it does. I'm wondering if there is a special variable in MySQL that does represent the precise character match found by LIKE.
SELECT help.*,
IF(CHAR_LENGTH(%s) > 4, 2, 0) w
FROM help
WHERE (
(title LIKE '%this%' OR title LIKE '%testy%' OR title LIKE '%test%') OR
(content LIKE '%this%' OR content LIKE '%testy%' OR content LIKE '%test%')
) LIMIT 1000
edit: I could in the PHP split the search string array into two arrays based on the character length of the elements, with two separate queries that return different values for 'w', then combine the results, but I'd rather not do that, as it seems to me that would be awkward, messy, and slow.
Check out FULLTEXT as another way to discover rows. It will be faster, but won't address your question.
This probably has the effect you want.
SELECT ....
IF ( (title LIKE '%testy%' OR
content LIKE '%testy%'), 2, 0)
....
Note that the "match" in your LIKEs includes the %, so it is the entire length of the string. I don't think that is what you wanted.
REGEXP "(this|testy|that)" will match either 4 or 5 characters (in this example). It may be possible to do something with REGEXP_REPLACE to replace that with the empty string, then see how much it shrank.
I think the answer to my question is that what I wanted to do isn't possible. There is no special variable in MySQL representing the core character match in a WHERE condtional where LIKE is the operator. The match is the contents of the returned data row.
What I did to reach my objective was took the original dynamic list of search tokens, iterated through that list, and performed a search on each token, with the SQL tailored to the conditions that matched each token.
As I did this I built an array of the search results, using the id for the database row as the index for the array. This allowed me to perform calculations with the array elements, while avoiding duplicates.
I'm not posting the PHP code because the original question was about the SQL.

How to avoid a specific character in MySQL

I have a SQL table, with genetic information (name of the gene, function, strand...)
I want to retrieve the amount of chromosomes (21 as I'm working with the human genome). Problem is that some chromosomes are "repeated". For example:
SELECT DISTINCT chrom FROM table LIMIT 6;
chr1
chr10
chr10_GL383545v1_alt
chr10_GL383546v1_alt
chr11
chr11_JH159136v1_alt
As you can see I have more than one chr10, so if I count the DISTINCT chromosomes I get about 6000.
I've tried using NOT LIKE "_" but didn't work. I've thought I could "force" the result with LIKE "chr1" and so on, but I feel like cheating and is not exactly what I'm searching for. I would like a way to avoid every "_", but running
SELECT COUNT(DISTINCT chrom) NOT LIKE "_" FROM table; gives me back just 1 result...
LEFT is not optimal either, because I would have to specify the length of the string, and, I want a system that I could use without knowing anything about the expected result. So running a LEFT "", 4 and LEFT "", 5 is not what I'm searching for.
Is there a way I can count everything that does NOT CONTAIN a certain character? There's a better strategy?
Thank you very much!
Underscore is a wildcard character itself, so it must be escaped. Furthermore you want to match any characters before and after that underscore character so the % wildcard is needed around the escaped underscore.
SELECT count(chrom) FROM table WHERE chrom NOT LIKE '%\_%`;
Also you could use substring_index() to get distinct string before the underscore and count those:
SELECT COUNT(DISTINCT SUBSTRING_INDEX(chrom, '_', 1)) FROM table;
Although that is almost definitely going to be slower.
The problem with SELECT COUNT(DISTINCT chrom) NOT LIKE "_" FROM table; is the location of the comparison and the lack of the % wildcards in the LIKE comparison string.
Either of the following should work for you:
SELECT COUNT(DISTINCT chrom) FROM table WHERE chrom NOT LIKE '%|_%' ESCAPE '|';
Using ESACPE and specifying an escape character after the LIKE is easier than using \ in many cases since, depending on your scenario, you may need to remember to double escape with \. (or if you are writing this in say php, triple escape)
SELECT COUNT(DISTINCT chrom) FROM table WHERE LOCATE('_', chrom) > 0;
LOCATE() is also easier to use here. But I believe it would be slower than just doing a LIKE. The performance difference is probably pretty insignificant, so in most cases, its just preference.
Use REGEXP if you wish to keep it simple.LIKE is faster though.
SELECT count(chrom) FROM table WHERE chrom NOT REGEXP '_';
I also recommend INSTR which I think will perform better than REGEXP.
SELECT count(chrom) FROM table WHERE INSTR(chrom, '_')=0;

How to use prefix wildcards like '*abc' with match-against

I have the following query :
SELECT * FROM `user`
WHERE MATCH (user_login) AGAINST ('supriya*' IN BOOLEAN MODE)
Which outputs all the records starting with 'supriya'.
Now I want something that will find all the records ending with e.g. 'abc'.
I know that * cannot be preappended and it doesn't work either and I have searched a lot but couldn't find anything regarding this.
If I give query the string priya ..it should return all records ending with priya.
How do I do this?
Match doesn't work with starting wildcards, so matching with *abc* won't work. You will have to use LIKE to achieve this:
SELECT * FROM user WHERE user_login LIKE '%abc';
This will be very slow however.
If you really need to match for the ending of the string, and you have to do this often while the performance is killing you, a solution would be to create a separate column in which you reverse the strings, so you got:
user_login user_login_rev
xyzabc cbazyx
Then, instead of looking for '%abc', you can look for 'cba%' which is much faster if the column is indexed. And you can again use MATCH if you like to search for 'cba*'. You will just have to reverse the search string as well.
I believe the selection of FULL-TEXT Searching isn't relevant here. If you are interested in searching some fields based on wildcards like:
%word% ( word anywhere in the string)
word% ( starting with word)
%word ( ending with word)
best option is to use LIKE clause as GolezTrol has mentioned.
However, if you are interested in advanced/text based searching, FULL-TEXT search is the option.
Limitations with LIKE:
There are some limitations with this clause. Let suppose you use something like '%good' (anything ending with good). It may return irrelevant results like goods, goody.
So make sure you understand what you are doing and what is required.

What's the difference between '=' operator and LIKE when not using wildcards

I do this question, because I can't found a question with the same reason. The reason is when I use LIKE, I get CONSISTENT RESULTS, and when I use (=) operator I get INCONSISTENT RESULTS.
THE CASE
I have a BIG VIEW (viewX) with multiple inner joins and left joins, where some columns have null values, because the database definition allows for that.
When I open this VIEW I see for example: 8 rows as result.
When I run for example: select * from viewX where column_int = 34 and type_string = 'xyz', this query shows me 100 rows, that aren't defined in the result of the view. [INCONSISTENT]
BUT
When I run select * from viewX where column_int = 34 and type_string like 'xyz', this query show me only 4 rows, that is defined in the view when I opened (see 1.) [CONSISTENT]
Does anyone idea, of what is happening here?
From the documentation.....
'Per the SQL standard, LIKE performs matching on a per-character basis, thus it can produce results different from the = comparison operator: '
more importantly (when using LIKE):
'string comparisons are not case sensitive unless one of the operands is a binary string'
from :
http://dev.mysql.com/doc/refman/5.0/en/string-comparison-functions.html
Per the MySQL documentation LIKE does function differently than =, especially when you have trailing or leading spaces.
You need to post your actual query but I'm guessing it's related to the known variances.

MySql Not Like Regexp?

I'm trying to find rows where the first character is not a digit. I have this:
SELECT DISTINCT(action) FROM actions
WHERE qkey = 140 AND action NOT REGEXP '^[:digit:]$';
But, I'm not sure how to make sure it checks just the first character...
First there is a slight error in your query. It should be:
NOT REGEXP '^[[:digit:]]'
Note the double square parentheses. You could also rewrite it as the following to avoid also matching the empty string:
REGEXP '^[^[:digit:]]'
Also note that using REGEXP prevents an index from being used and will result in a table scan or index scan. If you want a more efficient query you should try to rewrite the query without using REGEXP if it is possible:
SELECT DISTINCT(action) FROM actions
WHERE qkey = 140 AND action < '0'
UNION ALL
SELECT DISTINCT(action) FROM actions
WHERE qkey = 140 AND action >= ':'
Then add an index on (qkey, action). It's not as pleasant to read, but it should give better performance. If you only have a small number of actions for each qkey then it probably won't give any noticable performance increase so you can stick with the simpler query.
Your current regex will match values consisting of exactly one digit, not the first character only. Just remove the $ from the end of it, that means "end of value". It'll only check the first character unless you tell it to check more.
^[:digit:] will work, that means "start of the value, followed by one digit".