How to avoid a specific character in MySQL - mysql

I have a SQL table, with genetic information (name of the gene, function, strand...)
I want to retrieve the amount of chromosomes (21 as I'm working with the human genome). Problem is that some chromosomes are "repeated". For example:
SELECT DISTINCT chrom FROM table LIMIT 6;
chr1
chr10
chr10_GL383545v1_alt
chr10_GL383546v1_alt
chr11
chr11_JH159136v1_alt
As you can see I have more than one chr10, so if I count the DISTINCT chromosomes I get about 6000.
I've tried using NOT LIKE "_" but didn't work. I've thought I could "force" the result with LIKE "chr1" and so on, but I feel like cheating and is not exactly what I'm searching for. I would like a way to avoid every "_", but running
SELECT COUNT(DISTINCT chrom) NOT LIKE "_" FROM table; gives me back just 1 result...
LEFT is not optimal either, because I would have to specify the length of the string, and, I want a system that I could use without knowing anything about the expected result. So running a LEFT "", 4 and LEFT "", 5 is not what I'm searching for.
Is there a way I can count everything that does NOT CONTAIN a certain character? There's a better strategy?
Thank you very much!

Underscore is a wildcard character itself, so it must be escaped. Furthermore you want to match any characters before and after that underscore character so the % wildcard is needed around the escaped underscore.
SELECT count(chrom) FROM table WHERE chrom NOT LIKE '%\_%`;
Also you could use substring_index() to get distinct string before the underscore and count those:
SELECT COUNT(DISTINCT SUBSTRING_INDEX(chrom, '_', 1)) FROM table;
Although that is almost definitely going to be slower.

The problem with SELECT COUNT(DISTINCT chrom) NOT LIKE "_" FROM table; is the location of the comparison and the lack of the % wildcards in the LIKE comparison string.
Either of the following should work for you:
SELECT COUNT(DISTINCT chrom) FROM table WHERE chrom NOT LIKE '%|_%' ESCAPE '|';
Using ESACPE and specifying an escape character after the LIKE is easier than using \ in many cases since, depending on your scenario, you may need to remember to double escape with \. (or if you are writing this in say php, triple escape)
SELECT COUNT(DISTINCT chrom) FROM table WHERE LOCATE('_', chrom) > 0;
LOCATE() is also easier to use here. But I believe it would be slower than just doing a LIKE. The performance difference is probably pretty insignificant, so in most cases, its just preference.

Use REGEXP if you wish to keep it simple.LIKE is faster though.
SELECT count(chrom) FROM table WHERE chrom NOT REGEXP '_';
I also recommend INSTR which I think will perform better than REGEXP.
SELECT count(chrom) FROM table WHERE INSTR(chrom, '_')=0;

Related

mysql MATCH AGAINST weird characters query

I have a table where the field "company_name" has weird characters, like "à","ö","¬","©","¬","†", etc. I want to return all "company_name"s that contain these characters anywhere within the string. My current query looks like this:
SELECT * FROM table WHERE
MATCH (company_name) AGAINST ('"Ä","à","ö","¬","©","¬","†"' in natural language mode);
But I keep getting no data from the query. I know this can't be the case, as there are definitely examples of them I can find manually. To be clear, the query itself isn't throwing any errors, just not returning any data.
The minimun word length is 3 pr 4 .
you can change it see manial
https://dev.mysql.com/doc/refman/8.0/en/fulltext-fine-tuning.html
or use regular expressiions
SELECT * FROM table WHERE
ompany_name REGEXP '[Äàö¬©¬†]+';
SELECT *
FROM table
WHERE company_name LIKE '%[^0-9a-zA-Z !"#$%&''()*+,\-./:;<=>?#\[\^_`{|}~\]\\]%' ESCAPE '\'
This will find any wacky stuff, including wide characters or 'euro-ASCII' or emoji.

Why isn't MySQL REGEXP filtering out these values?

So I'm trying to find what "special characters" have been used in my customer names. I'm going through updating this query to find them all one-by-one, but it's still showing all customers with a - despite me trying to exlude that in the query.
Here's the query I'm using:
SELECT * FROM customer WHERE name REGEXP "[^\da-zA-Z\ \.\&\-\(\)\,]+";
This customer (and many others with a dash) are still showing in the query results:
Test-able Software Ltd
What am I missing? Based on that regexp, shouldn't that one be excluded from the query results?
Testing it on https://regex101.com/r/AMOwaj/1 shows there is no match.
Edit - So I want to FIND any which have characters other than the ones in the regex character set. Not exclude any which do have these characters.
Your code checks if the string contains any character that does not belong to the character class, while you want to ensure that none does belong to it.
You can use ^ and $ to check the while string at once:
SELECT * FROM customer WHERE name REGEXP '^[^\da-zA-Z .&\-(),]+$';
This would probably be simpler expressed with NOT, and without negating the character class:
SELECT * FROM customer WHERE name NOT REGEXP '[\da-zA-Z .&\-(),]';
Note that you don't need to escape all the characters within the character class, except probably for -.
Use [0-9] or [[:digit:]] to match digits irrespective of MySQL version.
Use the hyphen where it can't make part of a range construction.
Fix the expression as
SELECT * FROM customer WHERE name REGEXP "[^0-9a-zA-Z .&(),-]+";
If the entire text should match this pattern, enclose with ^ / $:
SELECT * FROM customer WHERE name REGEXP "^[^0-9a-zA-Z .&(),-]+$";
- implies a range except if it is first. (Well, after the "not" (^).)
So use
"[^-0-9a-zA-Z .&(),]"
I removed the + at the end because you don't really care how many; this way it will stop after finding one.

What is the purpose of using WHERE COLUMN like '%[_][01][7812]' in SQL statements?

What is the purpose of using WHERE COLUMN like '%[_][01][7812]' in SQL statements?
I get some result, but don't know how to use properly.
I see that it is searching through the base, but I don't understand the pattern.
Like selects strings similar to a pattern. The pattern you're looking at uses several wildcards, which you can review here: https://www.w3schools.com/SQL/sql_wildcards.asp
Briefly, the query seems to ba matching any row where COLUMN ends in an _ then a 0 or a 1, then a 7,8,1, or 2. (So it would match 'blah_07' but not 'blah_81', 'blah_0172', or 'blah18')
First thing as you might be aware that where clause is used for filtering rows.
In your case (Where column Like %[_][01][7812]) Means find the column ending with [_][01][7812] and there could be anything place of %
declare
#searchString varchar(50) = '[_][01][7812]',
#testString varchar(50) = 'BeginningOfString' + '[_][01][7812]' + 'EndofString'
select CHARINDEX(#searchString, #testString), #testString, LEN(#testString) as [totalLength]
set #testString = '[_][01][7812]' + 'EndofString'
select CHARINDEX(#searchString, #testString), #testString, LEN(#testString) as [totalLength]
set #testString = 'BeginningOfString' + '[_][01][7812]'
select CHARINDEX(#searchString, #testString), #testString, LEN(#testString) as [totalLength]
Although you've tagged your post MySQL, that code seems unlikely to have been written for it. That LIKE pattern, to me, resembles Microsoft SQL Server's variation on the syntax, where it would match anything ending with an underscore followed by a zero or a one, followed by a 7, an 8 a 1 or a 2.
So your example 'TA01_55_77' would not match, but 'TA01_55_18' would, as would 'GZ01_55_07'
(In SQL Server, enclosing a wildcard character like '_' in square brackets escapes it, turning it into a literal underscore.)
Of course, there may be other RDBMSes with similar syntax, but what you've presented doesn't seem like it would work on the data you've got if running in MySQL.

how to handle white spaces in sql

I want to write an SQL query that will fetch all the students who live in a specific Post Code. Following is my query.
SELECT * FROM `students` AS ss WHERE ss.`postcode` LIKE 'SE4 1NA';
Now the issue is that in database some records are saved without the white space is postcode, like SE41NA and some may also be in lowercase, like se41na or se4 1na.
The query gives me different results based on how the record is saved. Is there any way in which I can handle this?
Using regexp is one way to do it. This performs a case insensitive match by default.
SELECT * FROM students AS ss
WHERE ss.postcode REGEXP '^SE4[[:space:]]?1NA$';
[[:space:]]? matches an optional space character.
REGEXP documentation MySQL
Whether case matters depends on the collation of the string/column/database/server. But, you can get around it by doing:
WHERE UPPER(ss.postcode) LIKE 'SE4%1NA'
The % will match any number of characters, including none. It is a bit too general for what you might really need -- but it should work fine in practice.
The more important issue is that your database does not validate the data being put into it. You should fix the application so the postal codes are correct and follow a standard format.
Use a combination of UPPER and REPLACE.
SELECT *
FROM students s
WHERE UPPER(REPLACE(s.postcode, ' ', '')) LIKE '%SE41NA%'
SELECT * FROM students AS ss
WHERE UPPER(REPLACE(ss.postcode, ' ', '')) = 'SE41NA' ;
SELECT *
FROM students AS ss
WHERE UPPER(ss.postcode) LIKE SELECT REPLACE(UPPER('SE4 1NA'), ' ', '%'); ;
I propose using the spaces replaced with the'%' placeholder. Also transform the case to upper for both sides of the LIKE operator

Having trouble matching a single character in an SQL table

I need to use the '_' wildcard to find all id that are only one letter which there are a few of. However when I run my query no rows are returned.
Heres my query:
SELECT *
FROM table
WHERE id LIKE '_';
I have a table lets call Table1 that has two columns, id and name.
id either has 1 or 2 characters to label a name. I'm trying to only find the names where the id is only one character. Heres an example of the table:
id name
A Alfred
AD Andy
B Bob
BC Bridget
I only want to return Alfred and Bob in this example.
I don't want the solution but any advice or ideas would be helpful.
Here is a screenshot of my query:
http://i.imgur.com/EWTfoVI.png?1
And here is a small example of my table:
http://i.imgur.com/urGRZeK.png?1
So in this example of my table I would ideally like only East Asia... to be returned.
I if I search specifically for the character it works but for some strange reason the '_' wildcard doesn't.
For example:
SELECT *
FROM icao
WHERE prefix_code ='Z';
This works.
Try using TRIM
Select *
FROM [Table]
where TRIM(ID) LIKE '_';
In MySQL, the underscore is used to represent a wildcard for a single character. You can read more about that Pattern Matching here.
The way you have it written, your query will pull any rows where the id column is just one single character, you don't need to change anything.
Here is an SQL Fiddle example.
EDIT
One trouble shooting tip is to be sure there is no whitespace before/after the prefix code. If there is, and you need to remove it, add TRIM():
SELECT *
FROM myTable
WHERE TRIM(id) LIKE '_';
Here is an example with TRIM.
EDIT 2
A little explanation to your weird behavior, hopefully. In MySQL, if there is trailing white space on a character, it will still match if you say id = 'Z'; as seen by this fiddle now. However, leading white space will not match this, but will still be corrected by TRIM(), because that removes white space on the front and back end of the varchar.
TL;DR You have trailing white space after Z and that's causing the problem.
The most likely explanation for the behavior you observe is trailing spaces (or other whitespace) in the value. That is, you see one character
'A'
But the value may actually be stored as two (or more) characters.
'A '
To see what's actually stored, you can use the HEX and LENGTH functions.
SELECT t.foo
, LENGTH(t.foo)
, HEX(t.foo)
FROM mytable t
WHERE t.foo LIKE 'A%'
The % is a wildcard for the LIKE operator that matches any number of characters (zero, one or more).
You can use the RTRIM() function to remove trailing spaces...
SELECT RTRIM(t.foo)
, LENGTH(RTRIM(t.foo))
, HEX(RTRIM(t.foo))
FROM mytable t
WHERE t.foo LIKE 'A%'
SELECT *
FROM table
WHERE LENGTH(id)=1
Strange..., in my case works perfectly (I am using mysql 5.5).
Please, try this:
select * from mysql.help_topic where name like '_';
What set you get?