Using REGEXP to change spaces into hyphens in limited situations - mysql

I have a keywords column and it contains stuff like this:
apples, oranges, pine apple
I'm trying to change the spaces to hyphens using this query"
UPDATE articles SET keywords = REPLACE(keywords," ","-") WHERE
keywords REGEXP '[A-Z] [A-Z]' limit 1;
But this adds hyphens where I don't want them, like this:
apples,-oranges,-pine-apple
Can this be done with REGEXP? Or will I need to involve PHP?
Thank you.

You're selecting rows based on the regular expression, but how does REPLACE() know about that? It's going to replace spaces with hyphens, just like you told it.
There are a few options for adding regexp-based search and replace in MySQL such as a UDF, and MariaDB supports it natively:
UPDATE articles SET keywords = REGEXP_REPLACE(keywords, "[A-Z] [A-Z]", "-");
Also worth mentioning that a LIMIT clause without an ORDER BY clause is not very helpful.

Related

How to avoid a specific character in MySQL

I have a SQL table, with genetic information (name of the gene, function, strand...)
I want to retrieve the amount of chromosomes (21 as I'm working with the human genome). Problem is that some chromosomes are "repeated". For example:
SELECT DISTINCT chrom FROM table LIMIT 6;
chr1
chr10
chr10_GL383545v1_alt
chr10_GL383546v1_alt
chr11
chr11_JH159136v1_alt
As you can see I have more than one chr10, so if I count the DISTINCT chromosomes I get about 6000.
I've tried using NOT LIKE "_" but didn't work. I've thought I could "force" the result with LIKE "chr1" and so on, but I feel like cheating and is not exactly what I'm searching for. I would like a way to avoid every "_", but running
SELECT COUNT(DISTINCT chrom) NOT LIKE "_" FROM table; gives me back just 1 result...
LEFT is not optimal either, because I would have to specify the length of the string, and, I want a system that I could use without knowing anything about the expected result. So running a LEFT "", 4 and LEFT "", 5 is not what I'm searching for.
Is there a way I can count everything that does NOT CONTAIN a certain character? There's a better strategy?
Thank you very much!
Underscore is a wildcard character itself, so it must be escaped. Furthermore you want to match any characters before and after that underscore character so the % wildcard is needed around the escaped underscore.
SELECT count(chrom) FROM table WHERE chrom NOT LIKE '%\_%`;
Also you could use substring_index() to get distinct string before the underscore and count those:
SELECT COUNT(DISTINCT SUBSTRING_INDEX(chrom, '_', 1)) FROM table;
Although that is almost definitely going to be slower.
The problem with SELECT COUNT(DISTINCT chrom) NOT LIKE "_" FROM table; is the location of the comparison and the lack of the % wildcards in the LIKE comparison string.
Either of the following should work for you:
SELECT COUNT(DISTINCT chrom) FROM table WHERE chrom NOT LIKE '%|_%' ESCAPE '|';
Using ESACPE and specifying an escape character after the LIKE is easier than using \ in many cases since, depending on your scenario, you may need to remember to double escape with \. (or if you are writing this in say php, triple escape)
SELECT COUNT(DISTINCT chrom) FROM table WHERE LOCATE('_', chrom) > 0;
LOCATE() is also easier to use here. But I believe it would be slower than just doing a LIKE. The performance difference is probably pretty insignificant, so in most cases, its just preference.
Use REGEXP if you wish to keep it simple.LIKE is faster though.
SELECT count(chrom) FROM table WHERE chrom NOT REGEXP '_';
I also recommend INSTR which I think will perform better than REGEXP.
SELECT count(chrom) FROM table WHERE INSTR(chrom, '_')=0;

Use REPLACE() to ORDER BY a mySQL SELECT alphanumerically when special characters are present

I had done several different searches on SO looking for a simple solution to sorting mySQL results alphanumerically where some fields may have special characters present. The solution:
"SELECT *, REPLACE(title '\"', '') AS indexTitle ORDER BY indexTitle ASC";
In this case I'm searching for strings that begin with a double quote, escaped.
This probably wouldn't be a great solution where the types of special characters are not known, but for a simple sort it works nicely.
Hopefully this helps someone.
One way to do this would be to write your own function to strip non-alphanumeric characters from a String. Google found me this example (I've not checked it!). Then you could write something like:
SELECT *, remove_non_alphanum_char_f(title) AS indexTitle ORDER BY indexTitle ASC;
Though of course as #arkascha has pointed out in the comments above this is slow and not scalable. A better solution is to go back a step and, if possible, ensure the data in your table is in the correct format to begin with. If you really need the special characters, it may be less of an overhead to add an extra column to your table which is the title column with the special characters stripped - then you could just order by that column. You could perform the stripping at the point when you insert into the table.

How to remove fake names using regular expression in mysql query?

I want to remove the names which may be registered with fake names.
As the developer forgot to put validation on form registration.
Now i want to remove the fake names.
And for checking if that name is fake or not, I am checking if the name content any numbers or not ?
This is my query which i have written but its not working...
SELECT registration.regi_id, student.first_name,
student.cont_no, student.email_id,
registration.college,
registration.event_name,
registration.accomodation
FROM student, registration
WHERE student.stud_id = registration.stud_id
AND student.first_name NOT RLIKE '%[0-9]%'
How to fix this problem ?
Sorry for my language issues,
P.S.
There are many names in "first_name" field like "asdfasdf12323", i don't want that kind of names to be shown on list.
Your column may contain Alphanumeric characters also.YOu need to filter Numbers and Alphanumeric characters both
For Alphanumeric characters Try REGEXP '^[A-Za-z0-9]+$'
For numbers Try REGEXP '[0-9]'
Well as far as the regex is involved, your expression is only looking for a single number. Also, your 'NOT RLIKE' isn't using regex but is doing a basic string search for the literal '[0-9]' I believe. MySql has support for regex, and your last clause would look like so: AND student.first_name NOT REGEXP '[0-9]*'

SQL LIKE wildcard space character

let's say I have a string in which the words are separated by 1 or more spaces and I want to use that string in and SQL LIKE condition. How do I make my SQL and tell it to match 1 or more blank space character in my string? Is there an SQL wildcard that I can use to do that?
Let me know
If you're just looking to get anything with atleast one blank / whitespace then you can do something like the following WHERE myField LIKE '% %'
If your dialect allows it, use SIMILAR TO, which allows for more flexible matching, including the normal regular expression quantifiers '?', '*' and '+', with grouping indicated by '()'
where entry SIMILAR TO 'hello +there'
will match 'hello there' with any number of spaces between the two words.
I guess in MySQL this is
where entry RLIKE 'hello +there'
I know this is late, but I never found a solution to this in relation to a LIKE question.
There is no way to do what you're wanting within a SQL LIKE. What you would have to do is use REGEXP and [[:space:]] inside your expression.
So to find one or more spaces between two words..
WHERE col REGEXP 'firstword[[:space:]]+secondword'
Another way to match for one or more space would be to use [].
It's done like this.
LIKE '%[ ]%'
This will match one or more spaces.
you can't do this using LIKE but what you can do, if you know this condition can exist in your data, is as you're inserting the data into the table, use regular expression matching to detect it up front and set a flag in a different column created for this purpose.
I just replace the whitespace chars with '%'. Lets say I want to do a LIKE query on a string like this 'I want to query this string with a LIKE'
#search_string = 'I want to query this string with a LIKE'
#search_string = ("%"+#search_string+"%").tr(" ", "%")
#my_query = MyTable.find(:all, :conditions => ['my_column LIKE ?', #search_string])
first I add the '%' to the start and end of string with
("%"+#search_string+"%")
and then replace other remaining whitespace chars with '%' like so
.tr(" ", "%")
http://www.techonthenet.com/sql/like.php
The patterns that you can choose from are:
% allows you to match any string of any length (including zero length)
_ allows you to match on a single character
I think that the question is not asking to match any spaces but to match two strings one a pattern and the other with wrong number of spaces because of typos.
In my case I have to check two fields from different tables one preloaded and the other filled typed by users so sometimes they don't respect 100% the pattern.
The solution was to use LIKE in the join
Select table1.field
from table1
left join table2 on table1.field like('%' + replace(table2.field,' ','%')+'%')
if the condition:
WHERE myField LIKE '%Hello world%'
doesn't work try
WHERE myField LIKE '%Hello%'
and
WHERE myField LIKE '%world%'
this approach is helpful in a few specific use cases, hope this helps.

mysql query to match sentence against keywords in a field

I have a mysql table with a list of keywords such as:
id | keywords
---+--------------------------------
1 | apple, oranges, pears
2 | peaches, pineapples, tangerines
I'm trying to figure out how to query this table using an input string of:
John liked to eat apples
Is there a mysql query type that can query a field with a sentence and return results (in my example, record #1)?
One way to do it could be to convert apple, oranges, pears to apple|oranges|pears and use RLIKE (ie regular expression) to match against it.
For example, 'John liked to eat apples' matches the regex 'apple|orange|pears'.
First, to convert 'apple, oranges, pears' to the regex form, replace all ', ' by '|' using REPLACE. Then use RLIKE to select the keyword entries that match:
SELECT *
FROM keywords_table
WHERE 'John liked to eat apples' RLIKE REPLACE(keywords,', ','|');
However this does depend on your comma-separation being consistent (i.e. if there is one row that looks like apples,oranges this won't work as the REPLACE replaces a comma followed by a space (as per your example rows).
I also don't think it'll scale up very well.
And, if you have a sentence like 'John liked to eat pineapples', it would match both of the rows above (as it does have 'apple' in it). You could then try to add word boundaries to the regex (i.e. WHERE $sentence RLIKE '[[:<:]](apple|oranges|pears)[[:>:]]'), but this would screw up matching when you have plurals ('apples' wouldn't match '[wordboundary]apple[wordboundary]').
Hopefully this isn't more abstract than what you need but maybe good way of doing it.
I haven't tested this but I think it would work. If you can use PHP you can use str_replace to turn the spaces into keyword LIKE '%apple%'
$sentence = "John liked to eat apples";
$sqlversion = str_replace(" ","%' OR Keyword like '%",$sentence );
$finalsql = "%".$sqlversion."%";
the above will echo:
%John%' OR Keyword like '%liked%' OR Keyword like '%to%' OR Keyword like '%eat%' OR Keyword like '%apples%
Then just combine with your SQl statement
SQL ="SELECT *
FROM keywords_table
WHERE Keyword like" . $finalsql;
Storing comma delimited data is... less than ideal.
If you broke up the string "John liked to eat apples" into individual words, you could use the FIND_IN_SET operator:
WHERE FIND_IN_SET('apple', t.keywords) > 0
The performance wouldn't be great - this operation is better suited to Full Text Search.
I'm not aware of any direct solution to that type of query. But Full Text Search is a possibility. If you have a full-text index on the field of interest then a search with OR between each word in the sentence (although I think the OR operator is implied) would find that record ... but it might also find more than you want too.
I really don't think what you are looking for is completely possible but you can look into Full Text Search or SOUNDEX. SOUNDEX, for example, can do something like:
WHERE SOUNDEX(sentence) = SOUNDEX('%'+keywords+'%');
I have never tried it in this context but you should and let me know how it works out.