I would like to extract specific pattern from string in MySQL.
The column contains specific string like xxx-atg168d and xxx-atg444-6x. From these string, I want to extract atg168 and atg444 only. How can I perform this in MySQL?
**Input_column**
xxx-atg168d
xxx-atg444-6x
xxx-atg1689d
xxx-atg16507d
xxx-atg444d-6x
xxx-atg444c-6x
**Output_column**
atg168
atg444
atg1689
atg16507
atg444
atg444
Something like this may meet your specification:
SUBSTRING_INDEX(SUBSTR( t.col ,INSTR( t.col ,'-')+1),'-',1)
This assumes that you want to return the portion of the string following the first dash character, up to the next dash character (if present). If no dash characters exist within the string, the entire string will be returned.
EDIT
Ooops. That expression also includes the trailing "d". If it's just a trailing "d" character that needs to be removed...
TRIM(TRAILING 'd' FROM SUBSTRING_INDEX(SUBSTR( t.col ,INSTR( t.col ,'-')+1),'-',1))
In the more general case, to remove any "non-digit" character from the end (not just a "d"), things get pretty ugly. We need to check the rightmost character, and see if it matches a character we wnt to keep. If it's not, we shorten the string by one character.
IF( INSTR('0123456789',RIGHT(
#t := SUBSTRING_INDEX(SUBSTR( t.col ,INSTR( t.col ,'-')+1),'-',1)
,1))
, #t
, SUBSTRING( #t, 1, CHAR_LENGTH( #t )-1)
)
I made use of a user-defined variable here to avoid repeating the same expression multiple times. It's not required that we do that. The #t :=assignment can be removed, and other occurrences of #t can be replaced with the expression that was assigned to #t.
The literal '0123456789' in that expression is the set of characters that we don't want to remove from the end of the string.
Use SUBSTRING function , like this :
select SUBSTRING(column_name ,5,6) from table_name;
Here 5 is starting position and 6 is length of sub-string getting extracted from string.
Thanks spencer for your suggestions. I edited your code to get the solution for my query. Here is the update query,
left(substring_index (substr(subid,instr(subid,'-')+1),'-',1) , char_length(substring_index (substr(subid,instr(subid,'-')+1),'-',1))-1)
Related
I need to use special character like ( \ ) character in mysql string function and unfortunately it doesn't work properly!for example couldn't search this character alone (locate-instr-substring_index-concat and even in set variable value are function that i need and test )
like thses
SELECT LOCATE("\", "Schools.co\m", 1) AS MatchPosition;
select SUBSTRING_INDEX("footba\l","\",1)
I will appreciate if anybody could help me
Backslash needs to be escaped. To fix your SUBSTRING_INDEX example, consider the following:
SELECT SUBSTRING_INDEX("footba\\l","\\",1) FROM dual
Here, backslash has to be escaped both in the string literal and in the text to match.
To escape a literal backslash inside a LIKE expression, use four backslashes, e.g.
SELECT 'match' FROM dual WHERE "footba\\l" LIKE '%\\\\%';
Demo
I have tried:
....WHERE fieldname REGEXP '.*\(.*\).*';
But this returns every record in the table.
You should use double backslashes when escaping a regex special metacharacter in a REGEXP pattern. Also, since REGEXP also finds partial matches, you do not need the .* at the start/end.
So, you could fix the expression as
WHERE fieldname REGEXP '\\(.*\\)';
Or just use LIKE where % matches any amount of arbitrary chars (but must match the whole entry unlike REGEXP):
WHERE fieldname LIKE '%(%)%';
A MySQL fiddle:
DROP TABLE IF EXISTS t;
CREATE TABLE t (word varchar(255));
INSERT INTO t (word)
VALUES
('test (here)'),
('test (here) test'),
('(here) test'),
('test no'),
('no test');
SELECT * FROM t WHERE word REGEXP '\\(.*\\)';
SELECT * FROM t WHERE word LIKE '%(%)%';
To get entries like text ) and ( here, you may use
SELECT * FROM t WHERE word REGEXP '\\(.*\\)|\\).*\\(';
SELECT * FROM t WHERE word LIKE '%(%)%' OR word LIKE '%)%(%';
See another fiddle.
LIKE would be more efficient than REGEXP, but to answer in REGEXP terms...
I suspect the OP wants parentheses to be somewhere in the middle of row text.
The problem with the REGEXP supplied is that expressions are greedy.
As soon as the .* finishes matching, it has already soaked up the entire record.
Try making the first two .* expressions as [^(]( and [^)]) and removing the final .* as it is superfluous.
'[^\\(]*\\([^\\)]*\\)'
Basically, this expression says
Look for zero or more non ( characters
then look for a single ( character
then look for zero or more non ) characters
then look for a single ) character
The () may be anywhere in the record and may contain zero or more characters inside the ().
Might I suggest fiddling in https://regex101.com/
Hope this helps.
I have a list of numbers in some fields in a table, for example something like this:
2033,1869,1914,1913,19120,1911,1910,1909,1908,1907,1866,1921,1922,1923
Now, I'm trying to do a query to check if a number is found in the row, however, I can't use LIKE as then it may return false positives as if I did a search for 1912 in the above field I would get a result returned because of the number 19120, obviously we don't want that - we can't append or prepend a comma as the start/end numbers don't have them.
So, onto using REGEXP I go... I tried this, but it doesn't work (it returns a result):
SELECT * FROM cat_listing WHERE cats REGEXP '[^0-9]*1912[^0-9]*';
I imagine why it still finds something is because of the * quantifier; it found [^0-9] 0 times AFTER 1912 so it considers it a match.
I'm not sure how to modify it to do what I want.
In your case, it seems word boundaries are necessary:
SELECT * FROM cat_listing WHERE cats REGEXP '[[:<:]]1912[[:>:]]';
[[:<:]] is the beginning of a word and [[:>:]] is the end. See reference:
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and end of >words, respectively. A word is a sequence of word characters that is not >preceded by or followed by word characters. A word character is an alphanumeric >character in the alnum class or an underscore (_).
You have another option called find_in_set()
SELECT * FROM cat_listing WHERE find_in_set('1912', cats) <> 0;
Returns 0 if str is not in strlist or if strlist is the empty string. Returns NULL if either argument is NULL. This function does not work properly if the first argument contains a comma (“,”) character.
No need to use a regex just because the column value has no comma at either end:
SELECT
cats
FROM cat_listing
WHERE INSTR(CONCAT(',', cats, ','), ',1912,')
;
See it in action: SQL Fiddle.
Please comment if adjustment / further detail is required.
In MySql (I'm using 5.1.48), the following regular expressions return true i.e 1.
SELECT '10-5' REGEXP '10-5' as temp;
SELECT '10/5' REGEXP '10/5' as temp;
SELECT '1*5' REGEXP '1*5' as temp;
The following expressions however return false i.e 0.
SELECT '10+5' REGEXP '10+5' as temp;
SELECT '10*5' REGEXP '10*5' as temp;
To use a literal instance of a special character in a regular
expression, precede it by two backslash (\) characters. The MySQL
parser interprets one of the backslashes, and the regular expression
library interprets the other.
Escaping + and * in the preceding two statements returns true i.e 1 as follows.
SELECT '10+5' REGEXP '10\\+5' as temp;
SELECT '10*5' REGEXP '10\\*5' as temp;
If this is the case then why is * in the following statement (the last one in the first snippet) not required to escape?
SELECT '1*5' REGEXP '1*5' as temp;
It returns true i.e 1 without escaping * and the following something similar (the last one in the second snippet) returns false.
SELECT '10*5' REGEXP '10*5' as temp;
It requires * to be escaped. Why?
An unescaped asterisk, as you know, means "zero or more of the preceeding character", so "1*5" means "any number of 1s, followed by a 5".
The key is this info from the doc:
A REGEXP pattern match succeeds if the pattern matches anywhere in the value being tested. (This differs from a LIKE pattern match, which succeeds only if the pattern matches the entire value.)
So, "1*5" ("any number of 1s, followed by a 5") will match the string "1*5" by only seeing the "5". "10*5" ("1, followed by any number of 0s, followed by a 5") won't match the string "10*5" because the "*" character breaks it up.
Hope that helps.
I have an SQL column where the entries are strings. I need to display those entries after trimming the last two characters, e.g. if the entry is 199902345 it should output 1999023.
I tried looking into TRIM but looks like it offers to trim only if we know what are the last two characters. But in my case, I don't know what those last two numbers are and they just need to be discarded.
So, in short, what MySQL string operation enables to trim the last two characters of a string?
I must add that the length of the string is not fixed. It could be 9 characters, 11 characters or whatsoever.
To select all characters except the last n from a string (or put another way, remove last n characters from a string); use the SUBSTRING and CHAR_LENGTH functions together:
SELECT col
, /* ANSI Syntax */ SUBSTRING(col FROM 1 FOR CHAR_LENGTH(col) - 2) AS col_trimmed
, /* MySQL Syntax */ SUBSTRING(col, 1, CHAR_LENGTH(col) - 2) AS col_trimmed
FROM tbl
To remove a specific substring from the end of string, use the TRIM function:
SELECT col
, TRIM(TRAILING '.php' FROM col)
-- index.php becomes index
-- index.php.php becomes index (!)
-- index.txt remains index.txt
Why not using LEFT(string, length) function instead of substring.
LEFT(col,char_length(col)-2)
you can visit here https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_left to know more about Mysql String Functions.
substring().
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html
You can use a LENGTH(that_string) minus the number of characters you want to remove in the SUBSTRING() select perhaps or use the TRIM() function.