How to get first alphanumeric character after the first occurance of a set of possible symbols using MYSQL REGEX - mysql

I'm having diffficulty with a regular expression.
I have a MYSQL table with ABC notated tunes in it that look a bit like this:
X: 1
T: Spórt
M: 6/8
L: 1/8
R: jig
K: Dmaj
|:AdF ~A3 | GBE ~G3 | AdF ~A3 |
GBE cde | AdF ~A3 | GBE ~G3 |
cdc A2G | EAA D3 :|
I want to make a search function in MYSQL that will list tunes in order by their starting note. In this case I need to return A
Most tunes begin with either : a bar-line |, a repeat bar-line |:or no bar-line (which means I have to match the first character on the first line that has bar-lines in it)
Any suggestions of what regex would do this? I find regexpressions extremely confusing!

Try: \|:?([A-Ga-g])
\| matches |
:? matches : if there is one
([A-Ga-g]) Gets a note.

Related

MySQL - Searching for CC numbers

I inherited a MySQL server that has CC numbers stored in plaintext. due to PCI requirements, I need to find the numbers and mask them. The trick is they are stored in a field with other text as well. I need to find a way to search for cc numbers and change just those, not the rest of the text.
I have tried the masking feature in MySQL, but it doesn't work for this version. I also looked up a few different sites but can't seem to find anything that will really help with my particular instance.
Edit
to explain better. the previous admin didn't tell the operators to not take CC info through the live chat system. The system is using SSL but the chat history is stored in plain text in a MySQL DB. The company isn't PCI compliant (as far as getting scanned and SAQ is concerned) so we cannot have CC numbers stored anywhere. but the numbers are given in the middle of a conversation. If they were in their own column then that wouldn't be a big deal.
EDIT
I have tried using regexp to just try and search for CC #'s but now I am getting an operand error, which is lazy quantifiers I believe.
SELECT * FROM table_name Where text regexp '^4[0-9]{12}(?:[0-9]{3})?$'
Any Ideas?
You could potentially use a regular expression to search for 16-19 consecutive characters with (using LIKE if you have the numbers separated from the text, or just REGEXP):
The example is given here (where 5 is the number of items to search for, and ^$ requires it to be those at the beginning and end):
mysql> SELECT * FROM pet WHERE name REGEXP '^.{5}$';
+-------+--------+---------+------+------------+-------+
| name | owner | species | sex | birth | death |
+-------+--------+---------+------+------------+-------+
| Claws | Gwen | cat | m | 1994-03-17 | NULL |
| Buffy | Harold | dog | f | 1989-05-13 | NULL |
+-------+--------+---------+------+------------+-------+
Would end up something like:
REGEXP '^([0-9]{16|17|18|19})$'
https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html
And lookie here too:
Regex to match a digit two or four times

REGEX_REPLACE not matching all chars from the beginning to the first occurrence of a 5 digits word

I've this record in a Mysql table:
ADDRESS
----------------------------------
sdasd 4354 ciao 12345 sdsdsa asfds
I would like to match all chars from the beginning to the first occurrence of a 5 digits word, including it.
In this case, using REGEXP_REPLACE, I would like to remove the substring matched and return sdsdsa asfds.
What I've tried to do is this:
SELECT REGEXP_REPLACE(ADDRESS, '^.+\b\d{5}\b.','') FROM `mytable`
The regexp seems to work testing it in this snippet and I cannot understand why Mysql won't.
MySQL supports POSIX regex which doesn't support PERL like properties e.g. \b, \d etc.
This regex should work for you:
SELECT REGEXP_REPLACE
('sdasd 4354 ciao 12345 sdsdsa asfds', '^.+[[:<:]][0-9]{5}[[:blank:]]+', '') as val;
+--------------+
| val |
+--------------+
| sdsdsa asfds |
+--------------+
RegEx Details:
^.+: Match 1 or more of any characters at the start (greedy)
[[:<:]]: Match a word boundary (zero width)
[0-9]{5}: Match exactly 5 digits
[[:blank:]]+: Match 1 or more of whitespaces (tab or space)

Remove characters from specific field on tables

I need to clean up a database where one of the columns (TOTAL_AREA) has some characters on some of the entries (not all of them)
Such as 5000㎡
I need to clean all the fields that have this entry to show only 500
How can I do it with SQL? I looked at TRIM but couldn't find a way to select all entries that have a character after the number and them TRIM it
Any help would be appreciated
Thanks
This is pretty easy. MySQL does implicit conversion, ignoring characters after the digits. So, you can do:
select (col * 1.0 / 10)
For your example, this will return 500.
Assuming you want to get rid of all characters that are not digits, you can use e.g. REGEXP_REPLACE, e.g.
create or replace table x(s string);
insert into x values
('111'),
('abc234xyz'),
('5000㎡'),
('9000㎡以上');
select s, regexp_replace(s, '[^\\d]*(\\d+)[^\\d]*', '\\1') from x;
-----------+--------------------------------------------------+
S | REGEXP_REPLACE(S, '[^\\D]*(\\D+)[^\\D]*', '\\1') |
-----------+--------------------------------------------------+
111 | 111 |
abc234xyz | 234 |
5000㎡ | 5000 |
9000㎡以上 | 9000 |
-----------+--------------------------------------------------+
What we do there is we match sequences of 0-or-more non-digit characters, followed by 1-or-more digit characters, and again 0-or-more non-digit characters, and product only the middle sequence.
Note, that you can use a different regexp depending what characters exactly you want to keep/remove.

How to match hyphen delimited in any order

I need to match a set of characters delimited by a hyphen - for example:
B-B/w-W/Br-W-Br
Where the / are part of what I need, up to 20 spaces.
G-R-B, G/R-B-B/W-O
So I need a regex that covers between the -'s in any order (G-R-B could also be R-B-G)
I've been playing around with a bunch of combo's, but I can't come up with something that will match any order.
The plan is to search this way using mysql. So, it'll be something like
select * from table1 where pinout REGEXP '';
I just can't get the regex right :/
Description
This expression will match the string providing each of the hyphen delimited values are included in the string. The color values can appear in the string in any order so this expression will match W/Br-b-B/w and B/w-W/Br-b... or any other combinations which include those colors.
^ # match the start to of the string
(?=.*?(?:^|-)W\/Br(?=-|$)) # require the string to have a w/br
(?=.*?(?:^|-)b(?=-|$)) # require the string to have a b
(?=.*?(?:^|-)B\/w(?=-|$)) # require the string to have a b/w
.* # match the entire string
MySql doesn't really support the look arounds so this will need to be broken into a group of where statements
mysql> SELECT * FROM dog WHERE ( color REGEXP '.*(^|-)W\/Br(-|$)' and color REGEXP '.*(^|-)b(-|$)' and color REGEXP '.*(^|-)B\/w(-|$)' );
+-------+--------+---------+------+------------+---------------------+
| name | owner | species | sex | birth | color |
+-------+--------+---------+------+------------+---------------------+
| Claws | Gwen | cat | m | 1994-03-17 | B-B/w-W/Br-W-Br |
| Buffy | Harold | dog | f | 1989-05-13 | G-R-B, G/R-B-B/W-O |
+-------+--------+---------+------+------------+---------------------+
See also this working sqlfiddle: http://sqlfiddle.com/#!2/943af/1/0
Using a regex in conjunction with a MySql where statement can be found here: http://dev.mysql.com/doc/refman/5.1/en/pattern-matching.html
I might have misunderstood from your example, try this:
-*([a-zA-Z/]+)-*
The capture region can be altered to include your specific letters of interest, e.g. [GRBWOgrbwo/].
Edit: I don't think this will help you in the context you're using it, but I'll leave it here for posterity.

MySQL - UNHEX(HEX(UTF-8)) issue

I've got a database with UTF-8 characters in it, which are improperly displayed. I figured that I could use UNHEX(HEX(column)) != column condition to know what fields have UTF-8 characters in them. The results are rather interesting:
id | content | HEX(content) | UNHEX(HEX(content)) LIKE '%c299%' | UNHEX(HEX(content)) LIKE '%FFF%' | UNHEX(HEX(content))
49829102 | | C299 | 0 | 0 | c299
874625485 | FFF | 464646 | 0 | 1 | FFF
How is this possible and, possibly, how can I find the row with this character in it?
-- edit(2): since my edit has been removed (probably when JamWaffles was fixing my beautiful data table), here it is again: as editor strips out UTF-8 characters, the content in first row is \uc299 (if that's not clear ;) )
-- edit(3): I've figured out what the issue is - the actual representation of UNHEX(HEX(content)) is WRONG - to display my multibyte character I had to do the following: SELECT UNHEX(SUBSTR(HEX(content),1))). Sadly UNHEX(C299) doesn't work as UNHEX(C2)+UNHEX(99) so it's back to the drawing board.
There are two ways to determine if a string contains UTF-8 specific characters. The first is to see if the string has values outside the ASCII character set:
SELECT _utf8 'amńbcd' REGEXP '[^[.NUL.]-[.DEL.]]';
The second is to compare the binary and character lengths:
SELECT LENGTH(_utf8 'amńbcd') <> CHAR_LENGTH(_utf8 'amńbcd');
Both return TRUE.
See http://sqlfiddle.com/#!2/d41d8/9811