MySQL regex quantifiers not working the way I expect - mysql

I'm quite new to regular expressions and I'm not getting what I expect while using regex in MySql. I did investigate these regex expressions at "https://regexr.com/" which is giving me results that are what I expect. The query below returns 3 columns:
one_or_more: I'm expecting to get 6, but I'm getting 0. Doesn't "\s+" mean one ore more?
zero_or_more: I'm expecting 6, but I'm getting 7. If "\s*" means zero or more, shouldn't the match start one character earlier to include the whitespace?
zero_or_once: I'm expecting 6, but I'm getting 7. If "\s?" means one or more, shouldn't the match start one character earlier to include the whitespace?
SELECT
# 0, 6
REGEXP_INSTR("Birch Street, Boston, MA 02131, United States", "\s+street") one_or_more,
# 7, 6
REGEXP_INSTR("Birch Street, Boston, MA 02131, United States", "\s*street") zero_or_more,
# 7, 6
REGEXP_INSTR("Birch Street, Boston, MA 02131, United States", "\s?street") zero_or_once
FROM
DUAL;
Any helps is appreciated. Thank you. Paul

You need to use double \, in this case you'll get the expected results, i.e.:
REGEXP_INSTR("Birch Street, Boston, MA 02131, United States", "\\s+street")
To use a literal instance of a special character in a regular
expression, precede it by two backslash (\) characters. The MySQL
parser interprets one of the backslashes, and the regular expression
library interprets the other.
https://dev.mysql.com/doc/refman/8.0/en/regexp.html#regexp-syntax

Related

Remove multiple space in between string to a single space

Trying to clean up string data. TRIM removed leading and trailing spaces. Using REPLACE as REPLACE (col_name," ","") removed all spaces. Need a solution that will result in expected output.
Sample data:
7136 South Yale #300 Tulsa,;Oklahoma
428 NW 10th St. OKC
2903 W. Britton Road OKC
Expected output :
7136 South Yale #300 Tulsa,;Oklahoma
428 NW 10th St. OKC
2903 W. Britton Road OKC
I use MySQL 5.7.
On MySQL 8+ we could just have done a regex replacement on \s{2,} and replaced with a single space. On 5.7 this is a bit harder. Assuming that each address would only have at most one block of two or more unwanted spaces in it, we use substring operations here:
SELECT address,
CASE WHEN address LIKE '% %'
THEN CONCAT(SUBSTRING(address, 1, INSTR(address, ' ') - 1), ' ',
LTRIM(SUBSTRING(address, INSTR(address, ' '))))
ELSE address END AS output
FROM yourTable;
Demo
The above logic uses an INSTR() trick to find the start of the block of two or more spaces. It generates the output address by piecing together the two substrings on either side of this block, with excess spaces removed.

Filtering phone numbers REGEXP

I have a MySQL query to find 10 digit phone numbers that start with +1
SELECT blah
FROM table
WHERE phone REGEXP'^\\+[1]{1}[0-9]{10}$'
How can I filter this REGEXP further to only search certain 3 digit area codes? (ie. International 10 digit phone numbers who share US number format)
I tried using the IN clause ie. IN('+1809%','+1416%') but ended up with error in syntax
WHERE phone REGEXP'^\\+[1]{1}[0-9]{10}$' IN('+1809%','+1416%')
You may use a grouping construct with an alternation operator here, like
REGEXP '^\\+1(809|416)[0-9]{7}$'
^^^^^^^^^
Just subtract 3 from 10 to match the trailing digits. Note that in MySQL versions prior to 8.x, you cannot use non-capturing groups, you may only use capturing ones.
Also, [1]{1} pattern is equal to 1 because each pattern is matched exactly once by default (i.e. {1} is always redundant) and it makes littel sense to use a character class [...] with just one single symbol inside, it is meant for two or more, or to avoid escaping some symbols, but 1 does not have to be escaped as it is a word char, so the square brackets are totally redundant here.

SSRS White space issue

I have 2 scenarios
Scenario 1:
abc Ins Services,
123 Pine St Fl 23
San Francisco, CA, USA
SCENARIO 2:
abc Ins Services,
#4567
123 Pine St Fl 23
San Francisco, CA, USA
All fields are dynamic and I used trim in every expression but white space still comes as shown in scenario 1 ,I dont want this white space.
The space that you're seeing there isn't just a space character, it's a line return. These can be stored in the strings in your database as part of the address. They are hard to see when you preview results in a program like SSMS. The Trim function only removes spaces. Line returns are usually made up of ASCII characters 10 and 13. In order to remove line returns you can use the Replace function like so:
=REPLACE(REPLACE(<string to search in>, CHR(13), ""), CHR(10), "")
This allows you to add your own line returns where you actually want them.

What does the SSIS CODEPOINT value 26 represent

In one of the existing SSIS projects, I found a Condition Split with an expression of CODEPOINT(column)==26.
But I couldn't find what is the value "26" represents. When I searched the CODEPOINT for alphabet letters it starts from 65 and for 0-9 it starts from 45.
CODEPOINT Expresion merely states the following
Returns the Unicode code point of the leftmost character of a character expression
Exploring wikipedia turned up Unicode encodings
The first 128 Unicode code points, U+0000 to U+007F, used for the C0 Controls and Basic Latin characters and which correspond one-to-one to their ASCII-code equivalents
Sweet, putting that together and consulting the ASCII table allows me to determine that 26 is SUB/Substitute control character. Ctrl-Z for those really wanting to try this at home (and working under a Unix(tm) variant)

invalid column count on line 1

I'm trying to load a csv that has 21 columns and 240 rows through phpmyadmin. The most common error message is:
"invalid column count on line 1" (using csv import)
though when using load data, I get:
"error: #1083 – Field separator argument is not what is expected; check the manual"
Columns separated with ,
Columns enclosed with "
Columns escaped with \
Lines terminated with auto (though I've tried \r, \n, \r\n and any combination of the 3)
I have also tried escaping the quotes and commas but it seems to not do anything.
This is the first row of the data:
Denis,NULL,Wirtz,"221Maryland Hall 3400 North Charles Street\, Baltimore\, MD 21236",,410-516-7006,410-516-5528,wirtz#jhu.edu,Theophilus Halley Smoot Professor,NULL,NULL,"K.L. Yap\, S.I. Fraley\, M.M. Thiaville\, N. Jinawath\, K. Nakayama\, J.-L. Wang\, T.-L. Wang\, D. Wirtz\, and I.-M. Shih\, ÒNAC1 is an actin-binding protein that is essential for effective cytokinesis in cancer cellsÓ\, Cancer Research 72: 4085_4096 (2012).D.H. Kim\, S.B. Khatau\, Y. Feng\, S. Walcott\, S.X. Sun\, G.D. Longmore\, and D. Wirtz\, ÒActin cap associated focal adhesions and their distinct role in cellular mechanosensingÓ\, Scientific Reports (Nature) 2:555-568 (2012).S.I. Fraley\, Y. Feng\, G.D. Longmore\, and D. Wirtz\, ""Dimensional and temporal controls of cell migration by zyxin and binding partners in three-dimensional matrix""\, Nature Communications 3:719-731 (2012)P.-H. Wu\, C.M. Hale\, J.S.H. Lee\, Y. Tseng\, and D. Wirtz\, ÒHigh-throughput ballistic injection nanorheology (htBIN) to measure cell mechanicsÓ\, Nature Protocols 7: 155_170 (2012)C.M. Hale\, W.-C. Chen\, S.B. Khatau\, B.R. Daniels\, J.S.H. Lee\, and D. Wirtz\, ÒSMRT analysis of MTOC and nuclear positioning reveals the role of EB1 and LIC1 in single-cell polarizationÓ\, Journal of Cell Science124: 4267-4285 (2011).D. Wirtz\, K. Konstantopoulos\, and P.C. Searson\, ÒPhysics of cancer: the role of physical interactions and mechanical forces in cancer metastasisÓ\, Nature Reviews Cancer 11: 512-522 (2011)",,NULL,http://www.jhu.edu/chembe/faculty-template/DenisWirtz.jpg,Department of Chemical and Biomolecular Engineering,NULL,Whiting School of Engineering,"Postdoctoral\, Physics\, Biophysics. ESPCI\, Paris. 1993 - 1994Ph.D.\, Cemical Engineering. Stanford University. 1993M.S.\, Chemical Engineering. Stanford University. 1989B.S.\, Physics Engineering. Free University of Brussels. 1983-1988",Johns_Hopkins_University
Any help is greatly appreciated.
There are backslashes in front of commas inside double-quoted strings. If your utility treats those as escaped commas, they will function as column separators, and you will get the wrong number of columns:
"221Maryland Hall 3400 North Charles Street\, Baltimore\, MD 21236"
Again, the doubled double-quotes "" are usually a way to escape a single double-quote within a string. But if the parsing reads it as a string terminator, that's another way it can throw off your column count.
I have seen Excel mess up exported data in many fascinating ways, but this one is new to me.