I have a MySQL query to find 10 digit phone numbers that start with +1
SELECT blah
FROM table
WHERE phone REGEXP'^\\+[1]{1}[0-9]{10}$'
How can I filter this REGEXP further to only search certain 3 digit area codes? (ie. International 10 digit phone numbers who share US number format)
I tried using the IN clause ie. IN('+1809%','+1416%') but ended up with error in syntax
WHERE phone REGEXP'^\\+[1]{1}[0-9]{10}$' IN('+1809%','+1416%')
You may use a grouping construct with an alternation operator here, like
REGEXP '^\\+1(809|416)[0-9]{7}$'
^^^^^^^^^
Just subtract 3 from 10 to match the trailing digits. Note that in MySQL versions prior to 8.x, you cannot use non-capturing groups, you may only use capturing ones.
Also, [1]{1} pattern is equal to 1 because each pattern is matched exactly once by default (i.e. {1} is always redundant) and it makes littel sense to use a character class [...] with just one single symbol inside, it is meant for two or more, or to avoid escaping some symbols, but 1 does not have to be escaped as it is a word char, so the square brackets are totally redundant here.
Related
I have column in tableau with following values:
1234
3456
6789
camp-1
camp-2
camp-3
I only want to show filter with values
camp-1
camp-2
camp-3
How can I only select the alphabetic values in filter in tableau?
Your example is not clear about what you want to include and what you want to exclude. To explain better, I took an elaborated example
Case-1 If you want to search/filter for digits at start, use this calculated field
REGEXP_MATCH([Field1], '^[0-9]')
Case-2 If you want to search for numbers anywhere, use this
REGEXP_MATCH([Field1], '(.*)[0-9]')
Case-3 If digits only are required
REGEXP_MATCH([Field1], '^[0-9]+$')
case-4 for alphabet at start use this
REGEXP_MATCH([Field1], '^[:alpha:]')
Results of all matches are shown below
Note Combining numbers anywhere AND alphabet at start you can filter out case1, case2 and case3 only.
Good Luck
If the Tableau column contains a mixture of numbers and text, the column will be a text column and all content will be considered as text. This reduces the problem to that of identifying specific rows that contain non-numeric values.
This requires some string manipulation and comparison. If you know that the structure of the content in those rows is predictable (eg the first character is always a letter when there are non numeric characters in the row) then a simple equation will filter on those rows:
if ascii(left([Text And Numbers],1) )>57 then 'text' else 'number' END
This exploits the observation that the ASCII decimal code for the digit 9 is 57 and most of the ASCII characters with higher codes are letters or punctuation (which is a fair assumption if nothing other than numbers, letters or punctuation are present in your data).
Obviously, if letters and numbers could appear anywhere in the string you need a more complex function but Tableau provides the option to use regular expressions which can code much more complex text analysis like is any alphabetic character present in a string (see this for some ideas of the appropriate regex expressions).
I'm trying to cleanse a data set from erroneous phone number entries. Having trouble making the regular expression for the filter in MySQL.
The structure is the following:
First digit is in 2-9
Second and third digits can be any numeral except they may not be the same number
Forth digit is in 2-9
Fifth and sixth digits can be any numeral except '11'
I've landed on a few rather elaborate reg expressions which didn't quite work; but I'm sure there is a simplistic approach.
A "valid" number might look like:
2028658680
7137038891
My filter usually misses cases such as:
6778914351
7777777777
6178116678
Note that these numbers are completely made up.
This is possible, but it will be long and ugly. With a more robust regex engine you can do lookaround and even conditional statements, but MySQL doesn't support such things as far as I know.
^[2-9](?:0[1-9]|1[02-9]|2[013-9]|3[0-24-9]|4[0-35-9]|5[0-46-9]|6[0-57-9]|7[0-689]|8[0-79]|9[0-8])[2-9](?:1[02-9]|[02-9]1|[02-9]{2})[0-9]{4}$
https://regex101.com/r/qPuS5W/1
Explanation:
[2-9] First digit is any number from 2 to 9.
(?:0[1-9]|1[02-9]|2[013-9]|3[0-24-9]|4[0-35-9]|5[0-46-9]|6[0-57-9]|7[0-689]|8[0-79]|9[0-8]) Non capturing group that contains 10 alternatives starting with each number 0 to 9 followed by any number except that number.
(?:1[02-9]|[02-9]1|[02-9]{2}) Non capturing group that matches either 1 followed by a number that isn't 1, a number that isn't 1 followed by 1, or two numbers that aren't 1.
[0-9]{4} 4 of any number.
I am trying to write one single formula to identify all the patterns in a column/field. For example: Below are the five different patterns
AG 5643 895468 UWEB
7546 695321 IJJK
PE 45612384
8642567921
16724385
Formula for
First pattern: Contains 4 numbers 6 numbers
'*[0-9][0-9][0-9][0-9] [0-9][0-9][0-9][0-9][0-9][0-9] *' This is not working. Can we specify the length? Something like this [0-9]{4} - 4 digit number?
First pattern should pick second one also.
3rd one: first 2 characters are alphabets 8 or 10 digit numbers
4th one: 10 digit number
5th one 8 digit number
Thanks in advance!
If you're working in MySQL you can use regular expressions with the RLIKE filter operator.
For example, WHERE text RLIKE '[0-9]{8}' finds all the rows with any consecutive sequence of eight digits in them anywhere. (http://sqlfiddle.com/#!9/44996/1/0)
WHERE text RLIKE '^[0-9]{8}%' finds the rows consisting of nothing but an eight-digit sequence. (http://sqlfiddle.com/#!9/44996/2/0)
WHERE text RLIKE '^[0-9A-Z]{2} ' finds the rows starting with two letters or digits and then a space. (http://sqlfiddle.com/#!9/44996/3/0)
You get the idea. Regular expressions have a lot of power to them, generally beyond the scope of a SO answer to explain. Beware, though. This is a common saying: If you solve a problem with e regular expression, now you have two problems. You need to be careful with them.
I am trying to scrape prices from any given URL. I am using CsQuery and for the life of me, I cannot figure out the best way to find all items on a page that might be a price. A bonus would be figuring out the most likely price by size / color of the test and how close it is to the top of the page. I was thinking maybe looking at a Regex solution, but I am not sure if that is the correct way to go with CsQuery.
Well, if a currency sign is present, You might do something like.
(?:\$|\£)(\d+(?!\d*,\d)|\d{1,3}((, ?)\d{3}?)?(\3\d{3}?){0,4})(\.\d{1,2})?(?=[^\d,]|, (?!\d{3,})|$)
(?:\$|\£) -- matches literal currency simbols. You can remove this
if you can't count on the presence of currency symbols,
but it's a great anchor if you can
(\d+ -- matches any number of digits
(?!\d*,\d) as long as not followed by comma digit
|
\d{1,3} -- otherwise matches betweein 1 and 3 digits
(
(, ?) -- looks for a comma followed by a possible space
captures as \3
\d{3}?) -- followed by 3 digits
? -- zero or one times
(\3 -- looks for the same pattern of comma with or without space
\d{3}? -- followed by 3 digits
){0,4}) -- between 0 and 4 times, more on that below
(\. -- literal period
\d{1,2} -- followed by one or two digits
)? -- zero or one times (so, optional)
(?=[^\d,]|, (?!\d{3,})|$)
Another thing you might do is to limit how many repetitions of comma groups there can be, it might help weed out high numbers that aren't likely prices. If you're not expecting anything over 999,999, you might do this (but if you're dealing with foreign currencies, inflation has made some astronomically high--a loaf of bread in Zimbabwe costs fifty million).
For easy reading, I'll show you how to limit the repetitions to 7
Change the 4, (the only 4 in the whole regex) to 6, (the number you want -1, because we look for 1 beforehand to establish comma pattern).
(?:\$|\£)(\d+(?!\d*,\d)|\d{1,3}((, ?)\d{3}?)?(\3\d{3}?){0,6})(\.\d{1,2})?(?=[^\d,]|, (?!\d{3,})|$)
You can see this in action at: https://regex101.com/r/oU2nW2/1
Does anyone know how to match even numbers and odd numbers of letter using regexp in mysql? i need to match like a even number of A's followed by an odd number of G's and then at least one TC? For example: acgtccAAAAGGGTCatg would match up. It's something for dna sequencing
An even number of A's can be expressed as (AA)+ (one or more instance of AA; so it'll match AA, AAAA, AAAAAA...). An odd number of Gs can be expressed as G(GG)* (one G followed by zero or more instances of GG, so that'll match G, GGG, GGGGG...).
Put that together and you've got:
/(AA)+G(GG)*TC/
However, since regex engines will try to match as much as possible, this expression will actually match a substring of AAAGGGTC (ie. AAGGGTC)! In order to prevent that, you could use a negative lookbehind to ensure that the character before the first A isn't another A:
/(?<!A)(AA)+G(GG)*TC/
...except that MySQL doesn't support lookarounds in their regexes.
What you can do instead is specify that the pattern either starts at the beginning of the string (anchored by ^), or is preceded by a character that's not A:
/(^|[^A])(AA)+G(GG)*TC/
But note that with this pattern an extra character will be captured if the pattern isn't found at the start of the string so you'll have to chop of the first character if it's not an A.
You can maybe try something like (AA)*(GG)*GTC
I think that would do the trick. Don't know if there's a special syntax for mysql though