Wrap integers in quotes from json data - json

I created this question yesterday
I've since realised there are actually a few other bits of data that cause issues with the solutions I received. Hence, I thought it best to make a new question
Take the following example data;
"87",0000,0767,"078",0785,"0723",23487, "061 904 5284","17\/10\/2016","some.name.789#hotmail.com"
Using the accepted solution form above (?<!")(\b\d+\b)(?!")
The date string ends up having the middle number in between the two \/ wrapped, the number in quotes with spaces breaks as well as the email address.
The issues can be seen here: https://regex101.com/r/qVQYA7/6
My Solution
The following does seem to work for me, however it seems a bit messy. I have a feeling there's a much more succinct way to achieve the same result;
,(?<!("|\/|\\))(\b\d+\b)(?!("|\/|\\|( \d))) Replace with ,"$2"
https://regex101.com/r/qVQYA7/5
EDIT:
#Federico this screenshot shows that spaces before or after commas breaks the replace;

By reading your both questions, what I understand is that you want to wrap in double quots some numbers that aren't, so for this I can come up with a simple regex like this:
(?<=,)(\d+)(?=,)
With the replacement string: "$1"
Working demo
Update: after you updated the question, here I put the update for the answer. You can use this regex instead:
(?<=,)\s*(\d+)\s*(?=,)

Related

My input pattern doesn't work

I've created a regex for checking a date format ( 01-01-0000 to 31-12-9999).
I tried an example regex, and it works, so there is something wrong with my regex, but when I try it in a debugger (regexr) it works just fine.
What am I missing?
([0]{1}[1-9]{1}|[1-2]{1}[0-9]{1}|[3]{1}[0-1]{1})(\-)([0]{1}[1-9]{1}|[1]{1}[0-2]{1})(\-)\d{4}
New regex after edit:
(0[1-9]|[12][0-9]|3[01])-(0[1-9]|1[0-2])-\d{4}
I use an html input type text, and put the regex in pattern ="my pattern".
Thanks in advance (:
Edit: Fixed the regex according to Casimir et Hippolyte's comment, and now it works.
Your regex looks OK, at least it captures your both sample dates (tested on regex101.com).
You can simplify it a little:
No need for [...] around a single char (e.g. change [0] to 0).
No need for capturing groups around a dash (e.g. change (-) to -).
It is strange that you used capturing groups for day and month, but you
didn't for year field (I added it in the example below).
So try the following regex:
(0[1-9]|[12][0-9]|3[01])-(0[1-9]|1[0-2])-(\d{4})
It is however not clear, whether you realy need capturing groups.

Regular Expression for HTML attributes

I need to write a regular expression to catch the following things in bold
class="something_A211"
style="width:380px;margin-top: 20px;"
I have no idea how to write it, can someone help me?
I need this because, in html file i have to replace (whit notepad++) with empty, so i want to have a clear < tr > or < td > or anything else.
Thank you
You can use a regex like this to capture the content:
((?:class|style)=".*?")
Working demo
However, if you just want to match and delete that you can get rid of capturing groups:
(?:class|style)=".*?"
For all constructions like something="data", you can use this.
[^\s]*?\=\".*?\"
https://regex101.com/r/oQ5dR0/1
The link shows you what everything does.
To explain it briefly, a non space character can come before the "=" any mumber of times, then comes the quotes and info inside of them.
The question mark in .*? (and character any number of times) is needed so only the minimum amount of characters will be used (instead of looking for the next possible quotes somewhere further along)

Regex find two characters in order, between others, ignoring punctuation

I'm trying to filter using regex in mySQL.
The field is a text field and I want to find all that match 'MD' or similar ('M.D.', 'M. D.', 'DDS, M.D.' etc.).
I do not want to accept those that contain M and D as a part of another acronym (e.g., 'DMD'). However 'DMD, M.D.' I would want to find.
Apologies if this is a simple task - I read through some regex tutorials and couldn't figure this out! Thanks.
Update:
With help from the suggestions I arrived at the following solution:
(\s|^)M\.?\s*D\.?
which works for all of my cases. The quotes in my questions were to indicate it was a string, they are not a part of the string.
You can use a regex like this:
\b(M\.?\s*D\.?|D\.?\s*D\.?\s*S\.?)
Working demo
If I have understood your requirement:
'([^'.]*[ ,]*M[. ]*D[. ]*)'
this looks for MD preceded by space comma or ' separated by 0 or more dots & spaces, followed by '
it matches all the contents between the '' marks
test: https://regex101.com/r/oV2kV8/2
In the end I found this solution works:
(\s|^)M\.?\s*D\.?(\s|$)
This allows for the 'MD' to be at the start or after another credential and to have spaces or periods or nothing between the letters.

Regex Between HTML Tags - VBA

I have a page full of html data that I am scraping from.
There is one occurrence of a "gross amount" field that I am trying to extract.
<h3 id="cart_trans_detail_ach_grossamount_lbl">Gross Amount</h3>
<p id="cart_trans_detail_ach_grossamount_txt">$76.99 USD</p>
All I want to get from this is $76.99 USD
I have tried using Regex Buddy and putting together but regex is not my strong suite. Even something simple like this: <p id="cart_trans_detail_ach_grossamount_txt">(.*)</p> matches the whole string and not just what is between the tags.
Any ideas?
First of all, using a regex to parse HTML is unrecommended, you should use a HTML/XML parsing library instead. But if you really feel the need to use a regular expression for that, what you are missing is the ungreedy char (?) after your (*) so that your regex stops at the first </p> it finds.
<p id="cart_trans_detail_ach_grossamount_txt">(.*?)</p>
Try this pattern:
(?<=grossamount_txt">\$)(\d*\.?\d*) USD
It works in python and php, it shall also work in Java.
The group(1) gives you back only the amount without other things.
The first parenthesis encloses a positive lookbehind which looks if before the USD amount there is a string related to "grossamount_txt">$".
then the second parenthesis try to match for a numeric amount possibily expressed in integer number and decimal numbers.
Finally there the last part of the pattern is " USD".
You can test it here
https://www.regex101.com/#python
where you can also find some more detailed explanation.
Here about how lookaround works
http://www.regular-expressions.info/lookaround.html
Hope it helps.

RegEx: Link Twitter-Name Mentions to Twitter in HTML

I want to do THIS, just a little bit more complicated:
Lets say, I have an HTML input:
Don't break!
Some Twitter Users: #codinghorror, #spolsky, #jarrod_dixon and #blam4c.
You can't reach me at blam4c#example.com.
Is there a good RegEx to replace the twitter username mentions by links to twitter, but leave #example (eMail-Adress at the bottom) AND #test (in the link title, i.e. in HTML tags)?
It probably should also try to not add links inside existing links, i.e. not break this:
Hello #someone there!
My current attempt is to add ">" at the beginning of the string, then use this RegEx:
Search: '/>([^<]*\s)\#([a-z0-9_]+)([\s,.!?])/i'
Replace: '>\1#\2\3'
Then remove the ">" I added in step 1.
But that won't match anything but the "#blam4c". I know WHY it does so, that's not the problem.
I would like to find a solution that finds and replaces all twitter user name mentions without destroying the HTML. Maybe it might even be better to code this without RegEx?
First, keep the angle brackets out of your regexps.
Use a HTML parser and xpath to select the text nodes you are interested in processing, then consider a regexp for matching only #refs in those nodes.
I'll let to other people to try and give a specific answer to the regex part.
I agree with ddaa, there's almost no sane way to attack this without stripping the html links out first.
Presumably you'd be starting out with an actual Twitter message, which cannot by definition include any manually entered hyperlinks.
For example, here's how I found this question (the link resolves to this question so don't bother clicking it!)
Some Twitter Users: #codinghorror, #spolsky, #jarrod_dixon and #blam4c. http://bit.ly/2phvZ1
In this case, it's easy:
var msg = "Some Twitter Users: #codinghorror, #spolsky, #jarrod_dixon and #blam4c. http://bit.ly/2phvZ1";
var html = Regex.Replace(msg, "(?<!\w)(#(\w+))",
"$1");
(this might need some tweaking, I'd like to test it against a corpus, but it seems correct for the average Twitter message)
As for your more complicated cases (with HTML markup embedded in the tweets), I have no idea. Way too hard for me.
This regexp might work a bit better: /\B\#([\w\-]+)/gim
Here's a jsFiddle example of it in action: http://jsfiddle.net/2TQsx/4/