VScode - replace captured group with the values - html

i have a bunch of strings in code such as:
<td style="background-color:#fdfdff"> </td>
and
<td> </td>
in one file.
The goal is to replace from first example with 0, while from second example with - (dash)
I'm using VScode regex, but I can't find the way to replace captured groups with specific values, as $1, $2 groups refer to original string groups.
This one is just example, how I'm trying to achieve this, but VScode don't ignore grouped regex.

An alternative process is to use a snippet which can do conditional replacements. With this snippet:
"replaceTDs": {
"prefix": "tdr", // whatever prefix you want
"body": [
"${TM_SELECTED_TEXT/(?<=\">)( )|( )/${1:+0}${2:+-}/g}",
]
}
The conditional replacements can be quite simple since you first find and select only the two alternative texts you are interested in. So
find: <td\s*(style="[^"]*"\s*)> </td>|<td> </td> old version
This simpler find will probably work for you:
<td\s*(style="[^"]*")?\s*> </td>
Don't replace, rather Control+Shift+L : selects all your two alternatives. Esc to focus on editor from the find widget.
Then apply your snippet, in this case type tdr+Tab
and all the changes are made. You just have to make the snippet one time and then do a single find.
This technique scales a little better than running as many find/replaces as you have replacements to do. Even with more conditional replacements it would probably be a simple change to the one snippet to add more replacements.
Also you can simplify this even more if you use a keybinding to trigger your snippet (you don't have to change focus from the find widget - or create the separate snippet). So with no snippet, but this keybinding:
{
"key": "alt+w",
"command": "editor.action.insertSnippet",
"args": {
"snippet": "${TM_SELECTED_TEXT/(?<=\">)( )|( )/${1:+0}${2:+-}/g}"
},
"when": "editorHasSelection"
}
now the same demo:

You can use
Search for (?<=<td\s+style="[^"]*">) (?=</td>) and replace with 0, and
Search for <td> </td> and replace with <td>-</td>, no need for a regex here.
Note that capturing groups are meant to keep captured substrings.
The first pattern matches
(?<=<td\s+style="[^"]*">) - a place in string that is immediately preceded with <td, one or more whitespaces, style=", any zero or more chars other than " and then a >
- a literal string
(?=</td>) - immediately to the right, there must be </td>.

Related

How to use regex (regular expressions) in Notepad++ to remove all HTML and JSON code that does not contain a specific string?

Using regular expressions (in Notepad++), I want to find all JSON sections that contain the string foo. Note that the JSON just happens to be embedded within a limited set of HTML source code which is loaded into Notepad++.
I've written the following regex to accomplish this task:
({[^}]*foo[^}]*})
This works as expected in all the input that is possible.
I want to improve my workflow, so instead of just finding all such JSON sections, I want to write a regex to remove all the HTML & JSON that does not match this expression. The result will be only JSON sections that contain foo.
I tried using the Notepad++ regex Replace functionality with this find expression:
(?:({[^}]*?foo[^}]*?})|.)+
and this replace expression:
$1\n\n$2\n\n$3\n\n$4\n\n$5\n\n$6\n\n$7\n\n$8\n\n$9\n\n
This successfully works for the last occurrence of foo within the JSON, but does not find the rest of the occurrences.
How can I improve my code to find all the occurrences?
Here is a simplified minimal example of input and desired output. I hope I haven't simplified it too much for it to be useful:
Simplified input:
<!DOCTYPE html>
<html>
<div dat="{example foo1}"> </div>
<div dat="{example bar}"> </div>
<div dat="{example foo2}"> </div>
</html>
Desired output:
{example foo1}
{example foo2}
You can use
{[^}]*foo[^}]*}|((?s:.))
Replace with (?1:$0\n). Details:
{[^}]*foo[^}]*} - {, zero or more chars other than }, foo, zero or more chars other than } and then a }
| - or
((?s:.)) - Capturing group 1: any one char ((?s:...) is an inline modifier group where . matches all chars including line break chars, same as if you enabled . matches newline option).
The (?1:$0\n) replacement pattern replaces with an empty string if Group 1 was matched, else the replacement is the match text + a newline.
See the demo and search and replace dialog settings:
Updates
The comment section was full tried to suggest a code here,
Let me know if this is a bit close to your intended result,
Find: ({.+?[\n]*foo[ \d]*})|.*?
Replace all: $1
Also added Toto's example

Extract and Create Property in JSON file with RegEx

I have the following JSON file. Dotted across the file is the following:
"properties": {
"Name": "Darlington",
"Description": "<br><br><br> <table border=\"1\" padding=\"0\"> <tr><td>CCGcode</td><td>00C</td></tr> <tr><td>CCGname_short</td><td>Darlington</td></tr>"
}
Using RegEx, I would like to extract the CCG Code property and add it back in so that the above becomes:
"properties": {
"Name": "Darlington",
"CCGcode": "00C",
"Description": "<br><br><br> <table border=\"1\" padding=\"0\"> <tr><td>CCGcode</td><td>00C</td></tr> <tr><td>CCGname_short</td><td>Darlington</td></tr>"
}
I've tried all sorts and I just can't get it to work. I am using Sublime Text.
^("Description":").*?<td>CCGcode<\/td><td>([^<>\n]*).*$
The above selects the code, but not sure how I can get it to create the property.
Try this
( *)"Description".*?CCGcode.*?<td>([^<]+)
Regex demo
This one for sublimetext3
Find what:
( *)("Description".*?CCGcode.*?<td>)([^<]+)
Replace with:
\1"CCGcode": "\3",\n\1\2
Demo
There's a very simpel, but not so elegant, solution. Replace
"Description":.*?<td>CCGcode<\/td><td>([A-Z\d]*)<\/td>
with
"CCGCode":"\1",\n \0
Don't know how Sublime handles replacements, but you may have to change the replacing \0 and \1 to something else - e.g. $0 and $1.
What it does is to find the Description entry and the following CCGCode entry, capturing the code into capture group 1.
Then replace capture group 0 - the whole matched text, with the new CCGCode JSON tag plus the original text.
It's a pretty fragile solution, but it works for your sample case.
Check out example at regex101.
Regards

REGEX in mysql table containing html data

I have a table that stores html templates in a mysql database. Now I have to perform some text replacement on them. However my target text is also present in some of the anchor tags and I don't want that to be replaced.
EX :
<body> ... (has huge html crap)... .........(Some more html crap) ... (a bit more of html crap) ... </body>
Task is to replace the occurrences of the "KEYWORD" with "NEW KEYWORD" in the body but not the urls.
It would also be helpful if I can first find such cases where the KEYWORD is a part of a link in a given template.
MySQL is not capable of such advanced string manipulation.
However, if you were to have a one-time-use PHP script do the editing (ie. select from the table, for each row process and update), you can do this:
// foreach row as $row
$newtext = preg_replace("(<a\b.*?>(*SKIP)(*FAIL)|KEYWORD)","NEW KEYWORD",$row['data']);
What this does is look for links (very approximate Regex but should suffice in almost all cases here), then skip over them. Then, it looks for KEYWORD and replaces it with NEW KEYWORD.
You can use this to quickly and easily handle the replacement.
If that "almost all cases" thing above turns out to not be enough, you can use DOMDocument to load the HTML into a parser and process text nodes only from there.
Maybe you could find the cases where the KEYWORD is a part of a link with something like this:
SELECT * FROM tbl WHERE html REGEXP '<a[^>]*KEYWORD';

Powershell modifying HTML from ConvertTo-HTML

I have a script that generates an array of objects that I want to email out in HTML format. That part works fine. I am trying to modify the HTML string to make certain rows a different font color.
Part of the html string looks like this (2 rows only):
<tr>
<td>ABL - Branch5206 Daily OD Report</td>
<td>'\\CTB052\Shared_Files\FIS-BIC Reporting\Report Output Files\ABL\Operations\Daily\ABL - Branch5206 Daily OD Report.pdf'</td>
<td>13124</td>
<td>4/23/2013 8:05:34 AM</td>
<td>29134</td>
<td>0</td>
<td>Delivered</td>
</tr>
<tr>
<td>ABL - Branch5206 Daily OD Report</td>
<td>'\\CTB052\Shared_Files\FIS-BIC Reporting\Report Output Files\ABL\Operations\Daily\ABL - Branch5206 Daily OD Report.xls'</td>
<td>15716</td>
<td>4/23/2013 8:05:34 AM</td>
<td>29134</td>
<td>0</td>
<td>Delivered</td>
</tr>
I tried regex to add a font color to the beginning and end of the rows where the row ends with "Delivered":
$email = [regex]::Replace($email, "<tr><td>(.*?)Delivered</td></tr>", '<tr><font color = green><td>$1Delivered</td></font></tr>')
This didn't work (I am not sure if you can set font color for a whole row like that).
Any ideas on how to do this easily/efficiently? I have to do it on several different statuses (like Delivered)
Disclaimer: HTML cannot be parsed by regular expression parser. A regular expression will NOT provide a general solution to this problem. If your HTML structure is well known and you don't have any other <tr></tr> elements, though, the following might work. On that note, though, is there some reason you can't modify the HTML generation to do this then instead of waiting until the HTML is already generated?
Try this command:
PS > $email = $email -replace '(?s)<tr>(.*?)<td>Delivered</td>(.*?)</tr>','<tr style="color: #FF0000">$1<td>Delivered</td>$2</tr>'
The first string is the pattern. The (?s) tells the parser to allow . to accept newlines; this is called "single line" mode. Then it grabs a <tr> element that contains the string <td>Delivered</td>. The two capture groups grab everything else in the <tr> element around the <td>Delivered</td> string. Take note of the question marks following the *s. * by itself is greedy and matches as much text as possible; *? matches as little text as possible. If we just used * here, it would treat your entire string as one match and only replace the first <tr>.
The second string is the replacement. It plops the <tr> element and its contents back in place with an added style attribute, and all without back ref.
One other minor note is the quoting. I tend toward single quotes anyway, but in this case, you're likely to have double quotes in the replacement string. So single quotes are probably the way to go.
As for how you could do this for different statuses, regular expressions really aren't designed for conditional content like that; it's like trying to use a screwdriver as a drill. You can hard code several replaces or loop over status/color pairs and build your pattern and replace strings from them. A full blown HTML parser would be more efficient if you can find one for .NET; you might try to get away with an XML parser if you can guarantee it's valid XML. Or, going back to my question at the beginning, you could modify the HTML generation. If your e-mails are few in number, though, this may not be a bottleneck worth addressing. Development time spent is also costly. See if it's fast enough and try a different route if not.
Credit where it's due: I took the HTML style attribute from #FrankieTheKneeMan.

Potential pitfalls with my new markup language?

Something that's really bothered me about XHTML, and XML in general is why it's necessary to indicate what tag you're closing. Things like <b>bold <i>bold and italic</b> just italic</i> aren't legal anyways. Thus I think using {} makes more sense. Anyway, here's what I came up with:
doctype;
html
{
head
{
title "my webpage"
javascript '''
// code here
// single quotes do not allow variable substitution, like PHP
// triple quotes can be used like Python
'''
}
body
{
table {
tr {
td "cell 1"
td "cell 2"
td #var|filter1|filter2:arg
}
}
p "variable #var in a string"
p "variable #{var|withfilter}"
input(type=password, value=secret); // attributes are specified like this
br; // semi-colons are used on elements that don't have content
p { "strings are" "automatically" "concatenated together" #andvars "too" }
}
}
Tags that only contain one element do not need to be enclosed in braces (for example td "cell 1" the td is closed immediately after the text). Strings are outputted directly, except double-quoted strings allow variable substitution, and single quotes do not. I'm adopting a filtering scheme similar to Django's. The thing I'm most concerned about, I think, is variable substitution in double-quotes.. I don't want people to have to open and close single quotes everywhere because the syntax things are being treated as vars that shouldn't. I don't think the # character is very commonly used in code. I was going to use $ like PHP, but jQuery uses that, and I want to allow people to do substitutions in their JS too (of course, if they don't need to, they should use single quotes!)
Templates will use "dictionaries". By default, it uses this HTML dict, with familiar tags, but you can easily add your own. "Tags" may consist of not just one, but multiple HTML tags.
Still need to decide how to do loops and including partials...
Edit: Started an open source project, for those interested.
I believe you can get close to that with the syntax of TCL script language.
The thing I like the most about your idea is the removal of the (to me very) redundant information in the closing tags of the (has it's roots in) SGML markup.
Another clean option IMO is to go the road of using indenting to specify scope, eliminating braces all together. With the assumption of a little editor support, I can imagine this happening.
I think it's possibly stiflling that globally used specifications cater to the theorhetical person using VI or Notepad to type out their markup...