Using grep in BBEdit - html

I'd like BBEdit to search some HTML and match every paragraph tag that contains a text string like "myText".
This sort of works but often matches beyond the closing ">" of the tag.
<p.*myText[^>]*>
As I understand it, this should match the opening angle bracket-"p", then any number of characters until it finds "myText", then any number of characters that are NOT ">" until it finds the closing ">". What's wrong?

Use <p\s[^>]*myText[^>]*> – from comment by Wiktor Stribiżew.

Related

RegEx replace only occurrences outside of <h> html tags

I would like to regex replace Plus in the below text, but only when it's not wrapped in a header tag:
<h4 class="Somethingsomething" id="something">Plus plan</h4>The <b>Plus</b> plan starts at $14 per person per month and comes with everything from Basic.
In the above I would like to replace the second "Plus" but not the first.
My regex attempt so far is:
(?!<h\d*>)\bPlus\b(?!<\\h>)
Meaning:
Do not capture the following if in a <h + 1 digit and 0 or more characters and end an closing <\h>
Capture only if the group "Plus" is surrounded by spaces or white space
However - this captures both occurrences. Can someone point out my mistake and correct this?
I want to use this in VBA but should be a general regex question, as far as I understand.
Somewhat related but not addressing my problem in regex
Not relevant, as not RegEx
You can use
\bPlus\b(?![^>]*<\/h\d+>)
See the regex demo. To use the match inside the replacement pattern, use the $& backreference in your VBA code.
Details:
\bPlus\b - a whole word Plus
(?![^>]*<\/h\d+>) - a negative lookahead that fails the match if, immediately to the right of the current location, there are
[^>]* - zero or more chars other than >
<\/h - </h string
\d+ - one or more digits
> - a > char.

Search and replace outer tag in Atom using REGEX

Using Atom, I'm trying to replace the outer tag structure for multiple different texts within a document. Also using REGEX, which I'm not versed enough to come up with my own solution
HTML to be searched <span class="klass">Any text string</span>
Replace it with <code>Any text string</code>
My REGEX (<?span class="klass">)+[\w]+(<?/span>)
Is there a wildcard to "keep" the [\w] part into the replaced result?
You can use a capture group to capture the text in between the <span> tags during the match, and then use it to build the <code> output you want. Try the following find and replace:
Find:
<span class="klass">(.*?)</span>
Replace:
<code>$1</code>
Here $1 represents the quantity (.*?) which we captured in the search. One other point, we use .*? when capturing between tags as opposed to just .*. The former .*? is a "lazy" or tempered dot. This tells the engine to stop matching upon hitting the first closing </span> tag. Without this, the match would be greedy and would consume as much as possible, ending only with the final </span> tag in your text.

IE(11) does not escape "<" character when concatenated with an alphabetical character

I posted a similar question earlier but all replies missed the point or just assumed something basic/simple, so I'll try to explain again. Please read on if you wish to help...
I want to be able to type something like <this> is visible and have it show up on a rendered page. When I type the same text without this site's code-text, this is what I get: "something like is visible". Notice that the text between the "<" and ">" is missing.
In fact, I had to add the ">" character otherwise this text would have never showed up. This issue does not happen if the "<" character is not concatenated (i.e.: "something like < this> is visible")
The reason for that is that IE believes I am creating a tag. I want to escape the "<" special character.
Conversion does not work (i.e.: converting "<" to < or <).
Thank you.
I can't insert examples into my comment, but have you tried the following:
Something like <this> is visible.
You can use this page as a reference.

Sublime Text regex to find and replace whitespace between two xml or html tags?

I'm using Sublime Text and I need to come up with a regex that will find the whitespaces between a certain opening and closing tag and replace them with commas.
Example: Replace white space in
<tags>This is an example</tags>
so it becomes
<tags>This,is,an,example</tags>
Thanks!
You have just to use a simple regex like:
\s+
And replace it with with a comma.
Working demo
This will find instances of
<tags>...</tags>
with whitespace between the tags
(<tags>\S+)\W(.+</tags>)
This will replace the first whitespace with a comma
\1,\2
Open Find and Replace [OS X Cmd+Opt+F :: Windows Ctrl+H]
Use the two values above to find and replace and use the 'Replace All' option. Repeat until all the whitespaces are converted to commas.
The best answer is probably a quick script but this will get you there fairly fast without needing to do any coding.
You can replace any one or more whitespace chunks in between two tags using a single regular expression:
(?s)(?:\G(?!\A)|<tags>(?=.*?</tags>))(?:(?!</?tags>).)*?\K\s+
See the regex demo. Details
(?s) - a DOTALL inline modifier, makes . match line breaks
(?:\G(?!\A)|<tags>(?=.*?</tags>)) - either the end of the previous successful match (\G(?!\A)) or (|) <tags> substring that is immediately followed with any zero or more chars, as few as possible and then </tags> (see (?=.*?</tags>))
(?:(?!</?tags>).)*? - any char that does not start a <tags> or </tags> substrings, zero or more occurrences but as few as possible
\K - match reset operator
\s+ - one or more whitespaces (NOTE: use \s if each whitespace must be replaced).
SublimeText settings:

Regex conditional

How would I write a RegEx to:
Find a match where the first instance of a > character is before the first instance of a < character.
(I am looking for bad HTML where the closing > initially in a line has no opening <.)
It's a pretty bad idea to try to parse html with regex, or even try to detect broken html with a regex.
What happens when there is a linebreak so that the > character is the first character on the line for example (valid html).
You might get some mileage from reading the answers to this question also: RegEx match open tags except XHTML self-contained tags
Would this work?
string =~ /^[^<]*>/
This should start at the beginning of the line, look for all characters that aren't an open '<' and then match if it finds a close '>' tag.
^[^<>]*>
if you need the corresponding < as well,
^[^<>]*>[^<]*<
If there is a possibility of tags before the first >,
^[^<>]*(?:<[^<>]+>[^<>]*)*>
Note that it can give false positives, e.g.
<!-- > -->
is a valid HTML, but the RegEx will complain.