regex newline character error - multiline

I am trying to make my regex work across multiple lines and "m" didn't seem to work either. So, my regex is working for 1st line and noT for the following lines.

You can skip the match part and just do it all in one step:
> "the *text* is to be replaced \n by *text*".replace(/\*([\s\S]*?)\*/g, '<i>$1</i>');
"the <i>text</i> is to be replaced \n by <i>text</i>"
. matches any character, but it excludes newlines. [\s\S] matches any character including newlines.
I changed your search regex to \*([\s\S]*?)\*, which non-greedily matches the stuff between the asterisks.
The replacement string is <i>$1</i>. $1 is replaced with the contents of the first capturing group, which is your text.
Also, because it looks like you're trying to convert Markdown to HTML, try using a pre-made JS converter: http://www.showdown.im/
You can use it like this:
var str = "the *text* is to be *replaced \n by* *text*";
alert(str.replace(/\*([\s\S]*?)\*/g, '<i>$1</i>'));

Related

How to use regex (regular expressions) in Notepad++ to remove all HTML and JSON code that does not contain a specific string?

Using regular expressions (in Notepad++), I want to find all JSON sections that contain the string foo. Note that the JSON just happens to be embedded within a limited set of HTML source code which is loaded into Notepad++.
I've written the following regex to accomplish this task:
({[^}]*foo[^}]*})
This works as expected in all the input that is possible.
I want to improve my workflow, so instead of just finding all such JSON sections, I want to write a regex to remove all the HTML & JSON that does not match this expression. The result will be only JSON sections that contain foo.
I tried using the Notepad++ regex Replace functionality with this find expression:
(?:({[^}]*?foo[^}]*?})|.)+
and this replace expression:
$1\n\n$2\n\n$3\n\n$4\n\n$5\n\n$6\n\n$7\n\n$8\n\n$9\n\n
This successfully works for the last occurrence of foo within the JSON, but does not find the rest of the occurrences.
How can I improve my code to find all the occurrences?
Here is a simplified minimal example of input and desired output. I hope I haven't simplified it too much for it to be useful:
Simplified input:
<!DOCTYPE html>
<html>
<div dat="{example foo1}"> </div>
<div dat="{example bar}"> </div>
<div dat="{example foo2}"> </div>
</html>
Desired output:
{example foo1}
{example foo2}
You can use
{[^}]*foo[^}]*}|((?s:.))
Replace with (?1:$0\n). Details:
{[^}]*foo[^}]*} - {, zero or more chars other than }, foo, zero or more chars other than } and then a }
| - or
((?s:.)) - Capturing group 1: any one char ((?s:...) is an inline modifier group where . matches all chars including line break chars, same as if you enabled . matches newline option).
The (?1:$0\n) replacement pattern replaces with an empty string if Group 1 was matched, else the replacement is the match text + a newline.
See the demo and search and replace dialog settings:
Updates
The comment section was full tried to suggest a code here,
Let me know if this is a bit close to your intended result,
Find: ({.+?[\n]*foo[ \d]*})|.*?
Replace all: $1
Also added Toto's example

Regex to match style=' '

I'm using a series of regex patterns to remove HTML elements from my code. I need to also remove the style="{stuff}" attributes that are also present in the file.
At the moment I have style.*?, which matches only the word style, however I thought that by adding .*? to the regex it would also match with zero to unlimited characters after the style declaration?
I also have style={0,1}"{0,1}.*?"{0,1} which matches:
style=""
style="
style
But does not match style="something", again in this regex I would expect the .*? to match everything between the first " and the second ", but this is not the case. What do I need to do to change this regex so that it will match with all of the following:
style="font-family:"Open Sans", Arial, sans-serif;background-color:rgb(255, 255, 255);display:inline !important;"
style=""
style="something"
style
The pattern style.*? does not match the following parts as there is nothing following the non greedy part so it is matching as least as possible.
You could use an optional group and a negated character class:
\bstyle(?:="[^"]*")?
In parts
\bstyle Word bounary, match style
(?: Non capturing group
=" Match = and opening "
[^"]* Match any char 0+ times except "using a negated character class
" Match closing "
)? Close group and make it optional
Regex demo
If you want to match single or double quotes with the accompanying closing single or double quote to not match for example style="', you could use a capturing group (["']) with a backreference \1 to what was captured in group 1:
\bstyle(?:=(["'])[^"]*\1)?
Regex demo
Here's what I cooked up. It uses positive lookbehind (?<=...) and lookahead (?=...) to ensure that the found match is inside an HTML tag:
(?<=<[a-zA-Z][^<>]*?)\sstyle(?:="[^"]*")?(?=[\s>])(?=[^<>]*>)
Test it out.
It will match any whitespace before the "style", so that a removal of all matches goes from <a stuff="..." style="width:18px;" href="someurl"> to <a stuff="..." href="someurl"> without leaving a double space behind where it was removed.
Note that some regex parsers (like the Python one) don't like lookbehind with non-fixed size. This can be solved simply by changing the first and last parts, the lookbehind and lookahead groups, into capture groups instead, thereby capturing the whole html tag. Then you simply need to replace the match by $1$2 instead of an empty string, replacing the found match by the same thing but without the style="..." part inside it.
The resulting regex for that would be:
(<[a-zA-Z][^<>]*?)\sstyle(?:="[^"]*")?(?=[\s>])([^<>]*>)
Test it out.

Extract string from HTML tag [VB.Net] [duplicate]

This question already has answers here:
How do I match any character across multiple lines in a regular expression?
(26 answers)
Closed 4 years ago.
How do I match and replace text using regular expressions in multiline mode?
I know the RegexOptions.Multiline option, but what is the best way to specify match all with the new line characters in C#?
Input:
<tag name="abc">this
is
a
text</tag>
Output:
[tag name="abc"]this
is
a
test
[/tag]
Aahh, I found the actual problem. '&' and ';' in Regex are matching text in a single line, while the same need to be escaped in the Regex to work in cases where there are new lines also.
If you mean there has to be a newline character for the expression to match, then \n will do that for you.
Otherwise, I think you might have misunderstood the Multiline/Singleline flags. If you want your expression to match across several lines, you actually want to use RegexOptions.Singleline. What it means is that it treats the entire input string as a single line, thus ignoring newlines. Is this what you're after...?
Example
Regex rx = new Regex("<tag name=\"(.*?)\">(.*?)</tag>", RegexOptions.Singleline);
String output = rx.Replace("Text <tag name=\"abc\">test\nwith\nnewline</tag> more text...", "[tag name=\"$1\"]$2[/tag]");
Here's a regex to match. It requires the RegexOptions.Singleline option, which makes the . match newlines.
<(\w+) name="([^"]*)">(.*?)</\1>
After this regex, the first group contains the tag, the second the tag name, and the third the content between the tags. So replacement string could look like this:
[$1 name="$2"]$3[/$1]
In C#, this looks like:
newString = Regex.Replace(oldString,
#"<(\w+) name=""([^""]*)"">(.*?)</\1>",
"[$1 name=\"$2\"]$3[/$1]",
RegexOptions.Singleline);

Sublime Text regex to find and replace whitespace between two xml or html tags?

I'm using Sublime Text and I need to come up with a regex that will find the whitespaces between a certain opening and closing tag and replace them with commas.
Example: Replace white space in
<tags>This is an example</tags>
so it becomes
<tags>This,is,an,example</tags>
Thanks!
You have just to use a simple regex like:
\s+
And replace it with with a comma.
Working demo
This will find instances of
<tags>...</tags>
with whitespace between the tags
(<tags>\S+)\W(.+</tags>)
This will replace the first whitespace with a comma
\1,\2
Open Find and Replace [OS X Cmd+Opt+F :: Windows Ctrl+H]
Use the two values above to find and replace and use the 'Replace All' option. Repeat until all the whitespaces are converted to commas.
The best answer is probably a quick script but this will get you there fairly fast without needing to do any coding.
You can replace any one or more whitespace chunks in between two tags using a single regular expression:
(?s)(?:\G(?!\A)|<tags>(?=.*?</tags>))(?:(?!</?tags>).)*?\K\s+
See the regex demo. Details
(?s) - a DOTALL inline modifier, makes . match line breaks
(?:\G(?!\A)|<tags>(?=.*?</tags>)) - either the end of the previous successful match (\G(?!\A)) or (|) <tags> substring that is immediately followed with any zero or more chars, as few as possible and then </tags> (see (?=.*?</tags>))
(?:(?!</?tags>).)*? - any char that does not start a <tags> or </tags> substrings, zero or more occurrences but as few as possible
\K - match reset operator
\s+ - one or more whitespaces (NOTE: use \s if each whitespace must be replaced).
SublimeText settings:

Escape HTML as well as \r \t and \n from a string

I am trying to index solar search from a built string in a code which has HTML tags. Any one knows how I can remove all the characters from the String.
Currently, I am using
answers << answer.feedback.replaceAll('\\<.*?>','')
I want to escape all the HTML characters and all the \n \t and \r. How to do this?
Do you want to escape the html tags so that <span> becomes <span> or do you want to REMOVE the tags themselves. Your original question is ambiguous.
For the first scenario:
answer.feedback.encodeAsHTML()
(see http://grails.org/doc/latest/ref/Plug-ins/codecs.html for further info)