Can't figure out a regex with line break - HTML

Can't figure out a regex with line break - HTML - html

I have written a very simple regular expression to search within an HTML document for any tag - as we are modifying 40+ templates that have been edited by a WYSIWYG editor that was horrible. Basically, it added style="font... tags everywhere - so I want to delete them all.
The problem is, some of them have line breaks between the styles (like you would typically write CSS) - and I can't figure out how to include line breaks within my expression.
Here is what I have:
style="font(.*?)"
I am using textmate to search for it, and it works great except for styles that have hard line breaks in them.
Any help???

Use this RegEx: style="font([\s\S]*?)". . does not match \n by default.

Putting (?s) at the front of your regex causes . to match newline as well

This is the most straightforward way to do it:
style="font([^"]*)"

Related

Regex match and delete everything before string (opening html tag)

I'm using Dreamweaver and Notepad++ and have searched high and low but nothing seems to work from what I've found.
I've got a whole stack of html pages and I need to remove from all of them everything above but not including the first tag in the document. Specifically, everything before the string "<h1" (no quotes). I've tried various examples in Notepad++ and it finds the first h1 tag but doesn't replace everthing before it.

Assuming you want to lose everything in your file before the "<h1" text
then specify ".*<[hH]1" as search tag and "<h1" as replacement and check
the box marked ". matches newline". Works for me.

You can do this from the Command Line or a text editor that allows you to search-replace multiple files. However, are you sure the content is the same in every html file?

How to stop inline HTML comments from going to the next line when using Reformat Code in PHPStorm?

I am using PHPStorm IDE and in the meanwhile I also use HTML comments for the ending tags.
I use Reformat Code by cmd+alt+L keys a lot. I am aware that the Code Style in Preference will change the code style when using Reformat Code. However, it seems that I cannot change the HTML comments which are in the same line as HTML ending tags from going to the next line when I use Reformat Code feature which is annoying.
How can I change that to have the HTML comments inline with HTML ending tags when using Reformat Code feature?

AFAIK there is no such option.
https://youtrack.jetbrains.com/issue/WEB-5070 -- star/vote/comment to get notified on progress.

cgi/perl/html - what characters to escape when printing into html?

I got an input file that I need to print directly into an html page.
I did $inputfile =~ s/\n/<br>/g; Are there any other special characters I should be aware of maybe other than < and > when printing this $inputfile to html?

You absolutely should use HTML::Escape instead of doing some ill-conceived hackjob which will cause everyone who deals with your code (you included) to curse your name in the future.
It's simple - install HTML::Escape via CPAN, then use it thus:
use HTML::Escape qw(escape_html);
my $escaped_string = escape_html($string);
Note that if you want to preserve whitespace formatting you should use a module to do that, as well, such as HTML::FromText - the above code will not automagically convert line breaks to tags because that's different completely from escaping unsafe characters to HTML entities.

Regular Expression to Retrieve text between two html tags with Visual Studio's search-replace feature

I'm trying to use Visual Studio's search-replace function to remove tags that don't do anything. The intent is to simplify some HTML before I paste it into a SharePoint page.
This is what I'm using in the Find box \<font\>{~(.*\<font\>.*)}\</font\>
And the Replace box has \1
However, the expression comes up with no matches, even though I have plenty of places like this <font> xxxx </font> within the HTML. I could move the .* outside the paranthesis, but then the expression matches most of the line where I have multiple sets of font tags - some which actually do something.
I'm thinking this would be much easier if the IDE used the same regular expression engine as the languages for which it is the primary development tool.

I just had to review the documentation for VS 2010. Using a minimal match # was all I needed: \<font\>{.#}\</font\>.

I was trying to replace all span tags with div tags. I was able to solve a similar problem by using the following RegEx in the picture. I had to escape both the > and < and the class attribute double quotes.
\<span class=\"label\"\>{.#}\</span\>
<div class="label">\1<\div>

How to remove all empty tags in X/HTML code in once?

for example :
I want to remove all highlighted tags
alt text http://shup.com/Shup/299976/110220132930-My-Desktop.png

You could use a regular expression in any editor that supports them. For instance, I tested this one in Dreamweaver:
<(?!\!|input|br|img|meta|hr)[^/>]*?>[\s]*?</[^>]*?>
Just make a search and replace all (with the regex as search string and nothing as replacement). Note however that this may remove necessary whitespace. If you just want to remove empty tags without anything in between,
<(?!\!|input|br|img|meta|hr)[^/>]*?></[^>]*?>
would be the way to go.
Update: You want to remove &nbsps as well:
<(?!\!|input|br|img|meta|hr)[^/>]*?>(?:[\s]| )*?</[^>]*?>
I did not verify this one - it should be OK though, try it out :-)

If this is only about quickly editing a file, and your editor supports regular expression replacement, you can use a regex like this:
<[^>]+></[^>]+>
Search for this regex, and replace with an empty string.
Note: This isn't safe in any way - don't rely on it, as it can find more things than just valid, empty tags. (It would also find <a></b> for example.) There is no safe way to do this with regexes - but if you check each replacement manually, you should be fine. If you need real safe replacement, then either you'll have to find an editor that supports this (JEdit may be a good bet, but I haven't checked), or you'll have to parse the file yourself - e.g. using XSLT.

What you're asking for sounds like a job for regular expressions. Many editors support regular expression find/replace. Personally, I'd probably do this from the command-line with Perl (sed would also work), but that's just me.
perl -pe 's|<([^\s>]+)[^>]*></\1>||g' < file.html > new_file.html
or if you're brave, edit the file in place:
perl -pe 's|<([^\s>]+)[^>]*></\1>||g' -i file.html
This will remove:
<p></p>
<p id="foo"></p>
but not:
<p>hello world</p>
<p></a>
Warning: things like <img src="pic.png"></img> and <br></br> will also be removed. It's not obvious from your question, but I'll assume this is undesirable. Maybe you're not worried because you know all your images are declared like this <img src="pic.png"/>. Otherwise the regular expression will need to be modified to account for this, but I decided to start simple for an easier explanation...
It works by matching the opening tag: a literal < followed by the tag name (one or more characters which are not whitespace or > = [^\s>]+), any attributes (zero or more characters which aren't > = [^>]*), and then a literal >; and a closing tag with the same name: this takes advantage of the fact that we captured the tag name, so we can use a backreference = </\1>. The matches are then replaced with the empty string.
If the syntax/terminology used here is unfamiliar to you, I'm a fan of the perlre documentation page. Regular expression syntax in other languages should be very similar if not identical to this, so hopefully this will be useful even if you don't Perl :)
Oh, one more thing. If you have things like <div><p></p></div>, these will not be picked up all at once. You'll have to do multiple passes: the first will remove the <p></p> leaving a <div></div>to be removed by the second. In Perl, the substitution operator returns the number of replacements made, so you can:
perl -pe '1 while s|<([^\s>]+)[^>]*></\1>||g' < file.html > new_file.html

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Can't figure out a regex with line break - HTML - html

Use this RegEx: style="font([\s\S]*?)". . does not match \n by default.

Putting (?s) at the front of your regex causes . to match newline as well

This is the most straightforward way to do it: style="font([^"]*)"

Related

Regex match and delete everything before string (opening html tag)

How to stop inline HTML comments from going to the next line when using Reformat Code in PHPStorm?

cgi/perl/html - what characters to escape when printing into html?

Regular Expression to Retrieve text between two html tags with Visual Studio's search-replace feature

How to remove all empty tags in X/HTML code in once?

Categories

Resources