Can Vim highlight matching HTML tags like Notepad++? - html

Vim has support for matching pairs of curly brackets, parentheses, and square brackets. This is great for editing C-style languages like PHP and JavaScript. But what about matching HTML tags?
Notepad++ has had this feature for as long as I’ve been using it. Being able to spot where blocks of HTML begin and end is very useful. What I’m looking for is something like this for Vim (see the green div tags):
A bonus feature: highlighting unclosed HTML tags, like the red tag in the above screenshot.
matchit has been proposed as the next-best-thing, but it requires an extra keystroke to use its functionality. I’d like be able to see where the blocks of HTML begin and end without an extra keypress.
I’ve trawled the internet to find something like this for Vim. Apparently, I’m not the only one, according to two other StackOverflow questions and a Nabble thread.
I’ve almost resigned myself to Vim not being able to visually match HTML tags. Is it possible for Vim to do this?
Addendum: If it is not currently possible to do this with any existing plugins, does any Vimscript wizard out there have any pointers on how to approach writing a suitable plugin?

I had to work with some HTML today so thought I would tackle this. Added a ftplugin to vim.org that should solve your problem.
You can get it here on vim.org.
You can get it here on github.
Hope it works for you. Let me know if you have any problems.

Greg's MatchTag.vim plugin is awesome, but I wanted something more. I wanted the enclosing tags to always be highlighted, not just when the cursor is on one of the tags.
So I wrote MatchTagAlways which does everything that Greg's MatchTag does and also always highlights the enclosing tag, no matter where the cursor is in the code. It also works with unclosed tags and HTML templating languages like Jinja or Handlebars.
Here's a GIF showing it in action:

I came here looking for matching html style angle brackets in Vim. This seems to work:
:set mps+=<:>
:help matchpairs

Related

Using Extended Find and Replace + Reg Expressions to find and remove an anchor tag in HTML

I'm trying to use regular expressions to find and remove all anchor tags identical to these -
title name
title name part II
Where ONLY the filename and titles change, and I need to leave the title (unlinked) behind.
Because of my word-processing background I naively tried the following wildcards in my extended find and replace with regular expressions checked:
*
it does not work of course, not even to remove the entire link and text.
After much searching and reading I'm still just guessing at how to do this. Since no example I've found does exactly what I need using extended find and replace. All of this is over my head.
I have searched "How To Use regular expressions in Search and Replace" with HomeSite, Dreamweaver, topsite and other similar software to what I'm using to edit my HTML docs. Without success. I've read several tutorials on using RegExp and I'm learning, but still cannot seem to do what I need. I have read how to use RegExp in php, perl, c++ but cannot transition this over to what I need.
I'm willing to use other text editing software to accomplish this as I need to remove about 4,000 of these wma file links, while leaving the titles and other tags untouched.
I have searched similar questions here on stackoverflow. And read up on using regular expressions in general, but I cannot follow what is explained enough to adapt it to what I need. This is such a big subject.
This is what I have so far:
<a href="mms:\/\/media\.domain\.com\/CME\/ \.wma"> <\/a>
The parts where I've left spaces are what's giving me trouble.
Thanks
With some help on a forum I was able to work out my answer, here it is
Find:
([^<]+)
Replace:
\1
Note the parenthesis around the [^<]+ this creates a sub-expression that can be referenced by the \1 I do not know how this all works exactly, but I found it in an article.
This find and replace finds my anchor tag and title and replaces it with just the title.
I have to give thanks to http://forums.devshed.com for the pieces I was missing.

Alternatives to Regular Expression for HTML

I've seen over and over, and over and over and over on Stack Overflow that Regular Expression are NOT a good fit for XHTML. What I haven't seen however is an alternative.
Most text editors have a built in RegEx search and replace that is just super easy to use. Well, except for the fact that it doesn't work well with HTML. Is there some tool or language that is meant for parsing and replacing XHTML? It would be great if you could say "find all paragraph tags that have the class of "quote" that are within the DIV with the class of "monkey", and then add a H2 tag with "Monkey Quote" inside.
Another example that I'm struggling with finding a solution to is to find all words within Paragraph tags and wrap a SPAN tag around them (for word-by-word highlighting audio). That kind of stuff.
Is there a tool or language that is meant for this kind of thing?
From your last comment, I'm assuming you'd like something useful from the command-line.
If so, answered pretty well here:
Grep and Sed Equivalent for XML Command Line Processing
If you have a well formed document, XSLT and XPATH can do what you need.

Why should I use BBCode but not HTML in comment forms?

I'm writing a comment parsing function in PHP.
Since BBCode is not a real markup language, I'v never liked the writing style.
So I'm giving visitors the ability to use basic HTML code in comment forms.
And when posting, PHP will check for disallowed and invalid tags/attributes, and either replace or remove them.
I believe it does the same job and output exactly the same as with BBCode.
If this is true, why are there BBCode? Does BBcode have any advantages over HTML?
update
as monochrome answered
If you're confident that your HTML filter is safe enough, you should be fine though
well, I'm not confident writing the filter myself, but there are some top-rated filters out there like PHP Simple HTML DOM Parser, HTML Purifier, htmLawed...
The BBCode is developed by UBB and still being widely used, such as phpBB.
Are the developers from UBB/phpBB not confident about their skills to write a perfect HTML filter? (I guess not)
Also, like the Markdown that StackOverflow's using...if HTML+Parser does the job, why invent another "language" anyway? (except for saving a few bits...)
It's main advantage is the prevention of unwanted code injections. That's why I would use something like BBCode or Markdown.
At least you should work with a White-List of allowed HTML-Tags and not with blacklisting.
BBCode eliminates the issue that your HTML filter might have bugs so that the commenter can comment code he's not supposed to comment.
If you're confident that your HTML filter is safe enough, you should be fine though.
Another problem is that HTML comments might break your layout, e.g. when the commenter puts in a single closing </div> or something like that.
BBCode became popular as allowing the user a limited access to html while trying to prevent XSS. BBCode became popular before there where solutions like HTML Purifier. In all reality BBCode and Html Purifier have their own security problems. Its just that BBCode was a more simple solution to this problem.
Use BBcode + convert all left tag to htmlspecialchar seems to be totally XSS free for me. ( unless BBcode parser is really super bad designed )
Ultimately, both of them reach the same goal. Currently I choose BBcode because HTMLpurifier auto strip tag instead of replacing left Tag with html specialChar. At least in the demo I didn't see the function keeping the left Tag.
So there's some problem as we want user to write < instead of auto convert it to < . And some more issue of laziness to validate all data.

Need HTML characters stripped out of excel export, but effects preserved

I'm exporting data using CF9's cfspreeadsheet tags and functions, some columns have HTML formatted text in them. I need to strip out the HTML tags, and convert characters like &lt and &amp to their equivalents. However, I'd also like to keep the effects of bold tags and paragraphs tags if possible.
I know I can use rereplace, and others to brute force the output, but I was hoping for a more elegant solution.
Any ideas?
Thanks for the help!
I need to strip out the HTML tags, and convert characters like &lt and
&amp to their equivalents. However,
I'd also like to keep the effects of
bold tags and paragraphs tags if
possible.
I know I can use rereplace, and others
to brute force the output, but I was
hoping for a more elegant solution.
I do not think such a function exists in CF. It would require some sort of html=>excel conversion of the styles. This thread says that functionality did not even exist in POI (which is used by cfspreadsheet) until recently. So my guess would be it does not exist within the CF spreadsheet functions either.
If you are willing to work lower level, you might check the latest version of POI. See if the mentioned patch is available in the main distribution. Otherwise, rereplace() sounds like the simplest approach.

How do you find mismatched tags in HTML?

I've inherited some rather large static HTML files that need to be fixed up to work in webkit-based browsers, Safari in particular. One of the common bugs I've found that cause rendering differences is missing </div> tags. (Both IE7+ and FF3+ seem to ignore these, or make good guesses as to where to close the DIVs, and render as expected.) I'm used to using vim with HTML syntax highlighting for editing, but end up writing awk scripts to match starting and ending tags.
What is your favorite tool or technique for matching start and end tags in a large HTML file?
UPDATE: I'm currently in a shop that targets HTML 4.01 Strict, not XHTML.
The W3C HTML Validator works fairly well, or if you want something a little simpler then the Tidy FireFox plugin also works.
The w3c Validator can be (extremely) verbose, but it does check for missing closing tags.
HTML Tidy is a great command line tool. I often use it with WGet
Most IDE's usually let you know via highlighting, fuzzy-underline or a warning.
Div Checker is a great tool that focuses on div tags specifically.
While other tools were only able to tell me that "some tag was missing somewhere".
Div-Checker removes other tags, code, and most comments, to create a clean visual structure of just the divs themselves.
From this div map, it's fairly easy to see if nested divs are correctly paired !
I was able to locate a missing div left out by a wordpress theme developer, with the help of this tool.
Here is the Posted Answer from #noah-whitmore that enlightened me to this awesome tool.
There are a couple other useful tools mentioned in that thread as well, such as unclosed-tag-finder (visually not so easy to read, but helpful if your missing tag is not a div).
vim/gvim & NetBeans both do a great job of tag matching
What is your favorite tool or technique for matching start and end tags in a large HTML file?
A text editor with a built-in XML well-formedness checker, combined with using XHTML for everything.
Sublime Text with the Tag plugin has a Tag Lint feature which which aims to check correctness of opened and closed tags.