Can emacs re-indent a big blob of HTML for me? - html

When editing HTML in emacs, is there a way to automatically pretty-format a blob of markup, changing something like this:
<table>
<tr>
<td>blah</td></tr></table>
...into this:
<table>
<tr>
<td>
blah
</td>
</tr>
</table>

You can do sgml-pretty-print and then indent-for-tab on the same region/buffer, provided you are in html-mode or nxml-mode.
sgml-pretty-print adds new lines to proper places and indent-for-tab adds nice indentation. Together they lead to properly formatted html/xml.

By default, when you visit a .html file in Emacs (22 or 23), it will put you in html-mode. That is probably not what you want. You probably want nxml-mode, which is seriously fancy. nxml-mode seems to only come with Emacs 23, although you can download it for earlier versions of emacs from the nXML web site. There is also a Debian and Ubuntu package named nxml-mode. You can enter nxml-mode with:
M-x nxml-mode
You can view nxml mode documentation with:
C-h i g (nxml-mode) RET
All that being said, you will probably have to use something like Tidy to re-format your xhtml example. nxml-mode will get you from
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head></head>
<body>
<table>
<tr>
<td>blah</td></tr></table>
</body>
to
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head></head>
<body>
<table>
<tr>
<td>blah</td></tr></table>
</body>
</html>
but I don't see a more general facility to do line breaks on certain xml tags as you want. Note that C-j will insert a new line with proper indentation, so you may be able to do a quick macro or hack up a defun that will do your tables.

http://www.delorie.com/gnu/docs/emacs/emacs_277.html
After selecting the region you want to fix. (To select the whole buffer use C-x h)
C-M-q
Reindent all the lines within one parenthetical grouping(indent-sexp).
C-M-\
Reindent all lines in the region (indent-region).

i wrote a function myself to do this for xml, which works well in nxml-mode. should work pretty well for html as well:
(defun jta-reformat-xml ()
"Reformats xml to make it readable (respects current selection)."
(interactive)
(save-excursion
(let ((beg (point-min))
(end (point-max)))
(if (and mark-active transient-mark-mode)
(progn
(setq beg (min (point) (mark)))
(setq end (max (point) (mark))))
(widen))
(setq end (copy-marker end t))
(goto-char beg)
(while (re-search-forward ">\\s-*<" end t)
(replace-match ">\n<" t t))
(goto-char beg)
(indent-region beg end nil))))

In emacs 25, which I'm currently building from source, assuming you are in HTML mode, use
Ctrl-x
h
to select all, and then press Tab.

You can do a replace regexp
M-x replace-regexp
\(</[^>]+>\)
\1C-q-j
Indent the whole buffer
C-x h
M-x indent-region

This question is quite old, but I wasn't really happy with the various answers. A simple way to re-indent an HTML file, given that you are running a relatively newer version of emacs (I am running 24.4.1) is to:
open the file in emacs
mark the entire file with C-x h (note: if you would like to see what is being marked, add (setq transient-mark-mode t) to your .emacs file)
execute M-x indent-region
What's nice about this method is that it does not require any plugins (Conway's suggestion), it does not require a replace regexp (nevcx's suggestion), nor does it require switching modes (jfm3's suggestion). Jay's suggestion is in the right direction — in general, executing C-M-q will indent according to a mode's rules — for example, C-M-q works, in my experience, in js-mode and in several other modes. But neither html-mode nor nxml-mode do not seem to implement C-M-q.

Tidy can do what you want, but only for whole buffer it seems (and the result is XHTML)
M-x tidy-buffer

You can pipe a region to xmllint (if you have it) using:
M-|
Shell command on region: xmllint --format -
The result will end up in a new buffer.
I do this with XML, and it works, though I believe xmllint needs certain other options to work with HTML or other not-perfect XML. nxml-mode will tell you if you have a well-formed document.

The easiest way to do it is via command line.
Make sure you have tidy installed
type tidy -i -m <<file_name>>
Note that -m option replaces the newly tidied file with the old one. If you don't want that, you can type tidy -i -o <<tidied_file_name>> <<untidied_file_name>>
The -i is for indentation. Alternatively, you can create a .tidyrc file that has settings such as
indent: auto
indent-spaces: 2
wrap: 72
markup: yes
output-xml: no
input-xml: no
show-warnings: yes
numeric-entities: yes
quote-marks: yes
quote-nbsp: yes
quote-ampersand: no
break-before-br: no
uppercase-tags: no
uppercase-attributes: no
This way all you have to do is type tidy -o <<tidied_file_name>> <<untidied_file_name>>.
For more just type man tidy on the command line.

Related

Ignore any blank space or line break in git-diff

I have the same HTML file rendered in two different ways and want to compare it using git diff, taking care of ignoring every white-space, tab, line-break, carriage-return, or anything that is not strictly the source code of my files.
I'm actually trying this:
git diff --no-index --color --ignore-all-space <file1> <file2>
but when some html tags are collapsed all on one line (instead of one per line and tabulated) git-diff detect is as a difference (while for me it is not).
<html><head><title>TITLE</title><meta ......
is different from
<html>
<head>
<title>TITLE</title>
<meta ......
What option do I miss to accomplish what I need and threat as if it was the same?
git diff supports comparing files line by line or word by word, and also supports defining what makes a word. Here you can define every non-space character as a word to do the comparison. In this way, it will ignore all spaces including white-spcae, tab, line-break and carrige-return as what you need.
To achieve it, there's a perfect option --word-diff-regex, and just set it --word-diff-regex=[^[:space:]]. Refer to doc for detail.
git diff --no-index --word-diff-regex=[^[:space:]] <file1> <file2>
Here's an example. I created two files, with a.html as follows:
<html><head><title>TITLE</title><meta>
With b.html as follows:
<html>
<head>
<title>TI==TLE</title>
<meta>
By running
git diff --no-index --word-diff-regex=[^[:space:]] a.html b.html
It highlights the difference of TITLE and TI{+==+}TLE in the two files in plain mode as follows. You can also specify --word-diff=<mode> to display results in different modes. The mode can be color, plain, porcelain and none, and with plain as default.
diff --git a/d.html b/a.html
index df38a78..306ed3e 100644
--- a/d.html
+++ b/a.html
## -1 +1,4 ##
<html>
<head>
<title>TI{+==+}TLE</title>
<meta>
Executing command git diff --help gives some options like
--ignore-cr-at-eol
Ignore carriage-return at the end of line when doing a comparison.
--ignore-space-at-eol
Ignore changes in whitespace at EOL.
-b, --ignore-space-change
Ignore changes in amount of whitespace. This ignores whitespace at line end, and considers all other sequences of one or more whitespace
characters to be equivalent.
-w, --ignore-all-space
Ignore whitespace when comparing lines. This ignores differences even if one line has whitespace where the other line has none.
--ignore-blank-lines
Ignore changes whose lines are all blank.
Which you can combine according to your need, Below command worked for me
git diff --ignore-blank-lines --ignore-all-space --ignore-cr-at-eol
This does the trick for me:
git diff --ignore-blank-lines
git-diff compares files line by line
It checks the first line of your file1 with that in file2, since they are not same it reports an error.
Ignoring white space means that foo bar will match foobar if on the same line. Since your files span multiple lines in one and only one line in other, the files will always differ
If you really want to check that the files contain the exact same non-whitespace characters, you could try something like this:
diff <(perl -ne 's/\s*//xg; print' file1) <(perl -ne 's/\s*//g; print' file2)
Hope it solves your problem!

How can I replace and multiply dimensions of img tags in Perl or Ruby?

I have a folder full of html files created for a Kindle ebook. The images are coded with width and height, as per the Kindle guidelines:
<img width="328" height="234" src="images/224p_fmt.jpeg" alt="224p.tif"/>
What I need to create/find is a script that will process all the image tags, multiply the width an height attributes by a specified amount (coded into the script) and write them back into the html files.
So, for the above example, say I want to multiply by 1.5, and wind up with
<img width="492" height="351" src="images/224p_fmt.jpeg" alt="224p.tif"/>
Scripts like this are not my forte, so help appreciated. I especially am unclear on how to write a script that I can run on file(s) from the command line and just input/output html.
I assume the meat of the code would be something like
s/<img width="([0-9]+)" height="([0-9]+)" src="(.*?)" alt=".*"/>/'<img width="'.$1*1.5.'" height="'.$2*1.5.'" src="'.$3.'" alt=""/>'/eg;
Which I realize is incorrect (the multiplication part) which is why help appreciated.
You've already got the main regex figured out, just need to tweak it and decide a language. Using regexes on html is not optimal, but since this is somewhat straightforward, its probably ok.
perl -pi.bak -we 's/<img width="([0-9]+)" height="([0-9]+)"/q(<img width=") .
$1*1.5 . q(" height=") . $2*1.5 . q(")/eg;' yourfile.html
Note the use of the alternate quoting q(...), since using single quotes on the command line will conflict with the shell quoting.
There's no need to touch any parts you're not changing, unless you feel the need to make a stricter match. If you do, you can add a look-ahead assertion:
(?=\s*src=".*?"\s*alt=".*?"\/>)
This part will remain unchanged by the substitution.
In Python I'd do it like this.
import sys, re
source = sys.stdin.read()
def multi(by):
def handler(m):
updated = int(m.group(2)) * by
return m.group(1) + str(updated)
return handler
print re.sub(r'((?:width|height)=["\'])(\d+)', multi(1.5), source)
Then you can handle input and output on the command like using < and >.
$ python resize.py < index.html > new_file.html
I would look into using the nokogiri gem to parse the HTML, search for image tags, extract the width and height attributes and then output the changed document so you can save it.
More information at the nokogiri tutorial page.
You're right, it can be done with a small Ruby script. It can look like this :
source = '<img width="328" height="234" src="images/224p_fmt.jpeg" alt="224p.tif"/>'
datas = source.scan(/<img width="([0-9]+)" height="([0-9]+)" src="(.*?)" alt=".*">/).flatten!
source.gsub!(data[0], (data[0].to_i * 1.5).to_s)
source.gsub!(data[1], (data[1].to_i * 1.5).to_s)
Of course, it's a quick and dirty script, far from perfect and it has some drawback.

How to indent html with xmllint?

I'm outputting html that's all crushed together, and would like to convert it to have proper indentation. I've been trying to use xmllint for this, but with no joy. E.g. when this is in file.html:
<table><tr><td><b>Foo</b></td></tr></table>
<table><tr><td>Bar</td></tr></table>
I get:
$ xmllint --format file.html
file.html:2: parser error : Extra content at the end of the document
<table><tr><td>Bar</td></tr></table>
^
<<< exit status [1] >>>
But when file.html contains either of those lines alone, it works fine (removing the second line):
$ xmllint --format file.html
<?xml version="1.0"?>
<table>
<tr>
<td>
<b>Foo</b>
</td>
</tr>
</table>
When i inlcude the --html option, it's more likely to run without errors, but then it doesn't indent.
Any suggestions? Are there any other (*nix) tools I can use for this? Thanks ...
As user 4M01 suggested: On the command line, append the pipe with a call to HTML tidy.
HTML output from xmllint will be repaired; tidy will wrap some reasonable ... around your html fragment.
xmllint --xpath "//tr[6]/td[7]" --html - | tidy -q
tidy -i sets the indent: auto config value. If instead of auto I set it to yes, I consistently got better indentation style:
tidy --indent yes
I think this is because the HTML you have supplied doesn't have a root tag, thus making it an invalid XML.
Try adding the body tag and run xmllint again on it.
<body><table><tr><td><b>Foo</b></td></tr></table>
<table><tr><td>Bar</td></tr></table></body>
Have you tried HTML Tidy ? More Information about this is available at W3 & sourceforge.Even there GUI tool available which known as GuiTidy . This tools are great , they not only help in proper indentation but also validate html code.
Hope this help

How to fold/unfold HTML tags with Vim

Is there some plugin to fold HTML tags in Vim?
Or there is another way to setup a shortcut to fold or unfold html tags?
I would like to fold/unfold html tags just like I do with indentation folding.
I have found zfat (or, equally, zfit) works well for folding with HTML documents. za will toggle (open or close) an existing fold. zR opens all the folds in the current document, zM effectively re-enables all existing folds marked in the document.
If you find yourself using folds extensively, you could make some handy keybindings for yourself in your .vimrc.
If you indent your HTML the following should work:
set foldmethod=indent
The problem with this, I find, is there are too many folds. To get around this I use zO and zc to open and close nested folds, respectively.
See help fold-indent for more information:
The folds are automatically defined by the indent of the lines.
The foldlevel is computed from the indent of the line, divided by the
'shiftwidth' (rounded down). A sequence of lines with the same or higher fold
level form a fold, with the lines with a higher level forming a nested fold.
The nesting of folds is limited with 'foldnestmax'.
Some lines are ignored and get the fold level of the line above or below it,
whichever is lower. These are empty or white lines and lines starting
with a character in 'foldignore'. White space is skipped before checking for
characters in 'foldignore'. For C use "#" to ignore preprocessor lines.
When you want to ignore lines in another way, use the 'expr' method. The
indent() function can be used in 'foldexpr' to get the indent of a line.
Folding html with foldmethod syntax, which is simpler.
This answer is based on HTML syntax folding in vim. author is #Ingo Karcat.
set your fold method to be syntax with the following:
vim command line :set foldmethod=syntax
or put the setting in ~/.vim/after/ftplugin/html.vim
setlocal foldmethod=syntax
Also note so far, the default syntax script only folds a multi-line
tag itself, not the text between the opening and closing tag.
So, this gets folded:
<div
class="foo"
id="bar"
>
And this doesn't
<div>
<b>text between here</b>
</div>
To get folded between tags, you need extend the syntax script, via
the following, best place into ~/.vim/after/syntax/html.vim
The syntax folding is performed between all but void html elements
(those which don't have a closing sibling, like <br>)
syntax region htmlFold start="<\z(\<\(area\|base\|br\|col\|command\|embed\|hr\|img\|input\|keygen\|link\|meta\|para\|source\|track\|wbr\>\)\#![a-z-]\+\>\)\%(\_s*\_[^/]\?>\|\_s\_[^>]*\_[^>/]>\)" end="</\z1\_s*>" fold transparent keepend extend containedin=htmlHead,htmlH\d
Install js-beautify command(JavaScript version)
npm -g install js-beautify
wget --no-check-certificate https://www.google.com.hk/ -O google.index.html
js-beautify -f google.index.html -o google.index.bt.html
http://www.google.com.hk orignal html:
js-beautify and vim fold:
Add on to answer by James Lai.
Initially my foldmethod=syntax so zfat won't work.
Solution is to set the foldemethod to manual
:setlocal foldmethod=manual
to check which foldmethod in use,
:setlocal foldmethod?
Firstly set foldmethod=syntax and try zfit to fold start tag and zo to unfold tags, It works well on my vim.

How can one close HTML tags in Vim quickly?

It's been a while since I've had to do any HTML-like code in Vim, but recently I came across this again. Say I'm writing some simple HTML:
<html><head><title>This is a title</title></head></html>
How do I write those closing tags for title, head and html down quickly? I feel like I'm missing some really simple way here that does not involve me going through writing them all down one by one.
Of course I can use CtrlP to autocomplete the individual tag names but what gets me on my laptop keyboard is actually getting the brackets and slash right.
I find using the xmledit plugin pretty useful. it adds two pieces of functionality:
When you open a tag (e.g. type <p>), it expands the tag as soon as you type the closing > into <p></p> and places the cursor inside the tag in insert mode.
If you then immediately type another > (e.g. you type <p>>), it expands that into
<p>
</p>
and places the cursor inside the tag, indented once, in insert mode.
The xml vim plugin adds code folding and nested tag matching to these features.
Of course, you don't have to worry about closing tags at all if you write your HTML content in Markdown and use %! to filter your Vim buffer through the Markdown processor of your choice :)
I like minimal things,
imap ,/ </<C-X><C-O>
I find it more convinient to make vim write both opening and closing tag for me, instead of just the closing one. You can use excellent ragtag plugin by Tim Pope. Usage looks like this (let | mark cursor position)
you type:
span|
press CTRL+x SPACE
and you get
<span>|</span>
You can also use CTRL+x ENTER instead of CTRL+x SPACE, and you get
<span>
|
</span>
Ragtag can do more than just it (eg. insert <%= stuff around this %> or DOCTYPE). You probably want to check out other plugins by author of ragtag, especially surround.
Check this out..
closetag.vim
Functions and mappings to close open HTML/XML tags
https://www.vim.org/scripts/script.php?script_id=13
I use something similar.
If you're doing anything elaborate, sparkup is very good.
An example from their site:
ul > li.item-$*3 expands to:
<ul>
<li class="item-1"></li>
<li class="item-2"></li>
<li class="item-3"></li>
</ul>
with a <C-e>.
To do the example given in the question,
html > head > title{This is a title}
yields
<html>
<head>
<title>This is a title</title>
</head>
</html>
There is also a zencoding vim plugin: https://github.com/mattn/zencoding-vim
tutorial: https://github.com/mattn/zencoding-vim/blob/master/TUTORIAL
Update: this now called Emmet: http://emmet.io/
An excerpt from the tutorial:
1. Expand Abbreviation
Type abbreviation as 'div>p#foo$*3>a' and type '<c-y>,'.
---------------------
<div>
<p id="foo1">
</p>
<p id="foo2">
</p>
<p id="foo3">
</p>
</div>
---------------------
2. Wrap with Abbreviation
Write as below.
---------------------
test1
test2
test3
---------------------
Then do visual select(line wize) and type '<c-y>,'.
If you request 'Tag:', then type 'ul>li*'.
---------------------
<ul>
<li>test1</li>
<li>test2</li>
<li>test3</li>
</ul>
---------------------
...
12. Make anchor from URL
Move cursor to URL
---------------------
http://www.google.com/
---------------------
Type '<c-y>a'
---------------------
Google
---------------------
Mapping
I like to have my block tags (as opposed to inline) closed immediately and with as simple a shortcut as possible (I like to avoid special keys like CTRL where possible, though I do use closetag.vim to close my inline tags.) I like to use this shortcut when starting blocks of tags (thanks to #kimilhee; this is a take-off of his answer):
inoremap ><Tab> ><Esc>F<lyt>o</<C-r>"><Esc>O<Space>
Sample usage
Type—
<p>[Tab]
Result—
<p>
|
</p>
where | indicates cursor position.
Explanation
inoremap means create the mapping in insert mode
><Tab> means a closing angle brackets and a tab character; this is what is matched
><Esc> means end the first tag and escape from insert into normal mode
F< means find the last opening angle bracket
l means move the cursor right one (don't copy the opening angle bracket)
yt> means yank from cursor position to up until before the next closing angle bracket (i.e. copy tags contents)
o</ means start new line in insert mode and add an opening angle bracket and slash
<C-r>" means paste in insert mode from the default register (")
><Esc> means close the closing tag and escape from insert mode
O<Space> means start a new line in insert mode above the cursor and insert a space
Check out vim-closetag
It's a really simple script (also available as a vundle plugin) that closes (X)HTML tags for you. From it's README:
If this is the current content:
<table|
Now you press >, the content will be:
<table>|</table>
And now if you press > again, the content will be:
<table>
|
</table>
Note: | is the cursor here
Here is yet another simple solution based on easily foundable Web writing:
Auto closing an HTML tag
:iabbrev </ </<C-X><C-O>
Turning completion on
autocmd FileType xml set omnifunc=xmlcomplete#CompleteTags
allml (now Ragtag ) and Omni-completion ( <C-X><C-O> )
doesn't work in a file like .py or .java.
if you want to close tag automatically in those file,
you can map like this.
imap <C-j> <ESC>F<lyt>$a</^R">
( ^R is Contrl+R : you can type like this Control+v and then Control+r )
(| is cursor position )
now if you type..
<p>abcde|
and type ^j
then it close the tag like this..
<p>abcde</p>|
Building off of the excellent answer by #KeithPinson (sorry, not enough reputation points to comment on your answer yet), this alternative will prevent the autocomplete from copying anything extra that might be inside the html tag (e.g. classes, ids, etc...) but should not be copied to the closing tag.
UPDATE I have updated my response to work with filename.html.erb files.
I noticed my original response didn't work in files commonly used in Rails views, like some_file.html.erb when I was using embedded ruby (e.g. <p>Year: <%= #year %><p>). The code below will work with .html.erb files.
inoremap ><Tab> ><Esc>?<[a-z]<CR>lyiwo</<C-r>"><Esc>O
Sample usage
Type:
<div class="foo">[Tab]
Result:
<div class="foo">
|
<div>
where | indicates cursor position
And as an example of adding the closing tag inline instead of block style:
inoremap ><Tab> ><Esc>?<[a-z]<CR>lyiwh/[^%]><CR>la</<C-r>"><Esc>F<i
Sample usage
Type:
<div class="foo">[Tab]
Result:
<div class="foo">|<div>
where | indicates cursor position
It's true that both of the above examples rely on >[Tab] to signal a closing tag (meaning you would have to choose either inline or block style). Personally, I use the block-style with >[Tab] and the inline-style with >>.