How to edit HTML in Vim? - html

I'm new to Vim and I'm trying to get used to it. I just created a .vimrc file and got Vim to display line numbers and do incremental searching. I also enabled syntax highlighting. Now I want to enable things to make writing HTML easier. I searched for html.vim in /usr/share/vim and found this:
/usr/share/vim/vim72/syntax/html.vim
/usr/share/vim/vim72/ftplugin/html.vim
/usr/share/vim/vim72/indent/html.vim
Now, what do I have to do to enable HTML auto indentation? Copy those files to ~/.vim? Symlink them? Or does Vim automagically load them from /usr/share/vim/? (It already does HTML syntax highlighting, so I think that's possible - but it doesn't do HTML auto indenting)
I heard set autoindent in .vimrc would do the trick, but what's with .c files? I thought they needed set cindent, but does cindent work with HTML?

The very first thing you should do is try vimtutor and complete it a couple of times. Once the basics are covered you can start to play with plugins…
SnipMate is inspired by TextMate's snippets and/so is beautiful, it has a lot of HTML snippets by default and it's extremely easy to add your own. To use it, type div then hit Tab to obtain:
<div id="|">
</div>
with the caret between the "" ready for you to type an id; hit Tab again to move the caret on the blank line:
<div id="myId">
|
</div>
Beautiful. Many editors have this feature, though.
If you have a lot of HTML to write — say a few emails/newsletters a day — another plugin called SparkUp allows you to produce complex HTML with only a few key strokes and some CSS knowledge. You start by typing something like:
table[id=myTable] > tr*3 > td*2 > img
then you hit <C-e> (CtrlE) to obtain:
<table cellspacing="0" id="myTable">
<tr>
<td>
<img src="|" alt="" />
</td>
<td>
<img src="" alt="" />
</td>
</tr>
<tr>
<td>
<img src="" alt="" />
</td>
<td>
<img src="" alt="" />
</td>
</tr>
<tr>
<td>
<img src="" alt="" />
</td>
<td>
<img src="" alt="" />
</td>
</tr>
</table>
with the caret inside the first empty "". Hit <C-n> and <C-p> to go to the next/previous field.
Magical. The plugin is available for more editors, though.
I second text objects and Surround.vim which are unbelievably useful.
Another cool feature is the visual-block mode (:help visual-block) where you can select columns of text. Say you have:
<ul>
<li><p>My text doesn't mean anything</p></li>
<li><p>My text doesn't mean anything</p></li>
<li><p>My text doesn't mean anything</p></li>
<li><p>My text doesn't mean anything</p></li>
</ul>
place your cursor on the > of the first <li>then hit <C-v>and move the cursor downward to the fourth <li>. Hit I (capital I) to enter INSERT mode just before the > and type class="myElement" then <Esc> to obtain:
<ul>
<li class="myElement"><p>My text doesn't mean anything</p></li>
<li class="myElement"><p>My text doesn't mean anything</p></li>
<li class="myElement"><p>My text doesn't mean anything</p></li>
<li class="myElement"><p>My text doesn't mean anything</p></li>
</ul>
Ho yeah!
Seriously, Vim is great.

Take a look at the AutoCloseTag plugin to close tags as you type them. And set autoindent should be handling HTML indentation for you.
Also you should read the docs in :help text-objects to learn about using the inner and outer tag selections. For example, in normal mode you can do cit to change the text inside the current tag. Or in visual mode at will expand the visual selection to encapsulate the tag around the cursor.
Finally, look at the Surround.vim plugin, which can surround a selection or text object with a tag, or change the tag around it.

Related

Notepad++ XML Tools Autospacing Logic

PLEASE HELP ME
So I started using the CTRL+ALT+SHIFT+B autospacing feature that comes with the XML Tools plugin using Notepad++ v7.2. Everything is working fine I just have a question about the logic the plugin uses. In this excerpt of code:
<tr>
<td>
<img id="codeImg" alt="matrix code" src="http://i860.photobucket.com/albums/ab170/gondrongsolo/background.gif">
</td>
<td>
<ul>
<li>
<em>Python</em> programming</li>
<li>Shell scripting</li>
<li>Reddit addict</li>
<li>Fitness nut</li>
<li>Raidboss Gamer</li>
</ul>
</td>
</tr>
Why does using the autospacing feature correct it like this? Shouldn't the first <tr> match the same spacing with the </tr> closing tag? I'm finding this happens everything I inserted a tag that doesn't need a closing tag, such as img or p tags.
I would really like to be able to fix the auto formatting because it allows me to read my code more clearly and would greatly appreciate a response. If I need a different plugin please direct me!
So despite the fact you do not need to make img a self closing tag in order to use it in html, you can use it at the end of the tag. If using this plugin you can make it happy when you do <img ... /> without causing errors on the web page. The plugin will read this correctly then and fix the issue resulting in the following formatting:
<tr>
<td>
<img id="codeImg" alt="matrix code" src="http://i860.photobucket.com/albums/ab170/gondrongsolo/background.gif"/>
</td>
<td>
<ul>
<li>
<em>Python</em> programming</li>
<li>Shell scripting</li>
<li>Reddit addict</li>
<li>Fitness nut</li>
<li>Raidboss Gamer</li>
</ul>
</td>
</tr>

How to close <IMG> tags and add ALT text with Regex & WebStorm

I am having to edit a large number of HTML files (produced by an email generator) that contain code that doesn't pass the validator. In particular, there is a large number (hundreds) of <IMG> that need to be closed (<IMG/>), and a large number missing ALT text (but some already contain ALT text). I am using WebStorm 10 and my thought was to do the search and replace with regex, but I'm not that skilled at it, so am banging my head a big.
I got this far:
search for
(<img)([^/>]+)(>)
and replace with
$1 $2 alt="" /$3
but this gives me duplicate ALT text if the tag already contains an ALT text, and in that case, it adds an extra "/" inside the IMG tag, just before the ALT. I also want to make sure if the IMG tag is closed that it doesn't get an extra "/".
So if I have something like
<img src="foobar.jpg" width="108" height="71" style="display:block" >
I want to get
<img src="foobar.jpg" width="108" height="71" style="display:block" alt="" />
(but if the tag has an ALT or is already closed, leave it alone)
I spent an hour on this and solved much of the issue, but it's not perfect and doesn't work in the cases just above. Any help appreciated.

Image source not found in images content element

This is an extract of the rendered code of the frontend.
<div style="width: 345px;" class="csc-textpic-imagewrap">
<ul>
<li style="width: 0px;" class="csc-textpic-image csc-textpic-firstcol">
<img width="" height="" border="0" alt="" src="">
</li>
<li style="width: 335px;" class="csc-textpic-image csc-textpic-lastcol">
<img width="335" height="381" border="0" alt="" src="uploads/pics/katze_start_01.jpg">
</li>
</ul>
</div>
The first <li><img> contains no src, height, width, alt, etc. In the backend there is a image and it should work. Well I use the standard CSS Styled Content Framework and render the content elements via styles.content.getLeft (does work). For some reason (it feels not logical) it does not render the first image of an "image" content element...
What is the error here?
Check in install tool if imagemagick/graphicsmagic works. There are five tests defined in "Image Processing".
If everything works fine there, check if your image has special chars (f.e. german umlauts) or spaces in there. Replace spaces (i guess you uploaded them via scp/ftp, via TYPO3 they should be replaced automatically) and check install tool options "[SYS][UTF8filesystem]" and "[SYS][systemLocale]" in section "All Configuration".

Does anyone know of a text editor that allows you to indent highlighted chunks of code all at once?

I am extremely OCD when it comes to the layout of my code, and it's a pain to press the down arrow key and Tab a hundred times in a row. Does anyone use a text editor that has the function of indenting chunks of code at the same time? Such as, if I have this:
<div>
<img src="blahblah" style="float:left" />
<span>Hey it's a picture.</span>
</div>
<div>
<img src="somephoto" style="float:right" />
<span>Another picture</span>
</div>
...then I come back later and want to wrap both divs in another div, but it comes out like this:
<div>
<div>
<img src="blahblah" style="float:left" />
<span>Hey it's a picture.</span>
</div>
<div>
<img src="somephoto" style="float:right" />
<span>Another picture</span>
</div>
</div>
When I want it to look like this:
<div>
<div>
<img src="blahblah" style="float:left" />
<span>Hey it's a picture.</span>
</div>
<div>
<img src="somephoto" style="float:right" />
<span>Another picture</span>
</div>
</div>
Obviously this is a minor example, but when it comes to large files this becomes quite a hassle. I use Bluefish on Ubuntu and Notepad++ on Windows, and neither seem to come with the capability to indent a block of code all at once. What are your thoughts?
Any decent text editor, including Notepad++, can do this.
Select the lines and press Tab.
Yup go with Notepad++, its brilliant. Dont forget Shift + Tab too though, just as useful:)
Any decent text editor will have this functionality implemented into the editor.
They already mentioned Notepad++, but there's also Geany which is available for Windows and Linux (Also has the ability for auto close tags), Droidedit for Android,Smultron for OS X users, and if you love web based apps like myself there's the CodeMirror JavaScript library.
If you want to indent a large selection of code use Tab.
If you want to unindent/outdent a large selection of code use SHIFT + Tab.
Becomes very helpful. Also CTRL + F to find text, CTRL + H to replace text (Notepad++ has a replace all function, but Geany does not)

Edit many HTML-files with Regex, empty alt-tag once in 'img src' occurring twice

To begin and to be clear, I am using Regex to edit existing HTML code on many files and NOT to parse HTML.
Summary: The problem I am having to solve is that the content of the alt-tag in 'img src' gets removed. This for about 4500 HTML files.
Here is my the actual sample of the exisiting HTML Markup:
<!-- End Bottom Bar --><img src="image/sdim0490.jpg" alt="sdim0490.jpg" border="0" width="1" height="1" /><!-- Google Analytics Script -->
What I need to do is to remove the content of the alt-Tag so it's empty. There are about 4500 html-pages affected in various folders. I am using Notepad++ that allows editing of files contained in a folder using Regex.
The most difficult problem for me is that each HTML-page has at least 2 tags for 'img src', but I only need to edit one of them, actually the last occurence of it.
That one specific 'img src' tag is nested between the two comment-tags as shown in the example above, that is the case for ALL pages to be edited. But there is a but, sometimes there is an additional or even more empty line(s) above and/or below the comments. I know that doesn't make it simpler. But in every case both comments are there as outlined above and below. Of course the content of the alt-tag changes with every html-page and is never the same and unique for every page to be edited.
The result after applying the regex-edit should look like this:
<!-- End Bottom Bar --><img src="image/sdim0490.jpg" alt="" border="0" width="1" height="1" /><!-- Google Analytics Script -->
It does not matter whether the extra empty lines above and/or below are removed or not. What matters only is that the alt-tag is empty.
Hope you may help me to create a regular expression that will NOT affect any other 'img src'-tag in the markup.
The reason for having an empty alt-tag is that a decorative or any other image that is not of significance to the content should be marked with en empty alt-tag. At least that what I was told about how search-bots value and differentiate images (among many other aspects).
You may wonder why I'm setting a width and height to '1'. Well I use this technique to pre-load the next image to be viewed that is of significance to the following page. This may increase the browsing experience for the user.
Looking forward to receiving some feedback.
Thank you for your attention, Hans.
UPDATE to my question:
After some more thinking I found that I've got a single value to search for. It's: border="0"
And that value is not needed at all. So an alternative would be to search for that and replace it as below including the preceding alt="xyz":
replace: alt="xyz" border="0" with: alt=""
That would serve my intention fully.
As you do not want to parse the HTML files, it is possible to use a regex to do what you wish.
In Python language, here's the code of the program that does it:
import re
text = '''<img src="image/sdim0490.jpg" alt="bling" border="0" width="1" height="1" />
<!-- End
Bottom Bar -->
##############################
<img src="image/sdim0491.jpg" alt="bling" border="0" width="1" height="1" />
##############################
<!-- Google
Analytics Script
-->
<img src="image/sdim0492.jpg" alt="bling" border="0" width="1" height="1" />'''
regx = re.compile('(<!--\s+End\s+Bottom\s+Bar\s+-->'
'.*?'
'<img\s+src="image/.+?"\s+alt=")(.*?)("\s+.*? />'
'.*?'
'<!--\s+Google\s+Analytics\s+Script\s+-->)',
re.DOTALL)
print regx.sub('\\1\\3',text)
result
<img src="image/sdim0490.jpg" alt="bling" border="0" width="1" height="1" />
<!-- End
Bottom Bar -->
##############################
<img src="image/sdim0491.jpg" alt="" border="0" width="1" height="1" />
##############################
<!-- Google
Analytics
Script -->
<img src="image/sdim0492.jpg" alt="bling" border="0" width="1" height="1" />
\s is equivalent to [ \t\n\r\f\v] . I replaced the blanks in the pattern with \s+ to take account of the fact that sometimes, in HTML files, the tagged-elements are broken by randomly placed newlines. That is a fact that is often used by opponents of treatment of SGML/HTML/XML files with regexes to affirm that such files must always be treated with a parser, with which I don't agree. Note that I suppose in my code that such randomly placed newlines can occur between words, but not IN a word.
Here's a regex... of course an html parser has distinct advantages. It would be interesting to see which is faster. My money is on the regex. Maintainability likely goes to the html parser.
string input =#"<img src=""image/sdim0490.jpg"" alt=""bling"" border=""0"" width=""1"" height=""1"" /> <!-- End Bottom Bar --><img src=""image/sdim0490.jpg"" alt=""bling"" border=""0"" width=""1"" height=""1"" /><!-- Google Analytics Script --> <img src=""image/sdim0490.jpg"" alt=""bling"" border=""0"" width=""1"" height=""1"" />";
string pattern = #"(?<=\<!-- End Bottom Bar --><img[^>]+alt="")([^""]*)(?="".*<!-- Google Analytics Script -->)";
string html = Regex.Replace (input, pattern, "", RegexOptions.IgnoreCase);
Here's a sed command to clean the alts in img tags for all html files:
sed -i '' -e 's/(<img[^>]*?)alt="[^"]*"/$1alt=""/g' /somepath/*.html
Sometimes it’s good to sleep things over. This morning I had the idea that lead to the solution using Notepad++ (by the way this is a pretty decent editor).
Since the tag 'img src=' occurs at least twice I tried to a find a pattern that is unique to the tag 'img src=' in question. I did not occur to me earlier that ending height=”1” for the tag in question is unique enough. With this I did not need to take into account the lines above and below as I assumed initially. Also I removed the border=”0” as this data belongs into CSS and not into the mark-up.
Finally I got the search string I seeked for and I entered is as follows (Search Mode: Regular Expression, checked)
Strings:
Search string:
alt="(.*).jpg"(.*)width="1" height="1" />
Replace string:
alt="" width="1" height="1" />
With the result of 3937 hits, Bingo. (so my assumption for the total of 4500 files was quite close).
A more general solution for anyone who justs want to find any img tag with a non-blank alt attribute you could use this in notepad++ as well:
(<img [^>]*alt=")[^"]+("[^>]*>)
Replace the double quotes with quotes in the code if that's what you're using but you cannot mix the two in one expression because of the possible "Person's object" edge-case where an apostrophe is used between double quotes (or the rarer vice-versa).
Then in the replace field you'd use the captures you got from the find:
\1\2
What the regex search does is:
1.) Find an open img tag
2.) Check that it has an alt attribute before the img tag closes
3.) Make sure the alt tag isn't blank already
Then the replace simply ignores the content that was between the quotes and the output is a blank alt attribute. With a few modifications you could find the alt attributes that are empty and fill them, or re-fill the ones with content or all kinds of things.