Structuring html document - html

I have taken over some software which produces a html document with no structure.
The HTML in it self is good enough. Well enclosed and nested and what not but it is almost impossible to read with the human eye as the linebreaks are how the tekst editor, used to view the document, pleases.
So, my question is as follows.
Does any of you know a online parser or program that allows the showing of a messy, more or less minified html document, to show a human readable document? Preferablly also indenting he various tags to show nested levels of the tags
Thanks in advance

Try this maybe (just picked the first link for a 'html online tidy' google search). http://infohound.net/tidy/

Try this.
It is online and it is free.

Almost any HTML editor will have this capability. For instance: HTMLKit

The JS Beautifier also works with HTML: http://jsbeautifier.org/
There are other, similar tools available online if you search.

Related

When "viewing source", some sites have neat markup, some sites don't. Why? (pic attached)

Notice how in the 'ugly' side, the doctype is all the way indented and some of the meta lines extend past the left indent.
How can I get my markup looking neat when viewing source in a browser? Is there a certain way to encode the code while using an editor? I use Notepad++ by the way.
Large blocks of unindented code like you see in the left hand side are probably being written out server side, and so although the tag that creates them is nicely indented in your HTML the erver script output will not honour that.
It's not about encoding, it's about writing neat source code, haha. If you are outputting from php or something you can use keep track of how far to indent each thing or you an use some sort of template output function that keeps track of how many tags are open for you and indents the correct amount each time. But, there is no point on having neat HTML, the only important thing is that it's valid. Developer Tools will make it neat for you when you're trying to debug, and actually removing all that whitespace used to make it neat can reduce your page size quite a bit.
The ugly ones probably look pretty in the underlying php or other source. Once generated into HTML it looks ugly, and very few programmers will try to make that pretty too - it's not worth it.
It's funny that what you list as "ugly" seems properly indented to me... at least from what I can tell from the screenshot.
In any case, it doesn't matter. Most of the time these days, sites are made with something dynamic, and a lot of the HTML formatting isn't explicitly output.
If you were to view the source on many of my sites, it is all rammed together on one line, as that is how I echo it out. I don't see the point in wasting bytes on line feeds. Especially these days with all of the browser tools available that reformat the source while debugging.
I use Eclipse to do my coding and I can use Source->Format to clean up my code and format it nicely.
For Notepad++, I believe you can use HTML tidy as per: Formatting code in Notepad++
TextFX -> HTML Tidy -> Tidy: Reindent XML
You really want your HTML code to look like this:
view-source:http://lightningsoul.com/
As it uses the minimum amount of data to present itself to the browser. Remember that indents and white-spaces consume data as well as any other character.

Javadocs without HTML

Robert C. Martin's book Clean Code contains the following:
HTML in source code comments is an abomination [...] If comments are going to be extracted by some tool (like Javadoc) to appear in a Web page, then it should be responsibility of that tool, and not the programmer, to adorn the comments with appropriate HTML.
I kind of agree - source code surely would look cleaner without the HTML tags - but how do you make decent-looking Javadoc pages then? There's no way to even separate paragraphs without using a HTML tag. Javadoc manual says it clearly:
A doc comment is written in HTML.
Are there some preprocessor tools that could help here? Markdown syntax might be appropriate.
I agree. (This is also the reason why I am -strongly- opposed to C#-style "XML comment blocks"; the Javadoc DSL at least provides some escape for top-level entities!). To this end I simply do not try to make the javadoc look pretty...
...anyway, you may be interested in Doxygen. Here is a very quick post Doxygen versus Javadoc. It also brings up the issues that you do :-)
HTML is nothing I'd like to see in "normal" comments. But for Tools like JavaDoc, HTML adds the possibility to add formatting information, bullet points etc...
I would distinguish these two things:
non-javadoc code comments are for the programmer who maintains or enhances the code i question. he has to dig through existing sources, and any HTML in coments just doesn't make things easier. So, ban it in normal comments.
javadoc-comments are used to generate documentation. Use HTML where it helps. But a very limited subset of HTML should suffice.

WYSIWYG browser editor that generates *good* HTML?

I'm searching for a "suck less" WYSIWYG in-browser X?HTML editor that generates good HTML code.
(no <font>, <foo style="...">, <p></p><span></span><p><span> </span><span><span>blah</span></<span></p> and so on -- <b> and <i> etc is ok).
Should be easy-to-use as it is going to be used by people that do not know what HTML is.
Any suggestions?
Extra points for Copy-and-Paste-from-Word-readiness! :-)
(I found a lot of editors but they all create that <font> and nested <span> crap that breaks site design and bloats a site with one table up to 100kB.)
Download the current version of CKEditor and look at the XHTML output sample. It shows how to use full WYSIWYG but it doesn't generates font or styles. You just need to adjust the configuration to your needs.
What about WYMEditor?
WYMeditor has been created to generate perfectly structured XHTML strict code, to conform to the W3C XHTML specifications and to facilitate further processing by modern applications.
With WYMeditor, the code can't be contaminated by visual informations like font styles and weights, borders, colors, ... The end-user defines content meaning, which will determine its aspect by the use of style sheets. The result is easy and quick maintenance of information.
I've used it a little and while it takes quite a bit of tweaking if you have very specific needs, it does work out of the box for simple XHTML editing. If you set up specially annotated CSS files then it will detect the styles you want users to use and block level elements to which they apply. You can also tell it how to display these styles in the editor (which might be different from how you want them displayed in the resulting XHTML).
Of course, it generates XHTML, not HTML, so it may not meet your exact needs.
Wikipedia has a category for them:
http://en.wikipedia.org/wiki/Category:JavaScript-based_HTML_editors
You can use Markdown with the WMD UI, it's the one used by Stack Overflow. It always produces valid HTML code.
I just recently searched for an editor to create solid documentation, whose output is suitable for Subversion diffs: https://superuser.com/questions/126621/wysiwyg-editor-for-structured-text-suitable-for-svn-versioning
The editor that was suggested - "KompoZer" - turned out to be fantastic, especially because it generates very clean HTML (in my opinion). And I say that, although I had originally preferred something leaner than HTML.
P.S. Reading your question again, I'm not sure, what you mean with a "browser editor" - are you looking for an editor that can be integrated in an HTML page? KompoZer is based on a browser, but it can probably not be integrated in an HTML page.
I recently switched one of my projects to markdown to avoid this exact issue. There's still a bit of a learning curve for the users but I haven't had to deal with the usual issues that occur when they copy/paste content from Word and wonder why it blew up.
Having said that, I prefer CKEditor over TinyMCE and the Telerik controls. I've generally found it generates somewhat cleaner HTML.
There are several WYSIWG editors for embedding within your website out there.
WYMeditor (http://www.wymeditor.org/) looks very nice and seems to be a good fit for targetting clean and valid XHTML results.
Spaw2. Although it's kinda abandoned now.
The Apple Cocoa NSTextView class exports quite nice html, where all the fiddling is done through specifying a style sheet in the header. The Apple TextEdit editor uses this.
http://tinymce.moxiecode.com/ - easy to use, can import form Word, and restrict formatting to predefined CSS styles, to provide consistent output.
This post is 8+ years old now but still relevant...
I found an awesome github page with a curated list of WYSIWYG editors, including a few WYSIWYM ones which guarantee sane html. As of 2018, the most current and best WYSIWYM one looks like ProseMirror, or maybe ORY Editor if you're looking for something to edit entire webpages(!) in one textfield.

How do you find mismatched tags in HTML?

I've inherited some rather large static HTML files that need to be fixed up to work in webkit-based browsers, Safari in particular. One of the common bugs I've found that cause rendering differences is missing </div> tags. (Both IE7+ and FF3+ seem to ignore these, or make good guesses as to where to close the DIVs, and render as expected.) I'm used to using vim with HTML syntax highlighting for editing, but end up writing awk scripts to match starting and ending tags.
What is your favorite tool or technique for matching start and end tags in a large HTML file?
UPDATE: I'm currently in a shop that targets HTML 4.01 Strict, not XHTML.
The W3C HTML Validator works fairly well, or if you want something a little simpler then the Tidy FireFox plugin also works.
The w3c Validator can be (extremely) verbose, but it does check for missing closing tags.
HTML Tidy is a great command line tool. I often use it with WGet
Most IDE's usually let you know via highlighting, fuzzy-underline or a warning.
Div Checker is a great tool that focuses on div tags specifically.
While other tools were only able to tell me that "some tag was missing somewhere".
Div-Checker removes other tags, code, and most comments, to create a clean visual structure of just the divs themselves.
From this div map, it's fairly easy to see if nested divs are correctly paired !
I was able to locate a missing div left out by a wordpress theme developer, with the help of this tool.
Here is the Posted Answer from #noah-whitmore that enlightened me to this awesome tool.
There are a couple other useful tools mentioned in that thread as well, such as unclosed-tag-finder (visually not so easy to read, but helpful if your missing tag is not a div).
vim/gvim & NetBeans both do a great job of tag matching
What is your favorite tool or technique for matching start and end tags in a large HTML file?
A text editor with a built-in XML well-formedness checker, combined with using XHTML for everything.
Sublime Text with the Tag plugin has a Tag Lint feature which which aims to check correctness of opened and closed tags.

Equivalent of LaTeX's \label and \ref in HTML

I have an FAQ in HTML (example) in which the questions refer to each other a lot. That means whenever we insert/delete/rearrange the questions, the numbering changes. LaTeX solves this very elegantly with \label and \ref -- you give items simple tags and LaTeX worries about converting to numbers in the final document.
How do people deal with that in HTML?
ADDED: Note that this is no problem if you don't have to actually refer to items by number, in which case you can set a tag with
<a name="foo">
and then link to it with
some non-numerical way to refer to foo.
But I'm assuming "foo" has some auto-generated number, say from an <ol> list, and I want to use that number to refer to and link to it.
There is nothing like this in HTML.
The way you would normally solve this, is by having the HTML for the links generated, by either parsing the HTML itself and inserting the TOC (you can do that on the server, before you send the HTML out to the browser, or on the client, by traversing the DOM with a little piece of ECMAScript and simply collecting and inspecting all <a> elements) or generating the entire HTML document from a higher level source like a database, an XML document, markdown or – why not? – even LaΤΕΧ.
I know it's not widely supported by browsers, but you can do this using CSS counter.
Also, consider using ids instead of names for your anchors.
Instead of \label{key} use <a name="key" />. Then link using Link.
PrinceXML can do that, but that's about it. I suppose it'd be best to use server-side scripting.
Here's how I ended up solving this with a php script:
http://yootles.com/genfaq
It's roughly as convenient as \label and \ref in LaTeX and even auto-generates the index of questions.
And I put it on an etherpad instance which is handy when multiple people are contributing questions to the FAQ.