What do <o:p> elements do anyway?

What do <o:p> elements do anyway? - html

I've been running into some (standard) issues with Microsoft Office injecting its nasty markup into some html after forwarding an email via Outlook.
I'm interested to know:
Is there a resource that explains what <o:p> elements actually do
What other MSO elements are commonly injected

Couldn't find any official documentation (no surprise there) but according to this interesting article, those elements are injected in order to enable Word to convert the HTML back to fully compatible Word document, with everything preserved.
The relevant paragraph:
Microsoft added the special tags to Word's HTML with an eye toward backward compatibility. Microsoft wanted you to be able to save files in HTML complete with all of the tracking, comments, formatting, and other special Word features found in traditional DOC files. If you save a file in HTML and then reload it in Word, theoretically you don't loose anything at all.
This makes lots of sense.
For your specific question.. the o in the <o:p> means "Office namespace" so anything following the o: in a tag means "I'm part of Office namespace" - in case of <o:p> it just means paragraph, the equivalent of the ordinary <p> tag.
I assume that every HTML tag has its Office "equivalent" and they have more.

Related

Do we really need to use </body> and </html> closing tags?

More often than not I see HTML without the closing tags, especially body and html.
According to:
http://www.w3.org/TR/html5/sections.html#the-body-element
http://www.w3.org/TR/html5/semantics.html#the-html-element
This can be omitted, but what about cross device issues? Like running such HTML on androids or windows phone's or whatever you know where not having these closing tags this would not work.

Do we need it? Well that depends on your DTD. If you're using XHTML, then yes, you will need it to conform. For accessibility sake I would include the closing tags, you never know if there's a screen reader (or other piece of software) out there that only parses valid XHTML, you could be hindering partially sighted people for example.
Google will also, apparently, rank your valid documents higher than invalid documents in their listings.
Here's a document by a friend of a friend that answers your question a bit better; granted that it was written in 2008, I think some of the points still apply.
If you ever need to use the same html in an XHTML application you won't need to mess around with it, you can just copy it across and not have to worry about conforming (because you already are).
On a separate note, you are essentially future proofing your markup. Who's to say that the spec won't eventually change to "You must include the closing head and body tags"? You won't need to worry if you already have them. It is, however, highly unlikely that the spec will change to, "You must not include the closing head and body tags".
As a great man once said:
Should I close the lid of the toilet when I'm finished? Yes,
especially if the wife is going to use it after me.
- Darren Gourley (Nov 2015)

use https://validator.w3.org/
select your standards target... if that says it passes then I think its Gd enough.
Please bared in mind the HTML5 spec is still being defined/evolved.

Technically you could omit html, head and body tags all together as long as the markup follows the following conditions:
http://www.w3.org/TR/2011/WD-html5-20110525/syntax.html#optional-tags
In regards to your comment about half your team using them and half your team not I would suggest that as long as either option is technically standards compliant you just choose one and move on as the entire subject is open for discussion and interpretation. My personal opinion would be that it's probably more important for your team to get on the same page and produce work of a similar standard especially if you have more than one person working on a project simultaneously.

You can leave out the end tags. Indeed you can leave out the opening tags, too (obviously not if you are using any attributes on them).
Not only is that the case with the more recent standard, but it's been the case since the very beginning. (The obvious exception being if you are using the XML syntax, since XML itself requires all elements have an explicit closing tag.)
Browsers have been dealing with the existence of HTML documents lacking the trailing closing tags since the 1990s. If the standards hadn't allowed it they'd probably still have dealt with them, much as they try their best to deal with all manner of messy code. (This causes it's own problems, which was one of the motivations behind XML not allowing optional tags, but that's another matter).
Many people consider it poor style. I would be one of them. But it's certainly widely supported.

Why use quote tags when quotation marks will do?

I'm new to coding and have a trivial question: Why is it necessary to use the quote tag when it's easier to just write out the quotation marks in the text?

It's All About Semantics
The purpose of HTML is to allow you to add semantic information to the resource. In other words, when you surround a quote with quote tags, you are describing to the program that will be using this resource as to what the content means. Programs aren't always just browsers that render the HTML into an image for us to view; they might be screen readers for the vision impaired or a program that reads information from a web page and inserts data into a database (such as the web crawler for a search engine).
Why Semantics?
A similar question to yours would be, "why use a header tag when I could use a tag with a custom style to make the font larger and bold?"
The reason is because by marking text with a header tag (h1, h2, etc.) you are telling the program reading the HTML document that the content has special meaning. The program can then do things with the document besides simply displaying it to the user; if the HTML document has header tags in place, the program could automatically create a table of contents of the document by simply listing out the contents of the header tags (similar to how a Wikipedia article can automatically create a table of contents on the top of the page).
So, everything starts with adding semantic information. As others have pointed out, you can style the content of a quote tag, where you cannot style content within two quote characters. This is a by-product of adding semantics, however, and not necessarily the end goal. Of course, you could have styled the quotes by surrounding it with <div class="quote">..</div>. However, by doing this you lose the ability to have the browser help you render the quote as you like (see the quotes css attribute), or even have the browser render the quotation marks using the default quotation marks of the user's locale.
Even after this, there is more to the quote element than styling. For example, the quote element offers the ability to show additional information with a "site" attribute.
How could semantic information be used for quotes?
For example, let's say I create an HTML page called "http://example.com/MyThoughts". In that page, I have the following HTML...
<p>The W3C page <cite>About W3C</cite> says the W3C's mission is <q
cite="http://www.w3.org/Consortium/">To lead the World Wide Web to its
full potential by developing protocols and guidelines that ensure
long-term growth for the Web</q>. I disagree with this mission.</p>
Notice the cite attribute on the <q> element here (not to be confused with the <cite> element). If people added quote elements in this way, we could now create a web crawler that goes through the internet looking for pages that have a quote and citation. Then, using that data, we could create a database of documents and their citations. We could create a new site where the output might look like...
Resource: "About W3C"
Location: http://www.w3.org/Consurtium
Document Citing This Resources:
Resource: "My Thoughts"
Location: http://example.com/MyThoughts
...or equally useful...
Resource: "My Thoughts"
Location: http://example.com/MyThoughts
Documents Cited in this Resource:
Resource: "About W3C"
Location: http://www.w3.org/Consurtium
As you can see, we have created an application that joins together data from other websites without the need for APIs or direct database access. That is the power of adding semantics to your documents.
Conclusion
When doing day to day development work, the possibilities of what interesting things can be done be introducing semantics into your HTML documents is typically ignored in place of ensuring that the site "looks good in a browser". Semantically adding information can still help due to things such as helping you style your quotes correctly for the specific user.
To be sure, if you used a quote character instead of a quote element Tim Berners-Lee is not going to come bust down your door. However, the built-in browser rendering of quotes specific to the user's locale is a nice carrot on the stick.

It certainly is not necessary, but there are a number of reasons why using <q> might be preferred:
The quote element is more semantically explicit than quotation marks
The quote element can be styled (as noted by #Juhana)
The quote element allows for characters other than '"' (the guillemet - « - for example)
The quote element can be used with the cite attribute to explicitly link the quote and the source

What is the cite attribute for?

The cite attribute specifies the address of the source of the quoted text, I think, but who uses this information?
For example:
<q cite="http://www.example.com/quote">
<p>“A quote”</p>
<footer>Person quoted</footer>
</q>
The source of the quoted text isn't visible to the end-user in a normal browser, so who does use this information, and how?

First, it's not only blockquote where you can use the cite attribute.
You can use the cite attribute on the following elements also:
<blockquote>
<del>
<ins>
<q>
Why would one use cite in above elements?
To point to where the content is taken from, or change or deletion is referred.
Here is what w3.org says,
User agents may allow users to follow such citation links, but they
are primarily intended for private use (e.g: by server-side scripts
collecting statistics about a site's edits), not for readers
Now, the question, who uses it?
The cite attribute is used to identify the online source of the quotation in the form of a URI (for example, "http://sourcewebsite.doc/document.html").
The value of the cite attribute isn't rendered on screen (although this potentially useful meta data could be extracted and written back into the webpage through the magic of DOM Scripting).
As such, browser support for this attribute is marked as none, but because it has other potential uses (for search engine indexing, retrieval via DOM scripting, and more) and there is the likelihood of improved native support being provided for the attribute in future browser versions, you should use the cite attribute when you use the above elements.
So, currently no one uses it, but in future maybe used in user-agent or my search engines, so better to use it.

Both the <cite> tag and the cite attribute are for semantic purposes, which means that they are simply for giving a website more "meaning". For example, I could use a <div> tag for a quote, rather than using a <blockquote> tag, but this provides less meaning to the browser, and hence using <blockquote> is recommended for quotes.
The same is with the <cite> tag and cite attributes. As per the MDN definition for the cite attribute (link here):
Use the cite attribute on a or element to reference
an online resource for a source.
"so who does use this information, and how?" - I believe that search engines (e.g. Google) would use this information to show potential links between documents. If you think about this it is a major point. Check out the image below:
Notice how it shows the "Samsung Group" information box on the right. The guys who work at Google don't write this information - rather, it is sourced from Wikipedia. However, this information would be of greater relevance to the search "samsung" when this information is also written on other websites, with the cite attribute linking this information to Wikipedia (hence increasing the relevancy of Wikipedia). This is why Wikipedia's information is used here, and not some primary school's website regarding Samsung phones.
The cite attribute simply provides more meaning to the website. Tim Berners-Lee has described the semantic web as a component of "Web 3.0" - in other words, many components of the updating HTML language are simply to provide more meaning onto the webpage, as a step closer to Web 3.0.
TL;DR - in simpler terms, the cite attribute is just to provide more meaning to the web page, and may be used for search engines for better web linkage.

W3C has this to say:
The value of this attribute is a URI that designates a source document or message. This attribute is intended to give information about the source from which the quotation was borrowed.
It's not visible and I can't think of anywhere it's used except perhaps by search engines.

It is meant to be used by machine which collect and arrange data eg. search engines, but it can be used by any machine. It is meant to make webpages more systematic to be read by machine. As they can not understand which part of text represent citation and quote based on only context.
you can look up Semantic Web for more information.
http://en.wikipedia.org/wiki/Semantic_Web

Yes, the source of the quotation isn't visible to end user. So it's just a reference to the source.
Definition from WHATWG.ORG:
Content inside a q element must be
quoted from another source, whose
address, if it has one, may be cited
in the cite attribute. The source may
be fictional, as when quoting
characters in a novel or screenplay.
If the cite attribute is present, it
must be a valid URL potentially
surrounded by spaces.

Quoted from W3Schools:
The cite attribute is not supported by any of the major browsers.
However, search engines may use it to get more information about the quotation.
http://www.w3schools.com/tags/att_q_cite.asp

It's just another meta data chunk that can be used by server side scripts to collect statistics or by front end developpers to add functionnalities (they can choose to print the source, allow to access the original source, etc...).
It's just a good practice to have the original source written somewhere although it is actually not very useful for the end user.

author html for ms word

my objective is to generate HTML markup to target ms word. So far my findings are, if you have all the styles inline to an element, the document, when opened in word renders properly. However it is lengthy task.
<h1 style="font-family:Arial">Inventory</h1>
This is how I try to achieve formatting. If i want to maintain a constant font across the document, in my HTML, I'd have to add font-family to all the elements like I've done above.
Later, I came across a codeproject article. http://www.codeproject.com/KB/office/Wordyna.aspx Now I am sort of convinced that you can declare the styles globally, but the styling language used and the formatting is not like CSS, and, I think its proprietary to ms word document formatting. I am looking for any tutorials/articles for this styling being used.
ps: I am aware about OpenXML etc, etc. I feel its too complex for me to implement at this point.

Word --should-- open valid (read: not Microsoft's proprietary html-ish mess) without fail as it's the rendering engine for Outlook when you open an HTML email. You could go to the effort to build a document entirely in-line (read: only best practice for Microsoft) as we do for HTML emails, but I suspect there are several different ways to skin this cat.
Personally, if I was trying to get a rich text formatted document from html to Word I'd use a tool such as PHPDocX to build a proper word document natively, then if I really wanted Word HTML I could simply hit save on Word. I've had to do similarly with Excel, where it will accept CSV, but the outcome is always better with XLSX, and there's a similar plugin to easily author a proper XLSX document.
If that's too difficult a route (and it's not that bad, trust me) then I'd stick to formatting following HTML Email rules. Simple guides are all over the web, such as here. And, since Outlook 07-current uses Word's html rendering engine, one could deduce that it has the same limitations listed here

HTML: Include, or exclude, optional closing tags?

Some HTML1 closing tags are optional, i.e.:
</HTML>
</HEAD>
</BODY>
</P>
</DT>
</DD>
</LI>
</OPTION>
</THEAD>
</TH>
</TBODY>
</TR>
</TD>
</TFOOT>
</COLGROUP>
Note: Not to be confused with closing tags that are forbidden to be included, i.e.:
</IMG>
</INPUT>
</BR>
</HR>
</FRAME>
</AREA>
</BASE>
</BASEFONT>
</COL>
</ISINDEX>
</LINK>
</META>
</PARAM>
Note: xhtml is different from HTML. xhtml is a form of xml, which requires every element have a closing tag. A closing tag can be forbidden in html, yet mandatory in xhtml.
Are the optional closing tags
ideally included, but we'll accept them if you forgot them, or
ideally not included, but we'll accept them if you put them in
In other words, should I include them, or should I not include them?
The HTML 4.01 spec talks about closing element tags being optional, but doesn't say if it's preferable to include them, or preferable to not include them.
On the other hand, a random article on DevGuru says:
The ending tag is optional. However, it is recommended that it be included.
The reason I ask is because you just know it's optional for compatibility reasons; and they would have made them (mandatory | forbidden) if they could have.
Put it another way: What did HTML 1, 2, 3 do with regards to these, now optional, closing tags. What does HTML 5 do? And what should I do?
Note
Some elements in HTML are forbidden from having closing tags. You may disagree with that, but that is the specification, and it's not up for debate. I'm asking about optional closing tags, and what the intention was.
Footnotes
1HTML 4.01

There are cases where explicit tags help, but sometimes it's needless pedantry.
Note that the HTML spec clearly specifies when it's valid to omit tags, so it's not always an error.
For example you never need </body></html>. Nobody ever remembers to put <tbody> explicitly (to the point that XHTML made exceptions for it).
You don't need </head><body> unless you have DOM-manipulating scripts that actually search <head> (then it's better to close it explicitly, because rules for implied end of <head> could surprise you).
Nested lists are actually better off without </li>, because then it's harder to create erroneous ul > ul tree.
Valid:
<ul>
<li>item
<ul>
<li>item
</ul>
</ul>
Invalid:
<ul>
<li>item</li>
<ul>
<li>item</li>
</ul>
</ul>
And keep in mind that end tags are implied whether you try to close all elements or not. Putting end tags won't automatically make parsing more robust:
<p>foo <p>bar</p> baz</p>
will parse as:
<p>foo</p><p>bar</p> baz
It can only help when you validate documents.

The optional ones are all ones that should be semantically clear where they end, without needing the end tag.
E.G. each <li> implies a </li> if there isn't one right before it.
The forbidden end tags all would be immediately followed by their end tag so it would be kind of redundant to have to type <img src="blah" alt="blah"></img> every time.
I almost always use the optional tags (unless I have a very good reason not to) because it lends to more readable and updateable code.

I am adding some links here to help you with the history of HTML, for you to understand the various contradictions. This is not the answer to your question, but you will know more after reading these various digests.
How Did We Get Here? – Dive Into HTML5
The History of the Web
Brief History of HTML
HTML’s History – HTML WG Wiki
Some excerpts from Dive Into HTML5:
[T]he fact that “broken” HTML markup still worked in web browsers led authors to create broken HTML pages. A lot of broken pages. By some estimates, over 99% of HTML pages on the web today have at least one error in them. But because these errors don’t cause browsers to display visible error messages, nobody ever fixes them.
The W3C saw this as a fundamental problem with the web, and they set out to correct it. XML, published in 1997, broke from the tradition of forgiving clients and mandated that all programs that consumed XML must treat so-called “well-formedness” errors as fatal. This concept of failing on the first error became known as “draconian error handling,” after the Greek leader Draco who instituted the death penalty for relatively minor infractions of his laws. When the W3C reformulated HTML as an XML vocabulary, they mandated that all documents served with the new application/xhtml+xml MIME type would be subject to draconian error handling. If there was even a single well-formedness error in your XHTML page […] web browsers would have no choice but to stop processing and display an error message to the end user.
This idea was not universally popular. With an estimated error rate of 99% on existing pages, the ever-present possibility of displaying errors to the end user, and the dearth of new features in XHTML 1.0 and 1.1 to justify the cost, web authors basically ignored application/xhtml+xml. But that doesn’t mean they ignored XHTML altogether. Oh, most definitely not. Appendix C of the XHTML 1.0 specification gave the web authors of the world a loophole: “Use something that looks kind of like XHTML syntax, but keep serving it with the text/html MIME type.” And that’s exactly what thousands of web developers did: they “upgraded” to XHTML syntax but kept serving it with a text/html MIME type.
Even today, millions of web pages claim to be XHTML. They start with the XHTML doctype on the first line, use lowercase tag names, use quotes around attribute values, and add a trailing slash after empty elements like <br /> and <hr />. But only a tiny fraction of these pages are served with the application/xhtml+xml MIME type that would trigger XML’s draconian error handling. Any page served with a MIME type of text/html — regardless of doctype, syntax, or coding style — will be parsed using a “forgiving” HTML parser, silently ignoring any markup errors, and never alerting end users (or anyone else) even if the pages are technically broken.
XHTML 1.0 included this loophole, but XHTML 1.1 closed it, and the never-finalized XHTML 2.0 continued the tradition of requiring draconian error handling. And that’s why there are billions of pages that claim to be XHTML 1.0, and only a handful that claim to be XHTML 1.1 (or XHTML 2.0). So are you really using XHTML? Check your MIME type. (Actually, if you don’t know what MIME type you’re using, I can pretty much guarantee that you’re still using text/html.) Unless you’re serving your pages with a MIME type of application/xhtml+xml, your so-called “XHTML” is XML in name only.
[T]he people who had proposed evolving HTML and HTML forms were faced with two choices: give up, or continue their work outside of the W3C. They chose the latter, registered the whatwg.org domain, and in June 2004, the WHAT Working Group was born.
[T]he WHAT working group was quietly working on a few other things, too. One of them was a specification, initially dubbed Web Forms 2.0, which added new types of controls to HTML forms. (You’ll learn more about web forms in A Form of Madness.) Another was a draft specification called “Web Applications 1.0,” which included major new features like a direct-mode drawing canvas and native support for audio and video without plugins.
In October 2009, the W3C shut down the XHTML 2 Working Group and issued this statement to explain their decision:
When W3C announced the HTML and XHTML 2 Working Groups in March 2007, we indicated that we would continue to monitor the market for XHTML 2. W3C recognizes the importance of a clear signal to the community about the future of HTML.
While we recognize the value of the XHTML 2 Working Group’s contributions over the years, after discussion with the participants, W3C management has decided to allow the Working Group’s charter to expire at the end of 2009 and not to renew it.
The ones that win are the ones that ship.

The reason i ask is because you just know it's optional for compatibility reasons; and they would have made them (mandatory | forbidden) if they could have.
That's an interesting inference. My reading of it is that just about any time a tag could be reliably inferred, the tag is optional. The design suggests that the intention was to make it quick and easy to write.
What did HTML 1, 2, 3 do with regards to these, now optional, closing tags.
The DTD for HTML 2 is embedded in the RFC which, along with the original HTML DTD, has optional start and end tags all over the place.
HTML 3 was abandoned (thanks to the browser wars) and replaced with HTML 3.2 (which was designed to describe the then current state of the web).
What does HTML 5 do?
HTML 5 was geared towards "paving the cowpaths" from the outset.
And what should i do?
Ah, now that is subjective and argumentative :)
Some people think that explicit tags are better for readability and maintainability by virtue of being in front of the readers eyes.
Some people think that inferred tags are better for readability and maintainability by virtue of not cluttering up the editor.

What does HTML 5 do?
The answer to this question is in the W3C Working Draft:
http://www.w3.org/TR/html5/syntax.html#syntax-tag-omission
And what should i do?
It's a matter of style. I try to never omit end tags because it helps me to be rigorous and not omit tags that are necessary.

If it is superfluous, leave it out.
If it serves a purpose (even a seemingly trivial purpose, such as appeasing your IDE or appeasing your eyes), leave it in.
It's rare in a well-defined spec to see optional items that do not affect behavior. With the exception of "comments", of course. But the HTML spec is less of a design spec, and more of a document of the state of current major implementations. So when an item is optional in HTML and it seems to serve no purpose, we may guess that optional nature is merely documentation of a quirk in specific browser.
Looking at the HTML-5 spec RFC section linked above, you see that the optional tags are strangely linked to the presence of comments! That should tell you that the authors are not wearing design hats. They are instead playing the game of "document the quirks" in major implementations. So we can't take the spec too seriously in this respect.
So, the solution is: Don't sweat it. Move on to something that actually matters. :)

I think the best answer is to include closing tags for readability or error detection. However, if you have lots of generated HTML (say, tables of data), you could save significant bandwidth by omitting optional tags.

My recommendation is that you omit most optional close tags, and all optional attributes that you can get away with. Many IDEs will complain so you may not be able to get away with omitting some of these but it is generally better for smaller file size and less clutter. If you have code generators definitely omit end tags there because you can get some good size reduction from it. Usually it doesn't really matter one way or the other.
But when it does matter then act on it. On some recent work of mine I was able to reduce the size of my rendered HTML from 1.5 MB to 800 KB by eliminating most of the generated end and redundant value attributes for the open tag, where the text of the element was the same as the value. I have about 200 tags. I could implement this some other way entirely, but that would be more work ($$$), so this allows me to easily make the page more responsive.
Just out of curiosity I found that if I removed quotes around attributes that didn't need them I could save 20 KB, but my IDE (Visual Studio) doesn't like it. I also was surprised to find that the really long ID that ASP.NET generates account for 20% of my file.
The idea that we could ever get any relevant fraction of HTML strictly valid was misguided in the first place, so do whatever works best for you and your customers. Most tools that I have ever seen or used will say they generate xhtml, but they don't really work 100%, and there isn't any benefit to strict adherence anyway.

Personally, I'm a fan of XHTML and, like ghoppe, "I try to never omit end tags because it helps me to be rigorous and not omit tags that are necessary."
but
If you're deliberately using HTML 4.n, one can't argue that including them makes it easier to consume the document, as the notion of well-formedness as opposed to validity is an XML concept, and you lose that benefit when you forbid certain close tags. So the only issue becomes validity... and if it's still valid without them... you might as well save the bandwidth, no?

Using end tags makes dealing with fragments easier because their behaviour is not dependant on sibling elements. This reason alone should be compelling enough. Does anyone deal with monolithic html documents anymore?

In some curly bracket languages like C#, you can omit the curly braces around an if statement if its only two lines long. for example...
if ([condition])
[code]
but you can't do this...
if ([condition])
[code]
[code]
the third line won't be a part of the if statement. it hurts readability, and bugs can be easily introduced, and be difficult to find.
for the same reasons, i close all tags. tags like the img tag do still need to be closed, just not with a separate closing tag.

Do whatever you feel makes the code more readable and maintainable.
Personally I would always be inclined to close <td> and <tr>, but I would never bother with <li>.

If you were writing an HTML parser, would it be easier to parse HTML that included optional closing tags, or HTML that doesn't? I think the optional closing tags being present would make it easier, as I wouldn't have to infer where the closing tag should be.
For that reason, I always include the optional closing tags - on the theory that my page might render faster, as I'm creating less work for the browser's HTML parser.

For forbidden closing types use syntax like: <img /> With the /> to close the tag which is accepted in xml

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008