Is semantic markup too open-ended? [closed] - html

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am taking a peek at Dive Into HTML5. It seems nice and interesting, but I am puzzled.
In the 1990s, at the time when Netscape was the browser and HTML was HTML2 or HTML3, there were a lot of tags: address, cite, code... Most of them are unused as of today, probably even obsolete.
HTML5 introduces tags to express "semantic meaning" to the tag itself. This is all fun and games, but I see something very strange in this approach. Technically, the semantics can be very open ended. HTML5 has tags for article, time, navigation bars, footer. Why shouldn't it contain tags for post icon, author's place, name and surname, or whatever else you want to assign specific semantics to (I'm confident <rant> and <nsfw> would be very important tags): ? I thought XML was the strategy to assign semantics to stuff. Nothing forbids you to put an XML chunk under a XHTML div element, and assign a stylesheet to it so to style it properly, or to delegate to the proper viewer the handling of that namespace (for example, when handling RSS or SVG).
In conclusion, I don't understand the reason behind this extensions focused towards semantics, when it's clear that semantic is a very broad topic, which is guaranteed to require a potentially infinite amount of semantic tags. Since I am pretty sure there are clever people at W3C, I think I'm wrong, but I'd like to know why.

Why are tags for article, time, navigation bars, footer useful?
Because they facilitate parsing for text processing tools like Google.
It's nothing about semantics (at least in 'broad' meaning). Instead they just say: here is the body of page (most important text part) and there is the navigation bar full of links. With such an approach you can easily extract just what you need.

I too hate the way that W3C is going with their specs. There are many things that I don't like, and this "semantics" fad is one of them. (Others include taking forever to complete their specs and leaving too many important details for the browsers to implement as they choose)
Most of all I don't like it because it makes my work as a web developer more difficult. I often have to make a choice whether to make the webpage "semantically correct" or "visually/aesthetically pleasing". The latter wins of course, because that is what the users want, but as a result validations start failing and the whole thing gets quite non-semantic (tables for layout and other things).
Another issue at which I frown is that they have officialy declared that the "class" attribute is for semantics, but then they used it for visual presentation selectors in CSS.
Bottom line - DON'T MIX SEMANTICS AND VISUAL REPRESENTATION. If you use some mechanism for describing semantics (like tag names, attribute values, or what not else), then don't use it for funcional/visual purposes and vice versa.
If I would design HTML, I would simply add an attribute "semantic" which could (like the "class" attribute) be added to any tag. Then there would be a number of predefined values like all those headers/footers/articles/quotes/etc.
Tags would define functionality. Basically you could reduce HTML tags to just a handful, like "div", "table/tr/td", "a", "img", "form", "input" and "select". I probably missed a few but this is the bulk. Visual styling would be accomplished through CSS.
This way the three areas - semantics, visual representation, and functionality - would be completely independent and wouldn't clash in real life solutions.
Of course, I don't think W3C is interested in practical solutions...

There is already a lot of semantics in HTML markup in the forms of classes and IDs, of which there is a (near) infinite amount of possibilities of, And everyone has their own way of handling these semantics. One of the goals of HTML5 is to try to bring some structure to this. you will still be able to extend the semantics of tags with classes and ids. It will also most likely make things easier for search engines.

Look at it from the angle of trying to make statements either about the page, or about objects referenced from the page. If you see a <footer> tag, all you can say is "stuff in here is a footer" and pass it by. As such, adding custom tags is not as generic a solution as adding attributes and allowing people to use their own choice of URIs to specify predicates and optionally values - RDFa wins hands-down because you can express any triple-statement you like from RDF in a page, one way or another.

I just want to address one part of your question. You say:
In the nineties, at the time when
Netscape was the browser and html was
HTML2 or HTML3, there were a lot of
tags: address, cite, code... Most of
them are unused as of today, probably
even obsolete.
There are a great deal of tags to choose from in html, but the lack of usage does not imply that they are obsolete. In particular the header tags <h1>, etc, and <ul>, <ol> are used to join items into lists in a way I consider semantic. Many people may not use tags semantically, but the effort to create microformats is an ongoing continuation of the idea you consider an artifact of the 1990s. Efforts to make the semantic web be a winner keeps going, despite full-text search and link analysis (in the form of Google) being the winner as far as how to find and understand the web.
It would be great to see an updated version of Google's Web Stats which show "html as she is spoke." But you are right that many tags are underused.
Whether html5 will be successful is an open and interesting question, but the tags you describe as obsolete didn't go anywhere, they were there in HTML 4.01 and xhtml. HTML5 seems to be an effort to solidify what is useful in tags. In the end if html5 gets support in browsers and makes the job of web developers easier, it will succeed. xhtml2 failed because it roundly failed to gain adoption in browsers and did nothing to make the job of web page makers easier. The forces working on html5 seem keenly aware of the failure of xhtml2, and I think are avoiding having html5 suffer a similar fate.

"Why shouldn't it contain tags for post icon, author's place, name and surname, or whatever else you want to assign specific semantics to (I'm confident and would be very important tags): ?"
You use <dialog> to describe conversations or comments. Rant and NSFW are subjective terms therefore it makes sense not to use them.
From what I understand a bunch of experienced web developers did research and looked for what most websites have in common in html. They noticed that most websitse have id="header", id="footer", id="section" and id="nav" tags so they decided that we need HTML tags to replace those id's. So in other words, don't expect them to give you a HUGE amount of HTML vocabulary. Just keep it simple as possible as you can while addressing the MOST common needed HTML tags.
NAV tag is VERY important for providing accessibility as well. You want them to know where the navigation is rather than to force them to find whether links are for navigation or not.

I disagree with adding extra tags. If detailed vocabulary were actually import then there could be a different tag name for every word in the dictionary. Additional tags names are not helpful as they may communicate additional meaning to humans, but do nothing to facilitate machine parsing of the language. This is why I don't like the "semantic" tags for HTML5 as I believe this to be slippery slope to providing a vocabulary too complex while only providing a weak solution to a problem not fully addressed.
In my opinion markup language structure data as much as describe it in a tree diagram form. Through parsing of the structure and proper use of semantic conventions, such as RDFa, context can be leveraged to provide specific meaning to otherwise generic tag names. In such as case excessive vocabulary need not exist and structurally redundant tag names, such as footer and aside, could be eliminated. The final objective is to make content faster and more accurate to interpret by both humans and machines simultaneously while using as little code as possible to achieve that result. How that solution is lesser important, except to HTML5.

I thought XML was the strategy to assign semantics to stuff.
As far as I know, no it wasn’t. XML allows new languages to be defined which are all parsed in the same way, because they all use the XML syntax.
It doesn’t, of itself, provide any way to add meaning (“semantic” just means “meaningful”) to those languages. And until computers get artificial intelligence, they don’t actually understand meaning, so meaning is just what is agreed between human beings. HTML is the most commonly-used language with agreed meaning of its tags.
As HTML is so common, it’s helpful to add a few meaningful tags to it that are quite general in their application. The new HTML5 tags are aimed at that. The HTML5 spec’s authors could indeed carry on down this route, creating tags for every specific bit of meaning possible, but as they’re not robots, they probably won’t.
<section> is useful, and general enough to be meaningfully applicable in lots of documents. <author-last-name> isn’t. Distinguishing between the two is a judgment call, which is why humans, and not computers, write the spec.
For custom semantics that are too specific to be added to HTML as tags, HTML5 defines microdata.

I've been reading Andy Clark's book Transcending CSS (page 33).
...,it is now widely accepted that presentational names such as header, left, or red that describe an element's look or position are poor choices.
After reading these lines I asked myself: hey, aren't there elements in HTML5 spec such as header, footer?? Why is footer more semantic ? Andy in his book advocates to use site-info for the ID of the footer div and this makes more sense IMHO. Footer is a presentational name (describes the element's position).

In a word, AJAX. The new tags are meant to support what real-world developers are doing by replacing some of the <div class="sidebar-wrap"><div class="styling-hook"><div><ul class="nav"> type of divitis many websites suffer from. The only <div> left in the HTML5 is the styling hook.
The semantics that get promoted to tags from classes are those that developers have freely adopted en-masse as best practices, given an extended xhtml/css adoption period. Check out the WHATWG developer's edition of the spec's sections pagehere. The document itself is a pleasure, but I won't spoil it if you haven't seen it yet.
One of the less obvious reasons for some decisions made by the W3C is the importance of Webkit. If you look, you can see that they were better than some at taking the current work of the HTML5 Working Group and implementing ideas. They have historically been way out ahead in compliance (see here). The W3C placed a high priority on their (i.e. Android, iPhone, the Googlebot, Chrome, Safari, Dreamweaver, etc.,). Google, framework users, Wordpress/Moveable Type/Joomla! type users and others wanted self contained building blocks, so this is the style we get.
Facebook is modular. Responsive design's grids are modular. Wordpress is modular. Ajax works best with modular page structures. Widgets are modules. Plug-ins are modules. It would seem that we should be trying to figure out stuff like how to apply these tags to make it easier to hook the appropriate elements and activate them in our document/application/info-network hybrid Web 2.0.
In closing, HTML5 is meant to be written as xml (again, see the spec) in order to ensure that tools and machines making ajax requests for a portion of a document will get a well-formed useful response. How awesome in combination with things like media queries for devices like feed readers, braille printers, annotators, etc.,. I see a (near)future where anything with good semantic content is it's own newsfeed automagically! This only happens if developers adopt and write compliant documents.

Related

Practically speaking, why semantic markup?

Does Google really care if I use an <h5> as a <b> tag?
What are some real-world, practical reasons I should care about semantic markup?
A few examples
Many visually impaired people rely on speech browsers to read pages back to them. These programs cannot interpret pages very well unless they are clearly explained. In other words semantic code aids accessibility
Search engines need to understand what your content is about in order to rank you properly on search engines.
Semantic code tends to improve your placement on search engines, as it is easier for the "search engine spiders" to understand.
However, semantic code has other benefits too:
As you can see from the example above, semantic code is shorter and so downloads faster.
Semantic code makes site updates easier because you can apply design style to headings across an entire site instead of on a per page basis.
Semantic code is easier for people to understand too so if a new web designer picks up the code they can learn it much faster.
Because semantic code does not contain design elements it is possible to change the look and feel of your site without recoding all of the HTML.
Once again, because design is held separately from your content, semantic code allows anybody to add or edit pages without having to have a good eye for design.
You simply describe the content and the cascading style sheet defines what that content looks like.
Source: boagworld
Semantics and the Web
Semantics are the implied meaning of a subject, like a word or sentence. It aids how humans (and these days, machines) interpret subject matter. On the web, HTML serves both humans and machines, suggesting the purpose of the content enclosed within an HTML tag. Since the dawn of HTML, elements have been revised and adapted based on actual usage on the web, ideally so that authors can navigate markup with ease and create carefully structured documents, and so that machines can infer the context of the wonderful collection of data we humans can read.
Until — and perhaps even after — machines can understand language and all its nuances at the same level as a human, we need HTML to help machines understand what we mean. A computer doesn’t care if you had pizza for dinner. It likely just wants to know what on earth it should do with that information.
HTML semantics are a nuanced subject, widely debated and easily open to interpretation. Not everyone agrees on the same thing right away, and this is where problems arise.
Allow me to paint a picture:
You are busy creating a website.
You have a thought, “Oh, now I have to add an element.”
Then another thought, “I feel so guilty adding a div. Div-itis is terrible, I hear.”
Then, “I should use something else. The aside element might be appropriate.”
Three searches and five articles later, you’re fairly confident that aside is not semantically correct.
You decide on article, because at least it’s not a div.
You’ve wasted 40 minutes, with no tangible benefit to show for it.
— Divya Manian
This generated a storm of responses, both positive and negative. In Pursuing Semantic Value By Jeremy Keith argued that being semantically correct is not fruitless, and he even gave an example of how <section> can be used to adjust a document’s outline. He concludes:
But if you can get past the blustery tone and get to the kernel of the article, it’s a fairly straightforward message: don’t get too hung up on semantics to the detriment of other important facets of web development.
— Jeremy Keith
Naming Things
Of all the possible new element names in HTML5, the spec is pretty set on things like <nav> and <footer>. If you’ve used either of those as a class or id in your own markup, it’s no coincidence. Studies of the web from the likes of Google and Opera (amongst others) looked at which names people were using to hint at the purpose of a part of their HTML documents. The authors of the HTML5 spec recognised that developers needed more semantic elements and looked at what classes and IDs were already being used to convey such meaning.
Of course, it isn’t possible to use all of the names researched, and of the millions of words in the English language that could have been used, it’s better to focus on a small subset that meets the demands of the web. Yet some people feel that the spec isn’t yet doing so.
Source: html5doctor (This goes on for quite a while so I've only put a few examples here.)
Hope this helps!

Why does CSS work with fake elements?

In my class, I was playing around and found out that CSS works with made-up elements.
Example:
imsocool {
color:blue;
}
<imsocool>HELLO</imsocool>
When my professor first saw me using this, he was a bit surprised that made-up elements worked and recommended I simply change all of my made up elements to paragraphs with ID's.
Why doesn't my professor want me to use made-up elements? They work effectively.
Also, why didn't he know that made-up elements exist and work with CSS. Are they uncommon?
Why does CSS work with fake elements?
(Most) browsers are designed to be (to some degree) forward compatible with future additions to HTML. Unrecognised elements are parsed into the DOM, but have no semantics or specialised default rendering associated with them.
When a new element is added to the specification, sometimes CSS, JavaScript and ARIA can be used to provide the same functionality in older browsers (and the elements have to appear in the DOM for those languages to be able to manipulate them to add that functionality).
(There is a specification for custom elements, but they have specific naming requirements and require registering using JavaScript.)
Why doesn't my professor want me to use made-up elements?
They are not allowed by the HTML specification
They might conflict with future standard elements with the same name
There is probably an existing HTML element that is better suited to the task
Also; why didn't he know that made-up elements existed and worked with CSS. Are they uncommon?
Yes. People don't use them because they have the above problems.
TL;DR
Custom tags are invalid in HTML. This may lead to rendering issues.
Makes future development more difficult since code is not portable.
Valid HTML offers a lot of benefits such as SEO, speed, and professionalism.
Long Answer
There are some arguments that code with custom tags is more usable.
However, it leads to invalid HTML. Which is not good for your site.
The Point of Valid CSS/HTML | StackOverflow
Google prefers it so it is good for SEO.
It makes your web page more likely to work in browsers you haven't tested.
It makes you look more professional (to some developers at least)
Compliant browsers can render [valid HTML faster]
It points out a bunch of obscure bugs you've probably missed that affect things you probably haven't tested e.g. the codepage or language set of the page.
Why Validate | W3C
Validation as a debugging tool
Validation as a future-proof quality check
Validation eases maintenance
Validation helps teach good practices
Validation is a sign of professionalism
YADA (yet another (different) answer)
Edit: Please see the comment from BoltClock below regarding type vs tag vs element. I usually don't worry about semantics but his comment is very appropriate and informative.
Although there are already a bunch of good replies, you indicated that your professor prompted you to post this question so it appears you are (formally) in school. I thought I would expound a little bit more in depth about not only CSS but also the mechanics of web browsers. According to Wikipedia, "CSS is a style sheet language used for describing ... a document written in a markup language." (I added the emphasis on "a") Notice that it doesn't say "written in HTML" much less a specific version of HTML. CSS can be used on HTML, XHTML, XML, SGML, XAML, etc. Of course, you need something that will render each of these document types that will also apply styling. By definition, CSS does not know / understand / care about specific markup language tags. So, the tags may be "invalid" as far as HTML is concerned, but there is no concept of a "valid" tag/element/type in CSS.
Modern visual browsers are not monolithic programs. They are an amalgam of different "engines" that have specific jobs to do. At a bare minimum I can think of 3 engines, the rendering engine, the CSS engine, and the javascript engine/VM. Not sure if the parser is part of the rendering engine (or vice versa) or if it is a separate engine, but you get the idea.
Whether or not a visual browser (others have already addressed the fact that screen readers might have other challenges dealing with invalid tags) applies the formatting depends on whether the parser leaves the "invalid" tag in the document and then whether the rendering engine applies styles to that tag. Since it would make it more difficult to develop/maintain, CSS engines are not written to understand that "This is an HTML document so here are the list of valid tags / elements / types." CSS engines simply find tags / elements / types and then tell the rendering engine, "Here are the styles you should apply." Whether or not the rendering engine decides to actually apply the styles is up it.
Here is an easy way to think of the basic flow from engine to engine: parser -> CSS -> rendering. In reality it is much more convoluted but this is good enough for starters.
This answer is already too long so I will end there.
Unknown elements are treated as divs by modern browsers. That's why they work. This is part of the oncoming HTML5 standard that introduces a modular structure to which new elements can be added.
In older browsers (I think IE7-) you can apply a Javascript-trick after which they will work as well.
Here is a related question I found when looking for an example.
Here is a question about the Javascript fix. Turns out it is indeed IE7 that doesn't support these elements out of the box.
Also; why didn't he know that made-up tags existed and worked with CSS. Are they uncommon?
Yes, quite. But especially: they don't serve additional purpose. And they are new to html5. In earlier versions of HTML an unknown tag was invalid.
Also, teachers seem to have gaps in their knowledge, sometimes. This might be due to the fact that they need to teach students the basics about a given subject, and it doesn't really pay off to know all ins and outs and be really up to date.
I once got detention because a teacher thought I programmed a virus, just because I could make a computer play music using the play command in GWBasic. (True story, and yes, long ago). But whatever the reason, I think the advice not to use custome elements is a sound one.
Actually you can use custom elements. Here is the W3C spec on this subject:
http://w3c.github.io/webcomponents/spec/custom/
And here is a tutorial explaining how to use them:
http://www.html5rocks.com/en/tutorials/webcomponents/customelements/
As pointed out by #Quentin: this is a draft specification in the early days of development, and that it imposes restrictions on what the element names can be.
There are a few things about the other answers that are either just poorly phrased or perhaps a little incorrect.
FALSE(ish): Non-standard HTML elements are "not allowed", "illegal", or "invalid".
Not necessarily. They're "non-conforming". What's the difference? Something can "not conform" and still be "allowed". The W3C aren't going to send the HTML police to your home and haul you away.
The W3C left things this way for a reason. Conformance and specifications are defined by a community. If you happen to have a smaller community consuming HTML for more specific purposes and they all agree on some new Elements they need to make things easier, they can have what the W3C refers to as "other applicable specifications". (this is a gross over simplification, obviously, but you get the idea)
That said, strict validators will declare your non-standard elements to be "invalid". but that's because the validator's job is to ensure conformance to whatever spec it's validating for, not to ensure "legality" for the browser or for use.
FALSE(ish): Non-standard HTML elements will result in rendering issues
Possibly, but unlikely. (replace "will" with "might") The only way this should result in a rendering issue is if your custom element conflicts with another specification, such as a change to the HTML spec or another specification being honored within the same system (such as SVG, Math, or something custom).
In fact, the reason CSS can style non-standard tags is because the HTML specification clearly states that:
User agents must treat elements and attributes that they do not understand as semantically neutral; leaving them in the DOM (for DOM processors), and styling them according to CSS (for CSS processors), but not inferring any meaning from them
Note: if you want to use a custom tag, just remember a change to the HTML spec at a later time could blow your styling up, so be prepared. It's really unlikely that the W3C will implement the <imsocool> tag, however.
Non-standard tags and JavaScript (via the DOM)
The reason you can access and alter custom elements using JavaScript is because the specification even talks about how they should be handled in the DOM, which is the (really horrible) API that allows you to manipulate the elements on your page.
The HTMLUnknownElement interface must be used for HTML elements that are not defined by this specification (or other applicable specifications).
TL;DR: Conforming to the spec is done for purposes of communication and safety. Non-conformance is still allowed by everything but a validator, whose sole purpose is to enforce conformity, but whose use is optional.
For example:
var wee = document.createElement('wee');
console.log(wee.toString()); //[object HTMLUnknownElement]
(I'm sure this will draw flames, but there's my 2 cents)
According to the specs:
CSS
A type selector is the name of a document language element type written using the syntax of CSS qualified names
I thought this was called the element selector, but apparently it is actually the type selector. The spec goes on to talk about CSS qualified names which put no restriction on what the names actually are. That is to say that as long as the type selector matches CSS qualified name syntax it is technically correct CSS and will match the element in the document. There is no CSS-specific restriction on elements that do not exist in a particular spec -- HTML or otherwise.
HTML
There is no official restriction on including any tags in the document that you want. However, the documentation does say
Authors must not use elements, attributes, or attribute values for purposes other than their appropriate intended semantic purpose, as doing so prevents software from correctly processing the page.
And it later says
Authors must not use elements, attributes, or attribute values that are not permitted by this specification or other applicable specifications, as doing so makes it significantly harder for the language to be extended in the future.
I'm not sure specifically where or if the spec says that unkown elements are allowed, but it does talk about the HTMLUnknownElement interface for unrecognized elements. Some browsers may not even recognize elements that are in the current spec (IE8 comes to mind).
There is a draft for custom elements, though, but I doubt it is implemented anywhere yet.
This is possible with html5 but you need to take into consideration of older browsers.
If you do decide to use them then, make sure to COMMENT your html!! Some people may have some trouble figuring out what it is so a comment could save them a ton of time.
Something like this,
<!-- Custom tags in use, refer to their CSS for aid -->
When you make your own custom tag/elements the older browsers will have no clue what that is just like html5 elements like nav/section.
If you are interested in this concept then I recommend to do it the right way.
Getting started
Custom Elements allow web developers to define new types of HTML
elements. The spec is one of several new API primitives landing under
the Web Components umbrella, but it's quite possibly the most
important. Web Components don't exist without the features unlocked by
custom elements:
Define new HTML/DOM elements Create elements that extend from other
elements Logically bundle together custom functionality into a single
tag Extend the API of existing DOM elements
There is a lot you can do with it and it does make your script beautiful as this article likes to put it. Custom Elements defining new elements in HTML.
So lets recap,
Pros
Very elegant and easy to read.
It is nice to not see so many divs. :p
Allows a unique feel to the code
Cons
Older browser support is a strong thing to consider.
Other developers may have no clue what to do if they don't know about custom tags. (Explain to them or add comments to inform them)
Lastly one thing to take into consideration, but I am unsure, is block and inline elements. By using custom tags you are going to end up writing more css because of the custom tag won't have a default side to it.
The choice is entirely up to you and you should base it on what the project is asking for.
Update 1/2/2014
Here is a very helpful article I found and figured I would share, Custom Elements.
Learn the tech Why Custom Elements? Custom Elements let authors define
their own elements. Authors associate JavaScript code with custom tag
names, and then use those custom tag names as they would any standard
tag.
For example, after registering a special kind of button called
super-button, use the super button just like this:
Custom elements are still elements. We
can create, use, manipulate, and compose them just as easily as any
standard or today.
This seems like a very good library to use but I did notice it didn't pass Window's Build status. This is also in a pre-alpha I believe so I would keep an eye on this while it develops.
Why doesn't he want you to use them? They are not common nor part of the HTML5 standard.
Technically, they are not allowed. They are a hack.
I like them myself, though. You may be interested in XHTML5. It allows you to define your own tags and use them as part of the standard.
Also, as others have pointed out, they are invalid and thus not portable.
Why didn't he know that they exist? I don't know, except that they are not common. Possibly he was just not aware that you could.
Made-up tags are hardly ever used, because it's unlikely that they will work reliably in every current browser, and every future browser.
A browser has to parse the HTML code into elements that it knows, to made-up tags will be converted into something else to fit in the document object model (DOM). As the web standards doesn't cover how to handle everyting that is outside of the standards, web browsers tend to handle non-standars code in different ways.
Web development is tricky enough with a bunch of different browsers that have their own quirks, without adding another element of uncertainty. The best bet it to stick with things that are actually in the standards, that is what the browser vendors try to follow, so that has the best chance to actually work.
I think made-up tags are just potentially more confusing or unclear than p's with IDs (some block of text generally). We all know a p with an ID is a paragraph, but who knows what made-up tags are intended for? At least that's my thought. :) Therefore this is more of a style / clarity issue than one of functionality.
Others have made excellent points but its worth noting that if you look at a framework such as AngularJS, there is a very valid case for custom elements and attributes. These convey not only better semantic meaning to the xml, but they also can provide behavior, look and feel for the web page.
CSS is a style sheet language that can be used to present XML documents, not only (X)HTML documents. Your snippet with the made-up tags could be part of a legal XML document; it would be one if you enclose it in a single root element. Probably you already have a <html> ...</html> around it? Any current browser can display XML documents.
Of course it is not a very good XML document, it lacks a grammar and an XML declaration. If you use an HTML declaration header instead (and probably a server configuration that sends the correct mime type) it would instead be illegal HTML.
(X)HTML has advantages over plain XML as elements have a semantic meaning that is useful in the context of a web page presentation. Tools can work with this semantics, other developers know the meaning, it is less error prone and better to read.
But in other contexts it is better to use CSS with XML and/or XSLT to do the presentation. This is what you did. As this wasn't your task, you didn't know what you were doing, and HTML/CSS is the better way to go most of the time you should stick to it in your scenario.
You should add an (X)HTML header to your document so tools can give you meaningful error messages.
...I simply change all of my made up tags to paragraphs with ID's.
I actually take issue with his suggestion of how to do it properly.
A <p> tag is for paragraphs. I see people using it all the time instead of a div -- simply for spacing purposes or because it seems gentler. If it's not a paragraph, don't use it.
You don't need or want to stick ID's on everything unless you need to target it specifically (e.g. with Javascript). Use classes or just a straight-up div.
From its early days CSS was designed to be markup agnostic so it can be used with any markup language producing tree alike DOM structures (SVG for example). Any tag that comply to name token production is perfectly valid in CSS. So your question is rather about HTML than CSS itself.
Elements with custom tags are supported by HTML5 specification. HTML5 standardize the way how unknown elements must be parsed in the DOM. So HTML5 is the first HTML specification that enables custom elements strictly speaking. You just need to use HTML5 doctype <!DOCTYPE html> in your document.
As of custom tag names themselves...
This document http://www.w3.org/TR/custom-elements/ recommends custom tags you choose to contain at least one '-' (dash) symbol. This way they will not conflict with future HTML elements. Therefore you'd better change your doc to something like this:
<style>
so-cool {
color:blue;
}
</style>
<body>
<so-cool>HELLO</so-cool>
</body>
Surprisingly, nobody (including my past self) mentioned accessibility. Another reason that using valid tags instead of custom ones is for compatibility with the greatest amount of software, including screen-readers and other tools that people need for accessibility purposes. Moreover, accessibility laws like WAI require making accessible websites, which generally means requiring them to use valid markup.
Apparently nobody mentioned it, so I will.
This is a by-product of browser wars.
Back in the 1990’s when the Internet was first starting to go mainstream, competition incrased in the browser market. To stay competitive and draw users, some browsers (most notably Internet Explorer) tried to be helpful and “user-friendly” by attempting to figure out what page designers meant and thus allowed markup that are incorrect (e.g., <b><i>foobar</b></i> would correctly render as bold-italics).
This made sense to some degree because if one browser kept complaining about syntax errors while another ate anything you threw at it and spit out a (more-or-less) correct result, then people would naturally flock to the latter.
While many thought the browser wars were over, a new war between browser vendors has reignited in the past few years since Chrome was released, Apple started growing again and pushing Safari, and IE lost its dominance. (You could call it a “cold war” due to the perceived cooperation and support of standards by browser vendors.) Therefore, it is not a surprise that even contemporary browsers which supposedly conform strictly to web standards actually try to be “clever” and allow standard-breaking behavior such as this in order to try to gain an advantage as before.
Unfortunately, this permissive behavior led to a massive (some might even say cancerous) growth of poorly marked up webpages. Because IE was the most lenient and popular browser, and due to Microsoft’s continued flouting of standards, IE became infamous for encouraging and promoting bad design and propagating and perpetuating broken pages.
You may be able to get away with using quirks and exploits like that on some browsers for now, but other than the occasional puzzle or game or something, you should always stick to web standards when creating web pages and sites to ensure they display correctly and avoid them becoming broken (possibly completely ignored) with a browser update.
While browsers will generally relate CSS to HTML tags regardless of whether or not they are valid, you should ABSOLUTELY NOT do this.
There is technically nothing wrong with this from a CSS perspective. However, using made up tags is something you should NEVER do in HTML.
HTML is a markup language, which means that each tag corresponds to a specific type of information.
Your made up tags don't correspond to any type of information. This will create problems from web crawlers, such as Google.
Read more information on the importance of correct markup.
Edit
Divs refer to groups of multiple related elements, meant to be displayed in block form and can be manipulated as such.
Spans refer to elements that are to be styled differenly than the context they are currently in and are meant to be displayed inline, not as a block. An example is if a few words in a sentence needs to be all caps.
Custom tags do not correlate to any standards and thus span/div should be used with class/ID properties instead.
There are very specific exemptions to this, such as Angular JS
Although CSS has a thing called a "tag selector," it doesn't actually know what a tag is. That's left for the document's language to define. CSS was designed to be used not just with HTML, but also with XML, where (assuming you're not using a DTD or other validation scheme) the tags can be just about anything. You could use it with other languages too, though you would need to come up with your own semantics for exactly what things like "tags" and "attributes" correspond to.
Browsers generally apply CSS to unknown tags in HTML, because this is considered better than breaking completely: at least they can display something. But it is very bad practice to use "fake" tags deliberately. One reason for this is that new tags do get defined from time to time, and if one is defined that looks sort of like your fake tag but doesn't quite work the same way, that can cause problems with your site on new browsers.
Why does CSS work with fake elements? Because it doesn't hurt anyone because you're not supposed to use them anyways.
Why doesn't my professor want me to use made-up elements? Because if that element is defined by a specification in the future your element will have an unpredictable behavior.
Also, why didn't he know that made-up elements exist and work with CSS. Are they uncommon? Because he, like most other web developers, understand that we shouldn't use things that might break randomly in the future.

Is it OK to use unknown HTML tags?

Correct me if I'm mistaken, but AFAIK, unknown HTML tags in markup (i.e. tags not defined in the HTML spec, like say, <foobar>) will eventually be treated as a regular <div> in an HTML 5 browser environment.
I'm thinking: how supportable is this practice? I mean, if I use unknown HTML tags in my markup, what pitfalls can I expect? Will a velociraptor pounce on me within the next few seconds?
The reason I ask is that if these tags defer to <div>, I can potentially use these tags in a more semantic manner than, say, assigning class names that identify modules. Have a look at this article, for example, of a .media class. Now what if instead of writing up that CSS to target .media, I make it target <media> instead? In my opinion, that makes the markup much more readable and maintainable, but I do acknowledge that it's not "correct" HTML.
EDIT
Just to be transparent, I did find this SO question from a few years back. It's been closed as off-topic, but I feel that I have a valid point in my own wording. It's a close duplicate, I admit, but it's from a few years back, so there might have been changes in the general environ of opinions across web developers about the topic.
user1309389 had a very good answer, and I agree with the appeal to the spec. But I disagree with their conclusion, and I think they're wrong about made-up elements leading to "undefined behaviour". I want to propose an alternative way of thinking about it, rooted in how the spec and browsers actually handle made-up elements.
It's 2015, we're on the verge of the CustomElement spec being widely adopted, and polyfills are readily available. Now is a great time to be wondering about "making up your own elements". In the near future, you'll be able to create new elements with your own choice of tag and behaviour in a fully standard and supported way that everyone can love. But until this lands in all browsers, and until the majority of people are using supporting browsers, you can take advantage of the hard work of the Polymer or X-Tags projects to check out the future of custom elements in a nearly-standard and mostly-supported way that quite a few people can love. This is probably the "right thing" to do. But it doesn't answer your question, and frankly I find "just use X" or "don't do X" to be less helpful than "here's how the spec covers this, and here's what browsers do". So, here's what I love.
Against the heartfelt recommendation (and sometimes screaming) of much of the web dev community, I've been working with "made-up" elements for the past year, in all of my production projects, without a polyfill, and I've had no unsolvable issues (yet), and no complaints. How? By relying on the standard behaviour of HTMLUnknownElement, the part of the W3C spec that covers the case of "made-up" elements. If a browser encounters an unrecognized HTML element, there is a well-defined and unambiguous way that it should be handled, and HTMLUnknownElement defines that behaviour, and you can build on top of that. HTMLUnknownElement also has to be powerful and correct enough to "not break the web" when encountering all the tags that are now obsolete, like the <blink> tag. It's not recommended that you use HTMLUnknownElement, but in theory and in practice, there's absolutely no harm in doing so, if you know what you're doing.
So how does HTMLUnknownElement work? It is just an extension of the HTMLElement interface, which is the standard interface underlying all HTML elements. Unlike most other elements however, HTMLUnknownElement doesn't add any special behaviour — you get a raw element, unadorned with any special behaviour nor constraining rules about use. The HTMLDivElement interface works almost exactly the same way, extending HTMLElement and adding almost no additional behaviour. Put simply, making up your own element is almost identical to using a div or span.
What I like about "making-up" elements is the change of mindset. You should use or invent HTML elements based on several factors, ranging from how clear it makes the markup to read, to how the browser and screen readers and search engines parse your code, to how likely your code is to be "correct" by some objective measure. I sparingly use made-up elements, but I use in exactly the way Richard described, to make things more meaningful for the author of the HTML, not just meaningful to a computer service that extracts metadata. When used in a consistent way across a team, there can be a big benefit since made-up elements can concisely express what they're for.
I particularly like using made-up elements to indicate when I will be using JS to define extra behaviour for an element. For instance, if I have an element that will have children added/removed by JS, I will use a made-up element as a clue that this element is subject to special behaviour. By the same token, I don't use a made-up element when a standard element will suffice. You will see <dynamic-list> live happily next to <div> in my code.
Now, about those pesky validators. Yes, using made-up elements isn't "valid" in the sense that it won't pass a "validator". But many commonly used features, patterns, and systems of modern HTML and JS development fail all the W3C validators. The validators aren't the law — the spec is. And the law isn't binding — the implementations in all the browsers are. The utility of validators has been dimishing for years as the flexability of HTML has been increasing, and as browsers have shifted in their relationship to the spec. Validators are great for people who aren't comfortable with HTML and need guidance. But if you're comfortable taking your guidance from the spec and from browser implementations, there's no reason to worry about being flunked by the validator. Certainly, if you follow many of the guidelines offered by Google, Apple, Microsoft, etc, for implementing any experimental features, you'll be working outside the bounds of the validator. This is absolutely an okay thing to do, so long as you're doing it deliberately and you know enough about what you're doing.
Therefore, if you're going to make up your own elements and rely on HTMLUnknownElement, don't just treat it like a wild west. You need to follow a few simple rules.
You have to use a hyphen in the name of your tag. If you do, you are guaranteed to never collide with a future edition of the HTML spec. So never say <wrong>, always say <quite-right>.
Made-up elements can't be self-closing — you have to close them with a closing tag. You can't just say <wrong> or <still-wrong />, you have to say <totally-good></totally-good>.
You have to define a display property for your element in CSS, otherwise the rendering behaviour is undefined.
That's about it. If you do these things, you should be good to use made-up elements in IE9 and up, relying on the safety net of the HTMLUnknownElement. For me, the benefits far, far outweigh the costs, so I've been using this pattern heavily. I run a SaaS site catering to major industrial corporations, and I've had no trouble or complaints thus far. If you have to support older versions of IE, it's wise to stay far away from any "2015" technology or their crude approximations, and stay safely within the well-trodden parts of the spec.
So, in summary, the answer to your question is "yes, if you know what you're doing".
You should always approach HTML as it is defined in its respective specification. "Defining" new tags is a bit of an extreme approach. It might pass a browser check because it implements various failsafes, but there is no guarantee of this. You're entering the land of Undefined Behaviour, at best. Not to mention you will fail validation tests, but you seem to be aware of that.
If you wish to be more semantically expressive in your markup, you can use HTML5 which defines quite a bit of more descriptive tags for describing the structure of your page instead of generic divs which need to be appended ids or classes.
In the end, a short answer: No, it's bad practice, you shouldn't do it and there could be unforeseen problems later on in your development.
No. You will fail validation, you will get random issues cross browser and you WILL be eaten by said dinosaurs. CSS is the answer if you want your page to behave predictably.
Yes We Can.
There is a new spec going on about custom elements/tag - http://w3c.github.io/webcomponents/spec/custom/.
Only issue with this is you have to use js to register your new element
You can read more about this at
https://developers.google.com/web/fundamentals/getting-started/primers/customelements
Rule #1 of browser interoperability is: don't have errors. No matter how many browsers you test in, there are always browsers you can't test, for instance because they don't exist yet.
Also, unknown elements will be treated as <span>, not <div> by most browsers currently.
If it's really source readability(*) you're after, you should look into XML+XSLT.
That way, you can use all the tag names you want, and make them behave in any way you like and you don't have to worry that <media> will be a real element in some future version of HTML.
One good real world example is the element <picture>. If a website ever used <picture> and relied on the notion that this element would have no styles or special content by itself, they are in trouble now!
(*) With XML+XSLT, the readability will be in the XML part, not the XSLT part, obviously.
Generally not recommendable, e.g. IE wont apply css-styles to unknown tags.
All other browsers render unknown tags as inline-Elements (which causes problems with nesting).
I recommend you the following article: http://diveintohtml5.info/ There is a section about unknown tags.
In my case I use a lot of them into my Webkit-powered game GUI system, and everything works.
W3C says:
HTML5 supports unknown tags as inline elements, but it recommends CSS
styling for it.
Here is the source: https://www.w3schools.com/html/html5_browsers.asp
In your example you are talking about <media>, it's could be great but if html6 adds this tag for another element, your code won't be retrocompatible.
The one downside that worries me is what if a custom tag I use now, becomes an official HTML-tag next year or even later?
Therefore how about this: Instead of custom tags use 'div' + a custom CSS-class.
CSS-classes are meant to be custom, it is definitely ok to have your own custom CSS-classes. Your div can then further have any number of CSS-classes associated with it making the semantic machinery even more flexible, you could call it multiple inheritance.
Instead of div you could use span for the same purpose. I would like to use something shorter actually, say p but unfortunately p has its own special behavior of what happens if you don't close it.
But definitely if you go the route of expressing semantics with CSS-classes then it does help to use a tag-name that is as short as possible. I wish there was something shorter than div, say for instance t for 'tag'.
By default, custom elements main parts of many JavaScript-Framwork. It is state of the art in modern javascript.:
var XFoo = document.registerElement('x-foo', {
prototype: Object.create(HTMLElement.prototype)
});
https://developer.mozilla.org/en-US/docs/Web/Web_Components/Using_custom_elements
https://www.html5rocks.com/en/tutorials/webcomponents/customelements/
Is it OK to use unknown HTML tags? If you ask wrong questions, you get allways wrong answers. If you define your unknown tag with javascript as a CustomeElement , your unknown tag is no longer part of the HTML5/HTML-definition.
Yes, you can use unknown tags, if you define the tag in your javascript.
What's wrong with the judicious use of < !-- your stuff here -- >. It worked for scripts back around the time of the Cretaceous–Paleogene boundary. Dinosaurs ceased to be a problem around that time, that is apart from the flying, feathered variety.
It's bad practice and you should not do it. The browser renders it as div as fallback solution in most cases but it's not valid html and therefore never will pass a validity test.

how much does HTML5 outline algorithm boost SEO in general? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
So as the title suggests, im interested / curious about how much does HTML 5 outlining help boost SEO in general.
there are alot of tutorials and explanations as to what HTML5 will do for XYZ, and alot of how-tos but for some of the pluses that HTML5 currently(and future) brings, there isn't a clear mention(that i know of) as to how much better it will/or curently make things.
I work in a...normal team i guess in where we have dedicated graphic designers, coders, programmers etc and part of this team of course are the SEO guys.
Now MANY websites show (under the document outline tool) that many designers/developers are using HTML5 but when you look at their document outline, its all over the place and even have unnamed sections in their outlines.
my guess is that, theyre just using the regular DOC type of HTML <!DOCTYPE HTML> and rolling with it instead of typing the long/massive doctype and everything therein etc...
but that said, and as oppose to the regular "old" rules and debates of H1 tags etc...
Since many of HTML5s features are already in play, and most current browsers supporting them(at diff levels), does it hurt when having a semi messed up, unnamed sections in your outline hurt?
Overall, if i convert my page from html of old to new HTML5 standards and proper outlining etc, will it make my standings better?
in a simple question, FOR EXAMPLE, if my current page rankings are say 5/100 on google,
will implementing a better, newer doc outline with HTML5 bring me higher, say 3/100 or even 1/100.
You are under the wrongful impression that formatting of the code, or use of standards has something to do with how search engines rank your page. This isn't actually the case.
What does happen with semantic tags/classes/ids, validating documents, and standards compliant markup is make it a lot easier for search engines interpret the page properly.
The content, at which levels / order and importance (say with the header tags) mark how relevant it is, and a properly formed document only helps in recognizing the different sections, but has no inherent effect on how well its ranked.
In the end though SEO is slightly guessing how smart the search engine algorithms are, one might for instance assume that Google has some sort of consideration of grading an HTML4 markup page vs HTML5. Since HTML5 is more recent, it's likely to get some leverage over HTML4 markup since HTML4 is outdated.
There is no boost from using HTML5. Semantic markup helps SEO but the version you use, or format (as in microformats) does not. It's the content that gets ranked, not your markup.
I'm not sure what outlining means. Do you intent to verify that your document doesn't have untitled sections like in this article?
Which is your question:
Does Google like HTML5 more than other HTML's?
Does Google like valid HTML5 more than other valid HTML's that don't have valid HTML5 structre (i.e. messy or untitled sections)?
This is just my gut feeling but I'm guessing it doesn't matter what HTML you use. This article indicates code validity is not an issue in page ranking. Here it says Doctype isn't even registered by Google. (Just a couple of links, don't know how realiable they are.) It makes sense thought. PageRank algorithm can't really (again just my gut feeling) be so simple that would put a lot of weight on the validity of code or the used HTML language. Unique content, locality, links to other quelity sites etc. are far more important then few missing title tags or the fact source code is written using the latest HTML specification. Where would it end? Should PHP pages rank better than "pure" HTML. Is JavaScript an boosting factor or not? What JS library would be the most important one?
Sorry if I'm splitting hairs but used technology shouldn't matter, content should. (Don't really know if only content should matter thought...)

We hear so much about "semantic html". Where/what are the algorithms reading our semantic html?

I keep making attempts at properly using HTML5 but I feel like it's still not even close to anything semantically valuable.
My attempts:
HTML5 Article node Architecture
HTML5 Blog Page Architecture
But there's such subtleties in every single tag!
My question is, what specific software out there on the web is actually doing things like processing our HTML DOM, calculating and comparing elements to say "oh, this is a <header>, and it's just after <section>, and it has <time> in it, so the <time> tag must be "metadata" in relation to the <header>...", and saying "The content within the <time> tag not only is the "published time", but also relates to the author's birthday, so it must be a special post (say because there was also a <cite> or <address class='vcard'> tag in there too)".
I mean, what benefit am I ever going to get in using HTML5 if I don't know the algorithms that are interpreting it? If I just stuck with the basic div, ol, ul, li, p, a, h[1-6] tags, I could do everything with half the number of DOM elements.
Looking forward to some specific algorithms that I can use to shape how I structure the DOM from here on out.
I'm at the point where I don't even think we should be using HTML5 tags at all. For example, on the iPhone especially, the goal should be to minimize dom elements to decrease load time. Plus, if the iPhone site is a mirror of the traditional browser version, the search engines won't even see the iPhone site (ideally). So there's no real point in making the DOM semantic. So if I can use 1/2 the amount of <div> tags to achieve the same layout as if I used a somewhat "semantic HTML5" rendition, and that's a good thing for the iPhone, why don't I do that for the regular browser too? That's where I'm coming from.
Articles like this are basically saying it's pointless to worry about semantic HTML.
What algorithms are reading your semantic HTML? Google, that's who. Their algorithm tries to extract every bit of meaning from pages that it can, because that helps Google construct smart, relevant search results. For one example, Google tries to determine the dates of things by reading the HTML and gives headers extra consideration in determining the overall topic of a page.
Also, your assertion that we shouldn't use HTML5 tags on the iPhone "to minimize dom elements" isn't founded in any technical basis. HTML5 doesn't dictate that we use more DOM elements, and in fact it can let us leave out tags that would be required by XHTML. You should use HTML5 on the iPhone more than anywhere else. For example, the new input types like number and email don't do much on the desktop, but that extra information can really make things nicer on the iPhone by allowing it to present an appropriate interface.
Whenever a "machine" tries to make sense of your content.
In addition to search engines (→ SEO), screen readers (→ Accessibility) interpret the markup. They get better from version to version.
Also, think of all the tools that might come one day. The great thing about the Web is, that all the web pages could still exist in 5, 10, 100 … years from now. Imagine the user-agents and algorithms and search tools that might exist then, and how they could extract the meaning of your old documents.
Search engines can/will better interpret your pages which combined with other factors will result in better rankings for your pages.
Moreover if you use the tags consistently and semantically, you could build your own reusable widgets and libraries that derive knowledge from the HTML structure independent of how the data is stored in the backend.
Consider this sample Google search where you can filter results by date. By using semantic HTML, for let's say, <article> and <time>, you can write a simple crawler that recreates this functionality or allows users to specify a timespan within which to search articles in your own site(s).
Off the top of my head, I don’t know of any algorithms making use of the new semantic tags in HTML5. (Obviously, that doesn’t mean there aren’t any.)
But the idea that you should tailor your HTML to specific algorithms is, I think, a bit contrary to how the web works. The web is worldwide, and will hopefully be around for a long time. We can’t know what uses our HTML will be put to, and useful algorithms can’t be written until there’s a good amount of actual content out there.
The <a> tag wasn’t designed with Google’s PageRank algorithm in mind. Some people thought links would be useless if they weren’t inherently two-way, because you’d get too many broken links when one end went away.
Of course, if the vague possibility of undefined future benefits makes it not worth using some or all HTML5 tags for whatever project you’re working on, don’t use them.
For me, the benefit of using them is that there’s a well-known, public, non-proprietary specification that tells you, and anyone else working on the code, what we’ve agreed the tags mean. Future developers don’t just get a <div> with a class name that I made up in a coffee-fuelled 7 p.m. code print, they get a tag designed and documented by people smarter and more experienced than me. There’s also the chance that the code will become more useful in future if people use the meaning contained in HTML5 tags in algorithms, whereas there’s less chance of that if it’s all just a bunch of <div>s.
I don’t think the size increase of our pages from HTML5 tags is particularly worth worrying about though. After gzipping, the size increases aren’t enough to worry about, especially as mobile performance is as much hampered by the latency (which you can’t do much about) as the bandwidth. Plus mobile bandwidth is likely to trend up, rather than down.