HTML -- data-content-field - html

I noticed on a website that within a h1, they had a data-content-field so the code was
<h1 data-content-field="site-title">
My Amazing Website
</h1>
Is there any need in the data-content-field? I've personally never used it before. I've heard that it can be good for SEO and Google's bots when it comes to indexing websites.
I've seen them been used on other tags as well as the site title.
Would it hurt my site if I were to add / remove it (especially for SEO).
Thanks!

No, no. You're missing the point.
All data- attributes are custom attributes the can be added by the user. For example in my applications I use data-id, data-int and some others.
The advantage is that you can put here your own attributes. Please note that the data- tag is supported in HTML5 and above.
There is no connection between data-content-field and SEO.
I hope it helps you! Read some tutorials about SEO, they will probably make you a better programmer/webmaster.

All data-* attributes are for internal use within a page or site, basically in client-side scripting, site specific search systems and things like that. General search engines should be expected to ignore them. The example you mention looks rather misguided, among other things because h1 is meant to be page-specific, not site-specific and because here a class attribute would intuitively look more suitable - but in the absence of an address of the page, this is just an educated guess.

Related

Use header tag in html

Why would one use <header> tags or <footer> or <address> tags? Is it just for SEO, or another reason?
I ask this question because IE8 and older doesn't support many of these elements.
These are all tags introduced with HTML5. They are part of an evolution of HTML. They're not supported in IE8 because they were introduced after support for IE8 ended. They were introduced to provide more logical elements that were commonly used in web page designs.
If you need to support IE8, you can do so by not using these tags and sticking with <div> tags with classes, such as:
<div class="header"></div>
<div class="footer"></div>
<div class="address"></div>
accompanied, of course, by CSS styles for each.
They have nothing to do with SEO.
The tags you describe, such as a header tag, are what is known as 'syntactic sugar'. That is, it makes it easier to read and know what the intent of the tag is. This is good for human readers, but it is especially useful for automated systems.
Compare these examples that all could mean exactly the same thing:
<header>...</header>
<div class="header">...</div>
<div class="hdr">...</div>
Note that header is easy to differentiate from the two div tags. Only if you know what the class attribute means will you understand what the div tags are defining. Because the class attribute value is free-form, it means there is no standard definition.
SEO is one example of an automated system that might need to read the tag directly and understand that it has a semantic meaning. A particularly observant SEO engine might understand that the above three tags all refer to the same semantic definition, but you will agree that writing an engine that would presciently know all three mean 'header' would be difficult. In fact, there is nothing to disambiguate the latter two from <div class="zebra">.
Having automated systems be able to read your code and understand the semantic meaning goes well beyond SEO: it can make automated handling of mobile versions easier, for instance. You no longer have to hand-roll code for your specific implementation. You can use Javascript libraries that might do handy things with your headers, such as allow them to be sortable or auto-link them. Anyone writing those libraries has an easier time doing it.
You also ask why you would use something not supported by older browsers that are still widely used. That is a question of demographic: what is your app targeting? If you need the 6% of the world that is using an old browser to utilize your application, then you should absolutely use backwards compatible techniques. If, on the other hand, you want to make the UX the best possible for a set of users likely to be using a modern browser, then you should use the new tags. (Note that having bad UX or long development time as you roll your own solutions to things may cost you more than 6% of your application's userbase...)

What is the use of HTML5 semantic markup for intranet applications?

As far as I understand, the only real advantage of HTML5 semantic markup is for search engines and web crawlers to interpret the document better.
Since intranet applications have nothing to do with search engines or web crawlers, what are the advantages of using semantic markup in HTML5?
There is no straightforward example to point out, but the website (even intranet) can be consumed by different user agents (on different devices).
You are probably familiar with Skype (and the iOS Safari) making phone-number-like words clickable. In the future I can easily imagine mobile browsers being smarter to assist the user in completing tasks on the page, like importing a clearly indicated contact to the address book.
Screenreaders for blind people?
While there is not a whole lot of immediate benefit for non-disabled people, it is still good practice. Does your company not have any externally facing sites? If it does, do those people not look at internal page code? Good practices spread just like bad ones.
see also: http://en.wikipedia.org/wiki/Semantic_HTML
Simply said there the only rule you have to follow if you create your html documents is that it is valid html, otherwise you will have the problem that the browser would try to correct your broken syntax which may result in defects of the visual representation of your content.
In modern browsers you can use display to given any element - with some limitations e.g. with input element - the same visual look and behavior then any other element.
So if you ask what are the advantages of using semantic markup in HTML5 you should ask, why to use any of the semantic markup if it is possible to have the same result using css.
The short answer is, no one will stop you if it is your own project where you are responsible to - except the client that probably gives you requirements.
It is similar to asking: Language xyz provides comments and there is a syntax for doc-comments, but why should I use them?.
Using the semantics wisely increases the readability and thus maintainability. You are not required to use every possibility of semantics at all costs.
Using them will help you to get into the code again if you haven't looked at it for a longer time, e.g. to distinguish between the elements that encapsulate logical parts and elements that are used for styling. Especially if you use a template engine to create your code or to search for certain elements in multiple files.
Even if you are now the only one who works on the code it may happen that if the project grows that you need other people to work on the code. Or for the situation, you for some reason are not available, someone else needs to maintain your code, a good markup is essential.
Using the correct markup and additions like WAI-ARIA is not only essential for handicapped people, but also allows the browser to recognize the meaning of elements, allowing to e.g. improve the keyboard navigation. Especially in a productive environment where you need to type much, it is often faster to navigate with keyboard then using a trackpad or a mouse.

HTML5 for marking up functionality - what semantic tags should I use?

When it comes to writing blog markup, I absolutely understand the use of article and section tags. But my masthead sections have two widgets. One has a search engine embedded and the other is marketing copy leading to an FAQ page.
What would be the correct HTML5 markup in this case? How do I mark up widget functionality?
my masthead sections have two widgets. One has a search engine embedded...
A search engine embedded? Do you mean a search field, i.e. a text field into which you can type search terms? For that, you want <input type="search">.
...and the other is marketing copy leading to an FAQ page.
Does this really qualify as a “widget”? If it’s marketing copy “leading” to an FAQ page, that just sounds like a link to me, which has been semantically represented in HTML since version 1 with the <a> element.
HTML is pretty simple, you really don’t want to over-think it. You don’t need specific tags for everything people could possibly give a name to. (What exactly is a “widget”? Isn’t it just a section of the page?) For most things, <section> is fine.
While HTML5 is a big improvement, there's one thing it doesn't fix: The subjectivity of what is considered proper semantics for every situation.
And, I doubt HTML will ever fix that.
If you're already using HTML5 containers for other more obvious parts of the page, I wouldn't sweat these too elements much. You could put the marketing stuff in an aside. Search could be considered a form of nav. But...I don't think bad karma will come your way if you just stick them in a couple of divs, either. ;)

Embed sandboxed HTML on a webpage + SEO

I want to embed some HTML on my website... I would like that:
SEO: that content can be crawled and indexed
Integration: it renders nicely (does not break my DOM trees for instance, or does not inherit my styles)
Security: it remains safe for our user (javascript disabled)
Flexibility: the HTML can be completely free (don't want any BBCode or MarkDown or even TinyMCE, it's our users that are writing the HTML code...)
I saw that I might be able to use the IFrame for that, but I am not sure it is a very good solution concerning my SEO constraint.
Any answer would be greatly appreciated!!! Thanks.
For your requirements (rendering and security, primarily), IFRAME seems to be your only option, especially when we consider no rules are specified for the HTML content except the JS removal. Even some CSS + 'a' tag can bring a serious security risk, like overlaying outgoing links on your standard interface.
For the SEO part, you can use SEO maps to show the search engines the relation between the content and the container, also use html tags like link to make connection.
To make sure the user's html is safe then you should use HTMLPurifer. In terms of the rest of the question, you should split this up into multiple questions.

Is semantic markup too open-ended? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am taking a peek at Dive Into HTML5. It seems nice and interesting, but I am puzzled.
In the 1990s, at the time when Netscape was the browser and HTML was HTML2 or HTML3, there were a lot of tags: address, cite, code... Most of them are unused as of today, probably even obsolete.
HTML5 introduces tags to express "semantic meaning" to the tag itself. This is all fun and games, but I see something very strange in this approach. Technically, the semantics can be very open ended. HTML5 has tags for article, time, navigation bars, footer. Why shouldn't it contain tags for post icon, author's place, name and surname, or whatever else you want to assign specific semantics to (I'm confident <rant> and <nsfw> would be very important tags): ? I thought XML was the strategy to assign semantics to stuff. Nothing forbids you to put an XML chunk under a XHTML div element, and assign a stylesheet to it so to style it properly, or to delegate to the proper viewer the handling of that namespace (for example, when handling RSS or SVG).
In conclusion, I don't understand the reason behind this extensions focused towards semantics, when it's clear that semantic is a very broad topic, which is guaranteed to require a potentially infinite amount of semantic tags. Since I am pretty sure there are clever people at W3C, I think I'm wrong, but I'd like to know why.
Why are tags for article, time, navigation bars, footer useful?
Because they facilitate parsing for text processing tools like Google.
It's nothing about semantics (at least in 'broad' meaning). Instead they just say: here is the body of page (most important text part) and there is the navigation bar full of links. With such an approach you can easily extract just what you need.
I too hate the way that W3C is going with their specs. There are many things that I don't like, and this "semantics" fad is one of them. (Others include taking forever to complete their specs and leaving too many important details for the browsers to implement as they choose)
Most of all I don't like it because it makes my work as a web developer more difficult. I often have to make a choice whether to make the webpage "semantically correct" or "visually/aesthetically pleasing". The latter wins of course, because that is what the users want, but as a result validations start failing and the whole thing gets quite non-semantic (tables for layout and other things).
Another issue at which I frown is that they have officialy declared that the "class" attribute is for semantics, but then they used it for visual presentation selectors in CSS.
Bottom line - DON'T MIX SEMANTICS AND VISUAL REPRESENTATION. If you use some mechanism for describing semantics (like tag names, attribute values, or what not else), then don't use it for funcional/visual purposes and vice versa.
If I would design HTML, I would simply add an attribute "semantic" which could (like the "class" attribute) be added to any tag. Then there would be a number of predefined values like all those headers/footers/articles/quotes/etc.
Tags would define functionality. Basically you could reduce HTML tags to just a handful, like "div", "table/tr/td", "a", "img", "form", "input" and "select". I probably missed a few but this is the bulk. Visual styling would be accomplished through CSS.
This way the three areas - semantics, visual representation, and functionality - would be completely independent and wouldn't clash in real life solutions.
Of course, I don't think W3C is interested in practical solutions...
There is already a lot of semantics in HTML markup in the forms of classes and IDs, of which there is a (near) infinite amount of possibilities of, And everyone has their own way of handling these semantics. One of the goals of HTML5 is to try to bring some structure to this. you will still be able to extend the semantics of tags with classes and ids. It will also most likely make things easier for search engines.
Look at it from the angle of trying to make statements either about the page, or about objects referenced from the page. If you see a <footer> tag, all you can say is "stuff in here is a footer" and pass it by. As such, adding custom tags is not as generic a solution as adding attributes and allowing people to use their own choice of URIs to specify predicates and optionally values - RDFa wins hands-down because you can express any triple-statement you like from RDF in a page, one way or another.
I just want to address one part of your question. You say:
In the nineties, at the time when
Netscape was the browser and html was
HTML2 or HTML3, there were a lot of
tags: address, cite, code... Most of
them are unused as of today, probably
even obsolete.
There are a great deal of tags to choose from in html, but the lack of usage does not imply that they are obsolete. In particular the header tags <h1>, etc, and <ul>, <ol> are used to join items into lists in a way I consider semantic. Many people may not use tags semantically, but the effort to create microformats is an ongoing continuation of the idea you consider an artifact of the 1990s. Efforts to make the semantic web be a winner keeps going, despite full-text search and link analysis (in the form of Google) being the winner as far as how to find and understand the web.
It would be great to see an updated version of Google's Web Stats which show "html as she is spoke." But you are right that many tags are underused.
Whether html5 will be successful is an open and interesting question, but the tags you describe as obsolete didn't go anywhere, they were there in HTML 4.01 and xhtml. HTML5 seems to be an effort to solidify what is useful in tags. In the end if html5 gets support in browsers and makes the job of web developers easier, it will succeed. xhtml2 failed because it roundly failed to gain adoption in browsers and did nothing to make the job of web page makers easier. The forces working on html5 seem keenly aware of the failure of xhtml2, and I think are avoiding having html5 suffer a similar fate.
"Why shouldn't it contain tags for post icon, author's place, name and surname, or whatever else you want to assign specific semantics to (I'm confident and would be very important tags): ?"
You use <dialog> to describe conversations or comments. Rant and NSFW are subjective terms therefore it makes sense not to use them.
From what I understand a bunch of experienced web developers did research and looked for what most websites have in common in html. They noticed that most websitse have id="header", id="footer", id="section" and id="nav" tags so they decided that we need HTML tags to replace those id's. So in other words, don't expect them to give you a HUGE amount of HTML vocabulary. Just keep it simple as possible as you can while addressing the MOST common needed HTML tags.
NAV tag is VERY important for providing accessibility as well. You want them to know where the navigation is rather than to force them to find whether links are for navigation or not.
I disagree with adding extra tags. If detailed vocabulary were actually import then there could be a different tag name for every word in the dictionary. Additional tags names are not helpful as they may communicate additional meaning to humans, but do nothing to facilitate machine parsing of the language. This is why I don't like the "semantic" tags for HTML5 as I believe this to be slippery slope to providing a vocabulary too complex while only providing a weak solution to a problem not fully addressed.
In my opinion markup language structure data as much as describe it in a tree diagram form. Through parsing of the structure and proper use of semantic conventions, such as RDFa, context can be leveraged to provide specific meaning to otherwise generic tag names. In such as case excessive vocabulary need not exist and structurally redundant tag names, such as footer and aside, could be eliminated. The final objective is to make content faster and more accurate to interpret by both humans and machines simultaneously while using as little code as possible to achieve that result. How that solution is lesser important, except to HTML5.
I thought XML was the strategy to assign semantics to stuff.
As far as I know, no it wasn’t. XML allows new languages to be defined which are all parsed in the same way, because they all use the XML syntax.
It doesn’t, of itself, provide any way to add meaning (“semantic” just means “meaningful”) to those languages. And until computers get artificial intelligence, they don’t actually understand meaning, so meaning is just what is agreed between human beings. HTML is the most commonly-used language with agreed meaning of its tags.
As HTML is so common, it’s helpful to add a few meaningful tags to it that are quite general in their application. The new HTML5 tags are aimed at that. The HTML5 spec’s authors could indeed carry on down this route, creating tags for every specific bit of meaning possible, but as they’re not robots, they probably won’t.
<section> is useful, and general enough to be meaningfully applicable in lots of documents. <author-last-name> isn’t. Distinguishing between the two is a judgment call, which is why humans, and not computers, write the spec.
For custom semantics that are too specific to be added to HTML as tags, HTML5 defines microdata.
I've been reading Andy Clark's book Transcending CSS (page 33).
...,it is now widely accepted that presentational names such as header, left, or red that describe an element's look or position are poor choices.
After reading these lines I asked myself: hey, aren't there elements in HTML5 spec such as header, footer?? Why is footer more semantic ? Andy in his book advocates to use site-info for the ID of the footer div and this makes more sense IMHO. Footer is a presentational name (describes the element's position).
In a word, AJAX. The new tags are meant to support what real-world developers are doing by replacing some of the <div class="sidebar-wrap"><div class="styling-hook"><div><ul class="nav"> type of divitis many websites suffer from. The only <div> left in the HTML5 is the styling hook.
The semantics that get promoted to tags from classes are those that developers have freely adopted en-masse as best practices, given an extended xhtml/css adoption period. Check out the WHATWG developer's edition of the spec's sections pagehere. The document itself is a pleasure, but I won't spoil it if you haven't seen it yet.
One of the less obvious reasons for some decisions made by the W3C is the importance of Webkit. If you look, you can see that they were better than some at taking the current work of the HTML5 Working Group and implementing ideas. They have historically been way out ahead in compliance (see here). The W3C placed a high priority on their (i.e. Android, iPhone, the Googlebot, Chrome, Safari, Dreamweaver, etc.,). Google, framework users, Wordpress/Moveable Type/Joomla! type users and others wanted self contained building blocks, so this is the style we get.
Facebook is modular. Responsive design's grids are modular. Wordpress is modular. Ajax works best with modular page structures. Widgets are modules. Plug-ins are modules. It would seem that we should be trying to figure out stuff like how to apply these tags to make it easier to hook the appropriate elements and activate them in our document/application/info-network hybrid Web 2.0.
In closing, HTML5 is meant to be written as xml (again, see the spec) in order to ensure that tools and machines making ajax requests for a portion of a document will get a well-formed useful response. How awesome in combination with things like media queries for devices like feed readers, braille printers, annotators, etc.,. I see a (near)future where anything with good semantic content is it's own newsfeed automagically! This only happens if developers adopt and write compliant documents.