html4 header tag position - html

In all my websites XHTML source code, navigation and breadcrumbs appear below the content of the page yet visually they appear above. I am doing this as believe that in such way search engines find content more relevant.
In all the HTML5 examples I've seen, the order is classical:
header, body section, footer.
From SEO point of view, by working on HTML5 page, is it better to use classical tags order or the one I used till now in XHTML?

Unfortunately, this is more or less outdated advice.
Both Google and Bing have for many years now had the ability to render the DOM of the page and determine the actual layout of the page regardless of how the code is structured.
The old theory behind this technique was that search engines would only index the first 100kb or so of a page and typically that could be taken up by templated boilerplate code in some instances. This isn't a restriction that really exists anymore and to be honest if your pages are reaching that kind of filesize you probably have other things that you want to consider.

I think it is better when the content with keywords appear earlier in the source code. For the general link structure it doesn't matter where main navigation links are placed.
But maybe search engines can weight structure different when using standard semantic ids like navigation, breadcrumbs, content and footer? In this case the position would be equal. Isn't the semantic thing one of the big advantages of HTML 5?!

Related

How do you create a web page for reader mode?

What do I do to take advantage of Reader Mode in browsers to present a simplified, cleaner version of a web page?
I've used Google looking for information on how a page is coded or otherwise set up for reader mode and I have not found anything. Is there a document or web page somewhere that explains reader mode and how to set up a page which can take advantage of it?
There are no standards for how reading mode works, and it works quite different from browser to browser. You can help ensure reading mode works well with your sites by sticking to conventions for your document title, and providing certain metadata elements. I’ve documented this in detail here.
In theory: Do nothing.
Site authors generally don't like Reader mode - it hides their adverts, and throws away the design. It is designed to work in spite of sites and not require specific effort on their parts.
That said, it is more likely to work well when given high quality, well-structured markup.
Write valid HTML. Make use of semantic elements such as <main>, <nav>, <header>, <footer>, <h1-6> and so on.
Have you read this? https://mathiasbynens.be/notes/safari-reader
Here are the conclusion notes taken straight from the website
Conclusion
This definitely needs more investigating, but so far, these appear to
be the most important factors for Safari’s Reader functionality to
kick in:
Use the right markup, i.e. make sure the most important content is wrapped inside a container element. Whether you use article, div or even span
doesn’t seem to matter — as long as it’s not p.
The content needs to be long enough. Use enough words, use enough paragraphs, use enough punctuation. Every paragraph should have at least 100 characters.
Reader doesn’t work for local documents.
I think the essential thing is not only to use valid HTML, but to use HTML5 with everything it provides to structure a text semantically: <section> to have clearly seperated parts, <article> if you have thematically independent parts (like different blog posts / articles), <header> to clearly mark the header section, <h1> to <h6> for hierarchically different headers (where you should always be careful to never omit a level, i.e. for example not have an h4 directly in an h2 section, always use nav for navigation menus etc., paragraphs, lists, footers and so on...
BTW: <div>s don't have any semantic meaning, so you can use these (together with class attributes) to do whatever to tweak the result visually for "not-read-mode".

Is it better not to change html markup for the same content on different page?

I am using section tag for grouping topics and replies on the forum page. In cases that I need to load the topic and its replies on other article page, I use div tag for the same block and change topic title from h1 to h2. Although it is valid. But, for assistive technology, will this make navigating a bit confusing?
Assuming that the assistive technology you are talking about concerns mainly screenreaders, the best way for you to know how accessible your pages are is by downloading one yourself and testing it out. A free screenreader that I have used to do this is called NVDA but there are more out there.
In general, screenreaders work best when a page has a logical structure behind it. If you are displaying several articles, make sure that each article is located in a similar heirarchical location on the page and that each article itself resembles the others in terms of its structure. Using HTML5 semantic tags like article, aside and the like can be helpful but are not necessary. Screenreaders and other assistive technologies have made due for a lot longer than these tags have been around. They are certainly good to use when possible, but there are other more important ways to make your page accessible to as wide an audience as possible.
Another good thing to do is to use header tags for titles, and to use them in order. Screenreaders often give the option to users to skip from heading to heading in order to get a summary of what is on the page. You can also include visually invisible (via placing them far off the edge of the page using CSS) links at the top of the page, or in sections where placing a heading may not be appropriate visually. These will be read in context by screenreaders without your non-visually-impaired users seeing them.
If you are concerned about accessibility, a good way to get a clearer picture of how accessible your pages are is by following the WCAG (Web Content Accessibility Guidelines) standard recommendations. WCAG is managed by the W3C, and has various levels of accessibility that you can consider respecting when developing your content. The W3C has a list of validators that can be found here.
To answer your question from comments:
How it sound when read a topic title as h2, click it, then arrive the forum page and this topic title become h1?
This shouldn't confuse most people, especially if you do it consistently. I am assuming that you are making a news-like site.
Above Levi mentioned article tags. I would recommend using them if you are having multiple stories per page. The div tag is roughly the garbage can of the HTML world, you only should use it when nothing else is available. Article tags both give your code better syntaxical value as well as they have another feature, called a role. Roles allow a person using a screen reader to jump around a page, like they can with heading tags.

<nav> vs <article> for SEO

In term of SEO, if I want to group relevant page content together to maximize search engine readability, should I use the tag <nav> or <article>?
1) It's not there yet.
2) If it was, and you were wrapping menus as article, or wrapping affiliate link-farms as article, Google would slap you (keep that in mind in three or four years).
3) If you have lots of legitimate content, and each piece of content is self-contained (ie: suitable for article), then not only should you wrap it in an article tag, but you should also learn how to use Google's "Rich Snippet Tool", which was recently renamed "Structured Data Tool".
If you learn how to mark things up, both in an html5-friendly way, and in a Google-friendly microformat, then GoogleBot will grab all of the content it knows how, and it will be displayed in search results and elsewhere, when relevant.
Like I said... ...that's if you've got content which is worthy of doing this, because otherwise, Google will slap you, eventually, if you try to use it for evil.
article tag:-
The tag allows to mark separate entries in an online publication, such as a blog or a magazine. It is expected that when articles are marked with the tag, this will make the HTML code cleaner because it will reduce the need to use tags. Also, probably search engines will put more weight on the text inside the tag as compared to the contents on the other parts of the page.
nav tag:-Navigation is one of the important factors for SEO and everything that eases navigation is welcome. The new tag can be used to identify a collection of links to other pages.
so both tag have their own functionality which can be implemented according to need.

HTML5 for marking up functionality - what semantic tags should I use?

When it comes to writing blog markup, I absolutely understand the use of article and section tags. But my masthead sections have two widgets. One has a search engine embedded and the other is marketing copy leading to an FAQ page.
What would be the correct HTML5 markup in this case? How do I mark up widget functionality?
my masthead sections have two widgets. One has a search engine embedded...
A search engine embedded? Do you mean a search field, i.e. a text field into which you can type search terms? For that, you want <input type="search">.
...and the other is marketing copy leading to an FAQ page.
Does this really qualify as a “widget”? If it’s marketing copy “leading” to an FAQ page, that just sounds like a link to me, which has been semantically represented in HTML since version 1 with the <a> element.
HTML is pretty simple, you really don’t want to over-think it. You don’t need specific tags for everything people could possibly give a name to. (What exactly is a “widget”? Isn’t it just a section of the page?) For most things, <section> is fine.
While HTML5 is a big improvement, there's one thing it doesn't fix: The subjectivity of what is considered proper semantics for every situation.
And, I doubt HTML will ever fix that.
If you're already using HTML5 containers for other more obvious parts of the page, I wouldn't sweat these too elements much. You could put the marketing stuff in an aside. Search could be considered a form of nav. But...I don't think bad karma will come your way if you just stick them in a couple of divs, either. ;)

Is there a way to make search bots ignore certain text? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 months ago.
Improve this question
I have my blog (you can see it if you want, from my profile), and it's fresh, as well as google robots parsing results are.
The results were alarming to me. Apparently the most common 2 words on my site are "rss" and "feed", because I use text for links like "Comments RSS", "Post Feed", etc. These 2 words will be present in every post, while other words will be more rare.
Is there a way to make these links disappear from Google's parsing? I don't want technical links getting indexed. I only want content, titles, descriptions to get indexed. I am looking for something other than replacing this text with images.
I found some old discussions on Google, back from 2007 (I think in 3 years many things could have changed, hopefully this too)
This question is not about robots.txt and how to make Google ignore pages. It is about making it ignore small parts of the page, or transforming the parts in such a way that it will be seen by humans and invisible to robots.
There is a simple way to tell google to not index parts of your documents, that is using googleon and googleoff:
<p>This is normal (X)HTML content that will be indexed by Google.</p>
<!--googleoff: index-->
<p>This (X)HTML content will NOT be indexed by Google.</p>
<!--googleon: index-->
In this example, the second paragraph will not be indexed by Google. Notice the “index” parameter, which may be set to any of the following:
index — content surrounded by “googleoff: index” will not be indexed
by Google
anchor — anchor text for any links within a “googleoff: anchor” area
will not be associated with the target page
snippet — content surrounded by “googleoff: snippet” will not be used
to create snippets for search results
all — content surrounded by “googleoff: all” are treated with all
source
Google ignores HTML tags which have data-nosnippet:
<p>
This text can be included in a snippet
<span data-nosnippet>and this part would not be shown</span>.
</p>
Source: Special tags that Google understands - Inline directives
I work on a site with top-3 google ranking for thousands of school names in the US, and we do a lot of work to protect our SEO. There are 3 main things you could do (which are all probably a waste of time, keep reading):
Move the stuff you want to downplay to the bottom of your HTML and use CSS and/or to place it where you want readers to see it. This won't hide it from crawlers, but they'll value it lower.
Replace those links with images (you say you don't want to do that, but don't explain why not)
Serve a different page to crawlers, with those links stripped. There's nothing black hat about this, as long as the content is fundamentally the same as a browser sees. Search engines will ding you if you serve up a page that's significantly different from what users see, but if you stripped RSS links from the version of the page crawlers index, you would not have a problem.
That said, crawlers are smart, and you're not the only site filled with permalink and rss links. They care about context, and look for terms and phrases in your headings and body text. They know how to determine that your blog is about technology and not RSS. I highly doubt those links have any negative effect on your SEO. What problem are you actually trying to solve?
If you want to build SEO, figure out what value you provide to readers and write about that. Say interesting things that will lead others to link to your blog, and crawlers will understand that you're an information source that people value. Think more about what your readers see and understand, and less about what you think a crawler sees.
Firstly think about the issue. If Google think "RSS" is the main keyword that may suggest the rest of your content is a bit shallow and needs expanding. Perhaps this should be the focus of your attention.If the rest of your content is rich I wouldn't worry about the issue as a search engine should know what the page is about from title and headings. Just make sure RSS etc is not in a heading or bold or strong tag.
Secondly as you rightly mention, you probably don't want use images as they are not assessable to screen readers without alt text and if they have alt text or supporting text then you add the keyword back in. However aria live may help you get around this issue, but I'm not an expert on accessibility.
Options:
Use JavaScript to write that bit of content (maybe ajax it in after load). Search engines like Google can execute JavaScript but I would guess it wont value any JS written content very highly.
Re-word the content or remove duplicates of it, one prominent RSS feed link may be better than several smaller ones dotted around the page.
Use the css content attribute with pseudo :before or :after to add your content. I'm not sure if bots will index words in content attributes in CSS and know that contents value in relation to each page but it seems unlikely. Putting words like RSS in the CSS basically says it's a style thing not an HTML thing, therefore even if engines to index it they wont add much/any value to it. For example, the HTML and CSS could be:
.add-text:after { content:'View my RSS feed'; }
Note the above will not work in older versions of IE, so you may need some IE version comments if you care about that.
"googleon" and "googleoff" are only supported by the Google Search Appliance (when you host your own search results, usually for your own internal website).
They are not supported by Google's web-search at all. So please refrain from doing that and I think that should not be marked as a correct answer as this might create ambiguity.
Now, to get Google to exclude part of a page, you will need to place that content in a separate file, such as excluded.html, and use an iframe to display that content in the host page.
The iframe tag grabs content from another file and inserts it into the host page. I think there is no other available method so far.
The only control that you have over the indexing robots, is the robots.txt file. See this documentation, linked by Google on their page explaining the usage of the file.
You basically can prohibit certain links and URL's but not necessarily keywords.
Other than black-hat server-side methods, there is nothing you can do. You may want to look at why you have those words so often and remove some of them from the site.
It used to be that you could use JS to "hide" things from googlebot, but you can't now that it parses JS. ( http://www.webmasterworld.com/google/4159807.htm )
Google crawler are smart but someone that program them are smartest. Human always sees what is sensible in the page, they will spend time on blog that have some nice content and most rare and unique.
It is all about common sense, how people visit your blog and how much time they spend. Google measure the search result in the same way. Your page ranking also increase as daily visits increase and site content get better and update every day.
This page has "Answer" words repeated multiple times. It doesn't mean that it will not get indexed. It is how much useful is to every one.
I hope it will give you some idea
you have to manually detect the "Google Bot" from request's user agent and feed them little different content than you normally serve to your user.