How to tell google this text is part of another article - html

After every article in my website there are previews for other articles. They are random previews.
The problem is the previews are really big: got headline, subheadline and 6 rows of text. Sometimes google thinks they are part of my article.
Is there any way to tell google that this div contains text from another article?
preview example:

By using the appropriate semantic markup that HTML5 offers, user agents (like Google) would, in principle, be able to understand this; but that, of course, doesn’t necessarily mean that they (currently) support (all of) this.
The teasers should be outside of the main element.
Signal: It’s not part of this page’s main content.
The teasers should be in an aside element.
Signal: It’s only "tangentially related" to the page’s content.
Each teaser should be in its own article element.
Signal: It’s a self-contained item of content.
Each teaser’s link (to the full article) should get the bookmark link type.
Signal: The permalink URL of the teaser/article is not the same as the current page’s URL.
(One could also consider using the blockquote element for the parts taken over literally, i.e., in cases where the teaser doesn’t contain (slightly) different content, like a summary. But it depends on your understanding of your content, if you really quote here.)
However, that doesn’t stop Google to show parts of the teasers in their SERPs (if their algorithms deem it useful, get confused, or whatever). Without using some "hacks" (e.g., with JS or an iframe), it’s not possible nor intended to hide parts of the page for Google Search and their SERPs.

Wrap the preview article div in
<!--googleoff: all-->
<!--googleon: all>
That tells Google not to index that part of your page.
You can costumize the tag to your preference:
index — content surrounded by “googleoff: index” will not be indexed by Google
anchor — anchor text for any links within a “googleoff: anchor” area will not be associated with the target page
snippet — content surrounded by “googleoff: snippet” will not be used to create snippets for search results
all — content surrounded by “googleoff: all” are treated with all attributes: index, anchor, and snippet
(Source)

Related

div tag and nav tag uses in htlm [duplicate]

This question already has answers here:
Why should I use 'li' instead of 'div'?
(15 answers)
Are new HTML5 elements like <section> and <article> pointless? [closed]
(8 answers)
Why use HTML5 tags? [duplicate]
(1 answer)
Closed 9 years ago.
Why use HTML5 semantic tags like headers, section, nav, and article instead of simply div with the preferred css to it?
I created a webpage and used those tags, but they do not make a difference from div. What is their main purpose?
Is it only for the appropriate names for the tags while using it or more than that?
Please explain. I have gone through many sites, but I could not find these basics.
The Oxford Dictionary states:
semantics: the branch of linguistics and logic concerned with meaning.
As their name says, these tags are meant to improve the meaning of your web page. Good semantics plays an important role the automated processing of documents. This automated processing happens more often than you realize - each website ranking from search engines is derived from automated processing of all the website out there.
If you visit a (well designed) web page, you as the human reader can immediately (visually) distinguish all the page elements and more importantly understand the content. In the top left you see the company logo, next to it is the site navigation, there is a search bar and some text about the company, a link to a product you can buy and a legal disclaimer at the bottom.
However, machines are dumb and cannot do this:
Looking at the same page as you, all the web crawler would see is an image, a list of anchors tags, a text node, an input field and an image with a link on it. At the bottom there is another text node.
Now, how should they know, what part of the document you intended to be the navigation or the main article, or some not-so-important footnote? They can guess by analyzing your document structure using some common criteria which are a hint for a specific element.
E.g. an ul list of internal links is most likely some kind of page navigation and the text at the end of the document is something necessary but not so important to the everyday viewer (the legal disclaimer).
Now imagine instead of a plain div, a nav element would be used – the machine immediately knows what the purpose of this element is:
// machine: okay, this structure looks like it might be a navigation element?
<div><ul><li><a href="internal_link">...</div>
// machine: ah, a navigation element!
<nav><ul><li><a>...</nav>
Now the text inside a main tag – this is clearly the most important information of the page! Over there to the left, that text node, the image and the anchor node all belong together, because they are grouped inside a section tag, and down at the bottom there is some text inside a footer element (they still don't know the meaning of that text, but now they can deduce it's some sort of fine print).
Example:
You, as the user (reading a page without seeing the actual markup), don't care if an element is enclosed in an <i> or <em> tag. In most browsers both of these tags will be rendered identically – as italic text – and as long as it stands out between the surrounding text it serves its purpose.
However, there is a big difference in terms of semantics:
<i> means italic - it's simply a presentational hint for the browser on how to render it (italic) and does not necessarily contain deeper semantic information.
<em> means emphasize - it indicates an important piece of information. Now the browser is not bound to the italic instruction any more, but could render it in italic or bold or underlined or in a different color... For visually impaired persons, the screen readers can raise the voice - whatever method seems most suited in a specific situation to emphasise this important information.
Final thought:
Semantic tags are not the end. There are things like metadata, ontologies, resource description languages which go a step further and help connect data between different web pages and can even help create new knowledge!
E.g. wikipedia is doing a really bad job at semantically presenting data.
https://en.wikipedia.org/wiki/Barack_Obama
https://en.wikipedia.org/wiki/Donald_Trump
https://en.wikipedia.org/wiki/Joe_Biden
All three are persons who at some point in time where president of the USA.
All three articles contain a sidebar that displays these information, and you can compare them (by opening both pages and then switching back and forth), but they are not semantically described.
Instead, if wikipedia used an ontology to describe a person: http://dbpedia.org/ontology/Person
<!-- President is a subclass of Politician which is a subclass of Person -->
<President>
<birthname>Barrack Hussein Obama II</birthname>
<birthdate>1961-08-04</birthdate>
<headOf>country::USA</headOf>
<tenure>2009-01-20 – 2017-01-20</tenure>
</President>
Not only could you (and machines) now directly compare those three directly (on a dynamically generated page!), but you could even create new knowledge, e.g. show a list of all presidents of the United States - quite boring but also cool stuff like who are all the current world leaders, how many female world leaders do we have, who is the youngest leader, how many types of leaders are there (presidents/emperors/queens/dictators), who served the longest, how many of them are taller than 175cm and have brown eyes, etc. etc.
In conclusion, good semantics is super cool (but also – on a technical level – hard to achieve and maintain).
There's a nice little article on HTML5 semantics on HTML5Doctor.
Semantics have been a part of HTML in some form or another. It helps you understand what's happening where on the page.
Earlier when <div> was used for pretty much everything, we still implemented semantics by giving it a "semantic" class name or an id name.
These tags help in proper structuring and understanding of the layout.
If you do,
<div class="nav"></div>
as opposed to,
<nav></nav>
OR
<div class="sidebar"></div>
as opposed to,
<aside></aside>
there's nothing wrong, but the latter helps in providing better readability for you as well as crawlers, readers, etc..
In the div tag you have to give an id which tells about what kind of content it is holding, either body, header, footer, etc.
While in case of semantic elements of HTML5, the name clearly defines what kind of code it is holding, and it is for which part of the website.
Semantic elements are <header>, <footer>, <section>, <aside>, etc.

Preserving good semantics with repetitive content

Say I'm building a typical document editor:
Where the preview (in red) is an up-to-date, formatted vue of the form's data.
The preview element contains semantic elements (e.g. h1, h2, main, header, etc.). It's kind of a document in itself, which does make sense, conceptually. But this makes the structure of the real document quite confusing for crawlers and screen readers. There might be, for instance, two h1 or main elements. I'm looking for a way to avoid that.
Plus, there's the problem of repetitive content (see image).
For the accessibility part of the problem, I could just add an aria-hidden="true" attribute to the preview element. In fact, visually-impaired people don't need the preview, it's just redundancy to them, they just need the form.
But for crawlers, here are my options:
Don't use semantic elements inside the preview element, use divs instead (😥).
Host the preview at an other URL and insert it via an iframe (that's what I'm doing right now, but it seems hacky to me).
Leave it like that, crawlers don't care.
Any idea/resource/suggestion?
As long as your preview area is clearly indicated for assistive technology, it's perfectly fine to have redundant information. If you have an <iframe>, make sure there's a title attribute on it.
<iframe title="preview area"...>
However, you might have validator issues with multiple structure elements.
For example, HTML only allows one <main> element:
A document must not have more than one main element that does not have the hidden attribute specified.
You can have multiple <header> elements but a <header> has a default role of banner and the banner role says:
Within any document or application, the author SHOULD mark no more than one element with the banner role.
The key here is "should", meaning it's a strong recommendation but not required. You can also get away with multiple banner roles if your preview section has role="document".
I would recommend not using non-semantic elements (div) because an assistive technology user might want to check the actual semantic structure of what's generated, although I suppose you could also have a "show in new tab" option for the preview that uses all full semantics, kind of like your second bullet but not using an iframe.

Semantically, must text which visually looks like a heading use h1-h6 tags?

I have a page which contains a list of items as its content.
When no items exist, the design which I am to implement has a rather large heading reading something like:
'No results for this topic'
Now initially when I saw the design I instinctively wrapped the 'No results' text in a <h2> tag.
Afterwards I noticed that although I included meta content for title and description - Google displayed the 'no results' text as the title in search results - clearly not being the desired result.
Now on one hand I want to stick to semantic markup, but on the other I don't want it to mess up my SEO.
So my question is: Do I really need to use a <h2> element here for semantic markup?
True, the designer decided to display the text to look like a heading - but does this mean semantically that this is a heading?
Just for fun, I checked what Google does when you enter a search phrase with no results:
Result:
The 'No results' isn't displayed like a heading and (hence) not within a h1-h6 tag.
Disclaimer: I tried searching for an answer at W3C here and here but that didn't really help me here.
Edit: I meant the 'No results' to be an example. Actually, I had similar cases where Google picked up other pieces of not-so-relevant text (which I had wrapped in a <h2> because of the design) as the title - even when the page contained many items.
I think that such message shouldn't appear in h2 tag. But there are also other factors that determine what Google will display. All title, description and keywords should vary between pages but it also doesn't guarantee Google will use them.
In fact Google want to be smarter than we are. For one of my pages for English main page version Google used alt logo to display as page title although title is unique so now in Google it's displayed as mainpage - logo instead of normal title.
If I were you I would change "no results" from h2 to regular text for example p. You should also consider if you really need and should have indexed those pages at all.
Google "guidelines" change very often and they can even punish you if you have many subpages with in fact no content.
-- after editing question --
You should check first that your meta tags if they are unique on your page. It means searches (if it is indexes, pagination pages and so on). As I have written just before there is no guarantee that Google uses them at all. Google can use any part of your site and display it in search result as title or description.
Sitemap has no impact what Google indexes (or other search engines). It only help search engines faster index pages that are for example deep in structure. For sub-pages you don't want to be indexed you need to use in html head:
<meta name="Robots" content="noindex,nofollow" />
to stop indexing it by search engines that respect this rule (of course many crawlers / spam spiders don't respect it). After change it takes some time to deindex this page by Google. It depends of course on site size and how often Google spider is visiting your website.

How should I use html5 elements with modal sections of my web page?

I'm pretty inexperienced as far as html goes and even less so with html5.
I have a question regarding modal popups - page sections that are interacted with using javascript/ajax, but not necessarily displayed on the page all the time. These are not generally in the main html flow - I might for instance place all my modal code at the end of the page for maintainability. The question is - should I be declaring these chunks of the page using html section tags, or something else?
To shed more light on the situation I'm describing, I have an application page. This contains a number of sections (I'm not referring to html5 here). The first section is modal on entering the page - it's a "click to continue if you agree" section. The next 5 chunks belong to a stepped application form - each step is displayed on at a time using a multiview control. Then another modal - a UI block, followed by a final decision section.
Since they are modal, and appear out of the flow, it is probably most suitable to use a div for them. If you do want to use a semantic block, then which you use will depend on what the content is, and how it relates to the rest of the page. The following articles should help you make that decision:
http://html5doctor.com/the-section-element/
http://html5doctor.com/the-article-element/
http://html5doctor.com/avoiding-common-html5-mistakes/ (particularly the first section of that article - "Don’t use section as a wrapper for styling")
Edit: Have added that 3rd link, since I now have enough rep to do so :-) yay!
The question is - should I be declaring these chunks of the page as sections, or something else
One of the big advantages of HTML5 is it's sematically readable. If you feel that your modal pop ups are better described by something like an article tag, then use an article. Use the tag you feel most accurately describes your functionality.
For example, let's say I have a sample page like so:
<html>
<head></head>
<body>
<article>
<!-- Some stuff here -->
</article>
</body>
</html>
I would expect the content of that article tag to fit this definition:
The article element represents a component of a page that consists of a self-contained composition in a document, page, application, or site and that is intended to be independently distributable or reusable, e.g. in syndication. This could be a forum post, a magazine or newspaper article, a blog entry, a user-submitted comment, an interactive widget or gadget, or any other independent item of content.
W3C Specification. The Article Element.
Note: In this context, an article is designed to represent flow content. Given that your aim is not to write flow content (as you correctly put) this is not a good example. This is very clear from the definition I've provided.
Similarly, if I replaced article with section, I would expect it to fit this definition:
Examples of sections would be chapters, the various tabbed pages in a tabbed dialog box, or the numbered sections of a thesis. A Web site's home page could be split into sections for an introduction, news items, and contact information.
W3C Specification. The Section Element
If I were you I would have a look through the spec and think the following questions:
What does my content actually mean to the user?
How will my content appear to other programmers?
Does the use of this content give me a hint at the correct semantics?
It depends what you have in your modal.
You could have a login form, subscribe stuff, advertisements, articles, a frame of another page, so it would only make sense to use <section> if they are actually an interesting section of the page, for example, you have an article and then you want to display the autor info in a modal box, then I would say that it would acceptable to use <section>.
So overall if it is part of the content then sounds ok to use that, if is is not you should use a <div>.
I would also say that no one has the answer for this as it is purely opinionated, and quite frankly doesn't matter.
There is also another way to incorporate modals. As they are dependent of JavaScript you could also load the popup contents via AJAX without having them in the document flow. A recent project I worked on, first renders links to a normal and complete HTML page for popup contents (e.g. contact forms). If JS is enabled, a parameter is added to the links to load only the main content without header, menu and sidebars via AJAX.
As the modal content does not really belong to your site content (if it does it shouldn't be a popup but within the documents main content) it shouldn't get marked up with some section, main or article tag. Instead use a div to render the popups or use an iframe if that is admissible for your project.
It doesn't really matter what tag is used for a modal, as long as it's appropriate the purpose (don't use a <fieldset> for example). Usually we see a <div> representing a modal.
You can use the role attribute for semantic information about the purpose of an element. In this case role="dialog" would be appropriate. You can find more info on the role attribute in HTML5 here.
Also note ARIA attributes: They enhance accessibility. For example aria-hidden="true" specifies that the element isn't visible. Screen-readers use this to skip the content.

<nav> vs <article> for SEO

In term of SEO, if I want to group relevant page content together to maximize search engine readability, should I use the tag <nav> or <article>?
1) It's not there yet.
2) If it was, and you were wrapping menus as article, or wrapping affiliate link-farms as article, Google would slap you (keep that in mind in three or four years).
3) If you have lots of legitimate content, and each piece of content is self-contained (ie: suitable for article), then not only should you wrap it in an article tag, but you should also learn how to use Google's "Rich Snippet Tool", which was recently renamed "Structured Data Tool".
If you learn how to mark things up, both in an html5-friendly way, and in a Google-friendly microformat, then GoogleBot will grab all of the content it knows how, and it will be displayed in search results and elsewhere, when relevant.
Like I said... ...that's if you've got content which is worthy of doing this, because otherwise, Google will slap you, eventually, if you try to use it for evil.
article tag:-
The tag allows to mark separate entries in an online publication, such as a blog or a magazine. It is expected that when articles are marked with the tag, this will make the HTML code cleaner because it will reduce the need to use tags. Also, probably search engines will put more weight on the text inside the tag as compared to the contents on the other parts of the page.
nav tag:-Navigation is one of the important factors for SEO and everything that eases navigation is welcome. The new tag can be used to identify a collection of links to other pages.
so both tag have their own functionality which can be implemented according to need.