So recently am reading a book called Adaptive Webdesign and I came across something called an hcard, hcalendar and I went to it's respective documentation page. Now the question is am not understanding how this works? It is used to represent people..and the markup goes like this
<div class="vcard">
<a class="url fn" href="http://tantek.com/">Tantek Çelik</a>
</div>
Now I know these classes have meanings like url indicates that a given link takes the user to a webpage and fn signifies formatted name so on...
So does these classes point the search engines that the content is a hCard or it render's differently etc..Can someone explain me how this works, whats the benefits to do so, and does this have importance from SEO point of view and are these classes predefined?
Edit: So are these classes reserved? What if I use them for other elements? And is there any javvascript which I can call onclick of a button to save a vcard on computer/user device?
This concept allows machines the get detailed informations about content. It's quite simple, you know what a given name is. Machines does not... :)
So you need a way to tell a machine what kind of data your html contains.
For example: You could enrich your data like the example below and allow, maybe an Adressbook-Application, to get detailed informations about which fields should be filled.
<div class="vcard">
<a class="url fn" href="http://tantek.com/">
<span class="family-name">Tantek</span>
<span class="given-name">Çelik</span>
</a>
</div>
This snippet allows the Adressbook-App. to find the given name easily and set it to the correct field. Order doesn't matter here.
Test your "Rich Snippets": http://www.google.com/webmasters/tools/richsnippets
If you haven't declared that you're using the hCard syntax (by using the vcard class), then you're free to use whatever class names you'd like. Even if you did start using the hCard microformat, no styles will be applied implicitly, as microformats are not related to display style.
The purpose of using microformats is to open an interface for exposing metadata. By providing the data in a standardized microformat, anyone parsing your website can use the microformat to find relevant information.
Search engines in particular benefit from this as it allows them to provide more information about a particular resource on their results page.
vCard is a standard for an electronic business card. hCard takes these labels and uses them as class names around data in HTML.Every hCard starts inside a block that has class="vcard".
Some of these types have subproperties. For example, the 'tel' value contains 'type' and 'value'. This way you can specify separate home and business phone numbers. The 'adr' type has a lot of subproperties (post-office-box, extended-address, street-address, locality, region, postal-code, country-name, type, value).
<div class="vcard">
<div class="fn">xxxxx</div>
<div class="adr">
<span class="locality">yyyy</span>,
<span class="country-name">zzzzz</span>
</div>
</div>
The class names don't have to mean anything within your page. However, you can always take advantage of them to style your contact information. You could also style them in your browser's User Style Sheet, so that you can find them while you surf the web. (Original source)
Regarding the SEO aspects, Please checkout this article Tips for Local Search Engine Optimization for Your Site
I don't know exactly of hcard and hcalendar, but for instance, look up a Stack Overflow question on Google, you'll see that the time when it was posted appears next to the content, for many sites it also displays the name of the author.
In other words, Google will use these microformats to enhance the search experience, by providing meta-data for the search as it was parsed from the page.
You help Google, they help you.
I'd recommend you to use http://schema.org/ for microformats. Google officially recommends using it, and it is also fully supported by Bing and many other search engines. When you use schema.org microformats, search engine crawlers will extract data entities from your markup and will display them in search results in corresponding manner.
So yes, there are benefits of using microformats. By using them you can improve behavior of search engine crawlers, your content will be properly indexed and what is more important, it will be properly categorized, so it will appear in customized searches.
Related
I was trying to add microformats as following to my webpage:
<div itemscope itemtype="http://schema.org/Product">
<span itemprop="brand">Company Name</span>
<span itemprop="name">Product Name</span>
<span itemprop="description">Product Description</span>
Product #: <span itemprop="sku">12345</span>
</div>
I thought this microformat will only show up in a google search result page. But after adding it, those information became visible on my webpage, and not in a good shape.
Is there something wrong? Or should I use display:none to make it invisible on my webpage?
Microformats are meant to add machine readable meaning to existing content on the page. They're not invisible meta data, they augment content that's already there. So, yes, it'll show up. You can hide or style it via any of the usual ways in which you hide or style content.
You are using Microdata, not Microformats.
Microdata is a syntax to include structured data within HTML5. Ideally you would use your existing content (i.e., add the needed attributes like itemprop etc. to your already existing markup), and only if that’s not possible, the hidden elements meta and link (which are allowed in the body if used for Microdata).
If you don’t want to use your existing markup and the visible content, you could use an alternative syntax: JSON-LD. This gets included as a data block (using the script element), which is not visible by default.
Don't try to use hide or style on your content, it will have a bad impact on your site. You might get penalized for cloaking if you practice it on all of your pages.
If you are trying to mark/let the bots know about some more info that is not on your page you can try using either the Data Highlighter for simple things in you Search Engine Console (Webmaster Tools) or for more complicated stuff you can try using JSON-LD coding on you pages.
Microformats are HTML. Used to publish a standard API that is consumed and used by search engines, browsers, and other web sites. Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. Microformats are a way to enable "smart scraping" of web pages, so that you can create tools and scripts that losslessly extract machine-readable information from cleanly-formatted, human-readable HTML. Structured Data is the name given to content which is marked up in a specific way, using MicroFormatting, to explain what that content is all about.
It is always recommended to show the Microdata information and not to hide it. You can probably try to give a good shape. It would show up in the Google and Bing result pages as well but you need to wait a little for that. There is nothing wrong with the Microformats applied by you. The thing is SEO need some more patience.
Microdata with Schema.org already better describes any element than HTML5, it seems redundant? For example:
<nav itemscope itemtype="http://schema.org/SiteNavigationElement">
<!-- might as well just be... -->
<div itemscope itemtype="http://schema.org/SiteNavigationElement">
and
<article itemscope itemtype="http://schema.org/NewsArticle">
<!-- might as well just be... -->
<div itemscope itemtype="http://schema.org/NewsArticle">
Some elements create an "outline" for the webpage, but aside from that what's the point? Why not just use divs and forget about the semantic tags, and just use Microdata and Schema.org?
The schema.org definitions are specifically for applications such as search engines (From What is schema.org?):
This site provides a collection of schemas, i.e., html tags, that
webmasters can use to markup their pages in ways recognized by major
search providers. Search engines including Bing, Google, Yahoo! and
Yandex rely on this markup to improve the display of search results,
making it easier for people to find the right web pages.
Your mark-up needs to be understood by browsers and screen-readers as well as search engines (from the schema.org Getting started page):
Usually, HTML tags tell the browser how to display the information
included in the tag. For example, <h1>Avatar</h1> tells the browser to
display the text string "Avatar" in a heading 1 format. However, the
HTML tag doesn't give any information about what that text string
means—"Avatar" could refer to the hugely successful 3D movie, or it
could refer to a type of profile picture—and this can make it more
difficult for search engines to intelligently display relevant content
to a user.
So microdata allows you to add additional semantic meaning to your mark-up (using definitions provided by schema.org) which can be ignored by applications which don't need it, such as browsers, and read by applications which do, such as search engines.
Microdata is not a replacement for using the appropriate semantic-HTML tags where available, it should be used to augment that information. So the simple reason to use nav and article tags along with the microdata is that these tags have meaning to browsers and screen-readers, while the microdata does not.
Actually, your examples are fairly simplistic. I would suggest you have a look at some of the examples on the schema.org getting started page to see how microdata can be used more meaningfully.
To see microdata being used in practice, try googling yourself and inspecting the results. If I search for myself, the first three results (LinkedIn, github and my portfolio page) all display information marked up using microdata which google can pull from the pages and present to the user to help provide more meaningful search results.
The vast majority of terms that we have in schema.org have no overlap with HTML terminology, since they represent kinds of real world thing such as places, processes, products etc.
The problem area highlighted here is the small set of terms around http://schema.org/WebPageElement . I am not aware that any current search engine features make specific use of these, and I would suggest that any publishers who do see value in their use should also employ the corresponding pure HTML markup as well.
I've been researching this and haven't found much in terms of standard solutions for creating a 508-compliant, accessible org chart. We have images that represent organizational structure. It seems like the options would be to create an external file to link to that attempts to represent the relationships in the chart (although I'm not sure if there's a commonly accepted way to do this via text for a hierarchical tree), or maybe create an imagemap that doesn't actually link to anything externally but just exists for the labels. That seems much more of a hack. I also just thought of another potential representation - another html file (linked) that is basically just your standard list, which can represent a unlimited hierarchical complexity. Some labeled items are outside the general hierarchy (so groupings of various types within th hierarchy, etc.). Just wondering if anyone else had run into this, or just seen examples of how others have approached it?
Section 508 says, regarding web-based intranet and internet information and applications, which is probably what matters here: “(a) A text equivalent for every non-text element shall be provided (e.g., via "alt", "longdesc", or in element content).” Any solution that fulfills the requirement is 508 compliant. Note that this is a legal and formal matter; it does not imply that the content is really accessible.
So you can, for example, write a textual description of the organization (equivalent in content to the image) into the alt attribute. There is no defined upper limit on its length. Alternatively, you can use the longdesc attribute to link to a page containing an equivalent description, which may use all the expressive power of HTML, e.g. nested lists, or a table (which has accessibility requirements of course). Software support to longdesc is limited, if not anecdotal, but Section 508 explicitly mentions this possibility. Most sensibly, you can write a textual description, using HTML markup as needed, either in the page content (in which case you can use alt="") or on a separate page that you link to.
For a more specific answer, I think you need to ask a more specific question – like one with a real image representing an org chart.
I'm working toward a deadline that led me to this question more than five years after it was asked. Even now, if somebody hands me a visually presented org chart with no accessible fallback, Jukka's answer offers the best solution I can think of.
But what if we are part of the creation process (which is always the ideal), able to influence accessibility from the start? With well-structured semantic HTML, is it possible that no fallback will be needed? That's what I've gone in search of now, and here are a couple of resources that may be useful to someone in similar need. Both of these are licensed open source, which in both cases (using the MIT License) simply requires keeping the original copyright and license notice in the source code.
Here's a CSS solution proposed by Erin Sullivan.
And here's another that uses the Treeflex CSS library.
I always try to keep content separate from presentation, and CSS offers the possibility for continually customizing, refining and improving the presentation. I expect to use one of these in my current project, and I hope this research benefits others who are committed to better accessibility.
I remember reading an article about a specific set of attributes that could be used to describe the content, type, update frequency etc of elements with dynamic data in them.
The whole idea behind it, was to specify the sections of your Web Applications that have dynamic data in them and provide bots/crawlers with more information.
Just to get the concept, here's a sample code (the attributes i will be using are of course made up)
<div id="username" dataType="string" updateFrequency="rarely">Theo</div>
<div id="score" dataType="integer" updateFrequency="daily">9001</div>
I know this is really vague. Perhaps someone out there knows what I am talking about
Thanks for any help
ARIA (Accessible Rich Internet Applications) attributes are for improving accessibility, not for search engines. They can be used to describe the roles of elements, e.g. saying that a div element is really a progress bar or that a span is for decoration only.
Invented attributes will generally be ignored by all software. According to HTML5 drafts, and in practice, you can use data-* attributes to associate any invisible data with elements, but they are by definition application-specific and ignored by search engines.
Search engines are generally not interested in the update rates of individual elements. They analyze the update rates of pages on the basis of their own observations.
There are ways to associate, in markup, metadata with specific elements, in a manner that may be observed by major search engines; see Schema.org. But in practice, such markup seems to have impact on search engines only for pages in a major commercial or community sites. For Joe Q. Author’s pages, they are probably write-only information (for now).
I know that Google’s search algorithm is mainly based on pagerank. However, it also does analysis and uses the structure of the document H1, H2, title and other HTML tags to enhance the search results.
What is the name of this technique "using the document structure to enhance the search results"?
And are there any academic papers to help me study this area?
The fact that Google is taking the HTML structure into account is well covered in SEO articles however I could not find it in the academic papers.
I think it's called "Semantic Markup"
[...] semantic markup is markup that is descriptive enough to allow us and the machines we program to recognize it and make decisions about it. In other words, markup means something when we can identify it and do useful things with it. In this way, semantic markup becomes more than merely descriptive. It becomes a brilliant mechanism that allows both humans and machines to “understand” the same information. http://www.digital-web.com/articles/writing_semantic_markup/
A more practical article here
http://robertnyman.com/2007/10/29/explaining-semantic-mark-up/
SEO has become almost a religion to some people where they obsess about minutiae. Frankly, I'm not convinced that all this effort is justified.
My advice? Ignore what so-called pundits say and just follow Google's guidelines.
You might be looking for an academic answer but honestly, this isn't an academic question beyond the very basics of how Web indexing works. The reality of a modern page indexing and ranking algorithm is far more complex.
You may want to look at one of the earlier works on search engines. Note the authors' names. You may also want to read Google Patent application 20050071741.
These general principles aside, Google's search algorithm is constantly tweaked based on actual and desired results. The exact workings are a closely guarded secret just to make it harder for people to game the system. Much of the "advice" or descriptions on how Google's search algorithm works is pure supposition.
So, apart from having a title and having well-formed and valid HTML, I don't think you're going to find what you're looking for.
Google very deliberately doesn't give away too much information about its search algorithm, so it's unlikely you will find a definitve answer or academic paper that confirms this. If you're interested from an SEO point of view, just write your pages so they are good for humans and the robots will like them too.
To make a page good for humans, you SHOULD use tags such as h1, h2 and so on to create a hierarchical page outlay... a bit like this...
h1 "Contact Us"
...h2 "Contact Details"
......h3 "Telephone Numbers"
......h3 "Email Addresses"
...h2 "How To Find Us"
......h3 "By Car"
......h3 "By Train"
The difficulty with your question is that if you put something in your h1 tag hoping that it would increase your position in Google, but it didn't match up with other content on your page, you could look like you are spamming. Similarly, if your page is made up of too many headings and not enough actual content, you could look like you are spamming. It's not as simple as add a h1 and h2 tag and you'll go up! That's why you need to write websites for humans, not robots.
I have found this paper:
A New Study on Using HTML Structures to Improve Retrieval
however it is an old paper 1999,
still looking for more recent papers.
Check out
http://jcmc.indiana.edu/vol12/issue3/pan.html
http://www.springerlink.com/content/l22811484243r261/
Some time spent on scholar.google.com might help you find what you are looking for
You can also try searching the 'Computer Science' section of arXiv: http://arxiv.org for "search engine" and the various terms that others have suggested.
It contains many academic papers, all freely available... hopefully some of them will be relevant to your research. (Of course the caveat of validating any paper's content applies.)
Like cletus said follow the google guidelines.
I did a few tests came to the conclusion that title, image alt and h tags the most important. Also worth to mention is google adsense. I had the feeling if you implement these, the rank of your site increase.
I believe what you are interested in is called structural-fingerprinting, and it is often used to determine the similarity of two structures. In Google's case, applying a weight to different tags and applying to a secret algorithm that (probably) uses the frequencies of the different elements in the fingerprint. This is deeply routed in information theory - if you are looking for academic papers on information theory, I would start with "A Mathematical Theory of Communication" by Claude Shannon
I would also suggest looking at Microformats and RDF's. Both are used to enhance searching. These are mostly search engine agnostic, but there are some specific things as well. For google specific guidelines for HTML content read this link.
In short; very carefully. In long:
Quote from anatomy of a large-scale hypertextual erb search engine:
[...] This gives us some limited
phrase searching as long as there are
not that many anchors for a particular
word. We expect to update the way that
anchor hits are stored to allow for
greater resolution in the position and
docIDhash fields. We use font size
relative to the rest of the document
because when searching, you do not
want to rank otherwise identical
documents differently just because one
of the documents is in a larger
font. [...]
It goes on:
[...] Another big difference between
the web and traditional well controlled collections is that there
is virtually no control over what
people can put on the web. Couple
this flexibility to publish anything
with the enormous influence of search
engines to route traffic and companies
which deliberately manipulating search
engines for profit become a serious
problem. This problem that has not
been addressed in traditional closed
information retrieval systems. Also,
it is interesting to note that
metadata efforts have largely failed
with web search engines, because any
text on the page which is not directly
represented to the user is abused to
manipulate search engines. [...]
The Challenges in a web search engine addresses these issues in a more modern fashion:
[...] Web pages in HTML fall into the middle of this continuum of structure in documents, being neither close to free text nor to well-structured data. Instead HTML markup provides limited structural information, typically used to control layout but providing clues about semantic information. Layout information in HTML may seem of limited utility, especially compared to information contained in languages like XML that can be used to tag content, but in fact it is a particularly valuable source of meta-data in unreliable corpora such as the web. The value in layout information stems from the fact that it is visible to the user [...]:
And adds:
[...] HTML tags can be analyzed for what semantic information can be inferred. In addition to the header tags mentioned above, there are tags that control the font face (bold, italic), size, and color. These can be analyzed to determine which words in the document the author thinks are particularly important. One advantage of HTML, or any markup language that maps very closely to how the content is displayed, is that there is less opportunity for abuse: it is difficult to use HTML markup in a way that encourages search engines to think the marked text is important, while to users it appears unimportant. For instance, the fixed meaning of the tag means that any text in an HI context will appear prominently on the rendered web page, so it is safe for search engines to weigh this text highly. However, the reliability of HTML markup is decreased by Cascading Style Sheets which separate the names of tags from their representation. There has been research in extracting information from what structure HTML does possess.For instance, [Chakrabarti etal, 2001; Chakrabarti, 2001] created a DOM tree of an HTML page and used this information to in-crease the accuracy of topic distillation, a link-based analysis technique.
There are number of issues a modern search engine needs to combat, for example web spam and blackhat SEO schemes.
Combating webspam with trustrank
Webspam taxonomy
Detecting spam web pages through content analysis
But even in a perfect world, e.g. after eliminating the bad apples from the index, the web is still an utter mess because no-one has identical structures. There are maps, games, video, photos (flickr) and lots and lots of user generated content. In other word, the web is still very unpredictable.
Resources
Hypertext and the web:
Extracting knowledge from the World Wide Web
Rich media and web 2.0
Thresher: automating the unwrapping of semantic content from the World Wide Web
Information retrieval
Webspam papers
Combating webspam with trustrank
Webspam taxonomy
Detecting spam web pages through content analysis
To keep it painfully simple. Make your information architecture logical. If the most important elements for user comprehension are highlighted with headings and grouped logically, then the document is easier to interpret using information processing algorithms. Magically, it will also be easier for users to interpret. Remember the search engine algorithms were written by people trying to interpret language.
The Basic Process is:
Write well structured HTML - using header tags to indicate the most critical elements on the page. Use logical tags based on the structure of your information. Lists for lists, headers for major topics.
Supply relevant alt tags and names for any visual elements, and then use simple css to arrange these elements.
If the site works well for users and contains relevant information, you don't risk becoming a black listed spammer, and search engine algorithms will favor your page.
I really enjoyed the book Transcending CSS
for a clean explanation of properly structured HTML.
I suggest trying Google scholar as one of your avenues when looking for academic articles
semantic search
I found it interesting that - with no meta keywords nor description provided - in a scenatio like this:
<p>Some introduction</p>
<h1>headline 1</h1>
<p>text for section one</p>
Always the "text for section one" is shown on the search result page.
New tag to use called CANONICAL can now also be used, from Google, click HERE