We have a hybrid slideshow, meaning, each slide has its own unique URL, yet you can click through the whole slideshow without refreshing the page. In order to achieve that, we have a JSON living on each unique URL with info of all other slides, such as headers, subheaders, captions and image URL's.
Would that affect SEO negatively? Would Google read the JSON? And if so, would they read it as redundant data?
At current, this "one-page" architecture will negatively impact your ability to rank on Google.
Here are two public perceptions of how Google handles javascript:
1) Google can crawl some content served with javascript however does not prioritize it as highly in SERPs (search engine results pages) as content served across plain html pages.
Some technologies like screen readers (used by the blind) do not process javascript effectively and limit applications of this nature from achieving penetration to greater than 95% of the internet-connected population.
2) Google can crawl javascript but they have not yet weighed it into their algorithm because their technology has not yet fully matured.
I personally am with the first camp and believe that javascript sites won't be considered as highly in SERPS until ancillary web-browsing technologies supporting niche population groups have excellent support for javascript.
Only time will tell :)
Related
How can we add blank alt tags to the Google map tiles (generated by v3 api) so that they do not lower our accessibility score?
It would be unwise to attempt to do what you are suggesting. By attempting to "improve" the score that your automated tool is giving you, you would almost certainly be degrading accessibility for actual humans.
The issue here is that embedded Google Maps are not accessible for non-sighted visitors, full-stop. Doing hacky things with JavaScript won't fix that. To the best of my knowledge, none of the major interactive maps are very accessible. Here are a few of the reasons why.
If you genuinely are concerned about the accessibility of your webpage, and not just an arbitrary number that some tool gives you, then there are a few things you can do:
Understand that non-sighted visitors won't be able to use interactive maps. Offer alternatives instead, like text directions. Clearly state any relevant addresses in the text of the page.
If your page contains embedded maps, you may wish to hide that content from screen reader users with the aria-hidden attribute.
The Google Maps web interface offers a reasonably OK level of accessibility when it comes to directions between two points. The directions URLs are of the format: https://www.google.com/maps/dir/?api=1&destination=Rockefeller+Plaza,+New+York,+NY+10111.
Use techniques that make use of special markup that is only announced to screen reader users.
Keep in mind that you are creating webpages for actual humans, not robots.
Test your pages using free tools, like NVDA or VoiceOver with more than one browser.
There are some cases when you want to get a most representative image
of a web page, e.g. Pocket would try to add an image when you collect
a web page.
How would you define, in a programmatic way, which image is the key
image?
What would be the most appropriate way to do so?
Most websites that are seeking to be shared on sites like Facebook or Pocket will have an Open Graph protocol image. This is often an image in the head tag that uses the format <meta property="og:image" content="http://URL-TO-YOUR-IMAGE" />. The Open Graph protocol is used and looked for by companies such as Facebook, Pocket, Reddit, and has become fairly widespread in use.
For websites that do not follow such a standard, developers will often use a third-party tool such as Embedly, which has already solved the problem. Simply feed it a URL and it will return you some information on what content would be good for your thumbnail-ified images.
If you're wanting to create your own engine, you may want to study into DOM positioning analytics, and try to find your own algorithm by scraping many, many articles and web pages to try and find good patterns.
Study scraper.py to see how reddit uses BeautifulSoup to find representative images from links submitted to it.
I'm working as a web developer and have lots (hundreds) of links with hacks, tutorials and code snippets that I don't want to memorize. I am currently using evernote to save the content of my links as snippets and have them searchable and always available (even if the source site is down).
I spend a lot of time on tagging, sorting, evaluating and saving stuff to evernote and I'm not quite happy with the outcome. I ended up with a multitude of tags and keep reordering and renaming tags while retagging saved articles.
My Requirements
web based
saving web content as snippets with rich styling (code sections, etc.)
interlinked entries possible
chrome plugin for access to content
chrome plugin for content generation
web app or desktop client for faster sorting / tagging / batch processing
good and flexible search mechanism
(bonus) google search integration (search results from KnowledgeBase within google search results)
I had a look at kippt but that doesn't seem to be a solution for me. If I don't find a better solution, I'm willing to stay with evernote as it meets nearly all my needs but I need a good plan to sort through my links/snippets once and get them in order.
Which solutions do you use and how do you manage your knowledge base?
I'm a big Evernote fan but a stern critic of all my tools. I've stuck with Evernote because I'm happy enough with its fundamental information structures. I am, however, currently working on some apps to provide visualisations and hopefully better ways to navigate complex sets of notes.
A few tips, based on years of using Evernote and wiki's for collaboration and software project management:
you can't get away from the need to curate things, regardless of your tool
don't over-think using tags, tags in combination with words are a great way to search (you do know you can say tag:blah in a search to combine that with word searches?)
build index pages for different purposes - I'm using a lot more of the internal note links to treat Evernote like a wiki
refactor into smaller notebooks if you use mobile clients a lot, allowing you to choose to have different collections of content with you at different times
I'm looking for ways to prevent indexing of parts of a page. Specifically, comments on a page, since they weigh up entries a lot based on what users have written. This makes a Google search on the page return lots of irrelevant pages.
Here are the options I'm considering so far:
1) Load comments using JavaScript to prevent search engines from seeing them.
2) Use user agent sniffing to simply not output comments for crawlers.
3) Use search engine-specific markup to hide parts of the page. This solution seems quirky at best, though. Allegedly, this can be done to prevent Yahoo! indexing specific content:
<div class="robots-nocontent">
This content will not be indexed!
</div>
Which is a very ugly way to do it. I read about a Google solution that looks better, but I believe it only works with Google Search Appliance (can someone confirm this?):
<!--googleoff: all-->
This content will not be indexed!
<!--googleon: all-->
Does anyone have other methods to recommend? Which of the three above would be the best way to go? Personally, I'm leaning towards #2 since while it might not work for all search engines, it's easy to target the biggest ones. And it has no side-effect on users, unless they're deliberately trying to impersonate a web crawler.
I would go with your JavaScript option. It has two advantages:
1) bots don't see it
2) it would speed up your page load time (load the comments asynchronously and unobtrusively, e.g. via jQuery) ... page load times have a much underrated positive effect on your search rankings
Javascript is an option but engines are getting better at reading javascript, to be honest I think your thinking too much into it, Engines love unique content, the more content you have on each page the better and if the users are providing it... its the holy grail.
Just because your commenter made a reference to star wars on your toaster review doesn't mean your not going to rank for the toaster model, it just means you might rank for star wars toaster.
Another idea would be, you could only show comments to people who are logged in, collegehumor do the same I believe, they show the amount of comments a post has but you have to login to see them.
googleoff and googleon are for the Google Search Appliance, which is a search engine they sell to companies that need to search through their own internal documents. It's not effective for the live Google site.
I think number 1 is the best solution, actually. The search engines doesn't like when you give them other material than you give your users so number 2 could get you kicked out from the search listings altogether.
This is the first I have heard that search engines provide a method for informing them that part of a page is irrelevant.
Google has a feature for web masters to declare parts of their site for a web search engine to use to find pages when crawling.
http://www.google.com/webmasters/
http://www.sitemaps.org/protocol.php
You might be able to relatively de-emphasize some things on the page by specifying the most relevant keywords using META tag(s) in the HEAD section of your HTML pages. I think that is more in line with the engineering philosophy used to architect search engines in the first place.
Look at Google's Search Engine Optimization tips. They spell out clearly what they will and will not let you do to influence how they index your site.
Semantic HTML makes it easier for Google to crawl and 'understand' a website but what about microformats? Are microformats any more semantic/crawlable then standard HTML markup?
Google announced a little bit of RDFa and Microformats support in the last few days.
Links and commentary here:
http://rdfa.info/2009/05/12/google-announces-support-for-rdfa/
Yahoo has been using RDFa and Microformats to drive Search Monkey for some time:
http://developer.yahoo.com/searchmonkey/
Both will probably aid click-through rates, but not necessarily ranking. Expect more search engines to use more different RDFa vocabularies as time goes on. BOSS is also relevant here:
http://developer.yahoo.com/search/boss/
The intent is to help create more search engines and they will have access to the data in the pages.
For geographical information, Google will parse KML files and index them and the links in them.
I believe that Yahoo has gotten behind RDFa I don't think that Google has admitted to this yet.
AFAIK, all major search engines support the rel-nofollow microformat. Beyond that, I'm not aware of any support. However, there are smaller, more specialized search engines that have been specifically designed with microformats in mind. E.g. there are search engines that allow you to do searches on relationships between persons, using the XFN microformat.
As far as i know Google doesn't actively talk about these micro formats as a way of generating your page rank, from what i understand its more for other kinds of bots that are not just making a general purpose search engine.
At the moment Google did not announce any support for microformats yet. I hope, that in the near future they will.
On the other hand Yahoo has announced that it will support RDFa, eRDF and microformats.
They make your pages more semantic insofaras they force a bit more consistency in the way that you fit your information together. The major search engines can and do read microformats, and often use them to display what Google calls "rich snippets" ( http://googlewebmastercentral.blogspot.com/2009/05/introducing-rich-snippets.html ), which adds some interesting stuff to SERPs. Bing and Yahoo both display these too.