Tabbed Contents for Googlebot

Tabbed Contents for Googlebot - tabs

I have checked a lot of forum questions on how to tell Googlebot to crawl my tabbed contents, but of no luck.
I need someone to punch my face and say that Googlebot will never prioritise tabbed contents.
Or, are there other ways that you might know of that will tell Googlebot to crawl our tabbed contents as well? E.g. disable javascript for Googlebot.
Thanks.

Related

Can I override the title of a bookmark for a web page?

This question has been asked before by someone else entirely, but basically no solution was given (and this was in 2008). Now, in 2013, HTML and browser functionality has increased, so I thought maybe it's a good idea to ask.
Question:
As a developer, how can you make sure that the title of the web page is different from the title of when someone bookmarks your page?
The reason I ask is because there are many websites that have their title, and then some slogan. Or worse, the slogan first and the actual site title after that. In any case, the titles are long, and you want your bookmarks to be concise, preferably one word, right? I want to know if there's any kind of functionality like that available in modern browsers.

As far as I can tell for Internet Explorer you can add a link, perhaps in the footer, for your visitors to click and bookmark your site and edit the default title with this function AddFavorite().
Here is a link to it on MSDN. Please notice that this function is deprecated.
See the demonstration of use below:
<a href="#"
onclick="window.external.AddFavorite(location.href, 'YOUR_TITLE_HERE');
return false">
Bookmark this!
</a>
Although this won't work on other browsers as far as I am aware (surely won't work on Chrome and Firefox) and it's use is abolished when a user decides to bookmark your site manually instead of clicking the link.

Prevent site configuration info from showing up on Google

I have a site that's running WordPress.
The main page has an embedded Flash player and an imbedded iframe, and for some reason, all the configuration info from the Flash player is showing up on Google for my site, and nothing else.
How can I have the main site information show up on Google, without having that Flash player config info show up?
And can I customize what shows up at all?
If there's some way to tag the info I don't want to show up, or tag the info I want to show up, I can probably do most ofthe edits myself, I just don't know where to start...
EDIT: I tried most of the suggestions below, and I didn't get anywhere...
Any other ideas?
Thanks a lot!

If you don't want Google, or other crawler to access certain parts of your website you should use a robots.txt file. Inside you specify which parts are accessible and which aren't, when the crawlers get to your website will always look for this file for instructions.
You can check some documentation on how to do it here and here

In order to influence what text is used on the google search result try putting this within your head tags
<meta name="description" content="WHATEVER YOU WANT DISPLAYED ON GOOGLE">
Source: http://static.googleusercontent.com/external_content/untrusted_dlcp/www.google.com/en/us/webmasters/docs/search-engine-optimization-starter-guide.pdf
Some more information from google on controling parts of a page. Apparently there are google off/google on tags.
http://perishablepress.com/press/2009/08/23/tell-google-to-not-index-certain-parts-of-your-page/
Hope this helps.

If you want Google to index only part of your pages, you can't follow normal SEO routines. You should provide a mechanism to understand whether the current client (requester) is a robot or not. If yes, then don't render that part. This is the only way. Otherwise, a robot either gets the whole rendered content, or doesn't have access based on robots.txt file (Robot Exclusion Protocol).
Another way (which is not really smart, and can't be guaranteed to work) is to dynamically inject your content into the page via JavaScript. Because AMAIK, robots don't run JavaScript.

As search spiders won't render javascript generated markup (JS is not run as it is client-side in the browser), a quick fix would be to don't output any of flash / markup initially in the HTML document and then use JS to add the flash stuff on load.
Note: as far as I'm aware, Google is currently testing a JS reading spider so this may not work long term.

Google is returning this data because it simply can't find any content where it normally would. Search engines require content - they're not advanced enough to process your multimedia to determine what it's all about.
Google will IGNORE your meta description if it doesn't feel that it reflects your page content (of which there is only iframes and JS)
Use SWFObject to provide alternate content for users without flash (including search engines) - ensure it's not some dinky text like "download flash here" - but a lengthy descriptive content piece about your site or media that they would normally experience if they could experience.
Use robots.txt or <meta name="robots" content="noindex,follow"> for the iframe content to prevent it from being indexed.
For the love of all things holy, please look at reducing the number of JS files and inline JS on your site (i'd recommend WP-minify since it's so obvious that you love plugins)

Cleaning up HTML from textarea

I have a page with two textareas, where registered users can fill them with HTML codes. First one has TinyMCE (so HTML is cleaned up), but the other one does not, since I expect the code to be inserted as embed codes from other sites (mostly sites that provide maps, e.g. Google Maps, MapMyRace.com, etc). But problem is that those other sites may provide different tags, not just <embed> or <iframe>. So I can't strip tags because then I might strip tags that I didn't know other sites provided. I will save the HTML in these two textareas into my database, to be retrieved and displayed as parts of some other pages.
Do you have any suggestions to make this setup more secure? Or should I disallow free input of HTML in the 2nd textarea altogether? (Or.. I let the users tick a check box saying "I accept full responsibility for the behavior of the code I am inserting".. LOL)
Your opinion is highly appreciated :)
Thanks

The short answer is : free HTML is insecure and must be avoided. Nothing blocks your user from creating an iframe that redirects the user to some harmful page or put ads on your page or deface your site.
My favorite approach to this problem is to allow the user to paste a link (no the "embed on page" iframe code) in a text box. Then I use regex to identify the pasted link (is it youtube, Bing maps, ...) and I create the HTML from the pasted link, which isn't too complex for most iframe providers. It's much more work for you, and it restricts the APIs you can put on your page, but it's secure.

Letting your users use arbitrary HTML is dangerous. You may want to have a black and white lists of tags that you disallow and allow (respectively).

Methods for preventing search engines from indexing irrelevant content on a page

I'm looking for ways to prevent indexing of parts of a page. Specifically, comments on a page, since they weigh up entries a lot based on what users have written. This makes a Google search on the page return lots of irrelevant pages.
Here are the options I'm considering so far:
1) Load comments using JavaScript to prevent search engines from seeing them.
2) Use user agent sniffing to simply not output comments for crawlers.
3) Use search engine-specific markup to hide parts of the page. This solution seems quirky at best, though. Allegedly, this can be done to prevent Yahoo! indexing specific content:
<div class="robots-nocontent">
This content will not be indexed!
</div>
Which is a very ugly way to do it. I read about a Google solution that looks better, but I believe it only works with Google Search Appliance (can someone confirm this?):
<!--googleoff: all-->
This content will not be indexed!
<!--googleon: all-->
Does anyone have other methods to recommend? Which of the three above would be the best way to go? Personally, I'm leaning towards #2 since while it might not work for all search engines, it's easy to target the biggest ones. And it has no side-effect on users, unless they're deliberately trying to impersonate a web crawler.

I would go with your JavaScript option. It has two advantages:
1) bots don't see it
2) it would speed up your page load time (load the comments asynchronously and unobtrusively, e.g. via jQuery) ... page load times have a much underrated positive effect on your search rankings

Javascript is an option but engines are getting better at reading javascript, to be honest I think your thinking too much into it, Engines love unique content, the more content you have on each page the better and if the users are providing it... its the holy grail.
Just because your commenter made a reference to star wars on your toaster review doesn't mean your not going to rank for the toaster model, it just means you might rank for star wars toaster.
Another idea would be, you could only show comments to people who are logged in, collegehumor do the same I believe, they show the amount of comments a post has but you have to login to see them.

googleoff and googleon are for the Google Search Appliance, which is a search engine they sell to companies that need to search through their own internal documents. It's not effective for the live Google site.
I think number 1 is the best solution, actually. The search engines doesn't like when you give them other material than you give your users so number 2 could get you kicked out from the search listings altogether.

This is the first I have heard that search engines provide a method for informing them that part of a page is irrelevant.
Google has a feature for web masters to declare parts of their site for a web search engine to use to find pages when crawling.
http://www.google.com/webmasters/
http://www.sitemaps.org/protocol.php
You might be able to relatively de-emphasize some things on the page by specifying the most relevant keywords using META tag(s) in the HEAD section of your HTML pages. I think that is more in line with the engineering philosophy used to architect search engines in the first place.
Look at Google's Search Engine Optimization tips. They spell out clearly what they will and will not let you do to influence how they index your site.

Why are page titles on some websites clickable URLs?

Why on sites like Stack Overflow, Techcrunch, Smashing Magazine, etc. are the page titles (i.e. the text at the top of the page) clickable URLs that redirect to the same page that the user is on?
Some examples:
I believe that this does not effect SEO as search engines ignore internal links.
Is it for usability purposes?

It allows you to right-click on it and choose Copy link location (or equivalent) so that you can easily paste it in an email for example. This requires less time than copying it from the location bar, and some people run their browser without a visible location bar to save previous screen space.

More than anything, it provides a link to the default state of the page.
For example, for this very stack overflow page it is a user can get here through any of the following non-default links:
Why are Page Titles on some websites (including Stack Overflow) Clickable URLs?
https://stackoverflow.com/questions/904381#foobar
https://stackoverflow.com/questions/904381?sort=date
While the default link is actually:
Why are Page Titles on some websites (including Stack Overflow) Clickable URLs?
If users are unable to get to the default state, they end up bookmarking or emailing the non-default link which propagates to new users and the problem just multiplies.
Clicking on the title link of the post will restore the default state and strip off any query parameters (?sort=date), named anchors (#foobar) and fix the story slug (/why-are-page-titles/...).

I use it to refresh the page (yes, I could press F5 too).

Yes Jakob Nielsen has stated that linking to yourself is a web design mistake (nr 10). And I agree.
More reading info here. (nr 10)

The URL redirects to the beginning of the page, in case you arrived on the page via a specific answer (all answers are also clickable URLs). This way, you get the URL of the question, not of an answer.

Not sure if this is why they did it, but I find it useful to siphon off tabs:
If I look at something briefly and think "I'd like to read this thoroughly in a minute but continue with what I was doing before", I can do this:
I can right click the link, click "open in a new tab" and then click "back" and continue nicely.

It's called a Permalink... The name implies what it is, a permanent link.
It's the same reason that each answer on SO has a link you can copy.

I think it inherits the behavior from CMS where each question is a node, which has 0<= answered question. Now think you go for a search on apache questions.
The result are displayed one after another.
In terms of CMS this is called a teaser. You get a full page with lots of questions where the question's title link to the full article(question + answers)
Its not a must, but you'll find it on most sites which uses a CMS.

As long as it does not harm anyone why would people be against it?
I prefer to have those links available as hitting refresh would reload all elements of the page instead of just following the direct link (to the same page) that uses cached elements.

Makes sense to me, I find it useful! I have a lot of tabs open so I just right click the link and go back.
To me this makes perfect sense, from a SEO view this is also good! It forces it to read the page because it's linked.

UX-wise clickable titles which don't bring the user anywhere may seem unusable though that leads us into the realm of Affordance Theory and whether or not the affordance is perceptible to users.
For example, clickable page titles may provide:
A simple method for bookmarking a page to the desktop from a browser window.
A context menu with additional choices allowing users to share a blog post or article.
A method for updating the location bar so it's pointing at the canonical URL of the page.
For the sites you mentioned, however, it seems more likely the page titles were turned into hyperlinks using absolute URLs so analytics tooling could pick up inbound link clicks – those sending the referer info – resulting in DCMA takedown notices when people copied work and didn't update the URLs.
You'd be surprised what people do when they're being incentivized to produce work contractually.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Tabbed Contents for Googlebot - tabs

Related

Can I override the title of a bookmark for a web page?

Prevent site configuration info from showing up on Google

Cleaning up HTML from textarea

Methods for preventing search engines from indexing irrelevant content on a page

Why are page titles on some websites clickable URLs?

Categories

Resources