configure language domains for search engines - SEO - subdomain

I have a site that uses different language (http://www.boletus-app.com). Each language is displayed in a different subdomain (http://en.boletus-app.com, http://es.boletus-app.com, etc.). The main site (www) shows the language used by the user in previous entries or the browser one by default.
Which strategy should I use so that Google identifies this? People usually links to www site.
What I have currently done is adding the alternate tag:
<link href="http://es.boletus-app.com" rel="alternate" hreflang="es"/>
Is there anything I can do?
Thanks!

You can tell Google about language specific versions of your website in three different ways. One among them is your current way using rel="alternate" hreflang="x" annotation. Please note that this markup is seen as one of the signals and not as a directive to the Google Crawler..
You may find additional information under this webmaster tools link
Also, you should go through this Internationalisation FAQ by Google Webmaster Central where it is clearly mentioned that the Google Crawler detects the language on a per-URL basis and there's no need to use any special structure. However, it’s important that each language version is on a separate URL, and that Googlebot is able to crawl all versions.
So, imho and as stated in the links stated above, you are on a comfortable position with your current style.

Related

Make website completely invisible in search results

My client wants me to make his website completely invisible in search results, especially in Google search engine. The website will be a simple presentation one-page site, which should be normally accessible (no locks, no passwords).
I have several ideas which might certainly work and one where I am not sure.
What might certainly work:
Adjust robots.txt with following string:
User-agent: *
Disallow: /
Add nofollow meta-tag in page head:
<meta name="robots" content="noindex,nofollow" />
I can detect user-agent strings in all requests, make list of known robots and scrapers and if there will be match I can kick them out. This should work too I guess.
This link should be helpful in case of bots detection, these guys seems to provide API for known user agent strings.
Domain choice can influence search results too:
My client prefer to stay unlisted in certain countries. I noticed that Google prefers to serve us results according to our IP adress, so if your IP is located in Sweden than you can be sure that most results are with *.se domain. Commercial domains like *com, *.net, *.org seems to be visible in all cases, it make sense. So it brought me to question, how domain choice can influence search results. For example if I prefer to stay hidden for German people I should not take .de domain of course and I should avoid commercial domains. Ok I can choose another country, but it seems to be weird, I would prefer to take new domains extension like (.club, *.art, *.shop, *.name etc.) but here I am not sure how will Google treat those domains. If they behave like commercial domain then I should stay with other country-code domain.
I hope that my intentions are obvious here. I would be glad for your valuable advices here, if there is something more what I can do, or if anybody can bring more light in domain problem I would be very pleased.
The options you have listed will work well, especially the robots.txt. But another to consider would be using the google webmaster tools to add a no follow to the page / website. Works well as it allows your client to add no follows to other pages down the line also.

Hyperlink to specific page or folder with default document?

We are in the process of redesigning our company's web site, and we have been told by consultants that it is important that we either:
1.) Always link to a specific page i.e. foo.com/buy/default.aspx
or -
2.) Always link to a directory and allow the default document to load i.e. foo.com/buy/ where "default.aspx" is the default document
Is there any practical benefit to either approach? Does being 100% consistent in doing one or the other really gain us anything?
In your first example, always linking to a specific page helps prevent ambiguity in your URLs. Prefer a canonical URL (can be with or without a file extension). If you have to have multiple URLs for a single resource, for example /products/product1.aspx and products?productID=product1, then take advantage of the Canonical URL property in your code to specify which is the proper one.
Using extensionless URLs allows you to change technologies later. For example, /blog/post1.aspx is different than /blog/post1.php, say if you ever switched to WordPress (not common, of course, but it happens). It's just an easy way to make the links work no matter the technology. Plus, in my opinion it's always better to, as much as possible, mask the technology stack that you're using where possible. Extensionless URLs are becoming the norm (in my opinion) as people seem to be moving more toward an API-based approach to URIs as resources, and mixed technology stacks.
The second method you mentioned is good because it helps you make clean URLs
(don't click these urls they are just examples)
okay URL
http://www.example.com/example/example.php
clean URL
http://www.example.com/example/ (so much easier to remember, and shorter to write out)
here is a good tutorial I found that shows you how to accomplish this: http://www.desiquintans.com/cleanurls (htaccess)

What are dpuf (extension) files?

I have seen this extension in some urls and I would like to know what they are used for.
It seems odd, but I couldn't find any information about them. I think they are specific for some plug-in.
It seems to be connected to 'Share This'-buttons on the websites.
I found this page which gives a quite comprehensive explanation:
This tag is mainly developed for tracking the URL sharing on various Social Networks, so every time anyone copies your blog content there he gets the URL ending with #sthash and extension with .dpuf or .dpbs

cloud based "knowledge base" approach with links, snippets and excerpts for google chrome

I'm working as a web developer and have lots (hundreds) of links with hacks, tutorials and code snippets that I don't want to memorize. I am currently using evernote to save the content of my links as snippets and have them searchable and always available (even if the source site is down).
I spend a lot of time on tagging, sorting, evaluating and saving stuff to evernote and I'm not quite happy with the outcome. I ended up with a multitude of tags and keep reordering and renaming tags while retagging saved articles.
My Requirements
web based
saving web content as snippets with rich styling (code sections, etc.)
interlinked entries possible
chrome plugin for access to content
chrome plugin for content generation
web app or desktop client for faster sorting / tagging / batch processing
good and flexible search mechanism
(bonus) google search integration (search results from KnowledgeBase within google search results)
I had a look at kippt but that doesn't seem to be a solution for me. If I don't find a better solution, I'm willing to stay with evernote as it meets nearly all my needs but I need a good plan to sort through my links/snippets once and get them in order.
Which solutions do you use and how do you manage your knowledge base?
I'm a big Evernote fan but a stern critic of all my tools. I've stuck with Evernote because I'm happy enough with its fundamental information structures. I am, however, currently working on some apps to provide visualisations and hopefully better ways to navigate complex sets of notes.
A few tips, based on years of using Evernote and wiki's for collaboration and software project management:
you can't get away from the need to curate things, regardless of your tool
don't over-think using tags, tags in combination with words are a great way to search (you do know you can say tag:blah in a search to combine that with word searches?)
build index pages for different purposes - I'm using a lot more of the internal note links to treat Evernote like a wiki
refactor into smaller notebooks if you use mobile clients a lot, allowing you to choose to have different collections of content with you at different times

Crawling data or using API

How these sites gather all the data - questionhub, bigresource, thedevsea, developerbay?
Is this legal to show data in frame as bigresource do?
#amazed
EDITED : fixed some spelling issues 20110310
How these sites gather all data- questionhub, bigresource ...
Here's a very general sketch of what is probably happening in the background at website like questionhub.com
Spider program (google "spider program" to learn more)
a. configured to start reading web pages at stackoverflow.com (for example)
b. run program so it goes to home page of stackoverflow.com and starts visiting all links that it finds on those pages.
c. Returns HTML data from all of those pages
Search Index Program
Reads HTML data returned by spider and creates search index
Storing the words that it found AND what URL those words where found at
User Interface web-page
Provides feature rich user-interface so you can search the sites that have been spidered.
Is this legal to show data in frame as bigresource do?
To be technical, "it all depends" ;-)
Normally, websites want to be visible in google, so why not other search engines too.
Just as google displays part of the text that was found when a site was spidered,
questionhub.com (or others) has chosen to show more of the text found on the original page,
possibly keeping the formatting that was in the orginal HTML OR changing the formatting to
fit their standard visual styling.
A remote site can 'request' that spyders do NOT go thru some/all of their web pages
by adding a rule in a well-known file called robots.txt. Spiders do not
have to honor the robots.txt, but a vigilant website will track the IP addresses
of spyders that do not honor their robots.txt file and then block that IP address
from looking at anything on their website. You can find plenty of information about robots.txt here on stackoverflow OR by running a query on google.
There is a several industries (besides google) built about what you are asking. There are tags in stack-overflow for search-engine, search; read some of those question/answers. Lucene/Solr are open source search engine components. There is a companion open-source spider, but the name eludes me right now. Good luck.
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, or give it a + (or -) as a useful answer. This goes for your other posts here too ;-)