Whats the best way to handle subdomains in a sitemap? - html

If you have a website www.yourdomain.com and a subdomain blog.yourdomain.com
(both sites containing simular information) what is the best sitemap setup?
Is it best to have one site map for both sites?
(and if so what would this look like?)
Or two separate sitemaps?
Which would be most effective in regards to search traffic optimisation?

If the content is identical, use a rel tag to tell Google (and the other search engines) which URL should be used to accrue page rank.
If you don't Google will split your page rank 'juice' over both pages. Ideally, you want to concentrate your juice on one URL as it will get a better page rank.
Choose the main site or the subdomain to produce your site map for. IT doesn't really hurt anything if you do both.
The rel="canonical" tags go in your html pages.

You can simply create single sitemap as:
http://yourdomain.com/sitemap.xml and define sitemap for all your blog posts urls under this sitemap.
This sitemap will help you to index a large archive of content pages of your blog.yourdomain.com. In this way, Google crawlers can index from a single place within less time.

Related

How to make my user's blogs visible in search engines?

In my web site my users can create their own blog.
When user create the blog, all the blog content saved in database and it load the content from db when some one request for it.
My question is that these blogs are searchable in search engines like google?
If not how i make it searchable or what are the ways which i can optimize the discoverableness in search engines?
If your pages are rendered server side, your articles will be crawled by the bots and indexed in search engines. It's about time.
However, you can increase your chances to be indexed faster and better with these simple techniques:
Add enough correct meta tags in your html head Meta tags
Add a robots.txt file in the root of your site Robots.txt
Add a sitemap file in the root of your site Sitemap
Add json-ld description of your blog and each article in the head of your pages Json-ld
Be sure to use semantic html for your content Semantic HTML
Provide social links and social pages that point links to your site
Those are basic, yet effective ways to assure your site to be properly indexed in search engines.
You can also test your SEO eanking with online tools like this one rankgen.com

Finding number of pages of a website

I want to find the number of pages of a website. Usually what I look for is a sitemap but I just encountered a site which does not have a sitemap so I am out of ideas of how to find its total pages. I tried to Google the URL but that did not helped much. Is there any other way we can find out the pages of a website?
Thanks in advance.
Ask Google "site:yourdomain.com"
This gives you all indexed pages.
Or use the free tool "Xenu". It crawls the whole site. But it won't find sites which have no internal links pointing to them. You can also export a sitemap with it.
I was about to suggest the same thing :) If this is a website you own, you can also add it to the Google Webmaster tools. It will show you lots of things about your site including number of links, pages, search terms, etc Its very useful and is free of charge.
I have found a better solution myself. You can go to Google Advanced Search and restrict the search results to your domain name. Leave everything else empty. It would give you the list of all pages cached by Google.
You could also try A1 Website Analyzer. But for all link checker software, you will have to make sure you configure them correctly to obey/not-obey (whatever your needs are) e.g robots.txt, noindex and nofollow instructions. (Common source for confusion in my experience.)

What is sitemap.xml

Hello fellow programmers,
I am building a website and i read about sitemap.xml, but there is no place where i can find a definition or what it contains.
Can someone help me, what does it do? what is it for? what is in it?
http://www.sitemaps.org/ is the official resource.
The protocol page is probably the most important part of the entire site. It describes how to properly format your sitemap.xml file so that search engines can properly crawl your website.
from sitemaps.org
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
It provides information for search engines about the structure of your site. Wikipedia article.
It's an XML file that contains all of the URLs in your application, along with some other information that is used to make your site easier to crawl for search engines.

How are SEOs done for dynamic websites? Whether each page is separately built static?

I had a food takeaway website where the users can search the restaurants by giving their area name. I want my website's LONDON search page to be listed when user searches in Google TAKEAWAYS IN LONDON.
I think google doesn't crawl websites with query string. How can we achieve that?
Remember that Google just looks at the page you create, not how you create it.
Now this basically translates your question to "how do we make our dynamic pages visible to Google"? There are a number of tricks. Basically you build links from your homepage to specific other pages. You could have a "Top 5 searches" box, with links to "http://www.example.com/london/takeaways" etc. On the result pages, you can have links to similar queries, e.g. "http://www.example.com/london/finedining".
Basically, Google will see and follow those links, and remember the results per URL. The basic SEO rules till apply - good content, clear structure, etc. Don't bother Google or users with query strings. Use URL rewriting if a query string easier for you internally.
Maybe you're supposed to have a sitemap, which could have a discoverable link to a page of yours whose URL is http://www.food.com/london and whose title and heading is 'TAKEAWAYS IN LONDON' (and whose contents you can retrieve dynamically).

Methods for preventing search engines from indexing irrelevant content on a page

I'm looking for ways to prevent indexing of parts of a page. Specifically, comments on a page, since they weigh up entries a lot based on what users have written. This makes a Google search on the page return lots of irrelevant pages.
Here are the options I'm considering so far:
1) Load comments using JavaScript to prevent search engines from seeing them.
2) Use user agent sniffing to simply not output comments for crawlers.
3) Use search engine-specific markup to hide parts of the page. This solution seems quirky at best, though. Allegedly, this can be done to prevent Yahoo! indexing specific content:
<div class="robots-nocontent">
This content will not be indexed!
</div>
Which is a very ugly way to do it. I read about a Google solution that looks better, but I believe it only works with Google Search Appliance (can someone confirm this?):
<!--googleoff: all-->
This content will not be indexed!
<!--googleon: all-->
Does anyone have other methods to recommend? Which of the three above would be the best way to go? Personally, I'm leaning towards #2 since while it might not work for all search engines, it's easy to target the biggest ones. And it has no side-effect on users, unless they're deliberately trying to impersonate a web crawler.
I would go with your JavaScript option. It has two advantages:
1) bots don't see it
2) it would speed up your page load time (load the comments asynchronously and unobtrusively, e.g. via jQuery) ... page load times have a much underrated positive effect on your search rankings
Javascript is an option but engines are getting better at reading javascript, to be honest I think your thinking too much into it, Engines love unique content, the more content you have on each page the better and if the users are providing it... its the holy grail.
Just because your commenter made a reference to star wars on your toaster review doesn't mean your not going to rank for the toaster model, it just means you might rank for star wars toaster.
Another idea would be, you could only show comments to people who are logged in, collegehumor do the same I believe, they show the amount of comments a post has but you have to login to see them.
googleoff and googleon are for the Google Search Appliance, which is a search engine they sell to companies that need to search through their own internal documents. It's not effective for the live Google site.
I think number 1 is the best solution, actually. The search engines doesn't like when you give them other material than you give your users so number 2 could get you kicked out from the search listings altogether.
This is the first I have heard that search engines provide a method for informing them that part of a page is irrelevant.
Google has a feature for web masters to declare parts of their site for a web search engine to use to find pages when crawling.
http://www.google.com/webmasters/
http://www.sitemaps.org/protocol.php
You might be able to relatively de-emphasize some things on the page by specifying the most relevant keywords using META tag(s) in the HEAD section of your HTML pages. I think that is more in line with the engineering philosophy used to architect search engines in the first place.
Look at Google's Search Engine Optimization tips. They spell out clearly what they will and will not let you do to influence how they index your site.