I've created website with dynamic content, and I want google to know all my pages, so I've given a file "mysitemap.xml" via webmaster tools.
Basically, my links are like mysite.com/one-id/one-name , with one-id an id between 1 and 2000 (but will be greater with the time...).
I'm wondering if I need to create a page on my website (a kind of html sitemap), which will list all these links to help google bots to find my web pages, or is it enough for google to have the xml sitemap?
The problem is that the html sitemap will be very ugly and only a "for google" page, so I want to avoid this...
No, Google only requires a sitemap.
Sitemap is for Search Engines and Navigation is for humans.
Sitemap includes page's content type, update frequency, last modified, etc.
Navigation may include dropdown menus, hyperlinks, etc.
Related
I'm nearing the end of my first web development project and I'm looking to build a sitemap for our website as part of Search Engine Optimalisation. If I understand correctly a sitemap, when done correctly, is a file that shows a content tree (similar to paths in windows explorer) to all the public pages of my website.
For the purpose of my question you're going to need some background information on the site and how it works. The site is about bird migration, a user enters the site on a homepage that holds a searchbox, he or she is able to search for a species of birds and if we have data on it the user is able to go to a seperate page with information on this bird. From there the user can access statistical data about this species. The page will look something like below, filled with content that we get from a database.
The URL will look something like http://domain.com/searchbird.html?bird=Sedge%20Warbler?lang=1 for the informational page, and http://domain.com/statistics.html?bird=Sedge%20Warbler?lang=1 for the statistical page.
Every bird species uses the same base HTML file (searchbird.html) that is filled with data based on the ?bird="" parameter. I have about four HTML files in my webroot (lets call them: index.html, searchbird.html, statistics.html, about.html).
So when I go to create a sitemap using some sort of sitemap generation tool, I get a sitemap that contains those 4 .html files, which is great! Yet I'm missing the 500 bird species that users are going to be able to find.
Is there a way for me to include every possible URL in the sitemap automatically, and how would I go about doing such a thing? I've used HTML, CSS and Javascript in the past. but I'm only a beginner. If an executable tool exists for this that'd be great, but my Google searches haven't been successful yet.
You have to generate the list of URLs for your existing pages.
So dig into your data source (database or whatever you use), find all existing bird species, and generate the two URLs per species.
Directory for users/bots
It would probably be a good idea (for visitors as well as for bots) to output these links on your website, too. Visitors would have two ways to find a species (search for it or browse the directory), and as most bots don’t use search functions, they wouldn’t be able to find the links on your site otherwise (they would have to use your sitemap, which not all bots do, or they would have to hope to find the links from some other external website).
(If you do this, you could also use a sitemap generator service; but it’s usually better do generate it yourself.)
URL design
By the way, you might want to consider changing your URL design to a more human-friendly one. Instead of
http://example.com/searchbird.html?bird=Sedge%20Warbler?lang=1
http://example.com/statistics.html?bird=Sedge%20Warbler?lang=1
you could use something like
http://example.com/en/birds/sedge-warbler
http://example.com/en/birds/sedge-warbler/statistics
where en is the language code for "English" (these are standardized, and users have a chance to understand them, contrary to lang=1), and where http://example.com/en/birds could lead to the page listing all species. For other languages, you would of course ideally translate "birds" and "statistics".
Changing the URL design is possible with URL rewriting.
U can use sitemap generator. U can use https://www.xml-sitemaps.com/. U only need put url index. That website will search all link and generate sitemap automatically.
If u use wordpress u can use plugin wordpress like https://wordpress.org/plugins/google-sitemap-generator/.
Hope that help
It's easy to find all children pages of one webpage. But it is not trivial to get all parent pages of one webpage, how can I do that by using Google?
You can't and Google cannot help you as it doesn't index all of the web.
At best it follows links on other pages, or is initiated by someone wanting to have something indexed explicitly.
Create a server. Put an HTML page on it that no other page on your server has a link to. Name the page with some non-guessable UUID in the name.
Google will not find this unless they start to randomly change parts of URLs to test for existing pages (a lengthy process).
Within that page you can have links pointing to other pages. It is a parent page for those pages, is a web page, and will not be found via Google.
I'm looking for ideas/solutions for the following scenario:
I'm a website developer that is given 150'ish HTML pages from a 3rd party who update and re-issue the html pages from time to time.
I'm looking for a way to implement search functionality for these pages and then navigate to that location within the page.
I don't want to add navigation tags to the html pages as these would be lost when the 3rd party re-issue the html pages.
Ideally, I would like to have a search string, search the html files, then return a list of results (kinda like Google results) then when the user clicks on the link for a particular result, the page opens and navigates to the result location within the page.
I'm familiar with c#/javascript/jquery
Any ideas/suggestions to achieve this would be welcome...or confirmation that this cant be done :)
Don't Google, Bing, and other search engines provide APIs that let you use them to index the site then use their search capabilities to show results on only your site?
Is it possible to make JSON data readable by a Google spider?
Say for instance that I have a JSON feed that contains the data for an e-commerce site. This JSON data is used to populate a human-readable page in the users browser. (I.E. The translation from JSON data to human displayed page is done inside the users browser; not my choice, just what I've been given to work with, its an old legacy CGI application and not an actual server-side scripting language.)
My concern here is that, the google spiders will not be able to pickup/directly link to the item in question when a user clicks on it in google, being presented with an index page full of all the items, rather than being linked directly to the item they clicked on.
Is there anyway of "informing" the google spider in the JSON that what they should feed the user a different link?
While Google does crawl and index JavaScript in some circumstances, it's still best to serve "normal" (X)HTML content if at all possible. In this case, it would help to know the rest of the site's setup, in particular: is the JSON content just used to create a feed of links to the product pages (with static content) or are all product pages also generated by JSON feeds? If the feed is only used to point to the actual product pages (which are static) then one way to make the product pages discoverable could be to create a HTML sitemap page or some other alternate form of navigation. A XML Sitemap file can also help, but I would recommend not using it as the sole way of making the product pages discoverable.
If all of the content is only accessible through JSON feeds, then I think you will have to make some bigger changes if you want that content to be accessible through search results.
One way to handle it could also be to use the new JavaScript crawling/indexing proposal, which basically would result in a headless browser being set up between your site and Google: http://code.google.com/web/ajaxcrawling/ (whether setting this up or revamping the rest of the site is easier is hard to say :-))
You should make a wrapper page in server-side code around the JSON data, and respond to requests with either the wrapper or the regular version depending on the User-Agent.
Is there a widely used standard way on how to index ajax loaded content (for search engines)?
For example, indexing HTML content that would dynamically be inserted into a page.
Thanks
You may want to consider using some sort of sitemap generator that aggregates all the content you normally load through AJAX.
Sitemaps are particularly beneficial
on websites where:
Some areas of the website are not available through the browsable
interface, or
Webmasters use rich Ajax, Silverlight, or Flash content that is
not normally processed by search
engines.
From Wikipedia - Sitemaps
Remember that:
Because most web crawlers do not
execute JavaScript code, publicly
indexable web applications should
provide an alternative means of
accessing the content that would
normally be retrieved with Ajax, to
allow search engines to index it.
From Wikipedia - AJAX Drawbacks
In addition you may be interested in checking out the following articles:
Official Google Webmaster Central Blog - A proposal for making AJAX crawlable
SoftwareDeveloper.com - How to: Get Google and AJAX to Play Nice
Crawling Ajax-driven Web 2.0 Applications
One way of doing this is using JS fallbacks for dialog boxes like thickbox: A link would point to the dialog box loading Ajax content, and the fallback href='...' would point to a search engine-readable representation of that content (i.e. the HTML snippet that the AJAX function would load, but surrounded by the necessary HTML body basics).
Example (I pulled rel='box' out of my arse, this is supposed to be the anchor for the box plugin, like rel=thickbox):
<a href='/encyclopedia/definition/mushroom.html' rel='box'>Definition of Mushroom</a>
Clicking on the link in a Ajax/JS enabled browser will open a nice dialog box with the article
Clicking on the link without JS (or as a search engine) will lead to a new page containing the article (which needs some server side intelligence to detect which channel the request came from).
That's all that comes to my mind in this direction. Ajax and search engines is a widely uncharted field otherwise.
Have Javascript fallbacks. Have a look at Amazon Diamond Search with and without Javascript enabled. Read up on http://www.seroundtable.com/archives/006889.html
I don't really know the answer, but it seems to me that ajax-loaded content won't help to improve se positions because search engine can't refer to ajax-loaded content. Another words search engine can't say: "Hey, go here and then click 3rd button from the top to see the content you're interested in.".
I think that good idea is to put this content to xml and put link to this xml at tag (like URL to RSS)...
What about using an alternative content for JS disabled clients (search engines)? I think there is no other way of letting the search engines index your AJAX site properly.
I think actually only Google really implements a specification to index AJAX content.
It's the Google AJAX crawling specification.
We have used that for our website, there is an example in our technical blog on how to do that with Django in a clean way.