Google showing "A description for this result is not available because of this site's robots.txt " - html

I create a web application and host this site using wordpress.When i search name in goole it is showing
A description for this result is not available because of this site's robots.txt
Why this is happening.Is there problem in metatag?

Your site’s robots.txt file is disallowing crawling of the page you found in Google Search. This means Google’s bot won’t visit this page to read its content.
The robots.txt file exists at the URL /robots.txt, e.g., http://example.com/robots.txt.
You’ll want to look for and check any lines starting with Disallow::
Disallow: allows everything
Disallow: / blocks everything
Disallow: /a blocks every URL whose path starts with /a,
e.g. http://example.com/a, http://example.com/abc, http://example.com/a/b.html etc.

If you have the Yoast SEO plugin installed, you can find and edit the robots.txt file at
SEO > Tools > File Editor
If you use Google Search Console (highly recommended) then you can test your robots.txt file
Go to Crawl > robots.txt tester
GSC - https://www.google.com/webmasters/tools/home?hl=en

Related

Protect img for google

If I create a folder on the server with pictures, I would like to load them ONLY for users with the correct password.
in short:
the user enters the password and then we use Ajax to load the image as img src syntax into the HTML file.
I realize that the direct image call is also possible without a password. However, the pictures are in very unusual folder paths.
what I'm interested in:
if Google or any other search engine crawls / indexes my page, will these images also be inserted and could appear in Google Image Search?
reply
In general, search engines will only crawl your HTML page and links inside it, not the actual folder structure and files of your server. It actually shouldn't even have access to your server files :)
If your images are not linked in the page, you should be fine.
That said, you can always use a robots.txt. From the official documentation:
A robots.txt file tells search engine crawlers which pages or files the crawler can or can't request from your site.
Use robots.txt to manage crawl traffic, and also to prevent image, video, and audio files from appearing in Google search results.
Link: https://support.google.com/webmasters/answer/6062608?hl=en

Disallow google robot from robots.txt and list sitemap instead

I am using a paid geolocation script to direct users to specific sites based on country.
However, I am getting charged a lot because robots keep crawling every page of my large site.
If I were to disallow google within the robots.txt and provide a sitemap within the robots.txt would google still index my page without crawling?
Example
User-agent: *
Disallow: /
Sitemap: sitemap.xml
Google index only with crawling...
The best thing to do for you, is to disable the geolocation script when you detect a Google robot (or other)
You can recognize them in various ways: HTTP_USER_AGENT or HTTP_FROM, or IP

disable crawl unwanted subdomain

How to disable and remove subdomain.domain.com being crawled and listed to alexa and other crawlers ? Specially the cpanel.domain.com and webmail.domain.com that listed into my alexa information page and annoying :/ .
From this article: https://alexa.zendesk.com/hc/en-us/articles/200450194-Alexa-s-Web-and-Site-Audit-Crawlers
The Alexa web crawler (robot) identifies itself as “ia_archiver” in the HTTP “User-agent” header field. The Alexa Internet ia_archiver crawler strictly adheres to robots.txt rules.
To prevent ia_archiver from visiting any part of your site, your robots.txt file should look like this:
User-agent: ia_archiver
Disallow: /
You can also restrict crawling of specific directories. For example, to prevent ia_archiver from visiting the images directory (and its subdirectories):
User-agent: ia_archiver
Disallow: /images/
If you can you can place a robots.txt in the root of the subdomains you do not wish to have crawled. If these pages are outside of your control; the hosting service should/could have done these or similar restrictions.

Disallow subdomain url using robots.txt

i would like to ask you a question...
i have a domain kiosban.com and store.kiosban.com..
and i want to disallow
store.kiosban.com/template/*
And i have this on my store.kiosban.com/robots.txt
but when i look at google webmaster tools... on health menu >> Blocked Url, i got
robots.txt file Blocked URLs Downloaded Status
http://www.store.kiosban.com/robots.txt - Never
Did i do something wrong??
www.store.kiosban.com and store.kiosban.com are different hosts. You should provide a robots.txt on both hosts, or even better, 301-redirect one form to the other one.
But that doesn’t seem to be the issue in your case. It looks like Google just needs some time to crawl your site resp. its robots.txt.

How do you create a robots.txt file that blocks all but the root

How do you create a valid robots.txt file that blocks all crawler requests except for the root, aka landing http://www.mysite.com
Assuming your default page for the root is named index.htm, I believe this will accomplish what you're looking for.
User-agent: *
Allow: /index.htm
Disallow: /
Google's Webmaster Tools has some great help for formulating a robots.txt and if you use the Webmaster Tools, you also get a robots.txt builder/tester.