I am using a paid geolocation script to direct users to specific sites based on country.
However, I am getting charged a lot because robots keep crawling every page of my large site.
If I were to disallow google within the robots.txt and provide a sitemap within the robots.txt would google still index my page without crawling?
Example
User-agent: *
Disallow: /
Sitemap: sitemap.xml
Google index only with crawling...
The best thing to do for you, is to disable the geolocation script when you detect a Google robot (or other)
You can recognize them in various ways: HTTP_USER_AGENT or HTTP_FROM, or IP
Related
I'm developing a Google Workspace extension for Gmail which makes requests to backend API and fetches images from different CDN servers. Therefore it is not possible to include every possible URL in urlFetchWhitelist property of the manifest. Is it possible to add only the server's hostname and use some kind of wildcard to include all of its possible suffixes?
Wildcards are supported in urlFetchWhitelist, but they are limited to sub-domains.
See documentation linked below:
https://developers.google.com/apps-script/add-ons/concepts/workspace-manifests#allowlist_urls
I create a web application and host this site using wordpress.When i search name in goole it is showing
A description for this result is not available because of this site's robots.txt
Why this is happening.Is there problem in metatag?
Your site’s robots.txt file is disallowing crawling of the page you found in Google Search. This means Google’s bot won’t visit this page to read its content.
The robots.txt file exists at the URL /robots.txt, e.g., http://example.com/robots.txt.
You’ll want to look for and check any lines starting with Disallow::
Disallow: allows everything
Disallow: / blocks everything
Disallow: /a blocks every URL whose path starts with /a,
e.g. http://example.com/a, http://example.com/abc, http://example.com/a/b.html etc.
If you have the Yoast SEO plugin installed, you can find and edit the robots.txt file at
SEO > Tools > File Editor
If you use Google Search Console (highly recommended) then you can test your robots.txt file
Go to Crawl > robots.txt tester
GSC - https://www.google.com/webmasters/tools/home?hl=en
Based on Google's recent announcement I need a way to load the Google Maps Javscript API from my mobile hybrid/Cordova app. I could whitelist file:/// URLs from my console, but I'd rather not because that means anyone who could learn my client ID could then use it in their own app and I'd have no way to protect myself from that.
Apparently Google now supports some kind of API key, but only for Premium accounts created since January of this year, which mine is not.
Is there some other way to allow my mobile app to access the Google Maps Javascript API without opening up such a risk?
If you are a Standard Plan user: you need to load the Maps Javascript API with a key. Per the current limitation on API keys and file:// URLs, you will have to open up the key. You can star this bug to be alerted of updates.
If you are a Premium Plan user, you also have the option to use a client ID, which can be more tightly secured. You can file a support case to request that your client ID authorizes only the file:// URL(s) that you are using.
UPDATE
Restrictions for file protocol were introduced in Google Maps JavaScript API. You can find information in the official documentation:
https://developers.google.com/maps/documentation/javascript/get-api-key#key-restrictions
file:// referers need a special representation to be added to the Key restriction. The "file:/" part should be replaced with "__file_url__" before being added to the Key restriction. For example, "file:///path/to/" should be formatted as "__file_url__//path/to/*". After enabling file:// referers, it is recommended you regularly check your usage, to make sure it matches your expectations.
How to disable and remove subdomain.domain.com being crawled and listed to alexa and other crawlers ? Specially the cpanel.domain.com and webmail.domain.com that listed into my alexa information page and annoying :/ .
From this article: https://alexa.zendesk.com/hc/en-us/articles/200450194-Alexa-s-Web-and-Site-Audit-Crawlers
The Alexa web crawler (robot) identifies itself as “ia_archiver” in the HTTP “User-agent” header field. The Alexa Internet ia_archiver crawler strictly adheres to robots.txt rules.
To prevent ia_archiver from visiting any part of your site, your robots.txt file should look like this:
User-agent: ia_archiver
Disallow: /
You can also restrict crawling of specific directories. For example, to prevent ia_archiver from visiting the images directory (and its subdirectories):
User-agent: ia_archiver
Disallow: /images/
If you can you can place a robots.txt in the root of the subdomains you do not wish to have crawled. If these pages are outside of your control; the hosting service should/could have done these or similar restrictions.
i would like to ask you a question...
i have a domain kiosban.com and store.kiosban.com..
and i want to disallow
store.kiosban.com/template/*
And i have this on my store.kiosban.com/robots.txt
but when i look at google webmaster tools... on health menu >> Blocked Url, i got
robots.txt file Blocked URLs Downloaded Status
http://www.store.kiosban.com/robots.txt - Never
Did i do something wrong??
www.store.kiosban.com and store.kiosban.com are different hosts. You should provide a robots.txt on both hosts, or even better, 301-redirect one form to the other one.
But that doesn’t seem to be the issue in your case. It looks like Google just needs some time to crawl your site resp. its robots.txt.