<meta name="robots"> not working - html

I have a page at:
https://www.luckycheckout.com/goto/282/cs/1?ct=1
which contains the following line of code in the head section:
<meta name="robots" content="noindex, nofollow" />
I also have "Disallow: /goto" in my "robots.txt" file.
However, despite this, Google Search Console is complaining that the page is:
Indexed, though blocked by robots.txt
As far as I can tell everything is both valid and correct and should not be indexed, can anybody explain what google is complaining about?
Thanks

You need to remove the Disallow for this site on the robots.txt file:
When Googlebot next crawls that page and see the tag or header, Googlebot will drop that page entirely from Google Search results, regardless of whether other sites link to it.
Important: For the noindex directive to be effective, the page must not be blocked by a robots.txt file. If the page is blocked by a robots.txt file, the crawler will never see the noindex directive, and the page can still appear in search results, for example if other pages link to it.
source: https://support.google.com/webmasters/answer/93710
You can also remove a site you own from the Google index, using the Google Search Console. You can find more information on the Google Webmaster documentation.

Related

How stop bots from crawling or indexing an Angular app

I want to publish an Angular app for testing purposes, but I want to make sure that the site does not get crawled or indexed by bots.
I assume (might be way off!) I would add my <meta> tags simply on my index.html page, and for good measure add a robots.txt file in my root?
These are my meta tags:
<meta name="robots" content="noindex,nofollow">
<meta name="googlebot" content="noindex" />
This is the content of my robots.txt file:
User-agent: *
Disallow: /
Thank you in advance!
Using the robots.txt file you specified will be enough to prevent your site from being indexed by the bots that follow the robots exclusion standard. With this robots.txt you don't need to specify the meta headers, because the bot read the robots.txt first and won't parse HTML of the website to read the meta tags.
The meta tags are used when your robots.txt file would normally allow to index that page, but you want to exclude it on the page-level, which allows more granular selection.
Note that some uncommon crawlers may not respect the exclusion standard. If you really want to restrict access to your test site, you should consider making it accessible only after authentication or allowing access only to certain IP addresses.

Remove dynamically generated url from google search

I have a page in my website, It takes two parameters in query string.
I don't want to show this page on google search. I put meta tag as below:
<meta name="robots" content="noindex">
But It still shows results as Cached, I tried google webmaster to remove the url's. It removes requested url but not stopping index the url and I am getting the url with some other parameter value. I don't want to show this page on search completely.
I looked at remove pages from google dynamic url - robots.txt
But I am not getting my answer.
Thanks
It's known issue. After you made all the recommended steps, i.e. added 'noindex' meta tag, disallowed it in robots.txt and removed from search in Webmaster Tools it may take a couple of weeks until the page completely disappears from search results. It's Google.
So, an only way to prevent the page from visiting is catch the request on server and redirect.

How to keep the search engines away from some pages on my domain

I've build a admin control panel for my website. I don't want the control panel app to end up in a search engine, since there's really no need for it. I did some research and i've found that by using the following tag, i can probably achieve my goal
<meta name="robots" content="noindex,nofollow">
Is this true? Is there other methods more reliable? I'm asking because i'm scare i could mess things up if i'm using the wrong method, and i do want search engines to search my site, just not the control panel...
Thanks
This is true, but on top of doing that, for even more security, in your .htaccess file, you should set this:
Header set X-Robots-Tag "noindex, nofollow"
And in you should create a new file in the root of your domain, named robots.txt with this content:
User-agent: *
Disallow: /
And you can be sure that they won't index your content ;)
Google will honor the meta tag by completely dropping the page from their index (source) however other crawlers might just simply decide to ignore it.
In that particular sense meta tags are more reliable with Google because by simply using robots.txt any other external source that is explicitly linking to your admin page (for whatever reason) will make your page appear in Google index (though without any content which will probably result in some SERP leeching).

How to add a redirect to a web page where you have limited user priveledges

The company I work for has replaced our previously very flexible website with a much more restrictive "website in a box" technology. I have my web pages hosted on Google Sites and would like to redirect people to those pages. When I attempt to do this via javascript it gets stripped from the page when its saved. I do not have access to the section to attempt the depreciated method of redirecting.
Is there another method available to automatically redirect a customer other than just posting a link in a restricted environment like this?
If you're limited to using HTML to do the redirect, you can use a meta redirect:
<meta http-equiv="refresh" content="0; url=http://example.com/">
Though note that its use is deprecated because it may be disorienting to the user. In addition to the <meta> tag, you can add <link rel="canonical" href="http://example.com/"> to let search engines know that the targeted page is the canonical one.
Edit: if Google Sites won't allow you to change the <head> HTML, the Javascript, or the PHP, then it's time to go searching for solutions within Google Sites itself. One solution that pops up pretty frequently in searches seems to be using a URL Redirect Gadget.
On the page you want to redirect from, click the Edit Page button, then Insert Menu, then More Gadgets. Once there, search for "redirect gadgets" and some widgets that should help will show up.
These instructions are based on advice given in the Google Products forums. I don't have a Google Site myself, so I can't verify that they work.

Preventing Site from Being Indexed by Search Engines

How can I prevent Google and other search engines from indexing my website?
I realize this is a very old question, but I wanted to highlight the comment made by #Julien as an actual answer.
According to Joost de Valk, robots.txt will indeed prevent your site from being crawled by search engines, but links to your site may still appear in search results if other sites have links that point to your site.
The solution is either adding a robots meta tag to the header of your pages:
<meta name="robots" content="noindex,nofollow"/>
Or, a simpler option is to add the following to your .htaccess file:
Header set X-Robots-Tag "noindex, nofollow"
Obviously your web host has to allow .htaccess rules and have the mod_headers module installed for that to work.
Both of these tags keep search engines from following links that point to your site AND displaying your pages in search results. Win-Win, baby.
Create a robots.txt file in your site root with the following content:
# robots.txt for yoursite
User-agent: *
Disallow: /
Search engines (and most robots in general) will respect the contents of this file. You can put any number of Disallow: /path lines for robots to ignore. More details at robotstxt.org.