How do I stop search engines indexing a maintenance page - html

I need to setup a maintenance page for a website I'm running, e.g. for display when I'm performing site maintenance (scheduled downtime) or if something really breaks and I need to put up a holding page.
Is there anything special I need to do to ensure that search engine crawlers don't index it and think that it's my site. Or should I do a 404, add a temporary robots.txt file or something? I basically don't want them to index it as my site, but I also don't want them to think my site is dead and not come back.
Edit: Here's what I did in Apache: ErrorDocument 503 /.server-maintenance.html RewriteEngine On RewriteRule !^.server-maintenance.html /server-maintenance Redirect 503 /server-maintenancestrong text

You should send a 503 Service Unavailable HTTP status code, and not a 404. Use this in conjunction with a Retry-After header to tell the robots when to come back.

You may use a robots.txt
http://www.robotstxt.org/
Also, google has a validator in their webmasters tools.
https://www.google.com/webmasters/tools/

Returning 503 Service Unavailable tells Google bots to come back later. There's a Google support page describing the HTTP error codes and how they are interpreted by them.
You can also use Retry-After response header to suggest the minimum time after which your site is re-checked for availability.

Another approach would be to not link the maintenance page from any other page on your website (or any other website).

Related

Looking for a script or program to convert all HTTP links to HTTPS

I am trying to find a script or program to convert my html website links from http to https.
I have looked all over hundreds of search results and web articles and I used the Word Press SSL plugin but it missed numerous pages with http links.
Below is one of thousands of my links I need to convert:
http://www.robert-b-ritter-jr.com/2015/11/30/blog-121-we-dont-need-the-required-minimum-distributions-rmds
I am looking for a way to do this quickly instead of one at a time.
The HTTPS Everywhere extension will automatically rewrite unsecure HTTP requests to HTTPS. Keep in mind not all websites offer a secure and encrypted connection.

REACT spa app - serving separate and different prerendered static html for SEO, benefits and drawbacks

Are there any benefits or drawbacks if you serving light version of page optimized for SEO if bots crawls and if people come from web then react SPA which completely javascript application.
Basically question is, is there practice to actually serve like short HTML version which contains only SEO important things and rips off everything else for bots and full page for users.
Is there any use case or example that somebody have used this technique?
This would be seen as Cloaking by the crawlers and could get your site penalized in the search results. If you are serving a prerendered page, you will want to make sure it is the exact page that your users will see after the javascript has been executed in order to prevent any cloaking issues.
You'll could mount prerender:
The Prerender.io middleware that you install on your server will check each request to see if it's a request from a crawler. If it is a request from a crawler, the middleware will send a request to Prerender.io for the static HTML of that page. If not, the request will continue on to your normal server routes. The crawler never knows that you are using Prerender.io since the response always goes through your server.
The Prerender.io middleware that you install on your server will check each request to see if it's a request from a crawler. If it is a request from a crawler, the middleware will send a request to Prerender.io for the static HTML of that page. If not, the request will continue on to your normal server routes. The crawler never knows that you are using Prerender.io since the response always goes through your server.
Quora SEO post
taekwondomonfils.com SEO

should rel-canonical also include protocol (http/https)?

I'm migrating my website from http to https (although it will still support access via http)
Currently all of my pages have accurate rel-canonical meta tags set in the HTML, but obviously they all point to the canonical http:// url.
Should I now be updating those to https:// too, or is it ok to leave them as http?
I'm wondering whether Google will penalise me, or start detecting duplicate content, if I start mixing them
Yes Google sees http and https as different sites so you should update them.
A redirect on the server might be sufficient in the short term but personally I would be looking to update the pages as soon as you can.

Link to http or https

While adding a hyperlink to another site (which has SSL), the site documentation sometimes say to link to the http:// link instead of the https:// (e.g. Play store, which is a site that uses SSL but it does not tell you to link to https; instead, it says to link to http). They do not matter (as they function normally), but would there be a reason to link to the http:// instead the https://?
Maybe they don't want extra encryption and lowering down the site speed as SSL may decrease performance somewhat.
If users are downloading large, public files, there may be a system burden to encrypt these each time.
Some browsers may not support SSL.
You will probably want the home page accessible via HTTP, so that users don't have to remember to type https to get to it.
Your specific portion of page needs secure http(https) not whole site.
Your site is indexed mainly on http on Search engines.

Disallow subdomain url using robots.txt

i would like to ask you a question...
i have a domain kiosban.com and store.kiosban.com..
and i want to disallow
store.kiosban.com/template/*
And i have this on my store.kiosban.com/robots.txt
but when i look at google webmaster tools... on health menu >> Blocked Url, i got
robots.txt file Blocked URLs Downloaded Status
http://www.store.kiosban.com/robots.txt - Never
Did i do something wrong??
www.store.kiosban.com and store.kiosban.com are different hosts. You should provide a robots.txt on both hosts, or even better, 301-redirect one form to the other one.
But that doesn’t seem to be the issue in your case. It looks like Google just needs some time to crawl your site resp. its robots.txt.