GOOGLE: how to prevent subpages from appearing in results - html

I have a fairly new website which allows people to create their own profiles and such. The issue is that when someone links to their profile from their website/blog, their profile shows up in google searches for my website - and to date the one person who has done this has a NSFW profile. Which means, when you search for my site on Google one of the top results is a NSFW page.
How do I prevent google from listing subpages in the results? Would robots.txt solve this? And if a page is already listed, will adding an entry in robots.txt disallowing access to profile pages in general end up removing it from the results?

robots.txt will solve it to some extent. If there are direct external links, then I have found that google still indexes them.
Go to http://webmaster.google.com, get your website claimed, and then use their URL removal tool.

Yes, see http://www.robotstxt.org/. Just list things like "Disallow: /profile/" etc and google will stop indexing them and after a time, remove them.

Related

Google indexing page even when there are no links to it

The google indexed one of my pages, even there are no (I mean NONE) links to that page from anywhere (its a secret project and only 3 peopels know about it), but if I try to google its title, its in the results clearly indexed.
Does somebody know they did it? My theory is Google Chrome input when you go to the page, it just saves it to DB and crawls it.
Are there some pages talking about that? I tried to google it, but did not find anything.
Thanks.

How to find the parent page of a webpage

I have a webpage that it cannot be accessed through my website.
Say, my website is www.google.com and the webpage that I cannot access using the website is like www.google.com/iamaskingthis/asdasd. This webpage appears on the google results when I type its content, however there is nothing which sends me to that page on my website.
I've already tried analyzing the page source to find its parent location but I can't seem to find it. I want to delete that page, but since I cannot find it, I can't destroy it either.
Thank you
You can use a robots.txt file to prevent search engine bots from visiting a page, and thus not showing search results for it.
For example, you can create a robots.txt file in the root of your website and add the following content to it:
User-agent: *
Disallow: /mysecretpage.html
More details at: http://www.robotstxt.org/robotstxt.html
There is no such concept as a 'parent page'. If you mean, by which link Google found the page, plese keep in mind, that it need not be under your control: If I put a link to www.google.com/iamaskingthis/asdasd on a page on my website and thegooglebat crawls it, it will know about it.
To make it short: There is no reliable way of hiding a page on a website. Use authentication, if you want to restrict access.
Google will crawl the page even if the button is gone, as it already has the page stored in it's records. The only way to disallow google crawling to it is either robots.txt or simply deleting it off the server (via FTP or your hostings control panel).

How to embed/integrate WordPress blog into my own web site?

I have a WordPress blog account already (abc.wordpress.com). And I have my own web site: www.xyz.com
I would like to integrate my WordPress blog content into my own site. Hopefully something like blog.xyz.com or just replace the home page of xyz.com with abc.wordpress.com
I know that I can download WordPress' code from wordpress.org and run my own WordPress. And having my own MySQL database, but WordPress is always releasing new code. I don't have the time to keep updating the source on my end to match it.
I'm running my own site as a hobby, so I prefer to let WordPress.com to manage the content for me and continue reuse my own blog at abc.wordpress.com, but make the content show up in my own site: xyz.com
I hope I was clear when explaining this.
Anyone knows a way to do this?
Thanks.
If your main worry is about the updates, I would say don't be. A simple click of the 'Updates' button in the wordpress admin is all you need to do in order to apply the updates for wordpress. A notification will pop up alerting you of any updates.
And as Calle has already mentioned, you can retrieve your content via RSS, or you could just export your current content from Wordpress.com, import the content into your own site, and manage it there. Everything would be in one spot.
Good Luck.
I don't know how good you are with programming but there's a PHP library called Simple Pie which would help you retrieve your content via RSS (which Wordpress automatically generates for you). The adress is here: http://simplepie.org/
If you are not very good with programming, perhaps you can get someone to do it for you or find a script which is already written somewhere. I do think RSS is definitely the best way to go.
I also think you exaggerate the problems of hosting Wordpress yourself. It's not something that you have to keep updated with, and if you want to, all you have to do is log in from time to time, perhaps once a month (how often are you writing articles?), and click "update" and Wordpress will do everything for you. Both for your plugins and WP version.
For the ability to use your own domain (xyz.com) and have wordpress redirect users from abc.wordpress.com(your wordpress blog) to your domain requires a premium account.
If you have a premium account then you can just log in to wordpress.com, click 'upgrades' and select 'domains'. From there you will see the option "Map an Existing Domain" and you will want to enter your domain here. Now your wordpress.com blog is what will show when users enter your domain's url (xyz.com).
Alternatively, if you need a workaround with a free wordpress.com account then you want to just embed your blog and for that you will need to use an RSS feed. Note: this method will not maintain your wordpress styles it will merely transport the content. Also by default not all browsers support RSS feeds.
You can view your blog's current feed by adding 'feed' to the end of your wordpress.com url, i.e. abc.wordpress.com/feed. You can read more about feeds here (http://en.support.wordpress.com/feeds/). Now you are just left with the task of figuring out how to embed the feed into your page.
One final hail-mary you might attempt is just redirecting your domain to your blog. Reference on how to do this different ways here: (http://css-tricks.com/redirect-web-page/). Example, place this tag in the section of your domains pages:
<meta http-equiv="refresh" content="0; URL='http://google.com'" />
(this will redirect after 0 seconds to the specified url)

Short question about Google indexing of website and Google Webmaster Tools

For all you who know, in Google Webmaster Tools one can submit a sitemap or **sitemap_inde**x file and then google will fetch it and crawl the website when it "has time to".
I have searched for this but can't find an answer anywhere...
In the interface of webmaster tools, there is a section for "sitemaps" which lists all sitemaps submitted to google.
On the right of these sitemap names, there is a column saying something like "webadresses in webindex".
This have always shown 0 for all sitemaps.
I am guessing this means nr of pages indexed in the Sitemap.
My Q is, why is this showing 0 all the time? And is this actually the nr of pages indexed by google?
FYI, I have a very good and SE friendly website.
However, you should know it has only been a week that I have submitted the sitemaps.
Any ideas?
Well, sometimes it can take some time, unfortunatly it's quite random.
It happened to me once that, giving 5 different sitemap for 5 different websites at the same time, 4 was done in a week and 1 in a month...
Anyway,
in your sitemap, did you put <changefreq>monthly</changefreq> for the main page ?
on the "sitemaps" page, click on the sitemap you sent and watch the url of the site map (ie: Sitemap : http://www.mydomain.com/sitemap.xml) and see if there's any typo.
Finally, did you try to hit the "resent" link on that page ?
I have had some experience of the sitemapping process. Some software programs that create the XML sitemap will deliver XML that will get 'stuck'.
Have you tried creating the simplest sitemap possible for your site by hand and submitting that?

Do I need to submit the sitemap to search engines everytime it is updated?

If I have a sitemap_index.xml:
http://www.domain.com/sitemap.xml
2010-09-28
And I change the content or update the page, and then change the lastmod, will I then have to submit it again to the search engines, for example in google webmaster tools (the section where you submit sitemaps)?
Thanks
As long as you've told Google about the sitemap, they'll check it periodically. The more often it changes, the more they'll tend to check it.
If you go to Site configuration | Sitemaps, it'll tell you the last date they downloaded your sitemap.
No. It is however worth taking a look at the sitemaps page on webmaster tools every now and then and seeing if any errors were reported with the sitemap.
#Skilldrick is right!
Also, google states that the results are not effected by the sitemaps anyway. They should only give a guidance to the search spider. He/she will make the final decision!!