Google Crawl Errors Because Of External Links - google-maps

I have tons of 404 crawl errors(my old url's).. I deleted them via Google Webmaster tools > remove url tool..
example: www.mysite.com/page1.html
But there are some external source sites which link my old urls on their content pages (ex: www.anothersite.com).. And because of they have my old urls on their pages, my url removal always fails..
What can i do now? i cannot delete these links; i don't know who is these websites owners.. And there are tons of external URLs like this; i cannot delete one by one via pressing button again and again.
Can robots.txt be enough? or what can i do more?

You dont want to use robots.txt for blocking the url(Google does not recommending).
404s are a perfectly normal (and in many ways desirable) part of the web. You will likely never be able to control every link to your site, or resolve every 404 error listed in Webmaster Tools. Instead, check the top-ranking issues, fix those if possible, and then move on.
https://support.google.com/webmasters/answer/2409439?hl=en

Related

How can i tell google that i have removed the .html from my url?

Hi I have recently removed the '.html' from the end of my url's to make them look more professional which was brilliant. However, now when I see my site on Google the old url which includes the '.html' still appears which produces people with an error page as expected. How can I tell Google that I have new url addresses so that people can visit my site again?
thanks!
Best way to remove .html extensions is by adding it in .htaccess file. This way search engines will "understand" it, but you will not seeing the search result immediately, since search engine crawler, will take some time to update.
And make sure to submit your url in google. If you have google webmaster you will be able to see this process and status of your website more clearly.

Chrome says my website contains malware?

Chrome saying while I am accessing my site, after searching I cleaned my code from the site but chrome still showing then I removed all files from my site and just upload index.html (blank file) but warning is still showing.
Chrome warnings will be based on black-lists which record where malware has been found in a site or domain, this isn't a live "scan" and does not necessarily mean that malware is on that page or at that specific time. It is not clear from your question if you've created a new folder and index.html and you are also seeing a malware warning when browsing to that URL, or if you've replaced your site content with an empty folder and index.html and that warning is still showing. Once you have taken the steps to disinfect the site then you can request a review which should help remove the warning http://support.google.com/webmasters/bin/answer.py?hl=en&answer=163633.
The malware warning should be taken seriously even if you are confident in your own site content as crackers use automatic toolkits to find vulnerabilities in websites and inject code into them to infect visitors, as these kits are largely automatic there isn't the protection in obscurity you might otherwise assume.
If you've not been able to find and fix the issue Chrome is warning about, you owe it to your visitors- and your own reputation- to take the site content down until you can resolve the problem.
Google Chrome's malware blacklists should be based on same data used by Google's safebrowsing advisory. You can access this information for a particular site (e.g. stackoverflow.com) via the following url:
http://www.google.com/safebrowsing/diagnostic?site=stackoverflow.com
Just replace the domain with your own and it should give you some indication why your site generated malware warnings in Chrome.
1.In the top-right corner of the browser window, click the Chrome menu Chrome menu.
Select Settings.
Click Show advanced settings.
Under "Privacy," uncheck the box "Protect you and your device from dangerous sites."

How do I Prevent Httrack From Downloading the Same File Again?

I am using httrack to download this website:
http://4minutearticles.com/
However, the problem is that the author has link back to the main page on every page of his website
For example http://4minutearticles.com/ext/
The Parent Directory Link Redirect to the main page
and the software start downloading again
How do I prevent this loop from happening?
Read the answer to the question on the link provided below:
"I have duplicate files!What's going on?"
Link: http://www.httrack.com/html/faq.html#Q1b11
Also have a look at the "Filters:Advanced" on following link:
http://www.httrack.com/html/filters.html
It may help you on your issue.
You can use filters to stop HTTRACK from downloading same files or folders. You can do this by clicking the "Set options" button in front of the "Preferences and Mirror options" label, then opening the "Scan Rules" tab and then the "Exclude links" button to set the rules as you want.
This is generally the case for top indexes (index.html and
index-2.html).
This is a common issue, but that can not be easily avoided!
For example, http://www.foobar.com/ and
http://www.foobar.com/index.html might be the same pages. But if links
in the website refers both to http://www.foobar.com/ and
http://www.foobar.com/index.html, these two pages will be caught. And
because http://www.foobar.com/ must have a name, as you may want to
browse the website locally (the / would give a directory listing, NOT
the index itself!), HTTrack must find one. Therefore, two index.html
will be produced, one with the -2 to show that the file had to be
renamed.
It might be a good idea to consider that http://www.foobar.com/ and
http://www.foobar.com/index.html are the same links, to avoid
duplicate files, isn't it? NO, because the top index (/) can refer to
ANY filename, and if index.html is generally the default name,
index.htm can be choosen, or index.php3, mydog.jpg, or anything you
may imagine. (some webmasters are really crazy)
Note: In some rare cases, duplicate data files can be found when the
website redirect to another file. This issue should be rare, and might
be avoided using filters.
See also: Updating a project

How to find the parent page of a webpage

I have a webpage that it cannot be accessed through my website.
Say, my website is www.google.com and the webpage that I cannot access using the website is like www.google.com/iamaskingthis/asdasd. This webpage appears on the google results when I type its content, however there is nothing which sends me to that page on my website.
I've already tried analyzing the page source to find its parent location but I can't seem to find it. I want to delete that page, but since I cannot find it, I can't destroy it either.
Thank you
You can use a robots.txt file to prevent search engine bots from visiting a page, and thus not showing search results for it.
For example, you can create a robots.txt file in the root of your website and add the following content to it:
User-agent: *
Disallow: /mysecretpage.html
More details at: http://www.robotstxt.org/robotstxt.html
There is no such concept as a 'parent page'. If you mean, by which link Google found the page, plese keep in mind, that it need not be under your control: If I put a link to www.google.com/iamaskingthis/asdasd on a page on my website and thegooglebat crawls it, it will know about it.
To make it short: There is no reliable way of hiding a page on a website. Use authentication, if you want to restrict access.
Google will crawl the page even if the button is gone, as it already has the page stored in it's records. The only way to disallow google crawling to it is either robots.txt or simply deleting it off the server (via FTP or your hostings control panel).

Loading resources from html5 filesystem api

I am writing a chrome extension that dynamically writes some html pages and their resources to the file system. I have most things working but I just noticed that when I try to open one of the pages by navigating to the filesystem:chrome-extension://... url that I obtain via the fileentry.getURL() method, the page opens, but chrome does not fetch any of the associated resources: stylesheets, images etc. Any ideas why this might be? Are there some security flags I need to get this working? I am i going about this all wrong?
(One thing that may be relevant is that the resources are identified by relative urls. But I know they are correct relative to the file because if i manually resolve them and browse to the URLs I can fetch them.)
The page you include that uses the relative URLs doesn't understand the HTML5 filesystem's mapping. If you change the URLs to point to what the fileentry.getURL() calls give you, then this should work.
There's currently a bug that allows relative URLs in resources to be used like you're trying to do: http://crbug.com/89271