How to find the parent page of a webpage - html

I have a webpage that it cannot be accessed through my website.
Say, my website is www.google.com and the webpage that I cannot access using the website is like www.google.com/iamaskingthis/asdasd. This webpage appears on the google results when I type its content, however there is nothing which sends me to that page on my website.
I've already tried analyzing the page source to find its parent location but I can't seem to find it. I want to delete that page, but since I cannot find it, I can't destroy it either.
Thank you

You can use a robots.txt file to prevent search engine bots from visiting a page, and thus not showing search results for it.
For example, you can create a robots.txt file in the root of your website and add the following content to it:
User-agent: *
Disallow: /mysecretpage.html
More details at: http://www.robotstxt.org/robotstxt.html

There is no such concept as a 'parent page'. If you mean, by which link Google found the page, plese keep in mind, that it need not be under your control: If I put a link to www.google.com/iamaskingthis/asdasd on a page on my website and thegooglebat crawls it, it will know about it.
To make it short: There is no reliable way of hiding a page on a website. Use authentication, if you want to restrict access.

Google will crawl the page even if the button is gone, as it already has the page stored in it's records. The only way to disallow google crawling to it is either robots.txt or simply deleting it off the server (via FTP or your hostings control panel).

Related

Add html to a site in a site (proxy)

I imported a web proxy from github known as rhodium on to replit, and, after some editing was satisfied with the results, but i cant seem to add HTML to a site that is proxied. Example: You use rhodium to navigate your way to www.discord.com, but you want HTML added to the page, "yourdomain.example/service/https://discord.com/". I looked at the files and online, but I wasn't able to find a way to edit the index.html of that specific page, but frankly I am extremely new to html. (and to a lot of things web-development).
https://github.com/LudicrousDevelopment/Rhodium
Any help available?
Based on what i know, you can't. Because of the security parameters. You can't attach or redirect a website which isn't on the same directory/server.
You can, however redirect to that site, inside or outside, freely.

Making directory not accessible Jekyll

I made a directory called downloads where I will have files that I want people to be able to access with the direct link on my Jekyll site, however I don't want them to be able to go to https://example.com/downloads and then see all of the downloads.
How can I turn off visitors from seeing the index page?
There are two approaches.
You can make an actual /downloads/index.html page that will be shown instead of the default directory-like page.
You can use the jekyll-redirect-from plugin and redirect the url /downloads/ to your 404 page, for example.

Website image doesn't show when linking through another site

I'm linking my website through another site (for example my linkedin page) and for some reason it doesn't show any default image, instead it has the default blank image. Linking other sites, I get it to show correctly. I read somewhere that it has to do with not having my site prefixed with www. by default. Is that relevant?
Here is my linked in page: https://www.linkedin.com/in/stefmoreau
As you can see some websites show with images but the last 2 don't. They also happen to not redirect to their www. prefixed version when viewing them.
Linkedin uses the Open Graph Protocol to get images. AFAIK it's not related to the "www" part.
Take great care with linkedin: they cache what their bot retrieves, and there's NO refresh for it you can trigger.
Hence, I'd advise to first get it right using e.g. Facebook's OG implementation as they at least have a tool to let you refresh what the crawler fishes up.
Linkedin doc
Facebook doc

How can I use google to find all unknown webpages point to one specific known web page?

It's easy to find all children pages of one webpage. But it is not trivial to get all parent pages of one webpage, how can I do that by using Google?
You can't and Google cannot help you as it doesn't index all of the web.
At best it follows links on other pages, or is initiated by someone wanting to have something indexed explicitly.
Create a server. Put an HTML page on it that no other page on your server has a link to. Name the page with some non-guessable UUID in the name.
Google will not find this unless they start to randomly change parts of URLs to test for existing pages (a lengthy process).
Within that page you can have links pointing to other pages. It is a parent page for those pages, is a web page, and will not be found via Google.

Google Crawl Errors Because Of External Links

I have tons of 404 crawl errors(my old url's).. I deleted them via Google Webmaster tools > remove url tool..
example: www.mysite.com/page1.html
But there are some external source sites which link my old urls on their content pages (ex: www.anothersite.com).. And because of they have my old urls on their pages, my url removal always fails..
What can i do now? i cannot delete these links; i don't know who is these websites owners.. And there are tons of external URLs like this; i cannot delete one by one via pressing button again and again.
Can robots.txt be enough? or what can i do more?
You dont want to use robots.txt for blocking the url(Google does not recommending).
404s are a perfectly normal (and in many ways desirable) part of the web. You will likely never be able to control every link to your site, or resolve every 404 error listed in Webmaster Tools. Instead, check the top-ranking issues, fix those if possible, and then move on.
https://support.google.com/webmasters/answer/2409439?hl=en