Do activities in iframe contribute to search history? - html

When you search anything in a browser it will be saved in your search history and that can be used for, e.g. displaying relevant ads to you.
I was wondering, if say you have an iframe linking to another website, will that contribute to your search history?
i.e. If I make a webpage where the user can enter a URL into a text input and the iframe loads the URL entered, will that count in your search history?

By default the iframes does not show up in the browser history as your browser history has a history of the pages visited.
If you want to save it in the browser history (depending on the browser) you can do it via javascript pushState, however you might encounter the origin errors. This will only work for the same origin websites.
https://developer.mozilla.org/en-US/docs/Web/API/History/pushState
Please note many websites block iframing them in your website via header: X-Frame-Options:SAME-ORIGIN due to security (for example google.com, youtube.com)

Related

Selectively blocking Youtube URLs

Is there a way to selectively block Youtube videos? I know the Chrome policy allows me to block URLs. And I could whitelist URLs. So, I can do this, for example, in a .reg file, for Windows:
[HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Google\Chrome\URLBlacklist]
"1"="youtube.com"
[HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Google\Chrome\URLWhitelist]
"1"="https://www.youtube.com/watch?v=VlPdfLr1FSo"
This would prevent the user from directly browsing to youtube.com or www.youtube.com. But, for that one youtube URL, the user can watch the video. Sounds great, except a few problems.
Once the user gets into Youtube given the above URL that is whitelisted, he is free to click on other links. Or search within the Youtube search interface. Any of those actions will allow the user to view any other youtube video. Note that the URL in Chrome does change to reflect the selected video -- and that video/URL is different from the one I whitelisted. But it's not blocked if the user first goes through the whitelisted URL.
If the user attempts to directly type in another URL (aside from the one that is whitelisted) in the Chrome address bar, then it is blocked. But, navigation within Youtube, after going through the whitelisted URL is not prevented.
Perhaps this is by design. I can see wanting to allow navigation to a given URL, and then movement within an application without everything breaking, in the case of application that is not a single page app. But, in my use case this is not what I want; once the user is granted access to a short list of whitelisted videos, they can then watch anything they want (even if not approved).
Is it possible to use a combination of the Chrome policy to blacklist and whitelist URLs and a Chrome Extension? Could a Chrome extension read the whitelisted URLs, and before navigation within Chrome to a second URL, it could check if the URL is whitelisted (and not blacklisted) and then prevent navigation?

Can images from another website create cookies on my site?

I have a static website, it only contains html and css. No javascript, no php, no databases. On this site, I'm using images, which I get from image-hosting websites (like imgur).
I've noticed when I visit my website (on Google Chrome at least), if I click the information button next to the URL, it says there are cookies on this site. If I click on the cookies button, it says The following cookies were set when you viewed this page and has a list from cookies, including from those sites that I use for image-hosting.
If I delete them, they come back after a while, but not immediately. I'm trying to avoid cookies as the site is very simple. Are they considered part of my site? If so, is there anything I can do, except hosting the images myself?
I always though that if you link to an image directly (as in a link ending in .png for example) it would be the same as if you were hosting the image yourself, and there would be no javascript being run (to save cookies).
Are they considered part of my site?
That depends on your perspective.
The browser doesn't consider them to be part of your site. Cookies are stored on a per-domain basis, so a cookie received in response to a request for an image from http://example.com will belong to http://example.com and not to your site.
However, for the purpose of privacy laws (such as GDPR) then they are considered part of your site and, if they are used by the third party to track personally identifiable information, you are required to jump through the usual GDPR hoops.
If so, is there anything I can do, except hosting the images myself?
Not really.
I always though that if you link to an image directly (as in a link ending in .png for example) it would be the same as if you were hosting the image yourself, and there would be no javascript being run (to save cookies).
Cookies are generally set with HTTP response headers, not with JavaScript.
Whenever a browser requests a file from a server it automatically forwards any cookie data along with the request. Image Hosting services may use that for different purposes.
I always though that if you link to an image directly (as in a link ending in .png for example) it would be the same as if you were hosting the image yourself, and there would be no javascript being run (to save cookies).
So the question is, how to they set these cookies?
Let's say, you use a simple img tag to load an image from a hoster.
<img src="imageHoster.tld/123xyz.png">
The site imageHoster.tld can handle that request by redirecting all requests to e.g. requestHandler.php and that file can set the cookie before sending the image with a simple
<?
setcookie("cookieName", "whateverValue", time()+3600);
header('content-type: image/png');
...
?>
What happens there is actually the same as if you would set the image source like that:
<img src="imageHoster.tld/requestHandler.php?img=123xyz">
Are they considered part of my site?
Since these so called third party cookies are set when visiting your site one could consider them as part of your site. To be on the safe side I would at least mention the use of third party services in the data privacy statement.
If so, is there anything I can do, except hosting the images myself?
Third party cookies can be disabled in the clients browser. But you can't disable them for the visitors of your site. So no, to avoid third parties setting cookies on client browsers visiting your site you can only avoid using their services.

Website image doesn't show when linking through another site

I'm linking my website through another site (for example my linkedin page) and for some reason it doesn't show any default image, instead it has the default blank image. Linking other sites, I get it to show correctly. I read somewhere that it has to do with not having my site prefixed with www. by default. Is that relevant?
Here is my linked in page: https://www.linkedin.com/in/stefmoreau
As you can see some websites show with images but the last 2 don't. They also happen to not redirect to their www. prefixed version when viewing them.
Linkedin uses the Open Graph Protocol to get images. AFAIK it's not related to the "www" part.
Take great care with linkedin: they cache what their bot retrieves, and there's NO refresh for it you can trigger.
Hence, I'd advise to first get it right using e.g. Facebook's OG implementation as they at least have a tool to let you refresh what the crawler fishes up.
Linkedin doc
Facebook doc

Google analytics and iframe content - will all tracking work?

I've seen a bunch of posts on here about google analytics tracking and iframes and how there could be some issues. Also have seen this: https://developers.google.com/analytics/devguides/collection/gajs/gaTrackingSite#trackingIFrames
I have tracking code in the parent website that I don't care about, and I have tracking code in the page that's embedded in the iframe that I do care about. The iframe content is a completely different domain.
I was wondering if the iframe page will be able to get all the information about demographics and properly be able to send data up to Google for event tracking and whatnot. Again, I don't care about the parent at all in this case. Just that the google analytics code in the iframe works completely on its own.
I feel like the article I posted above from Google is relevant for users that want to somehow link the analytics in the iframe with the analytics in the parent, but I could be mistaken?
The documentation describes exactly what you have to do. It really boils down to:
Load the iframe using _getLinkerUrl to link the visit inside the iframe with the visit on the top frame
Use P3P headers on the iframed page to work around stupid internet explorer.
I would add some notes:
Even if you don't care about the top level page you should add a tag into it, if you don't you can't use _getLinkerUrl and you lose the traffic source, etc. If you don't use _getLinkerUrl GA inside the iframe will think it's a brand new visit referral from the top level page.
Setting cookies inside an iframe, in a third-party domain is the definition of a third-party cookie. Because of that any browser that is set to block third-party cookies will block the GA cookies and GA won't work. This include Safari (both Desktop and Mobile) that are set to block third-party cookies by default. So if visits using Safari or iDevices are important for you (likely these days) this tracking won't probably give you good results. The only solution is to eliminate the iframed page, either put it in your domain or open it in a new window/tab.

How to find the parent page of a webpage

I have a webpage that it cannot be accessed through my website.
Say, my website is www.google.com and the webpage that I cannot access using the website is like www.google.com/iamaskingthis/asdasd. This webpage appears on the google results when I type its content, however there is nothing which sends me to that page on my website.
I've already tried analyzing the page source to find its parent location but I can't seem to find it. I want to delete that page, but since I cannot find it, I can't destroy it either.
Thank you
You can use a robots.txt file to prevent search engine bots from visiting a page, and thus not showing search results for it.
For example, you can create a robots.txt file in the root of your website and add the following content to it:
User-agent: *
Disallow: /mysecretpage.html
More details at: http://www.robotstxt.org/robotstxt.html
There is no such concept as a 'parent page'. If you mean, by which link Google found the page, plese keep in mind, that it need not be under your control: If I put a link to www.google.com/iamaskingthis/asdasd on a page on my website and thegooglebat crawls it, it will know about it.
To make it short: There is no reliable way of hiding a page on a website. Use authentication, if you want to restrict access.
Google will crawl the page even if the button is gone, as it already has the page stored in it's records. The only way to disallow google crawling to it is either robots.txt or simply deleting it off the server (via FTP or your hostings control panel).