iFrames and encodings - html

I am designing a web site that will rely on iFrames to show third party content. Given that, I have two problems.
This third party content may come in different encodings.
Almost nobody defines the encoding of a html file.
Ok, in this case the browser will try to infeer the encoding but as my tests shows it won´t infeer the encoding of each iframe sepparatly and, hence, some iframe will have it´s content messed up.
To reproduce create the following files:
index.html (encoded utf-8)
<html>
<iframe src="utf.html"> </iframe>
ááá
<br />
<iframe src="iso.html"> </iframe>
</html>
utf.html (encoded UTF-8)
<html>
ááááéééé
</html>
iso.html (encoded ISO-8859-1)
<html>
ááééíí
</html>
Right. If you see the results won´t be perfect.
If I add encoding info in meta tag properly it will work.
Remember I can´t change third party content. So, long story in short, the question is. In my example, is there a way to make it show all characters properly editing only index.html?
Thank you

There is not a way to do this client side. The browser will block this because of cross domain security policies. You will need to proxy the pages through your server and modify the output.

Related

Open Graph share debugger scrapes empty html

I'm trying to set up Open Graph meta tags for a website. When I access the site normally using a browser and inspect the source, the tags are there. However, they don't show up when I use the OG debugger.
The site that I'm developing is here spurafrika-org.vercel.app (Next.js site). It's replacing the original site here: spurafrika.org (WordPress site).
When I use the See exactly what our scraper sees for your URL feature, I get this:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head>
<body><p>ÿþ</p></body>
</html>
See for yourself here. Vastly different from my actual source for my websites.
I originally thought it might have been a Vercel/Next.js issue, but when I discovered it also happening on the WordPress site, I was very confused: see this. I've checked other sites developed with Next.js and WordPress - the meta tags work fine on the debugger.
Another point of confusion is that the debugger tool seems to be able to pick up that on my Next.js site I've listed https://spurafrika.org as its canonical URL, which it can only tell through my og:url tag. Yet when I view what the debugger supposedly sees, it shows the above empty HTML snippet.
I thought perhaps it might have been an encoding or parsing issue, but I've validated my HTML source using several tools and there are no problems.
I'm stumped. Anyone know why this is happening?
I copied your code against a code sandbox - https://developers.facebook.com/tools/debug/?q=https%3A%2F%2Fkzi2c.csb.app%2F
Initially keeping the og:url as https://spurafrika.org/ caused warnings and og:description not getting picked up, once i pointed it to the actual url it all got fixed
Change the og:url to the right url may fix it, give it a shot and let us know.

X-Frames-Options in the meta tag

I've created an test application where i look into the different defense techniques against Clickjacking and other UI redressing attacks. One of the most used techniques is the X-Frames-Options along the Frame-Busting code. What i fail to understand is the reason why the following isn't recommended, and according to OWASP: (https://www.owasp.org/index.php/Clickjacking_Defense_Cheat_Sheet) doesn't work (Even though it works in my test applciation, i can't frame the page if the following is included)
<meta http-equiv="X-Frame-Options" content="deny">
Any explanation or link to an answer would be greatly appreciated.
Apparently this is because the META tag might not be received until information has already rendered in the subframe. This still works in browser such as Chrome and Firefox, but is ignored by IE.
According to many resources (not only your URL, but also e.g. this one) the <meta> tag should be ignored.
If your browser does not do so, that does not mean that all browsers don't do that as well. So to be on a safe side you must specify the HTTP header.
The question why so? Probably one of the reasons is the same why they tell to avoid using the following:
<meta name="robots" content="noindex" />
The reason, in my opinion, is that to get this meta tag you need to download and parse the whole page. To read the HTTP header you don't need to do so.
In this case HTTP header is just more efficient way to speed up the browser, so that could be the reason of forcing you to kill the meta tags.

IE 10 does not load page in UTF-8

I've got simple HTML pages in Russian with a bit of js in it.
Every browser going well except IE10. Even IE9 is fine. Next code is included:
<html lang="ru">
<meta http-equiv="Cоntent-Type" content="text/html"; charset="utf-8">
Also I've added .htacess with
AddDefaultCharset UTF-8
Still IE10 loads page in Cyrillic encoding (cp-1251 I believe), the only way to display characters in a right way is to manually change it to UTF-8 inside of a browser (or chose auto-detect mode).
I don't understand why IE10 force load 1251 instead of UTF-8.
The website to check is http://btlabs.ru
What really causes the problem is that the HTTP headers sent by the server include
Content-Type: text/html; charset=windows-1251
This overrides any meta tags. You should of course fix the errors with the meta tag as pointed out in other answers, and run a markup validator to check your code, but to fix the actual problem, you need to fix the .htaccess file. Without seeing the file and other server-side issues, it is impossible to tell how to fix that (e.g., server settings might prevent the effect of a per-directory .htaccess file and apply one global file set by the server admin). Note that the file name must have two c's, not one (.htaccess, not `.htacess').
You can check what headers are e.g. using Rex Swain’s HTTP Viewer.
The reason why things work on other browsers is that they apply the modern HTML5 principle “BOM wins them all”. That is, an HTTP header wins a meta tag in specifying the character encoding, but if the actual data begins with three bytes that constitute the UTF-8 encoded form of the Byte Order Mark (BOM), then, no matter what, the data will be interpreted as UTF-8 encoded. For some unknown reason, IE 10 does not do that (and neither does IE 11).
But this won’t be a problem if you just make the server send an HTTP header that declares UTF-8.
If the server has been set to declare windows-1251 and you cannot possibly change that, then you just need to live with it. Transcode your HTML files to windows-1251 then, and declare windows-1251 in a meta tag. This means that if you need any characters outside the limited repertoire representable in windows-1251, you need to represent them using character references.
perhaps because your 'o' in 'content' is not an ascii 'o'. notice that it is not red in Stackoverflow? i then copied it to a good text editor and see that it is indeed not an o. because the 'o' is not really an ascii 'o', that whole line probably should get ignored in every web browser, which should then depend on what default charset it uses. Microsoft and IE is notorious for picking bad defaults, thus is my reason why it doesn't work in IE. ;)
but codingaround has good advice too. it's best to put quotes around your attribute values. but that should not break a web browser.
you should use a doctype at the start:
<!DOCTYPE html>
<html lang='ru'>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
but the real culprit is your content and charset problem. notice my line. mine is very different. ;) that's the problem. note that mine has two ascii 'o's, one in "Content-Type" and another in 'content='.
As Shawn pointed out, copy and paste this:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
This is a really good example of how non-Ascii letters that look like English Ascii letters can really mess things up!
Maybe you forgot changing cоntent=text/html; to cоntent="text/html";
As Shawn has already pointed out, it could also be content="text/html; charset=utf-8".
But as you have tried both things out, can you confirm if the IE10 output looks like this?
I can't really help further with this, as the only thing I have here is an IE 10 online emulator.
So far the possible problems are:
Different o character
I see, that the <meta> tag is still outside of <head>, put it in place
Problems with IE handling the content and charset attributes

IE loses automatic UTF-8 encoding in iframe form target

I have an odd problem in IE. It has to do with how IE detects the encoding of an iframe based on its parent content. My application wraps the content of a page in an iframe, and sets the encoding of the parent window to UTF-8 through the Content-Type header. The content of the iframe does not set the encoding through the Content-Type, and picks up the parent window's encoding on its initial load. This is the desired behavior - the content window requires the UTF-8 encoding for some language content, but for complicated reasons beyond my control, it cannot forcibly set its own encoding, so it relies on the parent window's encoding.
The problem arises when the content page is the target of a form action. When the form submits and the page loads in the content window, it auto-selects Western European (Windows) encoding. Does anyone know why? I've tried searching for any sort of documentation on related behavior, but the googles, they do nothing. Any sort of a lead (beyond sending a Content-Type header or a byte-order mark in the content) would be most helpful.
I unfortunately don't have a public place to host this, but copy-pasting these code samples to local files and saving each with UTF-8 encoding without a byte-order mark should consistently reproduce the behavior in all versions of IE.
frame1.html
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
<div>エンコード</div>
<iframe src="frame2.html"></iframe>
frame2.html
<form>
<input value="エンコード">
<input type="submit">
</form>
To recap with the example, if you load the page and check the encoding of both the parent and the iframe, you should see "Auto-Select" checked and "UTF-8" selected in both. If you hit Submit in the iframe, the frame will reload and the input text will be garbled. Checking the encoding of the iframe should still show "Auto-Select" checked, but now "Western European (Windows)" will be selected instead of "UTF-8". I need to know if there is anything else I can do to make it automatically preserve the UTF-8 encoding when the form action completes.
Thanks in advance!
When you say you cannot add a Content-Type header/BOM, are you able to add the Content-Type as a meta tag? Something like:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Recently I had very similar issues - IE auto-detecting Western European at all times, except when a certain popup window navigated to the page, which then caused IE to pick UTF-8. I was never able to track down exactly what caused it (the resulting page was identical, only the page that linked to it was different!), so we ended up fixing it by forcing UTF-8 across the entire application (with headers).
If you're really unable to modify the inner page in any way, is it possible you could "replace" this page with your own, and then send the content over to the "other" server via an API or HTTP POST where you wouldn't need to worry about IE's "auto-detecting"?

What's the difference between IFrame and Frame?

Looking at options for embedding the 3D Secure page inside my own order form, I came across the following:
"Some commerce sites will devote the full browser page to the authentication rather than using a frame (not necessarily an iFrame, which is a less secure object anyway)."
from http://en.wikipedia.org/wiki/3-D_Secure
Can someone give me the lowdown as to why iframes are less secure, and cause problems, as opposed to normal frames? And what are the basic differences?
The way I see it, iframe is the way to go.
The difference is an iframe is able to "float" within content in a page, that is you can create an html page and position an iframe within it. This allows you to have a page and place another document directly in it. A frameset allows you to split the screen into different pages (horizontally and vertically) and display different documents in each part.
Read IFrames security summary.
IFrame is just an "internal frame". The reason why it can be considered less secure (than not using any kind of frame at all) is because you can include content that does not originate from your domain.
All this means is that you should trust whatever you include in an iFrame or a regular frame.
Frames and IFrames are equally secure (and insecure if you include content from an untrusted source).
iframes are used a lot to include complete pages. When those pages are hosted on another domain you get problems with cross side scripting and stuff. There are ways to fix this.
Frames were used to divide your page into multiple parts (for example, a navigation menu on the left). Using them is no longer recommended.
Basically the difference between <frame> tag and <iframe> tag is :
When we use <frame> tag then the content of a web page constitutes of frames which is created by using <frame> and <frameset> tags only (and <body> tag is not used) as :
<html>
<head>
<title>HTML Frames</title>
</head>
<frameset rows="20%,70%,10%">
<frame name="top" src="/html/top.html" />
<frame name="main" src="/html/main.html" />
<frame name="bottom" src="/html/bottom.html" />
</frameset>
</html>
And when we use <iframe> then the content of web page don't contain frames and content of web page is created by using <body> tag (and <frame> and <frameset> tags are not used) as:
<html>
<head>
<title>HTML Iframes</title>
</head>
<body>
<p>See the video</p>
<iframe width="854" height="480" src="https://www.youtube.com/embed/2eabXBvw4oI"
frameborder="0" allowfullscreen>
</iframe>
</body>
</html>
So <iframe> just brings some other source's document to a web page. The <iframe> are used to specify inline frames or floating frames. The World Wide Web Consortium (W3C) included the <iframe> feature in HTML 4.01.
<frameset> tags were used to create frames with the tag <frame> whereas <iframe> fulfills functions of both <frame> and <frameset> tags. Unlike <frame> tags, <iframe> tags can also be placed inside the <body> tags.
Placement of <iframe> is easy, a coder can easily put the <iframe> tag among the other webpage tags, and also add several <iframe> tags if he/she wants. On the other hand, placing <frame> tags in <frameset> is bit complicated.
Note : <frame> tag and <frameset> tag are deprecated in HTML5
So now as use of <frame> and <frameset> tag is deprecated so web developers use <body> tag for creating content of a webpage and for embedding some other source's document in the web page <iframe> tags are used. But even <frame> tags were also used to embed other source's document in a webpage and even <iframe> tags are also used to create frames.
The only reasons I can think of are actually in the wiki article you referenced to mention a couple...
"The "Verified by Visa" system has drawn some criticism, since it is
hard for users to differentiate between the legitimate Verified by
Visa pop-up window or inline frame, and a fraudulent phishing site."
"as of 2008, most web browsers do not provide a simple way to check
the security certificate for the contents of an iframe"
If you read the Criticism section in the article it details all the potential security flaws.
Otherwise the only difference is the fact that an IFrame is an inline frame and a Frame is part of a Frameset. Which means more layout problems than anything else!
Inline frame is just one "box" and you can place it anywhere on your site.
Frames are a bunch of 'boxes' put together to make one site with many pages.
While the security is the same, it may be easier for fraudulent applications to dupe users using an iframe since they have more flexibility regarding where the frame is placed.