Why do HTML Entities get garbled in View Source?

Why do HTML Entities get garbled in View Source? - google-chrome

I've seen this behavior across several different browsers over the years (Chrome, Firefox, and Opera, at least), but most recently it happens only in Opera and Chrome - I think Firefox fixed it at some point. If I have a page which pushes a fairly sizeable chunk of data (several thousand lines of HTML) to the browser, if I use any HTML Entities in the data, they come through malformed when you view the source code.
For example, I put a "lower right pencil" entity ( ✎ - or ✎) throughout the contents of a page in order to label "Edit" links. However, when I load the same page in any browser and click "View Source", I see a random code that often does not match what is actually hard coded into the page HTML. Some examples include:
&x#x2#x270E;, &#x#x270E;, &#x270#x270E;
Examining a Fiddler capture of the actual source code being sent to the browser shows that the browser indeed receives the CORRECT codes. Something seems to go awry as soon as the browser tries to display it in a view-source tab.
It happens with other codes too, becomes &nbnbsp; or &nnbsp; etc. Mysteriously, these randomize with each refresh. Once in a while they come through correct, though most of the time they get garbled. The codes appear to render correctly on the front-end, is this just a bug in every major browser, or should I be concerned about data loss when pushing somewhat large data sets over HTTP?
Past Tests
I ran two tests to confirm this:
(1) Spammed a single character into a valid HTML5 page's contents hosted on a public facing AWS LAMP server. Viewed the contents in Opera and viewed source. Most were okay, but about half way down it starts to trip up, and continues sporadically throughout:
&#x27#x270E;
(2) Spammed a single character into a valid HTML5 page's contents hosted on an intranet Windows server and served over a NetExtender VPN. Same result as the first test.
&#x270#x270E;✎
Steps to Reproduce:
I have tested this on many different systems (Linux - like Ubuntu, Windows 7 and Windows 10 so far) on several different networks. However, I would appreciate if others could confirm this.
Create a valid HTML page and paste a single HTML Entity (either decimal or hexidecimal representation) between the body tags.
Copy and paste the character to fill up several hundred lines of content (less may be required, but more will be most likely to produce the same issue). For example: ... etc.
Save the page on your web server.
Load the page in a new Opera window.
Right click anywhere in the page and click "Page source"
Copy the source code and either manually examine it or just paste it into the W3 Validator at https://validator.w3.org - it will help to point out the incorrectly formatted HTML Entities.
Opera 49.0 Illustration
See below how the Code Inspector shows the correct HTML Entity code. However, when you view Page Source for the same section, the code gets malformed.

Related

Get rendered source code from web components site?

I just tried something rather trivial: get the source code of a web page (by saving it) and count how often a certain phrase occurs in the code.
Turns out, it doesn't work if that page uses Polymer / web components. Is this a browser bug?
Try the following: Go to http://www.google.com/design/icons/ and try to find star_half in the code (the last icon on the page). If you inspect the element inside of Chrome or Firefox, it will lead you to
<i class="md-icon dp48">star_half</i>
but this won't be in the source if you copy the root node or save the html to disk.
Is there a way to get the entire code?

Reason for this behavior is probably how source viewing( and source saving as well?) works for browser and because shadow roots are attached to web components on the client side.
When you press ctrl-u on a web page, browser essentially does a network call again on the same url to fetch a copy of what server returned when you hit that url.
In this case, when this page renders, browser identifies the component icons-layout and then executes code to attach a shadow-root to this node. All of this happens when your page reaches the client(browser).
When you are trying to save this page, you are saving what server returned not current state of the page. You'll see same behavior if you fire up chrome console and try to save an icons-layout node.
Is there a way to get the entire code?
I don't know how to do it from browser but phantomjs provides a way to save client side rendered html.

SVG stacking, anchor elements, and HTTP fetching

I have a series of overlapping questions, the intersection of which can best be asked as:
Under what circumstances does an # character (an anchor) in a URL trigger an HTTP fetch, in the context of either an <a href or an <img src ?
Normally, should:
http://foo.com/bar.html#1
and
http://foo.com/bar.html#2
require two different HTTP fetches? I would think the answer should definitely be NO.
More specific details:
The situation that prompted this question was my first attempt to experiment with SVG stacking - a technique where multiple icons can be embedded within a single svg file, so that only a single HTTP request is necessary. Essentially, the idea is that you place multiple SVG icons within a single file, and use CSS to hide all of them, except the one that is selected using a CSS :target selector.
You can then select an individual icon using the # character in the URL when you write the img element in the HTML:
<img
src="stacked-icons.svg#icon3"
width="80"
height="60"
alt="SVG Stacked Image"
/>
When I try this out on Chrome it works perfectly. A single HTTP request is made, and multiple icons can be displayed via the same svg url, using different anchors/targets.
However, when I try this with Firefox (28), I see via the Console that multiple HTTP requests are made - one for each svg URL! So what I see is something like:
GET http://myhost.com/img/stacked-icons.svg#icon1
GET http://myhost.com/img/stacked-icons.svg#icon2
GET http://myhost.com/img/stacked-icons.svg#icon3
GET http://myhost.com/img/stacked-icons.svg#icon4
GET http://myhost.com/img/stacked-icons.svg#icon5
...which of course defeats the purpose of using SVG stacking in the first place.
Is there some reason Firefox is making a separate HTTP request for each URL instead of simply fetching img/stacked-icons.svg once like Chrome does?
This leads into the broader question of - what rules determine whether an # character in the URL should trigger an HTTP request?

Here's a demo in Plunker to help sort out some of these issues
Plunker
stack.svg#circ
stack.svg#rect
A URI has a couple basic components:
Protocol - determines how to send the request
Domain - where to send the request
Location - file path within the domain
Fragment - which part of the document to look in
Media Fragment URI
A fragment is simply identifying a portion of the entire file.
Depending on the implementation of the Media Fragment URI spec, it actually might be totally fair game for the browser to send along the Fragment Identifier. Think about a streaming video, some of which has been cached on the client. If the request is for /video.ogv#t=10,20 then the server can save space by only sending back the relevant portion/segment/fragment. Moreover, if the applicable portion of the media is already cached on the client, then the browser can prevent a round trip.
Think Round Trips, Not Requests
When a browser issues a GET Request, it does not necessarily mean that it needs to grab a fresh copy of the file all the way from the server. If it has already has a cached version, the request could be answered immediately.
HTTP Status Codes
200 - OK - Got the resource and returned from the server
302 - Found - Found the resource in the cache and not enough has changed since the previous request that we need to get a fresh copy.
Caching Disabled
Various things can affect whether or not a client will perform caching: The way the request was issued (F5 - soft refresh; Ctrl + R - hard refresh), the settings on the browser, any development tools that add attributes to requests, and the way the server handles those attributes. Often, when a browser's developer tools are open, it will automatically disable caching so you can easily test changes to files. If you're trying to observe caching behavior specifically, make sure you don't have any developer settings that are interfering with this.
When comparing requests across browsers, to help mitigate the differences between Developer Tool UI's, you should use a tool like fiddler to inspect the actual HTTP requests and responses that go out over the wire. I'll show you both for the simple plunker example. When the page loads it should request two different ids in the same stacked svg file.
Here are side by side requests of the same test page in Chrome 39, FF 34, and IE 11:
Caching Enabled
But we want to test what would happen on a normal client where caching is enabled. To do this, update your dev tools for each browser, or go to fiddler and Rules > Performance > and uncheck Disable Caching.
Now our request should look like this:
Now all the files are returned from the local cache, regardless of fragment ID
The developer tools for a particular browser might try to display the fragment id for your own benefit, but fiddler should always display the most accurate address that is actually requested. Every browser I tested omitted the fragment part of the address from the request:
Bundling Requests
Interestingly, chrome seems to be smart enough to prevent a second request for the same resource, but FF and IE fail to do so when a fragment is part of address. This is equally true for SVG's and PNG's. Consequently, the first time a page is ever requested, the browser will load one file for each time the SVG stack is actually used. Thereafter, it's happy to take the version from the cache, but will hurt performance for new viewers.
Final Score
CON: First Trip - SVG Stacks are not fully supported in all browsers. One request made per instance.
PRO: Second Trip - SVG resources will be cached appropriately
Additional Resources
simurai.com is widely thought to be the first application of using SVG stacks and the article explains some browser limitations that are currently in progress
Sven Hofmann has a great article about SVG Stacks that goes over some implementation models.
Here's the source code in Github if you'd like to fork it or play around with samples
Here's the official spec for Fragment Identifiers within SVG images
Quick look at Browser Support for SVG Fragments
Bugs
Typically, identical resources that are requested for a page will be bundled into a single request that will satisfy all requesting elements. Chrome appears to have already fixed this, but I've opened bugs for FF and IE to one day fix this.
Chrome - This very old bug which highlighted this issue appears to have been resolved for webkit.
Firefox - Bug Added Here
Internet Explorer - Bug Added Here

Webpage doesn't render for some Chrome users

Two of our users report that one of our web pages (http://vdgsa.org/pgs/ads.html) is failing to render properly. Apparently the text of each classified ad is blank, and not taking up space on the screen. For most of our users the page renders correctly.
The common denominator appears to be that the users experiencing problems are using Google Chrome. However, when I look at this page on my computer using the same version of Chrome and the same OS (Windows 7) as one of those users, the page looks fine. Both the HTML and CSS validate. Since I can't duplicate the prob, I'm at a bit of a loss.
Any suggestions?

I use the Chrome extension "AdBlock" and don't see the text of the adds.
If I disable adblock and refresh the page, it works normaly (I see the text)
If you download one of the main filter used by AdBlock: the "Easy List", you will find many references to the words "ad_titem" / "ad_title", etc...
You may try to use other names for your CSS class names "ad_item" and "ad_title"

Google Chrome doesn't print my Javascript-and-AJAX-generated content

I am the developer of a webapplication.
Chrome displays my Javascript-and-AJAX-generated webpages correctly, but when I try and print them (via its native function) I get a blank page.
Printing works just fine on other browsers.
I have tried and print server-side-generated pages with Chrome and they get printed fine.
What can be wrong on the webpages of my web application? I think the issue is that those pages are dynamically generated by Javascript and AJAX.
I am saying that because I have just found out that I can't even save those pages correctly with Chrome (all the dynamic HTML is not shown).
I am on Google Chrome 13.0.782.112.
How can I debug and fix this issue?
Is there any workaround?
Is anybody managing to get dynamic-generated content printed with Google Chrome?
This problem is driving me crazy!
P.S.: some of my users are reporting the same problem on Safari :-(
UPDATE: upgraded to Chrome 14.0.835.202 but the issue is still there...

I've had exactly the same problem, though not in Chrome (although I didn't actually test with Chrome). On certain browsers (and I cannot remember which ones offhand - but it was either in IE or FF), any content that is added into the DOM by JavaScript is not printed. What is actually printed is the original document that was served to the browser.
I successfully solved this using the following JavaScript function:
function docw()
{
var doct = "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"
\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">";
document.write(doct + document.getElementsByTagName('html')[0].innerHTML);
}
This is called when JavaScript page manipulation has finished. It actually reads the entire DOM and then re-writes the whole document back into itself using document.write. This is now enabling printing for my particular project on both IE and FireFox, although I'm pretty sure one of those did already work in the first place, and the other one didn't (can't remember which from memory, and it's not a project I can pull out to test at the moment). Whether this will solve the problem in Chrome I don't know, but it's worth a try.
Edit Terribly sorry, but I'm a complete pleb. I just re-read my old comments and this solution had nothing to do with printing; it was actually to fix a problem where only the original served document would be saved when saving to file. However, that said, I still wonder if it's worth a shot.

This helped me with a related problem - how to view/save dynamically generated HTML itself. I came up with the following bookmarklet.
javascript:(function(){document.write('<pre>'+(document.getElementsByTagName('html')[0].innerHTML.replace(/&/g,'&').replace(/</g,'<').replace(/>/g,'>'))+'</pre>')})()
I run this and 'select all' / copy, and then (in Linux) do 'xclip -out' to direct the large amount of clipboard data to a file.

Trevor's answer totally worked for me- with jquery I simply did something like
$("html").html $("html").html()
worked perfectly

IE form input data disappear after browser refresh

I'm trying to achieve sticky forms without PHP. My setup is AJAX like javascript. The back/forward work fine on both IE and FF, but refresh only works on FF, not IE. Doesn't matter what cache options I use, I've even set IE's temporary files option to never check for updates, and the input value is gone after page refresh(the refresh button or F5)
I've read many posts where people have the opposite problem, and do not want form data to persist across page refresh, and never read from browser cache, but I do.
Any help is appreciated, thanks!
ps. posts like HTML - input value kept after Refresh are exactly the opposite of my problems

IE/Chrome/Safari/Opera/etc has the expected behavior.
I consider it a bug that FF doesn't actually refresh the fields when you click refresh.
After all, the purpose of the refresh is to dump what you have and reload from the server. For Firefox to then merge any changed information / fields back into the form is unexpected behavior and, IMHO, bad by design.
Also note that this one issue has been fought over at Mozilla for 10 years. It is a source of MANY duplicate bug reports, is considered by many to be a critical failure, and is quite frankly a complete PITA. I don't know how many times I've had to explain to non-techies why the Firefox reload button doesn't, well, reload the page.
Lately I've taken to just telling them that the Firefox reload button is broken and that they have to either hold down the shift key which clicking refresh or use a different browser. Thankfully we have choices.
--- Update due to a comment stating confusion as to what F5 and Ctrl-F5 are --
All browsers (except Firefox) treat F5 as "reload". Which means reload the page either from cache or from the server if cache is disabled. Firefox does do the reload, but it also repopulates any boxes with stuff you've typed in... Provided those fields still exist. IMHO, this is bad behavior as a page may have changed and you end up in a very invalid state with some things filled in and others not.
To be clear, the cache does not contain what you typed in the page; cache only contains what the server sent you. So Firefox itself takes this extra step of trying to merge previously typed in and unsubmitted data. Again, NONE of the other browsers do this and it is a source of much confusion.
All browsers (including Firefox) treat Ctrl F5 as "reload from server". This ignores any files you've cached (images, css, javascript, etc) and pulls it brand new from the server. Thankfully, Firefox does not merge unsubmitted data back into the page when you do a Ctrl-F5.

Even though I agree with Chris, would this be of any help?
http://snipplr.com/view/799/get-url-variables/
It would allow you to store things in the URL (like phps GET) and access them with javascript

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008