I've come across an interesting bug either in Google Chrome, or less likely in how Windows 7 clipboard application. The bug happens when performing a copy (Ctrl+V) on selected browser rendered CSS + HTML containing a mix of text and images and more specifically occurs when the image in this situation is coded to contain multiple copies of different sizes. I need help verifying which app is the culprit (the browser or the clipboard) and ideally a fix if there is one (i.e. a registry edit perhaps).
There may be some security risks raised by this depending on how this behavior plays out.
Discovery
Let me begin with how I first observed this bug in OneNote 2016....
I sometimes save articles I come across in OneNote for later reading. I'm using a locally installed OneNote 2016 copy on Windows 7 Enterprise SP1 (64-bit). As it's an enterprise copy, the official Web Clipper extensions aren't an option, so I just copy the articles manually when the urge strikes.
Curiously I found that when I select a large block of text and copy and paste it, the images sometimes appear lower quality in OneNote than they do in my web browser.
As a case study, I was trying to copy this article manually: Meet the frail, small-brained people who first trekked out of Africa.
When copying as a block of images + formatted text (in browser: highlight the desired region, Ctrl+C; Ctrl+V in OneNote 2016) the image difference is immediately notable. What I discovered is that if I right-click just the article image and chose Copy image and then paste in OneNote using specifically right-click, Paste options: > Picture (U) (the rightmost one), I get a much higher resolution image.
Here's an example, with the image repasted above its occurence from the original bulk paste.
Digging into the web code, I believe I've figured out why. The issue occurs when there's multiple copies of the same image listed in some sort of variable style that I'm assuming is used to accommodate different screen resolutions or perhaps layouts specific to mobile versus PC.
The image URLs are contained in the <img... tag in the code below:
<figure class="figure">
<img sizes="" src="http://www.sciencemag.org/sites/default/files/styles/inline__450w__no_aspect/public/gg_61125N_FirstHumans_BottomPart.png?itok=d0BdUhJK" srcset="http://www.sciencemag.org/sites/default/files/styles/inline__450w__no_aspect/public/gg_61125N_FirstHumans_BottomPart.png?itok=d0BdUhJK 1w, http://www.sciencemag.org/sites/default/files/styles/inline__699w__no_aspect/public/gg_61125N_FirstHumans_BottomPart.png?itok=j9n6dGdk 700w" /> <figcaption>
<div class="caption"></div>
<div class="credit">
Garvin Grullón
</div>
</figcaption>
</figure>
The difference in pasted images is more glaring when it happens with vector-graphic type images, which I believe is likely the byproduct of taking the vector graphic dumped to *.png and then resizing it in photo-editing software by news folks (kind of defeating the point of vector graphics), hence butchering the rescaled resolution.
I show a couple instances of the overall OneNote copy/paste issue, below to illustrate:
Scope
Testing further I've found this bug appears to be common to all Office 2016 programs (tested on Word 2016 and PowerPoint 2016) when copying blocks of text and images that contain <img ...> content w/ multiple URLs in the tag.
Also I discovered that when directly pasting images via the right-click > Copy image strategy, only the Picture (U) option in Paste Options pastes the correct higher res image, matching what's displayed in the browser. Keep Source Formatting (K) and Merge Formatting (M) will interestingly still paste the lower res copy, even with this strategy.
Now the browser (Chrome Version 54) is displaying the proper copy, which leads me to try testing the same article and copy & paste method in Internet Explorer 11.0. In IE 11 it loads the proper highest resolution image (as so too does Chrome Version 54), but unlike Chrome, when bulk copying and pasting from Internet Explorer 11 to OneNote 2016 using that same article, magically the highest resolution image is properly pasted.
That has me thinking the bug in Chrome browser copy & paste.
Indeed, I may have confirmation of this, as in Adobe Pro (which is my only other locally installed rich media software with a clipboard paste option to my knowledge), choosing pasting from the clipboard (in Edit mode: Pages (right menu pane, second from top) > More Insert Options (under the Insert Pages section) > Insert from Clipboard) and then copying and pasting the image in the resulting block into MS Paint, I see it is indeed still the lower res image, as in Office 2016.
Hence I've discovered that the scope is not only OneNote 2016 or Office 2016, but rather all local rich media apps including both Microsoft's own apps and third party apps from Adobe, etc. That said, I'm still not 100% confident that I can pin the bug on Google Chrome as it's possible albeit perhaps less likely that the issue lies in Windows 7's clipboard application, as that too would explain the universal affect on local rich media paste attempts.
Looking one more time at that img tag, I do see that the srcset attribute is a somewhat new attribute addition to the img tag which when used is termed a "Responsive Images". The feature was announced by the W3C Working Group in 2014 and per my reading most third party mobile and PC browsers claim support for it. As I suspected is used with CSS to target different platforms... it appears the large dichotomy in resolution between so-called "Retina" (high-resolution) displays like those in Apple's mobile devices and low-end displays found in budget Android devices and budget PCs in part inspired this attribute's addition.
Humorously, the diagnostic site "Can I Use" lists the img srcset attribute as not working in Internet Explorer 11 and working in Google Chrome version 54. In this case the irony is that the opposite appears to be true for this use case -- Chrome 54's copy is not working correctly while copying from Internet Explorer 11 is correctly parsing the Responsive Image.
That still doesn't tell us for sure whether it's the browser or the clipboard application, but it's certainly something I find interesting to note, given that it suggests a critical missing piece of Can I Use's testing methodology (or that of whoever they get their data from -- haven't checked into that yet) as copy & paste seemingly is a common use case for rich content.
Questions
To recap and conclude, I've tracked down the scope, but the following remain unsolved questions
Which app is responsible -- Windows 7's clipboard app or Google Chrome?
For the offending app, is there a registry setting that I can tweak to have the bulk copy and paste grab whatever image the browser is displaying?
Also, to recap the earlier thought if the bug is in Google Chrome, there could be some interesting security reprecussions perhaps, as you're perhaps loading URL content into the operating system's clipboard application that doesn't match the rendered content in the browser window.
Related
I've seen this behavior across several different browsers over the years (Chrome, Firefox, and Opera, at least), but most recently it happens only in Opera and Chrome - I think Firefox fixed it at some point. If I have a page which pushes a fairly sizeable chunk of data (several thousand lines of HTML) to the browser, if I use any HTML Entities in the data, they come through malformed when you view the source code.
For example, I put a "lower right pencil" entity ( ✎ - or ✎) throughout the contents of a page in order to label "Edit" links. However, when I load the same page in any browser and click "View Source", I see a random code that often does not match what is actually hard coded into the page HTML. Some examples include:
&x#x2#x270E;, &#x#x270E;, ɰ#x270E;
Examining a Fiddler capture of the actual source code being sent to the browser shows that the browser indeed receives the CORRECT codes. Something seems to go awry as soon as the browser tries to display it in a view-source tab.
It happens with other codes too, becomes &nbnbsp; or &nnbsp; etc. Mysteriously, these randomize with each refresh. Once in a while they come through correct, though most of the time they get garbled. The codes appear to render correctly on the front-end, is this just a bug in every major browser, or should I be concerned about data loss when pushing somewhat large data sets over HTTP?
Past Tests
I ran two tests to confirm this:
(1) Spammed a single character into a valid HTML5 page's contents hosted on a public facing AWS LAMP server. Viewed the contents in Opera and viewed source. Most were okay, but about half way down it starts to trip up, and continues sporadically throughout:
'#x270E;
(2) Spammed a single character into a valid HTML5 page's contents hosted on an intranet Windows server and served over a NetExtender VPN. Same result as the first test.
ɰ#x270E;✎
Steps to Reproduce:
I have tested this on many different systems (Linux - like Ubuntu, Windows 7 and Windows 10 so far) on several different networks. However, I would appreciate if others could confirm this.
Create a valid HTML page and paste a single HTML Entity (either decimal or hexidecimal representation) between the body tags.
Copy and paste the character to fill up several hundred lines of content (less may be required, but more will be most likely to produce the same issue). For example: ... etc.
Save the page on your web server.
Load the page in a new Opera window.
Right click anywhere in the page and click "Page source"
Copy the source code and either manually examine it or just paste it into the W3 Validator at https://validator.w3.org - it will help to point out the incorrectly formatted HTML Entities.
Opera 49.0 Illustration
See below how the Code Inspector shows the correct HTML Entity code. However, when you view Page Source for the same section, the code gets malformed.
So here's the story. At my company, you can access previous pay statements online via a payroll website. When you go to look at a statement, it will open the statement in-browser via a pdf viewer. When working properly, it will usually ask if you want to blank out all the data or not (which... why would you want to? the point is to look at your statement). Now, this worked fine when checking it out in Adobe Reader; you'd just click "yes, show data" and everything displays fine, it can be printed, etc. But the company decided that instead of getting Acrobat for editing pdf files, that the better (cheaper) option is to get a cheap/free alternative called "Nuance" something or other. Two users installed this program, and now the browsers open pdfs in-browser with Nuance instead of Adobe Reader. This is a problem, because Nuance doesn't show the option of hiding or showing data like Reader did; rather, it just chooses the "no" option, which results in a blank template pdf coming up.
Now, this whole problem could be solved if we could just get the browsers to use Reader to open pdf files in-browser... obviously it's not a problem if you could download the pdf, but the site doesn't seem to allow you to download the pdf files. We've tried just about everything we could think of, short of uninstalling Nuance altogether, to get IE or Chrome to open pdf files with Reader, but even with a full IE reset, it uses Nuance to open pdfs inside the browser. Changing the default program for pdfs has yielded no results, IE still uses Nuance in-browser.
Anyone have any thoughts on how to change IE or Chrome to default to using Reader to show pdf files instead of Nuance?
Thanks!
just change the default programs or applications that opens pdf files, make it the adobe reader. You'll have to configure it in Windows, the Default Programs in Control Panel. You'll have to do this in each computer.
My company has a web application that outputs a PDF which we print on label paper (stickers with product data).
Chrome is the default browser around here. Unfortunately, when we try to print from Chrome the "Fit to Page" checkbox is automatically selected. This screws up the alignment and prints data in the wrong places. If we uncheck 'Fit to Page', it prints perfectly on all machines.
If I skip the Chrome Print dialog and use the system one, it works fine on a Mac, but poorly on Windows machines.
I would really like a way to disable the "Fit to Page" option.
What I've looked at:
Printing Avery 5160 labels with FPDF - I added /ViewerPreferences << /PrintScaling /None >> to my pdf, but this article Set PDF to print with no scaling says that it's controlled by the application (Chrome in my case).
http://productforums.google.com/forum/#!topic/chrome/REy2n67B1fM --not helpful
https://code.google.com/p/chromium/issues/detail?id=158752 --not helpful
So I'm afraid that I may need to find or make a Chrome extension to do this. Before I dig into that, does anyone know if it's even possible?
Other Facts:
PDF is being generated by fpdf in php. All computers should be using latest vs of Chrome.
I hate to say this but, could you try opening the PDF in a different browser, or use a standalone PDF reader application?
Since the Chromium team has labelled this as a WontFix, Chrome simply might not suffice for your needs.
Overall mission:
For the purpose of printing, I want to download 330 images that are linked from a Pinterest board. Note, not the thumbnails that you see when visiting a board, but the larger images that they link to.
Context:
Go to a Pinterest.com board of choice and view the source. The relevant hi-res image links can be seen in the page source within the atttribute data-closeup-url
Example URL
http://pinterest.com/stonegarden/misc/
The relevant board is invite-only, I reckon that may be relevant with regards to scripts not being 'logged in' etc.
My question is: How can I download all these 330 hi-res images from a board with the least effort? I.e with a script, with iMacros or anything else. The end result would be a folder that contains all these 330 images downloaded.
Edit, as requested:
What I've tried so far
I'm not using any specific programming language, my skills are limited in that field. Either way, I imagine the problem's gonna be the permissions.
Automating with the Firefox extension iMacros - fails because I can't get it to do anything useful with the image URLs and other reasons
The solution provided by Benno - I can paste the relevant URLs, but fails and says "No permissions for requested resource"
So, how does Pinterest differentiate between a user clicking a thumbnail to get the large image, and Safari trying to download the same resources via the Download window?
Open up your browser's web inspector and go to the Console.
Put in this code:
var s = ''; $('div[data-closeup-url]').each(function() {
s+= $(this).data('closeup-url') + "\n";
}); s;
That will give you the URLs of all the images, then you can just copy and paste them into a URL capturing utility like jdownloader. Takes away the need to do anything related to logging in to their server. Or just write a script to file_get_contents (php) for each URL.
If you use Safari, open the download window and paste the list of URL's into the download window (ctrl+v or cmd+v) and it downloads all of them (tested in safari 6 on mac)
I managed to collect the behavior of a complex web site into a webarchive. Thereafter I would like to turn that webarchive into an html set of nested directory. Yet, when I did it both with Waf and with a commercial software bought on the the Apple store, what I get is just the nested directory with the html page at the bottom and no images, nor css nor working links.
If you are interested the webarchive document is at:
http://www.miafoto.it/it/GiroMilano.webarchive
while the weak product of the extraction is at:
http://www.miafoto.it/it/Giromilano/Pagine/default.aspx
and the empty directories above.
In addition to the different look, the webarchive displays the same behavior as the official web site - when a listbox vales is selected and then the button pushed - while the extracted version produces a page with no contents by loading itself rather than the official page.
As you may see the webarchive is over 1MB while the extraction just little over 1 KB.
What is wrong with it and how may I perform such an apparently trivial business with usable results?
Thanks,
textutil -convert html example.webarchive
Be careful — html with files is created in the same folder as webarchive!
Also, I had to open .html with text editor and replace "file:///image.tiff" links (replace "file:///" with "") so they point to relative path.
Also, not all browsers display .tiff images.
Who knew we have Stack Overflow wiki?
I find that this WebArchiveExtractor.app works on my Mac (Mojave OS) –
https://robrohan.github.io/WebArchiveExtractor/
I managed the issue by finding all parameters being submitted in the page and submitting them too in my script, ignoring the webarchive.
To save HTML pages on mac, I use chrome. Download and install it and save your page as HTML. Safari will save the web pages with webarchiveformat and for me, it's very hard to deal with it.