PDF links getting stuck while loading in Chrome PDF Viewer - html

On a page of a website we're building http://ovsd.nutrislice.com/wellness/ , pdf download links ("Download the Issue") get stuck while loading in Chrome's PDF Viewer but work in all other browsers by triggering a download. Right click + "Save as" works in Chrome. I realize Chrome is the only browser with a built-in, default pdf viewer.
I figure we can instruct people to right click and then "save as", but I wanted to see if anyone can see a problem with either the html, or in the server response, which would cause chrome to fail like that.
Its not a traditional pass-thru file download sitting on a server somewhere. We use Heroku, and I'm currently storing the pdf's in the DB (I realize the downsides of this, but it was a simpler system than managing off-site files on S3 for now). I'm generating the response dynamically via a Django View, so I wonder if there's something i'm missing in the response headers or something.
Thanks!

Looks like a bad content-type:
Content-Type:('application/pdf', None)
Check your code where you are assigning a content-type to the response. Looks like you're sending a tuple instead of just application/pdf.

Like #dgel mentioned, your content type is incorrect:
$ curl -I http://ovsd.nutrislice.com/dbfiles/cms/resources/Vol5_Issue1_5_Dos_and_Donts_for_Supermarket_Survival.pdf
HTTP/1.1 200 OK
Access-Control-Allow-Methods: POST,GET,OPTIONS,PUT,DELETE
Access-Control-Allow-Origin: *
Cache-Control: max-age=90000
Content-Type: ('application/pdf', None) # <- Incorrect
Date: Fri, 09 Nov 2012 19:25:06 GMT
Expires: Fri, 09 Nov 2012 23:20:28 GMT
Last-Modified: Thu, 08 Nov 2012 22:20:28 GMT
Server: gunicorn/0.14.6
Connection: keep-alive
Also it might be a good idea to add Content-Length header.

Related

How to validate if Chrome is preloading resources hinted in a 103 response?

I am able to validate that we are returning the expected HTTP 103 response:
curl -D - https://local.contra.dev:8080/log-in
HTTP/1.1 103 Early Hints
Link: <https://builds.contra.com>; rel="preconnect"; crossorigin
Link: <https://fonts.googleapis.com/css2?family=Inter:wght#400;500;600;700;900&display=swap>; rel="preload"; as="font"
Link: </static/assets/entry-client-routing.de82cadc.js>; rel="modulepreload"; as="script"; crossorigin
HTTP/1.1 200 OK
cache-control: no-store
referrer-policy: strict-origin-when-cross-origin
x-frame-options: sameorigin
content-type: text/html
content-length: 5430
Date: Tue, 26 Jul 2022 19:19:28 GMT
Connection: keep-alive
Keep-Alive: timeout=72
However, how do I confirm that google-chrome (which is the only browser that supports 103 Early Hints) is taking advantage of these hints?
I don't see anything in Chrome network tab that would indicate that resources are loaded early.
One way to check would be to use the Performance API.
performance.getEntriesByName("https://path/to/your/resource")
You should see a PerformanceResourceTiming object with initiatorType: "early-hints" corresponding to your early hinted resource assuming your headers are working. See here: https://chromium.googlesource.com/chromium/src/+/master/docs/early-hints.md#checking-early-hints-preload-is-working.
I've also used network waterfalls in Chrome's devtools which should show the resource being loaded from disk cache with low time to load. Hopefully better support will arrive soon for tracing early hints requests in devtools.
Note that early hints in Chrome don't work over HTTP/1.1. https://chromium.googlesource.com/chromium/src/+/master/docs/early-hints.md#what_s-not-supported

Best and efficient way to parse both XML and HTML in C

folks!
I'm looking for the best and efficient way to parse server responds that content both HTML and XML stuff. The respond come from servers I need to poll each 5 minutes (it's about half a thousand of them in list currently, but it will double very soon). Respond stored in buffer as plane text (got from socket). So, I need to parse HTML part and in case of success (mandatory things found) I should then try to parse XML part and get statistics information to store in DB. The responses are like this:
HTTP/1.0 200 OK
Connection: close
Content-Length: 682
Content-Type: text/xml; charset=utf-8
Date: Sun, 09 Mar 2014 15:44:52 GMT
Last-Modified: Sun, 09 Mar 2014 15:44:52 GMT
Server: DrWebAV-DeskServer/REL-610-AV-6.02.0.201311040 Linux/x86_64 Lua/5.1.4 OpenSSL/1.0.0e
<?xml version="1.0" encoding="utf-8"?><avdesk-xml-api API='2.1.0' API_BUILD='20130709' branch='REL-610-AV' oper='get-server-info' rc='true' timestamp='20140309154452987' version='6.02.0.201311040'><server><id>00c1d140-d21d-b211-a828-b62919c4250d</id><platform>Linux 2.6.39-gentoo-r3 x86_64 (4 SMP Mon Oct 24 11:04:40 YEKT 2011)</platform><version>6.02.0.201311040</version><statistics from='20140301000000000' till='20140309235959999'><noviruses/><stations total='101'><online>5</online><deinstalled>21</deinstalled><blocked>0</blocked><expired>81</expired><offline>96</offline><activated>74</activated><unactivated>27</unactivated></stations></statistics></server></avdesk-xml-api>
And could be smth. like this
HTTP/1.0 401 Authorization Required
Cache-Control: post-check=0, pre-check=0
Connection: close
Content-Length: 421
Content-Type: text/html; charset=utf-8
Date: Sun, 09 Mar 2014 15:44:22 GMT
Expires: Date: Sat, 27 Nov 2004 10:18:15 GMT
Last-Modified: Date: Sat, 27 Nov 2004 10:18:15 GMT
Pragma: no-cahe
Server: DrWebAV-DeskServer/REL-610-AV-6.02.0.201311040 Linux/x86_64 Lua/5.1.4 OpenSSL/1.0.1
WWW-Authenticate: Basic realm="Dr.Web XML API area"
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"><HTML><TITLE>Unauthorized</TITLE><BODY><STRONG>Unauthorized</STRONG><P>The error "401 Unauthorized" occured while processing request you had sent.<P><BR><BR><I>Access denied or your browser does not support HTTP authentication!</I><BR><P><BR><BR><HR><P>Dr.Web ® AV-Desk Server REL-610-AV 6.02.0.201311040 Linux/x86_64 Lua/5.1.4 OpenSSL/1.0.1</BODY></HTML>
Concerning HTML part I'm basically interested in HTTP/1.0 STRING and Server: STRING stuff, and then need per-tag XML parsing, if authorization succeeded.
I have found, that libxml2 is suitable for parsing both HTML/XML stuff, but unable to find any real examples how to use it, just some major interface description. So, help needed.
Code examples for libxml2 are here
The mailing list is friendly, and the code is mature and good quality.
However, nothing in your example suggests you need to parse HTML. You need to parse (I think) HTTP to process the headers (and detect the 401 error from the HTTP response), then parse the XML content. Parsing HTTP headers to the level you require it is trivial (just strtok the response separating on line breaks and the first line has the answer you need). The body of the response starts after a double line break (I think your second example has a paste error). This reduces your task to simply processing HTTP headers and XML (no HTML parsing).

Override the "cache-control" values in a HTTP response

I have a web page that returns the following header when I access material:
HTTP/1.1 200 OK
Date: Sat, 29 Jun 2013 15:57:25 GMT
Server: Apache
Content-Length: 2247515
Cache-Control: no-cache, no-store, must-revalidate, max-age=-1
Pragma: no-cache, no-store
Expires: -1
Connection: close
Using a chrome extension, I want to modify this response header so that the material is actually cached instead of wasting bandwidth.
I have the following sample code:
chrome.webRequest.onHeadersReceived.addListener(function(details)
{
// Delete the required elements
removeHeader(details.responseHeaders, 'pragma');
removeHeader(details.responseHeaders, 'expires');
// Modify cache-control
updateHeader(details.responseHeaders, 'cache-control', 'max-age=3600;')
console.log(details.url);
console.log(details.responseHeaders);
return{responseHeaders: details.responseHeaders};
},
{urls: ["<all_urls>"]}, ['blocking', 'responseHeaders']
);
Which correctly modifies the header to something like this (based on the console.log() output):
HTTP/1.1 200 OK
Date: Sat, 29 Jun 2013 15:57:25 GMT
Server: Apache
Content-Length: 2247515
Cache-Control: max-age=3600
Connection: close
But based on everything I have tried to check this, I cannot see any evidence whatsoever that this has actually happened:
The cache does not contain an entry for this file
The Network tab in the Developer Console shows no change at all to the HTTP response (I have tried changing it to even trivial modifications just for the sake of ensuring that its not a error, but still no change).
The only real hints I can find are this question which suggests that my approach still works and this paragraph on the webRequest API documentation which suggests that this won't work (but doesn't explain why I can't get any changes whatsoever):
Note that the web request API presents an abstraction of the network
stack to the extension. Internally, one URL request can be split into
several HTTP requests (for example to fetch individual byte ranges
from a large file) or can be handled by the network stack without
communicating with the network. For this reason, the API does not
provide the final HTTP headers that are sent to the network. For
example, all headers that are related to caching are invisible to the
extension.
Nothing is working whatsoever (I can't modify the HTTP response header at all) so I think that's my first concern.
Any suggestions at where I could be going wrong or how to go about finding what is going wrong here?
If its not possible, are there any other ways to achieve what I am trying to achieve?
I have recently spent some hours on trying to get a file cached, and discovered that the chrome.webRequest and chrome.declarativeWebRequest APIs cannot force resources to be cached. In no way.
The Cache-Control (and other) response headers can be changed, but it will only be visible in the getResponseHeader method. Not in the caching behaviour.

Wikimedia login doesn't give back token

Im starting experimenting with Wikimedia, but I somehow can't get the login request working with a HTTP Client (RESTClient Firefox and others). This should be fairly simple, but it seems to fail or I have overlooked something evident.
I am using the instructions from this site.
This is what I insert in RESTClient:
I don't get the MediaWiki API Result back, but the help page (see below).
What am I doing wrong here? Thanks for any input.
Status Code: 200 OK
Cache-Control: private
Connection: keep-alive
Content-Encoding: gzip
Content-Length: 38052
Content-Type: text/html; charset=utf-8
Date: Mon, 09 Jul 2012 11:50:51 GMT
MediaWiki-API-Error: help
Server: Apache
Vary: Accept-Encoding
X-Cache: MISS from sq33.wikimedia.org, MISS from amssq35.esams.wikimedia.org, MISS from amssq39.esams.wikimedia.org
X-Cache-Lookup: MISS from sq33.wikimedia.org:3128, MISS from amssq35.esams.wikimedia.org:3128, MISS from amssq39.esams.wikimedia.org:80
X-Content-Type-Options: nosniff
<!DOCTYPE HTML>
<html>
<head>
<title>MediaWiki API</title>
</head>
<body>
<pre>
<span style="color:blue;"><?xml version="1.0"?></span>
<span style="color:blue;"><api servedby="mw67"></span>
<span style="color:blue;"><error code="help" info=""
xml:space="preserve"></span>
The are two problems with your request:
You're using the wrong URL. The correct domain is en.wikipedia.org, not www.wikipedia.org.
It seems RESTClient is using the Content-Type of text/plain by default, but the API expects application/x-www-form-urlencoded.
If you correct those two problems, you will get the correct response.
You also might want to indicate in what format do you want the response, by adding format=xml or format=json to the request. The default is HTML-formatted XML, which is useful for showing in a browser, but not for consumption by your application.

Issue with downloading PDF from S3 on Chrome

I'm facing an issue on downloading PDF files from Amazon S3 using Chrome.
When I click a link, my controller redirect the request to the file's URL on S3.
It works perfectly with Firefox, but nothing happens with Chrome.
Yet, if I perform a right click -> Save location as will download the file ...
And even a copy-paste of the S3 URL into Chrome will lead to a blank screen ...
Here is some information returned by curl:
Date: Wed, 01 Feb 2012 15:34:09 GMT
Last-Modified: Wed, 01 Feb 2012 04:45:24 GMT
Accept-Ranges: bytes
Content-Type: application/x-pdf
Content-Length: 50024
Server: AmazonS3
My guesses are related to an issue with the content type ... but all I tried didn't work.
The canonical Internet media type for a PDF document is actually application/pdf as defined in The application/pdf Media Type (RFC 3778) - please note that application/x-pdf, while commonly encountered and listed as a media type in Portable Document Format as well, is notably absent from the official Application Media Types listed by the Internet Assigned Numbers Authority (IANA).
I'm not aware of why and when application/x-pdf came to life, but apparently the Chrome PDF Plugin does not open application/x-pdf documents as of today.
Consequently you should be able to trigger a different behavior in Chrome by changing the media type of the stored objects accordingly.
Alternative (for authenticated requests)
Another approach would be to Force a PDF to download instead of letting Chrome attempt to open it, which can be done by means of triggering the Content-Diposition: attachment header with your GET request - please see the S3 documentation for GET Object on how to achieve this via the response-content-disposition request parameter, specifically response-content-disposition=attachment as demonstrated there in section Sample Request with Parameters Altering Response Header Values.
This is only available for authenticated requests though, see section Request Parameters:
Note
You must sign the request, either using an Authorization header
or a Pre-signed URL, when using these parameters. They can not be used
with an unsigned (anonymous) request.
There is an html based solution to this. Since chrome is up to date with HTML5, we can use the shiny new download attribute!
Broken
Works