I recently had to debug an (old) web app that had to check a session cookie and was failing under some circumstance. Turns out that the library code that had to parse the HTTP Cookie header did not like the fact that one of the cookies in the header had a value that was a JSON object:
Cookie: lt-session-data={"id":"0.198042fc1767138e549","lastUpdatedDate":"2020-12-17T10:22:25Z"}; sessionid=a7f2f57d0b9a3247a350d9157fcbf9c2
(The lt-session-data cookie comes from LucidChart and ends up tagged with our domain because of the user visiting one of our Confluence pages with an embedded LucidChart diagram.)
It seems clear that the library I am using is applying the rules of RFC6265 in a strict manner:
cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
; US-ASCII characters excluding CTLs,
; whitespace DQUOTE, comma, semicolon,
; and backslash
According to this, the JSON cookie value breaks at least two rules (no DQUOTE, no comma), and possibly more, depending on the data items in the JSON object (e.g. strings that may contain arbitrary characters).
My question is, is this common practice? Both Firefox and Chrome seem to accept this cookie without any issue, even though it goes against the RFC standard. I tried Googling for standards, but only RFC6265 turns up. It seems people just started putting JSON values in cookies.
If this is now common practice, and not a misguided effort by people who didn't bother to read the relevant standard docs, is there an updated standard?
Related
This is a style question about REST API design
I have an API which returns a particular resource, and can return either the fields of the resource as a JSON object, or a PDF representation of the resource. The normal REST way of doing that is to use the same URL, but return either the JSON object or the PDF data depending on the "Accept" header of the request.
This has been fine when calling the API from a client application. But now I am writing a web application, and I want to display the PDF. I can fetch the PDF data with XMLHttpRequest, but there isn't an easy way to display it. (I recall some hack involving passing the whole base64-encoded content in the URL, but that is both flaky and disgusting).
The easy way to display a PDF in a web application is window.open(), but I can't pass an Accept header to that (there are a few questions here for people asking how to do that).
This seems like a potentially common situation. What's the best workaround? Stick ?pdf or /pdf or ?accept=pdf onto the url? Is there a de facto standard? Or is there a solution I haven't thought of (maybe treating "application/pdf" as the default request mime type, and only returning the JSON object if the Accept header is "application/json")?
Actually the more I think of it, the more that last idea (return PDF unless explicitly asked for application/json in the Accept header) seems like the right answer. A web browser can display PDFs (at least usually, these days) and not JSON objects, so a request made natively by a browser should always get the PDF, and REST clients that wants JSON data would normally explicitly ask for application/json, so the API server should not do that by default if it has a better browser-compatible form.
Since it took actually explaining this whole problem into the question box to make me think of the answer I posted the question anyway.
On a project we spent considerable effort to work around basic authentication (because webdriver tests were depending on it, and webdriver has no api for basic authentication), and I remember basic authentication in the URL clearly not working. I.e. could not load http://username:password#url
Just google "basic authentication in url" and you will find tons of people complaining: https://medium.com/#lmakarov/say-goodbye-to-urls-with-embedded-credentials-b051f6c7b6a3
https://www.ietf.org/rfc/rfc3986.txt
Use of the format "user:password" in the userinfo field is deprecated.
Now today I told this quagmire to a friend and he said they are using http://username:password#url style basic authentication in webdriver tests without any problem.
I went in my current Chrome v71 to a demo page and to my surprise I found it indeed very well working: https://guest:guest#jigsaw.w3.org/HTTP/Basic/
How is this possible?? Are we living in parallel dimensions at the same time? Which one is true: is basic authentication using credentials in the URL supported or deprecated? (Or was this maybe added back to Chrome due to complaints of which I can't find any reference?)
Essentially, deprecated does not mean unsupported.
Which one is true: is basic authentication using credentials in the URL supported or deprecated?
The answer is yes, both are true. It is deprecated, but for the most part (anecdotally) still supported.
From the medium article:
While you would not usually have those hardcoded in a page, when you open a URL likehttps://user:pass#host and that page makes subsequent requests to resources linked via relative paths, that’s when those resources will also get the user:pass# part applied to them and banned by Chrome right there.
This means urls like <img src=./images/foo.png> but not urls like <a href=/foobar>zz</a>.
The rfc spec states:
Use of the format "user:password" in the userinfo field is
deprecated. Applications should not render as clear text any data
after the first colon (":") character found within a userinfo
subcomponent unless the data after the colon is the empty string
(indicating no password). Applications may choose to ignore or
reject such data when it is received as part of a reference and
should reject the storage of such data in unencrypted form. The
passing of authentication information in clear text has proven to be
a security risk in almost every case where it has been used.
Applications that render a URI for the sake of user feedback, such as
in graphical hypertext browsing, should render userinfo in a way that
is distinguished from the rest of a URI, when feasible. Such
rendering will assist the user in cases where the userinfo has been
misleadingly crafted to look like a trusted domain name
(Section 7.6).
So the use of user:pass#url is discouraged and backed up by specific recommendations and reasons for disabling the use. It also states that apps may opt to reject the userinfo field, but it does not say that apps must reject this.
Is it necessary to percent encode a URI before using it in the browser i.e when we write a URI in a browser should it already be percent encoded or it is the responsibility of the browser to encode the URI and send the request to the server?
You'll find that most modern browsers will accept a non-encoded URL and they will generally be able to encode reserved characters themselves.
However, it is bad practice to rely on this because you can end up with unpredictable results. For instance, if you were sending form data to a server using a GET request and someone had typed in a # symbol, the browser will interpret that differently if it was encoded or non-encoded.
In short, it's always best to encode data manually to get predictable results if you're expecting reserved characters in a request. Fortunately most programming languages used on the web have built in functions for this.
Just to add, you don't need to encode the whole URL - it's usually the data you're sending in a GET request which gets encoded. For example:
http://www.foo.com?data=This%20is%20my%20encoded%20string%20%23
As with any user supplied data, the URLs will need to be escaped and filtered appropriately to avoid all sorts of exploits. I want to be able to
Put user supplied URLs in href attributes. (Bonus points if I don't get screwed if I forget to write the quotes)
...
Forbid malicious URLs such as javascript: stuff or links to evil domain names.
Allow some leeway for the users. I don't want to raise an error just because they forgot to add an http:// or something like that.
Unfortunately, I can't find any "canonical" solution to this sort of problem. The only thing I could find as inspiration is the encodeURI function from Javascript but that doesn't help with my second point since it just does a simple URL parameter encoding but leaving alone special characters such as : and /.
OWASP provides a list of regular expressions for validating user input, one of which is used for validating URLs. This is as close as you're going to get to a language-neutral, canonical solution.
More likely you'll rely on the URL parsing library of the programming language in use. Or, use a URL parsing regex.
The workflow would be something like:
Verify the supplied string is a well-formed URL.
Provide a default protocol such as http: when no protocol is specified.
Maintain a whitelist of acceptable protocols (http:, https:, ftp:, mailto:, etc.)
The whitelist will be application-specific. For an address-book app the mailto: protocol would be indispensable. It's hard to imagine a use case for the javascript: and data: protocols.
Enforce a maximum URL length - ensures cross-browser URLs and prevents attackers from polluting the page with megabyte-length strings. With any luck your URL-parsing library will do this for you.
Encode a URL string for the usage context. (Escaped for HTML output, escaped for use in an SQL query, etc.).
Forbid malicious URLs such as javascript: stuff or links or evil domain names.
You can utilize the Google Safe Browsing API to check a domain for spyware, spam or other "evilness".
For the first point, regular attribute encoding works just fine. (escape characters into HTML entities. escaping quotes, the ampersand and brackets is OK if attributes are guaranteed to be quotes. Escaping other alphanumeric characters will make the attribute safe if its accidentally unquoted.
The second point is vague and depends on what you want to do. Just remember to use a whitelist approach instead of a blacklist one its possible to use html entity encoding and other tricks to get around most simple blacklists.
I was wondering if somebody could shed some light on this browser behaviour:
I have a form with a textarea that is submitted to to the server either via XHR (using jQuery, I've also tried with plain XMLHttpRequest just to rule jQuery out and the result is the same) or the "old fashioned" way via form submit. In both cases method="POST" is used.
Both ways submit to the same script on the server.
Now the funny part: if you submit via XHR new line characters are transferred as "%0A" (or \n if I am not mistaken), and if you submit the regular way they are transferred as "%0D%0A" (or \r\n).
This, off course, causes some problems on the server side, but that is not the question here.
I'd just like to know why this difference? Shouldn't new lines be transferred the same no matter what method of submitting you use? What other differences are there (if any)?
XMLHttpRequest will when sending XML strip the CR characters from the stream. This is in accord with the XML specification which indicates that CRLF be normalised to simple LF.
Hence if you package your content as XML and send it via XHR you will lose the CRs.
In part 3.7.1 of RFC2616(HTTP1.1), it allows either \r\n,\r,\n to represent newline.
HTTP relaxes this requirement and allows the
transport of text media with plain CR or LF alone representing a line
break when it is done consistently for an entire entity-body. HTTP
applications MUST accept CRLF, bare CR, and bare LF as being
representative of a line break in text media received via HTTP.
But this does not apply to control structures:
This flexibility regarding
line breaks applies only to text media in the entity-body; a bare CR
or LF MUST NOT be substituted for CRLF within any of the HTTP control
structures (such as header fields and multipart boundaries).