New line characters get submitted differently - html

I was wondering if somebody could shed some light on this browser behaviour:
I have a form with a textarea that is submitted to to the server either via XHR (using jQuery, I've also tried with plain XMLHttpRequest just to rule jQuery out and the result is the same) or the "old fashioned" way via form submit. In both cases method="POST" is used.
Both ways submit to the same script on the server.
Now the funny part: if you submit via XHR new line characters are transferred as "%0A" (or \n if I am not mistaken), and if you submit the regular way they are transferred as "%0D%0A" (or \r\n).
This, off course, causes some problems on the server side, but that is not the question here.
I'd just like to know why this difference? Shouldn't new lines be transferred the same no matter what method of submitting you use? What other differences are there (if any)?

XMLHttpRequest will when sending XML strip the CR characters from the stream. This is in accord with the XML specification which indicates that CRLF be normalised to simple LF.
Hence if you package your content as XML and send it via XHR you will lose the CRs.

In part 3.7.1 of RFC2616(HTTP1.1), it allows either \r\n,\r,\n to represent newline.
HTTP relaxes this requirement and allows the
transport of text media with plain CR or LF alone representing a line
break when it is done consistently for an entire entity-body. HTTP
applications MUST accept CRLF, bare CR, and bare LF as being
representative of a line break in text media received via HTTP.
But this does not apply to control structures:
This flexibility regarding
line breaks applies only to text media in the entity-body; a bare CR
or LF MUST NOT be substituted for CRLF within any of the HTTP control
structures (such as header fields and multipart boundaries).

Related

How can we send dot in smtp protocol

If we want when using the SMTP protocol, inside the text a
\n
.
\n
What should we do so that it is not confused with the endpoint of the email in the SMTP protocol?
The RFC has a separate section about the escape mechanism for sending a lone dot on a line; it's colloquially called "dot stuffing".
In brief, any leading dot on a line needs to be escaped with another dot. So, to send a line containing one literal dot, you'd send
..
and to send a line containing two literal dots, you'd send
...
etc, with the usual <CRLF> line terminators required by the protocol (your example seems to incorrectly assume just <LF> which is often tolerated in practice, but technically not correct). The receiving server correspondingly strips any initial dot from each line of the data.
The RFC refers to this mechanism as "transparency"; RFC 5321 section 4.5.2
Here's a helpful link to the RFC (top of page 37 if the link isn't working). Here's the important takeaway:
The custom of accepting lines ending only in <LF>, as a concession to
non-conforming behavior on the part of some UNIX systems, has proven
to cause more interoperability problems than it solves, and SMTP
server systems MUST NOT do this, even in the name of improved
robustness. In particular, the sequence "<LF>.<LF>" (bare line
feeds, without carriage returns) MUST NOT be treated as equivalent to
<CRLF>.<CRLF> as the end of mail data indication.
Basically SMTP servers are not actually listening for \n.\n as the terminator. Instead they're expecting \r\n.\r\n.
The \n character is a newline (in the RFC it's called LF) whereas the \r\n characters constitute a carriage return (in the RFC it's called CRLF).
So, it's an important distinction that SMTP servers actually terminate data with <CRLF>.<CRLF> not <LF>.<LF>, otherwise SMTP servers would run into exactly the issue you bring up here.

Are JSON objects allowed in HTTP Cookies (they seem to violate RFC6265)?

I recently had to debug an (old) web app that had to check a session cookie and was failing under some circumstance. Turns out that the library code that had to parse the HTTP Cookie header did not like the fact that one of the cookies in the header had a value that was a JSON object:
Cookie: lt-session-data={"id":"0.198042fc1767138e549","lastUpdatedDate":"2020-12-17T10:22:25Z"}; sessionid=a7f2f57d0b9a3247a350d9157fcbf9c2
(The lt-session-data cookie comes from LucidChart and ends up tagged with our domain because of the user visiting one of our Confluence pages with an embedded LucidChart diagram.)
It seems clear that the library I am using is applying the rules of RFC6265 in a strict manner:
cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
; US-ASCII characters excluding CTLs,
; whitespace DQUOTE, comma, semicolon,
; and backslash
According to this, the JSON cookie value breaks at least two rules (no DQUOTE, no comma), and possibly more, depending on the data items in the JSON object (e.g. strings that may contain arbitrary characters).
My question is, is this common practice? Both Firefox and Chrome seem to accept this cookie without any issue, even though it goes against the RFC standard. I tried Googling for standards, but only RFC6265 turns up. It seems people just started putting JSON values in cookies.
If this is now common practice, and not a misguided effort by people who didn't bother to read the relevant standard docs, is there an updated standard?

Why not use enctype="multipart/form-data" always?

By change I discovered that the django admin interfaces uses enctype="multipart/form-data" always.
I would like to adopt this pattern, but I am unsure if I see all consequences this has.
Why not use enctype="multipart/form-data" always?
Update
Since more than one year we use enctype="multipart/form-data" always in some forms. Works fine.
From the RFC that defines multipart/form-data:
Many web applications use the "application/x-www-form-urlencoded"
method for returning data from forms. This format is quite compact,
for example:
name=Xavier+Xantico&verdict=Yes&colour=Blue&happy=sad&Utf%F6r=Send
However, there is no opportunity to label the enclosed data with a
content type, apply a charset, or use other encoding mechanisms.
Many form-interpreting programs (primarily web browsers) now
implement and generate multipart/form-data, but a receiving
application might also need to support the
"application/x-www-form-urlencoded" format.
Aside from letting you upload files, multipart/form-data also allows you to use other charsets and encoding mechanisms. So the only reasons not to use it are:
If you want to save a bit of bandwidth (bearing in mind that this becomes much less of an issue if the request body is compressed).
If you need to support really old clients that can't handle file uploads and only know application/x-www-form-urlencoded, or that have issues handling anything other than ASCII.
There's a bit of overhead with using multipart/form-data for simple text forms. Compare a simple form with name and email.
Default (x-www-form-urlencoded)
Content-Type: application/x-www-form-urlencoded; charset=utf-8
name=Nomen+Nescio&email=foo%40bar.com
multipart/form-data
Content-Type: multipart/form-data; boundary=96a188ad5f9d4026822dacbdde47f43f
--96a188ad5f9d4026822dacbdde47f43f
Content-Disposition: form-data; name="name"
Nomen Nescio
--96a188ad5f9d4026822dacbdde47f43f
Content-Disposition: form-data; name="email"
foo#bar.com
--96a188ad5f9d4026822dacbdde47f43f
As you can see, you need to transmit a bunch of additional bytes in the body when using multipart encoding (37 bytes vs 252 bytes in this example)
But when you add the http headers and apply compression, the relative difference in payload would in most real life cases be much smaller.
The reason to prefer urlencoded over multipart is a small saving in http request size.
TL; DR
There's almost certainly no problem if you're targeting any modern browser and using SSL for any confidential data.
Background
The form-data type was originally developed as an experimental extension for file uploads in browsers, as explained in rfc 1867. There were compatibility issues at the time, but if your target browsers supports HTML 4.x and hence the enc-type, you're fine. As you can see here that's not an issue for all mainstream browsers.
As already noted in other answers, it is a more verbose format, but that is also not an issue when you can compress the request or even just rely on the improved speed of communications in the last 20 years.
Finally, you should also consider the potential for abuse of this format. Since it was designed to upload files, there was the potential for this to be used to extract information from the user's machine without their knowledge, or sending confidential information unencrypted, as noted in the HTML spec. Once again, though, modern browsers are so field hardened, I would be stunned if such low hanging fruit was left for hackers to abuse and you can use HTTPS for confidential data.
The enctype attribute specifies how the form-data should be encoded when submitting it to the server and enctype="multipart/form-data" is used when a user want to upload a file (images, text files etc.) to the server.

Is it necessary to percent encode a URI before sending it to the browser?

Is it necessary to percent encode a URI before using it in the browser i.e when we write a URI in a browser should it already be percent encoded or it is the responsibility of the browser to encode the URI and send the request to the server?
You'll find that most modern browsers will accept a non-encoded URL and they will generally be able to encode reserved characters themselves.
However, it is bad practice to rely on this because you can end up with unpredictable results. For instance, if you were sending form data to a server using a GET request and someone had typed in a # symbol, the browser will interpret that differently if it was encoded or non-encoded.
In short, it's always best to encode data manually to get predictable results if you're expecting reserved characters in a request. Fortunately most programming languages used on the web have built in functions for this.
Just to add, you don't need to encode the whole URL - it's usually the data you're sending in a GET request which gets encoded. For example:
http://www.foo.com?data=This%20is%20my%20encoded%20string%20%23

How to stop percent encoding in HTML form submission

I am trying to give users of my website the ability to download files from Amazon S3. The URLs are digitally signed by my AWS private key on my webserver than sent to the client via AJAX and embedded in the action attribute of an html form.
The problem arises when the form is submitted. The action attribute of the form contains a url that has a digital signature. This signature often times contains + symbols which get percent-encoded. It completely invalidates the signature. How can I keep forms from percent-encoding my urls?
I (respectfully) suggest that you need to more carefully identify the precise nature of the problem, where in the process flow it breaks down, and identify precisely what it is that you actually need to fix. URLEncoding of "+" is the correct thing for the browser to do, because the literal "+" in a query string is correctly interpreted by the server as " " (space).
Your question prompted me to review code I've written that generates signed urls for S3 and my recollection was correct -- I'm changing '+' to %2B, '=' to %3D, and '/' to %2F in the signature... so that is not invalid. This is assuming we are talking about the same thing, such that the "digital signature" you mention in the question is the signature discussed here:
http://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html#RESTAuthenticationQueryStringAuth
Note the signature in the example has a urlencoded '+' in it: Signature=vjbyPxybdZaNmGa%2ByT272YEAiv4%3D
I will speculate that the problem you are having might not be '+' → '%2B' (which should be not only valid, but required)... but perhaps it's a double-encoding, such that you are, at some point, double-encoding it so that '+' → '%2B' → '%252B' ... with the percent sign being encoded as a literal, which would break the signature.