If we want when using the SMTP protocol, inside the text a
\n
.
\n
What should we do so that it is not confused with the endpoint of the email in the SMTP protocol?
The RFC has a separate section about the escape mechanism for sending a lone dot on a line; it's colloquially called "dot stuffing".
In brief, any leading dot on a line needs to be escaped with another dot. So, to send a line containing one literal dot, you'd send
..
and to send a line containing two literal dots, you'd send
...
etc, with the usual <CRLF> line terminators required by the protocol (your example seems to incorrectly assume just <LF> which is often tolerated in practice, but technically not correct). The receiving server correspondingly strips any initial dot from each line of the data.
The RFC refers to this mechanism as "transparency"; RFC 5321 section 4.5.2
Here's a helpful link to the RFC (top of page 37 if the link isn't working). Here's the important takeaway:
The custom of accepting lines ending only in <LF>, as a concession to
non-conforming behavior on the part of some UNIX systems, has proven
to cause more interoperability problems than it solves, and SMTP
server systems MUST NOT do this, even in the name of improved
robustness. In particular, the sequence "<LF>.<LF>" (bare line
feeds, without carriage returns) MUST NOT be treated as equivalent to
<CRLF>.<CRLF> as the end of mail data indication.
Basically SMTP servers are not actually listening for \n.\n as the terminator. Instead they're expecting \r\n.\r\n.
The \n character is a newline (in the RFC it's called LF) whereas the \r\n characters constitute a carriage return (in the RFC it's called CRLF).
So, it's an important distinction that SMTP servers actually terminate data with <CRLF>.<CRLF> not <LF>.<LF>, otherwise SMTP servers would run into exactly the issue you bring up here.
Related
When I connect to an SMTP server, and issue an EHLO (ESMTP) greeting, some servers respond with:
250-STARTTLS
And other servers, respond with:
250 STARTTLS
Which is correct? RFC 3207 suggests that the hyphen is correct. But RFC 2487 suggests there shouldn't be a hyphen. Are they both correct? of course, deployed code usually takes precedence over standards, but it would be nice to clarify this.
As a random sample, MessageLabs offer "250 STARTTLS" whereas Hotmail/Outlook offer "250-STARTTLS".
The answer is in RFC 2821. It specifies the following:
Normally, the response to EHLO will be a multiline reply. Each line
of the response contains a keyword and, optionally, one or more
parameters. Following the normal syntax for multiline replies, these
keywords follow the code (250) and a hyphen for all but the last
line, and the code and a space for the last line
So RFC2487 has it without a space, because it's the last line of their sample SMTP session. RFC3207 has it with a hyphen, because it's not the last line (and this holds true for my Hotmail/Messagelabs example above).
I am trying to give users of my website the ability to download files from Amazon S3. The URLs are digitally signed by my AWS private key on my webserver than sent to the client via AJAX and embedded in the action attribute of an html form.
The problem arises when the form is submitted. The action attribute of the form contains a url that has a digital signature. This signature often times contains + symbols which get percent-encoded. It completely invalidates the signature. How can I keep forms from percent-encoding my urls?
I (respectfully) suggest that you need to more carefully identify the precise nature of the problem, where in the process flow it breaks down, and identify precisely what it is that you actually need to fix. URLEncoding of "+" is the correct thing for the browser to do, because the literal "+" in a query string is correctly interpreted by the server as " " (space).
Your question prompted me to review code I've written that generates signed urls for S3 and my recollection was correct -- I'm changing '+' to %2B, '=' to %3D, and '/' to %2F in the signature... so that is not invalid. This is assuming we are talking about the same thing, such that the "digital signature" you mention in the question is the signature discussed here:
http://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html#RESTAuthenticationQueryStringAuth
Note the signature in the example has a urlencoded '+' in it: Signature=vjbyPxybdZaNmGa%2ByT272YEAiv4%3D
I will speculate that the problem you are having might not be '+' → '%2B' (which should be not only valid, but required)... but perhaps it's a double-encoding, such that you are, at some point, double-encoding it so that '+' → '%2B' → '%252B' ... with the percent sign being encoded as a literal, which would break the signature.
As with any user supplied data, the URLs will need to be escaped and filtered appropriately to avoid all sorts of exploits. I want to be able to
Put user supplied URLs in href attributes. (Bonus points if I don't get screwed if I forget to write the quotes)
...
Forbid malicious URLs such as javascript: stuff or links to evil domain names.
Allow some leeway for the users. I don't want to raise an error just because they forgot to add an http:// or something like that.
Unfortunately, I can't find any "canonical" solution to this sort of problem. The only thing I could find as inspiration is the encodeURI function from Javascript but that doesn't help with my second point since it just does a simple URL parameter encoding but leaving alone special characters such as : and /.
OWASP provides a list of regular expressions for validating user input, one of which is used for validating URLs. This is as close as you're going to get to a language-neutral, canonical solution.
More likely you'll rely on the URL parsing library of the programming language in use. Or, use a URL parsing regex.
The workflow would be something like:
Verify the supplied string is a well-formed URL.
Provide a default protocol such as http: when no protocol is specified.
Maintain a whitelist of acceptable protocols (http:, https:, ftp:, mailto:, etc.)
The whitelist will be application-specific. For an address-book app the mailto: protocol would be indispensable. It's hard to imagine a use case for the javascript: and data: protocols.
Enforce a maximum URL length - ensures cross-browser URLs and prevents attackers from polluting the page with megabyte-length strings. With any luck your URL-parsing library will do this for you.
Encode a URL string for the usage context. (Escaped for HTML output, escaped for use in an SQL query, etc.).
Forbid malicious URLs such as javascript: stuff or links or evil domain names.
You can utilize the Google Safe Browsing API to check a domain for spyware, spam or other "evilness".
For the first point, regular attribute encoding works just fine. (escape characters into HTML entities. escaping quotes, the ampersand and brackets is OK if attributes are guaranteed to be quotes. Escaping other alphanumeric characters will make the attribute safe if its accidentally unquoted.
The second point is vague and depends on what you want to do. Just remember to use a whitelist approach instead of a blacklist one its possible to use html entity encoding and other tricks to get around most simple blacklists.
I'm reading email from a maildir and some emails have weird sets of characters in them:
=3D
=09
I think =3D is = and =09 is a space. There are some others, but I'm not sure:
=E2
=80
=93
Does anyone know what these are and what encoding issues I'm dealing with here?
BTW, I tried fetching these email via POP3 and it's the same thing. The reason I'm posting this on SO is not because I'm using a regular mail client to read the data. I'm reading via PHP out of maildir files. Perhaps a regular email client would detect what encoding this is and deal with it.
Thanks!
That looks like quoted-printable encoding.
This is a form of encoding for sending 8-bit character encodings over a medium which may not preserve the high bit - ie, they are not 8-bit clean. In the olden days, some mail servers did not preserve all 8 bits of a byte.
If you're seeing these in the message source but not in your email client, then this is normal.
If you're seeing these in your email client then something is messed up in whatever software the sender is using - most likely, the Content-Transfer-Encoding header has not been properly specified (which tells the email client how to decode it).
If you're writing an email client and want to be able to deal with this, you'll need to read the Content-Transfer-Encoding header. Of course, if you're doing that, you're also going to come up against multipart messages/attachments, base64, and much more.
I was wondering if somebody could shed some light on this browser behaviour:
I have a form with a textarea that is submitted to to the server either via XHR (using jQuery, I've also tried with plain XMLHttpRequest just to rule jQuery out and the result is the same) or the "old fashioned" way via form submit. In both cases method="POST" is used.
Both ways submit to the same script on the server.
Now the funny part: if you submit via XHR new line characters are transferred as "%0A" (or \n if I am not mistaken), and if you submit the regular way they are transferred as "%0D%0A" (or \r\n).
This, off course, causes some problems on the server side, but that is not the question here.
I'd just like to know why this difference? Shouldn't new lines be transferred the same no matter what method of submitting you use? What other differences are there (if any)?
XMLHttpRequest will when sending XML strip the CR characters from the stream. This is in accord with the XML specification which indicates that CRLF be normalised to simple LF.
Hence if you package your content as XML and send it via XHR you will lose the CRs.
In part 3.7.1 of RFC2616(HTTP1.1), it allows either \r\n,\r,\n to represent newline.
HTTP relaxes this requirement and allows the
transport of text media with plain CR or LF alone representing a line
break when it is done consistently for an entire entity-body. HTTP
applications MUST accept CRLF, bare CR, and bare LF as being
representative of a line break in text media received via HTTP.
But this does not apply to control structures:
This flexibility regarding
line breaks applies only to text media in the entity-body; a bare CR
or LF MUST NOT be substituted for CRLF within any of the HTTP control
structures (such as header fields and multipart boundaries).