What is the standards-compliant way to advertise STARTTLS in ESMTP capabilities? - smtp

When I connect to an SMTP server, and issue an EHLO (ESMTP) greeting, some servers respond with:
250-STARTTLS
And other servers, respond with:
250 STARTTLS
Which is correct? RFC 3207 suggests that the hyphen is correct. But RFC 2487 suggests there shouldn't be a hyphen. Are they both correct? of course, deployed code usually takes precedence over standards, but it would be nice to clarify this.
As a random sample, MessageLabs offer "250 STARTTLS" whereas Hotmail/Outlook offer "250-STARTTLS".

The answer is in RFC 2821. It specifies the following:
Normally, the response to EHLO will be a multiline reply. Each line
of the response contains a keyword and, optionally, one or more
parameters. Following the normal syntax for multiline replies, these
keywords follow the code (250) and a hyphen for all but the last
line, and the code and a space for the last line
So RFC2487 has it without a space, because it's the last line of their sample SMTP session. RFC3207 has it with a hyphen, because it's not the last line (and this holds true for my Hotmail/Messagelabs example above).

Related

How can we send dot in smtp protocol

If we want when using the SMTP protocol, inside the text a
\n
.
\n
What should we do so that it is not confused with the endpoint of the email in the SMTP protocol?
The RFC has a separate section about the escape mechanism for sending a lone dot on a line; it's colloquially called "dot stuffing".
In brief, any leading dot on a line needs to be escaped with another dot. So, to send a line containing one literal dot, you'd send
..
and to send a line containing two literal dots, you'd send
...
etc, with the usual <CRLF> line terminators required by the protocol (your example seems to incorrectly assume just <LF> which is often tolerated in practice, but technically not correct). The receiving server correspondingly strips any initial dot from each line of the data.
The RFC refers to this mechanism as "transparency"; RFC 5321 section 4.5.2
Here's a helpful link to the RFC (top of page 37 if the link isn't working). Here's the important takeaway:
The custom of accepting lines ending only in <LF>, as a concession to
non-conforming behavior on the part of some UNIX systems, has proven
to cause more interoperability problems than it solves, and SMTP
server systems MUST NOT do this, even in the name of improved
robustness. In particular, the sequence "<LF>.<LF>" (bare line
feeds, without carriage returns) MUST NOT be treated as equivalent to
<CRLF>.<CRLF> as the end of mail data indication.
Basically SMTP servers are not actually listening for \n.\n as the terminator. Instead they're expecting \r\n.\r\n.
The \n character is a newline (in the RFC it's called LF) whereas the \r\n characters constitute a carriage return (in the RFC it's called CRLF).
So, it's an important distinction that SMTP servers actually terminate data with <CRLF>.<CRLF> not <LF>.<LF>, otherwise SMTP servers would run into exactly the issue you bring up here.

Some questions about SMTP RFC

I have two questions about SMTP RFC:
What value should I pass as the argument for the EHLO command if I don't have my own domain name?
The domain name given in the EHLO command MUST be either a primary
host name (a domain name that resolves to an address RR) or, if
the host has no name, an address literal, as described in
Section 4.1.3 and discussed further in the EHLO discussion of
Section 4.1.4.
I don't really undesrstand Section 4.1.3. Can you give me an example or rephrase it?
Which headers are required to send in the DATA section?
Thanks in advance.
Argument to EHLO in the absence of a domain name
Section 4.1.3 Address Literals of RFC 2821 says:
Sometimes a host is not known to the domain name system and
communication (and, in particular, communication to report and repair
the error) is blocked. To bypass this barrier a special literal form
of the address is allowed as an alternative to a domain name. For
IPv4 addresses, this form uses four small decimal integers separated
by dots and enclosed by brackets such as [123.255.37.2], which
indicates an (IPv4) Internet Address in sequence-of-octets form.
so a simple EHLO [123.255.37.2] suffices (with the actual IP address of your SMTP server of course). Or it could be a properly formatted IPv6 instead.
Required headers
Section 3.6. Field definitions of RFC 2822 says:
The only required header fields are the origination date field and
the originator address field(s). All other header fields are
syntactically optional.
so only From: and Date: are required.
If you don't have a domain name, you should use your IP address:
EHLO [192.168.1.1]
It's kind of a ridiculous requirement in the protocol seeing as how there's no real value in this piece of information. The server shouldn't trust it (obviously) and it is trivial for the server to get the IP address of the connecting client, anyway.

weird characters in HTML email

I'm reading email from a maildir and some emails have weird sets of characters in them:
=3D
=09
I think =3D is = and =09 is a space. There are some others, but I'm not sure:
=E2
=80
=93
Does anyone know what these are and what encoding issues I'm dealing with here?
BTW, I tried fetching these email via POP3 and it's the same thing. The reason I'm posting this on SO is not because I'm using a regular mail client to read the data. I'm reading via PHP out of maildir files. Perhaps a regular email client would detect what encoding this is and deal with it.
Thanks!
That looks like quoted-printable encoding.
This is a form of encoding for sending 8-bit character encodings over a medium which may not preserve the high bit - ie, they are not 8-bit clean. In the olden days, some mail servers did not preserve all 8 bits of a byte.
If you're seeing these in the message source but not in your email client, then this is normal.
If you're seeing these in your email client then something is messed up in whatever software the sender is using - most likely, the Content-Transfer-Encoding header has not been properly specified (which tells the email client how to decode it).
If you're writing an email client and want to be able to deal with this, you'll need to read the Content-Transfer-Encoding header. Of course, if you're doing that, you're also going to come up against multipart messages/attachments, base64, and much more.

Syntax of HTTP-status headers

There're many ways to write an HTTP-status header:
HTTP/1.1 404 Not Found
Status: 404
Status: 404 Not Found
but which is the semantically-correct and spec-compliant way?
Edit: By status headers I mean this, using a function such as PHP's header().
Adding some information some time later, since I came across this question whilst researching something related.
I believe the Status header field was originally invented as part of the CGI specification, RFC 3875:
https://www.rfc-editor.org/rfc/rfc3875#section-6.3.3
To quote:
The Status header field contains a 3-digit integer result code that
indicates the level of success of the script's attempt to handle the
request.
Status = "Status:" status-code SP reason-phrase NL
status-code = "200" | "302" | "400" | "501" | extension-code
extension-code = 3digit
reason-phrase = *TEXT
It allows a CGI script to return a status code to the web server that overrides the default seen in the HTTP status line. Usually the server buffers the result from the script and emits a new header for the client. This one is a valid HTTP header which starts with an amended HTTP status line and omits the scripts "Status:" header field (plus some other transformations mandated by the RFC).
So all of your examples are valid from a CGI script, but only the first is really valid in a HTTP header. The latter two are only valid coming from a CGI script (or perhaps a FastCGI application).
A CGI script can also operate in "non-parsed header" (NPH) mode, when it generates a complete and valid HTTP header which the web server passes to the client verbatim. As such this shouldn't include a Status: header field.
Note, what I am interested in is what which status should win if an NPH script gets it a bit wrong and emits the Status: header field, possibly in addition to the HTTP status line. I can't find any clear indication so and I suspect it is left to the implementation of whatever is parsing the output, either the client or the server.
Since https://www.rfc-editor.org/rfc/rfc2616#section-6 and more specifically https://www.rfc-editor.org/rfc/rfc2616#section-6.1 does not mention use of "Status:" when indicating a status code, and since the official list of headers at http://www.iana.org/assignments/message-headers/message-headers.xml does not mention "Status", I'd be inclined to believe it should not be served with it as a header.
The closest thing I've found to an answer is the Fast CGI spec, which states to set status codes through Status and Location headers.
A lot of them are pretty much arbitrary strings, but there here is the w3c's spec for the commonly used ones
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

New line characters get submitted differently

I was wondering if somebody could shed some light on this browser behaviour:
I have a form with a textarea that is submitted to to the server either via XHR (using jQuery, I've also tried with plain XMLHttpRequest just to rule jQuery out and the result is the same) or the "old fashioned" way via form submit. In both cases method="POST" is used.
Both ways submit to the same script on the server.
Now the funny part: if you submit via XHR new line characters are transferred as "%0A" (or \n if I am not mistaken), and if you submit the regular way they are transferred as "%0D%0A" (or \r\n).
This, off course, causes some problems on the server side, but that is not the question here.
I'd just like to know why this difference? Shouldn't new lines be transferred the same no matter what method of submitting you use? What other differences are there (if any)?
XMLHttpRequest will when sending XML strip the CR characters from the stream. This is in accord with the XML specification which indicates that CRLF be normalised to simple LF.
Hence if you package your content as XML and send it via XHR you will lose the CRs.
In part 3.7.1 of RFC2616(HTTP1.1), it allows either \r\n,\r,\n to represent newline.
HTTP relaxes this requirement and allows the
transport of text media with plain CR or LF alone representing a line
break when it is done consistently for an entire entity-body. HTTP
applications MUST accept CRLF, bare CR, and bare LF as being
representative of a line break in text media received via HTTP.
But this does not apply to control structures:
This flexibility regarding
line breaks applies only to text media in the entity-body; a bare CR
or LF MUST NOT be substituted for CRLF within any of the HTTP control
structures (such as header fields and multipart boundaries).