HTML multipart form - maximum length of "boundary" string? - html

In a multi-part (i.e. Content-Type=multipart/form-data) form, is there an upper limit on the length of the boundary string that an HTTP server should accept?
As far as I can tell, the relevant RFCs say 70 chars:
RFC2616 (HTTP/1.1) section "3.7 Media Types" says that the allowed types in the Content-Type header is defined by RFC1590 (Media Type Registration Procedure).
RFC1590 updates RFC-1521(MIME).
RFC1521 says that a boundary "must be no longer than 70 characters, not counting the two leading hyphens".
The same text also appears in RFC2046 which supposedly obsoletes RFC1521.
So can I be certain all the major HTTP/1.1 browsers out there today adhere to this limit? Are there any browsers (or other HTTP clients/libraries) known to break this limit?
Is there some other spec or common rule-of-thumb I'm missing that says the string will be shorter than 70 chars? In Chrome(ium) I get something like this: ----WebKitFormBoundaryLu4dNSGEhJZUgoe5, which is obviously shorter than 70 chars.
I'm asking this question because my server is running in an extremely memory-constrained environment, so "malloc a buffer large enough to hold the entire header string" is not an ideal answer.

As you note, RFC 2046 updated the MIME spec, but kept the restriction of the maximum boundary string to 70 characters, not counting the two leading hyphens.
I think it's a fair assumption that the spec is followed by all major browsers (and all MIME-using clients, like mail programs) since otherwise passing around multipart data would be very risky indeed.
To be sure, I've experimentally verified it for you using the latest versions of:
curl: ----------------------------5a56a6c893f2 (40)
Chrome 30 (WebKit): ----WebKitFormBoundarym0vCJKBpUYdCIWQG (38)
Safari 6 (WebKit, and same as Chrome): ----WebKitFormBoundaryFHUXvJBZwO2JKkNa (38)
FireFox 24: ---------------------------7096603861379320641089344535 (55)
IE 10: ---------------------------7dd1961640278 (40) - same technique as curl!
Apache HttpClient: -----------------------------1294919323195 (42)
Thus not only does every major browser/client conform, but all would allow you to save 15 allocated bytes per boundary per buffer from the theoretical max. If you could trivially switch on user agent, you could squeeze even more performance out. ;-)

Related

When should the ERC-1155 Metadata URI need to be zero-padded to 64 hex characters?

EIP-155 states that the "The string format of the substituted hexadecimal ID MUST be leading zero padded to 64 hex characters length if necessary."
In what situation is a 0-padded hex ID necessary? It is odd they chose to use the keyword MUST here as it seems like the choice of whether to use 64 hex character padding is completely arbitrary.
I understand that there cannot exist more than 2^256 ids (64 hex digits), but wouldn't the choice of metadata URI for an ERC-1155 token be implementation-dependent?
For example, if I wanted to create an ERC-1155 token composed only of 64 NFTs, I'd much prefer defining metadata URLs as follows:
https://{DOMAIN}/1.json
https://{DOMAIN}/2.json
...
https://{DOMAIN}/40.json (64 in hex)
I suspect that ERC-1155 was built with uint256 in mind as the standard for numeric types and that requiring ID to be padded to 64 hex characters means that all 256 bits of information are specified explicitly. Maybe this alleviates potential issues with dirty leading bits?
Padding doesn't appear to be strictly necessary to function - I have seen smart contracts which use unpadded metadata URLs, such as Mining.game
(https://mumbai.polygonscan.com/address/0x1a3d0451f48ebef398dd4c134ae60846274b7ce0#code),
(https://api.mining.game/1.json).
This is on the Polygon testnet, not a mainnet, so keep in mind that code quality may not be stellar. But regardless, it appears to work.

HTML tel: link to emergency number

How do I create a HTML link to an emergency number like 911 or 112?
The RFC says
The
phone number can be represented in either global or local
notation. All phone numbers MUST use the global form unless they
cannot be represented as such.
[Emergency numbers ("911", "112")] cannot be represented in global form and
need to be represented as a local number with a context.
From the local-context section I don't find it easy to understand what a "local-context" is, let alone what the correct one for this case is. It lists domain prefixes like houston.example.com or a numeric prefix like +1, and in one paragraph it says
A context consisting of the initial digits of a global number does
not imply that adding these to the local number will generate a valid
E.164 number. It might do so by coincidence, but this cannot be
relied upon. (For example, "911" should be labeled with the context
"+1", but "+1-911" is not a valid E.164 number.)
But the phrasing of this paragraph is again very confusing.
Is
112
now the correct way of doing it, and the fact that it is not a valid E.164 number is irrelevant?
Or is the fact that it is not a valid E.164 number a problem?
In some other places I see people using
112
And again other people recommend
112
But when I tap that link on Android, the dialer opens with the number
112;746632668398+49
The cited Section 5.1.5 the RFC states
A context consisting of the initial digits of a global number does
not imply that adding these to the local number will generate a valid
E.164 number. It might do so by coincidence, but this cannot be
relied upon. (For example, "911" should be labeled with the context
"+1", but "+1-911" is not a valid E.164 number.)
I interpret this that emergency numbers should be prefixed by their country-secific prefixes, i.e.
in the US, 911 should be used as 911
in Germany 112 should be used as 112
The rest of the paragraph is about this syntax not being compliant to the E.164 recommendation. As far as I understand, E.164 is irrelevant in this context though.

http/2 dynamic table size update clarification

In the http/2 protocol we see the following statement for dynamic table size update:
SETTINGS_HEADER_TABLE_SIZE (0x1): Allows the sender to inform the
remote endpoint of the maximum size of the header compression
table used to decode header blocks, in octets. The encoder can
select any size equal to or less than this value by using
signaling specific to the header compression format inside a
header block (see [COMPRESSION]). The initial value is 4,096
octets.
The initial size for both encoder and decoder is 4096 bytes according to RFC.
In SETTINGS frame in wireshark, i can see the new table size passed to the ENDPOINT ( google.com in this case )
0000 00 00 12 04 00 00 00 00 00 **00 01 00 01 00** 00 00
0010 04 00 02 00 00 00 05 00 00 40 00
00 01 00 01 00 is a pattern for SETTINGS_HEADER_TABLE_SIZE = 65536
What i can't understand does it actually tells the endpoint that the dynamic table used to decode the headers from this ENDPOINT inside browser is 65536 bytes long, or does it tell the ENDPOINT that ENDPOINT dynamic table size should be 65536 ?
And reversed, i assume that the ENDPOINT must sent SETTINGS_HEADER_TABLE_SIZE to tell the browser its dynamic table used for decoding the headers from ENDPOINT but i don't see that option sent back by the ENDPOINT. Can someone explain this?
Also there is a signal for dynamic table size update, mentioned in RFC, which is sent inside the HEADERS frame.
A dynamic table size update starts with the '001' 3-bit pattern,
followed by the new maximum size, represented as an integer with a
5-bit prefix (see Section 5.1).
The new maximum size MUST be lower than or equal to the limit
determined by the protocol using HPACK. A value that exceeds this
limit MUST be treated as a decoding error. In HTTP/2, this limit is
the last value of the SETTINGS_HEADER_TABLE_SIZE parameter (see
Section 6.5.2 of [HTTP2]) received from the decoder and acknowledged
by the encoder (see Section 6.5.3 of [HTTP2]).
There is this line received from the decoder and acknowledged by the encoder, so does this signal is sent to limit the encoding dynamic table size ? I comletely lost, and it is not obvious from wireshark captures how this is handled correctly
UPDATE
Ok, i looked more on the logs of wireshark from firefox on the site of walmart.com ( since there is a lot of headers involved). Sometimes firefox sends the dynamic table size update signal in the headers frame, with the size smaller then the initial SETTINGS_HEADER_TABLE_SIZE sent by firefox on the beginning of connection. I wrote a firefox dynamic table on a paper and shrink it as if i expected the dynamic table size update would do. Turns out that shrinking it to smaller size produce incorrect headers.. So apparently the dynamic table size update affect only remote endpoint.. ( well i guess it is ). I also looked up on nigthttp and a c# implementation, and there they actually shrink the encoder table size, while sending dynamic table size update signal. I get a feeling that everyone have a complete different implementation for this protocol.. it's a complete nightmare to understand.
As you figured out there are multiple things which indicate the table size:
The maximum table size setting (as indicated in a HTTP/2 SETTINGS frame)
The actual used table size - which is encoded in a HEADERS frame in HPACK format
If we only look at the headers which are flowing from the client (browser) to a server we will see the following things going on:
As long as nobody has an information from the remote side the default values are used, which means the client expects that the server supports a maximum table size of 4kB (SETTINGS_HEADER_TABLE_SIZE) and it also uses this size as the initial table size.
The server can optionally inform the client through the HTTP/2 SETTINGS frame that it only supports smaller header tables. This information is contained in the SETTINGS_HEADER_TABLE_SIZE field, a SETTINGS frame which is sent from the server to the client.
The client can adjust the actually used [dynamic] header table size through the Dynamic Table Size Update in a HEADERS frame. This will always indicate the table size that is actually used on the encoder side - and which therefore also must be set on decoder side to be able to retrieve the same data. The sending side is free to set the actual used table size to anything between 0 and the maximum size that is supported by the remote side (in SETTINGS_HEADER_TABLE_SIZE). A typical strategy for implementations is to to always shrink the used table size when it's currently larger than what the remote supports. And to increase the table size when the remote supports bigger tables and the implementation also still can go bigger. There might be some race conditions where one end already set and used a larger table size than what the remote side actually supports, e.g. because the SETTINGS frame which indicates the lower limit was not received before a client encoded the first pair of headers. In that case the remote side might detect the use of a too big table size and reset the connection. To avoid this situations both sides of the connection should in reality at least support the default table size of 4kB, and ideally only increase the limit dynamically and never shrink it.
Now I mentioned that one pair of max. table size settings and actual table size settings is used for transmitting HEADERS from one end of the connection (client) to the other (server). But in total there is also a second pair of both, for the headers which are sent from the server to the client. For this case the client/browser also indicates in a SETTINGS frame how big the max. header table is that it supports and the server sends the size of the actual header table that is used.

Tiff versus BigTiff

Please let me know if there is another Stack Exchange community this question would be better suited for.
I am trying to understand the basic differences between Tiff and BigTiff. I have looked on various sites and the only difference that is mentioned is that BigTiff uses 64-bit offsets while Tiff uses 32-bit offsets. That being said, you would need to know which of the two types you are reading. How is this done? According to https://www.leadtools.com/help/leadtools/v19/main/api/tifffmt.html, this is done by reading a file flag. However, the flag they are referring to appears to be unique to their own reader as I cannot find a corresponding data field in the specifications as shown by http://www.fileformat.info/format/tiff/egff.htm. What am I missing? Does BigTiff use a different file header than Tiff?
Everything you need to know is described in the BigTIFF link posted by #cgohlke. This is just to provide an answer to your question:
Yes, it uses a different file header.
Normal TIFF uses the following header:
2 byte byte order mark, "II" for "Intel"/little endian, or "MM" for "Motorola"/big endian.
The (version) number 42* as a 16 bit value, in the endianness given.
Unsigned 32 bit offset to IFD0
BigTIFF uses a slightly different header:
2 byte byte order mark as above
The (version) number 43 as a 16 bit value, in the endianness given.
Byte size of offset as a 16 bit value, always 8 for BigTIFF
2 byte padding, always 0 for BigTIFF
Unsigned 64 bit offset to IFD0
*) The value 42 was chosen for its "deep philosophical significance". Or according to the official specification, "[a]n arbitrary but carefully chosen number"...

Is a unicode user agent legal inside an HTTP header?

An application I'm maintaining loads user agents extracted from web logs into a MySQL table column using the 'latin1' charset. Occasionally, it fails to load a user agent that looks like this:
Mozilla/5.0 (Iâ?; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML^C like Gecko) Version
I suspect it's choking on Iâ?. I'm working to figure out if this should be supported, or if it's corruption introduced by the upstream logging system. Is this a legal user agent in a HTTP header?
RFC 2616 (HTTP 1.1) says that message header contents must be "consisting of either *TEXT or combinations of token, separators, and quoted-string". If you look at the definitions for TEXT etc you will find that legal characters are those with byte values not in the [0, 31] range and not equal to 127; therefore characters such as â are as far as I can tell legal as per the spec.
Technically, octets > 127 are allowed in comments. RFC 2616 makes them default to ISO-8859-1, but HTTPbis (the upcoming revision of RFC 2616) has removed that rule so that sometimes in the distant future, we may be able to move to a sane encoding.
Recommendation: strip all octets > 127.
HTTP 1.1 RFC2616 refers to ISO-8859-1, which is a latin based single byte character set.
With the consideration that HTTP traffic is supposed to be single byte, I also am using the latin1 character set for my similar logs. The decision was simply to make my indexes smaller.
If you use UTF8 with VARCHAR, only the characters that are multi-byte require additional bytes, so in table space, it's not much extra. However, indexes are stored fixed-width, so, they're padded with spaces just in case you need them (UTF8 indexes are three times as large as latin1 indexes).
It doesn't affect me if the occasional odd header is unreadable. However, if you're not indexing the column, you may as well use UTF8.