HTTP: Generating ETag Header - language-agnostic

HTTP: Generating ETag Header - language-agnostic

How do I generate an ETag HTTP header for a resource file?

As long as it changes whenever the resource representation changes, how you produce it is completely up to you.
You should try to produce it in a way that additionally:
doesn't require you to re-compute it on each conditional GET, and
doesn't change if the resource content hasn't changed
Using hashes of content can cause you to fail at #1 if you don't store the computed hashes along with the files.
Using inode numbers can cause you to fail at #2 if you rearrange your filesystem or you serve content from multiple servers.
One mechanism that can work is to use something entirely content dependent such as a SHA-1 hash or a version string, computed and stored once whenever your resource content changes.

An etag is an arbitrary string that the server sends to the client that the client will send back to the server the next time the file is requested.
The etag should be computable on the server based on the file. Sort of like a checksum, but you might not want to checksum every file sending it out.
server client
<------------- request file foo
file foo etag: "xyz" -------->
<------------- request file foo
etag: "xyz" (what the server just sent)
(the etag is the same, so the server can send a 304)
I built up a string in the format "datestamp-file size-file inode number". So, if a file is changed on the server after it has been served out to the client, the newly regenerated etag won't match if the client re-requests it.
char *mketag(char *s, struct stat *sb)
{
sprintf(s, "%d-%d-%d", sb->st_mtime, sb->st_size, sb->st_ino);
return s;
}

From http://developer.yahoo.com/performance/rules.html#etags:
By default, both Apache and IIS embed data in the ETag that dramatically reduces the odds of the validity test succeeding on web sites with multiple servers.
...
If you're not taking advantage of the flexible validation model that ETags provide, it's better to just remove the ETag altogether.

How to generate the default apache etag in bash
for file in *; do printf "%x-%x-%x\t$file\n" `stat -c%i $file` `stat -c%s $file` $((`stat -c%Y $file`*1000000)) ; done
Even when i was looking for something exactly like the etag (the browser asks for a file only if it has changed on the server), it never worked and i ended using a GET trick (adding a timestamp as a get argument to the js files).

Ive been using Adler-32 as an html link shortener. Im not sure whether this is a good idea, but so far, I havent noticed any duplicates. It may work as a etag generator. And it should be faster then trying to hash using an encryption scheme like sha, but I havent verified this. The code I use is:
shortlink = str(hex(zlib.adler32(link)+(2**32-1)/2))[2:-1]

I would recommend not using them and going for last-modified headers instead.
Askapache has a useful article on this. (as they do pretty much everything it seems!)
http://www.askapache.com/htaccess/apache-speed-etags.html

The code example of Mark Harrison is similar to what used in Apache 2.2. But such algorithm causes problems for load balancing when you have two servers with the same file but the file's inode is different. That's why in Apache 2.4 developers simplified ETag schema and removed the inode part. Also to make ETag shorter usually they encoded in hex:
<inttypes.h>
char *mketag(char *s, struct stat *sb)
{
sprintf(s, "\"%" PRIx64 "-%" PRIx64 "\"", sb->st_mtime, sb->st_size);
return s;
}
or for Java
etag = '"' + Long.toHexString(lastModified) + '-' +
Long.toHexString(contentLength) + '"';
for C#
// Generate ETag from file's size and last modification time as unix timestamp in seconds from 1970
public static string MakeEtag(long lastMod, long size)
{
string etag = '"' + lastMod.ToString("x") + '-' + size.ToString("x") + '"';
return etag;
}
public static void Main(string[] args)
{
long lastMod = 1578315296;
long size = 1047;
string etag = MakeEtag(lastMod, size);
Console.WriteLine("ETag: " + etag);
//=> ETag: "5e132e20-417"
}
The function returns ETag compatible with Nginx. See comparison of ETags form different servers

Related

Haproxy frontend configuration to replace response header depending on query string

I used the following haproxy configuration in frontend to modify the response header of requests depending on a query string:
frontend my-frontend
acl is-foo urlp(foo) 1
http-response replace-header Set-Cookie "(.*)" "\1; SameSite=None" if is-foo
Depending on my information from the docs the acl should match for all requests like
example.com?a=b&foo=1&bar=2
example.com?foo=1
example.com?a=b&foo=1
And it should not match for requests like
example.com?a=b&foo=0&bar=2
example.com?a=b
example.com?a=b&foo=bar
The actual result is that the acl matches never.
If i invert the if i.e.: if !is-foo the replace-header happens on every request.
So the problem must be the acl which matches never.
I use haproxy 2.0.15

I got it working by myself.
It seems to be the case that urlp(foo) is not present at runtime when it has been executed for http-response.
So we need to store its value in a temporary variable using set-var(custom.name), before. At runtime in if condition we can access it with var(custom.name) and match it against our condition. I used urlp_val() instead of urlp() here because the value will be casted to int immediately.
frontend my-frontend
http-request set-var(txn.foo) urlp_val(foo)
http-response replace-header Set-Cookie "(.*)" "\1; SameSite=None" if { var(txn.foo) eq 1 }
Thank you for traveling.

Type of field 7 in auth response message of google cast protocol v2

The Google Cast protocol v2 has widely been reverse-engineered and is therefore already well-known. A good example of this is the Cast v2 Node library repository on GitHub which includes a detailed description of the cast v2 protocol.
However, whilst writing my own implementation of the protocol in Java using Netty, I realized that the auth response message is way more complex than described in the linked repository.
According to the repository, the message should look like:
message AuthResponse {
required bytes signature = 1;
required bytes client_auth_certificate = 2;
repeated bytes client_ca = 3;
}
However, the client sends 3 more fields. They have the indices 4, 6 and 7.
Field 4 is of wiretype VARINT and stands, as far as I know, for the SignatureAlgorithm the Cast-enabled device (Chromecast Gen2 and Chromecast Audio) has been challenged with.
Field 6 is also of type VARINT, but I have no idea what it stands for. During testing, it always had the value 0. (Maybe it stands for the client_ca certificate used for signing the client_auth_certificate?)
Field 7 is of wiretype LENGTH_DELIMITED. It is definetly not an UTF-8 encoded String since printing it out results in an unreadable mess. However, the sequence printed out contains the complete address that's also been used in the client_ca and client_auth_certificate, so I believe it has something to do with it. I've already tested whether this might be a certificate or RSA key, but both tests were negative. A file containing the raw byte sequence can be found here.
This brings me finally to my question:
Do you know what fields 6 and 7 stand for? Guesses based on the file's structure are also highly appreciated.

As I've found out, the protocol is practically open-source since the Chromium project includes the corresponding .proto-files in order to support streaming on Cast-enabled devices.
The complete protocol can be found here: https://github.com/chromium/chromium/blob/master/components/cast_channel/proto/cast_channel.proto
The structure of the AuthResponse message is therefore
message AuthResponse {
required bytes signature = 1;
required bytes client_auth_certificate = 2;
repeated bytes intermediate_certificate = 3;
optional SignatureAlgorithm signature_algorithm = 4
[default = RSASSA_PKCS1v15];
optional bytes sender_nonce = 5;
optional HashAlgorithm hash_algorithm = 6 [default = SHA1];
optional bytes crl = 7;
}

How to create a repeatable POST request that contains multipart-form-data?

I am trying to create a POST request that contains multipart-form-data that requires NT Credentials. The authentication request causes the POST to be resent and I get a unrepeatable entity exception.
I tried wrapping the MultipartContent entity that is produced with a BufferedHttpEntity but it throws NullPointerExceptions?
final GenericUrl sau = new GenericUrl(baseURI.resolve("Record"));
final MultipartContent c = new MultipartContent().setMediaType(MULTIPART_FORM_DATA).setBoundary("__END_OF_PART__");
final MultipartContent.Part p0 = new MultipartContent.Part(new HttpHeaders().set("Content-Disposition", format("form-data; name=\"%s\"", "RecordRecordType")), ByteArrayContent.fromString(null, "C_APP_BOX"));
final MultipartContent.Part p1 = new MultipartContent.Part(new HttpHeaders().set("Content-Disposition", format("form-data; name=\"%s\"", "RecordTitle")), ByteArrayContent.fromString(null, "JAVA_TEST"));
c.addPart(p0);
c.addPart(p1);
The documentation for ByteArrayContent says
Concrete implementation of AbstractInputStreamContent that generates repeatable input streams based on the contents of byte array.
Making all the parts repeatable does not solve the problem. Because this code
System.out.println("c.retrySupported() = " + c.retrySupported()); outputs c.retrySupported() = true.
I found the following documentation:
1.1.4.1. Repeatable entities An entity can be repeatable, meaning its content can be read more than once. This is only possible with self
contained entities (like ByteArrayEntity or StringEntity)
I have now converted my MultipartContent to a ByteArrayContent with a multi/part-form media type by extracting the string contents and still get the same error!
But I still get the following exception when I try and call request.execute().
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity.
So how do I go about convincing the ApacheHttpTransport to create a repeatable Entity?

I had to modify all the classes that inherited from HttpContent so that they would report back correctly with .retrySupported() so that the when the ApacheHttpTransport code was entered it would create repeatable content correctly.
The changes were made against version 1.20.0 because that is what I was using. I am submitting a pull request against dev branch HEAD so hopefully, this or some version of this will make it into the next release.
Here are the modifications that need to be merged in.

If content length of all parts in multipart entity is known (returned as a non negative value) the entity will be treated as repeatable. The easiest way to make multipart entity repeatable is to make all its parts repeatable.

How to upload multiple JSON files into CouchDB

I am new to CouchDB. I need to get 60 or more JSON files in a minute from a server.
I have to upload these JSON files to CouchDB individually as soon as I receive them.
I installed CouchDB on my Linux machine.
I hope some one can help me with my requirement.
If possible can someone help me with pseudo code.
My Idea:
Is to write a python script for uploading all JSON files to CouchDB.
Each and every JSON file must be each document and the data present in
JSON must be inserted same into CouchDB
(the specified format with values in a file).
Note:
These JSON files are Transactional, every second 1 file is generated
so I need to read the file upload as same format into CouchDB on
successful uploading archive the file into local system of different folder.

python program to parse the json and insert into CouchDb:
import sys
import glob
import errno,time,os
import couchdb,simplejson
import json
from pprint import pprint
couch = couchdb.Server() # Assuming localhost:5984
#couch.resource.credentials = (USERNAME, PASSWORD)
# If your CouchDB server is running elsewhere, set it up like this:
couch = couchdb.Server('http://localhost:5984/')
db = couch['mydb']
path = 'C:/Users/Desktop/CouchDB_Python/Json_files/*.json'
#dirPath = 'C:/Users/VijayKumar/Desktop/CouchDB_Python'
files = glob.glob(path)
for file1 in files:
#dirs = os.listdir( dirPath )
file2 = glob.glob(file1)
for name in file2: # 'file' is a builtin type, 'name' is a less-ambiguous variable name.
try:
with open(name) as f: # No need to specify 'r': this is the default.
#sys.stdout.write(f.read())
json_data=f
data = json.load(json_data)
db.save(data)
pprint(data)
json_data.close()
#time.sleep(2)
except IOError as exc:
if exc.errno != errno.EISDIR: # Do not fail if a directory is found, just ignore it.
raise # Propagate other kinds of IOError.

I would use CouchDB bulk API, even though you have specified that you need to send them to db one by one. For example, by implementing a simple queue that gets sent out every say 5 - 10 seconds via a bulk doc call will greatly increase performance of your application.
There is obviously a quirk in that and that is you need to know the IDs of the docs that you want to get from the DB. But for the PUTs it is perfect. (it is not entirely true, you can get ranges of docs using bulk operation if the IDs you are using for your docs can be sorted nicely).
From my experience working with CouchDB, I have a hunch that you are dealing with Transactional documents in order to compile them into some sort of sum result and act on that data accordingly (maybe creating next transactional doc in series). For that you can rely on CouchDB by using 'reduce' functions on the views you create. It takes a little practice to get reduce function working properly and is highly dependent on what it is you actually what to achieve and what data you are prepared to emit by the view so I can't really provide you with more detail on that.
So in the end the app logic would go something like that:
get _design/someDesign/_view/yourReducedView
calculate new transaction
add transaction to queue
onTimeout
send all in transaction queue
If I got that first part of why you are using transactional docs wrong all that would really change is the part where you getting those transactional docs in my app logic.
Also, before writing your own 'reduce' function, have a look at buil-in ones (they are alot faster then anything outside of db engine can do)
http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
EDIT:
Since you are starting, I strongly recommend to have a look at CouchDB Definitive Guide.
NOTE FOR LATER:
Here is one hidden stone (well maybe not so much a hidden stone but not an obvious thing to look out for for the new-comer in any case). When you write reduce function make sure that it does not produce too much output for the query without boundaries. This will extremely slow down the entire view even when you provide reduce=false when getting stuff from it.

So you need to get JSON documents from a server and send them to CouchDB as you receive them. A Python script would work fine. Here is some pseudo-code:
loop (until no more docs)
get new JSON doc from server
send JSON doc to CouchDB
end loop
In Python, you could use requests to send the documents to CouchDB and probably to get the documents from the server as well (if it is using an HTTP API).

You might want to checkout the pycouchdb module for python3. I've used it myself to upload lots of JSON objects into couchdb instance. My project does pretty much the same as you describe so you can take a look at my project Pyro at Github for details.
My class looks like that:
class MyCouch:
""" COMMUNICATES WITH COUCHDB SERVER """
def __init__(self, server, port, user, password, database):
# ESTABLISHING CONNECTION
self.server = pycouchdb.Server("http://" + user + ":" + password + "#" + server + ":" + port + "/")
self.db = self.server.database(database)
def check_doc_rev(self, doc_id):
# CHECKS REVISION OF SUPPLIED DOCUMENT
try:
rev = self.db.get(doc_id)
return rev["_rev"]
except Exception as inst:
return -1
def update(self, all_computers):
# UPDATES DATABASE WITH JSON STRING
try:
result = self.db.save_bulk( all_computers, transaction=False )
sys.stdout.write( " Updating database" )
sys.stdout.flush()
return result
except Exception as ex:
sys.stdout.write( "Updating database" )
sys.stdout.write( "Exception: " )
print( ex )
sys.stdout.flush()
return None
Let me know in case of any questions - I will be more than glad to help if you will find some of my code usable.

REST resource returning different object depending on state

I'm trying to define a REST API and I'm having trouble with one requirement.
I have an action that the API user can do that is the same thing, but can be done in two different ways.
For example, say my user uses my API to change the intensity of a light. I will have an URL something like
api/light/intensity
One option the user has to change the intensity is to set as a % of the maximum luminosity, the other option is setting the intensity as an exact value, in lumens (there is a detector for that) and he can pass the "precision" that can be low, medium and high (it changes the time it takes to get to the correct intensity).
I want the user to be able to GET the current intensity, meaning in which mode he is and depending on the mode, the % or the value in lumens and the precision.
This is where I'm lost, my GET will return a JSON object for example, is it OK to send something like
{
"Mode" = "Percent",
"Percent" = 50.5
}
when I'm in "percentage" mode and
{
"Mode" = "Exact",
"Lumens" = 200,
"Precision" = "High"
}
When I'm in "lumens" mode?
If that seems OK, how would I tell the user which type of "object" he should parse?
What would be the best way to let the user send his changes? I was thinking about having two URL, one for each mode, like
PUT /api/light/intensity/exact and PUT /api/light/intensity/percent
And both being waiting for JSON objects similar to the ones above, without the Mode.

Use HTTP Content negotiation. This allows:
the client to tell the server what representation of a resource it wants to GET,
the server to tell the client what representation of a resource it returns to the client,
the client to tell the server what represenation of a resource it is PUTing to the server.
Define two vendor content types:
application/vnd.com.example.light.intensity.percentage+json
application/vnd.com.example.light.intensity.lumens+json
The client tells the server which of both it wants:
GET /api/light/intensity/
Accept: application/vnd.com.example.light.intensity+percentage
The server responds:
200 OK
Content-Type: application/vnd.com.example.light.intensity+percentage
{
"Percent" = 50.5
}
The client wants to change the intensity:
PUT /api/light/intensity/
Content-Type: application/vnd.com.example.light.intensity+percentage
{
"Percent" = 42.7
}
The server knows from the Content-Type header how to interpret the JSON body. In this example it handles the request as in 'Percent' mode.
If the second content type was used, client and server would know to interpret the request/response as in 'Lumes' mode.
Edit: Note that the GET and PUT request use the same URL because the requests are about the same resource: the light intensity. All that differs is the representation of this resource. The proper way to handle this are content types.

The specifics will depend a bit on your API, and the needs of your users. The same GET method call to a RESTful API should always return the same value: a representation of the resource as defined by the information in the URL and nothing else. If you're maintaining state in the system, you're violating a precept of REST. (Edit: as pointed out by Gimly, that statement is unclear. It's not a violation of RESTful design for the system to maintain its own internal state, especially if a request changes the state of the system with a PUT, POST or DELETE. It's a violation for a request to rely on that state to return a representation of the resource, or to request a state change. Each request should be self-contained.)
I'd use a query string to change the format of the representation:
GET /api/light/intensity
GET /api/light/intensity?f=percent
That way /api/light/intensity always refers to the same resource (defaulting to the "exact" representation, which has the most data), and the query string "filters" the representation, similarly to a search query. It removes some data (in this case, the exact luminosity and precision) in favor of a relative representation in percent of some maximum value. Alternately, you could think of it as controlling the output format: GET /foo.json vs GET /foo.xml. The resource is the same, but the representation differs.
For updating a resource, you can take an object as you've described. Your server will have to understand the different formats, but you could either PUT to the bare URL, or again use a query parameter to control the format expected by the server, and then let your payload be more abstract, using value instead of lumens or percentage:
PUT /api/light/intensity
Payload: {"value": 200, "precision": "high"}
PUT /api/light/intensity?f=percent
Payload: {"value": 50.5}
That allows you to structure the API for your light resource in such a way that intensity is one property of the resource. "Percent" then becomes a convenience representation in the output, so when you return the entire light resource, it would read something like:
"light": {
"name": "the light",
"id": 12345,
"intensity": 200,
"max-intensity": 400,
...
}
So the API user could calculate current percent based on intensity and max-intensity. (You could of course substitute "percent" for "max-intensity" and let the user do the math the other way, but it feels more natural to me to provide absolute values and let the math calculate relative values.
Edit
Please see Tichodroma's answer for the better way of handling this. I'm leaving the answer because the discussion in the comments was useful to me, and may be useful to others in the future.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008