Streaming without Content-Length in response - html

I'm using Node.js, Express (and connect), and fluent-ffmpeg.
We want to stream audio files that are stored on Amazon S3 through http.
We have all working, except that we would like to add a feature, the on-the-fly conversion of the stream through ffmpeg.
This is working well, the problem is that some browsers checks in advance before actually getting the file.
Incoming requests containing the Range header, for which we reply with a 206 with all the info from S3, have a fundamental problem: we need to know in advance the content-length of the file.
We don't know that since it is going through ffmpeg.
One solution might be to write out the resulting content-length directly on S3 when storing the file (in a special header), but this means we have to go through the pain of having queues to encode after upload just to know the size for future requests.
It also means that if we change compressor or preset we have to go through all this over again, so it is not a viable solution.
We also noticed big differencies in the way Chrome and Safari request the audio tag src, but this may be discussion for another topic.
Fact is that without a proper content-length header in response everything seems to break or browsers goes in an infinite loop or restart the stream at pleasure.
Ideas?

This seems to be working for me.
It would be great if you could confirm whether it gives the expect results in your browsers too.
res.writeHead(200, {
'Transfer-Encoding': 'chunked'
, 'Content-Type': 'audio/mpeg'
, 'Accept-Ranges': 'bytes' //just to please some players, we do not actually allow seeking
});
Basically, you tell the browser that you are going to stream using chunked encoding. An issue may be that some browsers do not like streaming without know how much bytes they should expect in total.

Related

How to force re-validation of cached resource?

I have a page /data.txt, which is cached in the client's browser. Based on data which might be known only to the server, I now know that this page is out of date and should be refreshed. However, since it is cached, they will not re-request it for a long time (until the cache expires).
The client is now requesting a different page /foo.html. How can I make the client's browser re-request /data.txt and update its cache?
This should be done using HTTP or HTML (not all clients have JS).
(I specifically want to avoid the "cache-busting" pattern of appending version numbers to the /data.txt URL, like /data.txt?v=2. This fills the cache with useless entries rather than replacing expired ones.)
Edit for clarity: I specifically want to cache /data.txt for a long time, so telling the client not to cache it is unfortunately not what I'm looking for (for this question). I want /data.txt to be cached forever until the server chooses to invalidate it. But since the user never re-requests /data.txt, I need to invalidate it as a side effect of another request (for /foo.html).
To expand my comment:
You can use IF-Modified-Since and Etag, and to invalidate the resource that has been already downloaded you may take a look at the different approaches suggested in Clear the cache in JavaScript and fetch(), how do you make a non-cached request?, most of the suggestions there mentioned fetching the resource from JavaScript with no-cache header fetch(url, {cache: "no-store"}).
Or, if you can try sending a Clear-Site-Data header if your clients' browsers are supported.
Or maybe, give up this time only for the cache-busting solution. And if it's possible for you, rename the file to something else rather than adding a querystring as suggested in Revving Filenames: don’t use querystring.
Update after clarification:
If you are not maintaining a legacy implementation with users that already have /data.txt cached, the use of Etag And IF-Modified-Since headers should help.
And for the users with the cached versions, you may redirect to: /newFile.txt or /data.txt?v=1 from /foo.html. The new requests will have the newly added headers.
The first step is to fix your cache headers on the data.txt resource so it uses your desired cache policy (perhaps Cache-Control: no-cache in conjunction with an ETag for conditional validation). Otherwise you're just going to have this problem over and over again.
The next step is to get clients who have it in their cache already to re-request it. In general there's no automatic way to achieve this, but if you know they're accessing foo.html then it should be possible. On that page you can make an AJAX request to data.txt with the Cache-Control: no-cache request header. That should force the browser to bypass the cache and get a fresh version, and the cache should then be repopulated with the new version.
(At least, that's how it's supposed to work. I've never tried this, and I've seen reports here that browsers don't handle Cache-Control request headers properly.)

Handling large data through restapi

I have a RestFul server that is suppuse to return a large json object more specifically an array of objects to browsers. For example 30,000 points will have a size of 6.5mb.
But I get this content mismatch error in browser when speed is slow. I feel it is because large data throught rest api breaks up something. Even in Postman sometimes it fails to render even though i see data of 6.5 mb received.
My Server is in NodeJS. and return content-type header is application/json.
My Question is
Would it make more sense if I return a .json file. Will the browser be able to handle. If yes, then I will download the file and make front end changes.
Old URL - http://my-rest-server/data
Proposed Url - http://my-rest-server/data.json
What would be content-type in the proposed url?
Your client can't possibly expect to want all of the data at once but still, want their data fast data.
...but you might want to look into sending data in chunks and streams:
https://medium.freecodecamp.org/node-js-streams-everything-you-need-to-know-c9141306be93

technical inquiry - HTML transfer of image from server to browser

When an image is uploaded from the client's machine to the client (browser), it requires FileReader() API in html, thereafter a base64 encoded url (say) of the image is sent in chunks to the server, where it needs to be re-assembled. All of this is taken care by the developer.
However, when an image is sent from the server to the client, only the directory path of the image inside the server machine suffices, no chunking and encoding is required.
My questions are:
1. Does the server send the image in chunks to the html file. If it does not, how does sending full images not bottle server's network? What would happen in case of large video files?
2. In what form of binary data is the image sent to the client - base64url / ArrayBuffer / binarystring / text / etc.
3. If the server does send the image in chunks, who is doing the chunking and the re-assembly on the client thereafter?
Thanks.
HTML isn't really important here. What you care about are the transport protocols used - HTTP and TCP, most likely.
HTTP isn't chunked by default, though there are advanced headers that do allow that - those are mostly used to support seeking in large files (e.g. PDF, video). Technically, this isn't really chunking - it's just the infrastructure for allowing for partial downloads (i.e. "Give me data from byte 1024 to byte 2048.").
TCP is a stream-based protocol. From programmer point of view, that's all there is to it - no chunking. Technically, though, it will process your input data and send it as distinct packets that are reassembled in-order on the other side. This is a matter of practicality - it allows you to manage data latency, streaming, packet retransmission etc. TCP handles the specifics during connection negotiation - flow control, window sizing, congestion control...
That's not the end of it, though. All the other layers add their own bits - their own ways to package the payload and split it as necessary, their own ways to handle routing and filtering, their own ways to handle congestion...
Finally, just like HTTP natively supports downloads, it supports uploading data as well. Just send an HTTP request (usually POST or PUT) and write data in a format the server understands - anything from text through base-64 to raw binary data. The limiting factor in your case isn't the server, the browser or HTTP - it's JavaScript. The basic mechanism is still the same - a request followed by a response.
Now, to address your questions:
Server doesn't send images to the HTML file. HTML only contains an URL of the image[1], and when the browser sees an URL in the img tag, it will initiate a new, separate request just for the image data. It isn't fundamentally different from downloading a file from a link. As for the transfer itself, it follows pretty much exactly the same way as the original HTML document - HTTP headers, then the payload.
Usually, raw binary data. HTTP is a text-based protocol, but it's payload can be arbitrary. There's little reason to use Base-64 to transfer image data (though in the past, there have been HTTP and FTP servers that didn't support binary at all, so you had to use something like Base-64).
The HTTP server doesn't care (with the exception of "partial downloads" mentioned above). The underlying network protocols handle this.
[1] Nowadays, there's methods to embed images directly in the HTML text, but it's of varying practicality depending on the image size, caching requirements etc.

How to get browsers to cache video response from Azure CDN?

I have uploaded an mp4 video animation to Azure Blob Storage. The headers are are all default apart from setting the Content-Type to video/mp4. The video can be accessed at http://paddingtondev.blob.core.windows.net/media/1001/animation_default_headers.mp4
I have an Azure CDN sitting over that blob storage account. The URL for the same video through the CDN is http://az593791.vo.msecnd.net/media/1001/animation_default_headers.mp4
When I access the blob-stored video through an HTML5 video element on a web page, the browser (have tested in FF and Chrome) receives the entire video in a 200 HTTP response. Further requests for that video then receive a 304 response from blob storage.
However, when you request the video through the Azure CDN, it helpfully returns it to you as a series of HTTP 206 partial responses. This is in response to the browsers specifying a Range header with the request.
However, further requests for the video through the CDN are NOT cached, and the whole video is re-downloaded by the browser (through a series of further 206 requests).
How do I ensure the video is cached? I understand the usefulness of partial responses, but in our case the video won't be seekable and we only play it when the whole file is downloaded. I can see a few approaches here, but none have helped so far:
Forbid Azure CDN from returning partial responses
Remove range header from original browser request somehow
Persuade browsers to cache 206 partial responses
I have tried adding a max-age Cache-Control header to the file but this had no impact. Ideally we wouldn't even hit Azure at all when re-loading the video (as it will never change), but I'm happy to accept the cost of the HTTP request to Azure if it subsequently returns a 304 .
Caching 206 responses is tricky. The RFC for the client requires that in order to cache the content the ETAG and the range requested must match exactly.
There are a couple of things you can check -
1) Verify that the ETAGS did not change on the request. From the description of your environment (and setting the content expiration date), this sounds unlikely, but it may be an avenue to pursue.
2) More likely is that the range requests are not lining up. A request for byte range 1000-->2000 and a second request of 1500-->2000 would not (per RFC) be served from the client cache. So you may be in a situation of seeing exactly what is supposed to happen with that particular format/client.
I'm pretty sure HTML5 only supports progressive download, so unless you wanted to reconsider the delivery this may be expected behavior.

Custom headers possible with URLRequest/URLStream using method GET?

Quite simple really:
var req:URLRequest=new URLRequest();
req.url="http://somesite.com";
var header:URLRequestHeader=new URLRequestHeader("my-bespoke-header","1");
req.requestHeaders.push(header);
req.method=URLRequestMethod.GET;
stream.load(req);
Yet, if I inspect the traffic with WireShark, the my-bespoke-header is not being sent. If I change to URLRequestMethod.POST and append some data to req.data, then the header is sent, but the receiving application requires a GET not a POST.
The documentation mentions a blacklist of headers that will not get sent. my-bespoke-header is not one of these. It's possibly worth mentioning that the originating request is from a different port on the same domain. Nothing is reported in the policyfile log, so it seems unlikely, but is this something that can be remedied by force loading a crossdomain.xml with a allow-http-request-headers-from despite the fact that this is not a crossdomain issue? Or is it simply an undocumented feature of the Flash Player that it can only send custom headers with a POST request?
From what I can gather, it seems like your assumption about the lack of custom headers support for HTTP GET is indeed an undocumented feature (or a bug?) in the standard libraries.
In any case, you might want to see if as3httpclient would fit your purposes and let you work around this issue. Here's a relevant snippet from a post in the blog of the developer of this library:
"I was not able to set the header of a
HTTP/GET request. Macromedia Flash
Player allows you set the header only
for POST requests. I discussed this
issues with Ted Patrick and he told me
how I can us Socket to achieve the
desired and he was very kind to give a
me code-snippet, which got me
started."
If this limitation was undocumented at one time, that's no longer the case. See:
http://livedocs.adobe.com/flex/3/langref/flash/net/URLRequest.html#requestHeaders
"[...] Due to browser limitations, custom HTTP request headers are only supported for POST requests, not for GET requests. [...]"