When an image is uploaded from the client's machine to the client (browser), it requires FileReader() API in html, thereafter a base64 encoded url (say) of the image is sent in chunks to the server, where it needs to be re-assembled. All of this is taken care by the developer.
However, when an image is sent from the server to the client, only the directory path of the image inside the server machine suffices, no chunking and encoding is required.
My questions are:
1. Does the server send the image in chunks to the html file. If it does not, how does sending full images not bottle server's network? What would happen in case of large video files?
2. In what form of binary data is the image sent to the client - base64url / ArrayBuffer / binarystring / text / etc.
3. If the server does send the image in chunks, who is doing the chunking and the re-assembly on the client thereafter?
Thanks.
HTML isn't really important here. What you care about are the transport protocols used - HTTP and TCP, most likely.
HTTP isn't chunked by default, though there are advanced headers that do allow that - those are mostly used to support seeking in large files (e.g. PDF, video). Technically, this isn't really chunking - it's just the infrastructure for allowing for partial downloads (i.e. "Give me data from byte 1024 to byte 2048.").
TCP is a stream-based protocol. From programmer point of view, that's all there is to it - no chunking. Technically, though, it will process your input data and send it as distinct packets that are reassembled in-order on the other side. This is a matter of practicality - it allows you to manage data latency, streaming, packet retransmission etc. TCP handles the specifics during connection negotiation - flow control, window sizing, congestion control...
That's not the end of it, though. All the other layers add their own bits - their own ways to package the payload and split it as necessary, their own ways to handle routing and filtering, their own ways to handle congestion...
Finally, just like HTTP natively supports downloads, it supports uploading data as well. Just send an HTTP request (usually POST or PUT) and write data in a format the server understands - anything from text through base-64 to raw binary data. The limiting factor in your case isn't the server, the browser or HTTP - it's JavaScript. The basic mechanism is still the same - a request followed by a response.
Now, to address your questions:
Server doesn't send images to the HTML file. HTML only contains an URL of the image[1], and when the browser sees an URL in the img tag, it will initiate a new, separate request just for the image data. It isn't fundamentally different from downloading a file from a link. As for the transfer itself, it follows pretty much exactly the same way as the original HTML document - HTTP headers, then the payload.
Usually, raw binary data. HTTP is a text-based protocol, but it's payload can be arbitrary. There's little reason to use Base-64 to transfer image data (though in the past, there have been HTTP and FTP servers that didn't support binary at all, so you had to use something like Base-64).
The HTTP server doesn't care (with the exception of "partial downloads" mentioned above). The underlying network protocols handle this.
[1] Nowadays, there's methods to embed images directly in the HTML text, but it's of varying practicality depending on the image size, caching requirements etc.
Related
I'm trying to understand the Google CDN behavior in the following scenario:
Let's assume I have a backend service serving chunked http data. For the sake of the explanation, let's assume that serving a single request takes up to 10s
Let's imagine the case where a file is requested through the CDN by a client A, and that this file is not currently cached in the CDN. The request will go to the backend service, that starts serving the file. Client A will immediately start receiving HTTP chunks
After 5s, another client B requests the same file. I can envision 3 possible behaviors, but I can't figure out how to control this through the CDN configuration:
Option a: the CDN simply pass the request to the backend service, ignoring that half of the file has already been served and could already be cached. Not desirable as the backend service will be reached twice and serve the same data twice.
Option b: the CDN puts the second request on "hold", waiting for the first one to be terminated before serving the client B from its cache (in that case, the request B does not reach the backend service). Ok, but still not amazing as client B will wait 5s before getting any http data.
Option c: the CDN immediately serves the first half of the http chunks and then serves the remaining http chunks at the same pace than request A. Ideal!
Any ideas on the current behavior ? And what could we do to get the option C, which is by far our preferred option ?
Tnx, have a great day!
Jeannot
It is important to note that GFE historically cached only complete responses and stored each response as a single unit. As a result, the current behavior will follow option A. You can take a look at this help center article for more details.
However, with the introduction of Chunk caching, which is currently in Beta, large response bodies are treated as a sequence of chunks that can each be cached independently. Response bodies less than or equal to 1 MB in size can be cached as a unit, without using chunk caching. Response bodies larger than 1 MB are never cached as a unit. Such resources will either be cached using chunk caching or not cached at all.
Only resources for which byte range serving is supported are eligible for chunk caching. GFE caches only chunk data received in response to byte range requests it initiated, and GFE initiates byte range requests only after receiving a response that indicates the origin server supports byte range serving for that resource.
To be more clear, once Chunk caching is in GA, you would be able to achieve your preferred option C.
Regarding your recent query, unfortunately, only resources for which byte range serving is supported are eligible for chunk caching at the moment. You can definitely create a feature request for your use case at Google Issue Trackers.
The good news is that chunk caching with Cloud CDN is now in GA and you can check the functionality anytime you wish.
I have a RestFul server that is suppuse to return a large json object more specifically an array of objects to browsers. For example 30,000 points will have a size of 6.5mb.
But I get this content mismatch error in browser when speed is slow. I feel it is because large data throught rest api breaks up something. Even in Postman sometimes it fails to render even though i see data of 6.5 mb received.
My Server is in NodeJS. and return content-type header is application/json.
My Question is
Would it make more sense if I return a .json file. Will the browser be able to handle. If yes, then I will download the file and make front end changes.
Old URL - http://my-rest-server/data
Proposed Url - http://my-rest-server/data.json
What would be content-type in the proposed url?
Your client can't possibly expect to want all of the data at once but still, want their data fast data.
...but you might want to look into sending data in chunks and streams:
https://medium.freecodecamp.org/node-js-streams-everything-you-need-to-know-c9141306be93
Is the HTTP response always in HTML? When a mobile application, uses HTTP to interact with back-end server and seeks some data in response, what is the format of that data? is it always in HTML?
No, of course not! HTTP is just a protocol to transfer an amount of data from the server to the client.
It's up to the client to decide what to do with the data. If you want, it can be a JSON, XML, HTML or just plain text. That's why you can have a look at the source of a webpage. In that case the webbrowser shows you (almost) the raw data which the server sent.
It's up to you how you want to handle it. As an HTML Syntaxed text, a Plain text, binary, or whatever you can imagine.
I am looking for a way to dump http request & reaponse body (json format) in resteasy on wildfly 8.2.
I've checked this answer Dump HTTP requests in WildFly 8 but it just dumps headers.
I want to see the incoming json message and outgoing one as well.
Can configuration do it without filter or any coding?
Logging HTTP bodies is not something frequently done. That's probably the primary reason for RequestDumpingHandler in Undertow only logging the header values. Also keep in mind that the request body is not always very interesting to log. Think for example of using WebSockets or transmitting big files. You can write your own MessageBodyReader/Writer for JAX-RS, and write to a ByteArrayOutputStream first, then log the captured content before passing it on. However, given the proven infeasibility of this in production, I think your mostly interested in how to do this during development.
You can capture HTTP traffic (and in fact any network traffic) using tcpflow or Wireshark. Sometimes people use tools such as netcat to quickly write traffic to a file. You can use for example the Chrome debugger to read HTTP requests/responses (with their contents).
I am currently working on trying to take a packet capture and work backwards to determine what objects are associated to each page request. For example if a packet capture contains 2 different webpages worth of requests I want to be able to determine for each object (TCP stream) which root page it is associated with. Is there an easy way to do this?
I know there are tools that will isolate the TCP streams and which will pull the data within them out, however I am not looking to replicate the webpage. I am simply looking to be able to associate each stream to the original page that requested it.
What you are trying to do is reconstructing the "call graph" of a browsing session. For a simply analysis, you can inspect just the HTTP headers. Bro makes this process very convenient. If site A loads site B, A typically shows up in the Referer header of B.
However, if you aim for completeness, this task becomes a daunting challenge: you need to parse the HTTP body payload and even JavaScript to determine all the URLs that are being created at runtime in the client, e.g., via AJAX, iframes, and friends.