Understanding results from Chrome request timing - html

I am doing some profiling of my web app and I am puzzled about the results I am getting from the Chrome built-in timing of requests. To put things simple, lets say I request a web page composed of:
The HTML.
The static files (js, css, etc.).
Data loaded using AJAX.
According to the timing provided by Chrome the main HTML loading time is like this:
So far, so good. If I understand it, this means my server takes around 7-8 secs to process the request and send the response. In fact, according to this info, the first byte arrives at 7.79 secs, so the browser cannot start processing the HTML and loading other resources until this time.
However, if I look at the timing for static resources and AJAX requests, I found something puzzling. For instance, this is the timing for the request for bootstrap.min.css:
Notice the request was queued at 21.6 ms, much earlier than the arrival of the first byte from the HTML. By the way, the file was retrieved from cache (from disk cache, according to Chrome), but how did the browser know the HTML were going to request that file before receiving the response?
In the case of the AJAX request we have something similar:
but this time is even more strange, because to perform this request some javascript from the html should be processed, but the first byte was not received when the AJAX request was queued!.
How can this be possible? How am I suppossed to interpret these results regarding the performance of the website (i. e. did the HTML really take 7 secs to load the very first byte)?

Related

How this webpage data access works?

I'm trying to get data from this site: [1] https://www.eurobet.it/it/scommesse/#!/calcio/?temporalFilter=TEMPORAL_FILTER_OGGI_DOMANI
I found this link where I can get the data in JSON format: [2] https://www.eurobet.it/detail-service/sport-schedule/services/discipline/calcio?prematch=1&live=0&temporalFilter=TEMPORAL_FILTER_OGGI_DOMANI
But there is a problem:
The JSON link Doesn't work every time in fact sometimes I get a 404 error.
I noticed that if I open the first link [1] before opening the second [2] it works perfectly.
This error is also more frequent when I try to scrape other data on the same site: [3] https://www.eurobet.it/detail-service/sport-schedule/services/discipline/calcio/piu-giocate/u-o-goal?prematch=1&live=0&temporalFilter=TEMPORAL_FILTER_OGGI_DOMANI
In this link [3] I try to get all "u-o-goal" odds but this link works only if (before starting my program to scrape data) in the main link [1] I press the "U/O GOAL" button -> https://i.stack.imgur.com/Nei5u.png
In my code, I'm using Java and htmlunit to scrape the data.
My question is: how this webpage works, why couldn't I open directly the links [2]/[3], I know that there is a sort of request and approval system behind but I can't see where.
You cannot directly open these URLs since the website (and many like it) will use cookies and bot-prevention techniques/session tracking so they can gather data about usage of their website. eg. they set a "Referer".
I'm not going to code a solution for you but I can at least help you understand what you need to do to get to where you want...
I've attempted to summarise how I'd typically unpick a request like this to recreate it, but in its essence, you need to understand the sequence of HTTP requests being made (this is how the web works - HTTP requests).
First you typically start with no session cookies and you access the site directly (no referer).
Once you access a website, typically the server responds with a session cookie for you to communicate back to the server a unique session ID so it has some sort of record of your browser having already been in contact.
Your browser may make more requests (asynchronously) and in doing so typically sends the cookies and the referring URL (usually the base Url will work... just don't use something that starts with something other than "https://www.eurobet.it"
anything else you're going to need to figure it out. Lots of headers are optional. Lots of query params have defaults.
https://stackoverflow.com/a/64671815/7619034 - here's an answer I've given before that answers this type of question which comes up often enough.
so to explain a bit further, for your specific scenario...
When you access https://www.eurobet.it/it/scommesse/#!/calcio/?temporalFilter=TEMPORAL_FILTER_OGGI_DOMANI, the server responds with HTTP headers:
...
set-cookie: __cfduid=dd38d***********41125; ...
...
The rest doesn't look that relevant:
Going straight to the other request: https://www.eurobet.it/detail-service/sport-schedule/services/discipline/calcio?prematch=1&live=0&temporalFilter=TEMPORAL_FILTER_OGGI_DOMANI
This HTTP request takes (as input):
cookie: __cfduid=dd38d***********41125; mbox=session#6661556c.....b6e8cc1fa6f03#1608242987; at_check=true; s_ecid=MCMID%***********2021453010; AMCVS_45F10C3A53DAEC9F0A490D4D%40AdobeOrg=1; AMCV_45F10C3A53DAEC9F0A490D4D%40AdobeOrg=1075005958%7CMCIDTS%7C18614%7CMCMID%7C91883906030825914429183258312021453010%7CMCAID%7CNONE%7CMCOPTOUT-1608248327s%7CNONE%7CvVersion%7C4.4.1; s_cc=true
...
referer: https://www.eurobet.it/it/scommesse/
...
x-eb-accept-language: it_IT
x-eb-marketid: 5
x-eb-platformid: 1
Cookies are set in an initial request (typically) using Set-Cookie header and then are passed back to the server in subsequent requests using the cookie header.
I'm not certain how many of these values are relevant but you'd need to figure out where each came from in the chain of HTTP requests between the initial one and this one and you'd need to replicate them (see url above of my previous answer - warning this can be time consuming).
The other headers can be set statically most likely since they probably aren't due to change.
If you have access to curl on the command line, you can attempt to reconstruct some of these requests by hand. Some will be time sensitive since cookies do expire after an amount of time (see set-cookie header details for exactly when). Once you've reconstructed a working request, you can then start coding it in your application.
If you can work all this out you should be able to re-construct the chain of HTTP GET requests to get the JSON data you want. Good luck!

Google CDN behavior when serving concurrent (chunked) requests

I'm trying to understand the Google CDN behavior in the following scenario:
Let's assume I have a backend service serving chunked http data. For the sake of the explanation, let's assume that serving a single request takes up to 10s
Let's imagine the case where a file is requested through the CDN by a client A, and that this file is not currently cached in the CDN. The request will go to the backend service, that starts serving the file. Client A will immediately start receiving HTTP chunks
After 5s, another client B requests the same file. I can envision 3 possible behaviors, but I can't figure out how to control this through the CDN configuration:
Option a: the CDN simply pass the request to the backend service, ignoring that half of the file has already been served and could already be cached. Not desirable as the backend service will be reached twice and serve the same data twice.
Option b: the CDN puts the second request on "hold", waiting for the first one to be terminated before serving the client B from its cache (in that case, the request B does not reach the backend service). Ok, but still not amazing as client B will wait 5s before getting any http data.
Option c: the CDN immediately serves the first half of the http chunks and then serves the remaining http chunks at the same pace than request A. Ideal!
Any ideas on the current behavior ? And what could we do to get the option C, which is by far our preferred option ?
Tnx, have a great day!
Jeannot
It is important to note that GFE historically cached only complete responses and stored each response as a single unit. As a result, the current behavior will follow option A. You can take a look at this help center article for more details.
However, with the introduction of Chunk caching, which is currently in Beta, large response bodies are treated as a sequence of chunks that can each be cached independently. Response bodies less than or equal to 1 MB in size can be cached as a unit, without using chunk caching. Response bodies larger than 1 MB are never cached as a unit. Such resources will either be cached using chunk caching or not cached at all.
Only resources for which byte range serving is supported are eligible for chunk caching. GFE caches only chunk data received in response to byte range requests it initiated, and GFE initiates byte range requests only after receiving a response that indicates the origin server supports byte range serving for that resource.
To be more clear, once Chunk caching is in GA, you would be able to achieve your preferred option C.
Regarding your recent query, unfortunately, only resources for which byte range serving is supported are eligible for chunk caching at the moment. You can definitely create a feature request for your use case at Google Issue Trackers.
The good news is that chunk caching with Cloud CDN is now in GA and you can check the functionality anytime you wish.

HTML5 video fully buffered but continuously send network request when seeking

When I fully buffered a mp4 file and seek, browser continuously sending network request and buffering (sometimes ~100ms simetimes 500ms depending on network). Someone knows why?
Even If I use a local file and seek, it also send Range request!
I think if you look at the request in detail you'll see the requests are cancelled (on Chrome anyway, which is what you are using above). See below for example (this happens when moving back along the timeline of a short video):
I suspect that the browser is simply making the request first as an optimisation and then cancelling it when it checks and confirms that it has the video already buffered.
You also should be able to see that the first request will have a range request from 0 onwards and the request when you move along the time bar have an offset reflecting where you moved to:

Understanding Firebug's Net panel

I am trying to get a hang on analysing the performance of a web page using Firebug's Net panel.
The following screenshot shows an example of a google query. For the sake of this discussion I clicked twice, so some requests are cached.
So here are my questions:
1) What is happening between the end of the first request and the beginning of the next request (which is the third one). In the same context: Why is the third request starting earlier than the second request?
2) The next 6 requests are coming from the cache. The purple bar is indicating waiting time and I assumed this is the time the browser "waiting for the server to to something". So as comes from cache, what exactly is the browser waiting for. Also: What could be the reason, that the waiting time for for a 4,4KB response is longer (63ms) than for a 126,3 KB response (50ms).
3) In the next request there is a fairly long green bar indicating the time of receiving the response. How comes that this doesn't seem to be at least fairly proportional to the size of the response?
4) The red vertical line indicates the load event. According to https://developer.mozilla.org/en-US/docs/Web/Events/load this means: "The load event is fired when a resource and its dependent resources have finished loading." In the timeline you can see that there are still a couple of requests performed after the load event. How comes? Are they considered to be not dependent and if so why?
The response of the first request needs to be parsed to find out what else needs to be loaded. This parsing time causes the gap to the second request. See also my answer to a related question.
Responses coming from cache still have an associated network request, which returns the 304 HTTP status code. You can see the request and response headers as well as the response headers of the cached response when you expand the request.
In contrast to that there is also a response that is directly served from a special cache called Back-Forward Cache (or BFCache). These responses happen directly after the browser start when you have the option enabled to restore your tabs from the last session and also when you navigate back and forth in the tab history.
This depends on the network connection speed and the response's size in the first place but also on how long the server takes to send the full response. Why that one request takes that long in comparison to the others can't be explained without knowing what happens on the server side.
The load event is fired when the page request is loaded including all its depending resources like CSS, images, JavaScript sources, etc. Requests initiated after the load event are loaded asyncronously, e.g. through an XMLHttpRequest or the defer attribute of the element.

Determine which webpage an object in a packet capture is associated with

I am currently working on trying to take a packet capture and work backwards to determine what objects are associated to each page request. For example if a packet capture contains 2 different webpages worth of requests I want to be able to determine for each object (TCP stream) which root page it is associated with. Is there an easy way to do this?
I know there are tools that will isolate the TCP streams and which will pull the data within them out, however I am not looking to replicate the webpage. I am simply looking to be able to associate each stream to the original page that requested it.
What you are trying to do is reconstructing the "call graph" of a browsing session. For a simply analysis, you can inspect just the HTTP headers. Bro makes this process very convenient. If site A loads site B, A typically shows up in the Referer header of B.
However, if you aim for completeness, this task becomes a daunting challenge: you need to parse the HTTP body payload and even JavaScript to determine all the URLs that are being created at runtime in the client, e.g., via AJAX, iframes, and friends.