Issues with images, security and cache

Issues with images, security and cache - html

I've got some issues regarding images and secure access to the cloud (S3, in this case).
I do want to have direct access to avoid stressing my server so I do have two options:
1.) Generate a signed url on my server and send it back to client to load from
2.) Generate an Authorization-Header sent within the request from the client
Now, 1. & 2. work just fine, however, I am in trouble regarding images in html.
For 1.), images will be re-requested each time as the url changes every time and thus renders caching useless.
For 2.), caching would work as the url'd stay constant though I've seeked the net to figure how to add a custom http request header to image request including my security token without luck.
So, what options are left for me? How do others resolve that issue?
thanks
Alex

Related

Application Cache - HTML 5

In one of the online documents that talks about appcache for HTML5, it indicates that the cached files get updated once an offline user reconnects. I checked the original HTML5 appcache definition by W3, and I am not able to find anything that supports this statement.
Does anyone know if this is to be true?
Thanks in advance

MDN says the following, although if you scroll up on that page it says it's being deprecated.
If an application cache exists, the browser loads the document and its associated resources directly from the cache, without accessing the network. This speeds up the document load time.
The browser then checks to see if the cache manifest has been updated on the server.
If the cache manifest has been updated, the browser downloads a new version of the manifest and the resources listed in the manifest. This is done in the background and does not affect performance significantly.
And logic tells me that it would also depend on the app you're using, server you're trying to connect to and any special settings it might have, how long your browser keeps it's history, what it keeps, and if you saved the page to view offline - whether or not you have all the code/images saved in the right location(s).
Example:
Imagine you saved a page to view offline, and that page has a JS event handler that ran a while loop that did an ajax request every n seconds to do something, like make a number on a page change as long as you were online... As long as the loop is running, you suddenly connect to the internet, and it makes the request to the proper url with the right arguments, then it should go through, even though the url in your browser might say something like file:///C:/Users/you/Desktop/....
I've done this before, even though my url was like the one above. One time I was using braintree's drop-in javascript to a website, and using it's api on my backend. Trying to load the page when offline = Nothing. Online = Updated the spot on the page just fine when I had the required arguments, and it was pointing to the right url. If I got offline again, I could refresh the page, see the same images loaded in the <div>, but I couldn't send any data with it.

Displaying remote URL

First I must explain I am a total newbie with regards to web design.
My question is as follows:
I would like to have a remote URL displayed through a different web server. The remote URL resides on an internal firewalled server and I would like to give public access to a single page by displaying it on a remote web server that has access to the firewalled page. I have tried iframes but they use the clients IP which results in the page failing to display. I have limited access to the server (CPanel) - please advise how this is possible? The remote URL will be requiring a login - not sure if this will have relevance on the solution.

What you can do is create a page which makes a request to the firewalled page using either CURL or HttpWebRequest or any compatible technology based on the platform you have chosen. It can then trim out the headers and other tags which are not required and render the html in a div, or it can just redirect the entire code in the response of your page.
This way, there will be no connection made from the client end, just your server will connect to your firewalled server and fetch the page from there and in turn give it back.
The only problem here is, forms - images and linked objects might not work properly, you might also have to parse them and replace the respective urls to point to your server which in turn proxyfy it.
Here is an example of it
https://proxify.net/

Apache and HTML, post requests and actions - does an absolute URL leading to the same server get parsed as a local URL?

Not 100% sure if this is the right SE site to ask this, so feel free to move/warn me.
If I have a site www.mysite.com with a form on it and define its action as "http://www.mysite.com/handlepost" instead of "/handlepost", does it still get parsed as a local address by apache? That is, will apache figure out that I'm trying to send my form data to the same server the form resides on and do an automatic local post, or will the data be forced to make a round trip, going online, looking up the domain and actually being sent as an outside request?

Apache does not look at this information. It's your browser which does this job.
On the Apache side the job is only outputing content (html in this case), apache does not care about the way you write your url in this content.
On the browser side the page is analysed and GET requests (images,etc) are sent automatically to all collected url. The browser SHOULD know that relative url /foo are in fact http://currentsite/foo - or it's a really dump browser -. It is his job. And then it's his job to push the request to the right server (and to known if he should make a new DNS request, build a new HTTP connection, reuse an existing opened connection, build several connections -- usually max 3 conn per DNS--, etc). Apache does nothing in this part of the job.
So why absolute url are bad? Not because of the job the browser should have to do handling it (which is in fact nothing, his job is transforming relative url to absolute ones); It's because if your web application use only relative url the admin of the web server will have far more possibilities on proxying your application. For example:
he will be able to server your web application on several different DNS domains
(and then make the browser think he's talking to several servers, parallelizing static files downloads)
he could as use use this multi-domain to set up the application for different costumers
he could build an HTTPS access for external network access and an HTTP (without the S) access on a local name for the local network
And if your application is building the absolute url these tasks will become really harder.

dont use absolute URL's . As i feel it will do a round trip in your case as you have used round trip for the action part. so better use releative URL's

What is the complete process from entering a url to the browser's address bar to get the rendered page in browser?

I'm thinking about this question for a long time. It is a big question, since it almost covers all corners related to web developing.
In my understanding, the process should be like:
enter the url to the address bar
a request will be sent to the DNS server based on your network configuration
DNS will route you to the real IP of the domain name
a request(with complete Http header) will be sent to the server(with 3's IP to identify)'s 80 port(suppose we don't specify another port)
server will search the listening ports and forward the request to the app which is listening to 80 port(let's say nginx here) or to another server(then 3's server will be like a load balancer)
nginx will try to match the url to its configuration and serve as an static page directly, or invoke the corresponding script intepreter(e.g PHP/Python) or other app to get the dynamic content(with DB query, or other logics)
a html will be sent back to browser with a complete Http response header
browser will parse the DOM of html using its parser
external resources(JS/CSS/images/flash/videos..) will be requested in sequence(or not?)
for JS, it will be executed by JS engine
for CSS, it will be rendered by CSS engine and HTML's display will be adjusted based on the CSS(also in sequence or not?)
if there's an iframe in the DOM, then a separate same process will be executed from step 1-12
The above is my understanding, but I don't know whether it's correct or not? How much precise? Did I miss something?
If it's correct(or almost correct), I hope:
Make the step's description more precise in your words, or write your steps if there is a big change
Make a deep explanation for each step which you are most familiar with.
One answer per step. Others can make supplement in each answer's comment.
And I hope this thread can help all web developers to have a better understanding about what we do everyday.
And I will update this question based on the answers.
Thanks.

As you say this is a broad question where it's possible to go into great detail on a number of topics. There's nothing wrong with the sequence you described, but you're leaving out a lot of detail. To mention a few:
The DNS layer can help direct clients to different servers based on geographical location to help with load balancing and latency minimization, and one server can respond to requests from many different DNS names.
A browser can make different types of requests (GET, POST, HEAD, etc), and usually includes several different headers including cookies, browser capabilities, language preferences, etc.
Most browsers usually maintain a cache in order to avoid downloading stuff many times, and use various techniques to determine whether the cached version of a file is valid.
In modern webpages there's often complex interaction between many different kinds of files (HTML, CSS, images, JavaScript, video, Flash, ...), and web developers often need detailed knowledge of differences among browsers in order to keep their pages working for everyone
Each of these topics, and many more, could be discussed at length. Perhaps it's more practical to ask more specific questions about the topics you're interested in?

You type maps.google.com(Uniform Resource Locator) into the address bar of your browser and press enter.
Every URL has a unique IP address associated with it. The mapping is stored in Name Servers and this procedure is called Domain Name System.
The browser checks its cache to find the IP Address for the URL.
If it doesn't find it, it checks its OS to find the IP address (gethostname);
It then Checks the router's cache.
It then checks the ISP's cache. If it is not available there the ISP makes a recursive request to different name servers.
It Checks the com name server (we have many name servers such as 'in', 'mil', 'us' etc) and it will redirect to google.com
google.com name server will find the matching IP address for maps.google.com in its’ DNS records and return it to your DNS recursor which will send it back to your browser.
Browser initiates a TCP connection with the server.It uses a three way handshake
Client machine sends a SYN packet to the server over the internet asking if it is open for new connections.
If the server has open ports that can accept and initiate new connections, it’ll respond with an ACKnowledgment of the SYN packet using a SYN/ACK packet.
The client will receive the SYN/ACK packet from the server and will acknowledge it by sending an ACK packet.
Then a TCP connection is established for data transmission!
The browser will send a GET request asking for maps.google.com web page. If you’re entering credentials or submitting a form this could be a POST request.
The server sends the response.
Once the server supplies the resources (HTML, CSS, JS, images, etc.) to the browser it undergoes the below process:
Parsing - HTML, CSS, JS
Rendering - Construct DOM Tree → Render Tree → Layout of Render Tree → Painting the render tree
The rendering engine starts getting the contents of the requested document from the networking layer. This will usually be done in 8kB chunks.
A DOM tree is built out of the broken response.
New requests are made to the server for each new resource that is found in the HTML source (typically images, style sheets, and JavaScript files).
At this stage the browser marks the document as interactive and starts parsing scripts that are in "deferred" mode: those that should be executed after the document is parsed. The document state is set to "complete" and a "load" event is fired.
Each CSS file is parsed into a StyleSheet object, where each object contains CSS rules with selectors and objects corresponding CSS grammar. The tree built is called CSSCOM.
On top of DOM and CSSOM, a rendering tree is created, which is a set of objects to be rendered. Each of the rendering objects contains its corresponding DOM object (or a text block) plus the calculated styles. In other words, the render tree describes the visual representation of a DOM.
After the construction of the render tree it goes through a "layout" process. This means giving each node the exact coordinates where it should appear on the screen.
The next stage is painting–the render tree will be traversed and each node will be painted using the UI backend layer.
Repaint: When changing element styles which don't affect the element's position on a page (such as background-color, border-color, visibility), the browser just repaints the element again with the new styles applied (that means a "repaint" or "restyle" is happening).
Reflow: When the changes affect document contents or structure, or element position, a reflow (or relayout) happens.

i was also searching for the same thing and found this awesome detailed answer being built collaboratively at github

I can describe one point here -
Determining which file/resource to execute, which language interpreter to load.
Pardon me if I am wrong in using interpreter here. There may be other mistakes in my answer, I will try to correct them later and include proper technical terms for things.
When the web server (e.g. apache) has received the URI it checks if there is any existing rewrite rule matching it. In that case the rewritten URI is taken. In either case, if there is no file name to end the URI, the default file is loaded, which is generally index.html or index.php etc. According to the extension of the file name, the appropriate apache module for server-side programming language support is loaded, e.g. mod_php for PHP, mod_python in case of python. The appropriate server side language interpreter (considering interpreted languages like PHP) then prepares the final HTML or output in some other form for the web server which finally sends it as the HTTP response.

I hope above image help you to understand whole process.
Full article is here

What to do with chrome sending extra requests?

Google chrome sends multiple requests to fetch a page, and that's -apparently- not a bug, but a feature. And we as developers just have to deal with it.
As far as I could dig out in five minutes, chrome does that just to make the surfing faster, so if one connection gets lost, the second will take over.
I guess if the website is well developed, then it's functionality won't break by this, because multiple requests are just not new.
But I'm just not sure if I have accounted for all the situations this feature can produce.
Would there be any special situations? Any best practices to deal with them?
Update 1: Now I see why my bank's page throws an error when I open the page with chrome! It says: "Only one window of the browser should be open." That's their solution to security threats?!!

Your best bet is to follow standard web development best practises: don't change application state as a result of a GET call.
If you're worried I recommend updating your data layer unit tests for GET calls to be duplicated & ensure they return the same data.
(I'm not seeing this behaviour with Chrome 8.0.552.224, by the way, is very new?)

I saw the subjected behavior while writing a server application and found that earlier answers are probably not true.
Chrome distributes a single request into multiple http ones to fetch resources in parallel. In this case, it is an image which it fetches as a separate http get.
I have attached screen shot of packet capture through wireshark.
It is for a simple get request to port 8080 for which the server returns a hello message.
Chrome sends the second get request for obtaining favorite icon which you see on top of every tab opened. It is NOT a second get to cater time out or any such thing.
It should be considered another element that differs across browsers. However, doing things in multiple http requests in parallel is kind of a standard thing in browsers as of 2018.
Here is a reference question that i found latter
Chrome sends two requests SO
Chrome issue on google code

It also can be caused by link tags with empty href attributes, at least in Chromium (v41). For example, each of the following line will generate an additional query on the page :
<link rel="shortcut icon" href="" />
<link rel="icon" type="image/x-icon" href="" />
<link rel="icon" type="image/png" href="" />
It seams that looking for empty attributes in the page is a good starting point, either href or src.

This behavior can be caused by SRC='' or SRC='#' in IMG or (as in my case) IFRAME tag. Replacing '#' with 'about:blank" has fixed the problem.
Here http://forums.mozillazine.org/viewtopic.php?f=7&t=1816755 they say that SCRIPT tags can be the issue as well.

My observation of this characteristic (bug/feature/whatever) occurs when I am typing in a URL and the Autocomplete lands on a match while still typing in the URL.
Chrome takes that match and fetches the page, I assume for the caching benefits that would occur when loading the page yourself....

I have just implemented a single-use Guid token (asp.net/TSQL) which is generated when the first form in a series of two (+confirmation page) is generated. The Token is recorded as "pending" in the DB when it is generated. The Guid token accompanies posts as a hidden field, and is finally marked as closed when the user operation is completed (payment). This mechanism does work, and prevents any of the forms being resubmitted after the payment is made. However, I see 2 or 3 (!?) additional tokens generated by additional requests quickly one after the other. The first request is what ends up in front of the user (localhost - so ie., me), where the generated content ends up for the other two requests I have no idea. I wondered initially why Page_Load handlers were firing mutliple times for one page impression, so I tried a flag in Http.Context.Current - but found to my dismay, that the subsequent requests come in on the same URL but with no post data, and empty Http.Context.Current arrays - ie., completely (for practical purposes) seperate http requests. How to handle this? Some sort of token and logic to refuse subsequent page body content requests while the first is still processing? I guess this could take place as a global context?

This only happens when I enable "webug" extension (which is a FirePHP replacement for chrome). If I disable the extension, server only gets one request.

I just want to update on this one. I've encountered the same problem but on css style.
I've looked at all my src, href, script tag and none of them had an empty string. The offending entry was this:
<div class="Picture" style="background-image: url('');"> </div>
Make sure you also check your styles for empty url string

I was having this problem, but none of the solutions here were the issue. For me, it was caused by the APNG extension in Chrome (support for animated PNGs). Once I disabled that extension, I no longer saw double requests for images in the browser. I should note that regardless of whether the page was outputting a PNG image, disabling this extension fixed the issue (i.e., APNG seems to cause the issue for images regardless of image type, they don't have to be PNG).
I had many other extensions as well (such as "Web Developer" which many have suggested is the issue), and those were not the problem. Disabling them did not fix the issue. I'm also running in Developer Mode and that didn't make a difference for me at all.

In my case, it was Chrome (v65) making a second GET /favicon.ico, even though the response was text/plain thus clearly no <link in there referring the icon. It stopped doing that after I replied with a 404.
Firefox (v59) was sending 2 requests for favicon; again it stopped doing this after the 404.

I'm having the same bug. And like the previous answer this issue is because I've installed the Validator chrome extension
Once disable the extension, works normally.

In my case I have enpoint (json) data to a different server and browser make first an empty request(Request Method:OPTIONS) to check if a endpoint accept requests from my server, Same-origin policy. Also goot to know is a Angular 1 App.
In conclusion I make requests from localhost to a online fake json data.

I had empty tcp packet sent by Chrome to my simple server before normal html GET query and /favicon after. Favicon wasn`t a problem but empty tcp was, since my server was waiting either for data or for connection to be finished. It had no data and wouldn't release connection for 2 minutes. So thread was hanging for 2 minutes.
Jrummell's Link in a comment to original post helped me. It says empty tcp packets could be caused by "Predict network actions to improve page load performance" setting. I tried turning off prediction settings one by one and it worked. In chrome version 73.0.3683.86 (Official Build) (64-bit) this behavior was caused by chrome setting "Use a prediction service to load pages more quickly" turned on.
So in chrome~73 you can try going to setting -> advanced -> privacy and security -> Use a prediction service to load pages more quickly and turn it OFF.

It could be situation when Chrome send in start the request with method OPTIONS and only the second is real request with method GET. Usually in code we deal only with GET (or POST/PUT/DELETE..) but not with OPTIONS. Check if the first request has method OPTIONS.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008