There is a company I'm working with that says we are slowing down their web hosting software by hosting images on a separate domain.
I've told them what we are doing should only speed them up because there will be less file requests to their server.
They replied by saying that because they use HTML 4.0, their server is having to make image requests on the server side before they send content to the user.
This makes no sense to me and am trying to disprove this claim.
Am I wrong and just crazy?
I've been looking for articles on this for hours and have had no luck.
Proof that their statement is false would be greatly appreciated, and an article on this topic would be even more helpful.
Your mindset is correct. There is nothing about HTML4 that validates their claim in the context you provided us.
When you make a GET request to the server, you pull an HTML page. The browser then parses the document and makes additional requests, as declared in the document. Images are no exception. When it reaches an image, it will make a GET request to retrieve it to the specified URI. If that URI is not on the same domain, it is not going to make a request to the same domain. The server does not make the GET request for you.
Now, they could be doing something special that would cause it to operate more slowly, but nothing about the HTML4 spec would lead to it.
Simply has nothing to do with HTML 4, because you could target every Image in a <img src="http://other-server.de/bla.png" /> Tag. So if you point these tags to your own hosting solution it doesn't slow down their software, except you point these tags to their servers, and the servers fetch the images from the remote server. The browser always load resources from the URL, you put in the tag.
Except they rewrite the HTML code automatically on the fly, so they point to their servers.
EDIT: Maybe the page loads slowly because maybe your Image-Hosting-Server is responding slowly?!
Related
I'm new to HTML and coding period. I've created a basic HTML page. In that page i want to create dropdown selections that produce outputs from my SQL database. MSSQL not MySQL.
EX: If I select a table or a column from dropdown one and then input a keyword for selection box 2. I want it to produce a table that shows the information in that table/column with that keyword.
If I select a medical name from dropdown and I want it to show only medical names that are equal to Diabetes. and then show me those rows from my database to a table. How would I show that in HTMl from connecting to the database, to creating the dropdown selection linked to the database, and then being able to select the criteria for what I want to be displayed. and then showing that in a table or list format.
Thank you in advance
OK, Facu Carbonel's answer is a bit... chaotic, so since this question (suprisingly) isn't closed yet, I'll write one myself and try to do better.
First of all - this is a VERY BROAD topic which I cannot answer directly. I could give a pile of code, but walking through it all would take pages of text and in the end you'd just have a solution for this one particular problem and could start from scratch with the next one.
So instead I'll take the same path that Factu Carbonel took and try to show some directions. I'll put keywords in bold that you can look up and research. They're all pieces of the puzzle. You don't need to understand each of them completely and thoroughly from the beginning, but be aware what they are and what they do, so that you can google finer details when you need them.
First of all, you need to understand the roles of the "server side" and "client side".
The client side is the browser (Chrome, Firefox, Internet Explorer, what have you). When you type an address in the address bar (or click a link or whatever), what the browser does is it parses the whole thing and extracts the domain name. For example, the link to this question is https://stackoverflow.com/questions/59903087/sql-drop-down-selections-in-html?noredirect=1#comment105933697_59903087 and the domain part of that is stackoverflow.com. The rest of this long jibberish (it's called an "URL" by the way) is also relevant, but later.
With the domain in hand the browser then uses the DNS system to convert that pretty name into an IP address. Then it connects via network to the computer (aka "server") designated by that IP address and issues a HTTP request (HTTP, not HTML - don't mix these up, they're not the same thing).
HTTP, by the way, is the protocol that is used on the web to communicate between the server and the browser. It's like a language that they both understand, so that the browser can tell the server hey, give me the page /questions/59903087/sql-drop-down-selections-in-html. And the server then returns the HTML for that page.
This, by the way, is another important point to understand about HTTP. First the browser makes its request, and the server listens. Then the server returns its response, and the browser listens. And then the connection is closed. There's no chit-chat back and forth. The browser can do another request immediately after that, but it will be a new request.
Now, the browser is actually pretty limited in what it can do. Through these HTTP requests it gets from the server the HTML code, the CSS code and the Javascript code. It also can get pictures, videos and sound files. And then it can display them according to the HTML and CSS. And Javascript code runs inside the browser and can manipulate the HTML and CSS as needed, to respond to the user's actions. But that's all.
It might seem that the Javascript code that runs inside the browser is all powerful, but that is only an illusion as well. It's actually quite limited, and on purpose. In order to prevent bad webpages from doing bad things, the Javascript in each page is essentially limited to that page only.
Note a few things that it CANNOT do:
It cannot connect to something that doesn't use HTTP. Like an SQL server.
It can make HTTP requests, but only to the same domain as the page (you can get around this via CORS, but that's advanced stuff you don't need to worry about)
It cannot access your hard drive (well, it can if the user explicitly selects a file, but that's it)
It cannot affect other open browser tabs
It cannot access anything in your computer outside the browser
This, by the way, is called "sandboxing" - like, the Javascript code in the browser is only allowed to play in its sandbox, which is the page in which it was loaded.
OK, so here we can see, that accessing your SQL server directly from HTML/CSS/Javascript is impossible.
Fortunately, we still need to talk about the other side of the equation - the web server which responded to the browser's requests and gave it the HTML to display.
It used to be, far back in the early days of the internet, that web servers only returned static files. Those days are long gone. Now we can make the webserver return -- whatever we want. We can write a program that inspects the incoming request from the browser, and then generates the HTML on the fly. Or Javascript. Or CSS. Or images. Or whatever. The good thing about the server side is - we have FULL CONTROL over it. There are no sandboxes, no limits, your program can do anything.
Of course, it can't affect anything directly in the browser - it can only respond to the browsers requests. So to make a useful application, you actually need to coordinate both sides. There's one program running in the browser and one program running on the web server. They talk through HTTP requests and together they accomplish what they need to do. The browser program makes sure to give the user a nice UI, and the server program talks to all the databases and whatnot.
Now, while in browser you're basically limited to just Javascript and the features the browser offers you, on the server side you can choose what web server software and what programming language you use. You can use the same Javascript, or you can go for something like PHP, Java (not the same as Javasctipt!), C#, Ruby, Python, and thousands of others. Each language is different and does things its own way, but at the end of the day what it will do is that it will receive the incoming requests from the browser and generate some sort of output that the browser expects.
So, I hope that this at least gives you some starting point and outlines where to go from here.
First of all there is something that you need to know to do this, and that is the difference between a front-end and a back-end.
Html is a front-end technology, they are called like that because that's what is shown to the user and the back-end it's all mechanisms that run behind the hood.
The thing is, in your front-end you can't do things of back-end, like do querys from a database, manage sessions and that kind of thing.
For that you need a back-end running behind, like php, ruby, node.js or some technology like that.
From the html you can only call functions on the server using things like <form action="/log" method="POST"> this wold call the action /log that you should have already program on your back-end. Don't get confuse with this, there is plenty of ways to sending request to your back-end and this is just one way to do it.
For your specific case I recommend you to look up for ajax, to do the query on your database with no need of the browser to refresh after the query is made.
Some topics you need to know to understand this is:
-what's front-end and back-end and their differences.
-what is client-server architecture
-ajax
-http requests
-how to work with a back-end, doing querys to the database, making routes, etc.
-and for last, wile your server it's not open to the world with your own domain name, what is localhost and how to use it.
I hope that this clarify a bit this, that is no easy thing, but with a bit of research and practice you will accomplish!
Is there a reliable way to stop all browsers from caching an image locally.
This is not (just) about the freshness of the image, but rather a concern about sensitive images being stored on the local drive.
Adding a random url param to the img url as suggested in similar questions does not help because that just ensures the next request is not the last request in cache (at least that is my understanding). What I really need is for the image to never be saved locally or at least not accessible outside the browser session if it is saved.
You need to send appropriate cache-control headers when serving up the response for image request. See this post for information on standard ways to do this in several programming languages.
How to control web page caching, across all browsers?
There is an alternate, and possibly more foolproof yet more complex, approach which would be to directly populate base 64 encoded images data directly into the img src attrbitute. So far as I know this would not be subject to caching, as there is not a separate HTTP request made to retrieve the image. Of course you still need to make sure the page is not cached, which gets back to the initial problem of serving up appropriate headers for primary HTML request.
What security holes can appear on my site by including external images via img tag and how to avoid them?
I'm currently only checking the extension and mime-type of image on submission (that can be changed after URL is submitted) and URL is sanitized before putting it in src attribute.
There's probably a differentiation to be made here between who is at risk.
If all you're doing is storing URLs, and not uploading images to your server, then your site is probably safe, and any potential risk is to your users who view your site.
In essence, you're putting your trust in the reliability of the browser manufacturers. Things might be fine, but if a security hole in some browser one of your users uses were to arise that involved incorrectly parsing images that contain malicious code, then it's your users who will end up paying for it (you might find GIFAR interesting).
It comes down to whether you trust the browser manufacturers to make secure software, and whether you trust your users to not upload URLs to images that might contain exploits for certain browsers. What might be secure now might not be secure come the next release.
The primary holes that can be exposed are those where corrupted images cause buffer overflows within the browser, allowing arbitrary code execution.
If you're only putting the images into an <img> tag there shoudln't be any vulnerabilities relating to sending alternative MIME types, but never underestimate the stupidity of some web browser developers...
Well, obviously, you're not doing any checks on the data, so the data can be anything (the mime-type reported by the remote server doesn't necessarily tell the truth). Plus, as you said, the data on the remote server can be changed since you're never looking at it after submission.
As such, if the link is put into lets say an <img src="..."/>, then any vulnerability that a browser might have in the image handling can be exploited.
"Sanitizing" the URL doesn't help with anything: somebody submitting a link that points to a 'bad' image isn't going to attack his own server.
I maintain a local intranet site that among other things, displays movie poster images from IMDB.com. Until recently, I simply had a perl script download the images I needed and save them to the local server. But that became a HUGE space-hog, so I thought I could simply point my site directly to IMDB servers, since my traffic is very minimal.
The result was that some images would display, while others wouldn't. And images that were displayed, would sometimes disappear after a few refreshes. The images existed on the IMDB servers, they just wouldn't display on my page.
It seems unlikely to me that IMDB would somehow block this kind of access, but is that possible? Is there something that needs to be configured on my end?
I'm out of ideas - it just doesn't make sense to me.
I'm serving my pages with mod_perl and HTML::Mason, if that's relevant.
Thanks,
Ryan
Apache/2.2.14 (Unix) mod_ssl/2.2.14 OpenSSL/0.9.8l DAV/2 mod_perl/2.0.4 Perl/v5.10.0
Absolutely they would block that kind of access. You're using their bandwidth, which they have to pay for, for your web site. Sites will often look at the referrer, see that its not coming from their site, and either block or throttle access. Likely you're seeing this as an intermittent problem because IMDB is allowing you some amount of use of their images.
To find out more, look at the HTTP logs on your client. Either by using a browser plugin or by scripting it. Look at the HTTP response codes and you'll probably see some 4xx or 5xx responses.
I would suggest either caching the images in a cache that expires unused images, that will balance accesses with space, or perhaps getting a paid IMDB account. You may be able to get an API key to use to fetch images indicating you are a paying customer.
IMDB sure could be preventing your 'bandwidth theft' by checking the "referer". More info here: http://www.thesitewizard.com/archive/bandwidththeft.shtml
Why is it intermittent? Maybe they only implement this on some of the servers in their web farm.
Just to add to the existing answers, what you're doing is called "hotlinking", and people who run websites don't like it very much. Google for "hotlink blocking".
I have an ASP.NET web site technology that I use for scores of clients. Each client gets their own web site (a copy of the core site that can then be customized). The web site includes a fair amount of content - articles on health and wellness - that is loaded from a central content server. I can load the html for these articles from a central content server by copying from the content server and then inserting the text into the page as it is produced.
Easy so far.
However, these articles have image references that point back to the central server. The problem that I have is due to the fact that these sites are always accessed (every page) via an SSL link. When a page with an external image reference is loaded, the visitor receives a message that the page "contains both secure and insecure elements" (or something similar) because the images come from the (unsecured) server. There is really no way around this.
So, in your judgment, is it better to:
A) just put a cert on the content server so I can get the images over SSL? Are there problems there due to the page content having two certs? Any other thoughts?
B) change the links to the article presentation page so they don't use SSL? They don't need SSL but the left side of the page contains lots of links to pages that do need - all of which are now relative links. Making them all absolute links is grody because each client's site has its own URL so all links would need to be generated in code (blech).
C) Something else that I haven't thought of? This is where I am hoping that someone with experience in the area will offer something brilliant!
NOTE: I know that I can not get rid of the warning about insecure elements - it is there for a reason. I am just wondering if anyone else has experience in this area and has a reasonable compromise or some new insight.
Not sure how feasable this is but it may be possible to use a rewrite or proxy module to mirror the (img directory) structure on each clone to that of the central. With such a rule in place you could use relative img urls instead & internally rewrite all requests to these images over to the central server, silently
e.g.:
https://cloneA/banner.jpg -> http://central/static/banner.jpg
https://cloneB/topic7/img/header.jpg -> http://central/static/topic7/header.jpg
I'd go with B.
Sadly, I think you'll find this is a sad fact of life in SSL. Even if you were to put a cert on the other server, I think it may still get confused because of different sites [can't confirm nor deny though], and regardless, you don't want to waste the time of your media server by encrypting images.
I figured out a completely different way to import the images late last night after asking this question. In IIS, at least, you can set up "Virtual Directories" that can point essentially anywhere (I'm now evaluating whether to use a dedicated directory on each web server or a URL). If I use a dedicated directory on each server I will have three directories to keep up to date, but at least I don't have 70+.
Since each site will pull the images using resource locations found on the local site, then I don't have to worry about changing the SSL status of any page.