Can an external image URL be a security vulnerability? [duplicate] - html

What security holes can appear on my site by including external images via img tag and how to avoid them?
I'm currently only checking the extension and mime-type of image on submission (that can be changed after URL is submitted) and URL is sanitized before putting it in src attribute.

There's probably a differentiation to be made here between who is at risk.
If all you're doing is storing URLs, and not uploading images to your server, then your site is probably safe, and any potential risk is to your users who view your site.
In essence, you're putting your trust in the reliability of the browser manufacturers. Things might be fine, but if a security hole in some browser one of your users uses were to arise that involved incorrectly parsing images that contain malicious code, then it's your users who will end up paying for it (you might find GIFAR interesting).
It comes down to whether you trust the browser manufacturers to make secure software, and whether you trust your users to not upload URLs to images that might contain exploits for certain browsers. What might be secure now might not be secure come the next release.

The primary holes that can be exposed are those where corrupted images cause buffer overflows within the browser, allowing arbitrary code execution.
If you're only putting the images into an <img> tag there shoudln't be any vulnerabilities relating to sending alternative MIME types, but never underestimate the stupidity of some web browser developers...

Well, obviously, you're not doing any checks on the data, so the data can be anything (the mime-type reported by the remote server doesn't necessarily tell the truth). Plus, as you said, the data on the remote server can be changed since you're never looking at it after submission.
As such, if the link is put into lets say an <img src="..."/>, then any vulnerability that a browser might have in the image handling can be exploited.
"Sanitizing" the URL doesn't help with anything: somebody submitting a link that points to a 'bad' image isn't going to attack his own server.

Related

User submitted images with http:// urls are causing browsers to warn that the page is not secure

I'm working for a forum owner who allows users to submit hotlinked images from other domains in their posts. If they choose to use an http version of the URL, the otherwise clean page becomes insecure in the eyes of a browser, which some percentage of the time triggers a worried email from certain users.
I can't rewrite the urls, since I can't code against the assumption that future off site images will have https available. For the same reason, I can't use protocol relative src attributes. I'm unwilling to fetch and cache the images on our server just so that they can be served over https, because of the computational expense involved.
What can I do? Is there some piece of HTML syntax or some similar which I can use to tell the browser "This image doesn't matter, and doesn't constitute a security threat"?
This isn't possible. The image may not constitute a security threat but MITM attacks could still lead to images other than the intended one being loaded over the network, and who knows what an attacker may want to supplant that image with. My suggestion would be to pass the annoyance on to your users and tell them they can only use https URLs.

Do any common email clients pre-fetch links rather than images?

Although I know a lot of email clients will pre-fetch or otherwise cache images. I am unaware of any that pre-fetch regular links like some link
Is this a practice done by some emails? If it is, is there a sort of no-follow type of rel attribute that can be added to the link to help prevent this?
As of Feb 2017 Outlook (https://outlook.live.com/) scans emails arriving in your inbox and it sends all found URLs to Bing, to be indexed by Bing crawler.
This effectively makes all one-time use links like login/pass-reset/etc useless.
(Users of my service were complaining that one-time login links don't work for some of them and it appeared that BingPreview/1.0b is hitting the URL before the user even opens the inbox)
Drupal seems to be experiencing the same problem: https://www.drupal.org/node/2828034
Although I know a lot of email clients will pre-fetch or otherwise cache images.
That is not even a given already.
Many email clients – be they web-based, or standalone applications – have privacy controls that prevent images from being automatically loaded, to prevent tracking of who read a (specific) email.
On the other hand, there’s clients like f.e. gmail’s web interface, that tries to establish the standard of downloading all referenced external images, presumably to mitigate/invalidate such attempts at user tracking – if a large majority of gmail users have those images downloaded automatically, whether they actually opened the email or not, the data that can be gained for analytical purposes becomes watered down.
I am unaware of any that pre-fetch regular links like some link
Let’s stay on gmail for example purposes, but others will behave similarly: Since Google is always interested in “what’s out there on the web”, it is highly likely that their crawlers will follow that link to see what it contains/leads to – for their own indexing purposes.
If it is, is there a sort of no-follow type of rel attribute that can be added to the link to help prevent this?
rel=no-follow concerns ranking rather than crawling, and a no-index (either in robots.txt or via meta element/rel attribute) also won’t keep nosy bots from at least requesting the URL.
Plus, other clients involved – such as a firewall/anti-virus/anti-madware – might also request it for analytical purposes without any user actively triggering it.
If you want to be (relatively) sure that any action is triggered only by a (specific) human user, then use URLs in emails or other kind of messages over the internet only to lead them to a website where they confirm an action to be taken via a form, using method=POST; whether some kind of authentication or CSRF protection might also be needed, might go a little beyond the context of this question.
All Common email clients do not have crawlers to search or pre-build <a> tag related documents if that is what you're asking, as trying to pre-build and cache a web location could be an immense task if the page is dynamic or of large enough size.
Images are stored locally to reduce load time of the email which is a convenience factor and network load reduction, but when you open an email hyperlink it will load it in your web browser rather than email client.
I just ran a test using analytics to report any server traffic, and an email containing just
linktomysite
did not throw any resulting crawls to the site from outlook07, outlook10, thunderbird, or apple mail(yosemite). You could try using a wireshark scan to check for network traffic from the client to specific outgoing IP's if you're really interested
You won't find any native email clients that do that, but you could come across some "web accelerators" that, when using a web-based email, could try to pre-fetch links. I've never seen anything to prevent it.
Links (GETs) aren't supposed to "do" anything, only a POST is. For example, your "unsubscribe me" link in your email should not directly unsubscribe th subscriber. It should "GET" a page the subscriber can then post from.
W3 does a good job of how you should expect a GET to work (caching, etc.)
http://www.w3schools.com/tags/ref_httpmethods.asp

Prevent local caching of images, for security

Is there a reliable way to stop all browsers from caching an image locally.
This is not (just) about the freshness of the image, but rather a concern about sensitive images being stored on the local drive.
Adding a random url param to the img url as suggested in similar questions does not help because that just ensures the next request is not the last request in cache (at least that is my understanding). What I really need is for the image to never be saved locally or at least not accessible outside the browser session if it is saved.
You need to send appropriate cache-control headers when serving up the response for image request. See this post for information on standard ways to do this in several programming languages.
How to control web page caching, across all browsers?
There is an alternate, and possibly more foolproof yet more complex, approach which would be to directly populate base 64 encoded images data directly into the img src attrbitute. So far as I know this would not be subject to caching, as there is not a separate HTTP request made to retrieve the image. Of course you still need to make sure the page is not cached, which gets back to the initial problem of serving up appropriate headers for primary HTML request.

HTML 4.0 problems with hosting images on other domain

There is a company I'm working with that says we are slowing down their web hosting software by hosting images on a separate domain.
I've told them what we are doing should only speed them up because there will be less file requests to their server.
They replied by saying that because they use HTML 4.0, their server is having to make image requests on the server side before they send content to the user.
This makes no sense to me and am trying to disprove this claim.
Am I wrong and just crazy?
I've been looking for articles on this for hours and have had no luck.
Proof that their statement is false would be greatly appreciated, and an article on this topic would be even more helpful.
Your mindset is correct. There is nothing about HTML4 that validates their claim in the context you provided us.
When you make a GET request to the server, you pull an HTML page. The browser then parses the document and makes additional requests, as declared in the document. Images are no exception. When it reaches an image, it will make a GET request to retrieve it to the specified URI. If that URI is not on the same domain, it is not going to make a request to the same domain. The server does not make the GET request for you.
Now, they could be doing something special that would cause it to operate more slowly, but nothing about the HTML4 spec would lead to it.
Simply has nothing to do with HTML 4, because you could target every Image in a <img src="http://other-server.de/bla.png" /> Tag. So if you point these tags to your own hosting solution it doesn't slow down their software, except you point these tags to their servers, and the servers fetch the images from the remote server. The browser always load resources from the URL, you put in the tag.
Except they rewrite the HTML code automatically on the fly, so they point to their servers.
EDIT: Maybe the page loads slowly because maybe your Image-Hosting-Server is responding slowly?!

Pros and Cons of a separate image server (e.g. images.mydomain.com)?

We have several images and PDF documents that are available via our website. These images and documents are stored in source control and are copied content on deployment. We are considering creating a separate image server to put our stock images and PDF docs on - thus significantly decreasing the bulk of our deployment package.
Does anyone have experience with this approach?
I am wondering about any "gotchas" - like XSS issues and/or browser issues delivering content from the alternate sub-domain?
Pro:
Many browsers will only allocate two sockets to downloading assets from a single host. So if index.html is downloaded from www.domain.com and it references 6 image files, 3 javascript files, and 3 CSS files (all on www.domain.com), the browser will download them 2 at a time, with the other blocking until a socket is free.
If you pull the 6 image files off onto a separate host, say images.domain.com, you get an extra two sockets dedicated to download your images. This parallelizes the asset download process so, in theory, your page could render twice as fast.
Con:
If you're using SSL, you would need to either get an additional single-host SSL certificate for images.domain.com or a wildcard SSL certificate for *.domain.com (matches any subdomain). Failure to do so will generate a warning in the browser saying the page contains mixed secure and insecure content.
You will also, with a different domain, not send the cookies data with every request. This can increase performance.
Another thing not yet mentioned is that you can use different web servers to serve different sorts of content. For example, your static content could be served via lighttpd or nginx while still serving your dynamic content off Apache.
Pros:
-load balancing
-isolating a different functionality
Cons:
-more work (when you create a page on the main site you would have to maintain the resources on the separate server)
Things like XSS is a problem of code not sanitizing input (or output for that matter). The only issue that could arise is if you have sub-domain specific cookies that are used for authentication.. but that's really a trivial fix.
If you're serving HTTPS and you serve an image from an HTTP domain then you'll get browser security alert warnings pop up when you use it.
So if you do HTTPS, you'll need to buy HTTPS for your image domain awell if you don't want to annoy the hell out of your users :)
There are other ways around this, but it's not particularly in the scope of this answer - it was just a warning!