What a browser receives as HTML file can have many different filename extensions on the path: .html, .htm, /, .php, .asp, .stm, .cgi, etc.
Is there a way to distinguish, from only the request URL, whether it points to a HTML document or some additional data (f.ex. .png, .css, .js, ...)? This should be determined at the time of the request, so waiting for Content-Type is not an option.
HTML URLs
google.com/, stackoverflow.com, https://en.wikipedia.org/wiki/Uniform_Resource_Locator, https://www.google.de/search?q=content-length, http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html
non-HTML URLs
http://cdn.sstatic.net/stackoverflow/img/apple-touch-icon#2.png?v=73d79a89bded, http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js, http://cdn.sstatic.net/stackoverflow/all.css?v=aaf07438bdbd
Maybe filtering the non-html parts (for example, by js, css, png, jpg, ...) would work. An alternative would be to filter by What are common file extensions for web programming languages? and include directories and domains.
It must not be perfect, close enough would be good.
Is there a way to distinguish, from only the request URL, whether it
points to a HTML document or some additional data (f.ex. .png, .css,
.js, ...)? This should be determined at the time of the request, so
waiting for Content-Type is not an option.
No, this is not possible.
The webserver can do anything it wants in response to a request.
Some responses can be static, i.e. files on disk (but even then, the extension is no guarantee for the real contents of the file) - others can be totally dynamic, and only the server decides what kind of data to return (it could even return a .jpg file in response to a .html request -- or the opposite could happen a lot in the real world: a .jpg url that returns a html page with a download link for that jpg).
A lot of url's don't even have an extension, so checking the extension in general is no solution.
The best (soonest) way is to look at the Content-Type header field (assuming it corresponds with the data).
If the client doesn't want to download the full response, only to check the Content-Type, a HEAD request can be made, which will only return the HTTP headers.
No.
URLs are, once you hit the path segment, entirely arbitrary.
Sometimes the URL will include something which happens to match a filename on the HTTP server's hard disk. Sometimes that filename will give a clue about what kind of data is in it. Often it will give a clue about how the server will execute a program which will generate content of any kind.
The authoritative description of what an HTTP resource is is the Content-Type response header (and sometimes servers give wrong information there anyway).
No, that's not possible (assuming you're looking for something reliable).
In general, the format of a URI is independent of the media type of the resource it identifies. That's how the web works.
The below answer is deprecated. In Python, there is mimetypes.py in the standard library, which does exactly that.
Old answer
As a bit of reasoning: URLs containing file extensions like .html are implementation specifics. When you change from cgi to, whatever, you would be forced to either abandon the URL, breaking links, or keep an incorrect version around. See also
Semantic URL Wiki Page and
Cool URIs don't change.
Up until a few weeks ago, any HTML files I linked to an iFrame would be shown within the frame. All of a sudden, Chrome and Firefox will now ask me whether I want to download the HTML file in the iFrame. It's an Apache server and I do believe it was upgraded recently. How it was upgraded, I am not sure. I was wondering if it had anything to do with the way certain MIME types get processed within an iFrame.
Note: Chrome and Firefox are the only browsers that I've tested this with. I don't think this is a browser issue though.
It's very likely the mime-type configuration is no longer properly set up on your Apache server. Most of the time, the server configuration sets the mime type of the returned object based on the file extension you're requesting. If your file extensions have changed, or if you're using dynamic URLs that don't end in ".docx"), e.g. that get processed by an intervening app server to return the file without themselves setting the MIME type, then the browser has no way of knowing what the contents are, and correctly concludes that the best thing to do is to just gives you the contents in a file.
So... set the extension of the file you're downloading to .docx or .doc. If you're using a default Apache config, that might do it. If that doesn't work, change the mime type of the returned object based on a URL filter configuration in your apache.conf or other apache config file. Or if using dynamic URLs, explicitly set the mime type in your code to one of the following:
.doc - application/msword
.dot - application/msword
.docx - application/vnd.openxmlformats-officedocument.wordprocessingml.document
.dotx - application/vnd.openxmlformats-officedocument.wordprocessingml.template
.docm - application/vnd.ms-word.document.macroEnabled.12
.dotm - application/vnd.ms-word.template.macroEnabled.12
I have to open a file via a HTML-link.
The file's location is on another computer and adressed by the IP adress (passing through a custom server tool)
EG:
<a href="http://localhost:PORT/FILE.dotx" download>Download</a>
This works in firefox and chrome just fine, but IE (Version 8) interprets the file as a xml-File and tries to open it directly.
There is no possibility to upgrade or change the client's browser.
Is it possible to force IE to download the file without using PHP, VB or Rails?
(as we don't have a apache server or something like this)
lighttpd.conf sample, requires mod_setenv:
$HTTP["url"] =~ "\.pdf\?dl$" {
setenv.add-response-header = ( "Content-Disposition" => "attachment")
}
This is the only reliable way. IE won't trust MIME types because of an old Apache flaw workaround where Apache sent wrong MIME types and Microsoft "fixed it".
While all pdf files could simply be given the downloading header, I have chosen to show a neater way - only the parameter ?dl activates this behavior. Plain pdfs will still display in-browser and only a link which has ?dl appended gets the special treatment.
I am actually using this technique on my server, but it is implemented in php because I can't make do with the static handlers alone. Since I offer images through this, I also add the Cache: no-transform header to prevent Opera Turbo from recompressing the file to be downloaded.
EDIT: Fixed the Disposition word - has to be capitalized to also work in Webkit-based browsers.
Well, using HTML5 file handlining api we can read files with the collaboration of inpty type file. What about ready files with pat like
/images/myimage.png
etc??
Any kind of help is appreciated
Yes, if it is chrome! Play with the filesytem you will be able to do that.
The simple answer is; no. When your HTML/CSS/images/JavaScript is downloaded to the client's end you are breaking loose of the server.
Simplistic Flowchart
User requests URL in Browser (for example; www.mydomain.com/index.html)
Server reads and fetches the required file (www.mydomain.com/index.html)
index.html and it's linked resources will be downloaded to the user's browser
The user's Browser will render the HTML page
The user's Browser will only fetch the files that came with the request (images/someimages.png and stuff like scripts/jquery.js)
Explanation
The problem you are facing here is that when HTML is being rendered locally it has no link with the server anymore, thus requesting what /images/ contains file-wise is not logically comparable as it resides on the server.
Work-around
What you can do, but this will neglect the reason of the question, is to make a server-side script in JSP/PHP/ASP/etc. This script will then traverse through the directory you want. In PHP you can do this by using opendir() (http://php.net/opendir).
With a XHR/AJAX call you could request the PHP page to return the directory listing. Easiest way to do this is by using jQuery's $.post() function in combination with JSON.
Caution!
You need to keep in mind that if you use the work-around you will store a link to be visible for everyone to see what's in your online directory you request (for example http://www.mydomain.com/my_image_dirlist.php would then return a stringified list of everything (or less based on certain rules in the server-side script) inside http://www.mydomain.com/images/.
Notes
http://www.html5rocks.com/en/tutorials/file/filesystem/ (seems to work only in Chrome, but would still not be exactly what you want)
If you don't need all files from a folder, but only those files that have been downloaded to your browser's cache in the URL request; you could try to search online for accessing browser cache (downloaded files) of the currently loaded page. Or make something like a DOM-walker and CSS reader (regex?) to see where all file-relations are.
we have a small flash component on our website/application to upload multiple files.
This works fine, however we want to get the Content-Type from the headers and its always set to 'application/octet-stream'. From what I've learned this is due to a security of flash sandbox and FileUpLoad will never give this to us.
Is there any other way we could do this in flash (aside from creating an html/ajax multi file upload)?
many thanks
We have had a simlar problem when uploading from a browser. What is sent in the content type is dependent upon the browser and what is installed on the client machine. If it is an extension that the client machine does not recognise it will come back as application/octet-stream.
What we ended up doing was creating mapping functionality from the file extension to the content type. That way we could ensure consistency.