This question already has answers here:
What's the actual meaning of 'two requests to the server' in this context?
(2 answers)
Closed 6 years ago.
The w3schools documentation says:
Without a trailing slash on subfolder addresses, you might generate two requests to the server. Many servers will automatically add a trailing slash to the address, and then create a new request.
It is not clear what exactly this means. What difference does it make to add a trailing slash in the href urls, is there a best practice regarding adding a trailing slash.
These are two different URLs:
http://example.com/foo
http://example.com/foo/
Often, but not always, requesting the first URL will trigger the server to reply with a 301 Permanent Redirect to the second URL. The browser will then have to make a second request to the second URL.
This is most commonly the case when the URL is mapped on to a directory on the server's file system and the index.html (or other directory index) is being loaded.
Servers where the content is being dynamically generated (e.g. with an MVC framework like Perl's Catalyst) are less likely to do this. In that case you often have to be even more careful with where you link to because relative URLs will resolve differently from the two URLs.
Fundamentally, http://example.com/foo and http://example.com/foo/ are two entirely different URLs. Ultimately what's important is how the server serving those URLs will respond when queried for those URLs. And it's entirely up to the server what to do. .../foo may return a file while .../foo/ may return a directory listing. Or both may return a directory listing. Or a file. Or the same file. Or a random new response each and every time.
What W3S is pointing out is that many servers are by default configured to return a redirect response to the canonical version ending in a slash. Meaning, if you're requesting .../foo from that server, it will redirect you to .../foo/, which then causes your client to do a second request to .../foo/. Why or how or when a server may issue this redirect is entirely up to each server, and whether it's really such a popular practice is questionable (as is everything by W3S).
The important thing is that you point your URLs where you mean to point them. Is .../foo the correct URL because it's a file? Or is .../foo/ the correct URL because it's the root of a (virtual) directory? You decide, you make sure your server behaves appropriately.
Related
I am new in web development / SEO and stucked so hard on next moment:
We got sitemap file for helping SE robots index our pages correctly.
Sitemap could contain only URLs from current sitemap directory. For example: http://www.example.com/sitemap.xml can contain only links, whose exist in same catalog. But how data transfer protocols (http/https) relate to my finite directory, if it is just a way for transfer data? I have not two different folders with sources on my web server for http and https, lol. And indexing should not changing with protocol changes in URL. Same question i got for www subdomen. I know what a problem in my missunderstanding in web basics xD
Clients (such as search engine indexing bots and browsers) make HTTP requests to servers, which provide a response.
A URL is how a specific resource is located. It will specify the scheme/protocol, hostname, and path (and optionally a few other things).
A URL might specify HTTP or HTTPS (the latter adding an encryption layer).
The hostname portion of a URL might include www in the name or it might not.
When the server receives the request it will run some code to determine how to respond to it. A common and simple approach for that code is to match the path portion of the URL to part of the directory structure of the file system of the computer running the HTTP server software. It may, or may not, use different directories as the root for this depending on the hostname and protocol.
This means that you might have an HTTP server providing both HTTP and HTTPS and mapping www.example.com and example.com onto the same directory resulting in (at least) four different URLs all mapping onto any given file.
Best practise is to pick one of those as the canonical URL (with preference given to HTTPS and various arguments for with or without the www (which mostly revolve around convenience and how cookies for the primary hostname will be handled on other subdomains).
When writing absolute URLs (e.g. in sitemaps, emails and business cards), use the canonical URL.
It is generally recommended that the server be configured to issue 301 Redirects from the non-canonical URLs to the canonical equivalent.
I came across an HTML anchor which reads Home.
Normally we put something like Home but when I click on Home I am able to go to the index page on the website.
I can't replicate the behavior on localhost.
Why does \ direct to the website's homepage, and was it intentional on the developer's part?
You are correct that it is incorrect, and it's almost certainly not intentional. Backslashes (\) are considered unsafe in URLs, and if a backslash is necessary in your URL you would normally have to encode it as %5C.
Why it works
As Rocket Hazmat pointed out in a comment on your question, most browsers automatically substitute / for \ in URLs.
So the link to \ is converted to /, which requests the root of the current server. The server is probably set up to serve some default file like index.php when it receives a request for a directory, and the result is loading the homepage.
Why it doesn't work in localhost
I don't know your local http server setup, but chances are it hasn't been configured to serve a specific page (like index.php) when it receives a request for a directory. So you are likely just seeing a directory listing of whatever is at the root of the local http server you are running locally.
What a browser receives as HTML file can have many different filename extensions on the path: .html, .htm, /, .php, .asp, .stm, .cgi, etc.
Is there a way to distinguish, from only the request URL, whether it points to a HTML document or some additional data (f.ex. .png, .css, .js, ...)? This should be determined at the time of the request, so waiting for Content-Type is not an option.
HTML URLs
google.com/, stackoverflow.com, https://en.wikipedia.org/wiki/Uniform_Resource_Locator, https://www.google.de/search?q=content-length, http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html
non-HTML URLs
http://cdn.sstatic.net/stackoverflow/img/apple-touch-icon#2.png?v=73d79a89bded, http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js, http://cdn.sstatic.net/stackoverflow/all.css?v=aaf07438bdbd
Maybe filtering the non-html parts (for example, by js, css, png, jpg, ...) would work. An alternative would be to filter by What are common file extensions for web programming languages? and include directories and domains.
It must not be perfect, close enough would be good.
Is there a way to distinguish, from only the request URL, whether it
points to a HTML document or some additional data (f.ex. .png, .css,
.js, ...)? This should be determined at the time of the request, so
waiting for Content-Type is not an option.
No, this is not possible.
The webserver can do anything it wants in response to a request.
Some responses can be static, i.e. files on disk (but even then, the extension is no guarantee for the real contents of the file) - others can be totally dynamic, and only the server decides what kind of data to return (it could even return a .jpg file in response to a .html request -- or the opposite could happen a lot in the real world: a .jpg url that returns a html page with a download link for that jpg).
A lot of url's don't even have an extension, so checking the extension in general is no solution.
The best (soonest) way is to look at the Content-Type header field (assuming it corresponds with the data).
If the client doesn't want to download the full response, only to check the Content-Type, a HEAD request can be made, which will only return the HTTP headers.
No.
URLs are, once you hit the path segment, entirely arbitrary.
Sometimes the URL will include something which happens to match a filename on the HTTP server's hard disk. Sometimes that filename will give a clue about what kind of data is in it. Often it will give a clue about how the server will execute a program which will generate content of any kind.
The authoritative description of what an HTTP resource is is the Content-Type response header (and sometimes servers give wrong information there anyway).
No, that's not possible (assuming you're looking for something reliable).
In general, the format of a URI is independent of the media type of the resource it identifies. That's how the web works.
The below answer is deprecated. In Python, there is mimetypes.py in the standard library, which does exactly that.
Old answer
As a bit of reasoning: URLs containing file extensions like .html are implementation specifics. When you change from cgi to, whatever, you would be forced to either abandon the URL, breaking links, or keep an incorrect version around. See also
Semantic URL Wiki Page and
Cool URIs don't change.
I'm assuming that html pages essentially extract a root path by stripping the contents before the first single slash char.
Now Given that assumption, can we tell an html page to use a different root? for example, if I have a proxy that is the root, and the proxy has a slash in it:
http://localhost:8080/proxy1/
which I want to use, rather than the normally computed root:
http://localhost:8080/
Is there a way I can modify the way my page computes its own root? i.e.
http://localhost:8080/<ROOT=http://localhost:8080/proxy>
note the last url is of course, a totally made up construct to imagine/illustrate the end goal...
IF this is impossible, which I suspect it is, what is the more general way of dealing with a proxy that has slashes in it?
After thinking some more about this, really there are a few ways to solve the problem addressed here. Here are two:
1) Give up on the complex proxy with multiple slashes in it, and just map your proxy to a domain name
Since the "The forward slash (in an absolute url) is automatically replaced with the transfer protocol and domain name of the current website (from http://www.motive.co.nz/glossary/linking.php?ref)", it will just work if your web app uses absolute urls properly.
2) Embed a root URI in your application options
This can be instrumented to tell a web server that "when you get requests to /, forward them to /x/y/z". As an example, the prometheus monitoring system provides such an option so that the application can set the root for all requests dynamically.
...There are likely other valid solutions/answers/comments on this general problem as well.
Let's say we have a site that's test.com, would test.com/ ever be a different site/file then test.com? I know that the url represents a path to the server to get that file. Going to test.com/file and test.com/file/ could potentially bring up different sites since the latter is a directory. So I was wondering if the same is true for the root.
Or am wrong about the url as well?
It all depends on your environment and how it handles the routing. Many frameworks treat url/ and url as aliases, but many frameworks don't. So the answer to your question is yes, it can be different.
the url represents a path to the server to get that file
This is correct, but this can be any path you choose.
When you create a simple website with nested folders yes you can create something like this:
/webroot/index.html
/blog.html
/myvideos/list.html
Which results in for example www.example.com, www.example.com/blog.html and www.example.com/myvideos/list.html
But with some server side settings called rewrites you can make your url behave like anything you want.
I could even redirect urls to entire different servers. Or make 2 different urls go to the same path. Anything you want.