I am working on ROR web apps. My webpage url looks like below-
http://dev.ibiza.jp:3000/facebook/report?advertiser_id=2102#/dashboard
Here I understood that advertiser_id is 2102 but I couldn't understand what #/dashboard is pointing to?
The portion of the URL which follows the # symbol is not normally sent to the server in the request for the page. If you open your web inspector and watch the request for the page, you will see that the #/dashboard portion is not included in the request at all.
On a normal (basic HTML) web page, the # symbol can be used to link to a section within the page, so that the browser jumps down to that section after the page loads.
In fancy javascript-heavy web applications, the # symbol is commonly used followed by more URL paths, for example www.example.com/some-path#/other-path/etc the other-path/etc portion of the URL is not seen by the server, but is available for Javascript to read in the browser and presumably display something different based on that URL path.
So in your case, the first part of the URL is a request to the server:
http://dev.ibiza.jp:3000/facebook/report?advertiser_id=2102
and the second part of the URL could be for Javascript to display a specific view of the page once it has loaded:
#/dashboard
The # symbol is also used to create a Fragment Identifier and is also typically used to link to a specific piece of content within a web page (such as to cause the browser to jump down to a particular section on the page).
As others have mentioned, this has SEO implications. In order to index pages such as this, you may have to employ different techniques to allow the content that is "behind the # symbol" to be accessible to search engines.
# symbol is called anchor, it redirects to a specific position on the html page.
It's a crawling technique , you could read more Here
Providing another example
Here's a request to github for the sourcecode of a java class
https://github.com/spring-cloud/spring-cloud-consul/blob/master/spring-cloud-consul-discovery/src/main/java/org/springframework/cloud/consul/serviceregistry/ConsulServiceRegistry.java
By appending this with "#L90" the web browser will make the same request, and then scroll to line 90 and highlight the code.
https://github.com/spring-cloud/spring-cloud-consul/blob/master/spring-cloud-consul-discovery/src/main/java/org/springframework/cloud/consul/serviceregistry/ConsulServiceRegistry.java#L90
Your web browser made the same request to the github server, but in the anchored case, performed the additional action of highlighting the selected line after the response was received.
after # is the hash of the location; the ! the follows is used by search engines to help index AJAX content. After that can be anything, but is usually rendered to look as a path (hence the /)
Related
Let's say that we have two pages:
https://www.example.com/first/firstpage.html
https://www.example.com/second/secondpage.html
that both load the resource https://www.example.com/resource.js
If I want the server that serves resource.js to be able to serve a different version of resource.js depending on which page the request is coming from, is there a reliable header upon which the full URL of the requesting page can be determined (or maybe there is some other way to determine this)?
I know that there is an Origin header, but from my understanding this just represents the domain (and any subdomains) without the full URL and query string. Is there any way for the server to know the full URL and query string that the request for the resource is coming from?
If this isn't possible, I know it would be easy to include that info in the JS script tag as follows:
<script src="/resource.js?origin=/first/firstpage.html"></script>
But I don't want to have to modify the script tag for each page. Is there some other way to have the page automatically include it's own URL in the query string of the resource request (without having to dynamically load the resource using my own JS script - HTML only please!), or just any unique identifier so that the script tag doesn't have to be modified individually on each page?
There's the Referer header that you can use.
Make sure that your response uses Vary: Referer. Otherwise, browsers are going to cache this resource as if the referring page URL didn't matter.
I'd plead of you not to do this at all though. You're going to create a rabbit hole of problems, as not all browsers or proxy servers are well behaved. Some are going to aggressively cache this anyway, no matter what you do with the Vary header.
I've just noticed that the long, convoluted Facebook URLs that we're used to now look like this:
http://www.facebook.com/example.profile#!/pages/Another-Page/123456789012345
As far as I can recall, earlier this year it was just a normal URL-fragment-like string (starting with #), without the exclamation mark. But now it's a shebang or hashbang (#!), which I've previously only seen in shell scripts and Perl scripts.
The new Twitter URLs now also feature the #! symbols. A Twitter profile URL, for example, now looks like this:
http://twitter.com/#!/BoltClock
Does #! now play some special role in URLs, like for a certain Ajax framework or something since the new Facebook and Twitter interfaces are now largely Ajaxified?
Would using this in my URLs benefit my Web application in any way?
This technique is now deprecated.
This used to tell Google how to index the page.
https://developers.google.com/webmasters/ajax-crawling/
This technique has mostly been supplanted by the ability to use the JavaScript History API that was introduced alongside HTML5. For a URL like www.example.com/ajax.html#!key=value, Google will check the URL www.example.com/ajax.html?_escaped_fragment_=key=value to fetch a non-AJAX version of the contents.
The octothorpe/number-sign/hashmark has a special significance in an URL, it normally identifies the name of a section of a document. The precise term is that the text following the hash is the anchor portion of an URL. If you use Wikipedia, you will see that most pages have a table of contents and you can jump to sections within the document with an anchor, such as:
https://en.wikipedia.org/wiki/Alan_Turing#Early_computers_and_the_Turing_test
https://en.wikipedia.org/wiki/Alan_Turing identifies the page and Early_computers_and_the_Turing_test is the anchor. The reason that Facebook and other Javascript-driven applications (like my own Wood & Stones) use anchors is that they want to make pages bookmarkable (as suggested by a comment on that answer) or support the back button without reloading the entire page from the server.
In order to support bookmarking and the back button, you need to change the URL. However, if you change the page portion (with something like window.location = 'http://raganwald.com';) to a different URL or without specifying an anchor, the browser will load the entire page from the URL. Try this in Firebug or Safari's Javascript console. Load http://minimal-github.gilesb.com/raganwald. Now in the Javascript console, type:
window.location = 'http://minimal-github.gilesb.com/raganwald';
You will see the page refresh from the server. Now type:
window.location = 'http://minimal-github.gilesb.com/raganwald#try_this';
Aha! No page refresh! Type:
window.location = 'http://minimal-github.gilesb.com/raganwald#and_this';
Still no refresh. Use the back button to see that these URLs are in the browser history. The browser notices that we are on the same page but just changing the anchor, so it doesn't reload. Thanks to this behaviour, we can have a single Javascript application that appears to the browser to be on one 'page' but to have many bookmarkable sections that respect the back button. The application must change the anchor when a user enters different 'states', and likewise if a user uses the back button or a bookmark or a link to load the application with an anchor included, the application must restore the appropriate state.
So there you have it: Anchors provide Javascript programmers with a mechanism for making bookmarkable, indexable, and back-button-friendly applications. This technique has a name: It is a Single Page Interface.
p.s. There is a fourth benefit to this technique: Loading page content through AJAX and then injecting it into the current DOM can be much faster than loading a new page. In addition to the speed increase, further tricks like loading certain portions in the background can be performed under the programmer's control.
p.p.s. Given all of that, the 'bang' or exclamation mark is a further hint to Google's web crawler that the exact same page can be loaded from the server at a slightly different URL. See Ajax Crawling. Another technique is to make each link point to a server-accessible URL and then use unobtrusive Javascript to change it into an SPI with an anchor.
Here's the key link again: The Single Page Interface Manifesto
First of all: I'm the author of the The Single Page Interface Manifesto cited by raganwald
As raganwald has explained very well, the most important aspect of the Single Page Interface (SPI) approach used in FaceBook and Twitter is the use of hash # in URLs
The character ! is added only for Google purposes, this notation is a Google "standard" for crawling web sites intensive on AJAX (in the extreme Single Page Interface web sites). When Google's crawler finds an URL with #! it knows that an alternative conventional URL exists providing the same page "state" but in this case on load time.
In spite of #! combination is very interesting for SEO, is only supported by Google (as far I know), with some JavaScript tricks you can build SPI web sites SEO compatible for any web crawler (Yahoo, Bing...).
The SPI Manifesto and demos do not use Google's format of ! in hashes, this notation could be easily added and SPI crawling could be even easier (UPDATE: now ! notation is used and remains compatible with other search engines).
Take a look to this tutorial, is an example of a simple ItsNat SPI site but you can pick some ideas for other frameworks, this example is SEO compatible for any web crawler.
The hard problem is to generate any (or selected) "AJAX page state" as plain HTML for SEO, in ItsNat is very easy and automatic, the same site is in the same time SPI or page based for SEO (or when JavaScript is disabled for accessibility). With other web frameworks you can ever follow the double site approach, one site is SPI based and another page based for SEO, for instance Twitter uses this "double site" technique.
I would be very careful if you are considering adopting this hashbang convention.
Once you hashbang, you can’t go back. This is probably the stickiest issue. Ben’s post put forward the point that when pushState is more widely adopted then we can leave hashbangs behind and return to traditional URLs. Well, fact is, you can’t. Earlier I stated that URLs are forever, they get indexed and archived and generally kept around. To add to that, cool URLs don’t change. We don’t want to disconnect ourselves from all the valuable links to our content. If you’ve implemented hashbang URLs at any point then want to change them without breaking links the only way you can do it is by running some JavaScript on the root document of your domain. Forever. It’s in no way temporary, you are stuck with it.
You really want to use pushState instead of hashbangs, because making your URLs ugly and possibly broken -- forever -- is a colossal and permanent downside to hashbangs.
To have a good follow-up about all this, Twitter - one of the pioneers of hashbang URL's and single-page-interface - admitted that the hashbang system was slow in the long run and that they have actually started reversing the decision and returning to old-school links.
Article about this is here.
I always assumed the ! just indicated that the hash fragment that followed corresponded to a URL, with ! taking the place of the site root or domain. It could be anything, in theory, but it seems the Google AJAX Crawling API likes it this way.
The hash, of course, just indicates that no real page reload is occurring, so yes, it’s for AJAX purposes. Edit: Raganwald does a lovely job explaining this in more detail.
Is there a reason we include the http / https protocol on the href attribute of links?
Would it be fine to just leave it off:
my site
The inclusion of the “http:” or “https:” part is partly just a matter of tradition, partly a matter of actually specifying the protocol. If it is defaulted, the protocol of the current page is used; e.g., //www.example.com becomes http://www.example.com or https://www.example.com depending on the URL of the referring page. If a web page is saved on a local disk and then opened from there, it has no protocol (just the file: pseudo-protocol), so URLs like //www.example.com won’t work; so here’s one reason for including the “http:” or “https:” part.
Omitting also the “//” part is a completely different issue altogether, turning the URL to a relative URL that will be interpreted as relative to the current base URL.
The reason why www.example.com works when typed or pasted on a browser’s address line is that relative URLs would not make sense there (there is no base URL to relate to), so browser vendors decided to imply the “http://” prefix there.
URLs in href are not restricted to only HTTP documents. They support all the protocols supported by browsers- ftp, mailto, file etc.
Also, you can preceed URL name with '#', to link to a html id internally in the page. You can give just the name or directory path, without a protocol, which will be taken as a relative URL.
My solution was to trick the browser with a redirect service, such as bit.ly and goo.gl (which will be discontinued soon), in addition to others.
When the browser realizes that the url of the shortcuts is https, it automatically releases the link image, the link is released and instead displays the http image, without showing the original link.
The annoying part is that, according to the access, it will display in the panel control of your redirector, thousands of "clicks", which is actually "display".
With this experience I'm going to look for a Wordpress plugin for redirection and create my own "redirects links". So I will have https // mysite.com /id → redirect to http link.
what is the best method of hiding the file name in the URL from a developers side (who has no control over the server), for example if the site is www.123.co.za/contact.htm - i wan the user to only see www.123.co.za. an example of such can be seen here http://www.groupon.co.za
ways i know of is using one page and dynamically loading page content using ajax
the other is frames
(server options like mod_rewrite i cant use as i dnt have access to or control over the server)
They are using index pages. That means they have a page such as index.html, index.php, or index.aspx, etc. All you have to do is create a directory (for example, 'contact') and put a file named 'index.html' within that directory. Then you can view www.123.co.za/contact/index.html as www.123.co.za/contact. Note that your allowable index page names may vary. If index.* doesn't work for you, contact the host and ask (sometimes it's default.*).
The catch to this method is that your page is now viewable by at lest three URLS (www.123.co.za/contact, www.123.co.za/contact/, www.123.co.za/contact/index.html). This can hurt your site in search engines for you may get penalized for "duplicate" content. You could solve this issue with mod_rewrite but seeing as you can't use that, you can't prevent the aforementioned scenario.
I have a web application that has a constant URL and internal state machine. The states are changed via posts. I know it is a bad design and I should use the rest approach. But given this I have a following problem.
I use HTML5 offline cache (the manifest attribute in HTML tag). For the first page it is parsed and cached as I would expect (login page). But for the second page (main menu) the manifest included there is not parsed. No events are shown inside Chrome browser. If I change the URL a little by including a parameter then the manifest is parsed, but not before.
Event if I include everything in the login page manifest the second page downloads the same files again. Event if they are specified in the manifest for the first page.
Why this behaviour?
To answer it myself. It was looking so odd, simply because the cache is only parsed on GET calls and ignored on POST calls. Event if post loads another HTML page. To me this is a little bit silly but it seems that is how it works.
Now it finally works as it is supposed to.