How to mark POSTing URLs? - html

Search engines and pre-fetching browser plugins can cause quite some trouble with <a> elements where the destination page changes the state of the server. In a <form>, I'd mark it as modifying with method="POST". Is there a similar way to mark regular links as modifying?
rel="nofollow" does not solve the problem. From the specification:
By adding rel="nofollow" to a hyperlink, a page indicates that the destination of that hyperlink should not be afforded any additional weight or ranking by user agents which perform link analysis upon web pages (e.g. search engines)

A plain old link can only make get requests. A get request, as you indicated, should not trigger any destructive changes.
The solution, if you can't or don't want to have a form in your page at that point, is to have the link point to a page that does have a form. For instance, if you have a "delete" link it might point to a page that says "Are you sure you want to delete X? [delete]".
Then, if you don't want people to have to leave the page every time they delete something, you can implement some AJAX functionality in JavaScript.

Related

what risks can I avoid by using noreferrer in react links? [duplicate]

I have to link some other external sites.
I know when to use nofollow. But I am not clear when I should use rel=noreferrer.
In short, the noreferrer link type hides referrer information when the link is clicked. A link with the noreferrer link type looks something like this:
Click here for more info
If someone arrives at your site from a link that uses this link type, your analytics won't show who refered that link. Instead, it will mistakenly show as direct traffic in your acquisition channels report.
If you have an external link to someone else's site you don't trust and you want to hide referrer information then you can combine both and use
Other Domain Link
I advise you to use nofollow links for the following content:
Links in comments or on forums - Anything that has user-generated content is likely to be a source of spam. Even if you carefully moderate, things will slip through.
Advertisements & sponsored links - Any links that are meant to be advertisements or are part of a sponsorship arrangement must be nofollowed.
Paid links - If you charge in any way for a link (directory submission, quality assessment, reviews, etc.), nofollow the outbound links
noreferrer doesn't just block the HTTP referrer header, it also prevents a Javascript exploit involving window.opener
Link
Looks innocuous enough, but there's a hole because, by default, the page that's being opened is allowing the opened page to call back into it via window.opener. There are some restrictions, being cross-domain, but there's still some mischief that can be done
window.opener.location = 'http://gotcha.badstuff';
With noreferrer most browsers will disallow the window.opener exploit
As #unor said, it hides referrer information when the link is clicked. Basically this is a privacy enhancement for when you want to hide from the owner of the linked domain that the user came from your website.
Example:
User is on your website www.mywebsite.com, there you have a Link. When someone clicks the "Link" the owner of newsite.com knows it came from www.mywebsite.com. By setting rel=noreferrer you prevent revealing this information.
A good example how it works is starting from 21:28 of this conference talk. This is considered to be a good practice when working with server-side (e.g. Node.js). You can also read about this on the Helmet documentation.
You'll o ly need to use this on private pages or pages you dont want to advertise. E.g. a webmail or private bug tracker would be considered private and you don't want to leak any information to the external linked websites.
Sensitive public pages, like medical information or other sensitive topics may also want to mask the referrer header.

Difference between href="#!" and href="#" [duplicate]

I've just noticed that the long, convoluted Facebook URLs that we're used to now look like this:
http://www.facebook.com/example.profile#!/pages/Another-Page/123456789012345
As far as I can recall, earlier this year it was just a normal URL-fragment-like string (starting with #), without the exclamation mark. But now it's a shebang or hashbang (#!), which I've previously only seen in shell scripts and Perl scripts.
The new Twitter URLs now also feature the #! symbols. A Twitter profile URL, for example, now looks like this:
http://twitter.com/#!/BoltClock
Does #! now play some special role in URLs, like for a certain Ajax framework or something since the new Facebook and Twitter interfaces are now largely Ajaxified?
Would using this in my URLs benefit my Web application in any way?
This technique is now deprecated.
This used to tell Google how to index the page.
https://developers.google.com/webmasters/ajax-crawling/
This technique has mostly been supplanted by the ability to use the JavaScript History API that was introduced alongside HTML5. For a URL like www.example.com/ajax.html#!key=value, Google will check the URL www.example.com/ajax.html?_escaped_fragment_=key=value to fetch a non-AJAX version of the contents.
The octothorpe/number-sign/hashmark has a special significance in an URL, it normally identifies the name of a section of a document. The precise term is that the text following the hash is the anchor portion of an URL. If you use Wikipedia, you will see that most pages have a table of contents and you can jump to sections within the document with an anchor, such as:
https://en.wikipedia.org/wiki/Alan_Turing#Early_computers_and_the_Turing_test
https://en.wikipedia.org/wiki/Alan_Turing identifies the page and Early_computers_and_the_Turing_test is the anchor. The reason that Facebook and other Javascript-driven applications (like my own Wood & Stones) use anchors is that they want to make pages bookmarkable (as suggested by a comment on that answer) or support the back button without reloading the entire page from the server.
In order to support bookmarking and the back button, you need to change the URL. However, if you change the page portion (with something like window.location = 'http://raganwald.com';) to a different URL or without specifying an anchor, the browser will load the entire page from the URL. Try this in Firebug or Safari's Javascript console. Load http://minimal-github.gilesb.com/raganwald. Now in the Javascript console, type:
window.location = 'http://minimal-github.gilesb.com/raganwald';
You will see the page refresh from the server. Now type:
window.location = 'http://minimal-github.gilesb.com/raganwald#try_this';
Aha! No page refresh! Type:
window.location = 'http://minimal-github.gilesb.com/raganwald#and_this';
Still no refresh. Use the back button to see that these URLs are in the browser history. The browser notices that we are on the same page but just changing the anchor, so it doesn't reload. Thanks to this behaviour, we can have a single Javascript application that appears to the browser to be on one 'page' but to have many bookmarkable sections that respect the back button. The application must change the anchor when a user enters different 'states', and likewise if a user uses the back button or a bookmark or a link to load the application with an anchor included, the application must restore the appropriate state.
So there you have it: Anchors provide Javascript programmers with a mechanism for making bookmarkable, indexable, and back-button-friendly applications. This technique has a name: It is a Single Page Interface.
p.s. There is a fourth benefit to this technique: Loading page content through AJAX and then injecting it into the current DOM can be much faster than loading a new page. In addition to the speed increase, further tricks like loading certain portions in the background can be performed under the programmer's control.
p.p.s. Given all of that, the 'bang' or exclamation mark is a further hint to Google's web crawler that the exact same page can be loaded from the server at a slightly different URL. See Ajax Crawling. Another technique is to make each link point to a server-accessible URL and then use unobtrusive Javascript to change it into an SPI with an anchor.
Here's the key link again: The Single Page Interface Manifesto
First of all: I'm the author of the The Single Page Interface Manifesto cited by raganwald
As raganwald has explained very well, the most important aspect of the Single Page Interface (SPI) approach used in FaceBook and Twitter is the use of hash # in URLs
The character ! is added only for Google purposes, this notation is a Google "standard" for crawling web sites intensive on AJAX (in the extreme Single Page Interface web sites). When Google's crawler finds an URL with #! it knows that an alternative conventional URL exists providing the same page "state" but in this case on load time.
In spite of #! combination is very interesting for SEO, is only supported by Google (as far I know), with some JavaScript tricks you can build SPI web sites SEO compatible for any web crawler (Yahoo, Bing...).
The SPI Manifesto and demos do not use Google's format of ! in hashes, this notation could be easily added and SPI crawling could be even easier (UPDATE: now ! notation is used and remains compatible with other search engines).
Take a look to this tutorial, is an example of a simple ItsNat SPI site but you can pick some ideas for other frameworks, this example is SEO compatible for any web crawler.
The hard problem is to generate any (or selected) "AJAX page state" as plain HTML for SEO, in ItsNat is very easy and automatic, the same site is in the same time SPI or page based for SEO (or when JavaScript is disabled for accessibility). With other web frameworks you can ever follow the double site approach, one site is SPI based and another page based for SEO, for instance Twitter uses this "double site" technique.
I would be very careful if you are considering adopting this hashbang convention.
Once you hashbang, you can’t go back. This is probably the stickiest issue. Ben’s post put forward the point that when pushState is more widely adopted then we can leave hashbangs behind and return to traditional URLs. Well, fact is, you can’t. Earlier I stated that URLs are forever, they get indexed and archived and generally kept around. To add to that, cool URLs don’t change. We don’t want to disconnect ourselves from all the valuable links to our content. If you’ve implemented hashbang URLs at any point then want to change them without breaking links the only way you can do it is by running some JavaScript on the root document of your domain. Forever. It’s in no way temporary, you are stuck with it.
You really want to use pushState instead of hashbangs, because making your URLs ugly and possibly broken -- forever -- is a colossal and permanent downside to hashbangs.
To have a good follow-up about all this, Twitter - one of the pioneers of hashbang URL's and single-page-interface - admitted that the hashbang system was slow in the long run and that they have actually started reversing the decision and returning to old-school links.
Article about this is here.
I always assumed the ! just indicated that the hash fragment that followed corresponded to a URL, with ! taking the place of the site root or domain. It could be anything, in theory, but it seems the Google AJAX Crawling API likes it this way.
The hash, of course, just indicates that no real page reload is occurring, so yes, it’s for AJAX purposes. Edit: Raganwald does a lovely job explaining this in more detail.

What is the "one-document-per-URL paradigm"?

what does "one-document-per-URL paradigm" mean with reference to web development..
That if you go to a URI, you get a document, and you always get the same document.
The best way to explain it is to describe how to break it - which is usually achieved with frames or Ajax.
Frames gives you a document containing a frameset. You click a link and the page loaded in one of the frames changes. You are viewing "About" instead of "Home" but the URL in the address is unchanged so if you copy the link or bookmark it, you end up at "Home" instead of "About"
You get the same effect when Ajax is overused.
It usually means that under one URL, you should serve only one resource.
Example of right uses: Page with one news article, information about one specific product, etc.
Next step from there would be to allow user to see same resource in multiple ways. Ie, by visiting example.com/some/url?xml visitor is able to get information about given resource in XML format. If your page was list of resources, you could offer ?rss form of your list... etc.
In contrast to good uses, bad use would be that different things appear under same URL. For instance, when you have page to search for some product, you would have to avoid using POST for searching, because then you would be violating this principle (URL always leads to first search page, not to result page).
I hope I provided some answer and did not confuse you. :)

When redirecting users from a legacy website to the new one, what is the best way to detect whether or not to show them a custom welcome message?

Say you have a legacy website running on an old code-base that offers certain functionality. The successor website is up and running, providing all the old functionality and more. For some time, there has been an HTML link on the old site pointing to the new one, for those users that care to click over.
Now, the legacy site is reaching its end of life, and you want to automatically redirect users to the new site, for example via a 301 or 302 redirect. However, when a user encounters this redirect, you want to also display a friendly message on the new site welcoming them and explaining why they are not seeing the old version.
When the user clicks an HTML link, the HTTP_REFERER header is populated, and the welcome message can be triggered via that value. However it appears that the same is not true when using 3XX redirect codes.
The top Google hit for this issue has this to say:
"HTTP 1.1 specification states it clearly: if a 3XX code is given, no
Referer value is passed. (eventualy, the URL that pointed to 3XX site)."
(http://www.usenet-forums.com/apache-web-server/37811-how-set-referer-redirect.html#post145986)
However I could not find this statement in a quick read through the spec (https://www.rfc-editor.org/rfc/rfc2616).
Can anyone suggest the proper way to achieve this functionality?
Note: This is not meant to be an all-encompassing solution. We understand that some clients don't even send the HTTP_REFERER header for privacy reasons, but for the sake of argument, let's ignore that use case.
First, This should be a 301, not a 302 redirect. Your redirection is permanent, so you want to indicate that. As to how to indicate the redirect, just add a parm to the url. Instead of redirecting to http://www.newsite.com redirect them to http://www.newsite.com?FromOldSite=Y
Could you just redirect them to a specific launch page? Like if try try to visit http://oldsite.com/desired/page, just send them to http://newsite.com/welcome?nextpage=/desired/page. The welcome page could show the message and then pass them over to the content. Alternatively, you could send them right to the new page with a ?show_welcome=true in the URL.
Not sure how you plan to redirect your users, but if you don't want to "ugly" up your URL, you might just set your own custom header when hitting the old site and then check for it at the new.

Best Way For Back Button To Leave Form Data

I'm creating a web page based on user input from a form. After the user sees the generated page I want to allow them to press the back button and make changes to the form. I would like to display the form as they had filled it out previously. What is the best way to get this behavior (with cross browser support)?
After the user sees the generated page I want to allow them to press the back button and make changes to the form. I would like to display the form as they had filled it out previously.
There is no need to add any clever fancy code; that is what browsers will do by default, unless you take active steps to prevent it, such as:
breaking the cache with Cache-Control/Pragma headers
generating the form page itself from the response to a POST (use POST-Redirect-GET instead)
generating the form elements from script
Cookie solutions are fragile and need special handling if you don't want two tabs open at once to get very confused. Make it easy for yourself: let the browser do the work.
JQuery has a nice cookie plugin which i used to keep exam data while the user browsed the site for the answers in place.
Store the saved information in cookies as delimited data. If the cookie exists, repopulate the form.
If you use document.formName.fieldName syntax, there are no cross-browser issues.
As a fall-back if cookies are disabled, you can store it on the server and do the same with AJAX.