What is the "one-document-per-URL paradigm"? - language-agnostic

what does "one-document-per-URL paradigm" mean with reference to web development..

That if you go to a URI, you get a document, and you always get the same document.
The best way to explain it is to describe how to break it - which is usually achieved with frames or Ajax.
Frames gives you a document containing a frameset. You click a link and the page loaded in one of the frames changes. You are viewing "About" instead of "Home" but the URL in the address is unchanged so if you copy the link or bookmark it, you end up at "Home" instead of "About"
You get the same effect when Ajax is overused.

It usually means that under one URL, you should serve only one resource.
Example of right uses: Page with one news article, information about one specific product, etc.
Next step from there would be to allow user to see same resource in multiple ways. Ie, by visiting example.com/some/url?xml visitor is able to get information about given resource in XML format. If your page was list of resources, you could offer ?rss form of your list... etc.
In contrast to good uses, bad use would be that different things appear under same URL. For instance, when you have page to search for some product, you would have to avoid using POST for searching, because then you would be violating this principle (URL always leads to first search page, not to result page).
I hope I provided some answer and did not confuse you. :)

Related

Difference between href="#!" and href="#" [duplicate]

I've just noticed that the long, convoluted Facebook URLs that we're used to now look like this:
http://www.facebook.com/example.profile#!/pages/Another-Page/123456789012345
As far as I can recall, earlier this year it was just a normal URL-fragment-like string (starting with #), without the exclamation mark. But now it's a shebang or hashbang (#!), which I've previously only seen in shell scripts and Perl scripts.
The new Twitter URLs now also feature the #! symbols. A Twitter profile URL, for example, now looks like this:
http://twitter.com/#!/BoltClock
Does #! now play some special role in URLs, like for a certain Ajax framework or something since the new Facebook and Twitter interfaces are now largely Ajaxified?
Would using this in my URLs benefit my Web application in any way?
This technique is now deprecated.
This used to tell Google how to index the page.
https://developers.google.com/webmasters/ajax-crawling/
This technique has mostly been supplanted by the ability to use the JavaScript History API that was introduced alongside HTML5. For a URL like www.example.com/ajax.html#!key=value, Google will check the URL www.example.com/ajax.html?_escaped_fragment_=key=value to fetch a non-AJAX version of the contents.
The octothorpe/number-sign/hashmark has a special significance in an URL, it normally identifies the name of a section of a document. The precise term is that the text following the hash is the anchor portion of an URL. If you use Wikipedia, you will see that most pages have a table of contents and you can jump to sections within the document with an anchor, such as:
https://en.wikipedia.org/wiki/Alan_Turing#Early_computers_and_the_Turing_test
https://en.wikipedia.org/wiki/Alan_Turing identifies the page and Early_computers_and_the_Turing_test is the anchor. The reason that Facebook and other Javascript-driven applications (like my own Wood & Stones) use anchors is that they want to make pages bookmarkable (as suggested by a comment on that answer) or support the back button without reloading the entire page from the server.
In order to support bookmarking and the back button, you need to change the URL. However, if you change the page portion (with something like window.location = 'http://raganwald.com';) to a different URL or without specifying an anchor, the browser will load the entire page from the URL. Try this in Firebug or Safari's Javascript console. Load http://minimal-github.gilesb.com/raganwald. Now in the Javascript console, type:
window.location = 'http://minimal-github.gilesb.com/raganwald';
You will see the page refresh from the server. Now type:
window.location = 'http://minimal-github.gilesb.com/raganwald#try_this';
Aha! No page refresh! Type:
window.location = 'http://minimal-github.gilesb.com/raganwald#and_this';
Still no refresh. Use the back button to see that these URLs are in the browser history. The browser notices that we are on the same page but just changing the anchor, so it doesn't reload. Thanks to this behaviour, we can have a single Javascript application that appears to the browser to be on one 'page' but to have many bookmarkable sections that respect the back button. The application must change the anchor when a user enters different 'states', and likewise if a user uses the back button or a bookmark or a link to load the application with an anchor included, the application must restore the appropriate state.
So there you have it: Anchors provide Javascript programmers with a mechanism for making bookmarkable, indexable, and back-button-friendly applications. This technique has a name: It is a Single Page Interface.
p.s. There is a fourth benefit to this technique: Loading page content through AJAX and then injecting it into the current DOM can be much faster than loading a new page. In addition to the speed increase, further tricks like loading certain portions in the background can be performed under the programmer's control.
p.p.s. Given all of that, the 'bang' or exclamation mark is a further hint to Google's web crawler that the exact same page can be loaded from the server at a slightly different URL. See Ajax Crawling. Another technique is to make each link point to a server-accessible URL and then use unobtrusive Javascript to change it into an SPI with an anchor.
Here's the key link again: The Single Page Interface Manifesto
First of all: I'm the author of the The Single Page Interface Manifesto cited by raganwald
As raganwald has explained very well, the most important aspect of the Single Page Interface (SPI) approach used in FaceBook and Twitter is the use of hash # in URLs
The character ! is added only for Google purposes, this notation is a Google "standard" for crawling web sites intensive on AJAX (in the extreme Single Page Interface web sites). When Google's crawler finds an URL with #! it knows that an alternative conventional URL exists providing the same page "state" but in this case on load time.
In spite of #! combination is very interesting for SEO, is only supported by Google (as far I know), with some JavaScript tricks you can build SPI web sites SEO compatible for any web crawler (Yahoo, Bing...).
The SPI Manifesto and demos do not use Google's format of ! in hashes, this notation could be easily added and SPI crawling could be even easier (UPDATE: now ! notation is used and remains compatible with other search engines).
Take a look to this tutorial, is an example of a simple ItsNat SPI site but you can pick some ideas for other frameworks, this example is SEO compatible for any web crawler.
The hard problem is to generate any (or selected) "AJAX page state" as plain HTML for SEO, in ItsNat is very easy and automatic, the same site is in the same time SPI or page based for SEO (or when JavaScript is disabled for accessibility). With other web frameworks you can ever follow the double site approach, one site is SPI based and another page based for SEO, for instance Twitter uses this "double site" technique.
I would be very careful if you are considering adopting this hashbang convention.
Once you hashbang, you can’t go back. This is probably the stickiest issue. Ben’s post put forward the point that when pushState is more widely adopted then we can leave hashbangs behind and return to traditional URLs. Well, fact is, you can’t. Earlier I stated that URLs are forever, they get indexed and archived and generally kept around. To add to that, cool URLs don’t change. We don’t want to disconnect ourselves from all the valuable links to our content. If you’ve implemented hashbang URLs at any point then want to change them without breaking links the only way you can do it is by running some JavaScript on the root document of your domain. Forever. It’s in no way temporary, you are stuck with it.
You really want to use pushState instead of hashbangs, because making your URLs ugly and possibly broken -- forever -- is a colossal and permanent downside to hashbangs.
To have a good follow-up about all this, Twitter - one of the pioneers of hashbang URL's and single-page-interface - admitted that the hashbang system was slow in the long run and that they have actually started reversing the decision and returning to old-school links.
Article about this is here.
I always assumed the ! just indicated that the hash fragment that followed corresponded to a URL, with ! taking the place of the site root or domain. It could be anything, in theory, but it seems the Google AJAX Crawling API likes it this way.
The hash, of course, just indicates that no real page reload is occurring, so yes, it’s for AJAX purposes. Edit: Raganwald does a lovely job explaining this in more detail.

Mediawiki - Automatic two-way links between page sections

I want my MediaWiki install to have two classes of pages. (In the users' eyes - the wiki won't have to know the difference.)
I want some pages to be on topics, and others on sources (name of book, video, etc.)
I want to have a topic page "FAA Licenses" like:
==Medical Certificates==
===3rd Class===
Required for student license, and before student solo flights. {{{link/reference/whatever generally around here to Jeppesen Book#pg27-28}}}
And a source page "Jeppesen Book" like:
==pg27-28==
{{{link to FAA Licenses#3rd Class}}}
These source pages will track the source's (book or video) content. I imagine a source page for a book to have page numbers, and for a video to have start and stop times, or section numbers. (The book or video itself won't be on the source pages.)
So, the source pages will really serve two purposes. First, it will be fairly easy to see which parts of the sources have had notes taken and put into the topic pages. (So non-linear note-taking of sources will be easy -- skipping from source to source on topics, rather than digesting an entire source at once.) Second, it will be easy from a topic page to see where to go back to for a more in-depth review.
There's two issues I'm writing about.
(1) I want the workflow to be the user edits the topic page, putting in links to source pages and sections. I want this one user-addition to automatically make the source page link back to this spot. I want the system to handle the two-way-linking, assuming the user won't be perfect.
(2) I want the user to be able to put links in the topic page to source pages and sections that might not exist yet. I'd need those links to show up as red, to indicate they need to be created. But, still, once created, I want the system to handle the two-way-linking, even if there were multiple red links to the same area. (I could see building up quite a few red links, then having an unorganized "purge" of them by creating the missing pages and sections, and don't want to have to search for all the links to the new areas.) Ideally, I'd love for these source pages to be auto-generated -- so pages and sections were made as links were made to them, and automatically deleted (or at least the backlinks removed) as links were removed to them.
I don't think the MediaWiki what links here functionality does the job. I want this to work on a per-section rather than per-page basis. And, I don't want the user to have to add to each section a "what links here tag" -- I want it to be automatic.
The extension Semantic MediaWiki will allow you to get bidirectional linking in a semi-automatic fassion.
https://www.semantic-mediawiki.org/wiki/Help:Link_Template
shows a high level example.
If you dig deeper into SMW and SemanticForms you'll find how with e.g. SemanticForms you can get a user experience that is close to what you are asking for.
See e.g. http://smw.referata.com/wiki/Discourse_DB and http://www.discoursedb.org/wiki/Main_Page for an application of these principles.
I don't think there is an easy way to do that. You could write an extension that provides a parser function that your users can enter, save the source page + source section + target page + target section in a database at links update, then use the ParserSectionCreate hook to show links based on that. Or you can create two types of templates and write a bot that keeps them in sync.

an html tag for displaying html received from a specific url

I created an API of sorts, that when you navigate to it, returns information in html.
On my website, I would like to have the web page reach out to the API and display the information as part of the web page (sort of like a webpage reaches out for an img). What HTML tag would be best suited to achieving this result? I came across the and tags but not really sure which would be best.
I am building this myself thus have full control over how the content is delivered back to the page. Is there specific pattern that is used for such "modular" sourcing of information? I could rewrite my website to - prior to serving the web page - reach out to the api and pull the info itself and then include the results in html but a) this would be more complex and require changes in several places b) will become really complex as the number of such api call results I would want to include increases.
You can use Iframe for this purpose and when you recieve html which you want to display , you can simply set html content in that iframe's ID :
document.getElementById('myIframe').contentWindow.document.write("<html><body>Here is your html</body></html>");
Hope this helps.
As far as i know, using iframes is rather depricated. I always use div-tags for such tasks.
document.getElementById("targetdiv").innerHTML = "New HTML-Content";
More info on divs: http://www.w3schools.com/tags/tag_div.asp

How to mark POSTing URLs?

Search engines and pre-fetching browser plugins can cause quite some trouble with <a> elements where the destination page changes the state of the server. In a <form>, I'd mark it as modifying with method="POST". Is there a similar way to mark regular links as modifying?
rel="nofollow" does not solve the problem. From the specification:
By adding rel="nofollow" to a hyperlink, a page indicates that the destination of that hyperlink should not be afforded any additional weight or ranking by user agents which perform link analysis upon web pages (e.g. search engines)
A plain old link can only make get requests. A get request, as you indicated, should not trigger any destructive changes.
The solution, if you can't or don't want to have a form in your page at that point, is to have the link point to a page that does have a form. For instance, if you have a "delete" link it might point to a page that says "Are you sure you want to delete X? [delete]".
Then, if you don't want people to have to leave the page every time they delete something, you can implement some AJAX functionality in JavaScript.

How should I handle autolinking in wiki page content?

What I mean by autolinking is the process by which wiki links inlined in page content are generated into either a hyperlink to the page (if it does exist) or a create link (if the page doesn't exist).
With the parser I am using, this is a two step process - first, the page content is parsed and all of the links to wiki pages from the source markup are extracted. Then, I feed an array of the existing pages back to the parser, before the final HTML markup is generated.
What is the best way to handle this process? It seems as if I need to keep a cached list of every single page on the site, rather than having to extract the index of page titles each time. Or is it better to check each link separately to see if it exists? This might result in a lot of database lookups if the list wasn't cached. Would this still be viable for a larger wiki site with thousands of pages?
In my own wiki I check all the links (without caching), but my wiki is only used by a few people internally. You should benchmark stuff like this.
In my own wiki system my caching system is pretty simple - when the page is updated it checks links to make sure they are valid and applies the correct formatting/location for those that aren't. The cached page is saved as a HTML page in my cache root.
Pages that are marked as 'not created' during the page update are inserted into the a table of the database that holds the page and then a csv of pages that link to it.
When someone creates that page it initiates a scan to look through each linking page and re-caches the linking page with the correct link and formatting.
If you weren't interested in highlighting non-created pages however you could just have a checker to see if the page is created when you attempt to access it - and if not redirect to the creation page. Then just link to pages as normal in other articles.
I tried to do this once and it was a nightmare! My solution was a nasty loop in a SQL procedure, and I don't recommend it.
One thing that gave me trouble was deciding what link to use on a multi-word phrase. Say you had some text saying "I am using Stack Overflow" and your wiki had 3 pages called "stack", "overflow" and "stack overflow"....which part of your phrase gets linked to where? It will happen!
My idea would be to query the titles like SELECT title FROM articles and simply check if each wikilink is in that array of strings. If it is you link to the page, if not, you link to the create page.
In a personal project I made with Sinatra (link text) after I run the content through Markdown, I do a gsub to replace wiki words and other things (like [[Here is my link]] and whatnot) with proper links, on each checking if the page exists and linking to create or view depending.
It's not the best, but I didn't build this app with caching/speed in mind. It's a low resource simple wiki.
If speed was more important, you could wrap the app in something to cache it. For example, sinatra can be wrapped with the Rack caching.
Based on my experience developing Juli, which is an offline personal wiki with autolink, generating static HTML approach may fix your issue.
As you think, it takes long time to generate autolinked Wiki page. However, in generating static HTML situation, regenerating autolinked Wiki page happens only when a wikipage is newly added or deleted (in other words, it doesn't happen when updating wikipage) and the 'regenerating' can be done in background so that usually I don't matter how it take long time. User will see only the generated static HTML.