How's Comet implemented? - comet

I wander how do client side get response if the connection of request is not finished yet?
What's the principle?
In fact I've read quite a few posts on this subject:
How do I implement basic "Long Polling"?
How does the live, real-time typing work in Google Wave?
But none of them solve my doubt

The answer depends on the technique used.
HTTP streaming, using the "hidden Iframe" technique, can do this. The technique is that the server sends <script> elements to the hidden iframe. Each script element will contain some executable JavaScript. This technique relies on the fact that browsers generally interpret an HTML element as soon as it is loaded. In this way, there is no need for any sort of polling code in the client; the script tags will contain the appropriate function calls, and the browser will execute those calls as soon as the script element is completely loaded.

Related

give a crawler pure html as opposed to trigger ajax

If my site is being crawled what PHP method should I use so that although ajax will not be triggered, my content will be transmitted?
PHP cannot detect whether I have ajax capabilities.
Goal: to give the crawler plain html.
I checked suggested answers by I'm still lost.
Google has specifications on this utilizing hashbangs (#!). See here: http://moz.com/blog/how-to-allow-google-to-crawl-ajax-content
Also note that <noscript> is an option, although PHP inserted will be transmitted regardless of JavaScript.
Finally, there are the general principles of Progressive Enhancement. Make your page fully functional without AJAX: pagination with actual links, PHP insertion of results, etc.
Then if JavaScript is available, hijack the links to pull in content without navigating away from the page.

Why are some devs placing the script tag in the bottom of the document? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Is it practically good to put JS files at the bottom of webpage?
I've seen HTML documents where the developers place the <script src="xxx"> tags in the bottom of the document and is wondering why.
I've just read HTML4 specification regarding the script tag. It says nothing about when and how the script should be loaded. Hence it's up to the web browser to handle that as it sees fit.
Isn't it reasonable to think that the browser authors should be aware of that loading scripts synchronously or in any other way hinder the document rendering would affect the browsing experience?
i.e. aren't we as web developers better off to put the scripts in the <head> tag?
Scripts require interpretation (which can be comparatively expensive) and may optionally require retrieval via another HTTP request. Placing them at the end of the document allows the rest of the page to load first.
Scripts without async or defer attributes are fetched and executed
immediately, before the browser continues to parse the page.
https://developer.mozilla.org/en-US/docs/HTML/Element/script
See also http://dev.w3.org/html5/spec/the-script-element.html, specifically the areas around preparation/blocking which are covered in detail. Note that this is an editor's draft in flux; some of these rules may not be in the final specification or be enforced by all user agents.
As a side note, this blocking behavior is not bad/wrong. Think of a library like Modernizr, which does belong in the head. It alters the DOM in a way that allows CSS to be correctly applied; were it executed in parallel, the results would be incorrect.
Putting script tags at the top of a document can block the loading of rest of the page; this is especially true of banner ad scripts. If a third-party script takes a long time to load, any DOM element after that is delayed, resulting in a blank/malformed page until the request times out, or the content eventually loads.
This way the document will be fully displayed before the JavaScript code kicks in. If a script takes time or doesn't work or is unreachable it won't leave the user sitting in front of a blank page.
Also, doing DOM manipulations before the body element is fully loaded can produce an error IE.
A JS script is executed at the point where it occurs in the page load. If it is at top of the page, the rest of the page won't have loaded when it runs, meaning that it won't be able to access any elements in the page.
If the script takes time to run (say, it has a slow loop, or throws up an alert() box, or does anything else that blocks execution), then the rest of the page loading will be delayed until it finishes. If the script is in the <head>, this can result in the user seeing a blank page while the script does it's thing.

Can you detect the HTML 5 History API on the server side?

Is there reliable a way to detect such browser capability from the user agent string?
HTML5 isn't a server-side language.
Anyway, there isn't a way to reliably detect UA capabilities, for instance they could have Javascript turned off, addons installed, etc. etc.
You could use some SS methods such as PHP's Browser Detect, but aside from that there's nothing more you can really do. This is not at all comprehensive at all, though!
Everything such as this should really be done client side in Javascript, as you can easily detect what's available and what isn't. There's a number of libraries out there that will do this, but it's very simple to do yourself if you know what you want so using one shouldn't really be required. Furthermore, you should never want to do this based on User Agent strings, as I mentioned before there's addons available that can modify behaviour etc. You should literally just check for the feature you wish to use rather than restricting yourself to a certain version of the browser.
Not reliably — you’re stuck with figuring out the browser version from the user-agent string, and maintaining a list of which browser versions support the API.
You could, however, detect it on the client side using JavaScript:
Modernizr
Mark Pilgrim’s suggested History API detection code
and then do a redirect via JavaScript (i.e. by setting window.location) to let the server know whether the API is available or not. That would be the usual way to redirect to a URL starting with # (as per your comment on rudi_visser’s answer.
This is not server side (so it probably does not answer your question, I thought it would be helpful though):
Have you looked at Modernizr
It is a javascript file you include in your HTML page. You can then use its properties to detect if a particular HTML 5 tag is supported by the browser.

HTML Snapshot for crawler - Understanding how it works

i'm reading this article today. To be honest, im really interessed to "2. Much of your content is created by a server-side technology such as PHP or ASP.NET" point.
I want understand if i have understood :)
I create that php script (gethtmlsnapshot.php) where i include the server-side ajax page (getdata.php) and i escape (for security) the parameters. Then i add it at the end of the html static page (index-movies.html). Right? Now...
1 - Where i put that gethtmlsnapshot.php? In other words, i need to call (or better, the crawler need) that page. But if i don't have link on the main page, the crawler can't call it :O How can crawler call the page with _escaped_fragment_ parameters? It can't know them if i don't specific them somewhere :)
2 - How can crewler call that page with the parameters? As before, i need link to that script with the parameters, so crewler browse each page and save the content of the dinamic result.
Can you help me? And what do you think about this technique? Won't be better if the developers of crawler do their own bots in some others ways? :)
Let me know what do you think about. Cheers
I think you got something wrong so I'll try to explain what's going on here including the background and alternatives. as this indeed a very important topic that most of us stumbled upon (or at least something similar) from time to time.
Using AJAX or rather asynchronous incremental page updating (because most pages actually don't use XML but JSON), has enriched the web and provided great user experience.
It has however also come at a price.
The main problem were clients that didn't support the xmlhttpget object or JavaScript at all.
In the beginning you had to provide backwards compatibility.
This was usually done by providing links and capture the onclick event and fire an AJAX call instead of reloading the page (if the client supported it).
Today almost every client supports the necessary functions.
So the problem today are search engines. Because they don't. Well that's not entirely true because they partly do (especially Google), but for other purposes.
Google evaluates certain JavaScript code to prevent Blackhat SEO (for example a link pointing somewhere but with JavaScript opening some completely different webpage... Or html keyword codes that are invisible to the client because they are removed by JavaScript or the other way round).
But keeping it simple its best to think of a search engine crawler of a very basic browser with no CSS or JS support (it's the same with CSS, its party parsed for special reasons).
So if you have "AJAX links" on your website, and the Webcrawler doesn't support following them using JavaScript, they just don't get crawled. Or do they?
Well the answer is JavaScript links (like document.location whatever) get followed. Google is often intelligent enough to guess the target.
But ajax calls are not made. simple because they return partial content and no senseful whole page can be constructed from it as the context is unknown and the unique URI doesn't represent the location of the content.
So there are basically 3 strategies to work around that.
have an onclick event on the links with normal href attribute as fallback (imo the best option as it solves the problem for clients as well as search engines)
submitting the content websites via your sitemap so they get indexed, but completely apart from your site links (usually pages provide a permalink to this urls so that external pages link them for the pagerank)
ajax crawling scheme
the idea is to have your JavaScript xmlhttpget requests entangled with corresponding href attributes that look like so:
www.example.com/ajax.php#!key=value
so the link looks like:
go to my imprint
the function handleajax could evaluate the document.location variable to fire the incremental asynchronous page update. its also possible to pass an id or url or whatever.
the crawler however recognises the ajax crawling scheme format and automatically fetches http://www.example.com/ajax.php.php?%23!page=imprint instead of http://www.example.com/ajax.php#!page=imprint
so you the query string then contanis the html fragment from which you can tell which partial content has been updated.
so you have just have to make sure that http://www.example.com/ajax.php.php?%23!page=imprint returns a full website that just looks like the website should look to the user after the xmlhttpget update has been made.
a very elegant solution is also to pass the a object itself to the handler function which then fetches the same URL as the crawler would have fetched using ajax but with additional parameters. Your server side script then decides whether to deliver the whole page or just the partial content.
It's a very creative approach indeed and here comes my personal pr/ con analysis:
pro:
partial updated pages receive a unique identifier at which point they are fully qualified resources in the semantic web
partially updated websites receive a unique identifier that can be presented by search engines
con:
it's just a fallback solution for search engines, not for clients without JavaScript
it provides opportunities for black hat SEO. So Google for sure won't adopt it fully or rank pages with this technique high with out proper verification of the content.
conclusion:
just usual links with fallback legacy working href attributes, but an onclick handler are a better approach because they provide functionality for old browsers.
the main advantage of the ajax crawling scheme is that partially updated websites get a unique URI, and you don't have to do create duplicate content that somehow serves as the indexable and linkable counterpart.
you could argue that ajax crawling scheme implementation is more consistent and easier to implement. I think this is a question of your application design.

Updating display of elements on the web page without refreshing the whole page

Last time I coded a web application was almost 10 years ago. I used Java/JSP/HTML/CSS etc. I've been coding non-web applications only ever since.
When I look at modern sites now (like this one), I realize how my web development skills are obsolete. Maybe the most obvious "feature" that I wouldn't know how to implement now is the update of elements on the page after user input without having to refresh the whole page (e.g. the voting/downvoting here updates the vote count without reloading the whole page). What are the basic technologies behind this?
The techniques come under the umbrella of AJAX:
Ajax (shorthand for asynchronous JavaScript and XML) is a group of interrelated web development techniques used on the client-side to create interactive web applications. With Ajax, web applications can retrieve data from the server asynchronously in the background without interfering with the display and behavior of the existing page. The use of Ajax techniques has led to an increase in interactive or dynamic interfaces on web pages. Data is usually retrieved using the XMLHttpRequest object. Despite the name, the use of XML is not actually required, nor do the requests need to be asynchronous.
Something you should know:
DHTML : HTML Document
structure,document event;
JAVASCRIPT: use javascript to operate the HTML document;
AJAX: use javascript to communicate with the server.