give a crawler pure html as opposed to trigger ajax - html

If my site is being crawled what PHP method should I use so that although ajax will not be triggered, my content will be transmitted?
PHP cannot detect whether I have ajax capabilities.
Goal: to give the crawler plain html.
I checked suggested answers by I'm still lost.

Google has specifications on this utilizing hashbangs (#!). See here: http://moz.com/blog/how-to-allow-google-to-crawl-ajax-content
Also note that <noscript> is an option, although PHP inserted will be transmitted regardless of JavaScript.
Finally, there are the general principles of Progressive Enhancement. Make your page fully functional without AJAX: pagination with actual links, PHP insertion of results, etc.
Then if JavaScript is available, hijack the links to pull in content without navigating away from the page.

Related

Indexe site in angular by search robots if $locationProvider.html5Mode(true) (without !#)

friends.
I have site written in angular.
How google or other search robots indexing this sites?
What techniques can be used to allow robots to see pages content?
Internet says how optimise site on angular for indexing if $locationProvider.html5Mode(false);
Exist method for indexing site if url of pages like: http://site.com/!#audio/1234
But in my site $locationProvider.html5Mode(true);
That means that url of pages without !#
http://site.com/audio/1234.
But all content of page loades with javascript.
I need help. Thank you!
Your server will receive a request at http://site.com/audio/1234. Somehow you will need to be able to return some content.
For now the only strategy I know is to render that page on the server-side also... Which involved some (or a lot of) work.
It's also good for the browsers that does not support pushStates yet. That way these users can still access the data.
You can also take a look at this http://docs.meteor.com/#spiderable They render the page on the server-side by executing the client-side javascript on the server-side.
Here's a similar question: pushState and SEO

Why AJAX over iFrames?

I am relatively new programmer, talking with a partner he told me, that before AJAX, he used a iframe to send data and change the content(obviously with help of JavaScript).
I understood that both are similar techniques, but i didn't find a article to describe their characteristic,
what are advantages of AJAX over Iframe ?
EDIT
i didnt find any explanation of the technique, but my partner told me he post the data trough a hidden iframe and submit the iframe, sound like just the iframe have to be refreshed, but i never did that
One advantage AJAX has is being able to read the state/status of the
request. You also have access to page headers, which you don't with
Iframes.
Ajax can handle multiple asynch requests. It's a little trickier
with Iframes as you need to create an Iframe per request (and keep
track of all of them to delete them later) instead of recycling the
same one.
Existing libraries are full of AJAX goodness and there is a larger community support base.
iframe
is a way show seperately two (or more) webpages in one
ajax
is a way to merge two (or more) webpages ( or new data ) into one
key advantages to Ajax I find are;
CSS will flow to the page called into it.
A way to retrieve data and update new information to the visitors without page refresh.
A fab mention to this site for it's clever use of Ajax.
A'Google instant' and suggestive searching is achieved via Ajax
Just my two cents:
I agree with Kris above that I wouldn't say they are comparable.
There's on use case that I find iFrames to be easier to work with over AJAX and that is if you need to submit a complicated form to another page but you don't need any response - the iframe route is by far the easiest to code.
Beyond that, AJAX, using a metaphor, acts a very knowledgeable go-between. It will handle multiple requests, the status of those requests, and hand back the data in the format you need.
I just wanted to add this because I didn't see in any of the answers.
The reasons to use Ajax are mostly about control, which you get a lot of. These reasons have been mentioned above.
One serious downside of Ajax, though, is that it is a JS fix. JavaScript is a great language, but people have been throwing it at every problem for a while now, and things which could be optimized if they were built in to the browsers, are now instead being done slowly (compared to compiled languages) with JS.
iFrames are a great example of this. They represent an incredibly common use case, wanting to include some html in some other html. Unfortunately, they aren't very amazing at it, often creating more headache than anything else.
If you want to include something and not have it mess with your site, nor your site to mess with it, iFrames are great. For the more common use case of including some random html in some other html, Ajax is better.
And here is the point I'm trying to make: this is dumb. There is no reason there shouldn't be something like an iFrame that acts more like Ajax. But, by jumping on board (as all of us did) with Ajax, we are now left with no choice.
The biggest reason this is a problem is that JS was never meant to be the absolute building blocks of the internet. Further, it's being used by pretty much every site around to violate user privacy. So, if you're looking for a good reason to use iFrames, this is mine:
It feels good to not need JS. If you can make your site improved by JS rather than dependent on it, that's a hard earned accomplishment, and the site will feel less "hacky" overall.
Anyways, that's just my input.
In my experience data loaded via AJAX is easier to manipulate versus data inside an iFrame. Also AJAX is really good for creating a better user experience. However I am not sure if I would necessarily put iFrames and AJAX in the same category because AJAX is asynchronous content and an iFrame is really just another page being loaded from outside of your site.
Also I could see iFraming creating SEO barriers and creating bad user experience. Honestly though if I had access to content I would prefer AJAX.

Getting html content from one page and adding it to my website

I have affiliated with expedia and I am using their API system. One of their requirements for launching the site is adding the terms and agreements to my page and they give us this page: http://travel.ian.com/index.jsp?pageName=userAgreement&locale=en_US&cid=xxx. I do not want to go to a different site, and I can not copy and paste the information because of updates. I also prefer not to use an iframe. Does anyone have any ideas on how to do this? Here is a webpage using this on their site with their domain: http://www.helloweekends.com/terms.htm. Does anyone know how they did this? Any help would be greatly appreciated!
Since it originates from another domain, it wouldn't be possible to use JavaScript, due to the same origin policy. Also, relying on JavaScript for the update would be trouble for users who has JavaScript disabled, as they wouldn't see the terms. Since you don't want to use an iframe, or copy the content, I guess your best shot would be to scrape their page with a server-side language of your choice, and then display it on your page.
Scraping can be a bit tricky though, if you rely on their markup. If they change their markup, there is a chance that your script will break, thus stop updating the terms.
There are various tutorials available on how to scrape sites. Here are a few PHP examples:
Web scrape with PHP
PHP Screen Scraping Tutorial
Note Make sure that they allow you to scrape the page prior to implementing it, so that you don't violate their rules.
Do you know if their API serves something with JSON? A JSONP call can get the values to you, but it will make your page rely on javascript for the users to see the updated page.
Another option is to use PHP of any other server side language to get the contents of the url, process it and return the block you require.
I would suggest the load() function offered by jQuery. It makes a simple AJAX call to retrieve a file, and you could even use a selector to only grab part of the page. For example, load the contents of a HTML page into a div:
$('#div_id').load('my_file.html');
Or just load a part of the page:
$('#div_id').load('my_file.html #main_text_id');

HTML Snapshot for crawler - Understanding how it works

i'm reading this article today. To be honest, im really interessed to "2. Much of your content is created by a server-side technology such as PHP or ASP.NET" point.
I want understand if i have understood :)
I create that php script (gethtmlsnapshot.php) where i include the server-side ajax page (getdata.php) and i escape (for security) the parameters. Then i add it at the end of the html static page (index-movies.html). Right? Now...
1 - Where i put that gethtmlsnapshot.php? In other words, i need to call (or better, the crawler need) that page. But if i don't have link on the main page, the crawler can't call it :O How can crawler call the page with _escaped_fragment_ parameters? It can't know them if i don't specific them somewhere :)
2 - How can crewler call that page with the parameters? As before, i need link to that script with the parameters, so crewler browse each page and save the content of the dinamic result.
Can you help me? And what do you think about this technique? Won't be better if the developers of crawler do their own bots in some others ways? :)
Let me know what do you think about. Cheers
I think you got something wrong so I'll try to explain what's going on here including the background and alternatives. as this indeed a very important topic that most of us stumbled upon (or at least something similar) from time to time.
Using AJAX or rather asynchronous incremental page updating (because most pages actually don't use XML but JSON), has enriched the web and provided great user experience.
It has however also come at a price.
The main problem were clients that didn't support the xmlhttpget object or JavaScript at all.
In the beginning you had to provide backwards compatibility.
This was usually done by providing links and capture the onclick event and fire an AJAX call instead of reloading the page (if the client supported it).
Today almost every client supports the necessary functions.
So the problem today are search engines. Because they don't. Well that's not entirely true because they partly do (especially Google), but for other purposes.
Google evaluates certain JavaScript code to prevent Blackhat SEO (for example a link pointing somewhere but with JavaScript opening some completely different webpage... Or html keyword codes that are invisible to the client because they are removed by JavaScript or the other way round).
But keeping it simple its best to think of a search engine crawler of a very basic browser with no CSS or JS support (it's the same with CSS, its party parsed for special reasons).
So if you have "AJAX links" on your website, and the Webcrawler doesn't support following them using JavaScript, they just don't get crawled. Or do they?
Well the answer is JavaScript links (like document.location whatever) get followed. Google is often intelligent enough to guess the target.
But ajax calls are not made. simple because they return partial content and no senseful whole page can be constructed from it as the context is unknown and the unique URI doesn't represent the location of the content.
So there are basically 3 strategies to work around that.
have an onclick event on the links with normal href attribute as fallback (imo the best option as it solves the problem for clients as well as search engines)
submitting the content websites via your sitemap so they get indexed, but completely apart from your site links (usually pages provide a permalink to this urls so that external pages link them for the pagerank)
ajax crawling scheme
the idea is to have your JavaScript xmlhttpget requests entangled with corresponding href attributes that look like so:
www.example.com/ajax.php#!key=value
so the link looks like:
go to my imprint
the function handleajax could evaluate the document.location variable to fire the incremental asynchronous page update. its also possible to pass an id or url or whatever.
the crawler however recognises the ajax crawling scheme format and automatically fetches http://www.example.com/ajax.php.php?%23!page=imprint instead of http://www.example.com/ajax.php#!page=imprint
so you the query string then contanis the html fragment from which you can tell which partial content has been updated.
so you have just have to make sure that http://www.example.com/ajax.php.php?%23!page=imprint returns a full website that just looks like the website should look to the user after the xmlhttpget update has been made.
a very elegant solution is also to pass the a object itself to the handler function which then fetches the same URL as the crawler would have fetched using ajax but with additional parameters. Your server side script then decides whether to deliver the whole page or just the partial content.
It's a very creative approach indeed and here comes my personal pr/ con analysis:
pro:
partial updated pages receive a unique identifier at which point they are fully qualified resources in the semantic web
partially updated websites receive a unique identifier that can be presented by search engines
con:
it's just a fallback solution for search engines, not for clients without JavaScript
it provides opportunities for black hat SEO. So Google for sure won't adopt it fully or rank pages with this technique high with out proper verification of the content.
conclusion:
just usual links with fallback legacy working href attributes, but an onclick handler are a better approach because they provide functionality for old browsers.
the main advantage of the ajax crawling scheme is that partially updated websites get a unique URI, and you don't have to do create duplicate content that somehow serves as the indexable and linkable counterpart.
you could argue that ajax crawling scheme implementation is more consistent and easier to implement. I think this is a question of your application design.

Updating display of elements on the web page without refreshing the whole page

Last time I coded a web application was almost 10 years ago. I used Java/JSP/HTML/CSS etc. I've been coding non-web applications only ever since.
When I look at modern sites now (like this one), I realize how my web development skills are obsolete. Maybe the most obvious "feature" that I wouldn't know how to implement now is the update of elements on the page after user input without having to refresh the whole page (e.g. the voting/downvoting here updates the vote count without reloading the whole page). What are the basic technologies behind this?
The techniques come under the umbrella of AJAX:
Ajax (shorthand for asynchronous JavaScript and XML) is a group of interrelated web development techniques used on the client-side to create interactive web applications. With Ajax, web applications can retrieve data from the server asynchronously in the background without interfering with the display and behavior of the existing page. The use of Ajax techniques has led to an increase in interactive or dynamic interfaces on web pages. Data is usually retrieved using the XMLHttpRequest object. Despite the name, the use of XML is not actually required, nor do the requests need to be asynchronous.
Something you should know:
DHTML : HTML Document
structure,document event;
JAVASCRIPT: use javascript to operate the HTML document;
AJAX: use javascript to communicate with the server.