How do search engines deal with AngularJS applications? - html

I see two issues with AngularJS application regarding search engines and SEO:
1) What happens with custom tags? Do search engines ignore the whole content within those tags? i.e. suppose I have
<custom>
<h1>Hey, this title is important</h1>
</custom>
would <h1> be indexed despite being inside custom tags?
2) Is there a way to avoid search engines of indexing {{}} binds literally? i.e.
<h2>{{title}}</h2>
I know I could do something like
<h2 ng-bind="title"></h2>
but what if I want to actually let the crawler "see" the title? Is server-side rendering the only solution?

(2022) Use Server Side Rendering if possible, and generate URLs with Pushstate
Google can and will run JavaScript now so it is very possible to build a site using only JavaScript provided you create a sensible URL structure. However, pagespeed has become a progressively more important ranking factor and typically pages built clientside perform poorly on initial render.
Serverside rendering (SSR) can help by allowing your pages to be pre-generated on the server. Your html containst the div that will be used as the page root, but this is not an empty div, it contains the html that the JavaScript would have generated if it were allowed to run.
The client downloads the HTML and renders it giving a very fast initial load, then it executes the JavaScript replacing the content of the root div with generated content in a process known as hydration.
Many newer frameworks come with SSR built in, notably NextJS.
(2015) Use PushState and Precomposition
The current (2015) way to do this is using the JavaScript pushState method.
PushState changes the URL in the top browser bar without reloading the page. Say you have a page containing tabs. The tabs hide and show content, and the content is inserted dynamically, either using AJAX or by simply setting display:none and display:block to hide and show the correct tab content.
When the tabs are clicked, use pushState to update the URL in the address bar. When the page is rendered, use the value in the address bar to determine which tab to show. Angular routing will do this for you automatically.
Precomposition
There are two ways to hit a PushState Single Page App (SPA)
Via PushState, where the user clicks a PushState link and the content is AJAXed in.
By hitting the URL directly.
The initial hit on the site will involve hitting the URL directly. Subsequent hits will simply AJAX in content as the PushState updates the URL.
Crawlers harvest links from a page then add them to a queue for later processing. This means that for a crawler, every hit on the server is a direct hit, they don't navigate via Pushstate.
Precomposition bundles the initial payload into the first response from the server, possibly as a JSON object. This allows the Search Engine to render the page without executing the AJAX call.
There is some evidence to suggest that Google might not execute AJAX requests. More on this here:
https://web.archive.org/web/20160318211223/http://www.analog-ni.co/precomposing-a-spa-may-become-the-holy-grail-to-seo
Search Engines can read and execute JavaScript
Google has been able to parse JavaScript for some time now, it's why they originally developed Chrome, to act as a full featured headless browser for the Google spider. If a link has a valid href attribute, the new URL can be indexed. There's nothing more to do.
If clicking a link in addition triggers a pushState call, the site can be navigated by the user via PushState.
Search Engine Support for PushState URLs
PushState is currently supported by Google and Bing.
Google
Here's Matt Cutts responding to Paul Irish's question about PushState for SEO:
http://youtu.be/yiAF9VdvRPw
Here is Google announcing full JavaScript support for the spider:
http://googlewebmastercentral.blogspot.de/2014/05/understanding-web-pages-better.html
The upshot is that Google supports PushState and will index PushState URLs.
See also Google webmaster tools' fetch as Googlebot. You will see your JavaScript (including Angular) is executed.
Bing
Here is Bing's announcement of support for pretty PushState URLs dated March 2013:
http://blogs.bing.com/webmaster/2013/03/21/search-engine-optimization-best-practices-for-ajax-urls/
Don't use HashBangs #!
Hashbang URLs were an ugly stopgap requiring the developer to provide a pre-rendered version of the site at a special location. They still work, but you don't need to use them.
Hashbang URLs look like this:
domain.example/#!path/to/resource
This would be paired with a metatag like this:
<meta name="fragment" content="!">
Google will not index them in this form, but will instead pull a static version of the site from the escaped_fragments URL and index that.
Pushstate URLs look like any ordinary URL:
domain.example/path/to/resource
The difference is that Angular handles them for you by intercepting the change to document.location dealing with it in JavaScript.
If you want to use PushState URLs (and you probably do) take out all the old hash style URLs and metatags and simply enable HTML5 mode in your config block.
Testing your site
Google Webmaster tools now contains a tool which will allow you to fetch a URL as Google, and render JavaScript as Google renders it.
https://www.google.com/webmasters/tools/googlebot-fetch
Generating PushState URLs in Angular
To generate real URLs in Angular, rather than # prefixed ones, set HTML5 mode on your $locationProvider object.
$locationProvider.html5Mode(true);
Server Side
Since you are using real URLs, you will need to ensure the same template (plus some precomposed content) gets shipped by your server for all valid URLs. How you do this will vary depending on your server architecture.
Sitemap
Your app may use unusual forms of navigation, for example hover or scroll. To ensure Google is able to drive your app, I would probably suggest creating a sitemap, a simple list of all the URLs your app responds to. You can place this at the default location (/sitemap or /sitemap.xml), or tell Google about it using webmaster tools.
It's a good idea to have a sitemap anyway.
Browser support
Pushstate works in IE10. In older browsers, Angular will automatically fall back to hash style URLs
A demo page
The following content is rendered using a pushstate URL with precomposition:
http://html5.gingerhost.com/london
As can be verified, at this link, the content is indexed and is appearing in Google.
Serving 404 and 301 Header status codes
Because the search engine will always hit your server for every request, you can serve header status codes from your server and expect Google to see them.

Update May 2014
Google crawlers now executes javascript - you can use the Google Webmaster Tools to better understand how your sites are rendered by Google.
Original answer
If you want to optimize your app for search engines there is unfortunately no way around serving a pre-rendered version to the crawler. You can read more about Google's recommendations for ajax and javascript-heavy sites here.
If this is an option I'd recommend reading this article about how to do SEO for Angular with server-side rendering.
I’m not sure what the crawler does when it encounters custom tags.

Let's get definitive about AngularJS and SEO
Google, Yahoo, Bing, and other search engines crawl the web in traditional ways using traditional crawlers. They run robots that crawl the HTML on web pages, collecting information along the way. They keep interesting words and look for other links to other pages (these links, the amount of them and the number of them come into play with SEO).
So why don't search engines deal with javascript sites?
The answer has to do with the fact that the search engine robots work through headless browsers and they most often do not have a javascript rendering engine to render the javascript of a page. This works for most pages as most static pages don't care about JavaScript rendering their page, as their content is already available.
What can be done about it?
Luckily, crawlers of the larger sites have started to implement a mechanism that allows us to make our JavaScript sites crawlable, but it requires us to implement a change to our site.
If we change our hashPrefix to be #! instead of simply #, then modern search engines will change the request to use _escaped_fragment_ instead of #!. (With HTML5 mode, i.e. where we have links without the hash prefix, we can implement this same feature by looking at the User Agent header in our backend).
That is to say, instead of a request from a normal browser that looks like:
http://www.ng-newsletter.com/#!/signup/page
A search engine will search the page with:
http://www.ng-newsletter.com/?_escaped_fragment_=/signup/page
We can set the hash prefix of our Angular apps using a built-in method from ngRoute:
angular.module('myApp', [])
.config(['$location', function($location) {
$location.hashPrefix('!');
}]);
And, if we're using html5Mode, we will need to implement this using the meta tag:
<meta name="fragment" content="!">
Reminder, we can set the html5Mode() with the $location service:
angular.module('myApp', [])
.config(['$location',
function($location) {
$location.html5Mode(true);
}]);
Handling the search engine
We have a lot of opportunities to determine how we'll deal with actually delivering content to search engines as static HTML. We can host a backend ourselves, we can use a service to host a back-end for us, we can use a proxy to deliver the content, etc. Let's look at a few options:
Self-hosted
We can write a service to handle dealing with crawling our own site using a headless browser, like phantomjs or zombiejs, taking a snapshot of the page with rendered data and storing it as HTML. Whenever we see the query string ?_escaped_fragment_ in a search request, we can deliver the static HTML snapshot we took of the page instead of the pre-rendered page through only JS. This requires us to have a backend that delivers our pages with conditional logic in the middle. We can use something like prerender.io's backend as a starting point to run this ourselves. Of course, we still need to handle the proxying and the snippet handling, but it's a good start.
With a paid service
The easiest and the fastest way to get content into search engine is to use a service Brombone, seo.js, seo4ajax, and prerender.io are good examples of these that will host the above content rendering for you. This is a good option for the times when we don't want to deal with running a server/proxy. Also, it's usually super quick.
For more information about Angular and SEO, we wrote an extensive tutorial on it at http://www.ng-newsletter.com/posts/serious-angular-seo.html and we detailed it even more in our book ng-book: The Complete Book on AngularJS. Check it out at ng-book.com.

You should really check out the tutorial on building an SEO-friendly AngularJS site on the year of moo blog. He walks you through all the steps outlined on Angular's documentation. http://www.yearofmoo.com/2012/11/angularjs-and-seo.html
Using this technique, the search engine sees the expanded HTML instead of the custom tags.

This has drastically changed.
http://searchengineland.com/bing-offers-recommendations-for-seo-friendly-ajax-suggests-html5-pushstate-152946
If you use:
$locationProvider.html5Mode(true);
you are set.
No more rendering pages.

Things have changed quite a bit since this question was asked. There are now options to let Google index your AngularJS site. The easiest option I found was to use http://prerender.io free service that will generate the crwalable pages for you and serve that to the search engines. It is supported on almost all server side web platforms. I have recently started using them and the support is excellent too.
I do not have any affiliation with them, this is coming from a happy user.

Angular's own website serves simplified content to search engines: http://docs.angularjs.org/?_escaped_fragment_=/tutorial/step_09
Say your Angular app is consuming a Node.js/Express-driven JSON api, like /api/path/to/resource. Perhaps you could redirect any requests with ?_escaped_fragment_ to /api/path/to/resource.html, and use content negotiation to render an HTML template of the content, rather than return the JSON data.
The only thing is, your Angular routes would need to match 1:1 with your REST API.
EDIT: I'm realizing that this has the potential to really muddy up your REST api and I don't recommend doing it outside of very simple use-cases where it might be a natural fit.
Instead, you can use an entirely different set of routes and controllers for your robot-friendly content. But then you're duplicating all of your AngularJS routes and controllers in Node/Express.
I've settled on generating snapshots with a headless browser, even though I feel that's a little less-than-ideal.

A good practice can be found here:
http://scotch.io/tutorials/javascript/angularjs-seo-with-prerender-io?_escaped_fragment_=tag

As of now Google has changed their AJAX crawling proposal.
Times have changed. Today, as long as you're not blocking Googlebot from crawling your JavaScript or CSS files, we are generally able to render and understand your web pages like modern browsers.
tl;dr: [Google] are no longer recommending the AJAX crawling proposal [Google] made back in 2009.

Google's Crawlable Ajax Spec, as referenced in the other answers here, is basically the answer.
If you're interested in how other search engines and social bots deal with the same issues I wrote up the state of art here: http://blog.ajaxsnapshots.com/2013/11/googles-crawlable-ajax-specification.html
I work for a https://ajaxsnapshots.com, a company that implements the Crawlable Ajax Spec as a service - the information in that report is based on observations from our logs.

I have found an elegant solution that would cover most of your bases. I wrote about it initially here and answered another similar Stack Overflow question here which references it.
FYI this solution also includes hard coded fallback tags in case JavaScript isn't picked up by the crawler. I haven't explicitly outlined it, but it is worth mentioning that you should be activating HTML5 mode for proper URL support.
Also note: these aren't the complete files, just the important parts of those that are relevant. I can't help with writing the boilerplate for directives, services, etc.
app.example
This is where you provide the custom metadata for each of your routes (title, description, etc.)
$routeProvider
.when('/', {
templateUrl: 'views/homepage.html',
controller: 'HomepageCtrl',
metadata: {
title: 'The Base Page Title',
description: 'The Base Page Description' }
})
.when('/about', {
templateUrl: 'views/about.html',
controller: 'AboutCtrl',
metadata: {
title: 'The About Page Title',
description: 'The About Page Description' }
})
metadata-service.js (service)
Sets the custom metadata options or use defaults as fallbacks.
var self = this;
// Set custom options or use provided fallback (default) options
self.loadMetadata = function(metadata) {
self.title = document.title = metadata.title || 'Fallback Title';
self.description = metadata.description || 'Fallback Description';
self.url = metadata.url || $location.absUrl();
self.image = metadata.image || 'fallbackimage.jpg';
self.ogpType = metadata.ogpType || 'website';
self.twitterCard = metadata.twitterCard || 'summary_large_image';
self.twitterSite = metadata.twitterSite || '#fallback_handle';
};
// Route change handler, sets the route's defined metadata
$rootScope.$on('$routeChangeSuccess', function (event, newRoute) {
self.loadMetadata(newRoute.metadata);
});
metaproperty.js (directive)
Packages the metadata service results for the view.
return {
restrict: 'A',
scope: {
metaproperty: '#'
},
link: function postLink(scope, element, attrs) {
scope.default = element.attr('content');
scope.metadata = metadataService;
// Watch for metadata changes and set content
scope.$watch('metadata', function (newVal, oldVal) {
setContent(newVal);
}, true);
// Set the content attribute with new metadataService value or back to the default
function setContent(metadata) {
var content = metadata[scope.metaproperty] || scope.default;
element.attr('content', content);
}
setContent(scope.metadata);
}
};
index.html
Complete with the hardcoded fallback tags mentioned earlier, for crawlers that can't pick up any JavaScript.
<head>
<title>Fallback Title</title>
<meta name="description" metaproperty="description" content="Fallback Description">
<!-- Open Graph Protocol Tags -->
<meta property="og:url" content="fallbackurl.example" metaproperty="url">
<meta property="og:title" content="Fallback Title" metaproperty="title">
<meta property="og:description" content="Fallback Description" metaproperty="description">
<meta property="og:type" content="website" metaproperty="ogpType">
<meta property="og:image" content="fallbackimage.jpg" metaproperty="image">
<!-- Twitter Card Tags -->
<meta name="twitter:card" content="summary_large_image" metaproperty="twitterCard">
<meta name="twitter:title" content="Fallback Title" metaproperty="title">
<meta name="twitter:description" content="Fallback Description" metaproperty="description">
<meta name="twitter:site" content="#fallback_handle" metaproperty="twitterSite">
<meta name="twitter:image:src" content="fallbackimage.jpg" metaproperty="image">
</head>
This should help dramatically with most search engine use cases. If you want fully dynamic rendering for social network crawlers (which are iffy on JavaScript support), you'll still have to use one of the pre-rendering services mentioned in some of the other answers.

With Angular Universal, you can generate landing pages for the app that look like the complete app and then load your Angular app behind it.
Angular Universal generates pure HTML means no-javascript pages in server-side and serve them to users without delaying. So you can deal with any crawler, bot and user (who already have low cpu and network speed).Then you can redirect them by links/buttons to your actual angular app that already loaded behind it. This solution is recommended by official site. -More info about SEO and Angular Universal-

Use something like PreRender, it makes static pages of your site so search engines can index it.
Here you can find out for what platforms it is available: https://prerender.io/documentation/install-middleware#asp-net

Crawlers (or bots) are designed to crawl HTML content of web pages but due to AJAX operations for asynchronous data fetching, this became a problem as it takes sometime to render page and show dynamic content on it. Similarly, AngularJS also use asynchronous model, which creates problem for Google crawlers.
Some developers create basic html pages with real data and serve these pages from server side at the time of crawling. We can render same pages with PhantomJS on serve side which has _escaped_fragment_ (Because Google looks for #! in our site urls and then takes everything after the #! and adds it in _escaped_fragment_ query parameter). For more detail please read this blog .

The crawlers do not need a rich featured pretty styled gui, they only want to see the content, so you do not need to give them a snapshot of a page that has been built for humans.
My solution: to give the crawler what the crawler wants:
You must think of what do the crawler want, and give him only that.
TIP don't mess with the back. Just add a little server-sided frontview using the same API

Related

How do I generate SEO-friendly markup for a single-page web app? [duplicate]

There are a lot of cool tools for making powerful "single-page" JavaScript websites nowadays. In my opinion, this is done right by letting the server act as an API (and nothing more) and letting the client handle all of the HTML generation stuff. The problem with this "pattern" is the lack of search engine support. I can think of two solutions:
When the user enters the website, let the server render the page exactly as the client would upon navigation. So if I go to http://example.com/my_path directly the server would render the same thing as the client would if I go to /my_path through pushState.
Let the server provide a special website only for the search engine bots. If a normal user visits http://example.com/my_path the server should give him a JavaScript heavy version of the website. But if the Google bot visits, the server should give it some minimal HTML with the content I want Google to index.
The first solution is discussed further here. I have been working on a website doing this and it's not a very nice experience. It's not DRY and in my case I had to use two different template engines for the client and the server.
I think I have seen the second solution for some good ol' Flash websites. I like this approach much more than the first one and with the right tool on the server it could be done quite painlessly.
So what I'm really wondering is the following:
Can you think of any better solution?
What are the disadvantages with the second solution? If Google in some way finds out that I'm not serving the exact same content for the Google bot as a regular user, would I then be punished in the search results?
While #2 might be "easier" for you as a developer, it only provides search engine crawling. And yes, if Google finds out your serving different content, you might be penalized (I'm not an expert on that, but I have heard of it happening).
Both SEO and accessibility (not just for disabled person, but accessibility via mobile devices, touch screen devices, and other non-standard computing / internet enabled platforms) both have a similar underlying philosophy: semantically rich markup that is "accessible" (i.e. can be accessed, viewed, read, processed, or otherwise used) to all these different browsers. A screen reader, a search engine crawler or a user with JavaScript enabled, should all be able to use/index/understand your site's core functionality without issue.
pushState does not add to this burden, in my experience. It only brings what used to be an afterthought and "if we have time" to the forefront of web development.
What your describe in option #1 is usually the best way to go - but, like other accessibility and SEO issues, doing this with pushState in a JavaScript-heavy app requires up-front planning or it will become a significant burden. It should be baked in to the page and application architecture from the start - retrofitting is painful and will cause more duplication than is necessary.
I've been working with pushState and SEO recently for a couple of different application, and I found what I think is a good approach. It basically follows your item #1, but accounts for not duplicating html / templates.
Most of the info can be found in these two blog posts:
http://lostechies.com/derickbailey/2011/09/06/test-driving-backbone-views-with-jquery-templates-the-jasmine-gem-and-jasmine-jquery/
and
http://lostechies.com/derickbailey/2011/06/22/rendering-a-rails-partial-as-a-jquery-template/
The gist of it is that I use ERB or HAML templates (running Ruby on Rails, Sinatra, etc) for my server side render and to create the client side templates that Backbone can use, as well as for my Jasmine JavaScript specs. This cuts out the duplication of markup between the server side and the client side.
From there, you need to take a few additional steps to have your JavaScript work with the HTML that is rendered by the server - true progressive enhancement; taking the semantic markup that got delivered and enhancing it with JavaScript.
For example, i'm building an image gallery application with pushState. If you request /images/1 from the server, it will render the entire image gallery on the server and send all of the HTML, CSS and JavaScript down to your browser. If you have JavaScript disabled, it will work perfectly fine. Every action you take will request a different URL from the server and the server will render all of the markup for your browser. If you have JavaScript enabled, though, the JavaScript will pick up the already rendered HTML along with a few variables generated by the server and take over from there.
Here's an example:
<form id="foo">
Name: <input id="name"><button id="say">Say My Name!</button>
</form>
After the server renders this, the JavaScript would pick it up (using a Backbone.js view in this example)
FooView = Backbone.View.extend({
events: {
"change #name": "setName",
"click #say": "sayName"
},
setName: function(e){
var name = $(e.currentTarget).val();
this.model.set({name: name});
},
sayName: function(e){
e.preventDefault();
var name = this.model.get("name");
alert("Hello " + name);
},
render: function(){
// do some rendering here, for when this is just running JavaScript
}
});
$(function(){
var model = new MyModel();
var view = new FooView({
model: model,
el: $("#foo")
});
});
This is a very simple example, but I think it gets the point across.
When I instante the view after the page loads, I'm providing the existing content of the form that was rendered by the server, to the view instance as the el for the view. I am not calling render or having the view generate an el for me, when the first view is loaded. I have a render method available for after the view is up and running and the page is all JavaScript. This lets me re-render the view later if I need to.
Clicking the "Say My Name" button with JavaScript enabled will cause an alert box. Without JavaScript, it would post back to the server and the server could render the name to an html element somewhere.
Edit
Consider a more complex example, where you have a list that needs to be attached (from the comments below this)
Say you have a list of users in a <ul> tag. This list was rendered by the server when the browser made a request, and the result looks something like:
<ul id="user-list">
<li data-id="1">Bob
<li data-id="2">Mary
<li data-id="3">Frank
<li data-id="4">Jane
</ul>
Now you need to loop through this list and attach a Backbone view and model to each of the <li> items. With the use of the data-id attribute, you can find the model that each tag comes from easily. You'll then need a collection view and item view that is smart enough to attach itself to this html.
UserListView = Backbone.View.extend({
attach: function(){
this.el = $("#user-list");
this.$("li").each(function(index){
var userEl = $(this);
var id = userEl.attr("data-id");
var user = this.collection.get(id);
new UserView({
model: user,
el: userEl
});
});
}
});
UserView = Backbone.View.extend({
initialize: function(){
this.model.bind("change:name", this.updateName, this);
},
updateName: function(model, val){
this.el.text(val);
}
});
var userData = {...};
var userList = new UserCollection(userData);
var userListView = new UserListView({collection: userList});
userListView.attach();
In this example, the UserListView will loop through all of the <li> tags and attach a view object with the correct model for each one. it sets up an event handler for the model's name change event and updates the displayed text of the element when a change occurs.
This kind of process, to take the html that the server rendered and have my JavaScript take over and run it, is a great way to get things rolling for SEO, Accessibility, and pushState support.
Hope that helps.
I think you need this: http://code.google.com/web/ajaxcrawling/
You can also install a special backend that "renders" your page by running javascript on the server, and then serves that to google.
Combine both things and you have a solution without programming things twice. (As long as your app is fully controllable via anchor fragments.)
So, it seem that the main concern is being DRY
If you're using pushState have your server send the same exact code for all urls (that don't contain a file extension to serve images, etc.) "/mydir/myfile", "/myotherdir/myotherfile" or root "/" -- all requests receive the same exact code. You need to have some kind url rewrite engine. You can also serve a tiny bit of html and the rest can come from your CDN (using require.js to manage dependencies -- see https://stackoverflow.com/a/13813102/1595913).
(test the link's validity by converting the link to your url scheme and testing against existence of content by querying a static or a dynamic source. if it's not valid send a 404 response.)
When the request is not from a google bot, you just process normally.
If the request is from a google bot, you use phantom.js -- headless webkit browser ("A headless browser is simply a full-featured web browser with no visual interface.") to render html and javascript on the server and send the google bot the resulting html. As the bot parses the html it can hit your other "pushState" links /somepage on the server mylink, the server rewrites url to your application file, loads it in phantom.js and the resulting html is sent to the bot, and so on...
For your html I'm assuming you're using normal links with some kind of hijacking (e.g. using with backbone.js https://stackoverflow.com/a/9331734/1595913)
To avoid confusion with any links separate your api code that serves json into a separate subdomain, e.g. api.mysite.com
To improve performance you can pre-process your site pages for search engines ahead of time during off hours by creating static versions of the pages using the same mechanism with phantom.js and consequently serve the static pages to google bots. Preprocessing can be done with some simple app that can parse <a> tags. In this case handling 404 is easier since you can simply check for the existence of the static file with a name that contains url path.
If you use #! hash bang syntax for your site links a similar scenario applies, except that the rewrite url server engine would look out for _escaped_fragment_ in the url and would format the url to your url scheme.
There are a couple of integrations of node.js with phantom.js on github and you can use node.js as the web server to produce html output.
Here are a couple of examples using phantom.js for seo:
http://backbonetutorials.com/seo-for-single-page-apps/
http://thedigitalself.com/blog/seo-and-javascript-with-phantomjs-server-side-rendering
If you're using Rails, try poirot. It's a gem that makes it dead simple to reuse mustache or handlebars templates client and server side.
Create a file in your views like _some_thingy.html.mustache.
Render server side:
<%= render :partial => 'some_thingy', object: my_model %>
Put the template your head for client side use:
<%= template_include_tag 'some_thingy' %>
Rendre client side:
html = poirot.someThingy(my_model)
To take a slightly different angle, your second solution would be the correct one in terms of accessibility...you would be providing alternative content to users who cannot use javascript (those with screen readers, etc.).
This would automatically add the benefits of SEO and, in my opinion, would not be seen as a 'naughty' technique by Google.
Interesting. I have been searching around for viable solutions but it seems to be quite problematic.
I was actually leaning more towards your 2nd approach:
Let the server provide a special website only for the search engine
bots. If a normal user visits http://example.com/my_path the server
should give him a JavaScript heavy version of the website. But if the
Google bot visits, the server should give it some minimal HTML with
the content I want Google to index.
Here's my take on solving the problem. Although it is not confirmed to work, it might provide some insight or idea's for other developers.
Assume you're using a JS framework that supports "push state" functionality, and your backend framework is Ruby on Rails. You have a simple blog site and you would like search engines to index all your article index and show pages.
Let's say you have your routes set up like this:
resources :articles
match "*path", "main#index"
Ensure that every server-side controller renders the same template that your client-side framework requires to run (html/css/javascript/etc). If none of the controllers are matched in the request (in this example we only have a RESTful set of actions for the ArticlesController), then just match anything else and just render the template and let the client-side framework handle the routing. The only difference between hitting a controller and hitting the wildcard matcher would be the ability to render content based on the URL that was requested to JavaScript-disabled devices.
From what I understand it is a bad idea to render content that isn't visible to browsers. So when Google indexes it, people go through Google to visit a given page and there isn't any content, then you're probably going to be penalised. What comes to mind is that you render content in a div node that you display: none in CSS.
However, I'm pretty sure it doesn't matter if you simply do this:
<div id="no-js">
<h1><%= #article.title %></h1>
<p><%= #article.description %></p>
<p><%= #article.content %></p>
</div>
And then using JavaScript, which doesn't get run when a JavaScript-disabled device opens the page:
$("#no-js").remove() # jQuery
This way, for Google, and for anyone with JavaScript-disabled devices, they would see the raw/static content. So the content is physically there and is visible to anyone with JavaScript-disabled devices.
But, when a user visits the same page and actually has JavaScript enabled, the #no-js node will be removed so it doesn't clutter up your application. Then your client-side framework will handle the request through it's router and display what a user should see when JavaScript is enabled.
I think this might be a valid and fairly easy technique to use. Although that might depend on the complexity of your website/application.
Though, please correct me if it isn't. Just thought I'd share my thoughts.
Use NodeJS on the serverside, browserify your clientside code and route each http-request's(except for static http resources) uri through a serverside client to provide the first 'bootsnap'(a snapshot of the page it's state). Use something like jsdom to handle jquery dom-ops on the server. After the bootsnap returned, setup the websocket connection. Probably best to differentiate between a websocket client and a serverside client by making some kind of a wrapper connection on the clientside(serverside client can directly communicate with the server). I've been working on something like this: https://github.com/jvanveen/rnet/
Use Google Closure Template to render pages. It compiles to javascript or java, so it is easy to render the page either on the client or server side. On the first encounter with every client, render the html and add javascript as link in header. Crawler will read the html only but the browser will execute your script. All subsequent requests from the browser could be done in against the api to minimize the traffic.
This might help you : https://github.com/sharjeel619/SPA-SEO
Logic
A browser requests your single page application from the server,
which is going to be loaded from a single index.html file.
You program some intermediary server code which intercepts the client
request and differentiates whether the request came from a browser or
some social crawler bot.
If the request came from some crawler bot, make an API call to
your back-end server, gather the data you need, fill in that data to
html meta tags and return those tags in string format back to the
client.
If the request didn't come from some crawler bot, then simply
return the index.html file from the build or dist folder of your single page
application.

Add link tag using only for homepage in bigcommerce

I want to add <link rel="alternate" href="https://www.example.com" hreflang="en-us" /> this kinda of link tag in my bigcommerce site using jQuery or anything else...
Tried code:
1:
if ($("body#home").length > 0) {
$('head').add($('<link rel="alternate" href="https://www.example.com" hreflang="en-us" />'));
}
2:
var page = window.location.pathname;
if (page == '/' || page == '/index.html') {
$('head').add($('<link rel="alternate" href="https://www.example.com" hreflang="en-us" />'));
}
3:
if ($("html").hasClass("home")) {
$("head").append("<link rel=alternate href=https://www.example.com hreflang=en-us>");
}
But nothing worked for me....
Let's first give some background on hreflang and the three valid ways that Google will read it..
HTML link element in header. In the HTML section of http://www.example.com/, add a link element pointing to the Spanish version of that webpage at http://es.example.com/, like this:
<link rel="alternate" hreflang="es" href="http://es.example.com/" />
HTTP header. If you publish non-HTML files (like PDFs), you can use an HTTP header to indicate a different language version of a URL:
Link: <http://es.example.com/>; rel="alternate"; hreflang="es"
To specify multiple hreflang values in a Link HTTP header, separate the values with commas like so:
Link: <http://es.example.com/>; rel="alternate"; hreflang="es",<http://de.example.com/>; rel="alternate"; hreflang="de"
Sitemap. Instead of using markup, you can submit language version information in a Sitemap.
Source: Google 'hreflang' Usage
So Method 2 isn't possible since you can't modify or control the headers from your BigCommerce store.
This leaves us with Method 1 or Method 3.
The big question here though is..
"Will Google index & process a dynamically inserted JavaScript hreflang link tag"?
Unfortunately at the time of writing this, I need to wait several days for the Google Webmaster Tool to become active on my test site so I can be certain; while all the 3rd party hreflang test sites I used failed. My gut feeling is that I would not trust it. However, if you have an active Google Webmaster / Search Console account, you can test this by going to: Dashboard > Search Traffic > International Targeting.
But for the sake of argument, let's assume that it will work, and so to answer your specific question, you would go about this method like so...
Within the <head>...</head> block, create an empty link tag like so: <link id="lang1" /> This will have the link element physically in the DOM awaiting its attributes to be dynamically added.
Next, immediately below the link element created above, let's create the JavaScript that will turn this empty link tag into a complete hreflang reference depending on the current page:
<script>
// If current page is homepage, then append the neccessary attributes to the link tag. Else, do nothing.
// If on homepage, the link tag would become: "<link id="lang1" rel="alternate" href="https://www.example.com" hreflang="en-us" />"
window.location.pathname == '/' ? $("#lang1").attr({"rel": "alternate", "href": "https://www.example.com", "hreflang": "en-us"}) : false;
</script>
And that's about it from the coding side. If you run this and inspect the DOM (it won't be viewable in page source), you can confirm that your link tag now reads as: <link id="lang1" rel="alternate" href="https://www.example.com" hreflang="en-us" />
Again, whether or not Google will process this, I don't know.
But here's an alternative I do know will work...
We can follow Method 3 listed above, and submit language version information via your site's sitemap, which can specify which individual and specific pages have alternative language versions.
Now, you do not have access to directly modify your BigCommerce generated Sitemap. But what you do have access to, is to:
Create your own custom sitemap file, and upload it to your store.
Tell Google to use the URL of this custom sitemap, rather than the default BigCommerce one.
There are plenty of resources online on how to create a sitemap, and there are many tools that can help automate this process. Although beware, if you use a custom sitemap, then you will need to maintain it and manually update it whenever you add new pages or products to your store.
I've taken the time to point you to some specific documentation resources that should help you with this task. I will eventually come back to this post to transcribe the content from these links into this post as I do recognize posting links is bad SO practice. A hardass might say "well why are you doing it then", and well my time is limited and I'm trying to be as helpful as I can now upfront.
Here is a link from the Google Docs with information on creating a sitemap with page specific language versions.
Here is a link from the BigCommerce Docs with information on uploading a custom file to your store which can then be accessible via your domain/URL.
Finally, here is a link from the BigCommerce Docs with information on how to direct Google to use a specific/alternate file as your store's sitemap.
Please attempt the code suggestion I wrote for Method #1 and test it using your Google Webmaster's tool to let us know if the hreflang link tag is successfully crawled by Google when dynamically inserted via JavaScript - you would be doing the community a great service as there is no definite answer around this.
Remember, you can officially test this by logging into your Google Webmaster Console and navigating to Dashboard > Search Traffic > International Targeting

Why Twitter Feed isn't loading into my Website?

I am creating a Website using HTML5 and CSS3 myself. I am using jquery.tweet.js javascript file for showing my twitter feed into my website. And, I added below jQuery code into my body element:
<script>
jQuery(document).ready(function($) {
$('#latest-tweets').tweet({
username: 'sumonbdinfo',
count:1,
loading_text: "loading tweets..."
});
});
</script>
But, in my website is only now showing "loading tweets..." text, not showing my Twitter Feeds anymore.
Also, I added the jquery.tweet.js file into my head element and added jquery into my head element. But, still it's not showing on my website.
Should I put the above code into my head element of my website or I just added it in right place?
I pasted below the jquery.tweet.js file what I'm using for showing tweets on my website.
(function(a){a.fn.tweet=function(c){var n=a.extend({username:null,list:null,favorites:false,query:null,avatar_size:null,count:3,fetch:null,page:1,retweets:true,intro_text:null,outro_text:null,join_text:null,auto_join_text_default:"i said,",auto_join_text_ed:"i",auto_join_text_ing:"i am",auto_join_text_reply:"i replied to",auto_join_text_url:"i was looking at",loading_text:null,refresh_interval:null,twitter_url:"twitter.com",twitter_api_url:"api.twitter.com",twitter_search_url:"search.twitter.com",template:"{avatar}{time}{join}{text}",comparator:function(p,o){return o.tweet_time-p.tweet_time},filter:function(o){return true}},c);var b=/\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/gi;function l(q,r){if(typeof q==="string"){var o=q;for(var p in r){var s=r[p];o=o.replace(new RegExp("{"+p+"}","g"),s===null?"":s)}return o}else{return q(r)}}a.extend({tweet:{t:l}});function f(p,o){return function(){var q=[];this.each(function(){q.push(this.replace(p,o))});return a(q)}}function k(o){return o.replace(/</g,"<").replace(/>/g,"^>")}a.fn.extend({linkUser:f(/(^|[\W])#(\w+)/gi,'$1#$2'),linkHash:f(/(?:^| )[\#]+([\w\u00c0-\u00d6\u00d8-\u00f6\u00f8-\u00ff\u0600-\u06ff]+)/gi,' #$1'),capAwesome:f(/\b(awesome)\b/gi,'<span class="awesome">$1</span>'),capEpic:f(/\b(epic)\b/gi,'<span class="epic">$1</span>'),makeHeart:f(/(<)+[3]/gi,"<tt class='heart'>♥</tt>")});function e(p,o){return p.replace(b,function(s){var r=(/^[a-z]+:/i).test(s)?s:"http://"+s;var u=s;for(var t=0;t<o.length;++t){var q=o[t];if(q.url==r&&q.expanded_url){r=q.expanded_url;u=q.display_url;break}}return''+k(u)+""})}function j(o){return Date.parse(o.replace(/^([a-z]{3})( [a-z]{3} \d\d?)(.*)( \d{4})$/i,"$1,$2$4$3"))}function h(o){var q=(arguments.length>1)?arguments[1]:new Date();var s=parseInt((q.getTime()-o)/1000,10);var p="";if(s<1){p="just now"}else{if(s<60){p=s+" seconds ago"}else{if(s<120){p="a minute ago"}else{if(s<(45*60)){p=(parseInt(s/60,10)).toString()+" minutes ago"}else{if(s<(2*60*60)){p="an hour ago"}else{if(s<(24*60*60)){p=""+(parseInt(s/3600,10)).toString()+" hours ago"}else{if(s<(48*60*60)){p="a day ago"}else{p=(parseInt(s/86400,10)).toString()+" days ago"}}}}}}}return"about "+p}function g(o){if(o.match(/^(#([A-Za-z0-9-_]+)) .*/i)){return n.auto_join_text_reply}else{if(o.match(b)){return n.auto_join_text_url}else{if(o.match(/^((\w+ed)|just) .*/im)){return n.auto_join_text_ed}else{if(o.match(/^(\w*ing) .*/i)){return n.auto_join_text_ing}else{return n.auto_join_text_default}}}}}function d(){var p=("https:"==document.location.protocol?"https:":"http:");var o=(n.fetch===null)?n.count:n.fetch;var r="&include_entities=1&callback=?";if(n.list){return p+"//"+n.twitter_api_url+"/1/"+n.username[0]+"/lists/"+n.list+"/statuses.json?page="+n.page+"&per_page="+o+r}else{if(n.favorites){return p+"//"+n.twitter_api_url+"/favorites/"+n.username[0]+".json?page="+n.page+"&count="+o+r}else{if(n.query===null&&n.username.length==1){return p+"//"+n.twitter_api_url+"/1/statuses/user_timeline.json?screen_name="+n.username[0]+"&count="+o+(n.retweets?"&include_rts=1":"")+"&page="+n.page+r}else{var q=(n.query||"from:"+n.username.join(" OR from:"));return p+"//"+n.twitter_search_url+"/search.json?&q="+encodeURIComponent(q)+"&rpp="+o+"&page="+n.page+r}}}}function m(o,p){if(p){return("user" in o)?o.user.profile_image_url_https:m(o,false).replace(/^http:\/\/[a-z0-9]{1,3}\.twimg\.com\//,"https://s3.amazonaws.com/twitter_production/")}else{return o.profile_image_url||o.user.profile_image_url}}function i(p){var q={};q.item=p;q.source=p.source;q.screen_name=p.from_user||p.user.screen_name;q.avatar_size=n.avatar_size;q.avatar_url=m(p,(document.location.protocol==="https:"));q.retweet=typeof(p.retweeted_status)!="undefined";q.tweet_time=j(p.created_at);q.join_text=n.join_text=="auto"?g(p.text):n.join_text;q.tweet_id=p.id_str;q.twitter_base="http://"+n.twitter_url+"/";q.user_url=q.twitter_base+q.screen_name;q.tweet_url=q.user_url+"/status/"+q.tweet_id;q.reply_url=q.twitter_base+"intent/tweet?in_reply_to="+q.tweet_id;q.retweet_url=q.twitter_base+"intent/retweet?tweet_id="+q.tweet_id;q.favorite_url=q.twitter_base+"intent/favorite?tweet_id="+q.tweet_id;q.retweeted_screen_name=q.retweet&&p.retweeted_status.user.screen_name;q.tweet_relative_time=h(q.tweet_time);q.entities=p.entities?(p.entities.urls||[]).concat(p.entities.media||[]):[];q.tweet_raw_text=q.retweet?("RT #"+q.retweeted_screen_name+" "+p.retweeted_status.text):p.text;q.tweet_text=a([e(q.tweet_raw_text,q.entities)]).linkUser().linkHash()[0];q.tweet_text_fancy=a([q.tweet_text]).makeHeart().capAwesome().capEpic()[0];q.user=l('<a class="tweet_user" href="{user_url}">{screen_name}</a>',q);q.join=n.join_text?l(' <span class="tweet_join">{join_text}</span> ',q):" ";q.avatar=q.avatar_size?l('<a class="tweet_avatar" href="{user_url}"><img src="{avatar_url}" height="{avatar_size}" width="{avatar_size}" alt="{screen_name}\'s avatar" title="{screen_name}\'s avatar" border="0"/></a>',q):"";q.time=l('<span class="tweet_time">{tweet_relative_time}</span>',q);q.text=l('<span class="tweet_text">{tweet_text_fancy}</span>',q);q.reply_action=l('<a class="tweet_action tweet_reply" href="{reply_url}">reply</a>',q);q.retweet_action=l('<a class="tweet_action tweet_retweet" href="{retweet_url}">retweet</a>',q);q.favorite_action=l('<a class="tweet_action tweet_favorite" href="{favorite_url}">favorite</a>',q);return q}return this.each(function(p,s){var r=a('<ul class="tweet_list">');var q='<p class="tweet_intro">'+n.intro_text+"</p>";var o='<p class="tweet_outro">'+n.outro_text+"</p>";var t=a('<p class="loading">'+n.loading_text+"</p>");if(n.username&&typeof(n.username)=="string"){n.username=[n.username]}a(s).bind("tweet:load",function(){if(n.loading_text){a(s).empty().append(t)}a.getJSON(d(),function(u){a(s).empty().append(r);if(n.intro_text){r.before(q)}r.empty();var v=a.map(u.results||u,i);v=a.grep(v,n.filter).sort(n.comparator).slice(0,n.count);r.append(a.map(v,function(w){return"<li>"+l(n.template,w)+"</li>"}).join("")).children("li:first").addClass("tweet_first").end().children("li:odd").addClass("tweet_even").end().children("li:even").addClass("tweet_odd");if(n.outro_text){r.after(o)}a(s).trigger("loaded").trigger((v.length===0?"empty":"full"));if(n.refresh_interval){window.setTimeout(function(){a(s).trigger("tweet:load")},1000*n.refresh_interval)}})}).trigger("tweet:load")})}})(jQuery);
So, please help me anyone how may I fix this issue? It would be really very helpful for me man. Waiting for anyone answers here!
You cannot continue using the seaofclouds tweet.js and expect to pull tweets like you used to last month. The cause of this was that twitter 1.0 api was recently replaced with 1.1 - https://dev.twitter.com/blog/api-v1-is-retired. The author of jquery.tweet.js talks about the API changes and the state of his plugin here: https://github.com/seaofclouds/tweet/#important-note-about-twitters-api-changes-in-2013 and a separate discussion can be found here: https://github.com/seaofclouds/tweet/issues/264 with an alternative (which works) - https://github.com/StanScates/Tweet.js-Mod.
Tweet.js-Mod is a modified version of the original jquery.tweet.js and uses a php script to access twitter's API version 1.1.
Installation instructions can be found here: https://github.com/StanScates/Tweet.js-Mod#how-to-use
Looks like this is a major change to how Twitter is handling requests, version 1.0 of their API allowed unauthenticated requests, but now that has changed where Twitter is pushing developers to use their embedded timelines, which requires authentication.
Discussion of Twitter transition from v1.0 to v1.1
Let me warn you about using the https://github.com/StanScates/Tweet.js-Mod#how-to-use
It works perfectly but some of your clients where the website runs may have this ERROR
ERROR:
XMLHttpRequest cannot load http://api.getmytweets.co.uk/?screenname=philipbeel&limit=5&undefined=undefined. Origin http://plugins.theodin.co.uk is not allowed by Access-Control-Allow-Origin.
Also this method does not use the Oauth method and your security can be compromised. Twitter highly discourages the uses of client side authentication.
Your solution has 2 parts
Use a php based Oauth Script to retrieve you tweets in JSON format
Using Jquery format the JSON and show it into your website
SIMPLE AND SECURE.
HERE IS A COMPLETE TUTORIAL
http://www.webdevdoor.com/javascript-ajax/custom-twitter-feed-integration-jquery/

Cross/Different Domain get src's HTML content

Lets say I have example.html and inside that i have a code like
<iframe src="x.com" id="x"></iframe>
from x.com, I would like to get everything inside
<div class="content">...</div>
into example.html inside
<div class="xCodes">ONTO HERE</div>
So I tried to get the elements inside x.com to show up on example.html and I heard it's not possible to access them for cross domain problems.
I was wondering if there was another way to retrieve HTML tags from x.html into example.html
Maybe without using <iframe />??
Sourced from: http://james.padolsey.com/javascript/cross-domain-requests-with-jquery/
$('.xCodes').load('http://x.com/x.html');
OR
$.ajax({
url: 'http://x.com/x.html',
type: 'GET',
success: function(res) {
var data = $(res.responseText).find('.content').text();
$('.xCodes').html(data);
}
});
If I understand correctly you want to rip the content from a DIV on one site and display it on another. There are several issues with this, but I'll focus on the technological aspect and assume you are acting in good faith with pulling the content.
The real issue you're running up against here is that you don't have access to DOM elements of pages that haven't loaded yet. As such you need to tell the browser to load the data for that page so that you can access the elements that should have loaded on the page and then pull the information out. JQuery has a nice little method to help with that called .load() (http://api.jquery.com/load/).
As an important side note however you can't do this as all modern broswers forbid cross site access in such a manner:
From the JQuery .load() page:
Additional Notes:
Due to browser security restrictions, most "Ajax" requests are subject to the same origin policy; the request can not successfully retrieve data from a different domain, subdomain, or protocol.
And check out:
http://en.wikipedia.org/wiki/Same_origin_policy
One more bit of warning. If you don't control the code on the other site you are potentially exposing yourself to serious security issues so only do this in situations where you have control of the other site or for some reason have absolute faith in that site. Alternatively you should try to, if available, use APIs for the sites/services you are trying to get data from.

Alternative to iFrames with HTML5

I would like to know if there is an alternative to iFrames with HTML5.
I mean by that, be able to inject cross-domains HTML inside of a webpage without using an iFrame.
Basically there are 4 ways to embed HTML into a web page:
<iframe> An iframe's content lives entirely in a separate context than your page. While that's mostly a great feature and it's the most compatible among browser versions, it creates additional challenges (shrink wrapping the size of the frame to its content is tough, insanely frustrating to script into/out of, nearly impossible to style).
AJAX. As the solutions shown here prove, you can use the XMLHttpRequest object to retrieve data and inject it to your page. It is not ideal because it depends on scripting techniques, thus making the execution slower and more complex, among other drawbacks.
Hacks. Few mentioned in this question and not very reliable.
HTML5 Web Components. HTML Imports, part of the Web Components, allows to bundle HTML documents in other HTML documents. That includes HTML, CSS, JavaScript or anything else an .html file can contain. This makes it a great solution with many interesting use cases: split an app into bundled components that you can distribute as building blocks, better manage dependencies to avoid redundancy, code organization, etc. Here is a trivial example:
<!-- Resources on other origins must be CORS-enabled. -->
<link rel="import" href="http://example.com/elements.html">
Native compatibility is still an issue, but you can use a polyfill to make it work in evergreen browsers Today.
You can learn more here and here.
You can use object and embed, like so:
<object data="http://www.web-source.net" width="600" height="400">
<embed src="http://www.web-source.net" width="600" height="400"> </embed>
Error: Embedded data could not be displayed.
</object>
Which isn't new, but still works. I'm not sure if it has the same functionality though.
object is an easy alternative in HTML5:
<object data="https://github.com/AbrarJahin/Asp.NetCore_3.1-PostGRE_Role-Claim_Management/"
width="400"
height="300"
type="text/html">
Alternative Content
</object>
You can also try embed:
<embed src="https://github.com/AbrarJahin/Asp.NetCore_3.1-PostGRE_Role-Claim_Management/"
width=200
height=200
onerror="alert('URL invalid !!');" />
Re-
As currently, StackOverflow has turned off support for showing external URL contents, run code snippet is not showing anything. But for your site, it will work perfectly.
No, there isn't an equivalent. The <iframe> element is still valid in HTML5. Depending on what exact interaction you need there might be different APIs. For example there's the postMessage method which allows you to achieve cross domain javascript interaction. But if you want to display cross domain HTML contents (styled with CSS and made interactive with javascript) iframe stays as a good way to do.
If you want to do this and control the server from which the base page or content is being served, you can use Cross Origin Resource Sharing (http://www.w3.org/TR/access-control/) to allow client-side JavaScript to load data into a <div> via XMLHttpRequest():
// I safely ignore IE 6 and 5 (!) users
// because I do not wish to proliferate
// broken software that will hurt other
// users of the internet, which is what
// you're doing when you write anything
// for old version of IE (5/6)
xhr = new XMLHttpRequest();
xhr.onreadystatechange = function() {
if(xhr.readyState == 4 && xhr.status == 200) {
document.getElementById('displayDiv').innerHTML = xhr.responseText;
}
};
xhr.open('GET', 'http://api.google.com/thing?request=data', true);
xhr.send();
Now for the lynchpin of this whole operation, you need to write code for your server that will give clients the Access-Control-Allow-Origin header, specifying which domains you want the client-side code to be able to access via XMLHttpRequest(). The following is an example of PHP code you can include at the top of your page in order to send these headers to clients:
<?php
header('Access-Control-Allow-Origin: http://api.google.com');
header('Access-Control-Allow-Origin: http://some.example.com');
?>
This also does seem to work, although W3C specifies it is not intended "for an external (typically non-HTML) application or interactive content"
<embed src="http://www.somesite.com" width=200 height=200 />
More info:
http://www.w3.org/wiki/HTML/Elements/embed
http://www.w3schools.com/tags/tag_embed.asp
An iframe is still the best way to download cross-domain visual content. With AJAX you can certainly download the HTML from a web page and stick it in a div (as others have mentioned) however the bigger problem is security. With iframes you'll be able to load the cross domain content but won't be able to manipulate it since the content doesn't actually belong to you. On the other hand with AJAX you can certainly manipulate any content you are able to download but the other domain's server needs to be setup in such a way that will allow you to download it to begin with. A lot of times you won't have access to the other domain's configuration and even if you do, unless you do that kind of configuration all the time, it can be a headache. In which case the iframe can be the MUCH easier alternative.
As others have mentioned you can also use the embed tag and the object tag but that's not necessarily more advanced or newer than the iframe.
HTML5 has gone more in the direction of adopting web APIs to get information from cross domains. Usually web APIs just return data though and not HTML.
I created a node module to solve this problem node-iframe-replacement. You provide the source URL of the parent site and CSS selector to inject your content into and it merges the two together.
Changes to the parent site are picked up every 5 minutes.
var iframeReplacement = require('node-iframe-replacement');
// add iframe replacement to express as middleware (adds res.merge method)
app.use(iframeReplacement);
// create a regular express route
app.get('/', function(req, res){
// respond to this request with our fake-news content embedded within the BBC News home page
res.merge('fake-news', {
// external url to fetch
sourceUrl: 'http://www.bbc.co.uk/news',
// css selector to inject our content into
sourcePlaceholder: 'div[data-entityid="container-top-stories#1"]',
// pass a function here to intercept the source html prior to merging
transform: null
});
});
The source contains a working example of injecting content into the BBC News home page.
You can use an XMLHttpRequest to load a page into a div (or any other element of your page really). An exemple function would be:
function loadPage(){
if (window.XMLHttpRequest){
// code for IE7+, Firefox, Chrome, Opera, Safari
xmlhttp=new XMLHttpRequest();
}else{
// code for IE6, IE5
xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
}
xmlhttp.onreadystatechange=function(){
if (xmlhttp.readyState==4 && xmlhttp.status==200){
document.getElementById("ID OF ELEMENT YOU WANT TO LOAD PAGE IN").innerHTML=xmlhttp.responseText;
}
}
xmlhttp.open("POST","WEBPAGE YOU WANT TO LOAD",true);
xmlhttp.send();
}
If your sever is capable, you could also use PHP to do this, but since you're asking for an HTML5 method, this should be all you need.
The key issue to load another site's page in your own site's page is security. There is cross-site security policy defined, you would have no chance to load it directly in your iframe if another site has it set to strict same origin policy. Hence to find an alternative to iframe, they must be able to bypass this security policy restriction, even in the future, if any new component is introduced by WSC, it would have similar security restrictions.
For now, the best way to bypass this is to simulate a normal page access in your logic, this means AJAX + server side access maybe a good option. You can set up some proxy at your server side and fetch the page content and load it into the iframe. This works but not perfect as if there is any link or image in the content and they may not be accessible.
Normally if you try to load a page in your own iframe, you would need to check whether the page can be loaded in iframe or not. This post gives some guidelines on how to do the check.
You should have a look into JSON-P - that was a perfect solution for me when I had that problem:
https://en.wikipedia.org/wiki/JSONP
You basically define a javascript file that loads all your data and another javascript file that processes and displays it. That gets rid of the ugly scrollbar of iframes.