I just tried something rather trivial: get the source code of a web page (by saving it) and count how often a certain phrase occurs in the code.
Turns out, it doesn't work if that page uses Polymer / web components. Is this a browser bug?
Try the following: Go to http://www.google.com/design/icons/ and try to find star_half in the code (the last icon on the page). If you inspect the element inside of Chrome or Firefox, it will lead you to
<i class="md-icon dp48">star_half</i>
but this won't be in the source if you copy the root node or save the html to disk.
Is there a way to get the entire code?
Reason for this behavior is probably how source viewing( and source saving as well?) works for browser and because shadow roots are attached to web components on the client side.
When you press ctrl-u on a web page, browser essentially does a network call again on the same url to fetch a copy of what server returned when you hit that url.
In this case, when this page renders, browser identifies the component icons-layout and then executes code to attach a shadow-root to this node. All of this happens when your page reaches the client(browser).
When you are trying to save this page, you are saving what server returned not current state of the page. You'll see same behavior if you fire up chrome console and try to save an icons-layout node.
Is there a way to get the entire code?
I don't know how to do it from browser but phantomjs provides a way to save client side rendered html.
Related
I'm studying ways to develop a SEO-friendly React website with CSR.
I have read many articles pointing out that to provide a SEO-friendly website, one should go with the SSR approach.
To my knowledge, when using browser's view source feature in CSR, the html content is a bunch of javascript bundle files and the actual html would not be present since view source only shows what's rendered from server side. while in SSR html is rendered and passed to the browser and the displayed html would be present in source view of the page.
However https://divar.ir (a well known retailer site) seems to be using CSR (upon clicking any link, the data is fetched from an api endpoint in json format via an ajax call and then it looks like the page is rendered in client side).
The thing is, when I view the source of the page even after clicking any link, I can see the actual html that is being displayed.
So to sum it up, How can I use CSR in React, and when I view the source of a page, I actually see the html that is being displayed to the user?
Server side rendered react applications usually only pre-render the initial page load. Subsequent navigation may still be entirely handled and rendered by the client.
By using the view source tool it will open the code in a new tab (at least in chrome) that leads to a fresh load of the current route from the server. If the application is server side rendered you will receive a pre-rendered version of that route and therefore see the html for that route.
By providing a sitemap of your website a bot can discover all SEO relevant routes by visiting the urls provided in the sitemap. Each of those requests are independent requests to the server and will be pre-rendered in contrast to how a real user would navigate the page by clicking the links.
I am trying to understand the lifecycle of an HTML page. I cannot find any good resources on it online. So I opened up the f12 tool in ie and did some experiments on my own. Based on that I have drawn some conclusions, can someone please tell me if I am right?
My observation
1>When a page is requested over HTTP first the HTML skeleton is received by the browser. At this time nothing is displayed to the user.
2>Based on what is in the HTML skeleton some more additional requests are sent out for the resources (external JavaScript,css,images etc)
3>The browser waits until it receives a HTTP status code for the script and css resources.
4>Once the HTTP status code for the css and JavaScript is received, only then the browser starts loading the document top to bottom, executing whatever embedded JavaScript it encounters on the way.
5>If the embedded JavaScript on the top refers to an HTML element on the bottom, the JavaScript will fail.
6>Once the entire document finishes loading, then the jquery event $(document).ready is fired. (That is if I am using JQuery)
7>The browser does not wait for resources other than scripts and css, so resources like images could get loaded later after the page is displayed to the user.
You pretty much got it correct.
But it depends upon the code (especially point 5, 6 and 7). For example, if the JS at the top is within $(document).ready, then it will not fail.
Secondly, I would prefer Firefox F12 or Chrome F12 over IE. They are much much detailed and developer friendly. See the NET tab in them to understand further. It will show you the exact order in which the resources are called and loaded, which is what you were mainly looking for.
i have recently started using Senchs extJS.. when we see the source file it only displays what ever is the written code, but what the style has applied or any script that added later is not there in the "View Source"
Same for AJAX, when we load anything in any container, it is not there...
but if we're using Chrome and we inspect the element, it show everything....
WHY this behavior?
View Source in browsers typically only displays the downloaded source without running anything at all (including any JS that would modify the DOM). In fact, at least Chrome will create a separate request when you view source to get that code.
As for the reason why, I'm not sure exactly. This is just the standard and is the way that "view source" has worked for long before I was ever a web developer. It is similar to doing a raw HTTP request (i.e. you just get the source; nothing runs to change it). The term "Source" indicates the origin of what you have received, unmodified (think "source code.")
Because that's just how it works. View source only shows the page when it was first served up to the browser.
I have been reading the dev guide but haven't been able to work out how to put my own codes into webpages
I know it is possible because AVG uses it (in it's link scanner), and FastestChrome extension uses it too (highlight something and a link to a search pops up).
I have a backgrounded page but I can't get it to effect the webpages I go on (permissions are correct as I can get css to effect)
I am probably missing something really simple :/
It's not intuitively presented in the documentation but your background page can not access the current webpage b/c they are in different contexts. In other words the background page is it's own separate page so it has no access to any other page's DOM.
If you want to affect the page the user is viewing in the browser you will need to use what is referred to as a "content script".
If you want to communicate between content scripts and the background page you will need to refer to the message passing API. Check out my extension's source code for reference. I do exactly that.
Just remember...
Background Page: used for general logic in your extension, not anything page specific.
Content Scripts: are loaded into every page the user sees, and can manipulate that specific page.
Those probably use Content Scripts to inject Javascript into webpages. These scripts run in the context of the web pages and can access the DOM.
You can either define a script to always run in a web page by declaring the script file in the extension manifest, or you can use your background page to inject a script when needed.
I have a web application that has a constant URL and internal state machine. The states are changed via posts. I know it is a bad design and I should use the rest approach. But given this I have a following problem.
I use HTML5 offline cache (the manifest attribute in HTML tag). For the first page it is parsed and cached as I would expect (login page). But for the second page (main menu) the manifest included there is not parsed. No events are shown inside Chrome browser. If I change the URL a little by including a parameter then the manifest is parsed, but not before.
Event if I include everything in the login page manifest the second page downloads the same files again. Event if they are specified in the manifest for the first page.
Why this behaviour?
To answer it myself. It was looking so odd, simply because the cache is only parsed on GET calls and ignored on POST calls. Event if post loads another HTML page. To me this is a little bit silly but it seems that is how it works.
Now it finally works as it is supposed to.