render a full web page in node.js code - html

I am running a node.js server, and it is rendering a web page wonderfully. When I look at this in a browser, it runs exactly as I expect.
However, what I actually want to do is make the call to fully generate the html page - exactly as it is in the browser - within the node.js code, as a call. Currently, I have tried this:
http.request("http://localhost:8000/").end();
(with a few variants). This does exactly what it says, which is to make the single call to the server for the page - what it doesn't do is actually render the page, pulling in all of the other script files, and running the code on the page.
I have tried exploring express and ejs, and I think I need to use one of these, but I cannot find out how to do this fairly straightforward task. All it needs is to render an html page, but it seems to be a whole lot more complex than it should be.

What output do you want? A string of HTML? Maybe you want PhantomJS the headless browser. You could use it to render the page, then get the rendered DOM as a string of HTML.

Use the Mikeal's Request module to make http requests once you captured the response you then can inspect the html however you like.
To make that easier though you should use cheerio, this will give you a jQuery style api to manipulate the html.

Perhaps you are looking for wkhtmltopdf?
In a nutshell, it will render an entire web page (including images and JavaScript) to a PDF document.

Related

How it is possible to use jodd.http.HttpRequest to load a page content that is generated by javascript?

I try to load a page content with:
HttpResponse response2 = HttpRequest.get(_PAGE_URL).cookies(response.cookies()).send();
In a browser, the page source is full of javascript to generate the DOM, but in the Web Inspector of the browser I can see the generated source.
The question is, can I somehow retrieved the generated page content by Jodd's utilities?
You can't. You can just download the static HTML content (as you did) and then you would need to render it using some other tool.
Since Java 8, you can use JavaFX's WebView Component (as far as I remember), but please search for other tools as well (maybe cef?)
EDIT
See: https://github.com/igr/web-scraper (based on Selenium WebDriver). One thing I miss is better control over request/response.
There is also HtmlUnit, but from the reviews, it seems Selenium is a better choice.

Get full HTML after Ajax requests

Using PHP, how can I get the full HTML for a page after all Ajax requests have been executed on that page?
Basically, I would like to have the same HTML you see when you inspect an element in Google Chrome showing both the original HTML plus the extra markup added after the Ajax calls are done.
You'd need to use a headless browser to render the page. Check out Mink, as it appears to do what you're looking for.

Scraping with Cheerio, text is not visible

so I've been web scraping with Cheerio and I'm able to find the particular HTML element that I'm looking for, but for some reason, the text is not there.
For example in my web browser, when I inspect element I see Why Him?.
But, when I print out the object while scraping I see, so when I call the .text() function, it doesn't return anything. Why does this happen?
Inspect Element is not a valid test that Cheerio will be able to see something. You must use View Source instead.
Inspect Element is a live view of how the browser has rendered an element after applying all of the various technologies that exist in a browser, including CSS and JavaScript. View Source, on the other hand, is the raw code that the server sent to the browser, which you can generally expect to be the same as what Cheerio will receive. That is, assuming you ensure the HTTP headers are identical, particularly the ones relevant to content negotiation.
It is important to understand that while Cheerio is a DOM parser, it does not simulate a browser. So if the text is added via JavaScript, for example, then the text will not be there because that JavaScript will not have run.
If browser simulation is important to you, you should look into using PhantomJS. If you need a highly realistic browser rendering setup, then look into WebDriver and Leadfoot.

Get the "real" source code from a website

I've got a problem getting the "real" source code from a website:
http://sirius.searates.com/explorer
Trying it the normal way (view-source:) via Chrome I get a different result than trying it by using inspect elements function. And the code which I can see (using that function) is the one that I would like to have... How is that possible to get this code?
This usually happens because the UI is actually generated by a client-side Javascript utility.
In this case, most of the screen is generated by HighCharts, and a few elements are generated/modified by Bootstrap.
The DOM inspector will always give you the "current" view of the HTML, while the view source gives you the "initial" view. Since view source does not run the Javascript utilities, much of the UI is never generated.
To get the most up-to-date (HTML) source, you can use the DOM inspector to find the root html node, right-click and select "Edit as HTML". Then select-all and copy/paste into your favorite text editor.
Note, though, that this will only give you a snapshot of the page. Most modern web pages are really browser applications and the HTML is just one part of the whole. Copy/pasting the HTML will not give you a fully functional page.
You can get real-time html with this url,bookmark this url:
javascript:document.write('<textarea width="400">'+document.body.innerHTML+'</textarea>');

One page website and linking

Ok guys, what I can't seem to grasp is how, on a one page website, you link to certain pages/divs while using the scrollto function.
if you look at Ultranoir.com
You can see the site is built with the one page format but if you watch the url field, it navigates to subfolders etc, but is still loading all content dynamically. How do they achieve this effect while still keeping it so clean and ordered? on my current site it all stays at www.url.com/index.html even when I navigate pages. any help? thanks!
They are using hash tags to load different parts of their pages dynamically. if u add i.e. index.html#!/blog or index.html#!/about
you can parse the url client-side using javascript and load the correct content through ajax based on the url.
Check out this page to see an example implementation of this functionality using php and JQuery: http://www.queness.com/post/328/a-simple-ajax-driven-website-with-jqueryphp
They do it by abusing the fragment identifier. A modern approach would make use of pushState