Using node and node-phantom to scrape AngularJS Application

Using node and node-phantom to scrape AngularJS Application - html

I have a node script set up to scrape pages from an AngularJS application and then generate code needed for testing purposes. It works great except for one thing. ng-if. Since ng-if removes elements from the dom the script never sees these blocks of code. I can't remove the ng-if's. So I'm wondering if there is some way to intercept the html between when node-phantom requests the page and when it actually loads everything in to phantoms dom. What I'm hoping to do is simply set all the ng-if's to true so that all content is available. Does anyone have any ideas for this?
EDIT I'm using phantomjs-node not node-phantom.

My Final solution was to scrape the page for all of the comment tags. Then filter through to find the ones that contained ng-ifs and parse out variable names from those tags. Then I tapped into Angular's $scope and set all of the variables to true. Forcing everything that is hidden on the page to be visible.

Related

Can Go capture a click event in an HTML document it is serving?

I am writing a program for managing an inventory. It serves up html based on records from a postresql database, or writes to the database using html forms.
Different functions (adding records, searching, etc.) are accessible using <a></a> tags or form submits, which in turn call functions using http.HandleFunc(), functions then generate queries, parse results and render these to html templates.
The search function renders query results to an html table. To keep the search results page ideally usable and uncluttered I intent to provide only the most relevant information there. However, since there are many more details stored in the database, I need a way to access that information too. In order to do that I wanted to have each table row clickable, displaying the details of the selected record in a status area at the bottom or side of the page for instance.
I could try to follow the pattern that works for running the other functions, that is use <a></a> tags and http.HandleFunc() to render new content but this isn't exactly what I want for a couple of reasons.
First: There should be no need to navigate away from the search result page to view the additional details; there are not so many details that a single record's full data should not be able to be rendered on the same page as the search results.
Second: I want the whole row clickable, not merely the text within a table cell, which is what the <a></a> tags get me.
Using the id returned from the database in an attribute, as in <div id="search-result-row-id-{{.ID}}"></div> I am able to work with individual records but I have yet to find a way to then capture a click in Go.
Before I run off and write this in javascript, does anyone know of a way to do this strictly in Go? I am not particularly adverse to using the tried-and-true js methods but I am curious to see if it could be done without it.

does anyone know of a way to do this strictly in Go?
As others have indicated in the comments, no, Go cannot capture the event in the browser.
For that you will need to use some JavaScript to send to the server (where Go runs) the web request for more information.
You could also push all the required information to the browser when you first serve the page and hide/show it based on CSS/JavaScript event but again, that's just regular web development and nothing to do with Go.

HTML page title: localization and empty title

2 questions about a better way of solving the problem:
1) is there is a way to make HTML page title looking different for different locales of the client-side code except for javascript?
I.e. write HTML page title which is shown in the browser's tab in corresponding language.
I know I can use javascript for this, but may be there is another way?
2)I set my HTML page header with javascript (it is a different case). But there is a delay before the script will run. Is there is a way to set HTML page header to empty line before javascript evaluates?
If I remove tag I get the page URL.
If I use empty tag - same thing.
I have to use &nbsp content inside which looks a bit ugly.
Some other options?

I don't see any other means but JavaScript on the client side for this, sorry.
For the delay: try using an inline javascript to change the page title right on top of the page before any other scripts are loaded or executed, but after the page title has been set. This should keep the delay to an absolute minimum.

To the first:
Except Javascript, the only Way i know would be PHP, but using Javascript is a lot better and
easier.
To the second:
arkascha's Post is the answer

Are there any solutions to make my ajax script stable regardless of HTML changing?

I'm running a content-based website, and I usually used ajax to dynamically add items to the content list. Every time I updated my item structure I have to change my javascript to fit the new structure. I wonder whether there was any solution to keep script stable regardless of the changing of HTML?

Simple, instead of using the DOM to handle your data, process everything upon completion of the ajax request and only then call a function that has all of your data display functionality. Obviously you can't get away from having to change some code somewhere when you for instance rename HTML elements but you can separate concerns so that you only have to touch code in one place.

I do quite a bit of this in my app, and I follow the same pattern every time:
View page fires an ajax function to another page, which I call the "dispatcher" I use this pattern because I want a plain text output without header, footer, other JS, etc, so the dispatcher is a simple page that gets the request from the Ajax, fires appropriate PHP functions, and echos the results. In some cases it will return JSON strings while in others it will return HTML or plain text. For your example, return HTML from your server-side language.
Back in the AJAX success callback, inner html (.html()) an element with the returned html content. Have your server side language do the work of assembling the HTML (or even text if you're so inclined) because it is far less work and less overhead to accomplish.
Not too bad, huh?

POSTDATA without buttons in HTML4?

I have graphs in an html page. The graphs are generated by a call to a cgi-bin program in an IMG tag:
<IMG src="http://myserver.com/cgi-bin/StatBarChart.cgi?data=1,2,&data=3,5,1&legend=EC,ER">
Currently, the data for the graphs is passed as GET args (in the URL itself.)
Everything’s working OK, but te GET arguments are too long. I want to pass the data via POSTDATA. All the books I have (and discussions on the web that I’ve found) talk about using POSTDATA in forms that include a Submit button. I just want the graphs to appear as part of the page, without a Submit. Can this be done? Can it be done in HTML4, or does it require javascript?

I would require javascript, as you would have to get the resource yourself and set it to the img tag. This is not possible in html4.
Also, I don't see the problem with a long url. Your user will never see it (unless he looks in the sourcecode, which I don't consider as simple "user" anymore) so there is no problem with that either.

WPF, Frame Control, HTML DOM Document access

Ive used WindowsHost to host a WebBrowser control, and that has allowed me to access the WebBrowsers Document/DOM directly, t read HTML content via mouse clicks on HTML document elements and also to invokes on submit forms. I never found a way even in Net 3.5 to do this when I was searching at the time. Ive found this post http://rhizohm.net/irhetoric/blog/72/default.aspx and it looks like through som magic casing you can expose the dom. BUT My question is, has any one done this, and is it possible once you get the dom to do Invokes to submit contect to html forms and also get HTML elements via mouse click events????
Anyone tried? and was able to do both?
Thanks

I'm using WPF.
add a reference to:
Microsoft.mshtml
then:
var doc = ( mshtml.HTMLDocument )_wbOne.Document;
and this gives you the raw string:
doc.documentElement.innerHTML
in return, if you know how to get information out of the HTML document, i'd appreciate it.
for example get all the s and and the metas and whatever else might be gettable so i can get the information from them? i don't want to dink around with the html, just get the info from them...:-)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008