Extracting values from a login accessible web page post-javascript using Ruby - html

I have a stock trading website that is only accessible after logging into the site. After logging in, there is a stock value that I am trying to extract. That number is not readily available and takes a while to load as it is being updated from the company's database.
I am trying to write a script in Ruby that will allow me to extract the number and then use it in my program.
In firebug, the tag looks like this but only after the number has loaded:
<span id="ContentPlaceHolderTodaysStock">10,747</span>
I have explored libraries such as hpricot and nokogiri and have tried code similar to the following:
require "nokogiri"
require "open-uri"
doc = Nokogiri::HTML(open("website.com/stocks"))
puts doc.xpath("//span/text()")
The problems I run into are
1)it only reads the html from the login page "website.com" instead of "website.com/stocks"
2)once I do get past the login, how do I use the html code after the javascript has loaded?
I have also tried Watir so that can get me past problem #1 but then doing something like the following doesn't help with problem#2 because it provides the original html source...
require 'net/http'
source = Net::HTTP.get("website.com/stocks", '/')
Any help in solving this problem would be greatly appreciated. Thank you!

Since you are able to login using Watir, you may as well use it to get the text off of the page. Watir has built-in methods for waiting for asynchronous components to load - see http://watirwebdriver.com/waiting/.
To get the text, you will want something like:
puts browser.span(:id => 'element_id').when_present.text

If it's being loaded after-the-fact, it can't be seen by Nokogiri. You'll need to use something like Watir.
once I do get past the login, how do I use the html code after the javascript has loaded?
You can't get there with Nokogiri. The added HTML doesn't exist in Nokogiri's world, since it's given the base HTML via OpenURI. Nokogiri doesn't execute JavaScript.
Watir, on the other hand, can do all that, so it's your only choice. You'll have to figure out how to navigate through the login-page, request the stock page, then loop, waiting until the text appears, then grab it and do whatever you want with it.

Related

"Reverse" JSON Status API

I've been wondering how to fetch the PlayStation server status. They display it on this page:
https://status.playstation.com/en-us/
But PlayStation is known to use APIs instead of PHP database fetches. After looking around in the source code of the site, I found that they have a separate file called /data.json.
https://status.playstation.com/en-us/data.json
The content of this file is the same as the index file (for some reason). They use stuff like {{endDateTitle}} and {{message}}, but I can't find where it's defined, if it's pulled using a separate file or just pulled from a database using PHP.
How can I "reverse" this site and see if there's a API I can use to display the status on my site?
Maybe I did not get the question right, but it seems pretty straightforward.
If using firefox, open Developer tools, Network. Reload the page.
You can clearly see the requested URL
https://status.playstation.com/data/statuses/region/SCEA.json
It seems that an empty list as a status means "No problems" (since there are no problems I cannot verify this assumption. That's all
The parenthesis {{}} are used by various HTML templating languages, like angular, so you'd have to go through the js code to understand where they get updated.

How to get change in HTML DOM in LabVIEW?

I am doing IOT related project in Labview using Arudino as hardware.
I was able switch off/on an led on Arudino by Pressing OFF/ON on website by using datasocket vi. Now what i want is to control the intensity of led from Website.
I have a range slider in my website and its real time value can be viewed in textarea,div,input type.
Is there any way i can get that real time value that is being changed in HTML DOM in Labview.
I know that datasocket vi returns the html source code but not the HTML DOM.
I dont want to use the Web Publishing Services as they dont work in my Laptop.
This is the link im referring for datasocket.
Datasocket Labview
You can do something like creating a web socket, but I expect the easiest thing is to use a web service. You can create one in LV and add a setLEDIntensity method to it and call it from your JS code. You can find a simple example here and in other documents in that community.
Use WebSocket API for LabVIEW to send and receive data from the web. This is the best option for you.
https://decibel.ni.com/content/docs/DOC-40572

Drupal 7 (VERY) Custom Preview

I have a drupal site that is being used strictly as a CMS that produces JSON feeds using services and services_views, which are consumed by a separate site. What I would like to do (and I have a working proof of concept of this) is allow for a "live preview" on the real site, by intercepting the node form preview / submit, encoding the node as JSON, and loading a special page on the live site that consumes that JSON and displays the page accordingly.
The problem with this JSONized node is, it's different from the JSON being produced by my view (using services_views). My end goal is to produce JSON that is identical for both previewed and non-previewed objects, without having to maintain separate output methods (I could easily hand-customize the json but then when my view for the public api changes I have to make the same changes to the preview json. Trying to avoid this).
I'm looking for feedback on this approach. Is what I'm attempting even possible? The ideas I've been able to come up with so far are:
being able to (conditionally) drive my view with data from a non-databse source
sneakily inserting data into the view object during one of the stages of execution? Kludgy but I'm not above that :)
saving a "clone" node (or revision?) of the node being previewed and let the view use that to display the preview JSON?
Maybe this is the wrong approach altogether and there's something better? (Trying to intercept and format the services output in my module... maybe avoid services_views altogether?)
If anyone can offer some advice, insight or opinions on how to best proceed here, I'd be really grateful.
in a custom module, you could set up a page that grabs the json output from the view page.
$JSON = file_get_contents($url);
that way the preview stays bound to the view, even if the view changes.
First I think it's not an easy task what you are trying to achieve. So before all, good luck.
I think you could intercept the node submission data, then create a node programatically, then render that node, and then export the rendered node to JSON. Inmediately after you get the JSON, delete this node, because the programmatically created node is only for preview.
This task could be more CPU demanding but think that previewing content exactly as the content will look is difficult.
Your rss feeds that your site reads could be filtered with some parameter to avoid programmatically created nodes (prewiew nodes), despite these nodes will be available for a very short time.

Automatically copy text from a web page

There is a vpn that keeps changing their password. I have an autologin, but obviously the vpn connection drops every time that they change the password, and I have to manually copy and paste the new password into the credentials file.
http://www.vpnbook.com/freevpn
This is annoying. I realise that the vpn probably wants people not to be able to do this, but it's not against the ToS and not illegal, so work with me here!
I need a way to automatically generate a file which has nothing in it except
username
password
on separate lines, just like the one above. Downloading the entire page as a text file automatically (I can do that) will therefore not work. OpenVPN will not understand the credentials file unless it is purely and simply
username
password
and nothing more.
So, any ideas?
This kind of thing is done ideally via an API that vpnbook provides. Then a script can much more easily access the information and store it in a text file.
Barring that, and looks like vpnbook doesn't have an API, you'll have to use a technique called Web Scraping.
To automate this via "Web Scraping", you'll need to write a script that does the following:
First, login to vpnbook.com with your credentials
Then navigate to the page that has the credentials
Then traverse the structure of the page (called the DOM) to find the info you want
Finally, save out this info to a text file.
I typically do web scraping with Ruby and the mechanize library. The first example in the Mechanize examples page shows how to visit the google homepage, perform a search for "Hello World", and then grab the links in the results one at time printing it out. This is similar to what you are trying to do except instead of printing it out you would want to write it to a text file. (Google for writing a text file with Ruby)":
require 'rubygems'
require 'mechanize'
a = Mechanize.new { |agent|
agent.user_agent_alias = 'Mac Safari'
}
a.get('http://google.com/') do |page|
search_result = page.form_with(:id => 'gbqf') do |search|
search.q = 'Hello world'
end.submit
search_result.links.each do |link|
puts link.text
end
end
To run this on your computer you would need to:
a. Install ruby
b. Save this in a file called scrape.rb
c. call it by using the command line "ruby scrape.rb"
OSX comes with an older ruby that would work for this. Check out the ruby site for instructions on how to install it or get it working for your OS.
Before using a gem like mechanize you need to install it:
gem install mechanize
(this depends on Rubygems being installed, which I think typically comes with Ruby).
If you're new to programming this might sound like a big project, but you'll have an amazing tool in your toolbox for the future, where you'll feel like you can pretty much "do anything" you need to, and not rely on other developers to have happened to have built the software you need.
Note: for sites that rely on javascript, mechanize wont work - you can use Capybara+PhantomJS to run an actual browser that can run javascript from Ruby.
Note 2: Its possible that you don't actually have to go through the motions of (a) going to the login page (2) "filling in your info", (3) clicking on "Login", etc. Depending how their authentication works, you may be able to go directly to the page that displays info you need and just provide your credentials directly to that page using either basic auth or other means. You'll have to look at how their auth system works and do some trial and error for this. The most straightforward, most likely to work approach is to just to what a real user would do...login through the login page.
Update
After writing all this, I came across the vpnbook-utils library (during a search for "vpnbook api") which I think does what you need:
...With this little tool you can generate OpenVPN config files for the free VPN provider vpnbook.com...
...it also extracts the ever changing credentials from the vpnbook.com website...
looks like with one command line:
vpnbook config
you can automatically grab the credentials and write them into a config file.
Good luck! I still recommend you learn ruby :)
You don't even need to parse the content. Just string search for the second occurrence of Username:, cut everything before that, use sed to find the content between the next two occurrences of <strong> and </strong>. You can use curl or wget -qO- to get the website's content.

How to include the result of an api request in a template?

I'm creating a wiki using Mediawiki for the first time. I would like to include automatically all backlinks of the current page in a template (like the "See also" section). I tried to play with the API, successfully, but I still haven't succeed in including the useful section of the result in my template.
I have been querying Google and Stackoverflow for days (maybe in the wrong way) but I'm still stuck.
Can somebody help me?
As far as I know, there is no reasonable way to do that. Probably the closest you could get is to write a JavaScript code that reacts on the presence of a specific HTML element in the page, makes the API request and then updates the HTML to include the result.
It’s not possible in wiki text to execute any JavaScript or use even more uncommon HTML. As such you won’t be able to use the MediaWiki API like that.
There are multiple different options you have to achieve something like this though:
You could use the API by including custom JavaScript code on MediaWiki:Common.js. The code there will be included automatically and can be used to enhance the wiki experience. This obviously requires JavaScript on the client so it might not be the best option; but at least you could use the API directly. You would have to add something to figure out where to place the results correctly though.
A better option would be to use an extension that gives you this output. You can either try to find an extension that already provides this functionality, or write your own that uses the internal MediaWiki API (not the JS one) to access that content.
One extension I could personally recommend you that does this (and many other things), is DynamicPageList (full disclosure: I’m somewhat affiliated with that project). It allows you to perform complex page selections.
For example what you are trying to do is to find all pages which link to your page. This can be easily done by DPL like this:
{{ #dpl: linksto = {{FULLPAGENAME}} }}
I wrote a blog post recently showing how to call the API to get the job queue size and display that inside of the wiki page. You can read about it at Display MediaWiki job queue size inside your wiki. This solution does require the External Data extension however. The code looks like:
{{#get_web_data: url={{SERVER}}{{SCRIPTPATH}}/api.php?action=query&meta=siteinfo&siprop=statistics&format=json
| format=JSON
| data=jobs=jobs}}
{{#external_value:jobs}}
You could easily swap in a different API call to get other data. For the specific item your looking for, #poke's answer above is probably better.