Can I access HTML5 storages using HTMLUnit - html

I've a requirement where I need to identify if any page is storing or reading from HTML5 data stores. I am using HTMLUnit to scrape through webpages. I checked in the sourceforge listing that the support for HTML5 storages has been built. Does HTMLUnit actually create objects for localStorage, sessionStorage etc? If yes, how can I access them?
I've also thought of scraping all Javascripts on the page and search for the keywords, but is there any better method than that?

a simple test could be to pass a javascript source code that does the setItem('key','value') storage and then does getItem('key') and inspect the result. If some script object is returned, it means success. something like the following:
ScriptResult result = currentPage.executeJavaScript("window.localStorage.setItem('some_key','some_value');window.localStorage.getItem('some_key');");
System.out.println("script result: "+result.getJavaScriptResult().toString());

Related

"Reverse" JSON Status API

I've been wondering how to fetch the PlayStation server status. They display it on this page:
https://status.playstation.com/en-us/
But PlayStation is known to use APIs instead of PHP database fetches. After looking around in the source code of the site, I found that they have a separate file called /data.json.
https://status.playstation.com/en-us/data.json
The content of this file is the same as the index file (for some reason). They use stuff like {{endDateTitle}} and {{message}}, but I can't find where it's defined, if it's pulled using a separate file or just pulled from a database using PHP.
How can I "reverse" this site and see if there's a API I can use to display the status on my site?
Maybe I did not get the question right, but it seems pretty straightforward.
If using firefox, open Developer tools, Network. Reload the page.
You can clearly see the requested URL
https://status.playstation.com/data/statuses/region/SCEA.json
It seems that an empty list as a status means "No problems" (since there are no problems I cannot verify this assumption. That's all
The parenthesis {{}} are used by various HTML templating languages, like angular, so you'd have to go through the js code to understand where they get updated.

Html Http request read source

When I view a website in my browser (for example https://www.homedepot.ca/en/home/p.725-inch-miter-saw-with-laser.1000748698.html), it contains information that is not in the source code.
For example, the source code of this page doesn't specify a product price:
<span itemprop="price">-</span>
<small>/
each</small>
However, when viewed in a browser, the tag does actually contain a price.
How can I retrieve the product's price from the source code?
Short answer: just by reading the source, you can't. The price is dynamically loaded from their servers (using javascript), after the page loaded.
Using appropriate tools (such as the network tab in Chrome/Firefox's developer console) you can figure out where they retrieve the price from (in this case JSON document on their servers). However, even if you used that, there is no guarantee that it'll still work tomorrow - they can charge their link or the format of the data at any moment.
A good place to get started on the technologies they use is reading up on
JavaScript
AJAX
JSON
If you are interested in retrieving information from their page pro grammatically, a good start would be to contact them to see if they have a public interface (API) you can use. These are usually more stable to use.

How to Find a JSON object of a Website

New to the JSON world and I'm trying to find out how to view a JSON object of a webpage. Will every webpage have a JSON object and if so how do I find it in order to get the data and display it on my site? I vaguely remember something about using Firebug?
Thanks,
B
Will every webpage have a JSON object
No.
Many web sites will not use any JSON; many will be completely static (HTML and CSS only).
It may only apply if there is a "Web API" (for programmatic access to content), but there are non-JSON ways to do APIs (the X in AJAX is for XML).
To determine how to access a site programmatically look at the site's developer documentation. If there isn't any documentation then any AJAX web debuggers (like FireBug) show may well be internal only and intended only for the site's own implementation; other uses could well be not welcome (you could be up for violating IP).
This might become a vulnerability to add sensitive JSON to your final HTML page.. JSON should be loaded like an ingredient to the soup, via Ajax for example on authenticated page. If it's not sensitive JSON then you should load it for performance reasons once it is required... it really depends on your choice. I have built a library to handle these kind of requests for web, check it out: https://github.com/alexmano/jsMan

How to include the result of an api request in a template?

I'm creating a wiki using Mediawiki for the first time. I would like to include automatically all backlinks of the current page in a template (like the "See also" section). I tried to play with the API, successfully, but I still haven't succeed in including the useful section of the result in my template.
I have been querying Google and Stackoverflow for days (maybe in the wrong way) but I'm still stuck.
Can somebody help me?
As far as I know, there is no reasonable way to do that. Probably the closest you could get is to write a JavaScript code that reacts on the presence of a specific HTML element in the page, makes the API request and then updates the HTML to include the result.
It’s not possible in wiki text to execute any JavaScript or use even more uncommon HTML. As such you won’t be able to use the MediaWiki API like that.
There are multiple different options you have to achieve something like this though:
You could use the API by including custom JavaScript code on MediaWiki:Common.js. The code there will be included automatically and can be used to enhance the wiki experience. This obviously requires JavaScript on the client so it might not be the best option; but at least you could use the API directly. You would have to add something to figure out where to place the results correctly though.
A better option would be to use an extension that gives you this output. You can either try to find an extension that already provides this functionality, or write your own that uses the internal MediaWiki API (not the JS one) to access that content.
One extension I could personally recommend you that does this (and many other things), is DynamicPageList (full disclosure: I’m somewhat affiliated with that project). It allows you to perform complex page selections.
For example what you are trying to do is to find all pages which link to your page. This can be easily done by DPL like this:
{{ #dpl: linksto = {{FULLPAGENAME}} }}
I wrote a blog post recently showing how to call the API to get the job queue size and display that inside of the wiki page. You can read about it at Display MediaWiki job queue size inside your wiki. This solution does require the External Data extension however. The code looks like:
{{#get_web_data: url={{SERVER}}{{SCRIPTPATH}}/api.php?action=query&meta=siteinfo&siprop=statistics&format=json
| format=JSON
| data=jobs=jobs}}
{{#external_value:jobs}}
You could easily swap in a different API call to get other data. For the specific item your looking for, #poke's answer above is probably better.

passing data to web page (QWebView)

I'm writing a UI for a client that parses some very nested JSON data. This UI is in PySide and I'd like to include some visualization of the data as well. I've recently come across QWebView and this seems like a great way to quickly embed 'stunning' charts into my UI that can potentially also be configured.
So the question is, how can I send 'signals' and data to the page? The one approach that would work is to manually create the page as a temp file and have the webview browse to that, but I think there should be a better way. Is there?
You're probably looking for QWebFrame::addToJavaScriptWindowObject(). With that method, you can export QObjects to JavaScript. These objects can have signals you can connect to in JS, and you can also use properties or methods with return values to obtain some data.
See https://qt-project.org/doc/qt-4.8/qtwebkit-bridge.html for a complete overview on how the C++<->JS bridge works.