I'm trying to count the number of times a searchbox errors when I do a bulk input of test data on a website. So I'm trying to wget the query result and seeing whether there is the word "Error" in the html result page. I am trying to download the resulting html webpage after I submit a query to a website. I build the query and used wget to download the resulting webpage.
However, only the main content of the html is shown and not the result because it was done by using an external javascript file. The html that I want can only be seen if I right click on View Page Source on my browser. Is there a non-manual way to use wget/curl to download such page source instead of having to click through all of them?
The javascript is a program, and the result of a program isn't deterministic in polynomial time (for arbitrary input). Thus, it's easier to load the javascript in a sandbox environment, and then execute it against test cases.
Wget and curl can't do that: they don't have any features to examine/execute the result of their fetch. Practically speaking, what you need is a browser that can efficiently load and test the script, as wget/curl from the shell. Luckily, there is already a such thing: Selenium. It is a firefox/chrome/explorer extension, which makes a running an instance of those browsers scriptable, and easily controlled remotely.
If you want to run these browsers noninteractively, without a gui, I suggest using a fake (hardware-less) X server.
Google for: selenium, and google for: headless X. Good luck!
Related
I want to programmatically save a url, as I can manually do with Chrome’s Save As. I’m open to using any tool to accomplish this. If it’s not possible with a headless Chrome, a headed (Is this the correct word? :D) solution is also acceptable.
Other browsers are also acceptable, but note that I want the url saved as a ‘sane’ browser saves it. (Because the url in question needs Javascript rendering, and crashes phantomjs, too.)
Note that I’ve tried selenium’s page_source; This didn’t output the same content as Chrome’s Save As, and its result was incomplete and comparable to a simple curl.
PS: Here’s the url I’m trying to save https://outline.com/zKpUhM.
Update 1:
I’ve found the following non headless solution:
https://github.com/abiyani/automate-save-page-as
I have found an answer in https://splash.readthedocs.io.
After installation, this little shell function does exactly what I want:
full-html () {
#doc splash should be up. https://splash.readthedocs.io
#doc 'wait always waits the full time. Should be strictly < timeout.'
curl --silent "http://localhost:8050/render.html?url=$1&timeout=90&wait=${fu_wait:-10}" -o "$2"
}
See also: https://github.com/dhamaniasad/HeadlessBrowsers
I have a CGI script written in bash that performs certain actions and calculations on data passed to it from a parent web page. It does not display anything on its own page. When it is complete, I want to launch an ordinary flat HTML web page. How can I do this without user intervention? I know, I could create a submit button or a hyperlink, but I just want the script to finish its work and then link to a URL all by itself.
Muru had an idea that might work, although there may be instances where it might fail or work in an unexpected manner. I found a very simple solution, however:
curl -s "http://web-page-to-display"
I honestly don't know why I did not thnk of it sooner. It's pretty obvious, really.
I have some code on my mac in the latest version of python idle 3, that collects certain data from a csv file that gets sent to myself and prints out the output in the terminal. I want to create a webpage that has a button or link that a user clicks and it runs the code and prints out the output of my program.
Eventually i want to be able to create a website with multiple links that can do the same operation.
Will i need to create an sql database? If so how?...
From the sound of it, you want to use a webpage as a user interface for your python script. Unfortunately without utilising a server-side language this is not possible.
Multiple options exist for reacting to a button press on the server side, with PHP being the most well known, but solutions using only python do exist, such as Flask.
If you're just after a local GUI for your script, simpler options exist within python such as Tk.
Actually you can expose this function using a webserver and then the webpage will call the server with the right url.
Since you are using python I will recommend to take a look at Flask http://flask.pocoo.org/ great micro framework to get you started.
I've been wondering how to fetch the PlayStation server status. They display it on this page:
https://status.playstation.com/en-us/
But PlayStation is known to use APIs instead of PHP database fetches. After looking around in the source code of the site, I found that they have a separate file called /data.json.
https://status.playstation.com/en-us/data.json
The content of this file is the same as the index file (for some reason). They use stuff like {{endDateTitle}} and {{message}}, but I can't find where it's defined, if it's pulled using a separate file or just pulled from a database using PHP.
How can I "reverse" this site and see if there's a API I can use to display the status on my site?
Maybe I did not get the question right, but it seems pretty straightforward.
If using firefox, open Developer tools, Network. Reload the page.
You can clearly see the requested URL
https://status.playstation.com/data/statuses/region/SCEA.json
It seems that an empty list as a status means "No problems" (since there are no problems I cannot verify this assumption. That's all
The parenthesis {{}} are used by various HTML templating languages, like angular, so you'd have to go through the js code to understand where they get updated.
Currently when I build my site I have to manually open validate it at the 3wbc site (means when opera pops up, press ctr+alt+shft+u) for every page.
Is it possible to automatically validate every page whenever I build my pages?
P.s.: This page doesn't validate ;)
You can download and install your own copy of the validator - http://validator.w3.org/source/ - and invoke it locally instead of trekking out to w3.org for each page. Still, this requires piggybacking over a web server through plain HTTP or the API. For a simpler solution you may prefer to download the SP library - http://www.jclark.com/sp/index.htm or http://openjade.sourceforge.net/ - on which the W3 validator is based, then invoke the command 'nsgmls' from the command line.
There are of course also many desktop HTML validators that can process a batch of HTML pages at once, which may not be automated, but would certainly be much easier than manually checking each page. For example http://arealvalidator.com/ (Windows), http://www.webthing.com/software/validator-lite/install.html (Unix).
Might not be best choice for you but there's an Ant task for this: XmlValidate.
If you've got the HTML files in source control like SVN or Git, you can use a pre-commit hook script to run client-side validators on them. Or if you're feeling adventurous, you could use that method to ping another script on the server that validates the live pages...