Web automation - language-agnostic

I'm developing an interface between an old web based application and another one. That old web-based application works fine but there no exists any API to communicate with.
There is any programmatic way to say a web-form something like: enter this value on this field, this one ins other and submit form?
UPDATE: I looking for something like this:
WebAutomation w = new WebAutomation("http://apphost/report");
w.forms[0].input[3].value = 123;
w.forms[0].input[4].value = "hello";
Response r = w.forms[0].submit();
...

Despite the tag on your question, the answer is going to be highly language specific. There are also going to be wide range of solutions depending on how complex of a solution you are willing to implement and how flexible a result you are looking for.
On the one hand you can accomplish a lot in a very short period of time with something like Python's mechanize, but on the other hand, you can really get into the guts and have a lot of control by automating a browser using a COM object such as SHDocVw (Windows-only, of course).
Or, as LoveMeSomeCode suggested, you can really hit your head against the concrete and start forging POST requests, but good-luck figuring out what the server expects if is doing any front-end processing of the form data.
EDIT:
One more option, if you are looking for something that you can come up to speed on quickly, is to use a AutoIt's IE module, which basically provides a programmatic interface over an instance of Internet Explorer (its all COM in underneath, of course). Keep in mind that this will likely be the least supportable option you could choose. I have personally used this to produce proof-of-concept automation suites that were then migrated to a more robust C# implementation where I handled the COM calls myself.

In .NET: http://watin.sourceforge.net/
In ruby: http://wtr.rubyforge.org/
Cross platform: http://seleniumhq.org/

You can, but you have to mock up a POST request. The fields (textboxes, radio buttons, etc.) are transmitted as key-value pairs back to the resource. You need to make a request for this resource(whichever one is used in the SUBMIT action for the FORM tag) and put all your field-value pairs in a POST payload no the request.
Here's a good program to see what values are being transmitted: http://www.httpwatch.com
Or, you can use Firebug, a free Firefox extension.

The Perl module WWW::Mechanize does exactly that. Your
example would look something like this:
use WWW::Mechanize;
my $agent = WWW::Mechanize->new;
$agent->get("http://apphost/report");
my $response = $agent->submit_form(
with_fields => {
field_1_name => 123,
field_2_name => "hello",
},
);
There is also a Python port, and I guess similar libraries exist for many other languages.

Related

Logging service allowing simple <img/> interface

I'm looking to do some dead-simple logging from a web app (client-side) to some remote service/endpoint. Sure, I could roll my own, but for the purpose of this task, let's assume I want an existing service like Logentries/Splunk/Logstash so that my viewers can still log debugging info if my backend goes down.
Most logging services offer an API where I can import some <script/> onto my page and then use an API like LE.log('string', data); [Logentries example]. However that pulls in a JS dependency and uses cross-domain XHR for probably well-founded reasons (like URI length limitations).
My question is if anyone can point me to a service that will let me send simple query params to a "pixel" endpoint (similar to how Google Analytics does it). Something like:
<script>
new Image().src = 'http://something.io/pixel/log/<API_TOKEN>?some_data=1234';
</script>
-- or, in pure HTML --
<img src="http://something.io/pixel/log/<API_TOKEN>?some_data=1234" style="display:none" />
I'd assume some of the big names in the logging-as-a-service space would have something like this but I've not found anything (or it's too specific to turn up any search results).
This would not be for analytics so much as error logging, debugging, etc. Fire-and-forget sort of stuff.
Any advice appreciated.
It's possible to do this with Logentries, they offer a pixel tracker.
They require that data is sent in a base64 encoding, but that's quite simple in Javascript.
From their documentation:
var encoded = encodeURIComponent(btoa("Log message"));
This data can then be used in a pixel tracker like this:
<img src="https://js.logentries.com/v1/logs/{API-TOKEN}?e={ENCODED_DATA}/">

Perl table Extract or other method for multi page table

I'm trying to extract elements from a table, I have successfully used get and HTML:TableExtract to get elements of the table. The problem is the table is multi page and navigated with an arrow button to show additional pages. How would i extract these other pages as they are not new links but I think generated with JS or such?
Specifically I am trying to extract the table under Data for this Data Range at:
http://ycharts.com/companies/GOOG/pe_ratio#series=type:company,id:GOOG,calc:pe_ratio,,id:AAPL,type:company,calc:pe_ratio,,id:AMZN,type:company,calc:pe_ratio&zoom=3&startDate=&endDate=&format=real&recessions=false
See how there is the Viewing x of 45 and the First, Previous, Next, Last button.
The rest of the table elements can be viewed with next, how would i extract these in perl?
Update::
Hi Simbabque, Thanks for the response.
So I see if you click on next it calls:
ng-click="getHistoricalData(historicalData.currentPage+1)"
Is there a way I can call this method? I tried to use click,but it is not bound a name. (JS?)
I was trying to use Mechanize::Firefox now but I feel like their must be an easy way to use regular Mech and call the function and re-read the page?
The website builds up the tables using AJAX requests. Those are a little harder to parse. You can use WWW::Mechanize to fetch the initial page and then hit the AJAX calls for the table. It helps you keep track of cookies and stuff automatically.
use strict; use warnings;
use WWW::Mechanize;
my $mech = WWW::Mechanize->new;
$mech->get('http://ycharts.com/companies/GOOG/pe_ratio#series=type:company,id:GOOG,calc:pe_ratio,,id:AAPL,type:company,calc:pe_ratio,,id:AMZN,type:company,calc:pe_ratio&zoom=3&startDate=&endDate=&format=real&recessions=false');
my $response = $mech->post(
'http://ycharts.com/companies/GOOG/pe_ratio/data_ajax',
{
startDate => '1/1/1962',
endDate => '12/3/2013',
pageNum => 4,
}
);
if ( $response->is_success ) {
print $response->decoded_content; # or whatever
} else {
die $response->status_line;
}
This is just a basic example and will not work. It gives a 403 Forbidden. Probably there is more data required. Use Firebug or a similar tool to inspect what is happening. For example, there's another call to http://ping.chartbeat.net/ping?h=ycharts.com&p=%2Fcompanies%2FGOOG%2Fpe_ratio&u=o3m6snxteynby1b8&d=ycharts.com&g=20054&n=1&f=00001&c=10.81&x=200&y=1812&o=1663&w=658&j=30&R=0&W=1&I=0&E=109&e=6&b=1903&t=usmc0fjfd1j0h87g&V=16&_ happening automatically every now and again, with varying parameters. That is most likely required to keep the session going.
This page is pretty sophisticated. This might not be the best approach.
You could also try to use WWW::Mechanize::Firefox or even Selenium to remote-operate a browser. That will be better suited as it takes care of all the AJAX stuff that is happening.
Or you could look for a public API that just hands over that data voluntarily. I bet there is one around... or just pay for a ycharts pro account and hit the download button. ;-)

Scraping data after filling out form?

I'm doing a little project for my class and I'm just a beginner, so please forgive me if I mix up some of my terminology.
Basically, I'm creating an interactive journey planner for my city's public transit system. Unfortunately, they haven't made all the data I need publicly available. So instead of putting all my time into gathering the data for personal use, I've opted to do some screen scraping - letting their servers calculate the journey info from a START and STOP variable and then displaying the selected info on my page.
So is it possible to fill out a form's fields remotely, and then scrape the data on the page that subsequently loads? And if so, what would be the quickest, most convenient way? This happens to be a case where the data can't be manipulated via the URL, so it has to access the data by filling out the form first.
The website in question:
http://jp.translink.com.au/travel-information/journey-planner
Here is what you can do:
1.) Send a POST Request to the journey-planner with some data like that (be aware that CORS might jump in, then you could use cURL via PHP or whatsoever):
Start:Wickham Tce, Spring Hill
End:Upper Edward St, Spring Hill
SearchDate:10/05/2013 12:00:00 AM
TimeSearchMode:LeaveAfter
SearchHour:7
SearchMinute:40
TimeMeridiem:AM
TransportModes:Bus
TransportModes:Train
TransportModes:Ferry
MaximumWalkingDistance:1500
WalkingSpeed:Normal
ServiceTypes:Regular
ServiceTypes:Express
ServiceTypes:NightLink
FareTypes:Standard
FareTypes:Prepaid
FareTypes:Free
2.) You will get a new response location. This seems to be a REST link. Important for you is the id at the end. You will have to call that page and parse the HTML and look for a div with the HTML-id option-summaries, where you will find more information within the divs travel-option-1 to travel-option-n. You have to look at it carefully in order to find out which information is stored whee and how you will be able to use it.
In order to find such things you should learn how to use Firebug or Chrome's development tools.
This is one way to solve your problem. Probably not the best but still better than "screen-scraping" anything. But it will ask you for a lot of skills and effort. Furthermore if the data provider is going to change just a bit your solution will not work anymore. Additionally they might prevent your access by CORS or anything else (blocking your IP etc.)

How to model complex multi page forms with conditional branches?

It has occured to me that a large part of my job is really just building the same thing over and over again.
These are fundamentally complex multipage forms e.g. mortgage applications, insurance, etc.
Is there a common / well used model for such things? I don't care what language / technology is used. I'm thinking XML / language neutral ideally.
You can also use http://www.springsource.org/spring-web-flow:
Spring Web Flow is a Spring MVC extension that allows implementing the "flows" of a web application. A flow encapsulates a sequence of steps that guide a user through the execution of some business task. It spans multiple HTTP requests, has state, deals with transactional data, is reusable, and may be dynamic and long-running in nature.
It is perfectly integrated also in Groovy & Grails (Webflow Doc) . Groovy is an script-like extension of Java, whereas Grails is webframework that uses among other things Spring, Hibernate...
Personally I use Django to build my forms. I've done complex multi-step forms, where steps are conditional by using the django.contrib.formtools.FormWizard and using a factoryfunction to create the Form class for the step like this:
class SomeWizard(FormWizard):
def process_step(self, request, form, step):
if form.is_valid() and step == 0:
#compute step 2
Step2 = second_step_factory(form)
self.form_list[1] = Step2
And the step it with a placeholder when instantiating the Wizard object:
def some_form_view(request):
Step1 = first_step_factory(request)
placeholder = second_step_factory()
return SomeWizard([Step1, placeholder])(request)
In Django 1.4 the FormWizard has been replaced by a different implementation, I've not looked at that yet.
If you want a language neutral, more declarative option, you can have a look at XForms. Browser support seems a bit abandoned, but there are XSLTs that will transform your XForms into HTML.
The PFBC (PHP Form Builder Class) project is developed with the following goals in mind:
Promote rapid development of forms through an object-oriented PHP structure.
Eliminate the grunt/repetitive work of writing the html and validation when building forms. human error by using a consistent/tested utility.
A sample code will be:
<?php
//PFBC 2.x PHP 5 >= 5.3
session_start();
include($_SERVER["DOCUMENT_ROOT"] . "/PFBC/Form.php");
$form = new PFBC\Form("GettingStarted", 300);
$form->addElement(new PFBC\Element\Textbox("My Textbox:", "MyTextbox"));
$form->addElement(new PFBC\Element\Select("My Select:", "MySelect", array(
"Option #1",
"Option #2",
"Option #3"
)));
$form->addElement(new PFBC\Element\Button);
$form->render();
//PFBC 2.x PHP 5
session_start();
include($_SERVER["DOCUMENT_ROOT"] . "/PFBC/Form.php");
$form = new Form("GettingStarted", 300);
$form->addElement(new Element_Textbox("My Textbox:", "MyTextbox"));
$form->addElement(new Element_Select("My Select:", "MySelect", array(
"Option #1",
"Option #2",
"Option #3"
)));
$form->addElement(new Element_Button);
$form->render();
?>
Check out PHP Form Builder Class Project. Hope this helps... :)
I wouldn't try to be language neutral. Focus on a language like CFML or PHP which do this sort of thing very well. Eg.
<input type="radio" name="type" value='mortgage' onmouseup='updateForm(this)'> Mortgage
<input type="radio" name="type" value='loan' onmouseup='updateForm(this)'> Loan
<cfif form.type eq 'loan'>
<input name="income" type="text">
</cfif>
A very simple example. You could also use logic based on login details, database values, previous forms, etc. CFML also provides advanced form tags (eg, cfinput) which can take care of some of the messy details of dynamic forms for you.
Well, if you don't mind which language is used, and are interested in learning ROR, then here's a decent complex forms tutorial, although you will need some familiarity with the framework to make this work
Perhaps this answer will help you find a simpler jquery-oriented approach, might be a bit old
Take a look at this jQuery multi-step form tutorial too, but it looks like one I'll end up reading during my lunch-break later on as I'm interested in doing this myself
There's bound to be an idiot-proof plugin out there too
I do assume that you are using some kind of DB to form data. and what you want to do is fill out some data on form page one, submit form get the second page and so on.
Option 1 - Use PHP Yii framework.
It has some good built in CRUD(forms) generation support and it can generate simple forms automatically. What you need to do is customize the action to redirect you on next form (second page) and on the final form save all the data. It also has a good ajax based validation.
All you need to do is connect your app to db select a table generate model then generate CRUD. upto this its a 5-10 min task. Then you need to customize the forms define scenarios for validation and modify already defined actions to support the changes.
You can try their sample app Yii Blog. It can explain the process in detail.
Option 2 - USE JavaScript.
Built simple html forms as per your requirements. Then on submit of each page call a JavaScript (onClick event of submit button) that validates the form and stores the form data in JSON/XML object. You can serialize it or hold it in sessions. On submit of final page send an entire JSON/XML data (including data on final page) to your form processing script/ URL in action tag.

Is there a way to 'listen' for a database event and update a page in real time?

I'm looking for a way to create a simple HTML table that can be updated in real-time upon a database change event; specifically a new record added.
In other words, think of it like an executive dashboard. If a sale is made and a new line is added in a database (MySQL in my case) then the web page should "refresh" the table with the new line.
I have seen some information on the new using EVENT GATEWAY but all of the examples use Coldfusion as the "pusher" and not the "consumer". I would like to have Coldfusion both update / push an event to the gateway and also consume the response.
If this can be done using a combination of AJAX and CF please let me know!
I'm really just looking to understand where to get started with real-time updating.
Thank you in advance!!
EDIT / Explanation of selected answer:
I ended up going with #bpeterson76's answer because at the moment it was easiest to implement on a small scale. I really like his Datatables suggestion, and that's what I am using to update in close to real time.
As my site gets larger though (hopefully), I'm not sure if this will be a scalable solution as every user will be hitting a "listener" page and then subsequently querying my DB. My query is relatively simple, but I'm still worried about performance in the future.
In my opinion though, as HTML5 starts to become the web standard, the Web Sockets method suggested by #iKnowKungFoo is most likely the best approach. Comet with long polling is also a great idea, but it's a little cumbersome to implement / also seems to have some scaling issues.
So, let's hope web users start to adopt more modern browsers that support HTML5, because Web Sockets is a relatively easy and scalable way to get close to real time.
If you feel that I made the wrong decision please leave a comment.
Finally, here is some source code for it all:
Javascript:
note, this is a very simple implementation. It's only looking to see if the number of records in the current datatable has changed and if so update the table and throw an alert. The production code is much longer and more involved. This is just showing a simple way of getting a close to real-time update.
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.js"></script>
<script type="text/javascript" charset="utf-8">
var originalNumberOfRecsInDatatable = 0;
var oTable;
var setChecker = setInterval(checkIfNewRecordHasBeenAdded,5000); //5 second intervals
function checkIfNewRecordHasBeenAdded() {
//json object to post to CFM page
var postData = {
numberOfRecords: originalNumberOfRecsInDatatable
};
var ajaxResponse = $.ajax({
type: "post",
url: "./tabs/checkIfNewItemIsAvailable.cfm",
contentType: "application/json",
data: JSON.stringify( postData )
})
// When the response comes back, if update is available
//then re-draw the datatable and throw an alert to the user
ajaxResponse.then(
function( apiResponse ){
var obj = jQuery.parseJSON(apiResponse);
if (obj.isUpdateAvail == "Yes")
{
oTable = $('#MY_DATATABLE_ID').dataTable();
oTable.fnDraw(false);
originalNumberOfRecsInDatatable = obj.recordcount;
alert('A new line has been added!');
}
}
);
}
</script>
Coldfusion:
<cfset requestBody = toString( getHttpRequestData().content ) />
<!--- Double-check to make sure it's a JSON value. --->
<cfif isJSON( requestBody )>
<cfset deserializedResult = deserializeJSON( requestBody )>
<cfset numberOFRecords = #deserializedResult.originalNumberOfRecsInDatatable#>
<cfquery name="qCount" datasource="#Application.DBdsn#" username="#Application.DBusername#" password="#Application.DBpw#">
SELECT COUNT(ID) as total
FROM myTable
</cfquery>
<cfif #qCount.total# neq #variables.originalNumberOfRecsInDatatable#>
{"isUpdateAvail": "Yes", "recordcount": <cfoutput>#qCount.total#</cfoutput>}
<cfelse>
{"isUpdateAvail": "No"}
</cfif>
</cfif>
This isn't too difficult. The simple way would be to add via .append:
$( '#table > tbody:last').append('<tr id="id"><td>stuff</td></tr>');
Adding elements real-time isn't entirely possible. You'd have to run an Ajax query that updates in a loop to "catch" the change. So, not totally real-time, but very, very close to it. Your user really wouldn't notice the difference, though your server's load might.
But if you're going to get more involved, I'd suggest looking at DataTables. It gives you quite a few new features, including sorting, paging, filtering, limiting, searching, and ajax loading. From there, you could either add an element via ajax and refresh the table view, or simply append on via its API. I've been using DataTables in my app for some time now and they've been consistently cited as the number 1 feature that makes the immense amount of data usable.
--Edit --
Because it isn't obvious, to update the DataTable you call set your Datatables call to a variable:
var oTable = $('#selector').dataTable();
Then run this to do the update:
oTable.fnDraw(false);
UPDATE -- 5 years later, Feb 2016:
This is much more possible today than it was in 2011. New Javascript frameworks such as Backbone.js can connect directly to the database and trigger changes on UI elements including tables on change, update, or delete of data....it's one of these framework's primary benefits. Additionally, UI's can be fed real-time updates via socket connections to a web service, which can also then be caught and acted upon. While the technique described here still works, there are far more "live" ways of doing things today.
You can use SSE (Server Sent Events) a feature in HTML5.
Server-Sent Events (SSE) is a standard describing how servers can initiate data transmission towards clients once an initial client connection has been established. They are commonly used to send message updates or continuous data streams to a browser client and designed to enhance native, cross-browser streaming through a JavaScript API called EventSource, through which a client requests a particular URL in order to receive an event stream.
heres a simple example
http://www.w3schools.com/html/html5_serversentevents.asp
In MS SQL, you can attach a trigger to a table insert/delete/update event that can fire a stored proc to invoke a web service. If the web service is CF-based, you can, in turn, invoke a messaging service using event gateways. Anything listening to the gateway can be notified to refresh its contents. That said, you'd have to see if MySQL supports triggers and accessing web services via stored procedures. You'd also have to have some sort of component in your web app that's listening to the messaging gateway. It's easy to do in Adobe Flex applications, but I'm not sure if there are comparable components accessible in JavaScript.
While this answer does not come close to directly addressing your question, perhaps it will give you some ideas as to how to solve the problem using db triggers and CF messaging gateways.
M. McConnell
With "current" technologies, I think long polling with Ajax is your only choice. However, if you can use HTML5, you should take a look at WebSockets which gives you the functionality you want.
http://net.tutsplus.com/tutorials/javascript-ajax/start-using-html5-websockets-today/
WebSockets is a technique for two-way communication over one (TCP) socket, a type of PUSH technology. At the moment, it’s still being standardized by the W3C; however, the latest versions of Chrome and Safari have support for WebSockets.
http://html5demos.com/web-socket
Check out AJAX long polling.
Place to start Comet
No, you can't have any db code execute server side code. But you could write a service to poll the db periodically to see if a new record has been added then notify the code you have that needs pseudo real-time updates.
The browser can receive real-time updates via BOSH connection to Jabber/XMPP server. All bits and pieces can be found in this book http://professionalxmpp.com/ which I highly recommend. If you can anyhow send XMPP message upon record addition in your DB, then it is relatively easy to build the dashboard you want. You need strophe.js, Jabber/XMPP server (e.g. ejabberd), http server for proxying http-bind requests. All the details can be found in the book. A must read which I strongly believe will solve your problem.
The way I would achieve the notification is after the database update has been successfully committed I would publish an event that would tell any listening systems or even web pages that the change has occurred. I've detailed one way of doing this using an e-commerce solution in a recent blog post. The blog post shows how to trigger the event in ASP.NET but the same thing can easily be done in any other language since ultimately the trigger is performed via a REST API call.
The solution in this blog post uses Pusher but there's not reason why you couldn't install your own real-time server or use a Message Queue to communication between your app and the realtime server, which would then push the notification to the web page or client application.