getting information from a website in processing? - html

I am currently making a processing program, where a part of it will be to acess some information from at website. The website will be an HTML file, where some information is stored, which i need to acess and parse. I know how to open a html file, but my problem is that it is supposed to acess a list, which is generated after a login on the website. How do i do that?
This is the website, right after loading the HTML file:
http://i.imgur.com/kGIkyle.png
After a login, the website will begin to spit out data every two seconds.
I wanna acess the data in the ordered list, and i wanna acess it every two seconds in my processing program. How do i do that?
This is the website, after a login, after a moment.
http://i.imgur.com/O743fNJ.png

When you use a web browser to submit a login, you're really interacting with the server. Usually the web browser submits a POST request containing the login information (like a username and password), and the server responds with the next webpage to load.
The details of this are going to depend on the website you're interacting with. Some websites might use AJAX to submit the data and then trigger some JavaScript to run.
The point is, you're going to have to understand exactly how the underlying web server and webpage works. Then you're going to have to use the rules of those interactions to issue the appropriate requests from your Processing code.
It might be as simple as submitting the login credentials in the url itself and then just scraping the information from the webpage.
More likely, you're going to have to interact with some kind of web API and do the requests yourself. Google "Java post request" for more info.
Of course, all of this assumes that the website is open to people using it. If this website isn't yours, it could also be locked down and unavailable to you.

Related

Caching of web pages and SPA

I have read about SPA (single page application) and learned that biggest advantage of those is that save network traffic because SPA downloads all (at least most of them) application resources when loading the page.
But I am not clear on this - suppose in my index.jsp I have specified all my resources and downloaded when loading index.jsp. Now my application navigation starts from index.jsp, so for navigation I submit my form and which has action="user.jsp"
Now, since I have action="user.jsp" so on submitting the form my web browser will send a request to server to get user.jsp. Please correct me if I am wrong. Or will be taken from HTTP cache. But lets say through some Apache setting (I have read somewhere that it is possible but don't know how to do it) I have disabled the HTTP caching of web page then user.jsp will be downloaded from server.
Much appreciated if somebody can throw good insight on it. Basically I am confused with the fact that action="user.jsp" will lead a call to server and HTTP/browser can cache web pages.
P.S.: I accidentaly posted my question here as a guest user but now unable to remove that, so if you have moderation authorities then please get that question remove to avoid duplication.

Get downloads & file browsing requests from HTMLFrame

I am able to load and display files using the HTMLLoader class. http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/html/HTMLLoader.html
Problem is, when a user navigates to a download link or an upload button, nothing happens. I heard somewhere that any downloads get sent over to the user's main document folder. Anyway to intercept this and get some details? Someone in my browsing history suggested to somehow get it using the Socket class to fetch it's data and control where it would go using the File class. I couldn't make out the demonstration.
Bonus question...what properties do I have to set to make Google understand that this browser is not a bot? I get this in plain text when trying to navgiate to http://www.google.com . It's other services work completely fine though.
Google
Sorry...
We're sorry...
... but your computer or network may be sending automated queries. To protect our users, we can't process your request right now.

Prefill webpage from Lotus Notes client application

Looking to prefill a webpage (a form in a wordpress site) from a lotus notes application. My investigations to date show that using the query strings behind the urlname should work - however I am missing something here.
www.url.com/register?fname="bob"&sname="smith" etc...
I thought this could be done, but am not sure where the starting point would be?
Ideal solution - webservice from the owner of url.com to allow us to populate and get a response once done. They are not keen to do this as it does not save them any time, just our business.
I am looking to understand how to prepopulate the webpage so that the user just has to submit the form at url.com.
You could go mega-sneaky and use the Selenium WebDriver to launch a browser session on your client. It has full control over the browser and can sniff out the fields you need to populate. I wrote some simple code for starters - but the samples on the original web site might be more comprehensive.

How to solve this issue with the HTML5 manifest?

From my experiences so far, I've concluded that the HTML5 Manifest scheme was really terribly designed.
My site serves a manifest file when a user is logged in. Unfortunately, when they log out, they can still access the cached protected materials. Can anyone think of a way to fix this?
A manifest file is designed to take a website offline and still be able to navigate. It essentially just tells the browser to download and keep that stuff in cache. If your adding secret stuff to the manifest and the user goes offline, he needs to be able to still access it - or whats the point of having a special logged-in-manifest-file if he has to be loggedin (therefor online)?
You could add javascript that checks if the user is online again and if he is, tries to validate the "login state" and redirects or removes the secret stuff from localstorage (if you would use localstorage to save the "secret" stuff and javascript to display it instead of a manifest file )
Lets say the secret stuff is an image and you are not using a manifest file, but just displaying images when the user is logged in and its crusial, the user cant view that image after logout, you would need to set the http headers to no-cache and cache-expire to some random date of the past, so that a normal user would see it anymore. Problem then is, that the image is downloaded everytime somebody visits the website..
You need to approach the HTML5 Application Cache in a different way. It is not useful for caching server-side dynamically generated pages, especially those that require a login to reach. The Application Cache has no concept of logins, nor securing a page from somebody with a different/no login.
It is much more appropriate for an AJAX-based site, where all HTML/CSS/JavaScript is static and registered in the Application Cache, and data is instead fetched via AJAX then used to populate pages. If you need to cache data in the application for offline use, then use one of the offline data storage mechanisms such as Local Storage/Session Storage, or IndexedDB, for data.
You can then make your own judgement on how much data you want to cache offline, since there's no way to validate a login without making a call to the server that is naturally inaccessable whilst offline.
What if when the user logs out or is not logged in they get a manifest with only network:*

Access XML page via ActionScript 3 (bypassing login screen first)

Need some help here :P
What im trying to do is simply get some data from an xml page located on a server.
However, the server first requires a username/password combination before i even get to see the xml content. What it does, is present a login form, that requires a user to provide credentials. Once the user hits the login, a js function is run, which logs the user in and then presents XML content to the user without ever redirecting the user to a different page.
So what im trying to ask is, is there a way (and if so, how) can i retrieve the XML of a page that first requires me to provide login details to the server?
Cheers
i'm assuming the XML data on the server is dynamic, otherwise you could simply copy and bundle the data into your own website - obviously.
i'm not sure of the nature of this data, but sometimes data can be accessed thru a website's backdoor, legally. you could try a quick search to see if this data is available publically, or even contact the data holder to find out for sure. in any case, you'll need to have a cross-domain policy file to access data that is not hosted on your own domain.
You cannot load variables or XML data
into a Flash movie from another
domain. For example, a Flash movie
loaded from
http://www.yourserver.com/flashmovie.swf
can access data residing at
http://www.yourserver.com/data.txt.
The text file is located within the
same domain as the SWF.
However, an attempt to load data from
http://www.NotMyServer.com/data.txt
will fail and no error messages are
displayed. The load action will cause
a warning dialog to appear.
Note: This security feature does not
affect Flash movies playing in
stand-alone projectors.
if the data is publicly available, there is probably a way to bypass this security restriction by using JavaScript and ExternalInterface to capture the data, but i'm not well versed with such routines.
this security restriction is not applicable to AIR applications.
more: Cross-domain policy for Flash movies