Retrieving information from a web page

Retrieving information from a web page - html

My application is meant to speed up the retrieval of phone call information from our telephone system.
The best way to get this information is to create a new search on the telephone system's web interface and export the results to an Excel spreadsheet which my application then imports into a DataSet.
To get the export, from the login screen, the process goes as follows:
Log in
Navigate to Reports Page
Click "Extension Detail" link
Select "Extensions" CheckBox
Select the extensions (typically all the ones currently being used) from the listbox
Specify date range
Click on Export button
It's not a big job to do it manually every day, but, for reliability, it would be great if I can make my application do this automatically the first time it starts every day.
Since more than 1 person in the company is going to use this application, having a Windows Service do it would be even better.
I don't know if it'll help, but the system is Datatex Topaz Next Generation telephone management system: http://www.datatex.co.za/downloads/index.html#TNG
Can anyone give me a basic idea how to do this?
Also, can anyone post links (in comments if need be) to pages where I can learn more about how to do this?

I have done the something similar to fetch info from a website. I cannot give you a exact answer. But the idea is to send login info to the page with form values. If the site is relying on cookies, you can use this cookie aware WebClient:
public class CookieAwareWebClient : WebClient
{
private CookieContainer cookieContainer = new CookieContainer();
protected override WebRequest GetWebRequest(Uri address)
{
WebRequest request = base.GetWebRequest(address);
if (request is HttpWebRequest)
{
(request as HttpWebRequest).CookieContainer = cookieContainer;
}
return request;
}
}
You should be aware that some sites rely on a session id being passed so the first thing I did was to fetch the session id from the page:
var client = new CookieAwareWebClient();
client.Encoding = Encoding.UTF8;
var indexHtml = client.DownloadString(*index page url*);
string sessionID = fetchSessionID(indexHtml);
Then I had to log in to the page which you can do by uploading values to the page. You can see the specific form elements with "view source" but you have to know a little HTML to do so.
var values = new NameValueCollection();
values.Add("sessionid", sessionID); //Fetched session id
values.Add("brugerid", args[0]); //Username in my case
values.Add("adgangskode", args[1]); //Password in my case
values.Add("login", "Login"); //The login button
//Logging in
client.UploadValues(*url to login*, values); //If all goes perfect, I'm logged in now
And then I could download the page I needed. In your case you may use DownloadFile(...) if the file always have the same url (something like Export.aspx?From=2010-10-10&To=2010-11-11) or UploadValues(...) where you specify the values as before but saves the result.
string html = client.DownloadString(*url*);
It seems you have a lot more steps than I did. But the principle is the same. To see what values your send to the site to login etc. you can use programs such as Fiddler (windows) which can capture the activity going on. Essential you just do exactly the same thing but watch out for session id etc. which is temporary.
The best idea is really to use some native way to fetch data, but if don't got the code, database etc. you have to do it the ugly way. You may also need a HTML parser to fetch the data (ups, you don't because you export to a file). And last but not least, keep in mind that pages can change and there is great potential to fail to login, parse etc.
Please ask for if you are uncertain what is going on.
ADDITION
The CookieAwareWebClient is not my code:
http://code.google.com/p/gardens/source/browse/Montrics/Physical.MyPyramid/CookieAwareWebClient.cs?r=26
Using CookieContainer with WebClient class
I also found some relevant threads:
What's a good tool to screen-scrape with Javascript support?
http://forums.asp.net/t/1475637.aspx

With a HTTP client, you need to do the following:
Log in, using cookies or HTTP authentication
Request a page
Submit form data
This means that you need some class or component in your program that can do HTTP, cookies, authentication and forms. With this, you do the same requests a user would do.

Related

How to create alert from view model

I want to validate file. While file is invalid, i want to refresh my page and inform user that he did not upload proper file. So i have this in my
views/campaign.py
try:
wb = load_workbook(mp_file)
except BadZipfile:
return redirect('campaign_add', client_id)
The only way i know how to do it is add another attribute to client class which will be
is_error(models.BooleanField())
And then change views/campaign to
try:
client.is_error = False
wb = load_workbook(mp_file)
client.save()
except BadZipfile:
client.is_error = True
client.save()
return redirect('campaign_add', client)
And with another attribute i can add in my campaign.html file some kind of if is.error is true i'm adding some kind of windows with information about bad file after reloading page. But is there any way to do it without adding another attribute?

Ok, let's imagine that the answer is a little bit complicated than you've expected.
Modern UI's are not reloading pages just to inform about some errors with user input or upload.
So what is the best user experience here?
User is uploading some file(s) from the page.
You are sending a file via JavaScript to the dedicated API endpoint for this uploading. Let's say /workbook/uploads/. You need to create a handler for this endpoint (view)
Endpoint returns 200 OK with the empty body on success or an error, let's say 400 Bad Request with detailed JSON in the body to show to the user what's wrong.
You're parsing responses in JavaScript and show the user what's wrong
No refreshes are needed. 🙌
But the particular answer will need more code from your implementation. (view, urls, template)

Html - single page - staying logged in

I have an Html page with a load of javascript that changes between views.
Some views require the person to be logged in, and consequently prompt for it.
How can I note the person has successfully logged in, using the javascript, that will not be a security issue, but will mean the person does not have to repeatedly log in for each view. I do not want to keep on going back to the server each time.
Edit:::
To explain more. Here are the problems I see.
Lets say I have the following in my javascript:
var isLoggedIn = true;
var userEmail = "myemail#mysite.com";
Anyone can hack my code to change these values and then get another person's info. That is not good. So instead of isLoggedIn do I need something like a hashed password stored in the javascript:
var userHashedPassword = "shfasjfhajshfalshfla";
But every where I read, they say you should not keep any password stuff in memory for any length of time.
So what variables do I keep and where? The user will be constantly flicking between non-user specific divs and user-based divs, and I do not want them to have to constantly log in each time.
****Edit 2:*****
This is what I am presently doing, but am not happy with.
There is a page/div with 3 radio buttons. Vacant games (does not require user information), My Game (requires knowledge of user and must be signed in), My Old Games (also requires logged in status).
When first going on the page it defaults on vacant games, and gets the info from the server, which does not require login.
In two variables in the javascript I have
var g_Email = "";
var g_PasswordEncrypted = "";
Note these are both 0 length strings.
If the user wants to view their games, they click the My Games radio button. The code checks to see if the g_Email and PasswordEncrypted are 0 length strings, if they are it goes to a div where they need to login.
When the user submits their loging info, it goes to the server, checks their details, and sends back an ok message, and all the info (My Games) that the user was requesting.
So if the login was a success, then
g_Email = "myemail#mysite.com";
g_PasswordEncrypted = "this is and encrypted version of the password";
If there is any failure in login, these two are instead set to "".
Then when the user navigates to any page that requires login, it checks to see if these two strings are filled. If they are, it will not go to a login page when you request information like My Games.
Instead it just sends the info in these strings to the server, along with the My Games request. The server still checks these Email and encrypted password are valid before sending back the info, but at the client side, the user has not had to repeatedly input this info each time.
If there is any failure in the server request, it just sends back an error message (I am using ajax) in the callback function, which knows to set the g_Email and g_PasswordEncrypted to "" if there is anything wrong. (In the latter case, the client side knows it has to re-request the login details because these two strings are "").
The thing I do not like is I am keeping the Encryted password on the person's client machine. If they walk away from their machine, someone can open up the debugger in something like chrome and extract these details, and then hack it into their machine some time later.

If javascript loads content for each view from the server then it is for server to know if a current session belongs to logged user or not. In case the user is not logged, the server responses with prompt to login, otherwise it sends content of the view.
If javascript bulds content for the views deriving it from the data that was already received from the server then it should use some variable keeping state of the user (logged/not_logged). And depending on that value javascript will either show a prompt to login or display required content of the view.

Oracle APEX - HTML Links Breaks Session and Requires New Login

Ok so here is what is happening:
I have a client that I am building an application for. My client has a flowchart that they would like posted on the front page of their application. Check. My client then wants this flowchart to be set up as an image map so that a user could click one of the boxes in this flowchart and be taken to a report in another part of the application. Check.
All of that is elementary and, in a technical sense, works. The issue is, and it is an issue I have encountered before with APEX, is that every time a user clicks one of these links it takes them to the login screen. It seems that linking directly to a page's URL breaks the session and requires you to login again, even if you are linking from one page in the application to another in the same application.
I have played with all of the authentication settings in a hopes of fixing this and tried to determine what is breaking the session exactly but with no luck.
Has anyone else had this problem and could share their method for fixing it? I really cant have users logging in every time they click a link and I also cannot simply remove the authentication on the pages. Thanks in advance.

You should pass on the session id in your links. If you don't, then apex will see this as a new session. You can tell from the url: take note of the session id in your url when you are on your image map. When you select an application, take another look at the session id part in the url. If they are different, then you are starting a new session each time.
/apex/f?p=190:90:1674713700462259:::::
190 -> application id
90 -> page id
1674713700462259 -> Session id
To pass on the session, it depends where you construct your links.
In PLSQL, you can find it through :SESSION or :APP_SESSION
For example, in a plsql dynamic region: htp.p('the session id is '||:SESSION);
In javascript code you can use $v("pInstance") to retrieve the value dynamically, or use &APP_SESSION. which will have the value substituted at runtime.
Small example:
function printsome(){
var d = $("<div></div>");
d.text('&APP_SESSION. = ' + $v("pInstance"));
$("body").append(d);
};
So you probably just need to alter the construction of your link somewhat to include the session!

I was assuming the binding variables will do the job. But they were helpless.
Best way is to pass the current session id to an item then use the item value in the link.
f?p=&APP_ID.:32:&P31_SESSION.:::P32_CUSTOMER_ID:#CUSTOMER_ID#

implementing captcha in Flash

I'm developing a flash registration form and I need to incorporate dynamic 'captcha' images for confirmation.
Can anyone recommend a best solution for doing this?

Captcha is used to prevent bots from submitting html forms which is easily accomplished since html is easily understood and processed programmatically. The same is not true for a Flash application. It would be difficult for a bot to generically submit Flash forms if it was not specifically made to target your site.
Therefore you don't need to worry about the spam problem captcha solves when working with a Flash application.

Making a strong captcha is not a trivial task. It must be hard enough for bots to fail, but easy enough for humans to succeed... I would take a look at existing systems and possibly use them. reCAPTCHA is popular http://recaptcha.net/ . It might be possible to use it through flash, but I have not looked into it.

It's not that different from a captcha in an HTML form, really.
Suppose you're using php on the server and you have a captcha.php scritp that generates the captcha image and saves its value in the session. In an HTML form, you'd use an element and set its src to captcha.php. The user would fill up a field with the text they see in the image. In the script that receives the post, you'd check if the user input matches the session value.
In a flash form, it's exactly the same. You load the image calling captcha.php and ask the user to type the extra field. Then, when you post the data to the server you pass the value typed by the user in the captcha field and the server matches that against the value it has stored in the session when you called captcha.php.
So, basically, it's the same as in an HTML form.

Chances are, bots aren't going to be written for your website. If the need ever arises, a simple "add these two numbers for me, k?" would be simple enough.
In all honesty, i doubt someone would write letter recognition to sign up a few hundred times on your website =/
You should be more worried about someone disassembling [or whatever the flash term is] your .swf s and simply sending "register" messages to your server =/
And yes, by that, i tried to imply that Captcha must be applied server side, or, really, its not that hard to go around.

We had a strong need to implement CAPTCHA into a flash animation/form.
The most important point to note is that either FF or IE (can’t remember which one) doesn’t send any cookies back with a web service call. So if you’re submitting your form to a .Net web service you can’t use the session state of the http request to store the captcha text and then compare the user entered captcha value on submttion to the web service (session enabled web method)
We implemented the following:
Set a unique token value (Guid) on the web page
pass this token as a flashvar to the flash movie
load the captcha image into the flash with the token as a url param. Ie captchaImg.aspx?t=xxxxxxx
during that request save the random captcha text in a table with the token
when the user submits their form, compare the token and user entered captcha value with the one in the table
This approach works very well for us.
It’s also web farm safe.

public class Captcha extends Sprite{
private var question:String = "How do you feel?";
private var _answer:String;
private var isRobot:Boolean;
public function Captcha(answer:String){
_answer = answer;
}
public function checkAnswer():Boolean
if(answer != "sad"){
isRobot = true;
return isRobot;
}else{
isRobot = false;
return isRobot;
}
}
}

Retaining HTTP POST data when a request is interrupted by a login page

Say a user is browsing a website, and then performs some action which changes the database (let's say they add a comment). When the request to actually add the comment comes in, however, we find we need to force them to login before they can continue.
Assume the login page asks for a username and password, and redirects the user back to the URL they were going to when the login was required. That redirect works find for a URL with only GET parameters, but if the request originally contained some HTTP POST data, that is now lost.
Can anyone recommend a way to handle this scenario when HTTP POST data is involved?
Obviously, if necessary, the login page could dynamically generate a form with all the POST parameters to pass them along (though that seems messy), but even then, I don't know of any way for the login page to redirect the user on to their intended page while keeping the POST data in the request.
Edit : One extra constraint I should have made clear - Imagine we don't know if a login will be required until the user submits their comment. For example, their cookie might have expired between when they loaded the form and actually submitted the comment.

This is one good place where Ajax techniques might be helpful. When the user clicks the submit button, show the login dialog on client side and validate with the server before you actually submit the page.
Another way I can think of is showing or hiding the login controls in a DIV tag dynamically in the main page itself.

You might want to investigate why Django removed this feature before implementing it yourself. It doesn't seem like a Django specific problem, but rather yet another cross site forgery attack.

2 choices:
Write out the messy form from the login page, and JavaScript form.submit() it to the page.
Have the login page itself POST to the requesting page (with the previous values), and have that page's controller perform the login verification. Roll this into whatever logic you already have for detecting the not logged in user (frameworks vary on how they do this). In pseudo-MVC:
CommentController {
void AddComment() {
if (!Request.User.IsAuthenticated && !AuthenticateUser()) {
return;
}
// add comment to database
}
bool AuthenticateUser() {
if (Request.Form["username"] == "") {
// show login page
foreach (Key key in Request.Form) {
// copy form values
ViewData.Form.Add("hidden", key, Request.Form[key]);
}
ViewData.Form.Action = Request.Url;
ShowLoginView();
return false;
} else {
// validate login
return TryLogin(Request.Form["username"], Request.Form["password"]);
}
}
}

Just store all the necessary data from the POST in the session until after the login process is completed. Or have some sort of temp table in the db to store in and then retrieve it. Obviously this is pseudo-code but:
if ( !loggedIn ) {
StorePostInSession();
ShowLoginForm();
}
if ( postIsStored ) {
RetrievePostFromSession();
}
Or something along those lines.

Collect the data on the page they submitted it, and store it in your backend (database?) while they go off through the login sequence, hide a transaction id or similar on the page with the login form. When they're done, return them to the page they asked for by looking it up using the transaction id on the backend, and dump all the data they posted into the form for previewing again, or just run whatever code that page would run.
Note that many systems, eg blogs, get around this by having login fields in the same form as the one for posting comments, if the user needs to be logged in to comment and isn't yet.

I know it says language-agnostic, but why not take advantage of the conventions provided by the server-side language you are using? If it were Java, the data could persist by setting a Request attribute. You would use a controller to process the form, detect the login, and then forward through. If the attributes are set, then just prepopulate the form with that data?
Edit: You could also use a Session as pointed out, but I'm pretty sure if you use a forward in Java back to the login page, that the Request attribute will persist.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Retrieving information from a web page - html

With a HTTP client, you need to do the following: Log in, using cookies or HTTP authentication Request a page Submit form data This means that you need some class or component in your program that can do HTTP, cookies, authentication and forms. With this, you do the same requests a user would do.

Related

How to create alert from view model

Html - single page - staying logged in

Oracle APEX - HTML Links Breaks Session and Requires New Login

implementing captcha in Flash

Retaining HTTP POST data when a request is interrupted by a login page

Categories

Resources