Permanent links to thumbnails in Google Drive API

Permanent links to thumbnails in Google Drive API - google-drive-api

I'm using Google Drive API (PHP) to upload some photos to my Drive. When a file is uploaded, a Google_DriveFile object is returned in the response to confirm the successful transfer. It includes a field called thumbnailLink, accessible through the getThumbnailLink getter. Its content may look like this:
https://lh4.googleusercontent.com/dqVdU195R4_0ZtWxsJlhW1Fr2K30xa2hH3V1KV4UrTBl9QkhOSR0ZqN9HoB-TjEQv8SIJw=s220
Until today, I was sure that the link doesn't change by itself over time. However, when I tried to display a thumbnail of a photo I have on my Drive, using a cached address I keep in my local database, I got a 403 error - you can see it under the mentioned link. I asked the API for the current link to the thumbnail and it's now completely different.
It happened to me only once but for multiple files, i.e. all the files I had on my Drive suddenly got new thumbnail links.
Is there a way to quickly retrieve a thumbnail of a document (preferably, a photo) by some constant value or to be sure that it won't change? The perfect solution would be to access the thumbnail under a link that includes the document's id instead of some hash that may change.

Try this:
https://drive.google.com/thumbnail?authuser=0&sz=w320&id=[fileid]
Where:
sz is a size, where you may use as w (width), as h (height)
fileid is a file id. You may find it in "share" menu by right click in Google Drive UI.

I have gone through the API Documentation as they have provided:
Important: Thumbnails are invalidated each time the content of the file changes. When supplying thumbnails, it is important to upload new thumbnails each time the content is modified.
According to the information it means that a new Thumbnail is only generated only when the contents of the file are modifided. But in your case it is really weired thing and the contents are not changed but the thumbnail are Changed. As from documentation there is no batch process thing avaiable but another way around is available i.e. Web Hook
According to the Documentation there is web hook available i.e. Files:Watch process through which one can track the changes are made to file. Thus, it means every time contents are changed then hook would run and you can change the cache of the image thumbnail.
HTTP request can be sent to request the watching the files changing
POST https://www.googleapis.com/drive/v2/files/fileId/watch
Here fileID means the ID provided after loading the file.
In the request body, supply data with the following structure:
id ==> string (A UUID or similar unique string that identifies
this channel.)
token# ==> string (An arbitrary string delivered to the target address with
each notification delivered over this channel).
expiration# => long (Date and time of notification channel expiration,
expressed as a Unix timestamp, in milliseconds.)
type ==> string (The type of delivery mechanism used for this channel.
The only option is web_hook.)
address => string (The address where notifications are delivered
for this channel.)
# Optional.
If the contents get changed then new Thumbnail is generated and hook will notify you address and through your address you can fetch new information.

Here is another solution. Let's say we store only GDrive ID of the images or PDFs (google generate thumbs for many file types).
we can send request to gDrive to get valid thumbnail since looks like thumbs will expire even if there is no changes to the file.
In this case each thumbnail inside Angular component. If you use something else you can create array of links and iterate through it to create proper thumb links.
Here is the code:
const thumb = () => {
if (this.item.DriveId) {
this.getThumb(this.item.DriveId, this.authToken)
.then(response => {
console.log(`response from service ${response}`);
// Set thumbnail width size to 300px or any other width if needed
this.item.externalThumbnailId = response.slice(0, -3) + 300;
})
//here we can handle cases when API limit exceeded 10 req in a sec
.catch(e => {
if(e.data.error.message == 'User Rate Limit Exceeded'){
console.log('Failed to load thumb. trying one more time');
setTimeout(thumb, 1000);
} else {
console.log(e);
}
});
}
};
//call this function on component load.
thumb();
Another solution will be to write some backend script that updates thumbs in DB records.

Related

Obtain list of My Places from Google Maps

I am trying to obtain the list of places the user has saved on Google Maps. Now I know there isnt an API for this (for whatever reason), but I saw here:
"My Places" Google Maps API
That apparently there used to be a way to obtain the URL, but it does not seem to work with my list of places.
E.g.
https://www.google.com/maps/#46.889424,0.1194148,6z/data=!4m3!11m2!2s1KbZtik1IdXyNhwfXEb3P9vaZvzU!3e3
Does not seem to work if I append &output=kml or &output=json
I created this list on Google Maps, then hit share and obtained that link.
I even tried parsing the resulting HTML but it seems everything is handled by some Javascript Engine and I can't find any reference to Google Ids there --- I dont even know how they handle clicks!
Any help? There must be a way to retrieve this information programmatically!
EDIT:
I managed to get something working by visiting the shared link, then processing the html and storing the window.APP_INITIALIZATION_STATE variable. I then convert it to an javascript array and loop over it. Deep inside the array/map structure, I managed to get the google name and google place id out of that array. That seems to work a bit, but when trying with lists over 20 items long, google only gets the first 20 and is waiting for the user to 'scroll down' to get the next 20. That seems to trigger another call to get the next 20 results and looks a bit like:
https://www.google.com/search?tbm=map&fp=1&authuser=0&hl=en&gl=nl&pb=!4m8!1m3!1d54065472.4384380........
I can see the original feature id being included at the end of the url, but have no idea how to construct this url in full though to get the next 20 items.... Any ideas?

Your saved places list actually has what you call a feature ID attribute, this isn't a common practice and Google frowns upon this technique but take a look at this URL:
https://www.google.com/maps/preview/entity?authuser=0&hl=en&gl=us&pb=!1m10!1s0x0%3A0x3743ae09a161976b!3m8!1m3!1d14318.72623152007!2d-98.2296425!3d26.2070353!3m2!1i1024!2i768!4f13.1!12m3!2m2!1i392!2i106!13m57!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i200!7m42!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!14m3!1snyc5W-WeHY3r5gLwkoRI!7e81!15i10112!15m19!2b1!5m4!2b1!3b1!5b1!6b1!10m1!8e3!14m1!3b1!17b1!24b1!25b1!26b1!30m1!2b1!36b1!52b1!53b1!21m28!1m6!1m2!1i0!2i0!2m2!1i458!2i768!1m6!1m2!1i974!2i0!2m2!1i1024!2i768!1m6!1m2!1i0!2i0!2m2!1i1024!2i20!1m6!1m2!1i0!2i748!2m2!1i1024!2i768!22m1!1e81!29m0!30m1!3b1
Highlighted is the feature ID from the link you posted:
https://www.google.com/maps/#46.889424,0.1194148,6z/data=!4m3!11m2!2s1KbZtik1IdXyNhwfXEb3P9vaZvzU!3e3
Along with other maps parameters; when you hit that link you're actually manually triggering the same callback that Google's own scripts in maps use to parse the data to feed back to the maps UI; if you look at array item 2, or {c:..} you'll find a stringified array with the contents of your list, now depending on the program language you're using all it takes is a little tweaking (find/replace, loop through, lint and trim, etc.) to this array and you can pull your results; the cool thing is if you add or remove a place the next time you hit that end point it's updated in real-time.
Some people may call it a "hack"; but it gets the job done. :)
Hope I pointed you to a direction in the event you haven't found a solution; give this a shot.
Note the URL has to be pasted in its entirety, SO truncated the hyperlink; copy and paste the whole thing in one shot and a text file from Google with the arrays will be produced; in my case I curl the URLs I need and parse the returned strings as needed to pull data from Google where their API has limitations. Just a tip. :)

Also check Joel's Answer who did some research and refined some of the following information.
Pagination
You can use this tool to decrypt the pb-parameter. PB stands for protocol buffer (protobuf) and Google uses its own kind of it for maps. You can find different decoders for this by googling it.
In my case, the pagination was done via one parameter (8iX0). It seems, that it always comes with another similar parameter (7i20) but I don't know that it does. I can't yet confirm that this is always the case, but from my experience you're basically looking for two integers that are 20/40/60 etc. apart.
Here's what this looks like for me:
page 2 (7i20, 8i20)
page 3 (7i20, 8i40)
page 4 (7i20, 8i60)
From this information, I tried 7i20 8i00 for page 1, that seemed to work. For lists with >100 items, it just continues like that (8i120, 8i140 etc.)
Here's a code snippet in python (quick & dirty). Make sure to add (long) delays if your list has many pages as you will get rate-limited by captchas eventually if you don't. Notice the 8i%s0 in the url, make sure to put the %s back when you paste your pb-block.
url = "https://www.google.com:443/search?tbm=map&pb=!7i20!8i%s0!..."
headers = {"Referer": "https://www.google.com/"}
def fetch_stops_from_maps():
new_results = -1
page = 0
results = []
while new_results != 0:
new_results = 0
x = requests.get(url % page, headers=headers)
txt = html.unescape(x.text)
txt = txt.split("\n")[1]
results = re.findall(r"\[null,null,[0-9]{1,2}\.[0-9]{4,15},[0-9]{1,2}\.[0-9]{4,15}]", txt)
print(len(results))
for cord in results:
# curr = the description you can manually type in when saving
curr = txt.split(cord)[1].split("\"]]")[0]
curr = curr[curr.rindex(",\"") + 2:]
cords = str(cord).split(",")
lat = cords[2]
lon = cords[3][:-1]
results.append(s)
new_results += 1
page += 2
Actually getting the correct url
Getting the correct url currently seems to be the hardest part when doing this and I have not fully figured this out aswell. However, for my use-case this is not really important, so I extracted the correct pb-block once and called it a day.
As explained in the other answers, the id of the list is visible in the basic url (here, the 2sXX...) when you navigate to the list in your browser. It seems to usually be 24-32 (?) characters long.
.../maps/<coords>/data=!4m3!11m2!2sXXXX...XXXX!3e3
If you have this id, you can put it into an existing protobuf-block and it may work (I only tested this with 3 different lists, which were all created by the same account, so this theory is far from proven).
Now, how do you get the block? I would just share the one I have, but because I only understand parts of what it does, I fear that it may contain some personal info. Instead, I will share my process of getting it. For this I use Burpsuite. It's a program mainly used for web-security testing and has a free community edition, however for our use-case it is the perfect tool, because with it you can easily tinker with requests, change small parts in the request, send it again and immediately see if your changes changed the response. However for extracting the pb-block, one should also be able to use any program that can intercept browser traffic.
Heres the basic rundown with burp:
From GMaps, share a list that has >20 items (this is important) and copy the public link
In Burp, go to the tab "Proxy", make sure "Intercept" is off and click "Open browser" to open the integrated chromium browser
There, paste the link and wait until maps loaded completely
In Burp, turn "Intercept" on, then in google maps, scroll down in the list, until it starts loading new results (always blocks of 20)
Burp now intercepted all requests the browser made since you turned intercepting on. Click "Forward" and go through all requests, until you see a request in the format
GET /search?tbm=map&authuser=0&hl=de&gl=de&pb=!7i20....
This is what you're looking for.
Optionally, you can now right click into the request-text and click "send to repeater", then switch to the repeater-tab. Here you can edit the request and then send it again, being able to see the response immediately. For example, removing the authuser, hl, gl, q, ech, psi url parameters, the request still works flawlessly. If you remove the tch=1 parameter, the response you get will be in a more human readable format.
In the request-text you should now be able to just search for the list-id you got from the link previously and replace it with the id of another list (search bar is at the bottom in burp). As I said, this worked for me, but it may be possible that the pb-block contains some additional metadata that makes lists from different google-accounts or different types of lists incompatible with specific pb-blocks. Just a theory though. Let me know how it goes!
Further automating
I have theorised that one could automate getting the pb-block using requests-html because it can load html-sites fully but it doesn't get updated anymore. Another option (probably the better one) is Selenium Wire, as you should be able to load the page and intercept the requests, like we did in burp. Seems like a whole lot of work tho :D

This was the only API was able to find was this:
https://www.google.com/bookmarks/?output=xml
Used in a browser you would have to first log in through Google's OAuth. It would then return your saved places. Not sure at the moment how you would embedded the authentication to do this programmatically, but this might send you in the right direction.

I was able to extract the data I needed from my google maps list. Below are some comments that expand on some of the other comments here, along with a script that extracts all of the relevant data points from the network response.
Obtaining the underlying URL
You can easily find this URL by just opening the devtools on your browser, going to the network tab, and refreshing the webpage or scrolling down on the list until it loads new results (the list must be larger than 20 results). You should be able to find the network request that starts with https://www.google.com/search?tbm=map&pb... and go from there.
Increase the results size
I was able to increase the number of results returned from the request by changing the value of the 7i20 parameter. From what I can tell, the 71XX parameter is the size of the page, and the 8iXX parameter is the starting point. I haven't tested how large you can make the page limit, but I tested 100 and it seemed to work fine. This should make dealing with larger lists much easier.
Parsing out the data
Instead of using regex to parse out the relevant data from the response, I found that the response is basically just a massive JSON object and I was able to identify the indexes for specific types of data, such as the name of the place, location, notes, etc. See the script below.
If you look at the buildResults function in the script below, you can see the exact indexes used to extract specific pieces of information. This of course may change over time if the network response changes format at all, so use these as a starting point in the case where the specific values aren't at those indexes anymore. Hopefully they would be close to those locations
Script to parse the data (javascript / node.js)
// Insert the raw text content from the network response from the
// https://www.google.com/search?tbm=map&pb... url below.
const rawInput = null
function prepare(input) {
// There are 5 random characters before the JSON object we need to remove
// Also I found that the newlines were messing up the JSON parsing,
// so I removed those and it worked.
const preparedForParsing = input.substring(5).replace(/\n/g, '')
const json = JSON.parse(preparedForParsing)
const results = json[0][1].map(array => array[14])
return results
}
function prepareLookup(data) {
// this function takes a list of indexes as arguments
// constructs them into a line of code and then
// execs the retrieval in a try/catch to handle data not being present
return function lookup(...indexes) {
const indexesWithBrackets = indexes.reduce((acc, cur) => `${acc}[${cur}]`, '')
const cmd = `data${indexesWithBrackets}`
try {
const result = eval(cmd)
return result
} catch(e) {
return null
}
}
}
function buildResults(preparedData) {
const results = []
for (const place of preparedData) {
const lookup = prepareLookup(place)
// Use the indexes below to extract certain pieces of data
// or as a starting point of exploring the data response.
const result = {
address: {
street_address: lookup(183, 1, 2),
city: lookup(183, 1, 3),
zip: lookup(183, 1, 4),
state: lookup(183, 1, 5),
country_code: lookup(183, 1, 6),
},
name: lookup(11),
tags: lookup(13),
notes: lookup(25,15,0,2),
placeId: lookup(78),
phone: lookup(178,0,0),
coordinates: {
long: lookup(208,0,2),
lat: lookup(208,0,3)
}
}
results.push(result)
}
return results
}
const preparedData = prepare(rawInput)
const listResults = buildResults(preparedData)
console.log(listResults)

All data is gone on page reload. Is there any way to avoid that?

I have developed one dashboard application in angular js which is having search functionality and multiple views with lots of different data coming from that search functionality.
Its single page application. So I am facing issue on page reload all data is gone and it shows view with blank data.
Is there any way to solve this issue or any language which help me to create single page application and maintain same data on page reload??
Any suggestion will be appreciated.
Search Page
Data after search
Data after reload

You can use sessionStorage to set & get the data.
Step 1 :
Create a factory service that will save and return the saved session data based on the key.
app.factory('storageService', ['$rootScope', function($rootScope) {
return {
get: function(key) {
return sessionStorage.getItem(key);
},
save: function(key, data) {
sessionStorage.setItem(key, data);
}
};
}]);
Step 2 :
Inject the storageService dependency in the controller to set and get the data from the session storage.
app.controller('myCtrl',['storageService',function(storageService) {
// Save session data to storageService on successfull response from $http service.
storageService.save('key', 'value');
// Get saved session data from storageService on page reload
var sessionData = storageService.get('key');
});

You need to save data before changing the state of application / moving to another page,route.
So, either save that data using
Angular services (Save data, change route, come back check the data in service and reassign variables from service.)
Local-storage to save data. (Save data, change route, come back check the data in service and reassign variables from service.)
Rootscope. Set data in rootscope and move .. after coming back check the variables and reassign.

Your problem is not saving the data, but saving the state in your URL. As it is, you only save the "step", but no other information, so when you reload the page (refresh, or close/re-open browser, or open bookmark, or share...) you don't have all the information needed to restore the full page.
Add the relevant bits to the URL (as you have a single page app, probably in the fragment identifier). Use that information to load the data, rather than relying on other mechanisms to pass data between the "pages".

Get data on page refresh, that is something like,
$rootScope.on('onStateChangeStart', function(){
// here make a call to the source
$http(); // request and try to get data
})

use localStorage or save data to your server for users in a database temporarily and get it back with an API call(localstorage has a 10MB) limit.
You can have your code try to retrieve localstorage values first if they exist.

Facebook Graph API get page post's attachments

I am trying to get the url of an image attachment published by a facebook page. The goal is to embed that image in a webpage in order for the website to always display the last image attachment published by the page. I own both website and FB page.
I have yet not grasped all the details on handling Facebook Graph API, but here is what I did so far:
1) in FB developers website, I have created an application, getting its App ID and secret;
2) I used that information to get an access token (just pasted in my browser the following code:
https://graph.facebook.com/oauth/access_token?client_id={my-client-id}&client_secret={my-client-secret}&grant_type=client_credentials
3) in my website, I have loaded Facebook JS SDK after the body tag; and got also my Facebook page ID from Facebook Page administration;
4) now the real question begins: how can I query Facebook to get the information that I need – the source url of the last published image?
The best result I have gotten so far was by making a getJSON call with the help of jQuery:
var fbfeed = $j.getJSON('https://graph.facebook.com/{my-page-id}/feed?access_token={my-token}&fields=attachments&limit=1');
This will get and store a JSON array in the fbfeed variable (please correct me if I'm wrong). One of the keys of that array is called "src" which contains the source url of the attachment – the information I need to embed that picture in my website;
I have the following problems / concerns:
- I have not found the way to retrieve the value of the "url" key – how can I do that? How can I parse the fbfeed variable and extract the value of the "url" key?
– I have concerns with my usage of the access token:
is it problematic to expose the access token in this way, by using it in a jQuery function? Is it a security risk? If so, can I "mimic" this request but using a server side language such as PHP?
Will this access token expire (i.e. will I need to repeat step 2 from time to time?). So, imagining that I can get this to work, will I need from time to time to "refresh" the access token?
Thanks for your help.

I have managed to get the information I needed using server-side code, although it may not be the most "clean solution": it will iterate through the last 5 posts of my page until it finds an image and a post url:
<?php
$url = 'https://graph.facebook.com/{page-id}/feed?access_token={access-token}&fields=attachments,link&limit=5';
$json = file_get_contents($url);
$json_data = json_decode($json, true);
for($count = 0; $count < 5; $count++) {
$imagesource = $json_data['data'][$count]['attachments']['data'][0]['media']['image']['src']; // gets the image url
$postlink = $json_data['data'][$count]['link']; // gets the post url
if (isset($imagesource) && isset($postlink)) {
// do stuff with the image and post url
break;
};
};
// then I can do other stuff as fallback if the image url and post url are not found

Posting image to Facebook album with AS3 API

I'm having trouble with posting an image from my canvas application to the user's albums. According to the Facebook docs:
In order to publish a photo to a user’s album, you must have the publish_stream permission. With that granted, you can upload a photo by issuing an HTTP POST request with the photo content and an optional description to one these to Graph API connections:
https://graph.facebook.com/USER_ID/photos - The photo will be published to an album created for your app. We automatically create an album for your app if it does not already exist. All photos uploaded this way will then be added to this same album.
https://graph.facebook.com/ALBUM_ID/photos - The photo will be published to a specific, existing photo album, represented by the ALBUM_ID.
So, going by point one, if I upload an image like this...
Facebook.api("me/photos",imagePostCallback,{message:"",image:myImageBitmap,fileName:''},URLRequestMethod.POST);
...then I can expect it to place my image in an album named for my app, which it will create if necessary?
Not so.
What actually happens when the album doesn't exist is that the uploaded image is pushed into any other handy albums that exist, which are usually for (and created by) other applications. This is a bit of a pain.
So far I've tried the following:
Disabling sandbox mode. I had thought that the app might be unable to create new albums because it was in sandbox mode, however disabling sandbox mode made no difference and I can create albums directly with it enabled.
Checking for the existence of my album and creating it if necessary. I can check for my album and create it if it does not exist, but I cannot then upload an image because the POST call to Facebook.api to upload the image will fail if it is not called as a direct result of a user interaction.
And so now I'm a bit stumped. Obviously I can't have the possibility of my app posting images to a competitors album, but at the moment the only alternative I can see will involve effectively making the user submit their image twice if an album has to be created. Any ideas?

I'm guessing you need the access_token in your params :) When posting something on a user's facebook, you always need this one (not always necessary when getting information). The way to get the accesstoken is shown below :)
public function post():void
{
var _params:Object = new Object();
_params.access_token = Facebook.getSession().accessToken;
_params.message = "";
_params.image = myImageBitmap;
_params.fileName = "";
Facebook.api("me/photos", imagePostCallback, _params, URLRequestMethod.POST);
}
also make sure that you have the right permissions when asking for permissions with your app.
EDIT
Ok, so I've missed your edit a bit there ;) it should be possible to create your own album. Take a look at this php-code for graph api. The code should also be able to be parsed to AS3.
http://developers.facebook.com/blog/post/498/
EDIT2
ok, i've done some more digging (seemed interesting to know). This should actually work when using graph api.
FB.api('/me/albums', albumCreateCallback, {name: 'name of the album', message: 'description of the album'}, URLRequestMethod.POST);
When you then call for another api call to upload your image in the albumCreateCallback, it should work and upload your image (according to what i've found).

Retrieving information from a web page

My application is meant to speed up the retrieval of phone call information from our telephone system.
The best way to get this information is to create a new search on the telephone system's web interface and export the results to an Excel spreadsheet which my application then imports into a DataSet.
To get the export, from the login screen, the process goes as follows:
Log in
Navigate to Reports Page
Click "Extension Detail" link
Select "Extensions" CheckBox
Select the extensions (typically all the ones currently being used) from the listbox
Specify date range
Click on Export button
It's not a big job to do it manually every day, but, for reliability, it would be great if I can make my application do this automatically the first time it starts every day.
Since more than 1 person in the company is going to use this application, having a Windows Service do it would be even better.
I don't know if it'll help, but the system is Datatex Topaz Next Generation telephone management system: http://www.datatex.co.za/downloads/index.html#TNG
Can anyone give me a basic idea how to do this?
Also, can anyone post links (in comments if need be) to pages where I can learn more about how to do this?

I have done the something similar to fetch info from a website. I cannot give you a exact answer. But the idea is to send login info to the page with form values. If the site is relying on cookies, you can use this cookie aware WebClient:
public class CookieAwareWebClient : WebClient
{
private CookieContainer cookieContainer = new CookieContainer();
protected override WebRequest GetWebRequest(Uri address)
{
WebRequest request = base.GetWebRequest(address);
if (request is HttpWebRequest)
{
(request as HttpWebRequest).CookieContainer = cookieContainer;
}
return request;
}
}
You should be aware that some sites rely on a session id being passed so the first thing I did was to fetch the session id from the page:
var client = new CookieAwareWebClient();
client.Encoding = Encoding.UTF8;
var indexHtml = client.DownloadString(*index page url*);
string sessionID = fetchSessionID(indexHtml);
Then I had to log in to the page which you can do by uploading values to the page. You can see the specific form elements with "view source" but you have to know a little HTML to do so.
var values = new NameValueCollection();
values.Add("sessionid", sessionID); //Fetched session id
values.Add("brugerid", args[0]); //Username in my case
values.Add("adgangskode", args[1]); //Password in my case
values.Add("login", "Login"); //The login button
//Logging in
client.UploadValues(*url to login*, values); //If all goes perfect, I'm logged in now
And then I could download the page I needed. In your case you may use DownloadFile(...) if the file always have the same url (something like Export.aspx?From=2010-10-10&To=2010-11-11) or UploadValues(...) where you specify the values as before but saves the result.
string html = client.DownloadString(*url*);
It seems you have a lot more steps than I did. But the principle is the same. To see what values your send to the site to login etc. you can use programs such as Fiddler (windows) which can capture the activity going on. Essential you just do exactly the same thing but watch out for session id etc. which is temporary.
The best idea is really to use some native way to fetch data, but if don't got the code, database etc. you have to do it the ugly way. You may also need a HTML parser to fetch the data (ups, you don't because you export to a file). And last but not least, keep in mind that pages can change and there is great potential to fail to login, parse etc.
Please ask for if you are uncertain what is going on.
ADDITION
The CookieAwareWebClient is not my code:
http://code.google.com/p/gardens/source/browse/Montrics/Physical.MyPyramid/CookieAwareWebClient.cs?r=26
Using CookieContainer with WebClient class
I also found some relevant threads:
What's a good tool to screen-scrape with Javascript support?
http://forums.asp.net/t/1475637.aspx

With a HTTP client, you need to do the following:
Log in, using cookies or HTTP authentication
Request a page
Submit form data
This means that you need some class or component in your program that can do HTTP, cookies, authentication and forms. With this, you do the same requests a user would do.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008