Google Apps Script scrape data from website with OAuth and recaptcha - google-apps-script

Background:
I want to scrape data from a website by "google apps script" for personal use.
The data is from a member page which means i have to set a username and password to payload and pass the cookies to member page. This worked before the website upgraded.
Below is the code i used before:
var url = "https://ww2.metroplex.com.hk/en/member/login";
var formData = {
'username': userEmail,
'password': userPW
};
var options = {
"method": "post",
'contentType': 'application/json',
'payload' : JSON.stringify(formData),
"followRedirects": false,
"muteHttpExceptions": true
};
var response = UrlFetchApp.fetch(url, options);
var responseCode = response.getResponseCode();
var redirectUrl = response.getHeaders()['Location'];
//**************Member Page (after login)******************
var url = "https://ww2.metroplex.com.hk/en/member/login/profile";
if (redirectUrl != null){
url = redirectUrl;
}
var cookie_string = response.getAllHeaders()['Set-Cookie'];
var cookie = [{}];
for (var i = 0; i < cookie_string.length; i++) {
cookie[i] = cookie_string[i].split( ';' )[0];
};
cookie = cookie.join(';');
var dataHeaders = {
'Cookie': cookie
};
options = {
"method": "get",
"headers": dataHeaders
};
var dataResponse = UrlFetchApp.fetch(url, options);
var dataResponseCode = response.getResponseCode();
var html_text = dataResponse.getContentText();
var table_array = html_text.split("Member Summary");
i want to get member information (Member Summary) like below:
The website currently upgraded and the above method did not work.
The "table_array" should be returned 2 array splitted by value "Member Summary", but currently the value is same as html_text
I trace the network by Chrome and i see there are few things that might need to pass during log-in
access token
authorization
recaptcha
I am not sure whether the recaptcha is required.
I don't know how to get it work. If anyone experiences this kind of website and can program it to log-in and scrape data. Please feel free to discuss
Below is the website i want to access:
https://ww2.metroplex.com.hk/en/movie/highlight
You can use temporary email for registration:
https://ww2.metroplex.com.hk/en/member/freeregister
and the trace network I preformed:
the access token when you go to the website
there is an oauth authorization
google recaptcha

Related

How to integrate Gumroad API with Google Apps Script

I'm trying to see if a user is a paying customer for my gumroad product. I'm trying to integrate the Gumroad API to Google Apps Script.
I have the following code
function checkAccount(){
var token = <<token>>;
var userEmail = Session.getActiveUser().getEmail();
var url = "https://api.gumroad.com/v2/sales";
var headers = {"access_token=" : token};
var options = {
"method" : "GET",
"email" : userEmail,
"headers" : headers
};
var response = UrlFetchApp.fetch(url, options);
var jsonObject = JSON.parse(response.getContentText());
Logger.log(jsonObject);
}
I get the following error Exception: Request failed for https://api.gumroad.com returned code 401 which Gumroad is telling me 401 Unauthorized you did not provide a valid access token. I've checked the token and it's correct. I've logged the options and headers, and they show up correctly.
I'm just not sure why it's giving me a 401.
Try changing headers to this:
var headers = {
"Authorization": `access_token=${token}`
};
EDIT:
Based on this you could try:
headers: {
Authorization: `Bearer ${token}`}

How to use Google Photos API Method: mediaItems.search in Google apps script for a spreadsheet

I really tried to figure this out on my own...
I am trying to load photo metadata from google photos into a sheet using the Google Photos API and google apps script.
I was able to make some progress after a lot of help on a previous question
Is it possible to load google photos metadata into google sheets?
I now have two functions.
function photoAPI_ListPhotos() - Uses Method: mediaItems.list and gives me all my photos that are not archived
function photoAPI_ListAlbums() - Uses Method: albums.list and gives me all my albums
What I want to do is retrieve all photos from a specific album. Method: mediaItems.search should do this but it uses the POST protocol and the previous working examples I found only use GET. Looking at the examples available on that page, there is a javascript portion but it does not work in apps script.
The documentation for UrlFetchApp tells me how to format a POST request but not how to add the parameters for authentication.
The external APIs also is not giving me the examples I am looking for.
I feel like I'm missing some essential tiny piece of info and I hope I'm not wasting everyone's time asking it here. Just a solid example of how to use POST with oauth in apps script should get me where I need to go.
Here is my working function for listing all non-archived photos.
function photoAPI_ListPhotos() {
/*
This function retrieves all photos from your personal google photos account and lists each one with the Filename, Caption, Create time (formatted for Sheet), Width, Height, and URL in a new sheet.
it will not include archived photos which can be confusing if you happen to have a large chunk of archived photos some pages may return only a next page token with no media items.
Requires Oauth scopes. Add the below line to appsscript.json
"oauthScopes": ["https://www.googleapis.com/auth/spreadsheets.currentonly", "https://www.googleapis.com/auth/photoslibrary", "https://www.googleapis.com/auth/photoslibrary.readonly", "https://www.googleapis.com/auth/script.external_request"]
Also requires a standard GCP project with the appropriate Photo APIs enabled.
https://developers.google.com/apps-script/guides/cloud-platform-projects
*/
//Get the spreadsheet object
var ss = SpreadsheetApp.getActiveSpreadsheet();
//Check for presence of target sheet, if it does not exist, create one.
var photos_sh = ss.getSheetByName("photos") || ss.insertSheet("photos", ss.getSheets().length);
//Make sure the target sheet is empty
photos_sh.clear();
var narray = [];
//Build the request string. Max page size is 100. set to max for speed.
var api = "https://photoslibrary.googleapis.com/v1/mediaItems?pageSize=100";
var headers = { "Authorization": "Bearer " + ScriptApp.getOAuthToken() };
var options = { "headers": headers, "method" : "GET", "muteHttpExceptions": true };
//This variable is used if you want to resume the scrape at some page other than the start. This is needed if you have more than 40,000 photos.
//Uncomment the line below and add the next page token for where you want to start in the quotes.
//var nexttoken="";
var param= "", nexttoken;
//Start counting how many pages have been processed.
var pagecount=0;
//Make the first row a title row
var data = [
"Filename",
"description",
"Create Time",
"Width",
"Height",
"ID",
"URL",
"NextPage"
];
narray.push(data);
//Loop through JSON results until a nextPageToken is not returned indicating end of data
do {
//If there is a nextpagetoken, add it to the end of the request string
if (nexttoken)
param = "&pageToken=" + nexttoken;
//Get data and load it into a JSON object
var response = UrlFetchApp.fetch(api + param, options);
var json = JSON.parse(response.getContentText());
//Check if there are mediaItems to process.
if (typeof json.mediaItems === 'undefined') {
//If there are no mediaItems, Add a blank line in the sheet with the returned nextpagetoken
//var data = ["","","","","","","",json.nextPageToken];
//narray.push(data);
} else {
//Loop through the JSON object adding desired data to the spreadsheet.
json.mediaItems.forEach(function (MediaItem) {
//Check if the mediaitem has a description (caption) and make that cell blank if it is not present.
if(typeof MediaItem.description === 'undefined') {
var description = "";
} else {
var description = MediaItem.description;
}
//Format the create date as appropriate for spreadsheets.
var d = new Date(MediaItem.mediaMetadata.creationTime);
var data = [
MediaItem.filename,
"'"+description, //The prepended apostrophe makes captions that are dates or numbers save in the sheet as a string.
d,
MediaItem.mediaMetadata.width,
MediaItem.mediaMetadata.height,
MediaItem.id,
MediaItem.productUrl,
json.nextPageToken
];
narray.push(data);
});
}
//Get the nextPageToken
nexttoken = json.nextPageToken;
pagecount++;
//Continue if the nextPageToaken is not null
//Also stop if you reach 400 pages processed, this prevents the script from timing out. You will need to resume manually using the nexttoken variable above.
} while (pagecount<4 && nexttoken);
//Continue if the nextPageToaken is not null (This is commented out as an alternative and can be used if you have a small enough collection it will not time out.)
//} while (nexttoken);
//Save all the data to the spreadsheet.
photos_sh.getRange(1, 1, narray.length, narray[0].length).setValues(narray);
}
You want to retrieve all photos of the specific album using Google Photo API.
You want to know how to use the method of mediaItems.search using Google Apps Script.
You have already been able to retrieve the data using Google Photo API.
If my understanding is correct, how about this sample script? Please think of this as just one of several answers.
Sample script 1:
var albumId = "###"; // Please set the album ID.
var headers = {"Authorization": "Bearer " + ScriptApp.getOAuthToken()};
var url = "https://photoslibrary.googleapis.com/v1/mediaItems:search";
var mediaItems = [];
var pageToken = "";
do {
var params = {
method: "post",
headers: headers,
contentType: "application/json",
payload: JSON.stringify({albumId: albumId, pageSize: 100, pageToken: pageToken}),
}
var res = UrlFetchApp.fetch(url, params);
var obj = JSON.parse(res.getContentText());
Array.prototype.push.apply(mediaItems, obj.mediaItems);
pageToken = obj.nextPageToken || "";
} while (pageToken);
Logger.log(mediaItems)
At the method of mediaItems.search, albumId, pageSize and pageToken are included in the payload, and the values are sent as the content type of application/json.
Sample script 2:
When your script is modified, how about the following modified script?
function photoAPI_ListPhotos() {
var albumId = "###"; // Please set the album ID.
var ss = SpreadsheetApp.getActiveSpreadsheet();
var photos_sh = ss.getSheetByName("photos") || ss.insertSheet("photos", ss.getSheets().length);
photos_sh.clear();
var narray = [];
var api = "https://photoslibrary.googleapis.com/v1/mediaItems:search";
var headers = { "Authorization": "Bearer " + ScriptApp.getOAuthToken() };
var nexttoken = "";
var pagecount = 0;
var data = ["Filename","description","Create Time","Width","Height","ID","URL","NextPage"];
narray.push(data);
do {
var options = {
method: "post",
headers: headers,
contentType: "application/json",
payload: JSON.stringify({albumId: albumId, pageSize: 100, pageToken: nexttoken}),
}
var response = UrlFetchApp.fetch(api, options);
var json = JSON.parse(response.getContentText());
if (typeof json.mediaItems === 'undefined') {
//If there are no mediaItems, Add a blank line in the sheet with the returned nextpagetoken
//var data = ["","","","","","","",json.nextPageToken];
//narray.push(data);
} else {
json.mediaItems.forEach(function (MediaItem) {
if(typeof MediaItem.description === 'undefined') {
var description = "";
} else {
var description = MediaItem.description;
}
var d = new Date(MediaItem.mediaMetadata.creationTime);
var data = [
MediaItem.filename,
"'"+description,
d,
MediaItem.mediaMetadata.width,
MediaItem.mediaMetadata.height,
MediaItem.id,
MediaItem.productUrl,
json.nextPageToken
];
narray.push(data);
});
}
nexttoken = json.nextPageToken || "";
pagecount++;
} while (pagecount<4 && nexttoken);
photos_sh.getRange(1, 1, narray.length, narray[0].length).setValues(narray);
}
Note:
This script supposes as follows.
Google Photo API is enabed.
The scope of https://www.googleapis.com/auth/photoslibrary.readonly or https://www.googleapis.com/auth/photoslibrary are included in the scopes.
Reference:
Method: mediaItems.search
If I misunderstood your question and this was not the result you want, I apologize.

getting active user name in Google sheets from external domain

I have a published web app:
function doGet(request) {
// DocumentApp.getActiveDocument();
SpreadsheetApp.getActive();
var about = Drive.About.get();
var user = about.name;
// Logger.log(Session.getActiveUser().getEmail());
return ContentService.createTextOutput(user);
}
... at this URL:
https://script.google.com/macros/s/AKfycbzTlhKJXrTAEPEba0l1KWqqzlkul2ntC-0iHi7_POj0wk7j3R6K/exec
Which produces the desired result the user's full name (after authorization to user's data is approved - subsequent running of the URL does not prompt for authentication or approval)
That is the data I want to retrieve from this App Script:
function Test3() {
var options = {
'method' : 'get',
'followRedirects' : true,
// 'validateHttpsCertificates' : 'true',
'muteHttpExceptions' : true,
'contentType' : 'null'
};
var url = "https://script.google.com/macros/s/AKfycbzTlhKJXrTAEPEba0l1KWqqzlkul2ntC-0iHi7_POj0wk7j3R6K/exec"
var response = UrlFetchApp.fetch(url, options);
// var response = test2() ;
// var myName = response.getContentText();
Browser.msgBox("[" + response + "]");
}
but I have not been able to get just that data. Instead I get an a page HTML text, which equates to a Google login page.
Again, just running the URL manually from a browser as any user results in the user name web page, so why when run from app script, it can't just retrieve the result of that page?
What am I missing? Surely I'm some simple syntax away from getting that data.

Spotify API authorisation via Google Apps Script

I am using the following code to make requests to the Spotify API via Google Apps Script:
function search() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getActiveSheet();
var artist = sheet.getRange(1,1).getValue();
artist = encodeURIComponent(artist.trim());
var result = searchSpotify(artist);
Logger.log(result);
}
function searchSpotify(artist) {
//searches spotify and returns artist ID
var response = UrlFetchApp.fetch("https://api.spotify.com/v1/search?q=" + artist + "&type=artist&limit=1",
{ method: "GET",
headers:{
"contentType": "application/json",
'Authorization': "Bearer BQBnpSUdaEweirImw23yh2DH8OGhTwh5a_VnY_fgb2BPML0KvFvYd04CaEdUhQN9N4ZUXMIVfJ1MjFe1_j0Gl0UoHDhcoC_dklluZyOkq8Bo6i2_wfxSbGzP3k5EUjUKuULAnmTwCdkdZQnl-SNU0Co"
},
});
json = response.getContentText();
var data = JSON.parse(json);
var uri = data.artists.items[0].uri.slice(15);
var getArtists = getRelatedArtists(uri);
Logger.log(getArtists);
return getArtists;
}
function getRelatedArtists(uri) {
//searches related artists with the returned ID
var response = UrlFetchApp.fetch("https://api.spotify.com/v1/artists/" + uri + "/related-artists",
{ method: "GET",
headers:{
"contentType": "application/json",
'Authorization': "Bearer BQBnpSUdaEweirImw23yh2DH8OGhTwh5a_VnY_fgb2BPML0KvFvYd04CaEdUhQN9N4ZUXMIVfJ1MjFe1_j0Gl0UoHDhcoC_dklluZyOkq8Bo6i2_wfxSbGzP3k5EUjUKuULAnmTwCdkdZQnl-SNU0Co"
},
});
json = response.getContentText();
var data = JSON.parse(json);
var listArtists = [];
for(var i = 0, len = data.artists.length; i < len; i++){
listArtists.push(data.artists[i].name);
}
return listArtists;
}
This works fine using the temporary Authorisation token from the Spotify website but this token refreshes every hour and so is obviously useless.
I am trying to use my own Authorisation token and ID which I have setup on Spotify however I'm struggling to make this work. As I understand it I may need to add an extra step at the beginning to start the authorisation process but I've tried all methods suggested but keep receiving server errors.
From the document, it seems that "Client Credentials Flow" uses the basic authorization.
In order to use this, at first, you are required to retrieve "client_id" and "client_secret".
Sample script:
var clientId = "### client id ###"; // Please set here.
var clientSecret = "### client secret ###"; // Please set here.
var url = "https://accounts.spotify.com/api/token";
var params = {
method: "post",
headers: {"Authorization" : "Basic " + Utilities.base64Encode(clientId + ":" + clientSecret)},
payload: {grant_type: "client_credentials"},
};
var res = UrlFetchApp.fetch(url, params);
Logger.log(res.getContentText())
From curl sample, grant_type is required to send as form.
Result:
The document says that the response is as follows.
{
"access_token": "NgCXRKc...MzYjw",
"token_type": "bearer",
"expires_in": 3600,
}
Note:
This is a simple sample script. So please modify this for your situation.
I prepared this sample script by the sample curl in the document.
Reference:
Client Credentials Flow
Edit:
As your next issue, you want to retrieve the access token from the returned value. If my understanding is correct, how about this modification? Please modify my script as follows.
From:
Logger.log(res.getContentText())
To:
var obj = JSON.parse(res.getContentText());
Logger.log(obj.access_token)
When the value is returned from API, it returns as a string. So it is required to parse it as an object using JSON.parse().

Google Apps Script (GAS) Using urlfetchapp with username and password

I am not a coder by nature and am self taught in GAS (only code I have used). I work for City College Norwich and I would like to create a script that automatically logs me in to their website so I can fetch timetable data and put it into a spreadsheet.
After doing some research I have given up trying to figure it out so I am asking for help.
I have tried this:
function getTimetables() {
var url = "https://ccn.ac.uk/user/";
var options = {
"method": "post",
"payload": {
"user-login" : "username",
"edit-pass" : "password",
"BUTTON_Submit" : "Log In",
},
"testcookie": 1,
"followRedirects": false
};
var response = UrlFetchApp.fetch(url, options);
if ( response.getResponseCode() == 200 ) {
// Incorrect user/pass combo
Logger.log('Incorrect user/pass combo')
} else if ( response.getResponseCode() == 302 ) {
// Logged-in
var headers = response.getAllHeaders();
if ( typeof headers['Set-Cookie'] !== 'undefined' ) {
// Make sure that we are working with an array of cookies
var cookies = typeof headers['Set-Cookie'] == 'string' ? [ headers['Set-Cookie'] ] : headers['Set-Cookie'];
for (var i = 0; i < cookies.length; i++) {
// We only need the cookie's value - it might have path, expiry time, etc here
cookies[i] = cookies[i].split( ';' )[0];
};
url = "https://mytimetable.ccn.ac.uk/timetable.aspx?week=30&room=C5A";
options = {
"method": "get",
// Set the cookies so that we appear logged-in
"headers": {
"Cookie": cookies.join(';')
}
}
}
}
}
Which return "Incorrect user/pass combo".
And I have tried this:
function getTimetablev2() {
var site = "https://ccn.ac.uk/user"
var USERNAME = PropertiesService.getScriptProperties().getProperty('username');
var PASSWORD = PropertiesService.getScriptProperties().getProperty('password');
var url = PropertiesService.getScriptProperties().getProperty(site);
var headers = {
"Authorization" : "Basic " + Utilities.base64Encode(USERNAME + ':' + PASSWORD)
};
var params
= {
"method":"GET",
"headers":headers
};
var response =
UrlFetchApp.fetch(site, params);
Logger.log(response.getResponseCode())
}
Which return code 200 - failed to log in.
If anyone can solve this for me I would be forever in your debt as it would save me loads of time. I have created a practical booking system where each teacher has their own spreadsheet with their timetable on and all bookings go to a master spreadsheet us technicians use. If I could automate generating their timetables it would be fantastic.
Without knowing how a valid HTTP POST to https://ccn.ac.uk/user/ looks, it's hard to answer your question. So...
Using hurl.it and your payload provided in the first code snippet... it doesn't look like this is a valid POST.
Using firebug and inputing dummy data into the form you are able to look at how the POST is done correctly.
These are the parameters you should use in your request body.
Your second code snippet is not very likely to work if this site doesn't natively support authentication other than using this form.