Obtain list of My Places from Google Maps - google-maps

I am trying to obtain the list of places the user has saved on Google Maps. Now I know there isnt an API for this (for whatever reason), but I saw here:
"My Places" Google Maps API
That apparently there used to be a way to obtain the URL, but it does not seem to work with my list of places.
E.g.
https://www.google.com/maps/#46.889424,0.1194148,6z/data=!4m3!11m2!2s1KbZtik1IdXyNhwfXEb3P9vaZvzU!3e3
Does not seem to work if I append &output=kml or &output=json
I created this list on Google Maps, then hit share and obtained that link.
I even tried parsing the resulting HTML but it seems everything is handled by some Javascript Engine and I can't find any reference to Google Ids there --- I dont even know how they handle clicks!
Any help? There must be a way to retrieve this information programmatically!
EDIT:
I managed to get something working by visiting the shared link, then processing the html and storing the window.APP_INITIALIZATION_STATE variable. I then convert it to an javascript array and loop over it. Deep inside the array/map structure, I managed to get the google name and google place id out of that array. That seems to work a bit, but when trying with lists over 20 items long, google only gets the first 20 and is waiting for the user to 'scroll down' to get the next 20. That seems to trigger another call to get the next 20 results and looks a bit like:
https://www.google.com/search?tbm=map&fp=1&authuser=0&hl=en&gl=nl&pb=!4m8!1m3!1d54065472.4384380........
I can see the original feature id being included at the end of the url, but have no idea how to construct this url in full though to get the next 20 items.... Any ideas?

Your saved places list actually has what you call a feature ID attribute, this isn't a common practice and Google frowns upon this technique but take a look at this URL:
https://www.google.com/maps/preview/entity?authuser=0&hl=en&gl=us&pb=!1m10!1s0x0%3A0x3743ae09a161976b!3m8!1m3!1d14318.72623152007!2d-98.2296425!3d26.2070353!3m2!1i1024!2i768!4f13.1!12m3!2m2!1i392!2i106!13m57!2m2!1i203!2i100!3m2!2i4!5b1!6m6!1m2!1i86!2i86!1m2!1i408!2i200!7m42!1m3!1e1!2b0!3e3!1m3!1e2!2b1!3e2!1m3!1e2!2b0!3e3!1m3!1e3!2b0!3e3!1m3!1e8!2b0!3e3!1m3!1e3!2b1!3e2!1m3!1e9!2b1!3e2!1m3!1e10!2b0!3e3!1m3!1e10!2b1!3e2!1m3!1e10!2b0!3e4!2b1!4b1!9b0!14m3!1snyc5W-WeHY3r5gLwkoRI!7e81!15i10112!15m19!2b1!5m4!2b1!3b1!5b1!6b1!10m1!8e3!14m1!3b1!17b1!24b1!25b1!26b1!30m1!2b1!36b1!52b1!53b1!21m28!1m6!1m2!1i0!2i0!2m2!1i458!2i768!1m6!1m2!1i974!2i0!2m2!1i1024!2i768!1m6!1m2!1i0!2i0!2m2!1i1024!2i20!1m6!1m2!1i0!2i748!2m2!1i1024!2i768!22m1!1e81!29m0!30m1!3b1
Highlighted is the feature ID from the link you posted:
https://www.google.com/maps/#46.889424,0.1194148,6z/data=!4m3!11m2!2s1KbZtik1IdXyNhwfXEb3P9vaZvzU!3e3
Along with other maps parameters; when you hit that link you're actually manually triggering the same callback that Google's own scripts in maps use to parse the data to feed back to the maps UI; if you look at array item 2, or {c:..} you'll find a stringified array with the contents of your list, now depending on the program language you're using all it takes is a little tweaking (find/replace, loop through, lint and trim, etc.) to this array and you can pull your results; the cool thing is if you add or remove a place the next time you hit that end point it's updated in real-time.
Some people may call it a "hack"; but it gets the job done. :)
Hope I pointed you to a direction in the event you haven't found a solution; give this a shot.
Note the URL has to be pasted in its entirety, SO truncated the hyperlink; copy and paste the whole thing in one shot and a text file from Google with the arrays will be produced; in my case I curl the URLs I need and parse the returned strings as needed to pull data from Google where their API has limitations. Just a tip. :)

Also check Joel's Answer who did some research and refined some of the following information.
Pagination
You can use this tool to decrypt the pb-parameter. PB stands for protocol buffer (protobuf) and Google uses its own kind of it for maps. You can find different decoders for this by googling it.
In my case, the pagination was done via one parameter (8iX0). It seems, that it always comes with another similar parameter (7i20) but I don't know that it does. I can't yet confirm that this is always the case, but from my experience you're basically looking for two integers that are 20/40/60 etc. apart.
Here's what this looks like for me:
page 2 (7i20, 8i20)
page 3 (7i20, 8i40)
page 4 (7i20, 8i60)
From this information, I tried 7i20 8i00 for page 1, that seemed to work. For lists with >100 items, it just continues like that (8i120, 8i140 etc.)
Here's a code snippet in python (quick & dirty). Make sure to add (long) delays if your list has many pages as you will get rate-limited by captchas eventually if you don't. Notice the 8i%s0 in the url, make sure to put the %s back when you paste your pb-block.
url = "https://www.google.com:443/search?tbm=map&pb=!7i20!8i%s0!..."
headers = {"Referer": "https://www.google.com/"}
def fetch_stops_from_maps():
new_results = -1
page = 0
results = []
while new_results != 0:
new_results = 0
x = requests.get(url % page, headers=headers)
txt = html.unescape(x.text)
txt = txt.split("\n")[1]
results = re.findall(r"\[null,null,[0-9]{1,2}\.[0-9]{4,15},[0-9]{1,2}\.[0-9]{4,15}]", txt)
print(len(results))
for cord in results:
# curr = the description you can manually type in when saving
curr = txt.split(cord)[1].split("\"]]")[0]
curr = curr[curr.rindex(",\"") + 2:]
cords = str(cord).split(",")
lat = cords[2]
lon = cords[3][:-1]
results.append(s)
new_results += 1
page += 2
Actually getting the correct url
Getting the correct url currently seems to be the hardest part when doing this and I have not fully figured this out aswell. However, for my use-case this is not really important, so I extracted the correct pb-block once and called it a day.
As explained in the other answers, the id of the list is visible in the basic url (here, the 2sXX...) when you navigate to the list in your browser. It seems to usually be 24-32 (?) characters long.
.../maps/<coords>/data=!4m3!11m2!2sXXXX...XXXX!3e3
If you have this id, you can put it into an existing protobuf-block and it may work (I only tested this with 3 different lists, which were all created by the same account, so this theory is far from proven).
Now, how do you get the block? I would just share the one I have, but because I only understand parts of what it does, I fear that it may contain some personal info. Instead, I will share my process of getting it. For this I use Burpsuite. It's a program mainly used for web-security testing and has a free community edition, however for our use-case it is the perfect tool, because with it you can easily tinker with requests, change small parts in the request, send it again and immediately see if your changes changed the response. However for extracting the pb-block, one should also be able to use any program that can intercept browser traffic.
Heres the basic rundown with burp:
From GMaps, share a list that has >20 items (this is important) and copy the public link
In Burp, go to the tab "Proxy", make sure "Intercept" is off and click "Open browser" to open the integrated chromium browser
There, paste the link and wait until maps loaded completely
In Burp, turn "Intercept" on, then in google maps, scroll down in the list, until it starts loading new results (always blocks of 20)
Burp now intercepted all requests the browser made since you turned intercepting on. Click "Forward" and go through all requests, until you see a request in the format
GET /search?tbm=map&authuser=0&hl=de&gl=de&pb=!7i20....
This is what you're looking for.
Optionally, you can now right click into the request-text and click "send to repeater", then switch to the repeater-tab. Here you can edit the request and then send it again, being able to see the response immediately. For example, removing the authuser, hl, gl, q, ech, psi url parameters, the request still works flawlessly. If you remove the tch=1 parameter, the response you get will be in a more human readable format.
In the request-text you should now be able to just search for the list-id you got from the link previously and replace it with the id of another list (search bar is at the bottom in burp). As I said, this worked for me, but it may be possible that the pb-block contains some additional metadata that makes lists from different google-accounts or different types of lists incompatible with specific pb-blocks. Just a theory though. Let me know how it goes!
Further automating
I have theorised that one could automate getting the pb-block using requests-html because it can load html-sites fully but it doesn't get updated anymore. Another option (probably the better one) is Selenium Wire, as you should be able to load the page and intercept the requests, like we did in burp. Seems like a whole lot of work tho :D

This was the only API was able to find was this:
https://www.google.com/bookmarks/?output=xml
Used in a browser you would have to first log in through Google's OAuth. It would then return your saved places. Not sure at the moment how you would embedded the authentication to do this programmatically, but this might send you in the right direction.

I was able to extract the data I needed from my google maps list. Below are some comments that expand on some of the other comments here, along with a script that extracts all of the relevant data points from the network response.
Obtaining the underlying URL
You can easily find this URL by just opening the devtools on your browser, going to the network tab, and refreshing the webpage or scrolling down on the list until it loads new results (the list must be larger than 20 results). You should be able to find the network request that starts with https://www.google.com/search?tbm=map&pb... and go from there.
Increase the results size
I was able to increase the number of results returned from the request by changing the value of the 7i20 parameter. From what I can tell, the 71XX parameter is the size of the page, and the 8iXX parameter is the starting point. I haven't tested how large you can make the page limit, but I tested 100 and it seemed to work fine. This should make dealing with larger lists much easier.
Parsing out the data
Instead of using regex to parse out the relevant data from the response, I found that the response is basically just a massive JSON object and I was able to identify the indexes for specific types of data, such as the name of the place, location, notes, etc. See the script below.
If you look at the buildResults function in the script below, you can see the exact indexes used to extract specific pieces of information. This of course may change over time if the network response changes format at all, so use these as a starting point in the case where the specific values aren't at those indexes anymore. Hopefully they would be close to those locations
Script to parse the data (javascript / node.js)
// Insert the raw text content from the network response from the
// https://www.google.com/search?tbm=map&pb... url below.
const rawInput = null
function prepare(input) {
// There are 5 random characters before the JSON object we need to remove
// Also I found that the newlines were messing up the JSON parsing,
// so I removed those and it worked.
const preparedForParsing = input.substring(5).replace(/\n/g, '')
const json = JSON.parse(preparedForParsing)
const results = json[0][1].map(array => array[14])
return results
}
function prepareLookup(data) {
// this function takes a list of indexes as arguments
// constructs them into a line of code and then
// execs the retrieval in a try/catch to handle data not being present
return function lookup(...indexes) {
const indexesWithBrackets = indexes.reduce((acc, cur) => `${acc}[${cur}]`, '')
const cmd = `data${indexesWithBrackets}`
try {
const result = eval(cmd)
return result
} catch(e) {
return null
}
}
}
function buildResults(preparedData) {
const results = []
for (const place of preparedData) {
const lookup = prepareLookup(place)
// Use the indexes below to extract certain pieces of data
// or as a starting point of exploring the data response.
const result = {
address: {
street_address: lookup(183, 1, 2),
city: lookup(183, 1, 3),
zip: lookup(183, 1, 4),
state: lookup(183, 1, 5),
country_code: lookup(183, 1, 6),
},
name: lookup(11),
tags: lookup(13),
notes: lookup(25,15,0,2),
placeId: lookup(78),
phone: lookup(178,0,0),
coordinates: {
long: lookup(208,0,2),
lat: lookup(208,0,3)
}
}
results.push(result)
}
return results
}
const preparedData = prepare(rawInput)
const listResults = buildResults(preparedData)
console.log(listResults)

Related

Questions on extending GAS spreadsheet usefulness

I would like to offer the opportunity to view output from the same data, in a spreadsheet, TBA sidebar and, ideally another type of HTML window for output created, for example, with a JavaScript Library like THREE.
The non Google version I made is a web page with iframes that can be resized, dragged and opened/closed and, most importantly, their content shares the same record object in the top window. So, I believe, perhaps naively, something similar could be made an option inside this established and popular application.
At the very least, the TBA trial has shown me it useful to view and manipulate information from either sheet or TBA. The facility to navigate large building projects, clone rooms and floors, and combine JSON records (stored in depositories like myjson) for collaborative work is particularly inspiring for me.
I have tried using the sidebar for different HTML files, but the fact only one stays open is not very useful, and frankly, sharing record objects is still beyond me. So that is the main question. Whether Google people would consider an extra window type is probably a bit ambitious, but I think worth asking.
You can't maintain a global variable across calls to HtmlService. When you fire off an HtmlService instance, which runs in the browser, the server side code that launched it exits.
From that point control is client side, in the HtmlService code. If you then launch a server side function (using google.script.run from client side), a new instance of the server side script is launched, with no memory of the previous instance - which means that any global variables are re-initialized.
There are a number of techniques for peristing values across calls.
The simplest one of course is to pass it to the htmlservice in the first place, then to pass it back to server side as an argument to google.script.run.
Another is to use property service to hold your values, and they will still be there when you go back, but there is a 9k maximum entry size
If you need more space, then the cache service can hold 100k in a single entry and you can use that in the same way (although there is a slight chance it will be cleaned away -- although it's never happened for me)
If you need even more space, there are techniques for compressing and/or spreading a single object across several cache entries - as documented here http://ramblings.mcpher.com/Home/excelquirks/gassnips/squuezer. This same method supports Google Drive, or Google cloud storage if you need to persist data even longer
Of course you can't pass non-stringifiable objects like functions and so on, but you can postpone their evaluation and allow the initialized server side script to evaulate them, and even share the same code between server, client or across projects.
Some techniques for that are described in these articles
http://ramblings.mcpher.com/Home/excelquirks/gassnips/nonstringify
http://ramblings.mcpher.com/Home/excelquirks/gassnips/htmltemplateresuse
However in your specific example, it seems that the global data you want is fetched from an external api call. Why not just retrieve it client side in any case ? If you need to do something with it server side, then pass it to the server using google.script.run.
window.open and window.postMessage() solved both the problems I described above.
I hope you will be assured from the screenshot and code that the usefulness of Google sheets can be extended for the common good. At the core is the two methods for inputting, copying and reviewing textual data - spreadsheet for a slice through a set of data, and TBA for navigation of associations in the Trail (x axis) and Branches (y axis), and for working on Aspects (z axis) of the current selection that require attention, in collaborations, from different interests.
So, for example, a nurse would find TBA useful for recording many aspects of an examination of a patient, whereas a pharmacist might find a spreadsheet more useful for stock control. Both record their data in a common object I call 'nset' (hierarchy of named sets), saved in the cloud and available for distribution in collaborative activities.
TBA is also useful for cloning large sets of records. For example, one room, complete with furniture can be replicated on one floor, then that floor, complete with rooms can be replicated for a complete tower.
Being able to maintain parallel nset objects in multiple monitor windows by postMessage means unrivalled opportunities to display the same data in different forms of multimedia, including interactive animation, augmented reality, CNC machine instruction, IOT controls ...
Here is the related code:
From the TBA in sidebar:
window.addEventListener("message", receiveMessage, false);
function openMonitor(nset){
var params = [
'height=400',
'width=400'
].join(',');
let file = 'http://glasier.hk/blazer/model.html';
popup = window.open(file,'popup_window', params);
popup.moveTo(100,100);
}
var popup;
function receiveMessage(event) {
let ed,nb;
ed = event.data;
nb = typeof ed === "string"? ed : nb[0];
switch(nb){
case "Post":
console.log("Post");
popup.postMessage(["Refreshing nset",nset], "http:glasier.hk");
break;
}
}
function importNset(){
google.script.run
.withSuccessHandler(function (code) {
root = '1grsin';
trial = 'msm4r';
orig = 'ozs29';
code = orig;
path = "https://api.myjson.com/bins/"+code;
$.get(path)
.done((data, textStatus, jqXHR) => {
nset = data;
openMonitor(nset);
cfig = nset.cfig;
start();
})
})
.sendCode();
}
From the popup window:
$(document).ready(function(){
name = $(window).attr("name");
if(name === "workshop"){
tgt = opener.location.href;
}
else{
tgt = "https://n-rxnikgfd6bqtnglngjmbaz3j2p7cbcqce3dihry-0lu-script.googleusercontent.com"
}
$("#notice").html(tgt);
opener.postMessage("Post",tgt);
$(window).on("resize",function(){
location.reload();
})
})
}
window.addEventListener("message", receiveMessage, false);
function receiveMessage(event) {
let ed,nb;
ed = event.data;
nb = typeof ed === "string"? ed : ed[0];
switch(nb){
case "Post": popup.postMessage(["nset" +nset], "*"); break;
default :
src = event.origin;
notice = [ed[0]," from ",src ];
console.log(notice);
// $("#notice").html(notice).show();
nset = ed[1];
cfig = nset.cfig;
reloader(src);
}
}
I should explain that the html part of the sidebar was built on a localhost workshop, with all styles and scripts compiled into a single file for pasting in a sidebar html file. The workshop also is available online. The Google target is provided by event.origin in postMessage. This would have to be issued to anyone wishing to make different monitors. For now I have just made the 3D modelling monitor with Three.js.
I think, after much research and questioning around here, this should be the proper answer.
The best way to implement global variables in GAS is through userproperties or script properties.https://developers.google.com/apps-script/reference/properties/properties-service. If you'd rather deal with just one, write them to an object and then json.stringify the object (and json.parse to get it back).

How to restrict fields returned by stackexchange api, and turn off paging?

I'd like to have a list of just the current titles for all questions in one of the smaller (less than 10,000 questions) stackexchange site. I tried the interactive utility here: https://api.stackexchange.com/docs/questions and it both reports the result as a json at the bottom, and produces the requesting url at the top. For example:
https://api.stackexchange.com/2.2/questions?order=desc&sort=activity&tagged=apples&site=cooking
returns this JSON in my browser:
{"items":[{"tags":["apples","crumble"],"owner":{ ...
...
...],"has_more":true,"quota_max":300,"quota_remaining":252}
What is quota? It was 10,000 on one search on one site, but suddenly it's only 300 here.
I won't be doing this very often, what I'd like is the quickest way to edit that (or similar of course) url so I can get a list of all of the titles on a small site. I don't understand how to use paging, and I don't need any of the other fields. I don't care if I get them, but I'm thinking if I exclude them I can have more at once.
If I need to script it, python (2.7) is my preferred (only) language.
quota_max is the number of requests your application is allowed per day. 300 is the default for an unregistered application. This used to be mentioned directly on the page describing throttles, but seems to have been removed. Here is historical information describing the default.
To increase this to 10,000, you need to register an application and then authenticate by passing an access token in your script.
To get all titles on a site, you can use a Python library to help:
StackAPI. The answer below will use this library. DISCLAIMER: I wrote this library
Py-StackExchange
SEAPI
StackPy
Assuming you have registered your application and authenticated we can proceed.
First, install StackAPI (documentation):
pip install stackapi
This code will then grab the 10,000 most recent questions (max_pages * page_size) for the site hardwarerecs. Each page costs you one API hit, so the more items per page, the few API calls.
from stackapi import StackAPI
SITE = StackAPI('hardwarerecs')
SITE.page_size = 100
SITE.max_pages = 100
# Filter to only get question title and link
filter = '!BHMIbze0EQ*ved8LyoO6rNjkuLgHPR'
questions = SITE.fetch('questions', filter=filter)
In the questions variable is a dictionary that looks very similar to the API output, except that the library did all the paging for you. Your data is in questions['data'] and, in this case, contains a list of dictionaries that look like this:
[
...
{u'link': u'http://hardwarerecs.stackexchange.com/questions/29/sound-board-to-replace-a-gl2200-in-a-house-of-worship-foh-setting',
u'title': u'Sound board to replace a GL2200 in a house-of-worship FOH setting?'},
{ u'link': u'http://hardwarerecs.stackexchange.com/questions/31/passive-gps-tracker-logger',
u'title': u'Passive GPS tracker/logger'}
...
]
This result set is limited to only the title and the link because of the filter we applied. You can find the appropriate filter by adjusting what fields you want in the web UI and copying the filter field.
The hardwarerecs parameter that is passed when creating the SITE parameter is the first part of the site's domain URL. Alternatively, you can find it by looking at the api_site_parameter for your site when looking at the /sites end point.

Permanent links to thumbnails in Google Drive API

I'm using Google Drive API (PHP) to upload some photos to my Drive. When a file is uploaded, a Google_DriveFile object is returned in the response to confirm the successful transfer. It includes a field called thumbnailLink, accessible through the getThumbnailLink getter. Its content may look like this:
https://lh4.googleusercontent.com/dqVdU195R4_0ZtWxsJlhW1Fr2K30xa2hH3V1KV4UrTBl9QkhOSR0ZqN9HoB-TjEQv8SIJw=s220
Until today, I was sure that the link doesn't change by itself over time. However, when I tried to display a thumbnail of a photo I have on my Drive, using a cached address I keep in my local database, I got a 403 error - you can see it under the mentioned link. I asked the API for the current link to the thumbnail and it's now completely different.
It happened to me only once but for multiple files, i.e. all the files I had on my Drive suddenly got new thumbnail links.
Is there a way to quickly retrieve a thumbnail of a document (preferably, a photo) by some constant value or to be sure that it won't change? The perfect solution would be to access the thumbnail under a link that includes the document's id instead of some hash that may change.
Try this:
https://drive.google.com/thumbnail?authuser=0&sz=w320&id=[fileid]
Where:
sz is a size, where you may use as w (width), as h (height)
fileid is a file id. You may find it in "share" menu by right click in Google Drive UI.
I have gone through the API Documentation as they have provided:
Important: Thumbnails are invalidated each time the content of the file changes. When supplying thumbnails, it is important to upload new thumbnails each time the content is modified.
According to the information it means that a new Thumbnail is only generated only when the contents of the file are modifided. But in your case it is really weired thing and the contents are not changed but the thumbnail are Changed. As from documentation there is no batch process thing avaiable but another way around is available i.e. Web Hook
According to the Documentation there is web hook available i.e. Files:Watch process through which one can track the changes are made to file. Thus, it means every time contents are changed then hook would run and you can change the cache of the image thumbnail.
HTTP request can be sent to request the watching the files changing
POST https://www.googleapis.com/drive/v2/files/fileId/watch
Here fileID means the ID provided after loading the file.
In the request body, supply data with the following structure:
id ==> string (A UUID or similar unique string that identifies
this channel.)
token# ==> string (An arbitrary string delivered to the target address with
each notification delivered over this channel).
expiration# => long (Date and time of notification channel expiration,
expressed as a Unix timestamp, in milliseconds.)
type ==> string (The type of delivery mechanism used for this channel.
The only option is web_hook.)
address => string (The address where notifications are delivered
for this channel.)
# Optional.
If the contents get changed then new Thumbnail is generated and hook will notify you address and through your address you can fetch new information.
Here is another solution. Let's say we store only GDrive ID of the images or PDFs (google generate thumbs for many file types).
we can send request to gDrive to get valid thumbnail since looks like thumbs will expire even if there is no changes to the file.
In this case each thumbnail inside Angular component. If you use something else you can create array of links and iterate through it to create proper thumb links.
Here is the code:
const thumb = () => {
if (this.item.DriveId) {
this.getThumb(this.item.DriveId, this.authToken)
.then(response => {
console.log(`response from service ${response}`);
// Set thumbnail width size to 300px or any other width if needed
this.item.externalThumbnailId = response.slice(0, -3) + 300;
})
//here we can handle cases when API limit exceeded 10 req in a sec
.catch(e => {
if(e.data.error.message == 'User Rate Limit Exceeded'){
console.log('Failed to load thumb. trying one more time');
setTimeout(thumb, 1000);
} else {
console.log(e);
}
});
}
};
//call this function on component load.
thumb();
Another solution will be to write some backend script that updates thumbs in DB records.

Drive API files.list returning nextPageToken with empty item results

In the last week or so we got a report of a user missing files in the file list in our app. We we're a bit confused at first because they said they only had a couple files that matched our query string, but with a bit of work we were able to reproduce their issue by adding a large number of files to our Google Drive. Previously we had been assuming people would have less than 100 files and hadn't been doing paging to avoid multiple files.list requests.
After switching to use paging, we noticed that on one of our test accounts was sending hundreds and hundreds of files.list requests and most of the responses did not contain any files but did contain a nextPageToken. I'll update as soon as I can get a screenshot - but the client was sending enough requests to heat the computer up and drain battery fairly quickly.
We also found that based on what the query is even though it matches the same files it can have a drastic effect of the number of requests needed to retrieve our full file list. For example, switching '=' to 'contains' in the query param significantly reduces the number of requests made, but we don't see any guarantee that this is a reasonable and generalizeable solution.
Is this the intended behavior? Is there anything we can do to reduce the number of requests that we are sending?
We're using the following code to retrieve files created by our app that is causing the issue.
runLoad: function(pageToken)
{
gapi.client.drive.files.list(
{
'maxResults': 999,
'pageToken': pageToken,
'q': "trashed=false and mimeType='" + mime + "'"
}).execute(function (results)
{
this.filePageRequests++;
if (results.error || !results.nextPageToken || this.filePageRequests >= MAX_FILE_PAGE_REQUESTS)
{
this.isLoading(false);
}
else
{
this.runLoad(results.nextPageToken);
}
}.bind(this));
}
It is, but probably shouldn't be, the correct behaviour.
It generally occurs when using the drive.file scope. What (I think) is happening is that the API layer is fetching all files, and then removing those that are outside of the current scope/query, and returning the remainder to your client app. In theory, a particular page of files could have no files in-scope, and so the returned array is empty.
As you've seen, it's a horribly inefficient way of doing it, but that seems to be the way it is. You simply have to keep following the next page link until it's null.
As to "Is there anything we can do to reduce the number of requests that we are sending?"
You're already setting max results to 999 which is the obvious step. Just be aware that I have seen this value trigger internal errors (timeouts?) which manifest themselves as 500 errors. You might want to sacrifice efficiency for reliability and stick to the default of 100 which seems to be better tested.
I don't know if the code you posted is your actual code, or just a simplified illustration, but you need to make sure you are dealing with 401 errors (auth expiry) and 500 errors (sometimes recoverable with a retry)

Need help parsing through a page & getting all comments found into an Array using a RegEx pattern in Google Apps Script?

My problem lies in 2 parts however I'm hoping solving 1 will fix the other. I've been trying to parse through a page and get all the comments found within a forum thread.
The comments are found using a RegEx pattern and the idea is that whatever lies in the comment will be read into an array until there aren't any more comments left. Each comment div follows this format
<div id="post_message_480683" style="margin-right:2px;"> something </div>
I'm trying to locate up to "post_message_[some number]" since each number seems to be generated randomly and then get whatever is between that particular div. My 1st problem is my RegEx just doesn't seem to be working I've tried a few but none yielded any results (except for when I insert the post message no. in manually), Here's the code so far:
function GetPosts() {
var posts = new Array(60);
var url = "http://forums.blackmesasource.com/showthread.php?p=480683";
var geturl = UrlFetchApp.fetch(url).getContentText().toString();
var post_match = geturl.match(/<div id="post_message_(.+)" style="margin-right:2px;">(\w.+)<\/div>/m);
Logger.log(post_match);
}
Edit: I initially tried getting this info via GAS's Xml.Parse() class but after grabbing the URL I just didn't know what to do since suffixing
.getElement().getElement('div') (I also tried .getElements('div') and other variations with 'body' & 'html')
would cause an error. Here is the last code attempt I tried before trying the RegEx route:
function TestArea() {
var url = "http://forums.blackmesasource.com/showthread.php?p=480683";
var geturl = UrlFetchApp.fetch(url).getContentText().toString();
//after this point things stop making sense
var parseurl = Xml.parse(geturl, true);
Logger.log(geturl);
//None of this makes sense because I don't know HOW!
//The idea: Store each cleaned up Message Div in an Array called posts
//(usually it's no more than 50 per page)
//use a for loop to write each message into a row in GoogleSpreasheet
for (var i = 0; i <= parseurl - 1; i++) {
var display = parseurl[i];
Logger.log(parseurl); }
}
Thanks for reading!
In general like the comment points out - be aware of parsing HTML with RegEx.
In my past personal experience, I've used Yahoo's YQL platform to run the HTML through and using XPath on their service. Seems to work decently well for simple reliable markup. You can then turn that into a JSON or XML REST service that you can grab via UrlFetch and work on that simplified response. No endorsement here, but this might be easier than to bring down the full raw HTML into Google Apps Script. See below for the YQL Console. I also don't know what their quotas are - you should review that.
Of course, the best course is to convince the site owner to provide an RSS feed or an API.