Google Apps Script to download pdfs from UN ODS - google-apps-script

Background
The UN Secretary-General and other organs issue hundreds of reports to the General Assembly each year, and there is no unified list of these reports, like there are for other documents. There is, however, a simplified url for reading these reports using their document codes http://undocs.org/[document code] with the document codes having the format A/[Session]/[Document Number]. An example document code would be "A/71/1" and the url for accessing it would be "https://undocs.org/A/71/1".
I'm trying to download all of these documents for the past 15 years, but instead of manually typing in each of these, I'd like to set up a Google Apps Script to do it for me.
Problem
When I try to use the simple method UrlFetchApp.fetch("http://undocs.org/A/71/1"); for example, it fetches an error page saying that I am using an unauthorized method of accessing the page. This is the same page that shows up if you block cookies or sometimes when you try to access the page in an incognito window.
Now, I'm not looking to hack into the UN, but simply to download some PDFs that are up for public access. I need to figure out what sort of parameters I need to pass with the .fetch() method for the request to be authorized by the page.
Note: I scoured the undocs.org site looking for any guidance, and I found none.
tl;dr
Trying to access United Nations Official Document System using the UrlFetchApp from Google Apps Script, but I can't figure out how to get the request to be authorized.

Short answer - I don't think you'll be able to get it with a one-line fetch.
If you look at the HTML returned when you fetch https://undocs.org/A/71/1, you'll see that it embeds a frame that gets its content from https://daccess-ods.un.org/access.nsf/Get?OpenAgent&DS=A/71/1&Lang=E. Then, if you look at the HTML returned by that frame, you'll see two things:
A frame that loads https://documents-dds-ny.un.org/prod/ods_mother.nsf?Login&Username=freeods2&Password=1234
A redirect to the actual PDF at https://documents-dds-ny.un.org/doc/UNDOC/GEN/N16/206/02/PDF/N1620602.pdf?OpenElement
I presume that the first link sets a cookie indicating that the login has occurred, which the second link then verifies before returning the content.
Things you could try:
A multi-step fetch, where you first get the content from undocs.org, parse it to get the link to the actual PDF, then login and fetch the PDF. Google Apps Script would have to persist cookies between fetches though.
Write your script in different tool (such as Python).
Use a spider/crawler tool to navigate the UN site as if it was a real human.

Related

How to use preview image or PDF from Google Drive, using API?

I'm writting Web application to display content of Google Drive images and files, using API.
Currently, I can only see thumbnails of images/files (without login to Google drive).
If I want to preview the file, I need to be logged into Google drive and then I can use link returned by "webViewLink" and actually see the file.
I know I can click on folder or file on Drive and Share it, but I'm afraid that my customers will not be able to do that and it is complicated, anyway.
I already displaying Google dialog to customer where customer need to allow access to upload,delete etc. of files and now he can not preview the file???
Application is designed to display image/whatever to customer only, inside app only and not to sharing. With other words, I want to display images which he can see anyway if he is logged to Drive.
Is there any other option to allow customer to preview the file, if he already allowed full access previously?
Thanks.
Authorizing with OAuth does not automatically log you in. Users use their credentials to give their permission to create an access token, which needs to be used in any API calls. It does not imply that a browser session was created, that's a separate process.
You'll notice that the webViewLink is just the regular Drive URL with /view at the end. It's a page that requires the user to be signed in:
"webViewLink": "https://drive.google.com/file/d/<FILE-ID>/view?usp=drivesdk",
I'm not aware of any methods to sign in the user at the same time they use OAuth, but if you send your access_token in an Authorization: Bearer <access_token> header when trying to access the above URL you can see the preview without having to sign in. Depending on your platform I think implementing this would be tricky, and maybe not possible in Apps Script alone.
My recommendation as a workaround is to just use a full thumbnail. Don't know if you're aware of this, but the thumbnail URL has a =s parameter at the end that defines its height in pixels:
"thumbnailLink": "https://lh5.googleusercontent.com/<THUMB-ID>=s220",
You can change the default =s220 at the end to a higher size or remove it completely to get pretty much the full size of the image or PDF page. This may be enough for your users to figure out what the file is.

How to find parameters (variables) in a web site by using google chrome for JMeter

I just started to use JMeter. I want to know how can I see the variables at google chrome. I tried so many ways and I watched lots of videos but there are no answers. Every video says that; you can find it in the network page from console. But on the website which im looking for variables, I just can see JavaScript code token. There is no token as variable. Is there a way to import it into JMeter or is there other way to find variables in google chrome. To put it in a nutshell, I'm working on a project. I want to load test for a website but I got problems with POST request. I cannot login the threads because of cannot find the variables. I need help !! :(
This is what I am trying to reach with screenShot...
This is my chrome's screenShot
I'm just tryna get the parameters like first pic.
There are 2 possibilities:
The "variable" comes in the response, take a closer look at:
response URL as it may be a part of the URL after redirection
response headers (can be observed in the "Network" tab of the developer tools)
response body, you can view page source
The variable gets "calculated" in the browser using JavaScript code. If this is the case you should find the relevant JavaScript function and implement it using JMeter's JSR223 Test Elements
You can also try recording your test scenario using BlazeMeter Proxy Recorder, it's capable of exporting recorded scripts in "SmartJMX" mode with automatic detection and correlation of the dynamic parameters. In case #2 it will not help, but if the variable comes with the response most probably it will be able to detect it. More information: How to Cut Your JMeter Scripting Time by 80%

Google apps script cannot get url parameter from page on new Google site

My current site (Golf League) uses several scripts to allow players to schedule whether they are playing, display various results pages etc. It seems as though the New Google Sites implementation does not allow a parameter to be passed in the page url and get picked up by an embedded Google Web App (published from my script)
This link shows an example https://sites.google.com/site/kitchenergaffers/home/general-gaffers-information/publish/directory-of-results?display=directory
There is my webapp (built from a GAS) that does a doGet(e). The "display" parameter tells this script which page to format and display which it gets by extracting the e.queryString. I use a similar approach for players scheduling their absences. Another url parameter identifies the player who may be changing their availability.
It seems as though this ability is not going to be supported in the New Google Sites, so I am looking for an alternative (and free) web building facility where I can launch GAS web apps and access the page url parameters the same (or similar) way. Wordpress, Wix etc may be candidates, but it is difficult to tell from their introductory info whether it can be done. If someone has already found a site facility and methodology I would appreciate the guidance.
Just in case anyone finds this in a search, I have found a workaround.
What I had missed is that a script can be the target of a URL and will execute in a browser on its own. It does not need a "hosting" page. So to achieve what I need to do, instead of sending the link with the Google sites page, I can send a link with the script directly and it will happily execute in its own browser environment. In some cases, I may need to add a bit of text to the html returned by the script to replace that which was on the Sites page
So this link (below) achieves what I needed. Be aware that the links displayed by the script, are currently still to the original sites page.
https://script.google.com/macros/s/AKfycbxichdoGrHbImuudkJbuhhD00GpHvVvc-Ph_BTpSI4863pMevVx/exec?display=directory

Get downloads & file browsing requests from HTMLFrame

I am able to load and display files using the HTMLLoader class. http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/html/HTMLLoader.html
Problem is, when a user navigates to a download link or an upload button, nothing happens. I heard somewhere that any downloads get sent over to the user's main document folder. Anyway to intercept this and get some details? Someone in my browsing history suggested to somehow get it using the Socket class to fetch it's data and control where it would go using the File class. I couldn't make out the demonstration.
Bonus question...what properties do I have to set to make Google understand that this browser is not a bot? I get this in plain text when trying to navgiate to http://www.google.com . It's other services work completely fine though.
Google
Sorry...
We're sorry...
... but your computer or network may be sending automated queries. To protect our users, we can't process your request right now.

Unable to serve download links in google apps script

UPDATE: I have found a solution. This doesn't necessarily address every case, so I will leave the question open for a short time in case someone can enlighten me more. I solved it by changing the format of the url: Google Drive allows this format for downloading files:
https://docs.google.com/uc?export=download&id=FILE_ID
So I don't know if this is a problem for other URL's; nor actually exactly why the .getDownloadUrl() doesn't work ... maybe someone can explain. But for now this seems to work in the browsers that I can test ...
I have a simple WebApp script which I run on a Google Site by adding the Apps Script gadget. The gadget runs exactly as the Forms example on:
https://developers.google.com/apps-script/guides/html/communication#forms
The gadget is designed to do the following: when the page is loaded, a form is returned, and the user must enter a license key to get a link to download a product. My code serves the form OK, and gets the form submit OK; and it then validates the key, and if valid, sends back a link to download. All that works fine; and the problem is that no matter what I try to return for the download link, the caja iframe wrapper is preventing the click on the link from actually downloading the file.
My preferred URL to return is in fact via the Drive API: the download file is on the Google Drive, and I get the download link like so:
DriveApp.getFileById(downloadFileId).getDownloadUrl()
But when the returned link is clicked inside that caja iframe generated for the WebApp gadget, nothing happens. I have tried a few other URL formats pointing to that file on the Drive, but nothing is working for a download.
Is this possible?
.getDownloadUrl() method returns a temporary URL that can be used to download the file. This URL is valid only for a short period of time, after which it expires and does not return the file any more - that is probably why the links in your web app do not work. Can't remember exactly how long the URL is valid for, but I think it could be as short as 5 minutes.
Permanent download URL is stored in another file property: webContentLink. However, this property is not (yet) available through Google Apps Drive Service - you must use Advanced Drive Service to access it. You can enable Drive API under Advanced Google services in your script. After it is enabled, you can use it like so:
var file = Drive.Files.get(FILE_ID_HERE);
var dlUrl = file.webContentLink;
This will return the link just like the one you found and posted in your update. An advantage of using the Drive API to get the link, instead of hard-coding it, is that if Google ever changes the format of that URL, your code using Drive API to get the link will continue to work, while hard-coded links will not.
Full Drive Web API reference (what Advanced Drive Service uses) is at https://developers.google.com/drive/v2/reference/.