Pulling PubMed data into Google Sheets - google-apps-script

I'm looking for some help. I am trying to grab an author's publications from PubMed and populate the data into Google Sheets using Apps Script. I've gotten as far as the code below and am now stuck.
Basically, what I have done was first pull all the Pubmed IDs from a particular author whose name comes from the name of the sheet. Then I have tried creating a loop to go through each Pubmed ID JSON summary and pull each field I want. I have been able to pull the pub date. I had set it up with the idea that I would do a loop for each field of that PMID I want, store it in an array, and then return it to my sheet. However, I'm now stuck trying to get the second field - title - and all the subsequent fields (e.g. authors, last author, first author, etc.)
Any help would be greatly appreciated.
function IMPORTPMID(){
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getSheets()[0];
var author = sheet.getSheetName();
var url = ("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=" + author + "[author]&retmode=json&retmax=1000");
var response = UrlFetchApp.fetch(url);
var AllAuthorPMID = JSON.parse(response.getContentText());
var xpath = "esearchresult/idlist";
var patharray = xpath.split("/");
for (var i = 0; i < patharray.length; i++) {
AllAuthorPMID = AllAuthorPMID[patharray[i]];
}
var PMID = AllAuthorPMID;
var PDparsearray = [PMID.length];
var titleparsearray = [PMID.length];
for (var x = 0; x < PMID.length; x++) {
var urlsum = ("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&retmode=json&rettype=abstract&id=" + PMID[x]);
var ressum = UrlFetchApp.fetch(urlsum);
var contentsum = ressum.getContentText();
var jsonsum = JSON.parse(contentsum);
var PDpath = "result/" + PMID[x] + "/pubdate";
var titlepath = "result/" + PMID[x] + "/title";
var PDpatharray = PDpath.split("/");
var titlepatharray = titlepath.split("/");
for (var j = 0; j < PDpatharray.length; j++) {
var jsonsum = jsonsum[PDpatharray[j]];
}
PDparsearray[x] = jsonsum;
}
var tempArr = [];
for (var obj in AllAuthorPMID) {
tempArr.push([obj, AllAuthorPMID[obj], PDparsearray[obj]]);
}
return tempArr;
}

From a PubMed JSON response for a given PubMed ID, you should be able to determine the fieldnames (and paths to them) that you want to include in your summary report. Reading them all is simpler to implement if they are all at the same level, but if some are properties of a sub-field, you can still access them if you give the right path in your setup.
Consider the "source JSON":
[
{ "pubMedId": "1234",
"name": "Jay Sahn",
"publications": [
{ "pubId": "abcd",
"issn": "A1B2C3",
"title": "Dynamic JSON Parsing: A Journey into Madness",
"authors": [
{ "pubMedId": "1234" },
{ "pubMedId": "2345" }
]
},
{ "pubId": "efgh",
...
},
...
],
...
},
...
]
The pubId and issn fields would be at the same level, while the publications and authors would not.
You can retrieve both the pubMedId and publications fields (and others you desire) in the same loop by either 1) hard-coding the field access, or 2) writing code that parses a field path and supplying field paths.
Option 1 is likely to be faster, but much less flexible if you suddenly want to get a new field, since you have to remember how to write the code to access that field, along with where to insert it, etc. God save you if the API changes.
Option 2 is harder to get right, but once right, will (should) work for any field you (properly) specify. Getting a new field is as easy as writing the path to it in the relevant config variable. There are possibly libraries that will do this for you.
To convert the above into spreadsheet rows (one per pubMedId in the outer array, e.g. the IDs you queried their API for), consider this example code:
function foo() {
const sheet = /* get a sheet reference somehow */;
const resp = UrlFetchApp.fetch(...).getContentText();
const data = JSON.parse(resp);
// paths relative to the outermost field, which for the imaginary source is an array of "author" objects
const fields = ['pubMedId', 'name', 'publications/pubId', 'publications/title', 'publications/authors/pubMedId'];
const output = data.map(function (author) {
var row = fields.map(function (f) {
var desiredField = f.split('/').reduce(delve_, author);
return JSON.stringify(desiredField);
});
return row;
});
sheet.getRange(1, 1, output.length, output[0].length).setValues(output);
}
function delve_(parentObj, property, i, fullPath) {
// Dive into the given object to get the path. If the parent is an array, access its elements.
if (parentObj === undefined)
return;
// Simple case: parentObj is an Object, and property exists.
const child = parentObj[property];
if (child)
return child;
// Not a direct property / index, so perhaps a property on an object in an Array.
if (parentObj.constructor === Array)
return collate_(parentObj, fullPath.splice(i));
console.warn({message: "Unhandled case / missing property",
args: {parent: parentObj, prop: property, index: i, pathArray: fullPath}});
return; // property didn't exist, user error.
}
function collate_(arr, fields) {
// Obtain the given property from all elements of the array.
const results = arr.map(function (element) {
return fields.slice().reduce(delve_, element);
});
return results;
}
Executing this yields the following output in Stackdriver:
Obviously you probably want some different (aka real) fields, and probably have other ideas for how to report them, so I leave that portion up to the reader.
Anyone with improvements to the above is welcome to submit a PR.
Recommended Reading:
Array#reduce
Array#map
Array#splice
Array#slice
Internet references on parsing nested JSON. There are a lot.

Related

XmlService.parse() not able to handle HTML tables

I am looking for help from this community regarding the below issue.
// I am searching my Gmail inbox for a specific email
function getWeeklyEmail() {
var emailFilter = 'newer_than:7d AND label:inbox AND "Report: Launchpad filter"';
var threads = GmailApp.search(emailFilter, 0, 5);
var messages=[];
threads.forEach(function(threads)
{
messages.push(threads.getMessages()[0]);
});
return messages;
}
// Trying to parse the HTML table contained within the email
function getParsedMsg() {
var messages = getWeeklyEmail();
var msgbody = messages[0].getBody();
var doc = XmlService.parse(msgbody);
var html = doc.getRootElement();
var tables = doc.getDescendants();
var templ = HtmlService.createTemplateFromFile('Messages1');
templ.tables = [];
return templ.evaluate();
}
The debugger crashes when I try to step over the XmlService.parse function. The msgbody of the email contains both text and HTML formatted table. I am getting the following error: TypeError: Cannot read property 'getBody' of undefined (line 19, file "Code")
If I remove the getParsedMsg function and instead just display the content of the email, I get the email body along with the element tags etc in html format.
Workaround
Hi ! The issue you are experiencing is due to (as you previously mentioned) XmlService only recognising canonical XML rather than HTML. One possible workaround to solve this issue is to search in the string you are obtaining with getBody() for your desired tags.
In your case your main issue is var doc = XmlService.parse(msgbody);. To solve it you could iterate through the whole string looking for the table tags you need using Javascript search method. Here is an example piece of code retrieving an email with a single table:
function getWeeklyEmail() {
var emailFilter = 'newer_than:7d AND label:inbox AND "Report: Launchpad filter"';
var threads = GmailApp.search(emailFilter, 0, 5);
var messages=[];
threads.forEach(function(threads)
{
messages.push(threads.getMessages()[0]);
});
return messages;
}
// Trying to parse the HTML table contained within the email
function getParsedMsg() {
var messages = getWeeklyEmail();
var msgbody = messages[0].getBody();
var indexOrigin = msgbody.search('<table');
var indexEnd = msgbody.search('</table');
// Get what is in between those indexes of the string.
// I am adding 8 as it indexEnd only gets the first index of </table
// i.e the one before <
var Table = msgbody.substring(indexOrigin,indexEnd+8);
Logger.log(Table);
}
If you are looking for more than one table in your message, you can change getParsedMsg to the following:
function getParsedMsg() {
// If you are not sure about how many you would be expecting, use an approximate number
var totalTables = 2;
var messages = getWeeklyEmail();
var msgbody = messages[0].getBody();
var indexOrigin = msgbody.indexOf('<table');
var indexEnd = msgbody.indexOf('</table');
var Table = []
for(i=0;i<totalTables;i++){
// go over each stable and store their strings in elements of an array
var start = msgbody.indexOf('<table', (indexOrigin + i))
var end = msgbody.indexOf('</table', (indexEnd + i))
Table.push(msgbody.substring(start,end+8));
}
Logger.log(Table);
}
This will let you store each table in an element of an array. If you want to use these you would just need to retrieve the elements of this array and use them accordingly (for exaple to use them as HTML tables.
I hope this has helped you. Let me know if you need anything else or if you did not understood something. :)

gsheet importjson query for parsing JSONSchema

Using ImportJSON to parse JSONSchema documents and load into GSheet.
I have JSON documents with paths as in the snip below.
I want to output the names of properties in one column and the type in another.
Wanted to see if someone has done this already before i start hacking about with parseJSON or the defaultTransform functions of ImportJSON.
Added example GSheet here
Shows source, currently parsed output and what i need in terms of required output
/data/schema/properties/plan_id/type
/data/schema/properties/plan_id/maxLength
/data/schema/properties/plan_name/type
/data/schema/properties/plan_name/maxLength
/data/schema/properties/type/type
/data/schema/properties/type/maxLength
/data/schema/properties/quantity_ranges/type
/data/schema/properties/quantity_ranges/maximum
/data/schema/properties/quantity_ranges/minimum
/data/schema/properties/pricing_option/type
/data/schema/properties/pricing_option/maxLength
/data/schema/properties/currency/type
/data/schema/properties/currency/enum
/data/schema/properties/value/type
/data/schema/properties/value/maximum
/data/schema/properties/value/minimum
Thanks in advance!
You want to achieve the following situation.
From
To
You want to achieve this using Google Apps Script.
I understood like above. If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
Sample script:
When you use this sample script, please put =parseObject("SourceJSON!A1") to a cell in your shared Spreadsheet.
function parseObject(range) {
var range = SpreadsheetApp.getActiveSpreadsheet().getRange(range);
var value = range.getValue();
var object = JSON.parse(value);
var res = [];
var headers = ["type", ["maxLength", "maximum"], "minimum", "enum"];
// var headers = ["type", "maxLength", "maximum", "minimum", "enum"];
for (var i in object.data.schema.properties) {
var obj = object.data.schema.properties[i];
for (var j = 0; j < headers.length; j++) {
var temp = [object.data.id, object.data.version];
if (Array.isArray(headers[j])) {
for (var k = 0; k < headers[j].length; k++) {
if (obj[headers[j][k]]) res.push(temp.concat([i, "",obj[headers[j][k]],"",""]));
}
} else {
if (obj[headers[j]]) {
var ar = [i, "","","",""];
ar.splice(j + 1, 1, Array.isArray(obj[headers[j]]) ? obj[headers[j]].join(",") : obj[headers[j]]);
res.push(temp.concat(ar));
}
}
}
}
return res;
}
Result:
Note:
This sample script retrieves the data from the Spreadsheet.
In your DesiredOutput, the values of "maxLength" and "maximum" in the data are put to the same column. At above sample script, the result is the same with it. If you want to separate the values of "maxLength" and "maximum", please modify var headers = ["type", ["maxLength", "maximum"], "minimum", "enum"]; to var headers = ["type", "maxLength", "maximum", "minimum", "enum"];.
This sample script is for the value in your shared Spreadsheet. So when you use this for the data with other structure, an error might occur and/or the result you don't want might be returned. Please be careful this.

JSON Query in Google Sheets/Scripts

I am importing data from a JSON file using Google Apps Script and Google Sheets. I have learned the basics on this, but the formatting on the JSON file I am attempting to parse is throwing me off.
What is confusing me is how I would search for information based on "name". Currently I am using this:
function JSONReq(url, xpath){
var res = UrlFetchApp.fetch(url);
var content = res.getContentText();
var json = JSON.parse(content);
var patharray = xpath.split("/");
for(var i = 0; i < patharray.length; i++){
json = json[patharray[i]];
}
return json;
}
I'm a bit lost now to be honest with you.
I want to have a cell where I can type a name that I already know of, then find it in the JSON file and pull the return that information however I decide to do it. I can pull and write to cells, I have the basics down. But I just can't understand how I could search by the name.
That JSON file is an array of objects. To find a specific object with a given "name", you would parse it into an object (which you do already), then iterate through them and check the name parameter:
var myName = "name of thing I want";
var arr = JSON.parse( ... );
for(var i = 0; i < arr.length; ++i) {
var obj = arr[i];
if(obj.name == myName) { // could be done as obj["name"] == ... too
// do stuff with obj
}
}
For your case, you might add an additional argument to your function (i.e. 2nd arg = the object's property, e.g. "name", with the 3rd = the desired value. This will be fine for any simple key-value properties, but would need specific handling for where the value is itself an object (e.g. the "category" field in your specific JSON file).

Can I append data to an existing BigQuery table from a CSV file using the API?

I'm trying to use Google Apps Script to append data into a BigQuery table using the BigQuery API. The data to append is currently CSV format. So far I've found that you can stream data into BigQuery using tabledata().insertAll() but it looks like that requires json format and I'm not even convinced that it would do what I need to. Is there a straightforward solution to this that I'm missing? Because I know BigQuery supports appending, and yet everything I'm finding is really focused on loading data into new tables.
EDIT:
Sounds like tabledata().insertAll() is indeed the right method to use (hopefully). So I converted my file to json instead, but now I'm stuck on how to actually use it. I'm trying to base what I'm doing off of the reference page for it but it's still really confusing for me. Currently I am getting a 404 error when I run my code and it hits the fetch call. I'm trying to do a URL fetch, maybe that's not how I'm supposed to be doing things? I'm really new to APIs and I'm still figuring out how they work. Here's the code I currently have that's causing this:
var tableId = 'users';
var file = DriveApp.getFileById(jsonId);
//I don't know if a blob is the type that I want or not, but I'm trying it
var data = file.getBlob();
var url = 'https://www.googleapis.com/bigquery/v2/projects/PROJECT_ID/datasets/DATASET_ID/tables/tableId/insertAll'
.replace("PROJECT_ID", params.PROJECT_ID)
.replace("DATASET_ID", params.DATASET_ID)
.replace("tableId", tableId);
var response = UrlFetchApp.fetch(url, {
"kind": "bigquery#tableDataInsertAllRequest",
"skipInvalidRows": 0,
"ignoreUnknownValues": 0,
"rows": [
{
"json": data
}
],
headers: {
Authorization: 'Bearer ' + service.getAccessToken()
}
});
var result = JSON.parse(response.getContentText());
Logger.log(JSON.stringify(result, null, 2));
This is not the most direct from csv to BQ JSON but it's some code that I'm using that should help you on the BigQuery side.
var PROJECT_ID = "xxx";
var DATASET_ID = "yyy";
function convertValuesToRows(data) {
var rows = [];
var headers = data[0];
for (var i = 1, numColumns = data.length; i < numColumns; i++) {
var row = BigQuery.newTableDataInsertAllRequestRows();
row.json = data[i].reduce(function(obj, value, index) {
obj[headers[index]] = value;
return obj
}, {});
rows.push(row);
};
return rows;
}
function bigqueryInsertData(data, tableName) {
var insertAllRequest = BigQuery.newTableDataInsertAllRequest();
insertAllRequest.rows = convertValuesToRows(data);
var response = BigQuery.Tabledata.insertAll(insertAllRequest, PROJECT_ID, DATASET_ID, tableName);
if (response.insertErrors) {
Logger.log(response.insertErrors);
}
}
This allows you to supply any GAS style value matrix (from getValues or indeed Utilities.parseCsv)
convertValuesToRows will take a 2d array of strings (with headers) and encode it in the format BigQuery needs, e.g.
[["H1", "H2", "H3"],
[1 , 2 , 3 ],
[4 , 5 , 6 ]];
will be added to the insertRows request int he form of key value pairs i.e.
[{H1: 1, H2: 2, H3: 3},
{H1: 4, H2: 5, H3: 6}]
You only need to worry about the first representation as that is what you pass into bigQueryInsertData together with the table name you want to feed the data in to (The schema of the table needs to match what you are sending) and the converter function is called from within.
Utilities.parseCsv already returns a 2d array of strings so you can basically call bigQueryInsertData(Utilities.parseCsv(data.getDataAsString()), "myTable")

Matching array contents to pre-entered data

I have an array, the contents of which are a subset of a list of names that come from a checkbox question in a google form. I need to email the people whose names are in the array, I suppose from a hard coded list (multi-dim array?). I cannot figure out how to perform the search/comparisons/whatever. Apparently I am supposes to use an object literal as in the code below:
var formNames = ["Name One", "Name Three"]; // one possibility for example
var objectMatchingNamesToEmails{
"Name One":"nameone#work.com",
"Name Two":"nametwo#work.com",
"Name Three":"namethree#work.com",
};
You could loop through the array:
var arrayOfEmails,arrayOfNames,L,thisEmail,thisName;
arrayOfNames = ["NameOne","NameTwo"];
arrayOfEmails = [];
L = arrayOfNames.length;//The number of names in the array
for (var i = 0;i<L;i++) {
thisName = arrayOfNames[i];
thisEmail = objectMatchingNamesToEmails[thisName];
arrayOfEmails.push(thisEmail);
};
Create an object literal:
var objectMatchingNamesToEmails;
objectMatchingNamesToEmails = {
"NameOne":"exampleOne#gmail.com",
"NameTwo":"exampleTwo#gmail.com",
"NameThree":"exampleThree#gmail.com",
};
Then after you get the name, the code can look up the correct email:
var userName,userEmail;
userName = code here to get user name;
userEmail = objectMatchingNamesToEmails[userName];
MailApp.sendEmail(userEmail,subject,body);