Using ImportJSON to parse JSONSchema documents and load into GSheet.
I have JSON documents with paths as in the snip below.
I want to output the names of properties in one column and the type in another.
Wanted to see if someone has done this already before i start hacking about with parseJSON or the defaultTransform functions of ImportJSON.
Added example GSheet here
Shows source, currently parsed output and what i need in terms of required output
/data/schema/properties/plan_id/type
/data/schema/properties/plan_id/maxLength
/data/schema/properties/plan_name/type
/data/schema/properties/plan_name/maxLength
/data/schema/properties/type/type
/data/schema/properties/type/maxLength
/data/schema/properties/quantity_ranges/type
/data/schema/properties/quantity_ranges/maximum
/data/schema/properties/quantity_ranges/minimum
/data/schema/properties/pricing_option/type
/data/schema/properties/pricing_option/maxLength
/data/schema/properties/currency/type
/data/schema/properties/currency/enum
/data/schema/properties/value/type
/data/schema/properties/value/maximum
/data/schema/properties/value/minimum
Thanks in advance!
You want to achieve the following situation.
From
To
You want to achieve this using Google Apps Script.
I understood like above. If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
Sample script:
When you use this sample script, please put =parseObject("SourceJSON!A1") to a cell in your shared Spreadsheet.
function parseObject(range) {
var range = SpreadsheetApp.getActiveSpreadsheet().getRange(range);
var value = range.getValue();
var object = JSON.parse(value);
var res = [];
var headers = ["type", ["maxLength", "maximum"], "minimum", "enum"];
// var headers = ["type", "maxLength", "maximum", "minimum", "enum"];
for (var i in object.data.schema.properties) {
var obj = object.data.schema.properties[i];
for (var j = 0; j < headers.length; j++) {
var temp = [object.data.id, object.data.version];
if (Array.isArray(headers[j])) {
for (var k = 0; k < headers[j].length; k++) {
if (obj[headers[j][k]]) res.push(temp.concat([i, "",obj[headers[j][k]],"",""]));
}
} else {
if (obj[headers[j]]) {
var ar = [i, "","","",""];
ar.splice(j + 1, 1, Array.isArray(obj[headers[j]]) ? obj[headers[j]].join(",") : obj[headers[j]]);
res.push(temp.concat(ar));
}
}
}
}
return res;
}
Result:
Note:
This sample script retrieves the data from the Spreadsheet.
In your DesiredOutput, the values of "maxLength" and "maximum" in the data are put to the same column. At above sample script, the result is the same with it. If you want to separate the values of "maxLength" and "maximum", please modify var headers = ["type", ["maxLength", "maximum"], "minimum", "enum"]; to var headers = ["type", "maxLength", "maximum", "minimum", "enum"];.
This sample script is for the value in your shared Spreadsheet. So when you use this for the data with other structure, an error might occur and/or the result you don't want might be returned. Please be careful this.
Related
This question already has answers here:
Scraping data to Google Sheets from a website that uses JavaScript
(2 answers)
Closed last month.
I'm attempting to scrape options pricing data from Yahoo Finance in Google Sheets. Although I'm able to pull the options chain just fine, i.e.
=IMPORTHTML("https://finance.yahoo.com/quote/TCOM/options?date=1610668800","table",2)
I find that it's returning results that don't completely match what's actually shown on Yahoo Finance. Specifically, the scraped results are incomplete - they're missing some strikes. i.e. the first 5 rows of the chart may match, but then it will start returning only every other strike (aka skipping every other strike).
Why would IMPORTHTML be returning "abbreviated" results, which don't match what's actually shown on the page? And more importantly, is there some way to scrape complete data (i.e. that doesn't skip some portion of the available strikes)?
In Yahoo finance, all data are available in a big json called root.App.main. So to get the complete set of data, proceed as following
var source = UrlFetchApp.fetch(url).getContentText()
var jsonString = source.match(/(?<=root.App.main = ).*(?=}}}})/g) + '}}}}'
var data = JSON.parse(jsonString)
You can then choose to fetch the informations you need. Take a copy of this example https://docs.google.com/spreadsheets/d/1sTA71PhpxI_QdGKXVAtb0Rc3cmvPLgzvXKXXTmiec7k/copy
edit
if you want to get a full list of available data, you can retrieve it by this simple script
// mike.steelson
let result = [];
function getAllDataJSON(url = 'https://finance.yahoo.com/quote/TCOM/options?date=1610668800') {
var source = UrlFetchApp.fetch(url).getContentText()
var jsonString = source.match(/(?<=root.App.main = ).*(?=}}}})/g) + '}}}}'
var data = JSON.parse(jsonString)
getAllData(eval(data),'data')
var sh = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet()
sh.getRange(1, 1, result.length, result[0].length).setValues(result);
}
function getAllData(obj,id) {
const regex = new RegExp('[^0-9]+');
for (let p in obj) {
var newid = (regex.test(p)) ? id + '["' + p + '"]' : id + '[' + p + ']';
if (obj[p]!=null){
if (typeof obj[p] != 'object' && typeof obj[p] != 'function'){
result.push([newid, obj[p]]);
}
if (typeof obj[p] == 'object') {
getAllData(obj[p], newid );
}
}
}
}
Here's a simpler way to get the last market price of a given option. Add this function to you Google Sheets Script Editor.
function OPTION(ticker) {
var ticker = ticker+"";
var URL = "finance.yahoo.com/quote/"+ticker;
var html = UrlFetchApp.fetch(URL).getContentText();
var count = (html.match(/regularMarketPrice/g) || []).length;
var query = "regularMarketPrice";
var loc = 0;
var n = parseInt(count)-2;
for(i = 0; i<n; i++) {
loc = html.indexOf(query,loc+1);
}
var value = html.substring(loc+query.length+9, html.indexOf(",", loc+query.length+9));
return value*100;
}
In your google sheets input the Yahoo Finance option ticker like below
=OPTION("AAPL210430C00060000")
I believe your goal as follows.
You want to retrieve the complete table from the URL of https://finance.yahoo.com/quote/TCOM/options?date=1610668800, and want to put it to the Spreadsheet.
Issue and workaround:
I could replicate your issue. When I saw the HTML data, unfortunately, I couldn't find the difference of HTML between the showing rows and the not showing rows. And also, I could confirm that the complete table is included in the HTML data. By the way, when I tested it using =IMPORTXML(A1,"//section[2]//tr"), the same result of IMPORTHTML occurs. So I thought that in this case, IMPORTHTML and IMPORTXML might not be able to retrieve the complete table.
So, in this answer, as a workaround, I would like to propose to put the complete table parsed using Sheets API. In this case, Google Apps Script is used. By this, I could confirm that the complete table can be retrieved by parsing the HTML table with Sheet API.
Sample script:
Please copy and paste the following script to the script editor of Spreadsheet, and please enable Sheets API at Advanced Google services. And, please run the function of myFunction at the script editor. By this, the retrieved table is put to the sheet of sheetName.
function myFunction() {
// Please set the following variables.
const url ="https://finance.yahoo.com/quote/TCOM/options?date=1610668800";
const sheetName = "Sheet1"; // Please set the destination sheet name.
const sessionNumber = 2; // Please set the number of session. In this case, the table of 2nd session is retrieved.
const html = UrlFetchApp.fetch(url).getContentText();
const section = [...html.matchAll(/<section[\s\S\w]+?<\/section>/g)];
if (section.length >= sessionNumber) {
if (section[sessionNumber].length == 1) {
const table = section[sessionNumber][0].match(/<table[\s\S\w]+?<\/table>/);
if (table) {
const ss = SpreadsheetApp.getActiveSpreadsheet();
const body = {requests: [{pasteData: {html: true, data: table[0], coordinate: {sheetId: ss.getSheetByName(sheetName).getSheetId()}}}]};
Sheets.Spreadsheets.batchUpdate(body, ss.getId());
}
} else {
throw new Error("No table.");
}
} else {
throw new Error("No table.");
}
}
const sessionNumber = 2; means that 2 of =IMPORTHTML("https://finance.yahoo.com/quote/TCOM/options?date=1610668800","table",2).
References:
Method: spreadsheets.batchUpdate
PasteDataRequest
I am trying to fetch data via API to Google sheets, I am able to get "NewConfirmed" and other few fields but not able to get "Countries" data. Please help.
function Covid19() {
// Call the COVID19 API
var response = UrlFetchApp.fetch("https://api.covid19api.com/summary");
// Parse the JSON reply
var json=response.getContentText();
var data=JSON.parse(json);
var sheet = SpreadsheetApp.getActiveSheet();
var i = 2;
for each (var info in data)
{
sheet.getRange(i,1).setValue([info['NewConfirmed']]);
sheet.getRange(i,2).setValue([info['Country']]);
i = i + 1;
}
If you log data, you will see
{Countries=[{CountryCode=AX, TotalRecovered=0.0, NewDeaths=0.0,
Slug=ala-aland-islands, Country=ALA Aland Islands, NewRecovered=0.0,
Date=2020-04-21T12:32:01Z, NewConfirmed=0.0, ...
Thus, in order to retrieve Country and NewConfirmed you need to define
var data=JSON.parse(json).Countries and then you have to iterate through all entries within a loop.
Sample based on your code:
function Covid19() {
var response = UrlFetchApp.fetch("https://api.covid19api.com/summary");
var json=response.getContentText();
var data=JSON.parse(json).Countries;
var sheet = SpreadsheetApp.getActiveSheet();
for(var i = 0; i < data.length; i++)
{
sheet.getRange(i+2,1).setValue([data[i]['NewConfirmed']]);
sheet.getRange(i+2,2).setValue([data[i]['Country']]);
}
}
Sidenote:
It is ot good practive to use getRange(...).setValue(..) during each
loop iteration. It would be better to write the data into an array and
assign the array with all the data to the sheet after finishing
iteration.
I want to pass an object to a Google Sheet Web App that I have written and have that data appended to the Google Sheet. I want to make sure the data ends up in the correct columns.
I can append the data to the file, but this could cause issues if columns are added/maniputated etc.
I have created column metadata for each column that corresponds to the object key.
I can read through the column metadata and find what column number each one represents. ie. if I get the metadata for "orderNumber" i can see it is in row 1.
Code for web app.
function doGet(e) {
var p = e.parameter;
var sheetName = "Orders";
var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(sheetName);
var appendList = JSON.parse(p.appendList)[0];
var returnValue = {};
returnValue["status"] ="error";
//var appendRow = sheet.getLastRow()+1;
var values = []
for (var key in appendList) {
values.push(appendList[key]);
Logger.log(searchAndReturnCol(key+"Column")); // just to show I can get the column number from meta data
}
sheet.appendRow(values);
return ContentService.createTextOutput(JSON.stringify(returnValue));
}
function testDoGet() { // emmulates what will past over by the app
var e = [];
var test = [{
'orderNumber' : "vsdv",
'name' : "Bob",
'porkDumpling' : 0,
'prawnDumpling' : 0,
'vegetarianDumpling' : 0,
'sauce' : "Spicey",
'paymentMethod' : "Cash",
'dollarReceivable' : 5,
'dollarReceived' :5,
'collectionDate' : 44234244,
'packed' : "No",
'collected' : "No",
'comments' : "This is a comment"
}]
var mod = JSON.stringify (test)
e.parameter = {
'appendList':mod,
}
doGet(e)
//displayLog (doGet(e));
}
Code to find metadata
function searchAndReturnCol (key){
var colLevel = cSAM.SAM.searchByKey (SSID , key);
return colLevel.matchedDeveloperMetadata[0].developerMetadata.location.dimensionRange.endIndex
}
What I am unsure about is how to bring the two ideas together. I want to check the key in the object and then make sure that this data is inserted into the correct column based on the column metadata.
In your Spreadsheet (the sheet of Orders), each column has the developer metadata.
Each key of developer metadata is the same with the keys of object you want to put to Spreadsheet.
You want to put the values to the column, when the keys both the developer metadata and the data you give are the same.
You want to achieve this using Google Apps Script.
If my understanding is correct, how about this modification? Please think of this as just one of several answers.
Modification points:
When doGet is run, the developer metadata is retrieved in order of the column. At that time, using the key of retrieved developer metadata, the data for putting to Spreadsheet is created from the giving data.
Modified script:
When your script is modified, please modify as follows.
From:
var values = []
for (var key in appendList) {
values.push(appendList[key]);
Logger.log(searchAndReturnCol(key+"Column")); // just to show I can get the column number from meta data
}
To:
var columnToLetter = function(column) { // From https://stackoverflow.com/a/21231012/7108653
var temp, letter = '';
while (column > 0) {
temp = (column - 1) % 26;
letter = String.fromCharCode(temp + 65) + letter;
column = (column - temp - 1) / 26;
}
return letter;
};
var col = sheet.getLastColumn();
var values = [];
for (var i = 0; i < col; i++) {
var d = sheet.getRange(columnToLetter(i + 1) + ":" + columnToLetter(i + 1)).getDeveloperMetadata();
for (var j = 0; j < d.length; j++) {
values.push(appendList[d[j].getKey()]);
}
}
Note:
If above modified script didn't retrieve the developer metadata from your Spreadsheet, please add the developer metadata to each column using the following script. If you want to rearrange the keys for columns, please modify keys.
function createDeveloperMetadata() {
var columnToLetter = function(column) { // From https://stackoverflow.com/a/21231012/7108653
var temp, letter = '';
while (column > 0) {
temp = (column - 1) % 26;
letter = String.fromCharCode(temp + 65) + letter;
column = (column - temp - 1) / 26;
}
return letter;
};
var keys = {orderNumber:"",name:"",porkDumpling:"",prawnDumpling:"",vegetarianDumpling:"",sauce:"",paymentMethod:"",dollarReceivable:"",dollarReceived:"",collectionDate:"",packed:"",collected:"",comments:""};
var sheet = SpreadsheetApp.getActiveSheet();
Object.keys(keys).forEach(function(e, i) {
sheet.getRange(columnToLetter(i + 1) + ":" + columnToLetter(i + 1)).addDeveloperMetadata(e, keys[e]);
});
}
When you add the developer metadata, please check the current metadata. Because the same keys can be added to the metadata.
If you want to update all metadata, I recommend to remove them and add new metadata.
When you modified your script of Web Apps, please redeploy Web Apps as new version. By this, the latest script is reflected to Web Apps. Please be careful this. In your script, when you test the script with testDoGet, it is not required to redeploy it.
Reference:
Class DeveloperMetadata
If I misunderstood your question and this was not the direction you want, I apologize.
I'm looking for some help. I am trying to grab an author's publications from PubMed and populate the data into Google Sheets using Apps Script. I've gotten as far as the code below and am now stuck.
Basically, what I have done was first pull all the Pubmed IDs from a particular author whose name comes from the name of the sheet. Then I have tried creating a loop to go through each Pubmed ID JSON summary and pull each field I want. I have been able to pull the pub date. I had set it up with the idea that I would do a loop for each field of that PMID I want, store it in an array, and then return it to my sheet. However, I'm now stuck trying to get the second field - title - and all the subsequent fields (e.g. authors, last author, first author, etc.)
Any help would be greatly appreciated.
function IMPORTPMID(){
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getSheets()[0];
var author = sheet.getSheetName();
var url = ("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=" + author + "[author]&retmode=json&retmax=1000");
var response = UrlFetchApp.fetch(url);
var AllAuthorPMID = JSON.parse(response.getContentText());
var xpath = "esearchresult/idlist";
var patharray = xpath.split("/");
for (var i = 0; i < patharray.length; i++) {
AllAuthorPMID = AllAuthorPMID[patharray[i]];
}
var PMID = AllAuthorPMID;
var PDparsearray = [PMID.length];
var titleparsearray = [PMID.length];
for (var x = 0; x < PMID.length; x++) {
var urlsum = ("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&retmode=json&rettype=abstract&id=" + PMID[x]);
var ressum = UrlFetchApp.fetch(urlsum);
var contentsum = ressum.getContentText();
var jsonsum = JSON.parse(contentsum);
var PDpath = "result/" + PMID[x] + "/pubdate";
var titlepath = "result/" + PMID[x] + "/title";
var PDpatharray = PDpath.split("/");
var titlepatharray = titlepath.split("/");
for (var j = 0; j < PDpatharray.length; j++) {
var jsonsum = jsonsum[PDpatharray[j]];
}
PDparsearray[x] = jsonsum;
}
var tempArr = [];
for (var obj in AllAuthorPMID) {
tempArr.push([obj, AllAuthorPMID[obj], PDparsearray[obj]]);
}
return tempArr;
}
From a PubMed JSON response for a given PubMed ID, you should be able to determine the fieldnames (and paths to them) that you want to include in your summary report. Reading them all is simpler to implement if they are all at the same level, but if some are properties of a sub-field, you can still access them if you give the right path in your setup.
Consider the "source JSON":
[
{ "pubMedId": "1234",
"name": "Jay Sahn",
"publications": [
{ "pubId": "abcd",
"issn": "A1B2C3",
"title": "Dynamic JSON Parsing: A Journey into Madness",
"authors": [
{ "pubMedId": "1234" },
{ "pubMedId": "2345" }
]
},
{ "pubId": "efgh",
...
},
...
],
...
},
...
]
The pubId and issn fields would be at the same level, while the publications and authors would not.
You can retrieve both the pubMedId and publications fields (and others you desire) in the same loop by either 1) hard-coding the field access, or 2) writing code that parses a field path and supplying field paths.
Option 1 is likely to be faster, but much less flexible if you suddenly want to get a new field, since you have to remember how to write the code to access that field, along with where to insert it, etc. God save you if the API changes.
Option 2 is harder to get right, but once right, will (should) work for any field you (properly) specify. Getting a new field is as easy as writing the path to it in the relevant config variable. There are possibly libraries that will do this for you.
To convert the above into spreadsheet rows (one per pubMedId in the outer array, e.g. the IDs you queried their API for), consider this example code:
function foo() {
const sheet = /* get a sheet reference somehow */;
const resp = UrlFetchApp.fetch(...).getContentText();
const data = JSON.parse(resp);
// paths relative to the outermost field, which for the imaginary source is an array of "author" objects
const fields = ['pubMedId', 'name', 'publications/pubId', 'publications/title', 'publications/authors/pubMedId'];
const output = data.map(function (author) {
var row = fields.map(function (f) {
var desiredField = f.split('/').reduce(delve_, author);
return JSON.stringify(desiredField);
});
return row;
});
sheet.getRange(1, 1, output.length, output[0].length).setValues(output);
}
function delve_(parentObj, property, i, fullPath) {
// Dive into the given object to get the path. If the parent is an array, access its elements.
if (parentObj === undefined)
return;
// Simple case: parentObj is an Object, and property exists.
const child = parentObj[property];
if (child)
return child;
// Not a direct property / index, so perhaps a property on an object in an Array.
if (parentObj.constructor === Array)
return collate_(parentObj, fullPath.splice(i));
console.warn({message: "Unhandled case / missing property",
args: {parent: parentObj, prop: property, index: i, pathArray: fullPath}});
return; // property didn't exist, user error.
}
function collate_(arr, fields) {
// Obtain the given property from all elements of the array.
const results = arr.map(function (element) {
return fields.slice().reduce(delve_, element);
});
return results;
}
Executing this yields the following output in Stackdriver:
Obviously you probably want some different (aka real) fields, and probably have other ideas for how to report them, so I leave that portion up to the reader.
Anyone with improvements to the above is welcome to submit a PR.
Recommended Reading:
Array#reduce
Array#map
Array#splice
Array#slice
Internet references on parsing nested JSON. There are a lot.
I am importing data from a JSON file using Google Apps Script and Google Sheets. I have learned the basics on this, but the formatting on the JSON file I am attempting to parse is throwing me off.
What is confusing me is how I would search for information based on "name". Currently I am using this:
function JSONReq(url, xpath){
var res = UrlFetchApp.fetch(url);
var content = res.getContentText();
var json = JSON.parse(content);
var patharray = xpath.split("/");
for(var i = 0; i < patharray.length; i++){
json = json[patharray[i]];
}
return json;
}
I'm a bit lost now to be honest with you.
I want to have a cell where I can type a name that I already know of, then find it in the JSON file and pull the return that information however I decide to do it. I can pull and write to cells, I have the basics down. But I just can't understand how I could search by the name.
That JSON file is an array of objects. To find a specific object with a given "name", you would parse it into an object (which you do already), then iterate through them and check the name parameter:
var myName = "name of thing I want";
var arr = JSON.parse( ... );
for(var i = 0; i < arr.length; ++i) {
var obj = arr[i];
if(obj.name == myName) { // could be done as obj["name"] == ... too
// do stuff with obj
}
}
For your case, you might add an additional argument to your function (i.e. 2nd arg = the object's property, e.g. "name", with the 3rd = the desired value. This will be fine for any simple key-value properties, but would need specific handling for where the value is itself an object (e.g. the "category" field in your specific JSON file).