I wanted to thank everyone for being so helpful on this site - it means a lot!
I am trying to import the likes/followers from a Spotify playlist to Google Sheets. It seems like various playlists have a different XPath.
I can extract a majority(most work) of the likes/followers using this code: (B24 is the URL)
=INDEX(REGEXEXTRACT(IFERROR(QUERY(ARRAY_CONSTRAIN(IMPORTDATA(B24), 500, 5), "select Col5 where Col4 contains 'followers'", 0), QUERY(ARRAY_CONSTRAIN(IMPORTDATA(B24), 500, 7), "select Col7 where Col6 contains 'followers'", 0)), "\d+")*1)
However, some playlist links come up with an empty output.
Example: https://open.spotify.com/playlist/5aSO2lT7sVPKut6F9L6IAc
Example of a working one: https://open.spotify.com/playlist/7qvQVDnLe4asawpZqYhKMQ
I'm honestly not sure how to add a third argument, and I have been blindly changing the col numbers to see what works - no luck. Any idea on how to figure out what col #'s to change to/any guidance would be extremely helpful.
Thank you!!
Issue and workaround:
When I saw the HTML from both URLs, I thought that in this case, the value, you want to retrieve, can be retrieved from the JSON data included in the HTML. But unfortunately, the JSON data is large. So when IMPORTXML is used, an error occurs because of the data size. So in this answer, I would like to propose to use a custom function using Google Apps Script.
Sample script:
Please copy and paste the following Google Apps Script to the script editor of Google Spreadsheet. And, please put =SAMPLE("###url###") to a cell. By this, the value of followers is returned.
function SAMPLE(url) {
const res = UrlFetchApp.fetch(url).getContentText();
const v = res.replace(/&/g, "&").match(/Spotify\.Entity \=([\s\S\w]+?);/);
return v && v.length == 2 ? JSON.parse(v[1].trim()).followers.total : "Value cannot be retrieved.";
}
Result:
When above script is used for your 2 URLs, the following result is obtained. In this case, the following custom formulas are put to the cells "A1" and "A2", respectively.
=SAMPLE("https://open.spotify.com/playlist/5aSO2lT7sVPKut6F9L6IAc")
=SAMPLE("https://open.spotify.com/playlist/7qvQVDnLe4asawpZqYhKMQ")
Note:
This sample script is for the URLs in your question. So when you tested it for other URLs, the script might not be able to used. And, when the structure of HTML is changed at the server side, the script might not be able to used. So please be careful this.
References:
Custom Functions in Google Sheets
fetch(url)
Related
I have a range that assigns shifts to a set of employees, in which the row labels are dates (ie, the Y axis is a chronological set of dates), and the column headers are locations (Building1, Building2, etc). Each row, then, contains employees assigned to each location for that day. Or, alternatively, each column will contain a chrono list of who will be assigned to the location specified in that column's header.
I am attempting to match a name, say "John Doe" for each instance he appears throughout the range, and return a 2 column list of dates and locations for which he is assigned. John Doe will be listed many times over the dates in question and various locations (in multiple columns).
I've reached the limit of my expertise both with AppsScript and Filter functions and greatly appreciate any help. I believe a loop is necessary, but perhaps there is a better way. For what its worth, my goal is to take this list and put every assignment on the user's calendar (I've solved for this already). TIA everyone!
Sample input and output situation
From your provided Spreadsheet, I believe your goal is as follows.
You want to achieve the following situation using Google Apps Script.
In this case, how about the following sample script?
Sample script:
Please copy and paste the following script to the script editor of Spreadsheet and save the script. When you use this script, please put a custom function of =SAMPLE(Data!A3:F20,"John Doe") to a cell. By this, the result values are returned.
const SAMPLE = ([h, ...v], searchName) =>
[["Data", "Location"], ...v.flatMap(([hh, ...vv]) => {
const i = vv.indexOf(searchName);
return i != -1 ? [[hh, h[i + 1]]] : [];
})];
If you don't want to include the header row, you can also use the following script.
const SAMPLE = ([h, ...v], searchName) =>
v.flatMap(([hh, ...vv]) => {
const i = vv.indexOf(searchName);
return i != -1 ? [[hh, h[i + 1]]] : [];
});
Testing:
When this sample script is used for your sample input values, the following situation is obtained.
In the case of "John Doe", from your expected output, "Building4" of "8/8/2022" is not included as shown in the red background cell. But, I'm worried that you might have miscopied. So, I proposed the above sample script. If you want to except for the value of the specific date, please tell me. This can be also achieved.
Reference:
Custom Functions in Google Sheets
The result that you are looking for could be achieved by using Google Sheets built-in functions in a formula:
=ARRAYFORMULA(QUERY(SPLIT(FLATTEN(Data!A4:A20&"💣"&Data!B3:F3&"💣"&Data!B4:F20),"💣"),"SELECT Col1,Col2 WHERE Col3 = 'John Doe'")
Briefly, the above formula uses FLATTEN and Google Sheets array handling feature to "unpivot" your double entry table, then uses QUERY to filter and limit the data to be returned.
Related
How do you create a "reverse pivot" in Google Sheets?
I'll start by saying that my knowledge on using APIs is extremely limited. I'm impressed I've gotten as far as I have on this.
I've created a workbook in Google Sheets with imported data from the iexcloud API, which I'm using for data on stocks.
The requests have a cell reference in them so they update whenever a different symbol is selected.
So far, everything I've needed to request from it has the option to format as csv, so I can get cells with just the values.
However, this last thing I want doesn't have that option, so the whole response is wrapped in ["" ].
That really messes up what I need it for.
Here's an example
["PSA" CCI SHO ACC]
with each symbol being in its own cell.
I'm using the Peer Groups request.
A sample request:
> https://sandbox.iexapis.com/stable/stock/aapl/peers?token=Tsk_2b4c7c6fd98542f6a99f904cb7a3e721
Using Find and Replace doesn't work. I'm assuming because it's imported.
I need to use the cells with those symbols: PSA, CCI, SHO, ACC to reference in another request.
I recreated this in another Google Sheet that you can edit. The section in question in highlighted in blue
https://docs.google.com/spreadsheets/d/1BQ6FBD0S2YkDtDGZGIkDmQoKrQT4VmVDjuNsgV4mrXM/edit?usp=sharing
So I'm wondering if there's a way to have [ " ] automatically removed from any cells in that row, or if I copy and paste the values only, to have the values updated when the original cells are updated with new symbols (since I can have those characters removed in that row)
Or if there's a way I can format the response in sheets.
Any ideas?
I believe your goal as follows.
You want to achieve from ["CCI" SBAC CTL TDS RCI RCI-A-CT DTEGY] to CCI SBAC CTL TDS RCI RCI-A-CT DTEGY using the built-in functions of Google Spreadsheet.
Modified formula:
=ARRAYFORMULA(REGEXREPLACE(IMPORTDATA("https://cloud.iexapis.com/stable/stock/"&B3&"/peers?format=psv&token=###"),"[\[\]""]",""))
In this modified formula, [, ] and " are removed using REGEXREPLACE.
Please replace ### with your token at the above formula.
Result:
In this result, the values retrieved with =IMPORTDATA("https://cloud.iexapis.com/stable/stock/"&B3&"/peers?format=psv&token=###") are used. So the formula of cell "C9" is =ARRAYFORMULA(REGEXREPLACE(C6:I6,"[\[\]""]","")). But in this case, above modified formula can be used.
Note:
In this answer, I removed your token because I thought that it is your personal information.
Reference:
REGEXREPLACE
When I input
=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2")
in my google sheet, I get: #N/A Imported content is empty.
However, when I input:
=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*")
I get some content, so I can presume that access to the page is not blocked.
And the page contains several h2 tags without any doubt.
So what's the issue?
You want to know the reason of the following situation.
=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2") returns #N/A Imported content is empty.
=IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*") returns the content.
If my understanding is correct, how about this answer?
Issue:
When I saw the HTML data of http://www.ilgiornale.it/autore/franco-battaglia.html, I noticed that the wrong point of it. It is as follows.
window.jQuery || document.write("<script src='/sites/all/modules/jquery_update/replace/jquery/jquery.min.js'>\x3C/script>")
In this case, the script tag is not closed like \x3C/script>. It seems that when IMPORTXML retrieves this line, the script tab is not closed. I could confirm that when \x3C is converted to <, =IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2") correctly returns the values of h2 tag.
By this, it seems that the issue that =IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","//h2") returns #N/A Imported content is empty occurs.
About the reason that =IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*") returns the content, when I put this formula, I couldn't find the values of the script tab. From this situation, I thought that the script tag might have an issue. So I could find the above wrong point. I could confirm that when \x3C is converted to <, =IMPORTXML("http://www.ilgiornale.it/autore/franco-battaglia.html","*") returns the values including the values of the script tag.
Workarounds:
In order to avoid above issue, it is required to be modified \x3C to <. So how about the following workarounds? In these workarounds, I used Google Apps Script. Please think of these workarounds as just two of several workarounds.
Pattern 1:
In this pattern, at first, download the HTML data from the URL, and modify the wrong point. Then, the modified HTML data is created as a file, and the file is shared. And retrieve the URL of the file. Using this URL, the values are retrieved.
Sample script:
function myFunction() {
var url = "http://www.ilgiornale.it/autore/franco-battaglia.html";
var data = UrlFetchApp.fetch(url).getContentText().replace(/\\x3C/g, "<");
var file = DriveApp.createFile("htmlData.html", data, MimeType.HTML);
file.setSharing(DriveApp.Access.ANYONE_WITH_LINK, DriveApp.Permission.VIEW);
var endpoint = "https://drive.google.com/uc?id=" + file.getId() + "&export=download";
Logger.log(endpoint)
}
When you use this script, at first, please run the function of myFunction() and retrieve the endpoint. And as a test case, please put the endpoint to the cell "A1". And put =IMPORTXML(A1,"//h2") to the cell "A2". By this, the values can be retrieved.
Pattern 2:
In this pattern, the values of the tag h2 are directly retrieved by parsing HTML data and put them to the active Spreadsheet.
Sample script:
function myFunction() {
var url = "http://www.ilgiornale.it/autore/franco-battaglia.html";
var data = UrlFetchApp.fetch(url).getContentText().match(/<h2[\s\S]+?<\/h2>/g);
var xml = XmlService.parse("<temp>" + data.join("") + "</temp>");
var h2Values = xml.getRootElement().getChildren("h2").map(function(e) {return [e.getValue()]});
var sheet = SpreadsheetApp.getActiveSheet();
sheet.getRange(sheet.getLastRow() + 1, 1, h2Values.length, 1).setValues(h2Values);
Logger.log(h2Values)
}
When you run the script, the values of the tag h2 are directly put to the active Spreadsheet.
References:
Class UrlFetchApp
Class XmlService
If I misunderstood your question and this was not the direction you want, I apologize.
I have about 3000 rows in my Google Spreadsheet and each row contains data about one article from our website. In one column (e.g. A:A) is stored formated text in HTML. I need extract all URLs inside href="" attribute from this column and work with them later. (It could be array or text string separated with coma or space in B column)
I tryied to use REGEXTRACT formula but it gives me only the first result. Then I tryied to use REGEXREPLACE but I'm unable to write proper expression to get only URL links.
I know that it is not proper way to use regex to get anything from HTML. Is there another way to extract these values from HTML text in one cell?
Link to sample data: Google Spreadsheet
Thak you in advance! I'm real newbie here and in scripting, parsing etc. too.
How about this samples? I used href=\"(.*?)\" for retrieving the URL. The sample of regex101.com is here.
1. Using Google spreadsheets functions :
=TEXTJOIN(CHAR(10),TRUE,ARRAYFORMULA(IFERROR(REGEXEXTRACT(SPLIT(a1,">"),"href="&CHAR(34)&"(.*?)"&CHAR(34)))))
In this case, since REGEXEXTRACT retrieves only the first matched string, after the cell data is separated by SPLIT, the URL is retrieved by REGEXEXTRACT.
Result :
2. Using Google Apps Script :
function myFunction(str){
var re = /href=\"(.*?)\"/g;
var result = "";
while ((res=re.exec(str)) !== null) {
result += res[1] + "\n";
};
return result.slice(0,-1);
}
This script can be used as a custom function. When you use this, please put =myFunction(A1) to a cell.
Result :
The result is the same to above method.
If I misunderstand your question, I'm sorry.
I just discovered Google App Scripts, and I'm stumped on something already...
I am trying to write a script for a Google Spreadsheet which finds certain historical stock prices. I found that the FinanceApp service within Google App Scripts has been deprecated, and seemingly replaced by the GOOGLEFINANCE() function within Google Spreadsheets. However, it returns an array, when I need only a single cell, and the array is mucking up the works.
So I'd like to write a short script that calls the GOOGLEFINANCE() spreadsheet function, and finds just the 1 piece of info I need from the array which is returned by GOOGLEFINANCE(). However, I cannot find a way to access Spreadsheet Functions (SUM, VLOOKUP, GOOGLEFINANCE, etc) within a script.
Is there a way to access these functions in a script? Or perhaps, is there a new service which replaces the deprecated FinanceApp service?
Many thanks for any assistance!
You can try this:
var trick = SpreadsheetApp.getActiveSheet().getRange('D2').setValue('=GOOGLEFINANCE("GOOG")').getValue();
Native Spreadsheet functions are not supported in Google Apps Script.
You could eventually use a somewhat cumbersome workaround by reading the value of a cell in which you write a formula (using script in both write and read) but this will be less than practical and / or fast.
You might try the INDEX function combined with GOOGLEFINANCE-
For reference,
=GOOGLEFINANCE("MSFT", "PRICE", "01/01/21")
Returns the array:
Date Close
1/4/2021 217.69
One can add the INDEX function to pick out specific elements from the array using the row,column coordinates of the array.
=INDEX(GOOGLEFINANCE("MSFT", "PRICE", "01/01/21"),2,2)
This returns just the data in row 2, column 2 - 217.69
There is one possible way, with the .setFormula(). This function behave like .setValue() and can be used the following way:
var ss = SpreadsheetApp.getActiveSpreadsheet();
var mySheet = ss.getSheets()[0]
//Code Below selects the first cell in range column A and B
var thisCell = mySheet.getRange('A:B').getCell(1,1);
thisCell.setFormula('=SUM(A2:A4)');
All formulas you write in this function are treated as strings must have ' or " within the .setFormula() input.