GoogleAppsScript: How do I trim strings after parsing HTML? - google-apps-script

What I'm trying to do is parse & extract the movies title, without all the HTML gunk, from the webpage which will eventually get saved into a spreadsheet. My code:
function myFunction() {
var url = UrlFetchApp.fetch("http://boxofficemojo.com/movies/?id=clashofthetitans2.htm")
var doc = url.getContentText()
var patt1 = doc.match(/<font face\=\"Verdana\"\ssize\=\"6\"><b>.*?<\/b>/i);
//var cleaned = patt1.replace(/^<font face\=\"Verdana\" size\=\"6\"><b>/,"");
//Logger.log(cleaned); Didn't work, get "cannot find function in object" error.
//so tried making a function below:
String.trim = function() {
return this.replace(/^\W<font face\=\"Verdana\"\ssize\=\"6\"><b>/,""); }
Logger.log(patt1.trim());
}
I'm very new to all of this (programming and GoogleScripting in general) I've been referencing w3school.com's JavaScript section but many things on there just don't work with Google Scripts. I'm just not sure what's missing here, is my RegEx wrong? Is there a better/faster way to extract this data instead of RegEx? Any help would be great, Thanks for reading!

While trying to parse information out of HTML that's not under your control is always a bit of a challenge, there is a way you could make this easier on yourself.
I noticed that the title element of each movie page also contains the movie title, like this:
<title>Wrath of the Titans (2012) - Box Office Mojo</title>
You might have more success parsing the title out of this, as it is probably more stable.
var url = UrlFetchApp.fetch("http://boxofficemojo.com/movies/?id=clashofthetitans2.htm");
var doc = url.getContentText();
var match = content.match(/<title>(.+) \([0-9]{4}\) -/);
Logger.log("Movie title is " + match[1]);

Related

How to translate only text in formatted HTML code using Google Apps Script?

I have been trying to translate text from HTML code. Here is an example:
var s = '<span>X stopped the</span><icon></icon><subject>breakout session</subject>'
When I try =GOOGLETRANSLATE(s,"en","fi") in Google Sheet, it also changes the tags formatting and translates tags into simple text. Whereas the translation should be only for X stopped the breakout session. But that is not the case.
Then I tried this function:
function TransLang(string){
return LanguageApp.translate(string,'en', 'fi', {contentType: 'text'});
}
This function worked well (for some time), but after that I got an error
Service invoked too many times in one day.
So I am stuck here. Is there any way that we can translate simple text of html code without translating/messing with HTML tags? Is there any regex that can avoid tags and translate all the other simple text?
I hope I am able to state my problem clearly. Please guide me if you have any suggestions. Thank you
Is the text you want always inside a single <span>? Or could there be more than one span or other element types?
This works for extracting the inner text from a single <span>:
function getSpanText() {
let s = '<span>X stopped the</span><icon></icon><subject>breakout session</subject>';
var text = s.match("(?<=<span>).+(?=<\/span>)")[0]
Logger.log(text);
return text
}
So, after a lot of digging, I have been able to find what I was looking for.
function Translator(S){
var sourceLang = "en";
var targetLang = "fi";
var url =
'https://translate.googleapis.com/translate_a/single?client=gtx&sl='
+
sourceLang +
'&tl=' +
targetLang +
'&dt=t&q=' +
encodeURI(S);
var result = JSON.parse(UrlFetchApp.fetch(url).getContentText());
return result[0][0][0];
}
This simple function calls Google translate Api and extracts the result from there. The best thing is you do not have to worry about the tags, as they are not translated by Google, so just the simple text is translated. There is just one limitation in the solution that Api calls are limited, so you can not make more than 5000 calls/day.
Why not using LanguageApp.translate as a custom JS-Function (Extensions >> AppScripts)?!
var spanish = LanguageApp.translate('This is a <strong>test</strong>',
'en', 'es', {contentType: 'html'});
// The code will generate "Esta es una <strong>prueba</strong>".
LanguageApp.translate (apidoc) accepts as fourth option a contentType, which can be text or html.
For huge tables be aware that there are daily limits (quotas)!

Error with custom Search and Replace function for Google Sites

I'm trying to use a script to replace a particular string with a different string. I think the code is right, but I keep getting the error "Object does not allow properties to be added or changed."
Does anyone know what could be going wrong?
function searchAndReplace() {
var teams = SitesApp.getPageByUrl("https://sites.google.com/a/directory/teams");
var list = teams.getChildren();
list.forEach(function(element){
page = element.getChildren();
});
page.forEach(function(element) {
var html = element.getHtmlContent();
html.replace(/foo/, 'bar');
element.setHtmlContent = html;
});
};
Try This:
Javascript reference:
The replace() method returns a new string with some or all matches of a pattern replaced by a replacement.
I think the issue here is that forEach cannot change the array that it is called upon. From developer.mozilla.org "forEach() does not mutate the array on which it is called (although callback, if invoked, may do so)."
Try doing it with a regular loop.

Jsoup not displaying tags using the select() function

I'm trying to scrape the team win-loss record data from the NBA website here. Here's an image of the lines of text I want to capture, it is circled in black:
Can someone try scraping this exact data and seeing if it works? I've been at it for hours and nothing is working. I was able to scrape the team names and start times but when I try using jsoup's select function on the record lines, I get 0 results back. It's as if the tags are hidden from the html hierarchy. Is this possible? I'm new to this and I may be doing something wrong.
Code I have tried:
Document document = Jsoup.connect("http://espn.go.com/nba/scoreboard/_/date/20160315").get();
games = document.select("section.sb-score");
for(Element game : games)
{
mHomeTeam = game.select("td.home").select("div.sb-meta").text();
Elements test = game.select("p.record.overall");
mAwayTeam = game.select("td.away").select("div.sb-meta").text();
mHomeTeamRecord = game.select("td.home").select("div.record-container").select("p.record").text();
mAwayTeamRecord = game.select("td.away").select("div.record-container").select("p.record").text();
mGameStartTime = game.select("span.time").text();
Game newGameObj = new Game(mHomeTeam, mAwayTeam, mGameStartTime, mHomeTeamRecord, mAwayTeamRecord);
mGameList.add(newGameObj);
}
The team win-loss record data is loaded by Javascript in the page. Since Jsoup is an HTML parser this is why it's not displaying tags with the select() method.
However, it seems this data is located inside the page directly in a Javascript object called window.espn.scoreboardData.
Here is how to extract this data:
Document doc = Jsoup.connect("http://espn.go.com/nba/scoreboard/_/date/20160315").get();
for(Element script : doc.select("script")) {
String scriptData = script.html();
if (scriptData.contains("window.espn.scoreboardData")) {
// Parse scriptData to extract team win-loss record ...
}
}

Spotify List objects created from localStorage data come up blank

I'm working on a Spotify app and trying to create a views.List object from some stored information in our database. On initial load, a POST is made to get the necessary info. I store this in localstorage so each subsequent request can avoid hitting the database and retrieve the object locally. What's happening though is the List objects I create from localstorage data come up blank, while the POST requests work just fine.
Here is the snippet I'm using to create the list:
var temp_playlist = models.Playlist.fromURI(playlist.uri);
var tempList = new views.List(temp_playlist, function (track) {
return new views.Track(track, views.Track.FIELD.STAR |
views.Track.FIELD.NAME |
views.Track.FIELD.ARTIST |
views.Track.FIELD.DURATION);
});
document.getElementById("tracklist").appendChild(tempList.node);
playlist.uri in the first line is what I'm retrieving either from a POST or from localstorage. The resulting views.List object (tempList) looks identical in both cases except for tempList.node. The one retrieved from localstorage shows these values for innerHTML, innerText, outerHTML, and outerText in console.log:
innerHTML: "<div style="height: 400px; "></div>"
innerText: ""
outerHTML: "<div style="height: 400px; "></div>"
outerText: ""
Whereas the one retrieved via POST has the full data:
innerHTML: "<div style="height: 400px; "><a href="spotify:track:07CnMloaACYeFpwgZ9ihfg" class="sp-item sp-track sp-track-availability-0" title="Boss On The Boat by Tosca" data-itemindex="0" data-viewindex="0" style="-webkit-transform: translateY(0px); ">....
innerText: "3Boss On The BoatTosca6:082....
and so forth..
Any help would be greatly appreciated
Solved this.
I am using hide() and show() to render the tabs in my app. I was constructing the tracklist and then show()ing the div which led to a blank tracklist. If I simply show() the div and then construct the tracklist it works fine.
The reason (I think) it was working for POSTs is because the tracklist was retrieved from the database and the slightly longer loading time probably meant the tracklist was constructed after the div's show() executed. With localStorage I guess the tracklist was constructed before the div was even shown, leading to the error.
Using, the local storage, I did it this way :
sp = getSpotifyApi(1);
var m = sp.require("sp://import/scripts/api/models");
var v = sp.require("sp://import/scripts/api/views");
var pl;
pl = m.Playlist.fromURI(uri);
var player = new v.Player();
player.track = pl.get(0);
player.context = pl;
var list = new v.List(pl);
XXXXX.append($(list.node));
Hope, it will help, as it's working for me
I think I've actually managed to solve this and I think it's bulletproof.
Basically I was trying to solve this by trying to convince the API that it needed to redraw the playlist by hiding things/scrolling things/moving things which worked occasionally but never consistently. It never occurred to me to change the playlist itself. Or at least make the API think the playlist has changed.
You can do so by firing an event on the Playlist object.
var models = sp.require('$api/models');
...
// playlist is your Playlist object. Usually retrieved from models.Playlist.fromURI
playlist.notify(models.EVENT.CHANGE, playlist);
These are just standard Spotify functions and the list updates because it thinks something has changed in the playlist. Hope this helps someone!

Sending values through links

Here is the situation: I have 2 pages.
What I want is to have a number of text links(<a href="">) on page 1 all directing to page 2, but I want each link to send a different value.
On page 2 I want to show that value like this:
Hello you clicked {value}
Another point to take into account is that I can't use any php in this situation, just html.
Can you use any scripting? Something like Javascript. If you can, then pass the values along in the query string (just add a "?ValueName=Value") to the end of your links. Then on the target page retrieve the query string value. The following site shows how to parse it out: Parsing the Query String.
Here's the Javascript code you would need:
var qs = new Querystring();
var v1 = qs.get("ValueName")
From there you should be able to work with the passed value.
Javascript can get it. Say, you're trying to get the querystring value from this url: http://foo.com/default.html?foo=bar
var tabvalue = getQueryVariable("foo");
function getQueryVariable(variable)
{
var query = window.location.search.substring(1);
var vars = query.split("&");
for (var i=0;i<vars.length;i++)
{
var pair = vars[i].split("=");
if (pair[0] == variable)
{
return pair[1];
}
}
}
** Not 100% certain if my JS code here is correct, as I didn't test it.
You might be able to accomplish this using HTML Anchors.
http://www.w3schools.com/HTML/html_links.asp
Append your data to the HREF tag of your links ad use javascript on second page to parse the URL and display wathever you want
http://java-programming.suite101.com/article.cfm/how_to_get_url_parts_in_javascript
It's not clean, but it should work.
Use document.location.search and split()
http://www.example.com/example.html?argument=value
var queryString = document.location.search();
var parts = queryString.split('=');
document.write(parts[0]); // The argument name
document.write(parts[1]); // The value
Hope it helps
Well this is pretty basic with javascript, but if you want more of this and more advanced stuff you should really look into php for instance. Using php it's easy to get variables from one page to another, here's an example:
the url:
localhost/index.php?myvar=Hello World
You can then access myvar in index.php using this bit of code:
$myvar =$_GET['myvar'];
Ok thanks for all your replies, i'll take a look if i can find a way to use the scripts.
It's really annoying since i have to work around a CMS, because in the CMS, all pages are created with a Wysiwyg editor which tend to filter out unrecognized tags/scripts.
Edit: Ok it seems that the damn wysiwyg editor only recognizes html tags... (as expected)
Using php
<?
$passthis = "See you on the other side";
echo '<form action="whereyouwantittogo.php" target="_blank" method="post">'.
'<input type="text" name="passthis1" value="'.
$passthis .' " /> '.
'<button type="Submit" value="Submit" >Submit</button>'.
'</form>';
?>
The script for the page you would like to pass the info to:
<?
$thispassed = $_POST['passthis1'];
echo '<textarea>'. $thispassed .'</textarea>';
echo $thispassed;
?>
Use this two codes on seperate pages with the latter at whereyouwantittogo.php and you should be in business.