extracting specific information from website with actionscript 3 - actionscript-3

Im trying to extract lots of information from a website, and Im unfamiliar with the syntax I should use to get specific content, I've tried reading up on RegEx and match API for actionscript 3, but Im still unsure.
This is my code:
var l1:URLLoader = new URLLoader();
l1.addEventListener(Event.COMPLETE, completeHandler);
l1.load(new URLRequest("https://meny.no/oppskrifter/Pasta/baked-feta-pasta/"));
trace("load");
function completeHandler(e:Event):void {
trace("complete")
var s:String = e.target.data;
//var targets:Array = s.match(/(?<=<div class="target">).*(?=<\/div>)/igm);
trace(targets);
//getting the name of the recipe
var targets:Array = s.match(/(?<=<h1 class="c-h1">).*(?=<\/h1>)/igm);
trace(targets);
// getting the ingress of the recipe
targets[1] = s.match(/(?<=<div class="c-recipe__intro">).*(?=<\/div>)/sigm);
trace(targets[1]);
trace("complete2");
}
What I'm trying to grep with this line:
targets[1] = s.match(/(?<=).*(?=</div>)/sigm);
Is getting this information only: Oppskrift på TikTok trenden "Baked feta pasta", en enkel pasta med saus av ovnsbakt fetaost og tomater. Retten er enkel med få ingredienser og mye smak. Fetaost, tomater, olivenolje og urter ovnsbakes, og blandes så med kokt pasta.
But instead it gives me everything after aswell
Anyway, is there a template or something that explains how to get certain information in a more graspable way?
Thanks!
Its similar to this question: But not quite the same
In swf AS3, how do you extract string content from a website

There of course is another (rather than RegEx) approach, an algorithmic one.
Something like that will suffice for searching many sub-strings by the given head and tail:
function findMany(text:String, head:String, tail:String):Array
{
var result:Array = new Array;
// At this point we get several slices, each ends with the provided "tail".
var aList:Array = text.split(tail);
// The last chunk doesn't end with "tail".
aList.pop();
// Iterate over them.
for each (var a:String in aList)
{
// Find out where (and if) the "head" in the each slice.
// If there's more than one, take the last.
var anIndex:int = a.lastIndexOf(head);
if (anIndex > -1)
{
// If there is one, add it to the list of results.
// Don't forget the "tail" as .split() method
// cuts the given separator.
result.push(a.substr(anIndex) + tail);
}
}
return result;
}

Related

Array of stagewebviews

I am in the need of multiple staged WebViews for holding multiple loaded websites at the same time.
I was hoping to manage this by making an array of webviews object, so i could call them later as view[i].
var view:Array=[webview0, webview1, webview2];
for each (var v in view){
var v:StageWebView = new StageWebView();
This gives error: 1086: Syntax error: expecting semicolon before left bracket.
Does someone know how to make an array like that?
You're doing something really weird there in terms of syntax. If you just want an Array of freshly created instances, it goes like that:
// Initialize the array.
var Views:Array = new Array;
// This loop counts 0,1,2.
for (var i:int = 0; i < 3; i++)
{
// Create a new instance.
// Yes, you can omit () with new operator if there are no arguments.
var aView:StageWebView = new StageWebView;
// Assign the new element to your array.
Views[i] = aView;
}
Or, if you need only 3 then you don't need to go algorithmic.
var Views:Array = [new StageWebView, new StageWebView, new StageWebView];
Not on topic but related:
Here is an example of one HTML page hold multiple StageWebViews
https://www.w3schools.com/graphics/tryit.asp?filename=trymap_basic_many

trying to get properties of objects inside object properties

Sometimes JavaScript is playing with me (although the deal was that I would be playing with it...) This test code below keeps resisting so I'm looking for a little help from more clever people around here.
Answering to a recent question I tried to create a readable list of all the color IDs useable in Google Advanced Calendar API.
The request is very simple : Calendar.Colors.get()
The response is an object with a couple of properties, each one being other objects with other properties.
I can go down to the second level but the last -and most useful in this case - level returns a disturbing "undefined" (see partial log below)
And that's my question...
code with comments :
function getColorList(){
var colors = Calendar.Colors.get();
//Logger.log(JSON.stringify(colors));
for(var cat in colors){
Logger.log("category "+cat+" = "+JSON.stringify(colors[cat])+'\n\n')
}
// from there I try the "event" category
var events = colors["event"];
Logger.log('object colors["event"] = '+ JSON.stringify(events))
// then I try to get every properties in this object
for(var val in events){
Logger.log("key "+val+" = "+JSON.stringify(events[val]))
}
}
Full log is viewable here (externalized to keep this reasonably short)
Looks like (key) may be indicating a read-only definition as Sandy was eluding to.
Just make your own object from colors to loop through after converting it to string:
var json = JSON.stringify(colors["event"]);
var myObj = JSON.parse(json);
for(var val in myObj){
Logger.log("key "+ val +" = "+JSON.stringify(myObj[val]))
}

Character encoding issue when using Google Apps Script to extract data from web page

I have written a script using Google Apps Script to extract text from a web page into Google Sheets. I only need this script to work with a specific web page, so it does not need to be versatile. The script works almost exactly as I want it to except that I have run into a character encoding problem. I am extracting both Hebrew and English text. The meta tag in the HTML has charset=Windows-1255. The English extracts perfectly, but the Hebrew displays as black diamonds containing a question mark.
I found this question that says to pass the data into a blob then use the getDataAsString method to convert to another encoding. I tried converting to different encodings and got different results. UTF-8 displays the black diamonds with question marks, UTF-16 displays Korean, ISO 8859-8 returns an error and says it's not a valid parameter, and the original Windows-1255 displays one Hebrew character but a bunch of other gibberish.
However, I am able to copy and paste the Hebrew text into Google Sheets manually and it displays correctly.
I have even tested passing Hebrew directly from Google Apps Script code like so:
function passHebrew() {
return "וַיְדַבֵּר";
}
This displays the Hebrew text properly on Google Sheets.
My code is as follows:
function parseText(book, chapter) {
//var bk = book;
//var ch = chapter;
var bk = '04'; //hard-coded for testing purposes
var ch = '01'; //hard-coded for testing purposes
var url = 'http://www.mechon-mamre.org/p/pt/pt' + bk + ch + '.htm';
var xml = UrlFetchApp.fetch(url).getContentText();
//I had to "fix" these xml errors for XmlService.parse(xml) below
//to function.
xml = xml.replace('<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">', '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">');
xml = xml.replace('<LINK REL="stylesheet" HREF="p.css" TYPE="text/css">', '<LINK REL="stylesheet" HREF="p.css" TYPE="text/css"></LINK>');
xml = xml.replace('<meta http-equiv="Content-Type" content="text/html; charset=Windows-1255">', '<meta http-equiv="Content-Type" content="text/html; charset=Windows-1255"></meta>');
xml = xml.replace(/ALIGN=CENTER/gi, 'ALIGN="CENTER"');
xml = xml.replace(/<BR>/gi, '<BR></BR>');
xml = xml.replace(/class=h/gi, 'class="h"');
//This section is the specific route to the table in the page I want
var document = XmlService.parse(xml);
var body = document.getRootElement().getChildren("BODY");
var maintable = body[0].getChildren("TABLE");
var maintablechildren = maintable[0].getChildren();
//This creates a two-dimensional array so that I can store the Hebrew
//in the first column and the English in the second column
var array = new Array(maintablechildren.length);
for (var i = 0; i < maintablechildren.length; i++) {
array[i] = new Array(2);
}
//This is where the table gets parsed into the array
for (var i = 0; i < maintablechildren.length; i++) {
var verse = maintablechildren[i].getChildren();
//This is where the encoding problem occurs.
//I originally tried verse[0].getText() but it didn't work.
array[i][0] = Utilities.newBlob(verse[0].getText()).getDataAsString('UTF-8');
//This array receives the English text and works fine.
array[i][1] = verse[1].getText();
}
return array;
}
What am I overlooking, misunderstanding, or doing wrong? I don't have a very good understanding of how encoding works so I don't understand why converting it to UTF-8 isn't working.
Your problem occurs before the lines you've commented as an encoding problem: because the default encoding for UrlFetchApp is munging the unicode text from the start.
You should use the variation of the .getContentText() method that Returns the content of an HTTP response encoded as a string of the given charset. For your case:
var xml = UrlFetchApp.fetch(url).getContentText("Windows-1255");
That should be all you need to change, although the blob() work-around is no longer needed. (It's harmless, though.) Other comments:
The logical OR operator (||) is very helpful for setting default values. I've tweaked the first few lines to enable testing but still let the function operate normally with arguments.
The way you're setting up an empty array before populating it with strings is Bad JavaScript; it's complex code that isn't needed, so toss it. Instead, we'll declare the array Array, then push() rows onto it.
The .replace() functions can be reduced with more clever RegExp use; I've included the URLs for demos of the really tricky ones.
There were \n newline characters in the text which I guessed were unnecessary for your purposes, so added a replace() for them as well.
Here's what you're left with:
function parseText(book, chapter) {
var bk = book || '04'; //hard-coded for testing purposes
var ch = chapter || '01'; //hard-coded for testing purposes
var url = 'http://www.mechon-mamre.org/p/pt/pt' + bk + ch + '.htm';
var xml = UrlFetchApp.fetch(url).getContentText("Windows-1255");
//I had to "fix" these xml errors for XmlService.parse(xml) below
//to function.
xml = xml.replace(/(<!DOCTYPE.*EN")>/gi, '$1 "">')
.replace(/(<(LINK|meta).*>)/gi,'$1</$2>') // https://regex101.com/r/nH3pU8/1
.replace(/(<.*?=)([^"']*?)([ >])/gi,'$1"$2"$3') // https://regex101.com/r/eP7wO7/1
.replace(/<BR>/gi, '<BR/>')
.replace(/\n/g, '')
//This section is the specific route to the table in the page I want
var document = XmlService.parse(xml);
var body = document.getRootElement().getChildren("BODY");
var maintable = body[0].getChildren("TABLE");
var maintablechildren = maintable[0].getChildren();
//This is where the table gets parsed into the array
var array = [];
for (var i = 0; i < maintablechildren.length; i++) {
var verse = maintablechildren[i].getChildren();
//I originally tried verse[0].getText() but it didn't work.** It does now!
var hebrew = verse[0].getText();
//This array receives the English text and works fine.
var english = verse[1].getText();
array.push([hebrew,english]);
}
return array;
}
Results
[
[
"  וַיְדַבֵּר יְהוָה אֶל-מֹשֶׁה בְּמִדְבַּר סִינַי, בְּאֹהֶל מוֹעֵד:  בְּאֶחָד לַחֹדֶשׁ הַשֵּׁנִי בַּשָּׁנָה הַשֵּׁנִית, לְצֵאתָם מֵאֶרֶץ מִצְרַיִם--לֵאמֹר.",
" And the LORD spoke unto Moses in the wilderness of Sinai, in the tent of meeting, on the first day of the second month, in the second year after they were come out of the land of Egypt, saying:"
],
[
"  שְׂאוּ, אֶת-רֹאשׁ כָּל-עֲדַת בְּנֵי-יִשְׂרָאֵל, לְמִשְׁפְּחֹתָם, לְבֵית אֲבֹתָם--בְּמִסְפַּר שֵׁמוֹת, כָּל-זָכָר לְגֻלְגְּלֹתָם.",
" 'Take ye the sum of all the congregation of the children of Israel, by their families, by their fathers' houses, according to the number of names, every male, by their polls;"
],
[
"  מִבֶּן עֶשְׂרִים שָׁנָה וָמַעְלָה, כָּל-יֹצֵא צָבָא בְּיִשְׂרָאֵל--תִּפְקְדוּ אֹתָם לְצִבְאֹתָם, אַתָּה וְאַהֲרֹן.",
" from twenty years old and upward, all that are able to go forth to war in Israel: ye shall number them by their hosts, even thou and Aaron."
],
...

How to Use .as Exported File from PhysicsEditor

The question was here for a long time with bounty and no satisfying solution for me. I erased the first post and am posting instead a question that can be answered quickly with a yes or no so I can proceed with my doings.
If you could answer it really fast before it's deleted by "not a good question". Is using a custom shape from PhysicsEditor to Nape the same as doing it with Box2D? (ofc changing syntax)
If you could then give a look in that link then say it's the same process in Nape that'll be enought thanks.
I ask this because I found the Box2D tutorial easier to follow so far.
public var floor:Body;
floor = new Body(BodyType.STATIC);
var floorShape:PhysicsData = new PhysicsData();
floor.shapes.add(floorShape); // Error: Implicit coercion of a value of type PhysicsData to an unrelated type nape.shape:Shape.
floor.space = space;
Update:
According to a comment on this blog post, it sounds like recent versions of Nape have broken compatibility with the physics editor. Specifically, the graphic and graphicUpdate properties no longer exist on body objects. The solution suggested is to remove references to those properties.
I'm not in a position to be able to test this, but you could try updating the createBody method of your floor class as follows:
public static function createBody(name:String /*,graphic:DisplayObject=null*/):Body {
var xret:BodyPair = lookup(name);
//if(graphic==null) return xret.body.copy();
var ret:Body = xret.body.copy();
//graphic.x = graphic.y = 0;
//graphic.rotation = 0;
//var bounds:Rectangle = graphic.getBounds(graphic);
//var offset:Vec2 = Vec2.get(bounds.x-xret.anchor.x, bounds.y-xret.anchor.y);
//ret.graphic = graphic;
/*
ret.graphicUpdate = function(b:Body):void {
var gp:Vec2 = b.localToWorld(offset);
graphic.x = gp.x;
graphic.y = gp.y;
graphic.rotation = (b.rotation*180/Math.PI)%360;
}
*/
return ret;
}

Extracting links and twitter replies from a string

I am getting a string from Twitter into my Actionscript which is a unformatted string. I want to be able to extract any links and or any #replies from the string, then display it in htmlText.
So far I have this
var txt:String = "This is just some text http://www.thisisawebsite.com and some more text via #sumTwitter";
var twitterText:String = txt.slice(txt.indexOf("#"),txt.indexOf(" ",txt.indexOf("#")));
var urlText:String = txt.slice(txt.indexOf("http"),txt.indexOf(" ",txt.indexOf("http")));
var newURL:String = ""+urlText+"";
var arr:Array = txt.split(urlText);
var newString:String = arr[0] + newURL + arr[1];
var txtField:TextField = new TextField();
txtField.width = 500;
txtField.htmlText = newString;
addChild(txtField);
This is fine for extracting links, which finish with a space. But what if, like the #sumTwitter, it finishes at the end of the string. And also what if there are multiple links or #'s, is the best way to put it in a while loop?
Regular expressions are the best option for what you want, I think.
Check Grant Skinner's RegExr. You could write and test your own RegExp there, which is very convenient. But you also can find a lot of useful ready-to-use regexps created by different users. Check out the "community" tab in the right panel. There, search by some meaningful keywords like "twitter" and "url" and you'll get a good number of options.
For example,
Grab urls:
http://regexr.com?2s5m4
Capture twitter usernames:
http://regexr.com?2s5m7