Access and Loop External Datasource with Apify Puppeteer Scraper - puppeteer

The Apify Puppeteer Scraper does not expose jquery in the context object. I need to access an external JSON data source within the Puppeteer Scraper pageFunction and then loop over one of the nodes. Here is what I would do if jquery was available:
$.get(urlAPI, function(data) {
$.each(data.feed.entry, function(index, value) {
var url = value.URL;

As the handlePageFunction runs in node js context, there is no jQuery. You can easily include jQuery into page.evaluate function using Apify SDK.
async function pageFunction(context) {
const { page, request, log, Apify } = context;
await Apify.utils.puppeteer.injectJQuery(page);
const title = await page.evaluate(() => {
// There is jQuery include as we incleded it using injectJQuery method
return $('title').text()
});
return {
title,
}
}
EDIT: Using requestAsBrowser.
async function pageFunction(context) {
const { page, request, log, Apify } = context;
const response = await Apify.utils.requestAsBrowser({ url: "http://example.com" });
const data = JSON.parse(response.body);
return {
data,
}
}

You don't need JQuery (you can if you are familiar with it) to access an external resource.
Usually, we extract external data via common libraries like request or Apify's own httpRequest from a standalone actor. Unfortunately, Puppeteer Scraper doesn't allow usage of libraries (only dynamically downloaded which is probably overkill).
I would just use a modern fetch browser call. It is nicer than JQuery's AJAX and doesn't require inject.
async function pageFunction(context) {
const { page, request, log, Apify } = context;
const json = await page.evaluate(() => {
// There is jQuery include as we incleded it using injectJQuery method
return await fetch('http://my-json-url.com').then((resp) => resp.json())
});
// Process the JSON
}

Related

How to get metadata from Amazon Kinesis Video Streams via Video.js and http-streaming?

Now, I am working on client-side of Amazon Kinesis Video Streams, using video.js and http-streaming to display video.
However, on stream server there are some metadata (text only) for each fragment (as this link: https://aws.amazon.com/about-aws/whats-new/2018/10/kinesis-video-streams-fragment-level-metadata-support/).
I don't know how to get this data by using AWSJavaScriptSDK (Ex: https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/KinesisVideoMedia.html).
I've test with getMedia function, but it not working as expectation (just get media info one time, not each fragment)
var kinesisvideomedia = new AWS.KinesisVideoMedia({
//apiVersion: '2017-09-30',
region: options.region,
accessKeyId: options.accessKeyId,
secretAccessKey: options.secretAccessKey,
endpoint: response.DataEndpoint
});
// 3. Create the parameters for getMedia()
var mopts = {
StartSelector: {
StartSelectorType: 'EARLIEST'
},
StreamName: streamName
};
kinesisvideomedia.getMedia(mopts, function (error, vmresp) {
if (error) {
console.log(error);
}
//console.log(vmresp);
});
Many thanks for any support!
Your parameters only tells getMedia to grab the earliest fragment from the stream. If you want to get all the following fragments you have to use the ContinuationToken that was returned in the response from the previous call to getMedia when doing additional calls to getMedia.
Regarding the metadata on the fragment level, you need to parse the response payload, for example like in this example, using the video streams parser library
getMedia is not well documented in the js aws-sdk, the main trick is to use request.createReadStream() in order to stream the media chunks.
You could do it like
var kinesisvideomedia = new AWS.KinesisVideoMedia();
var kinesisvideo = new AWS.KinesisVideo();
const params = {
APIName: "GET_MEDIA",
StreamName: streamName
}
kinesisvideo.getDataEndpoint(params, function(err, data) {
if (err) {
throw(err)
}
console.log("Changing endpoint to", data.DataEndpoint);
kinesisvideomedia.endpoint = data.DataEndpoint;
var mopts = {
StartSelector: {
StartSelectorType: 'EARLIEST'
},
StreamName: streamName
};
const request = kinesisvideomedia.getMedia(mopts);
const stream = request.createReadStream();
stream.on('data', function(data) { console.log("data", data)})
});

Is it possible to perform an action with `context` on the init of the app?

I'm simply looking for something like this
app.on('init', async context => {
...
})
Basically I just need to make to calls to the github API, but I'm not sure there is a way to do it without using the API client inside the Context object.
I ended up using probot-scheduler
const createScheduler = require('probot-scheduler')
module.exports = app => {
createScheduler(app, {
delay: false
})
robot.on('schedule.repository', context => {
// this is called on startup and can access context
})
}
I tried probot-scheduler but it didn't exist - perhaps removed in an update?
In any case, I managed to do it after lots of digging by using the actual app object - it's .auth() method returns a promise containing the GitHubAPI interface:
https://probot.github.io/api/latest/classes/application.html#auth
module.exports = app => {
router.get('/hello-world', async (req, res) => {
const github = await app.auth();
const result = await github.repos.listForOrg({'org':'org name});
console.log(result);
})
}
.auth() takes the ID of the installation if you wish to access private data. If called empty, the client will can only retrieve public data.
You can get the installation ID by calling .auth() without paramaters, and then listInstallations():
const github = await app.auth();
const result = github.apps.listInstallations();
console.log(result);
You get an array including IDs that you can in .auth().

DialogflowSDK middleware return after resolving a promise

I'm currently playing around with the actions-on-google node sdk and I'm struggling to work out how to wait for a promise to resolve in my middleware before it then executes my intent. I've tried using async/await and returning a promise from my middleware function but neither method appears to work. I know typically you wouldn't override the intent like i'm doing here but this is to test what's going on.
const {dialogflow} = require('actions-on-google');
const functions = require('firebase-functions');
const app = dialogflow({debug: true});
function promiseTest() {
return new Promise((resolve,reject) => {
setTimeout(() => {
resolve('Resolved');
}, 2000)
})
}
app.middleware(async (conv) => {
let r = await promiseTest();
conv.intent = r
})
app.fallback(conv => {
const intent = conv.intent;
conv.ask("hello, you're intent was " + intent );
});
It looks like I should at least be able to return a promise https://actions-on-google.github.io/actions-on-google-nodejs/interfaces/dialogflow.dialogflowmiddleware.html
but I'm not familiar with typescript so I'm not sure if I'm reading these docs correctly.
anyone able to advise how to do this correctly? For instance a real life sample might be I need to make a DB call and wait for that to return in my middleware before proceeding to the next step.
My function is using the NodeJS V8 beta in google cloud functions.
The output of this code is whatever the actual intent was e.g the default welcome intent, rather than "resolved" but there are no errors. So the middleware fires, but then moves onto the fallback intent before the promise resolves. e.g before setting conv.intent = r
Async stuff is really fiddly with the V2 API. And for me only properly worked with NodeJS 8. The reason is that from V2 onwards, unless you return the promise, the action returns empty as it has finished before the rest of the function is evaluated. There is a lot to work through to figure it out, here's some sample boilerplate I have that should get you going:
'use strict';
const functions = require('firebase-functions');
const {WebhookClient} = require('dialogflow-fulfillment');
const {BasicCard, MediaObject, Card, Suggestion, Image, Button} = require('actions-on-google');
var http_request = require('request-promise-native');
process.env.DEBUG = 'dialogflow:debug'; // enables lib debugging statements
exports.dialogflowFirebaseFulfillment = functions.https.onRequest((request, response) => {
const agent = new WebhookClient({ request, response });
console.log('Dialogflow Request headers: ' + JSON.stringify(request.headers));
console.log('Dialogflow Request body: ' + JSON.stringify(request.body));
function welcome(agent) {
agent.add(`Welcome to my agent!`);
}
function fallback(agent) {
agent.add(`I didn't understand`);
agent.add(`I'm sorry, can you try again?`);
}
function handleMyIntent(agent) {
let conv = agent.conv();
let key = request.body.queryResult.parameters['MyParam'];
var myAgent = agent;
return new Promise((resolve, reject) => {
http_request('http://someurl.com').then(async function(apiData) {
if (key === 'Hey') {
conv.close('Howdy');
} else {
conv.close('Bye');
}
myAgent.add(conv);
return resolve();
}).catch(function(err) {
conv.close(' \nUh, oh. There was an error, please try again later');
myAgent.add(conv);
return resolve();
})})
}
let intentMap = new Map();
intentMap.set('Default Welcome Intent', welcome);
intentMap.set('Default Fallback Intent', fallback);
intentMap.set('myCustomIntent', handleMyIntent);
agent.handleRequest(intentMap);
});
A brief overview of what you need:
you have to return the promise resolution.
you have to use the 'request-promise-native' package for HTTP requests
you have to upgrade your plan to allow for outbound HTTP requests (https://firebase.google.com/pricing/)
So it turns out my issue was to do with an outdated version of the actions-on-google sdk. The dialogflow firebase example was using v2.0.0, changing this to 2.2.0 in the package.json resolved the issue

How to load google maps api asynchronously in Angular2

Usually in a plain javascript site, I can use the following script to reference google maps api and set the callback function with initMap.
<script async defer src="https://maps.googleapis.com/maps/api/js?callback=initMap"></script>
What I observed is the initMap function in the plain javascript site is under the window scope, and it can be referenced in the script parameter settings - ?callback=initMap, but once I write a component in angular2 with a component method called initMap, the initMap will be under the scope of my component. Then the async loading script I set up in the index will not be able to catch my component initMap method.
Specifically, I 'd like to know how to achieve the same thing in Angular2?
PS: I know there is an angular2-google-maps component available in alpha via npm, but it currently is shipped with limited capability, so I 'd like to know how to load it in an easier way without using another component so I can just use google maps api to implement my project.
I see you don't want another component, but polymer has components that work well with google apis. I have angular2 code that uses the polymer youtube data api. I had help getting it setup. Here is the plunker that got me started. I think the hardpart is getting setup for that callback, I'm sure you can do it without polymer. The example shows the tricky part an angular service is used to hook everything up.
const url = 'https://apis.google.com/js/client.js?onload=__onGoogleLoaded'
export class GoogleAPI {
loadAPI: Promise<any>
constructor(){
this.loadAPI = new Promise((resolve) => {
window['__onGoogleLoaded'] = (ev) => {
console.log('gapi loaded')
resolve(window.gapi);
}
this.loadScript()
});
}
doSomethingGoogley(){
return this.loadAPI.then((gapi) => {
console.log(gapi);
});
}
loadScript(){
console.log('loading..')
let node = document.createElement('script');
node.src = url;
node.type = 'text/javascript';
document.getElementsByTagName('head')[0].appendChild(node);
}
}
I came across this while trying to develop a progressive web app, i.e. where there was a possibility of not being online. There was an error in the code examples: onload in the google maps script should be callback. So my modification of user2467174 led to
map-loader.service.ts
const url = 'http://maps.googleapis.com/maps/api/js?key=xxxxx&callback=__onGoogleLoaded';
#Injectable()
export class GoogleMapsLoader {
private static promise;
public static load() {
// First time 'load' is called?
if (!GoogleMapsLoader.promise) {
// Make promise to load
GoogleMapsLoader.promise = new Promise( resolve => {
// Set callback for when google maps is loaded.
window['__onGoogleLoaded'] = (ev) => {
resolve('google maps api loaded');
};
let node = document.createElement('script');
node.src = url;
node.type = 'text/javascript';
document.getElementsByTagName('head')[0].appendChild(node);
});
}
// Always return promise. When 'load' is called many times, the promise is already resolved.
return GoogleMapsLoader.promise;
}
}
And then I have a component with
import { GoogleMapsLoader } from './map/map-loader.service';
constructor() {
GoogleMapsLoader.load()
.then(res => {
console.log('GoogleMapsLoader.load.then', res);
this.mapReady = true;
})
And a template
<app-map *ngIf='mapReady'></app-map>
This way the map div is only put into the dom if online.
And then in the map.component.ts we can wait until the component is placed into the DOM before loading the map itself.
ngOnInit() {
if (typeof google !== 'undefined') {
console.log('MapComponent.ngOnInit');
this.loadMap();
}
}
Just in case you'd like to make it a static function, which always returns a promise, but only gets the api once.
const url = 'https://maps.googleapis.com/maps/api/js?callback=__onGoogleMapsLoaded&ey=YOUR_API_KEY';
export class GoogleMapsLoader {
private static promise;
public static load() {
// First time 'load' is called?
if (!GoogleMapsLoader.promise) {
// Make promise to load
GoogleMapsLoader.promise = new Promise((resolve) => {
// Set callback for when google maps is loaded.
window['__onGoogleMapsLoaded'] = (ev) => {
console.log('google maps api loaded');
resolve(window['google']['maps']);
};
// Add script tag to load google maps, which then triggers the callback, which resolves the promise with windows.google.maps.
console.log('loading..');
let node = document.createElement('script');
node.src = url;
node.type = 'text/javascript';
document.getElementsByTagName('head')[0].appendChild(node);
});
}
// Always return promise. When 'load' is called many times, the promise is already resolved.
return GoogleMapsLoader.promise;
}
}
This is how you can get the api in other scripts:
GoogleMapsLoader.load()
.then((_mapsApi) => {
debugger;
this.geocoder = new _mapsApi.Geocoder();
this.geocoderStatus = _mapsApi.GeocoderStatus;
});
This is what I'm currently using:
loadMapsScript(): Promise<void> {
return new Promise(resolve => {
if (document.querySelectorAll(`[src="${mapsScriptUrl}"]`).length) {
resolve();
} else {
document.body.appendChild(Object.assign(document.createElement('script'), {
type: 'text/javascript',
src: mapsScriptUrl,
onload: doMapInitLogic();
}));
}
});
}
See my more comprehensive instructions here

DevExtreme datasource can't load Data Service data

I tried to accomplish the tutorial here, and when I used their data service, it worked just fine.
I modified the source to my data service (WCF Data Service v5.6, OData V2), and the list just shows the Loading sign and nothing happens.
The code should load any data type, it just has to be mapped accordingly. My service is availabe through the browser, I checked.
Here is the code:
DevExTestApp.home = function (params) {
var viewModel = {
dataSource: DevExpress.data.createDataSource({
load: function (loadOptions) {
if (loadOptions.refresh) {
try {
var deferred = new $.Deferred();
$.get("http://192.168.1.101/dataservice/dataservice.svc/People")
.done(function (result) {
var mapped = $.map(result, function (data) {
return {
name: data.Name
}
});
deferred.resolve(mapped);
});
}
catch (err) {
alert(err.message);
}
return deferred;
}
}
})
};
return viewModel;
}
What else should I set?
The try-catch block would not help is this case, because data loading is async. Instead, subscribe to the fail callback:
$.get(url)
.done(doneFunc)
.fail(failFunc);
Another common problem with accessing a web service from JavaScript is Same-Origin Policy. Your OData service have to support either CORS or JSONP. Refer to this discussion.