I am trying to download a HLS video once I am loggedon (cookies)
HLS videos are made of .m3u8 file and squeces of 00001.ts, 00002.ts ... files
I tried 2 methods
I download the files using the downloader
const client = await page.target().createCDPSession();
await client.send('Page.setDownloadBehavior', {
behavior: 'allow',
downloadPath: '/',
});
await page.evaluate((link) => {
location.href = link;
}, link);
This works for the .m3u8 file, however, it does not work for 00001.ts : because the browser opens these files and does not download them.
I have no idea how to force the downloading of .ts files and not opening them
I tried another way (from https://help.apify.com/en/articles/1929322-handling-file-download-with-puppeteer)
But this did not work either
await page.setRequestInterception(true);
await page.goto('https://eedols.com/m3u8/20221205130402-SiBOhsUsJbTdZLP-DfqGGLn5cF-6-l.m3u8');
const xRequest = await new Promise(resolve => {
page.on('request', interceptedRequest => {
interceptedRequest.abort(); //stop intercepting requests
resolve(interceptedRequest);
});
});
const options = {
encoding: null,
method: xRequest._method,
uri: xRequest._url,
body: xRequest._postData,
headers: xRequest._headers
}
/* add the cookies */
const cookies = await page.cookies();
options.headers.Cookie = cookies.map(ck => ck.name + '=' + ck.value).join(';');
/* resend the request */
const response = await request(options);
It is stuck.
Any idea on that ?
Related
When a page is rendered using the page.setContent method of some static Html content, what is the current folder for attributes such as the src of img tags?
For example, for:
await page.setContent("<img src="./pic.jpg" />");
where is the folder ./?
Maybe it's undefined, here is my test result:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
page.on('request', request => console.log('send request: ' + request.url()));
page.on('console', message => console.log('console: ' + message.text()));
await page.setContent('<img src="./test.jpg" /><script>console.log("href="+window.location.href);</script>');
await browser.close();
})();
output:
console: href=about:blank
The page URL is about:blank and there's no requests sent.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
page.on('request', request => console.log('send request: ' + request.url()));
page.on('console', message => console.log('console: ' + message.text()));
await page.setContent('<base href="https://www.google.com"><img src="./test.jpg" /><script>console.log("href="+window.location.href);</script>');
await browser.close();
})();
output:
console: href=about:blank
send request: https://www.google.com/test.jpg
console: Failed to load resource: the server responded with a status of 404 ()
browser request test.jpg after appending a base element while the URL is still about:blank
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
page.on('request', request => console.log('send request: ' + request.url()));
page.on('console', message => console.log('console: ' + message.text()));
// set base href to local URL
await page.setContent('<base href="file:///abc/index.html"><img src="./test.jpg" /><script>console.log("href="+window.location.href);</script>');
await browser.close();
})();
output:
console: href=about:blank
console: Not allowed to load local resource: file:///abc/test.jpg
send request: file:///abc/test.jpg
The folder is located from the page you are visiting.
For example if the URL is
mydomain.com/directory1/page.html
The image can be found at mydomain.com/directory1/pic.jpg
i use gologin service. gologin is a browser antidetect service where I can fake my browser identity / can manage browser fingerprint.
so I can freely do web-scraping without being detected.
in this case I want to be able to load my extension into that browser using the puppeteer.connect() method.
here's the code:
const puppeteer = require('puppeteer-core');
const GoLogin = require('gologin');
(async () => {
const GL = new GoLogin({
token: 'yU0token',
profile_id: 'yU0Pr0f1leiD',
});
const { status, wsUrl } = await GL.start();
const browser = await puppeteer.connect({
browserWSEndpoint: wsUrl.toString(),
ignoreHTTPSErrors: true,
});
const page = await browser.newPage();
await page.goto('https://myip.link/mini');
console.log(await page.content());
await browser.close();
await GL.stop();
})();
I don't know how. please help me, so i can load my extension using this puppeteer.connect()
Assume your wish is loading chrome-extension into your puppeteer browser.
Find chrome-extension Working Directory Where does Chrome store extensions?
Find your extension ID by go to chrome://extensions/
Sample code:
const puppeteer = require('puppeteer-core');
const MY_EXTENSION_PATH = '~/Library/Application Support/Google/Chrome/Default/Extensions/cdockenadnadldjbbgcallicgledbeoc/0.3.38_0'
async function loadExtension() {
return puppeteer.launch({
headless: 0,
args: [
`--disable-extensions-except=${MY_EXTENSION_PATH}`,
`--load-extension=${MY_EXTENSION_PATH}`,
],
});
}
Any existing sample on how to use puppeteer with nordVpn ?
I tried that:
page = await browser.newPage();
await useProxy(page, `socks5://login:password}#fr806.nordvpn.com:1080`);
I also tried:
'--proxy-server=socks5://login:password#fr806.nordvpn.com:1080'
This script works, you need to change the user/pass to yours... these are not your Nord user/pass... you need to get the service/api ones from in your account settings. Change the server to whatever one you need to use.
#!/usr/bin/env node
// Screengrab generator
// outputs a JSON object with a base64 encoded image of the screengrab
// eg;
const puppeteer = require('puppeteer');
let conf = new Object();
conf.url = "https://www.telegraph.co.uk";
// VPN
conf.vpnUser = conf.vpnUSer || 'USERNAME';
conf.vpnPass = conf.vpnPass || 'PASSWORD';
conf.vpnServer = conf.vpnServer || "https://uk1785.nordvpn.com:89";
(async() => {
const browser = await puppeteer.launch({
headless: true,
args: [
'--disable-dev-shm-usage',
'--proxy-server='+conf.vpnServer
]
});
try {
const page = await browser.newPage();
await page.authenticate({
username: conf.vpnUser,
password: conf.vpnPass,
});
await page.goto(conf.url, { waitUntil: 'networkidle2' });
} catch (error) {
console.error(error);
} finally {
await browser.close();
}
})();
I have a node js application that creates dynamic content which I want users to download.
static async downloadPDF(res, html, filename) {
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch({
headless: true
});
const page = await browser.newPage()
await page.setContent(html, {
waitUntil: 'domcontentloaded'
})
const pdfBuffer = await page.pdf({
format: 'A4'
});
res.set("Content-Disposition", "attachment;filename=" + filename + ".pdf");
res.setHeader("Content-Type", "application/pdf");
res.send(pdfBuffer);
await browser.close()
}
Is there a way to speed up the whole process since it takes about 10 seconds to create a pdf file of size about 100kb?
I read somewhere that I can launch the headless browser once then I will only be creating a new page instead of launching a browser every time I request for the file.
I cannot find out a correct way of doing it.
You could move page creation to a util and hoist it to re-use it.
const puppeteer = require('puppeteer');
let page;
const getPage = async () => {
if (page) return page;
const browser = await puppeteer.launch({
headless: true,
});
page = await browser.newPage();
return page;
};
.
const getPage = require('./getPage');
static async downloadPDF(res, html, filename) {
const page = await getPage()
}
Yes, no reason to launch browser every time. You can set puppeter to call new url and get content. Without every time launching, it would be more faster.
How implement this ? Cut your function to three steps :
Create a browser instance. No matter headless or not. If you run app in X environment, you can launch a window, to see what your puppetter do
Create a function code, that will do main task in cycle.
After block is done, call await page.goto(url) ( where "page" is the instance of browser.newPage() ) and run your function again.
This is one of possible solution in function style code :
Create a instnces :
const browser = await puppeteer.launch( {'headless' : false });
const page = await browser.newPage();
page.setViewport({'width' : 1280, 'height' : 1024 });
I put it in realtime async function like (async ()=>{})();
Gets a data
Im my case, a set of urls was in mongo db, after getting it, I had ran a cycle :
for( const entrie of entries)
{
const url = entrie[1];
const id = entrie[0];
await get_aplicants_data(page,url,id,collection);
}
In get_aplicants_data() I had realized a logic according a loaded page :
await page.goto(url); // Going to url
.... code to prcess page data
Also you can load url in cycle and then put in your logic
Hope I have given you some help )
My app records an audiofile from the microphone in the browser, typically Chrome; sends it to Firebase Storage; then a Firebase Cloud Function sends the audiofile to Google Cloud Speech-to-Text. Everything works with IBM Cloud Speech-to-Text. With Google Cloud Speech-to-Text it works if I send the audio/flac sample file "several tornadoes touched down as a line of severe thunderstorms swept through Colorado on Sunday". But when I send an audiofile recorded in the browser I get back an error message:
Error: 3 INVALID_ARGUMENT: Request contains an invalid argument.
Here's the browser code. The audio settings are at the top: audio/webm;codecs=opus and 48000 bits per second. This is the only media file format and encoding that Chrome supports.
navigator.mediaDevices.getUserMedia({ audio: true, video: false })
.then(stream => {
var options = {
audioBitsPerSecond: 48000, // switch to 8000 on slow connections?
mimeType: 'audio/webm;codecs=opus' // only options on Chrome
};
const mediaRecorder = new MediaRecorder(stream, options);
mediaRecorder.start();
const audioChunks = [];
mediaRecorder.addEventListener("dataavailable", event => {
audioChunks.push(event.data);
});
mediaRecorder.addEventListener("stop", () => {
const audioBlob = new Blob(audioChunks);
firebase.storage().ref('Users/' + $scope.user.uid + '/Pronunciation_Test').put(audioBlob) // upload to Firebase Storage
.then(function(snapshot) {
firebase.storage().ref(snapshot.ref.location.path).getDownloadURL() // get downloadURL
.then(function(url) {
firebase.firestore().collection('Users').doc($scope.user.uid).collection("Pronunciation_Test").doc('downloadURL').set({downloadURL: url})
.then(function() {
console.log("Document successfully written!");
})
.catch(function(error) {
console.error("Error writing document: ", error);
});
})
.catch(error => console.error(error))
})
.catch(error => console.error(error));
// play back the audio blob
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
audio.play();
});
setTimeout(() => {
mediaRecorder.stop();
}, 3000);
})
.catch(function(error) {
console.log(error.name + ": " + error.message);
});
Firebase Storage converts the audiofile from webm/opus to application/octet-streaming.
Here's my Firebase Cloud Function that gets an audiofile from Firebase Storage and sends it to Google Cloud Speech-to-Text.
exports.Google_Speech_to_Text = functions.firestore.document('Users/{userID}/Pronunciation_Test/downloadURL').onUpdate((change, context) => {
// Imports the Google Cloud client library
const speech = require('#google-cloud/speech');
// Creates a client
const client = new speech.SpeechClient();
const downloadURL = change.after.data().downloadURL;
const gcsUri = downloadURL;
const encoding = 'application/octet-stream';
const sampleRateHertz = 48000;
const languageCode = 'en-US';
const config = {
encoding: encoding,
sampleRateHertz: sampleRateHertz,
languageCode: languageCode,
};
const audio = {
uri: gcsUri,
};
const request = {
config: config,
audio: audio,
};
// Detects speech in the audio file
return response = client.recognize(request)
.then(function(response) {
const [responseArray] = response;
const transcription = responseArray.results
.map(result => result.alternatives[0].transcript)
.join('\n');
console.log(`Transcription: `, transcription);
})
.catch((err) => { console.error(err); });
}); // close Google_Speech_to_Text
Here's the list of supported media formats and encodings for Google Cloud Speech-to-Text:
MP3
FLAC
LINEAR16
MULAW
AMR
AMR_WB
OGG_OPUS
SPEEX_WITH_HEADER_BYTE
webm/opus and application/octet-streaming aren't on the list.
Am I missing something or is it impossible to record an audiofile in Chrome, save it in Firebase Storage, and then send it to Google Cloud Speech-to-Text? It seems strange that Google products wouldn't work together. Do I have to recode the audiofile with ffmpeg before I send it to Google Cloud Speech-to-Text?
you can change the file type before uploading it
https://developer.mozilla.org/en-US/docs/Web/API/Blob
mediaRecorder.addEventListener("stop", () => {
//const audioBlob = new Blob(audioChunks);
const audioBlob = new Blob(audioChunks, {type : 'webm/opus'});
// try to change 'webm/opus' to something supported by google.