I was using the YouTube's get_video_info unofficial endpoint to get the resolution of a video. Sometime very recently, this has stopped working and gives me:
We're sorry...... but your computer or network may be sending automated queries. To protect our users, we can't process your request right now.
Is there another public endpoint via which I can get the size/aspect ratio of the video? I need the endpoint to be accessible without having to use some login or developer key.
I spent a couple of hours debugging youtube-dl, and I found out that for all videos, you can simply get the content of the web page, i.e.
(async () => {
// From the back-end
const urlVideo = "https://www.youtube.com/watch?v=aiSla-5xq3w";
const html = await (await fetch(urlVideo)).text();
console.log(html);
})();
And then you match the content against the following RegExp:
/ytInitialPlayerResponse\s*=\s*({.+?})\s*;\s*(?:var\s+meta|<\/script|\n)/
Sample HTML:
(async () => {
const html = await (await fetch("https://gist.githubusercontent.com/avi12/0255ab161b560c3cb7e341bfb5933c2f/raw/3203f6e4c56fff693e929880f6b63f74929cfb04/YouTube-example.html")).text();
const matches = html.match(/ytInitialPlayerResponse\s*=\s*({.+?})\s*;\s*(?:var\s+meta|<\/script|\n)/);
const json = JSON.parse(matches[1]);
const [format] = json.streamingData.adaptiveFormats;
console.log(`${format.width}x${format.height}`);
})();
Related
I'm scraping data from Google using Puppeteer. But before Puppeteer gets to the google page, an annoying popup appears (screenshot).
I want to prevent this from happening so that I don't have to click the "reject" / "allow" button in Puppeteer every time. What are the necessary cookies to achieve it?
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto("https://google.com/")
const cookies = [
{
domain: '.google.com',
expirationDate: 9999999999.298648,
name: 'CONSENT',
...
},
]
await page.setCookie(...cookies)
Instead of trying to hard code the cookies, just save them.
Come up with a file path where the cookies file will be saved, e.g. cookiesFilePath.
After launching puppeteer, check to see if a file exists at the path. If it exists load it.
Keep your code for navigating the consent form, as you will use it at least once. At the end of the accepting/rejecting, save page.cookies to cookiesFilePath
const cookiesPath = './cookies.json';
const browser = await puppeteer.launch()
const page = await browser.newPage()
if(await fs.existsSync(cookiesPath)){
let cookies = require(cookiesPath)
for(let cookie of cookies){
await page.setCookie(cookie)
}
}
await page.goto("https://google.com/")
const hasConsentForm = async ()=>{
// create query that will return consent form
const consentFormQuerySelector=''
// I dont remember if page.$eval querySelector fails
// throws errors or not
try{
return await page.$eval(consentFormQuerySelector,el=>Boolean(el))
}catch(err){
console.log(err)
return false
}
}
const navigateConsentForm = async ()=>{
// add your existing code that navigates the consent form
// ...
// after the consent form is cleared, save cookies
let cookies = await page.cookies()
fs.writeFileSync(cookiesPath,JSON.stringify(cookies,null,2))
}
if(await hasConsentForm()){
await navigateConsentForm()
}
Assuming that the lack of cookies or the lack of a certain cookie property is the reason that the consent form pops up, this is will make it so that after running once, the form doesnt show again.
I'm using this bit of code that is working for its purpose of setting one localStorage value, but am unsure how to apply multiple values using this method. Searching for documentation has lead me many places without clear answers.
const browser = await puppeteer.launch();
browser.on('targetchanged', async (target) => {
const targetPage = await target.page();
const client = await targetPage.target().createCDPSession();
await client.send('Runtime.evaluate', {
expression: `localStorage.setItem('hello', 'world')`,
});
});
Specifically, how can I set hello2, hello3, etc... within the expression?
expression: `localStorage.setItem('hello', 'world')`,
I want to get an image from IPFS into my Vue/Nuxt project. I already import ipfs by 'npm i ipfs'. But when i run "cont node = Ipfs.create()". it show error
but this error doesn't always happen, many times it works and I can get the image normally. Has anyone ever encountered this situation and have a solution?
async downloadImg () {
const node = await Ipfs.create()
const { agentVersion, id } = await node.id()
this.agentVersion = agentVersion
this.id = id
const cid = '/ipfs/QmY2dod6X7GFmqnQ6qCBiaeNxJWa3CYQaxEjGUfL5CqMAj'
// load the raw data from js-ipfs (>=0.40.0)
const bufs = []
const a = node.cat(cid)
for await (const buf of node.cat(cid)) {
bufs.push(buf)
}
const data = Buffer.concat(bufs)
const blob = new Blob([data], { type: 'image/jpg' })
this.imageSrc = window.URL.createObjectURL(blob)
},
If am I true it can depends on the web protocol which you use, if you use https its work and in other protocol not !
The web crypto API is only available on pages accessed via https. If you're seeing that message, you are probably accessing the page via plain http.
Follow the github link at the top of the stack trace for more explanation and solutions.
I try to get google translation website to do some work for me, the website returns a blank web page with a json file. Using web brower, I can save the json file and open it in a text editor.
I am trying to use puppeteer to get this done automatically. Here is my code:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({headless:false, args: ["--no-sandbox"]});
const page = await browser.newPage();
// Approach 1:
const response = await page.goto('https://translate.googleapis.com/translate_a/single?client=gtx&sl=en&tl=zh&dt=t&q=Edit%20Report');
let text = await response.text();
console.log(text);
let json = await response.json();
console.log(json);
await browser.close();
})();
When I run this code, brower is launched, but the returned json file still get automatically saved to the disk instead of printing to the console. What puppeteer class I should use for this task?
Since it is an API call and the expected result is JSON, you can use a simple nodsJS or Jquery code to return the response as below.
$.get('https://translate.googleapis.com/translate_a/single?`client=gtx&sl=en&tl=zh&dt=t&q=Edit%20Report', (data) =>`
{
console.log(data);
});
but if you are particular about using puppeteer and want to return the response. you would do the following.
Add a Jquery dependency to your project, by running
npm install jquery
Import the JQuery to the project.
Invoke the below code, without launching the browser.
$.get('https://translate.googleapis.com/translate_a/single?client=gtx&sl=en&tl=zh&dt=t&q=Edit%20Report', (data) =>
{
console.log(data);
});
Here is the link to JSfiddle code https://jsfiddle.net/faizmagic/0h6cm1o4/latest/
I hope this helps.
I need to execute a script in every Window object created in Chrome – that is:
tabs opened through puppeteer
links opened by click()ing links in puppeteer
all the popups (e.g. window.open or "_blank")
all the iframes contained in the above
it must be executed without me evaluating it explicitly for that particular Window object...
I checked Chrome's documentation and what I should be using is Page.addScriptToEvaluateOnNewDocument.
However, it doesn't look to be possible to use through puppeteer.
Any idea? Thanks.
This searches for a target in all browser contexts.
An example of finding a target for a page opened
via window.open() or popups:
await page.evaluate(() => window.open('https://www.example.com/'))
const newWindowTarget = await browser.waitForTarget(async target => {
await page.evaluate(() => {
runTheScriptYouLike()
console.log('Hello StackOverflow!')
})
})
via browser.pages() or tabs
This script run evaluation of a script in the second tab:
const pageTab2 = (await browser.pages())[1]
const runScriptOnTab2 = await pageTab2.evaluate(() => {
runTheScriptYouLike()
console.log('Hello StackOverflow!')
})
via page.frames() or iframes
An example of getting eval from an iframe element:
const frame = page.frames().find(frame => frame.name() === 'myframe')
const result = await frame.evaluate(() => {
return Promise.resolve(8 * 7);
});
console.log(result); // prints "56"
Hope this may help you