Why does nothing run after page.goto twitter? - puppeteer

So I am running puppeteer and going to twitter, I am running the browser with headless false and saving cookies.
So my code is
const browser = await puppeteer.launch({
product: 'firefox',
headless: false,
userDataDir: './dataDir'
});
const page = await browser.newPage();
await page.setDefaultNavigationTimeout(0);
await page.goto('https://twitter.com/home');
console.log("hi");
It goes to twitter, it saves cookies because I went ahead and logged in and the next time I ran it I was logged in but it never gets to printing out hi on the console.
When I run it with some other url, like google.com or news.ycombinator.com it works fine, but not with twitter. which makes me think they have some secret sauce running there (although I would expect google to have that same secret sauce so hmmm)
I have tried with setting wait for events on the page.goto - like for example {waitUntil: "domcontentloaded"}but none of them improve the situation.
So anyway how can I go to Twitter with puppeteer and have that console.log show up after my page.goto.
ON EDIT: Have found that this affects FF with Puppeteer, but if I run Puppeteer with chromium do not have the problem. Would still like a solution of course, as I prefer to work in Firefox.

Related

How to prevent puppeteer from crawling my website content

I know that puppeteer is a simple and great tool, which can easily get the website data
As far as I know, if it is headless mode, there will be many properties different from normal browsers
But if I use the following method to link an open browser with the puppeteer , I can't detect it?
First :Modify Desktop Google Browser Shortcut Properties and open brwoser
C:\Users\13632\AppData\Local\Google\Chrome\Application\chrome.exe --remote-debugging-port=9222
const axios = require('axios')
const puppeteer = require('puppeteer')
async function main() {
const response = await axios.get(`http://127.0.0.1:9222/json/version`);
const webSocketDebuggerUrl = response.data.webSocketDebuggerUrl;
browser = await puppeteer.connect({
browserWSEndpoint: webSocketDebuggerUrl,
ignoreDefaultArgs: ["--enable-automation"],
slowMo: 100,
defaultViewport: { width: 1280, height: 600 },
});
let target = await browser.waitForTarget(t => t.url().includes("you url"))
const page = await target.page();
}
main()
The above method is to link to an opened browser, which is a normal Google browser. It seems that it is impossible to detect whether it is an automated tool? Is there any other way for me to judge whether the other party is a human or a machine
Browser profiling and automation detection (and beating it) is an entire subfield of its own. Some drivers (chromedriver; I've not used puppeteer) set flags to indicate automated use, but these are easily defeated. (See for instance undetected chromedriver for a package which tries not to be detectable.)
Then there's user profiling (bots tend to click in predictable ways), running JS in the browser to try to detect the environment, blacklisting ips (most bots are behind proxies), and so on.
Ask yourself: what are you afraid of? And then defend against that. Anything you put on the Internet can and will be crawled, but you can make it hard to do disruptive things like booking all the concert tickets and the reselling them with a 500% markup. Specific challenges like that have specific answers; but there is no foolproof way to detect automated browsers, and doing so is a waste of effort.

Is AudioWorklet.addModule logged in the Chrome network console?

I'm testing out some audio worklet code by loading an example module from Github via AudioWorklet.addModule(githubUrl). However when I look at the network tab in developer settings I don't see a network request to Github. I know that it is making the request because it was giving a CORS error before I switched to using raw.githubusercontent address (now it is giving Uncaught (in promise) DOMException: The user aborted a request). I want to be able to inspect what the network call is returning so I can help diagnose the problem.
There seems to be a bug in Chrome which prevents the network request from being shown in the dev tools. I think it would be a good idea to file a bug for that.
For now you could just use Firefox. It shows the network request in the dev tools.
If you want to keep using Chrome you can proxy your request with fetch() to make it appear in the devtools.
Instead of calling addModule() directly ...
context.audioWorklet.addModule(url)
... you could fetch the source first and then wrap it into an object URL.
fetch(url)
.then((response) => response.text())
.then((text) => {
const blob = new Blob([text], { type: 'application/javascript; charset=utf-8' });
const objectUrl = URL.createObjectURL(blob);
return context.audioWorklet.addModule(objectUrl)
.finally(() => URL.revokeObjectURL(objectUrl));
})

Google login fails with HTTP error 400 saying "Sorry, something went wrong there. Try again."

Description/background
I had set up a script which opened a Google site of our company in Google Chrome (not headless) and did some automated work on that page. The login information had to be refreshed occasionally what for I manually logged in. That had been working perfectly the last couple of months until last week. Today I noticed that I get the above mentioned error message as a result of a server response with HTTP status 400 upon entering my Gmail address and clicking the Next button.
Steps to reproduce
Puppeteer version: 2.0.0
Platform / OS version: Windows 10
URLs (if applicable): https://sites.google.com/...
Node.js version: v12.13.0
What steps will reproduce the problem?
Run a Puppeteer script to open a Google Site which requires login.
(async () => {
try {
const browser = await puppeteer.launch({headless: false, userDataDir: "<ProfileDirectory>"});
const pageLogin = await browser.newPage();
await pageLogin.goto('https://sites.google.com/...', {waitUntil: 'networkidle2'});
...
await browser.close();
}
catch (error) {
console.log(error.stacktrace);
}
})();
Manually enter Gmail address and click Next.
Get error message "Sorry, something went wrong there. Try again." as a result of a server response with HTTP status code 400.
Update:
Manually opening Chrome (same userDataDir) and the respective Google site still works as usual.
Recommend to use playwright/puppeteer + Firefox. It seems like google adds something into chrome so they can detect the browser is automated or not
One of the comments on this post mentions that Google tries to block logins with Puppeteer, Selenium etc. this might be why you are getting a 400 error.
One of the recent comments on the aforementioned post, links a gist with some example code that might still work, haven't tried it though.
While I was doing research on Puppeteer for Firefox, I noticed that (1) Puppeteer downloads its own local Google Chrome binaries it is executing and (2) my installed Puppeteer version 2.0.0 was outdated. Meaning, the browser actually used by Puppeteer was probably also outdated. The solution was as easy as to update Puppeteer to the latest version 2.1.1.

Programmatically start the performance profiling in Chrome

Is there a way to start the performance profiling programmatically in Chrome?
I want to run a performance test of my web app several times to get a better estimate of the FPS but manually starting the performance profiling in Chrome is tricky because I'd have to manually align the frame models. (I am using this technique to extract the frames)
CMD + Shift + E reloads the page and immediately starts the profiling, which alleviates the alignment problem but it only runs for 3 seconds as explained here. So this doesn't work.
Ideally, I'd like to click on a button to start my test script and also starts the profiling. Is there a way to achieve that?
in case you're still interested, or someone else may find it helpful, there's an easy way to achieve this using Puppeteer's tracing class.
Puppeteer uses Chrome DevTools Protocol's Tracing Domain under the hood, and writes a JSON file to your system that can be loaded in the dev tools performance panel.
To get a profile trace of your page's loading time you can implement the following:
const puppeteer = require('puppeteer');
(async () => {
// launch puppeteer browser in headful mode
browser = await puppeteer.launch({
headless: false,
devtools: true
});
// start a page instance in the browser
page = await browser.newPage();
// start the profiling, with a path to the out file and screenshots collected
await page.tracing.start({
path: `tests/logs/trace-${new Date().getTime()}.json`,
screenshots: true
});
// go to the page
await page.goto('http://localhost:8080');
// wait for as long as you want
await page.waitFor(4000);
// or you can wait for an element to appear with:
// await page.waitForSelector('some-css-selector');
// stop the tracing
await page.tracing.stop();
// close the browser
await browser.close();
})();
Of course, you'll have to install Puppeteer first (npm i puppeteer). If you don't want to use Puppeteer you can interact with Chrome DevTools Protocol's API directly (see link above). I didn't investigate that option very much since Puppeteer delivers a high level and easy to use API over CDP's API. You can also interact directly with CDP via Puppeteer's CDPSession API.
Hope this helps. Good luck!
You can use the chrome devtools protocol and use any driver library from here https://github.com/ChromeDevTools/awesome-chrome-devtools#protocol-driver-libraries to programmatically create a profile.
Use this method - https://chromedevtools.github.io/devtools-protocol/tot/Profiler#method-start to start a profile.

Chromecast - How to reconnect to the session after page refresh?

After reloading the page, method
cast.framework.CastContext.getInstance()
returns status 'NOT_CONNECTED' and 'NO_SESSION'
My code example:
const castContext = window.cast.framework.CastContext.getInstance();
castContext.setOptions({
receiverApplicationId:
window.chrome.cast.media.DEFAULT_MEDIA_RECEIVER_APP_ID,
autoJoinPolicy: window.chrome.cast.AutoJoinPolicy.ORIGIN_SCOPED,
resumeSavedSession: true
});
await castContext.requestSession(); // wait for prompt
const castSession = castContext.getCurrentSession();
const mediaInfo = new window.chrome.cast.media.MediaInfo(mediaUrl);
const request = new window.chrome.cast.media.LoadRequest(mediaInfo);
await castSession.loadMedia(request);
window.player = new window.cast.framework.RemotePlayer();
window.playerController = new window.cast.framework.RemotePlayerController(
window.player);
Can you please tell me how to connect to an existing session and receive information about playing media?
I been looking around for answers for a while and your switch of port make me think.
Switching port not a valid solution!
Why does a solution suddenly stop working without any changes to my Chromecast code?
Answer: Chrome got the hiccups. I guess this is either because of several live-refresh or something went wrong during development and it has been cached?!
Solution: I cleared the application storage and deleted my Chrome profile. After a restart of Chrome my solution reconnects with the chromecast as it did before.