chrome headless mode run in batch mode - google-chrome

I am trying to convert a series of html files to pdfs and then send the generated pdf file. However, if i start headless chrome like
...chrome.exe --headless --disable-gpu --print-to-pdf=file01.pdf source01.html
The chrome process exits immediately even though it could take some time before the pdf is generated. Is there a way or flag to instruct chrome.exe to wait for the pdf to be saved before exiting?
p.s.
Is there also a way to see html/js erros on the page?
p.p.s
I have given the simplest example.
What I actually run has
--run-all-compositor-stages-before-draw
--virtual-time-budget=10000 # Need to render some stuff first
--enable-logging

Related

Chrome tries to execute .sh file when displaying github page

Whenever I try to display the github page for a bash (sh) script in my regular profile, Chrome tries to execute the script. If I open the page in an Incognito browser Chrome does not try to execute the script and shows the github page with the bash script.
How can I fix my profile to allow me to open github pages with bash (sh) scripts without using the Incognito browser?
PS. Versions or machines do not seem to be the issue...

Why doesn't Chromium headless dump the DOM when I tell it to?

Here's exactly what I did:
I went to https://download-chromium.appspot.com/ and clicked the button.
I ran the file (oddly called chrome-win.exe instead of Chromium.exe).
I went to its install directory and opened a cmd.exe in there.
I ran the command:
chrome.exe --headless --dump-dom "https://www.example.com/"
According to the manual, this is supposed to open that URL headlessly and dump the DOM as text after JavaScript has been executed, to the stdout, meaning the cmd.exe in this case.
Problem: Nothing happens. Literally no output at all. The only thing that I can tell that happens (and that was just out of pure coincidence) is that a file called chrome_debug.txt is created in the same directory, with this contents:
[0712/065333.417:ERROR:browser_process_sub_thread.cc(203)] Waited 5 ms for network service
If I instead run the command:
chrome.exe "https://www.example.com/"
It opens the browser and goes to that URL (as expected). So it's not something fundamentally wrong with my Internet connection or computer.
What am I doing wrong?
You might want to try to enable logging by adding the --enable-logging flag to the command line.
Also, although according to this bug report this is no longer necessary, it may be wise to add the --disable-gpu flag to prevent GPU errors from showing in the stdout.
The final command line should look like this:
chrome.exe --headless --enable-logging --disable-gpu --dump-dom "https://www.example.com/""
which returns the DOM of www.example.com/ on chromium 76.0.3809.87 succesfully.

Take a screenshot of page using Chrome --headless + cookies/oauth token

I've found, that it's possible to take a page screenshot easily - https://superuser.com/questions/1410641/how-to-take-screenshots-of-a-list-of-urls - by calling:
"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" --headless --disable-gpu --enable-logging --screenshot="C:\path\to\screenshot.png" http://example.com/
However it's starting new Chrome process, which results that I'm able to take a screenshots of simple pages, but what with pages, with authentication (like Oauth2)?
Can I somehow fore chrome, to use already existing "session" of browser, where I'm logged in to interested page, that I'm about to take a screenshot OR to pass oauth2 token or even cookie, somehow?

Keep Chrome running in headless mode

I want to use Chrome browser in headless mode to produce images (PNG, JPEG) out of SVG graphic. Code works in normal interactive mode, but I have problems to use it in headless mode.
My main problem is that headless Chrome exits before drawing of HTML page is completed. As I understand, if I start Chrome with following arguments:
chromium --headless http://myserver.org
It exits together with document.onload event. But at this moment not all data fetched from the server (I using XMLHttpRequest) and therefore drawing is not complete.
I found workaround if I start chrome with debugging port enabled like:
chromium --headless --remote-debugging-port=7777 http://myserver.org
But this is not that I want, especially when I do not have privileges to open http ports on the node. Is there possibility to let Chrome running longer with other flags? I check a lot of them, but did not found appropriate one. Or is there any other methods to postpone exit of the headless Chrome?
You could try this answer https://stackoverflow.com/a/46424041/4830701
Copy pasted here for reference
Use the binary /opt/google/chrome/chrome directly not google-chrome which points to bash script /usr/bin/google-chrome.
Taken from comments in
https://developers.google.com/web/updates/2017/04/headless-chrome#screenshots

Open URL in Chrome & save its source code using Command prompt

I am having a hard time to find how to save the page as html or .txt using command line in Chrome Browser,
This is what I've done so far,
C:\Users\Cipher\AppData\Local\Google\Chrome\Application>chrome.exe --new-window
http://google.com
This command will open a new window of Chrome browser and visit google.com but i couldn't be able to figure our how can i save google.com as html or as txt file ,
is there anyway to do so using command prompt ?
You cannot perform the task you describe manually, but you can perform it using WebDriver automation.
Chrome can be remote controlled using an API called WebDriver (part of Selenium 2 automating suite). WebDrive has bindings for various programming languages, including e.g. JavaScript and Python.
Here is example code for Python (not tested):
from selenium import webdriver
driver = webdriver.Chrome('/path/to/chromedriver') # Optional argument, if not specified will search path.
driver.get('http://www.google.com/');
html = driver.page_source
f = open("myhtml", "wt")
f.write(html)
f.close()
Orignal example
Do you really need to open Google Chrome? You can get the page source using Wget (available for UNIX systems or for Windows in this post on SuperUser). Once installed, just use the following command:
wget http://google.com -O yourfilename.html
And this should be all :) I don't think there's a way to tell Chrome to download the HTML from the command line though :(
UPDATE: There's a repo on GitHub called chrome-cli that allows the user to control Chrome from the command line. Downside is that it only works on Mac OS X.
I created a small script to do perform exactly this task: https://github.com/abiyani/automate-save-page-as . See the demo gif in the README.
It automates the keyboard actions you would otherwise perform to save the page manually (literally sends those key signals to OS). As a side effect of it being used in another project of mine, it's been tested on various linux flavors: Ubuntu, Mint, Fedora, etc - and works fine on all of them. It probably won't work (at least without modifications) on Mac, and certainly not on Windows.
This should work :
cd c:\Program Files (x86)\Google\Chrome\Application
c:\Program Files (x86)\Google\Chrome\Application>chrome.exe --headless --dump-dom --enable-logging --disable-gpu https://www.google.com >c:\yourpath\yourfile.html