Chrome Headless print-to-pdf using PDF file input - google-chrome

I'm trying to use Headless Chrome to re-print a pdf file from a pdf file.
Similar Post: Print PDF with headless chrome in ubuntu
I've tried several variations of this:
chrome --headless --disable-gpu --print-to-pdf=E:/Python/Working/foo_flat3.pdf file://E:/Python/Working/foo_flat2.pdf
chrome --headless --disable-gpu --print-to-pdf=E:/Python/Working/foo_flat3.pdf "E:/Python/Working/foo_flat2.pdf"
And no PDF file is generated. When I use a webpage, for example:
chrome --headless --disable-gpu --print-to-pdf=E:/Python/Working/foo_flat3.pdf https://www.google.com/
It functions as intended, and "foo_flat3.pdf" is generated.
Why am I doing this? I'm trying to make use of Chrome's ability to unlock and flatten pdf files, as it is the fastest option I've found. If there's other options, I would be interested in those too.

Related

chrome headless mode run in batch mode

I am trying to convert a series of html files to pdfs and then send the generated pdf file. However, if i start headless chrome like
...chrome.exe --headless --disable-gpu --print-to-pdf=file01.pdf source01.html
The chrome process exits immediately even though it could take some time before the pdf is generated. Is there a way or flag to instruct chrome.exe to wait for the pdf to be saved before exiting?
p.s.
Is there also a way to see html/js erros on the page?
p.p.s
I have given the simplest example.
What I actually run has
--run-all-compositor-stages-before-draw
--virtual-time-budget=10000 # Need to render some stuff first
--enable-logging

Take a screenshot of page using Chrome --headless + cookies/oauth token

I've found, that it's possible to take a page screenshot easily - https://superuser.com/questions/1410641/how-to-take-screenshots-of-a-list-of-urls - by calling:
"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" --headless --disable-gpu --enable-logging --screenshot="C:\path\to\screenshot.png" http://example.com/
However it's starting new Chrome process, which results that I'm able to take a screenshots of simple pages, but what with pages, with authentication (like Oauth2)?
Can I somehow fore chrome, to use already existing "session" of browser, where I'm logged in to interested page, that I'm about to take a screenshot OR to pass oauth2 token or even cookie, somehow?

How to access PDF from headless chrome browser

I am trying to access a pdf from a headless chrome browser.
Below is the command I am trying to hit from the command prompt(Administrator). I can see the pdf file gets generated. However, I can not see any data coming up in the pdf.
The URL in the below command needs a password to access when I open it on the browser. It will really be a great help if you can let me know if I do need to use username, password (If yes then how and whats the command).Many thanks!!
"%chrome%" --headless --disable-gpu --print-to-pdf="C:\Users\i077219\Desktop\temp3.pdf "https://**ipaddress**/s/opu/odata/sp/CA_OC_OUTPUT_REQUEST_SRV/Items(ApplObjectType='BILLING_DOCUMENT',ApplObjectId='0090004410',ItemId='1')/GetDocument/$value/

--load-extension parameter for chrome doesn't work (mac)

This is the command to load a single extension
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --load-extension="~/Library/Application\ Support/Google/Chrome/Default/Extensions/cflchafndefoljnhhholeekfpgmbphaf/1.51_0/"
It throws me “Failed to load extension from: . Manifest file is missing or unreadable.”.
The weird thing is if I copy the extension to /tmp/ and load from there. It works.

Open URL in Chrome & save its source code using Command prompt

I am having a hard time to find how to save the page as html or .txt using command line in Chrome Browser,
This is what I've done so far,
C:\Users\Cipher\AppData\Local\Google\Chrome\Application>chrome.exe --new-window
http://google.com
This command will open a new window of Chrome browser and visit google.com but i couldn't be able to figure our how can i save google.com as html or as txt file ,
is there anyway to do so using command prompt ?
You cannot perform the task you describe manually, but you can perform it using WebDriver automation.
Chrome can be remote controlled using an API called WebDriver (part of Selenium 2 automating suite). WebDrive has bindings for various programming languages, including e.g. JavaScript and Python.
Here is example code for Python (not tested):
from selenium import webdriver
driver = webdriver.Chrome('/path/to/chromedriver') # Optional argument, if not specified will search path.
driver.get('http://www.google.com/');
html = driver.page_source
f = open("myhtml", "wt")
f.write(html)
f.close()
Orignal example
Do you really need to open Google Chrome? You can get the page source using Wget (available for UNIX systems or for Windows in this post on SuperUser). Once installed, just use the following command:
wget http://google.com -O yourfilename.html
And this should be all :) I don't think there's a way to tell Chrome to download the HTML from the command line though :(
UPDATE: There's a repo on GitHub called chrome-cli that allows the user to control Chrome from the command line. Downside is that it only works on Mac OS X.
I created a small script to do perform exactly this task: https://github.com/abiyani/automate-save-page-as . See the demo gif in the README.
It automates the keyboard actions you would otherwise perform to save the page manually (literally sends those key signals to OS). As a side effect of it being used in another project of mine, it's been tested on various linux flavors: Ubuntu, Mint, Fedora, etc - and works fine on all of them. It probably won't work (at least without modifications) on Mac, and certainly not on Windows.
This should work :
cd c:\Program Files (x86)\Google\Chrome\Application
c:\Program Files (x86)\Google\Chrome\Application>chrome.exe --headless --dump-dom --enable-logging --disable-gpu https://www.google.com >c:\yourpath\yourfile.html