How can I save a URL using Chrome in a headless server? - google-chrome

I want to programmatically save a url, as I can manually do with Chrome’s Save As. I’m open to using any tool to accomplish this. If it’s not possible with a headless Chrome, a headed (Is this the correct word? :D) solution is also acceptable.
Other browsers are also acceptable, but note that I want the url saved as a ‘sane’ browser saves it. (Because the url in question needs Javascript rendering, and crashes phantomjs, too.)
Note that I’ve tried selenium’s page_source; This didn’t output the same content as Chrome’s Save As, and its result was incomplete and comparable to a simple curl.
PS: Here’s the url I’m trying to save https://outline.com/zKpUhM.
Update 1:
I’ve found the following non headless solution:
https://github.com/abiyani/automate-save-page-as

I have found an answer in https://splash.readthedocs.io.
After installation, this little shell function does exactly what I want:
full-html () {
#doc splash should be up. https://splash.readthedocs.io
#doc 'wait always waits the full time. Should be strictly < timeout.'
curl --silent "http://localhost:8050/render.html?url=$1&timeout=90&wait=${fu_wait:-10}" -o "$2"
}
See also: https://github.com/dhamaniasad/HeadlessBrowsers

Related

How to use brave to automate printing html to pdf?

I currently manually open HTML files using Brave Browser and print the files to pdf files. I want to automate this process in the command line. Is there a way to do it? Since Brave is based on chromium, solutions based on chromium and google-chrome are also welcome.
This is a common use for calling the executable in headless or kiosk modes.
Your milage may vary compared to running heads up with a robotic puppet to press buttons for you, but more often than not is so much simpler for every day basic use in a batch file for multiple uses it is a second or so for each pdf generation.
Edge is not different to Brave or Chromium's so find the executable and append here using windows user folder
--headless --enable-logging --print-to-pdf="%UserProfile%\Documents\Demofile.pdf" --disable-extensions --print-to-pdf-no-header --disable-popup-blocking --run-all-compositor-stages-before-draw --disable-checker-imaging "HTTPs://url"
So darn quick I did not know it had run until opening the result, however note it needs the target to not pop-up blockers like google does, hence next step up is a button pusher to replace you by eating cookies.

Chrome: ERR_BLOCKED_BY_XSS_AUDITOR details

I'm getting this chrome flag when trying to post and then get a simple form.
The problem is that the Developer Console shows nothing about this and I cannot find the source of the problem by myself.
Is there any option for looking this at more detail?
View the piece of code triggering the error for fixing it...
The simple way for bypass this error in developing is send header to browser
Put the header before send data to browser.
In php you can send this header for bypass this error ,send header reference:
header('X-XSS-Protection:0');
In the ASP.net you can send this header and send header reference:
HttpContext.Response.AddHeader("X-XSS-Protection","0");
or
HttpContext.Current.Response.AddHeader("X-XSS-Protection","0");
In the nodejs send header, send header reference :
res.writeHead(200, {'X-XSS-Protection':0 });
// or express js
res.set('X-XSS-Protection', 0);
Chrome v58 might or might not fix your issue... It really depends to what you're actually POSTing. For example, if you're trying to POST some raw HTML/XML data whithin an input/select/textarea element, your request might still be blocked from the auditor.
In the past few days I hit this issue in two different scenarios: a WYSIWYG client-side editor and an interactive upload form featuring some kind of content preview. I managed to fix them both by base64-encoding the raw HTML before POSTing it, then decoding it on the receiving PHP page. This will most likely fix the issue and, most importantly, increase the developer's awareness level regarding the data coming from POST requests, hopefully pushing him into adopting effective data encoding/decoding strategies and strengthen their web application from XSS-type attacks.
To base64-encode your content on the client side you can either use the native btoa() function, which is supported by most browsers nowadays, or a third-party alternative such as a jQuery plugin (I ended up using this, which worked ok).
To base64-decode the POST data you can then use PHP's base64_decode(str) function, ASP.NET's Convert.FromBase64String(str) or anything else (depending on your server-side scenario).
For further info, check out this blog post that I wrote on the topic.
In this case, being a first-time contributor at the Creative forums, (some kind of vBulletin construct) and reduced to posting a PM to the moderators before forum access it is easy for one to encapsulate the nature of the issue from the more popular answers above.
The command was
http://forums.creative.com/private.php?do=insertpm&pmid=
And as described above the actual data was "raw HTML/XML data within an input/select/textarea element".
The general requirement for handling such a bug (or feature) at the user end is some kind of quick fixit tweak or twiddle. This post discusses the option of clearing cache, resetting Chrome settings, creating a new_user or retrying the operation with a new beta release.
It was also suggested that one launches a new instance with the following:
google-chrome-stable --disable-xss-auditor
The launch actually worked in this W10 1703 Chrome 061 edition after this modified version:
chrome --disable-xss-auditor
However, on logging back in to the site and attempting the post again, the same error was generated. Perhaps the syntax wants refining or something else is awry.
It then seemed reasonable to launched Edge and repost from there, which turned out to be no problem at all.
This may help in some circumstances. Modify Apache httpd.conf file and add
ResponseHeader set X-XSS-Protection 0
It may have been fixed in Version 58.0.3029.110 (64-bit).
I've noticed that if there is an apostrophe ' in the text Chrome will block it.
When I update href from javascript:void(0) to # in the page of POST request, it works.
For example:
login
Change to:
login
I solved the problem!
In my case when I make the submmit, I send the HTML to the action and in the model I had a property that accept the HTML with "AllowHTML".
The solution consist in remove this "AllowHTML" property and everything go OK!
Obviously I no longer send the HTML to the action because in my case I do not need it
It is a Chrome bug. The only remedy is to use FireFox until they fix this Chrome bug. XSS auditor trashing a page, that has worked fine for 20 years, seems to be a symptom, not a cause.

Wget source code for resulting webpage after querying?

I'm trying to count the number of times a searchbox errors when I do a bulk input of test data on a website. So I'm trying to wget the query result and seeing whether there is the word "Error" in the html result page. I am trying to download the resulting html webpage after I submit a query to a website. I build the query and used wget to download the resulting webpage.
However, only the main content of the html is shown and not the result because it was done by using an external javascript file. The html that I want can only be seen if I right click on View Page Source on my browser. Is there a non-manual way to use wget/curl to download such page source instead of having to click through all of them?
The javascript is a program, and the result of a program isn't deterministic in polynomial time (for arbitrary input). Thus, it's easier to load the javascript in a sandbox environment, and then execute it against test cases.
Wget and curl can't do that: they don't have any features to examine/execute the result of their fetch. Practically speaking, what you need is a browser that can efficiently load and test the script, as wget/curl from the shell. Luckily, there is already a such thing: Selenium. It is a firefox/chrome/explorer extension, which makes a running an instance of those browsers scriptable, and easily controlled remotely.
If you want to run these browsers noninteractively, without a gui, I suggest using a fake (hardware-less) X server.
Google for: selenium, and google for: headless X. Good luck!

How to disable Google Chrome source compression "?body=1"?

I'm trying to debug some JavaScript for a Rails project and its incredibly frustrating to go line by line when the source code is compressed in the Sources developer tab.
I know this compression is done by Chrome through the body variable. What I want to know is if there is any way to stop Chrome from compressing files in source view, i.e:
\application.js?body=1 --> \application.js
Thank you for your time.
Compression is being done by Rails. Disable it in your configuration:
# config/production.rb (or whatever environment you're in)
config.assets.compress = false
You might want to investigate a new feature in Chrome called Source Maps.
Source Maps allows Chrome to map the compressed source code it receives to the uncompressed original, which in turn means that you can debug the code, even though it's been compressed.
This feature should help you get around this kind of problem without having to change the compression settings on your server.
You can read more about it here: http://blog.mascaraengine.com/news/2012/4/16/sourcemap-support-in-chrome-greatly-improves-debugging.html
I believe this feature is still in test and not yet in the final release version of Chrome. I'm sure it will arrive in due course, but for the time being you may need to install the "Canary" version of Chrome, ie the pre-release version that includes all the forthcoming features that they're still working on.

Programmatically run webinspector on a page on Chrome and dump the output to a file

Is it possible to make chrome load a webpage and make it dump the output of the webinspector for that page ... all via commandline?
Its not possible to do with Chrome devtools. However, you can do similar things with PhantomJs. You can even do network sniffing and much more with PhantomJS. Check it out.