Can chrome be used, from the command-line, to retrieve a URL's content to a file? - google-chrome

I've been driving myself mad trying to get curl, wget, the python request module, and others, to simply get me logged in to a website and pull page text there. I can certainly request HTML from the site, but only as an anonymous user. I've spent a few hours with tricks like chrome's "copy cURL" feature, but the website in question is smart enough to defend against login playbacks.
All I want is a way, from the command-line, to do something like:
chrome.exe --output_to_file page.html https://www.endpoint.com/auth_access_only.html
Essentially, I'm looking for chrome to do for me what cURL does, but I want the command-line invocation to be executed as me. I can see how this might open a potential security issue, but I don't mind at all if I have to do something magical to authorize my script. I'm not looking to do anything evil - I just want to be able to write scripts that are as "me" as I am.
I guess that, if it's truly unavoidable, I could suck it up and dust off Internet Explorer. I'd really rather not do that. I'd feel so dirty.

This is possible, but it's not as simple as you're thinking.
You can use the Chrome Debugging Protocol to remote-control Chrome.
You will need to write some code to make this work - I have done similar tasks using the chrome-remote-interface library for Node.js.
Make sure you understand what a browser profile is and where your profile folder lives.
If Chrome is already running using your browser profile: make sure it was launched with --remote-debugging-port=9002 or similar.
If Chrome is not already running using your browser profile: launch it with --user-data-dir="C:\path\to\your\profile" --remote-debugging-port=9002 or similar.
The "running or not" part is a bit tricky - you cannot launch more than one Chrome instance with the same browser profile, but you need to use this user profile because your login data is stored there. It may actually be easiest to create a separate browser profile that is just used for this automated task, and log in to the site there too.
Then, at a high level, your Node.js code will need to connect to Chrome, load the page, wait for the response, and save it to a file. Have a look at the example code for the chrome-remote-interface library - you can definitely piece together what you need from there.
Another option which uses the same underlying technology is to use puppeteer which is another tool to automate Chrome. It is designed to start from a fresh profile every time. If you do this, you'll need to script more interactions:
Visit the site's login page
Type the login credentials into the form and click the login button
Visit the site's authenticated page and save it to a file.
The benefit of this approach is that the result should be more reliable, preventing issues like expired login sessions.

Related

Access all new Chrome Notifications programmatically

I have no previous experience with programming Google Chrome plugins which is why I am starting here to see if what I want to accomplish is possible/reasonable. I do however have a pretty broad experience in programming in general.
What I want:
I want some kind of "trigger" to go off when a new Chrome Notification (you know these little pop ups above the system tray) is popping up. I want to execute some script/code depending on what information the notification contains so that I for example could have an alarm go off if I receive an email from a certain user with a certain key word in the subject and get a pop up from my Gmail Notifier extension.
This is however just an example and I have a bunch of ideas for different notifications from different extensions and websites so don't get caught up on that particular example.
When I look at the Chrome Notification API I see that there is a getAll method that supposedly is getting all the "notifications in the system" but I do not find any Event for new notifications.
I suppose a possibility would be to poll with getAll a couple of times per second (it needs to be really fast for some implementations I have in mind) but it feels very tacky.
Is there any way to easily access new Notifications programmatically in Chrome?
(I'm open to all solutions, programming languages and such...)
Well, I searched long and hard and got involved with the Chromium dev group and asked around there. As far as I could figure out there was no reasonable way of accessing all Notifications programatically.
So what I ended up doing was just download the source-code of Chromium and build my own custom version of chromium adding a very crude API. Worked like a charm and not as complicated as one might think.
Cheers!

MobaXterm URL Protocol Handler Usage

I am wanting to deploy a series of MobaXterm connections (SSH connections) to our users and would like to look at creating a webpage where the users can simply invoke a chosen session by clicking on a link.
I can see that MobaXterm supports this by installing (installed by default) the URL Protocol Handler but I do not know and cannot find anywhere any syntax for the HTML links to invoke the named sessions.
Can anyone help or point me in the right direction to look please?
I found that this :
ssh localhost
allowed me to open a session, but il also created a new 'saved session' and saturated my quota so I am not sure it is the real solution.
But a least it looks like 'mobaxterm' is the protocol, if it can help anybody ending up here. This feature sure lacks documentation :)
I e-mailed Mobatek with the same question. It turns out if you right-click the "User Sessions" node and pick "Generate HTML web page", it will make a web page containing mobaxterm: protocol links to all your existing sessions. It looks like it's designed for exactly your use case, making a webpage to share with other users, so you may be able to just use that generated page as-is.
If you do want to generate your own links, it's a little trickier. I haven't really tried to understand the encoding yet; it's definitely not designed to be particularly user-readable, but since you can make any session you want and export it to a link it shouldn't be too hard to reverse-engineer the fields you care about.

windows tool to view website client content without browser

Per the title, I am looking for a tool or some sort of initiative that's already been undertaken by other developers to simply grab data off of websites so one can navigate them without looking at them in the browser. I am fully aware of how most pages work so what I would like to do is just look at the data that's being pulled from them per windows technology that's already (hopefully) been written. Does this make sense? Here is an example of what I would like to see in a tool:
a windows interface that gives me data about a webpage (menus, submenus,
button names/captions, etc...
be able to execute transactions on those pages by specifying what to do
through the tool's interface (click button, download image, etc..)
does anyone know of a tool out there to do such things?
The closest "program" that comes to mind is
WWW::Mechanize
Advertised as
Handy web browsing in a Perl object
This can in fact be used on Windows, however you
will need Perl.

Getting same information firebug can get?

This all goes back to some of my original questions of trying to "index" a webpage. I was originally trying to do it specifically in java but now I'm opening it up to any language.
Before I tried using HTML unit and other methods in java to get the information I needed but wasn't successful.
The information I need to get from a webpage I can very easily find with firebug and I was wondering if there was anyway to duplicate what firebug was doing specifically for my needs. When I open up firebug I go to the NET tab, then to the XHR tab and it shows a constantly updating page with the information the server is updating. Then when I click on the request and look at the response it has the information I need, and this is all without ever refreshing the webpage which is what I am trying to do(not to mention the variables it is outputting do not show up in the html of the webpage)
So can anyone point me in the right direction of how they would go about this?
(I will be putting this information into a mysql database which is why i added it as a tag, still dont know what language would be best to use though)
Edit: These requests on the server are somewhat random and although it shows the url that they come from when I try to visit the url in firefox it comes up trying to open something called application/jos
Jon, I am fairly certain that you are confusing several technologies here, and the simple answer is that it doesn't work like that. Firebug works specifically because it runs as part of the browser, and (as far as I am aware) runs under a more permissive set of instructions than a JavaScript script embedded in a page.
JavaScript is, for the record, different from Java.
If you are trying to log AJAX calls, your best bet is for the serverside application to log the invoking IP, useragent, cookies, and complete URI to your database on receipt. It will be far better than any clientside solution.
On a note more related to your question, it is not good practice to assume that everyone has read other questions you have posted. Generally speaking, "we" have not. "We" is in quotes because, well, you know. :) It also wouldn't hurt for you to go back and accept a few answers to questions you've asked.
So, the problem is?:
With someone else's web-page, hosted on someone else's server, you want to extract select information?
Using cURL, Python, Java, etc. is too painful because the data is continually updating via AJAX (requires a JS interpreter)?
Plain jQuery or iFrame intercepts will not work because of XSS security.
Ditto, a bookmarklet -- which has the added disadvantage of needing to be manually triggered every time.
If that's all correct, then there are 3 other approaches:
Develop a browser plugin... More difficult, but has the power to do everything in one package.
Develop a userscript. This is much easier to do and technologies such as Greasemonkey deal with the XSS problem.
Use a browser macro technology such as Chickenfoot. These all have plusses and minuses -- which I won't get into.
Using Greasemonkey:
Depending on the site, this can be quite easy.   The big drawback, if you want to record data, is that you need your own web-server and web-application. But this server can be locally hosted on an XAMPP stack, or whatever web-application technology you're comfortable with.
Sample code that intercepts a page's AJAX data is at: Using Greasemonkey and jQuery to intercept JSON/AJAX data from a page, and process it.
Note that if the target page does NOT use jQuery, the library in use (if any) usually has similar intercept capabilities. Or, listening for DOMSubtreeModified always works, too.
If you're using a library such as jQuery, you may have an option such as the jQuery ajaxSend and ajaxComplete callbacks. These could post requests to your server to log these events (being careful not to end up in an infinite loop).

how to browse web site with script to get informations

I need to write a script that go to a web site, logs in, navigates to a page and downloads (and after that parse) the html of that page.
What I want is a standalone script, not a script that controls Firefox. I don't need any javascript support in that just simple html navigation.
If nothing easy to do this exists.. well then something that acts though a web browser (firefox or safari, I'm on mac).
thanks
I've no knowledge of pre-built general purpose scrapers, but you may be able to find one via Google.
Writing a web scraper is definitely doable. In my very limited experience (I've written only a couple), I did not need to deal with login/security issues, but in Googling around I saw some examples that dealt with them - afraid I don't remember URL's for those pages. I did need to know some specifics about the pages I was scraping; having that made it easier to write the scraper, but, of course, the scrapers were limited to use on those pages. However, if you're just grabbing the entire page, you may only need the URL(s) of the page(s) in question.
Without knowing what language(s) would be acceptable to you, it is difficult to help much more. FWIW, I've done scrapers in PHP and Python. As Ben G. said, PHP has cURL to help with this; maybe there are more, but I don't know PHP very well. Python has several modules you might choose from, including lxml, BeautifulSoup, and HTMLParser.
Edit: If you're on Unix/Linux (or, I presume, CygWin) You may be able to achieve what you want with wget.
If you wanted to use PHP, you could use the cURL functions to build your own simple web page scraper.
For an idea of how to get started, see: http://us2.php.net/manual/en/curl.examples-basic.php
This is PROBABLY a dumb question, since I have no knowledge of mac but what language are we talking about here, and also is this a website that you have control over, or something like a spider bot that google might use when checking page content? I know that in C# you can load in objects on other sites using an HttpWebRequest and a stream reader... In java script (this would only really work if you know what is SUPPOSED to be there) you could open the web page as the source of an iframe, and using java script traverse the contents of all the elements on the page... or better yet, use jquery.
I need to write a script that go to a web site, logs in, navigates to a page and downloads (and after that parse) the html of that page.
To me this just sounds like a POST or GET request to the URL of the login page could do the job.With the proper parameters username and password (depending on the form input names used on the page) set in the request, the result will be the html of the page that you can then parse as you please.
This can be done with virtually any language. What language do you want to use?
I recently did exactly what you’re asking for in a C# project. If login is required your first request is likely to be a post and include credentials. The response will usually include cookies which persist the identity across subsequent requests. Use Fiddler to look at what form data (field names and values) is being posted to the server when you logon normally with your browser. Once you have this you can construct an HttpWebRequest with the form data and store the cookies from the response in a CookieContainer.
The next step is to make the request for the content you actually want. This will be another HttpWebRequest with the CookieContainer attached. The response can be read by a StreamReader which you can than read and convert to a string.
Each time I’ve done this it has usually been a pretty laborious process to identify all the relevant form data and recreate the requests manually. Use Fiddler extensively and compare the requests your browser is making when using the site normally with the requests coming from your script. You may also need to manipulate the request headers; again, use Fiddler to construct these by hand, get them submitting correctly and the response as you expect then code it. Good luck!