Is there a way to authorize Puppeteer for 2FA authentication?
Scenario:
Run Puppeteer and visit a 2FA protected URL
Enter credentials and wait for redirection
Request one-time passcode
Enter the passcode
Wait for redirection
Close the Puppeteer instance
Run Puppeteer and visit a 2FA protected URL
The protected page should be loaded not asking for the passcode anymore
This scenario doesn't work in my case :(
Any other library that can go successfully through this scenario?
There are two possible scenarios to handle 2FA using puppeteer, depending on the nature of the situation (it's not entirely clear from the way you phrase the question).
Replicating session data (in this scenario, you can't have someone provide you the code the second time, you need to bypass it in the future altogether):
I'm going to assume the site you are dealing with is performing some sort of analysis on the browser to determine whether to prompt for a 2FA code or not. In my experience, sometimes there is a random element to this that you can't control, but replicating the exact browser state (user data, cookies, everything) is a start. Pair that with a consistent IP address that has answered correctly previously, and I think chances are very very good.
See my code here, or if that is too heavy, here is a simple implementation of the functions I'm using to save the session data: simple code. In short, I'm converting the session data, cookies- everything that distinguishes that instance of chromium and stuffing it into a base64 string, then later I simply load that data and assume the exact state the browser had previously. I'm pretty sure this is what you want.
Interactive Bot
I'm unsure if this applies to your use-case, but I faced a situation where I needed to pull 2FA codes from a user's phone/email in real-time while the puppeteer was in the middle of performing a login process. The browser could not re-launch because the 2FA code would no longer be valid. It's not a trivial problem. I ended up using Redis and built a framework puppeteer-theater that addressed this use-case among pretty much every scraping/automation workflow I have encountered.
Feel free to reach out if you are looking for specific help.
Related
I've been driving myself mad trying to get curl, wget, the python request module, and others, to simply get me logged in to a website and pull page text there. I can certainly request HTML from the site, but only as an anonymous user. I've spent a few hours with tricks like chrome's "copy cURL" feature, but the website in question is smart enough to defend against login playbacks.
All I want is a way, from the command-line, to do something like:
chrome.exe --output_to_file page.html https://www.endpoint.com/auth_access_only.html
Essentially, I'm looking for chrome to do for me what cURL does, but I want the command-line invocation to be executed as me. I can see how this might open a potential security issue, but I don't mind at all if I have to do something magical to authorize my script. I'm not looking to do anything evil - I just want to be able to write scripts that are as "me" as I am.
I guess that, if it's truly unavoidable, I could suck it up and dust off Internet Explorer. I'd really rather not do that. I'd feel so dirty.
This is possible, but it's not as simple as you're thinking.
You can use the Chrome Debugging Protocol to remote-control Chrome.
You will need to write some code to make this work - I have done similar tasks using the chrome-remote-interface library for Node.js.
Make sure you understand what a browser profile is and where your profile folder lives.
If Chrome is already running using your browser profile: make sure it was launched with --remote-debugging-port=9002 or similar.
If Chrome is not already running using your browser profile: launch it with --user-data-dir="C:\path\to\your\profile" --remote-debugging-port=9002 or similar.
The "running or not" part is a bit tricky - you cannot launch more than one Chrome instance with the same browser profile, but you need to use this user profile because your login data is stored there. It may actually be easiest to create a separate browser profile that is just used for this automated task, and log in to the site there too.
Then, at a high level, your Node.js code will need to connect to Chrome, load the page, wait for the response, and save it to a file. Have a look at the example code for the chrome-remote-interface library - you can definitely piece together what you need from there.
Another option which uses the same underlying technology is to use puppeteer which is another tool to automate Chrome. It is designed to start from a fresh profile every time. If you do this, you'll need to script more interactions:
Visit the site's login page
Type the login credentials into the form and click the login button
Visit the site's authenticated page and save it to a file.
The benefit of this approach is that the result should be more reliable, preventing issues like expired login sessions.
I've been writing an extension that allows the user to issue voice commands to control their browser, and things were going great until I hit a catastrophic problem. It goes like this:
The speech recognition object is in continuous mode, and whenever the onerror: 'no-speech' or onend events fire, it restarts. This way, the extension is constantly waiting to accept input and reacts whenever a command is issued, even after 5 minutes of silence.
After a few days of of development, today I reached the point where I was testing it in practical use, and I found that after a little while (and with no change to anything on my part), my onend event started firing constantly. As in, looking at the console, I would see 18,000 requests being made in the space of three seconds, all being instantly denied, thus triggering onend and restarting the request.
I'm aware that it would be optimal to wait for sound before sending a request, or to have local speech recognition capabilities without the need for a remote server, but the present API does not allow that.
Are my suspicions correct? Am I getting request limited?
Are my suspicions correct? Am I getting request limited?
Yes
I'm aware that it would be optimal to wait for sound before sending a request, or to have local speech recognition capabilities without the need for a remote server, but the present API does not allow that.
To hide the IP source of your request you can use anonymizer networks like Tor, though it will not be fast.
It's naive to assume Google will spend resources to process all audio being recorded on your system. In your application development it is better to rely on API which provides at least some guarantees. It could be either commercial API or open source implementation like CMUSphinx.
With CMUSphinx, you can also properly implement command keyword detection and increase accuracy by specifying the grammar of the commands.
You could also use a Voice Activity Detection (VAD) algorithm to detect when a user is talking. This can be done by either setting a volume threshold or a frequency threshold (Human speech is usually less than 400hz for example). This way, you won't send useless requests to Google unless those conditions are meant. I would not recommend using Tor as this would significantly increase latency. CMUSphinx is probably the best local system option, but if still want to use a web-based service, I would recommend either using a Voice Activity Detection algorithm or finding a different web-based software.
I am trying to figure out a way to do an internet connectivity check for an AIR for iOS app. Previously, I was using (against my better judgement) a URLMonitor that checked Google once every 30 seconds. I did not like putting that load onto Google and neither did they; this morning, our network got flagged as a possible DDoS attacker simply from testing the app. So I had to disable this type of check and move on.
I have thought about using the NetworkInfo ANE from Adobe, but that presents its own issues in determining internet connectivity. The only way I can think of doing it is to check for interfaces "en0" and "pdpxx" (which correspond to WiFi and Cellular interfaces, respectively) and check their IPs to ensure they are not in the 192.168.x.x, 10.10.x.x, or 127.0.x.x ranges. However, I am not entirely sure those are the only static router/localhost IPs out there and there is always the possibility that the network interface names will change in the future, which would render this monitor useless. There is also the issue of IPv6 possibly throwing a wrench into this method as well.
Is there another way to check if the user is connected to the internet? I've searched multiple times and it seems that these are the only two ways to check. If that is the case, what is the best way to check?
I'm surprised that you got flagged as a DDoS attacker, are you sure that's what happened?
In any case, if your not happy with putting the load onto someone else's server, then make your own server, just a basic setup that you use with the URLMonitor. You don't have to use google's url with the URLMonitor, you can pass it another URLRequest, which could point to your own server.
monitor = new URLMonitor(new URLRequest("http://www.you-own-server.com"));
This might also be useful if you decide that you want to pass more data between the app and the server. It's your server, so can do what you want with it.
I don't think there's any other way to check if the user is connected to the internet. To be honest, I don't see why there would be. Checking for the users interfaces/wan0 etc... probably would be possible, but you'd need another program, maybe a simple python or c++ program, that Air could use to check these things, but that sounds like the long way round.
This all goes back to some of my original questions of trying to "index" a webpage. I was originally trying to do it specifically in java but now I'm opening it up to any language.
Before I tried using HTML unit and other methods in java to get the information I needed but wasn't successful.
The information I need to get from a webpage I can very easily find with firebug and I was wondering if there was anyway to duplicate what firebug was doing specifically for my needs. When I open up firebug I go to the NET tab, then to the XHR tab and it shows a constantly updating page with the information the server is updating. Then when I click on the request and look at the response it has the information I need, and this is all without ever refreshing the webpage which is what I am trying to do(not to mention the variables it is outputting do not show up in the html of the webpage)
So can anyone point me in the right direction of how they would go about this?
(I will be putting this information into a mysql database which is why i added it as a tag, still dont know what language would be best to use though)
Edit: These requests on the server are somewhat random and although it shows the url that they come from when I try to visit the url in firefox it comes up trying to open something called application/jos
Jon, I am fairly certain that you are confusing several technologies here, and the simple answer is that it doesn't work like that. Firebug works specifically because it runs as part of the browser, and (as far as I am aware) runs under a more permissive set of instructions than a JavaScript script embedded in a page.
JavaScript is, for the record, different from Java.
If you are trying to log AJAX calls, your best bet is for the serverside application to log the invoking IP, useragent, cookies, and complete URI to your database on receipt. It will be far better than any clientside solution.
On a note more related to your question, it is not good practice to assume that everyone has read other questions you have posted. Generally speaking, "we" have not. "We" is in quotes because, well, you know. :) It also wouldn't hurt for you to go back and accept a few answers to questions you've asked.
So, the problem is?:
With someone else's web-page, hosted on someone else's server, you want to extract select information?
Using cURL, Python, Java, etc. is too painful because the data is continually updating via AJAX (requires a JS interpreter)?
Plain jQuery or iFrame intercepts will not work because of XSS security.
Ditto, a bookmarklet -- which has the added disadvantage of needing to be manually triggered every time.
If that's all correct, then there are 3 other approaches:
Develop a browser plugin... More difficult, but has the power to do everything in one package.
Develop a userscript. This is much easier to do and technologies such as Greasemonkey deal with the XSS problem.
Use a browser macro technology such as Chickenfoot. These all have plusses and minuses -- which I won't get into.
Using Greasemonkey:
Depending on the site, this can be quite easy. The big drawback, if you want to record data, is that you need your own web-server and web-application. But this server can be locally hosted on an XAMPP stack, or whatever web-application technology you're comfortable with.
Sample code that intercepts a page's AJAX data is at: Using Greasemonkey and jQuery to intercept JSON/AJAX data from a page, and process it.
Note that if the target page does NOT use jQuery, the library in use (if any) usually has similar intercept capabilities. Or, listening for DOMSubtreeModified always works, too.
If you're using a library such as jQuery, you may have an option such as the jQuery ajaxSend and ajaxComplete callbacks. These could post requests to your server to log these events (being careful not to end up in an infinite loop).
I have an HTML5/jquery mobile web app at http://app.bluedot.mobi. It is used for long distance races to track competitors via SPOT satellite tracking. The issue I have not yet resolved is that when loading the app when no data connection exists, the browser throws a "no data connection" alert popup as it is attempting to fetch the manifest during the checking event. Even when a data connection is present, loading the app can take a very long time. There are ~ 500 files to check. The fastest way to load the app (from a phone) is to be in airplane mode and dismiss the browser's alert - not so elegant.
Rather than force an update on users who tend to be in the backcountry with a spotty connection, I want to use applicationCache.update() programmatically, giving the user control over the process and speeding up app load whether on or offline.
Is this currently possible with the HTML5 spec and respective browser implementations?
Sounds like you need the abort() method. Unfortunately it is very new, and it will probably be a while before it is implemented by the majority of mobile browsers.
There are ~ 500 files to check.
It sounds like you're implying that the browser checks each file to see if there's any of them which has changed. This is not correct. The browser only checks the manifest file if that has changed, and that is a simple byte check. If the manifest file has not changed, the browser believes nothing has changed.
So if your application is slow to start, it might be your because your application is complex and there's alot of HTML and Javascript to parse. I would advise you to take a look at the application and see if there's anything you can optimize. In that case, you might want to take a look at Yahoo's Best Practices for Speeding Up Your Web Site page.
For example, I noticed you have a lot of Javascript code in the HEAD section. The beforementioned article advices us to move all Javascript (To the extent of what's possible) to the bottom of the page, so that the browser can start rendering the page as soon as possible. And there's a lot of other sound advice in the article. So take a look, I'm sure you'll find it useful. :-)