I'm creating a small app that automates a few tasks. I'd like to know if someone used the app (e.g. used puppeteer or some other way).
Is there some giveaway weather puppeteer is browsing or a user is browsing the site manually (different user agent or something)?
These are the headers I currently receive from Puppeteer version v0.12.0:
{
"host": "localhost:3001",
"connection": "keep-alive",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_0) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/64.0.3240.0 Safari/537.36",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"accept-encoding": "gzip, deflate"
}
In the user-agent HeadlessChrome is found, so a naive detection can be based on this.
Be aware that it is very easy for a crawler to change the user agent. Puppeteer exposes a setUserAgent method on its page instances. The docs can be found here.
Related
I am using watir for testing an internal application. The testing recently stops working.
These are the simplified steps of the test:
require 'watir'
Selenium::WebDriver::Chrome.path = 'PATH_TO_CHROME_EXE'
browser = Watir::Browser.new :chrome, :options => {:options => {'useAutomationExtension' => false}}
URL = "TEST_URL"
browser.goto URL
After the goto line executed, the browser failed to navigate to the page. When inspecting the network activity, The request status showing as canceled. I also noticed the "Sec-Fetch-User" field populate in the header:
Sec-Fetch-Mode: navigate
**Sec-Fetch-User: ?1**
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36
If I directly enter the test URL into the browser, the field is not populated and the login pop-up.
This is my setting:
jruby 9.2.5.0 (2.5.0) 2018-12-06 6d5a228 Java HotSpot(TM) 64-Bit Server VM 25.102-b14 on 1.8.0_102-b14 +jit [mswin32-x86_64]
Version 77.0.3865.90 (Official Build) (64-bit)
Watir 6.16.5
ChromeDriver 77.0.3865.40 (f484704e052e0b556f8030b65b953dce96503217-refs/branch-heads/3865#{#442})
Did anyone run into similar issue? Would it be possible to suppress the header?
Steve
I run into exactly the same problem of Sec-Fetch-User: ?1.
The only way I could get to load the webpage I wanted was to ditch the chromedriver and use the firefox driver instead. Firefox did not send those headers.
I think that, if you really need to use chromedriver, and if you need to change the headers is to use a proxy like https://github.com/lightbody/browsermob-proxy
When I using Chrome to open the link https://www.amazon.cn/dp/0132269937, Chrome gave me the correct web page. But when I use wget or curl ,it gave me the 503 error.What's the difference between them ? I guess the root cause is COOKIE,how to figure out it?
Try setting chrome's user agent in the curl/wget request.
For example for curl:
curl -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2403.89 Safari/537.36" https://www.amazon.cn/dp/0132269937
I tried to modify the user agent string using Open Browser with desired_capabilities and discovered that Chrome does not support that technique any more.
After much searching and reading I discovered that there was a new version of Selenium2library that had a new keyword Create Webdriver that is supposed to address this issue.
I modified their example to suite my needs. But no matter what I do, it does not modify the user agent string.
I get no errors, no warnings, no nothing, except a perfectly working browser without a modified user agent string.
I tried to modify other options like --start-maximized with the same result, i.e. no result at all.
Excerpt from keyword that opens Google Chrome and (allegedly) modifies the user agent string:
${options}= Evaluate sys.modules['selenium.webdriver'].ChromeOptions() sys, selenium.webdriver
${options.add_argument}= Set Variable user-agent="Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36 System/ComputerId"
Create WebDriver Chrome chrome_options=${options}
Go To http://www.useragentstring.com
Fashioned after the example given here (at the bottom of the page):
https://github.com/rtomac/robotframework-selenium2library/issues/225
My software setup:
Google Chrome 31.0.1650.59
Selenium 2.39.0
Selenium2library 1.5
Robot Framework 2.8.3
Robot Framework Ride 1.2.2
So what is the problem?
After some more tinkering and reading I managed to figure out a way to get the example working.
Instead of using ${options.add_argument}= I used Call Method ${options} add_argument.
${options}= Evaluate sys.modules['selenium.webdriver'].ChromeOptions() sys, selenium.webdriver
Call Method ${options} add_argument --user-agent=Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36 System/ComputerId
Create WebDriver Chrome chrome_options=${options}
Go To http://www.useragentstring.com
I tried setting up user agent in chrome using RF and working fine for me below snippet...!
${options}= Evaluate sys.modules['selenium.webdriver'].ChromeOptions() sys, selenium.webdriver
${userAgent}= set variable --user-agent="Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 5 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166 Mobile Safari/535.19"
Call Method ${options} add_argument ${userAgent}
Create WebDriver Chrome chrome_options=${options}
I have encountered a similar problem. i tried running your code but no luck in making it work. It just says that the user agent is not defined. I have browsed and came across this code, but unfortunately is written in python:
from selenium import webdriver
import webbrowser
from selenium.webdriver.chrome.options import Options
mobile_emulation = {
"deviceMetrics": { "width": 360, "height": 640, "pixelRatio": 3.0 },
"userAgent": "Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 5 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166 Mobile Safari/535.19" }
chrome_options = Options()
chrome_options.add_experimental_option("mobileEmulation", mobile_emulation)
driver = webdriver.Chrome(chrome_options = chrome_options)
I've built a web application using CORS for communicating with the API.
The API accept all origin and some headers and is written using Play!Framework.
On every request made to the app, I add these headers :
override def doFilter(action: EssentialAction): EssentialAction = EssentialAction { request =>
action.apply(request).map(_.withHeaders(
"Access-Control-Allow-Origin" -> "*",
"Access-Control-Allow-Methods" -> "GET, POST, PUT, DELETE, OPTIONS",
"Access-Control-Allow-Headers" -> "Origin, X-Requested-With, Content-Type, Accept, Host, Api-Token"
))
}
Everything works great in Firefox and Chrome in the desktop (tested under Windows and Linux), but fails on my Android phone (using Chrome browser and the native browser).
I enabled the Chrome debugging via USB, and I can clearly see that Chrome doesn't go further than the OPTIONS request made to the server. Here's the request made :
Request URL:http://127.0.0.1:9100/auth/login
Request Headersview source
Access-Control-Request-Headers:accept, origin, x-requested-with, content-type
Access-Control-Request-Method:POST
Origin:http://192.168.0.15:9000
Referer:http://192.168.0.15:9000/login
User-Agent:Mozilla/5.0 (Linux; Android 4.1.1; HTC One S Build/JRO03C) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.90 Mobile Safari/537.36
And it stops here, no response from the server (but the Network tabs indicate "Pending" for this request !).
So now I don't know what is wrong. How can I fix this problem ?
It may be a red herring, but I see in the headers you show the string:
Request URL:http://127.0.0.1:9100/auth/login
It seems to me that it tries to connect to localhost (127.0.0.1) but your mobile doesn't have a running server, so it fails.
Today I was investigating on something with Fiddler, when I noticed that, when I launch Google Chrome, I have always 3 HEAD requests to some domains which seem to be randomly chosen.
Here is a sample :
HEAD http://fkgrekxzgo/ HTTP/1.1
Host: fkgrekxzgo
Proxy-Connection: keep-alive
Content-Length: 0
User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31
Accept-Encoding: gzip,deflate,sdch
Do you have any idea of why Google Chrome behaves this way ?
Thanks guys
This is Chrome checking to see if your ISP converts non-resolving domains into your.isp.com/search?q=randomnonresolvingdomain
See https://mikewest.org/2012/02/chrome-connects-to-three-random-domains-at-startup
This algorithm seems unusable with forward proxy servers. Browser definitely asks for random page and proxy definitely returns some page -- error (50x), masked error (50x or 40x) or nice "you are lost" page with HTTP code 200 .