Puppeteer avoid akamai detection - puppeteer

I am trying to scrape a page protected by akamai. There used to be no problems but since a couple of days i am having some troubles.
To scrape the site i am using the latest version of puppeteer, with puppeteer extra and steath.
I have found out that it blockes me from the first connection, so the problem is caused by browser info.
I got the parameters tested on the following site to be nominal. https://bot.sannysoft.com/ still i am getting blocked. Any ideas?
I will upload the blocked .har file shortly.

Related

CORS still working after disabling browser web security

So I have been testing a website using Selenium, specifically a page with a credit card form embedded in an iframe. I want to access the content of said iframe but due to CORS I get the error message:
Uncaught DOMException: Blocked a frame with origin "<url>" from accessing a cross-origin frame.
I made a quick google search and realized you can bypass CORS using the "--disable-web-security" flag, so my code now looks like the following:
options = webdriver.ChromeOptions()
options.add_argument("--disable-web-security")
self.driver = webdriver.Chrome(os.getenv("CHROME_DRIVER"), options=options)
Surprisingly, the CORS exception keeps on popping and I'm currently stuck on how to go from here. I really do have to access the iframe's content, there isn't a workaround for this.
Since I was confused as to why this didn't work, I replicated the problem with another website, in this case Amazon, which functions similarly (credit card form embedded in an iframe). I ran the code with web security enabled and I get the same CORS error, as expected. But then I disable web security, exactly as mentioned before, and it works! I can now access the iframe.
I also downgraded to an older version of Chrome (86) from the current most stable (88) and nothing happens again. I'm using Ubuntu 20.04.
So now I'm wondering - why isn't the flag working for the first scenario I mentioned? Is there a chance the first website is forcing the browser's web security or something related? I'm not an expert in web development so any input on this would be valuable.
Turns out, all I needed was to add --disable-site-isolation-trials and a non-empty --user-data-dir along with --disable-web-security as stated here
For me, adding these flags to my Chrome launcher fixed my CORS issues:
--disable-web-security --user-data-dir=~/chromeTemp
Pls note, this only turns warnings off, this is definitely not a CORS solution

Google Chrome Translate Page Does Not Work

In Google Chrome, Translate to English (or any other language) function was working fine, but all of the sudden it stopped working.
By looking at the console, I see error messages when the page tries to translate:
Failed to load resource: the server responded with a status of 403 () https://translate.googleapis.com/translate_a/t?anno=3&client=te_lib&format=html&v=1.0&key=no&logld=vTE_20170619_02&sl=da&tl=en&tc=1&tk=927511.556110&mode=1
I tried completely uninstalling Google Chrome and installing it again but it still does not work, it also does not work in Incognito Window.
It works fine in other machines, any ideas?
It looks like that you don't have the api key:
key=no
Look at environment variables to see if GOOGLE_API_KEY is set to no and delete it.
For now, I would just translate pages the long way.
I.E.:
Going to https://translate.google.com and typing your address in the input box, and going to address produced in the output box.
If you want to continue our discussion from the comments, I have created a chat for us to keep talking about your issue.

This webpage has a redirect loop (ERR_TOO_MANY_REDIRECTS)

We have a site that is not working in Google Chrome V44. It works well in IE and Firefox. All of sudden after updating chrome browser to V44, we unable to login to the system and just receiving this problem.
We're trying to figure out as why this is happening. We have 2 instances of our system in our server. Our live site is the one that is not working in Chrome V44 while the other - our demo site is fine. The only difference of the these sites is that our live has SSL. So our first impression is that there's a problem with Chrome V44 with our site with certificate.
I think Chrome can't establish secure connection with the site.
Has anyone experienced this issue?
Please help. Thanks.
This is due to a SSL in Chrome V44 where it incorrectly sends a HTTP_HTTPS header to be set, however the HTTPS header is still set correctly. It has been quite widely reported: http://www.zdnet.com/article/brand-new-chrome-44-release-added-a-bug/
https://ma.ttias.be/chrome-44-sending-https-header-by-mistake-breaking-web-applications-everywhere/
In order to stop this, in PHP, I added the following to the very top of my index.php file:
<?php
if (!isset($_SERVER['HTTPS'])) {
$_SERVER['HTTP_HTTPS'] = 0;
}
?>
Ensuring there is no space between the ?> and the next
I've recently had the chrome redirect loop on gmail.
Possibly significantly, I was doing some work involving changing my system time and it hasn't worked since. This guide helped to do that.
There is an available work-around, which is to use gmail in incognito mode, which does still work, although requires you to log in each time
In that case I would say this is an internal problem with you organization's setup. I would speak with your SysAdmin or IT staff. But just to be sure, use your phone carrier's internet, or a cafe nearby, basically something off your network to check if you can reproduce the error.
The issue with my MVC solution was, i had recently updated complete Nuget packages in my solution. After the update i forgot to update
<assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
<dependentAssembly>
section with new dll bindings which installed while update. So in my hosting server due to connectionstring issue, i was not overwriting the current .config file. So once i did the update in assemblyBinding section in .config file the issue gone.
There might be many reasons for the redirect loop. If you are confident your setup is done properly without any errors, then it might be the issue with your browser. You can try the following
Deleting cache and cookies
Correcting your system time (if it is not set to automatic)
Resetting the browser
Source
You should be able to fix this problem, you can try to clear your cookies in your browser
Open your Chrome browser.
Type "chrome://settings/clearBrowserData" in the address bar and press Enter.
Make sure you are clearing items from the Beginning of time. Then select Cookies and other site data. Click the Clear browsing data button.
If find from google search, this tutorial could helps you https://windows10freeapps.com/fix-err_too_many_redirects-error-google-chrome-browser

Chrome HTTP2.0 throws SPDY PROTOCOL ERROR

I'm using Windows 10 Technical preview. I know it's not yet tweeked out to full usage, but here is my problem.
On local IIS I'm developing my web app. It loads most of the data via ASP.NET MVC API. After the upgrade to Windows 10 I started to get
net::ERR_SPDY_PROTOCOL_ERROR
for all AJAX calls to API. The HTML page loads normally, but the dynamic loading of data content fails. I managed to figure it out by starting Chrome with parameters
--use-spdy=off --use-system-ssl
Strange thing is that on the first start I always get this error and have to restart Chrome. Other browsers fail too, but not with specific error. For the transfer there is used HTTP2.0 protocol, which is based on SPDY protocol.
Do I have to turn something off in IIS?
Edit:
Seems like an IIS problem with HTTP2.0. When trying to enter the site from Windows 8.1 I get the same error.
Most people’s reaction to this error would be to reload the web page. We would actually recommend this as a first response. Sometimes this even does work.
Interesting Factoid: SPDY actually stands for “speedy” and is used to reduce a web pages’ load time
If reloading the web pages does not remove the error, then you should close down Google Chrome and restart it. This alone will not really solve the problem because you will need to clear the cache as soon as you restart the browser.
or in cmd clear dns using this command ipconfig/flushdns

Refused to display in an frame because it set 'X-Frame-Options' to 'SAMEORIGIN'

FOllowing the video and sample code from https://developers.google.com/drive/web/quickstart/quickstart-js to upload any file from local system to google drve i was stuck with an error which states
"Refused to display 'https://accounts.google.com/o/oauth2/auth?client_id=%3CAIzaSyDa2kGIMQCLdfzk…%2Flocalhost&response_type=token&state=513052220%7C0.1330524626&authuser=0' in a frame because it set 'X-Frame-Options' to 'SAMEORIGIN'."
I registered for a new client ID and also changes in the code with new clien ID
Also disabled the chrome security using "chrome.exe --user-data-dir="c:/temp/chromedev" --disable-web-security" in run command.
I faced the below issue while loading the PDF document into iframe from google drive.
Problem:
Refused to display 'https://drive.google.com/file/d/1ipnylkWoSx4NS0GtmsQVFUq7ms339DAW/view' in a frame because it set 'X-Frame-Options' to 'sameorigin'.
Reason:
As per my analysis, I faced this issue due to the URL is not considering as embedded to allow loading from other websites.
Solution:
I did even tried with the solution of adding ?embedded=true to the URL but still no luck. Then I got one solution from https://support.google.com/drive/thread/34363118?hl=en and replaced the view with preview in my URL and problem solved.
https://drive.google.com/file/d/1ipnylkWoSx4NS0GtmsQVFUq7ms339DAW/preview
The browswer sequrity is nonsense since Google Realtime API is intended to work with default security.
I received this error when supplied clientId = 'garbage'. Indeed, your client_id=<AIzaSyDa2kGIMQCLdfzk looks like garbage. Correct client_id shoud look like '1088706429537-4oqhqr7o826ditbok23sll1rund1jim1.apps.googleusercontent.com'
This is not related to disabling security in chrome browser.
I believe there might me some issues with my XAMPP Windows local host.Deploying the same application in node.js server or hosting Dropbox/Google Drive as a web app also works fine.
I don't know if this is still relevant, i got the same error loading a google docs in an iframe. After stripping the page of everything i had, removing the doctype solved my issue.
After this i found out there was a parameter ?embedded=true which allowed me to load it in a fancybox, which looks nicer anyway. Hope it helps any1, sorry if it's not relevant for you.