I am using the below codes in my script but the internet is not getting disconnected.
1st try:
await browser.setNetworkConditions({
latency: 0,
throughput: 0,
offline: true
});
2nd try:
browser.setNetworkConnection(0)// airplane mode off, wifi off, data off
3rd try:
await browser.throttle({
offline: true,
downloadThroughput: 200 * 1024 / 8,
uploadThroughput: 200 * 1024 / 8,
latency: 20
})
Some links I have found in wdio official documentation but none of them are working.
browser.throttle
setNetworkConnection
setnetworkconditions
Could anyone suggest to me the best way to do this?
Related
I'm experiencing a console error on Chrome & Firefox after every few minutes but events are still coming through - this continues on for a while until eventually Chrome logs a 520 error & Firefox 'can't establish a connection to the server' resulting in the eventsource breaking completely.
I tested this locally and no issues at all in either browser. The difference in our production environment is that we are behind an nginx proxy and CloudFlare security.
These are the headers I'm using in the backend:
'Connection': 'keep-alive',
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'X-Accel-Buffering': 'no'
This is the code in the backend (node JS)
res.writeHead(200, headers);
res.setTimeout(1000 * 60 * 120)
res.flushHeaders()
req.on('close', () => {
res.end()
})
res.on('timeout', () => res.end())
And these are the nginx configurations I have tried:
proxy_set_header Connection '';
proxy_http_version 1.1;
chunked_transfer_encoding off;
proxy_buffering off;
proxy_cache off;
http2_max_field_size 64k;
http2_max_header_size 512k;
proxy_read_timeout 12h;
Front-end code
let source = new EventSource('/events')
source.onerror = (e) => console.log(e)
source.onmessage = (msg) => console.log(msg)
Chrome Console:
Appreciate any advice. Thanks
Firefox console:
I've went through this myself and it turned out to be caused by CloudFlare proxy. Seems CloudFlare will terminate the connection after some time.
To solve this simply turn off the CloudFlare proxy as described here: https://docs.readme.com/docs/turn-off-cloudflare-proxy
Make sure to create a subdomain so you don't disable proxying for your entire website.
I am trying to configure envoy as load balancer, currently stuck with fallbacks. In my playground cluster I have 3 backend servers and envoy as front proxy. I am generating some traffic on envoy using siege and watching the responses. While doing this I stop one of the backends.
What do I want: Envoy should resend failed requests from stopped backend to healthy one, so I will not get any 5xx responses
What do I get: When stopping backend I get some 503 responses, and then everything becomes normal again
What am I doing wrong? I think, fallback_policy should provide this functionality, but it doesn't work.
Here is my config file:
node:
id: LoadBalancer_01
cluster: HighloadCluster
admin:
access_log_path: /var/log/envoy/admin_access.log
address:
socket_address: { address: 0.0.0.0, port_value: 9901 }
static_resources:
listeners:
- name: http_listener
address:
socket_address: { address: 0.0.0.0, port_value: 80 }
filter_chains:
- filters:
- name: envoy.http_connection_manager
typed_config:
"#type": type.googleapis.com/envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager
stat_prefix: ingress_http
codec_type: AUTO
route_config:
name: request_route
virtual_hosts:
- name: local_service
domains: ["*"]
require_tls: NONE
routes:
- match: { prefix: "/" }
route:
cluster: backend_service
timeout: 1.5s
retry_policy:
retry_on: 5xx
num_retries: 3
per_try_timeout: 3s
http_filters:
- name: envoy.router
typed_config:
"#type": type.googleapis.com/envoy.config.filter.http.router.v2.Router
name: envoy.file_access_log
typed_config:
"#type": type.googleapis.com/envoy.config.accesslog.v2.FileAccessLog
path: /var/log/envoy/access.log
clusters:
- name: backend_service
connect_timeout: 0.25s
type: STATIC
lb_policy: ROUND_ROBIN
lb_subset_config:
fallback_policy: ANY_ENDPOINT
outlier_detection:
consecutive_5xx: 1
interval: 10s
load_assignment:
cluster_name: backend_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 1.1.1.1
port_value: 10000
- endpoint:
address:
socket_address:
address: 2.2.2.2
port_value: 10000
- endpoint:
address:
socket_address:
address: 3.3.3.3
port_value: 10000
health_checks:
- http_health_check:
path: /api/liveness-probe
timeout: 1s
interval: 30s
unhealthy_interval: 10s
unhealthy_threshold: 2
healthy_threshold: 1
always_log_health_check_failures: true
event_log_path: /var/log/envoy/health_check.log```
TL;DR
You can use a circuit breaker (see config example below), alongside with your retry_policy and outlier_detection.
Explanation
Context
I have successfully reproduced your issue with your config (except the health_checks part, because I found that it was not useful to reproduce your problem).
I have run envoy and my backend (2 apps load-balanced), generated some traffic with hey (50 threads making requests concurrently during 10 seconds):
hey -c 50 -z 10s http://envoy:8080
And I have stopped one backend app around 5s after the command started.
Result
When diving into envoy admin /stats endpoint, I noticed interesting stuff:
cluster.backend_service.upstream_rq_200: 17899
cluster.backend_service.upstream_rq_503: 28
cluster.backend_service.upstream_rq_retry_overflow: 28
cluster.backend_service.upstream_rq_retry_success: 3
cluster.backend_service.upstream_rq_total: 17930
There were indeed 28 503 responses when I stopped one backend app. But retry worked somehow: 3 retries were successful (upstream_rq_retry_success), but 28 other retries failed (upstream_rq_retry_overflow), resulting to 503 errors. Why ?
From the cluster stats docs:
upstream_rq_retry_overflow : Total requests not retried due to circuit breaking or exceeding the retry budget
Fix
To solve this, we can add a circuit breaker in the cluster (I have been generous with max_requests, max_pending_requests and max_retries parameters for the example). The interesting part is retry_budget.budget_percent value:
clusters:
- name: backend_service
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
outlier_detection:
consecutive_5xx: 1
interval: 10s
circuit_breakers:
thresholds:
- priority: "DEFAULT"
max_requests: 0xffffffff
max_pending_requests: 0xffffffff
max_retries: 0xffffffff
retry_budget:
budget_percent:
value: 100.0
From the retry_budget docs:
budget_percent: Specifies the limit on concurrent retries as a percentage of the sum of active requests and active pending requests. For example, if there are 100 active requests and the budget_percent is set to 25, there may be 25 active retries.
This parameter is optional. Defaults to 20%.
I set it to 100.0 to allow 100% of active retries.
When running the example again with this new config, there is no more upstream_rq_retry_overflow, so no more 503 errors:
cluster.backend_service.upstream_rq_200: 17051
cluster.backend_service.upstream_rq_retry_overflow: 0
cluster.backend_service.upstream_rq_retry_success: 5
cluster.backend_service.upstream_rq_total: 17056
Note that if you experience upstream_rq_retry_limit_exceeded, you can try to set and increase retry_budget.min_retry_concurrency (default when not set is 3):
retry_budget:
budget_percent:
value: 100.0
min_retry_concurrency: 10
I wrote a simple Selenium test (opening a page) of a secured site in a headless mode using Chrome 59 beta version.
I'm getting an the following exception while executing my code. The exception is thrown while initializing the driver
When I rerun my script after commenting the headless option (options.addArguments("headless")) it shows Chrome and running fine but my objective is to run it as headless. Could you please provide your thoughts on resolving my problem?
Exception:
Starting ChromeDriver 2.29.461585 (0be2cd95f834e9ee7c46bcc7cf405b483f5ae83b) on port 4971
Only local connections are allowed.
Exception in thread "main" org.openqa.selenium.WebDriverException: unknown error: Chrome failed to start: exited abnormally
(Driver info: chromedriver=2.29.461585 (0be2cd95f834e9ee7c46bcc7cf405b483f5ae83b),platform=Mac OS X 10.12.2 x86_64) (WARNING: The server did not provide any stacktrace information)
Command duration or timeout: 60.14 seconds
Build info: version: '3.4.0', revision: 'unknown', time: 'unknown'
Here are the steps:
I'm using Scala with SBT on Mac.
Chrome 59 beta version
ChromeDrive 2.29 release version.
Added the following dependencies
"org.seleniumhq.selenium" % "selenium-chrome-driver" % "3.4.0"
"org.seleniumhq.selenium" % "selenium-support" % "3.4.0"
"net.lightbody.bmp" % "browsermob-core" % "2.1.4"
Scala Code:
val username = "username"
val password = "password"
val domainname = "yoursecuredomain.com"
val browserMobProxyServer = new BrowserMobProxyServer()
browserMobProxyServer.start(0)
browserMobProxyServer.autoAuthorization(domainname, username, password, AuthType.BASIC)
val seleniumProxy = ClientUtil.createSeleniumProxy(browserMobProxyServer)
val options = new ChromeOptions()
options.addArguments("headless")
options.addArguments("--disable-gpu")
options.setBinary("""/Applications/Google Chrome.app/Contents/MacOS/Google Chrome""")
val desiredCapabilities = new DesiredCapabilities()
desiredCapabilities.setCapability(ChromeOptions.CAPABILITY, options)
desiredCapabilities.setCapability(CapabilityType.PROXY, seleniumProxy)
val driver: WebDriver = new ChromeDriver(desiredCapabilities)
val baseUrlString = s"""https://$domainname"""
driver.navigate().to(baseUrlString)
Thread.sleep(3000)
println("title: " + driver.getTitle)
driver.quit()
browserMobProxyServer.abort()
According to the 2.29 webdriver notes page, It says that
----------ChromeDriver v2.29 (2017-04-04)----------
Supports Chrome v56-58
so u have to downgrade your chrome version to be compatible with the latest chrome webdriver.
source:
https://chromedriver.storage.googleapis.com/2.29/notes.txt
Hello how can I get my crawlspider to work, I am able to login but nothing happens I don't really get not scrape. Also I been reading the scrapy doc and i really don't understand the rules to use to scrape. Why is nothing happening after "Successfully logged in. Let's start crawling!"
I also had this rule at the end of my else statement but remove it because it wasn't even being called because it was inside my else block. so I moved it at the top of start_request() method but got errors so i removed my rules.
rules = (
Rule(extractor,callback='parse_item',follow=True),
)
my code:
from scrapy.contrib.spiders.init import InitSpider
from scrapy.http import Request, FormRequest
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.contrib.spiders import Rule
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from linkedconv.items import LinkedconvItem
class LinkedPySpider(CrawlSpider):
name = 'LinkedPy'
allowed_domains = ['linkedin.com']
login_page = 'https://www.linkedin.com/uas/login'
# start_urls = ["http://www.linkedin.com/csearch/results?type=companies&keywords=&pplSearchOrigin=GLHD&pageKey=member-home&search=Search#facets=pplSearchOrigin%3DFCTD%26keywords%3D%26search%3DSubmit%26facet_CS%3DC%26facet_I%3D80%26openFacets%3DJO%252CN%252CCS%252CNFR%252CF%252CCCR%252CI"]
start_urls = ["http://www.linkedin.com/csearch/results"]
def start_requests(self):
yield Request(
url=self.login_page,
callback=self.login,
dont_filter=True
)
# def init_request(self):
#"""This function is called before crawling starts."""
# return Request(url=self.login_page, callback=self.login)
def login(self, response):
#"""Generate a login request."""
return FormRequest.from_response(response,
formdata={'session_key': 'myemail#gmail.com', 'session_password': 'mypassword'},
callback=self.check_login_response)
def check_login_response(self, response):
#"""Check the response returned by a login request to see if we aresuccessfully logged in."""
if "Sign Out" in response.body:
self.log("\n\n\nSuccessfully logged in. Let's start crawling!\n\n\n")
# Now the crawling can begin..
self.log('Hi, this is an item page! %s' % response.url)
return
else:
self.log("\n\n\nFailed, Bad times :(\n\n\n")
# Something went wrong, we couldn't log in, so nothing happens.
def parse_item(self, response):
self.log("\n\n\n We got data! \n\n\n")
self.log('Hi, this is an item page! %s' % response.url)
hxs = HtmlXPathSelector(response)
sites = hxs.select('//ol[#id=\'result-set\']/li')
items = []
for site in sites:
item = LinkedconvItem()
item['title'] = site.select('h2/a/text()').extract()
item['link'] = site.select('h2/a/#href').extract()
items.append(item)
return items
myoutput
C:\Users\ye831c\Documents\Big Data\Scrapy\linkedconv>scrapy crawl LinkedPy
2013-07-12 13:39:40-0500 [scrapy] INFO: Scrapy 0.16.5 started (bot: linkedconv)
2013-07-12 13:39:40-0500 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetCon
sole, CloseSpider, WebService, CoreStats, SpiderState
2013-07-12 13:39:41-0500 [scrapy] DEBUG: Enabled downloader middlewares: HttpAut
hMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, De
faultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMi
ddleware, ChunkedTransferMiddleware, DownloaderStats
2013-07-12 13:39:41-0500 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMi
ddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddle
ware
2013-07-12 13:39:41-0500 [scrapy] DEBUG: Enabled item pipelines:
2013-07-12 13:39:41-0500 [LinkedPy] INFO: Spider opened
2013-07-12 13:39:41-0500 [LinkedPy] INFO: Crawled 0 pages (at 0 pages/min), scra
ped 0 items (at 0 items/min)
2013-07-12 13:39:41-0500 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:602
3
2013-07-12 13:39:41-0500 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2013-07-12 13:39:41-0500 [LinkedPy] DEBUG: Crawled (200) <GET https://www.linked
in.com/uas/login> (referer: None)
2013-07-12 13:39:42-0500 [LinkedPy] DEBUG: Redirecting (302) to <GET http://www.
linkedin.com/nhome/> from <POST https://www.linkedin.com/uas/login-submit>
2013-07-12 13:39:45-0500 [LinkedPy] DEBUG: Crawled (200) <GET http://www.linkedi
n.com/nhome/> (referer: https://www.linkedin.com/uas/login)
2013-07-12 13:39:45-0500 [LinkedPy] DEBUG:
Successfully logged in. Let's start crawling!
2013-07-12 13:39:45-0500 [LinkedPy] DEBUG: Hi, this is an item page! http://www.
linkedin.com/nhome/
2013-07-12 13:39:45-0500 [LinkedPy] INFO: Closing spider (finished)
2013-07-12 13:39:45-0500 [LinkedPy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 1670,
'downloader/request_count': 3,
'downloader/request_method_count/GET': 2,
'downloader/request_method_count/POST': 1,
'downloader/response_bytes': 65218,
'downloader/response_count': 3,
'downloader/response_status_count/200': 2,
'downloader/response_status_count/302': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2013, 7, 12, 18, 39, 45, 136000),
'log_count/DEBUG': 11,
'log_count/INFO': 4,
'request_depth_max': 1,
'response_received_count': 2,
'scheduler/dequeued': 3,
'scheduler/dequeued/memory': 3,
'scheduler/enqueued': 3,
'scheduler/enqueued/memory': 3,
'start_time': datetime.datetime(2013, 7, 12, 18, 39, 41, 50000)}
2013-07-12 13:39:45-0500 [LinkedPy] INFO: Spider closed (finished)
Right now, the crawling ends in check_login_response() because Scrapy has not been told to do anything more.
1st request to the login page using start_requests(): OK
2nd request to POST the login information: OK
which response is parsed with check_login_response... and that's it
Indeed check_login_response() returns nothing. To keep the crawling going, you need to return Request instances (that tell Scrapy what pages to fetch next, see Scrapy documentation on Spiders' callbacks)
So, inside check_login_response(), you need to return a Request instance to the starting page containing the links you want to crawl next, probably some of the URLs you defined in start_urls.
def check_login_response(self, response):
#"""Check the response returned by a login request to see if we aresuccessfully logged in."""
if "Sign Out" in response.body:
self.log("\n\n\nSuccessfully logged in. Let's start crawling!\n\n\n")
# Now the crawling can begin..
return Request(url='http://linkedin.com/page/containing/links')
By default, if you do not set a callback for your Request, the spider calls its parse() method (http://doc.scrapy.org/en/latest/topics/spiders.html#scrapy.spider.BaseSpider.parse).
In your case, it will call CrawlSpider's built-in parse() method for you automatically, which applies the Rules you have defined to get next pages.
You must define your CrawlSpider rules within a rules attribute of you spider class, just as you did for name, allowed_domain etc., at the same level.
http://doc.scrapy.org/en/latest/topics/spiders.html#crawlspider-example provides example Rules. The main idea is that you tell the extractor what kind of absolute URL you are interested in within the page, using regular expression(s) in allow. If you do not set allow in your SgmlLinkExtractor, it will match all links.
And each Rule should have a callback to use for these links, in your case parse_item().
Good luck with parsing LinkedIn pages, I think a lot of what's in the pages is generated via Javascript and may not be inside the HTML content fetched by Scrapy.
I have a small grails application running on Tomcat in Ubuntu on a VPS. I use MySql as my datastore and everything works fine unless I leave the application for more than half a day (8 hours?). I did some searching and apparently this is the default wait_timeout in mysql.cnf so after 8 hours the connection will die but Tomcat won't know so when the next user tries to view the site they will see the connection failure error. Refreshing the page will fix this but I want to get rid of the error altogether. For my version of MySql (5.0.75) I have only my.cnf and it doesn't contain such a parameter, In any case changing this parameter doesn't solve the problem.
This Blog Post seems to be reporting a similar error but I still don't fully understand what I need to configure to get this fixed and also I am hoping that there is a simpler solution than another third party library. The machine I'm running on has 256MB ram and I'm trying to keep the number of programs/services running to a minimum.
Is there something I can configure in Grails / Tomcat / MySql to get this to go away?
Thanks in advance,
Gav
From my Catalina.out;
2010-04-29 21:26:25,946 [http-8080-2] ERROR util.JDBCExceptionReporter - The last packet successfully received from the server was 102,906,722 milliseconds$
2010-04-29 21:26:25,994 [http-8080-2] ERROR errors.GrailsExceptionResolver - Broken pipe
java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
...
2010-04-29 21:26:26,016 [http-8080-2] ERROR util.JDBCExceptionReporter - Already closed.
2010-04-29 21:26:26,016 [http-8080-2] ERROR util.JDBCExceptionReporter - Already closed.
2010-04-29 21:26:26,017 [http-8080-2] ERROR servlet.GrailsDispatcherServlet - HandlerInterceptor.afterCompletion threw exception
org.hibernate.exception.GenericJDBCException: Cannot release connection
at java.lang.Thread.run(Thread.java:619)
Caused by: java.sql.SQLException: Already closed.
at org.apache.commons.dbcp.PoolableConnection.close(PoolableConnection.java:84)
at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.close(PoolingDataSource.java:181)
... 1 more
Referring to this article, you have stale connections in your DBCP connections pool that are silently dropped by OS or firewall.
The solution is to define a validation query and do a sanity check of the connection before you actually use it in your application.
In grails this is actually done by modifying the grails-app/conf/spring/Resource.groovy file and add the following:
beans = {
dataSource(BasicDataSource) {
//run the evictor every 30 minutes and evict any connections older than 30 minutes.
minEvictableIdleTimeMillis=1800000
timeBetweenEvictionRunsMillis=1800000
numTestsPerEvictionRun=3
//test the connection while its idle, before borrow and return it
testOnBorrow=true
testWhileIdle=true
testOnReturn=true
validationQuery="SELECT 1"
}
}
In grails 1.3.X, you can modify the evictor values in the DataSource.groovy file to make sure pooled connections are used during idle. This will make sure the mysql server will not time out the connection.
production {
dataSource {
pooled = true
// Other database parameters..
properties {
maxActive = 50
maxIdle = 25
minIdle = 5
initialSize = 5
minEvictableIdleTimeMillis = 1800000
timeBetweenEvictionRunsMillis = 1800000
maxWait = 10000
}
}
A quick way to verify this works is to modify the MySQL my.cnf configuration file [mysql] element and add wait_time parameter with a low value.
Try increasing the number of open MySQL connections by putting the following in your DataSources.groovy:
dataSource {
driverClassName = "com.mysql.jdbc.Driver"
pooled=true
maxActive=10
initialSize=5
// Remaining connection params
}
If you want to go the whole hog, try implementing a connection pool; here is a useful link on this.
For grails 1.3.X, I had to add the following code to Bootstrap.groovy :
def init = {servletContext ->
def ctx=servletContext.getAttribute(ApplicationAttributes.APPLICATION_CONTEXT)
//implement test on borrow
def dataSource = ctx.dataSource
dataSource.targetDataSource.setMinEvictableIdleTimeMillis(1000 * 60 * 30)
dataSource.targetDataSource.setTimeBetweenEvictionRunsMillis(1000 * 60 * 30)
dataSource.targetDataSource.setNumTestsPerEvictionRun(3)
dataSource.targetDataSource.setTestOnBorrow(true)
dataSource.targetDataSource.setTestWhileIdle(true)
dataSource.targetDataSource.setTestOnReturn(false)
dataSource.targetDataSource.setValidationQuery("SELECT 1")
}
I also had to import org.codehaus.groovy.grails.commons.ApplicationAttributes
Add these parameters to dataSource
testOnBorrow = true
testWhileIdle = true
testOnReturn = true
See this article for more information
http://sacharya.com/grails-dbcp-stale-connections/
Starting from grails 2.3.6 default configuration already has options for preventing closing connection by timeout
These are the new defaults.
properties {
// See http://grails.org/doc/latest/guide/conf.html#dataSource for documentation
....
minIdle = 5
maxIdle = 25
maxWait = 10000
maxAge = 10 * 60000
timeBetweenEvictionRunsMillis = 5000
minEvictableIdleTimeMillis = 60000
validationQuery = "SELECT 1"
validationQueryTimeout = 3
validationInterval = 15000
testOnBorrow = true
testWhileIdle = true
testOnReturn = false
jdbcInterceptors = "ConnectionState;StatementCache(max=200)"
defaultTransactionIsolation = java.sql.Connection.TRANSACTION_READ_COMMITTED
}
What does your JDBC connection string look like? You can set an autoReconneect param in your data source config, e.g.
jdbc:mysql://hostname/mydb?autoReconnect=true