Should pika's channel.basic_publish timeout sooner if there is no network connection? - pika

I'm in the process of upgrading to pika 1.1.0, and performed some sanity testing.
I have:
placed a breakpoint at the following
disconnected the network
stepped over this command
... and no exception was thrown. Is this expected?
channel.basic_publish(
exchange=EXCHANGE,
routing_key=ROUTING_KEY,
body=message,
properties=pika.BasicProperties(
delivery_mode=MQ_TRANSIENT_DELIVERY_MODE,
headers=headers,
)
)
The connection is created with:
connection = pika.BlockingConnection(pika.ConnectionParameters(
host=rabbit_config.host,
credentials=credentials,
port=rabbit_config.port,
connection_attempts=1,
blocked_connection_timeout=10,
retry_delay=5,
socket_timeout=20,
heartbeat=30, ))
update:
If I call channel.confirm_delivery() before this, I successfully get a AMQPError.
However, this doesn't happen for 60 seconds (which doesn't my ConnectionParameters). How can I have it notice the connection loss quicker?

Related

KCL stops processing data after throwing error Cancelling subscription, and restarting

KCL
ShardConsumerSubscriber:131 - shardId-000000000000: Last request was dispatched at 2020-04-28T12:57:25.166Z, but no response as of 2020-04-28T12:58:00.435Z (PT35.269S). Cancelling subscription, and restarting."
But never restarts application and no data is processed after that.
Maven dependency used
<dependency>
<groupId>software.amazon.kinesis</groupId>
<artifactId>amazon-kinesis-client</artifactId>
<version>2.2.2</version>
</dependency>
And the Kinesis configuration
KinesisAsyncClient kinesisClient = KinesisAsyncClient.builder()
.credentialsProvider(new MyCredentialProvider(configVals)).region(region).build();
InitialPositionInStreamExtended initialPositionInStreamExtended = InitialPositionInStreamExtended
.newInitialPosition(InitialPositionInStream.TRIM_HORIZON);
RetrievalConfig retrievalConfig = configsBuilder.retrievalConfig()
.retrievalSpecificConfig(new PollingConfig(configVals.getStreamName(), kinesisClient)
.idleTimeBetweenReadsInMillis(10000).maxRecords(50).kinesisRequestTimeout(Duration.ofSeconds(100)));
retrievalConfig.initialPositionInStreamExtended(initialPositionInStreamExtended);
As it's stated in issue KCL 2.0 stops consuming from some shards #463, the main reason can be the fact that the processor is not able to process the input in 35 seconds. This value is hardcoded in ShardConsumer in MAX_TIME_BETWEEN_REQUEST_RESPONSE variable and is increased to 1 minute in later versions (I'm using kcl version 2.2.10 and it's 60 seconds already).
I've also experienced this issue and additionally had to override kinesis client http client settings to setup higher timeouts - something like that way:
val kinesisClient = KinesisClientUtil
.adjustKinesisClientBuilder(
KinesisAsyncClient.builder.credentialsProvider(credentialsProvider).region(awsRegion)
)
.httpClientBuilder(
NettyNioAsyncHttpClient.builder
.maxConcurrency(Integer.MAX_VALUE)
.maxPendingConnectionAcquires(kinesisHttpMaxPendingConnectionAcquires)
.maxHttp2Streams(kinesisHttpMaxConnections)
.connectionAcquisitionTimeout(kinesisHttpConnectionAcquisitionTimeout)
.connectionTimeout(kinesisHttpConnectionTimeout)
.readTimeout(kinesisHttpReadTimeout)
.http2Configuration(
Http2Configuration.builder
.initialWindowSize(10 * 1024 * 1024)
.healthCheckPingPeriod(kinesisHttpHealthCheckPingPeriod)
.maxStreams(kinesisHttpMaxConnections.longValue())
.build
)
.protocol(Protocol.HTTP2)
)
.build()
I've set all the timeouts to 120 seconds and it did the trick.

AWS Aurora Serverless - Communication Link Failure

I'm using MySQL Aurora Serverless cluster (with the Data API enabled) in my python code and I am getting a communications link failure exception. This usually occurs when the cluster has been dormant for some time.
But, once the cluster is active, I get no error. I have to send 3-4 requests every time before it works fine.
Exception detail:
The last packet sent successfully to the server was 0 milliseconds
ago. The driver has not received any packets from the server. An error
occurred (BadRequestException) when calling the ExecuteStatement
operation: Communications link failure
How can I solve this issue? I am using standard boto3 library
Here is the reply from AWS Premium Business Support.
Summary: It is an expected behavior
Detailed Answer:
I can see that you receive this error when your Aurora Serverless
instance is inactive and you stop receiving it once your instance is
active and accepting connection. Please note that this is an expected
behavior. In general, Aurora Serverless works differently than
Provisioned Aurora , In Aurora Serverless, while the cluster is
"dormant" it has no compute resources assigned to it and when a db.
connection is received, Compute resources are assigned. Because of
this behavior, you will have to "wake up" the clusters and it may take
a few minutes for the first connection to succeed as you have seen.
In order to avoid that you may consider increasing the timeout on the
client side. Also, if you have enabled Pause, you may consider
disabling it [2]. After disabling Pause, you can also adjust the
minimum Aurora capacity unit to higher value to make sure that your
Cluster always having enough computing resource to serve the new
connections [3]. Please note that adjusting the minimum ACU might
increase the cost of service [4].
Also note that Aurora Serverless is only recommend for certain
workloads [5]. If your workload is highly predictable and your
application needs to access the DB on a regular basis, I would
recommend you use Provisioned Aurora cluster/instance to insure high
availability of your business.
[2] How Aurora Serverless Works - Automatic Pause and Resume for Aurora Serverless - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.how-it-works.html#aurora-serverless.how-it-works.pause-resume
[3] Setting the Capacity of an Aurora Serverless DB Cluster - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.setting-capacity.html
[4] Aurora Serverless Price https://aws.amazon.com/rds/aurora/serverless/
[5] Using Amazon Aurora Serverless - Use Cases for Aurora Serverless - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.html#aurora-serverless.use-cases
If it is useful to someone this is how I manage retries while Aurora Serverless wake up.
Client returns a BadRequestException so boto3 will not retry even if you change the config for the client, see https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html.
My first option was to try with Waiters but RDSData does not have any waiter, then I tried to create a custom Waiter with an Error matcher but only tries to match error code, ignoring message, and because a BadRequestException can be raised by an error in a sql statement I needed to validate message too, so I using a kind of waiter function:
def _wait_for_serverless():
delay = 5
max_attempts = 10
attempt = 0
while attempt < max_attempts:
attempt += 1
try:
rds_data.execute_statement(
database=DB_NAME,
resourceArn=CLUSTER_ARN,
secretArn=SECRET_ARN,
sql_statement='SELECT * FROM dummy'
)
return
except ClientError as ce:
error_code = ce.response.get("Error").get('Code')
error_msg = ce.response.get("Error").get('Message')
# Aurora serverless is waking up
if error_code == 'BadRequestException' and 'Communications link failure' in error_msg:
logger.info('Sleeping ' + str(delay) + ' secs, waiting RDS connection')
time.sleep(delay)
else:
raise ce
raise Exception('Waited for RDS Data but still getting error')
and I use it in this way:
def begin_rds_transaction():
_wait_for_serverless()
return rds_data.begin_transaction(
database=DB_NAME,
resourceArn=CLUSTER_ARN,
secretArn=SECRET_ARN
)
I also got this issue, and taking inspiration from the solution used by Arless and the conversation with Jimbo, came up with the following workaround.
I defined a decorator which retries the serverless RDS request until the configurable retry duration expires.
import logging
import functools
from sqlalchemy import exc
import time
logger = logging.getLogger()
def retry_if_db_inactive(max_attempts, initial_interval, backoff_rate):
"""
Retry the function if the serverless DB is still in the process of 'waking up'.
The configration retries follows the same concepts as AWS Step Function retries.
:param max_attempts: The maximum number of retry attempts
:param initial_interval: The initial duration to wait (in seconds) when the first 'Communications link failure' error is encountered
:param backoff_rate: The factor to use to multiply the previous interval duration, for the next interval
:return:
"""
def decorate_retry_if_db_inactive(func):
#functools.wraps(func)
def wrapper_retry_if_inactive(*args, **kwargs):
interval_secs = initial_interval
attempt = 0
while attempt < max_attempts:
attempt += 1
try:
return func(*args, **kwargs)
except exc.StatementError as err:
if hasattr(err.orig, 'response'):
error_code = err.orig.response["Error"]['Code']
error_msg = err.orig.response["Error"]['Message']
# Aurora serverless is waking up
if error_code == 'BadRequestException' and 'Communications link failure' in error_msg:
logger.info('Sleeping for ' + str(interval_secs) + ' secs, awaiting RDS connection')
time.sleep(interval_secs)
interval_secs = interval_secs * backoff_rate
else:
raise err
else:
raise err
raise Exception('Waited for RDS Data but still getting error')
return wrapper_retry_if_inactive
return decorate_retry_if_db_inactive
which can then be used something like this:
#retry_if_db_inactive(max_attempts=4, initial_interval=10, backoff_rate=2)
def insert_alert_to_db(sqs_alert):
with db_session_scope() as session:
# your db code
session.add(sqs_alert)
return None
Please note I'm using sqlalchemy, so the code would need tweaking to suit specific purposes, but hopefully will be useful as a starter.
This may be a little late, but there is a way to deactivate the DORMANT behavior of the database.
When creating the Cluster from the CDK, you can configure an attribute as follows:
new rds.ServerlessCluster(
this,
'id',
{
engine: rds.DatabaseClusterEngine.AURORA_MYSQL,
defaultDatabaseName: 'name',
vpc,
scaling:{
autoPause:Duration.millis(0) //Set to 0 to disable
}
}
)
The attribute is autoPause. The default value is 5 minutes (Communication link failure message may appear after 5 minutes of not using the DB). The max value is 24 hours. However, you can set the value to 0 and this disables the automatic shutdown. After this, the database will not go to sleep even if there are no connections.
When looking at the configuration from AWS (RDS -> Databases -> 'instance' -> Configuration -> Capacity Settings), you'll notice this attribute without a value (if set to 0):
Finally, if you don't want the database to be ON all the time, set your own autoPause value so that it behaves as expected.

Scala / Slick, "Timeout after 20000ms of waiting for a connection" error

The block of code below has been throwing an error.
Timeout after 20000ms of waiting for a connection.","stackTrace":[{"file":"BaseHikariPool.java","line":228,"className":"com.zaxxer.hikari.pool.BaseHikariPool","method":"getConnection"
Also, my database accesses seem too slow, with each element of xs.map() taking about 1 second. Below, getFutureItem() calls db.run().
xs.map{ x =>
val item: Future[List[Sometype], List(Tables.myRow)] = getFutureItem(x)
Await.valueAfter(item, 100.seconds) match {
case Some(i) => i
case None => println("Timeout getting items after 100 seconds")
}
}
Slick logs this with each iteration of an "x" value:
[akka.actor.default-dispatcher-3] [akka://user/IO-HTTP/listener-0/24] Connection was PeerClosed, awaiting TcpConnection termination...
[akka.actor.default-dispatcher-3] [akka://user/IO-HTTP/listener-0/24] TcpConnection terminated, stopping
[akka.actor.default-dispatcher-3] [akka://system/IO-TCP/selectors/$a/0] New connection accepted
[akka.actor.default-dispatcher-7] [akka://user/IO-HTTP/listener-0/25] Dispatching POST request to http://localhost:8080/progress to handler Actor[akka://system/IO-TCP/selectors/$a/26#-934408297]
My configuration:
"com.zaxxer" % "HikariCP" % "2.3.2"
default_db {
url = ...
user = ...
password = ...
queueSize = -1
numThreads = 16
connectionPool = HikariCP
connectionTimeout = 20000
maxConnections = 40
}
Is there anything obvious that I'm doing wrong that is causing these database accesses to be so slow and throw this error? I can provide more information if needed.
EDIT: I have received one recommendation that the issue could be a classloader error, and that I could resolve it by deploying the project as a single .jar, rather than running it with sbt.
EDIT2: After further inspection, it appears that many connections were being left open, which eventually led to no connections being available. This can likely be resolved by calling db.close() to close the connection at the appropriate time.
EDIT3: Solved. The connections made by slick exceeded the max connections allowed by my mysql config.
OP wrote:
EDIT2: After further inspection, it appears that many connections were being left open, which eventually led to no connections being available. This can likely be resolved by calling db.close() to close the connection at the appropriate time.
EDIT3: Solved. The connections made by slick exceeded the max connections allowed by my mysql config.

Go write unix /tmp/mysql.sock: broken pipe when sending a lot of requests

I have a Go API endpoint that makes several MySQL query. When the endpoint receives a small number of requests, it works just fine. However, I am now testing it using apache bench with 100 requests. The first 100 all went through. However, the 2nd 100 caused this error to appear
2014/01/15 12:08:03 http: panic serving 127.0.0.1:58602: runtime error: invalid memory address or nil pointer dereference
goroutine 973 [running]:
net/http.funcĀ·009()
/usr/local/Cellar/go/1.2/libexec/src/pkg/net/http/server.go:1093 +0xae
runtime.panic(0x402960, 0x9cf419)
/usr/local/Cellar/go/1.2/libexec/src/pkg/runtime/panic.c:248 +0x106
database/sql.(*Rows).Close(0x0, 0xc2107af540, 0x69)
/usr/local/Cellar/go/1.2/libexec/src/pkg/database/sql/sql.go:1576 +0x1e
store.findProductByQuery(0xc2107af540, 0x69, 0x0, 0xb88e80, 0xc21000ac70)
/Users/dennis.suratna/workspace/session-go/src/store/product.go:83 +0xe3
store.FindProductByAppKey(0xc210337748, 0x7, 0x496960, 0x6, 0xc2105eb1b0)
/Users/dennis.suratna/workspace/session-go/src/store/product.go:28 +0x11c
api.SessionHandler(0xb9eff8, 0xc2108ee200, 0xc2108f5750, 0xc2103285a0, 0x0, ...)
/Users/dennis.suratna/workspace/session-go/src/api/session_handler.go:31 +0x2fb
api.funcĀ·001(0xb9eff8, 0xc2108ee200, 0xc2108f5750, 0xc2103285a0)
/Users/dennis.suratna/workspace/session-go/src/api/api.go:81 +0x4f
reflect.Value.call(0x3ad9a0, 0xc2101ffdb0, 0x130, 0x48d520, 0x4, ...)
/usr/local/Cellar/go/1.2/libexec/src/pkg/reflect/value.go:474 +0xe0b
reflect.Value.Call(0x3ad9a0, 0xc2101ffdb0, 0x130, 0xc2103c4a00, 0x3, ...)
/usr/local/Cellar/go/1.2/libexec/src/pkg/reflect/value.go:345 +0x9d
github.com/codegangsta/inject.(*injector).Invoke(0xc2103379c0, 0x3ad9a0, 0xc2101ffdb0, 0x4311a0, 0x1db94e, ...)
It looks like it's not caused by the number of concurrent requests but, rather, something that is not properly closed. I am already closing every prepare statement that I create in my code. I am wondering if anyone has ever seen this before.
Edit:
This is how I am initializing my MySQL connection:
func InitStore(environment string) error {
db, err := sql.Open("mysql", connStr(environment))
....
S = &Store{
Mysql: db,
Environment: environment,
}
}
In this happens only once when I start the server.
Ok so I was able to solve this problem and now I can send ~500 requests with 10 concurrency with no more Broken pipe or Too many connections error.
I think it all comes down to following best practices. When you don't expect multiple rows to be returned user QueryRow instead of Query AND chain it with Scan
db.QueryRow(...).Scan(...)
If you don't expect rows to be returned and if you're not going to re-use your statements, use Exec not Prepare.
If you have prepared statement or querying multiple rows. Don't forget to Close()
Got all of the above from
https://github.com/go-sql-driver/mysql/issues/111
If you use Go 1.2.x you can use db.SetMaxOpenConns to tell the sql package to not open more than X connections. Queries that need a database connection after X connections are already open (and busy) will block until there's an available connection.
That being said: what are the next lines of the "stack trace"? Line ~1093 in http/server.go is the recover code when your serve function fails. It looks more like you are just mishandling some data and that makes it fail or you are missing an error check and then try processing data when you really were returned an error, etc.

python: sqlalchemy - how do I ensure connection not stale using new event system

I am using the sqlalchemy package in python. I have an operation that takes some time to execute after I perform an autoload on an existing table. This causes the following error when I attempt to use the connection:
sqlalchemy.exc.OperationalError: (OperationalError) (2006, 'MySQL server has gone away')
I have a simple utility function that performs an insert many:
def insert_data(data_2_insert, table_name):
engine = create_engine('mysql://blah:blah123#localhost/dbname')
# Metadata is a Table catalog.
metadata = MetaData()
table = Table(table_name, metadata, autoload=True, autoload_with=engine)
for c in mytable.c:
print c
column_names = tuple(c.name for c in mytable.c)
final_data = [dict(zip(column_names, x)) for x in data_2_insert]
ins = mytable.insert()
conn = engine.connect()
conn.execute(ins, final_data)
conn.close()
It is the following line that times long time to execute since 'data_2_insert' has 677,161 rows.
final_data = [dict(zip(column_names, x)) for x in data_2_insert]
I came across this question which refers to a similar problem. However I am not sure how to implement the connection management suggested by the accepted answer because robots.jpg pointed this out in a comment:
Note for SQLAlchemy 0.7 - PoolListener is deprecated, but the same solution can be implemented using the new event system.
If someone can please show me a couple of pointers on how I could go about integrating the suggestions into the way I use sqlalchemy I would be very appreciative. Thank you.
I think you are looking for something like this:
from sqlalchemy import exc, event
from sqlalchemy.pool import Pool
#event.listens_for(Pool, "checkout")
def check_connection(dbapi_con, con_record, con_proxy):
'''Listener for Pool checkout events that pings every connection before using.
Implements pessimistic disconnect handling strategy. See also:
http://docs.sqlalchemy.org/en/rel_0_8/core/pooling.html#disconnect-handling-pessimistic'''
cursor = dbapi_con.cursor()
try:
cursor.execute("SELECT 1") # could also be dbapi_con.ping(),
# not sure what is better
except exc.OperationalError, ex:
if ex.args[0] in (2006, # MySQL server has gone away
2013, # Lost connection to MySQL server during query
2055): # Lost connection to MySQL server at '%s', system error: %d
# caught by pool, which will retry with a new connection
raise exc.DisconnectionError()
else:
raise
If you wish to trigger this strategy conditionally, you should avoid use of decorator here and instead register listener using listen() function:
# somewhere during app initialization
if config.check_connection_on_checkout:
event.listen(Pool, "checkout", check_connection)
More info:
Connection Pool Events
Events API
There is a better way to handle it right now - pool_recycle
engine = create_engine('mysql://...', pool_recycle=3600)
MySQL has a default timeout of 8 hours.
This leads to the connection to be closed by MySQL but the engine above it (such as SQLAlchemy) to not know about it.
There are 2 ways to solve it -
Optimistic - Using pool_recycle
Pessimistic - using pool_pre_ping=True
I prefer to go with the pool_recycle as it doesn't emit a SELECT 1 before each query - causing less stress on the db