Weblogic clustering configuration - configuration

I am developing an application with JDeveloper 11.1.1.6.0. I have a problem with my client application when I try to connect to a weblogic server from a cluster from within my application. A certain service runs on this server that I would like to call.
The situation is as follows:
There is a weblogic instance, whose configuration I cannot change at the moment. The weblogic instance has the following servers and clusters:
Admin server AS - (runs on Machine M1) URL: A, port: 1 - URL for connection t3://A:1
Cluster C containing:
Server S1 - (runs on Machine M1) URL: A, port: 2 - uses Database D1 - URL for connection t3://A:2
Server S2 - (runs on Machine M2) URL: B, port: 1 - uses Database D2 - URL for connection t3://B:1
Server S3 - (runs on Machine M2) URL: B, port: 2 - uses Database D2 - URL for connection t3://B:2
I am trying to connect to t3://A:2 and not to the cluster or any of the other two servers. However, it works only every third time, maybe because of the three servers within the cluster. The cluster uses unicast for messaging and round-robin-affinity for load balancing.
I am trying to find out what causes this. Can I change something within the configuration of the weblogic where my client application runs (integrated or standalone)? Or must the configuration setup of the instance with the server cluster be changed?
Thank you in advance!
Best Regards
(23.05.2013)
EDIT:
We use a plain JNDI-Lookup to access an EJB on the remote server in the described scenario.
Context ctx = new InitialContext();
Object o = ctx.lookup(...)
...
jndi.properties:
java.naming.provider.url=t3://A:2
java.naming.factory.initial=weblogic.jndi.WLInitialContextFactory
It seems to be possible to send the JNDI-Request to the right server by setting the property PIN_TO_PRIMARY_SERVER. Yet, subsequent ejb-requests are still routed to the whole cluster using round robin...
Can we do something on client-side to change this behavior to always address the specific server with the url t3://A:2?

I had a similar problem and after trying changing the InvocationContext environment properties, I found that I had very little luck. Instead I had to alter the weblogic-ejb-jar.xml for my stateless session bean.
String destination = "t3://node-alpha:2010";
Hashtable<String, String> env = new Hashtable<String, String>();
env.put( Context.INITIAL_CONTEXT_FACTORY, "weblogic.jndi.WLInitialContextFactory");
env.put( Context.PROVIDER_URL, destination );
// env.put( weblogic.jndi.WLContext.ENABLE_SERVER_AFFINITY, "true" );
// env.put( weblogic.jndi.WLContext.PIN_TO_PRIMARY_SERVER, "true" );
InitialContext ctx = new InitialContext( env );
EJBHome home = (EJBHome) ctx.lookup( JNDI_REMOTE_SYSTEM_SF );
sf = SomeSf.class.cast( home.getClass().getMethod( "create" ).invoke( home ) );
// Check that we are hitting the right server node.
System.out.println( destination + " => " + sf );
Once you start a transaction, you shouldn't change servers, so I would create a stateless bean to receive the targeted calls and from there begin the work you intend to do. You can set a stateless bean as not clusterable in the weblogic-ejb-jar.xml. You actually need to set both items listed below.
<home-is-clusterable>False</home-is-clusterable>
<stateless-bean-is-clusterable>False</stateless-bean-is-clusterable>
What this means is that when getting a reference through the initial context, is that the targeted server will give an instance of the reference to the stateless bean on that particular cluster node.
Using the servers
node-alpha:2010
node-alpha:2011
node-beta:3010
node-beta:3011
With home-is-clusterable & stateless-bean-is-clusterable set to true
Here the first entry is the server it is targeting, then the rest are for fail-over and/or the load balancing (e.g. round robin).
ClusterableRemoteRef(
3980825488277365621S:node-alpha:[2010,2010,-1,-1,-1,-1,-1]:MyDomain:node-alpha
[
3980825488277365621S:node-alpha:[2010,2010,-1,-1,-1,-1,-1]:MyDomain:node-alpha/338,
4236365235325235233S:node-alpha:[2011,2011,-1,-1,-1,-1,-1]:MyDomain:node-alpha/341,
1321244352376322432S:node-beta:[3010,3010,-1,-1,-1,-1,-1]:MyDomain:node-beta/342,
4317823667154133654S:node-beta:[3011,3011,-1,-1,-1,-1,-1]:MyDomain:node-beta/345
]
)/338
With home-is-clusterable & stateless-bean-is-clusterable set to false
weblogic.rmi.internal.BasicRemoteRef - hostID: '-3980825488277365621S:node-alpha:[2010,2010,-1,-1,-1,-1,-1]:MyDomain:node-alpha', oid: '336', channel: 'null'
weblogic-ejb-jar.xml example below.
<weblogic-ejb-jar>
<weblogic-enterprise-bean>
<ejb-name>SomeSf</ejb-name>
<stateless-session-descriptor>
<pool>
<max-beans-in-free-pool>42</max-beans-in-free-pool>
</pool>
<stateless-clustering>
<home-is-clusterable>false</home-is-clusterable>
<stateless-bean-is-clusterable>false</stateless-bean-is-clusterable>
<stateless-bean-methods-are-idempotent>true</stateless-bean-methods-are-idempotent>
</stateless-clustering>
</stateless-session-descriptor>
<transaction-descriptor>
<trans-timeout-seconds>20</trans-timeout-seconds>
</transaction-descriptor>
<enable-call-by-reference>true</enable-call-by-reference>
<jndi-name>SomeSf</jndi-name>
</weblogic-enterprise-bean>
</weblogic-ejb-jar>

Related

How to fix Cloud SQL (MySQL) & Cloud functions slow queries

I have an application that, through the Firebase Cloud Functions, connects to a Cloud SQL database (MySQL).
The SQL CLOUD machine I am using is the free and lowest level one. (db-f1-micro, shared core, 1vCPU 0.614 GB)
I report below what is my architecture of use for the execution of a simple query.
I have a file called "database.js" which exports my connection (pool) to the db.
const mysqlPromise = require('promise-mysql');
const cf = require('./config');
const connectionOptions = {
connectionLimit: cf.config.connection_limit, // 250
host: cf.config.app_host,
port: cf.config.app_port,
user: cf.config.app_user,
password: cf.config.app_password,
database: cf.config.app_database,
socketPath: cf.config.app_socket_path
};
if(!connectionOptions.host && !connectionOptions.port){
delete connectionOptions.host;
delete connectionOptions.port;
}
const connection = mysqlPromise.createPool(connectionOptions)
exports.connection = connection
Here instead is how I use the connection to execute the query within a "callable cloud function"
Note that the tables are light (no more than 2K records)
// import connection
const db = require("../Config/database");
// define callable function
exports.getProdottiNegozio = functions
.region("europe-west1")
.https.onCall(async (data, context) => {
const { id } = data;
try {
const pool = await db.connection;
const prodotti = await pool.query(`SELECT * FROM products WHERE shop_id=? ORDER BY name`, [id]);
return prodotti;
} catch (error) {
throw new functions.https.HttpsError("failed-precondition", error);
}
});
Everything works correctly, in the sense that the query is executed and returns the expected results, but there is a performance.
Query execution is sometimes very slow. (up to 10 seconds !!!).
I have noticed that some times in the morning they are quite fast (about 1 second), but sometimes they are very slow and make my application very slow.
Checking the logs inside the GCP console I noticed that this message appears.
severity: "INFO"
textPayload: "2021-07-30T07:44:04.743495Z 119622 [Note] Aborted connection 119622 to db: 'XXX' user: 'YYY' host: 'cloudsqlproxy~XXX.XXX.XXX.XXX' (Got an error reading communication packets)"
At the end of all this I would like some help to understand how to improve the performance of the application.
Is it just a SQL CLOUD machine problem? Would it be enough to increase resources to have decent query execution?
Or am I wrong about the architecture of the code and how I organize the functions and the calls to the db?
Thanks in advance to everyone :)
Don't connect directly to your database with an auto scaling solution:
You shouldn't use an auto scaling web service (Firebase Functions) to connect to a database directly. Imagine you get 400 requests, that means 400 connections opened to your database if each function tries to connect on startup. Your database will start rejecting (or queuing) new connections. You should ideally host a service that is online permanently and let Firebase Function tell that service what to query with an existing connection.
Firebase functions takes its sweet time to start up:
Firebase Functions takes 100~300ms to start (cold start) for each function called. So add that to your wait time. More so if your function relies on a connection to something else before it can respond.
Functions have a short lifespan:
You should also know that Firebase Functions don't live very long. They are meant to be single task microservices. Their lifespan is 90 seconds if I recall correctly. Make sure your query doesn't take longer than that
Specific to your issue:
If your database gets slow during the day it might be because the usage increases.
You are using a shared core, which means you share resources on the lowest tier with the the other lower tier databases in that region/zone. You might need to increase resources, like move to a dedicated core, or optimize your query(ies). I'd recommend bumping up your CPU. The cost is really low for small CPU options

AWS Aurora Serverless - Communication Link Failure

I'm using MySQL Aurora Serverless cluster (with the Data API enabled) in my python code and I am getting a communications link failure exception. This usually occurs when the cluster has been dormant for some time.
But, once the cluster is active, I get no error. I have to send 3-4 requests every time before it works fine.
Exception detail:
The last packet sent successfully to the server was 0 milliseconds
ago. The driver has not received any packets from the server. An error
occurred (BadRequestException) when calling the ExecuteStatement
operation: Communications link failure
How can I solve this issue? I am using standard boto3 library
Here is the reply from AWS Premium Business Support.
Summary: It is an expected behavior
Detailed Answer:
I can see that you receive this error when your Aurora Serverless
instance is inactive and you stop receiving it once your instance is
active and accepting connection. Please note that this is an expected
behavior. In general, Aurora Serverless works differently than
Provisioned Aurora , In Aurora Serverless, while the cluster is
"dormant" it has no compute resources assigned to it and when a db.
connection is received, Compute resources are assigned. Because of
this behavior, you will have to "wake up" the clusters and it may take
a few minutes for the first connection to succeed as you have seen.
In order to avoid that you may consider increasing the timeout on the
client side. Also, if you have enabled Pause, you may consider
disabling it [2]. After disabling Pause, you can also adjust the
minimum Aurora capacity unit to higher value to make sure that your
Cluster always having enough computing resource to serve the new
connections [3]. Please note that adjusting the minimum ACU might
increase the cost of service [4].
Also note that Aurora Serverless is only recommend for certain
workloads [5]. If your workload is highly predictable and your
application needs to access the DB on a regular basis, I would
recommend you use Provisioned Aurora cluster/instance to insure high
availability of your business.
[2] How Aurora Serverless Works - Automatic Pause and Resume for Aurora Serverless - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.how-it-works.html#aurora-serverless.how-it-works.pause-resume
[3] Setting the Capacity of an Aurora Serverless DB Cluster - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.setting-capacity.html
[4] Aurora Serverless Price https://aws.amazon.com/rds/aurora/serverless/
[5] Using Amazon Aurora Serverless - Use Cases for Aurora Serverless - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.html#aurora-serverless.use-cases
If it is useful to someone this is how I manage retries while Aurora Serverless wake up.
Client returns a BadRequestException so boto3 will not retry even if you change the config for the client, see https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html.
My first option was to try with Waiters but RDSData does not have any waiter, then I tried to create a custom Waiter with an Error matcher but only tries to match error code, ignoring message, and because a BadRequestException can be raised by an error in a sql statement I needed to validate message too, so I using a kind of waiter function:
def _wait_for_serverless():
delay = 5
max_attempts = 10
attempt = 0
while attempt < max_attempts:
attempt += 1
try:
rds_data.execute_statement(
database=DB_NAME,
resourceArn=CLUSTER_ARN,
secretArn=SECRET_ARN,
sql_statement='SELECT * FROM dummy'
)
return
except ClientError as ce:
error_code = ce.response.get("Error").get('Code')
error_msg = ce.response.get("Error").get('Message')
# Aurora serverless is waking up
if error_code == 'BadRequestException' and 'Communications link failure' in error_msg:
logger.info('Sleeping ' + str(delay) + ' secs, waiting RDS connection')
time.sleep(delay)
else:
raise ce
raise Exception('Waited for RDS Data but still getting error')
and I use it in this way:
def begin_rds_transaction():
_wait_for_serverless()
return rds_data.begin_transaction(
database=DB_NAME,
resourceArn=CLUSTER_ARN,
secretArn=SECRET_ARN
)
I also got this issue, and taking inspiration from the solution used by Arless and the conversation with Jimbo, came up with the following workaround.
I defined a decorator which retries the serverless RDS request until the configurable retry duration expires.
import logging
import functools
from sqlalchemy import exc
import time
logger = logging.getLogger()
def retry_if_db_inactive(max_attempts, initial_interval, backoff_rate):
"""
Retry the function if the serverless DB is still in the process of 'waking up'.
The configration retries follows the same concepts as AWS Step Function retries.
:param max_attempts: The maximum number of retry attempts
:param initial_interval: The initial duration to wait (in seconds) when the first 'Communications link failure' error is encountered
:param backoff_rate: The factor to use to multiply the previous interval duration, for the next interval
:return:
"""
def decorate_retry_if_db_inactive(func):
#functools.wraps(func)
def wrapper_retry_if_inactive(*args, **kwargs):
interval_secs = initial_interval
attempt = 0
while attempt < max_attempts:
attempt += 1
try:
return func(*args, **kwargs)
except exc.StatementError as err:
if hasattr(err.orig, 'response'):
error_code = err.orig.response["Error"]['Code']
error_msg = err.orig.response["Error"]['Message']
# Aurora serverless is waking up
if error_code == 'BadRequestException' and 'Communications link failure' in error_msg:
logger.info('Sleeping for ' + str(interval_secs) + ' secs, awaiting RDS connection')
time.sleep(interval_secs)
interval_secs = interval_secs * backoff_rate
else:
raise err
else:
raise err
raise Exception('Waited for RDS Data but still getting error')
return wrapper_retry_if_inactive
return decorate_retry_if_db_inactive
which can then be used something like this:
#retry_if_db_inactive(max_attempts=4, initial_interval=10, backoff_rate=2)
def insert_alert_to_db(sqs_alert):
with db_session_scope() as session:
# your db code
session.add(sqs_alert)
return None
Please note I'm using sqlalchemy, so the code would need tweaking to suit specific purposes, but hopefully will be useful as a starter.
This may be a little late, but there is a way to deactivate the DORMANT behavior of the database.
When creating the Cluster from the CDK, you can configure an attribute as follows:
new rds.ServerlessCluster(
this,
'id',
{
engine: rds.DatabaseClusterEngine.AURORA_MYSQL,
defaultDatabaseName: 'name',
vpc,
scaling:{
autoPause:Duration.millis(0) //Set to 0 to disable
}
}
)
The attribute is autoPause. The default value is 5 minutes (Communication link failure message may appear after 5 minutes of not using the DB). The max value is 24 hours. However, you can set the value to 0 and this disables the automatic shutdown. After this, the database will not go to sleep even if there are no connections.
When looking at the configuration from AWS (RDS -> Databases -> 'instance' -> Configuration -> Capacity Settings), you'll notice this attribute without a value (if set to 0):
Finally, if you don't want the database to be ON all the time, set your own autoPause value so that it behaves as expected.

Parallel processing and querying SQL with dplyr or pool: MySQL server has gone away

There's a couple of earlier related questions, but none of which solve the issue for me:
https://dba.stackexchange.com/questions/160444/parallel-postgresql-queries-with-r
Parallel Database calls with RODBC
"foreach" loop : Using all cores in R (especially if we are sending sql queries inside foreach loop)
My use case is the following: I have a large database of data that needs to be plotted. Each plot takes a few seconds to create due to some necessary pre-processing of the data and the plotting itself (ggplot2). I need to do a large number of plots. My thinking is that I will connect to the database via dplyr without downloading all the data to memory. Then I have a function that fetches a subset of the data to be plotted. This approach works fine when using single-threading, but when I try to use parallel processing I run into SQL errors related to the connection MySQL server has gone away.
Now, I recently solved the same issue working in Python, in which case the solution was simply to kill the current connection inside the function, which forced the establishment of a new connection. I did this using connection.close() where connection is from Django's django.db.
My problem is that I cannot find an R equivalent of this approach. I thought I had found the solution when I found the pool package for R:
This package enables the creation of object pools for various types of
objects in R, to make it less computationally expensive to fetch one.
Currently the only supported pooled objects are DBI connections (see
the DBI package for more info), which can be used to query a database
either directly through DBI or through dplyr. However, the Pool class
is general enough to allow for pooling of any R objects, provided that
someone implements the backend appropriately (creating the object
factory class and all the required methods) -- a vignette with
instructions on how to do so will be coming soon.
My code is too large to post here, but essentially, it looks like this:
#libraries loaded as necessary
#connect to the db in some kind of way
#with dplyr
db = src_mysql(db_database, username = db_username, password = db_password)
#with RMySQL directly
db = dbConnect(RMySQL::MySQL(), dbname = db_database, username = db_username, password = db_password)
#with pool
db = pool::dbPool(RMySQL::MySQL(),
dbname = db_database,
username = db_username,
password = db_password,
minSize = 4)
#I tried all 3
#connect to a table
some_large_table = tbl(db, 'table')
#define the function
some_function = function(some_id) {
#fetch data from table
subtable = some_large_table %>% filter(id == some_id) %>% collect()
#do something with the data
something(subtable)
}
#parallel process
mclapply(vector_of_ids,
FUN = some_function,
mc.cores = num_of_threads)
The code you have above is not the equivalent of your Python code, and that is the key difference. What you did in Python is totally possible in R (see MWE below). However, the code you have above is not:
kill[ing] the current connection inside the function, which forced the establishment of a new connection.
What it is trying (and failing) to do is to make a database connection travel from the parent process to each child process opened by the call to mclapply. This is not possible. Database connections can never travel across process boundaries no matter what.
This is an example of the more general "rule" that the child process cannot affect the state of the parent process, period. For example, the child process also cannot write to memory locations. You can’t plot (to the parent process’s graphics device) from those child processes either.
In order to do the same thing you did in Python, you need to open a new connection inside of the function FUN (the second argument to mclapply) if you want it to be truly parallel. I.e. you have to make sure that the dbConnect call happens inside the child process.
This eliminates the point of pool (though it’s perfectly safe to use), since pool is useful when you reuse connections and generally want them to be easily accessible. For your parallel use case, since you can't cross process boundaries, this is useless: you will always need to open and close the connection for each new process, so you might as well skip pool entirely.
Here's the correct "translation" of your Python solution to R:
library(dplyr)
getById <- function(id) {
# create a connection and close it on exit
conn <- DBI::dbConnect(
drv = RMySQL::MySQL(),
dbname = "shinydemo",
host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
username = "guest",
password = "guest"
)
on.exit(DBI::dbDisconnect(conn))
# get a specific row based on ID
conn %>% tbl("City") %>% filter(ID == id) %>% collect()
}
parallel::mclapply(1:10, getById, mc.cores = 12)

Connection already closed

I'm using Grails 2.5.3 and Tomcat7 and after 8 hours of app deployment our logs start blowing up with connection already closed issues. A good assumption is that MySql is killing the connection after the default wait time of 8 hrs.
By way of the docs my pool seems to be configured correctly to keep the idle connections open but it doesn't seem to be the case.
What might be wrong with my connection pool setting?
dataSource {
pooled = true
url = 'jdbc:mysql://******.**********.us-east-1.rds.amazonaws.com/*****'
driverClassName = 'com.mysql.jdbc.Driver'
username = '********'
password = '******************'
dialect = org.hibernate.dialect.MySQL5InnoDBDialect
loggingSql = false
properties {
jmxEnabled = true
initialSize = 5
timeBetweenEvictionRunsMillis = 10000
minEvictableIdleTimeMillis = 60000
validationQuery = "SELECT 1"
initSQL = "SELECT 1"
validationQueryTimeout = 10
testOnBorrow = true
testWhileIdle = true
testOnReturn = true
testOnConnect = true
removeAbandonedTimeout = 300
maxActive=100
maxIdle=10
minIdle=1
maxWait=30000
maxAge=900000
removeAbandoned="true"
jdbcInterceptors="org.apache.tomcat.jdbc.pool.interceptor.StatementCache;"
}
}
hibernate {
cache.use_second_level_cache=true
cache.use_query_cache=true
cache.region.factory_class = 'org.hibernate.cache.ehcache.EhCacheRegionFactory'
}
Also, I have confirmed that the dataSource at runtime is an instance of (org.apache.tomcat.jdbc.pool.DataSource)
UPDATE 1 (NOT FIXED)
We think we may have found the problem! We were storing a domain class in the http session and after reading a bit about how the session factory works we believe that the stored http object was somehow bound to a connection. When a user accessed the domain class form the http session after 8 hours we think that hibernate stored a reference to the dead connection. It's in production now and we are monitoring.
UPDATE 2 (FIXED)
We finally found the problem. Removing removeAbandoned and removeAbandonedTimeout resolved all our problems. We're not entirely sure why this resolved the issue as our assumption was that these two properties exist to prevent exactly what was occurring. The only thought is that our database was more aggressively managing the abandoned connections. It's been over 4 weeks with no issues.
I've had this issue with a completely different setup. It's really not fun to deal with. Basically it boils down to this:
You have some connection somewhere in your application just sitting around while Java is doing some sort of "other" processing. Here's a really basic way to reproduce:
Connection con = (get connection from pool);
Sleep(330 seconds);
con.close();
The code is not doing anything with the database connection above, so tomcat detects it as abandoned and returns it to the pool at 300 seconds.
Your application is high traffic enough that the same connection (both opened and abandoned in the above code) is opened somewhere else in the application in a different part of code.
Either the original code hits 330 seconds and closes the connection, or the new code picks up the connection and finished and closes it. At this point there are two places using the same connection and one of them has closed it.
The other location of code using the same connection then tries to either use or close the same connection
The connection is already closed. Producing the above error.
Suggested route to fix:
Use the setting logAbandoned="true" to find where the connections are being abandoned from.
Our url usually looks alike:
url = "jdbc:mysql://localhost/db?useUnicode=yes&characterEncoding=UTF-8&autoReconnect=true"
Check out also encoding params if you don't want to face such an issue.
(see update 2 on question)
Removing removeAbandoned and removeAbandonedTimeout resolved all our problems. Someone may want to provide a more detailed answer on why this did because we are not entirely sure.

mysql_connect "bool $new_link = true" is very slow

I'm using latest version of Xampp on 64bit Win7.
The problem is that, when I use mysql_connect with "bool $new_link" set to true like so:
mysql_connect('localhost', 'root', 'my_password', TRUE);
script execution time increases dramatically (about 0,5 seconds per connection, and when I have 4 diffirent objects using different connections, it takes ~2 seconds).
Is setting "bool $new_link" to true, generally a bad idea or could it just be some problem with my software configuration.
Thank you.
//Edit:
I'm using new link, because I have multiple objects, that use mysql connections (new objects can be created inside already existing objects and so on). In the end, when it comes to unsetting objects (I have mysql_close inside my __destruct() functions), I figured, that the only way to correctly clean up loose ends would be that all objects have their own connection variables.
I just formated my PC so configuration should be default conf.
Don't open a new connection unless you have a need for one (for instance, accessing multiple databases simultaneously).
Also you don't have to explicitly call mysql_close. I usually just include a function to quickly retrieve an existing db link (or a new one if none exists yet).
function &getDBConn() {
global $DBConn;
if(!$DBConn) $DBConn = mysql_connect(...);
return $DBConn;
}
// now you can just call $dbconn = getDBConn(); whenever you need it
Use "127.0.0.1" instead of "localhost". It improved my performance with mysql_connect from ~1 sek to a couple of milliseconds.
Article about php/mysql_connect and IPv6 on windows: http://www.bluetopazgames.com/uncategorized/php-mysql_connect-is-slow-1-second-for-localhost-windows-7/