I have a running instance of VerneMQ (cluster of 2 nodes) on Google kubernets and using MySQL (CloudSQL) for Auth. Server accepts connections over TLS
It works fine, but after a few days i start seeing this message on the log:
can't authenticate client {[],<<"Client-id">>} from X.X.X.X:16609 due to plugin_chain_exhausted
The client app (paho) complains that the server refused the connection for being "not authorized (code=5 in paho error)"
after a few retry it finally connects. but every time it get's harder and harder until it just won't connect anymore
If i restart VerneMQ everything get's back to normal
I have only 3 clients currently connected at most, at the same time.
clients already connected have no issues in pub/sub.
In my configuration i have (among other things):
log.console.level=debug
plugins.vmq_diversity=on
vmq_diversity.mysql.* = all of them set
allow_anonymous=off
vmq_diversity.auth_mysql.enabled=on
it's like the server degrades over time. the status webpage reports no problem
My verne server was build from the git repository about a month ago and runs on a docker container
what could be the cause?
what else could i check to find posibles causes? maybe a diversity missconfiguration?
Tks
To quickly explain the plugin_chain_exhausted log: with Verne you can run multiple authentication/authorization plugins, and they will be checked in a chain. If one plugin allows the client, it will be in. If no plugin allows the client, you'll see the log above.
This does not explain the behaviour you describe, though. I don't think I have seen that.
In any case, the first thing to check is whether you actually run multiple plugins. For instance: have you disabled the vmq.passwd and the vmq.acl plugins?
Related
I am using couchbase server 6.0.2 image from RedHat
https://access.redhat.com/containers/?tab=overview&get-method=registry-tokens#/registry.connect.redhat.com/couchbase/server
in openshift.
The Pod is running but does not react to http://localhost:8091. The Logs show the error shown below.
I have 3 questions:
Why is whoami failing in the entrypoint?
Why isn't the server responding on port 8091?
Does the couchbase server image require root permissions?
It seems the couchbase/server image is expecting to be run as root, then creates its own user couchbase and group couchbase.
At the end it's running an entrypoint script and in there checking if the user running the whole thing, is actually the user couchbase by executing the whois command.
This is not the case if you just run it in openshift, as the container will be run as some "random" unprivileged user.
This leads to a set of consecutive failures:
Here You will find the evaluation that is done in the entrypoint.sh.
Now the whois command is failing since there is not actual user just said random UID. that failing, leaves the first part of the evaluation blank, which will result in a failure.
This is a bug in the couchbase/server image and as such you should, if time allows contribute to fixing by opening an issue against that repo.
Right now I am connecting to a cluster endpoint that I have set up for an Aurora DB-MySQL compatible cluster, and after I do a "failover" from the AWS console, my web application is unable to properly connect to the DB that should be writable.
My setup is like this:
Java Web App (tomcat8) with HikariCP as the connection pool, with ConnecterJ as the driver for MySQL. I am evaluating Aurora-MySQL to see if it will satisfy some of the needs the application has. The web app sits in an EC2 instance that is in the same VPC and SG as the Aurora-MySQL cluster. I am connecting through the cluster endpoint to get to the database.
After a failover, I would expect HikariCP to break connections (it does), and then attempt to reconnect (it does), however, the application must be connecting to the wrong server, because anytime a write is hit to the database, a SQL Exception is thrown that says:
The MySQL server is running with the --read-only option so it cannot execute this statement
What is the solution here? Should I rework my code to flush DNS after all connections go down, or after I start receiving this error, and then try to re-initiate connections after that? That doesn't seem right...
I don't know why I keep asking questions if I just answer them (I should really be more patient), but here's an answer in case anyone stumbles upon this in a Google search:
RDS uses DNS changes when working with the cluster endpoint to make it looks "seamless". Since the IP behind the hostname can change, if there is any sort of caching going on, then you can see pretty quickly how a change won't be reflected. Here's a page from AWS' docs that go into it a bit more: https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-jvm-ttl.html
To resolve my issue, I went into the jvm's security file and then changed it to be 0 just to verify if what was happening was correct. Seems correct. Now I just need to figure out how to do it properly...
I am trying to hit 350 users but Jmeter failing script by saying Connection timed out.
I have added following:
http.connection.stalecheck$Boolean=true in hc.parameter file
httpclient4.retrycount=1
hc.parameter.file=hc.parameter
Is there anything that I am missing to add on?
This normally indicates a problem on application under test side so I would recommend checking the logs of your application for anything suspicious.
If everything seems to be fine there - check logs of your web and database servers, for instance Apache HTTP Server allows 150 connections by default, MySQL - 100, etc. so you may need to identify whether you are suffering from this form of limits and what needs to be done to raise them
And finally it may be simply lack of CPU or free RAM on application under test side so next time you run your test keep an eye on baseline OS health metrics as application may respond slowly or even hang if it doesn't have any spare headroom. You can use JMeter PerfMon plugin to integrate this form of monitoring with your load test.
My Apache access_log is littered with the following entries:
127.0.0.1 - - [06/Apr/2016:11:43:58 +0000] "GET /lookup/503 Over Quota Error Over Quota This application is temporarily over its serving quota. Please try again later. HTTP/1.0" 404 3450 "-" "WordPress/4.2.7; http://snip.com"
And on and on...
Within minutes, Apache spawns several child processes and the MySQL DB fails.
If I restart MySQL and Apache, minutes later the same thing happens.
I suspect it's a WordPress issue since the source of the requests is localhost [127.0.0.1]
Has anyone observed this behaviour before and, if so, how have you resolved? Or, what further diagnostics could you suggest I use to determine root cause (e.g. more detailed logging, additional logs, etc?)
More Details:
WordPress/4.2.7,
Apache/2.4.16,
MySQL/5.5.46
Apache and MySQL DB on same server
Server is Linux with 8 GB RAM
Although not a complete answer, I managed to narrow down the problem to Wordpress plugins, but I have not taken the time to identify the exact plugin and why it started happening "all of a sudden".
I disabled all the Wordpress plugins and observed that the GET /lookup/503 Over Quota Error... entries were no longer showing up in the Apache logs.
Going on the assumption that this problem was plugin-related, I disabled all plugins (DuckDuckGo for ways to disable plugins without the Wordpress Admin console.)
Then, I started enabling those plugins that I knew the site absolutely needed (sidebar: there were over 40 plugins installed, 30 of which remain disabled as of today. The lesson: only install plugins you need.)
Eventually, I'll go back and identify the exact plugin and what changed to make it freak out (I suspect bad data.) For now, all is right with the World.
this looks like the issue with woocommerce - in woocommerce, there is a geo ip lookup function and when we disabled it - it seems to be working fine.
I am getting intermittent 'Connection Timed Out' errors when a php script on my web server connects to the MySQL database server over the private network. However, if I tell the script to use the public network to connect, these errors do not appear.
My connection script is setup so that whenever I try to connect to mysql, it checks for errors, if there is an error, it sends me an email then automatically switches to the public network to try that connection. If the public connection fails, it sends me another email and displays a custom web page to the user.
I get about 5 to 10 connection errors every hour. There are hundreds of successful connections every minute.
These machines are dedicated machines. I contacted our hosting company and they tested the routers and cables and said everything is fine. I tried pinging the servers both ways and there are no errors at all for test periods over an hour.
I am using the latest Nginx with the latest PHP and PHP-FPM. Mysql is 5.5.27. These are Centos 6 64bit systems with that latest updates.
I've tried many network configuration options, adjustments to php-fpm & mysql config file and no matter what I do or change, nothing fixes it.
The weird thing is, everything works great over the public network and pings and file transfer work great over the private network between both machines.
Any ideas?
** UPDATE **
I made some changes to the PHP-FPM config file and to the MySQL config file and the errors are now about 2 to 3 per hour but still unresolved.
I'm not sure this is your case but still worth mentioning as it helped me in a similar situation. Basically, there is a cap on max number of connections in linux kernel: https://serverfault.com/questions/10852/what-limits-the-maximum-number-of-connections-on-a-linux-server
Not sure if it is shared between all the networks, but if you think it's worth checking I'd just raise those variable values say twice and see if it had any effect on how frequently the error happens.