Issue when trying to connect to the cluster after updating the version of Java SDK - couchbase

We are experiencing the issue when trying to connect to the cluster after updating the version of Java SDK.
The setup of the system is as follows:
We have a web application that is using Java SDK and a Couchbase cluster. In between we have a VIP (Virtual IP Address). We realise that isn’t ideal but we’re not able to change that immediately since VIP was mandated by Tech Ops. VIP is basically only there to reroute the initial request on application startup. That way we can make modifications on the cluster and ensure that when application starts it can find the cluster regardless of the actual nodes in the cluster and their IPs.
Prior to the issue we used JAVA SDK version 1.4.4. Our application would start and Java SDK would initiate a request on port 8091 to VIP. Please note that port 8091 is the only port open on VIP. VIP would reroute the request to one of the node cluster currently in use the cluster would respond to Java SDK. At that point Java SDK would discover all the nodes in the cluster and application would run fine. During up time if we would add, remove a node from the cluster Java SDK would update automatically and everything would run without the issue.
In the last sprint we updated the Java SDK to version 2.1.3. Our application would start and Java SDK would initiate a request on port 11210 to VIP. Since this port is not open the request would fail and Java SDK would throw an exception:
Caused by: java.lang.RuntimeException: java.util.concurrent.TimeoutException
at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:93)
at com.couchbase.client.java.CouchbaseCluster.openBucket(CouchbaseCluster.java:108)
at com.couchbase.client.java.CouchbaseCluster.openBucket(CouchbaseCluster.java:99)
at com.couchbase.client.java.CouchbaseCluster.openBucket(CouchbaseCluster.java:89)
No further request would be made on any port.
It appears the order in which port are being used has been changed between versions. Could somebody please confirm, or dispute, that the order in which ports are being used for cluster discovery has been changed between versions. Also could somebody please provide some advice on how we could resolve the issue. We are trying to understand the clients behavior, if we could open all those ports on the VIP would the client still then function correctly and at full performance?
The issue is happening on our production environment which we cannot use for testing out potential solutions since it will interfere with our products.

In v2.x of the Java SDK, it defaults to 11210 to get the cluster map to bootstrap the application. This is a huge improvement actually as now the map comes from the managed cache and not the cluster manager (8091). The SDK should use 8091 as a fall back if it cannot get the map on 11210 though. Regardless, you really want to get that map from 11210, trust me. It cleans up a lot of problems.
To resolve this long term and follow Couchbase best practices, upgrade to the Java 2.2.x SDK, get rid of the VIP entirely and go with a DNS SRV record instead. That gives you one DNS entry for the SDK connection object and you just manage the node list in DNS. It works great. I say SDK 2.2 as the DNS SRV record solution is fully supported there, in 2.1 it is experimental. VIPs are specifically recommended against by Couchbase these days. In older versions of the SDKs it was fine to do this and it helped with limiting the number of connections from the app to the DB nodes, but that is no longer necessary and can actually be a bad thing.

in addition to Kirk's long term answer (which I also advise you to follow), a shorter term solution may be to deactivate the 11210 bootstraping (carrier bootstrap) through the CouchbaseEnvironment by calling bootstrapCarrierEnabled(false) on the builder.
I don't guarantee that it'll work with a vIP even after that, but that may be worth a try if you're in a hurry.

Related

Couchbase CLI/REST to get cluster current settings

Im trying to find a way to monitor couchbase cluster settings such as memory, email configurations, etc.
Ideally it would be a cli/REST command that describes entire cluster configurations or its particular components.
Couchbase version: 4.5.1- Community Edition
Will appreciate for any advice.
In current CB versions, you can get the email info using http://hostname:8091/settings/alerts and memory info using http://hostname:8091/pools/nodes
For some reason, I cannot seem to access the CB archived documentation to confirm this. Try it out and see if these APIs are available in 4.5.x. The pools API should be available. Not sure on the alerts API.

Using zabbix_sender for host discovery

I'm writing an application which delivers data from remote devices over an HTTP API. These devices are on a mobile data connection and have limited resources.
I wish to receive custom monitoring data over the HTTP API, relying on the security model designed in the application, and push that data to Zabbix directly (or indirectly) from node.js. I do not wish to use Zabbix Agent on the remote devices.
I see that I can use zabbix_sender to send data to a Zabbix server containing a pre-configured host. This works great. I intend to deliver monitoring data over my custom API, and when received give this data to zabbix_sender inside the server network.
The problem is there are many devices in the field and more are being added all the time.
TL;DR:
When zabbix_sender provides a custom hostname which doesn't exist in Zabbix already, it fails.
I would like to auto-add discovered hosts, based upon new hostnames from zabbix_sender. How would I do this?
Also, extra respect if anyone can give examples of how to avoid zabbix_sender and send data directly from node.js to the Zabbix server. I mean: suggest an NPM package that you have experience using. (Update: Found working node.js package here: https://www.npmjs.com/package/node-zabbix-sender)
Zabbix configuration: I'm learning from Zabbix 2.4 installed in Docker, no custom configuration from this Dockerhub: https://hub.docker.com/r/zabbix/zabbix-2.4/
Probably the best would be to use the Zabbix API to create hosts directly.
Alternatively, you could set up an action and emulate active agent connection, which would make Zabbix create the host via the active agent auto-regstration.
You could also use low level discovery (LLD) to send in JSON, which would result in hosts/items being created, based on prototypes.
In all of these cases you have to wait for one minute (by default) for the hosts to appear in the Zabbix cache, then you can send the data.
Also note that Zabbix 2.4 is not supported anymore, it will receive no fixes - it is not a "long-term support" release.

openshift and let's encrypt certificates

Is there any integration for Let's Encrypt in OpenShift (or, is this planned)? Let's encrypt are going to issue certs that expire in 90 days[1] -- and a big part of their plan is to have automation setups via people who use their certs so that they're always updated with new certs. Given this, some integration from OpenShift would be necessary.
Thanks,
[1] https://letsencrypt.org/2015/11/09/why-90-days.html
Currently, the ability to automate ssl certificate renewals and installation on OpenShift Online is not possible because the ssl certificates are stored at the node level, and ssl connections are terminated by the node level proxy (Reference this). If you would like to see it included in future versions, you should vote here and get people to vote on it. It's possible that you could automate it locally somewhat (or build a module to do it) by using the OpenShift Online API. Another suggestion would be to get a free ssl certificate from StartSSL that lasts for a year and install it either using the command line, or the web console.

Running Mule Standalone vs Tomcat in Production

There are many ways of deploying Mule ESB into a production environment. According to the documentation, it appears that running Mule as a standalone service is the recommended way of doing so.
Are there any reasons for NOT running Mule standalone in production? I'm sure its stable, but how does it compare to Tomcat as far as performance, reliability, and resource utilization go?
Should I still consider running it within Tomcat for any reason?
Using Tomcat, or any other web container, allows you to use the web tier of that container for HTTP inbound endpoint (via the Servlet transport) instead of either Mule's HTTP or Jetty transports.
Other differences are found in class loading, handling of hot redeployment and logging.
Now the main reason why people do not use Mule standalone is corporate policy, ie "thou shalt deploy on _". When production teams have gained experience babysitting a particular Java app/web server, they want you to deploy your Mule project in that context so they can administer/monitor it in a well-known and consistent manner.
But if you're happy with the inbound HTTP layer you get in Mule standalone and you are allowed to deploy it in production, then go for it. It's production ready.
Mule actually recommends deploying standalone. Inside a container like e.g. tomcat it has to share the threadpool, heap etc... This can obviously prevent it from performing at it's best.
The main reason you'd want to inside a container like tomcat is to get automatic deployment. I.e. you can just update your Mule application .war and the container will restart mule with the new application. This helps in testing.
Also some transports are specific to running inside a container, like the servlet transport. OTOH when designing solution so Mule transports between your container and your servlets your'e doing it wrong.

Integrate different Nagios webservers

I have different sites running with 4 to 5 server at each location. All the locations have one monitoring server with Nagios. Now I want to create a central location and want to combine all the nagios services running at each location. Can anyone please point me to some documentation for these type of jobs.
There are two approaches that you can take.
Install a new Nagios core as you did at each location and perform active checks on each of the remote hosts. You'll likely end up installing NRPE on each of the remote hosts at each location and can read this document for the details: http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf. If your remote servers are Windows servers, you can use NSClient to much of the same things that NRPE does for Linux hosts. This effectively centralizes your monitoring server. I also wrote some how-to style entries for using NRPE to run privileged commands http://blog.gnucom.cc/?p=479 or to run event handlers http://blog.gnucom.cc/?p=458. If you get tired of installing NRPE, you can use my script here http://blog.gnucom.cc/?p=185. I also have instructions to install NSClient here http://blog.gnucom.cc/?p=201.
Install a new Nagios core as you did at each location and perform passive checks by instructing the remote Nagios cores to feed their results to the new central Nagios core's passive command file. I haven't done this myself, so I'm going to point you to the communities documentation here http://nagios.sourceforge.net/docs/2_0/passivechecks.html. You could probably look at my event handler post to set up event handlers that send checks to the main server.
From my personal experience, the first option I mentioned is easier to implement, and is far easy to administer. However, as your server fleet grows you'll start seeing major CPU bottlenecks with the main Nagios core. This is where passive checks would become beneficial, as the main Nagios core simply waits for critical checks to be sent to it rather than having to check them itself.
Hope this helps. :)
A centralized view tool may be what you are looking for. There are a number of different options available.
Nagiosfusion
MK Livestatus
Nagcen
Thruk