Couchbase compaction and data loss - couchbase

We had an issue with our cluster recently (running on 4.5 currently)… Compaction was not completing, it seemed to be continually running on a single node and the disk was getting close to full. We canceled the compaction and had to bounce the node to get things stable. We are using a 4 node cluster with a replication factor of 2.
What we are seeing now is some documents missing, and some documents reverted to an earlier version.
Is it possible that the scenario described above could cause documents to be missing or reverted to a previous version?

Related

MySql automatic restart after memory runs full

we're using MySql on CloudSql for quite some time now.
Obviously, we started with Mysql 5 but after a long wait and the final release of Mysql8 we decided to upgrade our database server.
As the title promotes, we now see a strange behavior of our memory utilization.
As you can see here it constantly fills up until server max resources are reached and then restarts and start filling up again.
I mean there could be an issue with one of our services but before the upgrade our memory consumption looked like this:
So you can see, memory consumption was more or less constant.
Furthermore, we increased resources when we upgraded to mysql8 and switched from db-n1-standard-1 to db-n1-standard-2, to have more available resources when data grows up.
Does anyone knows this behavior? Is there a change in Mysql5 to 8? I didn't find any information about it. Just found some notes that it's normal that Mysql takes as much memory as it can get. But I'm still wondering why it didn't on Mysql5.
Some more details on the configuration:
We're using read replica for HA
Binarylogs activated
Slow Query log enabled with FILE output.
Everything else is default CloudSql Configuration.
Any help is much appreciated.
Best regards,
Chris
Indeed, it seems that MySQL 8 is consuming more memory than MySQL 5. As shown in some tests performed by the author of the article MySQL 8 and MySQL 5.7 Memory Consumption on Small Devices
, the memory used by the version 8 in same VM settings is considerably higher than on versions 5, including both resident and virtual memories - even though these are tests in small VMs, it's a good indication that this occurs in bigger configurations as well.
So, yes, it seems that, as you mentioned, it's normal that Mysql takes as much memory as it can get, but that indeed, MySQL 8 is consuming more memory than the 5 one.

AWS MySql RDS: "Got error 28 from storage engine"

I’m working on a relatively small application, serving about 1,500 users and running on a Mysql database that is about 300 megs. The entire system runs on AWS with a single dedicated EC2 node running the Grails application on Tomcat 8 and a single dedicated Mysql RDS instance. The system has been running live in production for about three years with no database issues. The two largest tables contain about 40k records. The application is built using Grails and Java 1.7.
Yesterday our application began throwing the following exception, with the underlying error message of:
"Got error 28 from storage engine"
The logs available from the RDS admin web console are empty.
Googling has not revealed any promising leads that have helped us resolve the issue, other than most messages point out that the disk is out of space. Given that most search results refer to disk space. Being software developers rather than DBA's with significant Mysql expertise, we boosted the storage space of the Mysql RDS instance. Unfortunately, today our application is still sporadically throwing the same exception. Having created our Mysql DRS instance with 15 gigs of space -- which is several orders of magnitude of additional space than our application makes use of -- we are at a lost as to what is the root cause of this issue. Our guess is that there is possibly some out of the box Mysql limitation that we are hitting up against but have no idea what it may be or how to solve it. Indeed, the whole reason we host onRDS was to avoid issues of this type.
Doing some Googling, this seems like a somewhat common Mysql error but that does not have any concrete trail for us to follow. Most suggestions talk about checking the filesystem or "inode" space. Given that this is a hosted Mysql RDS instance on AWS, I am unsure if or how to check such things. Looking at the CloudWatch for the RDS instance, I can see that the CPU is idle and that the instance is dramatically under the 15 gig storage limit.
Does anyone have any suggestions for us to investigate?
Given that we are new to RDS, can you please point us to any documentation or -- even better -- suggest what settings we can tweak in the RDS console -- to help prevent this error from occuring? Ideally, we moved to RDS thinking that if this was a mysql sizing or scaling issue that moving to RDS would solve the problem. As a last resort, this morning we then deleted about 20k rows of unessential data. Unfortunately, the issue persists and we continue to experience the issue.
A few questions:
Are there any RDS settings we can adjust to avoid this issue?
Can this be solved by moving to a larger RDS instance, perhaps with more memory?
Would we experience this issue if we moved to Aurora?
Well from your comment this is definitely a low on storage issue. Because 13 Gb is very less storage. you can check the free available storage in the dashboard. Check this screenshot below in the "Storage" metric under monitoring if it goes ahead of the red line you will start getting Error 28. You will have to increase the storage of your RDS instance or free up some space. I will suggest increase the storage to avoid this issue in future.

MySQL hanging in Writing to Net

I have problem, when MySQL thread sometime stuck at status "Writing to net".
I have 4 Apache server (2.4) (requests are load-balanced on them) a 1 MySQL (MariaDB 10). Apache is executing php56. All Apache servers have same configuration. All servers runs on CentOS 7. SElinux is disabled on Apache servers for debug reasons. No problems in audit logs on DB server. All servers are virtual and located on same cluster (VMware).
Problem appear only on specific pages and specific queries to DB.
Usually there is around 100-200 separate queries on page and most of them takes 0.0001-0.0010 s. But then I have one query that takes around 1-2sec. The query itself take much lesser time (around 0.0045s).
Problematic query returns around 8984 rows and when executed from CLI from debug script, it is executed fast as expected.
Strange is that in time some Apache servers execute that page quickly, and some slowly. It changes (during day). Also I tried remove one Apache server from cluster and then send same request. If server is not under any load, it usually responds fast.
All server have enough resources (CPU and RAM) so it is definitely not load issue. They usually have around 4-10 active Apache workers (prefork) and have capacity for 100 active workers.
I tried debugging with tcpdump and when requesting page, I can see packet flow for fast queries and then it stops for a while and resumes. Not sure if the problem is on MySQL server or on Apache server.
My guess is that I am hitting some kind of limit, but I have no idea which one.
The solution is quite odd.
First few more details:
All Apache severs have same application data (PHP files, images, etc.) Mounted from NFS. The NFS share was working fine (low latency, no data corruption).
Solution:
When I was desperate I went through every possible log. Then I noticed that iptables are dropping some packets from NFS server. Well I said to myself that I should probably fix that, even when its not related.
But after I allowed all traffic from NFS to my Apache servers, MySQL status "writing to net" disappeared and all websites started to respond quickly.

Cloud based LAMP cluster

I run a pretty customized cluster for processing large amounts of scientific data based on a basic LAMP design. In general, I run a separate MySQL server with around 128GB of ram and about 1TB of storage. Separately, I run a head node that serves as an nfs mount point for the data input of my process, and a webserver to display results. Finally, I trypically have a few compute nodes that get their jobs from a mysql table, get the data from NFS, do some heavy lifting, then put results into mysql.
I have come across a dataset I would like to process which is pretty large (1TB of input data), and I don't really have the hardware on hand to handle it. As a result, I began investigating google compute engine etc, and the prospect of scaling instances to process these data rapidly with the results stored in a mysql instance. Upon completion the mysql tables could be dumped from the cloud and brought up locally for analysis. I would have no problem deploying a MySQL server, along with the rest of the LAMP pieces and the compute nodes, but I can't quite figure out how I would do this in the cloud.
A major sticking point seems to be the lack of read/write NFS which would allow me to get the data onto several instances, crunch it, then push the results to MySQL. This is a necessary step for me as I could queue hundreds of jobs from the webserver, then have the instances (as many as 50-100) pick the jobs up by connecting to a centralized mysql instance to find out what jobs an instance needs to do and where the data is. Process the data (there is a file conversion that happens which make the write part necessary), crunch the data, then load results to mysql. I hope I'm explaining my situation clearly. This seems like a great example of a CPU intensive process that would scale nicely in the cloud, I just can't seem to put all the pieces together... Any input is appreciated!
It sounds quite possible; I've been doing similar things in GCE for a while now.
NFS mount - you just need configure it as you would normally. Set up the NFS server on the head node, and then configure the clients on the slave nodes to mount it. Here and here are some basic configuration instructions for Centos 6 I used to get NFS up and running.
Setting up a LAMP stack is very straightforward. These machines run pretty much vanilla Linux distros, so you can just use yum or apt-get to install components.
For the cluster, you will probably end up having an image for the head node you use once, and then another image for the slave nodes that you replicate for each one.
For the scheduler, I've used Condor and Sge successfully, but I'm sure the other ones would work just as well.
Hope this helps.

Quartz JobStoreTX instances disappear on cluster recovery

I have configured two Java WARs with quartz schedulers (version 2.2.1) starting with XMLSchedulingProcessorPlugin. Both web applications are also running in a cluster mode (they are deployed in two identical machines), so I enabled the properties for quartz:
#===========================================================================
# Clustering
#===========================================================================
org.quartz.jobStore.isClustered = true
org.quartz.jobStore.clusterCheckinInterval = 60000
Both applications are running in JBOSS AS 7.1 configured with Quartz´s JobStoreTX. They save their jobs, triggers and so on into a MySQL database, which currently is configured with Galera DB (1 virtual IP address, 2 real nodes).
Currently, I am testing the failure of one of the real nodes so the jobs keep up firing even when a power outage occurs. In that case I noticed some failures, such as the one described in this Terracotta issue (the patch is not applied in the current version of Quartz).
In my case, I should have 4 Quartz instances in the QRTZ_SCHEDULER_STATE table...even if one of the MySQL nodes restart. The fact is that sometimes one or two instances are deleted from the table (maybe the ones which do not have any active job) so I am afraid it is possible to lose both instances of an application during the cluster recovery.
Has anyone experienced the same? Any other solution than restarting the JBOSS in order to reload Jobs and Triggers?
Thanks in advance.