Service ‘memcached’ exited with status 139. Restarting. Messages (Couchbase) - couchbase

I am facing below error most of time and at same time bucket appear down(1 node pending). This error facing just after setting up and Couchbase server even not accessed yet.
Service 'memcached' exited with status 139. Restarting. Messages:
2023-01-11T04:44:44.735996+00:00 CRITICAL /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ce) [0x400000+0x14fc5e]
2023-01-11T04:44:44.736000+00:00 CRITICAL /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0x94) [0x400000+0x14ff74]
2023-01-11T04:44:44.736009+00:00 CRITICAL /lib64/libpthread.so.0() [0x7f67cce7e000+0x12b20]
2023-01-11T04:44:44.736015+00:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0xd3e93]
2023-01-11T04:44:44.736019+00:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0xa3e8c]
2023-01-11T04:44:44.736024+00:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0(_ZN9Couchbase6Thread12thread_entryEv+0xf) [0x7f67cf6f2000+0x14e7f]
2023-01-11T04:44:44.736027+00:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f67cf6f2000+0x95d7]
2023-01-11T04:44:44.736031+00:00 CRITICAL /lib64/libpthread.so.0() [0x7f67cce7e000+0x814a]
2023-01-11T04:44:44.736061+00:00 CRITICAL /lib64/libc.so.6(clone+0x43) [0x7f67ccabb000+0xfcf23] hide
coubchbase bucket snapshot
couchbase server details:
Couchbase-server-community-6.6.0-7909
Single Couchbase instance(no other cluster node)
Installed on Operating System: CentOS Linux 8 (Core) ( Kernel: Linux 4.18.0-348.7.1.el8_5.x86_64)
Provided 10GB memory while setup the cluster (assigned 100MB to created bucket of type ephemeral)
Tried restart but not working and no such specific cause found in logs

Related

geth does not persist trie node data from memory to disk on ungraceful system restart

Issue: geth 1.8.22 starts mining from one of the first blocks instead of the last one on system reboot.
What we have
We have 3 synced private geth nodes using PoA(clique).
What happened
One day(a week ago) we had issues with our hosting provider so we had to restart 2 out of 3 nodes(each node is on separate VPS). Current block is 4 000 000. When node 1 and node 2 were restarted they started mining from block 372 instead of the last one 4 000 000.
Why it happened (my guess)
Geth 1.8.22 keeps some data with trie node data in RAM instead of a disk. On graceful node shutdown(for example from console) this trie node data is saved to hard drive from RAM. On forced system shutdown(for example from hosting admin panel) trie node data does not have time to be saved on a hard drive. We had our nodes running for 6 months without any reboot so I think that this trie node data was kept in RAM for the whole time and it was vanished on system reboot(though we still have node 3 which is up and running).
Logs
Here are the logs when I'm trying to run the backup version of one of the nodes:
vladimir#comp:~/Public/projects/ethereum/repro-geth-bug/geth-linux-amd64-1.8.22-7fa3509e$ ./geth --datadir ../opt/ethereum/data/ --networkid 1515 --unlock 0xd6ee38421e1713dd50e888c6d689b82953946bc3 --password ../opt/ethereum/unlock_password --port 30306 --mine
INFO [11-21|17:06:25.374] Maximum peer count ETH=25 LES=0 total=25
INFO [11-21|17:06:25.374] Starting peer-to-peer node instance=Geth/v1.8.22-stable-7fa3509e/linux-amd64/go1.11.5
INFO [11-21|17:06:25.374] Allocated cache and file handles database=/home/vladimir/Public/projects/ethereum/repro-geth-bug/opt/ethereum/data/geth/chaindata cache=512 handles=2048
INFO [11-21|17:06:26.550] Initialised chain configuration config="{ChainID: 1515 Homestead: 1 DAO: <nil> DAOSupport: false EIP150: 2 EIP155: 3 EIP158: 3 Byzantium: 4 Constantinople: 5 ConstantinopleFix: <nil> Engine: clique}"
INFO [11-21|17:06:26.550] Initialising Ethereum protocol versions="[63 62]" network=1515
WARN [11-21|17:06:26.579] Head state missing, repairing chain number=4073749 hash=9bfb53…56d503
INFO [11-21|17:07:45.179] Rewound blockchain to past state number=371 hash=102018…d91947
INFO [11-21|17:07:45.180] Loaded most recent local header number=4073749 hash=9bfb53…56d503 td=8147499 age=2d5h43m
INFO [11-21|17:07:45.180] Loaded most recent local full block number=371 hash=102018…d91947 td=743 age=7mo3w6d
INFO [11-21|17:07:45.180] Loaded most recent local fast block number=4073749 hash=9bfb53…56d503 td=8147499 age=2d5h43m
INFO [11-21|17:07:45.180] Loaded local transaction journal transactions=3 dropped=3
INFO [11-21|17:07:45.180] Regenerated local transaction journal transactions=0 accounts=0
WARN [11-21|17:07:45.180] Blockchain not empty, fast sync disabled
INFO [11-21|17:07:45.623] New local node record seq=6 id=e8c5a9e8848d4e30 ip=127.0.0.1 udp=30306 tcp=30306
INFO [11-21|17:07:45.623] Started P2P networking self=enode://9647000ba2579dd529574b49f472f029839a09257c1bc3ade5135cbbb5f3ceaf1237aff5b6b947d2fa4f218fa24858dc2767bd4b78e082b04c9d013c1482cfa6#127.0.0.1:30306
INFO [11-21|17:07:45.624] IPC endpoint opened url=/home/vladimir/Public/projects/ethereum/repro-geth-bug/opt/ethereum/data/geth.ipc
INFO [11-21|17:07:46.192] Unlocked account address=0xd6ee38421e1713dD50E888c6D689B82953946bC3
INFO [11-21|17:07:46.192] Transaction pool price threshold updated price=1000000000
INFO [11-21|17:07:46.192] Transaction pool price threshold updated price=1000000000
INFO [11-21|17:07:46.192] Etherbase automatically configured address=0xd6ee38421e1713dD50E888c6D689B82953946bC3
INFO [11-21|17:07:46.192] Commit new mining work number=372 sealhash=685e15…2c52df uncles=0 txs=0 gas=0 fees=0 elapsed=75.951µs
INFO [11-21|17:07:46.192] Successfully sealed new block number=372 sealhash=685e15…2c52df hash=0c60ef…f29e6b elapsed=385.27µs
INFO [11-21|17:07:46.192] 🔨 mined potential block number=372 hash=0c60ef…f29e6b
INFO [11-21|17:07:46.193] Commit new mining work number=373 sealhash=337ae5…2b4704 uncles=0 txs=0 gas=0 fees=0 elapsed=222.362µs
INFO [11-21|17:07:47.962] Mapped network port proto=tcp extport=30306 intport=30306 interface="UPNP IGDv1-IP1"
INFO [11-21|17:07:48.391] Mapped network port proto=udp extport=30306 intport=30306 interface="UPNP IGDv1-IP1"
INFO [11-21|17:07:49.625] New local node record seq=7 id=e8c5a9e8848d4e30 ip=128.71.103.50 udp=30306 tcp=30306
INFO [11-21|17:07:51.001] Successfully sealed new block number=373 sealhash=337ae5…2b4704 hash=b67668…81f164 elapsed=4.807s
INFO [11-21|17:07:51.001] 🔨 mined potential block number=373 hash=b67668…81f164
INFO [11-21|17:07:51.002] Commit new mining work number=374 sealhash=c0e9f6…628d51 uncles=0 txs=0 gas=0 fees=0 elapsed=1.434ms
INFO [11-21|17:07:56.001] Successfully sealed new block number=374 sealhash=c0e9f6…628d51 hash=77aae2…9c44e8 elapsed=4.998s
INFO [11-21|17:07:56.001] 🔨 mined potential block number=374 hash=77aae2…9c44e8
INFO [11-21|17:07:56.003] Commit new mining work number=375 sealhash=6f7db7…adca12 uncles=0 txs=0 gas=0 fees=0 elapsed=1.305ms
^CINFO [11-21|17:07:58.483] Got interrupt, shutting down...
INFO [11-21|17:07:58.483] IPC endpoint closed url=/home/vladimir/Public/projects/ethereum/repro-geth-bug/opt/ethereum/data/geth.ipc
INFO [11-21|17:07:58.483] Writing cached state to disk block=374 hash=77aae2…9c44e8 root=e16e04…e93be1
INFO [11-21|17:07:58.483] Persisted trie from memory database nodes=0 size=0.00B time=7.185µs gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [11-21|17:07:58.483] Writing cached state to disk block=373 hash=b67668…81f164 root=e16e04…e93be1
INFO [11-21|17:07:58.483] Persisted trie from memory database nodes=0 size=0.00B time=2.571µs gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [11-21|17:07:58.484] Writing cached state to disk block=247 hash=7b422a…5f9a62 root=e16e04…e93be1
INFO [11-21|17:07:58.484] Persisted trie from memory database nodes=0 size=0.00B time=2.784µs gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [11-21|17:07:58.484] Blockchain manager stopped
INFO [11-21|17:07:58.484] Stopping Ethereum protocol
INFO [11-21|17:07:58.484] Ethereum protocol stopped
INFO [11-21|17:07:58.484] Transaction pool stopped
INFO [11-21|17:07:58.497] Database closed database=/home/vladimir/Public/projects/ethereum/repro-geth-bug/opt/ethereum/data/geth/chaindata
How to fix
The 1st thing that comes to mind is to restart geth nodes(gracefully) via cron everyday so that nodes persist trie node data on the disk.
How to handle UNgraceful system shutdown so that geth node persists data and keeps mining from the latest block on restart?
Please check the full answer: https://github.com/ethereum/go-ethereum/issues/20383#issuecomment-558107815
In short:
geth persists data after 1 hour worth of block processing
if your network is super light (i.e. mostly empty blocks), it takes a very very long time until blocks are flushed from memory to hard drive
currently there is no way to configure the period of persistency rounds in geth
Solution: restart geth periodically so it saves data from RAM to hard drive

Trying to monitor resource usage of a kvm/qemu virtual machine with mesos

I’m currently deploying a kvm/qemu virtual machine with mesos/marathon. In marathon, I’m using the built in mesos command executor and running the script.
virsh start centos7.0; while true; do echo 'centos 7.0 guest is running'; sleep 5; done
Note the while loop is there only to keep the task running. My issue is that I cannot get mesos to monitor the resource usage of the virtual machine.
When marathon deploys this task on a mesos-agent, it is creating a container that uses the memory and cpu cgroups.
/sys/fs/cgroup/cpu/mesos/31b48dc3-6f09-4b5a-8964-b82d711bb895
/sys/fs/cgroup/memory/mesos/31b48dc3-6f09-4b5a-8964-b82d711bb895
When the virtual machine is being kicked off, the virsh start command is sending a request to libvirtd. Libvirtd then reads the guest.xml file located in /etc/libvirt/qemu/ and then sends a request to the qemu/kvm driver to deploy it.
In my guest.xml file I’m using a custom partition cgroup slice to monitor my virtual machine usage.
https://libvirt.org/cgroups.html
(for each cgroup)
/sys/fs/cgroup/???/vmHolder.slice/machine-qemu\x2d1\x2dcentos7.0\x2dclone.scope
What I have tried.
I tried deleting my memory / cpu cgroup from this slice by doing
cgdelete -r cpu,memory:vmHolder.slice
and then adding my qemu guest process to the mesos controllers
cgclassify -g cpu,memory:mesos/31b48dc3-6f09-4b5a-8964-b82d711bb895 GUEST-PID
When I run the command cat /proc/5531/cgroup
11:perf_event:/vmHolder.slice/machine-qemu\x2d1\x2dcentos7.0\x2dclone.scope
10:pids:/
9:devices:/vmHolder.slice/machine-qemu\x2d1\x2dcentos7.0\x2dclone.scope
8:cpuset:/vmHolder.slice/machine-qemu\x2d1\x2dcentos7.0\x2dclone.scope/emulator
7:net_prio,net_cls:/vmHolder.slice/machine-qemu\x2d1\x2dcentos7.0\x2dclone.scope
6:freezer:/vmHolder.slice/machine-qemu\x2d1\x2dcentos7.0\x2dclone.scope
5:blkio:/vmHolder.slice/machine-qemu\x2d1\x2dcentos7.0\x2dclone.scope
4:hugetlb:/
3:cpuacct,cpu:/mesos/31b48dc3-6f09-4b5a-8964-b82d711bb895
2:memory:/mesos/31b48dc3-6f09-4b5a-8964-b82d711bb895
1:name=systemd:/vmHolder.slice/machine-qemu\x2d1\x2dcentos7.0\x2dclone.scope
It shows that I’m using those controllers, but when I run systemd-cgtop it's not adding the memory usage of the VM. I'm not sure what to do next. Any suggestions?

Mount LVM2 and recover lost data from failed HDD?

HDD failure occurred.
So, a new primary HDD was added in and the old HDD was added in as a secondary one.
I'm trying to mount my secondary HDD but there are errors occurring.
I made /media/qwe/.
I then went on Putty and used these SSH commands:
root#chicken [/]# mount /dev/sdb2 /media/qwe
mount: unknown filesystem type 'LVM2_member'
But, I got an error.
root#chicken [/]# vgscan
Reading all physical volumes. This may take a while...
Found volume group "VolGroup" using metadata type lvm2
Found volume group "VolGroup" using metadata type lvm2
root#chicken [/]# vgs
VG #PV #LV #SN Attr VSize VFree
VolGroup 1 3 0 wz--n- 1.82t 0
VolGroup 1 3 0 wz--n- 1.82t 0
I use cPanel and WHM.
I am trying to recover the MySQL databases that were lost. I managed to mount the sdb1 bit, but I think that's the boot partition. I don't need that. I need to access the other files!
Any help?
You don't need file system to get you data back.
Start with taking an image from the failed disk

Unable to access Google Compute Engine instance using external IP address

I have a Google compute engine instance(Cent-Os) which I could access using its external IP address till recently.
Now suddenly the instance cannot be accessed using its using its external IP address.
I logged in to the developer console and tried rebooting the instance but that did not help.
I also noticed that the CPU usage is almost at 100% continuously.
On further analysis of the Serial port output it appears the init module is not loading properly.
I am pasting below the last few lines from the serial port output of the virtual machine.
rtc_cmos 00:01: RTC can wake from S4
rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
rtc0: alarms up to one day, 114 bytes nvram
cpuidle: using governor ladder
cpuidle: using governor menu
EFI Variables Facility v0.08 2004-May-17
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
usbhid: v2.6:USB HID core driver
GRE over IPv4 demultiplexor driver
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 17
registered taskstats version 1
rtc_cmos 00:01: setting system clock to 2014-07-04 07:40:53 UTC (1404459653)
Initalizing network drop monitor service
Freeing unused kernel memory: 1280k freed
Write protecting the kernel read-only data: 10240k
Freeing unused kernel memory: 800k freed
Freeing unused kernel memory: 1584k freed
Failed to execute /init
Kernel panic - not syncing: No init found. Try passing init= option to kernel.
Pid: 1, comm: swapper Not tainted 2.6.32-431.17.1.el6.x86_64 #1
Call Trace:
[] ? panic+0xa7/0x16f
[] ? init_post+0xa8/0x100
[] ? kernel_init+0x2e6/0x2f7
[] ? child_rip+0xa/0x20
[] ? kernel_init+0x0/0x2f7
[] ? child_rip+0x0/0x20
Thanks in advance for any tips to resolve this issue.
Mathew
It looks like you might have an script or other program that is causing you to run out of Inodes.
You can delete the instance without deleting the persistent disk (PD) and create a new vm with a higher capacity using your PD, however if it's an script causing this, you will end up with the same issue. It's always recommended to backup your PD before making any changes.
Run this command to find more info about your instance:
gcutil --project= getserialportoutput
If the issue still continue, you can either
- Make a snapshot of your PD and make a PD's copy or
- Delete the instance without deleting the PD
Attach and mount the PD to another vm as a second disk, so you can access it to find what is causing this issue. Visit this link https://developers.google.com/compute/docs/disks#attach_disk for more information on how to do this.
Visit this page http://www.ivankuznetsov.com/2010/02/no-space-left-on-device-running-out-of-inodes.html for more information about inodes troubleshooting.
Make sure the Allow HTTP traffic setting on the vm is still enabled.
Then see which network firewall you are using and it's rules.
If your network is set up to use an ephemral IP, it will be periodically released back. This will cause your IP to change over time. Set it to static/reserved then (on networks page).
https://developers.google.com/compute/docs/instances-and-network#externaladdresses

Multiple HAProxy instances on OpenShift

I have an application (Node.JS) deployed on OpenShift (bronze plan) with the Web Load Balancer activated, the minimum gears active are 3 and the max are 16.
Sometimes in the main gear I can see more than one HAProxy instance running, for example now I have:
> ps -ef|grep /usr/sbin/haproxy
3505 37488 1 1 08:46 ? 00:00:01 /usr/sbin/haproxy -f /var/lib/openshift/<APP_ID>/haproxy//conf/haproxy.cfg -sf 37237
3505 149643 1 1 May28 ? 00:09:08 /usr/sbin/haproxy -f /var/lib/openshift/<APP_ID>/haproxy//conf/haproxy.cfg -sf 114873
looking the logs I can't any error. Any explanation about this?
Thanks!
This could be a consequence of executing Haproxy reload script (/etc/init.d/haproxy). This will usually create a new haproxy process to accept new connections. It will also keep the old process alive until there are still open connections to it. Once they are closed, old haproxy process will be terminated.