geth does not persist trie node data from memory to disk on ungraceful system restart

geth does not persist trie node data from memory to disk on ungraceful system restart - ethereum

Issue: geth 1.8.22 starts mining from one of the first blocks instead of the last one on system reboot.
What we have
We have 3 synced private geth nodes using PoA(clique).
What happened
One day(a week ago) we had issues with our hosting provider so we had to restart 2 out of 3 nodes(each node is on separate VPS). Current block is 4 000 000. When node 1 and node 2 were restarted they started mining from block 372 instead of the last one 4 000 000.
Why it happened (my guess)
Geth 1.8.22 keeps some data with trie node data in RAM instead of a disk. On graceful node shutdown(for example from console) this trie node data is saved to hard drive from RAM. On forced system shutdown(for example from hosting admin panel) trie node data does not have time to be saved on a hard drive. We had our nodes running for 6 months without any reboot so I think that this trie node data was kept in RAM for the whole time and it was vanished on system reboot(though we still have node 3 which is up and running).
Logs
Here are the logs when I'm trying to run the backup version of one of the nodes:
vladimir#comp:~/Public/projects/ethereum/repro-geth-bug/geth-linux-amd64-1.8.22-7fa3509e$ ./geth --datadir ../opt/ethereum/data/ --networkid 1515 --unlock 0xd6ee38421e1713dd50e888c6d689b82953946bc3 --password ../opt/ethereum/unlock_password --port 30306 --mine
INFO [11-21|17:06:25.374] Maximum peer count ETH=25 LES=0 total=25
INFO [11-21|17:06:25.374] Starting peer-to-peer node instance=Geth/v1.8.22-stable-7fa3509e/linux-amd64/go1.11.5
INFO [11-21|17:06:25.374] Allocated cache and file handles database=/home/vladimir/Public/projects/ethereum/repro-geth-bug/opt/ethereum/data/geth/chaindata cache=512 handles=2048
INFO [11-21|17:06:26.550] Initialised chain configuration config="{ChainID: 1515 Homestead: 1 DAO: <nil> DAOSupport: false EIP150: 2 EIP155: 3 EIP158: 3 Byzantium: 4 Constantinople: 5 ConstantinopleFix: <nil> Engine: clique}"
INFO [11-21|17:06:26.550] Initialising Ethereum protocol versions="[63 62]" network=1515
WARN [11-21|17:06:26.579] Head state missing, repairing chain number=4073749 hash=9bfb53…56d503
INFO [11-21|17:07:45.179] Rewound blockchain to past state number=371 hash=102018…d91947
INFO [11-21|17:07:45.180] Loaded most recent local header number=4073749 hash=9bfb53…56d503 td=8147499 age=2d5h43m
INFO [11-21|17:07:45.180] Loaded most recent local full block number=371 hash=102018…d91947 td=743 age=7mo3w6d
INFO [11-21|17:07:45.180] Loaded most recent local fast block number=4073749 hash=9bfb53…56d503 td=8147499 age=2d5h43m
INFO [11-21|17:07:45.180] Loaded local transaction journal transactions=3 dropped=3
INFO [11-21|17:07:45.180] Regenerated local transaction journal transactions=0 accounts=0
WARN [11-21|17:07:45.180] Blockchain not empty, fast sync disabled
INFO [11-21|17:07:45.623] New local node record seq=6 id=e8c5a9e8848d4e30 ip=127.0.0.1 udp=30306 tcp=30306
INFO [11-21|17:07:45.623] Started P2P networking self=enode://9647000ba2579dd529574b49f472f029839a09257c1bc3ade5135cbbb5f3ceaf1237aff5b6b947d2fa4f218fa24858dc2767bd4b78e082b04c9d013c1482cfa6#127.0.0.1:30306
INFO [11-21|17:07:45.624] IPC endpoint opened url=/home/vladimir/Public/projects/ethereum/repro-geth-bug/opt/ethereum/data/geth.ipc
INFO [11-21|17:07:46.192] Unlocked account address=0xd6ee38421e1713dD50E888c6D689B82953946bC3
INFO [11-21|17:07:46.192] Transaction pool price threshold updated price=1000000000
INFO [11-21|17:07:46.192] Transaction pool price threshold updated price=1000000000
INFO [11-21|17:07:46.192] Etherbase automatically configured address=0xd6ee38421e1713dD50E888c6D689B82953946bC3
INFO [11-21|17:07:46.192] Commit new mining work number=372 sealhash=685e15…2c52df uncles=0 txs=0 gas=0 fees=0 elapsed=75.951µs
INFO [11-21|17:07:46.192] Successfully sealed new block number=372 sealhash=685e15…2c52df hash=0c60ef…f29e6b elapsed=385.27µs
INFO [11-21|17:07:46.192] 🔨 mined potential block number=372 hash=0c60ef…f29e6b
INFO [11-21|17:07:46.193] Commit new mining work number=373 sealhash=337ae5…2b4704 uncles=0 txs=0 gas=0 fees=0 elapsed=222.362µs
INFO [11-21|17:07:47.962] Mapped network port proto=tcp extport=30306 intport=30306 interface="UPNP IGDv1-IP1"
INFO [11-21|17:07:48.391] Mapped network port proto=udp extport=30306 intport=30306 interface="UPNP IGDv1-IP1"
INFO [11-21|17:07:49.625] New local node record seq=7 id=e8c5a9e8848d4e30 ip=128.71.103.50 udp=30306 tcp=30306
INFO [11-21|17:07:51.001] Successfully sealed new block number=373 sealhash=337ae5…2b4704 hash=b67668…81f164 elapsed=4.807s
INFO [11-21|17:07:51.001] 🔨 mined potential block number=373 hash=b67668…81f164
INFO [11-21|17:07:51.002] Commit new mining work number=374 sealhash=c0e9f6…628d51 uncles=0 txs=0 gas=0 fees=0 elapsed=1.434ms
INFO [11-21|17:07:56.001] Successfully sealed new block number=374 sealhash=c0e9f6…628d51 hash=77aae2…9c44e8 elapsed=4.998s
INFO [11-21|17:07:56.001] 🔨 mined potential block number=374 hash=77aae2…9c44e8
INFO [11-21|17:07:56.003] Commit new mining work number=375 sealhash=6f7db7…adca12 uncles=0 txs=0 gas=0 fees=0 elapsed=1.305ms
^CINFO [11-21|17:07:58.483] Got interrupt, shutting down...
INFO [11-21|17:07:58.483] IPC endpoint closed url=/home/vladimir/Public/projects/ethereum/repro-geth-bug/opt/ethereum/data/geth.ipc
INFO [11-21|17:07:58.483] Writing cached state to disk block=374 hash=77aae2…9c44e8 root=e16e04…e93be1
INFO [11-21|17:07:58.483] Persisted trie from memory database nodes=0 size=0.00B time=7.185µs gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [11-21|17:07:58.483] Writing cached state to disk block=373 hash=b67668…81f164 root=e16e04…e93be1
INFO [11-21|17:07:58.483] Persisted trie from memory database nodes=0 size=0.00B time=2.571µs gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [11-21|17:07:58.484] Writing cached state to disk block=247 hash=7b422a…5f9a62 root=e16e04…e93be1
INFO [11-21|17:07:58.484] Persisted trie from memory database nodes=0 size=0.00B time=2.784µs gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [11-21|17:07:58.484] Blockchain manager stopped
INFO [11-21|17:07:58.484] Stopping Ethereum protocol
INFO [11-21|17:07:58.484] Ethereum protocol stopped
INFO [11-21|17:07:58.484] Transaction pool stopped
INFO [11-21|17:07:58.497] Database closed database=/home/vladimir/Public/projects/ethereum/repro-geth-bug/opt/ethereum/data/geth/chaindata
How to fix
The 1st thing that comes to mind is to restart geth nodes(gracefully) via cron everyday so that nodes persist trie node data on the disk.
How to handle UNgraceful system shutdown so that geth node persists data and keeps mining from the latest block on restart?

Please check the full answer: https://github.com/ethereum/go-ethereum/issues/20383#issuecomment-558107815
In short:
geth persists data after 1 hour worth of block processing
if your network is super light (i.e. mostly empty blocks), it takes a very very long time until blocks are flushed from memory to hard drive
currently there is no way to configure the period of persistency rounds in geth
Solution: restart geth periodically so it saves data from RAM to hard drive

Related

Service ‘memcached’ exited with status 139. Restarting. Messages (Couchbase)

I am facing below error most of time and at same time bucket appear down(1 node pending). This error facing just after setting up and Couchbase server even not accessed yet.
Service 'memcached' exited with status 139. Restarting. Messages:
2023-01-11T04:44:44.735996+00:00 CRITICAL /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ce) [0x400000+0x14fc5e]
2023-01-11T04:44:44.736000+00:00 CRITICAL /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0x94) [0x400000+0x14ff74]
2023-01-11T04:44:44.736009+00:00 CRITICAL /lib64/libpthread.so.0() [0x7f67cce7e000+0x12b20]
2023-01-11T04:44:44.736015+00:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0xd3e93]
2023-01-11T04:44:44.736019+00:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0xa3e8c]
2023-01-11T04:44:44.736024+00:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0(_ZN9Couchbase6Thread12thread_entryEv+0xf) [0x7f67cf6f2000+0x14e7f]
2023-01-11T04:44:44.736027+00:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f67cf6f2000+0x95d7]
2023-01-11T04:44:44.736031+00:00 CRITICAL /lib64/libpthread.so.0() [0x7f67cce7e000+0x814a]
2023-01-11T04:44:44.736061+00:00 CRITICAL /lib64/libc.so.6(clone+0x43) [0x7f67ccabb000+0xfcf23] hide
coubchbase bucket snapshot
couchbase server details:
Couchbase-server-community-6.6.0-7909
Single Couchbase instance(no other cluster node)
Installed on Operating System: CentOS Linux 8 (Core) ( Kernel: Linux 4.18.0-348.7.1.el8_5.x86_64)
Provided 10GB memory while setup the cluster (assigned 100MB to created bucket of type ephemeral)
Tried restart but not working and no such specific cause found in logs

ECS service keeps deregistering Target Group and start/stop tasks

I have an ECS service that is repeatedly starting and stopping a task running on a EC2 (m5.large) launch type container. The Events tab says these messages in a loop -
service test-service deregistered 1 targets in target-group localhost-localhost-default
service test-service has begun draining connections on 1 tasks.
service test-service deregistered 1 targets in target-group localhost-localhost-default
service test-service has started 2 tasks: task 4e1569b3-a15c-4bac-85f7-396b530113a5 task d5651035-8e3d-48df-b457-d05e5b7be8db.
There is nothing more there to help understand what might be going on. When I checked the Target group itself, the instances are not registered anymore to it. I have allocated memory: 1024 and cpu: 512 for the task which should be enough.
Is there anything I can do to understand what the problem here is ?

On this line,
service test-service has started 2 tasks: task 4e1569b3-a15c-4bac-85f7-396b530113a5 task d5651035-8e3d-48df-b457-d05e5b7be8db.
Task ID is a hyperlink, when you click that it will take you the page where you can find all the details about that particular task.
Here there is a entry "stopped reason" which will show why was the task stopped.
If it stopped because of health check failures, it will show in events page itself.

Error Starting Protocol Stack: Invalid arguement

I am currently trying to work with the geth and I want to start my private Ethereum Network so I can test my applications. However, when I try to use geth --datadir=./chaindata/ but that's only giving me some error in the terminal which I have shown at the bottom of this question. I am aware that there are other users that are having the same problem on Mac OS, which is what I'm using as well.
Here is the terminal output:
Steves-MBP:assignment_1 stevesahayadarlin$ geth --datadir=./chaindata/
WARN [01-06|22:12:18] No etherbase set and no accounts found as default
INFO [01-06|22:12:18] Starting peer-to-peer node instance=Geth/v1.7.3-stable/darwin-amd64/go1.9.2
INFO [01-06|22:12:18] Allocated cache and file handles database=/Users/stevesahayadarlin/Desktop/distributed_exchange_truffle_class_3-master/assignment_1/chaindata/geth/chaindata cache=128 handles=1024
INFO [01-06|22:12:18] Initialised chain configuration config="{ChainID: 15 Homestead: 0 DAO: <nil> DAOSupport: false EIP150: <nil> EIP155: 0 EIP158: 0 Byzantium: <nil> Engine: unknown}"
INFO [01-06|22:12:18] Disk storage enabled for ethash caches dir=/Users/stevesahayadarlin/Desktop/distributed_exchange_truffle_class_3-master/assignment_1/chaindata/geth/ethash count=3
INFO [01-06|22:12:18] Disk storage enabled for ethash DAGs dir=/Users/stevesahayadarlin/.ethash count=2
INFO [01-06|22:12:18] Initialising Ethereum protocol versions="[63 62]" network=1
INFO [01-06|22:12:18] Loaded most recent local header number=0 hash=9b8d4a…9021ba td=131072
INFO [01-06|22:12:18] Loaded most recent local full block number=0 hash=9b8d4a…9021ba td=131072
INFO [01-06|22:12:18] Loaded most recent local fast block number=0 hash=9b8d4a…9021ba td=131072
INFO [01-06|22:12:18] Loaded local transaction journal transactions=0 dropped=0
INFO [01-06|22:12:18] Regenerated local transaction journal transactions=0 accounts=0
INFO [01-06|22:12:18] Starting P2P networking
INFO [01-06|22:12:20] UDP listener up self=enode://258e1a8136fd23d47b97404139841059a37e95751182dde366adc4a22bab88b9580eb53bfb1de937016645817f071d0766a3be66e7e056c8f6afe0a450bb221d#70.106.232.168:30303
INFO [01-06|22:12:20] RLPx listener up self=enode://258e1a8136fd23d47b97404139841059a37e95751182dde366adc4a22bab88b9580eb53bfb1de937016645817f071d0766a3be66e7e056c8f6afe0a450bb221d#70.106.232.168:30303
INFO [01-06|22:12:20] Blockchain manager stopped
INFO [01-06|22:12:20] Stopping Ethereum protocol
INFO [01-06|22:12:20] Ethereum protocol stopped
INFO [01-06|22:12:20] Transaction pool stopped
INFO [01-06|22:12:20] Database closed database=/Users/stevesahayadarlin/Desktop/distributed_exchange_truffle_class_3-master/assignment_1/chaindata/geth/chaindata
INFO [01-06|22:12:20] Mapped network port proto=udp extport=30303 intport=30303 interface="UPNP IGDv1-IP1"
INFO [01-06|22:12:20] Mapped network port proto=tcp extport=30303 intport=30303 interface="UPNP IGDv1-IP1"
Fatal: Error starting protocol stack: listen unix /Users/stevesahayadarlin/Desktop/distributed_exchange_truffle_class_3-master/assignment_1/chaindata/geth.ipc: bind: invalid argument
Steves-MBP:assignment_1 stevesahayadarlin$

Unable to access Google Compute Engine instance using external IP address

I have a Google compute engine instance(Cent-Os) which I could access using its external IP address till recently.
Now suddenly the instance cannot be accessed using its using its external IP address.
I logged in to the developer console and tried rebooting the instance but that did not help.
I also noticed that the CPU usage is almost at 100% continuously.
On further analysis of the Serial port output it appears the init module is not loading properly.
I am pasting below the last few lines from the serial port output of the virtual machine.
rtc_cmos 00:01: RTC can wake from S4
rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
rtc0: alarms up to one day, 114 bytes nvram
cpuidle: using governor ladder
cpuidle: using governor menu
EFI Variables Facility v0.08 2004-May-17
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
usbhid: v2.6:USB HID core driver
GRE over IPv4 demultiplexor driver
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 17
registered taskstats version 1
rtc_cmos 00:01: setting system clock to 2014-07-04 07:40:53 UTC (1404459653)
Initalizing network drop monitor service
Freeing unused kernel memory: 1280k freed
Write protecting the kernel read-only data: 10240k
Freeing unused kernel memory: 800k freed
Freeing unused kernel memory: 1584k freed
Failed to execute /init
Kernel panic - not syncing: No init found. Try passing init= option to kernel.
Pid: 1, comm: swapper Not tainted 2.6.32-431.17.1.el6.x86_64 #1
Call Trace:
[] ? panic+0xa7/0x16f
[] ? init_post+0xa8/0x100
[] ? kernel_init+0x2e6/0x2f7
[] ? child_rip+0xa/0x20
[] ? kernel_init+0x0/0x2f7
[] ? child_rip+0x0/0x20
Thanks in advance for any tips to resolve this issue.
Mathew

It looks like you might have an script or other program that is causing you to run out of Inodes.
You can delete the instance without deleting the persistent disk (PD) and create a new vm with a higher capacity using your PD, however if it's an script causing this, you will end up with the same issue. It's always recommended to backup your PD before making any changes.
Run this command to find more info about your instance:
gcutil --project= getserialportoutput
If the issue still continue, you can either
- Make a snapshot of your PD and make a PD's copy or
- Delete the instance without deleting the PD
Attach and mount the PD to another vm as a second disk, so you can access it to find what is causing this issue. Visit this link https://developers.google.com/compute/docs/disks#attach_disk for more information on how to do this.
Visit this page http://www.ivankuznetsov.com/2010/02/no-space-left-on-device-running-out-of-inodes.html for more information about inodes troubleshooting.

Make sure the Allow HTTP traffic setting on the vm is still enabled.
Then see which network firewall you are using and it's rules.
If your network is set up to use an ephemral IP, it will be periodically released back. This will cause your IP to change over time. Set it to static/reserved then (on networks page).
https://developers.google.com/compute/docs/instances-and-network#externaladdresses

Hot reconfiguration of HAProxy still lead to failed request, any suggestions?

I found there are still failed request when the traffic is high using command like this
haproxy -f /etc/haproxy.cfg -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)
to hot reload the updated config file.
Here below is the presure testing result using webbench :
/usr/local/bin/webbench -c 10 -t 30 targetHProxyIP:1080
Webbench – Simple Web Benchmark 1.5
Copyright (c) Radim Kolar 1997-2004, GPL Open Source Software.
Benchmarking: GET targetHProxyIP:1080
10 clients, running 30 sec.
Speed=70586 pages/min, 13372974 bytes/sec.
**Requests: 35289 susceed, 4 failed.**
I run command
haproxy -f /etc/haproxy.cfg -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)
several times during the pressure testing.
In the haproxy documentation, it mentioned
They will receive the SIGTTOU
611 signal to ask them to temporarily stop listening to the ports so that the new
612 process can grab them
so there is a time period that the old process is not listening on the PORT(say 80) and the new process haven’t start to listen to the PORT (say 80), and during this specific time period, it will cause the NEW connections failed, make sense?
So is there any approach that makes the configuration reload of haproxy that will not impact both existing connections and new connections?

On recent kernels where SO_REUSEPORT is finally implemented (3.9+), this dead period does not exist anymore. While a patch has been available for older kernels for something like 10 years, it's obvious that many users cannot patch their kernels. If your system is more recent, then the new process will succeed its attempt to bind() before asking the previous one to release the port, then there's a period where both processes are bound to the port instead of no process.
There is still a very tiny possibility that a connection arrived in the leaving process' queue at the moment it closes it. There is no reliable way to stop this from happening though.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008