Avalanche Cluster Nodes stops automatically with out any error - ethereum

I tried to setup avalanche cluster nodes using avalanche-network-runner
https://docs.avax.network/subnets/network-runner
I was able to run the nodes, but then after a while the nodes stop like below
[node1] INFO [11-24|17:23:12.281] <C Chain> github.com/ava-labs/coreth/core/blockchain.go:787: State manager shut down t="51.625µs"
[node1] INFO [11-24|17:23:12.281] <C Chain> github.com/ava-labs/coreth/core/blockchain.go:794: Shutting down sender cacher
[node1] INFO [11-24|17:23:12.281] <C Chain> github.com/ava-labs/coreth/core/blockchain.go:798: Closing scope
[node1] INFO [11-24|17:23:12.281] <C Chain> github.com/ava-labs/coreth/core/blockchain.go:802: Waiting for trie re-journal to complete
[node1] INFO [11-24|17:23:12.281] <C Chain> github.com/ava-labs/coreth/core/blockchain.go:805: Blockchain stopped
[11-24|17:23:12.292] INFO local/network.go:740 done stopping network
terminated network %!s(<nil>)
[11-24|17:23:12.292] INFO ux/output.go:14 terminated network %!s(<nil>)
[11-24|17:23:12.292] INFO server/server.go:376 no custom chain installation request, skipping its readiness check```
Not sure what is the issue here.

Related

MSDTC WS-AT, HTTP could not register URL https://+:2372/WsatService/. Your process does not have access rights to this namespace

On a Windows Server 2012 machine, I have a local DTC and a clustered DTC, as you can see here:
Here you can see the clustered DTC in the Failover Cluster Manager:
I have enabled WS-AT with the following command on the clustered DTC:
wsatconfig -network:enable -endpointCert:7c6361568413852afb471d5f8b92604cdde530dd -accountsCerts:3bcf068b0b984d2af9d2efa03e8a489c8483ba11 -virtualServer:ftsappdev -restart
For the endpointCert, I gave the thumbprint of the certificate for ftsappdev (the cluster role), and for accountscerts, I gave the thumbprint of the certificate of a JBOSS server.
I also have configured WS-AT for the local DTC through the WS-AT tab in Component Services:
In Failover Cluster Manager, when I take the clustered DTC resource offline and then online, I get the following entry in the Eventviewer/Application:
The MSDTC WS-AT protocol failed at the beginning of recovery. As a result, WS-AT functionality will be disabled.
Protocol ID: c05b9cad-ab24-4bb3-9440-3548fa7b4b1b
Protocol Name: WS-AtomicTransaction 1.1
Exception: Microsoft.Transactions.Bridge.PluggableProtocolException: A channel factory could not be opened. ---> Microsoft.Transactions.Wsat.Messaging.MessagingInitializationException: A channel factory could not be opened. ---> System.ServiceModel.AddressAccessDeniedException: HTTP could not register URL https://+:2372/WsatService/. Your process does not have access rights to this namespace (see http://go.microsoft.com/fwlink/?LinkId=70353 for details). ---> System.Net.HttpListenerException: Access is denied
at System.Net.HttpListener.AddAllPrefixes()
at System.Net.HttpListener.Start()
at System.ServiceModel.Channels.SharedHttpTransportManager.OnOpen()
--- End of inner exception stack trace ---
at System.ServiceModel.Channels.SharedHttpTransportManager.OnOpen()
at System.ServiceModel.Channels.TransportManager.Open(TransportChannelListener channelListener)
at System.ServiceModel.Channels.TransportManagerContainer.Open(SelectTransportManagersCallback selectTransportManagerCallback)
at System.ServiceModel.Channels.TransportChannelListener.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.HttpChannelListener`1.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
at System.ServiceModel.Channels.LayeredChannelListener`1.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
at System.ServiceModel.Channels.DatagramChannelDemuxer`2.OnOuterListenerOpen(ChannelDemuxerFilter filter, IChannelListener listener, TimeSpan timeout)
at System.ServiceModel.Channels.SingletonChannelListener`3.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
at System.ServiceModel.Channels.InternalDuplexChannelFactory.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannelFactory.TypedServiceChannelFactory`1.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
at System.ServiceModel.ChannelFactory.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
at Microsoft.Transactions.Wsat.Messaging.CoordinationService.OpenChannelFactory[T](ChannelFactory`1 cf)
--- End of inner exception stack trace ---
at Microsoft.Transactions.Wsat.Messaging.CoordinationService.OpenChannelFactory[T](ChannelFactory`1 cf)
at Microsoft.Transactions.Wsat.Messaging.CoordinationService.Initialize(CoordinationServiceConfiguration config)
at Microsoft.Transactions.Wsat.Messaging.CoordinationService..ctor(CoordinationServiceConfiguration config, ProtocolVersion protocolVersion)
at Microsoft.Transactions.Wsat.Protocol.ProtocolState.RecoveryBeginning()
--- End of inner exception stack trace ---
at Microsoft.Transactions.Wsat.Protocol.ProtocolState.RecoveryBeginning()
at Microsoft.Transactions.Wsat.InputOutput.TransactionManagerReceive.RecoveryBeginning()
Process Name: msdtc
Process ID: 12248
In Component Services, when I restart the local DTC I get the following entry in the Eventviewer/Application:
The WS-AT protocol service successfully completed startup and recovery.
Protocol ID: cc228cf4-a9c8-43fc-8281-8565eb5889f2
Protocol Name: WS-AtomicTransaction 1.0
Process Name: msdtc
Process ID: 7744
Both DTCs run under the user Network Service:
Why does the clustered DTC not have access rights to this namespace, whereas the local DTC has? Both run under the same user.
How can I make the clustered DTC to register the URL https://+:2372/WsatService/ successfully?
I finally used port 8444. I had to reserve it with the command:
netsh http add urlacl url=https://+:8444/ user=Everyone
and then I ran wsatonfig specifying port 8444:
wsatconfig -network:enable -port:8444 -accounts:Everyone -endpointcert:7c6361568413852afb471d5f8b92604cdde530dd -accountsCerts:7c6361568413852afb471d5f8b92604cdde530dd,83112f9b598c4341b3975aba413bf04eb71eb679 -traceLevel:ALL -restart
Another time, it helped to disable and reenable the Network DTC Access in the properties of the Local DTC and the Cluster DTC:
Disable Local DTC, Apply and OK:
Enable Local DTC, Apply and OK:
Disable Cluster DTC, Apply and OK:
Enable Cluster DTC, Apply and OK:

Connection reset by Cloudflare when building Packer image

I am trying to build a packer image for a digital ocean droplet, however when the build process finishes, it fails to create image (from what I can tell, that is a Cloudflare IP)
Any idea why this is happening or what I can do to investigate it further?
==> digitalocean: Gracefully shutting down droplet...
==> digitalocean: Error shutting down droplet: Post https://api.digitalocean.com/v2/droplets/198964166/actions: read tcp 10.0.2.15:44558->104.16.181.15:443: read: connection reset by peer
==> digitalocean: Destroying droplet...
==> digitalocean: Deleting temporary ssh key...
Build 'digitalocean' errored: Error shutting down droplet: Post https://api.digitalocean.com/v2/droplets/198964166/actions: read tcp 10.0.2.15:44558->104.16.181.15:443: read: connection reset by peer

geth does not persist trie node data from memory to disk on ungraceful system restart

Issue: geth 1.8.22 starts mining from one of the first blocks instead of the last one on system reboot.
What we have
We have 3 synced private geth nodes using PoA(clique).
What happened
One day(a week ago) we had issues with our hosting provider so we had to restart 2 out of 3 nodes(each node is on separate VPS). Current block is 4 000 000. When node 1 and node 2 were restarted they started mining from block 372 instead of the last one 4 000 000.
Why it happened (my guess)
Geth 1.8.22 keeps some data with trie node data in RAM instead of a disk. On graceful node shutdown(for example from console) this trie node data is saved to hard drive from RAM. On forced system shutdown(for example from hosting admin panel) trie node data does not have time to be saved on a hard drive. We had our nodes running for 6 months without any reboot so I think that this trie node data was kept in RAM for the whole time and it was vanished on system reboot(though we still have node 3 which is up and running).
Logs
Here are the logs when I'm trying to run the backup version of one of the nodes:
vladimir#comp:~/Public/projects/ethereum/repro-geth-bug/geth-linux-amd64-1.8.22-7fa3509e$ ./geth --datadir ../opt/ethereum/data/ --networkid 1515 --unlock 0xd6ee38421e1713dd50e888c6d689b82953946bc3 --password ../opt/ethereum/unlock_password --port 30306 --mine
INFO [11-21|17:06:25.374] Maximum peer count ETH=25 LES=0 total=25
INFO [11-21|17:06:25.374] Starting peer-to-peer node instance=Geth/v1.8.22-stable-7fa3509e/linux-amd64/go1.11.5
INFO [11-21|17:06:25.374] Allocated cache and file handles database=/home/vladimir/Public/projects/ethereum/repro-geth-bug/opt/ethereum/data/geth/chaindata cache=512 handles=2048
INFO [11-21|17:06:26.550] Initialised chain configuration config="{ChainID: 1515 Homestead: 1 DAO: <nil> DAOSupport: false EIP150: 2 EIP155: 3 EIP158: 3 Byzantium: 4 Constantinople: 5 ConstantinopleFix: <nil> Engine: clique}"
INFO [11-21|17:06:26.550] Initialising Ethereum protocol versions="[63 62]" network=1515
WARN [11-21|17:06:26.579] Head state missing, repairing chain number=4073749 hash=9bfb53…56d503
INFO [11-21|17:07:45.179] Rewound blockchain to past state number=371 hash=102018…d91947
INFO [11-21|17:07:45.180] Loaded most recent local header number=4073749 hash=9bfb53…56d503 td=8147499 age=2d5h43m
INFO [11-21|17:07:45.180] Loaded most recent local full block number=371 hash=102018…d91947 td=743 age=7mo3w6d
INFO [11-21|17:07:45.180] Loaded most recent local fast block number=4073749 hash=9bfb53…56d503 td=8147499 age=2d5h43m
INFO [11-21|17:07:45.180] Loaded local transaction journal transactions=3 dropped=3
INFO [11-21|17:07:45.180] Regenerated local transaction journal transactions=0 accounts=0
WARN [11-21|17:07:45.180] Blockchain not empty, fast sync disabled
INFO [11-21|17:07:45.623] New local node record seq=6 id=e8c5a9e8848d4e30 ip=127.0.0.1 udp=30306 tcp=30306
INFO [11-21|17:07:45.623] Started P2P networking self=enode://9647000ba2579dd529574b49f472f029839a09257c1bc3ade5135cbbb5f3ceaf1237aff5b6b947d2fa4f218fa24858dc2767bd4b78e082b04c9d013c1482cfa6#127.0.0.1:30306
INFO [11-21|17:07:45.624] IPC endpoint opened url=/home/vladimir/Public/projects/ethereum/repro-geth-bug/opt/ethereum/data/geth.ipc
INFO [11-21|17:07:46.192] Unlocked account address=0xd6ee38421e1713dD50E888c6D689B82953946bC3
INFO [11-21|17:07:46.192] Transaction pool price threshold updated price=1000000000
INFO [11-21|17:07:46.192] Transaction pool price threshold updated price=1000000000
INFO [11-21|17:07:46.192] Etherbase automatically configured address=0xd6ee38421e1713dD50E888c6D689B82953946bC3
INFO [11-21|17:07:46.192] Commit new mining work number=372 sealhash=685e15…2c52df uncles=0 txs=0 gas=0 fees=0 elapsed=75.951µs
INFO [11-21|17:07:46.192] Successfully sealed new block number=372 sealhash=685e15…2c52df hash=0c60ef…f29e6b elapsed=385.27µs
INFO [11-21|17:07:46.192] 🔨 mined potential block number=372 hash=0c60ef…f29e6b
INFO [11-21|17:07:46.193] Commit new mining work number=373 sealhash=337ae5…2b4704 uncles=0 txs=0 gas=0 fees=0 elapsed=222.362µs
INFO [11-21|17:07:47.962] Mapped network port proto=tcp extport=30306 intport=30306 interface="UPNP IGDv1-IP1"
INFO [11-21|17:07:48.391] Mapped network port proto=udp extport=30306 intport=30306 interface="UPNP IGDv1-IP1"
INFO [11-21|17:07:49.625] New local node record seq=7 id=e8c5a9e8848d4e30 ip=128.71.103.50 udp=30306 tcp=30306
INFO [11-21|17:07:51.001] Successfully sealed new block number=373 sealhash=337ae5…2b4704 hash=b67668…81f164 elapsed=4.807s
INFO [11-21|17:07:51.001] 🔨 mined potential block number=373 hash=b67668…81f164
INFO [11-21|17:07:51.002] Commit new mining work number=374 sealhash=c0e9f6…628d51 uncles=0 txs=0 gas=0 fees=0 elapsed=1.434ms
INFO [11-21|17:07:56.001] Successfully sealed new block number=374 sealhash=c0e9f6…628d51 hash=77aae2…9c44e8 elapsed=4.998s
INFO [11-21|17:07:56.001] 🔨 mined potential block number=374 hash=77aae2…9c44e8
INFO [11-21|17:07:56.003] Commit new mining work number=375 sealhash=6f7db7…adca12 uncles=0 txs=0 gas=0 fees=0 elapsed=1.305ms
^CINFO [11-21|17:07:58.483] Got interrupt, shutting down...
INFO [11-21|17:07:58.483] IPC endpoint closed url=/home/vladimir/Public/projects/ethereum/repro-geth-bug/opt/ethereum/data/geth.ipc
INFO [11-21|17:07:58.483] Writing cached state to disk block=374 hash=77aae2…9c44e8 root=e16e04…e93be1
INFO [11-21|17:07:58.483] Persisted trie from memory database nodes=0 size=0.00B time=7.185µs gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [11-21|17:07:58.483] Writing cached state to disk block=373 hash=b67668…81f164 root=e16e04…e93be1
INFO [11-21|17:07:58.483] Persisted trie from memory database nodes=0 size=0.00B time=2.571µs gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [11-21|17:07:58.484] Writing cached state to disk block=247 hash=7b422a…5f9a62 root=e16e04…e93be1
INFO [11-21|17:07:58.484] Persisted trie from memory database nodes=0 size=0.00B time=2.784µs gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B
INFO [11-21|17:07:58.484] Blockchain manager stopped
INFO [11-21|17:07:58.484] Stopping Ethereum protocol
INFO [11-21|17:07:58.484] Ethereum protocol stopped
INFO [11-21|17:07:58.484] Transaction pool stopped
INFO [11-21|17:07:58.497] Database closed database=/home/vladimir/Public/projects/ethereum/repro-geth-bug/opt/ethereum/data/geth/chaindata
How to fix
The 1st thing that comes to mind is to restart geth nodes(gracefully) via cron everyday so that nodes persist trie node data on the disk.
How to handle UNgraceful system shutdown so that geth node persists data and keeps mining from the latest block on restart?
Please check the full answer: https://github.com/ethereum/go-ethereum/issues/20383#issuecomment-558107815
In short:
geth persists data after 1 hour worth of block processing
if your network is super light (i.e. mostly empty blocks), it takes a very very long time until blocks are flushed from memory to hard drive
currently there is no way to configure the period of persistency rounds in geth
Solution: restart geth periodically so it saves data from RAM to hard drive

Error Starting Protocol Stack: Invalid arguement

I am currently trying to work with the geth and I want to start my private Ethereum Network so I can test my applications. However, when I try to use geth --datadir=./chaindata/ but that's only giving me some error in the terminal which I have shown at the bottom of this question. I am aware that there are other users that are having the same problem on Mac OS, which is what I'm using as well.
Here is the terminal output:
Steves-MBP:assignment_1 stevesahayadarlin$ geth --datadir=./chaindata/
WARN [01-06|22:12:18] No etherbase set and no accounts found as default
INFO [01-06|22:12:18] Starting peer-to-peer node instance=Geth/v1.7.3-stable/darwin-amd64/go1.9.2
INFO [01-06|22:12:18] Allocated cache and file handles database=/Users/stevesahayadarlin/Desktop/distributed_exchange_truffle_class_3-master/assignment_1/chaindata/geth/chaindata cache=128 handles=1024
INFO [01-06|22:12:18] Initialised chain configuration config="{ChainID: 15 Homestead: 0 DAO: <nil> DAOSupport: false EIP150: <nil> EIP155: 0 EIP158: 0 Byzantium: <nil> Engine: unknown}"
INFO [01-06|22:12:18] Disk storage enabled for ethash caches dir=/Users/stevesahayadarlin/Desktop/distributed_exchange_truffle_class_3-master/assignment_1/chaindata/geth/ethash count=3
INFO [01-06|22:12:18] Disk storage enabled for ethash DAGs dir=/Users/stevesahayadarlin/.ethash count=2
INFO [01-06|22:12:18] Initialising Ethereum protocol versions="[63 62]" network=1
INFO [01-06|22:12:18] Loaded most recent local header number=0 hash=9b8d4a…9021ba td=131072
INFO [01-06|22:12:18] Loaded most recent local full block number=0 hash=9b8d4a…9021ba td=131072
INFO [01-06|22:12:18] Loaded most recent local fast block number=0 hash=9b8d4a…9021ba td=131072
INFO [01-06|22:12:18] Loaded local transaction journal transactions=0 dropped=0
INFO [01-06|22:12:18] Regenerated local transaction journal transactions=0 accounts=0
INFO [01-06|22:12:18] Starting P2P networking
INFO [01-06|22:12:20] UDP listener up self=enode://258e1a8136fd23d47b97404139841059a37e95751182dde366adc4a22bab88b9580eb53bfb1de937016645817f071d0766a3be66e7e056c8f6afe0a450bb221d#70.106.232.168:30303
INFO [01-06|22:12:20] RLPx listener up self=enode://258e1a8136fd23d47b97404139841059a37e95751182dde366adc4a22bab88b9580eb53bfb1de937016645817f071d0766a3be66e7e056c8f6afe0a450bb221d#70.106.232.168:30303
INFO [01-06|22:12:20] Blockchain manager stopped
INFO [01-06|22:12:20] Stopping Ethereum protocol
INFO [01-06|22:12:20] Ethereum protocol stopped
INFO [01-06|22:12:20] Transaction pool stopped
INFO [01-06|22:12:20] Database closed database=/Users/stevesahayadarlin/Desktop/distributed_exchange_truffle_class_3-master/assignment_1/chaindata/geth/chaindata
INFO [01-06|22:12:20] Mapped network port proto=udp extport=30303 intport=30303 interface="UPNP IGDv1-IP1"
INFO [01-06|22:12:20] Mapped network port proto=tcp extport=30303 intport=30303 interface="UPNP IGDv1-IP1"
Fatal: Error starting protocol stack: listen unix /Users/stevesahayadarlin/Desktop/distributed_exchange_truffle_class_3-master/assignment_1/chaindata/geth.ipc: bind: invalid argument
Steves-MBP:assignment_1 stevesahayadarlin$

Instance Doesnt boot correctly, hangs on - "a start job is running for LSB: Raise network Interface.."

My VM was shutdown due to end of Trial. However i have since made payment and started other instances.
GCE UI shows this system as successfully booted, however looking at the serial port it shows the following (see image)
Any ideas how to fix this ?
Screenshot of Boot Error:
[ 6.895575] ppdev: user-space parallel port driver
[ 6.951588] ip6_tables: (C) 2000-2006 Netfilter Core Team
[ 6.993046] AVX version of gcm_enc/dec engaged.
[ 6.996351] alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni)
[ 7.001659] alg: No test for crc32 (crc32-pclmul)
[ OK ] Started LSB: start firewall.
[***] A start job is running for LSB: Raise network interf...17s / no limit)