What is QoS in the context of Ethereum - ethereum

I often see this line in my go-ethereum output.
Recalculated downloader QoS values rtt=5.329575339s confidence=0.923 ttl=17.32992345s
What does QoS stand for and what is its purpose in Ethereum?

It stands for Quality of Service. Geth keeps metrics on each of the peers you're connected to and the QoS metrics are used to tune for performance.

Related

Public transactions in quorum stuck in pending in transaction pool

I followed Quorum's docs and have created a 2 node network using the raft-consensus. In the genesis block i had pre-allocated funds to one of the accounts. Now I am trying to do a public transaction of some ethers to the other node.
However the transaction is getting stuck in the transaction pool and the balances of both nodes remain unchanged.
I have used the same genesis.json file that was provided in the documentation. Is there something I am missing?
Once the two nodes were brought up, I tried running -
eth.sendTransaction({from:current-node-address, to: second-node's-address, value:0x200,gas:21000})
On checking the transactionReceipt with the transaction hash that was generated, it displays null.
It sounds like your network is not minting blocks, so you may have some Raft misconfiguration.
Check the log files for any error messages.
You can also check that both nodes are in the network and that one of them is minting (is the leader) by using the command raft in the geth console.

Kubernetes on GCE / Prevent pods undergoing an eviction with "The node was low on compute resources."

Painful investigation on aspects that so far are not that highlighted by documentation (at least from what I've googled)
My cluster's kube-proxy became evicted (+-experienced users might be able to consider the faced issues). Searched a lot, but no clues about how to have them up again.
Until describing the concerned pod gave a clear reason : "The node was low on compute resources."
Still not that experienced with resources balance between pods/deployments and "physical" compute, how would one 'prioritizes' (or similar approach) to make sure specific pods will never end up in such a state ?
The cluster has been created with fairly low resources in order to get our hands on while keeping low costs and eventually witnessing such problems (gcloud container clusters create deemx --machine-type g1-small --enable-autoscaling --min-nodes=1 --max-nodes=5 --disk-size=30), is using g1-small is to prohibit ?
If you are using iptables-based kube-proxy (the current best practice), then kube-proxy being killed should not immediately cause your network connectivity to fail, but new services and updates to endpoints will stop working. Still, your apps should continue to work, but degrade slowly.
If you are using userspace kube-proxy, you might want to upgrade.
The error message sounds like it was due to memory pressure on the machine.
When there is memory pressure, Kubelet tries to terminate things in order of lowest to highest QoS level.
If your kube-proxy pod is not using Guaranteed resources, then you might want to change that.
Other things to look at:
if kube-proxy suddenly used a lot more memory, it could be terminated. If you made a huge number of pods or services or endpoints, this could cause it to use more memory.
if you started processes on the machine that are not under kubernetes control, that could cause kubelet to make an incorrect decision about what to terminate. Avoid this.
It is possible that on such a small machine as a g1-small, the amount of node resources held back is insufficient, such that too much guaranteed work got put on the machine -- see allocatable vs capacity. This might need tweaking.
Node oom documentation

Autoscaling GCE Instance groups based on Cloud pub/sub queue

Can GCE Instance groups be scaled up/down bases on Google Cloud PubSub queue counts or other asynchronous task queues such as PSQ?
Yes!
The feature is now in alpha: https://cloud.google.com/compute/docs/autoscaler/scaling-queue-based
I haven't tried this myself but looking at the documentation, it looks possible to set up autoscaling against Pub/Sub message queue counts.
This page [0] explains how to setup autoscaler to scale based on a standard metric provided by the Cloud Monitoring service.
This page [1] explains what metrics you can use for autoscaler. These two looks useful:
pubsub.googleapis.com/subscription/num_outstanding_messages
pubsub.googleapis.com/subscription/num_undelivered_messages
[0] https://cloud.google.com/compute/docs/autoscaler/scaling-cloud-monitoring-metrics
[1] https://cloud.google.com/monitoring/api/metrics
You can't use pubsub metrics (pubsub.googleapis.com/subscription/num_outstanding_messages or pubsub.googleapis.com/subscription/num_undelivered_messages) for that purpose.
According to the docs:
A valid utilization metric for scaling meets the following criteria:
The standard metric has a label for resource_id, and value of the label for each stream is ID of an instance.
The standard metric describes how busy an instance is, and the metric value increases or decreases proportionally to the number virtual machine instances in the group.
pubsub metrics don't meet that criteria.
However, there are two ways you can use pubsub based autoscaling:
Write your own custom metric - you can use the gcloud monitoring api to get your pubsub timeseries data. Than use it to calculate your own custom monitoring metric - for example - last time series value divided by your average/desired latency.
You can use this method with every async queue solution that you are using.
Still in alpha, there is a gcloud api for subscriber based autoscale: https://cloud.google.com/compute/docs/autoscaler/scaling-queue-based. This solution applies for google cloud pubsub only, and you can't use it with other async queue solutions.

Fiware: Data loss prevention

I’m working with the 0.27.0 version of context broker. I'm using the Cygnus generic enabler and I have established a MQTT agent that connects external devices to the context broker.
My major concern right now is how to prevent from data loss. I established the context broker and the Cygnus mongodb databases as replica sets, but that won't ensure that all data will be persisted into the databases. I have seen that Cygnus uses Apache flume. Looking at its configuration, the re-injection retries can be configured:
# Number of channel re-injection retries before a Flume event is definitely discarded (-1 means infinite retries)
cygnusagent.sources.http-source.handler.events_ttl = -1
¿It is a good idea to establish the retries value to -1? I have read about events re-injected in the channel forever.
¿What can be done to ensure that all the data will be persisted?
¿Is there any functionality into fiware ecosystem oriented to that purpose?
Regarding Cygnus, the TTL is for sure the way of controlling the persistence retries after an error. A retry means the data is reinjected in the internal channel communicating the source (which receives Orion notifications) and the sink (which persists the data in the final storage) for future persistence attempts.
Possible values for this TTL are:
TTL = 0: there are no retries, i.e. if the first time a notified data cannot be persisted in the final storage (because of a network fail, a storage error, whatever) then the data is dropped.
TTL > 0: there are as much retries as configured TTL. Once exhausted the TTL the data is dropped.
TTL = -1: infinite retries, i.e. the data is reinjected in the channel forever until it is persisted or the channel gets full.
As commented, a -1 TTL may consume the channel capacity if the final storage never gets OK, avoiding new received data is put into the channel. Nevertheless, if the final storage never gets OK, such a drawback does not matter, right? :)
Thus, we could say the rules for choosing a TTL are:
If you don't want retries, simply configure 0.
If you want retries but you don't mind to loose data afeter certain number of retries, then configure a positive value.
If you want retries but you don't want to loose data, then configure -1 and a large channel capacity since the final storage may be down for an unknown time.
In any case, the TTL feature is changing during this sprint. The behaviour will be the same, but instead of being applied to single events, it will applied to batches of events (batches may be about 1 single event, of course). You'll see this change in the next release of Cygnus (0.13.0), and it will be available at the end of February 2016 (at the moment of writing this, the next week :)). My recommendation is to wait for such a release if you want to instensively use the TTL feature.

Are there any protocols/standards on top of TCP optimized for high throughput and low latency?

Are there any protocols/standards that work over TCP that are optimized for high throughput and low latency?
The only one I can think of is FAST.
At the moment I have devised just a simple text-based protocol delimited by special characters. I'd like to adopt a protocol which is designed for fast transfer and supports perhaps compression and minification of the data that travels over the TCP socket.
Instead of using heavy-weight TCP, we can utilize the connection-oriented/reliable feature of TCP on the top of UDP by any of the following way:
UDP-based Data Transfer Protocol(UDT):
UDT is built on top of User Datagram Protocol (UDP) by adding congestion control and reliability control mechanisms. UDT is an application level, connection oriented, duplex protocol that supports both reliable data streaming and partial reliable messaging.
Acknowledgment:
UDT uses periodic acknowledgments (ACK) to confirm packet delivery, while negative ACKs (loss reports) are used to report packet loss. Periodic ACKs help to reduce control traffic on the reverse path when the data transfer speed is high, because in these situations, the number of ACKs is proportional to time, rather than the number of data packets.
Reliable User Datagram Protocol (RUDP):
It aims to provide a solution where UDP is too primitive because guaranteed-order packet delivery is desirable, but TCP adds too much complexity/overhead.
It extends UDP by adding the following additional features:
Acknowledgment of received packets
Windowing and congestion control
Retransmission of lost packets
Overbuffering (Faster than real-time streaming)
en.wikipedia.org/wiki/Reliable_User_Datagram_Protocol
If layered on top of TCP, you won't get better throughput or latency than the 'barest' TCP connection.
there are other non-TCP high-throughput and/or low-latency connection-oriented protocols, usually layered on top of UDP.
almost the only one i know is UDT, which is optimized for networks where the high bandwidth or long round trip times (RTT) makes typical TCP retransmissions suboptimal. These are called 'extremely long fat networks' (LFN, pronounced 'elefan').
You may want to consider JMS. JMS can run on top of TCP, and you can get reasonable latency with a message broker like ActiveMQ.
It really depends on your target audience though. If your building a game which must run anywhere, you pretty much need to use HTTP or HTTP/Streaming. If you are pushing around market data on a LAN, than something NOT using TCP would probably suite you better. Tibco RV and JGroups both provide reliable low-latency messaging over multicast.
Just as you mentioned FAST - it is intended for market data distribution and is used by leading stock exchanges and is running on the top of UDP multicast.
In general, with current level of networks reliability it always worth putting your protocol on the top of UDP.
Whatever having session sequence number, NACK+server-to-client-heartbeat and binary marshalling should be close to theoretical performance.
If you have admin/root privilege on the sending side, you can also try a TCP acceleration driver like SuperTCP.