Performance Query - fiware

I would like to know the maximum data ingress/egress(packet per sec) limit in orion broker for both northbound and southbound.
suppose orion broker is integrated with json IoT agent. now what would be the data ingress/egress limit

There is no hard limit for that. Performance depends on package size, network, number of nodes, CPU, RAM, number -and behavior- of active subscriptions... if there are IotAgents involved (MQTT? HTTP?), balancing/failover policies, MongoDB environment... and so on...
So for any kind of performance question, at any system, it is a must to specify infrastructure, architecture, deployment and use case scenario. In other case the question can't be answered.
Best way to know it is to make a test with your concrete scenario, within the given environment. There are some public performance docs (made by the Fiware Foundation)but they are not extrapolable to any kind of environment.
Best.

Related

How to understand a role of a queue in a distributed system?

I am trying to understand what is the use case of a queue in distributed system.
And also how it scales and how it makes sure it's not a single point of failure in the system?
Any direct answer or a reference to a document is appreciated.
Use case:
I understand that queue is a messaging system. And it decouples the systems that communicate between each other. But, is that the only point of using a queue?
Scalability:
How does the queue scale for high volumes of data? Both read and write.
Reliability:
How does the queue not becoming a single point of failure in the system? Does the queue do a replication, similar to data-storage?
My question is not specified to any particular queue server like Kafka or JMS. Just in general.
Queue is a mental concept, the implementation decides about 1 + 2 + 3
A1: No, it is not the only role -- a messaging seems to be main one, but a distributed-system signalling is another one, by no means any less important. Hoare's seminal CSP-paper is a flagship in this field. Recent decades gave many more options and "smart-behaviours" to work with in designing a distributed-system signalling / messaging services' infrastructures.
A2: Scaling envelopes depend a lot on implementation. It seems obvious that a broker-less queues can work way faster, that a centralised, broker-based, infrastructure. Transport-classes and transport-links account for additional latency + performance degradation as the data-flow volumes grow. BLOB-handling is another level of a performance cliff, as the inefficiencies are accumulating down the distributed processing chain. Zero-copy ( almost ) zero-latency smart-Queue implementations are still victims of the operating systems and similar resources limitations.
A3: Oh sure it is, if left on its own, the SPOF. However, Theoretical Cybernetics makes us safe, as we can create reliable systems, while still using error-prone components. ( M + N )-failure-resilient schemes are thus achievable, however the budget + creativity + design-discipline is the ceiling any such Project has to survive with.
my take:
I would be careful with "decouple" term - if service A calls api on service B, there is coupling since there is a contract between services; this is true even if the communication is happening over a queue, file or fax. The key with queues is that the communication between services is asynchronous. Which means their runtimes are decoupled - from practical point of view, either of systems may go down without affecting the other.
Queues can scale for large volumes of data by partitioning. From clients point of view, there is one queue, but in reality there are many queues/shards and number of shards helps to support more data. Of course sharding a queue is not "free" - you will lose global ordering of events, which may need to be addressed in you application.
A good queue based solution is reliable based on replication/consensus/etc - depends on set of desired properties. Queues are not very different from databases in this regard.
To give you more direction to dig into:
there an interesting feature of queues: deliver-exactly-once, deliver-at-most-once, etc
may I recommend Enterprise Architecture Patterns - https://www.enterpriseintegrationpatterns.com/patterns/messaging/Messaging.html this is a good "system design" level of information
queues may participate in distributed transactions, e.g. you could build something like delete a record from database and write it into queue, and that will be either done/committed or rolledback - another interesting topic to explore

ELI5: How etcd really works and what is consensus algorithm

I am having hard time to grab what etcd (in CoreOS) really does, because all those "distributed key-value storage" thingy seems intangible to me. Further reading into etcd, it delves into into Raft consensus algorithm, and then it becomes really confusing to understand.
Let's say that what happen if a cluster system doesn't have etcd?
Thanks for your time and effort!
As someone with no CoreOS experience building a distributed system using etcd, I think I can shed some light on this.
The idea with etcd is to give some very basic primitives that are applicable for building a wide variety of distributed systems. The reason for this is that distributed systems are fundamentally hard. Most programmers don't really grok the difficulties simply because there are orders of magnitude more opportunity to learn about single-system programs; this has really only started to shift in the last 5 years since cloud computing made distributed systems cheap to build and experiment with. Even so, there's a lot to learn.
One of the biggest problems in distributed systems is consensus. In other words, guaranteeing that all nodes in a system agree on a particular value. Now, if hardware and networks were 100% reliable then it would be easy, but of course that is impossible. Designing an algorithm to provide some meaningful guarantees around consensus is a very difficult problem, and one that a lot of smart people have put a lot of time into. Paxos was the previous state of the art algorithm, but was very difficult to understand. Raft is an attempt to provide similar guarantees but be much more approachable to the average programmer. However, even so, as you have discovered, it is non-trivial to understand it's operational details and applications.
In terms of what etcd is specifically used for in CoreOS I can't tell you. But what I can say with certainty is that any data which needs to be shared and agreed upon by all machines in a cluster should be stored in etcd. Conversely, anything that a node (or subset of nodes) can handle on its own should emphatically not be stored in etcd (because it incurs the overhead of communicating and storing it on all nodes).
With etcd it's possible to have a large number of identical machines automatically coordinate, elect a leader, and guarantee an identical history of data in its key-value store such that:
No etcd node will ever return data which is not agreed upon by the majority of nodes.
For cluster size x any number of machines > x/2 can continue operating and accepting writes even if the others die or lose connectivity.
For any machines losing connectivity (eg. due to a netsplit), they are guaranteed to continue to return correct historical data even though they will fail to write.
The key-value store itself is quite simple and nothing particularly interesting, but these properties allow one to construct distributed systems that resist individual component failure and can provide reasonable guarantees of correctness.
etcd is a reliable system for cluster-wide coordination and state management. It is built on top of Raft.
Raft gives etcd a total ordering of events across a system of distributed etcd nodes. This has many advantages and disadvantages:
Advantages include:
any node may be treated like a master
minimal downtime (a client can try another node if one isn't responding)
avoids split-braining
a reliable way to build distributed locks for cluster-wide coordination
users of etcd can build distributed systems without ad-hoc, buggy, homegrown solutions
For example: You would use etcd to coordinate an automated election of a new Postgres master so that there remains only one master in the cluster.
Disadvantages include:
for safety reasons, it requires a majority of the cluster to commit writes - usually to disk - before replying to a client
requires more network chatter than a single master system

Stress test cases for web application

What other stress test cases are there other than finding out the maximum number of users allowed to login into the web application before it slows down the performance and eventually crashing it?
This question is hard to answer thoroughly since it's too broad.
Anyway many stress tests depend on the type and execution flow of your workload. There's an entire subject dedicated (as a graduate course) to queue theory and resources optimization. Most of the things can be summarized as follows:
if you have a resource (be it a gpu, cpu, memory bank, mechanical or
solid state disk, etc..), it can serve a number of users/requests per
second and takes an X amount of time to complete one unit of work.
Make sure you don't exceed its limits.
Some systems can also be studied with a probabilistic approach (Little's Law is one of the most fundamental rules in these cases)
There are a lot of reasons for load/performance testing, many of which may not be important to your project goals. For example:
- What is the performance of a system at a given load? (load test)
- How many users the system can handle and still meet a specific set of performance goals? (load test)
- How does the performance of a system changes over time under a certain load? (soak test)
- When will the system will crash under increasing load? (stress test)
- How does the system respond to hardware or environment failures? (stress test)
I've got a post on some common motivations for performance testing that may be helpful.
You should also check out your web analytics data and see what people are actually doing.
It's not enough to simply simulate X number of users logging in. Find the scenarios that represent the most common user activities (anywhere between 2 to 20 scenarios).
Also, make sure you're not just hitting your cache on reads. Add some randomness / diversity in the requests.
I've seen stress tests where all the users were requesting the same data which won't give you real world results.

Does there exist an open-source distributed logging library?

I'm talking about a library that would allow me to log events from different machines and would align these events on a "global" time axis with sufficiently high precision.
Actually, I'm asking because I've written such a thing myself in the course of a cluster computing project, I found it terrifically useful, and I was surprised that I couldn't find any analogues.
Therefore, the point is whether something like this exists (and I better contribute to it) or nothing exists (and I better write an open-source analogue of my solution).
Here are the features that I'd expect from such a library:
Independence on the clock offset between different machines
Timing precision on the order of at least milliseconds, preferably microseconds
Scalability to thousands of concurrent logging processes, with at least several megabytes of aggregated logs per second
Soft real-time operation (t.i. I don't want to collect 200 big logs from 200 machines and then compute clock offsets and merge them - I want to see what happens "live", perhaps with a small lag like 10s)
Facebook's contribution in the matter is called 'Scribe'.
Excerpt:
Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups.
...
Scribe is implemented as a thrift service using the non-blocking C++ server. The installation at facebook runs on thousands of machines and reliably delivers tens of billions of messages a day.
The API is Thrift-based, so you have a good platform coverage, but in case you're looking for simple integration for Java you may want to have a look at Digg's log4j appender for Scribe.
You could use log4j/log4net targeting a central syslog daemon. log4j has a builtin SyslogAppender, and in log4net you can do it as shown here. log4cpp docs here.
There are Windows implementations of Syslog around if you don't have a Unix system to hand for this.
Use Chukwa, Its Open source and Large scale Log Monitoring System

How can I model this usage scenario?

I want to create a fairly simple mathematical model that describes usage patterns and performance trade-offs in a system.
The system behaves as follows:
clients periodically issue multi-cast packets to a network of hosts
any host that receives the packet, responds with a unicast answer directly
the initiating host caches the responses for some given time period, then discards them
if the cache is full the next time a request is required, data is pulled from the cache not the network
packets are of a fixed size and always contain the same information
hosts are symmetic - any host can issue a request and respond to requests
I want to produce some simple mathematical models (and graphs) that describe the trade-offs available given some changes to the above system:
What happens where you vary the amount of time a host caches responses? How much data does this save? How many calls to the network do you avoid? (clearly depends on activity)
Suppose responses are also multi-cast, and any host that overhears another client's request can cache all the responses it hears - thereby saving itself potentially making a network request - how would this affect the overall state of the system?
Now, this one gets a bit more complicated - each request-response cycle alters the state of one other host in the network, so the more activity the quicker caches become invalid. How do I model the trade off between the number of hosts, the rate of activity, the "dirtyness" of the caches (assuming hosts listen in to other's responses) and how this changes with cache validity period? Not sure where to begin.
I don't really know what sort of mathematical model I need, or how I construct it. Clearly it's easier to just vary two parameters, but particularly with the last one, I've got maybe four variables changing that I want to explore.
Help and advice appreciated.
Investigate tokenised Petri nets. These seem to be an appropriate tool as they:
provide a graphical representation of the models
provide substantial mathematical analysis
have a large body of prior work and underlying analysis
are (relatively) simple mathematical models
seem to be directly tied to your problem in that they deal with constraint dependent networks that pass tokens only under specified conditions
I found a number of references (quality not assessed) by a search on "token Petri net"