Modern monolitical application - message-queue

I know in these days microservice architecture is the best of :) But we create application on demand so customer is who will determine if application will be on cloud or not. We have old application which we need to modernize to decrease complexity and coupling. It is classic application with many logic in stored procedures. I read about DDD and I think bounded context is nice idea how to decrease complexity. But when there are bounded contexts which are separated, how do they communicate together? In microservice architecture there could be RPC or message queuing system. How to do when I want to create communication between bounded contexts in monolith which they are low coupled? Do you have some experiences?

Related

How to understand a role of a queue in a distributed system?

I am trying to understand what is the use case of a queue in distributed system.
And also how it scales and how it makes sure it's not a single point of failure in the system?
Any direct answer or a reference to a document is appreciated.
Use case:
I understand that queue is a messaging system. And it decouples the systems that communicate between each other. But, is that the only point of using a queue?
Scalability:
How does the queue scale for high volumes of data? Both read and write.
Reliability:
How does the queue not becoming a single point of failure in the system? Does the queue do a replication, similar to data-storage?
My question is not specified to any particular queue server like Kafka or JMS. Just in general.
Queue is a mental concept, the implementation decides about 1 + 2 + 3
A1: No, it is not the only role -- a messaging seems to be main one, but a distributed-system signalling is another one, by no means any less important. Hoare's seminal CSP-paper is a flagship in this field. Recent decades gave many more options and "smart-behaviours" to work with in designing a distributed-system signalling / messaging services' infrastructures.
A2: Scaling envelopes depend a lot on implementation. It seems obvious that a broker-less queues can work way faster, that a centralised, broker-based, infrastructure. Transport-classes and transport-links account for additional latency + performance degradation as the data-flow volumes grow. BLOB-handling is another level of a performance cliff, as the inefficiencies are accumulating down the distributed processing chain. Zero-copy ( almost ) zero-latency smart-Queue implementations are still victims of the operating systems and similar resources limitations.
A3: Oh sure it is, if left on its own, the SPOF. However, Theoretical Cybernetics makes us safe, as we can create reliable systems, while still using error-prone components. ( M + N )-failure-resilient schemes are thus achievable, however the budget + creativity + design-discipline is the ceiling any such Project has to survive with.
my take:
I would be careful with "decouple" term - if service A calls api on service B, there is coupling since there is a contract between services; this is true even if the communication is happening over a queue, file or fax. The key with queues is that the communication between services is asynchronous. Which means their runtimes are decoupled - from practical point of view, either of systems may go down without affecting the other.
Queues can scale for large volumes of data by partitioning. From clients point of view, there is one queue, but in reality there are many queues/shards and number of shards helps to support more data. Of course sharding a queue is not "free" - you will lose global ordering of events, which may need to be addressed in you application.
A good queue based solution is reliable based on replication/consensus/etc - depends on set of desired properties. Queues are not very different from databases in this regard.
To give you more direction to dig into:
there an interesting feature of queues: deliver-exactly-once, deliver-at-most-once, etc
may I recommend Enterprise Architecture Patterns - https://www.enterpriseintegrationpatterns.com/patterns/messaging/Messaging.html this is a good "system design" level of information
queues may participate in distributed transactions, e.g. you could build something like delete a record from database and write it into queue, and that will be either done/committed or rolledback - another interesting topic to explore

what is the difference between distributed computing, microservice and parallel computing

My basic understanding for:
Distributed computing is a model of connected nodes -from hardware perspective they share only network connection- and communicate through messages. each node code be responsible for one part of the business logic as in ERP system there is a node for hr, node for accounting. communication could be HTML, SOA, RCP
Microservice is a service that is responsible for one part of the business logic and communicate with each other usually by http. microservices could share the hardware resources and are accessed by thier api.
Parallel systems are systems which optimize the use of resources. for example multithreaded app running on several thread where sharing memory resources.
I am a little bit confused since microservices are distributed systems, but when running multiple microservices on single hardware resources they are also parallel systems. Am i getting it right here:
Micro services is one way to do distributed computing. There are many more distributed computing models like Map-Reduce and Bulk Synchronous Parallel.
However, as you pointed out, you don't need to use micro servers for a distributed system. You can put all your services on one machine. It's like using a screw driver to hammer a nail ;). Yeah, you'll have parallel computation on a single multi-core machine, but are micro services the right way to achieve it? They might be if you plan to move those services onto separate machines. However, if those services require co-location, then micro services was the wrong tool.
Distributed systems is one way to do parallel computing. There are many different ways to achieve parallel computation, like grid computing, multi-core machines, etc. Many of them are listed in the article I linked.

ELI5: How etcd really works and what is consensus algorithm

I am having hard time to grab what etcd (in CoreOS) really does, because all those "distributed key-value storage" thingy seems intangible to me. Further reading into etcd, it delves into into Raft consensus algorithm, and then it becomes really confusing to understand.
Let's say that what happen if a cluster system doesn't have etcd?
Thanks for your time and effort!
As someone with no CoreOS experience building a distributed system using etcd, I think I can shed some light on this.
The idea with etcd is to give some very basic primitives that are applicable for building a wide variety of distributed systems. The reason for this is that distributed systems are fundamentally hard. Most programmers don't really grok the difficulties simply because there are orders of magnitude more opportunity to learn about single-system programs; this has really only started to shift in the last 5 years since cloud computing made distributed systems cheap to build and experiment with. Even so, there's a lot to learn.
One of the biggest problems in distributed systems is consensus. In other words, guaranteeing that all nodes in a system agree on a particular value. Now, if hardware and networks were 100% reliable then it would be easy, but of course that is impossible. Designing an algorithm to provide some meaningful guarantees around consensus is a very difficult problem, and one that a lot of smart people have put a lot of time into. Paxos was the previous state of the art algorithm, but was very difficult to understand. Raft is an attempt to provide similar guarantees but be much more approachable to the average programmer. However, even so, as you have discovered, it is non-trivial to understand it's operational details and applications.
In terms of what etcd is specifically used for in CoreOS I can't tell you. But what I can say with certainty is that any data which needs to be shared and agreed upon by all machines in a cluster should be stored in etcd. Conversely, anything that a node (or subset of nodes) can handle on its own should emphatically not be stored in etcd (because it incurs the overhead of communicating and storing it on all nodes).
With etcd it's possible to have a large number of identical machines automatically coordinate, elect a leader, and guarantee an identical history of data in its key-value store such that:
No etcd node will ever return data which is not agreed upon by the majority of nodes.
For cluster size x any number of machines > x/2 can continue operating and accepting writes even if the others die or lose connectivity.
For any machines losing connectivity (eg. due to a netsplit), they are guaranteed to continue to return correct historical data even though they will fail to write.
The key-value store itself is quite simple and nothing particularly interesting, but these properties allow one to construct distributed systems that resist individual component failure and can provide reasonable guarantees of correctness.
etcd is a reliable system for cluster-wide coordination and state management. It is built on top of Raft.
Raft gives etcd a total ordering of events across a system of distributed etcd nodes. This has many advantages and disadvantages:
Advantages include:
any node may be treated like a master
minimal downtime (a client can try another node if one isn't responding)
avoids split-braining
a reliable way to build distributed locks for cluster-wide coordination
users of etcd can build distributed systems without ad-hoc, buggy, homegrown solutions
For example: You would use etcd to coordinate an automated election of a new Postgres master so that there remains only one master in the cluster.
Disadvantages include:
for safety reasons, it requires a majority of the cluster to commit writes - usually to disk - before replying to a client
requires more network chatter than a single master system

Performances evaluation with Message Passing

I have to build a distributed application, using MPI.
One of the decision that I have to take is how to map instances of classes into process (and then into machines), in order to take maximum advantages from a distributed environment.
My question is: there is a model that let me choose the better mapping? I mean, some arrangements are surely wrong (for ex., putting in two different machines two objects that should process together a fairly large amount of data, in a sequential manner, without a stream of tokens to process), but there's a systematically way to determine such wrong arrangements, determined by flow of execution, message complexity, time taken by the computation done by the algorithmic components?
Well, there are data flow diagrams. Those can help identify parallelism's opportunities and pitfalls. The references on the wikipedia page might give you some more theoretical grounding.
When I worked at Lockheed Martin, I was exposed to CSIM, a tool they developed for modeling algorithm mapping to processing blocks.
Another thing you might try is the Join Calculus. I've found examples of programming with it to be surprisingly intuitive, and I think it's well grounded in theory. I'm not sure why it hasn't caught on more.
The other approach is the Pi Calculus, and I think that might be more popular, though it seems harder to understand.
A practical solution to this would be using a different model of distributed-memory parallel programming, that directly addresses your concerns. I work on the Charm++ programming system, whose model is that of individual objects sending messages from one to another. The runtime system facilitates automatic mapping of these objects to available processors, to account for issues of load balance and communication locality.

How does the HP (Tandem) Non stop compare with Linux clusters?

HP NonStop systems (previously known as "Tandem") are known for their high availability and reliability, and higher price.
How do Linux or Unix based clusters compare with them, in these respects and others?
On a fault-tolerant machine the fault tolerance is handled directly in hardware and transparent to the application. Programming a cluster requires you to explicitly handle the fault tolerance in the application.
In practice, a clustered application architecture is much more complex to build and error prone than an application built for a fault-tolerant platform such as NonStop. This means that there is a far greater scope for unreliability driven by application bugs, as the London Stock Exchange found out the hard way. They had an incumbent Tandem-based system, which was quite a common architecture for stock exchange trading applications. Their new CEO had the bright idea that Microsoft was the way forward and had a big-5 consultancy build a .Net system based on a cluster of 120 servers.
The problem with clustered applications is that the failures can be correlated. If an application or configuration bug exists in the system it will typically be replicated on all of the nodes. This means that you can get a single situation or event that can take out the whole cluster. The additional complexity of clustered applications makes them more error-prone to develop and deploy, which raises the odds of this happening. A clustered system built on (for example) Linux and J2EE is vulnerable to the same types of failure modes.
IMHO this is a major advantage of older-style mainframe architectures. Several vendors (IBM, HP, DEC and probably several others I can't think of) made fault tolerant systems. The underlying programming model for this type of system is somewhat simpler than a clustered n-tier application server. This means that there is comparatively little to go wrong and for a given amount of effort you can achieve a more reliable system. A surprising number of older architectures are still alive and well and living quite comfortably in their market niches. IBM still sell plenty of Z and I series machines; Unisys still makes the A Series and 2200 series; VMS and NonStop are still alive within HP. The sales of these systems are not all to existing clients - for example a Commercial Underwriting system (GENIUS) runs on the ISeries and is still a market leader in this niche with new rollouts going on as I write this. The application has survived two attempts to rewrite it (1 in in Java and 1 in .Net) that I am aware of and the 'Old School' platform doesn't really seem to be cramping its style.
I wouldn't go shorting any screen-scraper vendors just yet ...
Gray & Reuter's Transaction Processing: Concepts and Techniques is somewhat dry and academic, but has a good treatment of fault-tolerant systems architecture. One of the authors was a key player in the design of Tandem's systems.