Related
Every time I read about how web service should communicate the first thing that comes up is:
Use REST, because it decouples client and server!
I would like to build a web service where each Query and Command is an Http-Endpoint. With REST I would have fewer endpoints, because of its nature of thinking of resources instead of operations (You typically have more operations than resources).
Why do I have a stronger coupling by using RPC over REST?
What is the benefit of using REST over RPC Json-Messaging style?
Additional information: With messaging I mean synchronous messaging (request/response)
Update: I think it would be also possible/better to only have one single Http endpoint that can handle a Query or Command depending on the given Http verb.
Before I get to the CQRS part, I'd like to spend a little time talking about the advantages and disadvantages of REST. I don't think it's possible to answer the question before we've established a common understanding of when and why to use REST.
REST
As with most other technology options, REST, too, isn't a silver bullet. It comes with advantages and disadvantages.
I like to use Richardson's Maturity Model, with Martin Fowler's additional level 0, as a thinking tool.
Level 0
Martin Fowler also calls level 0 the swamp of POX, but I think that what really distinguishes this level is simply the use of RPC over HTTP. It doesn't have to be XML; it could be JSON instead.
The primary advantage at this level is interoperability. Most system can communicate via HTTP, and most programming platforms can handle XML or JSON.
The disadvantage is that systems are difficult to evolve independently of clients (see level 3).
One of the distinguishing traits of this style is that all communication goes through a single endpoint.
Level 1
At level 1, you start to treat various parts of your API as separate resources. Each resource is identified by a URL.
One advantage is that you can now start to use off-the-shelf software, such as firewalls and proxy servers, to control access to various distinct parts of the system. You can also use HTTP redirects to point clients to different endpoints, although there are some pitfalls in that regard.
I can't think of any disadvantages, apart from those of level 0.
Level 2
At this level, not only do you have resources, but you also use HTTP verbs, such as GET, POST, DELETE, etc.
One advantage is that you can now begin to take more advantage of HTTP infrastructure. For instance, you can instruct clients to cache responses to GET requests, whereas other requests typically aren't cacheable. Again, you can use standard HTTP firewalls and proxies to implement caching. You can get 'web-scale' caching for free.
The reason that level 2 builds on level 1 is that you need each resource to be separate, because you want to be able to cache resources independently from each other. You can't do this if you can't distinguish various resources from each other, or if you can't distinguish reads from writes.
The disadvantage is that it may involve more programming work to implement this. Also, all the previous disadvantages still apply. Clients are tightly coupled to your published API. If you change your URL structure, clients break. If you change data formats, clients break.
Still, many so-called REST APIs are designed and published at this level, so in practice it seems that many organisations find this a good trade-off between advantages and disadvantages.
Level 3
This is the level of REST design that I consider true REST. It's nothing like the previous levels; it's a completely different way to design APIs. In my mind, there's a hard divide between levels 0-2, and level 3.
One distinguishing feature of level 3 is that you must think content negotiation into the API design. Once you have that, though, the reasons to choose this API design style become clearer.
To me, the dominant advantage of level 3 APIs is that you can evolve them independently of clients. If you're careful, you can change the structure, even the navigation graph, of your API without breaking existing clients. If you need to introduce breaking changes, you can use content negotiation to ensure that clients can opt-in to the breaking change, whereas legacy clients will keep working.
Basically, when I'm asked to write an API where I have no control over clients, my default choice is level 3.
Designing a level 3 REST API requires you to design in a way that's unusual and alien to many, so that's a disadvantage. Another disadvantage is that client developers often find this style of API design unfamiliar, so they often try to second-guess, or retro-engineer, your URL structure. If they do, you'll have to expend some effort to prevent them from doing that as well, since this will prevent you from being able to evolve the API.
In other words, level 3 APIs require considerable development effort, particularly on the server-side, but clients also become more complex.
I will, though, reiterate the advantage: you can evolve an level 3 REST API independently of clients. If you don't control clients, backwards compatibility is critical. Level 3 enables you to evolve APIs while still retaining compatibility. I'm not aware of a way you can achieve this with any of the other styles.
CQRS
Now that we've identified some advantages and disadvantages of REST, we can start to discuss whether it's applicable to CQRS.
The most fundamental agreement between Greg Young and Udi Dahan concerning CQRS is that it's not a top-level architecture.
In a nutshell, the reason for this is that the messages (commands and events) and queries that make up a CQRS system are sensitive to interpretation. In order to do something, a client must know which command to issue, and the server must know how to interpret it. The command, thus, is part of the system.
The system may be distributed across clients and servers, but the messages and data structures are coupled to each other. If you change how your server interprets a given message, that change will impact your clients. You can't evolve clients and servers independently in a CQRS architecture, which is the reason why it's not a top-level architecture.
So, given that it's not a top-level architecture, the transport architecture becomes fairly irrelevant. In a sense, the only thing you need in order to send messages is a single 'service bus' endpoint, which could easily be a level 0 endpoint, if all you need is interoperability. After all, the only thing you do is to put a message on a queue.
The final answer, then, is, as always: it depends.
Is speed of delivery the most important criterion? And can you control all clients and servers at the same time? Then, perhaps level 0 is all you need. Perhaps level 2 is fine.
On the other hand, if you have clients out of your control (mobile apps, hardware (IoT), business partners using your public API, etc.), you must consider how to deal with backwards and forwards compatibility, in which case level 3 is (IMO) required. In that case, though, I'd suggest keeping CQRS an implementation detail.
The best answer would probably be "it depends", but one of the big things about real REST is that it is stateless. And it being stateless means all sorts of things.
Basically, if you look at the HATEOAS constraint you have the reason for the decoupling 1
I think that RPC-JSON is not statefull per se, but it definately is not defined as stateless. So there you have the strongest argument of why the decoupling is very high with REST.
1: https://en.wikipedia.org/wiki/HATEOAS , http://restfulapi.net/hateoas/
I am seeking Python 2.7 alternatives to ZeroMQ that are released under the BSD or MIT license. I am looking for something that supports request-reply and pub-sub messaging patterns. I can serialize the data myself if necessary. I found Twisted from Twisted Matrix Labs but it appears to require a blocking event loop, i.e. reactor.run(). I need a library that will run in the background and let my application check messages upon certain events. Are there any other alternatives?
Give nanomsg, a ZeroMQ younger sister, a try - same father, same beauty
Yes, it is licensed under MIT/X11 license.
Yes, REQ/REP - allows to build clusters of stateless services to process user requests
Yes, PUB/SUB - distributes messages to large sets of interested subscribers
Has several Python bindings available
https://github.com/tonysimpson/nanomsg-python (recommended)
https://github.com/sdiehl/pynanomsg
https://github.com/djc/nnpy
Differences between nanomsg and ZeroMQ
( state as of 2014/11 v0.5-beta - courtesy nanomsg.org >>> a-click-thru to the original HyperDoc )
Licensing
nanomsg library is MIT-licensed. What it means is that, unlike with ZeroMQ, you can modify the source code and re-release it under a different license, as a proprietary product, etc. More reasoning about the licensing can be found here.
POSIX Compliance
ZeroMQ API, while modeled on BSD socket API, doesn't match the API fully. nanomsg aims for full POSIX compliance.
Sockets are represented as ints, not void pointers.
Contexts, as known in ZeroMQ, don't exist in nanomsg. This means simpler API (sockets can be created in a single step) as well as the possibility of using the library for communication between different modules in a single process (think of plugins implemented in different languages speaking each to another). More discussion can be found here.
Sending and receiving functions ( nn_send, nn_sendmsg, nn_recv and nn_recvmsg ) fully match POSIX syntax and semantics.
Implementation Language
The library is implemented in C instead of C++.
From user's point of view it means that there's no dependency on C++ runtime (libstdc++ or similar) which may be handy in constrained and embedded environments.
From nanomsg developer's point of view it makes life easier.
Number of memory allocations is drastically reduced as intrusive containers are used instead of C++ STL containers.
The above also means less memory fragmentation, less cache misses, etc.
More discussion on the C vs. C++ topic can be found here and here.
Pluggable Transports and Protocols
In ZeroMQ there was no formal API for plugging in new transports (think WebSockets, DCCP, SCTP) and new protocols (counterparts to REQ/REP, PUB/SUB, etc.) As a consequence there were no new transports added since 2008. No new protocols were implemented either. The formal internal transport API (see transport.h and protocol.h) are meant to mitigate the problem and serve as a base for creating and experimenting with new transports and protocols.
Please, be aware that the two APIs are still new and may experience some tweaking in the future to make them usable in wide variety of scenarios.
nanomsg implements a new SURVEY protocol. The idea is to send a message ("survey") to multiple peers and wait for responses from all of them. For more details check the article here. Also look here.
In financial services it is quite common to use "deliver messages from anyone to everyone else" kind of messaging. To address this use case, there's a new BUS protocol implemented in nanomsg. Check the details here.
Threading Model
One of the big architectural blunders I've done in ZeroMQ is its threading model. Each individual object is managed exclusively by a single thread. That works well for async objects handled by worker threads, however, it becomes a trouble for objects managed by user threads. The thread may be used to do unrelated work for arbitrary time span, e.g. an hour, and during that time the object being managed by it is completely stuck. Some unfortunate consequences are: inability to implement request resending in REQ/REP protocol, PUB/SUB subscriptions not being applied while application is doing other work, and similar. In nanomsg the objects are not tightly bound to particular threads and thus these problems don't exist.
REQ socket in ZeroMQ cannot be really used in real-world environments, as they get stuck if message is lost due to service failure or similar. Users have to use XREQ instead and implement the request re-trying themselves. With nanomsg, the re-try functionality is built into REQ socket.
In nanomsg, both REQ and REP support cancelling the ongoing processing. Simply send a new request without waiting for a reply (in the case of REQ socket) or grab a new request without replying to the previous one (in the case of REP socket).
In ZeroMQ, due to its threading model, bind-first-then-connect-second scenario doesn't work for inproc transport. It is fixed in nanomsg.
For similar reasons auto-reconnect doesn't work for inproc transport in ZeroMQ. This problem is fixed in nanomsg as well.
Finally, nanomsg attempts to make nanomsg sockets thread-safe. While using a single socket from multiple threads in parallel is still discouraged, the way in which ZeroMQ sockets failed randomly in such circumstances proved to be painful and hard to debug.
State Machines
Internal interactions inside the nanomsg library are modeled as a set of state machines. The goal is to avoid the incomprehensible shutdown mechanism as seen in ZeroMQ and thus make the development of the library easier.
For more discussion see here and here.
IOCP Support
One of the long-standing problems in ZeroMQ was that internally it uses BSD socket API even on Windows platform where it is a second class citizen. Using IOCP instead, as appropriate, would require major rewrite of the codebase and thus, in spite of multiple attempts, was never implemented. IOCP is supposed to have better performance characteristics and, even more importantly, it allows to use additional transport mechanisms such as NamedPipes which are not accessible via BSD socket API. For these reasons nanomsg uses IOCP internally on Windows platforms.
Level-triggered Polling
One of the aspects of ZeroMQ that proved really confusing for users was the ability to integrate ZeroMQ sockets into an external event loops by using ZMQ_FD file descriptor. The main source of confusion was that the descriptor is edge-triggered, i.e. it signals only when there were no messages before and a new one arrived. nanomsg uses level-triggered file descriptors instead that simply signal when there's a message available irrespective of whether it was available in the past.
Routing Priorities
nanomsg implements priorities for outbound traffic. You may decide that messages are to be routed to a particular destination in preference, and fall back to an alternative destination only if the primary one is not available.
For more discussion see here.
TCP Transport Enhancements
There's a minor enhancement to TCP transport. When connecting, you can optionally specify the local interface to use for the connection, like this:
nn_connect (s, "tcp://eth0;192.168.0.111:5555").
Asynchronous DNS
DNS queries (e.g. converting hostnames to IP addresses) are done in asynchronous manner. In ZeroMQ such queries were done synchronously, which meant that when DNS was unavailable, the whole library, including the sockets that haven't used DNS, just hung.
Zero-Copy
While ZeroMQ offers a "zero-copy" API, it's not true zero-copy. Rather it's "zero-copy till the message gets to the kernel boundary". From that point on data is copied as with standard TCP. nanomsg, on the other hand, aims at supporting true zero-copy mechanisms such as RDMA (CPU bypass, direct memory-to-memory copying) and shmem (transfer of data between processes on the same box by using shared memory). The API entry points for zero-copy messaging are nn_allocmsg and nn_freemsg functions in combination with NN_MSG option passed to send/recv functions.
Efficient Subscription Matching
In ZeroMQ, simple tries are used to store and match PUB/SUB subscriptions. The subscription mechanism was intended for up to 10,000 subscriptions where simple trie works well. However, there are users who use as much as 150,000,000 subscriptions. In such cases there's a need for a more efficient data structure. Thus, nanomsg uses memory-efficient version of Patricia trie instead of simple trie.
For more details check this article.
Unified Buffer Model
ZeroMQ has a strange double-buffering behaviour. Both the outgoing and incoming data is stored in a message queue and in TCP's tx/rx buffers. What it means, for example, is that if you want to limit the amount of outgoing data, you have to set both ZMQ_SNDBUF and ZMQ_SNDHWM socket options. Given that there's no semantic difference between the two, nanomsg uses only TCP's (or equivalent's) buffers to store the data.
Scalability Protocols
Finally, on philosophical level, nanomsg aims at implementing different "scalability protocols" rather than being a generic networking library. Specifically:
Different protocols are fully separated, you cannot connect REQ socket to SUB socket or similar.
Each protocol embodies a distributed algorithm with well-defined prerequisites (e.g. "the service has to be stateless" in case of REQ/REP) and guarantees (if REQ socket stays alive request will be ultimately processed).
Partial failure is handled by the protocol, not by the user. In fact, it is transparent to the user.
The specifications of the protocols are in /rfc subdirectory.
The goal is to standardise the protocols via IETF.
There's no generic UDP-like socket (ZMQ_ROUTER), you should use L4 protocols for that kind of functionality.
How do you sync data between two processes (say client and server) in real time over network?
I have various documents/datasets constructed on the server, which are downloaded and displayed by clients. Once downloaded, the document receives continuous updates in order to remain fresh.
It seems to be a simple and commonly occurring concept, but I cannot find any tools that provide this level of abstraction. I am not even sure what I am looking for. Perhaps there is a similar concept with solid tool support? Perhaps there is a chain of different tools that must be put together? Here's what I have considered so far:
I am required to propagate every change in a single hop (0.5 RTT), which rules out polling (typically >10 RTT) and cache invalidation techniques (1.5 RTT).
Data replication and simple notification broadcasts are not an option, because there is too much data and too many changes. Clients must be able to select specific documents to download and monitor for changes.
I am currently using message passing pattern, which does the job, but it is hopelessly unproductive. It works at way too low level of abstraction. It is laborious, error-prone, and it doesn't scale well with increasing application complexity.
HTTP and other RPC-like techniques are good for the initial fetch, but they encourage polling for subsequent synchronization. When performing reverse requests (from data source to data consumer), change notifications are possible, but it's even more complicated than message passing.
Combining RPC (for the initial fetch) with message passing (for updates) turned out to be a nightmare due to the complexity involved in coordinating communication over the two parallel connections as well as due to the impedance mismatch between the two paradigms. I need something unified.
WebSocket & Comet are popular methods to implement change notification, but they need additional libraries to be productive and I am not aware of any libraries suitable for my application.
Message queues merely put an intermediary on the network while maintaining the basic message passing pattern. Custom message filters/routers allow me to get closer to the live document concept, but I feel like I am implementing custom middleware layer on top of the MQ.
I have tons of additional requirements (native observable data structure API on both ends, incremental updates, custom message filters, custom connection routing, cross-platform, robustness & scalability), but before considering those requirements, I need to find some tools that at least attempt to do what I need. I am trying to avoid in-house frameworks for the standard reasons - cost, time to market, long-term maintenance, and keeping developers happy.
My conclusion at the moment is that there is no such live document synchronization framework. In-house solution is the way to go, but many existing components can be used as part of the solution.
It is pretty simple to layer live document logic on top of WebSocket or any other message passing platform. Server just sends the document as a separate message when the connection is initiated and then after every change. Automated reconnection and some connection monitoring must be added to handle network failures.
Serialization at both ends is a separate problem targeted by many existing libraries. Detecting changes in server-side data structures (needed to initiate push) is yet another separate problem that has its own set of patterns and tools. Incremental updates and many other issues can be solved by intermediaries intercepting the connection.
This approach will work with current technology at the cost of extensive in-house glue code. It can be incrementally substituted with standard components as they become available.
WebSocket already includes resource URIs, routing, and a few other nice features. Useful intermediaries and libraries will likely emerge in the future. HTTP with text/event-stream MIME type is a possible future alternative to WebSocket. The advantage of HTTP is that existing tools can be reused with little modification.
I've completely thrown away the pattern of combining RPC pull with separate push channel despite rich tool support. Pushing everything in 0.5 RTT requires the push channel to use exactly the same technology as the pull channel, i.e. reverse RPC. Reverse RPC is like message passing except it introduces redundant returns, throws away useful connection semantics, and makes it hard to insert content-agnostic intermediaries into the stream.
I am not able to understand if HTML5s Server-sent-events really fit in a ReST architecture. I understand that NOT all aspects of HTML5/HTTP need to fit in a ReST architecture. But I would like to know from experts, which half of HTTP is SSE in (the ReSTful half or the other half !).
One view could be that it is ReSTful, because there is an 'initial' HTTP GET request from the client to the server and the remaining can just be seen as partial-content responses of just a different Content-type ("text/event-stream")
A request sent without any idea of how many responses are going to come as response(events) ? Is that ReSTful ?
Motivation for the question: We are developing the server-side of an app, and we want to support both ReST clients (in general) and Browsers (in particular). While SSEs will work for most of the HTML5 browser clients, we are not sure if SSEs are suitable for support by a pure ReST client. Hence the question.
Edit1:
Was reading Roy Fielding's old article, where he says :
"In other words, a single user request results in a potentially large number of server obligations. As such, a benevolent user can produce a disproportionate load on the publisher or broker that is distributing notifications. On the Internet, we don’t have the luxury of designing just for benevolent users, and thus in HTTP systems we call such requests a denial-of-service exploit.... That is exactly why there is no standard mechanism for notifications in HTTP"
Does that imply SSE is not ReSTful ?
Edit2:
Was going through Twitter's REST API.
While REST puritans might debate if their REST API is really/fully REST, just the title of the section Differences between Streaming and REST seems to suggest that Streaming (and even SSE) cannot be considered ReSTful !? Anyone contends that ?
I think it depends:
Do your server-side events use hypermedia and hyperlinks to describe possible state changes?
The answer to that question is the answer to whether or not they satisfy REST within your application architecture.
Now, the manner in which those events are sent/received may or may not adhere to REST - everything I have read about SSE suggests that they do not. I suspect it will impact several principles, especially layering - though if intermediaries were aware of the semantics of SSE you could probably negate this.
I think this is orthogonal as it's just part of the processing directive for HTML and JavaScript that the browser (via the JavaScript it is running) understands. You should still be able to have client-side application state decoupled from server-side resource state.
Some of the advice I've seen on how to deal with scaling using SSE don't fit REST - i.e. introducing custom headers (modifying the protocol).
How do you respect REST while using SSE?
I'd like to see some kind of
<link rel="event" href="http://example.com/user/1" />
Then the processing directives (including code-on-demand such as JavaScript) of whatever content-type/resource you are working with tell the client how to subscribe and utilize the events made available from such a hyperlink. Obviously, the data of those events should itself be hypermedia containing more hyperlinks that control program flow. (This is where I believe you make the distinction between REST and not-REST).
At some point the browser could become aware of that link relationship - just like a stylesheet and do some of that fancy wire-up for you, so all you do is just listen for events in JavaScript.
While I do think that your application can still fit a REST style around SSE, they are not REST themselves (Since your question was specifically about their use, not their implementation I am trying to be clear about what I am speaking to).
I dislike that the specification uses HTTP because it does away with a lot of the semantics and effectively tunnels an anemic protocol through an otherwise relatively rich one. This is supposedly a benefit but strikes me as selling dinner to pay for lunch.
ReST clients (in general) and Browsers (in particular).
How is your browser not a REST client? Browser are arguably the most REST client of all. It's all the crap we stick in to them via JavaScript that makes then stop adhering to REST. I suspect/fear that as long as we continue to think about our REST-API 'clients' and our browser clients as fundamentally different we will still be stuck in this current state - presumably because all the REST people are looking for a hyperlink that the RPC people have no idea needs to exist ;)
I think SSE can be used by a REST API. According to the Fielding dissertation, we have some architectural constraints the application MUST meet, if we want to call it REST.
client-server architecture: ok - the client triggers while the server does the processing
stateless: ok - we still store client state on the client and HTTP is still a stateless protocol
cache: ok - we have to use no cache header
uniform interface
identification of resources: ok - we use URIs
manipulation of resources through representations: ok - we can use HTTP methods with the same URI
self-descriptive messages: ok, partially - we use content-type header we can add RDF to the data if we want, but there is no standard which describes that the data is RDF coded. we should define a text/event-stream+rdf MIME type or something like that if that is supported.)
hypermedia as the engine of application state: ok - we can send links in the data
layered system: ok - we can add other layers, which can transform the data stream aka. pipes and filters where the pump is the server, the filters are these layers and the sink is the client
code on demand: ok - optional, does not matter
Btw. there is no such rule, that you cannot use different technologies together. So you can use for example a REST API and websockets together if you want, but if the websockets part does not meet at least with the self-descriptive message and the HATEOAS constraints, then the client will be hard to maintain. Scalability can be another problem, since the other constraints are about that.
Since (html5) localStorage and its equivalencies persist in between tabs and windows, I've thought about using it for message passing. The problem is that fetch and store are different operations, and therefore not atomic. I have models that rely on UUID generation, conflict resolutions, and beaconing to do the small subset of what I need to do, but my real question is this:
Since the local storage is a shared memory resource, what are the locking mechanisms available for mutual access?
Benjamin Dumke-von der Ehe recently came up with some (experimental) locking code for localStorage: http://balpha.de/2012/03/javascript-concurrency-and-locking-the-html5-localstorage/
Update: the code link in the article is broken, but here on Github is code, specifically mentioning the article.
I think what you really need is Channel Messaging, though as far as I'm aware no-one has implemented it yet. It allows arbitrary client side messaging between scripts.
There aren't any built-in. You'll have to come up with your own locking mechanism. You can off course use any of the existing methods that other people have come up with for other things (like locking in memcache for instance).