JSON-RPC vs JSON Patch - json

There are a lot of comparisons like "REST vs smth" (eg vs Kafka, vs JSON-RPC), but I also see many similarities between JSON-RPC and JSON Patch – both of them specify operation/method, values/parameters, and allow to perform batch requests. The only difference I see is that JSON-RPC also describes response format with IDs and errors, so it looks more mature. But maybe they just have different pros&cons, different appropriate use cases?

JSON-RPC (2.0) is indeed mostly mentioned as a REST alternative. (A look at the more general discussion, REST vs SOAP vs RPC might be due but this is a rabbit hole, seriously. Part of the problem is, people talking about different RESTs and RPCs and confuse them.) Historically, RPC systems often suffered from confounded dependences related to schema requirements, custom serializations, protocol restrictions, and complex RPC/function mappings. JSON-RPC tries to entangle dependencies, it does not want to be everything RPC for everybody but to stay relatively narrow in scope. It supports a small(er) set of commands, and is more opinionated on how to implement it. Nonetheless, it's a complete package, highly flexible, and used for complex APIs that struggle to map all operations to HTTP verbs. Newer capabilities like events, notifications, and batch actions make JSON-RPC very versatile and a good choice for all kinds of projects; the additions are especially useful for action-based microservices.
JSON-Patch (RFC6902), arguably, tries to solve a single common problem of APIs, i.e. partial updates. Instead of PUTting or POSTing (potentially heavyweight) changes to a (historically) growing URL space, using query params to carry the semantics, or introducing other complexities, JSON Patch offers an atomic update request that patches your JSON using a series of "add", "remove", "replace", "move" and "copy" operations. To quote one of the authors of the JSON Patch RFC, Mark Nottingham (IETF):
This has a bunch of advantages. Since it’s a generic format, you can
write server- and client-side code once, and share it among a number
of applications. You can do QA once, and make sure you get it right.
Furthermore, your API will become less complex, because it has less
URI conventions, leading to more flexibility and making it easier to
approach for new developers. Using this approach will also make caches
operate more correctly; since modifications to a resource will
“travel” through its URL, rather than some other one, the right stored
responses will get invalidated.
Similar things can be said about JSON Merge Patch (RFC7396, which can be used for the same purpose but works more like a diff/patch that simply contains the changes instead of using mutating operations. In this respect, JSON Merge Patch is simpler but more limited than JSON Patch, e.g. you cannot set a key to null (since null means remove), merging only works with objects {..}, arrays [..] can only be replaced as a whole, and merging never fails, could cause side-effects and is therefore not transactional. This renders Merge Patch (overly) simplistic and impractical for certain real-world applications. There are more related projects named like JSON-Diffpatch, etc that take similar approaches.
In general, all of them have advantages and disadvantages, and none of them will fit everybody's use-cases. It really depends on what problems you want to solve with your APIs. JSON-RPC is arguably more versatile and mature, however, JSON-Patch is also a sound choice, especially if you control both sides of the communication.

Related

Protocol Buffer vs Json - when to choose one over another

Can anyone explain when to use protocol buffer instead of JSON for micro-services architecture? And vice-versa? Both on synchronous and asynchronous communication.
When to use JSON
You need or want data to be human readable
Data from the service is directly consumed by a web browser
Your server side application is written in JavaScript
You aren’t prepared to tie the data model to a schema
You don’t have the bandwidth to add another tool to your arsenal
The operational burden of running a different kind of network service
is too great
Pros of ProtoBuf
Relatively smaller size
Guarantees type-safety
Prevents schema-violations
Gives you simple accessors
Fast serialization/deserialization
Backward compatibility
While we are at it, have you looked at flatbuffers?
Some of the aspects are covered here google protocol buffers vs json vs XML
Reference:
https://codeclimate.com/blog/choose-protocol-buffers/
https://codeburst.io/json-vs-protocol-buffers-vs-flatbuffers-a4247f8bda6f
I'd use JSON when the consumer is or could possibly be written in a language with built-in native support for JSON (Javascript is an example), a web browser, or where human readability is wanted. Speaking of which, at least for asynchronous calls, many developers enjoy the convenience of examining the contents of the queue directly for debugging and even during the normal course of development. Depending on the tech stack used, it may or may not be worth the trade off to use protobuf just to reduce network load since any performance increase wont buy you much in the async world. And it's not like we need to write a bunch of boiler plate code anymore like we used to with JSON marshalling and unmarshalling in most languages.
I'd use protobuf for everything else... if there are any other use cases left for it with the considerations above. There are advantages you might see, such as performance, network load, the backwards compatibility offered by its versioning scheme, the lovely documentation that magically comes with proto files, and some validation! If for some reason you have a lot of REST or other synchronous calls between microservices, protobuf can be sent over the wire instead of JSON without many trade offs, if any at all, while offering a heap of advantages.

What are the practical disadvantages of using strongly typed data interchange format (eg thrift / capn proto) in a microservices context?

I'm thinking of introducing a strongly typed (read - with predefined schema) data interchange format for communication between our internal services. For example, I guess something like Thrift or Cap'n Proto.
At least two obvious advantages (to me) of using this over something like JSON is that
you would KNOW the exact format of the data the service can expect (so leaves less room for ambiguity and errors while communicating) and
the implementation generally deserializes the raw message for you and it provides methods for accessing the objects.
What are the practical disadvantages for going this route, versus something like JSON?
For context - our system consists of services written in python and java - and possibly other languages in the future, and communicates via HTTP endpoints between services and message brokers like rabbitmq.
As with every strongly typed system, one of the major advantanges is without a doubt that if you make mistakes, it fails early in the process, typically at the compilation stage, which is a good thing.
Second biggest advantage IMHO is what you already said: because the fields and types are well known, the compiler, libraries and related code know what data to expect and thus can be written/organized in a more efficient manner - or in short: performance.
In contrast, a losely typed system (like Avro), while allowing for much greater flexibility without the need of recompiling, comes with the other side of the same coin: the downside of being prone to errors regarding the contents of the message at runtime.
This is because a losely defined system defines only the syntax of a valid document (like for example XML) and leaves the message-level semantics of what's in the document up to the upper layers. A strongly typed system has the knowledge about those message-level semantics already built in at compile time. Therefore, it is easy to detect/decide whether a particular document or message is not only well-formed but valid with regard to the message contents. If you need to do the same with the losely defined system, you need to provide additional information at runtime (like XML schema) and validate your document against it.
Bottom line
What system you prefer is more or less a matter of taste in most cases. I'd make the decision based on the question, how variable the data are that I have to deal with. If it makes sense to use a strongly typed system, I'd go that way, because I like it very much to get informed about errors and mistakes early.
However, if there is a need for very flexible data structures, it may make more sense to go the other road. Although designing a losely typed schema on top of a strongly typed system is surely possible, it is somewhat contradicting and you'll end up with some overly complicated, while overly generic, thing.
Typed
Incoming messages that are type tagged is very liberating, so long as it's possible to tell what the incoming message is without reading all of it. If so then you no longer care so much about message order. This is because it's easy for the recipient of the messages to handle whatever it is sent. So you can have an application which just sits there taking whatever it gets, and just does whatever is appropriate for each one.
Format
A schema language that allows you to define value and size constraints is very useful. It means that the sender of a message cannot accidentally send an invalid one. Moreover the receiver can automatically tell if an incoming message meets the schema. This is a real bonus in implementing a network service; the vast bulk of the message validation is done for you!
By size constraint, I mean that you can specify how long an array is in the schema and the generated code will refuse to handle arrays longer or shorter. By value constraints, imagine a message field called "bearing"; you might want to constrain that to be between 0 and 359.
These both allow you to make a clear, unambiguous statement about what the interface is and have it enforced automatically. How many security bugs have there been recently where some network interface data validation has been badly implemented...
Options
One serialisation standard that does all this is ASN.1. The tools I've used take an ASN.1 schema and produce code to serialise and deserialise, automatically checking that the value and size constraints have been met and also telling you what an incoming message type is. The tools for ASN.1 can be quite elderly and are in need of updating. If updated it would be ideal for every purpose, with both binary and text wire formats available.
There's now JSON schemas too, and they seem to have type, value and size constraints. This might be what you're looking for.
I'm fairly sure that Google Protocol Buffers doesn't do type tagging very well, and doesn't do value and size constraints. I've seen comments in GPB schema along the lines of:
// musn't be greater than 10.
If that's what is being written into a schema, the schema language is arguably inadequate...
I'm not sure of Thrift, I'm not sure it does value constraints (someone correct me if I'm wrong please!).
Disadvantages
Can't think of any! It can irritate developers; code they thought was good can be readily revealed to be producing junk messages, which annoys them intensely...

Messaging library for jeroMQ

I have chosen jeroMQ for building Asynchronous message channel for publishing content from multiple clients. On the other end server side workers processes request and notify client only if server wanted to notify client based on the message received.
On digging deep, looking for messaging library to marshal/un-marshal message. I found kvpmsg class which does the job for simple key-value.
Don't want to re-invent the wheel if some standard library exists, that can be applied for bigger objects
It seems like you are asking for data serialization libraries. Check Wikipedia for a list and a comparison of data serialization formats.
Also there is a relevant entry in ZeroMQ FAQ explaining why ZeroMQ doesn't include any serialization format:
Does ØMQ include APIs for serializing data to/from the wire representation?
No. This design decision adheres to the UNIX philosophy of "do one thing and do it well". In the case of ØMQ, that one thing is moving messages, not marshaling data to/from binary representations.
Some middleware products do provide their own serialization API. We believe that doing so leads to bloated wire-level specifications like CORBA (1055 pages). Instead, we've opted to use the simplest wire formats possible which ensure easy interoperability, efficiency and reduce the code (and bug) bloat.
If you wish to use a serialization library, there are plenty of them out there. See for example
Google Protocol Buffers
MessagePack
JSON-GLib
C++ BSON Library
Note that serialization implementations might not be as performant as you might expect. You may need to benchmark your workloads with several serialization formats and libraries in order to understand performance and which format/implementation is best for your use case (ease of development must also be considered).

IDL for JSON REST/RPC interface

We are designing a fairly complex REST API, in which most of the I/O are JSON encoded objects with a specific structure. One challenge we have found is to document the API in such a way that makes it easier for clients to post correct input and process output. Because the data of both the input and output requires fairly complex JSON objects, client developers often introduce bugs related to the structure of the I/O objects.
With all of the JSON web API's these days, I would have hoped for a general solution, but I am having a hard time finding one. I looked into json-schema which is a json-validation schema but both the IETF draft and implementations seem to be fairly immature (even though they have been around for a while, which is not a good sign).
A slightly different approach is offered by Protocol Buffers and Apache Avro, where the schema is not used for validation, but actually required for the encoding/decoding of the message. Of these 2, Avro seems to have rather limited documentation and implementations. ProtoBuf seems better, but I am not sure if this is really suitable to use in the browser to call a JSON api?
Now I am starting to doubt if I am looking at this from the right angle. Are there other methods available to make my API a bit more strong-typed'ish? Or is a formal description of a JSON REST/RPC API something that defeats the purpose of using JSON?
Edit: 6 months after this topic we found mongoose, which is very close to what we were lookin for.
Below a reply I received by email from Douglas Crockford.
I am not a believer in schemas as an alternative to input validation.
There are properties that cannot be verified from the syntax. I think
that was one of the ways that XML went wrong.
If your formats are too complex, then I would look at simplifying
them.
Such systems exist and I'm the author of one of them. It is called Piqi-RPC and it does IDL-based validation of the input and output parameters for RPC-style APIs over HTTP.
It supports JSON, XML and Google Protocol Buffers as data representation formats for input and output of HTTP POST requests. Clients can choose to use any of the three formats and specify their choice using the standard Accept and Content-Type HTTP headers.
So, yes, in theory, you are looking in the right direction. However, at the moment, Piqi-RPC supports writing servers only in Erlang and it wouldn't be very useful for you if you use a different stack. I heard that Apache Thrift also supports JSON over HTTP transport, but I haven't checked. Another kind of similar system I know of (also for Erlang) is called UBF. I have heard of libraries for Java that can parse and validate JSON based on Protocol Buffers specification (e.g. http://code.google.com/p/protostuff/).
The idea itself is far from being new, but there aren't many systems that approach it in practice. It is a challenging problem.
Historically, IDLs were used for interface definition and binary data serialization and not so much for validating dynamic data interchange formats (e.g. XML and JSON) which emerged later. Sun-RPC IDL and CORBA IDL fall in the first category. WSDL would be one of few examples covering both areas, but it is a terrible piece of technology and it would be a bad choice for most modern systems. In addition, there are many schema languages (also known as DDLs -- data definition languages), most of which are highly specialized and work with only one representation format, e.g. XML or JSON schemas. Few of those have stable implementations.
The Piqi project and Piqi-RPC, which is based on it, are build around several fairly simple realizations:
DLL doesn't have to be explicitly tied to any particular data representation format or built around it. Instead, such language can be fairly universal and cover wide range of practical use-cases (e.g. cross-language data serialization and data validation) and data formats (e.g. JSON, XML, Protocol Buffers).
IDL for RPC-style communication can be implemented as a thin, mostly syntactic layer on top of the universal DDL.
Such IDL and interface specifications can be transport agnostic.
Speaking of REST-style APIs over HTTP compared to RPC-style APIs over HTTP.
With RPC-style APIs, service developer or an automated system have to validate three things: function name (according to some service naming scheme), input and, if you choose so, output.
In case of REST-style APIs, people get themselves in trouble for no good reason. Now, they have a lot more stuff to validate: arbitrarily complex URL syntax, including dynamic parameters encoded in URL segments (for all HTTP methods) and URL query string (only for HTTP GET method), HTTP method correspondence (whether it should be GET, POST, PUT, DELETE, etc.), HTTP body when some parameters go there (sometimes they do it manually twice for parameters represented in JSON and XML), custom HTTP headers, and separately -- service documentation. Imagine an IDL supporting all that!
XML is better for RESTful services in many ways. It has native linking (<link href=, for all those HATEOAS fans), native language support (lang="en") and a great ecosystem.
It is also better for future proofing and future API refactorings. Converting this:
<profile>
<username>alganet</username>
</profile>
To support more usernames:
<profile>
<username>alganet</username>
<username>alexandre</username>
</profile>
Is much more simpler to do without breaking existing clients using XML. JSON is hard on that.
If you really need JSON, JSON-Schema is the way to go. It's immature, but I don't know anything better on that case. Maybe your consumers could choose between XML and JSON, so they can choose between a small payload (JSON) or RESTful candies (XML) using Content Negotiation.
I'd say the answer to your last question is yes. If you need a way to constrain and document the JSON "schema", why didn't you go with XML in the first place? It is not that much harder to parse, and being able to enforce a schema for it is a great advantage.

Parsing language for both binary and character files

The problem:
You have some data and your program needs specified input. For example strings which are numbers. You are searching for a way to transform the original data in a format you need.
And the problem is: The source can be anything. It can be XML, property lists, binary which
contains the needed data deeply embedded in binary junk. And your output format may vary
also: It can be number strings, float, doubles....
You don't want to program. You want routines which gives you commands capable to transform the data in a form you wish. Surely it contains regular expressions, but it is very good designed and it offers capabilities which are sometimes much more easier and more powerful.
ADDITION:
Many users have this problem and hope that their programs can convert, read and write data which is given by other sources. If it can't, they are doomed or use programs like business
intelligence. That is NOT the problem.
I am talking of a tool for a developer who knows what is he doing, but who is also dissatisfied to write every time routines in a regular language. A professional data manipulation tool, something like a hex editor, regex, vi, grep, parser melted together
accessible by routines or a REPL.
If you have the spec of the data format, you can access and transform the data at once. No need to debug or plan meticulously how to program the transformation. I am searching for a solution because I don't believe the problem is new.
It allows:
joining/grouping/merging of results
inserting/deleting/finding/replacing
write macros which allows to execute a command chain repeatedly
meta-grouping (lists->tables->n-dimensional tables)
Example (No, I am not looking for a solution to this, it is just an example):
You want to read xml strings embedded in a binary file with variable length records. Your
tool reads the record length and deletes the junk surrounding your text. Now it splits open
the xml and extracts the strings. Being Indian number glyphs and containing decimal commas instead of decimal points, your tool transforms it into ASCII and replaces commas with points. Now the results must be stored into matrices of variable length....etc. etc.
I am searching for a good language / language-design and if possible, an implementation.
Which design do you like or even, if it does not fulfill the conditions, wouldn't you want to miss ?
EDIT: The question is if a solution for the problem exists and if yes, which implementations are available. You DO NOT implement your own sorting algorithm if Quicksort, Mergesort and Heapsort is available. You DO NOT invent your own text parsing
method if you have regular expressions. You DO NOT invent your own 3D language for graphics if OpenGL/Direct3D is available. There are existing solutions or at least papers describing the problem and giving suggestions. And there are people who may have worked and experienced such problems and who can give ideas and suggestions. The idea that this problem is totally new and I should work out and implement it myself without background
knowledge seems for me, I must admit, totally off the mark.
UPDATE:
Unfortunately I had less time than anticipated to delve in the subject because our development team is currently in a hot phase. But I have contacted the author of TextTransformer and he kindly answered my questions.
I have investigated TextTransformer (http://www.texttransformer.de) in the meantime and so far I can see it offers a complete and efficient solution if you are going to parse character data.
For anyone who will give it a try to implement a good parsing language, the smallest set of operators to directly transform any input data to any output data if (!) they were powerful enough seems to be:
Insert/Remove: Self-explaining
Group/Ungroup: Split the input data into a set of tokens and organize them into groups
and supergroups (datastructures, lists, tables etc.)
Transform
Substituition: Change the content of the tokens (special operation: replace)
Transposition: Change the order of tokens (swap,merge etc.)
Have you investigated TextTransformer?
I have no experience with this, but it sounds pretty good and the author makes quite competent posts in the comp.compilers newsgroup.
You still have to some programming work.
For a programmer, I would suggest:
Perl against a SQL backend.
For a non-programmer, what it sounds like you're looking for is some sort of business intelligence suite.
This suggestion may broaden the scope of your search too much... but here it is:
You could either reuse, as-is, or otherwise get "inspiration" from the [open source] code of the SnapLogic framework.
Edit (answering the comment on SnapLogic documentation etc.)
I agree, the SnapLogic documentation leaves a bit to be desired, in particular for people in your situation, i.e. when just trying to quickly get an overview of what SnapLogic can do, and if it would generally meet their needs, without investing much time and learn the system in earnest.
Also, I realize that the scope and typical uses of of SnapLogic differ, somewhat, from the requirements expressed in the question, and I should have taken the time to better articulate the possible connection.
So here goes...
A salient and powerful feature of SnapLogic is its ability to [virtually] codelessly create "pipelines" i.e. processes made from pre-built components;
Components addressing the most common needs of Data Integration tasks at-large are supplied with the SnapLogic framework. For example, there are components to
read in and/or write to files in CSV or XML or fixed length format
connect to various SQL backends (for either input, output or both)
transform/format [readily parsed] data fields
sort records
join records for lookup and general "denormalized" record building (akin to SQL joins but applicable to any input [of reasonnable size])
merge sources
Filter records within a source (to select and, at a later step, work on say only records with attribute "State" equal to "NY")
see this list of available components for more details
A relatively weak area of functionality of SnapLogic (for the described purpose of the OP) is with regards to parsing. Standard components will only read generic file formats (XML, RSS, CSV, Fixed Len, DBMSes...) therefore structured (or semi-structured?) files such as the one described in the question, with mixed binary and text and such are unlikely to ever be a standard component.
You'd therefore need to write your own parsing logic, in Python or Java, respecting the SnapLogic API of course so the module can later "play nice" with the other ones.
BTW, the task of parsing the files described could be done in one of two ways, with a "monolithic" reader component (i.e. one which takes in the whole file and produces an array of readily parsed records), or with a multi-component approach, whereby an input component reads in and parse the file at "record" level (or line level or block level whatever this may be), and other standard or custom SnapLogic components are used to create a pipeline which effectively expresses the logic of parsing a record (or block or...) into its individual fields/attributes.
The second approach is of course more modular and may be applicable if the goal is to process many different files format, whereby each new format requires piecing together components with no or little coding. Whatever the approach used for the input / parsing of the file(s), the SnapLogic framework remains available to create pipelines to then process the parsed input in various fashion.
My understanding of the question therefore prompted me to suggest SnapLogic as a possible framework for the problem at hand, because I understood the gap in feature concerning the "codeless" parsing of odd-formatted files, but also saw some commonality of features with regards to creating various processing pipelines.
I also edged my suggestion, with an expression like "inspire onself from", because of the possible feature gap, but also because of the relative lack of maturity of the SnapLogic offering and its apparent commercial/open-source ambivalence.
(Note: this statement is neither a critique of the technical maturity/value of the framework per-se, nor a critique of business-oriented use of open-source, but rather a warning that business/commercial pressures may shape the offering in various direction)
To summarize:
Depending on the specific details of the vision expressed in the question, SnapLogic may be worthy of consideration, provided one understands that "some-assembly-required" will apply, in particular in the area of file parsing, and that the specific shape and nature of the product may evolve (but then again it is open source so one can freeze it or bend it as needed).
A more generic remark is that SnapLogic is based on Python which is a very swell language for coding various connectors, convertion logic etc.
In reply to Paul Nathan you mentioned writing throwaway code as something rather unpleasant. I don't see why it should be so. After all, all of our code will be thrown away and replaced eventually, no matter how perfect we wrote it. So my opinion is that writing throwaway code is pretty much ok, if you don't spend too much time writing it.
So, it seems that there are two approaches to solving your solution: either a) find some specific tool intended for the purpose (parse data, perform some basic operations on it and storing it in some specific structure) or b) use some general purpose language with lots of libraries and code it yourself.
I don't think that approach a) is viable because sooner or later you'll bump into an obstacle not covered by the tool and you'll spend your time and nerves hacking the tool, or mailing the authors and waiting for them to implement what you need. I might as well be wrong, so please if you find a perfect tool, drop here a link (I myself am doing lots of data processing in my day job and I can't swear that I couldn't do it more efficiently).
Approach b) may at first seem "unpleasant", but given a nice high-level expressive language with bunch of useful libraries (regexps, XML manipulation, creating parsers...) it shouldn't be too hard, and may be gradually turned into a DSL for the very purpose. Beside Perl which was already mentioned, Python and Ruby sound like good candidates for these languages (I bet some Lisp derivatives too, but I have no experience there).
You might find AntlrWorks useful if you go so far as defining formal grammars for what you're parsing.