Protocol Buffer vs Json - when to choose one over another - json

Can anyone explain when to use protocol buffer instead of JSON for micro-services architecture? And vice-versa? Both on synchronous and asynchronous communication.

When to use JSON
You need or want data to be human readable
Data from the service is directly consumed by a web browser
Your server side application is written in JavaScript
You aren’t prepared to tie the data model to a schema
You don’t have the bandwidth to add another tool to your arsenal
The operational burden of running a different kind of network service
is too great
Pros of ProtoBuf
Relatively smaller size
Guarantees type-safety
Prevents schema-violations
Gives you simple accessors
Fast serialization/deserialization
Backward compatibility
While we are at it, have you looked at flatbuffers?
Some of the aspects are covered here google protocol buffers vs json vs XML
Reference:
https://codeclimate.com/blog/choose-protocol-buffers/
https://codeburst.io/json-vs-protocol-buffers-vs-flatbuffers-a4247f8bda6f

I'd use JSON when the consumer is or could possibly be written in a language with built-in native support for JSON (Javascript is an example), a web browser, or where human readability is wanted. Speaking of which, at least for asynchronous calls, many developers enjoy the convenience of examining the contents of the queue directly for debugging and even during the normal course of development. Depending on the tech stack used, it may or may not be worth the trade off to use protobuf just to reduce network load since any performance increase wont buy you much in the async world. And it's not like we need to write a bunch of boiler plate code anymore like we used to with JSON marshalling and unmarshalling in most languages.
I'd use protobuf for everything else... if there are any other use cases left for it with the considerations above. There are advantages you might see, such as performance, network load, the backwards compatibility offered by its versioning scheme, the lovely documentation that magically comes with proto files, and some validation! If for some reason you have a lot of REST or other synchronous calls between microservices, protobuf can be sent over the wire instead of JSON without many trade offs, if any at all, while offering a heap of advantages.

Related

Messaging library for jeroMQ

I have chosen jeroMQ for building Asynchronous message channel for publishing content from multiple clients. On the other end server side workers processes request and notify client only if server wanted to notify client based on the message received.
On digging deep, looking for messaging library to marshal/un-marshal message. I found kvpmsg class which does the job for simple key-value.
Don't want to re-invent the wheel if some standard library exists, that can be applied for bigger objects
It seems like you are asking for data serialization libraries. Check Wikipedia for a list and a comparison of data serialization formats.
Also there is a relevant entry in ZeroMQ FAQ explaining why ZeroMQ doesn't include any serialization format:
Does ØMQ include APIs for serializing data to/from the wire representation?
No. This design decision adheres to the UNIX philosophy of "do one thing and do it well". In the case of ØMQ, that one thing is moving messages, not marshaling data to/from binary representations.
Some middleware products do provide their own serialization API. We believe that doing so leads to bloated wire-level specifications like CORBA (1055 pages). Instead, we've opted to use the simplest wire formats possible which ensure easy interoperability, efficiency and reduce the code (and bug) bloat.
If you wish to use a serialization library, there are plenty of them out there. See for example
Google Protocol Buffers
MessagePack
JSON-GLib
C++ BSON Library
Note that serialization implementations might not be as performant as you might expect. You may need to benchmark your workloads with several serialization formats and libraries in order to understand performance and which format/implementation is best for your use case (ease of development must also be considered).

What is the most efficient way to send OData payloads over the wire? "Dense JSON?"

I'm designing a distributed application that will consist of a variety of REST services. Lately I've been going back and forth about whether to implement my REST services using the ASP.NET MVC 4 Web API or OData. Web API seems like it will some day be what I need but right now it's only half baked. Specifically, it only has a partial implementation of OData-style URI querying and doesn't do hypermedia out-of-the-box.
So this forces me to take another long hard look at OData. I really like the URI querying capability and structural hypermedia for lazy loading; I think I will use these features a lot in my application. However, the Atom Pub specification appears to be grossly inefficient.
I recently read a post about an efficient format for OData which mentions "dense JSON" but such a thing does not appear to actually exist. Is this true? And even if there's no such thing as dense JSON, regular JSON is still much more efficient than Atom Pub, correct?
Is there any situation where I would want to use Atom Pub over JSON?
There should be very little difference between ATOM and JSON on the semantic level with OData. Also most OData servers (WCF Data Services for sure) support both, so it's a choice of the client which one to use. As the blog post from Pablo mentions, to get the best payload size you should enable HTTP compression. It works great on both ATOM and JSON.
Reading JSON tends to be faster (XML parsing is kind of expensive), but that's if you're concerned with CPU consumption on the client. If I remember correctly, last time I saw the numbers, the compressed payload size for ATOM and JSON is not that different.
ATOM PUB is usually easier to consume in client which has available good XML or ATOM libraries and not JSON. And vice versa. But other than that, there should not be much of a difference.

Why Thrift, Why not HTTP RPC(JSON+gzip)

Thrift's primary goal is to enable efficient and reliable communication across programming languages. but I think HTTP-RPC can also do that, web developer almost everyone knows how to work on http and it is easier to implement HTTP-RPC(json) than Thrift,
Maybe Thrift-RPC is faster, then who can tell me the difference in perfermance between them?
A few reasons other than speed:
Thrift generates the client and server code completely, including the data structures you are passing, so you don't have to deal with anything other than writing the handlers and invoking the client. and everything, including parameters and returns are automatically validated and parsed. so you are getting sanity checks on your data for free.
Thrift is more compact than HTTP, and can easily be extended to support things like encryption, compression, non blocking IO, etc.
Thrift can be set up to use HTTP and JSON pretty easily if you want it (say if your client is somewhere on the internet and needs to pass firewalls)
Thrift supports persistent connections and avoids the continuous TCP and HTTP handshakes that HTTP incurs.
Personally, I use thrift for internal LAN RPC and HTTP when I need connections from outside.
I hope all this makes sense to you. You can read a presentation I gave about thrift here:
http://www.slideshare.net/dvirsky/introduction-to-thrift
It has links to a few other alternatives to thrift.
Here is good resource on performance comparison of different serializers: https://github.com/eishay/jvm-serializers/wiki/
Speaking specifically of Thrift vs JSON: Thrift performance is comparable to the best JSON libraries (jackson, protostuff), and serialized size is somewhat lower.
IMO, strongest thrift advantages are convenient interoperable RPC invocations and convenient handling of binary data.

Serialization format common to node.js and ActionScript?

Some of my friends are designing a game, and I am helping them out by implementing the game's backend server. The game is written in Flash, and I plan to develop the server in node.js because (a) it would be a cool project for learning node.js, and (b) it's fast, which is important for games.
The server's architecture is based on messages sent between the server and client (sort of like Minecraft's server protocol). The message format I have so far is a byte (the packet type), two bytes (the message length) and that many bytes (the message data, which is a mapping of key-value pairs). Problem is, I really don't want to develop my own serialization format (because while I probably could, implementing it would be a pain compared to using an existing solution).
Unfortunately, I am having problems finding a good candidate for the message data serialization format.
ActionScript's own remoting format might work, but I don't like it much.
JSON has support in node.js (obviously) and in ActionScript, but it's also textual and I would prefer binary for enhanced speed.
MessagePack looked like a good candidate, but I can't find an ActionScript implementation. (There's one called as3-msgpack on Google Code, but I get weird errors and can't access it.)
BSON has an ActionScript implementation, but no node.js support besides their MongoDB library (and I'm planning on using Redis).
So, can anyone offer any other serialization formats that I might have missed? Or should I just stick with one of these (or roll my own)?
Isn't that why HTTP supports gzipped content? Just use JSON and gzip the content when you send it. The time spent gzipping is more than recovered by the reduced latency of the transmission.
Check this article for more on gzip with Actionscript. On node.js I think that gzip-compress is fairly popular.
Actually, if I were in your shoes I would implement two methods and time them. Use JSON because it is common and easy to do. But then implement AMQP instead and compare them. If you want to massively scale this then you might find that AMQP makes it easier. Also. message queuing is just such a nice fit into the node.js world view.
AMQP on Actionscript, and someone doing similar on node.js.
Leverage JSAMF in Node.js for AMF communications with Flash.
http://www.jamesward.com/2010/07/07/amf-js-a-pure-javascript-amf-implementation/
If you wanted to, you could create your entire API in client side JavaScript, and use JSON as the data exchange format, then call ExternalInterface by AS to communicate with the client JavaScript API, which would make for an elegant server side solution.
It is worth noting that Flash Player has built in support for decompressing gzip compressed data. It may be worth compressing some of your JSON objects, things like localised string tables, game configuration data, etc which can grow to be a few hundred kb but are only loaded once on game load.
I'm working on a version of MessagePack for AS3.
At the current version it does the basic (encoding/decoding). Planning streams for the future.
Check the project page: https://github.com/loteixeira/as3-msgpack

Cross-platform and language (de)serialization

I'm looking for a way to serialize a bunch of C++ structs in the most convenient way so that the serialization is portable across C++ and Java (at a minimum) and across 32bit/64bit, big/little endian platforms. The structures to be serialized just contain data, i.e. they're pure data objects with no state or behavior.
The idea being that we serialize the structs into an octet blob that we can store in a database "generically" and be read out later on. Thus avoiding changing the database whenever a struct changes and also avoiding assigning each data member to a field - i.e. we only want one table to hold everything "generically" as a binary blob. This should make less work for developers and require less changes when structures change.
I've looked at boost.serialize but don't think there's a way to enable compatibility with Java. And likewise for inheriting Serializable in Java.
If there is a way to do it by starting with an IDL file that would be best as we already have IDL files that describe the structures.
Cheers in advance!
I stumbled here, having a very similar question. 6 years later, this might not be useful to you, but hopefully it will be to others.
There are a lot of alternatives, unfortunately with no clear winner (although one could argue that JSON is the clear winner). Even Google has released multiple competing technologies (all of them apparently being used internally):
FlatBuffers: this one seems to meet the requirements from the original question, has interesting benchmarks and supports some form of IDL (I'm personally not familiar with IDL)
Protocol Buffers: mentioned previously.
XFJSON: 5%-12% smaller than JSON.
Not to forget the alternatives posted in the other answers. Here are a few more:
YAML: JSON minus all the double quotes, but using indentation instead. It's more human readable, but probably less efficient, especially as it gets larger.
BSON (Binary JSON)
MessagePack (Another compacted JSON)
With so many variations, JSON is clearly the winner in terms of simplicity/convenience and cross-platform access. It has gained even more popularity in the last couple years, with the rise of JavaScript. A lot of people probably use that as a de-facto solution, without giving it much thought (that's what I originally did :P).
However, if size becomes an issue, but you prefer to keep things simple and not use one of the more advanced libraries, you could just compress JSON using zlib (that's what I'm doing now), or some other cross-platform algorithm (but that's a whole other topic).
To speed up JSON handling in C++, you could also use RapidJSON.
I'm surprised Jon Skeet hasn't already pounced on this one :-)
Protocol Buffers is pretty much designed for this sort of scenario -- passing structured data cross-language.
That said, if you're using a database the way you suggest, you really shouldn't be using a full-strength RDBMS like Oracle or SQL Server but rather a lightweight key-value store such as Berkeley DB or one of the many "cloud table" engines.
If I want to go really really cross language, I normally would suggest JSON, as the ease of javascript support and an abundance of libraries, as well as being human readable and modifiable (I prefer it to XML as I find it smaller in terms of chars, faster, and more readable). It's not the most efficient in terms of space, however, and a more machine readable format like protocol buffers or thrift would have advantages there (thrift can be made from an IDL, but it is also made for encoding services, so it could be heavier than you want).
You need ASN.1! (Some people refer to this as binary XML.) ASN.1 is very compact and thus ideal to transfer data between two systems. And for those who don't think this is ever used: several Internet protocols are based upon the ASN.1 model for data serialization!
Unfortunately, there aren't many libraries available for Java or C++ that will support ASN.1. I had to work with it several years ago and just couldn't find a good, free or inexpensive tool to allow support for ASN.1 in C++. At Objective Systems they are selling ASN.1/XML solutions but it's extremely expensive. (The ASN.1 compiler for C++ and Java, that is!) It costs you an arm and a leg at least! (But then you will have a tool that you can use with only one hand...)
I'd suggest saving the data with SQLite database. The structs can be stored as database rows in SQLite tables.
The resulting database file is binary compatible across many different platforms and can be stored as a BLOB in your main database. I believe the file size is comparable to compressed XML file with the same data, but memory usage during processing will be significantly less than XML DOM.
Why haven't you chosen XML, as this perfectly suits your demand. Both C++ and Java allow for an easy implementation.
Furthermore, I doubt your idea of storing everything as a blob in the database, use a relational database what a database has been designed for, or switch to some object oriented database like http://www.versant.com/en_US/products/objectdatabase which supports both Java and C++.
There is also Avro. Look this question for comparison of Apache thrift, protocol buffers, mes and so on.