Is there a standard mapping between JSON and Protocol Buffers? - json

From a comment on the announcement blog post:
Regarding JSON: JSON is structured
similarly to Protocol Buffers, but
protocol buffer binary format is still
smaller and faster to encode. JSON
makes a great text encoding for
protocol buffers, though -- it's
trivial to write an encoder/decoder
that converts arbitrary protocol
messages to and from JSON, using
protobuf reflection. This is a good
way to communicate with AJAX apps,
since making the user download a full
protobuf decoder when they visit your
page might be too much.
It may be trivial to cook up a mapping, but is there a single "obvious" mapping between the two that any two separate dev teams would naturally settle on? If two products supported PB data and could interoperate because they shared the same .proto spec, I wonder if they would still be able to interoperate if they independently introduced a JSON reflection of the same spec. There might be some arbitrary decisions to be made, e.g. should enum values be represented by a string (to be human-readable a la typical JSON) or by their integer value?
So is there an established mapping, and any open source implementations for generating JSON encoder/decoders from .proto specs?

May be this is helpful http://code.google.com/p/protobuf-java-format/

From what I have seen, Protostuff is the project to use for any PB work on Java, including serializing it as JSON, based on protocol definition. I have not used it myself, just heard good things.

Yes, since Protocol Buffers version 3.0.0 (released July 28, 2016) there
is "A well-defined encoding in JSON as an alternative to binary proto
encoding" as mentioned in the release notes
https://github.com/google/protobuf/releases/tag/v3.0.0

I needed to marshal from GeneratedMessageLite to a JSON object but did not need to unmarshal. I couldn't use the protobuf library in Pangea's answer because it doesn't work with the LITE_RUNTIME option. I also didn't want to burden our already large legacy system with generating more compiled code for the existing protocol buffers. For mashalling to JSON, I went with this simple solution to marshal
final Person gpb = Person.newBuilder().setName("Bill Monroe").build();
final Gson gson = new Gson();
final String jsonString = gson.toJson(gpb);

One further thought: if protobuf objects have getters/setters, or appropriately named fields, one could simply use Jackson JSON processor's data binding. By default it handles public getters, any setters and public fields, but these are just default visibility levels and can be changed. If so, Jackson can serialize/deserialize protobuf generated POJOs without problems.
I have actually used this approach with Thrift-generated objects; the only thing I had to configure there was to disable serialization of various "isXXX()" methods that Thrift adds for checking if a field has been explicitly assigned or not.

First of all I think one should reason very carefully on putting an effort into converting a dataset to protobuffs. Here my reasons to convert a data-set to protobuffs
Type Safety: guarantee on the format of the data being considered.
uncompressed memory foot-print of the data. The reason I mention un-compressed is because post compression there isn't much of a difference in the size of JSON compressed and proto compressed but compression has a cost associated with it. Also, the serialization/de-serialization speed is almost similar, infact Jackson json is faster than protobuffs. Please checkout the following link for more information
http://technicalrex.com/2014/06/23/performance-playground-jackson-vs-protocol-buffers/
The protobuffs needs to be transferred over the network a lot.
Saying that once you convert your data-set to Jackson JSON format in the way that the ProtoBuff definition is defined then it can very easily be directly mapped to ProtoBuff format using the Protostuff:JsonIoUtil:mergeFrom function. Signature of the function :
public static <T> void mergeFrom(JsonParser parser, T message, Schema<T> schema, boolean numeric)
Reference to protostuff

Related

Best way to store golang map[string]interface into data store

Scenario: I am having a arbitrary JSON size ranging from 300 KB - 6.5 MB read from MongoDB. Since it is very much arbitrary/dynamic data I cannot have struct type defined in golang, So I am using map[sting]interface{} type. And string of JSON data is parsed by using encoding/json's Unmarshal method. Some what similar to what is mentioned in Generic JSON with interface{}.
Issue: But the problem is its taking more time (around 30ms to 180ms) to parse the string json into map[string]interface{}. (Comparing to php parsing json using json_encode/decode / igbinary/ msgpack)
Question: Is there any way to pre-process it and store in cache?
I mean parse string into map[string]interface{} and serialize it and store to some cache, then when we retrieve it should not take much time to unserialization and proceed with execution.
Note: I am Newbie for the golang any suggestion are highly appriciated. Thanks
Updates: Serialization using Gob, binary built-in package & Msgpack implementation for Golang package are already tried. No luck, No improvement in the time to unserialization.
The standard library package for JSON is notoriously slow. There is a good reason for that: it use RTTI to provide a really flexible interface that is really simple. Hence the unmarshalling being slower than PHP's…
Fortunately, there is an alternative way, which is to implement the json.Unmarshaller interface on the types you want to use. If this interface is implemented, the package will use it instead of its standard method so you can see huge performance boosts.
And to help you, there is a small group of tools that appeared, among which:
https://godoc.org/github.com/benbjohnson/megajson
https://godoc.org/github.com/pquerna/ffjson
(listing here the main players from memory, there must be others)
These tools will generate tailored implementations of the json.Unmarshaller interface for the types you requested. And with go:generate, you can even integrate them seamlessly to your build step.

JSON.stringify versus serialization

Is JSON.stringify( ) equivalent to serialization or effectively serialization or is it just a necessary step towards
serialization?
In other words, is JSON.stringify( ) sufficient but not necessary for serialization? Or is necessary but not sufficient? Or is it neither necessary nor sufficient for serialization of JavaScript objects?
Serialization is the act of converting data into a format that can be written to disk or transmitted over the network (or written on paper if that's what you want). Usually, serialization is transforming objects to text but that's not necessary since there are several serialization formats such as bittorrent's bencoding and the old/ancient standard asn.1 formats which are binary.
JSON is one form of text-based serialization format and is currently very popular due to it's simplicity. It's not the only one though. Other popular formats include XML and CSV.
Due to its popularity and its origin as javascript object literal syntax ES5 introduced JSON.stringify() to generate a JSON string from an object. Previously you had to use libraries or write a recursive descent parser to do the job.
So, is JSON.stringify() enough for serialization? Yes, if the output format you want is JSON. No, if you want other output formats such as XML or CSV or bencode.
There are limitations to the JSON format. One limitation is that JSON cannot encode functions so JSON.stringify() ignores functions/methods when serializing. JSON also can't encode circular references. Most other serialization formats have this limitation as well but since JSON looks like javascript syntax some people assume it can do what javascript object literals can. It can't.
So the relationship between "JSON" and "serialization" is like the relationship between "Toyota Prius" and "car". JSON.stringify() is simply a function that generates JSON strings so I guess that would make it a Toyota factory.
Old question, but the following information may be useful for posterity.
Of course, you can serialise any way you want, including any number of custom methods, but JSON has become an increasingly popular method.
The most obvious benefit of JSON is that it represents objects in the same way that JavaScript object literals do, though it is slightly less flexible. Nevertheless, if you can represent normal data in JavaScript then JSON is a good match.
The most significant feature is that, since it represents objects as well as arrays, it can represent fairly complex & hierarchical data.
For one reason or another, JSON has more-or-less supplanted XML as the preferred serialisation for sending data between the server and browser. It is so useful that many languages include their own JSON functions (PHP, for example, has the better named json_encode & json_decode functions), as do some modern Databases. I myself have found it convenient to use JSON functions to store a more complex data structure in a single field of a database without JavaScript anywhere in sight).
The short answer is yes, for the most part it is a sufficient step to serializing most data (non-binary). It is not, however, necessary as there are alternatives.
Serializing binary data, on the other hand, now that’s another story …
Short answer... Serialize means the same thing as Stringify, IMHO.

Windows Store App: Performance of JSON parser

There are many possibilities to parse a JSON in context of a Windows Store App.
Regardless in which language (C#, JavaScript or C++).
For example: .NET 4.5 JsonObject and DataContractJsonSerializer, JavaScript Json parser or an extern one like Json.NET.
Does anybody know something about that?
I only read good things about Json.NET's performance.
But are they true and play that a role for JSON's which include datasets of 100k JSON objects? Or the user won't notice a difference?
I only have experience using Json.NET ... works just fast and great! I also used the library in enterprise-projects - i never got disappointed!
If it helps, and FWIW, I've been collecting recently some new JSON parsing / deserialization performance data that can be observed over various JSON payload "shapes" (and sizes), when using four JSON librairies, there:
https://github.com/ysharplanguage/FastJsonParser#Performances
(.NET's out-of-the-box JavaScriptSerializer vs. JSON.NET vs. ServiceStack vs. JsonParser)
Please note:
these figures are for the full .NET only (i.e., the desktop / server tier; not mobile devices)
I was interested in getting new benchmark figures about parsing / deserialization performances only (i.e., not serialization)
finally, I was also especially interested (although not exclusively) in figures re: strongly typed deserialization use cases (i.e., deserializing into POCOs)
'Hope this helps,

Looking for a Standard Data Interchange Format

We are going to be moving some data around that will have some standard fields as well as some key value pairs which will vary between data items. Obviously we could code something in JSON or XML to do this and write our own marshalling/unmarshalling code however I was hoping for a standards based solution that has some or all of the following:
Marshalling/unmarshalling for SharePoint lists/.Net
Marshalling/unmarshalling for Java
Service definitions and semantics for operating on the data across an integration boundary
Security semantics
We are currently looking at the OData protocol to perform this task: http://www.odata.org/
Presumably you've made your decision long ago now, but for anyone else who ends up here, and is perhaps interested in something at a lower level than OData, here's what I'm using for C# to Java data interchange:
Google Protocol buffers as the interchange format:
https://developers.google.com/protocol-buffers/
Marc Gravell's protobuf-net at the C# end:
http://code.google.com/p/protobuf-net/
A program called called protostuff at the Java end:
http://code.google.com/p/protostuff/
(I prefer protostuff to the official Google Java implementation of protocol buffers due to Google's implementation being based on the Java objects being immutable.)
Actually, I'm not using pure protocol buffers as the interchange format - I prefix the data with the name of the (outermost) class being transmitted. This makes the data self-identifying for deserializing at the other end.

IDL for JSON REST/RPC interface

We are designing a fairly complex REST API, in which most of the I/O are JSON encoded objects with a specific structure. One challenge we have found is to document the API in such a way that makes it easier for clients to post correct input and process output. Because the data of both the input and output requires fairly complex JSON objects, client developers often introduce bugs related to the structure of the I/O objects.
With all of the JSON web API's these days, I would have hoped for a general solution, but I am having a hard time finding one. I looked into json-schema which is a json-validation schema but both the IETF draft and implementations seem to be fairly immature (even though they have been around for a while, which is not a good sign).
A slightly different approach is offered by Protocol Buffers and Apache Avro, where the schema is not used for validation, but actually required for the encoding/decoding of the message. Of these 2, Avro seems to have rather limited documentation and implementations. ProtoBuf seems better, but I am not sure if this is really suitable to use in the browser to call a JSON api?
Now I am starting to doubt if I am looking at this from the right angle. Are there other methods available to make my API a bit more strong-typed'ish? Or is a formal description of a JSON REST/RPC API something that defeats the purpose of using JSON?
Edit: 6 months after this topic we found mongoose, which is very close to what we were lookin for.
Below a reply I received by email from Douglas Crockford.
I am not a believer in schemas as an alternative to input validation.
There are properties that cannot be verified from the syntax. I think
that was one of the ways that XML went wrong.
If your formats are too complex, then I would look at simplifying
them.
Such systems exist and I'm the author of one of them. It is called Piqi-RPC and it does IDL-based validation of the input and output parameters for RPC-style APIs over HTTP.
It supports JSON, XML and Google Protocol Buffers as data representation formats for input and output of HTTP POST requests. Clients can choose to use any of the three formats and specify their choice using the standard Accept and Content-Type HTTP headers.
So, yes, in theory, you are looking in the right direction. However, at the moment, Piqi-RPC supports writing servers only in Erlang and it wouldn't be very useful for you if you use a different stack. I heard that Apache Thrift also supports JSON over HTTP transport, but I haven't checked. Another kind of similar system I know of (also for Erlang) is called UBF. I have heard of libraries for Java that can parse and validate JSON based on Protocol Buffers specification (e.g. http://code.google.com/p/protostuff/).
The idea itself is far from being new, but there aren't many systems that approach it in practice. It is a challenging problem.
Historically, IDLs were used for interface definition and binary data serialization and not so much for validating dynamic data interchange formats (e.g. XML and JSON) which emerged later. Sun-RPC IDL and CORBA IDL fall in the first category. WSDL would be one of few examples covering both areas, but it is a terrible piece of technology and it would be a bad choice for most modern systems. In addition, there are many schema languages (also known as DDLs -- data definition languages), most of which are highly specialized and work with only one representation format, e.g. XML or JSON schemas. Few of those have stable implementations.
The Piqi project and Piqi-RPC, which is based on it, are build around several fairly simple realizations:
DLL doesn't have to be explicitly tied to any particular data representation format or built around it. Instead, such language can be fairly universal and cover wide range of practical use-cases (e.g. cross-language data serialization and data validation) and data formats (e.g. JSON, XML, Protocol Buffers).
IDL for RPC-style communication can be implemented as a thin, mostly syntactic layer on top of the universal DDL.
Such IDL and interface specifications can be transport agnostic.
Speaking of REST-style APIs over HTTP compared to RPC-style APIs over HTTP.
With RPC-style APIs, service developer or an automated system have to validate three things: function name (according to some service naming scheme), input and, if you choose so, output.
In case of REST-style APIs, people get themselves in trouble for no good reason. Now, they have a lot more stuff to validate: arbitrarily complex URL syntax, including dynamic parameters encoded in URL segments (for all HTTP methods) and URL query string (only for HTTP GET method), HTTP method correspondence (whether it should be GET, POST, PUT, DELETE, etc.), HTTP body when some parameters go there (sometimes they do it manually twice for parameters represented in JSON and XML), custom HTTP headers, and separately -- service documentation. Imagine an IDL supporting all that!
XML is better for RESTful services in many ways. It has native linking (<link href=, for all those HATEOAS fans), native language support (lang="en") and a great ecosystem.
It is also better for future proofing and future API refactorings. Converting this:
<profile>
<username>alganet</username>
</profile>
To support more usernames:
<profile>
<username>alganet</username>
<username>alexandre</username>
</profile>
Is much more simpler to do without breaking existing clients using XML. JSON is hard on that.
If you really need JSON, JSON-Schema is the way to go. It's immature, but I don't know anything better on that case. Maybe your consumers could choose between XML and JSON, so they can choose between a small payload (JSON) or RESTful candies (XML) using Content Negotiation.
I'd say the answer to your last question is yes. If you need a way to constrain and document the JSON "schema", why didn't you go with XML in the first place? It is not that much harder to parse, and being able to enforce a schema for it is a great advantage.