Relations and differences between marshall/unmarshal, encoding/decoding, and serialization/deserialization for JSON? - json

In Go's JSON package, I saw there are marshal, decode and other functions.
I thought that decode is the opposite to marshal, but latter realized that I might be wrong.
I think the fundamental question that I have is:
What are the relations and differences between marshall/unmarshal, encoding/decoding, and serialization/deserialization for JSON?
Thanks.
See an example here Why are json package's Decode and Marshal methods used here?

I would personally say all those terms are synonyms though less so with encoding/decoding. In Go Marshal and Unmarshal happen to be the terms that are used to describe converting json in a string form to an object and vice versa. However in C# these same methods are called serialize and deserialize, as far as I know that terminology isn't in Go at all (at least not in any std lib).
Encoding can be used as an adjective to describe the format in which some data is stored, the most common use is probably character encoding (UTF-8). In Go it's also used as a noun to describe objects that can unmarshal/marshal json. Marshal/Unmarshal are always used as verbs, you take that action on the json.
Encoding is also used in Go to refer to a larger category of packages that deal with the conversion from one encoding to another.
If you told me you were marshalling, marshalling, deserializing or serializing some object or json I would understand exactly what you meant. If you said you were json encoding an object I would ask a clarifying question. If you said the "response is json encoded" I would get what you mean though I would think it's odd that you used those words rather than just saying "the response is json". Hope that is more or less the information you're looking for.
Oh also, just for more clarity
Unmarshal == deserialize == decode
Marshal == serialize == encode

In the encoding/json package, the Marshal function and the inverse Unmarshalfunction return and operate on single fixed bytes slices. They transform single objects to bytes, and vice versa.
There are also the Encoder and Decoder types. These contain the Encode and Decode methods, and they operate on streams of bytes, taking an io.Reader and io.Writer respectively. They also allow multiple objects to be serialized or deserialized with a newline delimiter using those streams.
The underlying mechanisms of Marshal/Unmarshal functions and the Encoder/Decoder types are identical, they both use the same internal encodeState.marshal and decodeState.unmarshal codepaths. The only real difference is they provide alternative access for various usage patterns.

Related

What is the cleanest way to perform nested conversion of numpy types to python (JSON friendly) types?

What is the cleanest way to perform nested conversion of a "deep" object that contains mixed python / numpy types to an object containing only python types?
The question is motivated by the need to send the data as JSON, but here I do not have control over json.dumps() because that is the province of a different application. In other words, I cannot specify the JSON encoder.
One possible solution involves adopting the JSON encoder solution anyway, followed by a conversion back to JSON with json.loads(). This would mean every message has two round trips to JSON rather than one which might not be the end of the way. But is there a "better" alternative?
Note that I need to apply recursively so that fact that tolist() or item() sometimes works isn't a complete solution here.

Unified data model for JSON, BSON and YAML

Originally, JSON borrowed its syntax from JavaScript (object literals), but then became a programming language agnostic data interchange format. Its structures (string, array, object) can be mapped directly to primitive data types in most dynamic programming languages and vice versa.
Now, since it is no longer tied to JavaScript, what is the abstract data model of JSON today? In other words, if we compare XML with JSON, is there a XML Infoset equivalent for JSON?
Obviously, JSON is not the only format that can be used for serialization of JSON-like documents. Alternatives include YAML, BSON, and even XML. Is there a name for that unified data model and perhaps a formal specification available?
XML is more complicated that JSON format. Some common features that XML has and JSON lacks are: namespaces, attributes, comments. However, both formats can represent any kind of data, but potentially with a different structure logic.
What's the abstract data model of JSON ? The same as it was when it was created, nothing changed. JSON served as a data format for server-client communication. It was never tied to JavaScript, since it is just a formatted data string and not some kind of binary executable. Its format originates from javascript yes, but any language can interpret it with a text parser.
I am not sure what kind of information you are looking for, but the name of the process that converts language-specific structured data into strings and vise versa is called Serialization/Unserialization, but you already know these terms ...
"Unified data model", "formal specification", what are you even looking for ? Are you looking for principles of data formatting ? Data storing ? People need to store/transmit/present their data and they come up with ways to do it, there is nothing more to it.

JSON.stringify versus serialization

Is JSON.stringify( ) equivalent to serialization or effectively serialization or is it just a necessary step towards
serialization?
In other words, is JSON.stringify( ) sufficient but not necessary for serialization? Or is necessary but not sufficient? Or is it neither necessary nor sufficient for serialization of JavaScript objects?
Serialization is the act of converting data into a format that can be written to disk or transmitted over the network (or written on paper if that's what you want). Usually, serialization is transforming objects to text but that's not necessary since there are several serialization formats such as bittorrent's bencoding and the old/ancient standard asn.1 formats which are binary.
JSON is one form of text-based serialization format and is currently very popular due to it's simplicity. It's not the only one though. Other popular formats include XML and CSV.
Due to its popularity and its origin as javascript object literal syntax ES5 introduced JSON.stringify() to generate a JSON string from an object. Previously you had to use libraries or write a recursive descent parser to do the job.
So, is JSON.stringify() enough for serialization? Yes, if the output format you want is JSON. No, if you want other output formats such as XML or CSV or bencode.
There are limitations to the JSON format. One limitation is that JSON cannot encode functions so JSON.stringify() ignores functions/methods when serializing. JSON also can't encode circular references. Most other serialization formats have this limitation as well but since JSON looks like javascript syntax some people assume it can do what javascript object literals can. It can't.
So the relationship between "JSON" and "serialization" is like the relationship between "Toyota Prius" and "car". JSON.stringify() is simply a function that generates JSON strings so I guess that would make it a Toyota factory.
Old question, but the following information may be useful for posterity.
Of course, you can serialise any way you want, including any number of custom methods, but JSON has become an increasingly popular method.
The most obvious benefit of JSON is that it represents objects in the same way that JavaScript object literals do, though it is slightly less flexible. Nevertheless, if you can represent normal data in JavaScript then JSON is a good match.
The most significant feature is that, since it represents objects as well as arrays, it can represent fairly complex & hierarchical data.
For one reason or another, JSON has more-or-less supplanted XML as the preferred serialisation for sending data between the server and browser. It is so useful that many languages include their own JSON functions (PHP, for example, has the better named json_encode & json_decode functions), as do some modern Databases. I myself have found it convenient to use JSON functions to store a more complex data structure in a single field of a database without JavaScript anywhere in sight).
The short answer is yes, for the most part it is a sufficient step to serializing most data (non-binary). It is not, however, necessary as there are alternatives.
Serializing binary data, on the other hand, now that’s another story …
Short answer... Serialize means the same thing as Stringify, IMHO.

Serialize and unserialize

What does the serialize do?
Why do we need to serialize an Object and again unserialize it?
Is it for any sort of security measures?
Serialization is the process of turning an object or an object graph into a form that is independent from the specifics of the current execution environment.
Deserialization is the reverse of serialization. It is the process of reading the data written during serialization and restoring the object or object graph in the current execution environment.
Serialization is similar to Data Marshalling, as both describe writing out an object as execution-independent data. However, serialization is typically tailored to a specific language/platform, often featuring idioms of the host language, while Data marshalling aims to be language-neutral, providing a level of interoperability.
Serialization formats may be opaque or transparent. For example, Java serialization is opaque - the data is not used for purposes other than for deserialization. Java also offers an XMLEncoder/XMLDecoder that writes objects as XML in terms of their public properties. That format is transparent and can be processed/manipulated easily.
Serialization itself is not a security measure. In fact it can be a vulnerability when dealing with secured data. Users of serialization should ensure that the serialized data is guarded by at least the same level of security as the original object instance. Failure to do so is opening up the data to unauthorized use.
Serialization is the process of converting objects into strings, which can then be unserialized back into the same objects that they originally were.
One reason for serializing an object would be to store the serialized object (a string) into a database, from which you could then re-create when retrieving the string and passing it to unserialize.
Objects cannot be passed around as objects. We serialize them to text, pass them around, and then unserialize them so that they can be used at more than one place or time.
It's for storing objects in files, databases or any thing that can store strings or for passing them to another application/server/whatever.
serialize() gives a string representation of an object while unserialize() rebuilds the objects from a serialized string. Remember that the objects class definition must still be present to rebuild it.
The PHP manual pretty much explains that, too...

Is there a standard mapping between JSON and Protocol Buffers?

From a comment on the announcement blog post:
Regarding JSON: JSON is structured
similarly to Protocol Buffers, but
protocol buffer binary format is still
smaller and faster to encode. JSON
makes a great text encoding for
protocol buffers, though -- it's
trivial to write an encoder/decoder
that converts arbitrary protocol
messages to and from JSON, using
protobuf reflection. This is a good
way to communicate with AJAX apps,
since making the user download a full
protobuf decoder when they visit your
page might be too much.
It may be trivial to cook up a mapping, but is there a single "obvious" mapping between the two that any two separate dev teams would naturally settle on? If two products supported PB data and could interoperate because they shared the same .proto spec, I wonder if they would still be able to interoperate if they independently introduced a JSON reflection of the same spec. There might be some arbitrary decisions to be made, e.g. should enum values be represented by a string (to be human-readable a la typical JSON) or by their integer value?
So is there an established mapping, and any open source implementations for generating JSON encoder/decoders from .proto specs?
May be this is helpful http://code.google.com/p/protobuf-java-format/
From what I have seen, Protostuff is the project to use for any PB work on Java, including serializing it as JSON, based on protocol definition. I have not used it myself, just heard good things.
Yes, since Protocol Buffers version 3.0.0 (released July 28, 2016) there
is "A well-defined encoding in JSON as an alternative to binary proto
encoding" as mentioned in the release notes
https://github.com/google/protobuf/releases/tag/v3.0.0
I needed to marshal from GeneratedMessageLite to a JSON object but did not need to unmarshal. I couldn't use the protobuf library in Pangea's answer because it doesn't work with the LITE_RUNTIME option. I also didn't want to burden our already large legacy system with generating more compiled code for the existing protocol buffers. For mashalling to JSON, I went with this simple solution to marshal
final Person gpb = Person.newBuilder().setName("Bill Monroe").build();
final Gson gson = new Gson();
final String jsonString = gson.toJson(gpb);
One further thought: if protobuf objects have getters/setters, or appropriately named fields, one could simply use Jackson JSON processor's data binding. By default it handles public getters, any setters and public fields, but these are just default visibility levels and can be changed. If so, Jackson can serialize/deserialize protobuf generated POJOs without problems.
I have actually used this approach with Thrift-generated objects; the only thing I had to configure there was to disable serialization of various "isXXX()" methods that Thrift adds for checking if a field has been explicitly assigned or not.
First of all I think one should reason very carefully on putting an effort into converting a dataset to protobuffs. Here my reasons to convert a data-set to protobuffs
Type Safety: guarantee on the format of the data being considered.
uncompressed memory foot-print of the data. The reason I mention un-compressed is because post compression there isn't much of a difference in the size of JSON compressed and proto compressed but compression has a cost associated with it. Also, the serialization/de-serialization speed is almost similar, infact Jackson json is faster than protobuffs. Please checkout the following link for more information
http://technicalrex.com/2014/06/23/performance-playground-jackson-vs-protocol-buffers/
The protobuffs needs to be transferred over the network a lot.
Saying that once you convert your data-set to Jackson JSON format in the way that the ProtoBuff definition is defined then it can very easily be directly mapped to ProtoBuff format using the Protostuff:JsonIoUtil:mergeFrom function. Signature of the function :
public static <T> void mergeFrom(JsonParser parser, T message, Schema<T> schema, boolean numeric)
Reference to protostuff