Is there a standard to specify a binary format in json - json

I would like to know whether there is some standard that specifies binary formats using JSON as the describing language, similar to google's protocol buffers.
Protocol buffers seem very powerful but they require parsing of yet another language and considerable overhead, especially for compiled languages such as C++.
So I am wondering whether there is some accepted standard that uses JSON to describe a binary format. (Parsing the binary data might then still require some manual steps, but at least a clear and unique description of the data can be made available.)
To be clear, I am not talking about encoding binary data in JSON, I am talking about describing binary data in JSON.

Head to the ultimate Wikipedia listing and evaluate for yourself. I don't know what is the right argument to overcome your programmer's inertia. I'd consider Apache Avro the most fitting your requirement - it has JSON description.
For least friction, you could try MessagePack or BSON, which are JSON themselves, just better packed. But, by not having external declaration, need to be self descriptive, so must transport the field names on wire - so it's not as "binary" and compact as Protocol Buffers or Avro.

Related

Protocol Buffers vs XML/JSON for data entry outside of programming effort

I would love to use protocol buffers, but I am not sure if they fit my use case. Here it is:
I have a Quiz app. This requires a bunch of data, like categories, questions, a list of answers (and which ones are correct). I do not want to be responsible for entering this data - I would prefer to pass it off to a non-programmer to serialize all this data for me, in either XML or JSON. Then my app would just read in the data file.
Does Google's Protocol Buffers fit my use case? Or should I stick to a more traditional format like XML or JSON?
I think not: Protobuf is a binary format. So then you would need to support a text format like XML or JSON and Protobuf.
Also it does not seem you would benefit from Protobufs better berformance at all.

How to generate automatically asn.1 encoded packets?

I want to test my application and I need to generate different load. Application is SUPL RRLP protocol parser, I have ASN.1 specification for this protocol. Packets have a lot of optional fields and number of varians may be over billion - I can't go through all the options manually. I want to automate it.
The first way to generate packets automatically, the other way is to create a lot different value assignments sets and encode each into binary format.
I found some tools, for example libtasn and Asn1Editor, but the first one can't parse existing ASN.1 spec file; the second one can't encode packets by specification.
I'm afraid to create thousandth ASN.1 parser because I can introduce errors in test process.
I hoped it's easy to find something existing, but... I'm capitulating.
Maybe, someone faced with the same problem on stackowerflow and found the solution? Or know something to recommend. I'll thank you.
Please try going to https://asn1.io/asn1playground/ and try your specification there. You can ask it to generate a sample value for a given ASN.1 type. You can encode it and edit either the encoded (hex) data, or decoded values to create additional values.
You can also download a free trial of the OSS ASN.1 Tools from http://www.oss.com/asn1/products/asn1-download.html which includes OSS ASN.1 Studio. This also allows you to generate (and modify) sample values for a given ASN.1 type.
Note that these don't generate thousands of different test values for you automatically, but will parse valid value notation and encode the values for you if you are able to generate valid ASN.1 value notation.

ASN.1 vs JSON when is is appropriate to use them?

When is using ASN.1 preferable to using JSON? What are some advantages and disadvantages of both approaches?
ASN.1 and JSON aren't strictly comparable. JSON is a data format. ASN.1 is a schema language plus multiple sets of encoding rules, each of which produces different data formats for a given schema. So, the original question somewhat parallels the question "XML Schema vs. XML: when is it appropriate to use them?" A fairer comparison would be between ASN.1 and JSON Schema.
That said, a few points to consider:
ASN.1 has binary encoding rules. Consider whether binary or text encoding is preferable for your application.
ASN.1 also has XML and JSON encoding rules. You can opt to go with a text-based encoding using ASN.1, if you like.
ASN.1 allows other encoding rules to be developed. Before ITU-T specified encoding rules for JSON, we specified our own rules to encode ASN.1 to JSON. I blogged about this on our company website here
As with XML Schema, tools exist for compiling ASN.1. These are commonly referred to as data binding tools. The compiler output consists of data structures to hold your data, and code for encoding/decoding to/from the various encodings (binary, XML, JSON).
I am not sure what, if any, data binding tools exist for JSON Schema. I am also not sure how mature/stable JSON Schema is, whereas ASN.1 is quite mature and stable.
Choosing between JSON Schema and ASN.1, note that JSON Schema is bound to JSON, whereas ASN.1 is not bound to any particular representation.
You can use ASN.1 regardless of whether you need to serialize messages that might go to a recipient using C, C++, C#, Java, or any other programming language with ASN.1 encoder/decoder engine. ASN.1 also provides multiple encoding rules which have benefits under different circumstances. For example, DER is used when a canonical encoding is crucial, such as in digital certificates, while PER is used when bandwidth is critical such as in cellular protocols, and E-XER is used when you don't care about bandwidth and would like to display an encoding in XML for maniplulation in a browser or exchange messages with an XML Schema engine.
Note that with a good ASN.1 tool, you don't have to change you application code to switch between these ASN.1 encoding rules. A simple function call can select the encoding rules you would like to use.
Here can found a papper with a great study of JSON, XML, ASN.1, EXI and ProtoBuf

IDL for JSON REST/RPC interface

We are designing a fairly complex REST API, in which most of the I/O are JSON encoded objects with a specific structure. One challenge we have found is to document the API in such a way that makes it easier for clients to post correct input and process output. Because the data of both the input and output requires fairly complex JSON objects, client developers often introduce bugs related to the structure of the I/O objects.
With all of the JSON web API's these days, I would have hoped for a general solution, but I am having a hard time finding one. I looked into json-schema which is a json-validation schema but both the IETF draft and implementations seem to be fairly immature (even though they have been around for a while, which is not a good sign).
A slightly different approach is offered by Protocol Buffers and Apache Avro, where the schema is not used for validation, but actually required for the encoding/decoding of the message. Of these 2, Avro seems to have rather limited documentation and implementations. ProtoBuf seems better, but I am not sure if this is really suitable to use in the browser to call a JSON api?
Now I am starting to doubt if I am looking at this from the right angle. Are there other methods available to make my API a bit more strong-typed'ish? Or is a formal description of a JSON REST/RPC API something that defeats the purpose of using JSON?
Edit: 6 months after this topic we found mongoose, which is very close to what we were lookin for.
Below a reply I received by email from Douglas Crockford.
I am not a believer in schemas as an alternative to input validation.
There are properties that cannot be verified from the syntax. I think
that was one of the ways that XML went wrong.
If your formats are too complex, then I would look at simplifying
them.
Such systems exist and I'm the author of one of them. It is called Piqi-RPC and it does IDL-based validation of the input and output parameters for RPC-style APIs over HTTP.
It supports JSON, XML and Google Protocol Buffers as data representation formats for input and output of HTTP POST requests. Clients can choose to use any of the three formats and specify their choice using the standard Accept and Content-Type HTTP headers.
So, yes, in theory, you are looking in the right direction. However, at the moment, Piqi-RPC supports writing servers only in Erlang and it wouldn't be very useful for you if you use a different stack. I heard that Apache Thrift also supports JSON over HTTP transport, but I haven't checked. Another kind of similar system I know of (also for Erlang) is called UBF. I have heard of libraries for Java that can parse and validate JSON based on Protocol Buffers specification (e.g. http://code.google.com/p/protostuff/).
The idea itself is far from being new, but there aren't many systems that approach it in practice. It is a challenging problem.
Historically, IDLs were used for interface definition and binary data serialization and not so much for validating dynamic data interchange formats (e.g. XML and JSON) which emerged later. Sun-RPC IDL and CORBA IDL fall in the first category. WSDL would be one of few examples covering both areas, but it is a terrible piece of technology and it would be a bad choice for most modern systems. In addition, there are many schema languages (also known as DDLs -- data definition languages), most of which are highly specialized and work with only one representation format, e.g. XML or JSON schemas. Few of those have stable implementations.
The Piqi project and Piqi-RPC, which is based on it, are build around several fairly simple realizations:
DLL doesn't have to be explicitly tied to any particular data representation format or built around it. Instead, such language can be fairly universal and cover wide range of practical use-cases (e.g. cross-language data serialization and data validation) and data formats (e.g. JSON, XML, Protocol Buffers).
IDL for RPC-style communication can be implemented as a thin, mostly syntactic layer on top of the universal DDL.
Such IDL and interface specifications can be transport agnostic.
Speaking of REST-style APIs over HTTP compared to RPC-style APIs over HTTP.
With RPC-style APIs, service developer or an automated system have to validate three things: function name (according to some service naming scheme), input and, if you choose so, output.
In case of REST-style APIs, people get themselves in trouble for no good reason. Now, they have a lot more stuff to validate: arbitrarily complex URL syntax, including dynamic parameters encoded in URL segments (for all HTTP methods) and URL query string (only for HTTP GET method), HTTP method correspondence (whether it should be GET, POST, PUT, DELETE, etc.), HTTP body when some parameters go there (sometimes they do it manually twice for parameters represented in JSON and XML), custom HTTP headers, and separately -- service documentation. Imagine an IDL supporting all that!
XML is better for RESTful services in many ways. It has native linking (<link href=, for all those HATEOAS fans), native language support (lang="en") and a great ecosystem.
It is also better for future proofing and future API refactorings. Converting this:
<profile>
<username>alganet</username>
</profile>
To support more usernames:
<profile>
<username>alganet</username>
<username>alexandre</username>
</profile>
Is much more simpler to do without breaking existing clients using XML. JSON is hard on that.
If you really need JSON, JSON-Schema is the way to go. It's immature, but I don't know anything better on that case. Maybe your consumers could choose between XML and JSON, so they can choose between a small payload (JSON) or RESTful candies (XML) using Content Negotiation.
I'd say the answer to your last question is yes. If you need a way to constrain and document the JSON "schema", why didn't you go with XML in the first place? It is not that much harder to parse, and being able to enforce a schema for it is a great advantage.

What is the difference between YAML and JSON?

What are the differences between YAML and JSON, specifically considering the following things?
Performance (encode/decode time)
Memory consumption
Expression clarity
Library availability, ease of use (I prefer C)
I was planning to use one of these two in our embedded system to store configure files.
Related:
Should I use YAML or JSON to store my Perl data?
Technically YAML is a superset of JSON. This means that, in theory at least, a YAML parser can understand JSON, but not necessarily the other way around.
See the official specs, in the section entitled "YAML: Relation to JSON".
In general, there are certain things I like about YAML that are not available in JSON.
As #jdupont pointed out, YAML is visually easier to look at. In fact the YAML homepage is itself valid YAML, yet it is easy for a human to read.
YAML has the ability to reference other items within a YAML file using "anchors." Thus it can handle relational information as one might find in a MySQL database.
YAML is more robust about embedding other serialization formats such as JSON or XML within a YAML file.
In practice neither of these last two points will likely matter for things that you or I do, but in the long term, I think YAML will be a more robust and viable data serialization format.
Right now, AJAX and other web technologies tend to use JSON. YAML is currently being used more for offline data processes. For example, it is included by default in the C-based OpenCV computer vision package, whereas JSON is not.
You will find C libraries for both JSON and YAML. YAML's libraries tend to be newer, but I have had no trouble with them in the past. See for example Yaml-cpp.
Differences:
YAML, depending on how you use it, can be more readable than JSON
JSON is often faster and is probably still interoperable with more systems
It's possible to write a "good enough" JSON parser very quickly
Duplicate keys, which are potentially valid JSON, are definitely invalid YAML.
YAML has a ton of features, including comments and relational anchors. YAML syntax is accordingly quite complex, and can be hard to understand.
It is possible to write recursive structures in yaml: {a: &b [*b]}, which will loop infinitely in some converters. Even with circular detection, a "yaml bomb" is still possible (see xml bomb).
Because there are no references, it is impossible to serialize complex structures with object references in JSON. YAML serialization can therefore be more efficient.
In some coding environments, the use of YAML can allow an attacker to execute arbitrary code.
Observations:
Python programmers are generally big fans of YAML, because of the use of indentation, rather than bracketed syntax, to indicate levels.
Many programmers consider the attachment of "meaning" to indentation a poor choice.
If the data format will be leaving an application's environment, parsed within a UI, or sent in a messaging layer, JSON might be a better choice.
YAML can be used, directly, for complex tasks like grammar definitions, and is often a better choice than inventing a new language.
Bypassing esoteric theory
This answers the title, not the details as most just read the title from a search result on google like me so I felt it was necessary to explain from a web developer perspective.
YAML uses space indentation, which is familiar territory for Python developers.
JavaScript developers love JSON because it is a subset of JavaScript and can be directly interpreted and written inside JavaScript, along with using a shorthand way to declare JSON, requiring no double quotes in keys when using typical variable names without spaces.
There are a plethora of parsers that work very well in all languages for both YAML and JSON.
YAML's space format can be much easier to look at in many cases because the formatting requires a more human-readable approach.
YAML's form while being more compact and easier to look at can be deceptively difficult to hand edit if you don't have space formatting visible in your editor. Tabs are not spaces so that further confuses if you don't have an editor to interpret your keystrokes into spaces.
JSON is much faster to serialize and deserialize because of significantly less features than YAML to check for, which enables smaller and lighter code to process JSON.
A common misconception is that YAML needs less punctuation and is more compact than JSON but this is completely false. Whitespace is invisible so it seems like there are less characters, but if you count the actual whitespace which is necessary to be there for YAML to be interpreted properly along with proper indentation, you will find YAML actually requires more characters than JSON. JSON doesn't use whitespace to represent hierarchy or grouping and can be easily flattened with unnecessary whitespace removed for more compact transport.
The Elephant in the room: The Internet itself
JavaScript so clearly dominates the web by a huge margin and JavaScript developers prefer using JSON as the data format overwhelmingly along with popular web APIs so it becomes difficult to argue using YAML over JSON when doing web programming in the general sense as you will likely be outvoted in a team environment. In fact, the majority of web programmers aren't even aware YAML exists, let alone consider using it.
If you are doing any web programming, JSON is the default way to go because no translation step is needed when working with JavaScript so then you must come up with a better argument to use YAML over JSON in that case.
This question is 6 years old, but strangely, none of the answers really addresses all four points (speed, memory, expressiveness, portability).
Speed
Obviously this is implementation-dependent, but because JSON is so widely used, and so easy to implement, it has tended to receive greater native support, and hence speed. Considering that YAML does everything that JSON does, plus a truckload more, it's likely that of any comparable implementations of both, the JSON one will be quicker.
However, given that a YAML file can be slightly smaller than its JSON counterpart (due to fewer " and , characters), it's possible that a highly optimised YAML parser might be quicker in exceptional circumstances.
Memory
Basically the same argument applies. It's hard to see why a YAML parser would ever be more memory efficient than a JSON parser, if they're representing the same data structure.
Expressiveness
As noted by others, Python programmers tend towards preferring YAML, JavaScript programmers towards JSON. I'll make these observations:
It's easy to memorise the entire syntax of JSON, and hence be very confident about understanding the meaning of any JSON file. YAML is not truly understandable by any human. The number of subtleties and edge cases is extreme.
Because few parsers implement the entire spec, it's even harder to be certain about the meaning of a given expression in a given context.
The lack of comments in JSON is, in practice, a real pain.
Portability
It's hard to imagine a modern language without a JSON library. It's also hard to imagine a JSON parser implementing anything less than the full spec. YAML has widespread support, but is less ubiquitous than JSON, and each parser implements a different subset. Hence YAML files are less interoperable than you might think.
Summary
JSON is the winner for performance (if relevant) and interoperability. YAML is better for human-maintained files. HJSON is a decent compromise although with much reduced portability. JSON5 is a more reasonable compromise, with well-defined syntax.
GIT and YAML
The other answers are good. Read those first. But I'll add one other reason to use YAML sometimes: git.
Increasingly, many programming projects use git repositories for distribution and archival. And, while a git repo's history can equally store JSON and YAML files, the "diff" method used for tracking and displaying changes to a file is line-oriented. Since YAML is forced to be line-oriented, any small changes in a YAML file are easier to see by a human.
It is true, of course, that JSON files can be "made pretty" by sorting the strings/keys and adding indentation. But this is not the default and I'm lazy.
Personally, I generally use JSON for system-to-system interaction. I often use YAML for config files, static files, and tracked files. (I also generally avoid adding YAML relational anchors. Life is too short to hunt down loops.)
Also, if speed and space are really a concern, I don't use either. You might want to look at BSON.
I find YAML to be easier on the eyes: less parenthesis, "" etc. Although there is the annoyance of tabs in YAML... but one gets the hang of it.
In terms of performance/resources, I wouldn't expect big differences between the two.
Futhermore, we are talking about configuration files and so I wouldn't expect a high frequency of encode/decode activity, no?
Technically YAML offers a lot more than JSON (YAML v1.2 is a superset of JSON):
comments
anchors and inheritance - example of 3 identical items:
item1: &anchor_name
name: Test
title: Test title
item2: *anchor_name
item3:
<<: *anchor_name
# You may add extra stuff.
...
Most of the time people will not use those extra features and the main difference is that YAML uses indentation whilst JSON uses brackets. This makes YAML more concise and readable (for the trained eye).
Which one to choose?
YAML extra features and concise notation makes it a good choice for configuration files (non-user provided files).
JSON limited features, wide support, and faster parsing makes it a great choice for interoperability and user provided data.
If you don't need any features which YAML has and JSON doesn't, I would prefer JSON because it is very simple and is widely supported (has a lot of libraries in many languages). YAML is more complex and has less support. I don't think the parsing speed or memory use will be very much different, and maybe not a big part of your program's performance.
Benchmark results
Below are the results of a benchmark to compare YAML vs JSON loading times, on Python and Perl
JSON is much faster, at the expense of some readability, and features such as comments
Test method
100 sequential runs on a fast machine, average number of seconds
The dataset was a 3.44MB JSON file, containing movie data scraped from Wikipedia
https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json
Linked to from: https://github.com/jdorfman/awesome-json-datasets
Results
Python 3.8.3 timeit
JSON: 0.108
YAML CLoader: 3.684
YAML: 29.763
Perl 5.26.2 Benchmark::cmpthese
JSON XS: 0.107
YAML XS: 0.574
YAML Syck: 1.050
Perl 5.26.2 Dumbbench (Brian D Foy, excludes outliers)
JSON XS: 0.102
YAML XS: 0.514
YAML Syck: 1.027
From: Arnaud Lauret Book “The Design of Web APIs.” :
The JSON data format
JSON is a text data format based on how the JavaScript programming language describes data but is, despite its name, completely language-independent (see https://www.json.org/). Using JSON, you can describe objects containing unordered name/value pairs and also arrays or lists containing ordered values, as shown in this figure.
An object is delimited by curly braces ({}). A name is a quoted string ("name") and is sep- arated from its value by a colon (:). A value can be a string like "value", a number like 1.23, a Boolean (true or false), the null value null, an object, or an array. An array is delimited by brackets ([]), and its values are separated by commas (,).
The JSON format is easily parsed using any programming language. It is also relatively easy to read and write. It is widely adopted for many uses such as databases, configura- tion files, and, of course, APIs.
YAML
YAML (YAML Ain’t Markup Language) is a human-friendly, data serialization format. Like JSON, YAML (http://yaml.org) is a key/value data format. The figure shows a comparison of the two.
Note the following points:
There are no double quotes (" ") around property names and values in YAML.
JSON’s structural curly braces ({}) and commas (,) are replaced by newlines and
indentation in YAML.
Array brackets ([]) and commas (,) are replaced by dashes (-) and newlines in
YAML.
Unlike JSON, YAML allows comments beginning with a hash mark (#).
It is relatively easy to convert one of those formats into the other. Be forewarned though, you will lose comments when converting a YAML document to JSON.
Since this question now features prominently when searching for YAML and JSON, it's worth noting one rarely-cited difference between the two: license. JSON purports to have a license which JSON users must adhere to (including the legally-ambiguous "shall be used for Good, not Evil"). YAML carries no such license claim, and that might be an important difference (to your lawyer, if not to you).
Sometimes you don't have to decide for one over the other.
In Go, for example, you can have both at the same time:
type Person struct {
Name string `json:"name" yaml:"name"`
Age int `json:"age" yaml:"age"`
}
I find both YAML and JSON to be very effective. The only two things that really dictate when one is used over the other for me is one, what the language is used most popularly with. For example, if I'm using Java, Javascript, I'll use JSON. For Java, I'll use their own objects, which are pretty much JSON but lacking in some features, and convert it to JSON if I need to or make it in JSON in the first place. I do that because that's a common thing in Java and makes it easier for other Java developers to modify my code. The second thing is whether I'm using it for the program to remember attributes, or if the program is receiving instructions in the form of a config file, in this case I'll use YAML, because it's very easily human read, has nice looking syntax, and is very easy to modify, even if you have no idea how YAML works. Then, the program will read it and convert it to JSON, or whatever is preferred for that language.
In the end, it honestly doesn't matter. Both JSON and YAML are easily read by any experienced programmer.
If you are concerned about better parsing speed then storing the data in JSON is the option. I had to parse the data from a location where the file was subject to modification from other users and hence I used YAML as it provides better readability compared to JSON.
And you can also add comments in the YAML file which can't be done in a JSON file.
JSON encodes six data types: Objects (mappings), Arrays, Strings Numbers, Booleans and Null. It is extremely easy for a machine to parse and provides very little flexibility. The specification is about a page and a half.
YAML allows the encoding of arbitrary Python data and other crazy crap (which leads to vulnerabilities when decoding it). It is hard to parse because it offers so much flexibility. The specification for YAML was 86 pages, the last time I checked. YAML syntax is obviously influenced by Python, but maybe they should have been a little more influenced by the Python philosophy on a few points: e.g. “there should be one—and preferably only one—obvious way to do it” and “simple is better than complex.”
The main benefit of YAML over JSON is that it’s easier for humans to read and edit, which makes it a natural choice for configuration files.
These days, I’m leaning towards TOML for configuration files. It’s not as pretty or as flexible as YAML, but it’s easier both for machines and humans to parse. The syntax is (almost) a superset of INI syntax, but it parses out to JSON-like data structures, adding only one additional type: the date type.