Please help to suggest some merits and demerits of Flatbuffers and CBOR protocols. Both these binary formats claim to be good on their websites, but I am not able to make some good differences between the two.
Flatbuffers:
Advantage:
Strict typing in FlatBuffer, Cap’n proto and other similar solutions is seen as major key point for performance since no additional encoding/decoding is necessary.
The data model allows simple offsetting of typed objects with a compact data structure and fast access
FlatBuffers does not need a parsing/ unpacking step to a secondary representation before you can access data often coupled with per-object memory allocation.
Disadvantage:
New and not standardized like CBOR.
CBOR
Advantage:
Can create and process entirely in stream with no extra memory
Don’t have to pre-define any schema as our data is dynamic and variant
It’s an open international standard from the IETF makes it a even better choice than a proprietary one.
It’s designed for low memory, non-conversion, stream-based processing while also providing extensions for other data types
Disadvantage:
CBOR says that it follows the JSON model (so not strictly typed objects)
It starts with the same types of objects (strings, integers, maps, etc.).
PS:
It feels like managing types in CBOR will be performance costly compared to flatbuffers, but as CBOR is standardized protocol I am inclined to prefer it if this difference is not huge. Please let me know which of two will you all recommend and why.
I think you've already spelled it out quite clearly yourself. FlatBuffer's strength is being able to access the data without parsing/unpacking/allocation, which can give serious performance benefits in some scenarios. But if this doesn't matter to you, e.g. Protocol Buffers may work just as well.
Strong typing vs dynamic typing in data matters a lot too. I'd only use the latter if I wanted generic data storage with no constraints ahead of time.
Btw, if for some reason you prefer dynamic typing, but would also like to have the performance benefits of in-place access, there is actually a format that combines the two: https://google.github.io/flatbuffers/flexbuffers.html
FlatBuffers is not "proprietary". It may have been designed at Google, but it is open source and relied upon by many other companies.
I chose CBOR for my site https://kwippe.com - we use it to store all of the artwork and keyword data as compressed strings within a very small JSON structure, only a few attributes necessary to categorize the file. So the files are very small, and load very fast. I used this for over 30,000 SVG files, which I converted to JSON beforehand. All of the JSON is converted to string and compressed via a string compression library, then saved as part of the smaller JSON object that I encode to CBOR.
I've had very few problems with this CBOR system, and it was far easier to set up than FlatBuffers and some of the other binary solutions that I looked at.
I had this same question and went with CBOR for a couple reasons.
You have a CON that CBOR like JSON doesn't have strict types, true, you'll need to do a little validation to make sure the type you got is one you expected. You're right, this is what a Schema serializer gets you. You lose flexibility of changing types, but you know what you're going to get. I work on embedded in C, and static typing is important.
What you didn't list as a PRO is that CBOR 'can' retain JSON compatibility. That any valid JSON is valid CBOR, but not the other way around. A cbor can have a map item (object, key/value pair) of 1 : 2 that's integer 1 name has the value of integer 2. This isn't great a practice but there could be some uses for it. If you avoid the intentionally incompatible things, CBOR >> JSON conversion can be very handy. When would you use that? Well, I use it for logs. When my CBOR packets hit our server, they are converted to JSON and stored away already human readable for analytics. You can do this with any serializer, but we felt there was a lot less chance for "interpretation" differences in the close conversion.
The main factor for us was the schema was too difficult to share, and synchronize. If you own both sides of an A to B system, a schema is great! You get size efficiency because the map "Apples" : 100 is just stored as [1,100] but you had to get your schema file on both sides and compiled in (if using code generation) before you could get any work done. Ok, but what if you have 10 sides in a star pattern A B C D E F G H I J, where A and J can get messages to each other, B and H almost exclusively chat except for a message that goes to E and never back from, etc... In this scenario a schema can be very difficult! Maybe it's working and you add a whole slew of messages the option is to have old schemas, optional or missing definitions, or you synchronize everyone. For us this was the case and it would have taken place over 4 languages and in systems we didn't own.
Instead, we chose schemaless CBOR and appropriately name each map item. "apples" is for A,B,C, and J. "bananas" is an item that will go to C, H and E but never F, etc. Each side needs to know what it should expect and that's all.
As I understand it, FlatBuffers does have a schema-less mode, but I know little about it. I don't think there is a right answer, but for what it's worth, our web developers took to and understood CBOR right away because it's so similar in look and feel to JSON.
UPDATE: If interested in CBOR, but could really use some schema support and/or a clear way to document what the expected data is. CDDL (RFC 8610) looks to do exactly this. Also supports data definition of JSON because of how similar CBOR and JSON can be. There are also CDDL code generation tools for various languages that will accept the CDDL file, and help generate code for deserializing, parsing, validating the CBOR/JSON data. For me, this was the largest pain point of not having a schema, I was left to do this work and make mistakes on my own.
I would like to know whether there is some standard that specifies binary formats using JSON as the describing language, similar to google's protocol buffers.
Protocol buffers seem very powerful but they require parsing of yet another language and considerable overhead, especially for compiled languages such as C++.
So I am wondering whether there is some accepted standard that uses JSON to describe a binary format. (Parsing the binary data might then still require some manual steps, but at least a clear and unique description of the data can be made available.)
To be clear, I am not talking about encoding binary data in JSON, I am talking about describing binary data in JSON.
Head to the ultimate Wikipedia listing and evaluate for yourself. I don't know what is the right argument to overcome your programmer's inertia. I'd consider Apache Avro the most fitting your requirement - it has JSON description.
For least friction, you could try MessagePack or BSON, which are JSON themselves, just better packed. But, by not having external declaration, need to be self descriptive, so must transport the field names on wire - so it's not as "binary" and compact as Protocol Buffers or Avro.
I would love to use protocol buffers, but I am not sure if they fit my use case. Here it is:
I have a Quiz app. This requires a bunch of data, like categories, questions, a list of answers (and which ones are correct). I do not want to be responsible for entering this data - I would prefer to pass it off to a non-programmer to serialize all this data for me, in either XML or JSON. Then my app would just read in the data file.
Does Google's Protocol Buffers fit my use case? Or should I stick to a more traditional format like XML or JSON?
I think not: Protobuf is a binary format. So then you would need to support a text format like XML or JSON and Protobuf.
Also it does not seem you would benefit from Protobufs better berformance at all.
I want to test my application and I need to generate different load. Application is SUPL RRLP protocol parser, I have ASN.1 specification for this protocol. Packets have a lot of optional fields and number of varians may be over billion - I can't go through all the options manually. I want to automate it.
The first way to generate packets automatically, the other way is to create a lot different value assignments sets and encode each into binary format.
I found some tools, for example libtasn and Asn1Editor, but the first one can't parse existing ASN.1 spec file; the second one can't encode packets by specification.
I'm afraid to create thousandth ASN.1 parser because I can introduce errors in test process.
I hoped it's easy to find something existing, but... I'm capitulating.
Maybe, someone faced with the same problem on stackowerflow and found the solution? Or know something to recommend. I'll thank you.
Please try going to https://asn1.io/asn1playground/ and try your specification there. You can ask it to generate a sample value for a given ASN.1 type. You can encode it and edit either the encoded (hex) data, or decoded values to create additional values.
You can also download a free trial of the OSS ASN.1 Tools from http://www.oss.com/asn1/products/asn1-download.html which includes OSS ASN.1 Studio. This also allows you to generate (and modify) sample values for a given ASN.1 type.
Note that these don't generate thousands of different test values for you automatically, but will parse valid value notation and encode the values for you if you are able to generate valid ASN.1 value notation.
When building HTML forms why do we not always use enctype="multipart/form-data"?
multipart/form-data is a lot bulkier than application/x-www-form-urlencoded; the latter is just a bunch of keys and values (and can be parsed the same way whether for GET or POST), whereas the former requires full MIME support, and is thus more useful when you have data that can't simply be represented as key/value pairs.
Because it's a pain to handle, both on the server and in custom clients. Simple is better than complicated, unless simple just doesn't work.
With PHP it doesn't matter what kind o enctype the form had. You always get key/value pairs.
So if harder coding is the only reason not to and you are using PHP, just use enctype="multipart/form-data".
Is there any other reason?
Mulipart implicits that we are going to use different mime-types. For example, sending a binary file, you will have one part with the x-www-form-urlencoded part and the other with the octet-stream. Most of the times what you send is from the same mime type.