Where shall I use protobuf - json

I was reading this article about protobuf and I wondered where to use it in the projects. I read some articles that said google created protobuf to replace XML, but as far as I know in 2008 (the first release) JSON was already there.
I searched more and I found an article that the writer suggested to use it instead of JSON, but I still don't get the idea completely.
So where shall I use it? Any special scenario, or like JSON whenever that I want to transport data? Any other scenarios?

It is useful whenever you want to serialize/deserialize your data. Typical situations include sending your data to someone else over the network, storing it to disk or keeping it in context while performing asynchronous processes.
Here is a brief explanation about the main differences between protocol buffer, json and XML: https://stackoverflow.com/a/14029040/6681872

Related

How does protobuf helps or make it easier to do schema validation of message stream?

I have a Kafka topic that would contains different kinds of message that is different kinds of stuctures.
The messages of json kind. There is a need to evaluate the schema of every message/record.
I would like to know how we can leverage protobuf here to make things easier. where exactly should I insert protobuf.
Also a message of protobuf encapsulated means that it is always schema safe?
First of all, Avro would make more sense than Protobuf, in terms of Kafka. However, the new version of the Confluent Schema Registry does support Protobuf now, so... moving on...
where exactly should I insert protobuf.
Your messages are already JSON, so you wouldn't really "insert" anything...
If you want to use Protobuf, you would no longer be using JSON. You would have to rewrite both your Producer and Consumer code to not use String(De)serializer or JSON(De)Serializer and rather find some ProtobufSerializer that you would use instead.
The schemas must then be exist on both "sides" of the Kafka topic, in each application's code (or you'd fetch them from a remote Registry). And they are used to validate and (de)serialize the messages that are sent/received.
Also a message of protobuf encapsulated means that it is always schema safe?
Not necessarily, but I don't know enough about Protobuf to really say.

Can CouchDB natively convert plain text to json format?

I'm aware that there are python and powershell methods to convert plain text files, csv's etc.... into json format for upload into NoSQL DBs such as CouchDB.
But according to the CouchDB definitive guide, it makes it seems like there is a native built in way of doing this kind of conversion, without the need for a 3rd party tool.
This older thread appears to hint at this:
Filter and update functions in CouchDB?
This part in particular:
There are other design document functions that are being introduced at the >time of this writing, including _update and _filter that we aren’t covering in >depth here. Filter functions are covered in Chapter 20, Change Notifications. >Imagine a web service that POSTs an XML blob at a URL of your choosing when >particular events occur. PayPal’s instant payment notification is one of >these. With an _update handler, you can POST these directly in CouchDB and it >can parse the XML into a JSON document and save it. The same goes for CSV, >multi-part form, or any other format.
But when I dig deeper I don't find anything concrete.
The supporting wiki link is not clear to me (a beginner with json/NoSQL/curl stuff: http://wiki.apache.org/couchdb/Document_Update_Handlers
Hopefully this is a simple yes/no. And any links to help on this that is better than the above link also appreciated, thank you!
CouchDB supports transforming the internal documents/views into many other formats through the use of show and list functions. It's not a "native" transformation, as you define the transformation yourself, it's not magical.
That being said, there is not a similar mechanism for the reverse (ie: converting some arbitrary format into JSON documents) but you're much better off scripting that with a full-featured language/script and using the bulk docs API to do your imports in batch.

What alternatives are there for creating a REST-full web service API based on JSON?

We're creating a web service and we'd like 2 things:
- to be JSON based
- to be REST-full - how much so, we haven't decided
We've already implemented custom APIs but now we'd like to follow some standards, since at some point it gets a little crazy to remember all the rules, all the exceptions, and all the undocumented parts that the creator also forgot.
Are any of you using some standards that you've found useful? Or At least, what are some alternatives?
So far I know of jsonapi and HAL.
These don't seem to be good enough though, since what we'd optimaly like is to be able to:
+ define, expose and update entities and relations between them
+ define, expose and invoke operations
+ small numbers of requests are preferable, at least where it "makes sense" (i'll leave that as a blank check)
[EDIT]
Apparently, there's OData too: http://www.odata.org/
Are any of you using some standards that you've found useful? Or At least, what are some alternatives?
Between your own question and the comments most of the big names have been mentioned. I just like to also add JSON Hyper Schema:
"JSON Schema is a JSON based format for defining the structure of JSON data. This document specifies hyperlink- and hypermedia-related keywords of JSON Schema."
http://json-schema.org/latest/json-schema-hypermedia.html
It's an extension to JSON schema and fulfils a very similar role to the others mentioned above.
I've been using json-hal for a while and like it a lot, but I'm increasingly drawn to the JSON Schema family of schemas which also handle data model definition and validation. These schemas are also the basis of the excellent Swagger REST API standard:
http://swagger.io/specification/
Hope this helps.

Protocol Buffers vs XML/JSON for data entry outside of programming effort

I would love to use protocol buffers, but I am not sure if they fit my use case. Here it is:
I have a Quiz app. This requires a bunch of data, like categories, questions, a list of answers (and which ones are correct). I do not want to be responsible for entering this data - I would prefer to pass it off to a non-programmer to serialize all this data for me, in either XML or JSON. Then my app would just read in the data file.
Does Google's Protocol Buffers fit my use case? Or should I stick to a more traditional format like XML or JSON?
I think not: Protobuf is a binary format. So then you would need to support a text format like XML or JSON and Protobuf.
Also it does not seem you would benefit from Protobufs better berformance at all.

IDL for JSON REST/RPC interface

We are designing a fairly complex REST API, in which most of the I/O are JSON encoded objects with a specific structure. One challenge we have found is to document the API in such a way that makes it easier for clients to post correct input and process output. Because the data of both the input and output requires fairly complex JSON objects, client developers often introduce bugs related to the structure of the I/O objects.
With all of the JSON web API's these days, I would have hoped for a general solution, but I am having a hard time finding one. I looked into json-schema which is a json-validation schema but both the IETF draft and implementations seem to be fairly immature (even though they have been around for a while, which is not a good sign).
A slightly different approach is offered by Protocol Buffers and Apache Avro, where the schema is not used for validation, but actually required for the encoding/decoding of the message. Of these 2, Avro seems to have rather limited documentation and implementations. ProtoBuf seems better, but I am not sure if this is really suitable to use in the browser to call a JSON api?
Now I am starting to doubt if I am looking at this from the right angle. Are there other methods available to make my API a bit more strong-typed'ish? Or is a formal description of a JSON REST/RPC API something that defeats the purpose of using JSON?
Edit: 6 months after this topic we found mongoose, which is very close to what we were lookin for.
Below a reply I received by email from Douglas Crockford.
I am not a believer in schemas as an alternative to input validation.
There are properties that cannot be verified from the syntax. I think
that was one of the ways that XML went wrong.
If your formats are too complex, then I would look at simplifying
them.
Such systems exist and I'm the author of one of them. It is called Piqi-RPC and it does IDL-based validation of the input and output parameters for RPC-style APIs over HTTP.
It supports JSON, XML and Google Protocol Buffers as data representation formats for input and output of HTTP POST requests. Clients can choose to use any of the three formats and specify their choice using the standard Accept and Content-Type HTTP headers.
So, yes, in theory, you are looking in the right direction. However, at the moment, Piqi-RPC supports writing servers only in Erlang and it wouldn't be very useful for you if you use a different stack. I heard that Apache Thrift also supports JSON over HTTP transport, but I haven't checked. Another kind of similar system I know of (also for Erlang) is called UBF. I have heard of libraries for Java that can parse and validate JSON based on Protocol Buffers specification (e.g. http://code.google.com/p/protostuff/).
The idea itself is far from being new, but there aren't many systems that approach it in practice. It is a challenging problem.
Historically, IDLs were used for interface definition and binary data serialization and not so much for validating dynamic data interchange formats (e.g. XML and JSON) which emerged later. Sun-RPC IDL and CORBA IDL fall in the first category. WSDL would be one of few examples covering both areas, but it is a terrible piece of technology and it would be a bad choice for most modern systems. In addition, there are many schema languages (also known as DDLs -- data definition languages), most of which are highly specialized and work with only one representation format, e.g. XML or JSON schemas. Few of those have stable implementations.
The Piqi project and Piqi-RPC, which is based on it, are build around several fairly simple realizations:
DLL doesn't have to be explicitly tied to any particular data representation format or built around it. Instead, such language can be fairly universal and cover wide range of practical use-cases (e.g. cross-language data serialization and data validation) and data formats (e.g. JSON, XML, Protocol Buffers).
IDL for RPC-style communication can be implemented as a thin, mostly syntactic layer on top of the universal DDL.
Such IDL and interface specifications can be transport agnostic.
Speaking of REST-style APIs over HTTP compared to RPC-style APIs over HTTP.
With RPC-style APIs, service developer or an automated system have to validate three things: function name (according to some service naming scheme), input and, if you choose so, output.
In case of REST-style APIs, people get themselves in trouble for no good reason. Now, they have a lot more stuff to validate: arbitrarily complex URL syntax, including dynamic parameters encoded in URL segments (for all HTTP methods) and URL query string (only for HTTP GET method), HTTP method correspondence (whether it should be GET, POST, PUT, DELETE, etc.), HTTP body when some parameters go there (sometimes they do it manually twice for parameters represented in JSON and XML), custom HTTP headers, and separately -- service documentation. Imagine an IDL supporting all that!
XML is better for RESTful services in many ways. It has native linking (<link href=, for all those HATEOAS fans), native language support (lang="en") and a great ecosystem.
It is also better for future proofing and future API refactorings. Converting this:
<profile>
<username>alganet</username>
</profile>
To support more usernames:
<profile>
<username>alganet</username>
<username>alexandre</username>
</profile>
Is much more simpler to do without breaking existing clients using XML. JSON is hard on that.
If you really need JSON, JSON-Schema is the way to go. It's immature, but I don't know anything better on that case. Maybe your consumers could choose between XML and JSON, so they can choose between a small payload (JSON) or RESTful candies (XML) using Content Negotiation.
I'd say the answer to your last question is yes. If you need a way to constrain and document the JSON "schema", why didn't you go with XML in the first place? It is not that much harder to parse, and being able to enforce a schema for it is a great advantage.