Mashery IODocs - Latency issue due to heavy json config file - json

Mashery IOdocs is a really a great tools for documenting API.
I'm using it for a quite big project with more then 50 methods and complex structures sent to this API, so that my json config file is more than 4000 lines long.
I self-host IOdocs on a VPS along with other stuff and the doc is awfully slow because of my long json file.
Any idea to cope with this latency ? Except obviously split my json config file into several.

I have a fork of IO Docs with some performance improvements which may help. In this instance they involve stripping out json-minify (which is only used to allow comments in the source specifications), server-side cacheing of the specifications and not having to load the specification via a synchronous AJAX call on the client.

Related

Protocol Buffer vs Json - when to choose one over another

Can anyone explain when to use protocol buffer instead of JSON for micro-services architecture? And vice-versa? Both on synchronous and asynchronous communication.
When to use JSON
You need or want data to be human readable
Data from the service is directly consumed by a web browser
Your server side application is written in JavaScript
You aren’t prepared to tie the data model to a schema
You don’t have the bandwidth to add another tool to your arsenal
The operational burden of running a different kind of network service
is too great
Pros of ProtoBuf
Relatively smaller size
Guarantees type-safety
Prevents schema-violations
Gives you simple accessors
Fast serialization/deserialization
Backward compatibility
While we are at it, have you looked at flatbuffers?
Some of the aspects are covered here google protocol buffers vs json vs XML
Reference:
https://codeclimate.com/blog/choose-protocol-buffers/
https://codeburst.io/json-vs-protocol-buffers-vs-flatbuffers-a4247f8bda6f
I'd use JSON when the consumer is or could possibly be written in a language with built-in native support for JSON (Javascript is an example), a web browser, or where human readability is wanted. Speaking of which, at least for asynchronous calls, many developers enjoy the convenience of examining the contents of the queue directly for debugging and even during the normal course of development. Depending on the tech stack used, it may or may not be worth the trade off to use protobuf just to reduce network load since any performance increase wont buy you much in the async world. And it's not like we need to write a bunch of boiler plate code anymore like we used to with JSON marshalling and unmarshalling in most languages.
I'd use protobuf for everything else... if there are any other use cases left for it with the considerations above. There are advantages you might see, such as performance, network load, the backwards compatibility offered by its versioning scheme, the lovely documentation that magically comes with proto files, and some validation! If for some reason you have a lot of REST or other synchronous calls between microservices, protobuf can be sent over the wire instead of JSON without many trade offs, if any at all, while offering a heap of advantages.

Stream from JMS queue and store in Hive/MySQL

I have the following setup (that I cannot change) and I'd like some advice from people who have been down that road. I'm not sure if this is the right place to ask, but here goes anyway.
Various JSON messages are placed on a different channels of a JMS queue (Universal Messaging/webMethods).
Before the data can be stored in relational-style DBs it has to be transformed: renamed, arrays flattened and some structures from nested objects extracted.
Data has to be appended to MySQL (as a serving layer for a visualization tool) and Hive (for long-term storage).
We're stuck on Spark 1.4.1 and may move to 1.6.0 in a few months' time. So, structured streaming is not (yet) an option.
At some point the events will be streamed directly to real-time dashboards, so having something in place that is capable of doing that now would be ideal.
Ideally coding is done in Scala (because we already have considerable batch-based repo with Spark and Scala), so the minimal requirement is JVM-based.
I've looked at Spark Streaming but it does not have a JMS adapter and as far as I can tell operating on JSON would be done using a SQLContext instance on the DStream's RDDs. I understand that it's possible to write a custom adapter, but then I'm not sure if Spark is still the best/easiest solution. I've also looked at the doc for Samza and Flink but did not find much for JMS and/or JSON, at least not natively.
Apache Camel seems like it might have a substantial set of connectors but I'm not too familiar with it, and I get the impression it does not do the streaming part, 'just' the bit where you connect to various systems. There's also Akka although I get the impression it's more of a replacement for messaging systems and JMS is set.
There is an almost bewildering amount of available tools and I'm at this point at a loss what to look at or what to look out for. What do you recommend based on your experience that I use to pick up the messages, transform, and insert into Hive and MySQL?

Parsing Protocol-Buffers without .proto file

I am reverse-engineering an Android app as part of a security project. My first step is to discover the protocol exchanged between the app and server. I have found that the protocol being used is protocol buffers. Given the nature of protobuf, the original .proto file is needed to be able to unserialize the protobuf-encoded message. Since I don't have that, I used protod to disassemble the Android app and recover out any .proto files used.
I have the Android app in a form where it is a bunch of .smali and .so files. Running protod against the .so files yields only one .proto file -- google/protobuf/descriptor.proto.
I was under the impression that users of protocol buffers write their own .proto files, which might reference google/protobuf/descriptor.proto, but according to protod google/protobuf/descriptor.proto is the only protofile used by the app. Could this actually be possible and google/protobuf/descriptor.proto is enough for me to unserialize the messages between the app and server?
When you write a .proto file you can set an option optimize_for to LITE_RUNTIME (see here) and this will omit the descriptors from the generated code to reduce the size of the binary. I believe this is a common practice for mobile development since code size is a scarce resource in that environment. This may explain why you found only a single .proto file. It is unlikely that the app is actually transferring any data using descriptor.proto since that is mostly an implementation detail of the protocol buffers library.
If you cannot find any other descriptors, your best bet might be to try to interpret the protocol buffers without them. You can read about the protocol buffers wire format here. An easy way to get started would be to create a proto2 message type containing no fields and attempt to parse the data as that type. You can then use the reflection API to examine what are known as the "unknown fields" in the message and try to figure out what they represent.

How to covert a large JSON file into XML?

I have a large JSON file, its size is 5.09 GB. I want to convert it to an XML file. I tried online converters but the file is too large for them. Does anyone know how to to do that?
The typical way to process XML as well as JSON files is to load these files completely into memory. Then you have a so called DOM which allows you various kinds of data processing. But neither XML nor JSON are really designed for storing that much data you have here. To my experience you typically will run into memory problems as soon as you exceed a 200 MByte limit. This is because DOMs are created that are composed from individual objects. This approach results in a huge memory overhead that far exceeds the amount of data you want to process.
The only way for you to process files like that is basically to take a stream approach. The basic idea: Instead of parsing the whole file and loading it into memory you parse and process the file "on the fly". As data is read it is parsed and events are triggered on which your software can react and perform some actions as needed. (For details on that have a look at the SAX API in order to understand this concept in more detail.)
As you stated you are processing JSON, not XML. Stream APIs for JSON should be available in the wild as wel. Anyway you could implement one fairly easily yourself: JSON is a pretty simple data format.
Nevertheless such an approach is not optimal: Typically such a concept will result in very slow data processing because of millions of method invocations involved: For every item encountered you typically need to call a method in order to perform some data processing task. This together with additional checks about what kind of information you currently have encountered in the stream will slow down data processing pretty much.
You really should consider to use a different kind of approach. First split your file into many small ones, then perform processing on them. This approach might not seem to be very elegant, but it helps to keep your task much simpler. This way you gain a main advantage: It will be much easier for you to debug your software. Unfortunately you are not very specific about your problem, so I can only guess, but large files typically imply that the data model is pretty complex. Therefor you will probably be much better off by having many small files instead of a single huge one. And later it allows you to dig into individual aspects of your data and the data processing process as needed. You will probably fail getting any detailed insights into that while having a single large file of 5 GByte to process. On errors you will have trouble to identify which part of the huge file is causing the problem.
As I already stated you unfortunately are very unspecific about your problem. Sorry, but because of having no more details about your problem (and your data in particular) I can only give you these general recommendations about data processing. I do not know any details about your data, so I can not give you any recommendation about which approach will work best in your case.

Serialization format common to node.js and ActionScript?

Some of my friends are designing a game, and I am helping them out by implementing the game's backend server. The game is written in Flash, and I plan to develop the server in node.js because (a) it would be a cool project for learning node.js, and (b) it's fast, which is important for games.
The server's architecture is based on messages sent between the server and client (sort of like Minecraft's server protocol). The message format I have so far is a byte (the packet type), two bytes (the message length) and that many bytes (the message data, which is a mapping of key-value pairs). Problem is, I really don't want to develop my own serialization format (because while I probably could, implementing it would be a pain compared to using an existing solution).
Unfortunately, I am having problems finding a good candidate for the message data serialization format.
ActionScript's own remoting format might work, but I don't like it much.
JSON has support in node.js (obviously) and in ActionScript, but it's also textual and I would prefer binary for enhanced speed.
MessagePack looked like a good candidate, but I can't find an ActionScript implementation. (There's one called as3-msgpack on Google Code, but I get weird errors and can't access it.)
BSON has an ActionScript implementation, but no node.js support besides their MongoDB library (and I'm planning on using Redis).
So, can anyone offer any other serialization formats that I might have missed? Or should I just stick with one of these (or roll my own)?
Isn't that why HTTP supports gzipped content? Just use JSON and gzip the content when you send it. The time spent gzipping is more than recovered by the reduced latency of the transmission.
Check this article for more on gzip with Actionscript. On node.js I think that gzip-compress is fairly popular.
Actually, if I were in your shoes I would implement two methods and time them. Use JSON because it is common and easy to do. But then implement AMQP instead and compare them. If you want to massively scale this then you might find that AMQP makes it easier. Also. message queuing is just such a nice fit into the node.js world view.
AMQP on Actionscript, and someone doing similar on node.js.
Leverage JSAMF in Node.js for AMF communications with Flash.
http://www.jamesward.com/2010/07/07/amf-js-a-pure-javascript-amf-implementation/
If you wanted to, you could create your entire API in client side JavaScript, and use JSON as the data exchange format, then call ExternalInterface by AS to communicate with the client JavaScript API, which would make for an elegant server side solution.
It is worth noting that Flash Player has built in support for decompressing gzip compressed data. It may be worth compressing some of your JSON objects, things like localised string tables, game configuration data, etc which can grow to be a few hundred kb but are only loaded once on game load.
I'm working on a version of MessagePack for AS3.
At the current version it does the basic (encoding/decoding). Planning streams for the future.
Check the project page: https://github.com/loteixeira/as3-msgpack