Best way to store golang map[string]interface into data store - json

Scenario: I am having a arbitrary JSON size ranging from 300 KB - 6.5 MB read from MongoDB. Since it is very much arbitrary/dynamic data I cannot have struct type defined in golang, So I am using map[sting]interface{} type. And string of JSON data is parsed by using encoding/json's Unmarshal method. Some what similar to what is mentioned in Generic JSON with interface{}.
Issue: But the problem is its taking more time (around 30ms to 180ms) to parse the string json into map[string]interface{}. (Comparing to php parsing json using json_encode/decode / igbinary/ msgpack)
Question: Is there any way to pre-process it and store in cache?
I mean parse string into map[string]interface{} and serialize it and store to some cache, then when we retrieve it should not take much time to unserialization and proceed with execution.
Note: I am Newbie for the golang any suggestion are highly appriciated. Thanks
Updates: Serialization using Gob, binary built-in package & Msgpack implementation for Golang package are already tried. No luck, No improvement in the time to unserialization.

The standard library package for JSON is notoriously slow. There is a good reason for that: it use RTTI to provide a really flexible interface that is really simple. Hence the unmarshalling being slower than PHP's…
Fortunately, there is an alternative way, which is to implement the json.Unmarshaller interface on the types you want to use. If this interface is implemented, the package will use it instead of its standard method so you can see huge performance boosts.
And to help you, there is a small group of tools that appeared, among which:
https://godoc.org/github.com/benbjohnson/megajson
https://godoc.org/github.com/pquerna/ffjson
(listing here the main players from memory, there must be others)
These tools will generate tailored implementations of the json.Unmarshaller interface for the types you requested. And with go:generate, you can even integrate them seamlessly to your build step.

Related

JSON library in Scala and Distribution of the computation

I'd like to compute very large JSON files (about 400 MB each) in Scala.
My use-case is batch-processing. I can receive several very big files (up to 20 GB, then cut to be processed) at the same moment and I really want to process them quickly as a queue (but it's not the subject of this post!). So it's really about distributed architecture and performance issues.
My JSON file format is an array of objects, each JSON object contains at least 20 fields. My flow is composed of two major steps. The first one is the mapping of the JSON object into a Scala object. And the second step is some transformations I'm making on the Scala object data.
To avoid loading all the file in memory, I'd like a parsing library where I can have incremental parsing. There are so many libraries (Play-JSON, Jerkson, Lift-JSON, the built in scala.util.parsing.json.JSON, Gson) and I cannot figure out which one to take, with the requirement to minimize dependencies.
Do you have any ideas of a library I can use for high-volume parsing with good performances?
Also, I'm searching a way to process in parallel the mapping of the JSON file and the transformations made on the fields (between several nodes).
Do you think I can use Apache Spark to do it? Or are there alternative ways to accelerate/distribute the mapping/transformation?
Thanks for any help.
Best regards, Thomas
Considering a scenario without Spark, I would advise to stream the json with Jackson Streaming (Java) (see for example there), map each Json object to a Scala case class and send them to an Akka router with several routees that do the transformation part in parallel.

Windows Store App: Performance of JSON parser

There are many possibilities to parse a JSON in context of a Windows Store App.
Regardless in which language (C#, JavaScript or C++).
For example: .NET 4.5 JsonObject and DataContractJsonSerializer, JavaScript Json parser or an extern one like Json.NET.
Does anybody know something about that?
I only read good things about Json.NET's performance.
But are they true and play that a role for JSON's which include datasets of 100k JSON objects? Or the user won't notice a difference?
I only have experience using Json.NET ... works just fast and great! I also used the library in enterprise-projects - i never got disappointed!
If it helps, and FWIW, I've been collecting recently some new JSON parsing / deserialization performance data that can be observed over various JSON payload "shapes" (and sizes), when using four JSON librairies, there:
https://github.com/ysharplanguage/FastJsonParser#Performances
(.NET's out-of-the-box JavaScriptSerializer vs. JSON.NET vs. ServiceStack vs. JsonParser)
Please note:
these figures are for the full .NET only (i.e., the desktop / server tier; not mobile devices)
I was interested in getting new benchmark figures about parsing / deserialization performances only (i.e., not serialization)
finally, I was also especially interested (although not exclusively) in figures re: strongly typed deserialization use cases (i.e., deserializing into POCOs)
'Hope this helps,

Finding it difficult to process JSON in delphi

I'm currently working on an application that will get data of your character from the WoW armory.
Example character: My WoW Character(link)
I will get all the info I want by calling the API provided by Blizzard and I will get the response in JSON.
Example JSON: JSON response for the character above(link)
At first I tried get the data from the JSON by string manipulation.
This mean, splitting my strings, searching for keywords in the string to find the position and formatting that into individual pieces of data such as talents and stats.
This worked great at the beginning but as I wanted more data this became harder because of the many functions I ran on all the strings it just became one big blur and unclear to see what I was doing at that moment.
Is there a good way to process my JSON?
I was thinking about getting the JSON and creating an empty class.
While working through the JSON it would generate properties and store the values in there.
But I have no idea if and how its possible to generate properties dynamically.
In the future I would like to get even more data but first I want to get this up and running before even thinking about that.
Does anyone have any ideas/advice on this?
Thanks in advance.
Your JSON seems rather short and basic. It does not seem you need special speed or exotic features. http://jsonviewer.stack.hu/#http://eu.battle.net/api/wow/character/moonglade/Xaveak?fields=stats,talents
And while since Delphi XE2 you really have stock JSON parser as part of DB-Express suite, still there are concerns:
1. It was told to cause problems with both speed and reliability.
2. It would make you program dependent on DB-Express package (why, if you not actually using it for DB access?)
3. It would bind your future to Enterprise edition of Delphi.
So you'd better try some 3rd-party library.
One of the fastest would probably be Synopse JSON parser, side-project of their mORMot library. It is generally good code, with large attention to speed and developers actively helping on their forum.
One more known and used library would be Henri Gourvest's SuperObject.
It made claims to be the fastest parser for Delphi, and while due to above that is probably no more true, the speed is quite adequate for most tasks. Henri himself is not actively supporting his ex-projects, always doing something new, so the scarce documentation (also duplicated in install package) would be all you have officially, plus there is a forum where other users might help you. OTOH the main idea behind SuperObject design was uniformity, and while some tasks could really be documented better - that is mostly due to uncertainty "if this task would really work in uniform matter without any special treatment". But usually it does.
PS. Since that is wiki you may try to enhance it for future users ;-)
So coming back to documentation, you would need
1) to load the whole JSON to the library. That you can do via creating TStream by your http library or providing string buffer wth the data: that is Parsing a JSON data structure section of the manual
2) reading values like "name" and "level" - described in How to read a property value of an object ? section there.
3) enlist arrays like "talents" - described in Browsing data structure section.
XE3 has "built in" JSON support (see docwiki), but I have heard (haven't used it myself) that it isn't very well optimised.
So perhaps look for some thirdparty option like SuperObject.
Your task is easily achievable using TSvSerializer which is included in my delphi-oop library. You only need to declare your model type and deserialize it from your json string. Your model (very simplified incomplete and untested version) should look something like this:
type
TStats = class
public
property health: Integer read fhealth write Fhealth;
...
end;
TTalent = class
public
property tier: Integer read Ftier write Ftier;
...
end;
TMainTalent = class
public
property selected: Boolean read Fselected write Fselected;
property talents: TObjectList<TTalent> read Ftalents write Ftalents;
end;
TWowCharacter = class
public
property lastModified: Int64 read FlastModified write FlastModified;
property name: string read Fname write Fname;
...
property stats: TStats read Fstats write Fstats;
property talents: TObjectList<TMainTalent> read Ftalents write Ftalents;
...
end;
Then you just need to do:
uses
SvSerializer;
var
LWowCharacter: TWowCharacter;
begin
LWowCharacter := TWowCharacter.FromJson(YourJsonString);
...
You can find my contact email in delphi-oop project, ask me if something's unclear, I'll try to help you in my spare time.

What are some of the common steps for reducing JSON data size?

I am running into problems where some of our data stores are seeing a lot of throughput. We are using POJOs serialized to JSON using Jackson. What are some of the ways we can compress JSON data?
One initial thought suggested using BSON but apparently its not much smaller than JSON.
Check out CJSON.
You can see some comparisons here.
If you're not wedded to JSON you could try MessagePack:
MessagePack is a binary-based efficient object serialization library.
It enables to exchange structured objects between many languages like
JSON. But unlike JSON, it is very fast and small.
There are implementations in many languages.

Is there a standard mapping between JSON and Protocol Buffers?

From a comment on the announcement blog post:
Regarding JSON: JSON is structured
similarly to Protocol Buffers, but
protocol buffer binary format is still
smaller and faster to encode. JSON
makes a great text encoding for
protocol buffers, though -- it's
trivial to write an encoder/decoder
that converts arbitrary protocol
messages to and from JSON, using
protobuf reflection. This is a good
way to communicate with AJAX apps,
since making the user download a full
protobuf decoder when they visit your
page might be too much.
It may be trivial to cook up a mapping, but is there a single "obvious" mapping between the two that any two separate dev teams would naturally settle on? If two products supported PB data and could interoperate because they shared the same .proto spec, I wonder if they would still be able to interoperate if they independently introduced a JSON reflection of the same spec. There might be some arbitrary decisions to be made, e.g. should enum values be represented by a string (to be human-readable a la typical JSON) or by their integer value?
So is there an established mapping, and any open source implementations for generating JSON encoder/decoders from .proto specs?
May be this is helpful http://code.google.com/p/protobuf-java-format/
From what I have seen, Protostuff is the project to use for any PB work on Java, including serializing it as JSON, based on protocol definition. I have not used it myself, just heard good things.
Yes, since Protocol Buffers version 3.0.0 (released July 28, 2016) there
is "A well-defined encoding in JSON as an alternative to binary proto
encoding" as mentioned in the release notes
https://github.com/google/protobuf/releases/tag/v3.0.0
I needed to marshal from GeneratedMessageLite to a JSON object but did not need to unmarshal. I couldn't use the protobuf library in Pangea's answer because it doesn't work with the LITE_RUNTIME option. I also didn't want to burden our already large legacy system with generating more compiled code for the existing protocol buffers. For mashalling to JSON, I went with this simple solution to marshal
final Person gpb = Person.newBuilder().setName("Bill Monroe").build();
final Gson gson = new Gson();
final String jsonString = gson.toJson(gpb);
One further thought: if protobuf objects have getters/setters, or appropriately named fields, one could simply use Jackson JSON processor's data binding. By default it handles public getters, any setters and public fields, but these are just default visibility levels and can be changed. If so, Jackson can serialize/deserialize protobuf generated POJOs without problems.
I have actually used this approach with Thrift-generated objects; the only thing I had to configure there was to disable serialization of various "isXXX()" methods that Thrift adds for checking if a field has been explicitly assigned or not.
First of all I think one should reason very carefully on putting an effort into converting a dataset to protobuffs. Here my reasons to convert a data-set to protobuffs
Type Safety: guarantee on the format of the data being considered.
uncompressed memory foot-print of the data. The reason I mention un-compressed is because post compression there isn't much of a difference in the size of JSON compressed and proto compressed but compression has a cost associated with it. Also, the serialization/de-serialization speed is almost similar, infact Jackson json is faster than protobuffs. Please checkout the following link for more information
http://technicalrex.com/2014/06/23/performance-playground-jackson-vs-protocol-buffers/
The protobuffs needs to be transferred over the network a lot.
Saying that once you convert your data-set to Jackson JSON format in the way that the ProtoBuff definition is defined then it can very easily be directly mapped to ProtoBuff format using the Protostuff:JsonIoUtil:mergeFrom function. Signature of the function :
public static <T> void mergeFrom(JsonParser parser, T message, Schema<T> schema, boolean numeric)
Reference to protostuff