Golang: Parsing benchmarking between message pack and JSON - json

We are working on a TCP server which takes simple textbased commands over TCP (similar to redis)
We are tossing up between using raw text command, JSON or message pack (http://msgpack.org/)
An example of a command could be:
text command: LOCK some_random_key 1000
JSON command: {"command":"LOCK","key":"some_random_key","timeout":1000}
messagePack: \x83\xA7command\xA4LOCK\xA3key\xAFsome_random_key\xA7timeout\xCD\x03\xE8
Question:
EDIT: I have figured out my own question which is the speed comparison between parsing JSON and MsgPack. Please see results in my answer

Parsing Speed Comparison:
BenchmarkJSON 100000 17888 ns/op
BenchmarkMsgPack 200000 10432 ns/op
My benchmarking code:
package benchmark
import (
"encoding/json"
"github.com/vmihailenco/msgpack"
"testing"
)
var in = map[string]interface{}{"c": "LOCK", "k": "31uEbMgunupShBVTewXjtqbBv5MndwfXhb", "T/O": 1000, "max": 200}
func BenchmarkJSON(b *testing.B) {
for i := 0; i < b.N; i++ {
jsonB := EncodeJSON(in)
DecodeJSON(jsonB)
}
}
func BenchmarkMsgPack(b *testing.B) {
for i := 0; i < b.N; i++ {
b := EncodeMsgPack(in)
DecodeMsgPack(b)
}
}
func EncodeMsgPack(message map[string]interface{}) []byte {
b, _ := msgpack.Marshal(message)
return b
}
func DecodeMsgPack(b []byte) (out map[string]interface{}) {
_ = msgpack.Unmarshal(b, &out)
return
}
func EncodeJSON(message map[string]interface{}) []byte {
b, _ := json.Marshal(message)
return b
}
func DecodeJSON(b []byte) (out map[string]interface{}) {
_ = json.Unmarshal(b, &out)
return
}

I would suggest to do some benchmarks on the kind of data that the machines will be talking to each other.
I would suggest to try Protocol Buffers (Encoding) + Snappy (compression)

msgpack only promises to be shorter than json, not faster to parse. In both cases, your test string is so short and simple that your benchmarking may simply be testing the maturity of the particular implementation, rather than the underlying algorithms.
If all your messages really are this short, parsing speed may be the least of your problems. I'd suggest designing your server such that the parsing bits are easily replaceable and actually profiling the code in action.
Donald Knuth made the following statement on optimization:
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil"
Lastly if you really want to know what is going on, you need to profile the code. See
http://blog.golang.org/profiling-go-programs
for an example of how to profile code with go.

Also, your test cases are reversed
BenchmarkJSON actually calls MsgPack
and
BenchmarkMsgPack calls Json
could that have something to do with it?

Related

go JSON nightmare - Is there a simple [POSH] ConvertFrom-Json equivalent?

In powershell, if I make a REST call and receive any kind of json response, I can easily $json | ConvertFrom-Json into a proper object so I can make modifications, render specific values, whatever.
It seems like in Go I have to either define a struct, or "dynamically" convert using a map[string]interface{}.
The issue with a struct is that I am writing a rest handler for a platform that, depending on the endpoint, serves wildly different JSON responses, like most REST APIs. I don't want to define a struct for all of the dozens of possible responses.
The problem with map[string]interface{} is that it pollutes the data by generating a string with a bunch of ridiculous 'map' prefixes and unwanted [brackets].
ala: [map[current_user_role:admin id:1]]
Is there a way to convert a JSON response like:
{
"current_user_role": "admin",
"id": 1
}
To return a basic:
current_user_role: admin
id: 1
... WITHOUT defining a struct?
Your approach of using a map is right if you don't wish to specify the structure of the data you're receiving. You don't like how it is output from fmt.Println, but I suspect you're confusing the output format with the data representation. Printing them out in the format you find acceptable takes a couple of lines of code, and is not as convenient as in python or powershell, which you may find annoying.
Here's a working example (playground link):
package main
import (
"encoding/json"
"fmt"
"log"
)
var data = []byte(`{
"current_user_role": "admin",
"id": 1
}`)
func main() {
var a map[string]interface{}
if err := json.Unmarshal(data, &a); err != nil {
log.Fatal(err)
}
for k, v := range a {
fmt.Printf("%s: %v\n", k, v)
}
}

Unmarshalling Entities from multiple JSON arrays without using reflect or duplicating code

I'm making an JSON API wrapper client that needs to fetch paginated results, where the URL to the next page is provided by the previous page. To reduce code duplication for the 100+ entities that share the same response format, I would like to have a single client method that fetches and unmarshalls the different entities from all paginated pages.
My current approach in a simplified (pseudo) version (without errors etc):
type ListResponse struct {
Data struct {
Results []interface{} `json:"results"`
Next string `json:"__next"`
} `json:"d"`
}
func (c *Client) ListRequest(uri string) listResponse ListResponse {
// Do a http request to uri and get the body
body := []byte(`{ "d": { "__next": "URL", "results": []}}`)
json.NewDecoder(body).Decode(&listResponse)
}
func (c *Client) ListRequestAll(uri string, v interface{}) {
a := []interface{}
f := c.ListRequest(uri)
a = append(a, f.Data.Results...)
var next = f.Data.Next
for next != "" {
r := c.ListRequest(next)
a = append(a, r.Data.Results...)
next = r.Data.Next
}
b, _ := json.Marshal(a)
json.Unmarshal(b, v)
}
// Then in a method requesting all results for a single entity
var entities []Entity1
client.ListRequestAll("https://foo.bar/entities1.json", &entities)
// and somewehere else
var entities []Entity2
client.ListRequestAll("https://foo.bar/entities2.json", &entities)
The problem however is that this approach is inefficient and uses too much memory etc, ie first Unmarshalling in a general ListResponse with results as []interface{} (to see the next URL and concat the results into a single slice), then marshalling the []interface{} for unmarshalling it directly aftwards in the destination slice of []Entity1.
I might be able to use the reflect package to dynamically make new slices of these entities, directly unmarshal into them and concat/append them afterwards, however if I understand correctly I better not use reflect unless strictly necessary...
Take a look at the RawMessage type in the encoding/json package. It allows you to defer the decoding of json values until later. For example:
Results []json.RawMessage `json:"results"`
or even...
Results json.RawMessage `json:"results"`
Since json.RawMessage is just a slice of bytes this will be much more efficient then the intermediate []interface{} you are unmarshalling to.
As for the second part on how to assemble these into a single slice given multiple page reads you could punt that question to the caller by making the caller use a slice of slices type.
// Then in a method requesting all results for a single entity
var entityPages [][]Entity1
client.ListRequestAll("https://foo.bar/entities1.json", &entityPages)
This still has the unbounded memory consumption problem your general design has, however, since you have to load all of the pages / items at once. You might want to consider changing to an Open/Read abstraction like working with files. You'd have some Open method that returns another type that, like os.File, provides a method for reading a subset of data at a time, while internally requesting pages and buffering as needed.
Perhaps something like this (untested):
type PagedReader struct {
c *Client
buffer []json.RawMessage
next string
}
func (r *PagedReader) getPage() {
f := r.c.ListRequest(r.next)
r.next = f.Data.Next
r.buffer = append(r.buffer, f.Data.Results...)
}
func (r *PagedReader) ReadItems(output []interface{}) int {
for len(output) > len(buffer) && r.next != "" {
r.getPage()
}
n := 0
for i:=0;i<len(output)&&i< len(r.buffer);i++ {
json.Unmarshal(r.buffer[i], output[i] )
n++
}
r.buffer = r.buffer[n:]
return n
}

Speeding up JSON parsing in Go

We have transaction log files in which each transaction is a single line in JSON format. We often need to take selected parts of the data, perform a single time conversion, and feed results into another system in a specific format. I wrote a Python script that does this as we need, but I hoped that Go would be faster, and would give me a chance to start learning Go. So, I wrote the following:
package main
import "encoding/json"
import "fmt"
import "time"
import "bufio"
import "os"
func main() {
sep := ","
reader := bufio.NewReader(os.Stdin)
for {
data, _ := reader.ReadString('\n')
byt := []byte(data)
var dat map[string]interface{}
if err := json.Unmarshal(byt, &dat); err != nil {
break
}
status := dat["status"].(string)
a_status := dat["a_status"].(string)
method := dat["method"].(string)
path := dat["path"].(string)
element_uid := dat["element_uid"].(string)
time_local := dat["time_local"].(string)
etime, _ := time.Parse("[02/Jan/2006:15:04:05 -0700]", time_local)
fmt.Print(status, sep, a_status, sep, method, sep, path, sep, element_uid, sep, etime.Unix(), "\n")
}
}
That compiles without complaint, but I'm surprised at the lack of performance improvement. To test, I placed 2,000,000 lines of logs into a tmpfs (to ensure that disk I/O would not be a limitation) and compared the two versions of the script. My results:
$ time cat /mnt/ramdisk/logfile | ./stdin_conv > /dev/null
real 0m51.995s
$ time cat /mnt/ramdisk/logfile | ./stdin_conv.py > /dev/null
real 0m52.471s
$ time cat /mnt/ramdisk/logfile > /dev/null
real 0m0.149s
How can this be made faster? I have made some rudimentary efforts. The ffjson project, for example, proposes to create static functions that make reflection unnecessary; however, I have failed so far to get it to work, getting the error:
Error: Go Run Failed for: /tmp/ffjson-inception810284909.go
STDOUT:
STDERR:
/tmp/ffjson-inception810284909.go:9:2: import "json_parse" is a program, not an importable package
:
Besides, wouldn't what I have above be considered statically typed? Possibly not-- I am positively dripping behind the ears where Go is concerned. I have tried selectively disabling different attributes in the Go code to see if one is especially problematic. None have had an appreciable effect on performance. Any suggestions on improving performance, or is this simply a case where compiled languages have no substantial benefit over others?
Try using a type to remove all this unnecessary assignment and type assertion;
type RenameMe struct {
Status string `json:"status"`
Astatus string `json:"a_status"`
Method string `json:"method"`
Path string `json:"path"`
ElementUid string `json:"element_uid"`
TimeLocal time.Time `json:"time_local"`
Etime time.Time // deal with this after the fact
}
data := &RenameMe{}
if err := json.Unmarshal(byt, data); err != nil {
break
}
data.Etime, _ := time.Parse("[02/Jan/2006:15:04:05 -0700]", time_local)
I'm not going to test this to ensure it outperforms your code but I bet it does by a large margin. Give it a try and let me know please.
http://jsoniter.com/ declares itself to be the fastest json parser, golang and java implementations are provided. Two types of api can be used. And pre-injected json object definition is optional.
Check https://github.com/pquerna/ffjson
I saw 3x improvements over the standard json marshal/unmarshal method employed by the standard lib. It does so by rewrite the source and removing the need for reflection.

Go - Modify JSON on the fly

Considering the following input:
"object": {
"array": [
{
"element_field_1": some_value_1,
"element_field_2": some_value_1,
... // More unknown fields
},
...
],
"object_field": foo,
... // Some more unknown fields
}
I need to iterate over every element of the array, modify fields 1 and 2 and then output the JSON object. Here's what I have for now, but it is far from being valid Go code:
func handler(w http.ResponseWriter, r *http.Request) {
// Transform the request's body to an interface
j := getDecodedJSON(r)
// Iterate over every element of the array
for i := 0; i < len(j["object"]["array"]); i++ {
rewrite(j["object"]["array"][i])
}
// Encoding back to JSON shouldn't be a problem
}
func getDecodedJSON(r *http.Request) map[string]interface{} {
dec := json.NewDecoder(r.Body)
var j map[string]interface{}
if err := dec.Decode(&j); err != nil {
log.Fatal(err)
}
return j
}
func rewrite(element *map[string]interface{}) {
element["element_field_1"], element["element_field_2"] = lookupValues(element)
}
Basically the error is:
invalid operation: j["object"]["array"] \
(type interface {} does not support indexing)
but of course there's a more conceptual mistake on my approach.
Writing a struct that details the content of the input isn't really an option, since I don't know the JSON keys beforehand.
How can I do this "the Go way"?
EDIT: This is the actual use case:
I have two web services that need a "translator" between them.
Service 1 makes a request to the translator, where a couple of fields are modified, everything else is left intact.
Then the translator takes the modified JSON and replicates the request to service 2.
In other words, this translator acts like a man in the middle for both services. Go seems to be a good option for this given its fast startup times and fast JSON handling.
I don't think it makes sense to detail every JSON field in a Go struct, since I only need to change a few fields. I'm willing to make a tradeoff in efficiency because of reflection (parsing to a map[string]interface{} should be slower than using a full-blown struct), in exchange of making the code more generic to variations of the JSON input.
Change
j["object"]["array"][i]
to
j["object"].(map[string]interface{})["array"].([]interface{})[i]

Sending a MongoDB query to a different system: converting to JSON and then decoding into BSON? How to do it in Go language?

I need to transfer a MongoDB query to a different system. For this reason I would like to use the MongoDB Extended JSON. I need this to be done mostly because I use date comparisons in my queries.
So, the kernel of the problem is that I need to transfer a MongoDB query that has been generated in a node.js back-end to another back-end written in Go language.
Intuitively, the most obvious format for sending this query via REST, is JSON. But, MongoDB queries are not exactly JSON, but BSON, which contains special constructs for dates.
So, the idea is to convert the queries into JSON using MongoDB Extended JSON as form of representation of the special constructs. After some tests it's clear that these queries do not work. Both the MongoDB shell and queries sent via node.js's need the special ISODate or new Date constructs.
Finally, the actual question: are there functions to encode/decode from JSON to BSON, taking into account MongoDB Extended JSON, both in JavaScript (node.js) and Go language?
Updates
Node.js encoding package
Apparently there is a node.js package that parses and stringifies BSON/JSON.
So, half of my problem is resolved. I wonder if there is something like this in Go language.
Sample query
For example, the following query is in normal BSON:
{ Tmin: { $gt: ISODate("2006-01-01T23:00:00.000Z") } }
Translated into MongoDB Extended JSON, it becomes:
{ "Tmin": { "$gt" : { "$date" : 1136156400000 }}}
After some research I found the mejson library, however it's for Marshaling only, so I decided to write an Unmarshaller.
Behold ejson (I wrote it), right now it's a very simple ejson -> bson converter, there's no bson -> ejson yet, you can use mejson for that.
An example:
const j = `{"_id":{"$oid":"53c2ab5e4291b17b666d742a"},"last_seen_at":{"$date":1405266782008},"display_name":{"$undefined":true},
"ref":{"$ref":"col2", "$id":"53c2ab5e4291b17b666d742b"}}`
type TestS struct {
Id bson.ObjectId `bson:"_id"`
LastSeenAt *time.Time `bson:"last_seen_at"`
DisplayName *string `bson:"display_name,omitempty"`
Ref mgo.DBRef `bson:"ref"`
}
func main() {
var ts TestS
if err := ejson.Unmarshal([]byte(j), &ts); err != nil {
panic(err)
}
fmt.Printf("%+v\n", ts)
//or to convert the ejson to bson.M
var m map[string]interface{}
if err := json.Unmarshal([]byte(j), &m); err != nil {
t.Fatal(err)
}
err := ejson.Normalize(m)
if err != nil {
panic(err)
}
fmt.Printf("%+v\n", m)
}