Determine a JSON tag efficiently - json

I have a bunch of JSON files, each containing a very large array of complex data. The JSON files look something like:
ids.json
{
"ids": [1,2,3]
}
names.json:
{
"names": ["Tyrion","Jaime","Cersei"]
}
and so on. (In reality, the array elements are complex struct objects with 10s of fields)
I want to extract just the tag that specifies what kind of array it contains. Currently I'm using encoding/json to unmarshal the whole file into a map[string]interface{} and iterate through the map but that is too costly an operation.
Is there a faster way of doing this, preferably without the involvement of unmarshaling entire data?

You can offset the reader right after the opening curly brace then use json.Decoder to decode only the first token from the reader
Something along these lines
sr := strings.NewReader(`{
"ids": [1,2,3]
}`)
for {
b, err := sr.ReadByte()
if err != nil {
fmt.Println(err)
return
}
if b == '{' {
break
}
}
d := json.NewDecoder(sr)
var key string
err := d.Decode(&key)
if err != nil {
fmt.Println(err)
return
}
fmt.Println(key)
https://play.golang.org/p/xJJEqj0tFk9
Additionally you may wrap your io.Reader you obtained from open with bufio.Reader to avoid multiple single-byte writes
This solution assumes contents is a valid JSON object. Not that you could avoid that anyway.

I had a play around with Decoder.Token() reading one token at a time (see this example, line 87), and this works to extract your array label:
const jsonStream = `{
"ids": [1,2,3]
}`
dec := json.NewDecoder(strings.NewReader(jsonStream))
t, err := dec.Token()
if err != nil {
log.Fatal(err)
}
fmt.Printf("First token: %v\n", t)
t, err = dec.Token()
if err != nil {
log.Fatal(err)
}
fmt.Printf("Second token (array label): %v\n", t)

Related

Go reading map from json stream

I need to parse really long json file (more than million items). I don't want to load it to the memory and read it chunk by chunk. There's a good example with the array of items here. The problem is that I deal with the map. And when I call Decode I get not at beginning of value.
I can't get what should be changed.
const data = `{
"object1": {"name": "cattle","location": "kitchen"},
"object2": {"name": "table","location": "office"}
}`
type ReadObject struct {
Name string `json:"name"`
Location string `json:"location"`
}
func ParseJSON() {
dec := json.NewDecoder(strings.NewReader(data))
tkn, err := dec.Token()
if err != nil {
log.Fatalf("failed to read opening token: %v", err)
}
fmt.Printf("opening token: %v\n", tkn)
objects := make(map[string]*ReadObject)
for dec.More() {
var nextSymbol string
if err := dec.Decode(&nextSymbol); err != nil {
log.Fatalf("failed to parse next symbol: %v", err)
}
nextObject := &ReadObject{}
if err := dec.Decode(&nextObject); err != nil {
log.Fatalf("failed to parse next object")
}
objects[nextSymbol] = nextObject
}
tkn, err = dec.Token()
if err != nil {
log.Fatalf("failed to read closing token: %v", err)
}
fmt.Printf("closing token: %v\n", tkn)
fmt.Printf("OBJECTS: \n%v\n", objects)
}
TL,DR: when you are calling Token() method for a first time, you move offset from the beginning (of a JSON value) and therefore you get the error.
You are working with this struct (link):
type Decoder struct {
// others fields omits for simplicity
tokenState int
}
Pay attention for a tokenState field. This value could be one of (link):
const (
tokenTopValue = iota
tokenArrayStart
tokenArrayValue
tokenArrayComma
tokenObjectStart
tokenObjectKey
tokenObjectColon
tokenObjectValue
tokenObjectComma
)
Let's back to your code. You are calling Token() method. This method obtains first JSON-valid token { and changes tokenState from tokenObjectValue to the tokenObjectStart (link). Now you are "in-an-object" state.
If you try to call Decode() at this point you will get an error (not at beginning of value). This is because allowed states of tokenState for calling Decode() are tokenTopValue, tokenArrayStart, tokenArrayValue, tokenObjectValue, i.e. "full" value, not part of it (link).
To avoid this you can just don't call Token() at all and do something like this:
dec := json.NewDecoder(strings.NewReader(dataMapFromJson))
objects := make(map[string]*ReadObject)
if err := dec.Decode(&objects); err != nil {
log.Fatalf("failed to parse next symbol: %v", err)
}
fmt.Printf("OBJECTS: \n%v\n", objects)
Or, if you want to read chunk-by-chunk, you could keep calling Token() until you reach "full" value. And then call Decode() on this value (I guess this should work).
After consuming the initial { with your first call to dec.Token(), you must :
use dec.Token() to extract the next key
after extracting the key, you can call dec.Decode(&nextObject) to decode an entry
example code :
for dec.More() {
key, err := dec.Token()
if err != nil {
// handle error
}
var val interface{}
err = dec.Decode(&val)
if err != nil {
// handle error
}
fmt.Printf(" %s : %v\n", key, val)
}
https://play.golang.org/p/5r1d8MsNlKb

How to read a JSON object in Go without decoding it (for use in reading a large stream)

I am reading JSON in response to an HTTP endpoint and would like to extract the contents of an array of objects which is nested inside. The response can be large so I am trying to use a streaming approach instead of just json.Unmarshal'ing the whole thing. The JSON looks like so:
{
"useless_thing_1": { /* etc */ },
"useless_thing_2": { /* etc */ },
"the_things_i_want": [
{ /* complex object I want to json.Unmarshal #1 */ },
{ /* complex object I want to json.Unmarshal #2 */ },
{ /* complex object I want to json.Unmarshal #3 */ },
/* could be many thousands of these */
],
"useless_thing_3": { /* etc */ },
}
The json library provided with Go has json.Unmarshal which works well for complete JSON objects. It also has json.Decoder which can unmarshal full objects or provide individual tokens. I can use this tokenizer to carefully go through and extract things but the logic to do so is somewhat complex and I cannot then easily still use json.Unmarshal on the object after I've read it as tokens.
The json.Decoder is buffered which makes it difficult to read one object (i.e. { /* complex object I want to json.Unmarshal #1 */ }) and then consume the , myself and make a new json.Decoder - because it will try to consume the comma itself. This is the approach I tried and haven't been able to make work.
I'm looking for a better solution to this problem. Here is the broken code when I tried to manually consume the commas:
// code here that naively looks for `"the_things_i_want": [` and
// puts the next bytes after that in `buffer`
// this is the rest of the stream starting from `{ /* complex object I want to json.Unmarshal #1 */ },`
in := io.MultiReader(buffer, res.Body)
dec := json.NewDecoder(in)
for {
var p MyComplexThing
err := dec.Decode(&p)
if err != nil {
panic(err)
}
// steal the comma from in directly - this does not work because the decoder buffer's its input
var b1 [1]byte
_, err = io.ReadAtLeast(in, b1[:], 1) // returns random data from later in the stream
if err != nil {
panic(err)
}
switch b1[0] {
case ',':
// skip over it
case ']':
break // we're done
default:
panic(fmt.Errorf("Unexpected result from read %#v", b1))
}
}
Use Decoder.Token and Decoder.More to decode a JSON document as a stream.
Walk through the document with Decoder.Token to the JSON value of interest. Call Decoder.Decode unmarshal the JSON value to a Go value. Repeat as needed to slurp up all values of interest.
Here's some code with commentary explaining how it works:
func decode(r io.Reader) error {
d := json.NewDecoder(r)
// We expect that the JSON document is an object.
if err := expect(d, json.Delim('{')); err != nil {
return err
}
// While there are fields in the object...
for d.More() {
// Get field name
t, err := d.Token()
if err != nil {
return err
}
// Skip value if not the field that we are looking for.
if t != "the_things_i_want" {
if err := skip(d); err != nil {
return err
}
continue
}
// We expect JSON array value for the field.
if err := expect(d, json.Delim('[')); err != nil {
return err
}
// While there are more JSON array elements...
for d.More() {
// Unmarshal and process the array element.
var m map[string]interface{}
if err := d.Decode(&m); err != nil {
return err
}
fmt.Printf("found %v\n", m)
}
// We are done decoding the array.
return nil
}
return errors.New("things I want not found")
}
// skip skips the next value in the JSON document.
func skip(d *json.Decoder) error {
n := 0
for {
t, err := d.Token()
if err != nil {
return err
}
switch t {
case json.Delim('['), json.Delim('{'):
n++
case json.Delim(']'), json.Delim('}'):
n--
}
if n == 0 {
return nil
}
}
}
// expect returns an error if the next token in the document is not expectedT.
func expect(d *json.Decoder, expectedT interface{}) error {
t, err := d.Token()
if err != nil {
return err
}
if t != expectedT {
return fmt.Errorf("got token %v, want token %v", t, expectedT)
}
return nil
}
Run it on the playground.

find and delete nested json object in Go

I have a json document of a Kubernetes Pod, here's an example:
https://github.com/itaysk/kubectl-neat/blob/master/test/fixtures/pod-1-raw.json
I'd like to traverse spec.containers[i].volumeMounts and delete those volumeMount objects where the .name starts with "default-token-". Note that both containers and volumeMounts are arrays.
Using jq it took me 1 min to write this 1 line: try del(.spec.containers[].volumeMounts[] | select(.name | startswith("default-token-"))). I'm trying to rewrite this in Go.
While looking for a good json library I settled on gjson/sjson.
Since sjson doesn't support array accessors (the # syntax), and gjson doesn't support getting the path of result, I looked for workarounds.
I've tried using Result.Index do delete the the result from the byte slice directly, and succeeded, but for the query I wrote (spec.containers.#.volumeMounts.#(name%\"default-token-*\")|0) the Index is always 0 (I tried different variations of it, same result).
So currently I have some code 25 line code that uses gjson to get spec.containers.#.volumeMounts and iterate it's way through the structure and eventually use sjson.Delete to delete.
It works, but it feels way more complicated then I expected it to be.
Is there a better way to do this in Go? I'm willing to switch json library if needed.
EDIT: I would prefer to avoid using a typed schema because I may need to perform this on different types, for some I don't have the full schema.
(also removed some distracting details about my current implemetation)
The easiest thing to do here is parse the JSON into an object, work with that object, then serialise back into JSON.
Kubernetes provides a Go client library that defines the v1.Pod struct you can Unmarshal onto using the stdlib encoding/json:
// import "k8s.io/api/core/v1"
var pod v1.Pod
if err := json.Unmarshal(podBody, &pod); err != nil {
log.Fatalf("parsing pod json: %s", err)
}
From there you can read pod.Spec.Containers and their VolumeMounts:
// Modify.
for c := range pod.Spec.Containers {
container := &pod.Spec.Containers[c]
for i, vol := range container.VolumeMounts {
if strings.HasPrefix(vol.Name, "default-token-") {
// Remove the VolumeMount at index i.
container.VolumeMounts = append(container.VolumeMounts[:i], container.VolumeMounts[i+1:]...)
}
}
}
https://play.golang.org/p/3r5-XKIazhK
If you're worried about losing some arbitrary JSON which might appear in your input, you may instead wish to define var pod map[string]interface{} and then type-cast each of the properties within as spec, ok := pod["spec"].(map[string]interface{}), containers, ok := spec["containers"].([]map[string]interface) and so on.
Hope that helps.
ps. The "removing" is following https://github.com/golang/go/wiki/SliceTricks#delete
To take a totally different approach from before, you could create a
type Root struct {
fields struct {
Spec *Spec `json:"spec,omitempty"`
}
other map[string]interface{}
}
with custom UnmarshalJSON which unmarshals into both fields and other, and custom MarshalJSON which sets other["spec"] = json.RawMessage(spec.MarshalJSON()) before returning json.Marshal(other):
func (v *Root) UnmarshalJSON(b []byte) error {
if err := json.Unmarshal(b, &v.fields); err != nil {
return err
}
if v.other == nil {
v.other = make(map[string]interface{})
}
if err := json.Unmarshal(b, &v.other); err != nil {
return err
}
return nil
}
func (v *Root) MarshalJSON() ([]byte, error) {
var err error
if v.other["spec"], err = rawMarshal(v.fields.Spec); err != nil {
return nil, err
}
return json.Marshal(v.other)
}
func rawMarshal(v interface{}) (json.RawMessage, error) {
b, err := json.Marshal(v)
if err != nil {
return nil, err
}
return json.RawMessage(b), nil
}
You then define these sort of types all of the way down through .spec.containers.volumeMounts and have a Container.MarshalJSON which throws away and VolumeMounts we don't like:
func (v *Container) MarshalJSON() ([]byte, error) {
mounts := v.fields.VolumeMounts
for i, mount := range mounts {
if strings.HasPrefix(mount.fields.Name, "default-token-") {
mounts = append(mounts[:i], mounts[i+1:]...)
}
}
var err error
if v.other["volumeMounts"], err = rawMarshal(mounts); err != nil {
return nil, err
}
return json.Marshal(v.other)
}
Full playground example: https://play.golang.org/p/k1603cchwC7
I wouldn't do this.

merge two map[string]interface{} from json

I have two json inputs built this way
"count: 1 result: fields"
I would like to concatenate the fields that I find within result without using a defined structure. I have tried in many ways but most of the time the result is an error about the type Interface {} or the last map overwritten the data
I would like both the "result" and the first and second map fields to be merged within the result in output.
oracle, err := http.Get("http://XXX:8080/XXXX/"+id)
if err != nil {
panic(err)
}
defer oracle.Body.Close()
mysql, err := http.Get("http://XXX:3000/XXX/"+id)
if err != nil {
panic(err)
}
defer mysql.Body.Close()
oracleJSON, err := ioutil.ReadAll(oracle.Body)
if err != nil {
panic(err)
}
mysqlJSON, err := ioutil.ReadAll(mysql.Body)
if err != nil {
panic(err)
}
var oracleOUT map[string]interface{}
var mysqlOUT map[string]interface{}
json.Unmarshal(oracleJSON, &oracleOUT)
json.Unmarshal(mysqlJSON, &mysqlOUT)
a := oracleOUT["result"]
b := mysqlOUT["result"]
c.JSON(http.StatusOK, gin.H{"result": ????})
this is an example of json
{"count":1,"result":{"COD_DIPENDENTE":"00060636","MATRICOLA":"60636","COGNOME":"PIPPO"}}
If i have two json like this the result of the function it should be
`"result":{"COD_DIPENDENTE":"00060636","MATRICOLA":"60636","COGNOME":"PIPPO","COD_DIPENDENTE":"00060636","MATRICOLA":"60636","COGNOME":"PIPPO"}}`
The output you are looking for is not valid JSON. However with a small change you can output something very similar to your example that is valid JSON.
You probably do want to use a defined structure for the portion of the input that has a known structure, so that you can extract the more abstract "result" section more easily.
If you start at the top of the input structure using a map[string]interface{} then you'll have to do a type assertion on the "result" key. For example:
var input map[string]interface{}
err = json.Unmarshal(data, &input)
if err != nil {
return err
}
keys, ok := input["result"].(map[string]interface{})
if !ok {
return errors.New("wasn't the type we expected")
}
However if you used a defined structure for the top level you can do it like the following which feels much cleaner.
type Input struct {
Count int `json:"count"`
Result map[string]interface{} `json:"result"`
}
var input Input
err = json.Unmarshal(data, &input)
if err != nil {
return err
}
// from here you can use input.Result directly without a type assertion
To generate output that has duplicate keys, you could use an array of objects with a single key/value pair in each, then you end up with a valid JSON structure that does not overwrite keys. Here's how to do that (playground link):
package main
import (
"encoding/json"
"fmt"
)
type Input struct {
Count int `json:"count"`
Result map[string]interface{} `json:"result"`
}
type Output struct {
Count int `json:"count"`
Result []map[string]interface{} `json:"result"`
}
var inputdata = [][]byte{
[]byte(`{"count":1,"result":{"COD_DIPENDENTE":"00060636", "MATRICOLA":"60636", "COGNOME":"PIPPO"}}`),
[]byte(`{"count":1,"result":{"COD_DIPENDENTE":"00060636", "MATRICOLA":"60636", "COGNOME":"PIPPO"}}`),
}
func main() {
inputs := make([]Input, len(inputdata))
for i := range inputs {
err := json.Unmarshal(inputdata[i], &inputs[i])
if err != nil {
panic(err)
}
}
var out Output
out.Count = len(inputs)
for _, input := range inputs {
for k, v := range input.Result {
out.Result = append(out.Result, map[string]interface{}{k: v})
}
}
outdata, _ := json.Marshal(out)
fmt.Println(string(outdata))
}
Which produces output that looks like this when formatted:
{
"count": 2,
"result": [
{"MATRICOLA": "60636"},
{"COGNOME": "PIPPO"},
{"COD_DIPENDENTE": "00060636"},
{"COGNOME": "PIPPO"},
{"COD_DIPENDENTE": "00060636"},
{"MATRICOLA": "60636"}
]
}

JSON single value parsing

In python you can take a json object and grab a specific item from it without declaring a struct, saving to a struct then obtaining the value like in Go. Is there a package or easier way to store a specific value from json in Go?
python
res = res.json()
return res['results'][0]
Go
type Quotes struct {
AskPrice string `json:"ask_price"`
}
quote := new(Quotes)
errJson := json.Unmarshal(content, &quote)
if errJson != nil {
return "nil", fmt.Errorf("cannot read json body: %v", errJson)
}
You can decode into a map[string]interface{} and then get the element by key.
func main() {
b := []byte(`{"ask_price":"1.0"}`)
data := make(map[string]interface{})
err := json.Unmarshal(b, &data)
if err != nil {
panic(err)
}
if price, ok := data["ask_price"].(string); ok {
fmt.Println(price)
} else {
panic("wrong type")
}
}
Structs are often preferred as they are more explicit about the type. You only have to declare the fields in the JSON you care about, and you don't need to type assert the values as you would with a map (encoding/json handles that implicitly).
Try either fastjson or jsonparser. jsonparser is optimized for the case when a single JSON field must be selected, while fastjson is optimized for the case when multiple unrelated JSON fields must be selected.
Below is an example code for fastjson:
var p fastjson.Parser
v, err := p.Parse(content)
if err != nil {
log.Fatal(err)
}
// obtain v["ask_price"] as float64
price := v.GetFloat64("ask_price")
// obtain v["results"][0] as generic JSON value
result0 := v.Get("results", "0")