Decode JSON as it is still streaming in via net/http - json

In the past I've used go to decode JSON from an API endpoint in the manner shown below.
client := &http.Client{}
req, err := http.NewRequest("GET", "https://some/api/endpoint", nil)
res, err := client.Do(req)
defer res.Body.Close()
buf, _ := ioutil.ReadAll(res.Body)
// ... Do some error checking etc ...
err = json.Unmarshal(buf, &response)
I am shortly going to be working on an endpoint that could send me several megabytes of JSON data in the following format.
{
"somefield": "value",
"items": [
{ LARGE OBJECT },
{ LARGE OBJECT },
{ LARGE OBJECT },
{ LARGE OBJECT },
...
]
}
The JSON will at some point contain an array of large, arbitrary length, objects. I want to take each one of these objects and place them, separately, into a message queue. I do not need to decode the objects themselves.
If I used my normal method, this would load the entire response into memory before decoding it.
Is there a good way to split out each of the LARGE OBJECT items as the response is still streaming in and dispatch it off to the queue? I'm doing this to avoid holding as much data in memory.

Decoding a JSON stream is possible with the json.Decoder.
With Decoder.Decode(), we may read (unmarshal) a single value without consuming and unmarshaling the complete stream. This is cool, but your input is a "single" JSON object, not a series of JSON objects, which means a call to Decoder.Decode() would attempt to unmarshal the complete JSON object with all items (large objects).
What we want is partially, on-the-fly processing of a single JSON object. For this, we may use Decoder.Token() which parses (advances) only the next subsequent token in the JSON input stream and returns it. This is called event-driven parsing.
Of course we have to "process" (interpret and act upon) the tokens and build a "state machine" that keeps track of where we're in the JSON structure we're processing.
Here's an implementation that solves your problem.
We will use the following JSON input:
{
"somefield": "value",
"otherfield": "othervalue",
"items": [
{ "id": "1", "data": "data1" },
{ "id": "2", "data": "data2" },
{ "id": "3", "data": "data3" },
{ "id": "4", "data": "data4" }
]
}
And read the items, the "large objects" modeled by this type:
type LargeObject struct {
Id string `json:"id"`
Data string `json:"data"`
}
We will also parse and interpret other fields in the JSON object, but we will only log / print them.
For brevity and easy error handling, We'll use this helper error handler function:
he := func(err error) {
if err != nil {
log.Fatal(err)
}
}
And now let's see some action. In the example below for brevity and to have a working demonstration on the Go Playground, we'll read from a string value. To read from an actual HTTP response body, we only have to change a single line, which is how we create the json.Decoder:
dec := json.NewDecoder(res.Body)
So the demonstration:
dec := json.NewDecoder(strings.NewReader(jsonStream))
// We expect an object
t, err := dec.Token()
he(err)
if delim, ok := t.(json.Delim); !ok || delim != '{' {
log.Fatal("Expected object")
}
// Read props
for dec.More() {
t, err = dec.Token()
he(err)
prop := t.(string)
if t != "items" {
var v interface{}
he(dec.Decode(&v))
log.Printf("Property '%s' = %v", prop, v)
continue
}
// It's the "items". We expect it to be an array
t, err := dec.Token()
he(err)
if delim, ok := t.(json.Delim); !ok || delim != '[' {
log.Fatal("Expected array")
}
// Read items (large objects)
for dec.More() {
// Read next item (large object)
lo := LargeObject{}
he(dec.Decode(&lo))
fmt.Printf("Item: %+v\n", lo)
}
// Array closing delim
t, err = dec.Token()
he(err)
if delim, ok := t.(json.Delim); !ok || delim != ']' {
log.Fatal("Expected array closing")
}
}
// Object closing delim
t, err = dec.Token()
he(err)
if delim, ok := t.(json.Delim); !ok || delim != '}' {
log.Fatal("Expected object closing")
}
This will produce the following output:
2009/11/10 23:00:00 Property 'somefield' = value
2009/11/10 23:00:00 Property 'otherfield' = othervalue
Item: {Id:1 Data:data1}
Item: {Id:2 Data:data2}
Item: {Id:3 Data:data3}
Item: {Id:4 Data:data4}
Try the full, working example on the Go Playground.

If you want to be as productive as possible, you could read key-value pairs from the stream and tokenize it by your self using lexer from mailru/easyjson library:
r := bufio.NewReader(stream)
for err == nil {
pair, _ := r.ReadBytes(',')
x := jlexer.Lexer{
Data: pair,
}
fmt.Printf("%q = ", x.String())
x.WantColon()
fmt.Printf("%d\n", x.Int())
}
Note that error handling and some additional checks are skipped for the sake of simplicity. Here the full working example: https://play.golang.org/p/kk-7aEotqFd

Related

Proper json unmarshaling in Go with the empty interface

I'm currently learning golang and (probably as many others before me) I'm trying to properly understand the empty interface.
As an exercise, I'm reading a big json file produced by Postman and trying to access just one field (out of the many available).
Here is a simple representation of the json without the unnecessary fields I don't want to read (but that are still there):
{
"results": [
{
"times": [
1,
2,
3,
4
]
}
]
}
Since the json object is big, I opted out of unmarshaling it with a custom struct, and rather decided to use the empty interface interface{}
After some time, I managed to get some working code, but I'm quite sure this isn't the correct way of doing it.
byteValue, _ := ioutil.ReadAll(jsonFile)
var result map[string]interface{}
err = json.Unmarshal(byteValue, &result)
if err != nil {
log.Fatalln(err)
}
// ESPECIALLY UGLY
r := result["results"].([]interface{})
r1 := r[0].(map[string]interface{})
r2 := r1["times"].([]interface{})
times := make([]float64, len(r2))
for i := range r2 {
times[i] = r2[i].(float64)
}
Is there a better way to navigate through my json object without having to instantiate new variables every time i move deeper and deeper into the object?
Even if the JSON is large, you only have to define the fields you actually care about
You only need to use JSON tags if the keys aren't valid Go
identifiers (keys are idents in this case), even then you can sometimes avoid it by using a map[string]something
Unless you need the sub structs for some function or whatnot, you don't need to define them
Unless you need to reuse the type, you don't even have to define that, you can just define the struct at declaration time
Example:
package main
import (
"encoding/json"
"fmt"
)
const s = `
{
"results": [
{
"times": [1, 2, 3, 4]
}
]
}
`
func main() {
var t struct {
Results []struct {
Times []int
}
}
json.Unmarshal([]byte(s), &t)
fmt.Printf("%+v\n", t) // {Results:[{Times:[1 2 3 4]}]}
}
[...] trying to access just one field (out of the many available).
For this concrete use case I would use a library to query and access to a single value in a known path like:
https://github.com/jmespath/go-jmespath
In the other hand, if you're practicing how to access nested values in a JSON, I would recommend you to give a try to write a recursive function that follows a path in an unknown structure the same way (but simple) like go-jmespath does.
Ok, I challenged myself and spent an hour writing this. It works. Not sure about performance or bugs and it's really limited :)
https://play.golang.org/p/dlIsmG6Lk-p
package main
import (
"encoding/json"
"errors"
"fmt"
"strings"
)
func main() {
// I Just added a bit more of data to the structure to be able to test different paths
fileContent := []byte(`
{"results": [
{"times": [
1,
2,
3,
4
]},
{"times2": [
5,
6,
7,
8
]},
{"username": "rosadabril"},
{"age": 42},
{"location": [41.5933262, 1.8376757]}
],
"more_results": {
"nested_1": {
"nested_2":{
"foo": "bar"
}
}
}
}`)
var content map[string]interface{}
if err := json.Unmarshal(fileContent, &content); err != nil {
panic(err)
}
// some paths to test
valuePaths := []string{
"results.times",
"results.times2",
"results.username",
"results.age",
"results.doesnotexist",
"more_results.nested_1.nested_2.foo",
}
for _, p := range valuePaths {
breadcrumbs := strings.Split(p, ".")
value, err := search(breadcrumbs, content)
if err != nil {
fmt.Printf("\nerror searching '%s': %s\n", p, err)
continue
}
fmt.Printf("\nFOUND A VALUE IN: %s\n", p)
fmt.Printf("Type: %T\nValue: %#v\n", value, value)
}
}
// search is our fantastic recursive function! The idea is to search in the structure in a very basic way, for complex querying use jmespath
func search(breadcrumbs []string, content map[string]interface{}) (interface{}, error) {
// we should never hit this point, but better safe than sorry and we could incurr in an out of range error (check line 82)
if len(breadcrumbs) == 0 {
return nil, errors.New("ran out of breadcrumbs :'(")
}
// flag that indicates if we are at the end of our trip and whe should return the value without more checks
lastBreadcrumb := len(breadcrumbs) == 1
// current breadcrumb is always the first element.
currentBreadcrumb := breadcrumbs[0]
if value, found := content[currentBreadcrumb]; found {
if lastBreadcrumb {
return value, nil
}
// if the value is a map[string]interface{}, go down the rabbit hole, recursion!
if aMap, isAMap := value.(map[string]interface{}); isAMap {
// we are calling ourselves popping the first breadcrumb and passing the current map
return search(breadcrumbs[1:], aMap)
}
// if it's an array of interfaces the thing gets complicated :(
if anArray, isArray := value.([]interface{}); isArray {
for _, something := range anArray {
if aMap, isAMap := something.(map[string]interface{}); isAMap && len(breadcrumbs) > 1 {
if v, err := search(breadcrumbs[1:], aMap); err == nil {
return v, nil
}
}
}
}
}
return nil, errors.New("woops, nothing here")
}

How to read a JSON object in Go without decoding it (for use in reading a large stream)

I am reading JSON in response to an HTTP endpoint and would like to extract the contents of an array of objects which is nested inside. The response can be large so I am trying to use a streaming approach instead of just json.Unmarshal'ing the whole thing. The JSON looks like so:
{
"useless_thing_1": { /* etc */ },
"useless_thing_2": { /* etc */ },
"the_things_i_want": [
{ /* complex object I want to json.Unmarshal #1 */ },
{ /* complex object I want to json.Unmarshal #2 */ },
{ /* complex object I want to json.Unmarshal #3 */ },
/* could be many thousands of these */
],
"useless_thing_3": { /* etc */ },
}
The json library provided with Go has json.Unmarshal which works well for complete JSON objects. It also has json.Decoder which can unmarshal full objects or provide individual tokens. I can use this tokenizer to carefully go through and extract things but the logic to do so is somewhat complex and I cannot then easily still use json.Unmarshal on the object after I've read it as tokens.
The json.Decoder is buffered which makes it difficult to read one object (i.e. { /* complex object I want to json.Unmarshal #1 */ }) and then consume the , myself and make a new json.Decoder - because it will try to consume the comma itself. This is the approach I tried and haven't been able to make work.
I'm looking for a better solution to this problem. Here is the broken code when I tried to manually consume the commas:
// code here that naively looks for `"the_things_i_want": [` and
// puts the next bytes after that in `buffer`
// this is the rest of the stream starting from `{ /* complex object I want to json.Unmarshal #1 */ },`
in := io.MultiReader(buffer, res.Body)
dec := json.NewDecoder(in)
for {
var p MyComplexThing
err := dec.Decode(&p)
if err != nil {
panic(err)
}
// steal the comma from in directly - this does not work because the decoder buffer's its input
var b1 [1]byte
_, err = io.ReadAtLeast(in, b1[:], 1) // returns random data from later in the stream
if err != nil {
panic(err)
}
switch b1[0] {
case ',':
// skip over it
case ']':
break // we're done
default:
panic(fmt.Errorf("Unexpected result from read %#v", b1))
}
}
Use Decoder.Token and Decoder.More to decode a JSON document as a stream.
Walk through the document with Decoder.Token to the JSON value of interest. Call Decoder.Decode unmarshal the JSON value to a Go value. Repeat as needed to slurp up all values of interest.
Here's some code with commentary explaining how it works:
func decode(r io.Reader) error {
d := json.NewDecoder(r)
// We expect that the JSON document is an object.
if err := expect(d, json.Delim('{')); err != nil {
return err
}
// While there are fields in the object...
for d.More() {
// Get field name
t, err := d.Token()
if err != nil {
return err
}
// Skip value if not the field that we are looking for.
if t != "the_things_i_want" {
if err := skip(d); err != nil {
return err
}
continue
}
// We expect JSON array value for the field.
if err := expect(d, json.Delim('[')); err != nil {
return err
}
// While there are more JSON array elements...
for d.More() {
// Unmarshal and process the array element.
var m map[string]interface{}
if err := d.Decode(&m); err != nil {
return err
}
fmt.Printf("found %v\n", m)
}
// We are done decoding the array.
return nil
}
return errors.New("things I want not found")
}
// skip skips the next value in the JSON document.
func skip(d *json.Decoder) error {
n := 0
for {
t, err := d.Token()
if err != nil {
return err
}
switch t {
case json.Delim('['), json.Delim('{'):
n++
case json.Delim(']'), json.Delim('}'):
n--
}
if n == 0 {
return nil
}
}
}
// expect returns an error if the next token in the document is not expectedT.
func expect(d *json.Decoder, expectedT interface{}) error {
t, err := d.Token()
if err != nil {
return err
}
if t != expectedT {
return fmt.Errorf("got token %v, want token %v", t, expectedT)
}
return nil
}
Run it on the playground.

Determine a JSON tag efficiently

I have a bunch of JSON files, each containing a very large array of complex data. The JSON files look something like:
ids.json
{
"ids": [1,2,3]
}
names.json:
{
"names": ["Tyrion","Jaime","Cersei"]
}
and so on. (In reality, the array elements are complex struct objects with 10s of fields)
I want to extract just the tag that specifies what kind of array it contains. Currently I'm using encoding/json to unmarshal the whole file into a map[string]interface{} and iterate through the map but that is too costly an operation.
Is there a faster way of doing this, preferably without the involvement of unmarshaling entire data?
You can offset the reader right after the opening curly brace then use json.Decoder to decode only the first token from the reader
Something along these lines
sr := strings.NewReader(`{
"ids": [1,2,3]
}`)
for {
b, err := sr.ReadByte()
if err != nil {
fmt.Println(err)
return
}
if b == '{' {
break
}
}
d := json.NewDecoder(sr)
var key string
err := d.Decode(&key)
if err != nil {
fmt.Println(err)
return
}
fmt.Println(key)
https://play.golang.org/p/xJJEqj0tFk9
Additionally you may wrap your io.Reader you obtained from open with bufio.Reader to avoid multiple single-byte writes
This solution assumes contents is a valid JSON object. Not that you could avoid that anyway.
I had a play around with Decoder.Token() reading one token at a time (see this example, line 87), and this works to extract your array label:
const jsonStream = `{
"ids": [1,2,3]
}`
dec := json.NewDecoder(strings.NewReader(jsonStream))
t, err := dec.Token()
if err != nil {
log.Fatal(err)
}
fmt.Printf("First token: %v\n", t)
t, err = dec.Token()
if err != nil {
log.Fatal(err)
}
fmt.Printf("Second token (array label): %v\n", t)

Converting YAML to JSON in Go

I have a config file in YAML format, which I am trying to output as JSON via an http API call. I am unmarshalling using gopkg.in/yaml.v2. Yaml can have non-string keys, which means that the yaml is unmarshalled as map[interface{}]interface{}, which is not supported by Go's JSON marshaller. Therefore I convert to map[string]interface{} before unmarshalling. But I still get: json: unsupported type: map[interface {}]interface" {}. I don't understand. The variable cfy is not map[interface{}]interface{}.
import (
"io/ioutil"
"net/http"
"encoding/json"
"gopkg.in/yaml.v2"
)
func GetConfig(w http.ResponseWriter, r *http.Request) {
cfy := make(map[interface{}]interface{})
f, err := ioutil.ReadFile("config/config.yml")
if err != nil {
// error handling
}
if err := yaml.Unmarshal(f, &cfy); err != nil {
// error handling
}
//convert to a type that json.Marshall can digest
cfj := make(map[string]interface{})
for key, value := range cfy {
switch key := key.(type) {
case string:
cfj[key] = value
}
}
j, err := json.Marshal(cfj)
if err != nil {
// errr handling. We get: "json: unsupported type: map[interface {}]interface" {}
}
w.Header().Set("content-type", "application/json")
w.Write(j)
}
Your solution only converts values at the "top" level. If a value is also a map (nested map), your solution does not convert those.
Also you only "copy" the values with string keys, the rest will be left out of the result map.
Here's a function that recursively converts nested maps:
func convert(m map[interface{}]interface{}) map[string]interface{} {
res := map[string]interface{}{}
for k, v := range m {
switch v2 := v.(type) {
case map[interface{}]interface{}:
res[fmt.Sprint(k)] = convert(v2)
default:
res[fmt.Sprint(k)] = v
}
}
return res
}
Testing it:
m := map[interface{}]interface{}{
1: "one",
"two": 2,
"three": map[interface{}]interface{}{
"3.1": 3.1,
},
}
m2 := convert(m)
data, err := json.Marshal(m2)
if err != nil {
panic(err)
}
fmt.Println(string(data))
Output (try it on the Go Playground):
{"1":"one","three":{"3.1":3.1},"two":2}
Some things to note:
To covert interface{} keys, I used fmt.Sprint() which will handle all types. The switch could have a dedicated string case for keys that are already string values to avoid calling fmt.Sprint(). This is solely for performance reasons, the result will be the same.
The above convert() function does not go into slices. So for example if the map contains a value which is a slice ([]interface{}) which may also contain maps, those will not be converted. For a full solution, see the lib below.
There is a lib github.com/icza/dyno which has an optimized, built-in support for this (disclosure: I'm the author). Using dyno, this is how it would look like:
var m map[interface{}]interface{} = ...
m2 := dyno.ConvertMapI2MapS(m)
dyno.ConvertMapI2MapS() also goes into and converts maps in []interface{} slices.
Also see possible duplicate: Convert yaml to json without struct

How to decode complicated unnamed JSON in Golang

I am currently trying to decode the following JSON structure:
[
{
"2015-08-14 19:29:48-04:00": {
"value": "0.1",
"measurement_tag_id": "0.1.1a",
"UTC_time": "2015-08-14 23:29:48",
"error": "0"
}
},
{
"2015-08-14 19:37:07-04:00": {
"value": "0.1",
"measurement_tag_id": "0.1.1b",
"UTC_time": "2015-08-14 23:37:07",
"error": "0"
}
},
{
"2015-08-14 19:44:16-04:00": {
"value": "0.1",
"measurement_tag_id": "0.1.1b",
"UTC_time": "2015-08-14 23:44:16",
"error": "0"
}
}
]
This is to eventually have a slice of Reading structs, formatted as the following:
type reading struct {
Value string `json:"value"`
MTID string `json:"measurement_tag_id"`
UTCTime string `json:"UTC_time"`
Error string `json:"error"`
}
I would then like to add this into an existing structure nested as:
type site struct {
Name string
ID string
Tags []tag
}
type tag struct {
ID string
Readings []reading
}
I've currently been able to create the base structure for sites and tags from a more typical JSON payload with appropriate keys. I have been unsuccessful though in figuring out how to decode the reading JSON. So far the closest I have gotten is via map[string]interface{} chaining, but this feels incredibly clunky and verbose.
Solution so far for reference:
var readingData []interface{}
if err := json.Unmarshal(file, &readingData); err != nil {
panic(err)
}
readings := readingData[0].(map[string]interface{})
firstReading := readings["2015-08-14 19:29:48-04:00"].(map[string]interface{})
fmt.Println(firstReading)
value := firstReading["value"].(string)
error := firstReading["error"].(string)
MTID := firstReading["measurement_tag_id"].(string)
UTCTime := firstReading["UTC_time"].(string)
fmt.Println(value, error, MTID, UTCTime)
While I am not sure if its necessary yet, I would also like to hold on to the arbitrary date keys. My first thought was to create a function that returned a map[string]reading but I am not sure how feasible this is.
Thanks for the help in advance!
You can have your reading type implement the json.Unmarshaler interface.
func (r *reading) UnmarshalJSON(data []byte) error {
type _r reading // same structure, but no methods, avoids infinite calls to this method
m := map[string]_r{}
if err := json.Unmarshal(data, &m); err != nil {
return err
}
for _, v := range m {
*r = reading(v)
}
return nil
}
https://play.golang.org/p/7X1oB77XL4
Another way is to use a slice of maps to parse, then copy to slice of readings, eg:
var readingMaps []map[string]reading //slice of maps of string key to reading value
if err := json.Unmarshal([]byte(data), &readingMaps); err != nil {
panic(err)
}
readings := []reading{}
for _, m := range readingMaps {
for _, r := range m {
readings = append(readings, r)
}
}
play.golang.org/p/jXTdmaZz7s