Unmarshal JSON with arbitrary key/value pairs to struct - json

Problem
Found many similar questions (title) but none solved my problem, so here is it.
I have a JSON string that contains some known fields (should always be present) plus any number of unknown/arbitrary fields.
Example
{"known1": "foo", "known2": "bar", "unknown1": "car", "unknown2": 1}
In this example known1 and known2 are known fields. unknown1 and unknown2 are arbitrary/unknown fields.
The unknown fields can have any name (key) and any value. The value type can be either a string, bool, float64 or int.
What I want is to find the simplest and most elegant (idiomatic) way to parse a message like this.
My solution
I've used the following struct:
type Message struct {
Known1 string `json:"known1"`
Known2 string `json:"known2"`
Unknowns []map[string]interface{}
}
Expected result
With this struct and the above sample JSON message I want to achieve a Message like the following (output from fmt.Printf("%+v", msg)):
{Known1:foo Known2:bar Unknowns:[map[unknown1:car] map[unknown2:1]]}
Attempts
1. Simple unmarshal
https://play.golang.org/p/WO6XlpK_vJg
This doesn't work, Unknowns is not filled with the remaining unknown key/value pairs as expected.
2. Double unmarshal
https://play.golang.org/p/Mw6fOYr-Km8
This works but I needed two unmarshals, one to fill the known fields (using an alias type to avoid an infinite loop) and a second one to get all fields as a map[string]interface{} and process the unknowns.
3. Unmarshal and type conversion
https://play.golang.org/p/a7itXObavrX
This works and seems the best option among my solutions.
Any other option?
Option 2 and 3 work but I'm curious if anyone has a simpler/more elegant solution.

TMTOWTDI, and I think you found a reasonable solution. Another option you could consider -- which I guess is similar to your option 2 -- is to unmarshal it onto a map[string]interface{}.
Then range over the keys and values and do whatever you need to with the data, running type assertions on the values as necessary. Depending on what you need to do, you may skip the struct entirely, or still populate it.
package main
import (
"encoding/json"
"fmt"
)
func main() {
jsonMsg := `{"known1": "foo", "known2": "bar", "unknown1": "car", "unknown2": 1}`
var msg map[string]interface{}
fmt.Println(json.Unmarshal([]byte(jsonMsg), &msg))
fmt.Printf("%+v", msg)
}
prints
<nil>
map[known1:foo known2:bar unknown1:car unknown2:1]
https://play.golang.org/p/Bl99Cq5tFWW

Probably there is no existing solution just for your situation.
As an addition to your solutions you may try to use raw parsing libraries like:
https://github.com/buger/jsonparser
https://github.com/json-iterator/go
https://github.com/nikandfor/json
Last one is mine and it has some unfinished features, but raw parsing is done.
You may also use reflection with libraries above to set known fields in the struct.

Related

A better solution to validate JSON unmarshal to nested structs

There appears to be few options to validate the source JSON used when unmarshalling to a struct. By validate I mean 3 main things:
a required field exists in the JSON
the field is the correct type (e.g. don't force a string into an integer)
the field contains a valid value (value range / enum)
For nested structs, I simply mean where an attribute in one struct has the type of another struct:
type Example struct {
Attr1 int `json:"attr1"`
Attr2 ExampleToo `json:"attr2"`
}
type ExampleToo struct {
Attr3 int `json:"attr3"`
}
And this JSON would be valid:
{"attr1": 5, "attr2": {"attr3": 0}}
To keep this simple, I'll focus simply on integers. The concept of "zero values" is the first issue. I could create an UnmarshalJSON method, which is detected by JSON packages, including the standard encoding/json package. The problem with this approach is that is that is does not support nested structs. If ExampleToo has an UnmarshalJSON method, the ExampleToo.UnmarshalJSON() method is never called if unmarshalling to an Example object. It would be possible to write a method Example.UnmarshalJSON() that recursively handled validation, but that seems extremely complex, especially if ExampleToo is reused in many places.
So there appears to be some packages like the go-playground/validator where validation can be specified both as functions and tags. However, this works on the struct created, and not the JSON itself. So if a field is tagged as validation:"required" on an integer, and the integer value is 0, this will return an error because 0 is both a valid value and the "zero value" for integers.
An example of the latter here: https://go.dev/play/p/zqSUksPzUiq
I could also use pointers for everything, checking for nil as missing values. The main problem with that is that it requires dereferencing on each use and is a pretty uncommon practice for things like integers and strings.
One thing that I have also considered is a "sister struct" that uses pointers to do validation for required fields. The process would basically be to write a validation method for each struct, then validate that sister struct. If it works, then deserialize the main struct (without pointers). I haven't started on this, just a concept I've thought about, but I'm hoping there are better validation options.
So... is there a better way to do JSON/YAML input validation on nested structs? I'm happy to mix methods where say UnmarshalJSON is used for doing some work like verifying fields exist, but I'd like to pass that back to the library to let it continue to call UnmarshalJSON on subsequent nested structs. I'd also rather defer to the JSON library for casting values into the struct, etc.

Excessive use of map[string]interface{} in go development?

The majority of my development experience has been from dynamically typed languages like PHP and Javascript. I've been practicing with Golang for about a month now by re-creating some of my old PHP/Javascript REST APIs in Golang. I feel like I'm not doing things the Golang way most of the time. Or more generally, I'm not use to working with strongly typed languages. I feel like I'm making excessive use of map[string]interface{} and slices of them to box up data as it comes in from http requests or when it gets shipped out as json http output. So what I'd like to know is if what I'm about to describe goes against the philosophy of golang development? Or if I'm breaking the principles of developing with strongly typed languages?
Right now, about 90% of the program flow for REST Apis I've rewritten with Golang can be described by these 5 steps.
STEP 1 - Receive Data
I receive http form data from http.Request.ParseForm() as formvals := map[string][]string. Sometimes I will store serialized JSON objects that need to be unmarshaled like jsonUserInfo := json.Unmarshal(formvals["user_information"][0]) /* gives some complex json object */.
STEP 2 - Validate Data
I do validation on formvals to make sure all the data values are what I expect before using it in SQL queries. I treat everyting as a string, then use Regex to determine if the string format and business logic is valid (eg. IsEmail, IsNumeric, IsFloat, IsCASLCompliant, IsEligibleForVoting,IsLibraryCardExpired etc...). I've written my own Regex and custom functions for these types of validations
STEP 3 - Bind Data to SQL Queries
I use golang's database/sql.DB to take my formvals and bind them to my Query and Exec functions like this Query("SELECT * FROM tblUser WHERE user_id = ?, user_birthday > ? ",formvals["user_id"][0], jsonUserInfo["birthday"]). I never care about the data types I'm supplying as arguments to be bound, so they're all probably strings. I trust the validation in the step immediately above has determined they are acceptable for SQL use.
STEP 4 - Bind SQL results to []map[string]interface{}{}
I Scan() the results of my queries into a sqlResult := []map[string]interface{}{} because I don't care if the value types are null, strings, float, ints or whatever. So the schema of an sqlResult might look like:
sqlResult =>
[0] {
"user_id":"1"
"user_name":"Bob Smith"
"age":"45"
"weight":"34.22"
},
[1] {
"user_id":"2"
"user_name":"Jane Do"
"age":nil
"weight":"22.22"
}
I wrote my own eager load function so that I can bind more information like so EagerLoad("tblAddress", "JOIN ON tblAddress.user_id",&sqlResult) which then populates sqlResult with more information of the type []map[string]interface{}{} such that it looks like this:
sqlResult =>
[0] {
"user_id":"1"
"user_name":"Bob Smith"
"age":"45"
"weight":"34.22"
"addresses"=>
[0] {
"type":"home"
"address1":"56 Front Street West"
"postal":"L3L3L3"
"lat":"34.3422242"
"lng":"34.5523422"
}
[1] {
"type":"work"
"address1":"5 Kennedy Avenue"
"postal":"L3L3L3"
"lat":"34.3422242"
"lng":"34.5523422"
}
},
[1] {
"user_id":"2"
"user_name":"Jane Do"
"age":nil
"weight":"22.22"
"addresses"=>
[0] {
"type":"home"
"address1":"56 Front Street West"
"postal":"L3L3L3"
"lat":"34.3422242"
"lng":"34.5523422"
}
}
STEP 5 - JSON Marshal and send HTTP Response
then I do a http.ResponseWriter.Write(json.Marshal(sqlResult)) and output data for my REST API
Recently, I've been revisiting articles with code samples that use structs in places I would have used map[string]interface{}. For example, I wanted to refactor Step 2 with a more standard approach that other golang developers would use. So I found this https://godoc.org/gopkg.in/go-playground/validator.v9, except all it's examples are with structs . I also noticed that most blogs that talk about database/sql scan their SQL results into typed variables or structs with typed properties, as opposed to my Step 4 which just puts everything into map[string]interface{}
Hence, i started writing this question. I feel the map[string]interface{} is so useful because majority of the time,I don't really care what the data is and it gives me to the freedom in Step 4 to construct any data schema on the fly before I dump it as JSON http response. I do all this with as little code verbosity as possible. But this means my code is not as ready to leverage Go's validation tools, and it doesn't seem to comply with the golang community's way of doing things.
So my question is, what do other golang developers do with regards to Step 2 and Step 4? Especially in Step 4...do Golang developers really encourage specifying the schema of the data through structs and strongly typed properties? Do they also specify structs with strongly typed properties along with every eager loading call they make? Doesn't that seem like so much more code verbosity?
It really depends on the requirements just like you have said you don't require to process the json it comes from the request or from the sql results. Then you can easily unmarshal into interface{}. And marshal the json coming from sql results.
For Step 2
Golang has library which works on validation of structs used to unmarshal json with tags for the fields inside.
https://github.com/go-playground/validator
type Test struct {
Field `validate:"max=10,min=1"`
}
// max will be checked then min
you can also go to godoc for validation library. It is very good implementation of validation for json values using struct tags.
For STEP 4
Most of the times, We use structs if we know the format and data of our JSON. Because it provides us more control over the data types and other functionality. For example if you wants to empty a JSON feild if you don't require it in your JSON. You should use struct with _ json tag.
Now you have said that you don't care if the result coming from sql is empty or not. But if you do it again comes to using struct. You can scan the result into struct with sql.NullTypes. With that also you can provide json tag for omitempty if you wants to omit the json object when marshaling the data when sending a response.
Struct values encode as JSON objects. Each exported struct field
becomes a member of the object, using the field name as the object
key, unless the field is omitted for one of the reasons given below.
The encoding of each struct field can be customized by the format
string stored under the "json" key in the struct field's tag. The
format string gives the name of the field, possibly followed by a
comma-separated list of options. The name may be empty in order to
specify options without overriding the default field name.
The "omitempty" option specifies that the field should be omitted from
the encoding if the field has an empty value, defined as false, 0, a
nil pointer, a nil interface value, and any empty array, slice, map,
or string.
As a special case, if the field tag is "-", the field is always
omitted. Note that a field with name "-" can still be generated using
the tag "-,".
Example of json tags
// Field appears in JSON as key "myName".
Field int `json:"myName"`
// Field appears in JSON as key "myName" and
// the field is omitted from the object if its value is empty,
// as defined above.
Field int `json:"myName,omitempty"`
// Field appears in JSON as key "Field" (the default), but
// the field is skipped if empty.
// Note the leading comma.
Field int `json:",omitempty"`
// Field is ignored by this package.
Field int `json:"-"`
// Field appears in JSON as key "-".
Field int `json:"-,"`
As you can analyze from above information given in Golang spec for json marshal. Struct provide so much control over json. That's why Golang developer most probably use structs.
Now on using map[string]interface{} you should use it when you don't the structure of your json coming from the server or the types of fields. Most Golang developers stick to structs wherever they can.

Go json.Unmarshal field case

I'm new to Go. I was trying to fetch and marshal json data to a struct. My sample data looks like this:
var reducedFieldData = []byte(`[
{"model":"Traverse","vin":"1gnkrhkd6ej111234"}
,{"model":"TL","vin":"19uua66265a041234"}
]`)
If I define the struct for receiving the data like this:
type Vehicle struct {
Model string
Vin string
}
The call to Unmarshal works as expected. However, if I use lower case for the fields ("model" and "vin") which actually matches cases for the field names in the data it will return empty strings for the values.
Is this expected behavior? Can the convention be turned off?
Fields need to be exported (declared with an uppercase first letter) or the reflection library cannot edit them. Since the JSON (un)marshaller uses reflection, it cannot read or write unexported fields.
So yes, it is expected, and no, you cannot change it. Sorry.
You can add tags to a field to change the name the marshaller uses:
Model string `json:"model"`
See the documentation for more info on the field tags "encoding/json" supports.

Get field name that err's in Go json unmarshal

I have some big json files that are slightly different in the types that the fields contain.
{ "a":"1" }
vs.
{ "a":1 }
When I unmarshal the second I get:
cannot unmarshal number into Go value of type string
However since these jsons are large I would like to have the actual field that is in error so I can fix them. The UnmarshalTypeError does not hold the Struct's field type.
Does anybody know of a way to get to field name? (not debugging I have a lot of different fields that err)
[EDIT]
I know how to solve the type conversion. What I need is a method to see what fields I need to apply that conversion to.
The short answer is that you can't.
However, to fix your problem, there is multiple solutions:
Dive into the json.Unmarshal source code to change its working and add the information you need: copy the function to a local package, do your edits, and use this function
Use a thrid-party tool to help you, for example a JSON validator compatible with JSON Schema: here is an online example, there is probably some better-suited tool
Now the UnmarshalTypeError, contains the field name.

Unmarshal a JSON array of heterogeneous structs

I want to deserialise an object that includes an array of a some interface Entity:
type Result struct {
Foo int;
Bar []Entity;
};
Entity is an interface that is implemented by a number of struct types. JSON data identifies the struct type with a "type" field in each entity. E.g.
{"type":"t1","field1":1}
{"type":"t2","field2":2,"field3":3}
How would I go about deserialising the Result type in such a way that it correctly populates the array. From what I can see, I have to:
Implement UnmarshalJSON on Result.
Parse Bar as a []*json.RawMessage.
Parse each raw message as map[string]interface{}.
Check "type" field in the raw message.
Create a struct of appropriate type.
Parse the raw message again, this time into the just created struct.
This all sounds very tedious and boring. Is there a better way to do this? Or am I doing it backwards, and there is a more canonical method to handle an array of heterogeneous objects?
I think your process is probably a bit more complicated than it has to be, see http://play.golang.org/p/0gahcMpuQc. A single map[string]interface{} will handle a lot of that for you.
Alternatively, you could make a type like
struct EntityUnion {
Type string
// Fields from t1
// Fields from t2
// ...
}
Unmarshal into that; it will set the Type string and fill in all the fields it can get from the JSON data. Then you just need a small function to copy the fields to the specific type.