UnmarshalJSON any type from []interface{} possible? - json

When you unmarshal JSON to []interface{} is there any way to automatically detect the type besides some standard types like bool, int and string?
What I noticed is the following, Let's say I marshal [uuid.UUID, bool] then the JSON I get looks like:
[[234,50,7,116,194,41,64,225,177,151,60,195,60,45,123,106],true]
When I unmarshal it again, I get the types as shown through reflect:
[]interface{}, bool
I don't understand why it picked []interface{}. If it cannot detect it, shouldn't it be at least interface{}?
In any case, my question is, is it possible to unmarshal any type when the target is of type []interface{}? It seems to work for standard types like string, bool, int but for custom types I don't think that's possible, is it? You can define custom JSON marshal/unmarshal methods but that only works if you decode it into a target type so that it can look up which custom marshal/unmarshal methods to use.

You can unmarshal any type into a value of type interface{}. If you use a value of type []interface{}, you can only unmarshal JSON arrays into it, but yes, the elements of the array may be of any type.
Since you're using interface{} or []interface{}, yes, type information is not available, and it's up to the encoding/json package to choose the best it sees fit. For example, for JSON objects it will choose map[string]interface{}. The full list of default types is documented in json.Unmarshal():
To unmarshal JSON into an interface value, Unmarshal stores one of these in the interface value:
bool, for JSON booleans
float64, for JSON numbers
string, for JSON strings
[]interface{}, for JSON arrays
map[string]interface{}, for JSON objects
nil for JSON null
Obviously if your JSON marshaling/unmarshaling logic needs some pre- / postprocessing, the json package will not miraculously find it out. It can know about those only if you unmarshal into values of specific types (which implement json.Unmarshaler). The json package will still be able to unmarshal them to the default types, but custom logic will obviously not run on them.

Related

A better solution to validate JSON unmarshal to nested structs

There appears to be few options to validate the source JSON used when unmarshalling to a struct. By validate I mean 3 main things:
a required field exists in the JSON
the field is the correct type (e.g. don't force a string into an integer)
the field contains a valid value (value range / enum)
For nested structs, I simply mean where an attribute in one struct has the type of another struct:
type Example struct {
Attr1 int `json:"attr1"`
Attr2 ExampleToo `json:"attr2"`
}
type ExampleToo struct {
Attr3 int `json:"attr3"`
}
And this JSON would be valid:
{"attr1": 5, "attr2": {"attr3": 0}}
To keep this simple, I'll focus simply on integers. The concept of "zero values" is the first issue. I could create an UnmarshalJSON method, which is detected by JSON packages, including the standard encoding/json package. The problem with this approach is that is that is does not support nested structs. If ExampleToo has an UnmarshalJSON method, the ExampleToo.UnmarshalJSON() method is never called if unmarshalling to an Example object. It would be possible to write a method Example.UnmarshalJSON() that recursively handled validation, but that seems extremely complex, especially if ExampleToo is reused in many places.
So there appears to be some packages like the go-playground/validator where validation can be specified both as functions and tags. However, this works on the struct created, and not the JSON itself. So if a field is tagged as validation:"required" on an integer, and the integer value is 0, this will return an error because 0 is both a valid value and the "zero value" for integers.
An example of the latter here: https://go.dev/play/p/zqSUksPzUiq
I could also use pointers for everything, checking for nil as missing values. The main problem with that is that it requires dereferencing on each use and is a pretty uncommon practice for things like integers and strings.
One thing that I have also considered is a "sister struct" that uses pointers to do validation for required fields. The process would basically be to write a validation method for each struct, then validate that sister struct. If it works, then deserialize the main struct (without pointers). I haven't started on this, just a concept I've thought about, but I'm hoping there are better validation options.
So... is there a better way to do JSON/YAML input validation on nested structs? I'm happy to mix methods where say UnmarshalJSON is used for doing some work like verifying fields exist, but I'd like to pass that back to the library to let it continue to call UnmarshalJSON on subsequent nested structs. I'd also rather defer to the JSON library for casting values into the struct, etc.

How can I use serde_json to deserialize arbitrary JSON into a Value-like object of raw bytes?

I'm writing a library to deserialize a subset of JSON into predefined Python types.
I want to deserialize arbitrary JSON into an object that quacks like serde-json's Value. However, I don't want it to deserialize into String's, Number's and Bool's - instead when the deserializer hits one of these I would prefer it simply keeps a reference to the respective byte string so I can efficiently (i.e. without the additional type conversion) parse the byte strings into the correct arbitrary Python types. Something like this:
use serde::Deserialize;
use serde_json::value::RawValue;
use serde_json::Map;
#[derive(Deserialize)]
pub enum MyValue<'a> {
Null,
Bytes(&'a RawValue),
Array(Vec<MyValue<'a>>),
Object(Map<String, MyValue<'a>>),
}
This will require writing a lot of traits so that it behaves like Value, and I'm not even sure if it won't just ignore deserializing the structural parts and put everything into a RawValue.
What is the cleanest way to do this?

What's the difference between UnmarshalText and UnmarshalJson?

In decode.go, it mentions:
// To unmarshal JSON into a value implementing the Unmarshaler interface,
// Unmarshal calls that value's UnmarshalJSON method, including
// when the input is a JSON null.
// Otherwise, if the value implements encoding.TextUnmarshaler
// and the input is a JSON quoted string, Unmarshal calls that value's
// UnmarshalText method with the unquoted form of the string.
What are the differences between UnmarshalText and UnmarshalJSON? Which one is preferred?
Simply:
UnmarshalText unmarshals a text-encoded value.
UnmarshalJSON unmarshals a JSON-encoded value.
Which is preferred depends on what you're doing.
JSON encoding is defined by RFC 7159. If you're consuming or producing JSON documents, you should use JSON encoding.
Text encoding has no standard, and is entirely implementation-dependent. Go implements Text-(un)marshalers for a few types, but there's no guarantee that any other application will understand these formats.
Text-encoding is most commonly used for things like URL query parameters, HTML forms, or other loosely-defined formats.
If you have a choice in the matter, using JSON is probably a better way to go. But again, it depends on what you're doing what makes the most sense.
As it relates to Go's JSON unmarshaler, the JSON unmarshaler will call a type's UnmarshalJSON method, if it's defined, and fall back to UnmarshalText if that is defined.
If you know you'll be using JSON, you should absolutely define an UnmarshalJSON function.
You would generally create an UnmarshalText only if you expected it to be used in non-JSON contexts, with the added benefit that the JSON unmarshaler would also use it, without having to duplicate it (if indeed the same implementation would work for JSON).
Per the documentation:
To unmarshal JSON into a value implementing the Unmarshaler interface,
Unmarshal calls that value's UnmarshalJSON method, including when the
input is a JSON null. Otherwise, if the value implements
encoding.TextUnmarshaler and the input is a JSON quoted string,
Unmarshal calls that value's UnmarshalText method with the unquoted
form of the string.
Meaning: if you want to take some JSON and unmarshal it with some custom logic, you would use UnmarshalJSON. If you want to take the text in a string field of a JSON document and decode that in some special way (i.e. parse it rather than just write it into a string-typed field), you would use UnmarshalText. For example, net.IP implements UnmarshalText so that you can provide a string value like "ipAddress": "1.2.3.4" and unmarshal it into a net.IP field. If net.IP did not implement UnmarshalText, you would only be able to unmarshal the JSON representation of the underlying type ([]byte).

How to unmarshal from interface{} to interface{} in Go

There are multiple nodes in my system which communicate through RPC. I am trying to send a map[string] interface{} to another node through RPC. Sender uses json.Marshal and receiver uses json.Unmarshal to get the map.
Let us say at the sender side, map contains [1] => 2 where 2 is of type uint32.
The problem is Unmarshal tries to find the type of underlying data and converts 2 to float64 type according to its default behavior as specified here https://blog.golang.org/json-and-go. Later, casting float64 to uint32 causes panic.
I refered to How to unmarshal json into interface{} in golang? . But for this, we need to know the type of data. In my case data can be of any type so I want to keep it as interface{}. How do I unmarshal from interface{} to interface{}?
Unfortunately using the encoding/json package you can't, because type information is not transmitted, and JSON numbers by default are unmarshaled into values of float64 type if type information is not present. You would need to define struct types where you explicitly state the field is of type uint32.
Alternatively you may opt to use encoding/gob which does transmit and preserve type information. See this example:
m := map[string]interface{}{"1": uint32(1)}
b := &bytes.Buffer{}
gob.NewEncoder(b).Encode(m)
var m2 map[string]interface{}
gob.NewDecoder(b).Decode(&m2)
fmt.Printf("%T\n%#v\n", m2["1"], m2)
Output (try it on the Go Playground):
uint32
map[string]interface {}{"1":0x1}
The downside of gob is that it's Go-specific unlike the language and platform independent JSON.

Unmarshal a JSON array of heterogeneous structs

I want to deserialise an object that includes an array of a some interface Entity:
type Result struct {
Foo int;
Bar []Entity;
};
Entity is an interface that is implemented by a number of struct types. JSON data identifies the struct type with a "type" field in each entity. E.g.
{"type":"t1","field1":1}
{"type":"t2","field2":2,"field3":3}
How would I go about deserialising the Result type in such a way that it correctly populates the array. From what I can see, I have to:
Implement UnmarshalJSON on Result.
Parse Bar as a []*json.RawMessage.
Parse each raw message as map[string]interface{}.
Check "type" field in the raw message.
Create a struct of appropriate type.
Parse the raw message again, this time into the just created struct.
This all sounds very tedious and boring. Is there a better way to do this? Or am I doing it backwards, and there is a more canonical method to handle an array of heterogeneous objects?
I think your process is probably a bit more complicated than it has to be, see http://play.golang.org/p/0gahcMpuQc. A single map[string]interface{} will handle a lot of that for you.
Alternatively, you could make a type like
struct EntityUnion {
Type string
// Fields from t1
// Fields from t2
// ...
}
Unmarshal into that; it will set the Type string and fill in all the fields it can get from the JSON data. Then you just need a small function to copy the fields to the specific type.