Rust: enable Serde to distinguish between undefined & null [duplicate] - json

I'd like to use Serde to parse some JSON as part of a HTTP PATCH request. Since PATCH requests don't pass the entire object, only the relevant data to update, I need the ability to tell between a value that was not passed, a value that was explicitly set to null, and a value that is present.
I have a value object with multiple nullable fields:
struct Resource {
a: Option<i32>,
b: Option<i32>,
c: Option<i32>,
}
If the client submits JSON like this:
{"a": 42, "b": null}
I'd like to change a to Some(42), b to None, and leave c unchanged.
I tried wrapping each field in one more level of Option:
#[derive(Debug, Deserialize)]
struct ResourcePatch {
a: Option<Option<i32>>,
b: Option<Option<i32>>,
c: Option<Option<i32>>,
}
playground
This does not make a distinction between b and c; both are None but I'd have wanted b to be Some(None).
I'm not tied to this representation of nested Options; any solution that can distinguish the 3 cases would be fine, such as one using a custom enum.

Building off of E_net4's answer, you can also create an enum for the three possibilities:
#[derive(Debug)]
enum Patch<T> {
Missing,
Null,
Value(T),
}
impl<T> Default for Patch<T> {
fn default() -> Self {
Patch::Missing
}
}
impl<T> From<Option<T>> for Patch<T> {
fn from(opt: Option<T>) -> Patch<T> {
match opt {
Some(v) => Patch::Value(v),
None => Patch::Null,
}
}
}
impl<'de, T> Deserialize<'de> for Patch<T>
where
T: Deserialize<'de>,
{
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
Option::deserialize(deserializer).map(Into::into)
}
}
This can then be used as:
#[derive(Debug, Deserialize)]
struct ResourcePatch {
#[serde(default)]
a: Patch<i32>,
}
Unfortunately, you still have to annotate each field with #[serde(default)] (or apply it to the entire struct). Ideally, the implementation of Deserialize for Patch would handle that completely, but I haven't figured out how to do that yet.

Quite likely, the only way to achieve that right now is with a custom deserialization function. Fortunately, it is not hard to implement, even to make it work for any kind of field:
fn deserialize_optional_field<'de, T, D>(deserializer: D) -> Result<Option<Option<T>>, D::Error>
where
D: Deserializer<'de>,
T: Deserialize<'de>,
{
Ok(Some(Option::deserialize(deserializer)?))
}
Then each field would be annotated as thus:
#[serde(deserialize_with = "deserialize_optional_field")]
a: Option<Option<i32>>,
You also need to annotate the struct with #[serde(default)], so that empty fields are deserialized to an "unwrapped" None. The trick is to wrap present values around Some.
Serialization relies on another trick: skipping serialization when the field is None:
#[serde(deserialize_with = "deserialize_optional_field")]
#[serde(skip_serializing_if = "Option::is_none")]
a: Option<Option<i32>>,
Playground with the full example. The output:
Original JSON: {"a": 42, "b": null}
> Resource { a: Some(Some(42)), b: Some(None), c: None }
< {"a":42,"b":null}

Building up on Shepmaster's answer and adding serialization.
use serde::ser::Error;
use serde::{Deserialize, Deserializer};
use serde::{Serialize, Serializer};
// #region ------ JSON Absent support
// build up on top of https://stackoverflow.com/a/44332837
/// serde Valueue that can be Absent, Null, or Valueue(T)
#[derive(Debug)]
pub enum Maybe<T> {
Absent,
Null,
Value(T),
}
#[allow(dead_code)]
impl<T> Maybe<T> {
pub fn is_absent(&self) -> bool {
match &self {
Maybe::Absent => true,
_ => false,
}
}
}
impl<T> Default for Maybe<T> {
fn default() -> Self {
Maybe::Absent
}
}
impl<T> From<Option<T>> for Maybe<T> {
fn from(opt: Option<T>) -> Maybe<T> {
match opt {
Some(v) => Maybe::Value(v),
None => Maybe::Null,
}
}
}
impl<'de, T> Deserialize<'de> for Maybe<T>
where
T: Deserialize<'de>,
{
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
let d = Option::deserialize(deserializer).map(Into::into);
d
}
}
impl<T: Serialize> Serialize for Maybe<T> {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
match self {
// this will be serialized as null
Maybe::Null => serializer.serialize_none(),
Maybe::Value(v) => v.serialize(serializer),
// should have been skipped
Maybe::Absent => Err(Error::custom(
r#"Maybe fields need to be annotated with:
#[serde(default, skip_serializing_if = "Maybe::is_Absent")]"#,
)),
}
}
}
// #endregion --- JSON Absent support
And then you can use it this way:
#[derive(Serialize, Deserialize, Debug)]
struct Rect {
#[serde(default, skip_serializing_if = "Maybe::is_absent")]
stroke: Maybe<i32>,
w: i32,
#[serde(default, skip_serializing_if = "Maybe::is_absent")]
h: Maybe<i32>,
}
// ....
let json = r#"
{
"stroke": null,
"w": 1
}"#;
let deserialized: Rect = serde_json::from_str(json).unwrap();
println!("deserialized = {:?}", deserialized);
// will output: Rect { stroke: Null, w: 1, h: Absent }
let serialized = serde_json::to_string(&deserialized).unwrap();
println!("serialized back = {}", serialized);
// will output: {"stroke":null,"w":1}
I wish Serde had a built-in way to handle JSON's null and absent states.
Update 2021-03-12 - Updated to Maybe::Absent as it is more JSON and SQL DSL idiomatic.
The catch with this approach is that we can express:
type | null with the default Option<type>
type | null | absent with Maybe<type>
But we cannot express
type | absent
The solution would be to refactor Maybe to just have ::Present(value) and ::Absent and support Maybe<Option<type>> for the type | null | absent. So this will give us full coverage.
type | null with the default Option<type>
type | absent with Maybe<type>
type | absent | null with Maybe<Option<type>>
I am trying to implement this without adding a #[serde(deserialize_with = "deserialize_maybe_field")] but not sure it is possible. I might be missing something obvious.

Related

Serde JSON deserializing enums

I have an enum:
#[derive(Serialize, Deserialize)]
enum Action {
Join,
Leave,
}
and a struct:
#[derive(Serialize, Deserialize)]
struct Message {
action: Action,
}
and I pass a JSON string:
"{\"action\":0}" // `json_string` var
but when I try deserialzing this like this:
let msg: Message = serde_json::from_str(json_string)?;
I get the error expected value at line 1 column 11.
In the JSON if I were to replace the number 0 with the string "Join" it works, but I want the number to correspond to the Action enum's values (0 is Action::Join, 1 is Action::Leave) since its coming from a TypeScript request. Is there a simple way to achieve this?
You want serde_repr!
Here's example code from the library's README:
use serde_repr::{Serialize_repr, Deserialize_repr};
#[derive(Serialize_repr, Deserialize_repr, PartialEq, Debug)]
#[repr(u8)]
enum SmallPrime {
Two = 2,
Three = 3,
Five = 5,
Seven = 7,
}
fn main() -> serde_json::Result<()> {
let j = serde_json::to_string(&SmallPrime::Seven)?;
assert_eq!(j, "7");
let p: SmallPrime = serde_json::from_str("2")?;
assert_eq!(p, SmallPrime::Two);
Ok(())
}
For your case:
use serde_repr::{Serialize_repr, Deserialize_repr};
#[derive(Serialize_repr, Deserialize_repr)]
#[repr(u8)]
enum Action {
Join = 0,
Leave = 1,
}
use serde::{Serialize, Deserialize};
#[derive(Serialize, Deserialize)]
struct Message {
action: Action,
}
Without adding any extra dependencies, the least verbose way is possibly to use "my favourite serde trick", the try_from and into container attributes. But in this case I feel that custom implementations of Deserialize and Serialize are more appropriate:
impl<'de> Deserialize<'de> for Action {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
match i8::deserialize(deserializer)? {
0 => Ok(Action::Join),
1 => Ok(Action::Leave),
_ => Err(serde::de::Error::custom("Expected 0 or 1 for action")),
}
}
}
impl Serialize for Action {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
serializer.serialize_i8(match self {
Action::Join => 0,
Action::Leave => 1,
})
}
}
The custom implementations only redirect to serializing/deserializing i8. Playground

Is there a way to derive a struct like with Deserialize to get automatic transform from serde_json::Value?

Deserialising from a string directly into a struct works perfectly. But in some cases, you may already have a serde_json::Value in your hands, and want to try and convert it into a struct.
The following example illustrate just that: loading a Request struct from JSON (in a network library for example), with a type string and a generic content as a Value, and then you want to call a handler (from a client library) with the value transformed into a given struct.
use serde::Deserialize;
use serde_json::{json, Value};
use std::convert::TryFrom;
use std::error::Error;
#[derive(Deserialize)]
struct Request {
#[serde(alias = "type")]
req_type: String,
content: Value
}
#[derive(Deserialize)]
struct Person {
name: String,
age: u8
}
// It there a way to avoid having to declare this???
impl TryFrom<Value> for Person {
type Error = serde_json::Error;
fn try_from(value: Value) -> Result<Self, Self::Error> {
Person::deserialize(value)
}
}
fn say_hello(p: Person) {
println!("Hello, I'm {}, and I'm {} years old!", p.name, p.age);
}
fn main() -> Result<(), Box<dyn Error>> {
let req: Request = Request::deserialize(json!({
"type": "sayHello",
"content": {
"name": "Pierre",
"age": 32
}
}))?;
match req.req_type.as_str() {
"sayHello" => say_hello(req.content.try_into()?),
_ => println!("unknown request")
}
Ok(())
}
So the question is: is there some derive or other magic implemented which would allow the same behaviour as Deserialize from String, so that the client can only write:
#[derive(Deserialize)]
struct Person {
name: String,
age: u8
}
fn say_hello(p: Person) {
println!("Hello, I'm {}, and I'm {} years old!", p.name, p.age);
}
I tried the #[serde(try_from = "Value")] attribute but it does not look like it's intended for that purpose...
There is serde_json::from_value() specifically for this:
pub fn from_value<T>(value: Value) -> Result<T, Error>
where
T: DeserializeOwned,
Given any serde_json::Value and some T: DeserializedOwned, the function will deserialize the Value to that T, if possible.

How can I deserialize a type where all the fields are default values as a None instead?

I have to deserialize JSON blobs where in some places the absence of an entire object is encoded as an object with the same structure but all of its fields set to default values (empty strings and zeroes).
extern crate serde_json; // 1.0.27
#[macro_use] extern crate serde_derive; // 1.0.78
extern crate serde; // 1.0.78
#[derive(Debug, Deserialize)]
struct Test<T> {
text: T,
number: i32,
}
#[derive(Debug, Deserialize)]
struct Outer {
test: Option<Test<String>>,
}
#[derive(Debug, Deserialize)]
enum Foo { Bar, Baz }
#[derive(Debug, Deserialize)]
struct Outer2 {
test: Option<Test<Foo>>,
}
fn main() {
println!("{:?}", serde_json::from_str::<Outer>(r#"{ "test": { "text": "abc", "number": 42 } }"#).unwrap());
// good: Outer { test: Some(Test { text: "abc", number: 42 }) }
println!("{:?}", serde_json::from_str::<Outer>(r#"{ "test": null }"#).unwrap());
// good: Outer { test: None }
println!("{:?}", serde_json::from_str::<Outer>(r#"{ "test": { "text": "", "number": 0 } }"#).unwrap());
// bad: Outer { test: Some(Test { text: "", number: 0 }) }
// should be: Outer { test: None }
println!("{:?}", serde_json::from_str::<Outer2>(r#"{ "test": { "text": "Bar", "number": 42 } }"#).unwrap());
// good: Outer2 { test: Some(Test { text: Bar, number: 42 }) }
println!("{:?}", serde_json::from_str::<Outer2>(r#"{ "test": { "text": "", "number": 0 } }"#).unwrap());
// bad: error
// should be: Outer { test: None }
}
I would handle this after deserialization but as you can see this approach is not possible for enum values: no variant matches the empty string so the deserialization fails entirely.
How can I teach this to serde?
There are two things that need to be solved here: replacing Some(value) with None if value is all defaults, and handling the empty string case for Foo.
The first thing is easy. The Deserialize implementation for Option unconditionally deserializes it as Some if the input field isn't None, so you need to create a custom Deserialize implementation that replaces Some(value) with None if the value is equal to some sentinel, like the default (this is the answer proposed by Issac, but implemented correctly here):
fn none_if_all_default<'de, T, D>(deserializer: D) -> Result<Option<T>, D::Error>
where
T: Deserialize<'de> + Default + Eq,
D: Deserializer<'de>,
{
Option::deserialize(deserializer).map(|opt| match opt {
Some(value) if value == T::default() => None,
opt => opt,
})
}
#[derive(Deserialize)]
struct Outer<T: Eq + Default> {
#[serde(deserialize_with = "none_if_all_default")]
#[serde(bound(deserialize = "T: Deserialize<'de>"))]
test: Option<Test<T>>,
}
This solves the first half of your problem, with Option<Test<String>>. This will work for any deserializable type that is Eq + Default.
The enum case is much more tricky; the problem you're faced with is that Foo simply won't deserialize from a string other than "Bar" or "Baz". I don't really see a good solution for this other than adding a third "dead" variant to the enum:
#[derive(PartialEq, Eq, Deserialize)]
enum Foo {
Bar,
Baz,
#[serde(rename = "")]
Absent,
}
impl Default for Foo { fn default() -> Self { Self::Absent } }
The reason this problem exists from a data-modeling point of view is that it has to account for the possibility that you'll get json like this:
{ "test": { "text": "", "number": 42 } }
In this case, clearly Outer { test: None } is not the correct result, but it still needs a value to store in Foo, or else return a deserialization error.
If you want it to be the case that "" is valid text only if number is 0, you could do something significantly more elaborate and probably overkill for your needs, compared to just using Absent. You'd need to use an untagged enum, which can store either a "valid" Test or an "all empty" Test, and then create a version of your struct that only deserializes default values:
struct MustBeDefault<T> {
marker: PhantomData<T>
}
impl<'de, T> Deserialize<'de> for MustBeDefault<T>
where
T: Deserialize<'de> + Eq + Default
{
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>
{
match T::deserialize(deserializer)? == T::default() {
true => Ok(MustBeDefault { marker: PhantomData }),
false => Err(D::Error::custom("value must be default"))
}
}
}
// All fields need to be generic in order to use this solution.
// Like I said, this is radically overkill.
#[derive(Deserialize)]
struct Test<T, U> {
text: T,
number: U,
}
#[derive(Deserialize)]
#[serde(untagged)]
enum MaybeDefaultedTest<T> {
AllDefault(Test<EmptyString, MustBeDefault<i32>>),
Normal(Test<Foo, i32>),
}
// `EmptyString` is a type that only deserializes from empty strings;
// its implementation is left as an exercise to the reader.
// You'll also need to convert from MaybeDefaultedTest<T> to Option<T>;
// this is also left as an exercise to the reader.
It is now possible to write MaybeDefaulted<Foo>, which will deserialize from things like {"text": "", "number": 0} or {"text": "Baz", "number": 10} or {"text": "Baz", "number": 0}, but will fail to deserialize from {"text": "", "number": 10}.
Again, for the third time, this solution is probably radically overkill (especially if your real-world use case involves more than 2 fields in the Test struct), and so unless you have very intense data modeling requirements, you should go with adding an Absent variant to Foo.
You can look at an example of custom field deserializing.
In particular, you might want to define something like
extern crate serde; // 1.0.78
#[macro_use]
extern crate serde_derive; // 1.0.78
use serde::{Deserialize, Deserializer, de::Visitor};
fn none_if_all_default<'de, T, D>(deserializer: D) -> Result<Option<T>, D::Error>
where
T: Deserialize<'de>,
D: Deserializer<'de> + Clone,
{
struct AllDefault;
impl<'de> Visitor<'de> for AllDefault {
type Value = bool;
// Implement the visitor functions here -
// You can recurse over all values to check if they're
// the empty string or 0, and return true
//...
}
let all_default = deserializer.clone().deserialize_any(AllDefault)?;
if all_default {
Ok(None)
} else {
Ok(Some(T::deserialize(deserializer)?))
}
}
And then do
#[derive(Deserialize)]
struct Outer2 {
#[serde(deserialize_with = "none_if_all_default")]
test: Option<Test<Foo>>,
}

How can I do key-agnostic deserialization of JSON objects?

I am working with a less than ideal API that doesn't follow any rigid standard for sending data. Each payload comes with some payload info before the JSON, followed by the actual data inside which can be a single string or several fields.
As it stands right now, if I were to map every different payload to a struct I would end up with roughly 50 structs. I feel like this is not ideal, because a ton of these structs overlap in all but key. For instance, there are I believe 6 different versions of payloads that could be mapped to something like the following, but they all have different keys.
I have these two JSON examples:
{"key": "string"}
{"key2": "string"}
And I want to serialize both into this struct:
#[derive(Debug, Deserialize)]
struct SimpleString {
key: String,
}
The same can be said for two strings, and even a couple cases for three. The payloads are frustratingly unique in small ways, so my current solution is to just define the structs locally inside the function that deserializes them and then pass that data off wherever it needs to go (in my case a cache and an event handler)
Is there a better way to represent this that doesn't have so much duplication? I've tried looking for things like key-agnostic deserializing but I haven't found anything yet.
You can implement Deserialize for your type to decode a "map" and ignore the key name:
extern crate serde;
extern crate serde_json;
use std::fmt;
use serde::de::{Deserialize, Deserializer, Error, MapAccess, Visitor};
fn main() {
let a = r#"{"key": "string"}"#;
let b = r#"{"key2": "string"}"#;
let a: SimpleString = serde_json::from_str(a).unwrap();
let b: SimpleString = serde_json::from_str(b).unwrap();
assert_eq!(a, b);
}
#[derive(Debug, PartialEq)]
struct SimpleString {
key: String,
}
struct SimpleStringVisitor;
impl<'de> Visitor<'de> for SimpleStringVisitor {
type Value = SimpleString;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("an object with a single string value of any key name")
}
fn visit_map<M>(self, mut access: M) -> Result<Self::Value, M::Error>
where
M: MapAccess<'de>,
{
if let Some((_, key)) = access.next_entry::<String, _>()? {
if access.next_entry::<String, String>()?.is_some() {
Err(M::Error::custom("too many values"))
} else {
Ok(SimpleString { key })
}
} else {
Err(M::Error::custom("not enough values"))
}
}
}
impl<'de> Deserialize<'de> for SimpleString {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
deserializer.deserialize_map(SimpleStringVisitor)
}
}

rustc-serialize custom enum decoding

I have a JSON structure where one of the fields of a struct could be either an object, or that object's ID in the database. Let's say the document looks like this with both possible formats of the struct:
[
{
"name":"pebbles",
"car":1
},
{
"name":"pebbles",
"car":{
"id":1,
"color":"green"
}
}
]
I'm trying to figure out the best way to implement a custom decoder for this. So far, I've tried a few different ways, and I'm currently stuck here:
extern crate rustc_serialize;
use rustc_serialize::{Decodable, Decoder, json};
#[derive(RustcDecodable, Debug)]
struct Car {
id: u64,
color: String
}
#[derive(Debug)]
enum OCar {
Id(u64),
Car(Car)
}
#[derive(Debug)]
struct Person {
name: String,
car: OCar
}
impl Decodable for Person {
fn decode<D: Decoder>(d: &mut D) -> Result<Person, D::Error> {
d.read_struct("root", 2, |d| {
let mut car: OCar;
// What magic must be done here to get the right OCar?
/* I tried something akin to this:
let car = try!(d.read_struct_field("car", 0, |r| {
let r1 = Car::decode(r);
let r2 = u64::decode(r);
// Compare both R1 and R2, but return code for Err() was tricky
}));
*/
/* And this got me furthest */
match d.read_struct_field("car", 0, u64::decode) {
Ok(x) => {
car = OCar::Id(x);
},
Err(_) => {
car = OCar::Car(try!(d.read_struct_field("car", 0, Car::decode)));
}
}
Ok(Person {
name: try!(d.read_struct_field("name", 0, Decodable::decode)),
car: car
})
})
}
}
fn main() {
// Vector of both forms
let input = "[{\"name\":\"pebbles\",\"car\":1},{\"name\":\"pebbles\",\"car\":{\"id\":1,\"color\":\"green\"}}]";
let output: Vec<Person> = json::decode(&input).unwrap();
println!("Debug: {:?}", output);
}
The above panics with an EOL which is a sentinel value rustc-serialize uses on a few of its error enums. Full line is
thread '<main>' panicked at 'called `Result::unwrap()` on an `Err` value: EOF', src/libcore/result.rs:785
What's the right way to do this?
rustc-serialize, or at least its JSON decoder, doesn't support that use case. If you look at the implementation of read_struct_field (or any other method), you can see why: it uses a stack, but when it encounters an error, it doesn't bother to restore the stack to its original state, so when you try to decode the same thing differently, the decoder is operating on an inconsistent stack, eventually leading to an unexpected EOF value.
I would recommend you look into Serde instead. Deserializing in Serde is different: instead of telling the decoder what type you're expecting, and having no clear way to recover if a value is of the wrong type, Serde calls into a visitor that can handle any of the types that Serde supports in the way it wants. This means that Serde will call different methods on the visitor depending on the actual type of the value it parsed. For example, we can handle integers to return an OCar::Id and objects to return an OCar::Car.
Here's a full example:
#![feature(custom_derive, plugin)]
#![plugin(serde_macros)]
extern crate serde;
extern crate serde_json;
use serde::de::{Deserialize, Deserializer, Error, MapVisitor, Visitor};
use serde::de::value::MapVisitorDeserializer;
#[derive(Deserialize, Debug)]
struct Car {
id: u64,
color: String
}
#[derive(Debug)]
enum OCar {
Id(u64),
Car(Car),
}
struct OCarVisitor;
#[derive(Deserialize, Debug)]
struct Person {
name: String,
car: OCar,
}
impl Deserialize for OCar {
fn deserialize<D>(deserializer: &mut D) -> Result<Self, D::Error> where D: Deserializer {
deserializer.deserialize(OCarVisitor)
}
}
impl Visitor for OCarVisitor {
type Value = OCar;
fn visit_u64<E>(&mut self, v: u64) -> Result<Self::Value, E> where E: Error {
Ok(OCar::Id(v))
}
fn visit_map<V>(&mut self, visitor: V) -> Result<Self::Value, V::Error> where V: MapVisitor {
Ok(OCar::Car(try!(Car::deserialize(&mut MapVisitorDeserializer::new(visitor)))))
}
}
fn main() {
// Vector of both forms
let input = "[{\"name\":\"pebbles\",\"car\":1},{\"name\":\"pebbles\",\"car\":{\"id\":1,\"color\":\"green\"}}]";
let output: Vec<Person> = serde_json::from_str(input).unwrap();
println!("Debug: {:?}", output);
}
Output:
Debug: [Person { name: "pebbles", car: Id(1) }, Person { name: "pebbles", car: Car(Car { id: 1, color: "green" }) }]
Cargo.toml:
[dependencies]
serde = "0.7"
serde_json = "0.7"
serde_macros = "0.7"