Unable to tackle optional fields in JSON with Rustc-serialize - json

I am trying to deserialize JSON to Rust structure using rustc_serialize. The problem is that certain JSONs have some optional fields, i.e., may or may not be present. The moment the first absent field is encountered, the decoder seems to bail out and not consider subsequent fields, even if they are present. Is there a way to overcome this?
Here is the code:
extern crate rustc_serialize;
#[derive(Debug)]
struct B {
some_field_0: Option<u64>,
some_field_1: Option<String>,
}
impl rustc_serialize::Decodable for B {
fn decode<D: rustc_serialize::Decoder>(d: &mut D) -> Result<Self, D::Error> {
Ok(B {
some_field_0: d.read_struct_field("some_field_0", 0, |d| rustc_serialize::Decodable::decode(d)).ok(),
some_field_1: d.read_struct_field("some_field_1", 0, |d| rustc_serialize::Decodable::decode(d)).ok(),
})
}
}
fn main() {
{
println!("--------------------------------\n1st run - all field present\n--------------------------------");
let json_str = "{\"some_field_0\": 1234, \"some_field_1\": \"There\"}".to_string();
let obj_b: B = rustc_serialize::json::decode(&json_str).unwrap();
println!("\nJSON: {}\nDecoded: {:?}", json_str, obj_b);
}
{
println!("\n\n--------------------------------\n2nd run - \"some_field_1\" absent\n---------------------------------");
let json_str = "{\"some_field_0\": 1234}".to_string();
let obj_b: B = rustc_serialize::json::decode(&json_str).unwrap();
println!("\nJSON: {}\nDecoded: {:?}", json_str, obj_b);
}
{
println!("\n\n--------------------------------\n3rd run - \"some_field_0\" absent\n---------------------------------");
let json_str = "{\"some_field_1\": \"There\"}".to_string();
let obj_b: B = rustc_serialize::json::decode(&json_str).unwrap();
println!("\nJSON: {}\nDecoded: {:?}", json_str, obj_b);
}
}
and here's the output:
--------------------------------
1st run - all field present
--------------------------------
JSON: {"some_field_0": 1234, "some_field_1": "There"}
Decoded: B { some_field_0: Some(1234), some_field_1: Some("There") }
--------------------------------
2nd run - "some_field_1" absent
---------------------------------
JSON: {"some_field_0": 1234}
Decoded: B { some_field_0: Some(1234), some_field_1: None }
--------------------------------
3rd run - "some_field_0" absent
---------------------------------
JSON: {"some_field_1": "There"}
Decoded: B { some_field_0: None, some_field_1: None }
Notice that the third run produces an unexpected result. When the decoder fails to find some_field_0 it fails on all subsequent tokens, even though some_field_1 is present.

There's something wrong with your Decodable implementation. Using the automatically-generated implementation works:
#[derive(Debug, RustcDecodable)]
struct B {
some_field_1: Option<String>,
some_field_0: Option<u64>,
}
JSON: {"some_field_1": "There"}
Decoded: B { some_field_1: Some("There"), some_field_0: None }
Using the generated implementation is the right thing to do if you can. If you cannot, here's the right implementation:
impl rustc_serialize::Decodable for B {
fn decode<D: rustc_serialize::Decoder>(d: &mut D) -> Result<Self, D::Error> {
Ok(B {
some_field_0: try!(d.read_struct_field("some_field_0", 0, |d| rustc_serialize::Decodable::decode(d))),
some_field_1: try!(d.read_struct_field("some_field_1", 0, |d| rustc_serialize::Decodable::decode(d))),
})
}
}
The important change is the use of try!. Decoding can fail. By using ok, you were saying that a failed decoding was actually a success, albeit a successful decoding of a None.

Related

Serde JSON deserializing enums

I have an enum:
#[derive(Serialize, Deserialize)]
enum Action {
Join,
Leave,
}
and a struct:
#[derive(Serialize, Deserialize)]
struct Message {
action: Action,
}
and I pass a JSON string:
"{\"action\":0}" // `json_string` var
but when I try deserialzing this like this:
let msg: Message = serde_json::from_str(json_string)?;
I get the error expected value at line 1 column 11.
In the JSON if I were to replace the number 0 with the string "Join" it works, but I want the number to correspond to the Action enum's values (0 is Action::Join, 1 is Action::Leave) since its coming from a TypeScript request. Is there a simple way to achieve this?
You want serde_repr!
Here's example code from the library's README:
use serde_repr::{Serialize_repr, Deserialize_repr};
#[derive(Serialize_repr, Deserialize_repr, PartialEq, Debug)]
#[repr(u8)]
enum SmallPrime {
Two = 2,
Three = 3,
Five = 5,
Seven = 7,
}
fn main() -> serde_json::Result<()> {
let j = serde_json::to_string(&SmallPrime::Seven)?;
assert_eq!(j, "7");
let p: SmallPrime = serde_json::from_str("2")?;
assert_eq!(p, SmallPrime::Two);
Ok(())
}
For your case:
use serde_repr::{Serialize_repr, Deserialize_repr};
#[derive(Serialize_repr, Deserialize_repr)]
#[repr(u8)]
enum Action {
Join = 0,
Leave = 1,
}
use serde::{Serialize, Deserialize};
#[derive(Serialize, Deserialize)]
struct Message {
action: Action,
}
Without adding any extra dependencies, the least verbose way is possibly to use "my favourite serde trick", the try_from and into container attributes. But in this case I feel that custom implementations of Deserialize and Serialize are more appropriate:
impl<'de> Deserialize<'de> for Action {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
match i8::deserialize(deserializer)? {
0 => Ok(Action::Join),
1 => Ok(Action::Leave),
_ => Err(serde::de::Error::custom("Expected 0 or 1 for action")),
}
}
}
impl Serialize for Action {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
serializer.serialize_i8(match self {
Action::Join => 0,
Action::Leave => 1,
})
}
}
The custom implementations only redirect to serializing/deserializing i8. Playground

Is there a way to derive a struct like with Deserialize to get automatic transform from serde_json::Value?

Deserialising from a string directly into a struct works perfectly. But in some cases, you may already have a serde_json::Value in your hands, and want to try and convert it into a struct.
The following example illustrate just that: loading a Request struct from JSON (in a network library for example), with a type string and a generic content as a Value, and then you want to call a handler (from a client library) with the value transformed into a given struct.
use serde::Deserialize;
use serde_json::{json, Value};
use std::convert::TryFrom;
use std::error::Error;
#[derive(Deserialize)]
struct Request {
#[serde(alias = "type")]
req_type: String,
content: Value
}
#[derive(Deserialize)]
struct Person {
name: String,
age: u8
}
// It there a way to avoid having to declare this???
impl TryFrom<Value> for Person {
type Error = serde_json::Error;
fn try_from(value: Value) -> Result<Self, Self::Error> {
Person::deserialize(value)
}
}
fn say_hello(p: Person) {
println!("Hello, I'm {}, and I'm {} years old!", p.name, p.age);
}
fn main() -> Result<(), Box<dyn Error>> {
let req: Request = Request::deserialize(json!({
"type": "sayHello",
"content": {
"name": "Pierre",
"age": 32
}
}))?;
match req.req_type.as_str() {
"sayHello" => say_hello(req.content.try_into()?),
_ => println!("unknown request")
}
Ok(())
}
So the question is: is there some derive or other magic implemented which would allow the same behaviour as Deserialize from String, so that the client can only write:
#[derive(Deserialize)]
struct Person {
name: String,
age: u8
}
fn say_hello(p: Person) {
println!("Hello, I'm {}, and I'm {} years old!", p.name, p.age);
}
I tried the #[serde(try_from = "Value")] attribute but it does not look like it's intended for that purpose...
There is serde_json::from_value() specifically for this:
pub fn from_value<T>(value: Value) -> Result<T, Error>
where
T: DeserializeOwned,
Given any serde_json::Value and some T: DeserializedOwned, the function will deserialize the Value to that T, if possible.

Rust: enable Serde to distinguish between undefined & null [duplicate]

I'd like to use Serde to parse some JSON as part of a HTTP PATCH request. Since PATCH requests don't pass the entire object, only the relevant data to update, I need the ability to tell between a value that was not passed, a value that was explicitly set to null, and a value that is present.
I have a value object with multiple nullable fields:
struct Resource {
a: Option<i32>,
b: Option<i32>,
c: Option<i32>,
}
If the client submits JSON like this:
{"a": 42, "b": null}
I'd like to change a to Some(42), b to None, and leave c unchanged.
I tried wrapping each field in one more level of Option:
#[derive(Debug, Deserialize)]
struct ResourcePatch {
a: Option<Option<i32>>,
b: Option<Option<i32>>,
c: Option<Option<i32>>,
}
playground
This does not make a distinction between b and c; both are None but I'd have wanted b to be Some(None).
I'm not tied to this representation of nested Options; any solution that can distinguish the 3 cases would be fine, such as one using a custom enum.
Building off of E_net4's answer, you can also create an enum for the three possibilities:
#[derive(Debug)]
enum Patch<T> {
Missing,
Null,
Value(T),
}
impl<T> Default for Patch<T> {
fn default() -> Self {
Patch::Missing
}
}
impl<T> From<Option<T>> for Patch<T> {
fn from(opt: Option<T>) -> Patch<T> {
match opt {
Some(v) => Patch::Value(v),
None => Patch::Null,
}
}
}
impl<'de, T> Deserialize<'de> for Patch<T>
where
T: Deserialize<'de>,
{
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
Option::deserialize(deserializer).map(Into::into)
}
}
This can then be used as:
#[derive(Debug, Deserialize)]
struct ResourcePatch {
#[serde(default)]
a: Patch<i32>,
}
Unfortunately, you still have to annotate each field with #[serde(default)] (or apply it to the entire struct). Ideally, the implementation of Deserialize for Patch would handle that completely, but I haven't figured out how to do that yet.
Quite likely, the only way to achieve that right now is with a custom deserialization function. Fortunately, it is not hard to implement, even to make it work for any kind of field:
fn deserialize_optional_field<'de, T, D>(deserializer: D) -> Result<Option<Option<T>>, D::Error>
where
D: Deserializer<'de>,
T: Deserialize<'de>,
{
Ok(Some(Option::deserialize(deserializer)?))
}
Then each field would be annotated as thus:
#[serde(deserialize_with = "deserialize_optional_field")]
a: Option<Option<i32>>,
You also need to annotate the struct with #[serde(default)], so that empty fields are deserialized to an "unwrapped" None. The trick is to wrap present values around Some.
Serialization relies on another trick: skipping serialization when the field is None:
#[serde(deserialize_with = "deserialize_optional_field")]
#[serde(skip_serializing_if = "Option::is_none")]
a: Option<Option<i32>>,
Playground with the full example. The output:
Original JSON: {"a": 42, "b": null}
> Resource { a: Some(Some(42)), b: Some(None), c: None }
< {"a":42,"b":null}
Building up on Shepmaster's answer and adding serialization.
use serde::ser::Error;
use serde::{Deserialize, Deserializer};
use serde::{Serialize, Serializer};
// #region ------ JSON Absent support
// build up on top of https://stackoverflow.com/a/44332837
/// serde Valueue that can be Absent, Null, or Valueue(T)
#[derive(Debug)]
pub enum Maybe<T> {
Absent,
Null,
Value(T),
}
#[allow(dead_code)]
impl<T> Maybe<T> {
pub fn is_absent(&self) -> bool {
match &self {
Maybe::Absent => true,
_ => false,
}
}
}
impl<T> Default for Maybe<T> {
fn default() -> Self {
Maybe::Absent
}
}
impl<T> From<Option<T>> for Maybe<T> {
fn from(opt: Option<T>) -> Maybe<T> {
match opt {
Some(v) => Maybe::Value(v),
None => Maybe::Null,
}
}
}
impl<'de, T> Deserialize<'de> for Maybe<T>
where
T: Deserialize<'de>,
{
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
let d = Option::deserialize(deserializer).map(Into::into);
d
}
}
impl<T: Serialize> Serialize for Maybe<T> {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
match self {
// this will be serialized as null
Maybe::Null => serializer.serialize_none(),
Maybe::Value(v) => v.serialize(serializer),
// should have been skipped
Maybe::Absent => Err(Error::custom(
r#"Maybe fields need to be annotated with:
#[serde(default, skip_serializing_if = "Maybe::is_Absent")]"#,
)),
}
}
}
// #endregion --- JSON Absent support
And then you can use it this way:
#[derive(Serialize, Deserialize, Debug)]
struct Rect {
#[serde(default, skip_serializing_if = "Maybe::is_absent")]
stroke: Maybe<i32>,
w: i32,
#[serde(default, skip_serializing_if = "Maybe::is_absent")]
h: Maybe<i32>,
}
// ....
let json = r#"
{
"stroke": null,
"w": 1
}"#;
let deserialized: Rect = serde_json::from_str(json).unwrap();
println!("deserialized = {:?}", deserialized);
// will output: Rect { stroke: Null, w: 1, h: Absent }
let serialized = serde_json::to_string(&deserialized).unwrap();
println!("serialized back = {}", serialized);
// will output: {"stroke":null,"w":1}
I wish Serde had a built-in way to handle JSON's null and absent states.
Update 2021-03-12 - Updated to Maybe::Absent as it is more JSON and SQL DSL idiomatic.
The catch with this approach is that we can express:
type | null with the default Option<type>
type | null | absent with Maybe<type>
But we cannot express
type | absent
The solution would be to refactor Maybe to just have ::Present(value) and ::Absent and support Maybe<Option<type>> for the type | null | absent. So this will give us full coverage.
type | null with the default Option<type>
type | absent with Maybe<type>
type | absent | null with Maybe<Option<type>>
I am trying to implement this without adding a #[serde(deserialize_with = "deserialize_maybe_field")] but not sure it is possible. I might be missing something obvious.

How can I do key-agnostic deserialization of JSON objects?

I am working with a less than ideal API that doesn't follow any rigid standard for sending data. Each payload comes with some payload info before the JSON, followed by the actual data inside which can be a single string or several fields.
As it stands right now, if I were to map every different payload to a struct I would end up with roughly 50 structs. I feel like this is not ideal, because a ton of these structs overlap in all but key. For instance, there are I believe 6 different versions of payloads that could be mapped to something like the following, but they all have different keys.
I have these two JSON examples:
{"key": "string"}
{"key2": "string"}
And I want to serialize both into this struct:
#[derive(Debug, Deserialize)]
struct SimpleString {
key: String,
}
The same can be said for two strings, and even a couple cases for three. The payloads are frustratingly unique in small ways, so my current solution is to just define the structs locally inside the function that deserializes them and then pass that data off wherever it needs to go (in my case a cache and an event handler)
Is there a better way to represent this that doesn't have so much duplication? I've tried looking for things like key-agnostic deserializing but I haven't found anything yet.
You can implement Deserialize for your type to decode a "map" and ignore the key name:
extern crate serde;
extern crate serde_json;
use std::fmt;
use serde::de::{Deserialize, Deserializer, Error, MapAccess, Visitor};
fn main() {
let a = r#"{"key": "string"}"#;
let b = r#"{"key2": "string"}"#;
let a: SimpleString = serde_json::from_str(a).unwrap();
let b: SimpleString = serde_json::from_str(b).unwrap();
assert_eq!(a, b);
}
#[derive(Debug, PartialEq)]
struct SimpleString {
key: String,
}
struct SimpleStringVisitor;
impl<'de> Visitor<'de> for SimpleStringVisitor {
type Value = SimpleString;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("an object with a single string value of any key name")
}
fn visit_map<M>(self, mut access: M) -> Result<Self::Value, M::Error>
where
M: MapAccess<'de>,
{
if let Some((_, key)) = access.next_entry::<String, _>()? {
if access.next_entry::<String, String>()?.is_some() {
Err(M::Error::custom("too many values"))
} else {
Ok(SimpleString { key })
}
} else {
Err(M::Error::custom("not enough values"))
}
}
}
impl<'de> Deserialize<'de> for SimpleString {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
deserializer.deserialize_map(SimpleStringVisitor)
}
}

rustc-serialize custom enum decoding

I have a JSON structure where one of the fields of a struct could be either an object, or that object's ID in the database. Let's say the document looks like this with both possible formats of the struct:
[
{
"name":"pebbles",
"car":1
},
{
"name":"pebbles",
"car":{
"id":1,
"color":"green"
}
}
]
I'm trying to figure out the best way to implement a custom decoder for this. So far, I've tried a few different ways, and I'm currently stuck here:
extern crate rustc_serialize;
use rustc_serialize::{Decodable, Decoder, json};
#[derive(RustcDecodable, Debug)]
struct Car {
id: u64,
color: String
}
#[derive(Debug)]
enum OCar {
Id(u64),
Car(Car)
}
#[derive(Debug)]
struct Person {
name: String,
car: OCar
}
impl Decodable for Person {
fn decode<D: Decoder>(d: &mut D) -> Result<Person, D::Error> {
d.read_struct("root", 2, |d| {
let mut car: OCar;
// What magic must be done here to get the right OCar?
/* I tried something akin to this:
let car = try!(d.read_struct_field("car", 0, |r| {
let r1 = Car::decode(r);
let r2 = u64::decode(r);
// Compare both R1 and R2, but return code for Err() was tricky
}));
*/
/* And this got me furthest */
match d.read_struct_field("car", 0, u64::decode) {
Ok(x) => {
car = OCar::Id(x);
},
Err(_) => {
car = OCar::Car(try!(d.read_struct_field("car", 0, Car::decode)));
}
}
Ok(Person {
name: try!(d.read_struct_field("name", 0, Decodable::decode)),
car: car
})
})
}
}
fn main() {
// Vector of both forms
let input = "[{\"name\":\"pebbles\",\"car\":1},{\"name\":\"pebbles\",\"car\":{\"id\":1,\"color\":\"green\"}}]";
let output: Vec<Person> = json::decode(&input).unwrap();
println!("Debug: {:?}", output);
}
The above panics with an EOL which is a sentinel value rustc-serialize uses on a few of its error enums. Full line is
thread '<main>' panicked at 'called `Result::unwrap()` on an `Err` value: EOF', src/libcore/result.rs:785
What's the right way to do this?
rustc-serialize, or at least its JSON decoder, doesn't support that use case. If you look at the implementation of read_struct_field (or any other method), you can see why: it uses a stack, but when it encounters an error, it doesn't bother to restore the stack to its original state, so when you try to decode the same thing differently, the decoder is operating on an inconsistent stack, eventually leading to an unexpected EOF value.
I would recommend you look into Serde instead. Deserializing in Serde is different: instead of telling the decoder what type you're expecting, and having no clear way to recover if a value is of the wrong type, Serde calls into a visitor that can handle any of the types that Serde supports in the way it wants. This means that Serde will call different methods on the visitor depending on the actual type of the value it parsed. For example, we can handle integers to return an OCar::Id and objects to return an OCar::Car.
Here's a full example:
#![feature(custom_derive, plugin)]
#![plugin(serde_macros)]
extern crate serde;
extern crate serde_json;
use serde::de::{Deserialize, Deserializer, Error, MapVisitor, Visitor};
use serde::de::value::MapVisitorDeserializer;
#[derive(Deserialize, Debug)]
struct Car {
id: u64,
color: String
}
#[derive(Debug)]
enum OCar {
Id(u64),
Car(Car),
}
struct OCarVisitor;
#[derive(Deserialize, Debug)]
struct Person {
name: String,
car: OCar,
}
impl Deserialize for OCar {
fn deserialize<D>(deserializer: &mut D) -> Result<Self, D::Error> where D: Deserializer {
deserializer.deserialize(OCarVisitor)
}
}
impl Visitor for OCarVisitor {
type Value = OCar;
fn visit_u64<E>(&mut self, v: u64) -> Result<Self::Value, E> where E: Error {
Ok(OCar::Id(v))
}
fn visit_map<V>(&mut self, visitor: V) -> Result<Self::Value, V::Error> where V: MapVisitor {
Ok(OCar::Car(try!(Car::deserialize(&mut MapVisitorDeserializer::new(visitor)))))
}
}
fn main() {
// Vector of both forms
let input = "[{\"name\":\"pebbles\",\"car\":1},{\"name\":\"pebbles\",\"car\":{\"id\":1,\"color\":\"green\"}}]";
let output: Vec<Person> = serde_json::from_str(input).unwrap();
println!("Debug: {:?}", output);
}
Output:
Debug: [Person { name: "pebbles", car: Id(1) }, Person { name: "pebbles", car: Car(Car { id: 1, color: "green" }) }]
Cargo.toml:
[dependencies]
serde = "0.7"
serde_json = "0.7"
serde_macros = "0.7"