Serialize/Deserialize CSV with nested enum/struct with serde in Rust - csv

I want to serialize/deserialize a CSV file with variable row length and content, like the following:
./test.csv
Message,20200202T102030,Some message content
Measurement,20200202T102031,10,30,40,2
AnotherMeasurement,20200202T102034,0,2
In my opinion, the easiest way to represent this is the following enum:
#[derive(Debug, Serialize, Deserialize)]
pub enum Record {
Message { timestamp: String, content: String }, // timestamp is String because of simplicity
Measurement { timestamp: String, a: u32, b: u32, c: u32, d: u32 },
AnotherMeasurement { timestamp: String, a: u32, b: u32 },
}
Cargo.toml
[dependencies]¬
csv = "^1.1.6"¬
serde = { version = "^1", features = ["derive"] }
Running the following
main.rs
fn example() -> Result<(), Box<dyn Error>> {
let mut rdr = csv::ReaderBuilder::new()
.has_headers(false)
.delimiter(b',')
.flexible(true)
.double_quote(false)
.from_path("./test.csv")
.unwrap();
for result in rdr.deserialize() {
let record: Record = result?;
println!("{:?}", record);
}
Ok(())
}
fn write_msg() -> Result<(), Box<dyn Error>> {
let msg = Record::Message {
timestamp: String::from("time"),
content: String::from("content"),
};
let mut wtr = csv::WriterBuilder::new()
.has_headers(false)
.flexible(true)
.double_quote(false)
.from_writer(std::io::stdout());
wtr.serialize(msg)?;
wtr.flush()?;
Ok(())
}
fn main() {
if let Err(err) = example() {
println!("error running example: {}", err);
}
if let Err(err) = write_msg() {
println!("error running example: {}", err);
}
}
prints
error running example: CSV deserialize error: record 0 (line: 1, byte: 0): invalid type: unit variant, expected struct variant
error running example: CSV write error: serializing enum struct variants is not supported
Is there an easy solution to do this with serde and csv? I feel like I missed one or two serde attributes, but I was not able to find the right one in the documentation yet.
EDITS
Netwave suggested adding the #[serde(tag = "type")] attribute. Serializing now works, Deserializing gives the following error:
error running example: CSV deserialize error: record 0 (line: 1, byte: 0): invalid type: string "Message", expected internally tagged enum Record
Research I did that did not lead to a solution yet
Is there a way to "flatten" enums for (de)serialization in Rust?
https://docs.rs/csv/1.1.6/csv/tutorial/index.html
Custom serde serialization for enum type
https://serde.rs/enum-representations.html

Make your enum tagged (internally tagged specifically):
#[derive(Debug, Serialize, Deserialize)]
#[serde(tag = "type")]
pub enum Record {
Message { timestamp: String, content: String }, // timestamp is String because of simplicity
Measurement { timestamp: String, a: u32, b: u32, c: u32, d: u32 },
AnotherMeasurement { timestamp: String, a: u32, b: u32 },
}
Playground

Related

Serde JSON deserializing enums

I have an enum:
#[derive(Serialize, Deserialize)]
enum Action {
Join,
Leave,
}
and a struct:
#[derive(Serialize, Deserialize)]
struct Message {
action: Action,
}
and I pass a JSON string:
"{\"action\":0}" // `json_string` var
but when I try deserialzing this like this:
let msg: Message = serde_json::from_str(json_string)?;
I get the error expected value at line 1 column 11.
In the JSON if I were to replace the number 0 with the string "Join" it works, but I want the number to correspond to the Action enum's values (0 is Action::Join, 1 is Action::Leave) since its coming from a TypeScript request. Is there a simple way to achieve this?
You want serde_repr!
Here's example code from the library's README:
use serde_repr::{Serialize_repr, Deserialize_repr};
#[derive(Serialize_repr, Deserialize_repr, PartialEq, Debug)]
#[repr(u8)]
enum SmallPrime {
Two = 2,
Three = 3,
Five = 5,
Seven = 7,
}
fn main() -> serde_json::Result<()> {
let j = serde_json::to_string(&SmallPrime::Seven)?;
assert_eq!(j, "7");
let p: SmallPrime = serde_json::from_str("2")?;
assert_eq!(p, SmallPrime::Two);
Ok(())
}
For your case:
use serde_repr::{Serialize_repr, Deserialize_repr};
#[derive(Serialize_repr, Deserialize_repr)]
#[repr(u8)]
enum Action {
Join = 0,
Leave = 1,
}
use serde::{Serialize, Deserialize};
#[derive(Serialize, Deserialize)]
struct Message {
action: Action,
}
Without adding any extra dependencies, the least verbose way is possibly to use "my favourite serde trick", the try_from and into container attributes. But in this case I feel that custom implementations of Deserialize and Serialize are more appropriate:
impl<'de> Deserialize<'de> for Action {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
match i8::deserialize(deserializer)? {
0 => Ok(Action::Join),
1 => Ok(Action::Leave),
_ => Err(serde::de::Error::custom("Expected 0 or 1 for action")),
}
}
}
impl Serialize for Action {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
serializer.serialize_i8(match self {
Action::Join => 0,
Action::Leave => 1,
})
}
}
The custom implementations only redirect to serializing/deserializing i8. Playground

Deserializing an unknown enum

Why does this code not deserialize the messages correctly?
The first message received is expected to fail, because the data field = {}, but after that the data field matches the Trade structure which is part of the flattened enum Data.
What am I doing wrong?
Here is the output:
***> {"event":"bts:subscription_succeeded","channel":"live_trades_btcusd","data":{}}
===> Error("no variant of enum Data found in flattened data", line: 1, column: 79)
***> {"data":{"id":220315257,"timestamp":"1644180636","amount":0.03208,"amount_str":"0.03208000","price":41586.11,"price_str":"41586.11","type":0,"microtimestamp":"1644180636419000","buy_order_id":1455495830577152,"sell_order_id":1455495778422789},"channel":"live_trades_btcusd","event":"trade"}
===> Error("no variant of enum Data found in flattened data", line: 1, column: 290)
***> {"data":{"id":220315276,"timestamp":"1644180648","amount":0.02037389,"amount_str":"0.02037389","price":41586.11,"price_str":"41586.11","type":1,"microtimestamp":"1644180648235000","buy_order_id":1455495864238080,"sell_order_id":1455495878971395},"channel":"live_trades_btcusd","event":"trade"}
use serde::{Deserialize, Serialize};
use serde_json::json;
use tungstenite::{connect, Message};
use url::Url;
#[derive(Serialize, Deserialize, Debug)]
enum Data {
None,
Trade(Trade),
}
#[derive(Serialize, Deserialize, Debug)]
struct Msg {
channel: String,
event: String,
#[serde(flatten)]
data: Data,
}
#[derive(Serialize, Deserialize, Debug)]
struct Trade {
id: u32,
amount: f32,
amount_str: String,
buy_order_id: u64,
microtimestamp: String,
price: f32,
price_str: String,
sell_order_id: u64,
timestamp: String,
#[serde(rename = "type")]
_type: u8,
}
let result: Result<Msg, serde_json::Error> = serde_json::from_str(msg.to_text().unwrap());
let _value = match result {
Ok(msg) => {
println!("---> {:?}", msg);
}
Err(err) => {
println!("===> {:?}", err);
continue;
}
};
https://pastebin.com/JZqPkmWC
I solved it by both removing the #[serde(flatten)] and also adding #[serde(untagged)]. Thanks for the help!
#[derive(Serialize, Deserialize, Debug)]
#[serde(untagged)]
enum Data {
Trade(Trade),
Order(Order),
None {},
}
#[derive(Serialize, Deserialize, Debug)]
struct Msg {
channel: String,
event: String,
data: Data,
}

Is there a way to derive a struct like with Deserialize to get automatic transform from serde_json::Value?

Deserialising from a string directly into a struct works perfectly. But in some cases, you may already have a serde_json::Value in your hands, and want to try and convert it into a struct.
The following example illustrate just that: loading a Request struct from JSON (in a network library for example), with a type string and a generic content as a Value, and then you want to call a handler (from a client library) with the value transformed into a given struct.
use serde::Deserialize;
use serde_json::{json, Value};
use std::convert::TryFrom;
use std::error::Error;
#[derive(Deserialize)]
struct Request {
#[serde(alias = "type")]
req_type: String,
content: Value
}
#[derive(Deserialize)]
struct Person {
name: String,
age: u8
}
// It there a way to avoid having to declare this???
impl TryFrom<Value> for Person {
type Error = serde_json::Error;
fn try_from(value: Value) -> Result<Self, Self::Error> {
Person::deserialize(value)
}
}
fn say_hello(p: Person) {
println!("Hello, I'm {}, and I'm {} years old!", p.name, p.age);
}
fn main() -> Result<(), Box<dyn Error>> {
let req: Request = Request::deserialize(json!({
"type": "sayHello",
"content": {
"name": "Pierre",
"age": 32
}
}))?;
match req.req_type.as_str() {
"sayHello" => say_hello(req.content.try_into()?),
_ => println!("unknown request")
}
Ok(())
}
So the question is: is there some derive or other magic implemented which would allow the same behaviour as Deserialize from String, so that the client can only write:
#[derive(Deserialize)]
struct Person {
name: String,
age: u8
}
fn say_hello(p: Person) {
println!("Hello, I'm {}, and I'm {} years old!", p.name, p.age);
}
I tried the #[serde(try_from = "Value")] attribute but it does not look like it's intended for that purpose...
There is serde_json::from_value() specifically for this:
pub fn from_value<T>(value: Value) -> Result<T, Error>
where
T: DeserializeOwned,
Given any serde_json::Value and some T: DeserializedOwned, the function will deserialize the Value to that T, if possible.

Something before csv header and how can i deal with it when i use serde in rust

Before csv header(time,ampl), there are some 'invalid' data.
the csv is about:
LECROYWS3024,13568,Waveform
Segments,1,SegmentSize,100002
Segment,TrigTime,TimeSinceSegment1
#1,01-Apr-2021 16:49:34,0
Time,Ampl
-2.510018e-005,0
-2.509968e-005,0
-2.509918e-005,0
-2.509868e-005,0
-2.509818e-005,0
...
when i build and run the exe, then an error is occured as below :
the error is :
CSV deserialize error: record 1 (line: 1, byte: 29): missing field Time
How can I deal with the invalid data with serde or other crates? Thanks!
use std::error::Error;
use std::io;
use std::process;
use serde::Deserialize;
#[derive(Debug, Deserialize)]
struct Record {
Time: Option<f32>,
Ampl:Option<f32>,
}
...
fn example() -> Result<(), Box<dyn Error>> {
let mut rdr = csv::Reader::from_path("foo.csv")?;
for result in rdr.deserialize() {
let record: Record = result?;
let x0= match record.Time{
Some(x)=> x,
None=> 0.0,
};
...
}
Ok(())
}
fn main() {
if let Err(err) = example() {
println!("error running example: {}", err);
process::exit(1);
}
}
You can use the csv crate, which has a custom deserializer: csv::invalid_option.
Then you can use a macro like this in your struct:
#[derive(Debug, Deserialize)]
struct Record {
Time: Option<f32>,
#[serde(deserialize_with = "csv::invalid_option")]
Ampl:Option<f32>,
}
to have invalid data converted to None values

rustc-serialize custom enum decoding

I have a JSON structure where one of the fields of a struct could be either an object, or that object's ID in the database. Let's say the document looks like this with both possible formats of the struct:
[
{
"name":"pebbles",
"car":1
},
{
"name":"pebbles",
"car":{
"id":1,
"color":"green"
}
}
]
I'm trying to figure out the best way to implement a custom decoder for this. So far, I've tried a few different ways, and I'm currently stuck here:
extern crate rustc_serialize;
use rustc_serialize::{Decodable, Decoder, json};
#[derive(RustcDecodable, Debug)]
struct Car {
id: u64,
color: String
}
#[derive(Debug)]
enum OCar {
Id(u64),
Car(Car)
}
#[derive(Debug)]
struct Person {
name: String,
car: OCar
}
impl Decodable for Person {
fn decode<D: Decoder>(d: &mut D) -> Result<Person, D::Error> {
d.read_struct("root", 2, |d| {
let mut car: OCar;
// What magic must be done here to get the right OCar?
/* I tried something akin to this:
let car = try!(d.read_struct_field("car", 0, |r| {
let r1 = Car::decode(r);
let r2 = u64::decode(r);
// Compare both R1 and R2, but return code for Err() was tricky
}));
*/
/* And this got me furthest */
match d.read_struct_field("car", 0, u64::decode) {
Ok(x) => {
car = OCar::Id(x);
},
Err(_) => {
car = OCar::Car(try!(d.read_struct_field("car", 0, Car::decode)));
}
}
Ok(Person {
name: try!(d.read_struct_field("name", 0, Decodable::decode)),
car: car
})
})
}
}
fn main() {
// Vector of both forms
let input = "[{\"name\":\"pebbles\",\"car\":1},{\"name\":\"pebbles\",\"car\":{\"id\":1,\"color\":\"green\"}}]";
let output: Vec<Person> = json::decode(&input).unwrap();
println!("Debug: {:?}", output);
}
The above panics with an EOL which is a sentinel value rustc-serialize uses on a few of its error enums. Full line is
thread '<main>' panicked at 'called `Result::unwrap()` on an `Err` value: EOF', src/libcore/result.rs:785
What's the right way to do this?
rustc-serialize, or at least its JSON decoder, doesn't support that use case. If you look at the implementation of read_struct_field (or any other method), you can see why: it uses a stack, but when it encounters an error, it doesn't bother to restore the stack to its original state, so when you try to decode the same thing differently, the decoder is operating on an inconsistent stack, eventually leading to an unexpected EOF value.
I would recommend you look into Serde instead. Deserializing in Serde is different: instead of telling the decoder what type you're expecting, and having no clear way to recover if a value is of the wrong type, Serde calls into a visitor that can handle any of the types that Serde supports in the way it wants. This means that Serde will call different methods on the visitor depending on the actual type of the value it parsed. For example, we can handle integers to return an OCar::Id and objects to return an OCar::Car.
Here's a full example:
#![feature(custom_derive, plugin)]
#![plugin(serde_macros)]
extern crate serde;
extern crate serde_json;
use serde::de::{Deserialize, Deserializer, Error, MapVisitor, Visitor};
use serde::de::value::MapVisitorDeserializer;
#[derive(Deserialize, Debug)]
struct Car {
id: u64,
color: String
}
#[derive(Debug)]
enum OCar {
Id(u64),
Car(Car),
}
struct OCarVisitor;
#[derive(Deserialize, Debug)]
struct Person {
name: String,
car: OCar,
}
impl Deserialize for OCar {
fn deserialize<D>(deserializer: &mut D) -> Result<Self, D::Error> where D: Deserializer {
deserializer.deserialize(OCarVisitor)
}
}
impl Visitor for OCarVisitor {
type Value = OCar;
fn visit_u64<E>(&mut self, v: u64) -> Result<Self::Value, E> where E: Error {
Ok(OCar::Id(v))
}
fn visit_map<V>(&mut self, visitor: V) -> Result<Self::Value, V::Error> where V: MapVisitor {
Ok(OCar::Car(try!(Car::deserialize(&mut MapVisitorDeserializer::new(visitor)))))
}
}
fn main() {
// Vector of both forms
let input = "[{\"name\":\"pebbles\",\"car\":1},{\"name\":\"pebbles\",\"car\":{\"id\":1,\"color\":\"green\"}}]";
let output: Vec<Person> = serde_json::from_str(input).unwrap();
println!("Debug: {:?}", output);
}
Output:
Debug: [Person { name: "pebbles", car: Id(1) }, Person { name: "pebbles", car: Car(Car { id: 1, color: "green" }) }]
Cargo.toml:
[dependencies]
serde = "0.7"
serde_json = "0.7"
serde_macros = "0.7"