Skip empty or faulty rows with Serde - csv

I have a file with valid rows that I'm parsing to a struct using Serde and the csv crate.
#[derive(Debug, Deserialize)]
struct Circle {
x: f32,
y: f32,
radius: f32,
}
fn read_csv(path: &str) -> Result<Vec<Circle>, csv::Error> {
let mut rdr = csv::ReaderBuilder::new().delimiter(b';').from_path(path)?;
let res: Vec<Circle> = rdr
.deserialize()
.map(|record: Result<Circle, csv::Error>| {
record.unwrap_or_else(|err| panic!("There was a problem parsing a row: {}", err))
})
.collect();
Ok(res)
}
This code work for the most times, but sometimes when I get files they contain "empty" rows at the end:
x;y;radius
6398921.770;146523.553;0.13258
6398921.294;146522.452;0.13258
6398914.106;146526.867;0.13258
;;;
This makes the parsing fail with
thread 'main' panicked at 'There was a problem parsing a row: CSV
deserialize error: record 4 (line: 4, byte: 194): field 0: cannot
parse float from empty string', src/main.rs:90:41 note: run with
RUST_BACKTRACE=1 environment variable to display a backtrace
How can I handle faulty rows without manipulating the file contents beforehand?
Thanks!

Related

Json file serialization and deserialization

I have a json file:
[
{
"name": "John",
"sirname": "Fogerty",
"age": 77
},
{
"name": "Dave",
"sirname": "Mustaine",
"age": 61
}
]
I want to read the objects of the User structure from it into an array, then add another element to this array and re-write everything to the same file.
My code:
use serde_derive::{Serialize, Deserialize};
use serde_json::json;
use std::fs::File;
#[derive(Serialize, Deserialize, Debug)]
struct User {
name: String,
sirname: String,
age: u8,
}
fn main() {
let f = File::open("info.json").unwrap();
let mut q: Vec<User> = serde_json::from_reader(&f).unwrap();
q.push(User{name: "Daniil".to_string(),
sirname: "Volkov".to_string(),
age: 19,
});
serde_json::to_writer(f, &q).unwrap();
println!("{:?}", q);
}
I get an error when starting:
thread 'main' panicked at 'called Result::unwrap() on an Err value: Error("Access denied. (os error 5)", line: 0, column: 0)', src\main.rs:22:34
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
error: process didn't exit successfully: target\debug\json.exe (exit code: 101).
What do I need to do?
As noted by #Caesar in comments, you should change the way you open your file. Thus instead of
let f = File::open("info.json").unwrap();
you should put the following:
let mut f = File::options()
.read(true)
.write(true)
.open("info.json")
.unwrap();
But as you read your file your position in that file moves, so before writing to the file (since you are willing to rewrite it, not append) you need to reset the position:
let _ = f.seek(std::io::SeekFrom::Start(0)).unwrap();
right before the following line:
serde_json::to_writer(f, &q).unwrap();

Returns an unknown type in Rust

From a deserialized JSON file into structures,
{
"infos": {
"info_example": {
"title": {
"en": "Title for example",
"fr": "Titre pour l'exemple"
},
"message": {
"en": "Message for example"
}
}
},
"errors": {}
}
#[derive(Debug, Deserialize)]
struct Logs {
infos: Infos,
errors: Errors,
}
#[derive(Debug, Deserialize)]
struct Infos {
info_example: Log,
}
#[derive(Debug, Deserialize)]
struct Errors {}
#[derive(Debug, Deserialize)]
struct Log {
title: MultiString,
message: MultiString,
}
#[derive(Debug, Deserialize)]
struct MultiString {
en: String,
fr: Option<String>,
de: Option<String>
}
I would like to create a function working like this :
logs_manager.get("infos.info_example.message.en")
struct LogsManager {
logs: Logs
}
impl LogsManager {
fn get(&self, element: &str) -> T {
let splitted: Vec<&str> = element.split(".").collect();
// Later it will be a loop, but for now I'm just gonna take the first part
match splitted[0] {
"infos" => {
&self.logs.infos
},
"errors" => {
&self.logs.errors
}
_ => panic!()
}
}
}
When I try to compile, I'm getting these errors :
Compiling so_question_return_type v0.1.0 (/run/media/anton/data120/Documents/testa)
error[E0308]: mismatched types
--> src/main.rs:26:17
|
20 | fn get<T>(&self, element: &str) -> &T {
| - this type parameter -- expected `&T` because of return type
...
26 | &self.logs.infos
| ^^^^^^^^^^^^^^^^ expected type parameter `T`, found struct `Infos`
|
= note: expected reference `&T`
found reference `&Infos`
error[E0308]: mismatched types
--> src/main.rs:29:17
|
20 | fn get<T>(&self, element: &str) -> &T {
| - this type parameter -- expected `&T` because of return type
...
29 | &self.logs.errors
| ^^^^^^^^^^^^^^^^^ expected type parameter `T`, found struct `Errors`
|
= note: expected reference `&T`
found reference `&Errors`
For more information about this error, try `rustc --explain E0308`.
error: could not compile `so_question_return_type` due to 2 previous errors
try compile by yourself thanks to this file : https://gist.github.com/antoninhrlt/feee9ec4da1cb0edd0e7f426a6c744b0
So, I've tried to create an enum variant to avoid the type return problem.
enum LogObject<'a> {
Logs(Logs),
Infos(Infos),
Errors(Errors),
Log(Log),
MultiString(MultString)
}
Then I wrap the object I want to return into a LogObject. See :
"infos" => {
StringsObject::Logs(&self.logs.infos)
}
It works. But I would like to easily retrieve the value inside the enum object.
impl<'a> LogsObject<'a> {
fn retrieve_object<T>(&self) -> &T {
match *self {
Self::Logs(object) => object,
Self::Infos(object) => object,
Self::Errors(object) => object,
Self::Log(object) => object,
Self::MultiString(object) => object,
}
}
}
But this gives me another error :
error[E0308]: mismatched types
--> crates/strings/src/manager.rs:63:44
|
61 | fn retrieve_object<T>(&self) -> &T {
| - -- expected `&T` because of return type
| |
| this type parameter
62 | match *self {
63 | Self::Logs(object) => object,
| ^^^^^^ expected type parameter `T`, found struct `structured::Logs`
|
= note: expected reference `&T`
found reference `&structured::Logs`
For more information about this error, try `rustc --explain E0308`.
error: could not compile `strings` due to previous error
I'm lost, I don't know how to implement that. So after searching for a while, I'm asking to you.
Does anyone know how to do this ?
PS : I'm using serde for deserialization.
Making a get function was a bad idea.
Now, I'm simply doing something like that :
logs_manager.logs.info_example.message.en
Other thing, I'm including the json file into the binary thanks to include_str!
Thank you the reply, but I think it's better to abandon my first idea

Why is data bytes size different when serialized with JSONSerialization

Why is data serialized using JSONSerialization differ from the data serialized with the extension below?
let uint8Array: [UInt8] = [123, 234, 255]
let data1 = uint8Array.data // 3bytes
let data2 = try! JSONSerialization.data(withJSONObject: uint8Array) // 13 bytes
extension Data {
var bytes: [UInt8] {
return [UInt8](self)
}
}
extension Array where Element == UInt8 {
var data: Data {
return Data(self)
}
}
JSON, and so the JSONSerialization will transform an object (here an array of Int), into Data according to some rules (like adding "[", "]", "," in our cases, in others input values and options, it can add if necessary ":", "{", "}", "\n"`, etc) and using the UTF8 (or kind of UTF16, but I'll skip this part for understanding, see this part in WikiPedia) encoding.
So in fact, for 123 value, it will be 1 into UTF8 encoding, then 2, then 3, so just for one number, it's 3 bytes.
So you see that for your 3 Int numbers, it will be 9 bytes, then, we need to add the "[", "]", and the two commas to separate them: ",", which makes 9 + 2 + 2 = 13 bytes.
To illustrates:
Let's add:
extension Data {
func hexRepresentation(separator: String) -> String {
map { String(format: "%02hhX", $0) }.joined(separator: separator)
}
func intRepresentation(separator: String) -> String {
map { String(format: "%d", $0) }.joined(separator: separator)
}
}
And use it to see what are the values inside data1 & data2:
print(data1.hexRepresentation(separator: "-"))
print(data1.intRepresentation(separator: "-"))
print(data2.hexRepresentation(separator: "-"))
print(data2.intRepresentation(separator: "-"))
$>7B-EA-FF
$>123-234-255
$>5B-31-32-33-2C-32-33-34-2C-32-35-35-5D
$>91-49-50-51-44-50-51-52-44-50-53-53-93
I let you chose as you prefers Int or hex raw value interpretations.
We see that using the first method, we get 3 bytes, with the initial values of your array.
But for the JSON one, we can split it as such:
$>91 -> [
$>49-50-51 -> 1, 2 & 3
$>44 -> ,
$>50-51-52 -> 2, 3, 4
$>44 -> ,
$>50-53-53 -> 2, 5, 5
$>93 -> ]
You can check on any UTF8 table (like https://www.utf8-chartable.de, etc), but 91 that for [, 49 it's for 1, etc.

Unable to deserialize chrono::DateTime from json

I encounter an interesting issue. For some reason serde is unable to deserialize a chrono::DateTime<Utc> object from a string in the same format it was serialized (but it does if I save a variable with it):
use chrono; // 0.4.11
use serde_json; // 1.0.48
fn main() {
let date = chrono::Utc::now();
println!("{}", date);
let date_str = serde_json::to_string(&date).unwrap();
println!("{}", date_str);
let parsed_date: chrono::DateTime<chrono::Utc> = serde_json::from_str(&date_str).unwrap();
println!("{}", parsed_date);
assert_eq!(date, parsed_date);
let date = "2020-03-28T16:29:04.644008111Z";
let _: chrono::DateTime<chrono::Utc> = serde_json::from_str(&date).unwrap();
}
Here is the playground link
Which outputs:
Compiling playground v0.0.1 (/playground)
Finished dev [unoptimized + debuginfo] target(s) in 1.01s
Running `target/debug/playground`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error("invalid type: integer `2020`, expected a formatted date and time string or a unix timestamp", line: 1, column: 4)', src/main.rs:17:44
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Standard Output
2020-03-28 17:57:04.222452521 UTC
"2020-03-28T17:57:04.222452521Z"
2020-03-28 17:57:04.222452521 UTC
Why is this happening? How should I be doing it?
You need to put valid json, don't forget double quote:
let date = "\"2020-03-28T16:29:04.644008111Z\"";
You can see it with println!("{:?}", date_str);

Why do I always get a "trailing characters" error when trying to parse data with serde_json?

I have a server that returns requests in a JSON format. When trying to parse the data I always get "trailing characters" error. This happens only when getting the JSON from postman
let type_of_request = parsed_request[1];
let content_of_msg: Vec<&str> = msg_from_client.split("\r\n\r\n").collect();
println!("{}", content_of_msg[1]);
// Will print "{"username":"user","password":"password","email":"dwadwad"}"
let res: serde_json::Value = serde_json::from_str(content_of_msg[1]).unwrap();
println!("The username is: {}", res["username"]);
when getting the data from postman this happens:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error("trailing characters", line: 1, column: 60)', src\libcore\result.rs:997:5
but when having the string inside Rust:
let j = "{\"username\":\"user\",\"password\":\"password\",\"email\":\"dwadwad\"}";
let res: serde_json::Value = serde_json::from_str(j).unwrap();
println!("The username is: {}", res["username"]);
it works like a charm:
The username is: "user"
EDIT: Apparently as I read the message into a buffer and turned it into a string it saved all the NULL characters the buffer had which are of course the trailing characters.
Looking at the serde json code, one finds the following comment above the relevant ErrorCode enum element:
/// JSON has non-whitespace trailing characters after the value.
TrailingCharacters,
So as the error code implies, you've got some trailing character which is not whitespace. In your snippet, you say:
println!("{}", content_of_msg[1]);
// Will print "{"username":"user","password":"password","email":"dwadwad"}"
If you literally copy and pasted the printed output here, I'd note that I wouldn't expect the output to be wrapped in the leading and trailing quotation marks. Did you include these yourself or were they part of what was printed? If they were printed, I suspect that's the source of your problem.
Edit:
In fact, I can nearly recreate this using a raw string with leading/trailing quotation marks in Rust:
extern crate serde_json;
#[cfg(test)]
mod tests {
#[test]
fn test_serde() {
let s =
r#""{"username":"user","password":"password","email":"dwadwad"}""#;
println!("{}", s);
let _res: serde_json::Value = serde_json::from_str(s).unwrap();
}
}
Running it via cargo test yields:
test tests::test_serde ... FAILED
failures:
---- tests::test_serde stdout ----
"{"username":"user","password":"password","email":"dwadwad"}"
thread 'tests::test_serde' panicked at 'called `Result::unwrap()` on an `Err` value: Error("trailing characters", line: 1, column: 4)', src/libcore/result.rs:997:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
failures:
tests::test_serde
Note that my printed output also includes leading/trailing quotation marks and I also get a TrailingCharacter error, albeit at a different column.
Edit 2:
Based on your comment that you've added the wrapping quotations yourself, you've got a known good string (the one you've defined in Rust), and one which you believe should match it but doesn't (the one from Postman).
This is a data problem and so we should examine the data. You can adapt the below code to check the good string against the other:
#[test]
fn test_str_comp() {
// known good string we'll compare against
let good =
r#"{"username":"user","password":"password","email":"dwadwad"}"#;
// lengthened string, additional characters
// also n and a in username are transposed
let bad =
r#"{"useranme":"user","password":"password","email":"dwadwad"}abc"#;
let good_size = good.chars().count();
let bad_size = bad.chars().count();
for (idx, (c1, c2)) in (0..)
.zip(good.chars().zip(bad.chars()))
.filter(|(_, (c1, c2))| c1 != c2)
{
println!(
"Strings differ at index {}: (good: `{}`, bad: `{}`)",
idx, c1, c2
);
}
if good_size < bad_size {
let trailing = bad.chars().skip(good_size);
println!(
"bad string contains extra characters: `{}`",
trailing.collect::<String>()
);
} else if good_size > bad_size {
let trailing = good.chars().skip(bad_size);
println!(
"good string contains extra characters: `{}`",
trailing.collect::<String>()
);
}
assert!(false);
}
For my example, this yields the failure:
test tests::test_str_comp ... FAILED
failures:
---- tests::test_str_comp stdout ----
Strings differ at index 6: (good: `n`, bad: `a`)
Strings differ at index 7: (good: `a`, bad: `n`)
bad string contains extra characters: `abc`
thread 'tests::test_str_comp' panicked at 'assertion failed: false', src/lib.rs:52:9
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
failures:
tests::test_str_comp