Unit testing CSV parsing logic

Unit testing CSV parsing logic - csv

I'm using the CSV crate to read CSV files. I then parse the content. I would like to unit test the parsing logic. Here's a simplified version of the code:
fn main() -> Result<(), Box<dyn Error>> {
let mut rdr = csv::ReaderBuilder::new()
.from_path("test.csv")?;
process(rdr.records());
Ok(())
}
fn process(iter: StringRecordsIter<File>) -> Result<String, String> {
for result in iter {
// Parsing takes place here
println!("{:?}", result);
}
// Post-parsing using entire file content takes place here
Ok(String::from("My Result Here"))
}
In my unit test I would like to be able to construct sequences of StringRecord objects, pass them to the process() method and validate the results. I can successfully create a StringRecord using the simple StringRecord::new() and fill it with values using record.push_field("my field value"). However, I'm struggling to create an iterator that returns my values to pass to the process(). Any suggestions? I'm happy to change the arguments to process() if this makes things easier.

The suggestion made by Jmb to change the signature of process() to fn process(iter: impl Iterator<Item = csv::Result<StringRecord>>) -> Result<String, String> works nicely.
Here's the solution in detail. Firstly the only change to process() is to make it accept more types:
fn process(iter: impl Iterator<Item = csv::Result<StringRecord>>) -> Result<String, String> {
for result in iter {
// Parsing takes place here
println!("{:?}", result);
}
// Post-parsing using entire file content takes place here
Ok(String::from("My Result Here"))
}
The main() remains identical as rdr.records can still be passed to process(). Then the testing looks like this:
#[test]
fn my_test() -> Result<(), String> {
let record1 = result_record(&["Value 1", "Value 2"]);
let record2 = result_record(&["Value 3", "Value 4"]);
let records = vec![record1, record2];
let result = process(records.into_iter())?;
assert_eq!("My Result Here", result);
Ok(())
}
fn result_record(fields: &[&str]) -> csv::Result<StringRecord> {
let mut record = StringRecord::new();
for field in fields {
record.push_field(field);
}
Ok(record)
}

Related

Rust Read CSV without header

How does one read a CSV without a header in Rust? I've searched through the docs and gone through like 15 examples each of which is subtly not what I'm looking for.
Consider how easy Python makes it:
csv.DictReader(f, fieldnames=['city'])
How do you do this in Rust?
Current attempt:
use std::fs::File;
use serde::Deserialize;
#[derive(Debug, Deserialize)]
struct CityRow {
city: &str,
pop: u32,
}
fn doit() -> zip::result::ZipResult<()>
{
let filename = "cities.csv";
let mut zip = zip::ZipArchive::new(File::open(filename).unwrap())?;
let mut file = zip.by_index(0).unwrap();
println!("Filename: {}", file.name());
let mut reader = csv::Reader::from_reader(Box::new(file));
reader.set_headers(csv::StringRecord([ "city", "pop" ]));
for record in reader.records() {
// let record: CityRow = record.unwrap();
// let record = record?;
println!("{:?}", record);
}
Ok(())
}

Use a ReaderBuilder, and call ReaderBuilder::has_headers to disable header parsing. You can then use StringRecord::deserialize to extract and print each record, skipping the first header row:
let mut reader = csv::ReaderBuilder::new()
.has_headers(false)
.from_reader(Box::new(file));
let headers = csv::StringRecord::from(vec!["city", "pop"]);
for record in reader.records().skip(1) {
let record: CityRow = record.unwrap().deserialize(Some(&headers)).unwrap();
println!("{:?}", record);
}
(playground)

#smitop's answer didn't totally make sense to me when looking at the underlying code since the library appears to assume headers will exist by default. This means actually the below should work directly, and I found it did:
let mut reader = csv::Reader::from_reader(data.as_bytes());
for record in reader.deserialize() {
let record: CityRow = record.unwrap();
println!("{:?}", record);
}
I checked through the variants in this playground.
For what it's worth, it turned out in my case I had accidentally left a code path in that was reading my csv as a plain file, which is why I had seen headers read as a row. (Oops.)

How to use Rusts tracing_distributed

I am trying to use the Rust tracing_distributed package, but I am getting strange and unhelpful errors when using it, and I am assuming I am using it wrong, but there is no documentation and there are no examples about how to use it. Here is an example of what I'm trying to do :
let trace = tracing_distributed::register_dist_tracing_root(traceId, remote_parent_span_id));
println!("trace value: {:?}", trace);
// the result of trace is: Err(NoEnabledSpan)
I have tried passing a few things in as the traceID and remote_parent_span_id including:
traceId = remote_parent_span_id = Some(tracing::Span::current())
As well as:
traceId = Some(tracing::Span::current())
remote_parent_span_id = ~someParentRequestIdGeneratedUpstream~
I know that the current span is not disabled from trying:
let span = tracing::Span::current();
if span.is_disabled() {
println!("CURRENT SPAN DISABELED");
}
So this leads me to think that the issue is coming from not having the subscriber set properly. I am trying to set the subscriber in an init function which is called before this function which looks like this:
let subscriber = tracing_subscriber::registry() // provide underlying span data store
.with(
tracing_subscriber::fmt::layer()
.json()
.with_span_events(FmtSpan::ACTIVE)
.event_format(stackdriver::StackDriverEventFormat::default())
.with_filter(tracing_subscriber::filter::dynamic_filter_fn(
move |m, c| filter_layer.enabled(m, c.to_owned()),
)),
);
let _ = tracing::subscriber::set_global_default(subscriber)
.map_err(|_err| eprintln!("Unable to set global default subscriber"));
Would anyone be willing to provide me with an example of how to use this library? Or can anyone see what I'm doing wrong here? I have tried everything I can think of.

tracing-distributed has a test which demonstrates how to create and use TelemetryLayer.
I made a demo based on it. In this demo, NoEnabledSpan may be caused by missing #[instrument], which creates a Span for function foo. Hope this will help you find the actual cause.
Also tracing-honeycomb is a great use case for tracing-distributed, better check it out.
use std::sync::{Arc, Mutex};
use tracing::{Id, info};
use tracing::instrument;
use tracing_distributed::{Event, Span, Telemetry, TelemetryLayer};
use tracing_subscriber::layer::SubscriberExt;
use tracing_subscriber::registry;
#[derive(Default, Debug)]
pub struct BlackholeVisitor;
#[derive(PartialEq, Eq, Hash, Copy, Clone, Debug)]
pub struct TraceId(pub(crate) u128);
type SpanId = tracing::Id;
impl tracing::field::Visit for BlackholeVisitor {
fn record_debug(&mut self, _: &tracing::field::Field, _: &dyn std::fmt::Debug) {}
}
/// Mock telemetry capability
pub struct TestTelemetry {
spans: Arc<Mutex<Vec<Span<BlackholeVisitor, SpanId, TraceId>>>>,
events: Arc<Mutex<Vec<Event<BlackholeVisitor, SpanId, TraceId>>>>,
}
impl TestTelemetry {
pub fn new(
spans: Arc<Mutex<Vec<Span<BlackholeVisitor, SpanId, TraceId>>>>,
events: Arc<Mutex<Vec<Event<BlackholeVisitor, SpanId, TraceId>>>>,
) -> Self {
TestTelemetry { spans, events }
}
}
impl Telemetry for TestTelemetry {
type Visitor = BlackholeVisitor;
type TraceId = TraceId;
type SpanId = SpanId;
fn mk_visitor(&self) -> Self::Visitor {
BlackholeVisitor
}
fn report_span(&self, span: Span<BlackholeVisitor, SpanId, TraceId>) {
// succeed or die. failure is unrecoverable (mutex poisoned)
let mut spans = self.spans.lock().unwrap();
spans.push(span);
}
fn report_event(&self, event: Event<BlackholeVisitor, SpanId, TraceId>) {
// succeed or die. failure is unrecoverable (mutex poisoned)
let mut events = self.events.lock().unwrap();
events.push(event);
}
}
#[instrument]
fn foo() {
let trace = tracing_distributed::register_dist_tracing_root(TraceId(123), Option::<Id>::None);
println!("trace value: {:?}", trace);
info!("test");
}
fn main() {
let spans = Arc::new(Mutex::new(Vec::new()));
let events = Arc::new(Mutex::new(Vec::new()));
let cap = TestTelemetry::new(spans.clone(), events.clone());
let telemetry_layer = TelemetryLayer::new("test_svc_name", cap, |x| x);
let subscriber = registry::Registry::default()
.with(tracing_subscriber::fmt::Layer::default())
.with(telemetry_layer);
// dbg!(&subscriber);
tracing::subscriber::set_global_default(subscriber).expect("setting global default failed");
foo();
dbg!(&spans);
dbg!(&events);
}
crate versions:
tracing = "0.1.32"
tracing-distributed = "0.4.0"
tracing-subscriber = "0.3.10"

How can an arbitrary json structure be deserialized with reqwest get in Rust?

I am totally new to rust and I am trying to find out how to I can doload an deserialize a arbitrary JSON structure from a URL endpoint.
The respective example on the reqwest README goes like this:
use std::collections::HashMap;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let resp = reqwest::get("https://httpbin.org/ip")
.await?
.json::<HashMap<String, String>>()
.await?;
println!("{:#?}", resp);
Ok(())
}
So in case of this example, the target structure – i.e. a HashMap Object with strings as keys and strings as values – is obviously known.
But what if I don't know what is the structure received on the request endpoint looks like?

You can use serde_json::Value.
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let resp = reqwest::get("https://httpbin.org/ip")
.await?
.json::<serde_json::Value>()
.await?;
println!("{:#?}", resp);
Ok(())
}
You will have to add serde_json to your Cargo.toml file.
[dependencies]
...
serde_json = "1"

How can I validate that the headers of a CSV file match my struct?

I need to parse a CSV file, but before actually parsing, I need to check if the file header can be assigned to my needs.
The problem is that some fields may be missing or the order of the fields may be different for different files.
I have a struct for dish
struct Dish {
title: String,
ingredients: Vec<String>,
spicy: Option<bool>,
vegetarian: Option<bool>,
}
I need to generate an error for any CSV file with a header that has missing fields from the structure (not Option) or has extra fields:
title;spicy;vegeterian
title;ingredients;poisoned

The csv crate has support for serde. The following example, adapted from the docs should do what you want:
use std::error::Error;
use std::io;
use std::process;
use serde::Deserialize;
#[derive(Debug, Deserialize)]
struct Dish {
title: String,
ingredients: Vec<String>,
spicy: Option<bool>,
vegetarian: Option<bool>,
}
fn example() -> Result<(), Box<dyn Error>> {
let mut rdr = csv::Reader::from_reader(io::stdin());
for result in rdr.deserialize() {
// Notice that we need to provide a type hint for automatic
// deserialization.
let dish: Dish = result?;
println!("{:?}", dish);
}
Ok(())
}
fn main() {
if let Err(err) = example() {
println!("error running example: {}", err);
process::exit(1);
}
}

TreeMap<String, String> to json

I have the following code:
extern crate serialize;
use std::collections::TreeMap;
use serialize::json;
use serialize::json::ToJson;
use serialize::json::Json;
fn main() {
let mut tree_map = get_tree_map(); // : TreeMap<String, String>
let mut tree_map2 = tree_map.iter().map(|k, v| (k, v.to_json())); //error the type of this value must be known in this context
let json1 = json::Object(tree_map2);
}
I want to convert tree_map to json. I tried to do it by converting it to TreeMap<String, Json> but failed. How can I do that?

The closure you passed to map takes two parameters, but it should take a single parameter that is a tuple type, because iter() returns an iterator over tuples (see Trait Implementations on Entries). Change |k, v| to |(k, v)| to fix this. (I found this by adding explicit type annotations on k, v: the compiler then complained about the closure not having the right number of parameters.)
There are some other errors however. Instead of using iter(), you might want to use into_iter() to avoid cloning the Strings if you don't need the TreeMap<String, String> anymore. Also, you should add .collect() after .map(...) to turn the iterator into a TreeMap. The compiler will automatically infer the type for tree_map2 based on the requirements for json::Object.
fn main() {
let mut tree_map = get_tree_map(); // : TreeMap<String, String>
let mut tree_map2 = tree_map.into_iter().map(|(k, v)| (k, v.to_json())).collect();
let json1 = Json::Object(tree_map2);
}

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Unit testing CSV parsing logic - csv

Related

Rust Read CSV without header

How to use Rusts tracing_distributed

How can an arbitrary json structure be deserialized with reqwest get in Rust?

How can I validate that the headers of a CSV file match my struct?

TreeMap<String, String> to json

Categories

Resources