Rust Read CSV without header - csv

How does one read a CSV without a header in Rust? I've searched through the docs and gone through like 15 examples each of which is subtly not what I'm looking for.
Consider how easy Python makes it:
csv.DictReader(f, fieldnames=['city'])
How do you do this in Rust?
Current attempt:
use std::fs::File;
use serde::Deserialize;
#[derive(Debug, Deserialize)]
struct CityRow {
city: &str,
pop: u32,
}
fn doit() -> zip::result::ZipResult<()>
{
let filename = "cities.csv";
let mut zip = zip::ZipArchive::new(File::open(filename).unwrap())?;
let mut file = zip.by_index(0).unwrap();
println!("Filename: {}", file.name());
let mut reader = csv::Reader::from_reader(Box::new(file));
reader.set_headers(csv::StringRecord([ "city", "pop" ]));
for record in reader.records() {
// let record: CityRow = record.unwrap();
// let record = record?;
println!("{:?}", record);
}
Ok(())
}

Use a ReaderBuilder, and call ReaderBuilder::has_headers to disable header parsing. You can then use StringRecord::deserialize to extract and print each record, skipping the first header row:
let mut reader = csv::ReaderBuilder::new()
.has_headers(false)
.from_reader(Box::new(file));
let headers = csv::StringRecord::from(vec!["city", "pop"]);
for record in reader.records().skip(1) {
let record: CityRow = record.unwrap().deserialize(Some(&headers)).unwrap();
println!("{:?}", record);
}
(playground)

#smitop's answer didn't totally make sense to me when looking at the underlying code since the library appears to assume headers will exist by default. This means actually the below should work directly, and I found it did:
let mut reader = csv::Reader::from_reader(data.as_bytes());
for record in reader.deserialize() {
let record: CityRow = record.unwrap();
println!("{:?}", record);
}
I checked through the variants in this playground.
For what it's worth, it turned out in my case I had accidentally left a code path in that was reading my csv as a plain file, which is why I had seen headers read as a row. (Oops.)

Related

How can I return a record from a CSV file using the byte position of line?

I have an assets.csv file with 172 MB, a million rows, and 16 columns. I would like to read it using an offset -> bytes/line/record. In the code below, I am using the byte value.
I have stored the required positions (record.postion.bytes() in assets_index.csv) and I would like to read a particular line in the assets.csv using the saved offset.
I am able to get an output, but I feel there must be a better way to read from a CSV file based on byte position.
Please advise. I am new to programming and also to Rust, and learned a lot using the tutorials.
The assets.csv is of this format:
asset_id,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation
1000001,2015,10000,2016,10000,2017,10000,2018,10000,2019,10000,2020,10000,2021,10000,2022,10000,2023,10000,2024,10000,2025,10000,2026,10000,2027,10000,2028,10000,2029,10000
I used another function to get the Position { byte: 172999933, line: 1000000, record: 999999 }.
The assets_index.csv is of this format:
asset_id,offset_inbytes
1999999,172999933
fn read_from_position() -> Result<(), Box<dyn Error>> {
let asset_pos = 172999933 as u64;
let file_path = "assets.csv";
let mut rdr = csv::ReaderBuilder::new()
.flexible(true)
.from_path(file_path)?;
let mut wtr = csv::Writer::from_writer(io::stdout());
let mut record = csv::ByteRecord::new();
while rdr.read_byte_record(&mut record)? {
let pos = &record.position().expect("position of record");
if pos.byte() == asset_pos
{
wtr.write_record(&record)?;
break;
}
}
wtr.flush()?;
Ok(())
}
$ time ./target/release/testcsv
1999999,2015,10000,2016,10000,2017,10000,2018,10000,2019,10000,2020,10000,2021,10000,2022,10000,2023,10000,2024,10000,2025,10000,2026,10000,2027,10000,2028,10000,2029,10000
Time elapsed in readcsv() is: 239.290125ms
./target/release/testcsv 0.22s user 0.02s system 99% cpu 0.245 total
Instead of using from_path you can use from_reader with a File and seek in that file before creating the CsvReader:
use std::{error::Error, fs, io::{self, Seek}};
fn read_from_position() -> Result<(), Box<dyn Error>> {
let asset_pos = 0x116 as u64; // offset to only record in example
let file_path = "assets.csv";
let mut f = fs::File::open(file_path)?;
f.seek(io::SeekFrom::Start(asset_pos))?;
let mut rdr = csv::ReaderBuilder::new()
.flexible(true)
// edit: as noted by #BurntSushi5 we have to disable headers here.
.has_headers(false)
.from_reader(f);
let mut wtr = csv::Writer::from_writer(io::stdout());
let mut record = csv::ByteRecord::new();
rdr.read_byte_record(&mut record)?;
wtr.write_record(&record)?;
wtr.flush()?;
Ok(())
}
Then the first record read will be the one you're looking for.

JSON to dict with class

I decide some JSON and try to typecast it to a dictionary of String: classy and it fails. I have found that often the reason I have trouble doing something is because of a misunderstanding of how Swift works, so here is what I want to happen. Feel free to tell me that I am doing it wrong and if I do it this way all will be wonderful.
I want my data to live between runs of the app so I have to save the data to storage between runs. I have an object, data and associated code, and I have places where changes I make to a copy should reflect back to the original so it is a class. I have a bunch of these objects and most of the time I pick the one I want based on an id that is an integer. An array is not good since it would be a sparse array cause come ids are not used. I came up with a dictionary with a key of the id and data of the structure. I turned the key from an Int to a String, by changing the Int id to a String, cause converting a dictionary to JSON is MUCH easier for a key that is a string. I save the JSON string. When the app starts again I read the string in and convert the JSON string to Any. Then I typecast the result to the desired dictionary. This is where it fails. The cast does not work. In my Googling the samples I found said this should work.
Here is my code:
class Y: Codable, Hashable {
var a: String = "c"
static func ==(lhs: Y, rhs: Y) -> Bool {
return lhs.a == rhs.a
}
func hash(into hasher: inout Hasher) {
hasher.combine(a)
}
}
struct ContentView: View {
var body: some View {
VStack {
Button ("Error") {
var y = Y()
var yDict = [String: Y]()
yDict["0"] = y
do {
let encodedData = try JSONEncoder().encode(yDict)
let jsonString = String(data: encodedData, encoding: .utf8)
let decoded = try JSONSerialization.jsonObject(with: encodedData, options: [])
if let yyDictDec = decoded as? [String:Y] {
print("yDict after decide")
print (yyDictDec)
}
} catch {
print(error.localizedDescription)
}
print("x")
}
}
}
}
In this code the if yyDictDec = is failing, I think, cause the prints after it never happen. I can cast it as [String, Any] but I really need it to be my class.
My problem is in the convert JSON back to the dictionary. I feel I am missing something fairly simple.
DonĀ“t use JSONSerialization use JsonDecoder and decode it to the the type it was before encoding. e.g.:
let decoded = try JSONDecoder().decode([String: Y].self, from: encodedData)

Unit testing CSV parsing logic

I'm using the CSV crate to read CSV files. I then parse the content. I would like to unit test the parsing logic. Here's a simplified version of the code:
fn main() -> Result<(), Box<dyn Error>> {
let mut rdr = csv::ReaderBuilder::new()
.from_path("test.csv")?;
process(rdr.records());
Ok(())
}
fn process(iter: StringRecordsIter<File>) -> Result<String, String> {
for result in iter {
// Parsing takes place here
println!("{:?}", result);
}
// Post-parsing using entire file content takes place here
Ok(String::from("My Result Here"))
}
In my unit test I would like to be able to construct sequences of StringRecord objects, pass them to the process() method and validate the results. I can successfully create a StringRecord using the simple StringRecord::new() and fill it with values using record.push_field("my field value"). However, I'm struggling to create an iterator that returns my values to pass to the process(). Any suggestions? I'm happy to change the arguments to process() if this makes things easier.
The suggestion made by Jmb to change the signature of process() to fn process(iter: impl Iterator<Item = csv::Result<StringRecord>>) -> Result<String, String> works nicely.
Here's the solution in detail. Firstly the only change to process() is to make it accept more types:
fn process(iter: impl Iterator<Item = csv::Result<StringRecord>>) -> Result<String, String> {
for result in iter {
// Parsing takes place here
println!("{:?}", result);
}
// Post-parsing using entire file content takes place here
Ok(String::from("My Result Here"))
}
The main() remains identical as rdr.records can still be passed to process(). Then the testing looks like this:
#[test]
fn my_test() -> Result<(), String> {
let record1 = result_record(&["Value 1", "Value 2"]);
let record2 = result_record(&["Value 3", "Value 4"]);
let records = vec![record1, record2];
let result = process(records.into_iter())?;
assert_eq!("My Result Here", result);
Ok(())
}
fn result_record(fields: &[&str]) -> csv::Result<StringRecord> {
let mut record = StringRecord::new();
for field in fields {
record.push_field(field);
}
Ok(record)
}

Function for reading in JSON files

I am both new to stackoverflow as well as Swift programming (coming from statistics/R), so please bear with me in case I'm doing something super wrong. Ok, lets get to my problem: I want to read in multiple JSON files, e.g. metrics.json and accounts.json. After having read various posts on that topic, I have now written the following function and created one struct for each of the JSON files, replicating its structure:
func readJsonFile(fileName: String) -> [metric]? {
var result: [metric]?
if let path = Bundle.main.path(forResource: fileName, ofType: "json") {
do {
let jsonData = try Data(contentsOf: URL(fileURLWithPath: path))
result = try JSONDecoder().decode([metric].self, from: jsonData)
} catch {
fatalError("Unresolved error when loading JSON file \(fileName): \(error)")
}
}
return result
}
struct Metric: Codable {
let METRIC_ID: Int
let METRIC_NAME: String
}
struct Account: Codable {
let ACCOUNT_ID: Int
let ACCOUNT_NAME: String
}
Now what I was trying to achieve is somehow provide not just the filename to the function but also somehow the respective struct as a blueprint. Also, I would probably need to dynamically change the function output. It then should like somehow like that:
let metrics = readJsonFile("metrics", Metric)
let accounts = readJsonFile("accounts", Account)
I have a feeling that might be an easy thing if one knows his way around. Unfortunately, I do not. Can someone help me with any suggestions? Or should I in general take a different approach? Also, if there is anything else odd or wrong in the code, happy to receive any constructive feedback. Thanks guys.
I think what I would do is make the function generic over Codable, like this:
enum MyError:Error { case noSuchFile }
func readJsonFile<T:Codable>(fileName: String, type:T.Type) throws -> [T] {
if let path = Bundle.main.path(forResource: fileName, ofType: "json") {
let jsonData = try Data(contentsOf: URL(fileURLWithPath: path))
return try JSONDecoder().decode([T].self, from: jsonData)
}
throw MyError.noSuchFile
}
Now you can call it for either file and either struct:
let x = try? self.readJsonFile(fileName:"metric", type:Metric.self)
let y = try? self.readJsonFile(fileName:"account", type:Account.self)
I've revised your function to throw when there's an error, and I count the lack of the appropriate file as an error. The outcome of using try? in the call is that you get an Optional which is nil if things went badly and is an array of the correct struct type if things went well; if you don't like that, you can use do/catch instead of course, or even try! if you're dead sure this will always work (you should be, they are your files after all).

How can I validate that the headers of a CSV file match my struct?

I need to parse a CSV file, but before actually parsing, I need to check if the file header can be assigned to my needs.
The problem is that some fields may be missing or the order of the fields may be different for different files.
I have a struct for dish
struct Dish {
title: String,
ingredients: Vec<String>,
spicy: Option<bool>,
vegetarian: Option<bool>,
}
I need to generate an error for any CSV file with a header that has missing fields from the structure (not Option) or has extra fields:
title;spicy;vegeterian
title;ingredients;poisoned
The csv crate has support for serde. The following example, adapted from the docs should do what you want:
use std::error::Error;
use std::io;
use std::process;
use serde::Deserialize;
#[derive(Debug, Deserialize)]
struct Dish {
title: String,
ingredients: Vec<String>,
spicy: Option<bool>,
vegetarian: Option<bool>,
}
fn example() -> Result<(), Box<dyn Error>> {
let mut rdr = csv::Reader::from_reader(io::stdin());
for result in rdr.deserialize() {
// Notice that we need to provide a type hint for automatic
// deserialization.
let dish: Dish = result?;
println!("{:?}", dish);
}
Ok(())
}
fn main() {
if let Err(err) = example() {
println!("error running example: {}", err);
process::exit(1);
}
}