As an example, when the below code is run, each time the previous test.csv file is overwritten with a new one. How to append to test.csv instead of overwriting it?
extern crate csv;
use std::error::Error;
use std::process;
fn run() -> Result<(), Box<Error>> {
let file_path = std::path::Path::new("test.csv");
let mut wtr = csv::Writer::from_path(file_path).unwrap();
wtr.write_record(&["City", "State", "Population", "Latitude", "Longitude"])?;
wtr.write_record(&["Davidsons Landing", "AK", "", "65.2419444", "-165.2716667"])?;
wtr.write_record(&["Kenai", "AK", "7610", "60.5544444", "-151.2583333"])?;
wtr.write_record(&["Oakman", "AL", "", "33.7133333", "-87.3886111"])?;
wtr.flush()?;
Ok(())
}
fn main() {
if let Err(err) = run() {
println!("{}", err);
process::exit(1);
}
}
Will the append solution work if the file does not yet exist?
The csv crate provides Writer::from_writer so you can use anything, which implements Write. When using File, this answer from What is the best variant for appending a new line in a text file? shows a solution:
Using OpenOptions::append is the clearest way to append to a file
let mut file = OpenOptions::new()
.write(true)
.append(true)
.open("test.csv")
.unwrap();
let mut wtr = csv::Writer::from_writer(file);
Will the append solution work if the file does not yet exist?
Just add create(true) to the OpenOptions:
let mut file = OpenOptions::new()
.write(true)
.create(true)
.append(true)
.open("test.csv")
.unwrap();
let mut wtr = csv::Writer::from_writer(file);
Related
I have an assets.csv file with 172 MB, a million rows, and 16 columns. I would like to read it using an offset -> bytes/line/record. In the code below, I am using the byte value.
I have stored the required positions (record.postion.bytes() in assets_index.csv) and I would like to read a particular line in the assets.csv using the saved offset.
I am able to get an output, but I feel there must be a better way to read from a CSV file based on byte position.
Please advise. I am new to programming and also to Rust, and learned a lot using the tutorials.
The assets.csv is of this format:
asset_id,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation
1000001,2015,10000,2016,10000,2017,10000,2018,10000,2019,10000,2020,10000,2021,10000,2022,10000,2023,10000,2024,10000,2025,10000,2026,10000,2027,10000,2028,10000,2029,10000
I used another function to get the Position { byte: 172999933, line: 1000000, record: 999999 }.
The assets_index.csv is of this format:
asset_id,offset_inbytes
1999999,172999933
fn read_from_position() -> Result<(), Box<dyn Error>> {
let asset_pos = 172999933 as u64;
let file_path = "assets.csv";
let mut rdr = csv::ReaderBuilder::new()
.flexible(true)
.from_path(file_path)?;
let mut wtr = csv::Writer::from_writer(io::stdout());
let mut record = csv::ByteRecord::new();
while rdr.read_byte_record(&mut record)? {
let pos = &record.position().expect("position of record");
if pos.byte() == asset_pos
{
wtr.write_record(&record)?;
break;
}
}
wtr.flush()?;
Ok(())
}
$ time ./target/release/testcsv
1999999,2015,10000,2016,10000,2017,10000,2018,10000,2019,10000,2020,10000,2021,10000,2022,10000,2023,10000,2024,10000,2025,10000,2026,10000,2027,10000,2028,10000,2029,10000
Time elapsed in readcsv() is: 239.290125ms
./target/release/testcsv 0.22s user 0.02s system 99% cpu 0.245 total
Instead of using from_path you can use from_reader with a File and seek in that file before creating the CsvReader:
use std::{error::Error, fs, io::{self, Seek}};
fn read_from_position() -> Result<(), Box<dyn Error>> {
let asset_pos = 0x116 as u64; // offset to only record in example
let file_path = "assets.csv";
let mut f = fs::File::open(file_path)?;
f.seek(io::SeekFrom::Start(asset_pos))?;
let mut rdr = csv::ReaderBuilder::new()
.flexible(true)
// edit: as noted by #BurntSushi5 we have to disable headers here.
.has_headers(false)
.from_reader(f);
let mut wtr = csv::Writer::from_writer(io::stdout());
let mut record = csv::ByteRecord::new();
rdr.read_byte_record(&mut record)?;
wtr.write_record(&record)?;
wtr.flush()?;
Ok(())
}
Then the first record read will be the one you're looking for.
I would like to load the following csv file, which has a difference of notation rules between before 2nd line and after 3rd line.
test.csv
[YEAR],2022,[Q],1,
[TEST],mid-term,[GRADE],3,
FirstName,LastName,Score,
AA,aaa,97,
BB,bbbb,15,
CC,cccc,66,
DD,ddd,73,
EE,eeeee,42,
FF,fffff,52,
GG,ggg,64,
HH,h,86,
II,iii,88,
JJ,jjjj,72,
However, I have the following error. I think this error is caused by the difference of notation rules. How do I correct this error and load the csv file as I want.
error message
StringRecord(["[YEAR]", "2022", "[Q]", "1", ""])
StringRecord(["[TEST]", "mid-term", "[GRADE]", "3", ""])
Error: Error(UnequalLengths { pos: Some(Position { byte: 47, line: 2, record: 2 }), expected_len: 5, len: 4 })
error: process didn't exit successfully: `target\debug\read_csv.exe` (exit code: 1)
main.rs
use csv::Error;
use csv::ReaderBuilder;
use encoding_rs;
use std::fs;
fn main() -> Result<(), Error> {
let path = "./test.csv";
let file = fs::read(path).unwrap();
let (res, _, _) = encoding_rs::SHIFT_JIS.decode(&file);
let mut reader = ReaderBuilder::new()
.has_headers(false)
.from_reader(res.as_bytes());
for result in reader.records() {
let record = result?;
println!("{:?}", record)
}
Ok(())
}
Version
cargo = "1.62.0"
rustc = "1.62.0"
csv = "1.1.6"
encoding_rs = "0.8.31"
I can correct this error by using "flexible" method.
use csv::Error;
use csv::ReaderBuilder;
use encoding_rs;
use std::fs;
fn main() -> Result<(), Error> {
let path = "./test.csv";
let file = fs::read(path).unwrap();
let (res, _, _) = encoding_rs::SHIFT_JIS.decode(&file);
let mut reader = ReaderBuilder::new()
+ .flexible(true)
.has_headers(false)
.from_reader(res.as_bytes());
for result in reader.records() {
let record = result?;
println!("{:?}", record)
}
Ok(())
}
How does one read a CSV without a header in Rust? I've searched through the docs and gone through like 15 examples each of which is subtly not what I'm looking for.
Consider how easy Python makes it:
csv.DictReader(f, fieldnames=['city'])
How do you do this in Rust?
Current attempt:
use std::fs::File;
use serde::Deserialize;
#[derive(Debug, Deserialize)]
struct CityRow {
city: &str,
pop: u32,
}
fn doit() -> zip::result::ZipResult<()>
{
let filename = "cities.csv";
let mut zip = zip::ZipArchive::new(File::open(filename).unwrap())?;
let mut file = zip.by_index(0).unwrap();
println!("Filename: {}", file.name());
let mut reader = csv::Reader::from_reader(Box::new(file));
reader.set_headers(csv::StringRecord([ "city", "pop" ]));
for record in reader.records() {
// let record: CityRow = record.unwrap();
// let record = record?;
println!("{:?}", record);
}
Ok(())
}
Use a ReaderBuilder, and call ReaderBuilder::has_headers to disable header parsing. You can then use StringRecord::deserialize to extract and print each record, skipping the first header row:
let mut reader = csv::ReaderBuilder::new()
.has_headers(false)
.from_reader(Box::new(file));
let headers = csv::StringRecord::from(vec!["city", "pop"]);
for record in reader.records().skip(1) {
let record: CityRow = record.unwrap().deserialize(Some(&headers)).unwrap();
println!("{:?}", record);
}
(playground)
#smitop's answer didn't totally make sense to me when looking at the underlying code since the library appears to assume headers will exist by default. This means actually the below should work directly, and I found it did:
let mut reader = csv::Reader::from_reader(data.as_bytes());
for record in reader.deserialize() {
let record: CityRow = record.unwrap();
println!("{:?}", record);
}
I checked through the variants in this playground.
For what it's worth, it turned out in my case I had accidentally left a code path in that was reading my csv as a plain file, which is why I had seen headers read as a row. (Oops.)
I'm using the CSV crate to read CSV files. I then parse the content. I would like to unit test the parsing logic. Here's a simplified version of the code:
fn main() -> Result<(), Box<dyn Error>> {
let mut rdr = csv::ReaderBuilder::new()
.from_path("test.csv")?;
process(rdr.records());
Ok(())
}
fn process(iter: StringRecordsIter<File>) -> Result<String, String> {
for result in iter {
// Parsing takes place here
println!("{:?}", result);
}
// Post-parsing using entire file content takes place here
Ok(String::from("My Result Here"))
}
In my unit test I would like to be able to construct sequences of StringRecord objects, pass them to the process() method and validate the results. I can successfully create a StringRecord using the simple StringRecord::new() and fill it with values using record.push_field("my field value"). However, I'm struggling to create an iterator that returns my values to pass to the process(). Any suggestions? I'm happy to change the arguments to process() if this makes things easier.
The suggestion made by Jmb to change the signature of process() to fn process(iter: impl Iterator<Item = csv::Result<StringRecord>>) -> Result<String, String> works nicely.
Here's the solution in detail. Firstly the only change to process() is to make it accept more types:
fn process(iter: impl Iterator<Item = csv::Result<StringRecord>>) -> Result<String, String> {
for result in iter {
// Parsing takes place here
println!("{:?}", result);
}
// Post-parsing using entire file content takes place here
Ok(String::from("My Result Here"))
}
The main() remains identical as rdr.records can still be passed to process(). Then the testing looks like this:
#[test]
fn my_test() -> Result<(), String> {
let record1 = result_record(&["Value 1", "Value 2"]);
let record2 = result_record(&["Value 3", "Value 4"]);
let records = vec![record1, record2];
let result = process(records.into_iter())?;
assert_eq!("My Result Here", result);
Ok(())
}
fn result_record(fields: &[&str]) -> csv::Result<StringRecord> {
let mut record = StringRecord::new();
for field in fields {
record.push_field(field);
}
Ok(record)
}
I'm trying to return a JSON file using a Nickel template. I found some API sample code that returns a JSON response and modified it:
extern crate rustc_serialize;
#[macro_use]
extern crate nickel;
use nickel::{Nickel, HttpRouter, JsonBody};
use nickel::mimes::MediaType;
use nickel::status::*;
use rustc_serialize::json;
use std::collections::HashMap;
#[derive(RustcDecodable, RustcEncodable)]
struct Person {
firstname: String,
lastname: String,
}
fn main() {
let mut server = Nickel::new();
server.get("/post", middleware! { |request, mut response|
let person: Person = Person { firstname: "firstName ".to_string(), lastname: "lastName".to_string()};
let mut p: Vec<Person> = vec![];
p.push(person);
let json_data = json::encode(&p).unwrap();
let mut data_result = "{\"status\": 200, \"data\":".to_owned();
data_result.push_str(&json_data.to_string());
data_result.push_str("}");
response.set(StatusCode::Ok);
response.set(MediaType::Json);
format!("{}", data_result)
});
server.get("/json", middleware! { |_, response|
let mut data = HashMap::new();
data.insert("name", "user");
return response.render("app/views/temp.tpl", &data);
// template source
//
//{name: {{name}}}
//
});
server.listen("127.0.0.1:6767");
}
And /post returns this JSON:
{ "status": 200, "data": [{ "firstname": "firstName ", "lastname": "lastName" ]}
/json returns this text:
"name: user"
How to return a JSON file using templates?
It actually returns
{name: user}
All you need to do is add response.set(MediaType::Json); like you already have in the other handler:
#[macro_use]
extern crate nickel;
use nickel::{Nickel, HttpRouter};
use nickel::mimes::MediaType;
use std::collections::HashMap;
fn main() {
let mut server = Nickel::new();
server.get("/json", middleware! { |_, mut response| {
let mut data = HashMap::new();
data.insert("name", "user");
response.set(MediaType::Json);
return response.render("app/views/temp.tpl", &data);
}});
server.listen("127.0.0.1:6767");
}
Now, this may not be a good idea. Creating structured formats (CSV, JSON, XML, etc.) via string concatenation often has problems with malformed documents or improperly escaped data.