Read json file in and write without indentation - json

The following code take a folder of json files (saved with indentation) open it, get content and serialize to json and write to file a new file.
Same code task in python works, so it is not the data. But the rust version you see in here:
extern crate rustc_serialize;
use rustc_serialize::json;
use std::io::Read;
use std::fs::read_dir;
use std::fs::File;
use std::io::Write;
use std::io;
use std::str;
fn write_data(filepath: &str, data: json::Json) -> io::Result<()> {
let mut ofile = try!(File::create(filepath));
let encoded: String = json::encode(&data).unwrap();
try!(ofile.write(encoded.as_bytes()));
Ok(())
}
fn main() {
let root = "/Users/bling/github/data/".to_string();
let folder_path = root + &"texts";
let paths = read_dir(folder_path).unwrap();
for path in paths {
let input_filename = format!("{}", path.unwrap().path().display());
let output_filename = str::replace(&input_filename, "texts", "texts2");
let mut data = String::new();
let mut f = File::open(input_filename).unwrap();
f.read_to_string(&mut data).unwrap();
let json = json::Json::from_str(&data).unwrap();
write_data(&output_filename, json).unwrap();
}
}
Do you have spot an Error in my code already or did I get some language concepts wrong. Is the rustc-serialize cargo wrongly used. At the end it does not work as expected - to outperform python.
± % cargo run --release --verbose
Fresh rustc-serialize v0.3.16
Fresh fileprocessing v0.1.0 (file:///Users/bling/github/rust/fileprocessing)
Running `target/release/fileprocessing`
thread '<main>' panicked at 'called `Result::unwrap()` on an `Err` value: SyntaxError("unescaped control character in string", 759, 55)', ../src/libcore/result.rs:736
Process didn't exit successfully: `target/release/fileprocessing` (exit code: 101)
Why does it throw an error is my serializing json done wrong?
Can I get the object it fails on? What about encoding?
...is the code right or is there something obvious wrong with some more experience?

Wild guess: if the same input file can be parsed by other JSON parsers (e.g. in Python), you may be hitting a rustc-serialize bug that was fixed in https://github.com/rust-lang-nursery/rustc-serialize/pull/142. Try to update?

Related

How to write a vector to a json file?

How to write a vector to a JSON file in rust?
Code:
use std::fs::File;
use std::io::prelude::*;
let vec1 = vec![1.0,2.0,2.1,0.6];
let mut file = File::create("results.json").unwrap();
let data = serde_json::to_string(&vec1).unwrap();
file.write(&data);
Error:
mismatched types
expected reference `&[u8]`
found reference `&std::string::String`rustc(E0308)
Instead of writing the data to an in-memory string first, you can also write it directly to the file:
use std::fs::File;
use std::io::{BufWriter, Write};
fn main() -> std::io::Result<()> {
let vec = vec![1, 2, 3];
let file = File::create("a")?;
let mut writer = BufWriter::new(file);
serde_json::to_writer(&mut writer, &vec)?;
writer.flush()?;
Ok(())
}
This approach has a lower memory footprint and is generally preferred. Note that you should use buffered writes to the file, since serialization could otherwise result in many small writes to the file, which would severely reduce performance.
If you need to write the data to memory first for some reason, I suggest using serde_json::to_vec() instead of serde_json::to_string(), since that function will give you a Vec<u8> immediately.

Error while reading JSON file in Rust using BufReader: the trait bound Result: std::io::Read is not satisfied [duplicate]

This question already has answers here:
Unable to read file contents to string - Result does not implement any method in scope named `read_to_string`
(2 answers)
Closed 2 years ago.
I am trying to read JSON from a file:
use std::error::Error;
use std::fs::File;
use std::io::BufReader;
use std::path::Path;
impl Params {
pub fn new(raw_opt2: opt::Opt, path: String) -> Self {
// Open the file in read-only mode with buffer.
let file = File::open(path);
let reader = BufReader::new(file);
Self {
opt_raw: raw_opt2,
module_settings: serde_json::from_reader(reader).unwrap(),
}
}
}
But I'm getting an error:
error[E0277]: the trait bound `std::result::Result<std::fs::File, std::io::Error>: std::io::Read` is not satisfied
--> src\params.rs:20:37
|
20 | let reader = BufReader::new(file);
| ^^^^ the trait `std::io::Read` is not implemented for `std::result::Result<std::fs::File, std::io::Error>`
|
= note: required by `std::io::BufReader::<R>::new`
The File::open operation returns a Result - signifying that the open operation could succeed or fail.
This is one standout feature of Rust compared to many other languages; it tries to force you to deal with errors. Instead of:
C - just returns an int
Python - exceptions (try: finally:)
C++ - exceptions (needs a libstdc++ runtime)
As you can expect, this leads to more programming time at the start, but overall much less hassles and higher quality programs.
After the line let file = File::open(path); you have to deal with the result.
If you don't care, and want to crash the program if the file can't be opened:
let file = File::open(path).unwrap();
To make a better error message in the crash:
let file = File::open(path).expect("Unable to open file");
To do it properly - read the Rust book
Most likely, you'll want to return a Result yourself from your function. Then you could rewrite it something like this (to use a match):
impl Params {
pub fn new(raw_opt2: opt::Opt, path: String) -> Result<Self, std::io::Error> {
// Open the file in read-only mode with buffer.
match File::open(path) {
Ok(file) => {
let reader = BufReader::new(file);
Ok(Self {
opt_raw: raw_opt2,
module_settings: serde_json::from_reader(reader).unwrap(),
})
}
Err(err) => Err(err),
}
}
}
.. or a more functional way:
impl Params {
pub fn new(raw_opt2: opt::Opt, path: String) -> Result<Self, std::io::Error> {
// Open the file in read-only mode with buffer.
File::open(path).map(|file| {
let reader = BufReader::new(file);
Self {
opt_raw: raw_opt2,
module_settings: serde_json::from_reader(reader).unwrap(),
}
})
}
}
Update:
Now I generally use these two libraries for error management:
thiserror When I'm writing libraries and creating my own error types
anyhow When writing applications or scripts or tests to easily handle all the library errors.
.. and of course I didn't mention the ? operator, which makes working with results and options so much easier.

How to read a non-UTF8 encoded csv file?

With the csv crate and the latest Rust version 1.31.0, I would want to read CSV files with ANSI (Windows 1252) encoding, as easily as in UTF-8.
Things I have tried (with no luck), after reading the whole file in a Vec<u8>:
CString
OsString
Indeed, in my company, we have a lot of CSV files, ANSI encoded.
Also, I would want, if possible, not to load the entire file in a Vec<u8> but a reading line by line (CRLF ending), as many of the files are big (50 Mb or more…).
In the file Cargo.toml, I have this dependency:
[dependencies]
csv = "1"
test.csv consist of the following content saved as Windows-1252 encoding:
Café;au;lait
Café;au;lait
The code in main.rs file:
extern crate csv;
use std::error::Error;
use std::fs::File;
use std::io::BufReader;
use std::path::Path;
use std::process;
fn example() -> Result<(), Box<Error>> {
let file_name = r"test.csv";
let file_handle = File::open(Path::new(file_name))?;
let reader = BufReader::new(file_handle);
let mut rdr = csv::ReaderBuilder::new()
.delimiter(b';')
.from_reader(reader);
// println!("ANSI");
// for result in rdr.byte_records() {
// let record = result?;
// println!("{:?}", record);
// }
println!("UTF-8");
for result in rdr.records() {
let record = result?;
println!("{:?}", record);
}
Ok(())
}
fn main() {
if let Err(err) = example() {
println!("error running example: {}", err);
process::exit(1);
}
}
The output is:
UTF-8
error running example: CSV parse error: record 0 (line 1, field: 0, byte: 0): invalid utf-8: invalid UTF-8 in field 0 near byte index 3
error: process didn't exit successfully: `target\debug\test-csv.exe` (exit code: 1)
When using rdr.byte_records() (uncommenting the relevant part of code), the output is:
ANSI
ByteRecord(["Caf\\xe9", "au", "lait"])
I suspect this question is under specified. In particular, it's not clear why your use of the ByteRecord API is insufficient. In the csv crate, byte records specifically exists for exactly cases like this, where your CSV data isn't strictly UTF-8, but is in an alternative encoding such as Windows-1252 that is ASCII compatible. (An ASCII compatible encoding is an encoding in which ASCII is a subset. Windows-1252 and UTF-8 are both ASCII compatible. UTF-16 is not.) Your code sample above shows that you're using byte records, but doesn't explain why this is insufficient.
With that said, if your goal is to get your data into Rust's string data type (String/&str), then your only option is to transcode the contents of your CSV data from Windows-1252 to UTF-8. This is necessary because Rust's string data type uses UTF-8 for its in-memory representation. You cannot have a Rust String/&str that is Windows-1252 encoded because Windows-1252 is not a subset of UTF-8.
Other comments have recommended the use of the encoding crate. However, I would instead recommend the use of encoding_rs, if your use case aligns with the same use cases solved by the Encoding Standard, which is specifically geared towards the web. Fortunately, I believe such an alignment exists.
In order to satisfy your criteria for reading CSV data in a streaming fashion without first loading the entire contents into memory, you need to use a wrapper around the encoding_rs crate that implements streaming decoding for you. The encoding_rs_io crate provides this for you. (It's used inside of ripgrep to do fast streaming decoding before searching UTF-8.)
Here is an example program that puts all of the above together, using Rust 2018:
use std::fs::File;
use std::process;
use encoding_rs::WINDOWS_1252;
use encoding_rs_io::DecodeReaderBytesBuilder;
fn main() {
if let Err(err) = try_main() {
eprintln!("{}", err);
process::exit(1);
}
}
fn try_main() -> csv::Result<()> {
let file = File::open("test.csv")?;
let transcoded = DecodeReaderBytesBuilder::new()
.encoding(Some(WINDOWS_1252))
.build(file);
let mut rdr = csv::ReaderBuilder::new()
.delimiter(b';')
.from_reader(transcoded);
for result in rdr.records() {
let r = result?;
println!("{:?}", r);
}
Ok(())
}
with the Cargo.toml:
[package]
name = "so53826986"
version = "0.1.0"
edition = "2018"
[dependencies]
csv = "1"
encoding_rs = "0.8.13"
encoding_rs_io = "0.1.3"
And the output:
$ cargo run --release
Compiling so53826986 v0.1.0 (/tmp/so53826986)
Finished release [optimized] target(s) in 0.63s
Running `target/release/so53826986`
StringRecord(["Café", "au", "lait"])
In particular, if you swap out rdr.records() for rdr.byte_records(), then we can see more clearly what happened:
$ cargo run --release
Compiling so53826986 v0.1.0 (/tmp/so53826986)
Finished release [optimized] target(s) in 0.61s
Running `target/release/so53826986`
ByteRecord(["Caf\\xc3\\xa9", "au", "lait"])
Namely, your input contained Caf\xE9, but the byte record now contains Caf\xC3\xA9. This is a result of translating the Windows-1252 codepoint value of 233 (encoded as its literal byte, \xE9) to U+00E9 LATIN SMALL LETTER E WITH ACUTE, which is UTF-8 encoded as \xC3\xA9.

How to iterate / stream a gzip file (containing a single csv)?

How to iterate over a gziped file which contains a single text file (csv)?
Searching crates.io I found flate2 which has the following code example for decompression:
extern crate flate2;
use std::io::prelude::*;
use flate2::read::GzDecoder;
fn main() {
let mut d = GzDecoder::new("...".as_bytes()).unwrap();
let mut s = String::new();
d.read_to_string(&mut s).unwrap();
println!("{}", s);
}
How to stream a gzip csv file?
For stream io operations rust has the Read and Write traits. To iterate over input by lines you usually want the BufRead trait, which you can always get by wrapping a Read implementation in BufReader::new.
flate2 already operates with these traits; GzDecoder implements Read, and GzDecoder::new takes anything that implements Read.
Example decoding stdin (doesn't work well on playground of course):
extern crate flate2;
use std::io;
use std::io::prelude::*;
use flate2::read::GzDecoder;
fn main() {
let stdin = io::stdin();
let stdin = stdin.lock(); // or just open any normal file
let d = GzDecoder::new(stdin).expect("couldn't decode gzip stream");
for line in io::BufReader::new(d).lines() {
println!("{}", line.unwrap());
}
}
You can then decode your lines with your usual ("without gzip") logic; perhaps make it generic by taking any input implementing BufRead.

Simple modification of JSON object without serialization gets "cannot borrow immutable borrowed content as mutable"

I have a JSON encoded object in Rust 1.6.0. I want to decode it from JSON, change the value of one key, and convert it back to a JSON encoded string again. I don't want to write a struct to hold the data.
I am using rustc_serialize, which mostly seems to be built around serializing structs and automatically doing that, but I just want a simple JSON modification.
json_contents is a String that has the original, encoded JSON object.
let new_value = json::Json::from_str(&format!("[\"http://localhost:{}\"]", port)).unwrap();
let mut myjson_0 = json::Json::from_str(&json_contents).unwrap();
let mut myjson = tilejson_0.as_object().unwrap();
myjson.insert("mykey".to_owned(), new_value);
let new_json: String = json::encode(&myjson).unwrap();
However I get the following error:
src/main.rs:53:5: 53:13 error: cannot borrow immutable borrowed content `*myjson` as mutable
src/main.rs:53 myjson.insert("mykey".to_owned(), new_value);
^~~~~~
error: aborting due to previous error
How can I compile this? Is there a better, simpler, JSON Rust library I can use?
Some debugging has fixed this problem for me:
I replaced this code:
let mut myjson = tilejson_0.as_object().unwrap();
With this, to ensure that I had the type that I thought I had:
let mut myjson: BTreeMap<String, json::Json> = tilejson_0.as_object().unwrap();
and I got this compiler error:
src/main.rs:52:54: 52:85 error: mismatched types:
expected `collections::btree::map::BTreeMap<collections::string::String, rustc_serialize::json::Json>`,
found `&collections::btree::map::BTreeMap<collections::string::String, rustc_serialize::json::Json>`
(expected struct `collections::btree::map::BTreeMap`,
found &-ptr) [E0308]
src/main.rs:52 let mut myjson: BTreeMap<String, json::Json> tilejson_0.as_object().unwrap();
Clearly I was wrong. Rather than an owned BTreeMap, I had a reference to one, &BTreeMap.
The solution was to change the line to this:
let mut myjson = tilejson_0.as_object().unwrap().to_owned();
And everything compiled and worked (so far)