How to write a vector to a json file? - json

How to write a vector to a JSON file in rust?
Code:
use std::fs::File;
use std::io::prelude::*;
let vec1 = vec![1.0,2.0,2.1,0.6];
let mut file = File::create("results.json").unwrap();
let data = serde_json::to_string(&vec1).unwrap();
file.write(&data);
Error:
mismatched types
expected reference `&[u8]`
found reference `&std::string::String`rustc(E0308)

Instead of writing the data to an in-memory string first, you can also write it directly to the file:
use std::fs::File;
use std::io::{BufWriter, Write};
fn main() -> std::io::Result<()> {
let vec = vec![1, 2, 3];
let file = File::create("a")?;
let mut writer = BufWriter::new(file);
serde_json::to_writer(&mut writer, &vec)?;
writer.flush()?;
Ok(())
}
This approach has a lower memory footprint and is generally preferred. Note that you should use buffered writes to the file, since serialization could otherwise result in many small writes to the file, which would severely reduce performance.
If you need to write the data to memory first for some reason, I suggest using serde_json::to_vec() instead of serde_json::to_string(), since that function will give you a Vec<u8> immediately.

Related

Convert serde_json Value keys to camelCase

I'm writing a CLI tool that reads JSON files and is supposed to convert the JSON object keys into camelCase.
Because this should work with any JSON file, I obviously can't just use strong typing and then #[serde(rename_all = "camelCase")].
I can't seem to find an obvious way in serde_json to make it use the already existing renaming code that serde clearly has and apply it to a serde_json::value::Value.
Am I missing something obvious?
You'll have to write a function that recurses through the serde_json::Value structure and replaces the keys of serde_json::Map whenever it encounters one. That's a bit awkward to implement, as there is no Map::drain.
fn rename_keys(json: &mut serde_json::Value) {
match json {
serde_json::Value::Array(a) => a.iter_mut().for_each(rename_keys),
serde_json::Value::Object(o) => {
let mut replace = serde_json::Map::with_capacity(o.len());
o.retain(|k, v| {
rename_keys(v);
replace.insert(
heck::ToLowerCamelCase::to_lower_camel_case(k.as_str()),
std::mem::replace(v, serde_json::Value::Null),
);
true
});
*o = replace;
}
_ => (),
}
}
use std::io::Read;
fn main() {
let mut stdin = vec![];
std::io::stdin()
.read_to_end(&mut stdin)
.expect("Read stdin");
let mut json = serde_json::from_slice::<serde_json::Value>(&stdin).expect("Parse Json");
rename_keys(&mut json);
println!("{}", serde_json::to_string_pretty(&json).unwrap());
}
(Note that rename_keys will produce a stack overflow on deep JSON structures, but serde_json only parses to a limited depth by default, so no need to worry. If you do need support for deeply nested structures, have a look at serde_stacker.)
If you're not interested in the serde_json::Value itself and just want to transform a JSON string, there's two more ways to go on about this:
You could do the renaming on serialization, by writing a custom serializer for a wrapper struct around serde_json::Value. An example of such a serializer is here, but you'd have to adopt it to be recursive. (Possibly, doing it at deserialization might be easier than at serialization)
Write a JSON tokenizer (or grab a crate that contains one) to skip creating the actual serde_json::Value structure and to the renaming on the token stream (no need to worry when working with GBs of JSON)

How I can I lazily read multiple JSON values from a file/stream in Rust?

I'd like to read multiple JSON objects from a file/reader in Rust, one at a time. Unfortunately serde_json::from_reader(...) just reads until end-of-file; there doesn't seem to be any way to use it to read a single object or to lazily iterate over the objects.
Is there any way to do this? Using serde_json would be ideal, but if there's a different library I'd be willing use that instead.
At the moment I'm putting each object on a separate line and parsing them individually, but I would really prefer not to need to do this.
Example Use
main.rs
use serde_json;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let stdin = std::io::stdin();
let stdin = stdin.lock();
for item in serde_json::iter_from_reader(stdin) {
println!("Got {:?}", item);
}
Ok(())
}
in.txt
{"foo": ["bar", "baz"]} 1 2 [] 4 5 6
example session
Got Object({"foo": Array([String("bar"), String("baz")])})
Got Number(1)
Got Number(2)
Got Array([])
Got Number(4)
Got Number(5)
Got Number(6)
This was a pain when I wanted to do it in Python, but fortunately in Rust this is a directly-supported feature of the de-facto-standard serde_json crate! It isn't exposed as a single convenience function, but we just need to create a serde_json::Deserializer reading from our file/reader, then use its .into_iter() method to get a StreamDeserializer iterator yielding Results containing serde_json::Value JSON values.
use serde_json; // 1.0.39
fn main() -> Result<(), Box<dyn std::error::Error>> {
let stdin = std::io::stdin();
let stdin = stdin.lock();
let deserializer = serde_json::Deserializer::from_reader(stdin);
let iterator = deserializer.into_iter::<serde_json::Value>();
for item in iterator {
println!("Got {:?}", item?);
}
Ok(())
}
One thing to be aware of: if a syntax error is encountered, the iterator will start to produce an infinite sequence of error results and never move on. You need to make sure you handle the errors inside of the loop, or the loop will never end. In the snippet above, we do this by using the ? question mark operator to break the loop and return the first serde_json::Result::Err from our function.

Is there a better way to directly convert a Rust BSON document to JSON?

The idea is getting a cursor from Mongo and serializing the result set to JSON in a string. I have working code:
extern crate bson;
extern crate mongodb;
use mongodb::db::ThreadedDatabase;
use mongodb::{Client, ThreadedClient};
extern crate serde;
extern crate serde_json;
fn main() {
let client =
Client::connect("localhost", 27017).expect("Failed to initialize standalone client.");
let coll = client.db("foo").collection("bar");
let cursor = coll.find(None, None).ok().expect("Failed to execute find.");
let docs: Vec<_> = cursor.map(|doc| doc.unwrap()).collect();
let serialized = serde_json::to_string(&docs).unwrap();
println!("{}", serialized);
}
Is there a better way to do this? If not I will close this thread.
This is the sort of situation that serde-transcode was made for. What it does is it converts directly between serde formats. How it works is it takes in a Deserializer and a Serializer, then directly calls the corresponding serialize function for each deserialized item. Conceptually this is a bit similar to using serde_json::Value as an intermediate format, but it may include some extra type information if available in the input format.
Unfortunatly, the bson crate does not expose bson::de::raw::Deserializer or bson::ser::raw::Serializer so this is not currently possible. If you look in the documentation, the Deserializer and Serializer actually refer to different structs which handle the conversion to and from the Bson enum.
If bson::de::raw::Deserializer was public, then this code would have the desired effect. Hopefully this will be helpful to anyone who has a similar problem (or anyone who wants this enough to raise an issue on their repository).
let mut buffer = Vec::new();
// Manually add array separators because the proper way requires going through
// DeserializeSeed and that is a whole other topic.
buffer.push(b'[');
while cursor.advance().await? {
let bytes = cursor.current().as_bytes();
// Create deserializer and serializer
let deserializer = bson::de::raw::Deserializer::new(bytes, false);
let serializer = serde_json::Serializer::new(&mut buffer);
// Transcode between formats
serde_transcode::transcode(deserializer, serializer).unwrap();
// Manually add array separator
buffer.push(b',');
}
// Remove trailing comma and add closing bracket
if buffer.len() > 1 {
buffer.pop();
}
buffer.push(']');
// Do something with the result
println!("{}", String::from_utf8(buffer).unwrap())

How to iterate / stream a gzip file (containing a single csv)?

How to iterate over a gziped file which contains a single text file (csv)?
Searching crates.io I found flate2 which has the following code example for decompression:
extern crate flate2;
use std::io::prelude::*;
use flate2::read::GzDecoder;
fn main() {
let mut d = GzDecoder::new("...".as_bytes()).unwrap();
let mut s = String::new();
d.read_to_string(&mut s).unwrap();
println!("{}", s);
}
How to stream a gzip csv file?
For stream io operations rust has the Read and Write traits. To iterate over input by lines you usually want the BufRead trait, which you can always get by wrapping a Read implementation in BufReader::new.
flate2 already operates with these traits; GzDecoder implements Read, and GzDecoder::new takes anything that implements Read.
Example decoding stdin (doesn't work well on playground of course):
extern crate flate2;
use std::io;
use std::io::prelude::*;
use flate2::read::GzDecoder;
fn main() {
let stdin = io::stdin();
let stdin = stdin.lock(); // or just open any normal file
let d = GzDecoder::new(stdin).expect("couldn't decode gzip stream");
for line in io::BufReader::new(d).lines() {
println!("{}", line.unwrap());
}
}
You can then decode your lines with your usual ("without gzip") logic; perhaps make it generic by taking any input implementing BufRead.

Read json file in and write without indentation

The following code take a folder of json files (saved with indentation) open it, get content and serialize to json and write to file a new file.
Same code task in python works, so it is not the data. But the rust version you see in here:
extern crate rustc_serialize;
use rustc_serialize::json;
use std::io::Read;
use std::fs::read_dir;
use std::fs::File;
use std::io::Write;
use std::io;
use std::str;
fn write_data(filepath: &str, data: json::Json) -> io::Result<()> {
let mut ofile = try!(File::create(filepath));
let encoded: String = json::encode(&data).unwrap();
try!(ofile.write(encoded.as_bytes()));
Ok(())
}
fn main() {
let root = "/Users/bling/github/data/".to_string();
let folder_path = root + &"texts";
let paths = read_dir(folder_path).unwrap();
for path in paths {
let input_filename = format!("{}", path.unwrap().path().display());
let output_filename = str::replace(&input_filename, "texts", "texts2");
let mut data = String::new();
let mut f = File::open(input_filename).unwrap();
f.read_to_string(&mut data).unwrap();
let json = json::Json::from_str(&data).unwrap();
write_data(&output_filename, json).unwrap();
}
}
Do you have spot an Error in my code already or did I get some language concepts wrong. Is the rustc-serialize cargo wrongly used. At the end it does not work as expected - to outperform python.
± % cargo run --release --verbose
Fresh rustc-serialize v0.3.16
Fresh fileprocessing v0.1.0 (file:///Users/bling/github/rust/fileprocessing)
Running `target/release/fileprocessing`
thread '<main>' panicked at 'called `Result::unwrap()` on an `Err` value: SyntaxError("unescaped control character in string", 759, 55)', ../src/libcore/result.rs:736
Process didn't exit successfully: `target/release/fileprocessing` (exit code: 101)
Why does it throw an error is my serializing json done wrong?
Can I get the object it fails on? What about encoding?
...is the code right or is there something obvious wrong with some more experience?
Wild guess: if the same input file can be parsed by other JSON parsers (e.g. in Python), you may be hitting a rustc-serialize bug that was fixed in https://github.com/rust-lang-nursery/rustc-serialize/pull/142. Try to update?