Optionally `Take` records from a CSV - csv

I'm using the Rust csv crate to read CSV files. I want to create the option for the user to take x first records from the CSV.
Given a function like fn read_records(csv_reader: csv::Reader, max_records: Option<usize>) -> ?, I want to do the below:
use std::fs::File;
use std::io::BufReader;
use csv as csv_crate;
use self::csv_crate::StringRecordsIntoIter;
/// Read a csv, and print the first n records
fn read_csv_repro(
mut file: File,
max_read_records: Option<usize>,
) {
let mut csv_reader = csv::ReaderBuilder::new()
.from_reader(BufReader::new(file.try_clone().unwrap()));
let records: Box<StringRecordsIntoIter<std::io::BufReader<std::fs::File>>> = match max_read_records {
Some(max) => {
Box::new(csv_reader.into_records().take(max).into_iter())
},
None => {
Box::new(csv_reader.into_records().into_iter())
}
};
for result in records
{
let record = result.unwrap();
// do something with record, e.g. print values from it to console
let string: Option<&str> = record.get(0);
println!("First record is {:?}", string);
}
}
fn main() {
read_csv_repro(File::open("csv_test.csv").unwrap(), Some(10));
}
(gist)
I'm struggling with getting my code to work, with the below error from the compiler:
error[E0308]: mismatched types
--> src/main.rs:18:22
|
18 | Box::new(csv_reader.into_records().take(max).into_iter())
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected struct `csv::reader::StringRecordsIntoIter`, found struct `std::iter::Take`
|
= note: expected type `csv::reader::StringRecordsIntoIter<_>`
found type `std::iter::Take<csv::reader::StringRecordsIntoIter<_>>`
How can I get the above code to work?

While Nate's answer works for this specific case, the more general solution here is to use trait objects. My impression is that this is what you were intending to do by using Box here. Otherwise, in Nate's solution, the use of Box is completely superfluous.
Here is code that uses trait objects without needing to do take(std::usize::MAX) (using Rust 2018):
use std::fs::File;
use std::io::BufReader;
/// Read a csv, and print the first n records
fn read_csv_repro(
file: File,
max_read_records: Option<usize>,
) {
let csv_reader = csv::ReaderBuilder::new()
.from_reader(BufReader::new(file.try_clone().unwrap()));
let records: Box<Iterator<Item=csv::Result<csv::StringRecord>>> =
match max_read_records {
Some(max) => {
Box::new(csv_reader.into_records().take(max).into_iter())
},
None => {
Box::new(csv_reader.into_records().into_iter())
}
};
for result in records
{
let record = result.unwrap();
// do something with record, e.g. print values from it to console
let string: Option<&str> = record.get(0);
println!("First record is {:?}", string);
}
}
fn main() {
read_csv_repro(File::open("csv_test.csv").unwrap(), Some(10));
}

You have to take(std::usize::MAX) when max_records is None. It's annoying, but both iterators have to have the same type to be stored in the same variable. Also, the .intoIter()'s that you added have no effect, as you were calling them on iterators.
fn read_csv_repro(file: File, max_read_records: Option<usize>) {
let mut csv_reader = csv::Reader::from_reader(BufReader::new(file));
let records: Box<std::iter::Take<StringRecordsIntoIter<std::io::BufReader<std::fs::File>>>> = match max_read_records {
Some(max) => {
Box::new(csv_reader.into_records().take(max))
},
None => {
Box::new(csv_reader.into_records().take(std::usize::MAX))
}
};
}

Related

How can I append json data in memory to the end of a new-line delimited json file without reading the file into memory in Rust? [duplicate]

I am using this code to append a new line to the end of a file:
let text = "New line".to_string();
let mut option = OpenOptions::new();
option.read(true);
option.write(true);
option.create(true);
match option.open("foo.txt") {
Err(e) => {
println!("Error");
}
Ok(mut f) => {
println!("File opened");
let size = f.seek(SeekFrom::End(0)).unwrap();
let n_text = match size {
0 => text.clone(),
_ => format!("\n{}", text),
};
match f.write_all(n_text.as_bytes()) {
Err(e) => {
println!("Write error");
}
Ok(_) => {
println!("Write success");
}
}
f.sync_all();
}
}
It works, but I think it's too difficult. I found option.append(true);, but if I use it instead of option.write(true); I get "Write error".
Using OpenOptions::append is the clearest way to append to a file:
use std::fs::OpenOptions;
use std::io::prelude::*;
fn main() {
let mut file = OpenOptions::new()
.write(true)
.append(true)
.open("my-file")
.unwrap();
if let Err(e) = writeln!(file, "A new line!") {
eprintln!("Couldn't write to file: {}", e);
}
}
As of Rust 1.8.0 (commit) and RFC 1252, append(true) implies write(true). This should not be a problem anymore.
Before Rust 1.8.0, you must use both write and append — the first one allows you to write bytes into a file, the second specifies where the bytes will be written.

Rescript manipulate JSON file

I have this JSON file.
Using rescript I want to :
Read the file.
Extract data from the file.
Write result in a new file.
{
"name": "name",
"examples": [
{
"input": [1,2],
"result": 1
},
{
"input": [3,4],
"result": 3
}
],
}
I was able to acheive this using JavaScript
var file = Fs.readFileSync("file.json", "utf8");
var data = JSON.parse(file);
var name = data.name
var examples = data.examples
for (let i = 0; i< examples.length; i++){
let example = examples[i]
let input = example.input
let result = example.result
let finalResult = `example ${name}, ${input[0]}, ${input[1]}, ${result} \n`
Fs.appendFileSync('result.txt',finalResult)
}
These are my attempts at writing it in Rescript and the issues I ran into.
let file = Node.Fs.readFileSync("file.json", #utf8)
let data = Js.Json.parseExn(file)
let name = data.name //this doesn't work. The record field name can't be found
So I have tried a different approach (which is a little bit limited because I am specifying the type of the data that I want to extract).
#module("fs")
external readFileSync: (
~name: string,
[#utf8],
) => string = "readFileSync"
type data = {name: string, examples: array<Js_dict.t<t>>}
#scope("JSON") #val
external parseIntoMyData: string => data = "parse"
let file = readFileSync(~name="file.json", #utf8)
let parsedData = parseIntoMyData(file)
let name = parsedData.name
let example = parsedData.examples[0]
let input = parsedData.examples[0].input //this wouldn't work
Also tried to use Node.Fs.appendFileSync(...) and I get The value appendFileSync can't be found in Node.Fs
Is there another way to accomplish this?
It's not clear to me why you're using Js.Dict.t<t> for your examples, and what the t in that type refers to. You certainly could use a Js.Dict.t here, and that might make sense if the shape of the data isn't static, but then you'd have to access the data using Js.Dict.get. Seems you want to use record field access instead, and if the data structure is static you can do so if you just define the types properly. From the example you give, it looks like these type definitions should accomplish what you want:
type example {
input: (int, int), // or array<int> if it's not always two elements
result: int,
}
type data = {
name: string,
examples: array<example>,
}

detect an enum at runtime and stringify as keys

playground
I have a bunch of interfaces, at least 2-3 levels nested, where some of the leafs are numbers/strings, etc, but others are (numeric) enums.
I don't want to change this.
Now I want to "serialize" objects that implements my interfaces as JSON. Using JSON.stringify is good for almost all cases, but the enums, that are serialized with their (numerical) value.
I know that it's possible to pass a replacer function to JSON.stringify, but I'm stuck, as I'm not sure how to write a function that detect the structure of my object and replace the enum values with the appropriate names.
example:
enum E { X = 0, Y = 1, Z = 2 }
enum D { ALPHA = 1, BETA = 2, GAMMA = 3 }
interface C { e: E; }
interface B { c?: C; d?: D; }
interface A { b?: B; }
function replacer(this: any, key: string, value: any): any {
return value;
}
function stringify(obj: A): string {
return JSON.stringify(obj, replacer);
}
const expected = '{"b":{"c":{"e":"Y"},"d":"ALPHA"}}';
const recieved = stringify({ b: { c: { e: E.Y }, d: D.ALPHA } });
console.log(expected);
console.log(recieved);
console.log(expected === recieved);
It's not possible to automatically find out which enum was assigned to a field, not even with typescript's emitDecoratorMetadata option. That option can only tell you it's a Number and it will only be emitted on class fields that have other decorators on them.
The best solution you have is to manually add you own metadata. You can do that using reflect-metadata node module.
You'd have to find all enum fields on all of your classes and add metadata saying which enum should be used for serializing that field.
import 'reflect-metadata';
enum E
{
ALPHA = 1,
BETA = 2,
GAMMA = 3,
}
class C
{
// flag what to transform during serialization
#Reflect.metadata('serialization:type', E)
enumField: E;
// the rest will not be affected
number: number;
text: string;
}
This metadata could be added automatically if you can write an additonal step for your compiler, but that is not simple to do.
Then in your replacer you'll be able to check if the field was flagged with this matadata and if it is then you can replace the numeric value with the enum key.
const c = new C();
c.enumField= E.ALPHA;
c.number = 1;
c.text = 'Lorem ipsum';
function replacer(this: any, key: string, value: any): any
{
const enumForSerialization = Reflect.getMetadata('serialization:type', this, key);
return enumForSerialization ? enumForSerialization[value] ?? value : value;
}
function stringify(obj: any)
{
return JSON.stringify(obj, replacer);
}
console.log(stringify(c)); // {"enumField":"ALPHA","number":1,"text":"Lorem ipsum"}
This only works with classes, so you will have to replace your interfaces with classes and replace your plain objects with class instances, otherwise it will not be possible for you to know which interface/class the object represents.
If that is not possible for you then I have a much less reliable solution.
You still need to list all of the enum types for all of the fields of all of your interfaces.
This part could be automated by parsing your typescript source code and extracting the enum types for those enum fields and then saving it in a json file that you can load in runtime.
Then in the replacer you can guess the interface of an object by checking what are all of the fields on the this object and if they match an interface then you can apply enum types that you have listed for that interface.
Did you want something like this? It was the best I could think without using any reflection.
enum E { X = 0, Y = 1, Z = 2 }
enum D { ALPHA = 1, BETA = 2, GAMMA = 3 }
interface C { e: E; }
interface B { c?: C; d?: D; }
interface A { b?: B; }
function replacer(this: any, key: string, value: any): any {
switch(key) {
case 'e':
return E[value];
case 'd':
return D[value];
default:
return value;
}
}
function stringify(obj: A): string {
return JSON.stringify(obj, replacer);
}
const expected = '{"b":{"c":{"e":"Y"},"d":"ALPHA"}}';
const recieved = stringify({ b: { c: { e: E.Y }, d: D.ALPHA } });
console.log(expected);
console.log(recieved);
console.log(expected === recieved);
This solution assumes you know the structure of the object, just as you gave in the example.

How to hint the type of a function I do not control?

When parsing a JSON-formatted string I get a linter error:
let mqttMessage = JSON.parse(message.toString())
// ESLint: Unsafe assignment of an `any` value. (#typescript-eslint/no-unsafe-assignment)
I control the content of message so I would like to tell TS that what comes out of JSON.parse() is actually an Object. How can I do that?
Note: I could silence the warning, but I would like to understand if there is a better way to approach the problem.
The problem is that JSON.parse returns an any type.
That's fair enough right - TypeScript doesn't know if it's going to parse out to a string, a number, or an object.
You have a linting rule saying 'Don't allow assigning variables as any'.
So yeah, you could coerce the result of your JSON.parse
type SomeObjectIKnowAbout = {
};
const result = JSON.parse(message.toString()) as SomeObjectIKnowAbout;
What I tend to like doing in this scenario is create a specific parsing function, that will assert at runtime that the obj really is of the shape you are saying, and will do the type casting to you can treat it while you're writing your code as that object.
type SomeObjectIKnowAbout = {
userId: string;
}
type ToStringable = {
toString: () => string;
}
function parseMessage(message: ToStringable ) : SomeObjectIKnowAbout {
const obj = JSON.parse(message.toString()); //I'm not sure why you are parsing after toStringing tbh.
if (typeof obj === 'object' && obj.userId && typeof obj.userId === 'string') {
return obj as SomeObjectIKnowAbout;
}
else {
throw new Error ("message was not a valid SomeObjectIKnowAbout");
}
}
JSON.parse isn't generic, so we can't supply a generic argument to do it.
You have a couple of options.
The simple thing is that since JSON.parse returns any, you can just define the type of what you're assigning it to:
let mqttMessage: MQTTMessage = JSON.parse(message.toString());
(I've used MQTTMessage as a stand-in for the appropriate type.)
That may not be typesafe enough for everyone, though, since it makes the assumption that the string defines what you expect it to define. And it has the problem that if you do it elsewhere, you repeat the assumption.
Instead, you could define a function:
function parseMQTTMessageJSON(json: string): MQTTMessage {
const x: object = JSON.parse(json);
if (x && /*...appropriate checks for properties here...*/"someProp" in x) {
return x as MQTTMessage;
}
throw new Error(`Incorrect JSON for 'MQTTMessage' type`);
}
Then your code is:
let mqttMessage = parseMQTTMessageJSON(message.toString());
As an alternative to type assertions and runtime wrapper functions, you can utilize declaration merging to augment the global JSON object with a generic overload for the parse method. This will allow you to pass through the expected type and give you improved IntelliSense in case you use a reviver when parsing:
interface JSON {
parse<T = unknown>(text: string, reviver?: (this: any, key: keyof T & string, value: T[keyof T]) => unknown): T
}
type Test = { a: 1, b: "", c: false };
const { a, b, c } = JSON.parse<Test>(
"{\"a\":1,\"b\":\"\",\"c\":false}",
//k is "a"|"b"|"c", v is false | "" | 1
(k,v) => v
);
Or, if you are relying on declaration files to augment global interfaces:
declare global {
interface JSON {
parse<T = unknown>(text: string, reviver?: (this: any, key: keyof T & string,
value: T[keyof T]) => unknown): T
}
}
Playground

How can one iterate/map over a Js.Json.t that is an array?

I'm trying to decode a JSON array that has the type Js.Json.t (not array(Js.Json.t) apparently). A call to Js.log(jsonList) reveals that it is an array, but I'm not certain how to map over the elements in the array to decode it.
So far, I've got:
let json_to_list = response => {
switch (response |> Js.Json.decodeObject) {
| None => {
Js.log("Decoding JSON failed!!")
None
}
| Some(jsonObject) => {
switch (jsonObject -> Js.Dict.get("list")) {
| None => {
Js.log("JSON didn't have a 'list' key/value.")
None
}
| Some(jsonList) => {
jsonList
|> Js.List.map(
/* compiler is expecting an uncurried function */
record => {
switch (record->Js.Dict.get("session-id") { /* ... */ }
}
)
}
}
}
}
};
The compiler is expecting an uncurried function, which, I don't know how to provide.
EDIT
I'd like to think I'm closer, but I'm getting an This has type: array(unit) Somewhere wanted: unit on the line (below) value |> Array.map(Js.log)
let json_to_list = response => {
Js.log("Decoding JSON")
switch (response |> Js.Json.decodeObject) {
| None => {
Js.log("Decoding JSON failed!!")
None
}
| Some(jsonObject) => {
switch (jsonObject -> Js.Dict.get("list")) {
| None => {
Js.log("JSON didn't have a 'list' key/value.")
None
}
| Some(jsonArray) => switch (Js.Json.decodeArray(jsonArray)) {
| None => {
Js.log("JSON Object wasn't an array.")
None
}
| Some(value) => {
Js.log("Value length: " ++ string_of_int(value|>Js.Array.length))
value |> Array.map(Js.log)
Some(value)
}
}
}
}
}
};
There are several ways of doing this, depending on what you know about the data at compile-time and what you actually need.
If you know exactly what it is, and there's no chance you could receive anything else, you could just cast it into the type you want without doing any checks at runtime:
external toMyType: Js.Json.t => array(something) = "%identity"
let myData = toMyType(json)
If you don't know the shape of the data until runtime, you can use Js.Json.classify:
let decodeArrayItem = ...
let myData : array(something) =
switch Js.Json.classify(json) {
| Js.Json.JSONArray(array) => Array.map(decodeArrayItem, array)
| _ => []
}
}
Or, if you could get anything but arrays are all you care about, you can use Js.Json.,decodeArray as a short-hand, which returns an option you can deal with further:
let decodeArrayItem = ...
let maybeData : option(array(something)) =
Js.Json.decodeArray(json)
|> Option.map(Array.map(decodeArrayItem))
Lastly, my recommended option for most scenarios is to use one of the third-party JSON decoder libraries, which tends to be designed for composition and therefore much more convenient for decoding large data structures. For example, using #glennsll/bs-json (no bias here, obviously):
module Decode = {
let arrayItem = ...
let myData =
Json.Decode.array(arrayItem)
}
let myData =
try Decode.myData(json) catch {
| Js.Json.DecodeError(_) => []
}
Edit: As for the actual error you get, you can turn a curried anonymus function into an uncurried one just by using a slightly different syntax:
let curried = record => ...
let uncurried = (. record) => ...