Currently I have a Rust function that searches through a large JSON file (About 1,080,000 lines) Currently this function takes about 1 second to search through this file, the data in this file is mostly stuff like this:
{"although":false,"radio":2056538449,"hide":1713884795,"hello":1222349560.787047,"brain":903780409.0046091,"heard":-1165604870.8374772}
How would I be able to increase the performance of this function?
Here is my Main.rs file.
use std::collections::VecDeque;
use std::fs::File;
use std::io::BufWriter;
use std::io::{BufRead, BufReader, Write};
fn search(filename: &str, search_line: &str) -> Result<VecDeque<u32>, std::io::Error> {
let file = File::open(filename)?;
let mut reader = BufReader::with_capacity(2048 * 2048, file);
let mut line_numbers = VecDeque::new();
let mut line_number = 0;
let start = std::time::Instant::now();
loop {
line_number += 1;
let mut line = String::new();
let n = reader.read_line(&mut line)?;
if n == 0 {
break;
}
if line.trim() == search_line {
line_numbers.push_back(line_number);
println!(
"Matching line found on line number {}: {}",
line_number, line
);
break;
}
}
let elapsed = start.elapsed();
println!("Elapsed time: {:?}", elapsed);
if line_numbers.is_empty() {
println!("No lines found that match the given criteria");
}
Ok(line_numbers)
}
fn main() {
let database = "Test.json";
if let Err(e) = search(database, r#"{"08934":420696969}"#) {
println!("Error reading file: {}", e);
}
}
I have an assets.csv file with 172 MB, a million rows, and 16 columns. I would like to read it using an offset -> bytes/line/record. In the code below, I am using the byte value.
I have stored the required positions (record.postion.bytes() in assets_index.csv) and I would like to read a particular line in the assets.csv using the saved offset.
I am able to get an output, but I feel there must be a better way to read from a CSV file based on byte position.
Please advise. I am new to programming and also to Rust, and learned a lot using the tutorials.
The assets.csv is of this format:
asset_id,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation,year,depreciation
1000001,2015,10000,2016,10000,2017,10000,2018,10000,2019,10000,2020,10000,2021,10000,2022,10000,2023,10000,2024,10000,2025,10000,2026,10000,2027,10000,2028,10000,2029,10000
I used another function to get the Position { byte: 172999933, line: 1000000, record: 999999 }.
The assets_index.csv is of this format:
asset_id,offset_inbytes
1999999,172999933
fn read_from_position() -> Result<(), Box<dyn Error>> {
let asset_pos = 172999933 as u64;
let file_path = "assets.csv";
let mut rdr = csv::ReaderBuilder::new()
.flexible(true)
.from_path(file_path)?;
let mut wtr = csv::Writer::from_writer(io::stdout());
let mut record = csv::ByteRecord::new();
while rdr.read_byte_record(&mut record)? {
let pos = &record.position().expect("position of record");
if pos.byte() == asset_pos
{
wtr.write_record(&record)?;
break;
}
}
wtr.flush()?;
Ok(())
}
$ time ./target/release/testcsv
1999999,2015,10000,2016,10000,2017,10000,2018,10000,2019,10000,2020,10000,2021,10000,2022,10000,2023,10000,2024,10000,2025,10000,2026,10000,2027,10000,2028,10000,2029,10000
Time elapsed in readcsv() is: 239.290125ms
./target/release/testcsv 0.22s user 0.02s system 99% cpu 0.245 total
Instead of using from_path you can use from_reader with a File and seek in that file before creating the CsvReader:
use std::{error::Error, fs, io::{self, Seek}};
fn read_from_position() -> Result<(), Box<dyn Error>> {
let asset_pos = 0x116 as u64; // offset to only record in example
let file_path = "assets.csv";
let mut f = fs::File::open(file_path)?;
f.seek(io::SeekFrom::Start(asset_pos))?;
let mut rdr = csv::ReaderBuilder::new()
.flexible(true)
// edit: as noted by #BurntSushi5 we have to disable headers here.
.has_headers(false)
.from_reader(f);
let mut wtr = csv::Writer::from_writer(io::stdout());
let mut record = csv::ByteRecord::new();
rdr.read_byte_record(&mut record)?;
wtr.write_record(&record)?;
wtr.flush()?;
Ok(())
}
Then the first record read will be the one you're looking for.
I am trying to retrieve and parse a JSON file using reqwest.
I used this question as a starting point but it doesn't work with my API.
The error:
Error: reqwest::Error { kind: Decode, source: Error("expected value", line: 1, column: 1) }
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let resp = reqwest::get("https://tse.ir/json/MarketWatch/data_7.json")
.await?
.json::<serde_json::Value>()
.await?;
println!("{:#?}", resp);
Ok(())
}
The API works fine with other languages. thank for your help.
Cargo.toml:
[package]
name = "rust_workspace"
version = "0.1.0"
edition = "2021"
[dependencies]
serde_json = "1.0"
serde = { version = "1.0", features = ["derive"] }
reqwest = { version = "0.11", features = ["json", "blocking"] }
tokio = { version = "1", features = ["full"] }
bytes = "1"
The error happens usually when the response doesn't contain
"content-type":"application/json"
Even though the content is a valid json you will get that error.
To solve the issue you need to use
let text_response = reqwest.get("...").await?.text().await?;
let resp: serde_json::Value = serde_json::from_str(&text_response)?;
Before csv header(time,ampl), there are some 'invalid' data.
the csv is about:
LECROYWS3024,13568,Waveform
Segments,1,SegmentSize,100002
Segment,TrigTime,TimeSinceSegment1
#1,01-Apr-2021 16:49:34,0
Time,Ampl
-2.510018e-005,0
-2.509968e-005,0
-2.509918e-005,0
-2.509868e-005,0
-2.509818e-005,0
...
when i build and run the exe, then an error is occured as below :
the error is :
CSV deserialize error: record 1 (line: 1, byte: 29): missing field Time
How can I deal with the invalid data with serde or other crates? Thanks!
use std::error::Error;
use std::io;
use std::process;
use serde::Deserialize;
#[derive(Debug, Deserialize)]
struct Record {
Time: Option<f32>,
Ampl:Option<f32>,
}
...
fn example() -> Result<(), Box<dyn Error>> {
let mut rdr = csv::Reader::from_path("foo.csv")?;
for result in rdr.deserialize() {
let record: Record = result?;
let x0= match record.Time{
Some(x)=> x,
None=> 0.0,
};
...
}
Ok(())
}
fn main() {
if let Err(err) = example() {
println!("error running example: {}", err);
process::exit(1);
}
}
You can use the csv crate, which has a custom deserializer: csv::invalid_option.
Then you can use a macro like this in your struct:
#[derive(Debug, Deserialize)]
struct Record {
Time: Option<f32>,
#[serde(deserialize_with = "csv::invalid_option")]
Ampl:Option<f32>,
}
to have invalid data converted to None values
As an example, when the below code is run, each time the previous test.csv file is overwritten with a new one. How to append to test.csv instead of overwriting it?
extern crate csv;
use std::error::Error;
use std::process;
fn run() -> Result<(), Box<Error>> {
let file_path = std::path::Path::new("test.csv");
let mut wtr = csv::Writer::from_path(file_path).unwrap();
wtr.write_record(&["City", "State", "Population", "Latitude", "Longitude"])?;
wtr.write_record(&["Davidsons Landing", "AK", "", "65.2419444", "-165.2716667"])?;
wtr.write_record(&["Kenai", "AK", "7610", "60.5544444", "-151.2583333"])?;
wtr.write_record(&["Oakman", "AL", "", "33.7133333", "-87.3886111"])?;
wtr.flush()?;
Ok(())
}
fn main() {
if let Err(err) = run() {
println!("{}", err);
process::exit(1);
}
}
Will the append solution work if the file does not yet exist?
The csv crate provides Writer::from_writer so you can use anything, which implements Write. When using File, this answer from What is the best variant for appending a new line in a text file? shows a solution:
Using OpenOptions::append is the clearest way to append to a file
let mut file = OpenOptions::new()
.write(true)
.append(true)
.open("test.csv")
.unwrap();
let mut wtr = csv::Writer::from_writer(file);
Will the append solution work if the file does not yet exist?
Just add create(true) to the OpenOptions:
let mut file = OpenOptions::new()
.write(true)
.create(true)
.append(true)
.open("test.csv")
.unwrap();
let mut wtr = csv::Writer::from_writer(file);