Issue printing header using Rust's CSV crate - csv

Here is my setup:
I am reading a csv file, the path to which is passed into the built exe as an argument, and I am using the crate Clap for it.
It all reads the file with no problem, but I am having trouble printing the headers.
I'd like to be able to print the headers without the quotes, but when I print it, only the first header/column gets printed without them, and the remaining ones do not.
Here's what I mean:
This is the part of the code that prints the header:
let mut rdr = csv::Reader::from_path(file)?;
let column_names = rdr.headers();
println!("{}", match column_names {
Ok(v) => v.as_slice(),
Err(_) => "Error!"
});
With this, this is what the output is:
warning: `csv_reader` (bin "csv_reader") generated 2 warnings
Finished release [optimized] target(s) in 0.13s
Running `target\release\csv_reader.exe -f C:\nkhl\Projects\dataset\hw_25000.csv`
Index "Height(Inches)" "Weight(Pounds)"
()
As you can see, Index does not get printed with the quotes, which is how I'd like the others to be printed. Printing with Debug marker enabled, I get this:
let mut rdr = csv::Reader::from_path(file)?;
let column_names = rdr.headers();
println!("{:?}", match column_names {
Ok(v) => v.as_slice(),
Err(_) => "Error!"
});
warning: `csv_reader` (bin "csv_reader") generated 2 warnings
Finished release [optimized] target(s) in 1.92s
Running `target\release\csv_reader.exe -f C:\nkhl\Projects\dataset\hw_25000.csv`
"Index \"Height(Inches)\" \"Weight(Pounds)\""
()
The CSV can be found here: https://people.sc.fsu.edu/~jburkardt/data/csv/hw_25000.csv
This is how it looks:
"Index", "Height(Inches)", "Weight(Pounds)"
1, 65.78331, 112.9925
2, 71.51521, 136.4873
3, 69.39874, 153.0269
I hope I am doing something utterly silly, but for the life of me, I am unable to figure it out.

Your csv data contains extraneous spaces after the commas, because of that Rusts csv thinks that the quotes around Height(Inches) are part of the header, not meant to escape them.
Unfortunately the lack of standardization around csv makes both interpretations valid.
You can use trim to get rid of the extra spaces:
let data: &[u8] = include_bytes!("file.csv");
let mut rdr = csv::ReaderBuilder::new().trim(csv::Trim::All).from_reader(data);
But csv does the unquoting before it applies the trim so this does still leave you with the same problem.
You can additionaly disable quoting to at least get the same behaviour on all columns:
let mut rdr = csv::ReaderBuilder::new().quoting(false).trim(csv::Trim::All).from_reader(data);
If you somehow can remove the spaces from your csv file it works just fine:
fn main() {
let data: &[u8] = br#""Index","Height(Inches)","Weight(Pounds)"
1,65.78331,112.9925
2,71.51521,136.4873
3,69.39874,153.0269"#;
let mut rdr = csv::Reader::from_reader(data);
let hd = rdr.headers().unwrap();
println!("{}", hd.as_slice());
// prints `IndexHeight(Inches)Weight(Pounds)` without any `"`
}
Playground

Related

How to deserialize csv based on line format [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 9 months ago.
Improve this question
I have a csv without headers that can have lines in these three following formats:
char,int,int,string,int
char,int,string
char
The first character defines the format and be one of the values (A,B,C) respectively. Does anyone know a way to deserialize it based on the line format?
Just keep it simple. You can always parse it manually.
use std::io::{self, BufRead, Error, ErrorKind};
pub enum CsvLine {
A(i32, i32, String, i32),
B(i32, String),
C,
}
pub fn read_lines<R: BufRead>(reader: &mut R) -> io::Result<Vec<CsvLine>> {
let mut lines = Vec::new();
for line in reader.lines() {
let line = line?;
let trimmed = line.trim();
if trimmed.is_empty() {
continue
}
// Split line by commas
let items: Vec<&str> = trimmed.split(',').collect();
match items[0] {
"A" => {
lines.push(CsvLine::A (
items[1].parse::<i32>().map_err(|e| Error::new(ErrorKind::Other, e))?,
items[2].parse::<i32>().map_err(|e| Error::new(ErrorKind::Other, e))?,
items[3].to_string(),
items[4].parse::<i32>().map_err(|e| Error::new(ErrorKind::Other, e))?,
));
}
"B" => {
lines.push(CsvLine::B (
items[1].parse::<i32>().map_err(|e| Error::new(ErrorKind::Other, e))?,
items[2].to_string(),
));
}
"C" => lines.push(CsvLine::C),
x => panic!("Unexpected string {:?} in first column!", x),
}
}
Ok(lines)
}
Calling this function would look something like this:
let mut file = File::open("path/to/data.csv").unwrap();
let mut reader = BufReader::new(file);
let lines: Vec<CsvLine> = read_lines(&mut reader).unwrap();
But you may want to keep in mind that I didn't bother to handle a couple edge cases. It may panic if there are not enough items to satisfy the requirements and it makes no attempt to parse more complex strings. For example, "\"quoted strings\"" and "\"string, with, commas\"" would likely cause issues.

Trying to make a command to store and retreat info from a json file

Im trying to make a command that will store name,description and image of a character and another command to retrieve that data in an embed,but i have trouble working with json files
this is my code to add them:
#client.command()
async def addskillset(ctx):
await ctx.send("Let's add this skillset!")
questions = ["What is the monster name?","What is the monster description?","what is the monster image link?"]
answers = []
#code checking the questions results
embedkra = nextcord.Embed(title = f"{answers[0]}", description = f"{answers[1]}",color=ctx.author.color)
embedkra.set_image(url = f"{answers[2]}")
mess = await ctx.reply(embed=embedkra,mention_author=False)
await mess.add_reaction('✅')
await mess.add_reaction('❌')
def check(reaction, user):
return user == ctx.author and (str(reaction.emoji) == "✅" or "❌")
try:
reaction, user = await client.wait_for('reaction_add', timeout=1000.0, check=check)
except asyncio.TimeoutError:
#giving a message that the time is over
else:
if reaction.emoji == "✅":
monsters = await get_skillsets_data() #this data is added at the end
if str(monster_name) in monsters:
await ctx.reply("the monster is already added")
else:
monsters[str(monster_name)]["monster_name"] = {}
monsters[str(monster_name)]["monster_name"] = answers[0]
monsters[str(monster_name)]["monster_description"] = answers[1]
monsters[str(monster_name)]["monster_image"] = answers[2]
with open('skillsets.json','w') as f:
json.dump(monsters,f)
await mess.delete()
await ctx.reply(f"{answers[0]} successfully added to the list")
Code to get the embed with the asked info:
#client.command()
async def skilltest(ctx,*,monster_name):
data = open('skillsets.json').read()
data = json.loads(data)
if str(monster_name) in data:
name = data["monster_name"]
description = data["monster_description"]
link = data["monster_image"]
embedkra = nextcord.Embed(title = f"{name}", description = f"{description}",color=ctx.author.color)
embedkra.set_image(url = f"{link}")
await ctx.reply(embed=embedkra,mention_author=False)
else:
# otherwise, it is still None meaning we didn't find it
await ctx.reply("monster not found",mention_author=False)
and my json should look like this:
{"katufo": {"monster_name": "Katufo","Monster_description":"Katufo is the best","Monster_image":"#image_link"},
"armor claw":{"monster_name": "Armor Claw","Monster_description":"Armor claw is the best","Monster_image":#image_link}}
The get_skillsets_data used in first command:
async def get_skillsets_data():
with open('skillsets.json','r') as f:
monsters = json.load(f)
return monsters
Well, When you are trying to retrieve data from your json file try using name = data["katufo"]["monster_name"] now here it will only retrieve monster_name of key katufo. If You want to retrieve data for armor claw code must go like this name = data["armor claw"]["monster_name"]. So try this code :
#client.command()
async def skilltest(ctx,*,monster):
data = open('skillsets.json').read()
data = json.loads(data)
if str(monster) in data:
name = data[f"monster"]["monster_name"]
description = data[f"monster"]["Monster_description"]
link = data[f"monster"]["Monster_image"]
embedkra = nextcord.Embed(title = f"{name}", description = f"{description}",color=ctx.author.color)
embedkra.set_image(url = f"{link}")
await ctx.reply(embed=embedkra,mention_author=False)
else:
# otherwise, it is still None meaning we didn't find it
await ctx.reply("monster not found",mention_author=False)
Hope this works for you :)
If your json looks like what you showed above,
{
"katufo":{
"monster_name":"Katufo",
"Monster_description":"Katufo is the best",
"Monster_image":"#image_link"
},
"armor claw":{
"monster_name":"Armor Claw",
"Monster_description":"Armor claw is the best",
"Monster_image":"#image_link"
}
}
then there is no data["monster_name"] the two objects inside of your JSON are named katufo and armor_claw. To get one of them you can simply write data['katufo']['monster_name'] or data.katufo.monster_name.
Your problem stems from looking up the monster name like this:
if str(monster_name) in data:
name = data["monster_name"]
description = data["monster_description"]
link = data["monster_image"]
What you could do instead is loop through data, as it contains several monsters and then on each object, to the check that you do:
for monster in data:
if str(monster_name) in monster.values():
name = monster.monster_name
description = monster.Monster_description
link = monster.Monster_image
One thing to think about, the way the variables are named is not something I personally recommend. Don't be afraid of adding longer descriptive names so things make more sense for you in the code. Also, in the JSON you provided, there are certain attributes starting with a capital letter, something you should think about.
Edit:
Dicts in python are the equivalent of objects in Javascript and are initialized using the same syntax which we can see below:
monster_data = {}
But since you want a specific structure on these monsters we can go further and create a function called add_monster_object():
def add_monster_object(original_dict, new_monster):
new_monster = {
"monster_name": '',
"monster_description": '',
"monster_image": ''
}
#Now we have a new empty object with the correct names.
return original_dict.update(new_monster)
Now every time you run this function with a given name, in the dict there will be an object with that name. Example is if user writes armor_sword as the monster_name attribute, then we can call the function above as add_monster_object(original_dict, monster_name).
This will, if we take your initial dict as an example, return this:
{
"katufo":{
"monster_name":"Katufo",
"Monster_description":"Katufo is the best",
"Monster_image":"#image_link"
},
"armor claw":{
"monster_name":"Armor Claw",
"Monster_description":"Armor claw is the best",
"Monster_image":"#image_link"
},
"armor sword":{
"monster_name":"",
"monster_description":"",
"monster_image":""
}
}
Then you can populate them as you want, or update the function to take more parameters. The important part here is that you take a minute and figure out what you want to keep saved. Then make sure that you can read and write from file and you should have a somewhat simple structure going. Warning: This isn't a slap and dry method, you will also have to think about special cases, such as adding an object that already exists and soforth.
If you decide to go with Replit you could use their database to create similar functionality but you wouldn't have to worry about reading and writing to a file.
As it is right now, I still think you need to proceed with your bot, add some of the changes that I mentioned before the next actual problem arrives as there are many things that arent quite right. I also suggest you break everything into managing parts, 1 would be to read from a file. 2 would be to write. 3 to write a dict to a file. 4 to update a dict and soforth. Good luck!

Is it possible to parse a text file using Rust's csv crate?

I have a text file with multiple lines. Is it possible to use Rust's csv crate to parse it such that each line is parsed into a different record?
I've tried specifying b'\n' as the field delimiter and left the record terminator as the default. The issue I'm having is that lines can sometimes end with \r\n and sometimes with just \n.
This however raises the UnequalLengths error unless the flexible option is specified because apparently new lines take precedence over field delimiters, so the code below:
use csv::{ByteRecord, Reader as CsvReader, ReaderBuilder, Terminator};
fn main() {
let data = "foo,foo2\r\nbar,bar2\nbaz\r\n";
let mut reader = ReaderBuilder::new()
.delimiter(b'\n')
.has_headers(false)
.flexible(true)
.from_reader(data.as_bytes());
let mut record = ByteRecord::new();
loop {
match reader.read_byte_record(&mut record) {
Ok(true) => {},
Ok(false) => { break },
Err(csv_error) => {
println!("{}", csv_error);
break;
}
}
println!("fields: {}", record.len());
for field in record.iter() {
println!("{:?}", ::std::str::from_utf8(&field))
}
}
}
Will print:
fields: 1
Ok("foo,foo2")
fields: 2
Ok("bar,bar2")
Ok("baz")
I would like the string to be parsed into 3 records with one field each, so the expected output would be:
fields: 1
Ok("foo,foo2")
fields: 1
Ok("bar,bar2")
fields: 1
Ok("baz")
Is it possible to tweak the CSV reader somehow to obtain that behavior?
Conceptually I'd like the field terminator to be None but it seems that the terminator must be a single u8 value
I guess I'll re-post my comment as the answer. More succinctly, as the author of the csv crate, I'd say the answer to your question is "no."
Firstly, it's not clear to me why you're trying to use a csv parser for this task at all. As the comments indicate, it's likely that your question is under-specified. Nevertheless, it seems more prudent to just write your own parser.
Secondly, setting both the delimiter and the terminator to the same thing is probably a condition in which the csv reader should panic or return an error. It doesn't really make sense from the perspective of the parser, and its behavior is likely unspecified.
Finally, it seems to me like your desired output indicates that you should just iterate over the lines in your input. It should give you exactly the output you want, as it handles both \n and \r\n.

Replace Quotation in List of Lists R

I am trying to get a JSON response from an API:
test <- GET(url, add_headers(`api_key` = key))
content(test, 'parsed')
When I run content(test, 'parsed'), I get the following error:
# Error: lexical error: invalid string in json text. .Note: Final passage of the "fiscal cliff bill" on January 1
I think this is because of the double quotations. How can I either replace the double quotes or if this is not the problem, how can I fix this issue?
Thanks!
So I had run into a similar problem before, and I had intended to write a quite function to use Jeroen's fix to try to repair the JSON. Since I intended to do it anyway, here's a quick hack attempt.
NB: repairing a structured format like this is speculative at best and most certainly prone to errors. The good news is that I tried to keep this specific enough so that it will not produce false results: it'll either fix what it knows it can, or fail. The "unit-testing" really needs to check other corner-cases. If you find something that this does not fix (and should) or that this breaks (gasp!), please comment!
fix_json_quotes <- function(s) {
if (length(s) != 1) {
warning("the argument has length > 1 and only the first element will be used")
s <- s[[1]]
}
stopifnot(is.character(s))
val <- jsonlite::validate(s)
while (! val) {
ind <- attr(val, "offset") - 1
snew <- gsub("(.*)(['\"])([[:space:],]*)$", "\\1\\\\\\2\\3", substr(s, 1, ind))
if (snew != substr(s, 1, ind)) {
s <- paste0(snew, substr(s, ind + 1, nchar(s)))
} else {
break
}
val <- jsonlite::validate(s)
}
if (! val) {
# still not validating
stop("unable to fix quotes")
}
return(s)
}
Some sample data, unit-testing if you will (testthat is not required for use of the function):
library(testthat)
lst <- list(a="final \"cliff bill\" on")
json <- as.character(toJSON(lst))
json
# [1] "{\"a\":[\"final \\\"cliff bill\\\" on\"]}"
Okay, there should be no change:
expect_equal(json, fix_json_quotes(json))
Some bad data:
# un-escape the double quotes
badlst <- "{\"a\":[\"final \"cliff bill\" on\"]}"
expect_error(jsonlite::fromJSON(badlst))
expect_equal(json, fix_json_quotes(badlst))
PS: this looks specifically for double-quotes, nothing more. However, I believe that there are related errors that this might also be able to fix. I "left room" for this, in the second group within the regex (([\"])); for example, if single-quotes could also cause a problem, then the group could be changed to be ([\"']). I don't know if it's useful or even necessary.

CSVProvider start reading csv at specific row

I want to read a csv file with using FSharp.Data CSVProvider.
The data looks like:
;Datum;Von;bis;MW
Maximum;16.10.2015;19:00;19:15;9268,000
Minimum;26.12.2015;13:30;13:45;-5195,000
"Datum";"Von";"bis";"Vertikale Netzlast [MW]";
01.01.2015;00:00;00:15;1.216;
01.01.2015;00:15;00:30;1.121;
01.01.2015;00:30;00:45;1.090;
01.01.2015;00:45;01:00;981;
I want to use the following code:
let csvValues = CsvProvider<"http://ws.50hertz.com/web01/api/PhotovoltaicForecast/DownloadFile?fileName=2015.csv&callback=?", ";">.GetSample()
How can I start to read the file at row 5 or if the first column contains "Datum"?
It is working with SkipWhile:
let csvValues = CsvProvider<"http://ws.50hertz.com/web01/api/PhotovoltaicForecast/DownloadFile?fileName=2015.csv&callback=?", ";", IgnoreErrors=true>.GetSample()
.SkipWhile(fun r -> not (r.Column1.Contains("Datum")))
Or this is also working, with an option in constructor to skip rows:
let csvValues = CsvProvider<"http://ws.50hertz.com/web01/api/PhotovoltaicForecast/DownloadFile?fileName=2015.csv&callback=?", ";", IgnoreErrors=true, SkipRows=3>.GetSample()