let mut file = Writer::from_path(output_path)?;
file.write_record([5.34534536546, 34556.456456467567567, 345.56465456])?;
Produces the following error:
error[E0277]: the trait bound `{float}: AsRef<[u8]>` is not satisfied
--> src/main.rs:313:27
|
313 | file.write_record([5.34534536546, 34556.456456467567567, 345.56465456])?;
| ------------ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `AsRef<[u8]>` is not implemented for `{float}`
| |
| required by a bound introduced by this call
|
= help: the following implementations were found:
<&T as AsRef<U>>
<&mut T as AsRef<U>>
<Arc<T> as AsRef<T>>
<Box<T, A> as AsRef<T>>
and 44 others
note: required by a bound in `Writer::<W>::write_record`
--> /home/mlueder/.cargo/registry/src/github.com-1ecc6299db9ec823/csv-1.1.6/src/writer.rs:896:12
|
896 | T: AsRef<[u8]>,
| ^^^^^^^^^^^ required by this bound in `Writer::<W>::write_record`
Is there any way to use the csv crate with numbers instead of structs or characters?
Only strings or raw bytes can be written to a file; if you try to give it something else, it isn't sure how to handle the data (as #SilvioMayolo mentioned). You can map your float array to one with strings, and then you will be able to write the string array to the file.
let float_arr = [5.34534536546, 34556.456456467567567, 345.56465456];
let string_arr = float_arr.map(|e| e.to_string();
This can obviously be combined to one line without using the extra variable, but it is a little easier to see the extra step that we need to take if it is split apart.
Related
My dataframe looks like
|ID|Notes|
---------------
|1|'{"Country":"USA","Count":"1000"}'|
|2|{"Country":"USA","Count":"1000"}|
ID : int
Notes : string
When i use from_json to parse the column Notes, it gives all Null values.
I need help in parsing this column Notes into columns in pyspark
When you are using from_json() function, make sure that the column value is exactly a json/dictionary in String format. In the sample data you have given, the Notes column value with id=1 is not exactly in json format (it is a string but enclosed within additional single quotes). This is the reason it is returning NULL values. Implementing the following code on the input dataframe gives the following output.
df = df.withColumn("Notes",from_json(df.Notes,MapType(StringType(),StringType())))
You need to change your input data such that the entire Notes column is in same format which is json/dictionary as a string and nothing more because it is the main reason for the issue. The below is the correct format that helps you to fix your issue.
| ID | Notes |
---------------
| 1 | {"Country":"USA","Count":"1000"} |
| 2 | {"Country":"USA","Count":"1000"} |
To parse Notes column values as columns in pyspark, you can simply use function called json_tuple() (no need to use from_json()). It extracts the elements from a json column (string format) and creates the result as new columns.
df = df.select(col("id"),json_tuple(col("Notes"),"Country","Count")) \
.toDF("id","Country","Count")
df.show()
Output:
NOTE: json_tuple() also returns null if the column value is not in the correct format (make sure the column values are json/dictionary as a string without additional quotes).
I need to process a csv file obtained from a government site. The file has two different format issues that cannot both be handled by Camel CsvDataFormat unmarshal. Minimal test file:
Registration No,Trade Name
"A009928","Rotagen "Combo""
"A010343","Vet Direct Abamectin Wormer, Bot + Tape"
Using this code to unmarshal:
CsvDataFormat csv = new CsvDataFormat();
csv.setDelimiter(",");
csv.setQuoteDisabled(true);
csv.setUseMaps(false);
from("file://c:/temp?fileName=test.csv&noop=true")
.unmarshal(csv)
.process(new Processor() {
public void process(Exchange exchange) throws Exception {
List<List<String>> rows = (List<List<String>>) exchange.getIn().getBody();
for (int j = 0; j< rows.size();j++) {
List<String> row = rows.get(j);
for (int i = 0; i< row.size();i++) {
log.info("ITEM["+row.get(i)+"]");
}
}
}
});
When setQuoteDisabled(false) I get:
java.lang.IllegalStateException: IOException reading next record: java.io.IOException: (line 2) invalid char between encapsulated token and delimiter
When setQuoteDisabled(true) the file is unmarshaled but the 3rd line ends an additional split at the extra ','
Here's the output:
13:10| INFO | MainRoute.java 54 | ITEM[Registration No]
13:10| INFO | MainRoute.java 54 | ITEM[Trade Name]
13:10| INFO | MainRoute.java 54 | ITEM["A009928"]
13:10| INFO | MainRoute.java 54 | ITEM["Rotagen "Combo""]
13:10| INFO | MainRoute.java 54 | ITEM["A010343"]
13:10| INFO | MainRoute.java 54 | ITEM["Vet Direct Abamectin Wormer]
13:10| INFO | MainRoute.java 54 | ITEM[ Bot + Tape"]
How to configure CsvDataFormat to unmarshall both rows correctly?
Well, this is a problem of CSV as a "soft standard". Rows and delimiters are more or less standardized, but when it comes to quotes, it gets complicated.
Since your data is quoted (i.e. every field value is in quotes), the correct configuration would be
setQuoteDisabled(false)
The second record works fine with this configuration.
"A010343","Vet Direct Abamectin Wormer, Bot + Tape"
Because the fields are enclosed in quotes, the comma inside the data is no problem.
However, the first record contains quotes inside the data.
"A009928","Rotagen "Combo""
According to RFC-4180, Paragraph 2.7 such quotes must be escaped with an additional quote.
If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.
"A009928","Rotagen ""Combo"""
You could try to fix this manually in one record to see if it works like this.
Generally, you have multiple options:
Inform the data provider that his data is not RFC-4180 compliant and ask him to fix it
Fix the data upfront before you read it with Camel
Parse the data by yourself and compensate the quote problem
The second line of your .csv file violates the rules of quoting in csv, or at least as it is understood by the default options of commons-csv (the library camel uses under the hood for these types of things).
The default way of dealing with quotes inside quotes, is to escape the inner quotes by repeating it twice. Keep setQuoteDisabled(false) and correct the second line in your .csv-file to:
"A009928","Rotagen ""Combo"""
Or to put the question another way: Why is PapaParse's ParseResult.data an empty array when trimming all leading and trailing empty cells during Papa.step() function? EDIT: Please note I can achieve what I'm wanting by mapping over the parsed results and trimming, but I don't want to parse and then map, I'd rather do it all in one go.
Example CSV:
Col 1,Col 2,Col 3
1-1,1-2,
,2-2,2-3
3-1,3-2,3-3
Note that row 1 contains headers (Col 1, Col 2, etc). Row 2 col 3 is empty, and
row 3 col 1 is empty.
Given that CSV, I want to present this back to the user (as a nicely-formatted
table):
| | | |
|-----|-----|-----|
| 1-1 | 1-2 | |
| 2-2 | 2-3 | |
| 3-1 | 3-2 | 3-3 |
I want to push all rows as far to the left as they can go, and remove all empty
cells from the end of each row.
In other words, I want to trim all empty cells from both the beginning and the
end of each row. Below is the code I'm using. I have put debuggers inside of
trimEmptyCells and it is doing exactly as expected. However, the ParseResult
that parseAndTrim returns contains an empty data array.
export const parseAndTrim = (csv: string): Papa.ParseResult => {
return Papa.parse(csv, {
skipEmptyLines: true,
step: trimEmptyCells,
})
};
const trimEmptyCells = (results: Papa.ParseResult) => {
// Note that `_.dropWhile` and `_.dropRightWhile` are [lodash
// functions](https://lodash.com/docs/4.17.15#dropRight).
const leftTrimmed = _.dropWhile(results.data, (r) => r === "");
return _.dropRightWhile(leftTrimmed, (r) => r === "");
};
My first guess was
that PapaParse was experiencing errors with arrays with different lengths, but
the errors array is also empty. So I tested what I could (no step function)
at https://www.papaparse.com/demo using the example below and simply having
missing cells (not merely empty) throws no errors and returns a proper data
array.
Example test input at https://www.papaparse.com/demo
Col 1,Col 2,Col 3
1-1,1-2
,2-2,2-3
Based on this comment from pokoli (the #2 contributor to PapaParse and the #1 contributor since early 2017), I believe this is impossible. pokoli's proposed solution is
You should use Papa.parse to read records as array, filter them and then use Papa.Unparse to write the second file.
I wish I could mutate data while parsing to be faster, but PapaParse is very fast. I was able to parse a 36,000-line csv in under 300ms, and unparse in twice the time. Parsing a 2,000-line csv took under 30ms and unparse again took twice the time. My use case will involve CSVs under 2,000 lines 99% of the time so parsing into 2d array, filtering, unparsing back into csv, then parsing again into json won't take too long.
EDIT:
My below question still stands but I appreciate that it's hard to answer without sifting through a pile of code. Therefore, to ask a somewhat similar question, does anyone have any examples of Menhir being used to implement an AST? Preferably not "toy" projects like a calculator but I would appreciate any help I could get.
Original Question:
I'm trying to implement an abstract syntax tree using Menhir and there's an issue I can't seem to solve. My set up is as follows:
The AST's specification is generated using atdgen. This is basically a file with all of my grammar rules translated to to the ATD format. This allows me to serialize some JSON which is what I use to print out the AST.
In my parser.mly file I have a long list of production. As I'm using Menhir I can link these productions up to AST node creation, i.e. each production from the parser corresponds with an instruction to record a value in the AST.
The second point is where I'm really struggling to make progress. I have a huge grammar (the ast.atd file is ~600 lines long and the parser.mly file is ~1000 files long) so it's hard to pin down where I'm going wrong. I suspect I have a type error somewhere along the way.
Snippets of Code
Here's what my ast.atd file looks like:
...
type star = [ Star ]
type equal = [ Equal ]
type augassign = [
| Plusequal
| Minequal
| Starequal
| Slashequal
| Percentequal
| Amperequal
| Vbarequal
| Circumflexequal
| Leftshiftequal
| Rightshiftequal
| Doublestarequal
| Doubleslashequal
]
...
Here's what my parser.mly file looks like:
...
and_expr // Used in: xor_expr, and_expr
: shift_expr
{ $1 }
| and_expr AMPERSAND shift_expr
{ `And_shift ($1, `Ampersand, $3) } ;
shift_expr // Used in: and_expr, shift_expr
: arith_expr
{ $1 }
| shift_expr pick_LEFTSHIFT_RIGHTSHIFT arith_expr
{ `Shift_pick_arith ($1, $2, $3) } ;
pick_LEFTSHIFT_RIGHTSHIFT // Used in: shift_expr
: LEFTSHIFT
{ `Leftshift }
| RIGHTSHIFT
{ `Rightshift } ;
...
The error I get when I try to compile the files with
ocamlbuild -use-menhir -tag thread -use-ocamlfind -quiet -pkgs
'core,yojson,atdgen' main.native
is a type error, i.e.
This expression has type [GIANT TYPE CONSTRUCTION] but an expression
was expected of type [DIFFERENT GIANT TYPE CONSTRUCTION]
I realise that this question is somewhat difficult to answer in the abstract like this, and I'm happy to provide a link to the dropbox of my code, but I'd really appreciate if anyone could point me in the right direction.
Possibly of interest: I have some productions in parser.mly that were initially "empty" which I dealt with by using the ocaml option type (Some and None). Perhaps I could be having issues here?
About examples of code using menhir, you can have a look at the list on the right on the OPAM menhir page - all those depend on menhir.
I've an excel workbook that looks something like this:
/-------------------------------------\
| Lat | Long | Area |
|-------------------------------------|
| 5.3 | 103.8 | AREA_NAME |
\-------------------------------------/
I also have a JSON api with a url of the following structure:
https://example.com/api?token=TOKEN&lat=X.X&lng=X.X
that returns a JSON object with the following structure:
{ "Area": "AREA_NAME", "OTHERS": "Other_details"}
I tried to implement a VBA function that will help me to extract AREA_NAME. However, I keep getting syntax errors. I don't know where I am going wrong.
Function get_p()
Source = Json.Document (Web.Contents("https://example.com/api?token=TOKEN&lat=5.3&lng=103.8"))
name = Source[Area]
get_p = Name
End Function
I intentionally hardcoded the lat and long value for development purposes. Eventually, I want the function to accept lat and long as parameters. I got the first line of the function from PowerQuery Editor.
Where am I going wrong? How to do this properly in VBA? Or is there a simpler way using PowerQuery?