Error Reading JSON with Rascal with multiple constructors - json

The following code:
import lang::json::IO;
data Field = field(map[str, str]);
data Block = null()
| block(str \type)
| block(str \type, str id)
| block(str \type, Block next)
| block(str \type, Field field, Input input, Block next);
data Input = input(map[str, Block]);
writeJSON(|file:///rascal/block.json|, block("type","sdlkfjl"));
readJSON(#Block, |file:///rascal/block.json|);
Gives an error:
rascal>readJSON(#Block, |file:///rascal/block.json|);
|std:///lang/json/IO.rsc|(976,2999,<32,0>,<60,179>): IO("Overloading of constructor names is not supported
(Block):$.block")
at *** somewhere ***(|std:///lang/json/IO.rsc|(976,2999,<32,0>,<60,179>))
at readJSON(|std:///lang/json/IO.rsc|(3968,5,<60,172>,<60,177>)ok
rascal>
As is shown in the code; my block have some optional fields. What options do I have to write and read to JSON?

Related

Convert string column to json and parse in pyspark

My dataframe looks like
|ID|Notes|
---------------
|1|'{"Country":"USA","Count":"1000"}'|
|2|{"Country":"USA","Count":"1000"}|
ID : int
Notes : string
When i use from_json to parse the column Notes, it gives all Null values.
I need help in parsing this column Notes into columns in pyspark
When you are using from_json() function, make sure that the column value is exactly a json/dictionary in String format. In the sample data you have given, the Notes column value with id=1 is not exactly in json format (it is a string but enclosed within additional single quotes). This is the reason it is returning NULL values. Implementing the following code on the input dataframe gives the following output.
df = df.withColumn("Notes",from_json(df.Notes,MapType(StringType(),StringType())))
You need to change your input data such that the entire Notes column is in same format which is json/dictionary as a string and nothing more because it is the main reason for the issue. The below is the correct format that helps you to fix your issue.
| ID | Notes |
---------------
| 1 | {"Country":"USA","Count":"1000"} |
| 2 | {"Country":"USA","Count":"1000"} |
To parse Notes column values as columns in pyspark, you can simply use function called json_tuple() (no need to use from_json()). It extracts the elements from a json column (string format) and creates the result as new columns.
df = df.select(col("id"),json_tuple(col("Notes"),"Country","Count")) \
.toDF("id","Country","Count")
df.show()
Output:
NOTE: json_tuple() also returns null if the column value is not in the correct format (make sure the column values are json/dictionary as a string without additional quotes).

Scenario Outline: Using Whitelists & Ranges?

I have been building up a Cucumber automation framework and have a number of components to test. I have used Scenario Outline's to capture various values and their expected responses.
What I see as a problem:
I have to specify every single type of input of data and the error message to go with it. From the example Scenario Outline below you can see I have certain numbers that all are expected to return the one message. If anything does not equal these values the return an error message:
Scenario Outline: Number is or is not valid
Given I send an event with the "Number" set to <num>
Then I will receive the following <message>
Examples:
| num | message |
| 0 | "Processed event" |
| 1 | "Processed event" |
| 2 | "Processed event" |
| 3 | "Processed event" |
| 4 | "Processed event" |
| 5 | "Processed event" |
| 6 | "Message failed" |
| -1 | "Message failed" |
| "One" | "Message failed" |
What I would like to do:
I would basically like to have a "whitelist" of good data defined in the Scenario Outline and if there is any other value input - it returns the the expected error message. Like the following:
Scenario Outline: Number is or is not valid
Given I send an event with the "Number" set to <num>
Then I will receive the following <message>
Examples:
| num | message |
| 0-5 | "Processed event" |
| Anything else | "Message failed" |
Is the following possible with the code behind it? As you can see it would have benefits of making an automation suite far more concise and maintainable. If so please let me know, keen to discuss.
Thanks!
Kirsty
Cucumber is a tool to support BDD. This means that it works really well when you have to communicate about behavior. But this particular problem is going towards validating the properties of the event validator, i.e. property based testing. So it might be worth to split the test strategy accordingly.
It appears there is a rule that valid events results are processed and invalid events are rejected. This is something you could test with Cucumber. For example:
Feature: Events
This system accepts events. Events are json messages.
Examples of well known valid and invalid json messages
can be found in the supplementary documentation.
Scenario: The system accepts valid events
When a well known valid event is send
Then the system accepts the event
And responds with "Processed event"
Scenario: The system rejects invalid events
When a well known invalid event is send
Then the system rejects the event
And responds with "Message failed"
It also appears there is a rule that valid events have a field "Number" set to any value between 0-5. And since sounds like a json object I'm guessing the strings "0", "1", "2", "3", "4", "5" are also valid. Anything else is invalid.
A good way to test this exhaustively is by using property based testing framework. For example JQwik. Given a description of a set of either valid or invalid values it will randomly try a few. For a simplified example:
package my.example.project;
import net.jqwik.api.*;
import static org.assertj.core.api.Assertions.assertThat;
class ValidatorProperties {
#Provide
Arbitrary<Object> validValues() {
Arbitrary<Integer> validNumbers = Arbitraries.integers().between(0, 5);
Arbitrary<String> validStrings = validNumbers.map(Object::toString);
return Arbitraries.oneOf(validNumbers, validStrings);
}
#Provide
Arbitrary<Object> invalidValues() {
Arbitrary<Integer> invalidNumbers = Arbitraries.oneOf(
Arbitraries.integers().lessOrEqual(-1),
Arbitraries.integers().greaterOrEqual(6)
);
Arbitrary<String> invalidStrings = invalidNumbers.map(Object::toString);
return Arbitraries.oneOf(
invalidNumbers,
invalidStrings,
Arbitraries.just(null)
);
}
#Property
void accepts0To5(#ForAll("validValues") Object value) {
Validator validator = new Validator();
assertThat(validator.isValid(value)).isTrue();
}
#Property
void rejectsAnythingElse(#ForAll("invalidValues") Object value) {
Validator validator = new Validator();
assertThat(validator.isValid(value)).isFalse();
}
static class Validator {
boolean isValid(Object event) {
return event != null && event.toString().matches("^[0-5]$");
}
}
}
Split this way the Cucumber tests describe how the system should respond to valid and invalid events. While the JQwik test describe what the properties of a valid and invalid event are. This allows much more clarity on the first and a greater fidelity on the second.

How do you write an array of numbers to a csv file?

let mut file = Writer::from_path(output_path)?;
file.write_record([5.34534536546, 34556.456456467567567, 345.56465456])?;
Produces the following error:
error[E0277]: the trait bound `{float}: AsRef<[u8]>` is not satisfied
--> src/main.rs:313:27
|
313 | file.write_record([5.34534536546, 34556.456456467567567, 345.56465456])?;
| ------------ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `AsRef<[u8]>` is not implemented for `{float}`
| |
| required by a bound introduced by this call
|
= help: the following implementations were found:
<&T as AsRef<U>>
<&mut T as AsRef<U>>
<Arc<T> as AsRef<T>>
<Box<T, A> as AsRef<T>>
and 44 others
note: required by a bound in `Writer::<W>::write_record`
--> /home/mlueder/.cargo/registry/src/github.com-1ecc6299db9ec823/csv-1.1.6/src/writer.rs:896:12
|
896 | T: AsRef<[u8]>,
| ^^^^^^^^^^^ required by this bound in `Writer::<W>::write_record`
Is there any way to use the csv crate with numbers instead of structs or characters?
Only strings or raw bytes can be written to a file; if you try to give it something else, it isn't sure how to handle the data (as #SilvioMayolo mentioned). You can map your float array to one with strings, and then you will be able to write the string array to the file.
let float_arr = [5.34534536546, 34556.456456467567567, 345.56465456];
let string_arr = float_arr.map(|e| e.to_string();
This can obviously be combined to one line without using the extra variable, but it is a little easier to see the extra step that we need to take if it is split apart.

How can I print nulls when converting a dataframe to json in Spark

I have a dataframe that I read from a csv.
CSV:
name,age,pets
Alice,23,dog
Bob,30,dog
Charlie,35,
Reading this into a DataFrame called myData:
+-------+---+----+
| name|age|pets|
+-------+---+----+
| Alice| 23| dog|
| Bob| 30| dog|
|Charlie| 35|null|
+-------+---+----+
Now, I want to convert each row of this dataframe to a json using myData.toJSON. What I get are the following jsons.
{"name":"Alice","age":"23","pets":"dog"}
{"name":"Bob","age":"30","pets":"dog"}
{"name":"Charlie","age":"35"}
I would like the 3rd row's json to include the null value. Ex.
{"name":"Charlie","age":"35", "pets":null}
However, this doesn't seem to be possible. I debugged through the code and saw that Spark's org.apache.spark.sql.catalyst.json.JacksonGenerator class has the following implementation
private def writeFields(
row: InternalRow, schema: StructType, fieldWriters:
Seq[ValueWriter]): Unit = {
var i = 0
while (i < row.numFields) {
val field = schema(i)
if (!row.isNullAt(i)) {
gen.writeFieldName(field.name)
fieldWriters(i).apply(row, i)
}
i += 1
}
}
This seems to be skipping a column if it is null. I am not quite sure why this is the default behavior but is there a way to print null values in json using Spark's toJSON?
I am using Spark 2.1.0
To print the null values in JSON using Spark's toJSON method, you can use following code:
myData.na.fill("null").toJSON
It will give you expected result:
+-------------------------------------------+
|value |
+-------------------------------------------+
|{"name":"Alice","age":"23","pets":"dog"} |
|{"name":"Bob","age":"30","pets":"dog"} |
|{"name":"Charlie","age":"35","pets":"null"}|
+-------------------------------------------+
I hope it helps!
I have modified JacksonGenerator.writeFields function and included in my project.
Below are the steps-
1) Create package 'org.apache.spark.sql.catalyst.json' inside 'src/main/scala/'
2) Copy JacksonGenerator class
3) Create JacksonGenerator.scala class in '' package and paste the copied code
4) modify writeFields function
private def writeFields(row: InternalRow, schema: StructType, fieldWriters:Seq[ValueWriter]): Unit = {
var i = 0
while (i < row.numFields) {
val field = schema(i)
if (!row.isNullAt(i)) {
gen.writeFieldName(field.name)
fieldWriters(i).apply(row, i)
}
else{
gen.writeNullField(field.name)
}
i += 1
}}
tested with Spark 3.0.0:
When creating your spark session, set spark.sql.jsonGenerator.ignoreNullFields to false.
The toJSON function internally uses org.apache.spark.sql.catalyst.json.JacksonGenerator, which in turn takes org.apache.spark.sql.catalyst.json.JSONOptions for configuration.
The latter includes an option ignoreNullFields.
However, toJSON uses the defaults, which in the case of this particular option is taken from the sql config given above.
An example with the configuration set to false:
val schema = StructType(Seq(StructField("a", StringType), StructField("b", StringType)))
val rows = Seq(Row("a", null), Row(null, "b"))
val frame = spark.createDataFrame(spark.sparkContext.parallelize(rows), schema)
println(frame.toJSON.collect().mkString("\n"))
produces
{"a":"a","b":null}
{"a":null,"b":"b"}
import org.apache.spark.sql.types._
import scala.util.parsing.json.JSONObject
def convertRowToJSON(row: Row): String = {
val m = row.getValuesMap(row.schema.fieldNames).filter(_._2 != null)
JSONObject(m).toString()
}

Fail to convert to json a record with union types with websharper

I'm using websharper to convert records/unions to post to a API in json. This are the declarations:
[<NamedUnionCases "Icon">]
type IconName =
| Aaabattery
| Abacus
[<NamedUnionCases>]
type Size =
| H1
| H2
| H3
| H4
| H5
type Icon = {title:string; name:IconName; size:Size; updated:DateTime; }
Icon {title = "hello";
name = Aaabattery;
size = H1;
updated = 11/06/2015 3:18:29 p. m.;}
This is how I encode:
let ToJString (jP:Core.Json.Provider, msg:Widget) =
let enc = jP.GetEncoder<Widget>()
enc.Encode msg
|> jP.Pack
|> Core.Json.Stringify
printfn "D:"
let j = Core.Json.Provider.Create()
let data = ToJString(j, widget)
printfn "D: %A" data
The program never reach the last printfn "D: %A" data. However, If I turn the unions to enum or remove it them worked. What is missing?
[<NamedUnionCases>] without argument relies on the names of the arguments to disambiguate between cases. For example, with the following type:
[<NamedUnionCases>]
type Foo =
| Case1 of x: int
| Case2 of y: int
values are serialized as either {"x":123} or {"y":123}, so deserialization is possible by checking which fields are present. But in your type Size, all cases have zero argument so they would essentially all be serialized as {}, so the deserializer wouldn't know which case to choose.
There are several solutions:
If you want to serialize these values as objects with a field telling which case it is, use [<NamedUnionCases "fieldName">] to get eg. {"fieldName":"H1"}.
If you want to serialize them as constant numbers or strings, use the Constant attribute like this:
type Size =
| [<Constant 1>] H1
| [<Constant 2>] H2
| [<Constant 3>] H3
| [<Constant 4>] H4
| [<Constant 5>] H5
this way for example H1 will be serialized simply as 1.