I understand that we can define Schema for a JSON structure, but I'm unable to find any info on how to tie the schema with actual json data, where exactly within JSON data we mention the schema for the JSON.
I'm trying to understand how I can return a JSON data along with the Schema so that the end user/service that consumes my JSON can discover and understand the schema associated with the json data.
Related
I am writing a Swift program to interact with our backend Database. These database tables can change format, and each row is from DynamoDB and is represented by a JSON object. Within the program, it will grab the cache from the server which contains the Database structures and then the current contents of the database. I don't want to hardcode the database structures into the app as that would obviously make this too very hard to maintain. What is the best way to read in these rows into a generic dictionary object with dynamic names that can hold any of the JSON types?
One way you can handle parsing dynamic JSON is to have a key or a set of keys that gives information about which type of model to decode. For example, you can have a key type in your JSON objects:
{
...
"type": "some type"
...
}
and an intermediate model:
struct IntermediateModel: Codable {
let type: String // or enum
}
you first decode all your JSON objects to the intermediate model and then decode them to the actual model type based on the value of type.
I have written some helper protocols that make this easier here.
I'd like to define a JSON schema which defines a hierarchy of data structures, but one of the data items is itself json but the schema for that data item is not strongly typed (ie unknown structure at the time of writing the schema). I would like the validator to be able to validate that this item is valid json, and I'd like to avoid having to use backslash-quote in the json in the way I would have to do if the data item were declared as a raw string.
Does JSON Schema support this?
More specifically (if it's relevant) I'm using C# and NewtonSoft JSONSchema classes.
I think the answer is to just declare something as 'object' - its parsed for json but not validated for members etc. I guess this is because 'allow an unknown property' is enabled by default for objects.
What I'm trying to do is something similar to the Stackoverflow question here: basically converting .seq.gz JSON files to Parquet files with a proper schema defined.
I don't want to infer the schema, rather I would like to define my own, ideally having my Scala case classes so they can be reused as models by other jobs.
I'm not too sure whether I should deserialise my JSON into a case class and let toDS() to implicitly convert my data like below:
spark
.sequenceFile(input, classOf[IntWritable], classOf[Text])
.mapValues(
json => deserialize[MyClass](json.toString) // json to case class instance
)
.toDS()
.write.mode(SaveMode.Overwrite)
.parquet(outputFile)
...or rather use a Spark Data Frame schema instead, or even a Parquet schema. But I don't know how to do it though.
My objective is having full control over my models and possibly map JSON types (which is a poorer format) to Parquet types.
Thanks!
I have case: in flow content is always json format and the data inside json always change (both kyes and values). Is this possible to convert this flow content to csv?
Please note that, keys in json are always change.
Many thanks,
To achieve this usecase we need to generate avro schema dynamically for each json record first then convert to AVRO finally convert AVRO to CSV
Flow:
1.SplitJson //split the array of json records into individual records
2.InferAvroSchema //infer the avro schema based on Json record and store in attribute
3.ConvertJSONToAvro //convert each json record into Avro data file
4.ConvertRecord //read the avro data file dynamically and convert into CSV format
5.MergeContent (or) MergeRecord processor //to merge the splitted flowfiles into one flowfile based on defragment strategy.
Save this xml and upload to your nifi instance and change as per your requirements.
I want to convert my nested json into csv ,i used
df.write.format("com.databricks.spark.csv").option("header", "true").save("mydata.csv")
But it can use to normal json but not nested json. Anyway that I can convert my nested json to csv?help will be appreciated,Thanks!
When you ask Spark to convert a JSON structure to a CSV, Spark can only map the first level of the JSON.
This happens because of the simplicity of the CSV files. It is just asigning a value to a name. That is why {"name1":"value1", "name2":"value2"...} can be represented as a CSV with this structure:
name1,name2, ...
value1,value2,...
In your case, you are converting a JSON with several levels, so Spark exception is saying that it cannot figure out how to convert such a complex structure into a CSV.
If you try to add only a second level to your JSON, it will work, but be careful. It will remove the names of the second level to include only the values in an array.
You can have a look at this link to see the example for json datasets. It includes an example.
As I have no information about the nature of the data, I can't say much more about it. But if you need to write the information as a CSV you will need to simplify the structure of your data.
Read json file in spark and create dataframe.
val path = "examples/src/main/resources/people.json"
val people = sqlContext.read.json(path)
Save the dataframe using spark-csv
people.write
.format("com.databricks.spark.csv")
.option("header", "true")
.save("newcars.csv")
Source :
read json
save to csv