JSON to JSON conversion - json

I have a 3rd party service which returns me the following response:
JSON 1
{"Bag":{"Type":{"$":"LIST"},"Source":{"$":"ABC"},"Id":{"$":"151559458"},"Name":{"$":"Bag list"},"Source":{"$":"ABC"},"CustomerId":{"$":"abc#gmail.com"},"DateTime":{"$":"2014-07-17T12:36:01Z"}}}
But I have to format this JSON into the following format:
JSON2
{"Bag":{"Type":"LIST","Source":"ABC","Id":"151559458","Name":"Bag list","Source":"ABC","CustomerId":"abc#gmail.com","DateTime":"2014-07-17T12:36:01Z"}}
And Vice versa like from client I get JSON2 and I have to send this response to service in JSON1 format.

Given the input in the first JSON schema, transform it to a language specific data structure. You can typically find a library to do this.
Transform your language specific data structure, using the idioms of whichever language you are using, to the equivalent of the second JSON schema.
Transform the language specific data structure to JSON text. You can typically find a library to do this.

You can use jq tool http://stedolan.github.io/jq
Then the conversion is console one-liner:
$ jq '{Bag: .Bag | with_entries({key, value: .value."$"})}' file.json
And the result is
{
"Bag": {
"Type": "LIST",
"Source": "ABC",
"Name": "Bag\nlist",
"Id": "151559458",
"DateTime": "2014-07-17T12:36:01Z",
"CustomerId": "abc#gmail.com"
}
}

Related

Extract nested JSON content from JSON with use of JSON Extractor in jMeter

I have JSON content inside another JSON that I need to extract as it is, without parsing its contents:
{
"id": 555,
"name": "aaa",
"JSON": "{\r\n \"fake1\": {},\r\n \"fake2\": \"bbbb\",\r\n \"fake3\": \"eee\" \r\n}",
"after1": 1,
"after2": "test"
}
When I use JSON Extractor with JSON Path expression:
$.JSON
It returns:
"{
"fake1": {},
"fake2": "bbbb",
"fake3": "eee"
}"
when I need to get the raw string:
"{\r\n \"fake1\": {},\r\n \"fake2\": \"bbbb\",\r\n \"fake3\": \"eee\" \r\n}"
I think you need to switch to JSR223 PostProcessor instead of the JSON Extractor and use the following code:
def json = new groovy.json.JsonSlurper().parse(prev.getResponseData()).JSON
vars.put('rawString', org.apache.commons.text.StringEscapeUtils.escapeJson(json))
You will be able to refer the extracted value as ${rawString} where required.
More information:
Apache Groovy - Parsing and producing JSON
Apache Groovy: What Is Groovy Used For?
console.log(JSON.stringify(data.JSON))
Here data is your given JSON data.
At first, you have to extract your JSON/data. Then you have to stringify the JSON data using JSON.stringify().
The confusing fact you have done here is that you named your key in the JSON object as "JSON".
In js when you extract a JSON object if there is another nested JSON object you will always get JSON data by just data.key_name
where data is JSON data
key is for Nested JSON key

How to get the results of a TDE in MarkLogic in RDF/JSON or Turtle format?

With any kind of Template Driven Extraction (TDE) in MarkLogic, how can I convert the results I get from the tde:node-data-extract function into RDF/JSON format? The JSON format returned by this method is not compliant with RDF/JSON, so I can't use it directly to insert triples into another database. In this case, I don't want to insert the triples into the same database that I'm applying the template against, I just want to use the template to create triples from XML data.
Here's an example of the JSON output that I get from the tde:node-data-extract function:
{
"document/pt/document/39627370": [{
"triple": {
"subject": "http://www.example.com/document/id/39627370",
"predicate": "http://www.example.com/typeOf",
"object": {
"datatype": "http://www.w3.org/2001/XMLSchema#string",
"value": "http://www.example.com/document"
}
}
},
{
"triple": {
"subject": "http://www.example.com/publisher/Oxford_University_Press",
"predicate": "http://www.example.com/typeOf",
"object": {
"datatype": "http://www.w3.org/2001/XMLSchema#string",
"value": "http://www.example.com/publisher"
}
}
}
}
}
Convert each "triple" property into a triple object using sem.triple(). Then serialize the array of sem.triple objects using sem.rdfSerialize().
https://docs.marklogic.com/sem.triple
https://docs.marklogic.com/sem.rdfSerialize
With the help from John and Mads, I found a slight variation that works really well assuming you're in the query console. $docs is any sequence of documents and $template is the TDE template.
let $jsontriples := tde:node-data-extract($docs, $template)
for $key in map:keys($jsontriples)
let $entry := map:get($jsontriples, $key)
return $entry["triple"]
This will return the triples automatically serialized into Turtle format in the query console Result tab, which you can switch to JSON or Text. I assume the answer from John is the most correct in a situation where the serialization is not automatically performed (e.g. when not using the query console).

Process events from Event hub using pyspark - Databricks

I have a Mongo change stream (a pymongo application) that is continuously getting the changes in collections. These change documents as received by the program are sent to Azure Event Hubs. A Spark notebook has to read the documents as they get into Event Hub and do Schema matching (match the fields in the document with spark table columns) with the spark table for that collection. If there are fewer fields in the document than in the table, columns have to be added with Null.
I am reading the events from Event Hub like below.
spark.readStream.format("eventhubs").option(**config).load().
As said in the documentation, the original message is in the 'body' column of the dataframe that I am converting to string. Now I have got the Mongo document as a JSON string in a streaming dataframe. I am facing below issues.
I need to extract the individual fields in the mongo document. This is needed to compare what fields are present in the spark table and what is not in Mongo document. I saw a function called get_json_object(col,path). This essentially returns a string again and I cannot individually select all the columns.
If from_json can be used to convert the JSON string to Struct type, I cannot specify the Schema because we have close to 70 collections (corresponding number of spark tables as well) each sending Mongo docs with fields from 10 to 450.
If I can convert the JSON string in streaming dataframe to a JSON object whose schema can be inferred by the dataframe (something like how read.json can do), I can use SQL * representation to extract the individual columns, do few manipulations & then save the final dataframe to the spark table. Is it possible to do that? What is the mistake I am making?
Note: Stram DF doesn't support collect() method to individually extract the JSON string from underlying rdd and do the necessary column comparisons. Using Spark 2.4 & Python in Azure Databricks environment 4.3.
Below is the sample data I get in my notebook after reading the events from event hub and casting it to string.
{
"documentKey": "5ab2cbd747f8b2e33e1f5527",
"collection": "configurations",
"operationType": "replace",
"fullDocument": {
"_id": "5ab2cbd747f8b2e33e1f5527",
"app": "7NOW",
"type": "global",
"version": "1.0",
"country": "US",
"created_date": "2018-02-14T18:34:13.376Z",
"created_by": "Vikram SSS",
"last_modified_date": "2018-07-01T04:00:00.000Z",
"last_modified_by": "Vikram Ganta",
"last_modified_comments": "Added new property in show_banners feature",
"is_active": true,
"configurations": [
{
"feature": "tip",
"properties": [
{
"id": "tip_mode",
"name": "Delivery Tip Mode",
"description": "Tip mode switches the display of tip options between percentage and amount in the customer app",
"options": [
"amount",
"percentage"
],
"default_value": "tip_percentage",
"current_value": "tip_percentage",
"mode": "multiple or single"
},
{
"id": "tip_amount",
"name": "Tip Amounts",
"description": "List of possible tip amount values",
"default_value": 0,
"options": [
{
"display": "No Tip",
"value": 0
}
]
}
]
}
]
}
}
I would like to separate and take out the full_document in the sample above. When I use get_json_object, I am getting the full_document in another streaming dataframe as JSON string and not as an object. As you can see, there are some array types in full_document which I can explode (documentation says that explode is supported in streaming DF, but havent tried) but there are some objects also (like struct type) which I would like to extract the individual fields. I cannot use the SQL '*' notation because what get_json_object returns is a string and not the object itself.
Its convincing that this much varied Schema of the JSON would be better with schema mentioned explicitly. So I took it like, in a streaming environment with very different Schema of the incoming stream, its always better to specify the schema. So I am proceeding with get_json_object and from_json and reading the schema through a file.

How to parse JSON file with no SparkSQL?

I have the following JSON file.
{
"reviewerID": "ABC1234",
"productID": "ABCDEF",
"reviewText": "GOOD!",
"rating": 5.0,
},
{
"reviewerID": "ABC5678",
"productID": "GFMKDS",
"reviewText": "Not bad!",
"rating": 3.0,
}
I want to parse without SparkSQL and use a JSON parser.
The result of parse that i want is textfile.
ABC1234::ABCDEF::5.0
ABC5678::GFMKDS::3.0
How to parse the json file by using json parser in spark scala?
tl;dr Spark SQL supports JSONs in the format of one JSON per file or per line. If you'd like to parse multi-line JSONs that can appear together in a single file, you'd have to write your own Spark support as it's not currently possible.
A possible solution is to ask the "writer" (the process that writes the files to be nicer and save one JSON per file) that would make your life much sweeter.
If that does not give you much, you'd have to use mapPartitions transformation with your parser and somehow do the parsing yourself.
val input: RDD[String] = // ... load your JSONs here
val jsons = jsonRDD.mapPartitions(json => // ... use your JSON parser here)

How to load multiline JSON in spark with Java

I am looking for a way to load multiline JSON into Spark using Java. The Spark SQLContext has methods to load JSON, but it only supports "one record per line". I have a multiline JSON file that I need to process.
Example input:
The JSON contains words, definitions and example sentences :
{
"one-armedbandit":
[
{
"function": "noun",
"definition": "slot machine",
"examples":
[
]
}
],
...
}
The Spark ingestion methods indeed accept a json-line format. You could consider using a json processor to convert your data to this format before processing.
What I did was read the JSON into a List of POJOs with a JSON processor, then called parallelize on the SparkContext to get a JavaRDD.