Transform complex json files using ADF - json

I want to transform multiple complex JSON files into one complex JSON file using Azure Data Factory dataflow.
The multiple complex input JSON files are in the following format:
{
"creationDate": "2022-01-19T17:00:17Z",
"count": 2,
"data": [
{
"id": "",
"name": "Project A",
"url": "",
"state": "Open",
"revision": 1,
"visibility": "private",
"lastUpdateTime": "2019-09-23T08:44:45.103Z"
},
{
"id": "",
"name": "Project B",
"url": "",
"state": "Done",
"revision": 1,
"visibility": "private",
"lastUpdateTime": "2019-12-31T09:38:49.16Z"
}
]
}
We want to transform those files to one single json file in the format:
[
{
"date": "2022-01-14",
"count": 2,
"projects": [
{
"name": "Project A",
"state": "",
"lastUpdateTime": ""
},
{
"name": "Project B",
"state": "",
"lastUpdateTime": ""
}
]
},
{
"date": "2022-01-17",
"count": 3,
"projects": [
{
"name": "Project A",
"state": "",
"lastUpdateTime": ""
},
{
"name": "Project B",
"state": "",
"lastUpdateTime": ""
},
{
"name": "Project C",
"state": "",
"lastUpdateTime": ""
}
]
}
]
We were using the derived column with the expression #(name=data.name, state=data.state).
Can someone help us how to do this? We tried a lot of things like derived column, first flattening but we can't get it as we like...
Thanks!

The solution on the end was pretty close on what we had.
So our final solution is as follow:
First flatten with Unroll by set to data. We also mapped the creationDate to date.
Create a derived column called projects with as expression #(name=name,state=state,lastUpdateTime=lastUpdateTime,url=url)
Group activity with a Group by on date. Set Aggregates for count on first(count) and (this was the solution) set projects to collect(projects).
Select activity which will select the columns date, count and projects.
Sort activity with a sort on date Ascending.
Sink with file name option to output to single file and partion option set to single partion
Note:
Because we have a sink to one big json file (output to single file). The sorting wasn't correct written to json. If we debugged (data preview) the dataflow everything was correct. Strange behavior. When we changed the Sort activity the option Partion option to Single partion the json file had the right sort order.

Related

Jmeter: JSON response manipulation and passing to the next http request

I've got the response from HTTP GET request as JSON file and I want to use that JSON and pass it to the next HTTP request. I got the following response data
{
"apiInfo": {
"id": "23143",
"name": "bookkeeping",
"state": "used",
"data": "15893712000000"
},
"apiDetails": [
{
"bookName": "abc",
"state": "old",
"noOfTimesUsed": "53"
"additionalParam"{
"name": "abc",
"id": "123"
}
},
{
"bookName": "def",
"state": "new",
"noOfTimesUsed": "5",
"action": "keep"
"additionalParam"{
"name": "def",
"id": "456"
}
},
{
"bookName": "xyz",
"state": "avg",
"noOfTimesUsed": "23"
"additionalParam"{
"name": "ghi",
"id": "789"
}
},
{
"bookName": "pqr",
"state": "old",
"noOfTimesUsed": "75",
"action": "discard"
"additionalParam"{
"name": "jkl",
"id": "012"
}
}
]
}
I want to use "apiInfo" & "apiDetails" part from the JSON response and manipulate its data. As you can notice, some array field have attribute "action" in it and some one doesn't. I want to make sure all the field in the array have this data and is assigned as ' "action":"keep" '. Also, I want to add "id" from apiInfo & "name" from additionalParams from apiDetails itself. The end result I want is somewhat like this
"apiDetails": [
{
"id": "23143",
"bookName": "abc",
"state": "old",
"noOfTimesUsed": "53",
"action": "keep",
"name":"abc"
},
{
"id": "23143",
"bookName": "def",
"state": "new",
"noOfTimesUsed": "5",
"action": "keep",
"name":"def"
},
{
"id": "23143",
"bookName": "xyz",
"state": "avg",
"noOfTimesUsed": "23",
"action": "keep",
"name":"ghi"
},
{
"id": "23143",
"bookName": "pqr",
"state": "old",
"noOfTimesUsed": "75",
"action": "keep",
"name":"jkl"
}
]
I've been trying to use JSR223 sampler and have been struggling with it. It's bit complicated and I need help. P.S.: I've tried using javascript code to manipulate the results as desired but have been unsuccessful.
Please help.
Thanks, Sid
Add JSR223 PostProcessor as a child of the request which returns the above JSON
Put the following code into "Script" area:
def apiDetails = new groovy.json.JsonSlurper().parse(prev.getResponseData()).apiDetails
apiDetails.each { apiDetail ->
apiDetail.put('action', 'keep')
}
vars.put('request', new groovy.json.JsonBuilder(apidetails: apiDetails.collect()).toPrettyString())
That's it, you should be able to refer the generated request as ${request} where required
More information:
Apache Groovy - Parsing and producing JSON
Apache Groovy - Why and How You Should Use It

Deep_search in big json object by method ruby

I have a problem with filtering JSON object in ruby!
1. My JSON object is a big array of two hashes.
2. That hashes includes another hashes that include another arrays and hashes (oh god! :c).
My goal is to output Big hash that contains concrete value!
Examples down below:
JSON file just like in
[#That's hash 0{
"id": 0,
"firstName": "",
"lastName": "",
"middleName": null,
"email": "",
"phones": [
null,
null
],
"groups": [{
"id": 0,
"name": ""
}],
"disabled": "",
"technologies": [{
"id": 0,
"name": "",
"children": [{
"id": 1,
"name": "",
"children": [{
"id": 2,
"name": "Farmer",
"children": []
}]
}]
}],
"fullName": ""
},
#That's hash1{
"id": 0,
"firstName": "",
"lastName": "",
"middleName": null,
"email": "",
"phones": [
null,
null
],
"groups": [{
"id": 0,
"name": ""
}],
"disabled": "",
"technologies": [{
"id": 0,
"name": "",
"children": [{
"id": 1,
"name": "",
"children": [{
"id": 2,
"name": "Not Farmer",
"children": []
}]
}]
}],
"fullName": ""
}
]
Pseudocode on ruby (what I want to):
file = File.read("example.json") #=> Reading JSON file
data_hash = JSON.parse(file, object_class: Hash) #=> Parsing JSON file
data = data_hash.filter #=> filter that hash if "technologies" is not empty!
data.get_hash_by_value(value) #=> For example i put "Not Farmer" in value, and that method must search in all data that (value) and output hash1 for me (because hash0 not include "Not Farmer")
That's big problem, i don't know what to do!!!
My thoughts is a recursive finding method..
I wrote my own functions. Maybe it can help someone.
def check_children(item)
return true if item["name"] == "Farmer"
item["children"].each do |child_item|
break if check_children(child_item)
end
return false
end
data_hash.each do |item|
next if item["technologies"].empty?
item["technologies"].each do |technologies_item|
next if technologies_item["children"].empty?
technologies_item["children"].each do |children_item|
data << item if check_children(children_item)
end
end
end

Compare 2 cucumber JSON reports with ruby

The problem is: I have 2 cucumber test reports in JSON format
I need to remove redundant key-value pairs from those reports and compare them, but I can't understand how to remove the unnecessary data from those 2 jsons because of their structure after JSON.parse (array or hash with many nested arrays/hashes). Please advice if there are some gems or known solutions to do this
JSON structure is e.g. :
[
{
"uri": "features/home_screen.feature",
"id": "as-a-user-i-want-to-explore-home-screen",
"keyword": "Feature",
"name": "As a user I want to explore home screen",
"description": "",
"line": 2,
"tags": [
{
"name": "#home_screen",
"line": 1
}
],
"elements": [
{
"keyword": "Background",
"name": "",
"description": "",
"line": 3,
"type": "background",
"before": [
{
"match": {
"location": "features/step_definitions/support/hooks.rb:1"
},
"result": {
"status": "passed",
"duration": 505329000
}
}
],
"steps": [
{
"keyword": "Given ",
"name": "I click OK button in popup",
"line": 4,
"match": {
"location": "features/step_definitions/registration_steps.rb:91"
},
"result": {
"status": "passed",
"duration": 2329140000
}
},
{
"keyword": "And ",
"name": "I click Allow button in popup",
"line": 5,
"match": {
"location": "features/step_definitions/registration_steps.rb:96"
},
"result": {
"status": "passed",
"duration": 1861776000
}
}
]
},
Since you are asking for a gem, you might try iteraptor I have created exactly for this kind of tasks.
It allows iterating, mapping and reducing the deeply nested structures. For instance, to filter out all the keys called "name" on all levels, you might do:
input.iteraptor.reject(/name/)
The more detailed description might be found on the github page linked above.

How to format json output in pyspark?

I am having a trouble to preserve the order of my json and pretty printing it in pyspark.
Below is sample code:
json_out = sqlContext.jsonRDD(sc.parallelize([json.dumps(info)]))
# here info is my ordered dictionary
json_out.toJSON().saveAsTextFile("file:///home//XXX//samplejson")
One more thing is that I want my output as single file and not as partitioned datasets.
Could anyone help in pretty printing and preserving the order of output json in my case?
info sample:
Note:TypeA,TypeB etc is a list meaning there can be more than one product in TypeA or TypeB.
{
"score": {
"right": ,
"wrong":
},
"articles": {
"TypeA": [{
"ID": 333,
"Name": "",
"S1": "",
"S2": "",
"S3": "",
"S4": ""
}],
"TypeB": [{
"ID": 123,
"Name": "",
"T1": "",
"T2": "",
"T3": "",
"T4": "",
"T5": "",
"T6": ""
}]
}
}
( I have tried using json.dumps(info,indent=2),but of no use.

MySQL script able to parse read and write json data to a table DB

I want to parse some not-so-simple json data and read into a model then write that modelled data into a DB. Or rather write into a DB without involving the model itself?
My json looks like this.
{
"parent": [{
"id": "1",
"name": "name a",
"children": [{
"id": "34",
"name": "cvfd",
"children": []
}, {
"id": "5643",
"name": "name tyu",
"children": [{
"id": "5433",
"name": "blah",
"children": []
}]
}]
}]
}
And my table would have simple two columns such as "id" and "name".
Just read/write might be straightforward but it involves parsing as well.
Any suggestion/advice appreciated and even another alternative to Mysql itself for better approach?