How to access data from nested dicts json? - json

I want to get some data from a json file. I can access everything with the code below.
import json
with open('C:\\Users\\me\\Documents\\stdin.json', 'r', encoding='utf8', errors='ignore') as json_file:
data = json.load(json_file)
print("Type: ", type(data))
print("VM: ", data["Datacenter"])
The .json file looks like this:
{
"Datacenter":[
{
"Folder":[
{
"Folder":[
{
"VirtualMachine":[
{
"moid":"vm-239566",
"name":"DEV CentOS 6",
},
{
"moid":"vm-239464",
"name":"DEV Sles 12",
},
],
"moid":"group-v239127",
"name":"DEV-VMs"
},
],
"moid":"group-v78",
"name":"Test and Dev"
},
{
"VirtualMachine":[
{
"moid":"vm-66130",
"name":"Hyv16-clone",
}
],
"moid":"group-v77",
"name":"Templates"
}
],
"moid":"datacenter-21",
"name":"Datencenter"
}
],
"vSphereHost":"srv01",
"vSphereProductLine":"vpx",
"vSphereServer":"VMware vCenter Server",
"vSphereVersion":"xxx",
"version":"1.0",
"viewType":"VMs and Templates"
}
Note that the original json file was much bigger as I deleted lines for readabilty. Also note that I run everything from Command Line, as my IDE always gives me the error UnicodeEncodeError: 'charmap' codec can't encode characters in position 22910-22912: character maps to <undefined>
I tried to use data["VirtualMachine"] instead of data["Datacenter"] but then I get an error... TypeError: 'VirtualMachine' is an invalid keyword argument for this function.
So how can I get/print the moid and name of a VM? I am really new to coding and donĀ“t know how to deal with nested dictionarys

However your question does not seems clear but from whatever you have mentioned this can be help you deriving the value in nested json if you have already have dataframe created. You can go ahead and add get() till you reach what you require. Below is the sample that you can use
import json
data = data.apply(lambda x: json.loads(json.loads(x).get("Folder","{}")).get("moid") if x else None)

Related

How to parse JsonArray in Scala and writing them in a DataFrame?

Using my Scala HTTP Client I retrieved a response in JSON format from an API GET call.
My end goal is to write this JSON content to an AWS S3 bucket in order to make it available as a table on RedShift running a simple AWS Glue crawler.
My thinking is to parse this JSON message and somehow converting into a Spark DataFrame, so later on I can save it to my preferred S3 location in the format of .csv, .parquet, or whatever.
The JSON file looks like this
{
"response": {
"status": "OK",
"start_element": 0,
"num_elements": 100,
"categories": [
{
"id": 1,
"name": "Airlines",
"is_sensitive": false,
"last_modified": "2010-03-19 17:48:36",
"requires_whitelist_on_external": false,
"requires_whitelist_on_managed": false,
"is_brand_eligible": true,
"requires_whitelist": false,
"whitelist": {
"geos": [],
"countries_and_brands": []
}
},
{
"id": 2,
"name": "Apparel",
"is_sensitive": false,
"last_modified": "2010-03-19 17:48:36",
"requires_whitelist_on_external": false,
"requires_whitelist_on_managed": false,
"is_brand_eligible": true,
"requires_whitelist": false,
"whitelist": {
"geos": [],
"countries_and_brands": []
}
}
],
"count": 148,
"dbg_info": {
"warnings": [],
"version": "1.18.1621",
"output_term": "categories"
}
}
}
The content I would like to map to a Dataframe is the one contained by the "categories" JSON Array.
I have managed to parse the message using json4s.JsonMethods method parse this way:
val parsedJson = parse(request) \\ "categories"
Obtaining the following:
output: org.json4s.JValue = JArray(List(JObject(List((id,JInt(1)), (name,JString(Airlines)), (is_sensitive,JBool(false)), (last_modified,JString(2010-03-19 17:48:36)), (requires_whitelist_on_external,JBool(false)), (requires_whitelist_on_managed,JBool(false)), (is_brand_eligible,JBool(true)), (requires_whitelist,JBool(false)), (whitelist,JObject(List((geos,JArray(List())), (countries_and_brands,JArray(List()))))))), JObject(List((id,JInt(2)), (name,JString(Apparel)), (is_sensitive,JBool(false)), (last_modified,JString(2010-03-19 17:48:36)), (requires_whitelist_on_external,JBool(false)), (requires_whitelist_on_managed,JBool(false)), (is_brand_eligible,JBool(true)), (requires_whitelist,JBool(false)), (whitelist,JObject(List((geos,JArray(List())), (countries_and_brands,JArray(List()))))))))
However, I am completely lost on how to proceed. I have even tried using another library for Scala called uJson:
val json = (ujson.read(request))
val tuples = json("response")("categories").arr /* <-- categories is an array */ .map { item =>
(item("id"), item("name"))
This time I have only parsed two fields for testing, but this shouldn't change much. Hence, I obtained the following structure:
tuples: scala.collection.mutable.ArrayBuffer[(ujson.Value, ujson.Value, ujson.Value, ujson.Value)] = ArrayBuffer((1,"Airlines",false,"2010-03-19 17:48:36"), (2,"Apparel",false,"2010-03-19 17:48:36"))
However, also this time I do not know how to move forward and everything I try results in errors, mostly related to format incompatibility.
Please, feel free to propose any other approach to achieve my goal even if it changes totally my workflow. I rather learn something properly. Thanks
We can use the following code to convert JSON to Spark Dataframe/Dataset
val df00 =
spark.read.option("multiline","true").json(Seq(JSON_OUTPUT).toDS())

extract certain json object nifi Json

Im trying to extracting json objects and store it to hdfs. I'm targeting message attribute which is a6,b6,c6,d6,e6
json sample
{
"#timestamp":"2020-07-06T07:35:29.047Z",
"#metadata":{
"beat":"filebeat",
"type":"_doc",
"version":"7.7.1"
},
"log":{
"offset":91,
"file":{
"path":"C:\\Program Files\\Filebeat\\test-kafka\\test_csv.csv"
}
},
"message":"a6,b6,c6,d6,e6",
"input":{
"type":"log"
},
"ecs":{
"version":"1.5.0"
},
"host":{
"name":"host"
},
"agent":{
"version":"7.7.1",
"type":"filebeat",
"ephemeral_id":"0b4a288f-f7ac-4db9-835e-60ca07a45fff",
"hostname":"host",
"id":"5e2fec03-bbdc-4f91-acc9-4ab36c7268db"
}
}
GenerateFlowFile properties
JsonEvaluatePath properties
but problem JsonEvaluatePath not working as i expected, i thought it will extracting only message attribute.
hadoop#ambari:~$ hdfs dfs -cat /user/test/5a422f02-9074-4384-a3c9-f3e3ce7c2e40
{
"#timestamp":"2020-07-06T07:35:29.047Z",
"#metadata":{
"beat":"filebeat",
"type":"_doc",
"version":"7.7.1"
},
"log":{
"offset":91,
"file":{
"path":"C:\\Program Files\\Filebeat\\test-kafka\\test_csv.csv"
}
},
"message":"a6,b6,c6,d6,e6",
"input":{
"type":"log"
},
"ecs":{
"version":"1.5.0"
},
"host":{
"name":"host"
},
"agent":{
"version":"7.7.1",
"type":"filebeat",
"ephemeral_id":"0b4a288f-f7ac-4db9-835e-60ca07a45fff",
"hostname":"host",
"id":"5e2fec03-bbdc-4f91-acc9-4ab36c7268db"
}
}
Am i missing something?
Since you used EvaluateJsonPath with destination set as flow file attributes, it extracted message into a flow file attribute and the content of the flow file is still the same as it was before. You would need to use another processor like AttributesToJson before PutHDFS to rewrite the flow file content with the attributes you want. An alternative might be to set EvaluateJsonPath destination to flow file content, but I'm not sure if that produces valid json.

Data Factory - Retrieve value from field with dash "-" from JSON file

In my pipeline I reach through REST API using GET request to a 3rd party database. As an output I receive a bunch of JSON files. The number of JSON files I have to download (same as number of iterations I will have to use) is in one of the fields in JSON file. The problem is that the field's name is 'page-count' which contains "-".
#activity('Lookup1').output.firstRow.meta.page.page-count
Data Factory considers dash in field's name as a minus sign, so I get an error instead of value from that field.
{"code":"BadRequest","message":"ErrorCode=InvalidTemplate, ErrorMessage=Unable to parse expression 'activity('Lookup1').output.firstRow.meta.page.['page-count']'","target":"pipeline/Product_pull/runid/f615-4aa0-8fcb-5c0a144","details":null,"error":null}
This is how the structure of JSON file looks like:
"firstRow": {
"meta": {
"page": {
"number": 1,
"size": 1,
"page-count": 7300,
"record-count": 7300
},
"non-compliant-record-count": 7267
}
},
"effectiveIntegrationRuntime": "intergrationRuntimeTest1",
"billingReference": {
"activityType": "PipelineActivity",
"billableDuration": [
{
"meterType": "SelfhostedIR",
"duration": 0.016666666666666666,
"unit": "Hours"
}
]
},
"durationInQueue": {
"integrationRuntimeQueue": 1
}
}
How to solve this problem?
The below syntax works when retrieving the value for a json element with a hyphen. It is otherwise treated as a minus sign by the parser. It does not seem to be documented by Microsoft however I managed to get this to work through trial and error on a project of mine.
#activity('Lookup1').output.firstRow.meta.page['page-count']
This worked for us too. We had the same issue where we couldn't reference an output field that contained a dash(-). We referenced this post and used the square brackets and single quote and it worked!
Example below.
#activity('get_token').output.ADFWebActivityResponseHeaders['Set-Cookie']

How to parse dynamic json reponse and get specific value and pass it as an input to next request

I get .json file as a response from an API and from that file I should parse and fins specific parameter and pass it as an input to the next request, how do I do that using Katalon.
If I say
response = JSON.parse("response.json");
it says it is unable to identify JSON as valid. Can someone help me out with the solution?
Your JSON is invalid, maybe it is a copy-paste issue.
The valid JSON should be
{
"responseStatusCode": "OK",
"data": {
"screenName": "employeeTimeslip",
"screenType": "Redirect",
"searchResultCount": 0,
"rows": [],
"tabs": [],
"searchParams": {
"employeeID": "000092926",
"timeslipNumber": "201900019701"
}
}
}
So, you were missing a "," between "OK" and "data" and two closing curly braces at the end of the file.
You can check JSON files for validity yourself using online JSON validators, for example, this one.
i found a way to read specific parameter from the json response file like below:
val scn = scenario("ClaimSubmission")
.exec(http("request_2")
.post("URL")
.headers(headers_2)
.body(RawFileBody("json file path"))
.check(jsonPath("$..timeslipnumber").find.saveAs("timeslipnumber")))
Timeslip number would be retrieved using : .check(jsonPath("$..timeslipnumber").find.saveAs("timeslipnumber")))

Json.net JObject.Parse erroring on complex json

The following code:
let resp = string(argv.GetValue 0)
let json =JObject.Parse resp
gives this error:
An unhandled exception of type 'Newtonsoft.Json.JsonReaderException'
occurred in Newtonsoft.Json.dll
Input string '2.2.6' is not a valid number.
Path 'items[0].versionName', line 1, position 39.
where argv is this input:
{
"totalCount":1,
"items":[
{
"versionName":"2.2.6",
"phase":"PLANNING",
"distribution":"EXTERNAL",
"source":"CUSTOM",
"_meta":{
"allow":[
"GET",
"PUT",
"DELETE"
],
"href":"url",
"links":[
{
"rel":"versionReport",
"href":"url"
},
{
"rel":"project",
"href":"url"
},
{
"rel":"policy-status",
"href":"url"
}
]
}
}
]
}
How can I fix this? Is there a simple way to implement a json reader that does not error here?
I also get this error:
An unhandled exception of type 'Newtonsoft.Json.JsonReaderException'
occurred in Newtonsoft.Json.dll
Error parsing undefined value. Path 'items[0].name', line 1, position 28.
With this input:
{
"totalCount":1,
"items":[
{
"name":"uaa",
"projectLevelAdjustments":true,
"source":"CUSTOM",
"_meta":{
"allow":[
"GET",
"PUT",
"DELETE"
],
"href":"url",
"links":[
{
"rel":"versions",
"href":"url"
},
{
"rel":"canonicalVersion",
"href":"url"
}
]
}
}
]
}
I am trying to read in json of many different schemas that I did not make or know. The first error seems to be because it is trying to generate a float from something that should be output as a string. The second sounds like the schema is too complex and a type would be needed to properly parse it using Json.Deserialize but I'm not sure how to do that and it would take too much time as there are too many schemas to make a types for them all. Is there any way around both these things?
In C# you can use dynamic with something like this:
var json = JsonConvert.DeserializeObject(resp)
And then you could access properties with something like: json.totalCount.
In F# land, this question gives some suggestions for how you might deal with dynamic objects. If you use the package FSharp.Interop.Dynamic you can get the value of totalCount in your example with something like this:
let value:obj = json?totalCount?Value
which gives 1L on my computer.