How to parse JsonArray in Scala and writing them in a DataFrame?

How to parse JsonArray in Scala and writing them in a DataFrame? - json

Using my Scala HTTP Client I retrieved a response in JSON format from an API GET call.
My end goal is to write this JSON content to an AWS S3 bucket in order to make it available as a table on RedShift running a simple AWS Glue crawler.
My thinking is to parse this JSON message and somehow converting into a Spark DataFrame, so later on I can save it to my preferred S3 location in the format of .csv, .parquet, or whatever.
The JSON file looks like this
{
"response": {
"status": "OK",
"start_element": 0,
"num_elements": 100,
"categories": [
{
"id": 1,
"name": "Airlines",
"is_sensitive": false,
"last_modified": "2010-03-19 17:48:36",
"requires_whitelist_on_external": false,
"requires_whitelist_on_managed": false,
"is_brand_eligible": true,
"requires_whitelist": false,
"whitelist": {
"geos": [],
"countries_and_brands": []
}
},
{
"id": 2,
"name": "Apparel",
"is_sensitive": false,
"last_modified": "2010-03-19 17:48:36",
"requires_whitelist_on_external": false,
"requires_whitelist_on_managed": false,
"is_brand_eligible": true,
"requires_whitelist": false,
"whitelist": {
"geos": [],
"countries_and_brands": []
}
}
],
"count": 148,
"dbg_info": {
"warnings": [],
"version": "1.18.1621",
"output_term": "categories"
}
}
}
The content I would like to map to a Dataframe is the one contained by the "categories" JSON Array.
I have managed to parse the message using json4s.JsonMethods method parse this way:
val parsedJson = parse(request) \\ "categories"
Obtaining the following:
output: org.json4s.JValue = JArray(List(JObject(List((id,JInt(1)), (name,JString(Airlines)), (is_sensitive,JBool(false)), (last_modified,JString(2010-03-19 17:48:36)), (requires_whitelist_on_external,JBool(false)), (requires_whitelist_on_managed,JBool(false)), (is_brand_eligible,JBool(true)), (requires_whitelist,JBool(false)), (whitelist,JObject(List((geos,JArray(List())), (countries_and_brands,JArray(List()))))))), JObject(List((id,JInt(2)), (name,JString(Apparel)), (is_sensitive,JBool(false)), (last_modified,JString(2010-03-19 17:48:36)), (requires_whitelist_on_external,JBool(false)), (requires_whitelist_on_managed,JBool(false)), (is_brand_eligible,JBool(true)), (requires_whitelist,JBool(false)), (whitelist,JObject(List((geos,JArray(List())), (countries_and_brands,JArray(List()))))))))
However, I am completely lost on how to proceed. I have even tried using another library for Scala called uJson:
val json = (ujson.read(request))
val tuples = json("response")("categories").arr /* <-- categories is an array */ .map { item =>
(item("id"), item("name"))
This time I have only parsed two fields for testing, but this shouldn't change much. Hence, I obtained the following structure:
tuples: scala.collection.mutable.ArrayBuffer[(ujson.Value, ujson.Value, ujson.Value, ujson.Value)] = ArrayBuffer((1,"Airlines",false,"2010-03-19 17:48:36"), (2,"Apparel",false,"2010-03-19 17:48:36"))
However, also this time I do not know how to move forward and everything I try results in errors, mostly related to format incompatibility.
Please, feel free to propose any other approach to achieve my goal even if it changes totally my workflow. I rather learn something properly. Thanks

We can use the following code to convert JSON to Spark Dataframe/Dataset
val df00 =
spark.read.option("multiline","true").json(Seq(JSON_OUTPUT).toDS())

Related

Extract all the JSON data using Kotlin Android Studio

I'm using Volley library to communicate with my API. I'm pretty new to Android and Kotlin and I'm really confused about extracting keys from the following JSON data
{
"message": {
"_id": "60bc7fa7abeedb25643fa692",
"hash": "3a54b415461a63abac1fc6dfa0e140584047bd15358e33a177f9505ed2faa4d4",
"blockchain": "ethereum",
"amount": 5000,
"amount_usd": 13352971,
"from": "d3d69228cb2292f933572399593617f574c70eb1",
"to": "fe9996da73d6bf5252f15024811954ae37ab68be",
"__v": 0
}
}
The volley library returns all of this JSON data in a variable called response and I'm using response.getString("message") to extract the message key but, I don't understand how to extract the internal data such as hash, blockchain, amount, etc.
I'm using the following code to get the JSON data from my backend.
val jsonRequest = JsonObjectRequest(
Request.Method.GET, url, null,
{ response ->
tweet_text.setText(response.getString("message"))
Log.d("resp", response.toString())
},
{
Log.d("err", it.localizedMessage)
})
Any help would be appreciated, Thanks!

I found it, I just used the getJSONObject() method to make it work
val jsonRequest = JsonObjectRequest(
Request.Method.GET, url, null,
{ response ->
val txn = response.getJSONObject("message")
//txn object can be used to extract the internal data
},
{
Log.d("err", it.localizedMessage)
})

Trying to get values form a list inside a JSON respons

My goal is to loop over a JSON response, grab two values, and build out an API call to load information into a POST to create a policy I am building.
To start out on this, I am trying to get two values from a JSON response to assign as variables to build the POST call, which will be the second step to this. As each different "id" and "name" key is assigned, I would like to build out a JSON payload and send the POST calls one at a time. The keys "id" and "name" occurs multiple times in the response payload and I am having issues capturing the two keys.
JSON response
data = {
"data":[
{
"id":"02caf2be-3245-4d3d",
"name":"ORA-FIN-ACTG",
"description":"Oracle",
"links":{
"web":"https://com/",
"api":"https://com/"
}
},
{
"id":"03af2f46-fad6-41a1",
"name":"NBCMAINFRAME",
"description":"Network",
"links":{
"web":"https://com/",
"api":"https://com/"
}
},
{
"id":"0649628b-0e3b-48df",
"name":"CAMS",
"description":"Customer",
"links":{
"web":"https://com/",
"api":"https://com/"
}
},
{
"id":"069d4bcf-3e50-4105",
"name":"SHAREPOINTSITES",
"description":"Sharepoint",
"links":{
"web":"https://com/",
"api":"https://com/"
}
}
],
"took":0.013,
"requestId":"1f364470"
}
I have tried various "for loops" to grab the data. Here is one of the loops below:
data = json.loads(data)
data[0]['data'][0]['name']
for item in range(len(data)):
print(data[item]['data'][0]['name'])
I have also tried reading it as a dictionary:
for data_dict in data:
for key, value in data_dict.items():
team.append(key)
id.append(value)
print('name = ', team)
print('id = ', id)
I am also getting KeyError's and TypeError: the JSON object must be str, bytes or bytearray, not 'dict'.
Any help is appreciated.
FYI, this is the payload I am wanting to populate with the "name" and "id" values:
data= {
"type":"alert",
"description":"policy",
"enabled":"true",
"filter":{
"type":"match-any-condition",
"conditions":[
{
"field":"extra-properties",
"key":"alertOwner",
"operation":"equals",
"expectedValue":name
}
]
},
"ignoreOriginalResponders": "true",
"ignoreOriginalTags": "true",
"continue": "true",
"name": str(name) + " Policy",
"message":"{{message}}",
"responders":[{"type":"team","id":id}],
"alias":"{{alias}}",
"tags":["{{tags}}"],
"alertDescription":"{{description}}"
}

The JSON response which you have given is already a dictionary so no need to use json.loads for that. The multiple item list is actually nested under the data key. So you could just simply iterate through the array of items like this:
for item in data["data"]:
print("{} : {}".format(item["id"],item["name"]))
This is the output:
02caf2be-3245-4d3d : ORA-FIN-ACTG
03af2f46-fad6-41a1 : NBCMAINFRAME
0649628b-0e3b-48df : CAMS
069d4bcf-3e50-4105 : SHAREPOINTSITE

How to access data from nested dicts json?

I want to get some data from a json file. I can access everything with the code below.
import json
with open('C:\\Users\\me\\Documents\\stdin.json', 'r', encoding='utf8', errors='ignore') as json_file:
data = json.load(json_file)
print("Type: ", type(data))
print("VM: ", data["Datacenter"])
The .json file looks like this:
{
"Datacenter":[
{
"Folder":[
{
"Folder":[
{
"VirtualMachine":[
{
"moid":"vm-239566",
"name":"DEV CentOS 6",
},
{
"moid":"vm-239464",
"name":"DEV Sles 12",
},
],
"moid":"group-v239127",
"name":"DEV-VMs"
},
],
"moid":"group-v78",
"name":"Test and Dev"
},
{
"VirtualMachine":[
{
"moid":"vm-66130",
"name":"Hyv16-clone",
}
],
"moid":"group-v77",
"name":"Templates"
}
],
"moid":"datacenter-21",
"name":"Datencenter"
}
],
"vSphereHost":"srv01",
"vSphereProductLine":"vpx",
"vSphereServer":"VMware vCenter Server",
"vSphereVersion":"xxx",
"version":"1.0",
"viewType":"VMs and Templates"
}
Note that the original json file was much bigger as I deleted lines for readabilty. Also note that I run everything from Command Line, as my IDE always gives me the error UnicodeEncodeError: 'charmap' codec can't encode characters in position 22910-22912: character maps to <undefined>
I tried to use data["VirtualMachine"] instead of data["Datacenter"] but then I get an error... TypeError: 'VirtualMachine' is an invalid keyword argument for this function.
So how can I get/print the moid and name of a VM? I am really new to coding and don´t know how to deal with nested dictionarys

However your question does not seems clear but from whatever you have mentioned this can be help you deriving the value in nested json if you have already have dataframe created. You can go ahead and add get() till you reach what you require. Below is the sample that you can use
import json
data = data.apply(lambda x: json.loads(json.loads(x).get("Folder","{}")).get("moid") if x else None)

Is there a way to programmatically convert JSON to AVRO Schema?

I need to create AVRO file but for that I need 2 things:
1) JSON
2) Avro Schema
From these 2 requirements - I have JSON:
{"web-app": {
"servlet": [
{
"servlet-name": "cofaxCDS",
"servlet-class": "org.cofax.cds.CDSServlet",
"init-param": {
"configGlossary:installationAt": "Philadelphia, PA",
"configGlossary:adminEmail": "ksm#pobox.com",
"configGlossary:poweredBy": "Cofax",
"configGlossary:poweredByIcon": "/images/cofax.gif",
"configGlossary:staticPath": "/content/static",
"templateProcessorClass": "org.cofax.WysiwygTemplate",
"templateLoaderClass": "org.cofax.FilesTemplateLoader",
"templatePath": "templates",
"templateOverridePath": "",
"defaultListTemplate": "listTemplate.htm",
"defaultFileTemplate": "articleTemplate.htm",
"useJSP": false,
"jspListTemplate": "listTemplate.jsp",
"jspFileTemplate": "articleTemplate.jsp",
"cachePackageTagsTrack": 200,
"cachePackageTagsStore": 200,
"cachePackageTagsRefresh": 60,
"cacheTemplatesTrack": 100,
"cacheTemplatesStore": 50,
"cacheTemplatesRefresh": 15,
"cachePagesTrack": 200,
"cachePagesStore": 100,
"cachePagesRefresh": 10,
"cachePagesDirtyRead": 10,
"searchEngineListTemplate": "forSearchEnginesList.htm",
"searchEngineFileTemplate": "forSearchEngines.htm",
"searchEngineRobotsDb": "WEB-INF/robots.db",
"useDataStore": true,
"dataStoreClass": "org.cofax.SqlDataStore",
"redirectionClass": "org.cofax.SqlRedirection",
"dataStoreName": "cofax",
"dataStoreDriver": "com.microsoft.jdbc.sqlserver.SQLServerDriver",
"dataStoreUrl": "jdbc:microsoft:sqlserver://LOCALHOST:1433;DatabaseName=goon",
"dataStoreUser": "sa",
"dataStorePassword": "dataStoreTestQuery",
"dataStoreTestQuery": "SET NOCOUNT ON;select test='test';",
"dataStoreLogFile": "/usr/local/tomcat/logs/datastore.log",
"dataStoreInitConns": 10,
"dataStoreMaxConns": 100,
"dataStoreConnUsageLimit": 100,
"dataStoreLogLevel": "debug",
"maxUrlLength": 500}},
{
"servlet-name": "cofaxEmail",
"servlet-class": "org.cofax.cds.EmailServlet",
"init-param": {
"mailHost": "mail1",
"mailHostOverride": "mail2"}},
{
"servlet-name": "cofaxAdmin",
"servlet-class": "org.cofax.cds.AdminServlet"},
{
"servlet-name": "fileServlet",
"servlet-class": "org.cofax.cds.FileServlet"},
{
"servlet-name": "cofaxTools",
"servlet-class": "org.cofax.cms.CofaxToolsServlet",
"init-param": {
"templatePath": "toolstemplates/",
"log": 1,
"logLocation": "/usr/local/tomcat/logs/CofaxTools.log",
"logMaxSize": "",
"dataLog": 1,
"dataLogLocation": "/usr/local/tomcat/logs/dataLog.log",
"dataLogMaxSize": "",
"removePageCache": "/content/admin/remove?cache=pages&id=",
"removeTemplateCache": "/content/admin/remove?cache=templates&id=",
"fileTransferFolder": "/usr/local/tomcat/webapps/content/fileTransferFolder",
"lookInContext": 1,
"adminGroupID": 4,
"betaServer": true}}],
"servlet-mapping": {
"cofaxCDS": "/",
"cofaxEmail": "/cofaxutil/aemail/*",
"cofaxAdmin": "/admin/*",
"fileServlet": "/static/*",
"cofaxTools": "/tools/*"},
"taglib": {
"taglib-uri": "cofax.tld",
"taglib-location": "/WEB-INF/tlds/cofax.tld"}}}
But how to create AVRO Schema based on it?
Looking for programatic way to do that since will have many schemas and can not create Avro Schema manually every time.
I checked 'avro-tools-1.8.1.jar' but that can not create Avro Schema from JSON directly.
Looking for a Jar or Python code that can create JSON -> Avro schema.
It is ok if Data Types are not perfect (Strings, Integers and Floats are good enough for start).

This one works cool with a simple copy and paste of avro schema.
https://toolslick.com/generation/metadata/avro-schema-from-json

you can use Kite SDK util to infer avro schema from a json input.
https://github.com/kite-sdk/kite/blob/master/kite-data/kite-data-core/src/main/java/org/kitesdk/data/spi/JsonUtil.java#L539
Example:
String json = "{\n" +
" \"id\": 1,\n" +
" \"name\": \"A green door\",\n" +
" \"price\": 12.50,\n" +
" \"tags\": [\"home\", \"green\"]\n" +
"}\n"
;
String avroSchema = JsonUtil.inferSchema(JsonUtil.parse(json), "myschema").toString();
System.out.println(avroSchema);
Result:
{
"type":"record",
"name":"myschema",
"fields":[
{
"name":"id",
"type":"int",
"doc":"Type inferred from '1'"
},
{
"name":"name",
"type":"string",
"doc":"Type inferred from '\"A green door\"'"
},
{
"name":"price",
"type":"double",
"doc":"Type inferred from '12.5'"
},
{
"name":"tags",
"type":{
"type":"array",
"items":"string"
},
"doc":"Type inferred from '[\"home\",\"green\"]'"
}
]
}
You can find the maven dependency here

Give this one a shot. http://www.dataedu.ca/avro
It basically infers the Avro schema that accepts the JSON.
You can even give it a JSON array. What it would do is generating an Avro schema that is compatible with all the JSON documents in your array.
There are other tools that you can verify the result.

If you want to avoid creating a dedicated AVRO schema for every JSON format, you can use rec-avro package.
It allows you to take any python data structure, including parsed XML or JSON and store it in Avro without a need for a dedicated schema.
I tested it for python 3.
You can install it as pip3 install rec-avro or see the code and docs at https://github.com/bmizhen/rec-avro
I gave a json to avro example code here: https://stackoverflow.com/a/55444481/6654219

Jmeter Dynamic Json Array Generation from CSV file

I have a following Json data to post.
{
"id": 1,
"name": "Zypher",
"price": 12.50,
"tags": [{
"tag": 1,
"tagName": "X"
},
{
"tag": 2,
"tagName": "Y"
},
{
"tag": 2,
"tagName": "Z"
}]
}
My Jmeter Test Plan is as following,
- Test Plan
- Thread Group
- Http Request Defaults
- Http Cookie Manager
- Simple Controller
- CSV Data Set Config (Sheet_1)
- Http Header Manager
- Http Request (The hard coded json was provided here as body data)
Every thing works fine. Now I want to use csv to parametrised my Json.
Sheet_1:
id,name,price
1,Zypher,12.50
I modified json with these 3 parameters and its works for me. Now I want to parametrise detail portion. I have no idea how to do this.
All I want to keep my json like this,
{
"id": ${id},
"name": ${name},
"price": ${price},
"tags": [
{
"tag": ${tag},
"tagName": ${tagName}
}]
}
How could I dynamically make json array tags for details portion from csv data? I want it to be looped as row provided in csv file.
Updated csv
id,name,price,tag,tagname
1,Zypher,12.50,7|9|11,X|Y|Z
It would be great in this format
id,name,price,tag
1,Zypher,12.50,7:X|9:Y|11:Z
tag has two properties dividing by :

You can do it using JSR223 PreProcessor and Groovy language, something like:
Given you have the following CSV file structure:
id,name,price,tag
1,Zypher,12.50,X|Y|Z
And the following CSV Data Set Config settings:
Add JSR223 PreProcessor as a child of the HTTP Request sampler and put the following code into "Script" area:
import groovy.json.JsonBuilder
def json = new JsonBuilder()
def tagsValues = vars.get("tags").split("\\|")
class Tag {int tag; String tagName }
List<Tag> tagsList = new ArrayList<>()
def counter = 1
tagsValues.each {
tagsList.add(new Tag(tag: counter, tagName: it))
counter++
}
json {
id Integer.parseInt(vars.get("id"))
name vars.get("name")
price Double.parseDouble(vars.get("price"))
tags tagsList.collect { tag ->
["tag" : tag.tag,
"tagName": tag.tagName]
}
}
sampler.addNonEncodedArgument("",json.toPrettyString(),"")
sampler.setPostBodyRaw(true)
Remove any hard-coded data from the HTTP Request sampler "Body Data" tab (it should be absolutely blank)
Run your request - JSON payload should be populated dynamically by the Groovy code:
References:
Parsing and producing JSON - Groovy
Groovy Is the New Black
Update:
for CSV format
id,name,price,tag
1,Zypher,12.50,7:X|9:Y|11:Z
Replace the below Groovy code:
List<Tag> tagsList = new ArrayList<>()
def counter = 1
tagsValues.each {
tagsList.add(new Tag(tag: counter, tagName: it))
counter++
}
with
List<Tag> tagsList = new ArrayList<>();
tagsValues.each {
String[] tag = it.split("\\:")
tagsList.add(new Tag(tag: Integer.parseInt(tag[0]), tagName: tag[1]))
}

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to parse JsonArray in Scala and writing them in a DataFrame? - json

We can use the following code to convert JSON to Spark Dataframe/Dataset val df00 = spark.read.option("multiline","true").json(Seq(JSON_OUTPUT).toDS())

Related

Extract all the JSON data using Kotlin Android Studio

Trying to get values form a list inside a JSON respons

How to access data from nested dicts json?

Is there a way to programmatically convert JSON to AVRO Schema?

Jmeter Dynamic Json Array Generation from CSV file

Categories

Resources