How can i retrieve the "values" of json with R? - json

So i tried anything to get a specific key:value. But it seems that i cant find a solution for this. Any help?
{
"contracts":[
{
"contractid":"BemDRHtv17",
"cid":"",
"category":"WORK",
"mainCategory":"Grundeinkommen",
"configured":false,
"customMainCategory":null,
"customSubCategory":null,
"customContractPartner":null,
"amount":209200,
"interval":"MONTHLY",
"runTime":null,
"periodOfNotice":null,
"cancelationAlert":null,
"extensionPeriod":null,
"contractPartner":{
"creditorId":null,
"name":null,
"__typename":"ContractPartner"
},
"__typename":"Contract"
},
P.s i'm trying to access the key and value of mainCategory / search for its specific values.

You can access certain key using the following approach
using libraries like rjson , jsonlite
calling the json data into R using fromJSON() function. Lets say you saved the data like this
library(jsonlite)
jsonData = fromJSON("PATH") #Link to the file or html link
Now you can parse what you what using this approach and save the result in a matrix (or whatever data class you want to use)
variable <- as.matrix(jsonData$contracts$mainCategory)

Rather than converting your whole JSON to an R object, you can use library(jqr) to access specific elements of raw JSON
library(jqr)
jq(js, ".contracts[].mainCategory")
# "Grundeinkommen"
Data
js <- '{
"contracts":[
{
"contractid":"BemDRHtv17",
"cid":"",
"category":"WORK",
"mainCategory":"Grundeinkommen",
"configured":false,
"customMainCategory":null,
"customSubCategory":null,
"customContractPartner":null,
"amount":209200,
"interval":"MONTHLY",
"runTime":null,
"periodOfNotice":null,
"cancelationAlert":null,
"extensionPeriod":null,
"contractPartner":{
"creditorId":null,
"name":null,
"__typename":"ContractPartner"
},
"__typename":"Contract"
}
]}
'

Related

convert nested json string column into map type column in spark

overall aim
I have data landing into blob storage from an azure service in form of json files where each line in a file is a nested json object. I want to process this with spark and finally store as a delta table with nested struct/map type columns which can later be queried downstream using the dot notation columnName.key
data nesting visualized
{
key1: value1
nestedType1: {
key1: value1
keyN: valueN
}
nestedType2: {
key1: value1
nestedKey: {
key1: value1
keyN: valueN
}
}
keyN: valueN
}
current approach and problem
I am not using the default spark json reader as it is resulting in some incorrect parsing of the files instead I am loading the files as text files and then parsing using udfs by using python's json module ( eg below ) post which I use explode and pivot to get the first level of keys into columns
#udf('MAP<STRING,STRING>' )
def get_key_val(x):
try:
return json.loads(x)
except:
return None
Post this initial transformation I now need to convert the nestedType columns to valid map types as well. Now since the initial function is returning map<string,string> the values in nestedType columns are not valid jsons so I cannot use json.loads, instead I have regex based string operations
#udf('MAP<STRING,STRING>' )
def convert_map(string):
try:
regex = re.compile(r"""\w+=.*?(?:(?=,(?!"))|(?=}))""")
obj = dict([(a.split('=')[0].strip(),(a.split('=')[1])) for a in regex.findall(s)])
return obj
except Exception as e:
return e
this is fine for second level of nesting but if I want to go further that would require another udf and subsequent complications.
question
How can I use a spark udf or native spark functions to parse the nested json data such that it is queryable in columnName.key format.
also there is no restriction of spark version, hopefully I was able to explain this properly. do let me know if you want me to put some sample data and the code for ease. Any help is appreciated.

I want to write the XQuery to print the specific keys in JSON

This is my sample JSON
{
"id":"743",
"groupName":"group1",
"transation":{
"101":"success",
"102":"rejected",
"301":"processing"
}
}
Expected Result:
"101"
"102"
"301"
Can anyone please help me to print the above result using XQuery?
I can achieve this through JavaScript, but I need to write in XQuery.
Not knowing how you are reading the JSON document, whether as a doc in the database or parsing a JSON string, below uses xdmp:unquote() to parse a string, but you could instead just read the document from the database with fn:doc() or through cts:search().
Then, you could just XPath to the transation fields and return those node names with the name() function:
let $jsonData := xdmp:unquote('
{
"id":"743",
"groupName":"group1",
"transation":{
"101":"success",
"102":"rejected",
"301":"processing"
}
}')
return
$jsonData/transation/*/name()

How to convert a multi-dimensional dictionary to json file?

I have uploaded a *.mat file that contains a 'struct' to my jupyter lab using:
from pymatreader import read_mat
data = read_mat(mat_file)
Now I have a multi-dimensional dictionary, for example:
data['Forces']['Ss1']['flap'].keys()
Gives the output:
dict_keys(['lf', 'rf', 'lh', 'rh'])
I want to convert this into a JSON file, exactly by the keys that already exist, without manually do so because I want to perform it to many *.mat files with various key numbers.
EDIT:
Unfortunately, I no longer have access to MATLAB.
An example for desired output would look something like this:
json_format = {
"Forces": {
"Ss1": {
"flap": {
"lf": [1,2,3,4],
"rf": [4,5,6,7],
"lh": [23 ,5,6,654,4],
"rh": [4 ,34 ,35, 56, 66]
}
}
}
}
ANOTHER EDIT:
So after making lists of the subkeys (I won't elaborate on it), I did this:
FORCES = []
for ind in individuals:
for force in forces:
for wing in wings:
FORCES.append({
ind: {
force: {
wing: data['Forces'][ind][force][wing].tolist()
}
}
})
Then, to save:
with open(f'{ROOT_PATH}/Forces.json', 'w') as f:
json.dump(FORCES, f)
That worked but only because I looked manually for all of the keys... Also, for some reason, I have squared brackets at the beginning and at the end of this json file.
The json package will output dictionaries to JSON:
import json
with open('filename.json', 'w') as f:
json.dump(data, f)
If you are using MATLAB-R2016b or later, and want to go straight from MATLAB to JSON check out JSONENCODE and JSONDECODE. For your purposes JSONENCODE
encodes data and returns a character vector in JSON format.
MathWorks Docs
Here is a quick example that assumes your data is in the MATLAB variable test_data and writes it to a file specified in the variable json_file
json_data = jsonencode(test_data);
writematrix(json_data,json_file);
Note: Some MATLAB data formats cannot be translate into JSON data due to limitations in the JSON specification. However, it sounds like your data fits well with the JSON specification.

I need to flatten a JSON web response with nested arrays [[ ]] into a DataFrame

I'm trying to convert an http JSON response into a DataFrame, then out to CSV file.
I'm struggling with the JSON into DF.
http line:
http://api.kraken.com/0/public/OHLC?pair=XXBTZEUR&interval=1440
JSON response (part of - 720 records in arrays):
[formatted using a JSON site does not post here apparently]
{
"error": [],
"result": {
"XXBTZEUR": [
[1486252800, "959.7", "959.7", "935.0", "943.6", "945.6", "4423.72544809", 5961],
[1486339200, "943.8", "959.7", "940.0", "952.9", "953.5", "4464.48492401", 7678],
[1486425600, "953.6", "990.0", "952.7", "988.5", "977.3", "8123.94462701", 10964],
[1486512000, "988.4", "1000.1", "963.3", "987.5", "983.7", "10989.31074845", 16741],
[1486598400, "987.4", "1007.4", "847.9", "926.4", "934.5", "22530.11626076", 52668],
[1486684800, "926.4", "949.0", "886.0", "939.7", "916.7", "11173.53504917", 12588],
],
"last": 1548288000
}
}
I get
KeyError: 'XXBTZEUR'
on the json_normalize line. Seems to indicate to me that json_normalize is trying to build the DF from the "XXBTZEUR" level, not lower down at the record level. How do I get json_normalize to read the records instead. ie How do I get it to reference deep enough?
I have read several other posts on this site without understanding what I'm doing wrong.
One post mentions that json.loads() must be used. Is json_string.json() also loading the JSON object or do I need the json.loads() instead?
Also tried variations of json_normalize:
BTCEUR_Daily_Table = json_normalize(json_data[[]])
TypeError: unhashable type: 'list'
Can normalize not load an array into a DF line?
code so far:
BTCEUR_Daily_URL = 'http://api.kraken.com/0/public/OHLC?pair=XXBTZEUR&interval=1440'
json_string = requests.get(BTCEUR_Daily_URL)
json_data = json_string.json()
BTCEUR_Daily_Table = json_normalize(json_data, record_path=["XXBTZEUR"])
What I need in result:
In my DF, I just want the arrayed records shown in the "body" of the JSON structure. None of the header & footer are needed.
The solution I found was:
BTCEUR_Daily_Table = json_normalize(data=json_data, record_path=[['result','XXBTZEUR']])
The 2nd parameter specifies the full "path" to the parent label of the records.
Apparently double brackets are needed to specify a full path, otherwise the 2 labels are taken to mean 2 top level names.
Without another post here, I would never have found the solution.

Parsing nodes on JSON with Scala -

I've been asked to parse a JSON file to get all the buses that are over a specified speed inputed by the user.
The JSON file can be downloaded here
It's like this:
{
"COLUMNS": [
"DATAHORA",
"ORDEM",
"LINHA",
"LATITUDE",
"LONGITUDE",
"VELOCIDADE"
],
"DATA": [
[
"04-16-2015 00:00:55",
"B63099",
"",
-22.7931,
-43.2943,
0
],
[
"04-16-2015 00:01:02",
"C44503",
781,
-22.853649,
-43.37616,
25
],
[
"04-16-2015 00:11:40",
"B63067",
"",
-22.7925,
-43.2945,
0
],
]
}
The thing is: I'm really new to scala and I have never worked with json before (shame on me). What I need is to get the "Ordem", "Linha" and "Velocidade" from DATA node.
I created a case class to enclousure all the data so as to later look for those who are over the specified speed.
case class Bus(ordem: String, linha: Int, velocidade: Int)
I did this reading the file as a textFile and spliting. Although this way, I need to foreknow the content of the file in order to go to the lines after DATA node.
I want to know how to do this using a JSON parser. I've tried many solutions, but I couldn't adapt to my problem, because I need to extract all the lines from DATA node instead of nodes inside one node.
Can anyone help me?
PS: Sorry for my english, not a native speaker.
First of all, you need to understand the different JSON data types. The basic types in JSON are numbers, strings, booleans, arrays, and objects. The data returned in your example is an object with two keys: COLUMNS and DATA. The COLUMNS key has a value that is an array of strings and numbers. The DATA key has a value which is an array of arrays of strings.
You can use a library like PlayJSON to work with this type of data:
val js = Json.parse(x).as[JsObject]
val keys = (js \ "COLUMNS").as[List[String]]
val values = (js \ "DATA").as[List[List[JsValue]]]
val busses = values.map(valueList => {
val keyValues = (keys zip valueList).toMap
for {
ordem <- keyValues("ORDEM").asOpt[String]
linha <- keyValues("LINHA").asOpt[Int]
velocidade <- keyValues("VELOCIDADE").asOpt[Int]
} yield Bus(ordem, linha, velocidade)
})
Note the use of asOpt when converting the properties to the expected types. This operator converts the key-values to the provided type if possible (wrapped in Some), and returns None otherwise. So, if you want to provide a default value instead of ignoring other results, you could use keyValues("LINHA").asOpt[Int].getOrElse(0), for example.
You can read more about the Play JSON methods used here, like \ and as, and asOpt in their docs.
You can use Spark SQL to achieve it. Refer section under JSON Datasets here
In essence, Use spark APIs to load a JSON and register it as temp table.
You can run your SQL queries on the table from there.
As seen on #Ben Reich answer, that code works great. Thank you very much.
Although, my Json had some type problems on "Linha". As it can be seen on the JSON example that I put on the Question, there are "" and also numbers, e.g., 781.
When trying to do keyValues("LINHA").asOpt[Int].getOrElse(0), it was producing an error saying that value flatMap is not a member of Int.
So, I had to change some things:
case class BusContainer(ordem: String, linha: String, velocidade: Int)
val jsonString = fromFile("./project/rj_onibus_gps.json").getLines.mkString
val js = Json.parse(jsonString).as[JsObject]
val keys = (js \ "COLUMNS").as[List[String]]
val values = (js \ "DATA").as[List[List[JsValue]]]
val buses = values.map(valueList => {
val keyValues = (keys zip valueList).toMap
println(keyValues("ORDEM"),keyValues("LINHA"),keyValues("VELOCIDADE"))
for {
ordem <- keyValues("ORDEM").asOpt[String]
linha <- keyValues("LINHA").asOpt[Int].orElse(keyValues("LINHA").asOpt[String])
velocidade <- keyValues("VELOCIDADE").asOpt[Int]
} yield BusContainer(ordem, linha.toString, velocidade)
})
Thanks for the help!