Creating objectmapper arraynode in scala looping the spark dataframe - json

I need to create a array of json using objectmapper.
I have a dataframe , say dataSchemaDF, which has the columns as columnNm,columnType.
Now , I want to create the array of json like below iterating all rows of the df.
[{"ColumnNm":"name1","columnType":"type1"},{"ColumnNm":"name2","columnType":"type2"}]
Below is the code I have written for it.
val dataSchemaDFColoumnsArray=dataSchemaDF.columns
val objectMapperCtrlTopic :ObjectMapper= new ObjectMapper();
var parentArray:ArrayNode = objectMapperCtrlTopic.createArrayNode ();
dataSchemaDF. foreach (r => {
val objectNode: ObjectNode= objectMapperCtrlTopic.createObjectNode ()
for (i <- dataSchemaDFColumnsArray) {
objectNode.putPOJO (i, r.get (r.fieldIndex(i)))
}
println("This is printing objNode: "+objectNode+"\n")
parentArray.put (objectNode)
println ("This is printing parentArrayNode"+parentArray+"\n")
}|
I want to access the final result of parentArrayNode outside the loop. But it comes out to be empty.
Within the loop, the objectNode for each record gets printed and the parentArray is also updated and printed for every record in the DF.
It seems ro create a local variable and update that.How can I have the global variable of parentArrayNode updated.

Related

How do I generate a serde_json object from a "." separated text format?

The Problem
I am trying to generate a json object (with serde) by parsing a custom macro format that looks like this:
Plot.Polar.max: 20
Plot.Polar.min: 0
Plot.Polar.numberlabel: 0101
Plot.Polar.chartname: small-chart
Plot.Polar.Var.1:
Plot.Polar.Var.2: A label: with T+ES[T] #Data
What I get stuck on is how to set the keys for the object. In my old JavaScript code I split on \n, ., and :, had a couple of nested loops, and a reduceRight in the end to create the object like this:
// rowObject equals one row in the old macro format
let rowObject = keys.reduceRight(
(allKeys, item) => ({ [item]: allKeys }),
val,
);
My Goal
My goal is to use that json object to generate a highcharts config (json) depending on the keys and values from the custom macro. I want to be able to print just the macro in json format as well hence why I want to convert the macro to json first and not use a separate data structure (though that might be a good idea?). The json I want to produce from the macro is this:
{
"Plot": {
"Polar": {
"max": 20,
"min": 0
}
}
}
What I Have Tried
Map::insert though I am not sure how to structure the key string. How do I manage the Map objects in this case?
Another solution I see is creating the object from a raw string and merging each rowObject with the main object though this approach feels a bit hacky.
The current loop I have:
// pseudo
// let mut json_macro = new Map();
for row in macro_rows.iter() {
let row_key_value: Vec<&str> = row.split(':').collect();
let keys = row_key_value[0];
let value = row_key_value[1];
let keys_split: Vec<&str> = keys.split('.').collect();
for key in keys_split.iter() {
// TODO: accumulate a objects to row_object
}
// TODO: insert row_object to json_macro
}
The Question
Is it possible to do something like reduceRight in JavaScript or something similar in rust?
Update
I realized that I will have to treat all values as strings because it is impossible to know if a number is a string or not. What worked in the end was the solution #gizmo provided.
To insert your row into json_macro you can fold keys_split from the left and insert every key into the top-level object:
let row_key_value: Vec<&str> = row.split(':').collect();
let keys = row_key_value[0];
let value: Value = serde_json::from_str(row_key_value[1]).unwrap();
let keys_split: Vec<&str> = keys.split('.').collect();
keys_split[..keys_split.len() - 1]
.iter()
.fold(&mut json_macro, |object, &key| {
object
.entry(key)
.or_insert(Map::new().into())
.as_object_mut()
.unwrap()
})
.insert(keys_split.last().unwrap().to_string(), value);
A couple things to note here about unwrap()s:
from_str(...).unwrap(): I parse val as a JSON object here. This might not be what you want. Maybe instead you want str::parse::<i32> or something else. In any case, this parsing might fail.
.as_object_mut().unwrap(): This will explode if the input redefines a key like
Plot.Polar: 0
Plot.Polar.max: 20
The other way around, you probably want to handle the case where the key is already defined as an object.
keys_split.last().unwrap() won't fail but you might want to check if it's the empty string

Python3: Loop over objects and add attributes to an array or object

I'm trying to loop through some objects and add certain attributes to an array to be sent back as JSON to the view:
data = {}
camera_logs = CameraLog.objects.filter(camera_id=camera_id)
for log in camera_logs :
setattr(data, 'celsius', log.celsius)
setattr(data, 'fahrenheit', log.fahrenheit)
return JsonResponse(data)
I'm quite new to Python, so I'm not sure if I'm even on the right track.
Accessing Python dictionaries is much easier than using setattr.
You have two options. Either create a dictionary with IDs as Keys or just a simple list.
import json
data = {}
camera_logs = CameraLog.objects.filter(camera_id=camera_id)
for log in camera_logs :
data[log.id] = log.celsius
return Response(json.dumps(data))
in above solution you will have a dictionary with ID of CameraLog as a KEY, and as value you will have celsius. Basically your json will look like this:
{
1: 20,
2: 19,
3: 21
}
Second approach is to send a simple list of values, but I guess you would like to have info, what camera had what temp
import json
data = []
camera_logs = CameraLog.objects.filter(camera_id=camera_id)
for log in camera_logs :
data.append(log.celsius)
return Response(json.dumps(data))
Edit to an answer
If you wish to have a list of dicts, make something like this:
import json
data = []
camera_logs = CameraLog.objects.filter(camera_id=camera_id)
for log in camera_logs :
data.append({
'camera_id': log.id,
'celsius': log.celsius,
'fahrenheit': log.fahrenheit
})
return Response(json.dumps(data))
You can enhance your query by only selecting the attributes that you need from a queryset using .values_list.
import json
camera_logs = CameraLog.objects.filter(camera_id=camera_id).values_list(
'celsius', 'fahrenheit')
data = [{"celcius": cel, "fahrenheit": fahr} for cel, fahr in camera_logs]
return Response(json.dumps(data))

Reading massive JSON files into Spark Dataframe

I have a large nested NDJ (new line delimited JSON) file that I need to read into a single spark dataframe and save to parquet. In an attempt to render the schema I use this function:
def flattenSchema(schema: StructType, prefix: String = null) : Array[Column] = {
schema.fields.flatMap(f => {
val colName = if (prefix == null) f.name else (prefix + "." + f.name)
f.dataType match {
case st: StructType => flattenSchema(st, colName)
case _ => Array(col(colName))
}
})
}
on the dataframe that is returned by reading by
val df = sqlCtx.read.json(sparkContext.wholeTextFiles(path).values)
I've also switched this to val df = spark.read.json(path) so that this only works with NDJs and not multi-line JSON--same error.
This is causing an out of memory error on the workers
java.lang.OutOfMemoryError: Java heap space.
I've altered the jvm memory options and spark executor/driver options to no avail
Is there a way to stream the file, flatten the schema, and add to a dataframe incrementally? Some lines of the JSON contain new fields from the preceding entires...so those would need to be filled in later.
No work around. The issue was with the JVM object limit. I ended up using a scala json parser and built the dataframe manually.
You can achieve this in multiple ways.
First while reading, you can provide the schema for dataframe to read json or you can allow the spark to infer the schema by itself.
Once the json is in dataframe, you can follow the following ways to flatten it.
a. Using explode() on dataframe - to flatten it.
b. Using spark sql and access the nested fields using . operator. You can find examples here
Lastly, if you want to add new columns to dataframe
a. First option,using withColumn() is one approach. However this will be done for each new column added and for entire data set.
b. Using sql to generate new dataframe from existing - this may be easiest
c. Lastly, using map, then accessing elements, get old schema, add new values, create new schema and finally get the new df - as below
One withColumn will work on entire rdd. So generally its not a good practise to use the method for every column you want to add. There is a way where you work with columns and their data inside a map function. Since one map function is doing the job here, the code to add new column and its data will be done in parallel.
a. you can gather new values based on the calculations
b. Add these new column values to main rdd as below
val newColumns: Seq[Any] = Seq(newcol1,newcol2)
Row.fromSeq(row.toSeq.init ++ newColumns)
Here row, is the reference of row in map method
c. Create new schema as below
val newColumnsStructType = StructType{Seq(new StructField("newcolName1",IntegerType),new StructField("newColName2", IntegerType))
d. Add to the old schema
val newSchema = StructType(mainDataFrame.schema.init ++ newColumnsStructType)
e. Create new dataframe with new columns
val newDataFrame = sqlContext.createDataFrame(newRDD, newSchema)

How to create an RDD from another RDD by extracting specific values?

I have an RDD which contains a String and JSON object (as String). I extracted required values from the JSON object. How can I use the values to create a new RDD which stores each value in each column?
RDD
(1234,{"id"->1,"name"->"abc","age"->21,"class"->5})
From which a map was generated as shown below.
"id"->1,
"name"->"abc",
"age"->21
"id"->2,
"name"->"def",
"age"->31
How to convert this to RDD[(String, String, String)], which stores data like:
1 abc 21
2 def 31
Not in front of a compiler right now, but something like this should work:
def parse(val row: (String, JValue)) : Seq((String, String, String)) = {
// Here goes your code to parse a Json into a sequence of tuples, seems like you have this already well in hand.
}
val rdd1 = ??? // Initialize your RDD[(String, JValue)]
val rdd2: RDD[(String, String, String)] = rdd1.flatMap(parse)
flatMap does the trick, as your extraction function can extract multiple rows on each Json input (or none) and they will be seamlessly be integrated into the final RDD.

Google Dart JSON Extraction

I'm attempting to pull out data from a nested array in JSON but cannot seem to get the values correct. Right now, all values of the nested operatingSystem array print out in the table when I only need the name of the operating system. My code is below and please let me if you need more information.
Dart:
List<Map> assetList;
// LinkedHashMap preserves key entry order
LinkedHashMap<String, Map> dataMap = new LinkedHashMap<String, Map>();
for (var d in assetList) {
HashMap rowMap = new HashMap();
String domainId = d["process"]["processId"];
//first <td> element, the rest follow in succession
dataMap[domainId] = rowMap;
rowMap["domainId"] = domainId;
//is still not checking if null
if(d["asset"]["operatingSystem"].containsKey("name")){
rowMap["operatingSystem"] = d["asset"]["operatingSystem"]["name"];
} else{
rowMap["operatingSystem"] = d["asset"]["operatingSystem"];
}
//print out table data for debugging
print(rowMap.toString());
print(d);
JSON:
"asset":{
"assetId":"8a498592469189660146918d9e2f0000",
"oplock":0,
"domainName":"",
"latitude":58.92,
"ipAddress":"4.4.4.4",
"longitude":-37.23,
"operatingSystem":{
"osId":2,
"oplock":0,
"name":"Windows 8"
}
}
You need to go one level deeper. You are printing out the object's operatingSystem header but the operatingSystem header has 3 attributes.
The corect syntax is
json["asset"]["operatingSystem"]["name"];
You could also do it like this which I believe is more standard when it comes to JS and JSON
json.asset.operatingSystem.name