Parse and store nested array with RestKit - json

I've iOS project which is using RestKit 0.21.0 component responsible to get, parse and store in Core Data responses from remote server. In one of the backend JSON response I have something like that:
"response": [
{
"id": 1,
"start_time": "10:00:00",
"end_time": "14:00:00",
"name": "Object name",
"occurrences": [
"2013-09-13T14:00:00",
"2013-09-20T14:00:00",
"2013-09-27T14:00:00"
]
},
.
.
.
]
Generally I'm able to parse and store in Core Data received objects. I've only problem with nested array occurrences.
Do you have any advices how should I properly parse and store this collection?

I guess you want to map it to dates. To do that you generally need a container. You can also simply map to an array of strings and post-process.
1) Array of strings:
Just add an NSArray property to your destination object and map occurrences to it. This would be a transformable attribute in Core Data (could be transient). Now you can iterate the array and create the dates (could be done in willSave).
2) Relationship to dates:
Create a new entity, call it Occurrence. It has a single date property. Use a 'nil' keypath mapping to create instances of this Occurrence entity and map each of the dates to a new instance (the conversion to NSDate will be done for you). You have no identity so your only option would be to use the date as the unique identifier.

Related

Modify Nested JSON fields with Kafka Streams

Is it possible to apply filter on the Nested JSON fields with help of Kafka Streams? If Yes, how those fields can be addressed?
For example,
{
"before":{
"id":1,
"name":"abc"
},
"after":{
"id":1,
"name":"xyz"
}
and now if name is modified in after field I do not want to filter it but fields other than name are getting modified I want to filter that record.
Thank you.
The deserializer of the configured Stream Serde should return your object type. You can then filter just like a regular Java Stream
stream.filter(yourMesssage -> compareCDCRecords(yourMessage.getBefore(), yourMessage.getAfter()))

Get multiple JSON keypair values within a JSON object only if a specific keypair value matches iterable

I am trying to pull multiple json keyvalue data within any given JSON object, from a JSON file produced via an API call, for only the JSON objects that contain a specific value within a keyvalue pair; this specific value within a keyvalue pair is obtained via iterating through a list. Hard to articulate, so let me show you:
Here is a list (fake list representing my real list) I have containing the "specific value within a keyvalue pair" I'm interested in:
mylist = ['device1', 'device2', 'device3']
And here is the json file (significantly abridged, obviously) that I am reading from my python script:
[
{
"name": "device12345",
"type": "other",
"os_name": "Microsoft Windows 10 Enterprise",
"os_version": null
},
{
"name": "device2",
"type": "other",
"os_name": "Linux",
"os_version": null
},
{
"name": "device67890",
"type": "virtual",
"os_name": "Microsoft Windows Server 2012 R2 Standard",
"os_version": null
}
]
I want to iterate through mylist, and for each item in mylist that matches the json keyvalue key "name" in the json file, print the values for 'name' and 'os_name'. In this case, I should see a result where after iterating through mylist, device2 matches the 'name' key in the json object that contains {'name':'device2'}, and then prints BOTH the 'name' value ('device2') and the 'os_name' value ('Linux'), and then continues iterating through mylist for the next value to iterate through the json file.
My code that does not work; let us assume the json file was opened correctly in python and is defined as my_json_file with type List:
for i in mylist:
for key in my_json_file:
if key == i:
deviceName = i.get('name')
deviceOS = i.get('os_name')
print(deviceName)
print(deviceOS)
I think the problem here is that I can't figure out how to "identify" json objects that "match" items from mylist, since the json objects are "flat" (i.e. 'device2' in my json file doesn't have 'name' and 'os_name' nested under it, which would allow me to dict.get('device2') and then retrieve nested data).
Sorry if I did not understand your question clearly, but it appears you're trying to read values off of the list instead of the JSON file since you're looking at i.get instead of key.get when key is what actually contains the information.
And for the optimization issue, I'd recommend converting your list of devices into a set. You can then iterate through the JSON array and check if a given name is in the set instead of doing it the other way. The advantage is that sets can return if they contain an item in O(1) time, meaning it will significantly speed up the overall speed of the program to O(n) where n = size of json array.
for key in my_json_file:
if key.get('name') in my_list:
deviceName = key.get('name')
deviceOS = key.get('os_name')
print(deviceName)
print(deviceOS)

Process events from Event hub using pyspark - Databricks

I have a Mongo change stream (a pymongo application) that is continuously getting the changes in collections. These change documents as received by the program are sent to Azure Event Hubs. A Spark notebook has to read the documents as they get into Event Hub and do Schema matching (match the fields in the document with spark table columns) with the spark table for that collection. If there are fewer fields in the document than in the table, columns have to be added with Null.
I am reading the events from Event Hub like below.
spark.readStream.format("eventhubs").option(**config).load().
As said in the documentation, the original message is in the 'body' column of the dataframe that I am converting to string. Now I have got the Mongo document as a JSON string in a streaming dataframe. I am facing below issues.
I need to extract the individual fields in the mongo document. This is needed to compare what fields are present in the spark table and what is not in Mongo document. I saw a function called get_json_object(col,path). This essentially returns a string again and I cannot individually select all the columns.
If from_json can be used to convert the JSON string to Struct type, I cannot specify the Schema because we have close to 70 collections (corresponding number of spark tables as well) each sending Mongo docs with fields from 10 to 450.
If I can convert the JSON string in streaming dataframe to a JSON object whose schema can be inferred by the dataframe (something like how read.json can do), I can use SQL * representation to extract the individual columns, do few manipulations & then save the final dataframe to the spark table. Is it possible to do that? What is the mistake I am making?
Note: Stram DF doesn't support collect() method to individually extract the JSON string from underlying rdd and do the necessary column comparisons. Using Spark 2.4 & Python in Azure Databricks environment 4.3.
Below is the sample data I get in my notebook after reading the events from event hub and casting it to string.
{
"documentKey": "5ab2cbd747f8b2e33e1f5527",
"collection": "configurations",
"operationType": "replace",
"fullDocument": {
"_id": "5ab2cbd747f8b2e33e1f5527",
"app": "7NOW",
"type": "global",
"version": "1.0",
"country": "US",
"created_date": "2018-02-14T18:34:13.376Z",
"created_by": "Vikram SSS",
"last_modified_date": "2018-07-01T04:00:00.000Z",
"last_modified_by": "Vikram Ganta",
"last_modified_comments": "Added new property in show_banners feature",
"is_active": true,
"configurations": [
{
"feature": "tip",
"properties": [
{
"id": "tip_mode",
"name": "Delivery Tip Mode",
"description": "Tip mode switches the display of tip options between percentage and amount in the customer app",
"options": [
"amount",
"percentage"
],
"default_value": "tip_percentage",
"current_value": "tip_percentage",
"mode": "multiple or single"
},
{
"id": "tip_amount",
"name": "Tip Amounts",
"description": "List of possible tip amount values",
"default_value": 0,
"options": [
{
"display": "No Tip",
"value": 0
}
]
}
]
}
]
}
}
I would like to separate and take out the full_document in the sample above. When I use get_json_object, I am getting the full_document in another streaming dataframe as JSON string and not as an object. As you can see, there are some array types in full_document which I can explode (documentation says that explode is supported in streaming DF, but havent tried) but there are some objects also (like struct type) which I would like to extract the individual fields. I cannot use the SQL '*' notation because what get_json_object returns is a string and not the object itself.
Its convincing that this much varied Schema of the JSON would be better with schema mentioned explicitly. So I took it like, in a streaming environment with very different Schema of the incoming stream, its always better to specify the schema. So I am proceeding with get_json_object and from_json and reading the schema through a file.

How to read invalid JSON format amazon firehose

I've got this most horrible scenario in where i want to read the files that kinesis firehose creates on our S3.
Kinesis firehose creates files that don't have every json object on a new line, but simply a json object concatenated file.
{"param1":"value1","param2":numericvalue2,"param3":"nested {bracket}"}{"param1":"value1","param2":numericvalue2,"param3":"nested {bracket}"}{"param1":"value1","param2":numericvalue2,"param3":"nested {bracket}"}
Now is this a scenario not supported by normal JSON.parse and i have tried working with following regex: .scan(/({((\".?\":.?)*?)})/)
But the scan only works in scenario's without nested brackets it seems.
Does anybody know an working/better/more elegant way to solve this problem?
The one in the initial anwser is for unquoted jsons which happens some times. this one:
({((\\?\".*?\\?\")*?)})
Works for quoted jsons and unquoted jsons
Besides this improved it a bit, to keep it simpler.. as you can have integer and normal values.. anything within string literals will be ignored due too the double capturing group.
https://regex101.com/r/kPSc0i/1
Modify the input to be one large JSON array, then parse that:
input = File.read("input.json")
json = "[#{input.rstrip.gsub(/\}\s*\{/, '},{')}]"
data = JSON.parse(json)
You might want to combine the first two to save some memory:
json = "[#{File.read('input.json').rstrip.gsub(/\}\s*\{/, '},{')}]"
data = JSON.parse(json)
This assumes that } followed by some whitespace followed by { never occurs inside a key or value in your JSON encoded data.
As you concluded in your most recent comment, the put_records_batch in firehose requires you to manually put delimiters in your records to be easily parsed by the consumers. You can add a new line or some special character that is solely used for parsing, % for example, which should never be used in your payload.
Other option would be sending record by record. This would be only viable if your use case does not require high throughput. For that you may loop on every record and load as a stringified data blob. If done in Python, we would have a dictionary "records" having all our json objects.
import json
def send_to_firehose(records):
firehose_client = boto3.client('firehose')
for record in records:
data = json.dumps(record)
firehose_client.put_record(DeliveryStreamName=<your stream>,
Record={
'Data': data
}
)
Firehose by default buffers the data before sending it to your bucket and it should end up with something like this. This will be easy to parse and load in memory in your preferred data structure.
[
{
"metadata": {
"schema_id": "4096"
},
"payload": {
"zaza": 12,
"price": 20,
"message": "Testing sendnig the data in message attribute",
"source": "coming routing to firehose"
}
},
{
"metadata": {
"schema_id": "4096"
},
"payload": {
"zaza": 12,
"price": 20,
"message": "Testing sendnig the data in message attribute",
"source": "coming routing to firehose"
}
}
]

Order of performing sorting on a big JSON object

i have a big json object with a list of "tickets". schema looks like below
{
"Artist": "Artist1",
"Tickets": [
{
"Id": 1,
"Attr2Array": [
{
"Att41": 1,
"Att42": "A",
"Att43": null
},
{
"Att41": 1,
"Att42": "A",
"Att43": null
},
],
.
.
.
(more properties)
"Price": "20",
"Description": "I m a ticket"
},
{
"Id": 4,
"Attr2Array": [
{
"Att41": 1,
"Att42": "A",
"Att43": null
},
{
"Att41": 1,
"Att42": "A",
"Att43": null
},
],
.
.
.
.
(more properties)
"Price": "30",
"Description": "I m a ticket"
}
]
}
each item in the list has around 25-30 properties (some simple types, and others complex array as nested objects)
i have to read the object from an api endpoint and extract only "ID" and "Description" but they need to be sorted by "Price" which is an int for example
In what order shall i proceed with this data manipulation
Shall i use the json object, deserialised it into another object with just those 2 properties (which i need) and THEN perform sort "asc" on the "Price"?
Please note that after i have the sorted list i will have to convert it back to a json list because the front end consumes a json after all.
What i dont like about this approach is the cycle of serialisation and deserialisation that happens
or
I perform a sort on the json object first (using for example a binary/bubble sort) and then use the object to create a strongly typed (deserialised) object with just those 2 properties and then serialise it back to pass to the front end
I dont know how performant the bubble sort will be and if at all i will get any gain in performance for large chunks of data processing.
I also need to keep in mind that this implementation can take into account other properties like "availabilitydate" because at a later date, this front end could add one more filter like "availabilitdate" asc
any help is much appreciated
thanks
You can deserialize your JSON string (or file) using the Microsoft System.Web.Extensions and JavaScriptSerializer.DeserializeObject.
First, you must have classes associated to your JSON. To create classes, select your JSON sample data and, in Visual Studio, go to Edit / Paste Special / Paste JSON As Classes.
Next, use this sample to deserialize a JSON string to typed objects, and to sort all Tickets by Price property using Linq.
String json = System.IO.File.ReadAllText(#"C:\Data.json");
var root = new System.Web.Script.Serialization.JavaScriptSerializer().Deserialize<Rootobject>(json);
var sortedTickets = root.Tickets.OrderBy(t => t.Price);