Need to relationalise a nested JSON string using Pyspark - json

I am new to Pyspark and need guidance to perform the following task.
A sample data in the form of a JSON string has been given,
{
"id": "1234",
"location": "znd",
"contact": "{\"phone\": [{\"number\":\"12345\",\"code\":\"111\",\"altno\":\"No\"},{\"number\":\"55656\",\"code\":\"222\",\"altno\":\"Yes\"}]}"
}
This needs to be rationalized as follows, as seen below one row of input will get translated to 2 rows.
{id: "1234", "location": "znd","number": "12345", "code": "111","altno":"No"}
{id: "1234", "location": "znd","number": "55656", "code": "222","altno":"No"}
I have tried to use the explode function but as this is a JSON string, explode does not work on it.
I have read the data into a DF and tried to enforce a struct type to later use explode, but that does not work either.

Related

JSON query matching

For the given input JSON:
{
"person": {
"name": "John",
"age": 25
},
"status": {
"title": "assigned",
"type": 3
}
}
I need to build a string query that I could use to answer if the given JSON matches it or not. For example if the given person's name is "John" and his age is in the range of 20..30 and his status is not 4.
I need the query to be presented as a string and a commonly known library that can run it. I need it on multiple platforms (iOS, Android, Xamarin). I've tried JSON Path and JSON schemas, but didn't really figure out if it's able to achieve that with them. JSON Path seems to be specified on finding a single value in the JSON by a certain condition and JSON Schema mostly checks for the data structure and types.
Ok, found the solution. If I format the whole input object as a single-element array like that:
[
{
"person": {
"name": "John",
"age": 25
},
"status": {
"title": "assigned",
"type": 3
}
}
]
That will allow me to use JSON Path expressions:
$[?(#.person.name == 'John' && #.person.age >= 20 && #.status.type != 4)]
Basically if it doesn't match there won't be a result.

Add data to a json file using Talend

I have the following JSON:
[
{
"date": "29/11/2021",
"Name": "jack",
},
{
"date": "30/11/2021",
"Name": "Adam",
},
"date": "27/11/2021",
"Name": "james",
}
]
Using Talend, I wanna add 2 lines to have something like:
[
{
"company": "AMA",
"service": "BI",
"date": "29/11/2021",
"Name": "jack",
},
{
"company": "AMA",
"service": "BI",
"date": "30/11/2021",
"Name": "Adam",
},
"company": "AMA",
"service": "BI",
"date": "27/11/2021",
"Name": "james",
}
]
Currently, I use 3 components (tJSONDocOpen, tFixedFlowInput, tJSONDocOutput) but I can't have the right configuration of components in order to get the job done !
If you are not comfortable with json .
Just do these steps :
In the metaData just create a FileJson like this then paste it in your job as a tFileInputJson
Your job design and mapping would be
In your tFileOutputJson don't forget to change in the name of the data block "Data" with ""
What you need to do there according to the Talend practices is read your JSON. Then extract each object of it, add your properties and finally rebuild your JSON in a file.
An efficient way to do this is using tMap componenent like this.
The first tFileInputJSON will have to specify what properties it has to read from the JSON by setting your 2 objects in the mapping field.
Then the tMap will simply add 2 columns to your main stream, here is an example with hard coded string values. Depending on you needs, this component will also offer you the possibility to assign dynamic data to your 2 new columns, it's a powerful tool for manipulating the structure of a data stream.
You will find more infos about this component in the official documentation : https://help.talend.com/r/en-US/7.3/tmap/tmap; especially the "tMap scenarios" part.
Note
Instead of using the tMap, if you are comfortable with Java, you can use a tjavaRow instead. Using this, you can setup your 2 new columns with whatever java code you want to put as long as you have defined the output schema of the component.
output_row.Name = input_row.Name;
output_row.date = input_row.date;
output_row.company = "AMA";
output_row.service = "BI";

How to get the full pointer data in Parse.com

i'm making a new application using Parse.com as a backend, i'm trying to make less requests to the Parse, I have a class which is pointing to another object of another class.
Class1(things):
ObjectID Name Category(pointer)
JDFHSJFxv Apple QSGKqf343
Class2(Categories):
ObjectID Name Number Image
QSGKqf343 Fruits 45 http://myserver.com/fruits.jpeg
when i'm trying to retreive data for my first class things using REST API i'm getting this json object :
{
"results": [
{
"Name": "Apple",
"createdAt": "2015-07-12T02:50:20.291Z",
"objectId": "JDFHSJFxv",
"category": {
"__type": "Pointer",
"className": "Teams",
"objectId": "QSGKqf343"
},
"updatedAt": "2015-07-12T02:55:33.696Z"
}
]
}
the json doesn't contains all the data included in the object i'm pointing to, I will have to make another request to get all the data of that object,
is There any way to fix that
You need to tell Parse to return the related object in your query, via the include key.
e.g., add the following to your CURL --data-urlencode 'include=category'

Parsing and manipulating json in Scala

I have this JSON that is returned from a REST-service I'm using.
{
"id": "6804",
"signatories": [
{
"id": "12125",
"fields": [
{
"type": "standard",
"name": "fstname",
"value": "John"
},
{
"type": "standard",
"name": "sndname",
"value": "Doe"
},
{
"type": "standard",
"name": "email",
"value": "john.doe#somwhere.com"
},
{
"type": "standard",
"name": "sigco",
"value": "Company"
}
]
}
]
}
Currently I'm looking into a way to parse this with json4s, iterating over the "fields" array, to be able to change the property "value" of the different objects in there. So far I've tried a few json libs and ended up with json4s.
Json4s allows me to parse the json into a JObject, which I can try extract the "fields" array
from.
import org.json4s._
import org.json4s.native.JsonMethods._
// parse to JObject
val data = parse(json)
// extract the fields into a map
val fields = data \ "signatories" \ "fields"
// parse back to JSON
println(compact(render(fields)))
I've managed to extract a Map like this, and rendered it back to JSON again. What I can't figure out though is, how to loop through these fields and change the property "value" in them?
I've read the json4s documentation but I'm very new to both Scala and it's syntax so I'm having a difficult time.
The question becomes, how do I iterate over a parsed JSON result, to change the property "value"?
Here's the flow I want to achieve.
Parse JSON into iterable object
Loop through and look for certain "names" and change their value, for example fstname, from John to some other name.
Parse it back to JSON, so I can send the new JSON with the updated values back.
I don't know if this is the best way to do this at all, I'd really appreciate input, maybe there's an easier way to do this.
Thanks in advance,
Best regards,
Stefan Konno
You can convert the json into an array of case class which is the easiest thing to do. For example: you can have case class for Fields like
case class Field(`type`: String, name: String, value: String)
and you can convert your json into array of fields like read[Array[Field]](json) where json is
[
{
"type": "standard",
"name": "fstname",
"value": "John"
},
...
]
which will give you an array of fields. Similarly, you can model for your entire Json.
As now you have an array of case classes, its pretty simple to iterate the objects and change the value using case classes copy method.
After that, to convert the array of objects into Json, you can simply use write(objects) (read and write functions of Json4s are available in org.json4s.native.Serialization package.
Update
To do it without converting it into case class, you can use transformField function
parse(json).transformField{case JField(x, v) if x == "value" && v == JString("Company")=> JField("value1",JString("Company1"))}

Access object returned from Newtonsoft json DeserializeObject

Should be a no brainer, but I'm can't seem to access the elements returned from Newtonsoft's json deserializer.
Example json:
{
"ns0:Test": {
"xmlns:ns0": "http:/someurl",
"RecordCount": "6",
"Record": [{
"aaa": "1",
"bbb": "2",
},
{
"aaa": "1",
"bbb": "2",
}]
}
}
var result = Newtonsoft.Json.JsonConvert.DeserializeObject<dynamic>(somestring);
Stripping out the json up to the Record text, i can access the data without issue.
i.e. result.Recordcount
If i leave the json as shown above, can someone enlighten me how to access Recordcount?
All inputs appreciated. Thanks!
For those JSON properties that have punctuation characters or spaces (such that they cannot be made into valid C# property names), you can use square bracket syntax to access them.
Try this:
int count = result["ns0:Test"].RecordCount;