Read unformated JSON file in Pyspark

Read unformated JSON file in Pyspark - json

I am trying to read a simple json file into a dataframe. Here is the JSON:
{
"apiName": "bctt-e-card-operations",
"correlationId": "bctt-e-card-operations",
"msg": "{ "MSG_TIP": "1261", "MSG_VER": "02", "LOG_SIS": "32", "LOG_PERN01": "2122", "LOG_NUMN01": "30108916", "MSG_RESTIPA00": "0", "MSG_IDE": "00216452708916", "MSG_REACOD": "000", "SAN_NUM": "000010X01941XXX", "EXT_MOECOD": "978", "SAN_SDIMNT": "00000043830", "LOG_SINMOV": "C", "SAN_SAUMNT": "00000043830", "LOG_SINMOV": "C", "SAN_SCOMNT": "00000043830", "LOG_SINMOV": "C", "SAN_SCODAT": "20220502", "SDF_FORDAD": "0", "CAR_PAGXTR": "0", "CLI_LIMCRE": "0000000", "SDF_AUTCAT": "000000000", "LOG_SINMOV": "C", "SDF_SLDDIV": "000000000", "CLI_SDICSH": "000000000", "LOG_SINMOV": "C", "CLI_SDICPR": "000000000", "LOG_SINMOV": "C", "MSG_DADLGT": "0000" }"
"contentType": "application/json",
"msgFormat": "SIBS"
"step": "response",
"status": "0",
"transTimestamp": "2022-05-02 16:45:28.487",
"operationMetadata": {
"msgType": "PAYMENT",
"msgSubtype": "116100-02300",
"accountId": "10X01941XXX",
"cardNumber": "451344X063617XXX",
"reference": "212230108916",
"originalReference": null,
"reversalReference": null,
"timeout": false
"flow": "PRT"
},
"error": {
"errorCode": null
"errorDescription": null
}
}
Here is my code to read the file
file_location = "/mnt/test/event example.json"
df = spark.read.option("multiline", "true").json(file_location, schema=schema)
display(df)
But as you can see the json file is missing some commas and also the 'msg' key has its value wrapped in quotes. This makes the dataframe return everything with nulls.
Is there a way to format the JSON, (in Pyspark because the file comes like that from the source), for removing the quotes and adding commas so the json is properly formated?
Thanks in advance

You could turn the JSON into a string and use Python RegEx sub() function to reformat the string.
Example:
import re
json_string= " "contentType": "application/json",
"msgFormat": "SIBS"
"step": "response","
x = re.sub("\"SIBS\"", "\"SIBS\"," , json_string)
print(json_string)
" "contentType": "application/json",
"msgFormat": "SIBS",
"step": "response","
In this case, the function is looking for the value "SIBS" and replacing it with "SIBS", .You might need to play around with it and use escape characters.

Related

SwiftyJSON not working with WebSocket message

I have a socket response that is this:
{"op":0,"d":{"author":{"id":"6699457769390473216","name":"Test","verified":false},"unixTime":1597277057132,"id":"6699465549836976128","group":"64632423765273287342","content":"Yo","_id":"5f34838198980c0023fa49e3"},"t":"MESSAGE"}
and I need to access the "d" object, I've tried doing
print(JSON(data)["d"])
and it just Returns null every time.

If data is of type String, you are probably using the wrong init method to initialize the JSON object. Try using init(parseJSON:) like this:
let jsonString = """
{
"op": 0,
"d": {
"author": {
"id": "6699457769390473216",
"name": "Test",
"verified": false
},
"unixTime": 1597277057132,
"id": "6699465549836976128",
"group": "64632423765273287342",
"content": "Yo",
"_id": "5f34838198980c0023fa49e3"
},
"t": "MESSAGE"
}
"""
let json = JSON(parseJSON: jsonString)
print(json["d"])

How to convert json to csv with single header and multiple values?

I have input
data = [
{
"details": [
{
"health": "Good",
"id": "1",
"timestamp": 1579155574
},
{
"health": "Bad",
"id": "1",
"timestamp": 1579155575
}
]
},
{
"details": [
{
"health": "Good",
"id": "2",
"timestamp": 1588329978
},
{
"health": "Good",
"device_id": "2",
"timestamp": 1588416380
}
]
}
]
Now I want to convert it in csv something like below,
id,health
1,Good - 1579155574,Bad - 1579155575
2,Good - 1588329978,Good - 1588416380
Is this possible?
Currently I am converting this in simple csv, my code and response are as below,
f = csv.writer(open("test.csv", "w", newline=""))
f.writerow(["id", "health", "timestamp"])
for data in data:
for details in data['details']:
f.writerow([details['id'],
details["health"],
details["timestamp"],
])
Response:
id,health,timestamp
1,Good,1579155574
1,Bad,1579155575
2,Good,1579261319
2,Good,1586911295
So how could I get the expected output? I am using python3.

You almost have done your job, I think you do not need use csv module.
And CSV does not mean anything, it just a name let people know what it is. CSV ,TXT and JSON are same things to computers, they are something to record the words.
I don't know whole patterns of your datas, but you can get output value you want.
output = 'id,health\n'
for data in datas:
output += f'{data["details"][0]["id"]},'
for d in data["details"]:
if 'health' in d:
output += f'{d["health"]} - {d["timestamp"]},'
else:
output += f'{d["battery_health"]} - {d["timestamp"]},'
output = output[:-1] + '\n'
with open('test.csv', 'w') as op:
op.write(output)

Concatenating lists in Groovy

I have captured two sets of values by using JSON extractor in JMeter which I want to concatenate. Let me give you an example below for the format which I want to use.
The following are the two sets of captured values:
Set 1: [V2520 V2522 V2521 V2500 V2500]
Set 2: [PL PL PL NP NP]
So from the above sets, I am looking for the something like the following value, because the body which I have to send in a subsequent call contains the combination of these 2 values:
Answer: ["V2520PL", "V2522PL", "V2521PL", "V2500NP", "V2500NP"]
Can you please help me how to solve this in JMeter using Groovy?
This is the JSON I have:
{ "body": {
"responseObject": [
{
"benefitInfo": [
{
"procedureCode": "V2520",
"modifier": "PL",
"usage": "Dress",
"authorizationID": null,
"description": "ContactLensDisposable",
"id": "96",
"coPayAmount": "25"
},
{
"procedureCode": "V2522",
"modifier": "PL",
"usage": "Dress",
"authorizationID": null,
"description": "ContactLensDisposableBifocal",
"id": "98",
"coPayAmount": "25"
},
{
"procedureCode": "V2521",
"modifier": "PL",
"usage": "Dress",
"authorizationID": null,
"description": "ContactLensDisposableToric",
"id": "97",
"coPayAmount": "25"
},
{
"procedureCode": "V2500",
"modifier": "NP",
"usage": "Dress",
"authorizationID": null,
"description": "ContactLens (Non Plan)",
"id": "89",
"coPayAmount": "0"
},
{
"procedureCode": "V2500",
"modifier": "NP",
"usage": "Dress",
"authorizationID": null,
"description": "ContactLensConventional (Non Plan)",
"id": "157",
"coPayAmount": "0"
}
]
}
]}}

An easy way to do this is to combine them as you collect the values from the JSON when you parse it.
def json = new groovy.json.JsonSlurper().parseText(text)
def answer = json.body.responseObject[0].benefitInfo.collect { it.procedureCode + it.modifier }
assert answer == ["V2520PL", "V2522PL", "V2521PL", "V2500NP", "V2500NP"]

Another method would be to use transpose() and join():
def r = new groovy.json.JsonSlurper().parseText(text).body.responseObject.benefitInfo[0]
def answer = [r.procedureCode, r.modifier].transpose()*.join()
assert answer == ["V2520PL", "V2522PL", "V2521PL", "V2500NP", "V2500NP"]

Add JSR223 PostProcessor as a child of the request which returns the above JSON
Put the following code into "Script" area:
def answer = []
def benefitInfos = com.jayway.jsonpath.JsonPath.read(prev.getResponseDataAsString(), '$..benefitInfo')
benefitInfos.each { benefitInfo ->
benefitInfo.each { entry ->
answer.add(entry.get('procedureCode') + entry.get('modifier'))
}
}
vars.put('answer', new groovy.json.JsonBuilder(answer).toPrettyString())
That's it, you will be able to access generated value as ${answer} where required:
References:
Jayway JsonPath
Groovy: Parsing and producing JSON
Apache Groovy - Why and How You Should Use It

How can i parse json file in scala spark 2.0 and can I insert data into hive table?

I want to parse json file in spark 2.0(scala). Next i want to save the data.. in Hive table.
How can i parse json file by using scala?
json file example) metadata.json:
{
"syslog": {
"month": "Sep",
"day": "26",
"time": "23:03:44",
"host": "cdpcapital.onmicrosoft.com"
},
"prefix": {
"cef_version": "CEF:0",
"device_vendor": "Microsoft",
"device_product": "SharePoint Online",
},
"extensions": {
"eventId": "7808891",
"msg": "ManagedSyncClientAllowed",
"art": "1506467022378",
"cat": "SharePoint",
"act": "ManagedSyncClientAllowed",
"rt": "1506466717000",
"requestClientApplication": "Microsoft SkyDriveSync",
"cs1": "0bdbe027-8f50-4ec3-843f-e27c41a63957",
"cs1Label": "Organization ID",
"cs2Label": "Modified Properties",
"ahost": "cdpdiclog101.cgimss.com",
"agentZoneURI": "/All Zones",
"amac": "F0-1F-AF-DA-8F-1B",
"av": "7.6.0.8009.0",
}
},
Thanks

You can use something like:
val jsonDf = sparkSession
.read
//.option("wholeFile", true) if its not a Single Line JSON
.json("resources/json/metadata.json")
jsonDf.printSchema()
jsonDf.registerTempTable("metadata")
More details about this https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

Stripping model pk and fields text from Django queryset json output

In a view of django, I want to output the queryset converted to json without the model, pk, and field text.
my view code:
s = serializers.serialize('json', Item.objects.get(id=actuators_id)])
o = s.strip("[]")
return HttpResponse(o, content_type="application/json")
What I get is this:
{"model": "actuators.acutatoritem", "pk": 1, "fields": {"name": "Environment Heater", "order": 1, "controlid": "AAHE", "index": "1", "param1": "", "param2": "", "param3": "", "current_state": "unknown"}}
What I spend all day NOT getting is this:
{"name": "Environment Heater", "order": 1, "controlid": "AAHE", "index": "1", "param1": "", "param2": "", "param3": "", "current_state": "unknown"}
I can I strip the model, pk, and field text from my output????

Use simplejson to convert the qs to a python dict {}
import simplejson
s = serializers.serialize('json', Item.objects.filter(id=actuators_id)])
js = simplejson.loads(s)
//select the key needed and return the response
s = js[0]['fields']
return HttpResponse(str(s), content_type="application/json")

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Read unformated JSON file in Pyspark - json

Related

SwiftyJSON not working with WebSocket message

How to convert json to csv with single header and multiple values?

Concatenating lists in Groovy

How can i parse json file in scala spark 2.0 and can I insert data into hive table?

Stripping model pk and fields text from Django queryset json output

Categories

Resources