Groovy - Parse XML Response, Select Fields, Create New File - json

I think I've read every Groovy parsing question on here but I can't seem to find my exact scenario so I'm reaching out for help - please be kind, I'm new to Groovy and I've really bitten off more than I can chew in this latest endeavor.
So I have this XML Response:
<?xml version="1.0" encoding="UTF-8"?>
<worklogs date_from="2020-04-19 00:00:00" date_to="2020-04-25 23:59:59" number_of_worklogs="60" format="xml" diffOnly="false" errorsOnly="false" validOnly="false" addDeletedWorklogs="true" addBillingInfo="false" addIssueSummary="true" addIssueDescription="false" duration_ms="145" headerOnly="false" userName="smm288" addIssueDetails="false" addParentIssue="false" addUserDetails="true" addWorklogDetails="false" billingKey="" issueKey="" projectKey="" addApprovalStatus="true" >
<worklog>
<worklog_id></worklog_id>
<jira_worklog_id></jira_worklog_id>
<issue_id></issue_id>
<issue_key></issue_key>
<hours></hours>
<work_date></work_date>
<username></username>
<staff_id />
<billing_key></billing_key>
<billing_attributes></billing_attributes>
<activity_id></activity_id>
<activity_name></activity_name>
<work_description></work_description>
<parent_key></parent_key>
<reporter></reporter>
<external_id />
<external_tstamp />
<external_hours></external_hours>
<external_result />
<customField_11218></customField_11218>
<customField_12703></customField_12703>
<customField_12707></customField_12707>
<hash_value></hash_value>
<issue_summary></issue_summary>
<user_details>
<full_name></full_name>
<email></email>
<user-prop key="auto_approve_timesheet"></user-prop>
<user-prop key="cris_id"></user-prop>
<user-prop key="iqn_gl_string"></user-prop>
<user-prop key="is_contractor"></user-prop>
<user-prop key="is_employee"></user-prop>
<user-prop key="it_leadership"></user-prop>
<user-prop key="primary_role"></user-prop>
<user-prop key="resource_manager"></user-prop>
<user-prop key="team"></user-prop>
</user_details>
<approval_status></approval_status>
<timesheet_approval>
<status></status>
<status_date></status_date>
<reviewer></reviewer>
<actor></actor>
<comment></comment>
</timesheet_approval>
</worklog>
....
....
</worklogs>
And I'm retrieving this XML Response from an API call so the response is held within an object. NOTE: The sample XML above is from Postman.
What I'm trying to do is the following:
1. Only retrieve certain values from this response from all the nodes.
2. Write the values collected to a .json file.
I've created a map but now I'm kind of stuck on how to parse through it and create a .json file out of the fields I want.
This is what I have thus far
#Grab('org.codehaus.groovy.modules.http-builder:http-builder:0.7.1')
#Grab('oauth.signpost:signpost-core:1.2.1.2')
#Grab('oauth.signpost:signpost-commonshttp4:1.2.1.2')
import groovyx.net.http.RESTClient
import groovyx.net.http.Method
import static groovyx.net.http.ContentType.*
import groovyx.net.http.HttpResponseException
import groovy.json.JsonBuilder
import groovy.json.JsonOutput
import groovy.json.*
// User Credentials
def jiraAuth = ""
// JIRA Endpoints
def jiraUrl = "" //Dev
def jiraUrl = "" //Production
// Tempo API Tokens
//def tempoApiToken = "" //Dev
//def tempoApiToken = "" //Production
// Define Weekly Date Range
def today = new Date()
def lastPeriodStart = today - 8
def lastPeriodEnd = today - 2
def dateFrom = lastPeriodStart.format("yyyy-MM-dd")
def dateTo = lastPeriodEnd.format("yyyy-MM-dd")
def jiraClient = new RESTClient(jiraUrl)
jiraClient.ignoreSSLIssues()
def headers = [
"Authorization" : "Basic " + jiraAuth,
"X-Atlassian-token": "no-check",
"Content-Type" : "application/json"
]
def response = jiraClient.get(
path: "",
query: [
tempoApiToken: "${tempoApiToken}",
format: "xml",
dateFrom: "${dateFrom}",
dateTo: "${dateTo}",
addUserDetails: "true",
addApprovalStatus: "true",
addIssueSummary: "true"
],
headers: headers
) { response, worklogs ->
println "Processing..."
// Start building the Output - Creates a Worklog Map
worklogs.worklog.each { worklognodes ->
def workLog = convertToMap(worklognodes)
// Print out the Map
println (workLog)
}
}
// Helper Method
def convertToMap(nodes) {
nodes.children().collectEntries {
if (it.name() == 'user-prop') {
[it['#key'], it.childNodes() ? convertToMap(it) : it.text()]
} else {
[it.name(), it.childNodes() ? convertToMap(it) : it.text()]
}
}
}
I'm only interested in parsing out the following fields from each node:
<worklogs>
<worklog>
<hours>
<work_date>
<billing_key>
<customField_11218>
<issue_summary>
<user_details>
<full_name>
<user-prop key="auto_approve_timesheet">
<user-prop key="it_leadership">
<user-prop key="resource_manager">
<user-prop key="team">
<user-prop key="cris_id">
<user-prop key="iqn_id">
<approval_status>
</worklog>
...
</worklogs>
I've tried the following:
1. Converting the workLog to a json string (JsonOutput.toJson) and then converting the json string to prettyPrint (JsonOutput.prettyPrint) - but this just returns a collection of .json responses which I can't do anything with (thought process was, this is as good as I can get and I'll just use a .json to .csv converter and get rid of what I don't want) - which is not the solution I ultimately want.
2. Printing the map workLog just returns little collections which I can't do anything with either
3. Create a new file using File and creating a .json file of workLog but again, it doesn't translate well.
The results of the println for workLog is here (just so everyone can see that the response is being held and the map matches the XML response).
[worklog_id: , jira_worklog_id: , issue_id: , issue_key: , hours: , work_date: , username: , staff_id: , billing_key: , billing_attributes: , activity_id: , activity_name: , work_description: , parent_key: , reporter: , external_id:, external_tstamp:, external_hours: , external_result:, customField_11218: , hash_value: , issue_summary: , user_details:[full_name: , email: , auto_approve_timesheet: , cris_id: , iqn_gl_approver: , iqn_gl_string: , iqn_id: , is_contractor: , is_employee: , it_leadership: , primary_role: , resource_manager: , team: ], approval_status: , timesheet_approval:[status: ]]
I would so appreciate it if anyone could offer some insights on how to move forward or even documentation that has good examples of what I'm trying to achieve (Apache's documentation is sorely lacking in examples, in my opinion).

It's not all of the way there. But, I was able to get a JSON file created with the XML and Map. From there I can just use the .json file to create a .csv and then get rid of the columns I don't want.
// Define Weekly Date Range
def today = new Date()
def lastPeriodStart = today - 8
def lastPeriodEnd = today - 2
def dateFrom = lastPeriodStart.format("yyyy-MM-dd")
def dateTo = lastPeriodEnd.format("yyyy-MM-dd")
def jiraClient = new RESTClient(jiraUrl)
jiraClient.ignoreSSLIssues()
// Creates and Begins the File
File file = new File("${dateFrom}_RPT05.json")
file.write("")
file.append("[\n")
// Defines the File
def arrplace = 0
def arrsize = 0
def headers = [
"Authorization" : "Basic " + jiraAuth,
"X-Atlassian-token": "no-check",
"Content-Type" : "application/json"
]
def response = jiraClient.get(
path: "/plugins/servlet/tempo-getWorklog/",
query: [
tempoApiToken: "${tempoApiToken}",
format: "xml",
dateFrom: "${dateFrom}",
dateTo: "${dateTo}",
addUserDetails: "true",
addApprovalStatus: "true",
addIssueSummary: "true"
],
headers: headers
) { response, worklogs ->
println "Processing..."
// Gets Size of Array
worklogs.worklog.each { worklognodes ->
arrsize = arrsize+1 }
// Start Building the Output - Creates a Worklog Map
worklogs.worklog.each { worklognodes ->
worklognodes = convertToMap(worklognodes)
// Convert Map to a JSON String
def json_str = JsonOutput.toJson(worklognodes)
// Adds Row to File
file.append(json_str)
arrplace = arrplace+1
if(arrplace<arrsize)
{file.append(",")}
file.append("\n")
print "."
}
}
file.append("]")
// Helper Method
def convertToMap(nodes) {
nodes.children().collectEntries {
if (it.name() == 'user-prop') {
[it['#key'], it.childNodes() ? convertToMap(it) : it.text()]
} else {
[it.name(), it.childNodes() ? convertToMap(it) : it.text()]
}
}
}
The output is a collection/array of worklogs.

Related

Pulling specific Parent/Child JSON data with Python

I'm having a difficult time figuring out how to pull specific information from a json file.
So far I have this:
# Import json library
import json
# Open json database file
with open('jsondatabase.json', 'r') as f:
data = json.load(f)
# assign variables from json data and convert to usable information
identifier = data['ID']
identifier = str(identifier)
name = data['name']
name = str(name)
# Collect data from user to compare with data in json file
print("Please enter your numerical identifier and name: ")
user_id = input("Numerical identifier: ")
user_name = input("Name: ")
if user_id == identifier and user_name == name:
print("Your inputs matched. Congrats.")
else:
print("Your inputs did not match our data. Please try again.")
And that works great for a simple JSON file like this:
{
"ID": "123",
"name": "Bobby"
}
But ideally I need to create a more complex JSON file and can't find deeper information on how to pull specific information from something like this:
{
"Parent": [
{
"Parent_1": [
{
"Name": "Bobby",
"ID": "123"
}
],
"Parent_2": [
{
"Name": "Linda",
"ID": "321"
}
]
}
]
}
Here is an example that you might be able to pick apart.
You could either:
Make a custom de-jsonify object_hook as shown below and do something with it. There is a good tutorial here.
Just gobble up the whole dictionary that you get without a custom de-jsonify and drill down into it and make a list or set of the results. (not shown)
Example:
import json
from collections import namedtuple
data = '''
{
"Parents":
[
{
"Name": "Bobby",
"ID": "123"
},
{
"Name": "Linda",
"ID": "321"
}
]
}
'''
Parent = namedtuple('Parent', ['name', 'id'])
def dejsonify(json_str: dict):
if json_str.get("Name"):
parent = Parent(json_str.get('Name'), int(json_str.get('ID')))
return parent
return json_str
res = json.loads(data, object_hook=dejsonify)
print(res)
# then we can do whatever... if you need lookups by name/id,
# we could put the result into a dictionary
all_parents = {(p.name, p.id) : p for p in res['Parents']}
lookup_from_input = ('Bobby', 123)
print(f'found match: {all_parents.get(lookup_from_input)}')
Result:
{'Parents': [Parent(name='Bobby', id=123), Parent(name='Linda', id=321)]}
found match: Parent(name='Bobby', id=123)

creating a nested json document

I have below document in mongodb.
I am using below python code to save it in .json file.
file = 'employee'
json_cur = find_document(file)
count_document = emp_collection.count_documents({})
with open(file_path, 'w') as f:
f.write('[')
for i, document in enumerate(json_cur, 1):
print("document : ", document)
f.write(dumps(document))
if i != count_document:
f.write(',')
f.write(']')
the output is -
{
"_id":{
"$oid":"611288c262c5c14df84f649b"
},
"Lname":"Borg",
"Fname":"James",
"Dname":"Headquarters",
"Projects":"[{"HOURS": 5.0, "PNAME": "Reorganization", "PNUMBER": 20}]"
}
But i need it like this (Projects value without quotes) -
{
"_id":{
"$oid":"611288c262c5c14df84f649b"
},
"Lname":"Borg",
"Fname":"James",
"Dname":"Headquarters",
"Projects":[{"HOURS": 5.0, "PNAME": "Reorganization", "PNUMBER": 20}]
}
Could anyone please help me to resolve this?
Thanks,
Jay
You should parse the JSON from the Projects field
Like this:
from json import loads
document['Projects'] = loads(document['Projects'])
So,
file = 'employee'
json_cur = find_document(file)
count_document = emp_collection.count_documents({})
with open(file_path, 'w') as f:
f.write('[')
for i, document in enumerate(json_cur, 1):
document['Projects'] = loads(document['Projects'])
print("document : ", document)
f.write(dumps(document))
if i != count_document:
f.write(',')
f.write(']')

Executing Groovy script from JSON

If I want to send multiline string in JSON using curl the only way is to concatenate string using spaces. So this is working. But what to do if I send groovy script which will be run? Like in content property:
{
"name": "helloWorld",
"content": "repository.createDockerHosted('docker_private', 8082_1, null) repository.createDockerHosted('docker_private_2', 8082, null)",
"type": "groovy"
}
Is it possible to inform groovy compiler that there are two instructions? Maybe by adding some special character to content value?
Thanks for help :)
To execute a string as Groovy script, the standard pattern is to use GroovyShell with a Binding:
class Repository {
def createDockerHosted(a, b, c) {
println "TRACER createDockerHosted $a $b $c"
}
}
def content = "repository.createDockerHosted('docker_private', 8082_1, null) ; repository.createDockerHosted('docker_private_2', 8082, null)"
def binding = new Binding()
binding.setProperty('repository', new Repository())
def shell = new GroovyShell(binding)
shell.evaluate(content)
(Note in the above that content has ; to separate the two statements) Output:
$ groovy Example.groovy
TRACER createDockerHosted docker_private 80821 null
TRACER createDockerHosted docker_private_2 8082 null
So for your situation (if I understand it), the only issue is how to extract content from JSON:
def jsonStr = '''
{
"name": "helloWorld",
"content": "repository.createDockerHosted('docker_private', 8082_1, null) ; repository.createDockerHosted('docker_private_2', 8082, null)",
"type": "groovy"
}
'''
def jsonMap = new groovy.json.JsonSlurper().parseText(jsonStr)
def content = jsonMap['content']

Emit Python embedded object as native JSON in YAML document

I'm importing webservice tests from Excel and serialising them as YAML.
But taking advantage of YAML being a superset of JSON I'd like the request part of the test to be valid JSON, i.e. to have delimeters, quotes and commas.
This will allow us to cut and paste requests between the automated test suite and manual test tools (e.g. Postman.)
So here's how I'd like a test to look (simplified):
- properties:
METHOD: GET
TYPE: ADDRESS
Request URL: /addresses
testCaseId: TC2
request:
{
"unitTypeCode": "",
"unitNumber": "15",
"levelTypeCode": "L",
"roadNumber1": "810",
"roadName": "HAY",
"roadTypeCode": "ST",
"localityName": "PERTH",
"postcode": "6000",
"stateTerritoryCode": "WA"
}
In Python, my request object has a dict attribute called fields which is the part of the object to be serialised as JSON. This is what I tried:
import yaml
def request_presenter(dumper, request):
json_string = json.dumps(request.fields, indent=8)
return dumper.represent_str(json_string)
yaml.add_representer(Request, request_presenter)
test = Test(...including embedded request object)
serialised_test = yaml.dump(test)
I'm getting:
- properties:
METHOD: GET
TYPE: ADDRESS
Request URL: /addresses
testCaseId: TC2
request: "{
\"unitTypeCode\": \"\",\n
\"unitNumber\": \"15\",\n
\"levelTypeCode": \"L\",\n
\"roadNumber1\": \"810\",\n
\"roadName\": \"HAY\",\n
\"roadTypeCode\": \"ST\",\n
\"localityName\": \"PERTH\",\n
\"postcode\": \"6000\",\n
\"stateTerritoryCode\": \"WA\"\n
}"
...only worse because it's all on one line and has white space all over the place.
I tried using the | style for literal multi-line strings which helps with the line breaks and escaped quotes (it's more involved but this answer was helpful.) However, escaped or multiline, the result is still a string that will need to be parsed separately.
How can I stop PyYaml analysing the JSON block as a string and make it just accept a block of text as part of the emitted YAML? I'm guessing it's something to do with overriding the emitter but I could use some help. If possible I'd like to avoid post-processing the serialised test to achieve this.
Ok, so this was the solution I came up with. Generate the YAML with a placemarker ahead of time. The placemarker marks the place where the JSON should be inserted, and also defines the root-level indentation of the JSON block.
import os
import itertools
import json
def insert_json_in_yaml(pre_insert_yaml, key, obj_to_serialise):
marker = '%s: null' % key
marker_line = line_of_first_occurrence(pre_insert_yaml, marker)
marker_indent = string_indent(marker_line)
serialised = json.dumps(obj_to_serialise, indent=marker_indent + 4)
key_with_json = '%s: %s' % (key, serialised)
serialised_with_json = pre_insert_yaml.replace(marker, key_with_json)
return serialised_with_json
def line_of_first_occurrence(basestring, substring):
"""
return line number of first occurrence of substring
"""
lineno = lineno_of_first_occurrence(basestring, substring)
return basestring.split(os.linesep)[lineno]
def string_indent(s):
"""
return indentation of a string (no of spaces before a nonspace)
"""
spaces = ''.join(itertools.takewhile(lambda c: c == ' ', s))
return len(spaces)
def lineno_of_first_occurrence(basestring, substring):
"""
return line number of first occurrence of substring
"""
return basestring[:basestring.index(substring)].count(os.linesep)
embedded_object = {
"unitTypeCode": "",
"unitNumber": "15",
"levelTypeCode": "L",
"roadNumber1": "810",
"roadName": "HAY",
"roadTypeCode": "ST",
"localityName": "PERTH",
"postcode": "6000",
"stateTerritoryCode": "WA"
}
yaml_string = """
---
- properties:
METHOD: GET
TYPE: ADDRESS
Request URL: /addresses
testCaseId: TC2
request: null
after_request: another value
"""
>>> print(insert_json_in_yaml(yaml_string, 'request', embedded_object))
- properties:
METHOD: GET
TYPE: ADDRESS
Request URL: /addresses
testCaseId: TC2
request: {
"unitTypeCode": "",
"unitNumber": "15",
"levelTypeCode": "L",
"roadNumber1": "810",
"roadName": "HAY",
"roadTypeCode": "ST",
"localityName": "PERTH",
"postcode": "6000",
"stateTerritoryCode": "WA"
}
after_request: another value

BSON structure created by Apache Spark and MongoDB Hadoop-Connector

I'm trying to save some JSON from Spark (Scala) to MongoDB using the MongoDB Hadoop-Connector. The problem I'm having is that this API always seems to save your data as "{_id: ..., value: {your JSON document}}".
In the code example below, my document gets saved like this:
{
"_id" : ObjectId("55e80cfea9fbee30aa703261"),
"value" : {
"_id" : "55e6c65da9fbee285f2f9175",
"year" : 2014,
"month" : 5,
"day" : 6,
"hour" : 18,
"user_id" : 246
}
}
Is there any way to persuade the MongoDB Hadoop Connector to write the JSON/BSON in the structure you've specified, instead of nesting it under these _id/value fields?
Here's my Scala Spark code:
val jsonstr = List("""{
"_id" : "55e6c65da9fbee285f2f9175",
"year" : 2014,
"month" : 5,
"day" : 6,
"hour" : 18,
"user_id" : 246}""")
val conf = new SparkConf().setAppName("Mongo Dummy").setMaster("local[*]")
val sc = new SparkContext(conf)
// DB params
val host = "127.0.0.1"
val port = "27017"
val database = "dummy"
val collection = "fubar"
// input is collection we want to read (not doing so here)
val mongo_input = s"mongodb://$host/$database.$collection"
// output is collection we want to write
val mongo_output = s"mongodb://$host/$database.$collection"
// Set up extra config for Hadoop connector
val hadoopConfig = new Configuration()
//hadoopConfig.set("mongo.input.uri", mongo_input)
hadoopConfig.set("mongo.output.uri", mongo_output)
// convert JSON to RDD
val rdd = sc.parallelize(jsonstr)
// write JSON data to DB
val saveRDD = rdd.map { json =>
(null, Document.parse(json))
}
saveRDD.saveAsNewAPIHadoopFile("file:///bogus",
classOf[Object],
classOf[BSONObject],
classOf[MongoOutputFormat[Object, BSONObject]],
hadoopConfig)
// Finished
sc.stop
And here's my SBT:
name := "my-mongo-test"
version := "1.0"
scalaVersion := "2.10.4"
// Spark needs to appear in SBT BEFORE Mongodb connector!
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.0"
// MongoDB-Hadoop connector
libraryDependencies += "org.mongodb.mongo-hadoop" % "mongo-hadoop-core" % "1.4.0"
To be honest, I'm kind of mystified at how hard it seems to be to save JSON --> BSON --> MongoDB from Spark. So any suggestions on how to save my JSON data more flexibly would be welcomed.
Well, I just found the solution. It turns out that MongoRecordWriter which is used by MongoOutputFormat inserts any value that does not inherit from BSONWritable or MongoOutput or BSONObject under value field.
The most simple solution, therefore, is to create RDD that contain BSONObject as a value, rather than Document.
I tried this solution in Java, but I'm sure it will work in Scala as well. Here is a sample code:
JavaPairRDD<Object, BSONObject> bsons = values.mapToPair(lineValues -> {
BSONObject doc = new BasicBSONObject();
doc.put("field1", lineValues.get(0));
doc.put("field2", lineValues.get(1));
return new Tuple2<Object, BSONObject>(UUID.randomUUID().toString(), doc);
});
Configuration outputConfig = new Configuration();
outputConfig.set("mongo.output.uri",
"mongodb://localhost:27017/my_db.lines");
bsons.saveAsNewAPIHadoopFile("file:///this-is-completely-unused"
, Object.class
, BSONObject.class
, MongoOutputFormat.class
, outputConfig);