pymongo and output of aggregate - output

Here is my pymongo call
from pymongo import MongoClient
client = MongoClient('localhost', 27017)
db = client['somedb']
collection = db.some_details
pipe = [{'$group': {'_id': '$mvid', 'count': {'$sum': 1}}}]
TestOutput = db.collection.aggregate(pipeline=pipe)
print(list(TestOutput))
client.close()
For some reason resulting list is empty, while in Robomongo I get nonempty output.
Is formatting incorrect?
The exact Robomongo query is
db.some_details.aggregate([{$group: {_id: '$mvid', count: {$sum: 1}}}])
UPDATE
The output looks like
{
"result" : [
{
"_id" : "4f973d56a64facfaa7c3r4rf262ad5be695eef329aff7ab4610ddedfb8137427",
"count" : 84.0000000000000000
},
{
"_id" : "a134106e1a1551d296fu777cedc933e7df2d0a9bc5f41de047aba3ee29bace78",
"count" : 106.0000000000000000
},
],
"ok" : 1.0000000000000000
}

You are again adding db to collection otherwise code seems to be OK to me.
Here is modified version of your code :
from pymongo import MongoClient
client = MongoClient('localhost', 27017)
db = client['somedb']
collection = db.some_details
pipe = [{'$group': {'_id': '$mvid', 'count': {'$sum': 1}}}]
# Notice the below line
TestOutput = collection.aggregate(pipeline=pipe)
print(list(TestOutput))
client.close()

Related

Pulling specific Parent/Child JSON data with Python

I'm having a difficult time figuring out how to pull specific information from a json file.
So far I have this:
# Import json library
import json
# Open json database file
with open('jsondatabase.json', 'r') as f:
data = json.load(f)
# assign variables from json data and convert to usable information
identifier = data['ID']
identifier = str(identifier)
name = data['name']
name = str(name)
# Collect data from user to compare with data in json file
print("Please enter your numerical identifier and name: ")
user_id = input("Numerical identifier: ")
user_name = input("Name: ")
if user_id == identifier and user_name == name:
print("Your inputs matched. Congrats.")
else:
print("Your inputs did not match our data. Please try again.")
And that works great for a simple JSON file like this:
{
"ID": "123",
"name": "Bobby"
}
But ideally I need to create a more complex JSON file and can't find deeper information on how to pull specific information from something like this:
{
"Parent": [
{
"Parent_1": [
{
"Name": "Bobby",
"ID": "123"
}
],
"Parent_2": [
{
"Name": "Linda",
"ID": "321"
}
]
}
]
}
Here is an example that you might be able to pick apart.
You could either:
Make a custom de-jsonify object_hook as shown below and do something with it. There is a good tutorial here.
Just gobble up the whole dictionary that you get without a custom de-jsonify and drill down into it and make a list or set of the results. (not shown)
Example:
import json
from collections import namedtuple
data = '''
{
"Parents":
[
{
"Name": "Bobby",
"ID": "123"
},
{
"Name": "Linda",
"ID": "321"
}
]
}
'''
Parent = namedtuple('Parent', ['name', 'id'])
def dejsonify(json_str: dict):
if json_str.get("Name"):
parent = Parent(json_str.get('Name'), int(json_str.get('ID')))
return parent
return json_str
res = json.loads(data, object_hook=dejsonify)
print(res)
# then we can do whatever... if you need lookups by name/id,
# we could put the result into a dictionary
all_parents = {(p.name, p.id) : p for p in res['Parents']}
lookup_from_input = ('Bobby', 123)
print(f'found match: {all_parents.get(lookup_from_input)}')
Result:
{'Parents': [Parent(name='Bobby', id=123), Parent(name='Linda', id=321)]}
found match: Parent(name='Bobby', id=123)

Groovy - Parse XML Response, Select Fields, Create New File

I think I've read every Groovy parsing question on here but I can't seem to find my exact scenario so I'm reaching out for help - please be kind, I'm new to Groovy and I've really bitten off more than I can chew in this latest endeavor.
So I have this XML Response:
<?xml version="1.0" encoding="UTF-8"?>
<worklogs date_from="2020-04-19 00:00:00" date_to="2020-04-25 23:59:59" number_of_worklogs="60" format="xml" diffOnly="false" errorsOnly="false" validOnly="false" addDeletedWorklogs="true" addBillingInfo="false" addIssueSummary="true" addIssueDescription="false" duration_ms="145" headerOnly="false" userName="smm288" addIssueDetails="false" addParentIssue="false" addUserDetails="true" addWorklogDetails="false" billingKey="" issueKey="" projectKey="" addApprovalStatus="true" >
<worklog>
<worklog_id></worklog_id>
<jira_worklog_id></jira_worklog_id>
<issue_id></issue_id>
<issue_key></issue_key>
<hours></hours>
<work_date></work_date>
<username></username>
<staff_id />
<billing_key></billing_key>
<billing_attributes></billing_attributes>
<activity_id></activity_id>
<activity_name></activity_name>
<work_description></work_description>
<parent_key></parent_key>
<reporter></reporter>
<external_id />
<external_tstamp />
<external_hours></external_hours>
<external_result />
<customField_11218></customField_11218>
<customField_12703></customField_12703>
<customField_12707></customField_12707>
<hash_value></hash_value>
<issue_summary></issue_summary>
<user_details>
<full_name></full_name>
<email></email>
<user-prop key="auto_approve_timesheet"></user-prop>
<user-prop key="cris_id"></user-prop>
<user-prop key="iqn_gl_string"></user-prop>
<user-prop key="is_contractor"></user-prop>
<user-prop key="is_employee"></user-prop>
<user-prop key="it_leadership"></user-prop>
<user-prop key="primary_role"></user-prop>
<user-prop key="resource_manager"></user-prop>
<user-prop key="team"></user-prop>
</user_details>
<approval_status></approval_status>
<timesheet_approval>
<status></status>
<status_date></status_date>
<reviewer></reviewer>
<actor></actor>
<comment></comment>
</timesheet_approval>
</worklog>
....
....
</worklogs>
And I'm retrieving this XML Response from an API call so the response is held within an object. NOTE: The sample XML above is from Postman.
What I'm trying to do is the following:
1. Only retrieve certain values from this response from all the nodes.
2. Write the values collected to a .json file.
I've created a map but now I'm kind of stuck on how to parse through it and create a .json file out of the fields I want.
This is what I have thus far
#Grab('org.codehaus.groovy.modules.http-builder:http-builder:0.7.1')
#Grab('oauth.signpost:signpost-core:1.2.1.2')
#Grab('oauth.signpost:signpost-commonshttp4:1.2.1.2')
import groovyx.net.http.RESTClient
import groovyx.net.http.Method
import static groovyx.net.http.ContentType.*
import groovyx.net.http.HttpResponseException
import groovy.json.JsonBuilder
import groovy.json.JsonOutput
import groovy.json.*
// User Credentials
def jiraAuth = ""
// JIRA Endpoints
def jiraUrl = "" //Dev
def jiraUrl = "" //Production
// Tempo API Tokens
//def tempoApiToken = "" //Dev
//def tempoApiToken = "" //Production
// Define Weekly Date Range
def today = new Date()
def lastPeriodStart = today - 8
def lastPeriodEnd = today - 2
def dateFrom = lastPeriodStart.format("yyyy-MM-dd")
def dateTo = lastPeriodEnd.format("yyyy-MM-dd")
def jiraClient = new RESTClient(jiraUrl)
jiraClient.ignoreSSLIssues()
def headers = [
"Authorization" : "Basic " + jiraAuth,
"X-Atlassian-token": "no-check",
"Content-Type" : "application/json"
]
def response = jiraClient.get(
path: "",
query: [
tempoApiToken: "${tempoApiToken}",
format: "xml",
dateFrom: "${dateFrom}",
dateTo: "${dateTo}",
addUserDetails: "true",
addApprovalStatus: "true",
addIssueSummary: "true"
],
headers: headers
) { response, worklogs ->
println "Processing..."
// Start building the Output - Creates a Worklog Map
worklogs.worklog.each { worklognodes ->
def workLog = convertToMap(worklognodes)
// Print out the Map
println (workLog)
}
}
// Helper Method
def convertToMap(nodes) {
nodes.children().collectEntries {
if (it.name() == 'user-prop') {
[it['#key'], it.childNodes() ? convertToMap(it) : it.text()]
} else {
[it.name(), it.childNodes() ? convertToMap(it) : it.text()]
}
}
}
I'm only interested in parsing out the following fields from each node:
<worklogs>
<worklog>
<hours>
<work_date>
<billing_key>
<customField_11218>
<issue_summary>
<user_details>
<full_name>
<user-prop key="auto_approve_timesheet">
<user-prop key="it_leadership">
<user-prop key="resource_manager">
<user-prop key="team">
<user-prop key="cris_id">
<user-prop key="iqn_id">
<approval_status>
</worklog>
...
</worklogs>
I've tried the following:
1. Converting the workLog to a json string (JsonOutput.toJson) and then converting the json string to prettyPrint (JsonOutput.prettyPrint) - but this just returns a collection of .json responses which I can't do anything with (thought process was, this is as good as I can get and I'll just use a .json to .csv converter and get rid of what I don't want) - which is not the solution I ultimately want.
2. Printing the map workLog just returns little collections which I can't do anything with either
3. Create a new file using File and creating a .json file of workLog but again, it doesn't translate well.
The results of the println for workLog is here (just so everyone can see that the response is being held and the map matches the XML response).
[worklog_id: , jira_worklog_id: , issue_id: , issue_key: , hours: , work_date: , username: , staff_id: , billing_key: , billing_attributes: , activity_id: , activity_name: , work_description: , parent_key: , reporter: , external_id:, external_tstamp:, external_hours: , external_result:, customField_11218: , hash_value: , issue_summary: , user_details:[full_name: , email: , auto_approve_timesheet: , cris_id: , iqn_gl_approver: , iqn_gl_string: , iqn_id: , is_contractor: , is_employee: , it_leadership: , primary_role: , resource_manager: , team: ], approval_status: , timesheet_approval:[status: ]]
I would so appreciate it if anyone could offer some insights on how to move forward or even documentation that has good examples of what I'm trying to achieve (Apache's documentation is sorely lacking in examples, in my opinion).
It's not all of the way there. But, I was able to get a JSON file created with the XML and Map. From there I can just use the .json file to create a .csv and then get rid of the columns I don't want.
// Define Weekly Date Range
def today = new Date()
def lastPeriodStart = today - 8
def lastPeriodEnd = today - 2
def dateFrom = lastPeriodStart.format("yyyy-MM-dd")
def dateTo = lastPeriodEnd.format("yyyy-MM-dd")
def jiraClient = new RESTClient(jiraUrl)
jiraClient.ignoreSSLIssues()
// Creates and Begins the File
File file = new File("${dateFrom}_RPT05.json")
file.write("")
file.append("[\n")
// Defines the File
def arrplace = 0
def arrsize = 0
def headers = [
"Authorization" : "Basic " + jiraAuth,
"X-Atlassian-token": "no-check",
"Content-Type" : "application/json"
]
def response = jiraClient.get(
path: "/plugins/servlet/tempo-getWorklog/",
query: [
tempoApiToken: "${tempoApiToken}",
format: "xml",
dateFrom: "${dateFrom}",
dateTo: "${dateTo}",
addUserDetails: "true",
addApprovalStatus: "true",
addIssueSummary: "true"
],
headers: headers
) { response, worklogs ->
println "Processing..."
// Gets Size of Array
worklogs.worklog.each { worklognodes ->
arrsize = arrsize+1 }
// Start Building the Output - Creates a Worklog Map
worklogs.worklog.each { worklognodes ->
worklognodes = convertToMap(worklognodes)
// Convert Map to a JSON String
def json_str = JsonOutput.toJson(worklognodes)
// Adds Row to File
file.append(json_str)
arrplace = arrplace+1
if(arrplace<arrsize)
{file.append(",")}
file.append("\n")
print "."
}
}
file.append("]")
// Helper Method
def convertToMap(nodes) {
nodes.children().collectEntries {
if (it.name() == 'user-prop') {
[it['#key'], it.childNodes() ? convertToMap(it) : it.text()]
} else {
[it.name(), it.childNodes() ? convertToMap(it) : it.text()]
}
}
}
The output is a collection/array of worklogs.

Can JSON String format be converted to Actual format using groovy?

I have the following JSON String format getting from external source:-
What kind of format is this actually?
{
id=102,
brand=Disha,
book=[{
slr=EFTR,
description=Grammer,
data=TYR,
rate=true,
numberOfPages=345,
maxAllowed=12,
currentPage=345
},
{
slr=EFRE,
description=English,
data=TYR,
rate=true,
numberOfPages=345,
maxAllowed=12,
currentPage=345
}]
}
I want to convert this into actual JSON format like this: -
{
"id": "102",
"brand": "Disha",
"book": [{
"slr": "EFTR",
"description": "Grammer",
"data": "TYR",
"rate": true,
"numberOfPages": 345,
"maxAllowed": "12",
"currentPage": 345
},
{
"slr": "EFRE",
"description": "English",
"data": "TYR",
"rate": true,
"numberOfPages": 345,
"maxAllowed": "12",
"currentPage": 345
}]
}
Is this achievable using groovy command or code?
Couple of things:
You do not need Groovy Script test step which is currently there as step3
For step2, Add a 'Script Assertion` with given below script
Provide step name for nextStepName in the script below for which you want to add the request.
//Provide the test step name where you want to add the request
def nextStepName = 'step4'
def setRequestToStep = { stepName, requestContent ->
context.testCase.testSteps[stepName]?.httpRequest.requestContent = requestContent
}
//Check the response
assert context.response, 'Response is empty or null'
setRequestToStep(nextStepName, context.response)
EDIT: Based on the discussion with OP on the chat, OP want to update existing request of step4 for a key and its value as step2's response.
Using samples to demonstrate the change input and desired outputs.
Let us say, step2's response is:
{
"world": "test1"
}
And step4's existing request is :
{
"key" : "value",
"key2" : "value2"
}
Now, OP wants to update value of key with first response in ste4's request, and desired is :
{
"key": {
"world": "test1"
},
"key2": "value2"
}
Here is the updated script, use it in Script Assertion for step 2:
//Change the key name if required; the step2 response is updated for this key of step4
def keyName = 'key'
//Change the name of test step to expected to be updated with new request
def nextStepName = 'step4'
//Check response
assert context.response, 'Response is null or empty'
def getJson = { str ->
new groovy.json.JsonSlurper().parseText(str)
}
def getStringRequest = { json ->
new groovy.json.JsonBuilder(json).toPrettyString()
}
def setRequestToStep = { stepName, requestContent, key ->
def currentRequest = context.testCase.testSteps[stepName]?.httpRequest.requestContent
log.info "Existing request of step ${stepName} is ${currentRequest}"
def currentReqJson = getJson(currentRequest)
currentReqJson."$key" = getJson(requestContent)
context.testCase.testSteps[stepName]?.httpRequest.requestContent = getStringRequest(currentReqJson)
log.info "Updated request of step ${stepName} is ${getStringRequest(currentReqJson)}"
}
setRequestToStep(nextStepName, context.request, keyName)
We can convert the invalid JSON format to valid JSON format using this line of code:-
def validJSONString = JsonOutput.toJson(invalidJSONString).toString()

Reading data from the JSON object

I have JSON data in a file json_format.py as follows:
{
"name" : "ramu",
"place" : "hyd",
"height" : 5.10,
"list" : [1,2,3,4,5,6],
"tuple" : (0,1,2),
"colors" : {"mng":"white","aft" : "blue","night":"red"},
"car" : "None",
"bike" : "True",
}
I'm reading the above with this code:
import json
from pprint import pprint
with open (r'C:/PythonPrograms\Json_example/json_format.py') as jobj:
fp = jobj.readlines()
b = json.dumps(fp) # ---> I get string
print(type(b))
c = json.loads(b)
print(type(c)) # ---> List
pprint(c)
print(c[0])
pprint(c["name"])
Now, I would like to access the JSON object as c['name'] and the output should be ramu.
Since c is a list, I can't do so. How can I read my JSON data so that I can access it with keys?
Thanks in advance!
You're effectively doing c = json.loads(json.dumps(jobj.readlines())) when you just need:
c = json.load(jobj)
print(c["name"]) # ramu
Also, your JSON is malformed.
There are no tuples in JSON: "tuple" : (0,1,2),
Your last item should not end with a comma: "bike" : "True",

Loading JSON data to a list in a particular order using PyMongo

Let's say I have the following document in a MongoDB database:
{
"assist_leaders" : {
"Steve Nash" : {
"team" : "Phoenix Suns",
"position" : "PG",
"draft_data" : {
"class" : 1996,
"pick" : 15,
"selected_by" : "Phoenix Suns",
"college" : "Santa Clara"
}
},
"LeBron James" : {
"team" : "Cleveland Cavaliers",
"position" : "SF",
"draft_data" : {
"class" : 2003,
"pick" : 1,
"selected_by" : "Cleveland Cavaliers",
"college" : "None"
}
},
}
}
I'm trying to collect a few values under "draft_data" for each player in an ORDERED list. The list needs to look like the following for this particular document:
[ [1996, 15, "Phoenix Suns"], [2003, 1, "Cleveland Cavaliers"] ]
That is, each nested list must contain the values corresponding to the "pick", "selected_by", and "class" keys, in that order. I also need the "Steve Nash" data to come before the "LeBron James" data.
How can I achieve this using pymongo? Note that the structure of the data is not set in stone so I can change this if that makes the code simpler.
I'd extract the data and turn it into a list in Python, once you've retrieved the document from MongoDB:
for doc in db.collection.find():
for name, info in doc['assist_leaders'].items():
draft_data = info['draft_data']
lst = [draft_data['class'], draft_data['pick'], draft_data['selected_by']]
print name, lst
List comprehension is the way to go here (Note: don't forget .iteritems() in Python2 or .items() in Python3 or you'll get a ValueError: too many values to unpack).
import pymongo
import numpy as np
client = pymongo.MongoClient()
db = client[database_name]
dataList = [v for i in ["Steve Nash", "LeBron James"]
for key in ["class", "pick", "selected_by"]
for document in db.collection_name.find({"assist_leaders": {"$exists": 1}})
for k, v in document["assist_leaders"][i]["draft_data"].iteritems()
if k == key]
print dataList
# [1996, 15, "Phoenix Suns", 2003, 1, "Cleveland Cavaliers"]
matrix = np.reshape(dataList, [2,3])
print matrix
# [ [1996, 15, "Phoenix Suns"],
# [2003, 1, "Cleveland Cavaliers"] ]