Groovy: compare two lazy maps/jsons - json

I have two jsons/lazy maps in the format as shown below. I now need to compare them to find if there is any difference between them. The reason I combine each set of values in a string so that the comparison becomes faster as my actual inputs (i.e. json messages) are going to be really large.
reqJson:
[["B1": 100, "B2": 200, "B3": 300, "B4": 400],["B1": 500, "B2": 600, "B3": 700, "B4": 800], ["B1": 900, "B2": 1000, "B3": 2000, "B4": 3000], ["B1": 4000, "B2": 5000, "B3": 6000, "B4": 7000]]
respJson:
[["B1": 100, "B2": 200, "B3": 300, "B4": 400],["B1": 500, "B2": 600, "B3": 700, "B4": 800], ["B1": 900, "B2": 1000, "B3": 2000, "B4": 3000], ["B1": 4000, "B2": 5000, "B3": 6000, "B4": 7000], ["B1": 8000, "B2": 9000, "B3": 10000, "B4": 11000]]
My code looks something like as shown below but somehow I am unable to get the desired result. I am unable to figure out what is going wrong. I am taking each value from response Json and compare it with any value in request-Json to find if there is a difference or not.
def diffCounter = 0
Set diffSet = []
respJson.each { respJ ->
reqJson.any {
reqJ ->
if (respJ.B1+respJ.B2+respJ.B3+respJ.B4 != reqJ.B1+reqJ.B2+reqJ.B3+reqJ.B4) {
diffCounter += 1
diffSet << [
"B1" : respJ.B1,
"B2" : respJ.B2,
"B3" : respJ.B3,
"B4" : respJ.B4
]
}
}
}
println ("Difference Count: "+ diffCounter)
println ("Difference Set: "+ diffSet)
Actual Output:
Difference Count: 5
Difference Set: [[B1:100, B2:200, B3:300, B4:400], [B1:500, B2:600, B3:700, B4:800], [B1:900, B2:1000, B3:2000, B4:3000], [B1:4000, B2:5000, B3:6000, B4:7000], [B1:8000, B2:9000, B3:10000, B4:11000]]
Expected Output:
Difference Count: 1
Difference Set: [["B1": 8000, "B2": 9000, "B3": 10000, "B4": 11000]]
NOTE: It can also happen that the request-json is bigger than the response-json so in that case I need to store the difference obtained from request-json into the diffSet.
Any inputs/suggestions in this regard will be helpful.

As #daggett mentioned, if your JSONs become more nested/complicated, you will want to use a library to do this job for you.
In your use case of pure Lists of elements (with values that can be concatenated/added to form a unique key for that element) there is no problem with doing it 'manually'.
The problem with your code is that you check if any reqJson entry has a different count, which for 2+ different reqJson entries is always true.
What you really want to check is if there is any matching reqJson entry that has the same count. And if you can't find any matching entry, then you know that entry only exists in respJson.
def diffCounter = 0
Set diffSet = []
respJson.each { respJ ->
def foundMatching = reqJson.any { reqJ ->
respJ.B1 + respJ.B2 + respJ.B3 + respJ.B4 == reqJ.B1 + reqJ.B2 + reqJ.B3 + reqJ.B4
}
if (!foundMatching) {
diffCounter += 1
diffSet << [
"B1" : respJ.B1,
"B2" : respJ.B2,
"B3" : respJ.B3,
"B4" : respJ.B4
]
}
}
println ("Difference Count: "+ diffCounter)
println ("Difference Set: "+ diffSet)
You mention that reqJson can become bigger than respJson and that in that case you want to switch the roles of the two arrays in the comparison, so that you always get the unmatched elements from the larger array. A trick to do this is to start by swapping the two variables around.
if (reqJson.size() > respJson.size()) {
(reqJson, respJson) = [respJson, reqJson]
}
Note that the time complexity of this algorithm is O(m * n * 2i), meaning it grows linearly with the multiplication of the sizes of the two arrays (m and n, here 5 and 4), times the count of property accesses we do every loop on both elements (i for both elements, here 4 because there are 4 Bs), because we potentially check each element of the smaller array one time for each element of the bigger array.
So if the arrays are tens of thousands of elements long, this will become very slow. A simple way to speed it up to O(m * i + n * i) would be to:
make a Set smallArrayKeys out of the concatenates messages/added values of the smaller array
iterate through the bigger array, check if it's concatenated message is contained in the smallArrayKeys Set, and if not then it only exists in the bigger array.

Related

FastAPI not returning list of SQLAlchemy rows properly

I am trying to return a list of SQLAlchemy rows and output it from a FastAPI endpoint. Each row in the list consists of the actor's name and the actor's total number of lines from a show. The query itself I believe is correct, but when viewing the output from the endpoint, one of the columns is missing for some reason.
The query function:
def get_actors(db: Session, detailed: bool = False) -> list[str]:
"""Return a list of actors and their total lines from the show"""
if not detailed:
query = (
db.query(models.Script.actor, func.count(models.Script.detail)) # (<actor name>, <total lines>)
.filter(models.Script.actor.isnot(None)) # Skip null actors.
.group_by(models.Script.actor) # Group unique actors.
.order_by(func.count(models.Script.detail).desc()) # Sort by most to least lines.
)
actors_list = [actor for actor in query]
print("ACTORS LIST:", actors_list) # Debug print, looks fine.
return actors_list
...
FastAPI endpoint
#holy_api.get("/actors", response_class=PrettyJSONResponse)
def get_actors(detailed: bool = False, db: Session = Depends(get_db)):
"""Get a list of all the actors from the show with their total lines, optionally detailed view"""
return crud.get_actors(db, detailed=detailed)
Now, when I open up /actors, the terminal prints out the debug looking like:
ACTORS LIST: [('Michael Palin', 2454), ('Eric Idle', 2107), ('John Cleese', 2044), ('Graham Chapman', 1848), ('Terry Jones', 1801), ('Carol Cleveland', 277), ('Terry Gilliam', 85), ('Terry\nJones', 35), ('Neil Innes', 12), ('Ian Davidson', 8), ('Connie Booth', 5), ('Katya Wyeth', 4), ('Rita Davies', 3), ('Marjorie Wilde', 3), ('Donna Reading', 2), ('Nicki Howorth', 1), ('Julia Breck', 1), ('Caron Gardener', 1)]
INFO: 127.0.0.1:54057 - "GET /actors HTTP/1.1" 200 OK
This is exactly what I need. But the actual JSON response looks like:
[
{
"actor": "Michael Palin"
},
{
"actor": "Eric Idle"
},
{
"actor": "John Cleese"
},
{
"actor": "Graham Chapman"
},
{
"actor": "Terry Jones"
},
{
"actor": "Carol Cleveland"
},
...
]
The total lines aren't shown next to the actor. Why? I am not sure if this is a problem with FastAPI not parsing the JSON properly or if this is a SQLAlchemy thing or I'm doing something plainly wrong.
Wow! Shortly after posting this question, I fixed it. All I had to do was attach a label to the aggregate count function: func.count(models.Script.detail).label("total_lines") I've been on this problem for hours and I feel really, really dumb right now.

Mule condense data based on a category

Example below. I've got a set of account numbers, with an account attribute. For each account_number there are three categories, and I would like the sum for each account number based on each balance in DataWeave.
Data input
[
{
Account_Number: 1,
Account: 5,
Category: "A",
Balance: 500
},
{
Account_Number: 1,
Account: 5,
Category: "A",
Balance: 700
},
{
Account_Number: 1,
Account: 5,
Category: "B",
Balance: 300
},
{
Account_Number: 1,
Account: 5,
Category: "C",
Balance: 100
},
{
Account_Number: 2,
Account: 10,
Category: "B",
Balance: 300
},
{
Account_Number: 2,
Account: 10,
Category: "B",
Balance: 800
}
]
Data Output
[
{
Account_Number: 1,
Account: 5,
CategoryA_Balance: 1200,
CategoryB_Balance: 300,
CategoryC_Balance: 100
}
{
Account_Number: 2,
Account: 10,
CategoryA_Balance: 0,
CategoryB_Balance: 1100,
CategoryC_Balance: 0
}
]```
I assume Categories are dynamic. If not, you can replace the Categories variable with a static array.
%dw 2.0
output application/json
var byAcctNbr = payload groupBy ($.Account_Number)
var categories = payload..Category distinctBy $
---
keysOf(byAcctNbr) map ((acctNbr) ->
do {
var item = byAcctNbr[acctNbr]
var outItem = (item[0] default {}) - "Balance" - "Category"
var balances = categories reduce ((category, acc={}) ->
do {
var accounts = item filter ($.Category == category)
---
acc ++ (
("Category" ++ category ++ "_Balance"): if (isEmpty(accounts)) 0
else sum (accounts.Balance)
)
})
---
outItem ++ balances
}
)
A Similar solution to sudhish. Breaking down the solution for better understanding
distinctBy Since .. will give you all the categories present in the input. DistinctBy will remove duplicates and you will have [A,B,C]
groupBy to group based over details of each account number
(item[0] - "Balance" - "Category") Since we require AccountNumber and Account only once so used item[0] and "-" to eliminate Balance and Category since we need to perform some conditional based logic further
pluck to convert the object with account number as key to array
map iterates over the details of each account number
map over the categories will yield you [A,B,C] for both the account numbers
filter to check if the Category present in the top level map matches the categories present in the variable. if (true) then sum(Balance) else 0
sum to add based on the categories matched using filter
%dw 2.0
output application/json
var categories = payload..Category distinctBy $
---
payload groupBy $.Account_Number pluck $ map(item,index)->{
(item[0] - "Balance" - "Category"),
(categories map (cat)->{
("Category" ++ cat ++ "_Balance"):
if (isEmpty(item filter ($.Category == cat)))
0
else
sum((item filter ($.Category == cat)).Balance)
})
}

Return a value of dictionary where a variable is inbetween keys or values

I have some data that I think would work best as a dictionary or JSON. The data has an initial category, a, b...z, and five bands within each category.
What I want to be able to do is give a function a category and a value and for the function to return the corresponding band.
I tried to create a dictionary like this where the values of each band are the lower threshold i.e. for category a, Band 1 is between 0 and 89:
bandings = {
'a' :
{
'Band 1' : 0,
'Band 2': 90,
'Band 3': 190,
'Band 4': 420,
'Band 5': 500
},
'b' :
{
'Band 1' : 0,
'Band 2': 500,
'Band 3': 1200,
'Band 4': 1700,
'Band 5': 2000
}
}
So if I was to run a function:
lookup_band(category='a', value=100)
it would return 'Band 3' as 100 is between 90 and 189 in category a
I also experimented with settings keys as ranges but struggled with how to handle a range of > max value in Band 5.
I can change the structure of the dictionary or use a different way of referencing the data.
Any ideas, please?
You can structure your data a little bit differently (using sorted lists instead of dictionaries) and use bisect module. For example:
from bisect import bisect
bandings = {
'a': [0, 90, 190, 420, 500],
'b': [0, 500, 1200, 1700, 2000]
}
def lookup_band(bandings, band, value):
return 'Band {}'.format(bisect(bandings[band], value))
print(lookup_band(bandings, 'a', 100)) # Band 2
print(lookup_band(bandings, 'b', 1700)) # Band 4
print(lookup_band(bandings, 'b', 9999)) # Band 5

Loading JSON data to a list in a particular order using PyMongo

Let's say I have the following document in a MongoDB database:
{
"assist_leaders" : {
"Steve Nash" : {
"team" : "Phoenix Suns",
"position" : "PG",
"draft_data" : {
"class" : 1996,
"pick" : 15,
"selected_by" : "Phoenix Suns",
"college" : "Santa Clara"
}
},
"LeBron James" : {
"team" : "Cleveland Cavaliers",
"position" : "SF",
"draft_data" : {
"class" : 2003,
"pick" : 1,
"selected_by" : "Cleveland Cavaliers",
"college" : "None"
}
},
}
}
I'm trying to collect a few values under "draft_data" for each player in an ORDERED list. The list needs to look like the following for this particular document:
[ [1996, 15, "Phoenix Suns"], [2003, 1, "Cleveland Cavaliers"] ]
That is, each nested list must contain the values corresponding to the "pick", "selected_by", and "class" keys, in that order. I also need the "Steve Nash" data to come before the "LeBron James" data.
How can I achieve this using pymongo? Note that the structure of the data is not set in stone so I can change this if that makes the code simpler.
I'd extract the data and turn it into a list in Python, once you've retrieved the document from MongoDB:
for doc in db.collection.find():
for name, info in doc['assist_leaders'].items():
draft_data = info['draft_data']
lst = [draft_data['class'], draft_data['pick'], draft_data['selected_by']]
print name, lst
List comprehension is the way to go here (Note: don't forget .iteritems() in Python2 or .items() in Python3 or you'll get a ValueError: too many values to unpack).
import pymongo
import numpy as np
client = pymongo.MongoClient()
db = client[database_name]
dataList = [v for i in ["Steve Nash", "LeBron James"]
for key in ["class", "pick", "selected_by"]
for document in db.collection_name.find({"assist_leaders": {"$exists": 1}})
for k, v in document["assist_leaders"][i]["draft_data"].iteritems()
if k == key]
print dataList
# [1996, 15, "Phoenix Suns", 2003, 1, "Cleveland Cavaliers"]
matrix = np.reshape(dataList, [2,3])
print matrix
# [ [1996, 15, "Phoenix Suns"],
# [2003, 1, "Cleveland Cavaliers"] ]

Dataframe in R to be converted to sequence of JSON objects

I had asked the same question after editing 2 times of a previous question I had posted. I am sorry for the bad usage of this website. I have flagged it for deletion and I am posting a proper new question on the same here. Please look into this.
I am basically working on a recommender system code. The output has to be converted to sequence of JSON objects. I have a matrix that has a look up table for every item ID, with the list of the closest items it is related to and the the similarity scores associated with their combinations.
Let me explain through a example.
Suppose I have a matrix
In the below example, Item 1 is similar to Items 22 and 23 with similarity scores 0.8 and 0.5 respectively. And the remaining rows follow the same structure.
X1 X2 X3 X4 X5
1 22 23 0.8 0.5
34 4 87 0.4 0.4
23 7 92 0.6 0.5
I want a JSON structure for every item (every X1 for every row) along with the recommended items and the similarity scores for each combination as a separate JSON entity and this being done in sequence. I don't want an entire JSON object containing these individual ones.
Assume there is one more entity called "coid" that will be given as input to the code. I assume it is XYZ and it is same for all the rows.
{ "_id" : { "coid" : "XYZ", "iid" : "1"}, "items" : [ { "item" : "22", "score" : 0.8},{ "item": "23", "score" : 0.5}] }
{ "_id" : { "coid" : "XYZ", "iid" : "34"},"items" : [ { "item" : "4", "score" : 0.4},{ "item": "87", "score" : 0.4}] }
{ "_id" : { "coid" : "XYZ", "iid" : "23"},"items" : [ { "item" : "7", "score" : 0.6},{ "item": "92", "score" : 0.5}] }
As in the above, each entity is a valid JSON structure/object but they are not put together into a separate JSON object as a whole.
I appreciate all the help done for the previous question but somehow I feel this new alteration I have here is not related to them because in the end, if you do a toJSON(some entity), then it converts the entire thing to one JSON object. I don't want that.
I want individual ones like these to be written to a file.
I am very sorry for my ignorance and inconvenience. Please help.
Thanks.
library(rjson)
## Your matrix
mat <- matrix(c(1,34,23,
22, 4, 7,
23,87,92,
0.8, 0.4, 0.6,
0.5, 0.4, 0.5), byrow=FALSE, nrow=3)
I use a function (not very interesting name makejson) that takes a row of the matrix and returns a JSON object. It makes two list objects, _id and items, and combines them to a JSON object
makejson <- function(x, coid="ABC") {
`_id` <- list(coid = coid, iid=x[1])
nitem <- (length(x) - 1) / 2 # Number of items
items <- list()
for(i in seq(1, nitem)) {
items[[i]] <- list(item = x[i + 1], score = x[i + 1 + nitem])
}
toJSON(list(`_id`=`_id`, items=items))
}
Then using apply (or a for loop) I use the function for each row of the matrix.
res <- apply(mat, 1, makejson, coid="XYZ")
cat(res, sep = "\n")
## {"_id":{"coid":"XYZ","iid":1},"items":[{"item":22,"score":0.8},{"item":23,"score":0.5}]}
## {"_id":{"coid":"XYZ","iid":34},"items":[{"item":4,"score":0.4},{"item":87,"score":0.4}]}
## {"_id":{"coid":"XYZ","iid":23},"items":[{"item":7,"score":0.6},{"item":92,"score":0.5}]}
The result can be saved to a file with cat by specifying the file argument.
## cat(res, sep="\n", file="out.json")
There is a small difference in your output and mine, the numbers are in quotes ("). If you want to have it like that, mat has to be character.
## mat <- matrix(as.character(c(1,34,23, ...
Hope it helps,
alex