Mongolite group by/aggregate on JSON object - json

I have a json document like this on my mongodb collection:
Updated document:
{
"_id" : ObjectId("59da4aef8c5d757027a5a614"),
"input" : "hi",
"output" : "Hi. How can I help you?",
"intent" : "[{\"intent\":\"greeting\",\"confidence\":0.8154089450836182}]",
"entities" : "[]",
"context" : "{\"conversation_id\":\"48181e58-dd51-405a-bb00-c875c01afa0a\",\"system\":{\"dialog_stack\":[{\"dialog_node\":\"root\"}],\"dialog_turn_counter\":1,\"dialog_request_counter\":1,\"_node_output_map\":{\"node_5_1505291032665\":[0]},\"branch_exited\":true,\"branch_exited_reason\":\"completed\"}}",
"user_id" : "50001",
"time_in" : ISODate("2017-10-08T15:57:32.000Z"),
"time_out" : ISODate("2017-10-08T15:57:35.000Z"),
"reaction" : "1"
}
I need to perform group by on intent.intent field and I'm using Rstudio with mongolite library.
What I have tried is :
pp = '[{"$unwind": "$intent"},{"$group":{"_id":"$intent.intent", "count": {"$sum":1} }}]'
stats <- chat$aggregate(
pipeline=pp,
options = '{"allowDiskUse":true}'
)
print(stats)
But it's not working, output for above code is
_id count
1 NA 727

If intent attribute type is string and keep the object as string.
We can split it to array with \" and use third item of array.
db.getCollection('test1').aggregate([
{ "$project": { intent_text : { $arrayElemAt : [ { $split: ["$intent", "\""] } ,3 ] } } },
{ "$group": {"_id": "$intent_text" , "count": {"$sum":1} }}
])
Result:
{
"_id" : "greeting",
"count" : 1.0
}

Related

Splitting Json to multiple jsons in NIFI

I have the below json file which I want to split in NIFI
Input:
[ {
"id" : 123,
"ticket_id" : 345,
"events" : [ {
"id" : 3322,
"type" : "xyz"
}, {
"id" : 6675,
"type" : "abc",
"value" : "sample value",
"field_name" : "subject"
}, {
"id" : 9988,
"type" : "abc",
"value" : [ "text_file", "json_file" ],
"field_name" : "tags"
}]
}]
and my output should be 3 different jsons like below:
{
"id" : 123,
"ticket_id" : 345,
"events.id" :3322,
"events.type":xyz
}
{
"id" : 123,
"ticket_id" : 345,
"events.id" :6675,
"events.type":"abc",
"events.value": "sample value"
"events.field_name":"subject"
}
{
"id" : 123,
"ticket_id" : 345,
"events.id" :9988,
"events.type":"abc",
"events.value": "[ "text_file", "json_file" ]"
"events.field_name":"tags"
}
I want to know can we do it using splitjson? I mean can splitjson split the json based on the array of json objects present inside the json?
Please let me know if there is a way to achieve this.
If you want 3 different flow files, each containing one JSON object from the array, you should be able to do it with SplitJson using a JSONPath of $ and/or $.*
Using reduce function:
function split(json) {
return json.reduce((acc, item) => {
const events = item.events.map((evt) => {
const obj = {id: item.id, ticket_id: item.ticket_id};
for (const k in evt) {
obj[`events.${k}`] = evt[k];
}
return obj;
});
return [...acc, ...events];
}, []);
}
const input = [{"id":123,"ticket_id":345,"events":[{"id":3322,"type":"xyz"},{"id":6675,"type":"abc","value":"sample value","field_name":"subject"},{"id":9988,"type":"abc","value":["text_file","json_file"],"field_name":"tags"}]}];
const res = split(input);
console.log(res);

Extract values from dictionary and create list of tuples in Robot Framework

I am trying to extract values from a dictionary and return as list of tuples in Robot Framework. Would you suggest how to go about it?
my JSON content looks like this :
{
"_embedded" : {
"products" : [ {
"id" : "BMHY2IZB",
"Name" : "ANR",
"securityType" : "type1",
"_links" : {
"self" : {
"href" : "https://test.com/v1/products/BMHY2IZB"
},
"relatedproducts" : {
"href" : "https://test.com/v1/products/BMHY2IZB/related"
}
}
}, {
"id" : "FXDNZBW",
"Name" : "STREPLC",
"securityType" : "ANV",
"_links" : {
"self" : {
"href" : "https://test.com/v1/products/FXDNZBW"
},
"relatedProducts" : {
"href" : "https://test.com/v1/products/FXDNZBW/related"
}
}
} ]
},
"page" : {
"size" : 20,
"totalElements" : 2,
"totalPages" : 1,
"number" : 0
}
}
And with the below code from Robot Framework:
${fileload} = get file ../../Resources/Sample.json
${json}= to json ${fileload}
${PRD}= get from dictionary ${json} _embedded
${products}= get from dictionary ${PRD} products
${PRDlist} = create list
: FOR ${product} in #{products}
\ append to list ${PRDlist} ${product}
log to console ${PRDlist}
I get a response like this :
[{'id': 'BMHY2IZB', 'Name': 'ANR', 'securityType': 'type1', '_links':
{'self': {'href': 'https://test.com/v1/products/BMHY2IZB'},
'relatedproducts': {'href': 'https://test.com/v1/products/BMHY2
IZB/related'}}}, {'id': 'FXDNZBW', 'Name': 'STREPLC', 'securityType':
'ANV',
'_links': {'self': {'href': 'https://test.com/v1/products/FXDNZBW'},
'relatedProducts': {'href':
'https://test.com/v1/products/FXDNZBW/related'}}}]
But I wanted selected values returned as list of tuples :
[{'BMHY2IZB','ANR','type1'},{'FXDNZBW','STREPLC','ANV'}]
This seem to work :
import os
import collections
def APIResponse(dict):
prds = dict.get('_embedded')
products = prds.get('products')
l2 = []
for i in range(len(products)):
v1= products[i].get('id')
v2= products[i].get('Name')
v3= products[i].get('securityType')
l1 = (v1,v2,v3)
l2.append(l1)
return l2

How to find all the json key-value pair by matching the value using json query

I have below JSON structure :
{
"key" : "value",
"array" : [
{ "key" : 1 },
{ "key" : 2, "misc": {
"a": "Apple",
"b": "Butterfly",
"c": "Cat",
"d": "Dog"
} },
{ "key" : 3 }
],
"tokenize" : {
"firstkey" : {
"token" : 0
},
"secondkey" : {
"token" : 1
},
"thirdkey" : {
"token" : 0
}
}
}
I am able to traverse the above structure till array->dictionary->b by the below syntax :
$.array[?(#.key=2)].misc.b
Now I need to print all the tokens which has value 0. The same way as shown above I can traverse till $.array[?(#.key=2)].tokenize.
How can I query it to print all values having token:0 .
To be very precise, I want the output to be shown as :
[
"tokenize" : {
"firstkey" : {
"token" : 0
},
"thirdkey" : {
"token" : 0
}
}
]
The following query already showing something near to what I want but it does not show the keys ("firstkey" and "thirdkey" in this case).
$.tokenize[?(#.token == 0)]
Please help me to get this as well.
Thanks.
You can try this script.
$.tokenize[?(#.token == 0)].token
Result:
[
0,
0
]
$.tokenize[?(#.token == 0)]~
will output
[
"firstkey",
"thirdkey"
]
for the OP's sample json, use https://jsonpath-plus.github.io/JSONPath/demo/ to verify against your data.

Query for : How many elements of an array are matching within a string in mongoDb

Suppose my JSON is like following:
{ "id":0,"keywords":"amount,debited,account,ticket,not,generated,now" }
{ "id":1,"keywords":"how,safe,gocash" }
{ "id":2,"keywords":"how,referral,program,gocash,works" }
If my array is like
array =["how","safe","gocash"];
then how do I get the count that while checking with first; count should be zero, with second three and with third two. (That means how many elements of an array are present in the string)
Is it possible or what approach I should adopt?
One way of solving this would require some form of modification to your schema by adding an extra field that holds the keywords in an array. This field becomes quite handy when running an aggregation pipeline to return the desired count of elements of an array that match the original string.
To add the additional field you would need the Bulk API operations to update the collection as follows:
var bulk = db.collection.initializeOrderedBulkOp(),
count = 0;
db.collection.find({"keywords": { "$exists": true, "$type": 2 }}).forEach(function(doc) {
var keywordsArray = doc.keywords.split(',');
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "keywordsArray": keywordsArray }
});
count++;
if (count % 100 == 0) {
bulk.execute();
bulk = db.collection.initializeUnorderedBulkOp();
}
});
if (count % 100 != 0) { bulk.execute(); }
The above creates an additional field "keywordsArray" that is a result of splitting the keywords string to an array.
After the operation your sample collection would have the documents:
/* 0 */
{
"_id" : ObjectId("561e24e9ba53a16c763eaab4"),
"id" : 0,
"keywords" : "amount,debited,account,ticket,not,generated,now",
"keywordsArray" : [
"amount",
"debited",
"account",
"ticket",
"not",
"generated",
"now"
]
}
/* 1 */
{
"_id" : ObjectId("561e24e9ba53a16c763eaab5"),
"id" : 1,
"keywords" : "how,safe,gocash",
"keywordsArray" : [
"how",
"safe",
"gocash"
]
}
/* 2 */
{
"_id" : ObjectId("561e24e9ba53a16c763eaab6"),
"id" : 2,
"keywords" : "how,referral,program,gocash,works",
"keywordsArray" : [
"how",
"referral",
"program",
"gocash",
"works"
]
}
On to the next stage, the aggregation framework pipeline, run the following pipeline operation which uses the $let, $size and $setIntersection operators to work out the the desired count result:
var array = ["how","safe","gocash"];
db.collection.aggregate([
{
"$project": {
"id": 1, "keywords": 1,
"count": {
"$let": {
"vars": {
"commonToBoth": { "$setIntersection": [ "$keywordsArray", array ] }
},
"in": { "$size": "$$commonToBoth" }
}
}
}
}
])
Sample Output:
/* 0 */
{
"result" : [
{
"_id" : ObjectId("561e24e9ba53a16c763eaab4"),
"id" : 0,
"keywords" : "amount,debited,account,ticket,not,generated,now",
"count" : 0
},
{
"_id" : ObjectId("561e24e9ba53a16c763eaab5"),
"id" : 1,
"keywords" : "how,safe,gocash",
"count" : 3
},
{
"_id" : ObjectId("561e24e9ba53a16c763eaab6"),
"id" : 2,
"keywords" : "how,referral,program,gocash,works",
"count" : 2
}
],
"ok" : 1
}

Generating Mongo query from MySQL query

I have been using the following MySQL command to construct a heatmap from log data. However, I have a new data set that is stored in a Mongo database and I need to run the same command.
select concat(a.packages '&' b.packages) "Concurrent Packages",
count(*) "Count"
from data a
cross join data b
where a.packages<b.packages and a.jobID=b.jobID
group by a.packages, b.packages
order by a.packages, b.packages;
Keep in mind that the tables a and b do not exist prior to the query. However, they are created from the packages column of the data table, which has jobID as the field which I want to check for matches. In other words if two packages are within the same job I want to add an entry to the concurrent usage count. How can I generate a similar query in Mongo?
This is not a "join" of different documents; it is an operation within one document, and can be done in MongoDB.
You have a SQL TABLE "data" like this:
JobID TEXT,
package TEXT
The best way to store this in MongoDB will be a collection called "data", containing one document per JobID that contains an array of packages:
{
_id: <JobID>,
packages: [
"packageA",
"packageB",
....
]
}
[ Note: you could also implement your data table as only one document in MongoDB, containing an array of jobs which contain each an array of packages. This is not recommended, because you might hit the 16MB document size limit and nested arrays are not (yet) well supported by different queries - if you want to use the data for other purposes as well ]
Now, how to get a result like this ?
{ pair: [ "packageA", "packageB" ], count: 20 },
{ pair: [ "packageA", "packageC" ], count: 11 },
...
As there is no built-in "cross join" of two arrays in MongoDB, you'll have to program it out in the map function of a mapReduce(), emitting each pair of packages as a key:
mapf = function () {
that = this;
this.packages.forEach( function( p1 ) {
that.packages.forEach( function( p2 ) {
if ( p1 < p2 ) {
key = { "pair": [ p1, p2 ] };
emit( key, 1 );
};
});
});
};
[ Note: this could be optimized, if the packages arrays were sorted ]
The reduce function is nothing more than summing up the counters for each key:
reducef = function( key, values ) {
count = 0;
values.forEach( function( value ) { count += value } );
return count;
};
So, for this example collection:
> db.data.find()
{ "_id" : "Job01", "packages" : [ "pA", "pB", "pC" ] }
{ "_id" : "Job02", "packages" : [ "pA", "pC" ] }
{ "_id" : "Job03", "packages" : [ "pA", "pB", "pD", "pE" ] }
we get the following result:
> db.data.mapReduce(
... mapf,
... reducef,
... { out: 'pairs' }
... );
{
"result" : "pairs",
"timeMillis" : 443,
"counts" : {
"input" : 3,
"emit" : 10,
"reduce" : 2,
"output" : 8
},
"ok" : 1,
}
> db.pairs.find()
{ "_id" : { "pair" : [ "pA", "pB" ] }, "value" : 2 }
{ "_id" : { "pair" : [ "pA", "pC" ] }, "value" : 2 }
{ "_id" : { "pair" : [ "pA", "pD" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pA", "pE" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pB", "pC" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pB", "pD" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pB", "pE" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pD", "pE" ] }, "value" : 1 }
For more information on mapReduce consult: http://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/ and http://docs.mongodb.org/manual/applications/map-reduce/
You can't. Mongo doesn't do joins. Switching from SQL to Mongo is a lot more involved than migrating your queries.
Typically, you would include all the pertinent information in the same record (rather than normalize the information and select it with a join). Denormalize!