Filtering JSONPath with given string value - json

If I have a JSON like so:
{
"data": [
{
"service" : { "id" : 1 }
},
{
"service" : { "id" : 2 }
},
{
"service" : {}
}
]
}
This query works:
$..service[?(#.id==2)]
And gives expected result:
[
{
"id" : 2
}
]
However, if I had strings as id's:
{
"data": [
{
"service" : { "id" : "a" }
},
{
"service" : { "id" : "b" }
},
{
"service" : {}
}
]
}
Running similar query:
$..service[?(#.id == "a")]
Gives no results (empty array).
I am using this evaluator.
I was looking at docs here but could not find anything to point me in the right direction... Any help if someone knows how to write such query? Thanks :)

without " works
$..service[?(#.id == b)]
give this result
[
{
"id" : "b"
}
]

Related

How to find all the json key-value pair by matching the value using json query

I have below JSON structure :
{
"key" : "value",
"array" : [
{ "key" : 1 },
{ "key" : 2, "misc": {
"a": "Apple",
"b": "Butterfly",
"c": "Cat",
"d": "Dog"
} },
{ "key" : 3 }
],
"tokenize" : {
"firstkey" : {
"token" : 0
},
"secondkey" : {
"token" : 1
},
"thirdkey" : {
"token" : 0
}
}
}
I am able to traverse the above structure till array->dictionary->b by the below syntax :
$.array[?(#.key=2)].misc.b
Now I need to print all the tokens which has value 0. The same way as shown above I can traverse till $.array[?(#.key=2)].tokenize.
How can I query it to print all values having token:0 .
To be very precise, I want the output to be shown as :
[
"tokenize" : {
"firstkey" : {
"token" : 0
},
"thirdkey" : {
"token" : 0
}
}
]
The following query already showing something near to what I want but it does not show the keys ("firstkey" and "thirdkey" in this case).
$.tokenize[?(#.token == 0)]
Please help me to get this as well.
Thanks.
You can try this script.
$.tokenize[?(#.token == 0)].token
Result:
[
0,
0
]
$.tokenize[?(#.token == 0)]~
will output
[
"firstkey",
"thirdkey"
]
for the OP's sample json, use https://jsonpath-plus.github.io/JSONPath/demo/ to verify against your data.

Use $Map or $Unwind or both?

I want to get an array of results for each of the values within a nested array. The depth of the nest is horrible, it is objectRawOriginData.Reports.Rows.Rows.Cells. I didn't want to use this as my example for the forum, hence I've create a more simple one where the nest value for the name is columns.row.0.value and the value is columns.row.1.value
I've created an example below which might help explain my problem.
What I want to get is an array with the following name value pair:
header 1 : value 1
header 2 : value 2
{
"_id" : ObjectId("565baae61506995581569437"),
"objectType" : "Report",
"columns" : [
{
"rows" : [
{
"value" : "header 1"
},
{
"value" : "value 1"
}
]
},
{
"rows" : [
{
"value" : "header 2"
},
{
"value" : "value 2"
}
]
}
]
}
I gave this a go below but, it's not giving me the pair of values. I need not only position 0 value "header 1" but also position 1 value of "value 1".
db.testing.aggregate(
{ $match : { objectType: "Report"}},
{ $project: {_id: 0, columns: 1, rows:1}},
{ $unwind: "$columns" },
{ $unwind: "$columns.rows" },
{ $match: {"columns.rows.value": "header 1"}},
{ $group: {_id: null, columns: { $push: "$columns" }}},
{ $project: {_id: 0,columns: 1}}
//,{$out : "entity_datapoints"}
)
However that just gives me:
{
"result" : [
{
"columns" : [
{
"rows" : {
"value" : "header 1"
}
}
]
}
],
"ok" : 1.0000000000000000
}
Thanks, Matt
Matt, unfortunately you cannot do it as of now. Projecting key names as value of another field is not yet possible in mongoDB using aggregation. According to this JIRA ticket, it is currently "planned but not scheduled".
I think you can achieve similar thing by doing map reduce.

Sub-records in Avro with Morphlines

I'm trying to convert JSON into Avro using the kite-sdk morphline module. After playing around I'm able to convert the JSON into Avro using a simple schema (no complex data types).
Then I took it one step further and modified the Avro schema as displayed below (subrec.avsc). As you can see the schema consist of a subrecord.
As soon as I tried to convert the JSON to Avro using the morphlines.conf and the subrec.avsc it failed.
Somehow the JSON paths "/record_type[]/alert/action" are not translated by the toAvro function.
The morphlines.conf
morphlines : [
{
id : morphline1
importCommands : ["org.kitesdk.**"]
commands : [
# Read the JSON blob
{ readJson: {} }
{ logError { format : "record: {}", args : ["#{}"] } }
# Extract JSON
{ extractJsonPaths { flatten: false, paths: {
"/record_type[]/alert/action" : /alert/action,
"/record_type[]/alert/signature_id" : /alert/signature_id,
"/record_type[]/alert/signature" : /alert/signature,
"/record_type[]/alert/category" : /alert/category,
"/record_type[]/alert/severity" : /alert/severity
} } }
{ logError { format : "EXTRACTED THIS : {}", args : ["#{}"] } }
{ extractJsonPaths { flatten: false, paths: {
timestamp : /timestamp,
event_type : /event_type,
source_ip : /src_ip,
source_port : /src_port,
destination_ip : /dest_ip,
destination_port : /dest_port,
protocol : /proto,
} } }
# Create Avro according to schema
{ logError { format : "WE GO TO AVRO"} }
{ toAvro { schemaFile : /etc/flume/conf/conf.empty/subrec.avsc } }
# Create Avro container
{ logError { format : "WE GO TO BINARY"} }
{ writeAvroToByteArray { format: containerlessBinary } }
{ logError { format : "DONE!!!"} }
]
}
]
And the subrec.avsc
{
"type" : "record",
"name" : "Event",
"fields" : [ {
"name" : "timestamp",
"type" : "string"
}, {
"name" : "event_type",
"type" : "string"
}, {
"name" : "source_ip",
"type" : "string"
}, {
"name" : "source_port",
"type" : "int"
}, {
"name" : "destination_ip",
"type" : "string"
}, {
"name" : "destination_port",
"type" : "int"
}, {
"name" : "protocol",
"type" : "string"
}, {
"name": "record_type",
"type" : ["null", {
"name" : "alert",
"type" : "record",
"fields" : [ {
"name" : "action",
"type" : "string"
}, {
"name" : "signature_id",
"type" : "int"
}, {
"name" : "signature",
"type" : "string"
}, {
"name" : "category",
"type" : "string"
}, {
"name" : "severity",
"type" : "int"
}
] } ]
} ]
}
The output on { logError { format : "EXTRACTED THIS : {}", args : ["#{}"] } } I output the following:
[{
/record_type[]/alert / action = [allowed],
/record_type[]/alert / category = [],
/record_type[]/alert / severity = [3],
/record_type[]/alert / signature = [GeoIP from NL,
Netherlands],
/record_type[]/alert / signature_id = [88006],
_attachment_body = [{
"timestamp": "2015-03-23T07:42:01.303046",
"event_type": "alert",
"src_ip": "1.1.1.1",
"src_port": 18192,
"dest_ip": "46.231.41.166",
"dest_port": 62004,
"proto": "TCP",
"alert": {
"action": "allowed",
"gid": "1",
"signature_id": "88006",
"rev": "1",
"signature" : "GeoIP from NL, Netherlands ",
"category" : ""
"severity" : "3"
}
}],
_attachment_mimetype=[json/java + memory],
basename = [simple_eve.json]
}]
UPDATE 2017-06-22
you MUST populate the data in the structure in order for this to work, by using addValues or setValues
{
addValues {
micDefaultHeader : [
{
eventTimestampString : "2017-06-22 18:18:36"
}
]
}
}
after debugging the sources of morphline toAvro, it appears that the record is the first object to be evaluated, no matter what you put in your mappings structure.
the solution is quite simple, but unfortunately took a little extra time, eclipse, running the flume agent in debug mode, cloning the source code and lots of coffee.
here it goes.
my schema:
{
"type" : "record",
"name" : "co_lowbalance_event",
"namespace" : "co.tigo.billing.cboss.lowBalance",
"fields" : [ {
"name" : "dummyValue",
"type" : "string",
"default" : "dummy"
}, {
"name" : "micDefaultHeader",
"type" : {
"type" : "record",
"name" : "mic_default_header_v_1_0",
"namespace" : "com.millicom.schemas.root.struct",
"doc" : "standard millicom header definition",
"fields" : [ {
"name" : "eventTimestampString",
"type" : "string",
"default" : "12345678910"
} ]
}
} ]
}
morphlines file:
morphlines : [
{
id : convertJsonToAvro
importCommands : ["org.kitesdk.**"]
commands : [
{
readJson {
outputClass : java.util.Map
}
}
{
addValues {
micDefaultHeader : [{}]
}
}
{
logDebug { format : "my record: {}", args : ["#{}"] }
}
{
toAvro {
schemaFile : /home/asarubbi/Development/test/co_lowbalance_event.avsc
mappings : {
"micDefaultHeader" : micDefaultHeader
"micDefaultHeader/eventTimestampString" : eventTimestampString
}
}
}
{
writeAvroToByteArray {
format : containerlessJSON
codec : null
}
}
]
}
]
the magic lies here:
{
addValues {
micDefaultHeader : [{}]
}
}
and in the mappings:
mappings : {
"micDefaultHeader" : micDefaultHeader
"micDefaultHeader/eventTimestampString" : eventTimestampString
}
explanation:
inside the code the first field name that is evaluated is micDefaultHeader of type RECORD. as there's no way to specify a default value for a RECORD (logically correct), the toAvro code evaluates this, does not get any value configured in mappings and therefore it fails at it detects (wrongly) that the record is empty when it shouldn't.
however, taking a look at the code, you may see that it requires a Map object, containing no values to please the parser and continue to the next element.
so we add a map object using the addValues and fill it with an empty map [{}]. notice that this must match the name of the record that is causing you an empty value. in my case "micDefaultHeader"
feel free to comment if you have a better solution, as this looks like a "dirty fix"

Finding JSON objects in mongoDB

I'm trying to find objects using the built it queries and It just doesn't work..
My JSON file is something like this:
{ "Text1":
{
"id":"2"
},
"Text2":
{
"id":"2,3"
},
"Text3":
{
"id":"1"
}
}
And I write this db.myCollection.find({"id":2})
And it doesn't find anything.
When I write db.myCollection.find() it shows all the data as it should.
Anyone knows how to do it correctly?
Its hard to change the data-structure but as you want just your matching sub-document and you don't know where is your target sub-document (for example the query should be on Text1 or Text2 , ...) there is a good data structure for this:
{
"_id" : ObjectId("548dd9261a01c68fab8d67d7"),
"pair" : [
{
"id" : "2",
"key" : "Text1"
},
{
"id" : [
"2",
"3"
],
"key" : "Text2"
},
{
"id" : "1",
"key" : "Text3"
}
]
}
and your query is:
db.myCollection.findOne({'pair.id' : "2"} , {'pair.$':1, _id : -1}).pair // there is better ways (such as aggregation instead of above query)
as result you will have:
{
"0" : {
"id" : "2",
"key" : "Text1"
}
}
Update 1 (newbie way)
If you want all the document not just one use this
var result = [];
db.myCollection.find({'pair.id' : "2"} , {'pair.$':1, _id : -1}).forEach(function(item)
{
result.push(item.pair);
});
// the output will be in result
Update 2
Use this query to get all sub-documents
db.myCollection.aggregate
(
{ $unwind: '$pair' },
{ $match : {'pair.id' : "2"} }
).result
it produce output as
{
"0" : {
"_id" : ObjectId("548deb511a01c68fab8d67db"),
"pair" : {
"id" : "2",
"key" : "Text1"
}
},
"1" : {
"_id" : ObjectId("548deb511a01c68fab8d67db"),
"pair" : {
"id" : [
"2",
"3"
],
"key" : "Text2"
}
}
}
Since your are query specify a field in a subdocument this is what will work. see .find() documentation.
db.myCollection.find({"Text1.id" : "2"}, {"Text1.id": true})
{ "_id" : ObjectId("548dd798e2fa652e675af11d"), "Text1" : { "id" : "2" } }
If the query is on "Text1" or "Text2" the best thing to do here as mention in the accepted answer is changing you document structure. This can be easily done using the "Bulk" API.
var bulk = db.mycollection.initializeOrderedBulkOp(),
count = 0;
db.mycollection.find().forEach(function(doc) {
var pair = [];
for(var key in doc) {
if(key !== "_id") {
var id = doc[key]["id"].split(/[, ]/);
pair.push({"key": key, "id": id});
}
}
bulk.find({"_id": doc._id}).replaceOne({ "pair": pair });
count++; if (count % 300 == 0){
// Execute per 300 operations and re-Init
bulk.execute();
bulk = db.mycollection.initializeOrderedBulkOp();
}
})
// Clean up queues
if (count % 300 != 0 )
bulk.execute();
Your document now look like this:
{
"_id" : ObjectId("55edddc6602d0b4fd53a48d8"),
"pair" : [
{
"key" : "Text1",
"id" : [
"2"
]
},
{
"key" : "Text2",
"id" : [
"2",
"3"
]
},
{
"key" : "Text3",
"id" : [
"1"
]
}
]
}
Running the following query:
db.mycollection.aggregate([
{ "$project": {
"pair": {
"$setDifference": [
{ "$map": {
"input": "$pair",
"as": "pr",
"in": {
"$cond": [
{ "$setIsSubset": [ ["2"], "$$pr.id" ]},
"$$pr",
false
]
}
}},
[false]
]
}
}}
])
returns:
{
"_id" : ObjectId("55edddc6602d0b4fd53a48d8"),
"pair" : [
{
"key" : "Text1",
"id" : [
"2"
]
},
{
"key" : "Text2",
"id" : [
"2",
"3"
]
}
]
}

elasticsearch request an element in an array

I have a document indexed in my elastic search like:
{
...
purchase:{
zones: ["FR", "GB"]
...
}
...
}
I use this kind of query to find for example document with puchase's zone to GB
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"purchase.zones": "GB"
}
}
}
}
}
But with it i get no results...
I would like to perform a query like in php in_array("GB", purchase.zones).
Any help would be very helpful.
If your "purchase" field is nested type then you have to use nested query to access the "zones".
{
"nested" : {
"path" : "obj1",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : [
{
"match" : {"obj1.name" : "blue"}
},
{
"range" : {"obj1.count" : {"gt" : 5}}
}
]
}
}
}
}
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html