Formatting SOLR response - json

I have a query returning following, which is standard output of SOLR.
{
Company : "Nokia"
Series : "X"
Products : ["3320", "1100"]
ProductStatus : ["Continued","Discontinued"]
}
I want to format the output as follows
{
Company : "Nokia"
Series : "X"
Products : [{
"name": "3320",
"Status":"Continued"
},
{
"name":"1100",
"Status":"Discontinued"
}]
}
How to achieve above?

Related

Sort / filter multiple objects in JQ by date

I'm trying to use JQ to find the most recent artifact in a Nexus API query. Right now, my JSON output looks something like:
{
"items" : [ {
"downloadUrl" : "https://nexus.ama.org/repository/Snapshots/org/sso/browser-manager/1.0-SNAPSHOT/browser-manager-1.0-20180703.144121-1.jar",
"path" : "org/sso/browser-manager/1.0-SNAPSHOT/browser-manager-1.0-20180703.144121-1.jar",
"id" : "V0FEQS1TbmFwc2hvdHM6MzhjZDQ3NTQwMTBkNGJhOTY1N2JiOTEyMTM1ZGRjZWQ",
"repository" : "Snapshots",
"format" : "maven2",
"checksum" : {
"sha1" : "7ac324905fb1ff15ef6020f256fcb5c9f54113ca",
"md5" : "bb25c483a183001dfdc58c07a71a98ed"
}
}, {
"downloadUrl" : "https://nexus.ama.org/repository/Snapshots/org/sso/browser-manager/1.0-SNAPSHOT/browser-manager-1.0-20180703.204941-2.jar",
"path" : "org/sso/browser-manager/1.0-SNAPSHOT/browser-manager-1.0-20180703.204941-2.jar",
"id" : "V0FEQS1TbmFwc2hvdHM6MzhjZDQ3NTQwMTBkNGJhOWM4YjQ0NmRjYzFkODkxM2U",
"repository" : "Snapshots",
"format" : "maven2",
"checksum" : {
"sha1" : "b4ba2049ea828391c720f49b6668a66a8b0bca9c",
"md5" : "6757c55c0e6d933dc90e398204cca966"
}
} ],
"continuationToken" : null
}
I've managed to use JQ to repackage the data as:
.items[] | { "id" : .id, "date" : (.path | scan("[0-9]{8}\\.[0-9-]*")) }
output:
{
"id": "V0FEQS1TbmFwc2hvdHM6MzhjZDQ3NTQwMTBkNGJhOTY1N2JiOTEyMTM1ZGRjZWQ",
"date": "20180703.144121-1"
}
{
"id": "V0FEQS1TbmFwc2hvdHM6MzhjZDQ3NTQwMTBkNGJhOWM4YjQ0NmRjYzFkODkxM2U",
"date": "20180703.204941-2"
}
Now I'm a little stuck trying to figure out which of the two JSON objects is the most recent. How can I sort by date and extract the id for that object?
Is there a better way to filter/sort this data? My example has only 2 items[] in the JSON response, but there may be a larger number of them.
The filter sort_by/1 will sort your timestamps in chronological order, but it requires an array as input, so you could write:
.items
| map({ "id" : .id, "date" : (.path | scan("[0-9]{8}\\.[0-9-]*")) })
| sort_by(.date)
| .[-1]
The trailing .[-1] selects the last item, so with your input the result would be:
{
"id": "V0FEQS1TbmFwc2hvdHM6MzhjZDQ3NTQwMTBkNGJhOWM4YjQ0NmRjYzFkODkxM2U",
"date": "20180703.204941-2"
}

Firebase JOIN of 2 nodes (friends and posts)

I try to find a way of joining two nodes in Firebase (JSON based structure)
Example data structure:
"users" : {
"1" : {
"name" : "Example Name",
"contacts" : {
"2" : true,
"3" : true
},
"posts" : {
"15" : true,
"28" : true
}
},
"posts" : {
"5" : {
user : "2",
date_time : "11.11.2016",
text : "example text"
},
"15" : {
user : "1",
date_time : "25.11.2016",
text : "example text"
}
}
The user should now have a newsfeed screen, where all posts of all of his contacts are listed. Therefore a join of the two nodes would make the query much more efficient.
Right now I would execute a query for each contact to get the post id's and then have to do a final query to get the actual posts.
EDIT: details and question in comments

Query for : How many elements of an array are matching within a string in mongoDb

Suppose my JSON is like following:
{ "id":0,"keywords":"amount,debited,account,ticket,not,generated,now" }
{ "id":1,"keywords":"how,safe,gocash" }
{ "id":2,"keywords":"how,referral,program,gocash,works" }
If my array is like
array =["how","safe","gocash"];
then how do I get the count that while checking with first; count should be zero, with second three and with third two. (That means how many elements of an array are present in the string)
Is it possible or what approach I should adopt?
One way of solving this would require some form of modification to your schema by adding an extra field that holds the keywords in an array. This field becomes quite handy when running an aggregation pipeline to return the desired count of elements of an array that match the original string.
To add the additional field you would need the Bulk API operations to update the collection as follows:
var bulk = db.collection.initializeOrderedBulkOp(),
count = 0;
db.collection.find({"keywords": { "$exists": true, "$type": 2 }}).forEach(function(doc) {
var keywordsArray = doc.keywords.split(',');
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "keywordsArray": keywordsArray }
});
count++;
if (count % 100 == 0) {
bulk.execute();
bulk = db.collection.initializeUnorderedBulkOp();
}
});
if (count % 100 != 0) { bulk.execute(); }
The above creates an additional field "keywordsArray" that is a result of splitting the keywords string to an array.
After the operation your sample collection would have the documents:
/* 0 */
{
"_id" : ObjectId("561e24e9ba53a16c763eaab4"),
"id" : 0,
"keywords" : "amount,debited,account,ticket,not,generated,now",
"keywordsArray" : [
"amount",
"debited",
"account",
"ticket",
"not",
"generated",
"now"
]
}
/* 1 */
{
"_id" : ObjectId("561e24e9ba53a16c763eaab5"),
"id" : 1,
"keywords" : "how,safe,gocash",
"keywordsArray" : [
"how",
"safe",
"gocash"
]
}
/* 2 */
{
"_id" : ObjectId("561e24e9ba53a16c763eaab6"),
"id" : 2,
"keywords" : "how,referral,program,gocash,works",
"keywordsArray" : [
"how",
"referral",
"program",
"gocash",
"works"
]
}
On to the next stage, the aggregation framework pipeline, run the following pipeline operation which uses the $let, $size and $setIntersection operators to work out the the desired count result:
var array = ["how","safe","gocash"];
db.collection.aggregate([
{
"$project": {
"id": 1, "keywords": 1,
"count": {
"$let": {
"vars": {
"commonToBoth": { "$setIntersection": [ "$keywordsArray", array ] }
},
"in": { "$size": "$$commonToBoth" }
}
}
}
}
])
Sample Output:
/* 0 */
{
"result" : [
{
"_id" : ObjectId("561e24e9ba53a16c763eaab4"),
"id" : 0,
"keywords" : "amount,debited,account,ticket,not,generated,now",
"count" : 0
},
{
"_id" : ObjectId("561e24e9ba53a16c763eaab5"),
"id" : 1,
"keywords" : "how,safe,gocash",
"count" : 3
},
{
"_id" : ObjectId("561e24e9ba53a16c763eaab6"),
"id" : 2,
"keywords" : "how,referral,program,gocash,works",
"count" : 2
}
],
"ok" : 1
}

Search within array object

I have a the following json object --
{
"Title": "Terminator,
"Purchases": [
{"Country": "US", "Site": "iTunes"},
{"Country": "FR", "Site": "Google"}
]
}
Given the above object, here is how the search results show yield:
"Titles on iTunes in US" ==> YES, show "Terminator"
"Titles on Google in FR" ==> YES, show "Terminator"
"Titles on iTunes in FR" ==> NO
However, if I just AND the query, to get Titles with Purchase.Country="FR" and Titles with Purchase.Site="iTunes", it would erroneously show the above result, since both conditions are met. However, I want to restrict that facet to within the purchase item. The equivalent in python code would be:
for purchase in item['Purchases']:
if purchase['Country'] == "FR" and purchase['Site'] == "iTunes":
return True
Currently it works like this:
for purchase in item['Purchases']:
if purchase['Country'] == "FR":
has_fr = True
if purchase['Site'] == "iTunes":
has_itunes = True
if has_itunes and has_fr: return True
How would this be done in ElasticSearch?
First, you need to index the "Purchases" field as a nested field, by defining the mapping of your object type like this:
{
"properties" : {
"Purchases" : {
"type" : "nested",
"properties": {
"Country" : {"type": "string" },
"Site" : {"type": "string" }
}
}
}
}
Only then will ElasticSearch keep the association between the individual countries and the individual sites, as described here.
Next, you should use a nested query, such as this one:
{ "query":
{ "nested" : {
"path" : "Purchases",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : [
{
"match" : {"Purchases.Country" : "US"}
},
{
"match" : {"Purchases.Site" : "iTunes"}
}
]
}
}
}
}
}
This will return your object if the query combines "US" and "iTunes", but not if it combines "US" and "Google". The details are described here.

Generating Mongo query from MySQL query

I have been using the following MySQL command to construct a heatmap from log data. However, I have a new data set that is stored in a Mongo database and I need to run the same command.
select concat(a.packages '&' b.packages) "Concurrent Packages",
count(*) "Count"
from data a
cross join data b
where a.packages<b.packages and a.jobID=b.jobID
group by a.packages, b.packages
order by a.packages, b.packages;
Keep in mind that the tables a and b do not exist prior to the query. However, they are created from the packages column of the data table, which has jobID as the field which I want to check for matches. In other words if two packages are within the same job I want to add an entry to the concurrent usage count. How can I generate a similar query in Mongo?
This is not a "join" of different documents; it is an operation within one document, and can be done in MongoDB.
You have a SQL TABLE "data" like this:
JobID TEXT,
package TEXT
The best way to store this in MongoDB will be a collection called "data", containing one document per JobID that contains an array of packages:
{
_id: <JobID>,
packages: [
"packageA",
"packageB",
....
]
}
[ Note: you could also implement your data table as only one document in MongoDB, containing an array of jobs which contain each an array of packages. This is not recommended, because you might hit the 16MB document size limit and nested arrays are not (yet) well supported by different queries - if you want to use the data for other purposes as well ]
Now, how to get a result like this ?
{ pair: [ "packageA", "packageB" ], count: 20 },
{ pair: [ "packageA", "packageC" ], count: 11 },
...
As there is no built-in "cross join" of two arrays in MongoDB, you'll have to program it out in the map function of a mapReduce(), emitting each pair of packages as a key:
mapf = function () {
that = this;
this.packages.forEach( function( p1 ) {
that.packages.forEach( function( p2 ) {
if ( p1 < p2 ) {
key = { "pair": [ p1, p2 ] };
emit( key, 1 );
};
});
});
};
[ Note: this could be optimized, if the packages arrays were sorted ]
The reduce function is nothing more than summing up the counters for each key:
reducef = function( key, values ) {
count = 0;
values.forEach( function( value ) { count += value } );
return count;
};
So, for this example collection:
> db.data.find()
{ "_id" : "Job01", "packages" : [ "pA", "pB", "pC" ] }
{ "_id" : "Job02", "packages" : [ "pA", "pC" ] }
{ "_id" : "Job03", "packages" : [ "pA", "pB", "pD", "pE" ] }
we get the following result:
> db.data.mapReduce(
... mapf,
... reducef,
... { out: 'pairs' }
... );
{
"result" : "pairs",
"timeMillis" : 443,
"counts" : {
"input" : 3,
"emit" : 10,
"reduce" : 2,
"output" : 8
},
"ok" : 1,
}
> db.pairs.find()
{ "_id" : { "pair" : [ "pA", "pB" ] }, "value" : 2 }
{ "_id" : { "pair" : [ "pA", "pC" ] }, "value" : 2 }
{ "_id" : { "pair" : [ "pA", "pD" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pA", "pE" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pB", "pC" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pB", "pD" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pB", "pE" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pD", "pE" ] }, "value" : 1 }
For more information on mapReduce consult: http://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/ and http://docs.mongodb.org/manual/applications/map-reduce/
You can't. Mongo doesn't do joins. Switching from SQL to Mongo is a lot more involved than migrating your queries.
Typically, you would include all the pertinent information in the same record (rather than normalize the information and select it with a join). Denormalize!