JSON Path Combining - json

I have a json file like below.
{"trackId":610957461,"countryCode":"TR","deviceType":"IPHONE","date":"2020-10-01","rankings":
[
{"keyword":"boyner","rank":1},
{"keyword":"giyim","rank":1},
{"keyword":"ykm","rank":1},
{"keyword":"colin\\s","rank":1},
{"keyword":"erkek giyim","rank":1},
{"keyword":"boyner kart","rank":1},
{"keyword":"giyim siteleri","rank":1}
]}
When i set json path like $, I see that only trackid,countrycode,devicetype,date columns.
I want keyword and rank columns in addition to these.
So What is the right json path for this columns?

Using this expression (using Jayway)
$..["rankings"]..["keyword", "rank"]
outputs
[
{
"keyword" : "boyner",
"rank" : 1
},
{
"keyword" : "giyim",
"rank" : 1
},
{
"keyword" : "ykm",
"rank" : 1
},
{
"keyword" : "colin\\s",
"rank" : 1
},
{
"keyword" : "erkek giyim",
"rank" : 1
},
{
"keyword" : "boyner kart",
"rank" : 1
},
{
"keyword" : "giyim siteleri",
"rank" : 1
}
]

Related

jq - how can I flat my list of lists into one level list

how can I have transformed my json
{
"clients": [
{
"id" : "qwerty",
"accounts" : [{"number" : "6666"}, {"number" : "7777"}]
},
{
"id" : "zxcvb",
"accounts" : [{"number" : "1111"}, {"number" : "2222"}]
}
]
}
into following type of json? using JQ
{
"items": [
{
"id" : "qwerty",
"number" : "6666"
},{
"id" : "qwerty",
"number" : "7777"
},{
"id" : "zxcvb",
"number" : "1111"
},{
"id" : "zxcvb",
"number" : "2222"
}]
}
What kind of tools from JQ can help me? I can't choose any possible way to do it
Something like this should do the trick:
{items: [.clients[] | {id} + .accounts[]]}
Online demo

How to find all the json key-value pair by matching the value using json query

I have below JSON structure :
{
"key" : "value",
"array" : [
{ "key" : 1 },
{ "key" : 2, "misc": {
"a": "Apple",
"b": "Butterfly",
"c": "Cat",
"d": "Dog"
} },
{ "key" : 3 }
],
"tokenize" : {
"firstkey" : {
"token" : 0
},
"secondkey" : {
"token" : 1
},
"thirdkey" : {
"token" : 0
}
}
}
I am able to traverse the above structure till array->dictionary->b by the below syntax :
$.array[?(#.key=2)].misc.b
Now I need to print all the tokens which has value 0. The same way as shown above I can traverse till $.array[?(#.key=2)].tokenize.
How can I query it to print all values having token:0 .
To be very precise, I want the output to be shown as :
[
"tokenize" : {
"firstkey" : {
"token" : 0
},
"thirdkey" : {
"token" : 0
}
}
]
The following query already showing something near to what I want but it does not show the keys ("firstkey" and "thirdkey" in this case).
$.tokenize[?(#.token == 0)]
Please help me to get this as well.
Thanks.
You can try this script.
$.tokenize[?(#.token == 0)].token
Result:
[
0,
0
]
$.tokenize[?(#.token == 0)]~
will output
[
"firstkey",
"thirdkey"
]
for the OP's sample json, use https://jsonpath-plus.github.io/JSONPath/demo/ to verify against your data.

Retrieve item list by checking multiple attribute values in MongoDB in golang

This question based on MongoDB,How to retrieve selected items retrieve by selecting multiple condition.It is like IN condition in Mysql
SELECT * FROM venuelist WHERE venueid IN (venueid1, venueid2)
I have attached json data structure that I have used.[Ref: JSON STRUCTUE OF MONGODB ].
As an example, it has a venueList then inside the venue list, It has several attribute venue id and sum of user agents name and total count as value.user agents mean user Os,browser and device information. In this case I used os distribution.In that case i was count linux,ubuntu count on particular venueid.
it is like that,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 4
}
],
Finally I want to get count of all linux user count by selecting venueid list in one find query in MongoDB.
As example, I want to select all count of linux users by conditioning if venue id VID1212 or VID4343
Ref: JSON STRUCTUE OF MONGODB
{
"_id" : ObjectId("57f940c4932a00aba387b0b0"),
"tenantID" : 1,
"date" : "2016-10-09 00:23:56",
"venueList" : [
{
"id" : “VID1212”,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 4
}
],
“ssidList” : [ // this is list of ssid’s in venue
{
"id" : “SSID1212”,
"sum" : [
{
"name" : "linux",
"value" : 8
},
{
"name" : "ubuntu",
"value" : 6
}
],
“macList” : [ // this is mac list inside particular ssid ex: this is mac list inside the SSID1212
{
"id" : “12:12:12:12:12:12”,
"sum" : [
{
"name" : "linux",
"value" : 12
},
{
"name" : "ubuntu",
"value" : 1
}
]
}
]
}
]
},
{
"id" : “VID4343”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
],
"ssidList" : [
{
"id" : “SSID4343”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
],
"macList" : [
{
"id" : “43:43:43:43:43:34”,
"sum" : [
{
"name" : "linux",
"value" : 2
}
]
}
]
}
]
}
]
}
I am using golang as language to manipulation data with mongoldb using mgo.v2 package
expected out put is :
output
linux : 12+2 = 14
ubuntu : 4+0 = 4
Don't consider inner list in venuelist.
You'd need to use the aggregation framework where you would run an aggregation pipeline that first filters the documents in the collection based on
the venueList ids using the $match operator.
The second pipeline would entail flattening the venueList and sum subdocument arrays in order for the data in the documents to be processed further down the pipeline as denormalised entries. The $unwind operator is useful here.
A further filter using $match is necessary after unwinding so that only the documents you want to aggregate are allowed into the next pipeline.
The main pipeline would be the $group operator stage which aggregates the filtered documents to create the desired sums using the accumulator operator $sum. For the desired result, you would need to use a tenary operator like $cond to create the independent count fields since that will feed the number of documents to the $sum expression depending on the name value.
Putting this altogether, consider running the following pipeline:
db.collection.aggregate([
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList" },
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList.sum" },
{
"$group": {
"_id": null,
"linux": {
"$sum": {
"$cond": [
{ "$eq": [ "$venueList.sum.name", "linux" ] },
"$venueList.sum.value", 0
]
}
},
"ubuntu": {
"$sum": {
"$cond": [
{ "$eq": [ "$venueList.sum.name", "ubuntu" ] },
"$venueList.sum.value", 0
]
}
}
}
}
])
For usage with mGo, you can convert the above pipeline using the guidance in http://godoc.org/labix.org/v2/mgo#Collection.Pipe
For a more flexible and better performant alternative which executes much faster than the above, and also takes into consideration unknown values for the sum list, run the alternative pipeline as follows
db.collection.aggregate([
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList" },
{ "$match": { "venueList.id": { "$in": ["VID1212", "VID4343"] } } },
{ "$unwind": "$venueList.sum" },
{
"$group": {
"_id": "$venueList.sum.name",
"count": { "$sum": "$venueList.sum.value" }
}
},
{
"$group": {
"_id": null,
"counts": {
"$push": {
"name": "$_id",
"count": "$count"
}
}
}
}
])

How to format the TSV file in Druid

I am trying to load in a TSV in druid using this ingestion speck:
MOST UPDATED SPEC BELOW:
{
"type" : "index",
"spec" : {
"ioConfig" : {
"type" : "index",
"inputSpec" : {
"type": "local",
"baseDir": "quickstart",
"filter": "test_data.json"
}
},
"dataSchema" : {
"dataSource" : "local",
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "hour",
"queryGranularity" : "none",
"intervals" : ["2016-07-18/2016-07-22"]
},
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"dimensionsSpec" : {
"dimensions" : ["name", "email", "age"]
},
"timestampSpec" : {
"format" : "yyyy-MM-dd HH:mm:ss",
"column" : "date"
}
}
},
"metricsSpec" : [
{
"name" : "count",
"type" : "count"
},
{
"type" : "doubleSum",
"name" : "age",
"fieldName" : "age"
}
]
}
}
}
If my schema looks like this:
Schema: name email age
And actual dataset looks like this:
name email age Bob Jones 23 Billy Jones 45
Is this how the columns should be formatted^^ in the above dataset for a TSV? Like name email age should be first (the columns) and then the actual data. I am confused how Druid will know how to map the columns to the actual dataset in TSV format.
TSV stands for tab separated format, so it looks the same as csv but you will use tabs instead of commas e.g.
Name<TAB>Age<TAB>Address
Paul<TAB>23<TAB>1115 W Franklin
Bessy the Cow<TAB>5<TAB>Big Farm Way
Zeke<TAB>45<TAB>W Main St
you will use frist line as header to define your column names - so you can use "name", "age" or "email" in dimensions in your spec file
as for the gmt and utc, they are basically the same
There is no time difference between Greenwich Mean Time and
Coordinated Universal Time
first one is time zone, the other one is a time standard
btw don`t forget to include a column with some time value in your tsv file!!
so e.g. if you will have tsv file that looks like:
"name" "position" "office" "age" "start_date" "salary"
"Airi Satou" "Accountant" "Tokyo" "33" "2016-07-16T19:20:30+01:00" "162700"
"Angelica Ramos" "Chief Executive Officer (CEO)" "London" "47" "2016-07-16T19:20:30+01:00" "1200000"
your spec file should look like this:
{
"spec" : {
"ioConfig" : {
"inputSpec" : {
"type": "local",
"baseDir": "path_to_folder",
"filter": "name_of_the_file(s)"
}
},
"dataSchema" : {
"dataSource" : "local",
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "hour",
"queryGranularity" : "none",
"intervals" : ["2016-07-01/2016-07-28"]
},
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "tsv",
"dimensionsSpec" : {
"dimensions" : [
"position",
"age",
"office"
]
},
"timestampSpec" : {
"format" : "auto",
"column" : "start_date"
}
}
},
"metricsSpec" : [
{
"name" : "count",
"type" : "count"
},
{
"name" : "sum_sallary",
"type" : "longSum",
"fieldName" : "salary"
}
]
}
}
}

MongoDB aggregate and count json paths

I have a MongoDB Collection which contains data elements like this:
{
"_id" : "9878jr23geg",
"element" : {
"name" : "element7",
"Set" : [
{
"SubListA" : [
{
"name" : "AlbertEinstein",
"value" : "45"
},
{
"name" : "JohnDoe",
"value" : "34"
},
]
},
{
"MoreNames" : [
{
"name" : "TimMcGraw",
"value" : "39"
}
]
}
]
}
{
"_id" : "275678hfvd",
"element" : {
"name" : "element8",
"Set" : [
{
"SubListA" : [
{
"name" : "AlbertEinstein",
"value" : "45"
},
{
"name" : "JimmyKimmel",
"value" : "41"
}
]
}
]
}
I'm trying to count the occurrences of each unique name, grouped by the element of Set to which they belong. For example, both objects in my example above have an object with name: "AlbertEinstein" inside element.Set.SublistA; therefore I'd expect a return value something along the lines of:
element.Set.SublistA.AlbertEinstein | 2
Essentially, I'd like a count for each of the distinct names when the data is grouped by objects within element.Set.
Ideally, for the example given, I'd like all of:
element.Set.SubListA.AlbertEinstein | 2
element.Set.SubListA.JohnDoe | 1
element.Set.MoreNames.TimMcGraw | 1
element.Set.SublistA.JimmyKimmel | 1
I've tried several aggregate queries but none seems to achieve what I'm trying to do.