I'm trying to determine the best way to calculate the elapsed time it took for each operation, series of actions. Looking at my example data below, how might I take the min/max for the "actions" array, for each corresponding operation, which includes 'take' and 'throw' actions:
{
"name" : "test",
"location" : "here",
"operation" "hammer use",
"actions" : [
{
"action" : "take",
"object" : "hammer",
"timestamp" : "12332234234"
},
{
"action" : "drop",
"object" : "hammer",
"timestamp" : "12332234255"
},
{
"action" : "take",
"object" : "hammer",
"timestamp" : "12332234266"
},
{
"action" : "throw",
"object" : "hammer",
"timestamp" : "12332234277"
}
},
{
"name" : "test 2",
"location" : "there",
"operation" : "rock use",
"actions" : [
{
"action" : "take",
"object" : "rock",
"timestamp" : "12332534277"
},
{
"action" : "drop",
"object" : "rock",
"timestamp" : "12332534288"
},
{
"action" : "take",
"object" : "rock",
"timestamp" : "12332534299"
},
{
"action" : "throw",
"object" : "rock",
"timestamp" : "12332534400"
},
{
"name" : "test 3",
"location" : "elsewhere",
"operation" : "seal hose",
"actions" : [
{
"action" : "create",
"object" : "grommet",
"timestamp" : "12332534277"
},
{
"action" : "place",
"object" : "grommet",
"timestamp" : "12332534288"
},
{
"action" : "tighten",
"object" : "hose",
"timestamp" : "12332534299"
}
}
Expected output:
{
"operation" : "hammer use",
"elapsed_time" : 123
},
{
"operation" : "rock use",
"elapsed_time" : 123
}
I'm still new to rethinkdb and trying to get a hang for it. So far, I've come up with the following query to pick the specific records, i'm interested in, from the table:
r.db('test').table('operations').filter(function(row) {
return row('actions').contains(function(x) {
return x('action').eq('take')}).and(
row('actions').contains(function(x) { return x('action').eq('throw') })
);
});
I'm still trying to figure out how to aggregate the results by taking the min/max of the timestamp and subtracting them from each other.
I hope there's enough detail there to get an idea for the goal at hand. Let me know otherwise. Any help greatly appreciated.
Well, nobody tugged on this so I had to solve it without any help. Took a bit longer but finally figured out how. Here's the pseudocode for finding min/max on the nested fields above, and elapsed_time:
r.db('test').table('operations').filter(function(row) {
return row('actions').contains(function(x) { return x('action').eq("take") }).and(
row('actions').contains(function(x) { return x('action').eq("throw") })
);
}).map(function(doc) {
return {
operation: doc('operation'),
min: doc('actions')('timestamp').min(),
max: doc('actions')('timestamp').max(),
elapsed_time: doc('actions')('timestamp').max().sub(doc('actions')('timestamp').min())
}
})
Related
how can I have transformed my json
{
"clients": [
{
"id" : "qwerty",
"accounts" : [{"number" : "6666"}, {"number" : "7777"}]
},
{
"id" : "zxcvb",
"accounts" : [{"number" : "1111"}, {"number" : "2222"}]
}
]
}
into following type of json? using JQ
{
"items": [
{
"id" : "qwerty",
"number" : "6666"
},{
"id" : "qwerty",
"number" : "7777"
},{
"id" : "zxcvb",
"number" : "1111"
},{
"id" : "zxcvb",
"number" : "2222"
}]
}
What kind of tools from JQ can help me? I can't choose any possible way to do it
Something like this should do the trick:
{items: [.clients[] | {id} + .accounts[]]}
Online demo
I have the JSON that you can see below and I want to sum the values of the two objects, but when I make an aggregation it returns me 0.Here you can see the query that I use; really the first line I only use it to be sure that the path works, and it does. On the other hand,when I use this path in the aggregation query it gives me the "ID" and the "COUNT" with right values,but the "SUM" is always 0 when it must be 3600.Any idea?
db.getCollection('TEST').find({"prices.year.months.day.csv.price.valPrice":1800})
db.TEST.aggregate([
{ $match: {"location.cp":"20830"}},
{$group:{_id:"20830",total:{$sum:"$prices.year.months.day.csv.price.valPrice"}, count: { $sum: 1 }
}}])
And this is the JSON:
{
"_id" : "20830:cas:S:3639",
"lodgtype" : "Casa",
"lodg" : "Motrico: country holiday home - San sebastian",
"webid" : "6107939",
"location" : {
"thcod" : "20",
"cp" : "20830",
"th" : "Gipuzkoa",
"geometry" : {
"type" : "Point",
"coordinates" : [
43.31706238,
-2.40293598
]
}
},
"prices" : {
"year" : [
{
"valYear" : "2018",
"months" : [
{
"valMonth" : "02",
"day" : [
{
"valDay" : "13",
"csv" : [
{
"valCsv" : "20180205210908_223",
"price" : [
{
"valPrice" : 1800.0
}
]
}
]
}
]
}
]
}
]
},
"reg" : {
"created" : "20180213",
"updated" : "20180213",
"viewed" : "20180213"
}
},{
"_id" : "TEST20830:cas:S:3639",
"lodgtype" : "Casa",
"lodg" : "TESTMotrico: country holiday home - San sebastian",
"webid" : "6107930",
"location" : {
"thcod" : "20",
"cp" : "20830",
"th" : "Gipuzkoa",
"geometry" : {
"type" : "Point",
"coordinates" : [
43.31706238,
-2.40293598
]
}
},
"prices" : {
"year" : [
{
"valYear" : "2018",
"months" : [
{
"valMonth" : "02",
"day" : [
{
"valDay" : "13",
"csv" : [
{
"valCsv" : "20180205210908_223",
"price" : [
{
"valPrice" : 1800.0
}
]
}
]
}
]
}
]
}
]
},
"reg" : {
"created" : "20180213",
"updated" : "20180213",
"viewed" : "20180213"
}
}
Since you've deeply nested array you've to unwind to flatten to a document structure. To count the number of matches you've to use extra group after $match with $push with $$ROOT to keep the matching data.
db.TEST.aggregate([
{"$match":{"location.cp":"20830"}},
{"$group":{
"_id":"20830",
"data":{"$push":"$$ROOT"},
"count":{"$sum":1}
}},
{"$unwind":"$data.prices.year"},
{"$unwind":"$data.prices.year"},
{"$unwind":"$data.prices.year.months"},
{"$unwind":"$data.prices.year.months.day"},
{"$unwind":"$data.prices.year.months.day.csv"},
{"$unwind":"$data.prices.year.months.day.csv.price"},
{"$group":{
"_id":"20830",
"total":{"$sum":"$prices.year.months.day.csv.price.valPrice"},
"count":{"$first":"$count"}
}}
])
I am using Logstash 2.4 to read JSON messages from a Kafka topic and send them to an Elasticsearch Index.
The JSON format is as below --
{
"schema":
{
"type": "struct",
"fields": [
{
"type":"string",
"optional":false,
"field":"reloadID"
},
{
"type":"string",
"optional":false,
"field":"externalAccountID"
},
{
"type":"int64",
"optional":false,
"name":"org.apache.kafka.connect.data.Timestamp",
"version":1,
"field":"reloadDate"
},
{
"type":"int32",
"optional":false,
"field":"reloadAmount"
},
{
"type":"string",
"optional":true,
"field":"reloadChannel"
}
],
"optional":false,
"name":"reload"
},
"payload":
{
"reloadID":"328424295",
"externalAccountID":"9831200013",
"reloadDate":1446242463000,
"reloadAmount":240,
"reloadChannel":"C1"
}
}
Without any filter in my config file, the target documents from the ES index look like below --
{
"_index" : "kafka_reloads",
"_type" : "logs",
"_id" : "AVfcyTU4SyCFNFP2z5-l",
"_score" : 1.0,
"_source" : {
"schema" : {
"type" : "struct",
"fields" : [ {
"type" : "string",
"optional" : false,
"field" : "reloadID"
}, {
"type" : "string",
"optional" : false,
"field" : "externalAccountID"
}, {
"type" : "int64",
"optional" : false,
"name" : "org.apache.kafka.connect.data.Timestamp",
"version" : 1,
"field" : "reloadDate"
}, {
"type" : "int32",
"optional" : false,
"field" : "reloadAmount"
}, {
"type" : "string",
"optional" : true,
"field" : "reloadChannel"
} ],
"optional" : false,
"name" : "reload"
},
"payload" : {
"reloadID" : "155559213",
"externalAccountID" : "9831200014",
"reloadDate" : 1449529746000,
"reloadAmount" : 140,
"reloadChannel" : "C1"
},
"#version" : "1",
"#timestamp" : "2016-10-19T11:56:09.973Z",
}
}
But, I want only the value part of the "payload" field to move to my ES index as the target JSON body. So I tried to use the 'mutate' filter in the config file as below --
input {
kafka {
zk_connect => "zksrv-1:2181,zksrv-2:2181,zksrv-4:2181"
group_id => "logstash"
topic_id => "reload"
consumer_threads => 3
}
}
filter {
mutate {
remove_field => [ "schema","#version","#timestamp" ]
}
}
output {
elasticsearch {
hosts => ["datanode-6:9200","datanode-2:9200"]
index => "kafka_reloads"
}
}
With this filter, the ES documents now look like below --
{
"_index" : "kafka_reloads",
"_type" : "logs",
"_id" : "AVfch0yhSyCFNFP2z59f",
"_score" : 1.0,
"_source" : {
"payload" : {
"reloadID" : "850846698",
"externalAccountID" : "9831200013",
"reloadDate" : 1449356706000,
"reloadAmount" : 30,
"reloadChannel" : "C1"
}
}
}
But actually It should be like below --
{
"_index" : "kafka_reloads",
"_type" : "logs",
"_id" : "AVfch0yhSyCFNFP2z59f",
"_score" : 1.0,
"_source" : {
"reloadID" : "850846698",
"externalAccountID" : "9831200013",
"reloadDate" : 1449356706000,
"reloadAmount" : 30,
"reloadChannel" : "C1"
}
}
Is there a way to do this? Can anyone help me on this?
I also tried the below filter --
filter {
json {
source => "payload"
}
}
But that is giving me errors like --
Error parsing json {:source=>"payload", :raw=>{"reloadID"=>"572584696", "externalAccountID"=>"9831200011", "reloadDate"=>1449093851000, "reloadAmount"=>180, "reloadChannel"=>"C1"}, :exception=>java.lang.ClassCastException: org.jruby.RubyHash cannot be cast to org.jruby.RubyIO, :level=>:warn}
Any help will be much appreciated.
Thanks
Gautam Ghosh
You can achieve what you want using the following ruby filter:
ruby {
code => "
event.to_hash.delete_if {|k, v| k != 'payload'}
event.to_hash.update(event['payload'].to_hash)
event.to_hash.delete_if {|k, v| k == 'payload'}
"
}
What it does is:
remove all fields but the payload one
copy all payload inner fields at the root level
delete the payload field itself
You'll end up with what you need.
It's been a while but here there is a valid workaround, hope it would be useful.
json_encode {
source => "json"
target => "json_string"
}
json {
source => "json_string"
}
I am trying to load in a TSV in druid using this ingestion speck:
MOST UPDATED SPEC BELOW:
{
"type" : "index",
"spec" : {
"ioConfig" : {
"type" : "index",
"inputSpec" : {
"type": "local",
"baseDir": "quickstart",
"filter": "test_data.json"
}
},
"dataSchema" : {
"dataSource" : "local",
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "hour",
"queryGranularity" : "none",
"intervals" : ["2016-07-18/2016-07-22"]
},
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"dimensionsSpec" : {
"dimensions" : ["name", "email", "age"]
},
"timestampSpec" : {
"format" : "yyyy-MM-dd HH:mm:ss",
"column" : "date"
}
}
},
"metricsSpec" : [
{
"name" : "count",
"type" : "count"
},
{
"type" : "doubleSum",
"name" : "age",
"fieldName" : "age"
}
]
}
}
}
If my schema looks like this:
Schema: name email age
And actual dataset looks like this:
name email age Bob Jones 23 Billy Jones 45
Is this how the columns should be formatted^^ in the above dataset for a TSV? Like name email age should be first (the columns) and then the actual data. I am confused how Druid will know how to map the columns to the actual dataset in TSV format.
TSV stands for tab separated format, so it looks the same as csv but you will use tabs instead of commas e.g.
Name<TAB>Age<TAB>Address
Paul<TAB>23<TAB>1115 W Franklin
Bessy the Cow<TAB>5<TAB>Big Farm Way
Zeke<TAB>45<TAB>W Main St
you will use frist line as header to define your column names - so you can use "name", "age" or "email" in dimensions in your spec file
as for the gmt and utc, they are basically the same
There is no time difference between Greenwich Mean Time and
Coordinated Universal Time
first one is time zone, the other one is a time standard
btw don`t forget to include a column with some time value in your tsv file!!
so e.g. if you will have tsv file that looks like:
"name" "position" "office" "age" "start_date" "salary"
"Airi Satou" "Accountant" "Tokyo" "33" "2016-07-16T19:20:30+01:00" "162700"
"Angelica Ramos" "Chief Executive Officer (CEO)" "London" "47" "2016-07-16T19:20:30+01:00" "1200000"
your spec file should look like this:
{
"spec" : {
"ioConfig" : {
"inputSpec" : {
"type": "local",
"baseDir": "path_to_folder",
"filter": "name_of_the_file(s)"
}
},
"dataSchema" : {
"dataSource" : "local",
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "hour",
"queryGranularity" : "none",
"intervals" : ["2016-07-01/2016-07-28"]
},
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "tsv",
"dimensionsSpec" : {
"dimensions" : [
"position",
"age",
"office"
]
},
"timestampSpec" : {
"format" : "auto",
"column" : "start_date"
}
}
},
"metricsSpec" : [
{
"name" : "count",
"type" : "count"
},
{
"name" : "sum_sallary",
"type" : "longSum",
"fieldName" : "salary"
}
]
}
}
}
I want update a array value that is nested within an array value: i.e. set
status = enabled
where alerts.id = 2
{
"_id" : ObjectId("5496a8ed49847b6cd7c7b350"),
"name" : "joe",
"locations" : [
{
"name": "my location",
"alerts" : [
{
"id" : 1,
"status" : null
},
{
"id" : 2,
"status" : null
}
]
}
]
}
I would have used the position $ character, but cannot use it twice in a statement - multi positional operators are not supported yet: https://jira.mongodb.org/browse/SERVER-831
How do I issue a statement to only update the status field of an alert matching an id of 2?
UPDATE
If I change the schema as follows:
{
"_id" : ObjectId("5496ab2149847b6cd7c7b352"),
"name" : "joe",
"locations" : {
"my location" : {
"alerts" : [
{
"id" : 1,
"status" : "enabled"
},
{
"id" : 2,
"status" : "enabled"
}
]
},
"my other location" : {
"alerts" : [
{
"id" : 3,
"status" : null
},
{
"id" : 4,
"status" : null
}
]
}
}
}
I can then use:
update({"locations.my location.alerts.id":1},{$set: {"locations.my location.alerts.$.status": "enabled"}});
Problem is I cannot create indexes on the alert id :-(
it may be better of modelled as such, specially if an index on location and,or alerts.id is needed.
{
"_id" : ObjectId("5496a8ed49847b6cd7c7b350"),
"name" : "joe",
"location" : "myLocation",
"alerts" : [{
"id" : 1,
"status" : null
},
{
"id" : 2,
"status" : null
}
]
}
{
"_id" : ObjectId("5496a8ed49847b6cd7c7b350"),
"name" : "joe",
"location" : "otherLocation",
"alerts" : [{
"id" : 1,
"status" : null
},
{
"id" : 2,
"status" : null
}
]
}
I think you are having a wrong tool for the job. What you have in your example is relational data and it's much easier to handle with relational database. So I would suggest to use SQL-database instead of mongo.
But if you really want to do it with mongo, then I guess the only option is to fetch the document and modify it and put it back.