i 'm actually working on a stream, receiving a bunch of strings and need to make a count of all the strings. the sums is aggragated, that mean for the second record the sum was added to the day before
the output must be some json file looking like
{
"aggregationType" : "day",
"days before" : 2,
"aggregates" : [
{"date" : "2018-03-03",
"sum" : 120},
{"date" :"2018-03-04",
"sum" : 203}
]
}
i created a stream looking like :
val eventStream : DataStream [String] =
eventStream
.addSource(source)
.keyBy("")
.TimeWindow(Time.days(1), Time.days(1))
.trigger(new MyTriggerFunc)
.aggregation(new MyAggregationFunc)
.addSink(sink)
thank you in advance for the help :)
Note on working with JSON in Flink:
Use JSONDeserializationSchema to deserialize the events, which will produce ObjectNodes. You can map the ObjectNode to YourObject for convenience or continue working with the ObjectNode.
Tutorial on working with ObjectNode: http://www.baeldung.com/jackson-json-node-tree-model
Back to your case, you can do it like the following:
val eventStream : DataStream [ObjectNode] =
oneMinuteAgg
.addSource(source)
.windowAll()
.TimeWindow(Time.minutes(1))
.trigger(new MyTriggerFunc)
.aggregation(new MyAggregationFunc)
will output a stream of 1min aggregates
[
{
"date" :2018-03-03
"sum" : 120
},
{
"date" :2018-03-03
"sum" : 120
}
]
then chain another operator to the "oneMinuteAgg" that will add the 1min aggregates into 1day aggregates:
[...]
oneMinuteAgg
.windowAll()
.TimeWindow(Time.days(1))
.trigger(new Whatever)
.aggregation(new YourDayAggF)
that will output what you need
{
"aggregationType" : "day"
"days before" : 4
"aggregates : [{
"date" :2018-03-03
"sum" : 120
},
{
"date" :2018-03-03
"sum" : 120
}]
}
I used windowAll() assuming you don't need to key the stream.
Related
How do I access the JSON array to display the output of "AdjustedScheduleTime" from the Trip section?
I got it working for StopLabel as shown below, but I'm struggling to access AdjustedScheduleTime.
I tried the following:
["GetNextTripsForStopResponse"]["GetNextTripsForStopResult"]["Route"]["RouteDirection"]["Trips"]["Trip"]["AdjustedScheduleTime"]
but doesn't work.
override func viewDidLoad() {
super.viewDidLoad()
// Do any additional setup after loading the view, typically from a nib.
let parameters = [
"appID": "5rt5rydg", //incorrect appID
"apiKey": "3b5fb15rdgy5454hdrfhr", //incorrect apiKey
"routeNo": "14",
"stopNo": "8600",
"format": "JSON"
]
AF.request("https://api.octranspo1.com/v1.2/GetNextTripsForStop?", method: .post, parameters: parameters,encoding:
URLEncoding.httpBody, headers: nil).responseJSON{ response in
let swiftyJsonVar = JSON(response.result.value!)
print(swiftyJsonVar)
if let busInfo = swiftyJsonVar["GetNextTripsForStopResult"]["StopLabel"].string {
print(": ",busInfo)
print("Label1: ", self.label1.text = busInfo)
}
}
}
This is the results:
{
"GetNextTripsForStopResult" : {
"Error" : "",
"Route" : {
"RouteDirection" : {
"RouteLabel" : "St-Laurent",
"Error" : "",
"RequestProcessingTime" : "20190112151425",
"Trips" : {
"Trip" : [
{
"AdjustmentAge" : "0.38",
"GPSSpeed" : "0.5",
"Latitude" : "45.429457",
"Longitude" : "-75.684117",
"TripDestination" : "St-Laurent",
"LastTripOfSchedule" : false,
"TripStartTime" : "14:31",
"BusType" : "4LB - IN",
"AdjustedScheduleTime" : "11"
},
{
"AdjustmentAge" : "4.32",
"GPSSpeed" : "0.5",
"Latitude" : "45.413749",
"Longitude" : "-75.689748",
"TripDestination" : "St-Laurent",
"LastTripOfSchedule" : false,
"TripStartTime" : "14:46",
"BusType" : "4LB - IN",
"AdjustedScheduleTime" : "22"
},
{
"AdjustmentAge" : "0.55",
"GPSSpeed" : "31.3",
"Latitude" : "45.399587",
"Longitude" : "-75.727631",
"TripDestination" : "St-Laurent",
"LastTripOfSchedule" : false,
"TripStartTime" : "15:01",
"BusType" : "4L - IN",
"AdjustedScheduleTime" : "37"
}
]
},
"RouteNo" : 14,
"Direction" : "Eastbound"
}
},
"StopLabel" : "MCARTHUR \/ IRWIN MILLER",
"StopNo" : "8600"
}
}
: MCARTHUR / IRWIN MILLER //This is the desired output for StopLabel
Ok, so do you explain JSON. Here's a shot.
First some rules:
When you see opening { it means dictionary, you have to pick a key next
When you see opening [ it means array. you have to pick an index
When you see "SomeString": its a key in an array.
Dictionaries have keys, arrays have index. Pick accordingly..
So when we walk through this response:
We see that we start with {. We have a dictionary! We're expecting to see some keys next.
So lets pick a key: We only have one and it's "GetNextTripsForStopResult". so far we have: swiftyJsonVar["GetNextTripsForStopResult"]
We now look at the content of "GetNextTripsForStopResult". We see it's also a dictionary. Again we should have some keys. We do. We have Error, Route, StopLabel and more. Let's pick a key. Since we're trying to get to a "AdjustedScheduleTime", lets pick Route. so far we have ["GetNextTripsForStopResult"]["Route"]
Now lets look at the contents of Route. Its a dictionary again.
Again we pick a key and keep repeating till we hit Trip. You should have ["GetNextTripsForStopResult"]["Route"]["RouteDirection"]["Trips"]["Trip"]
Lets look at what we have in Trip Whats this?..its an array!
We have to pick an index now. We need to chose somehow. Thats the tricky part. In order to do that we need some more information. So lets just ARBITRARILY chose one. Lets take the last one. so we have: ["GetNextTripsForStopResult"]["Route"]["RouteDirection"]["Trips"]["Trip"][2]
Now we can get our final key AdjustedScheduleTime. So let's pick it!
["GetNextTripsForStopResult"]["Route"]["RouteDirection"]["Trips"]["Trip"][2]["AdjustedScheduleTime"]
Keep in mind:
These hard coded indexes are almost NEVER what you want. Maybe you need to show all the AdjustedScheduleTime to the user or let the user chose one, or add all of them up. That really depends on your application and what you're trying to accomplish. I chose the last index (2) arbitrarily without having any knowledge of your application, the api you're calling and what you're trying to achieve. Its VERY possible that you don't want the last index.
This is the structure of a document I have in one monggodb collection.
I wanted to understand how one can do a mongo aggregate of grouped count over key "code" and the index position in the nested json (not the priority as it can be any number but within schedules nested there can be just 5 values):
{
"_id" : ObjectId("5749e9fde4b0064e7362b560"),
"_class" : "com.weirdcompanyname.core.collectionname",
"rfId" : 1,
"scheduleds" : [
{
"code" : "556e4835f1eae40bdfa2f2001f2afc76",
"type" : "HT",
"priority" : 0
},
{
"code" : "8b2ab67af4f60e42f7ea64813b5795cf",
"type" : "HT",
"priority" : 1
},
{
"code" : "ed17101eb918b4d8c7c598e4884523ea",
"type" : "HT",
"priority" : 2
},
{
"code" : "7e0ffb4db",
"type" : "QZ",
"priority" : 3
},
{
"code" : "1453dfa1794f39b05f0259ad04699073",
"type" : "HT",
"priority" : 4
}
],
"created" : ISODate("2016-05-28T18:57:00.878Z")
}
The result I'm trying to find is:
code index_position count
556e4835f1eae40bdfa2f2001f2afc76 0 100
8b2ab67af4f60e42f7ea64813b5795cf 1 100
ed17101eb918b4d8c7c598e4884523ea 2 100
7e0ffb4db 3 100
1453dfa1794f39b05f0259ad04699073 4 100
I could get my head around unwinding the nested json in single arrays and then grouping the code over code and maybe other column, let's say priority and have the count but the problem is to get the index position.
Is this even doable on mongo, I've read around a lot of stuff about it and I figured if I have value for which I need a position then it can be doable but I don't really have a value to look for, what I'm looking for is each code and its index position in the "scheduleds" and count.
This is what I could do with my limited mongo querying skills:
db.collectionname.aggregate([{'$match':{'date_key':{'$gte': yesterday_beginning, '$lte': yesterday_end}}}, {'$unwind':'$scheduleds'}, {'$group':{'_id':{'code':'$scheduleds.code','priority':'$scheduleds.priority'}, 'rfid':{'$addToSet':'$rfId'}}}, {'$project':{'_id':0, 'code':'$_id.code', 'priority':'$_id.priority', 'totalRfid':{'$size':'$rfid'}}}, { $limit : 1000 }],{ allowDiskUse:true})
Alain1405 says here that MongoDB 3.2 supports unwinding of the array index.
Instead of passing a path the $unwind operator, you can pass an
object with the field path and the field includeArrayIndex which
will hold the array index.
From MongoDB official documentation:
{
$unwind:
{
path: <field path>,
includeArrayIndex: <string>,
preserveNullAndEmptyArrays: <boolean>
}
}
I am a newbie to MongoDB. I am experimenting the various ways of extracting fields from a document inside collection.
Here in the below JSON document, I am finding it difficult to get extract it according to my need
{
"_id":1,
"dependencies":{
"a":[
"hello",
"hi"
],
"b":[
"Hmmm"
],
"c":[
"Vanilla",
"Strawberry",
"Pista"
],
"d":[
"Carrot",
"Cauliflower",
"Potato",
"Cabbage"
]
},
"productid":"25",
"date":"Thu Jul 30 11:36:49 PDT 2015"
}
I need to display the following output:
c:[
"Vanilla",
"Strawberry",
"Pista"
]
Can anyone please help me in solving it?
MongoDB Aggregation comes into rescue to get the result you are looking for :
$Project--> Passes along the documents with only the specified fields to the next stage in the pipeline. The specified fields can be existing fields from the input documents or newly computed fields.
db.collection.aggregate( [
{ $project :
{ c: "$dependencies.c", _id : 0 }
}
]).pretty();
As per the output you required, we just need to project ( display) the field "dependencies.c" , so we are creating a new field "c" and assigining the value of the "dependencies.c" into it.
Also by defalut "_id" field will be display along with the result, since you dont need it, so we are suppressing of the _id field by assigining "_id" : <0 or false>, so that it will not display the _id field in the output.
The above query will fetch you the result as below :
"c" : [
"Vanilla",
"Strawberry",
"Pista"
]
I have nested JSON like -
"disks" : [ {
"name" : "v2.16",
"diskAggregate" : "aggr0",
"diskRPM" : 15000,
"totalSizeBytes" : 1077477376,
"vendorId" : "NETAPP ",
"usedBytes" : 1070071808,
"diskType" : "FCAL",
"uuid" : "4E455441:50502020:56442D31:3030304D:422D465A:2D353230:32353836:30303030:00000000:00000000",
"portName" : "FC:A ",
"raidGroup" : "rg0"
},
{
"name" : "v4.16",
"diskAggregate" : "aggr0",
"diskRPM" : 15000,
"totalSizeBytes" : 1077477376,
"vendorId" : "NETAPP ",
"usedBytes" : 1070071808,
"diskType" : "FCAL",
"uuid" : "4E455441:50502020:56442D31:3030304D:422D465A:2D353230:32353633:34333030:00000000:00000000",
"portName" : "FC:B ",
"raidGroup" : "rg0"
}]
I want to get addition 'totalSizeBytes' from above list of objects.
I used following code to get it -
val storageDevices = "above given json".toList
val totalCapacity = storageDevices.foldLeft(0) {
case (sumOfAllDevices, storageDevice) =>
val sumOfTotalBytesOnStorageDevice = storageDevice.disks.foldLeft(0) {
case (totalBytesOnDevice, disk) =>
totalBytesOnDevice + disk.usedBytes.getOrElse(0).toString.toInt
}
sumOfAllDevices + sumOfTotalBytesOnStorageDevice
// Logger.info("dss"+sumOfTotalBytesOnStorageDevice.toString.toInt)
}
This code gives me total capacity in Integer format. But as there are too many objects in disks array, the totalCapacity will get exceed int. So I wanted to convert it to Long while doing addition.
I want following output-
"totalCapacity": [
{
"name": "192.168.20.22",
"y": 123456789
}
]
How do I convert it to Long to get exact sum of all 'totalBytesAvailable' from array/list???
Cast zero values as 0L (by default assumed Int), both in foldLeft(0L) and in getOrElse(0L), so the compiler will enforce arithmetic additions on Long.
I have a rather complex structure on my json and I cannot find how to query it to get the rows I am interested in. Here is a sample of my data:
{
"_id" : ObjectId("5282bf9ce4b05216ca1b68f8"),
"authorID" : ObjectId("5282a8c3e4b0d7f4f4d07b9a"),
"blogID" : "7180831558698033600",
"blogs" : {
"$" : {
"posts" : [
[
{
"author" : {
"displayName" : "mms",
...
...
...
}}}
So, I am interested in finding all json entries that have the author displayName equal to "mms".
My collection name is bz so, a find all query would be: db.dz.find()
What criteria do I have to put inside the find() to only get json document with author displayName equal to mms?
Any ideas?
Thank you in advance!
Suppose you have replaced field name "$" with "dollarSign".
Then db.dz.find({"blogs.dollarSign.posts.author.displayName": "mms"}) will fetch whole documents according to your requirements.