CEP's sequence detection‏ - fiware

In developing for Fiware's Proton CEP, I came across an issue with Sequence event detection. I'll take advantage of DoSAttack example project, that comes with the software, to explain the issue.
I make two main changes to an original copy of DoSAttack:
-One is to make ExpectedCrash event have 3 more variables. This way I can log to DoSAttackTRConsumer file the 3 values that triggered it.
-Then I also change the Cardinality Policy of the Agent from Single to Unrestricted. This way the event can be triggered several times in a row, as TrafficReports come in (this may be a source to the issue).
I test this result and I find it works ok. I can see in the log that the values that trigger detection are the sequence of 3 values that arrived just before the event, after the first three events have arrived.
This, taking into account that the test beeing made on those 3 values still remains the original example test: (TR3.volume>1.50* TR2.volume AND TR2.volume>1.50 * TR1.volume).
The issue arrises if I make the test be just (TR3.volume>1.50* TR2.volume), for example, then CEP doesn't hold TR1 correctly. Now TR1 is the same as TR2, so cep loses "memory" of this value.
Going a step further, I make the test, just the condition (3>2) which is always true and should trigger a detection on any event that arrives. In this case, as events arrive, all TR1, TR2 and TR3 are the same and CEP has no memory of past values, even though the agent is of Type: Sequence.
The desired application would be for the CEP to recieve 22 readings as a sequence of input events and analyse only the 1st, 8th, 15th and 22nd values of this sequence, at each value that enters. But I find I can't make CEP remember the values correctly unless I'm testing all of them explicitly in the Condition view-box.
What would be the correct way to analyse the 1st, 8th, 15th and 22nd values that arrived, evaluating each time a new one arrives?
Here is the specificatin of DoSAttack, after altering it:
{"epn":{"events":[{"name":"TrafficReport","attributes":[{"name":"volume","type":"Integer","dimension":0}]},{"name":"ExpectedCrash","attributes":[{"name":"Cost","type":"Double","dimension":0},{"name":"TR1","type":"Integer","dimension":"0"},{"name":"TR2","type":"Integer","dimension":"0"},{"name":"TR3","type":"Integer","dimension":"0"}]}],"epas":[{"name":"IncreasingTraffic","epaType":"Sequence","context":"3MinAfterStartUp","inputEvents":[{"name":"TrafficReport","alias":"TR1","consumptionPolicy":"Consume","instanceSelectionPolicy":"First"},{"name":"TrafficReport","alias":"TR2","consumptionPolicy":"Consume","instanceSelectionPolicy":"First"},{"name":"TrafficReport","alias":"TR3","consumptionPolicy":"Consume","instanceSelectionPolicy":"First"}],"computedVariables":[],"assertion":"3>2","evaluationPolicy":"Immediate","cardinalityPolicy":"Unrestricted","internalSegmentation":[],"derivedEvents":[{"name":"ExpectedCrash","reportParticipants":false,"expressions":{"Cost":"10","TR1":"TR1.volume","TR2":"TR2.volume","TR3":"TR3.volume"}}],"derivedActions":[]}],"contexts":{"temporal":[{"name":"3MinAfterStartUp","type":"TemporalInterval","atStartup":true,"neverEnding":false,"initiators":[],"terminators":[{"terminatorType":"RelativeTime","terminationType":"Terminate","relativeTime":"180000"}]}],"segmentation":[],"composite":[]},"consumers":[{"name":"SysTemCrashConsumer","type":"File","properties":[{"name":"filename","value":"/opt/tomcat10/sample/DoSAttack_PredictedCrash.txt"},{"name":"formatter","value":"json"},{"name":"delimiter","value":";"},{"name":"tagDataSeparator","value":"="},{"name":"SendingDelay","value":"1000"}],"events":[{"name":"ExpectedCrash"}],"actions":[]},{"name":"DoSAttackTRConsumer","type":"File","properties":[{"name":"filename","value":"/opt/tomcat10/sample/DoSAttack_TrafficReport.txt"},{"name":"formatter","value":"json"},{"name":"delimiter","value":";"},{"name":"tagDataSeparator","value":"="},{"name":"SendingDelay","value":"1000"}],"events":[{"name":"TrafficReport"}],"actions":[]}],"producers":[{"name":"TrafficReportFileProducer","type":"File","properties":[{"name":"filename","value":"/opt/tomcat10/sample/DoSAttackScenarioJSON.txt"},{"name":"pollingInterval","value":"1000"},{"name":"sendingDelay","value":"1500"},{"name":"formatter","value":"json"},{"name":"delimiter","value":";"},{"name":"tagDataSeparator","value":"="}],"events":[]}],"actions":[],"name":"DoSAttack"}}
The producer file, DoSAttackScenarioJSON.txt, is still the original one, unaltered:
{"Name":"TrafficReport", "volume":"1000"}
{"Name":"TrafficReport", "volume":"1600"}
{"Name":"TrafficReport", "volume":"2500"}
If you do include more values than 3 you can see that the issue propagates.
If you need more information let me know.
Thank you

In the Sequence pattern, the engine looks for event instances that occurred in a particular order.
In Sequence (A, B, C), the engine looks for three event instances, the first one of type A, the second of type B and the third of type C, where:
(A's detection time) <= (B's detection time) AND (B's detection time) <= (C's detection time)
Usually in a Sequence pattern, either the event types are different, or there is other condition above the participants events (as in the DoSAttack example).
When you use the same event type in a sequence (e.g., Sequence(A, A, A)), then the same event instance can be used in all the three places, since it holds the detection order listed above.
In addition, if you use a "consumptionPolicy": "Consume" for a participant event, then after the event was used to detect the pattern, it will not be used for future detections of this pattern.
This is why when you have a Sequence(A, A, A) with no condition, and event instance A1 of type A arrives, it causes a pattern detection, and since it has Consume policy, it will not be kept for future detections. Later when event A2 of type A arrives, it causes another detection based on A2 alone.
Also, according to the Sequence built-in condition over the detection time, a sequence of events can be detected although other events arrived in between.
Please describe the pattern you would like to detect. Maybe you can use a Trend or Aggregate EPA instead.

Related

Data model for timeline event synchronisation

I am looking for ideas on the data model for the following problem (and the proper CS terminology):
A (horizontal) "timeline" with several rows (A,B,C) contains "events" (1,2,3) width different durations (width) at different times (absolute x position or by delay "." after previous event):
A 1111....222222
B 33333
------------------
T 0123456789ABCDEF
(The rows are only interesting for graphical representation of overlapping/parallel "events", so they probably are not essential to the data model.)
Event duration may vary, affecting the whole timing:
A 11....222222
B 33333+3
------------------
T 0123456789ABCDEF
But let event 2 require events 1 and 3 to be finished, so the timing should look like this:
A 11.... 222222
B 33333+3
------------------
T 0123456789ABCDEF
(let's ignore that the original delay at T=7 is now missing.)
Originally I thought I'd have to have some "elastic" synchronization elements, one for each row:
A 11....####222222
B 33333+3#
------------------
T 0123456789ABCDEF
Thus the original problem of how to model and sync the sync elements in the two different "rows". But, as established above, this is only a matter of graphical/parallel representation.
Rather, the sync is a condition that could be "attached" to event 2, modifiying or determining its beginning.
If an event "has" a condition, it will not have an absolute or relative start time. Its start can only be determined at the ends of the "linked" events (1 and 3).
So, given (a list of) some events with variable duration and either an absolute start time or a delay relative to another event's end, how could the condition "events 1 and 3 ended" be modelled to determine the start of "event 2"?
(I will prototype this in JavaScript and eventually implement in C/C++, so any sample code provided should not use high-level data types or libraries.)
What you need is an object that I would call a TimeFrame. The object would have the attributes duration, link and type, where link can be a precise time or a link to another TimeFrame and type accounts for the kind of link. For instance, a given TimeFrame that starts at a known time would have that time as its link attribute and the type would be TIME. A TimeFrame that is linked to the end of another would have that other TimeFrame as its link attribute and START-END as its type and so on.
Using the combination between link and type you could also support other types of links such as START-START, END-START or END-END.
UPDATE
Also, in order to allow some time interval between say, the end of a TimeFrame and the start of the next, one can add the attribute lag, which represents any delay between events. So, for instance if tf1 and tf2 are TimeFrames such that tf2 must start 5 time units after the end of tf1 the attributes of tf2 would be link = tf1, type = START-END, duration = <something> and lag = 5. Note also that the lag could be negative, which would extend the expressiveness of the model to a broad range of relationships.
While #Leandro Caniglia nicely rephrased my question into an Object and Attributes, essentially, I see two options:
the whole list of "events" needs to be evaluated at "condition" (start/end) to check dependent "events".
adding a "link" to a "parent" also creates a link to the "child" (no need to evaluate all pending event's links).
Also:
The "link" property needs to be a List or Array to be able hold several references (e.g. 2:[1,3]).
Analogous to the link property start_me_on_condition a stop_me_on_condition association may be desirable (see Leandro's suggestion of type, it would need to be extended to support multiple links+type)
An independet delay "event" might be more practical than a lag property.

CEP - Proton: Complex Event Definition

I've installed the Proton GE and preformed a simple condition verification on an input event.
My goal is to verify more complex conditions. For example: If the rain level on a period of 48 hours exceeds a limit.
How can I define this verification? Can someone show me an example?
Thank you
please refer to the fraud sample : https://github.com/ishkin/Proton/tree/master/documentation/sample/fraud
It demonstrates more complex situations and the appropriate definitions. The folder containts the description and explanation of the application: https://github.com/ishkin/Proton/blob/master/documentation/sample/fraud/SuspiciousAccountExample.pdf, and the appropriate artifacts.
On a high level - you need to define a temporal context, lasting 48 hours (you need to decide when this context begins - usually some initiator event indicates the beginning of the temporal window) and you need to define an EPA - if its just filtering out an event based on threshold it can be EPA of type "Basic" with a threshold condition.

Proton CEP Fiware: delete old received events

I've got this kind of problem with Proton CEP: i currently have a "Sequence" EPA; its input are 2 events. But these events have different granularity: let's say i have A and B events; i receive N "A" events, and M "B" events, where M << N.
So i'd like to have a rule like "if event of type A is not consumed within X seconds, remove it", otherwise i've got a long A events queue; i only need the rule to be evaluated for closest (temporally) events.
Practically, i've got a fake room temperature sensor that sends its temperature updates every 5seconds, and i've got another program that checks external weather and sends it every minute.
Any idea how to solve this situation?
Thank you very much!
I guess that in "consume" you mean arrival, so do you want to evaluate the time the A event took to get to the proton pcoressor? or the time between A events? Do you want to ensure that the A events are indeed continuous in a fix rate? "Removing" an event means to ignore it, since events are not kept anywhere, just processed. At the end, what is that you want to detect here? Like, what is the trend of room temperature compared to the outside temperature? then, emit output events accordingly?
Thanks.
all the relevant event instances are kept within the local state of a corresponding EPA.
For each EPA operand you have policies which dictates how the state is gathered and how the matching set for event derivation is built.
For example, instance selection policy which is defined per operand, and has the values of "Each", "First" and "Last" will tell you if all A instances are examined for match with B instance, or the first (in the order of arrival), or the last.
The consumption policy says what to do with the operand state once a seqence is detected - should the instances of say A which participated in sequence be removed from EPA's state ("consume" value of the policy) or should they remain.
Playing with combination of those policies should give you the behaviour you require

Creating a logic gate simulator

I need to make an application for creating logic circuits and seeing the results. This is primarily for use in A-Level (UK, 16-18 year olds generally) computing courses.
Ive never made any applications like this, so am not sure on the best design for storing the circuit and evaluating the results (at a resomable speed, say 100Hz on a 1.6Ghz single core computer).
Rather than have the circuit built from the basic gates (and, or, nand, etc) I want to allow these gates to be used to make "chips" which can then be used within other circuits (eg you might want to make a 8bit register chip, or a 16bit adder).
The problem is that the number of gates increases massively with such circuits, such that if the simulation worked on each individual gate it would have 1000's of gates to simulate, so I need to simplify these components that can be placed in a circuit so they can be simulated quickly.
I thought about generating a truth table for each component, then simulation could use a lookup table to find the outputs for a given input. The problem occurred to me though that the size of such tables increase massively with inputs. If a chip had 32 inputs, then the truth table needs 2^32 rows. This uses a massive amount of memory in many cases more than there is to use so isn't practical for non-trivial components, it also wont work with chips that can store their state (eg registers) since they cant be represented as a simply table of inputs and outputs.
I know I could just hardcode things like register chips, however since this is for educational purposes I want it so that people can make their own components as well as view and edit the implementations for standard ones. I considered allowing such components to be created and edited using code (eg dlls or a scripting language), so that an adder for example could be represented as "output = inputA + inputB" however that assumes that the students have done enough programming in the given language to be able to understand and write such plugins to mimic the results of their circuit which is likly to not be the case...
Is there some other way to take a boolean logic circuit and simplify it automatically so that the simulation can determine the outputs of a component quickly?
As for storing the components I was thinking of storing some kind of tree structure, such that each component is evaluated once all components that link to its inputs are evaluated.
eg consider: A.B + C
The simulator would first evaluate the AND gate, and then evaluate the OR gate using the output of the AND gate and C.
However it just occurred to me that in cases where the outputs link back round to the inputs, will cause a deadlock because there inputs will never all be evaluated...How can I overcome this, since the program can only evaluate one gate at a time?
Have you looked at Richard Bowles's simulator?
You're not the first person to want to build their own circuit simulator ;-).
My suggestion is to settle on a minimal set of primitives. When I began mine (which I plan to resume one of these days...) I had two primitives:
Source: zero inputs, one output that's always 1.
Transistor: two inputs A and B, one output that's A and not B.
Obviously I'm misusing the terminology a bit, not to mention neglecting the niceties of electronics. On the second point I recommend abstracting to wires that carry 1s and 0s like I did. I had a lot of fun drawing diagrams of gates and adders from these. When you can assemble them into circuits and draw a box round the set (with inputs and outputs) you can start building bigger things like multipliers.
If you want anything with loops you need to incorporate some kind of delay -- so each component needs to store the state of its outputs. On every cycle you update all the new states from the current states of the upstream components.
Edit Regarding your concerns on scalability, how about defaulting to the first principles method of simulating each component in terms of its state and upstream neighbours, but provide ways of optimising subcircuits:
If you have a subcircuit S with inputs A[m] with m < 8 (say, giving a maximum of 256 rows) and outputs B[n] and no loops, generate the truth table for S and use that. This could be done automatically for identified subcircuits (and reused if the subcircuit appears more than once) or by choice.
If you have a subcircuit with loops, you may still be able to generate a truth table. There are fixed-point finding methods which can help here.
If your subcircuit has delays (and they are significant to the enclosing circuit) the truth table can incorporate state columns. E.g. if the subcircuit has input A, inner state B, and output C, where C <- A and B, B <- A, the truth table could be:
A B | B C
0 0 | 0 0
0 1 | 0 0
1 0 | 1 0
1 1 | 1 1
If you have a subcircuit that the user asserts implements a particular known pattern such as "adder", provide an option for using a hard-coded implementation for updating that subcircuit instead of by simulating its inner parts.
When I made a circuit emulator (sadly, also incomplete and also unreleased), here's how I handled loops:
Each circuit element stores its boolean value
When an element "E0" changes its value, it notifies (via the observer pattern) all who depend on it
Each observing element evaluates its new value and does likewise
When the E0 change occurs, a level-1 list is kept of all elements affected. If an element already appears on this list, it gets remembered in a new level-2 list but doesn't continue to notify its observers. When the sequence which E0 began has stopped notifying new elements, the next queue level is handled. Ie: the sequence is followed and completed for the first element added to level-2, then the next added to level-2, etc. until all of level-x is complete, then you move to level-(x+1)
This is in no way complete. If you ever have multiple oscillators doing infinite loops, then no matter what order you take them in, one could prevent the other from ever getting its turn. My next goal was to alleviate this by limiting steps with clock-based sync'ing instead of cascading combinatorials, but I never got this far in my project.
You might want to take a look at the From Nand To Tetris in 12 steps course software. There is a video talking about it on youtube.
The course page is at: http://www1.idc.ac.il/tecs/
If you can disallow loops (outputs linking back to inputs), then you can significantly simplify the problem. In that case, for every input there will be exactly one definite output. Cycles however can make the output undecideable (or rather, constantly changing).
Evaluating a circuit without loops should be easy - just use the BFS algorithm with "junctions" (connections between logic gates) as the items in the list. Start off with all the inputs to all the gates in an "undefined" state. As soon as a gate has all inputs "defined" (either 1 or 0), calculate its output and add its output junctions to the BFS list. This way you only have to evaluate each gate and each junction once.
If there are loops, the same algorithm can be used, but the circuit can be built in such a way that it never comes to a "rest" and some junctions are always changing between 1 and 0.
OOps, actually, this algorithm can't be used in this case because the looped gates (and gates depending on them) would forever stay as "undefined".
You could introduce them to the concept of Karnaugh maps, which would help them simplify truth values for themselves.
You could hard code all the common ones. Then allow them to build their own out of the hard coded ones (which would include low level gates), which would be evaluated by evaluating each sub-component. Finally, if one of their "chips" has less than X inputs/outputs, you could "optimize" it into a lookup table. Maybe detect how common it is and only do this for the most used Y chips? This way you have a good speed/space tradeoff.
You could always JIT compile the circuits...
As I haven't really thought about it, I'm not really sure what approach I'd take.. but it would possibly be a hybrid method and I'd definitely hard code popular "chips" in too.
When I was playing around making a "digital circuit" simulation environment, I had each defined circuit (a basic gate, a mux, a demux and a couple of other primitives) associated with a transfer function (that is, a function that computes all outputs, based on the present inputs), an "agenda" structure (basically a linked list of "when to activate a specific transfer function), virtual wires and a global clock.
I arbitrarily set the wires to hard-modify the inputs whenever the output changed and the act of changing an input on any circuit to schedule a transfer function to be called after the gate delay. With this at hand, I could accommodate both clocked and unclocked circuit elements (a clocked element is set to have its transfer function run at "next clock transition, plus gate delay", any unclocked element just depends on the gate delay).
Never really got around to build a GUI for it, so I've never released the code.

Reconstructing state from time series data events

For a particular project, we acquire data for a number of events and collect variables about those events at the same time. After the data has been collected, we perform a user-customizable analysis on said data to determine whatever it is that the user is interested in.
The data is collected in a form similar to this:
Timestamp Event
0 x = 0
0 y = 1
3 Event A occurred
3 x = 1
4 Event A occurred
4 x = 2
9 Event B occurred
9 y = 2
9 x = 0
To understand the entire state at any time, the most straightforward approach is to walk over the entire set of data. For example, if I start at time 0, and "analyze" until timestamp 5, I know that at that point x = 2, y = 1, and Event A has occurred twice. That's a really simple example. The user might be (and often is) interested in the time between events, say from A to B, and they might specify the first occurrence of A, then B, or the last occurrence of A, then B (respectively, 9-3 = 6 or 9-4 = 5). Like I said, this is easy to analyze when you're walking over the entire set.
Now, we need to adapt the model to analyze an arbitrary window of time. If we look at 0-N, that's the easy case. But if I look at 1-5, for instance, I have no notion of y unless I begin at 0 and know that y was initially 1 and did not change in the window 1-5.
Our approach is to essentially create a dictionary of variables, and run callbacks on events. If one analysis was "What is x when Event A occurs and time is > 3" then we would run that callback on the first Event A, and it would immediately return because time is not greater than 3. It would run again at 4, and and it would report that x was 1 at t=4.
To adapt to the "time-windowing", I think I am going to (in the background) tack on additional conditions to the analysis. If their analysis is just "What is x when Event A occurs", and the current window is 1-5, then I will change it to "What is x when Event A occurs and time >= 1 and time <= 5". Then if the next window is 6-10, I can readjust the condition as necessary.
My main question is: what pattern does this fit? We are obviously not the first people to approach a problem like this, but I have not been able to find how others have approached it. I probably just don't know what exactly to search on Google. Is there any other approach besides keeping a dictionary of the entire global state for looking up a single state at a given time? Note also that the data could have several, maybe tens of thousands of records, so the fewer iterations over the data set, the better.
I think your best approach here would be to take periodic "snapshots" of the full state data, say every 1000 samples (for example), along with recording the deltas. When you're storing your data as offsets from some original value (aka deltas), you don't have any choice but to reconstruct the full data starting with the original values. Storing periodic snapshots will lessen the amount of reconstruction you have to do - the design tradeoff is between low storage requirements but long reconstruction time on the one hand, and higher storage requirements but shorter reconstruction time on the other.
MPEGs, for example, store each frame as the differences between the current frame and the previous frame. Ordinarily, this would force an MPEG to be viewed from the beginning, but the format also periodically stores full frames so that the decoder doesn't have to backtrack all the way to the beginning of the file.
You can search by time in Log(N), and you can have a feeling about how many updates ares acceptable... hence here's my solution:
Pick a number, N, of updates that are acceptable in order to return a result. 256 might be good, given the scales you've mentioned so far.
Every N records, commit an entry of all state to a dictionary, with a timestamp.
Now, you have a tradeoff, dictionary size against speed. N->\infty is regular searching. N<-1 is your current solution, N anywhere else will require less memory, but be slower.
Your implementation is now (for time X):
Log(n) search of subsampled global dictionary to timestamp before X, (timestamped as Y).
Log(n) search of eventlist to timestamp Y, and perform less than N updates.
Picking N as a power of two will even allow you to do some nice shift tricks to do a rounded-down integer divide nice and fast.