What term is used to describe a queuing/messaging system that acts upon one event after receiving several identical events? - message-queue

What term (or terms) is used to describe a queuing/messaging system that receives several identical events/messages over a period of time and "buffers" them so that only one is acted upon? In Javascript, for example, this is actually called "buffering" (only executing a particular function once in response to several events in a short time). And maybe "buffer/buffering" is the right term, but I'm wondering if there's a more appropriate term when the "system" is a messaging bus.

Related

State definition in Reinforcement learning

When defining state for a specific problem in reinforcement learning, How to decide what to include and what to leave for the definition, and also how to set difference between an observation and a state.
For example assuming that the agent is in the context of human resource and planning where it needs to hire some workers based on the demand of jobs, considering the cost of hiring them (assuming the budget is limited) is a state in the format of (# workers, cost) a good definition of state?
In total I don't know what information is needed to be in state and what should be left as it's rather observation.
Thank you
I am assuming you are formulating this as an RL problem because the demand is an unknown quantity. And, maybe [this is optional criteria] the Cost of hiring them may take into account a worker's contribution towards the job which is unknown initially. If however, both these quantities are known or can be approximated beforehand then you can just run a Planning algorithm to solve the problem [or just some sort of Optimization].
Having said this, the state in this problem could be something as simple as (#workers). Note I'm not including the cost, because cost must be experienced by the agent, and therefore is unknown to the agent until it reaches a specific state. Depending on the problem, you might need to add another factor of "time", or the "job-remaining".
Most of the theoretical results on RL hinge on a key assumption in several setups that the environment is Markovian. There are several works where you can get by without this assumption, but if you can formulate your environment in a way that exhibits this property, then you would have much more tools to work with. The key idea being, the agent can decide which action to take (in your case, an action could be : Hire 1 more person. Other actions could be Fire a person) based on the current state, say (#workers = 5, time=6). Note that we are not distinguishing between workers yet, so firing "a" person, instead of firing "a specific" person x. If the workers have differing capabilities, you may need to add several other factors each representing which worker is currently hired, and which are currently in the pool, yet to be hired so like a boolean array of a fixed length. (I hope you get the idea of how to form a state representation, and this can vary based on the specifics of the problem, which are missing in your question).
Now, once we have the State definition S, the action definition A (hire / fire), we have the "known" quantities for an MDP-setup in an RL framework. We also need an environment that can supply us with the cost function when we query it (Reward Function / Cost Function), and tell us the outcome of taking a certain action on a certain state (Transition). Note that we don't necessarily need to know these Reward / Transition function beforehand, but we should have a means of getting these values when we query for a specific (state, action).
Coming to your final part, the difference between observation and state. There are much better resources to dig deep into it, but in a crude sense, observation is an agent's (any agent, AI, human etc) sensory data. For example, in your case the agent has the ability to count number of workers currently employed (but it does not have an ability to distinguish between workers).
A state, more formally, a true MDP state must be something that is Markovian and captures the environment at its fundamental level. So, maybe in order to determine the true cost to the company, the agent needs to be able to differentiate between workers, working hours of each worker, jobs they are working at, interactions between workers and so on. Note that, much of these factors may not be relevant to your task, for example a worker's gender. Typically one would like to form a good hypothesis on which factors are relevant beforehand.
Now, even though we can agree that a worker's assignment (to a specific job) maybe a relevant feature which making a decision to hire or fire them, your observation does not have this information. So you have two options, either you can ignore the fact that this information is important and work with what you have available, or you try to infer these features. If your observation is incomplete for the decision making in your formulation we typically classify them as Partially Observable Environments (and use POMDP frameworks for it).
I hope I clarified a few points, however, there is huge theory behind all of this and the question you asked about "coming up with a state definition" is a matter of research. (Much like feature engineering & feature selection in Machine Learning).

How to account for rare events at different time intervals while using LSTM neural networks?

I'm working on an interesting sequence-to-sequence (regression) time series problem where some static features/rare events can change the behavior of future time series. The problem is a forecasting problem, where I use previous time step values to forecast the next time step values and I try to integrate static features + rare events into time step t=0.
In my problem, there is always a rare event at t=0 in addition to some static features that should affect the future behavior of time series.
For clarity, My definition of "rare events": an event that happens at a specific time step (for ex: t=0) and another separate event can happen at any time in the future as well (for ex: t=n) in addition to the event that happened at t=0 but, it happens only once at that time and both events can affect the future time series behavior starting from the time they occurred.
Even though most of the static features don't change over time, the rare events can be different from each other (has different characteristics/features). The time of each event is usually known because it will be applied due to outside human intervention to optimize the future behavior (increase profit) but, they do not necessarily happen at the same time step for every sample/example.
These events are so rare that it kind of makes sense to me to treat them as static features at time=0 but, I can't think of a way to include a rare event that happens n timesteps later in the future and has different characteristics than the event at t=0.
Below is an example schematic of the problem. There may be multiple samples with varying time steps affected by these unique rare events but, if I don't account for these events, I believe my predictions may suffer.
Can anyone suggest any sources to look at for these types of problems? I may also be missing key words that are usually used with these types of problems and that may be one of the reasons why I'm still having difficulties finding good sources. I call it "rare events" but, it may be called something else in the literature... At this point, I appreciate any type of source that addresses this issue such as scientific papers/articles, github code or a code example provided by you, correct keywords to search for, etc.
Thank you.
Example image to describe the problem
I have seen your rare event in the picture you have mentioned from the little information present in the picture it could be observed that there is some seasonality present in the rare event . So if you using rare to point out a random event i think that is not correct because it has seasonality (periodic).
In short words you are worried about the features that you using to train the model.
normal events
rare events
There may be multiple samples with varying time steps affected by
these unique rare events but, if I don't account for these events, I
believe my predictions may suffer.
If you are not certain whether your features rare ones you have mentioned are contributing/ or not etc.
You must switch to attention based mechanism because :
" Attention is all you need "
These models such as Bert are much better then LSTM because they add a attention
(Importance) feature to every feature so the model will learn automatically that how much weightage should be added to both rare and normal features.
I am explaining in very general terms as your question was not too much specific.
Have a nice day
stay blessed !

Reagent - methods of storing ui state together with (or separate from?) server-persisted state?

I'm using multiple atoms within a map called app-state and it's working quite well architecturally so far. The state distributed across those atoms is normalised, reflecting as it is stored in datomic, and of course what the client is initialised with is a specific subset of what's in datomic. This is me preparing the way to try out datascript (which is what gave me the aha moment of why client state is so much better fully normalised, even if not using datascript).
I have a question at this point. We all know that some state in reagent is a reflection of what's in the server's database (typically), but there's also state in reagent concerning solely the current condition of the ui. That state will vanish when the page is re-loaded and there's (typically) no need to store that on the server.
So, I'm looking at my list of atoms and realising that I have some atoms which hold database-record-like maps, i.e. they contain exact reflections of datomic entities, (which arrive by transit), which is great.
But now I notice I also want some ui state per datomic entity.
So the question arises whether to (this seems wrong to me) add some keys to what came from datomic, of the ui state that is irrelevant to datomic, but that the client needs (i.e., dump it into the same nested map). That is entirely possible, but seems wrong, and so suggests.... (this being my idea as of now), How about a parallel atom per "entity", like #<entity-name>-ui, containing a map (or even a vector of maps, if multiple entities), with a set of keys for ui state.
That seems an improvement on what I have ended up with by default as of now, which is separate atom for every piece of ui state (I've avoided component local state up to now). (Currently the ui only holds ui state for one record at a time, so these ui atoms need only be concerned with a single current entity).
But if, say, I made a parallel atom (to avoid mixing ephemeral ui and server state), then ui state could perhaps manageably extend deeper. We could hold, say, ui state per entity so switching current-entity back and forth would remember ui state.
Since this is Stack Overflow, I have to ask a specific question, rather than this just be discussion, so: given what I've described, what are some sensible architectural choices in this case, to store state in reagent?
If you are storing your app state in several component-independent reagent atoms already - you can check https://github.com/day8/re-frame which is a widely-adopted reagent-powered framework exactly for your case. Essentially it stores all the application state in a reagent atom but has a well-developed infrastructure to support coordinated storage and updates. They have brilliant documentation with a great high-level explanation of the idea.
Regarding your initial question about server/ui state separation - I think you should definitely go this way. It'll give you a better way of separating concerns and give you an easier way to update server/ui data separately. It is very easy to achieve this with re-frame by storing both parts of the state under separate top-level keys in re-frame application db. E.g.
{:server {:entity-name ...}
:ui {:entity-name ...}}
and then create suitable subscriptions(see re-frame docs) to retrieve it.

What are examples of real-world scenarios where a message queuing system can accept the loss of some messages?

I was reading this blog post, in which the author proposes the following question, in the context of message queues:
does it matter if a message is lost? If you application node, processing the request, dies, can you recover? You’ll be surprised how often it doesn’t actually matter, and you can function properly without guaranteeing all messages are processed
At first I thought that the main point of handling messages was to never loose a single message - after all, a message lost could mean a hotel reservation not booked, a checkout not completed, or any other functionality not carried through, which seems too similar to a bug for me. I suppose I am missing something, so, what are examples of scenarios where it is OK for a messaging system to loose a few messages?
Well, your initial expectation:
the main point of handling messageswas to never loose a single message
was just not a correct one.
Right, if one strives for a one certain type of robustness, where fail-safe measures have to take all due care and precautions, so as not a single message could get lost, yes, there your a priori expressed expectation fits.
This does not mean that all other system designs have to carry all the immense burdens and have to pay all that incurred costs ( resources-wise, latency-wise et al ), as the "100+% guaranteed delivery" systems do ( but, again, only if they can ).
Anti-pattern cases:
There are many use-cases, where an absolute certainty of delivery of each and every message originally sent is actually an anti-pattern.
Just imagine a weakly synchronised system ( including ones, that have nothing like backthrottling or even any simplest form of feedback propagation at all ), where the sensors read an actual temperature, a sound, a video-frame and send a message with that value(s).
Whenever a postprocessing system gets such information delivered, there may be a reason not to read any and all "old" values, but the most recent one(s).
If a delivery framework already got any newer set of values, all the "older" values, not processed yet, just hanging at some depth from the queue-head, yet in the queue, might create the anti-pattern, where one would not like to have to read and process any and all of those "older" values, but just the most recent one(s).
Like no one will make a trade with you based on yesterday prices, there is not positive value to make any new, current, decision based on reading any and all "old" temperature readings, that still wait in the queue.
Some smart-messaging frameworks provide explicit means for taking just the very "newest" message from a given source - thus enabling to imperatively discard any "older" messages, avoiding them from being read and processed right due to a known presence of a one "most" recent.
This answers the original question about the assumed main point of handling messages.
Efficiency first:
In any case, where a smart-delivery takes place ( either deliver an exact copy of the original message content or noting-at-all ), the resources are used at their best efforts, yet, without spending a single penny on anything but the "just-enough" smart-delivery.
Building robustness costs more than that.
Building an ultimate robustness, costs even way more than that.
Systems than do have such an extreme requirement can and may extend the resources-efficient smart-delivery so as to reach some requirements defined level of robustness, at some add-on costs.
The same but reversed is not possible -- if an "everything-proof" system is to get a slimmer form and fashion, so as to fit onto any restricted-resources hardware or to make it "forget" some "old" messages, that are of no positive value at this very moment ( but on the contrary, constitute a must for the processing element to read and process each and every "unwanted" message, just due to the fact it was delivered, while knowing a core-logic needs just the most recent one ).
Distributed systems accrue E2E-latency from many distributed sources, so any rigid-delivery system just block and penalise the only one element, who is ( latency-wise ) innocent -- the receiver.
I suppose it's OK to loose few messages from some measurement units that deliver the value once in.... Also for big data analytics solutions few lost messages won't make a big difference
It all depends on the application/larger system. The message queue is only one link in the chain, so to speak. If the application(s) at the ends are prepared to deal with loss, losing some messages is not a problem. If the application(s) rely on total messaging integrity then there will be problems.
An example of a system that will be ok with loss is weather updates for your phone. If a few temperature/wind updates don't make it to you there's no real harm in that.
Now, if you're running a nuclear reactor and you lose a few temperature updates on the core, well that is a problem.
I work a lot on safety critical, infrastructure-level systems, and am responsible for messaging much of the time. Many of those systems state clearly that messaging may reorder, duplicate, or lose messages; it's just a fact of life where distributed systems and networks are involved. The endpoint systems need to be designed to work correctly in that environment. So they track messages, ack end to end, deal with duplicates and retransmits, etc.

CQRS / communication between contexts / eventstore / push or pull?

Communications between bounded context in CQRS/ES architecture is achieved through events; context A generates events as response to commands, and these events is then forwarded to context B through event bus (message queue).
Or... you can store the events in eventstore (that belongs to context A).
Or... both (store and forward).
My question is: from context B, should I pull the events from the context store? or simply consume the events pushed through the event bus?
I'm leaning toward the pulling approach. Because then we can do some catching up in context B. In contrast, in the push approach, context B might be unaware of events that were delivered while B is experiencing downtime.
So... does it mean... when we have eventstore, we can simply forget about the message queue (seems redundant)?
Or am I missing something here?
You'll want to review Consume event stream without Pub/Sub
At the DDD Europe conference, I realized that the speakers I talked with where (sic) avoiding Pub/Sub whenever possible.
The discussion that follows may have value. TL;DR: not many fans of pub/sub there.
Konrad Garus on Push or Pull?, describing the Pull design:
In the latter (and simpler) design, they only spread the information that a new event has been saved, along with its sequential ID (so that all projections can estimate how much behind they are). When awakened, the executor can continue along its normal path, starting with querying the event store.
Why? Because handling events coming from a single source is easier, but more importantly because a DB-backed event store trivially guarantees ordering and has no issues with lost or duplicate messages. Querying the database is very fast, given that we’re reading a single table sequentially by primary key, and most of the time the data is in RAM cache anyway. The bottleneck is in the projection thread updating its read model database.
In the large, it comes down to this: when people are thinking about event sourcing, they are really thinking about histories, rather than events in isolation. If what you really want is an ordered sequence of events with no gaps, querying the authority for that sequence is much better than trying to reconstruct if from a bunch of disjoint event messages.
But - once you decide to do that, then suddenly the history, and all of the events that appear within it, becomes part of the api of context A. What happens when team A decides that a different event store implementation is more suitable? Can they just roll out a new version of their own services, or do we need a grand outage because every consumer also has to get updated?
Similarly, what happens if we decide to refactor context A into context C and context D? Again, do we have to screw around in context B to get the data we need?
Maybe the real problem is that context B is coupled to the histories in context A, and those histories should really be private? Should context B be accessing context A's data, or should it instead be delegating that work to context A's capabilities?
Udi Dahan essays on SOA may jump start your thinking in that direction.