Are polling and event-driven programming different words for the same technique? - listener

I studied interrupts vs cyclical polling and learnt the advantages of interrupts that don't have to wait for a poll. Polling seemed to me just like event-driven programming or at least similar to a listener and what the polling does is actually much like listening to input or output. Do you agree or did I misunderstand any crucial difference between polling (cyclical listening) and event-driven programming (also listening with so-called listeners)?

Nope, quite the contrary interrupt driven programming is pretty much what event driven programming is at the hardware level. Both interrupt driven code and event driven code waits for event before running a code, while polling will attempt to query for event whether or not one actually exists.
However, it should be noted that interrupt- and event-driven programs are generally implemented in the lower level using a form of polling; there is no truly interrupt or event driven system that does not involve some sort of polling, although usually in hardware. In the case of interrupts, the CPU actually polls the interrupt line every clock cycle, and likewise with event driven programming because restarting a paused thread involves an interrupt being raised by the source of event (usually drivers).
You can say that interrupt- and event- driven programming is a disciplined way to poll that have lots of advantage compared to actually doing polling yourself.

Polling and interrupt handling are two ways to find out about events. Neither is in contradiction with event-driven programming, which is building your program around handling of incoming events.

Both the answers are correct and addresses the original question. I am attempting to add a few points/considerations on event-driven / polling as is generally utilized from a bit higher level, from application containers.
If we assume we have a table where states of some components are written and based on the state some action needs to be taken; this could very well be designed as Q based eventing system or thread based polling system.
In the polling system in face of system/container termination/shutdown, the polling thread will find the changes in the next cycle and act on it.
In case of an event based system, if the container terminates before the event could be processes, it would result in the loss of the event. However, having said that there are transactional queues which are available and the capability comes out of the box; implementation of transaction, in general is a non-trivial process.
Both the methods are very much viable and not superior/inferior to each other. However, there are considerations for each model.

Related

How to propagate exception on event-driven/message queue microservice?

It's very straightforward on HTTP call between microservices to propagate exception to caller/front-end.
But how to propagate exception on event-driven/message queue (ie. RabbitMQ) microservice to the caller/front-end?
I would recommend Cadence Workflow which is much more powerful solution for microservice orchestration and provides exception handling propagation across long running operations out of the box.
It offers a lot of other advantages over using queues for your use case.
Built it exponential retries with unlimited expiration interval
Support for long running heartbeating operations
Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverble failures (SAGA)
Gives complete visibility into current state of the update. For example when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Cadence every event is recorded.
Ability to cancel an update in flight.
See the presentation that goes over Cadence programming model.

Solutions for handling millions of timed (scheduled) messages?

I'm evaluating possible solutions for handling a large quantity of queued messages, which must be delivered to workers at a certain date and time. The result of executing them is mostly updates to stored data, and they may or may not be originally triggered by user action.
For example, think of what you'd implement in a hypothetical large-scale StarCraft game server for storing and executing users' actions, like upgrading a building, hatching a soldier, all of which requires to be applied to the game state after several seconds or minutes after the player initiates them.
The problem is I can't seem to find the right term to name this problem area. There are several that looks similar, but different:
cron/task/job scheduler
The content of the queue is not dynamic, it's predefined.
Each task is scheduled.
message queue
The content of the queue is dynamic.
Each task is intended to be delivered immediately.
???
The content of the queue is dynamic.
Each task is scheduled.
If there are message queues that allow conditional delivery of messages, that might be it.
Summary:
What are these kind of technology called?
What are some of the solutions out there?
This just sounds like a trivial priority queue on the surface. The priority in this case is the time of completion, and you check the front of the queue to see when the next event is due. Pretty much every language comes with a priority queue or something that can easily be used as one, so I'm not sure what the actual problem is here.
Is it that you're worried about scalability, when it comes to millions of messages? Obviously 'millions' is a meaningless term - if that's millions per day, it's a trivial problem. If it's millions per second, then you can just scale horizontally, splitting the queue across multiple processes. (And the benefit of such a queue system is that this parallelization is really simple.)
I would bet that when implementing a large scale real-time strategy game server you would hit networking problems long before you start hitting problems with the message queue.
Have you tried looking at push queues by Iron.io? The content of the queue can be anything you like, and you specify a webhook to where the messages will be pushed to. You can also set a delay for each of the messages.
The webhook is static though for each queue and delay isn't always exactly on time (could be up to a minute off). If timing is more important or the ability of providing a different webhook per message is important, try looking at boomerang.io.
They say they are pretty accurate on the timing, you can provide a delay or unix timestamp for the webhook to return and that is per message. Sounds like either of those might work for you.
For StarCraft, I would use the Red Dwarf server.
For a Java EE app, I would use Quartz Scheduler.
It seems to me that a queue-based solution would be best in this case for a number of reasons:
Management. Most queuing solutions provide support for inspecting the content of queues which makes it easier to debug, easier to take action when certain threshold are exceeded, ...
Performance. You can divide workload by having multiple enqueue/dequeue processes (gives you the ability to scale out).
Prioritizing. Most queues support prioritizing of messages (probably not all messages are equally important).
...
Remaining problem is the immediate delivery of messages in the queue. You have two ways to solve this: either delay enqueuing of messages or delay execution of dequeued messages. I would go with the first approach, delayed enqueuing.
A message then has two properties: (content, delay). You provide the message to a component in your system that queues the message at the appropriate time.
I'm not sure what programming language you're using, but the MS .NET 4 framework has support for such a scenario (delayed execution of tasks).

Exceptions & Interrupts

When I was searching for a distinction between Exceptions and Interrupts,
I found this question Interrupts and exceptions on SO...
Some answers there were not suitable (at least for assembly level):
"Exception are software-version of an interrupt" But there exist software interrupts!!
"Interrupts are asynchronous but exceptions are synchronous" Is that right?
"Interrupts occur regularly"
"Interrupts are hardware implemented trap, exceptions are software implemented" Same as above!
I need to find if some of these answers were right , also I would be grateful if anyone could provide a better answer...
Thanks!
Interrupts are typically a method of signaling a change in hardware state. Peripherals will be tied by electrical signal to an interrupt controller which prioritizes and assigns address vectors to each possible signal. the interrupt controller forwards a detected interrupt condition to the CPU which may or may not 'interrupt' its present execution state to process the signaled state change (depending on whether interrupts are enabled and/or whether this particular input is non-maskable). Interrupt conditions may, on some architectures, be initiated by software (such as on the x86 there is an int mnemonic) in addition to hardware input.
Exceptions span a greater range of implementation. In some CPU architectures such as 68K, an exception can be similar to an interrupt but is generated by some CPU state that needs to be handled. For example there are conditions such as divide by zero, illegal instruction, I/O bus timeout, etc. that generate exceptions. By handling those exceptions one can do things such as emulate instructions and virtually extend the instruction set.
Exceptions may also be a software-only concept such as in the C++ language where certain error conditions can be trapped and handled.
So in general, the statements you are trying to find the validity of may be true or false depending on the exact platform you are applying them to.
An exception as it is used most often is a form of control flow in a programming language to deal with events outside the normal logic flow of the program to avoid that the business logic of a program drowns in the error handling logic. The 'handling' of the exception is context specific. It is more like a kind of GoTo for a number of use-cases where it was useful.
An interrupt is a hardware assisted 'trap' to trigger certain actions when certain events occur, as a timer tick or program "calling" INT21. There is a handler registered which does something predefined.
Both may or may not be synchronous or asynchronous.

"Work stealing" vs. "Work shrugging"?

Why is it that I can find lots of information on "work stealing" and nothing on "work shrugging" as a dynamic load-balancing strategy?
By "work-shrugging" I mean pushing surplus work away from busy processors onto less loaded neighbours, rather than have idle processors pulling work from busy neighbours ("work-stealing").
I think the general scalability should be the same for both strategies. However I believe that it is much more efficient, in terms of latency & power consumption, to wake an idle processor when there is definitely work for it to do, rather than having all idle processors periodically polling all neighbours for possible work.
Anyway a quick google didn't show up anything under the heading of "Work Shrugging" or similar so any pointers to prior-art and the jargon for this strategy would be welcome.
Clarification
I actually envisage the work submitting processor (which may or may not be the target processor) being responsible for looking around the immediate locality of the preferred target processor (based on data/code locality) to decide if a near neighbour should be given the new work instead because they don't have as much work to do.
I dont think the decision logic would require much more than an atomic read of the immediate (typically 2 to 4) neighbours' estimated q length here. I do not think this is any more coupling than implied by the thieves polling & stealing from their neighbours. (I am assuming "lock-free, wait-free" queues in both strategies).
Resolution
It seems that what I meant (but only partially described!) as "Work Shrugging" strategy is in the domain of "normal" upfront scheduling strategies that happen to be smart about processor, cache & memory loyality, and scaleable.
I find plenty of references searching on these terms and several of them look pretty solid. I will post a reference when I identify one that best matches (or demolishes!) the logic I had in mind with my definition of "Work Shrugging".
Load balancing is not free; it has a cost of a context switch (to the kernel), finding the idle processors, and choosing work to reassign. Especially in a machine where tasks switch all the time, dozens of times per second, this cost adds up.
So what's the difference? Work-shrugging means you further burden over-provisioned resources (busy processors) with the overhead of load-balancing. Why interrupt a busy processor with administrivia when there's a processor next door with nothing to do? Work stealing, on the other hand, lets the idle processors run the load balancer while busy processors get on with their work. Work-stealing saves time.
Example
Consider: Processor A has two tasks assigned to it. They take time a1 and a2, respectively. Processor B, nearby (the distance of a cache bounce, perhaps), is idle. The processors are identical in all respects. We assume the code for each task and the kernel is in the i-cache of both processors (no added page fault on load balancing).
A context switch of any kind (including load-balancing) takes time c.
No Load Balancing
The time to complete the tasks will be a1 + a2 + c. Processor A will do all the work, and incur one context switch between the two tasks.
Work-Stealing
Assume B steals a2, incurring the context switch time itself. The work will be done in max(a1, a2 + c) time. Suppose processor A begins working on a1; while it does that, processor B will steal a2 and avoid any interruption in the processing of a1. All the overhead on B is free cycles.
If a2 was the shorter task, here, you have effectively hidden the cost of a context switch in this scenario; the total time is a1.
Work-Shrugging
Assume B completes a2, as above, but A incurs the cost of moving it ("shrugging" the work). The work in this case will be done in max(a1, a2) + c time; the context switch is now always in addition to the total time, instead of being hidden. Processor B's idle cycles have been wasted, here; instead, a busy processor A has burned time shrugging work to B.
I think the problem with this idea is that it makes the threads with actual work to do waste their time constantly looking for idle processors. Of course there are ways to make that faster, like have a queue of idle processors, but then that queue becomes a concurrency bottleneck. So it's just better to have the threads with nothing better to do sit around and look for jobs.
The basic advantage of 'work stealing' algorithms is that the overhead of moving work around drops to 0 when everyone is busy. So there's only overhead when some processor would otherwise have been idle, and that overhead cost is mostly paid by the idle processor with only a very small bus-synchronization related cost to the busy processor.
Work stealing, as I understand it, is designed for highly-parallel systems, to avoid having a single location (single thread, or single memory region) responsible for sharing out the work. In order to avoid this bottleneck, I think it does introduce inefficiencies in simple cases.
If your application is not so parallel that a single point of work distribution causes scalability problems, then I would expect you could get better performance by managing it explicitly as you suggest.
No idea what you might google for though, I'm afraid.
Some issues... if a busy thread is busy, wouldn't you want it spending its time processing real work instead of speculatively looking for idle threads to offload onto?
How does your thread decide when it has so much work that it should stop doing that work to look for a friend that will help?
How do you know that the other threads don't have just as much work and you won't be able to find a suitable thread to offload onto?
Work stealing seems more elegant, because solves the same problem (contention) in a way that guarantees that the threads doing the load balancing are only doing the load balancing while they otherwise would have been idle.
It's my gut feeling that what you've described will not only be much less efficient in the long run, but will require lots of of tweaking per-system to get acceptable results.
Though in your edit you suggest that you want submitting processor to handle this, not the worker threads as you suggested earlier and in some of the comments here. If the submitting processor is searching for the lowest queue length, you're potentially adding latency to the submit, which isn't really a desirable thing.
But more importantly it's a supplementary technique to work-stealing, not a mutually exclusive technique. You've potentially alleviated some of the contention that work-stealing was invented to control, but you still have a number of things to tweak before you'll get good results, these tweaks won't be the same for every system, and you still risk running into situations where work-stealing would help you.
I think your edited suggestion, with the submission thread doing "smart" work distribution is potentially a premature optimization against work-stealing. Are your idle threads slamming the bus so hard that your non-idle threads can't get any work done? Then comes the time to optimize work-stealing.
So, by contrast to "Work Stealing", what is really meant here by "Work Shrugging", is a normal upfront work scheduling strategy that is smart about processor, cache & memory loyalty, and scalable.
Searching on combinations of the terms / jargon above yields many substantial references to follow up. Some address the added complication of machine virtualisation, which wasn't infact a concern of the questioner, but the general strategies are still relevent.

Real time system exception handling

As always after some research I was unable to find anything of real value. My question is how does one go about handling exceptions in a real time system? As program failure generally is not the best case i.e. nuclear reactor/ heart monitor.
Ok since everyone got lost on the second piece of this, which had NOTHING to do with the main question. I had it in there to show how I normally escape code blocks.
Exception handling in real-time/embedded systems has several layers. Not just the language supported options, but also MMU, CPU exceptions and one of my favorites: watchdogs.
Language exceptions (C/C++)
- not often used, beause it is hard to prove that all exceptions are handled at the right level. Also it is pretty hard to determine what threat/process should be responsible. Instead, programming by contract is preferred.
Programming style:
- i.e. programming by contract. Additional constraints : Misra/C Misra/C++. This can be checked to unsure that all possible cases are somehow handled. (i.e. no if without else)
Hardware support:
- MMU : use of multiple processes which are protected against each other. This allows
- watchdog
- CPU exceptions
- multi core: use of multiple cores to separate cricical processes from the rest. Also allows to have voting mechanisms (you want this and more for your nuclear reactor).
- multi-system
Most important is to define a strategy. Depending on the other nonfunctional requirements (safety, reliability, security) a strategy needs to be thought of. Can be graceful degradation to partial system reboot.
In a 'real-time', 'nuclear reactor' type system, chances are the exception handling allows the system to instead of fail, do the next best thing.
Let's say that we have a heart monitor. If it isn't receiving a signal, that might trigger an exception. In that case, the heart monitor might handle the exception by waiting a few seconds and trying again.
In a nuclear reactor, getting to a certain temperature might trigger an exception. In that case, the handling might shut off various parts of the reactor to start to cool it down, and then start them back up when it gets to a reasonable temperature.
Exceptions are meant to have a lower-level system say that it doesn't know what to do, and to have a higher level system handle it. Like in the nuclear reactor, the system that measures temperature probably doesn't know how to turn on parts of the reactor, so it triggers an exception so that some higher-level system can handle it.
A critical system is like any other system, except it's specified more clearly, passes through more testing phases, and is generally will fail-safe.
Regarding your form, yes it's pretty bad. I do mind the lack of {} very much; and it's been said so-often that this is just plain bad style, and leads to confusion when adding new code.