How to scale azure functions to high throughput short lived Event Grid events - function

When publishing a large amount of events to a topic (where the retry and time to live is in the minutes) many fail to get delivered to subscribed functions. Does anyone know of any settings, or approaches to ensure scaling react quickly without dropping them all?
I am creating a Azure Function app that essentially passes events to an event grid topic at high rate, and other functions subscribed to a topic will handle the events. These events are meant to be short lived and not persist longer than a specified set of minutes. Ideally I want to see the app scale to handle the load without dropping events. the overall goal is that each event will trigger an outbound api endpoint call to my own api to test performance/load.
I have reviewed documentation on MSDN, and other locations but not much fits my scenario (most talk in terms of incoming events and not outbound http events).
For scaling I have looked into host.json settings for http (as there is none for grid events and grid events look to be similar to http triggers) and setting those seemed to have made some improvements
The end result I expect is that for every publish to a topic endpoint it gets delivered to a function and executed with a low fail delivery/drop rate.
What I am seeing is that when publishing many events to a topic (and at a consistent rate), the majority of events get dead-lettered/dropped

Consumption plan is limited by the computing power that is assigned to your function. In essence that there are some limits up to which it can scale, and then it becomes the bottle neck.
I suggest to have a look at the limitations.
And here you can some insights about computing power differences.
If you want to enable automatic scaling, or scaling in the number of vm instances I suggest using an app service plan. The cheapest option where scaling is supported is Standard pricing tier.

Related

Google Compute Engine auto scaling based on queue length

We host our infrastructure on Google Compute Engine and are looking into Autoscaling for groups of instances. We do a lot of batch processing of binary data from a queue. In our case, this means:
When a worker is processing data the CPU is always 100%
When the queue is empty we want to terminate all workers
Depending on the length of the queue we want a certain amount of workers
However I'm finding it hard to figure out a way to auto-scale this on Google Compute Engine because they appear to scale on instance-only metrics such as CPU. From the documentation:
Not all custom metrics can be used by the autoscaler. To choose a
valid custom metric, the metric must have all of the following
properties:
The metric must be a per-instance metric.
The metric must be a valid utilization metric, which means that data from the metric can be used to proportionally scale up or down
the number of virtual machines.
If I'm reading the documentation properly this makes it hard to use the auto scaling on a global queue length?
Backup solutions
Write a simple auto-scale handler using the Google Cloud API to create or destroy new workers using Instances API
Write a simple auto-scale handler using instance groups and then manually insert/remove instances using the InstanceGroups: insert
Write a simple auto-scaling handler using InstangeGroupManagers: resize
Create a custom per-instance metric which measures len(queue)/len(workers) on all workers
As of February 2018 (Beta) this is possible via "Per-group metrics" in stackdriver.
Per-group metrics allow autoscaling with a standard or custom metric
that does not export per-instance utilization data. Instead, the group
scales based on a value that applies to the whole group and
corresponds to how much work is available for the group or how busy
the group is. The group scales based on the fluctuation of that group
metric value and the configuration that you define.
More information at https://cloud.google.com/compute/docs/autoscaler/scaling-stackdriver-monitoring-metrics#per_group_metrics
The how-to is too long to post here.
As far as I understand this is not implemented yet (as at January 2016). At the moment autoscaling is only targeted at web serving scenarios, where you want to serve web pages/other web services from your machines and keep some reasonable headroom (e.g. in terms of CPU or other metrics) for spikes in traffic. Then the system will adjust the number of instances/VMs to match your target.
You are looking for autoscaling for batch processing scenarios, and this is not catered for at the moment.

Decrease CPU load while calculating for a long time on AIR mobile application?

I am facing the follow problem :
- There is a calculation which is calculated complex maths during the loading of the application, and it is toking considerable long time ( about 20 seconds ) on which time the CPU is used nearly on 100% and the application look like it is frozen.
Since it is a mobile application, this must be prevented, even with the costs of extending the initial loading time, but there is not direct access to the calculating code since it is inside 3rd party library.
Is there a way to prevent AIR application most of CPU generally ?
On desktop, you would use the Workers API. Its pretty new, I'd recommend it for AS3 only projects. If you use flex, its better to wait a few months.
Workers is a multi-threading API, what allows you to make a UI and a Working thread. This will still use 100% of CPU, but UI won't stuck.. Here are some links to get you started:
Thibault Imbert - sneak peek,
Intro to as 3 workers,
AS3 Workers livedocs
However, on Mobile, you can't use workers, so you'd have to break your function apart, and insert some delays there, like callLater, or setTimeout. Its hard to compose a function like that, but if it has a loop, you can insert a callLater method after every x iteration. you can parametrize both x, and the delay of callLater function to achieve perfect solution. After callLater is called, the UI will be rendered, events will be generated and catched. If you don't need them, remove their listeners, or stop their propagation with a higher priority handler. If you need, I can post some source example of callLater in a loop.

How can I model this usage scenario?

I want to create a fairly simple mathematical model that describes usage patterns and performance trade-offs in a system.
The system behaves as follows:
clients periodically issue multi-cast packets to a network of hosts
any host that receives the packet, responds with a unicast answer directly
the initiating host caches the responses for some given time period, then discards them
if the cache is full the next time a request is required, data is pulled from the cache not the network
packets are of a fixed size and always contain the same information
hosts are symmetic - any host can issue a request and respond to requests
I want to produce some simple mathematical models (and graphs) that describe the trade-offs available given some changes to the above system:
What happens where you vary the amount of time a host caches responses? How much data does this save? How many calls to the network do you avoid? (clearly depends on activity)
Suppose responses are also multi-cast, and any host that overhears another client's request can cache all the responses it hears - thereby saving itself potentially making a network request - how would this affect the overall state of the system?
Now, this one gets a bit more complicated - each request-response cycle alters the state of one other host in the network, so the more activity the quicker caches become invalid. How do I model the trade off between the number of hosts, the rate of activity, the "dirtyness" of the caches (assuming hosts listen in to other's responses) and how this changes with cache validity period? Not sure where to begin.
I don't really know what sort of mathematical model I need, or how I construct it. Clearly it's easier to just vary two parameters, but particularly with the last one, I've got maybe four variables changing that I want to explore.
Help and advice appreciated.
Investigate tokenised Petri nets. These seem to be an appropriate tool as they:
provide a graphical representation of the models
provide substantial mathematical analysis
have a large body of prior work and underlying analysis
are (relatively) simple mathematical models
seem to be directly tied to your problem in that they deal with constraint dependent networks that pass tokens only under specified conditions
I found a number of references (quality not assessed) by a search on "token Petri net"

Solutions for handling millions of timed (scheduled) messages?

I'm evaluating possible solutions for handling a large quantity of queued messages, which must be delivered to workers at a certain date and time. The result of executing them is mostly updates to stored data, and they may or may not be originally triggered by user action.
For example, think of what you'd implement in a hypothetical large-scale StarCraft game server for storing and executing users' actions, like upgrading a building, hatching a soldier, all of which requires to be applied to the game state after several seconds or minutes after the player initiates them.
The problem is I can't seem to find the right term to name this problem area. There are several that looks similar, but different:
cron/task/job scheduler
The content of the queue is not dynamic, it's predefined.
Each task is scheduled.
message queue
The content of the queue is dynamic.
Each task is intended to be delivered immediately.
???
The content of the queue is dynamic.
Each task is scheduled.
If there are message queues that allow conditional delivery of messages, that might be it.
Summary:
What are these kind of technology called?
What are some of the solutions out there?
This just sounds like a trivial priority queue on the surface. The priority in this case is the time of completion, and you check the front of the queue to see when the next event is due. Pretty much every language comes with a priority queue or something that can easily be used as one, so I'm not sure what the actual problem is here.
Is it that you're worried about scalability, when it comes to millions of messages? Obviously 'millions' is a meaningless term - if that's millions per day, it's a trivial problem. If it's millions per second, then you can just scale horizontally, splitting the queue across multiple processes. (And the benefit of such a queue system is that this parallelization is really simple.)
I would bet that when implementing a large scale real-time strategy game server you would hit networking problems long before you start hitting problems with the message queue.
Have you tried looking at push queues by Iron.io? The content of the queue can be anything you like, and you specify a webhook to where the messages will be pushed to. You can also set a delay for each of the messages.
The webhook is static though for each queue and delay isn't always exactly on time (could be up to a minute off). If timing is more important or the ability of providing a different webhook per message is important, try looking at boomerang.io.
They say they are pretty accurate on the timing, you can provide a delay or unix timestamp for the webhook to return and that is per message. Sounds like either of those might work for you.
For StarCraft, I would use the Red Dwarf server.
For a Java EE app, I would use Quartz Scheduler.
It seems to me that a queue-based solution would be best in this case for a number of reasons:
Management. Most queuing solutions provide support for inspecting the content of queues which makes it easier to debug, easier to take action when certain threshold are exceeded, ...
Performance. You can divide workload by having multiple enqueue/dequeue processes (gives you the ability to scale out).
Prioritizing. Most queues support prioritizing of messages (probably not all messages are equally important).
...
Remaining problem is the immediate delivery of messages in the queue. You have two ways to solve this: either delay enqueuing of messages or delay execution of dequeued messages. I would go with the first approach, delayed enqueuing.
A message then has two properties: (content, delay). You provide the message to a component in your system that queues the message at the appropriate time.
I'm not sure what programming language you're using, but the MS .NET 4 framework has support for such a scenario (delayed execution of tasks).

Detecting slow readers with zmq(zeromq)

I'm trying to replace a small homegrown messaging system, and are playing around a bit with zmq .
I'll be needing to detect slow readers, and boot/disconnect them - slow readers pretty much meaning a particular consumer whos queue size is above a certain threshold.
So far it seems zmq blocks every consumer if one of them is a bit slow (fair enough) - but
I can't find any way to detect a potential slow consumer. Anyone have any experience with
wether and how this is possible with zmq - or have any other broker-less messaging system to recccommend ?
As of zeromq-2.0.7, you can set the ZMQ_HWM option on a ZMQ_PUB socket to control the maximum number of messages that can be queued for a subscriber. Once the high-water mark has been reached, all further messages destined for that subscriber will be dropped until the queue size drops back below the high-water mark. This limits the amount of memory dedicated to what you call a slow reader.
However, because the ZeroMQ library exposes sockets, not clients, there is no way for you to identify and forcibly disconnect unwanted clients without modifying the library itself.
There is a section in the ZeroMq Guide regarding this, it suggests implementing a pattern the call the "Suicidal Snail Pattern".
Basically, it reverses the dependency and tries to convince slow subscribers to disconnect/kill themselves by giving them a way to detect if they have become slow readers.