I'm developing a web app that needs to handle bursts of very high loads,
once per minute I get a burst of requests in very few seconds (~1M-3M/sec) and then for the rest of the minute I get nothing,
What's my best strategy to handle as many req /sec as possible at each front server, just sending a reply and storing the request in memory somehow to be processed in the background by the DB writer worker later ?
The aim is to do as less as possible during the burst, and write the requests to the DB ASAP after the burst.
Edit : the order of transactions in not important,
we can lose some transactions but 99% need to be recorded
latency of getting all requests to the DB can be a few seconds after then last request has been received. Lets say not more than 15 seconds
This question is kind of vague. But I'll take a stab at it.
1) You need limits. A simple implementation will open millions of connections to the DB, which will obviously perform badly. At the very least, each connection eats MB of RAM on the DB. Even with connection pooling, each 'thread' could take a lot of RAM to record it's (incoming) state.
If your app server had a limited number of processing threads, you can use HAProxy to "pick up the phone" and buffer the request in a queue for a few seconds until there is a free thread on your app server to handle the request.
In fact, you could just use a web server like nginx to take the request and say "200 OK". Then later, a simple app reads the web log and inserts into DB. This will scale pretty well, although you probably want one thread reading the log and several threads inserting.
2) If your language has coroutines, it may be better to handle the buffering yourself. You should measure the overhead of relying on our language runtime for scheduling.
For example, if each HTTP request is 1K of headers + data, want to parse it and throw away everything but the one or two pieces of data that you actually need (i.e. the DB ID). If you rely on your language coroutines as an 'implicit' queue, it will have 1K buffers for each coroutine while they are being parsed. In some cases, it's more efficient/faster to have a finite number of workers, and manage the queue explicitly. When you have a million things to do, small overheads add up quickly, and the language runtime won't always be optimized for your app.
Also, Go will give you far better control over your memory than Node.js. (Structs are much smaller than objects. The 'overhead' for the Keys to your struct is a compile-time thing for Go, but a run-time thing for Node.js)
3) How do you know it's working? You want to be able to know exactly how you are doing. When you rely on the language co-routines, it's not easy to ask "how many threads of execution do I have and what's the oldest one?" If you make an explicit queue, those questions are much easier to ask. (Imagine a handful of workers putting stuff in the queue, and a handful of workers pulling stuff out. There is a little uncertainty around the edges, but the queue in the middle very explicitly captures your backlog. You can easily calculate things like "drain rate" and "max memory usage" which are very important to knowing how overloaded you are.)
My advice: Go with Go. Long term, Go will be a much better choice. The Go runtime is a bit immature right now, but every release is getting better. Node.js is probably slightly ahead in a few areas (maturity, size of community, libraries, etc.)
How about a channel with a buffer size equal to what the DB writer can handle in 15 seconds? When the request comes in, it is sent on the channel. If the channel is full, give some sort of "System Overloaded" error response.
Then the DB writer reads from the channel and writes to the database.
Related
For Create operations it is clear that putting the message in the queue is a good idea in case the processing or creation of that entity takes longer than expected and other the other benefits queues bring.
However, for read operations that are timebound (must return to the UI in less than 3 seconds) it is not entirely clear if a queue is a good idea.
http://masstransit-project.com/MassTransit/usage/request-response.html provides a nice abstraction but it goes through the queue.
Can someone provide some suggestions as to why or why not I would use mass transit or that effect any technology like nservicebus etc for database read operation that are UI timebound?
Should I only use mass transit only for long running processes?
Request/Reply is a perfectly valid pattern for timebound operations. Transport costs in case of, for example, RabbitMQ, are very low. I measured performance of request/response using ServiceStack (which is very fast) and MassTransit. There is an initial delay with MassTransit to cache the endpoints, but apart from that the speed is pretty much the same.
Benefits here are:
Retries
Fine tuning of timeouts
Easy scaling with competing consumers
just to name the most obvious ones.
And with error handling you get your requests ending up in the error queue so there is no data loss and you can always look there to find out what and why went wrong.
Update: There is a SOA pattern that describes this (or rather similar) approach. It is called Decoupled Invocation.
I have a custom python script that monitors the call logs from a Nortel phone system. This phone system is under extremely high volume throughout the day and it's starting to appear that some records may be getting lost.
Some of you may dislike this, but I'm not interested in sharing the source code or current method in any way. I would rather consider this from a "new project" approach.
I'm looking for insight into the easiest and safest way to reliably monitor heavy data output through a serial port on Linux. I'm not limiting this to any particular set of tools or languages, I want to find out what works best to do this one critical job. I'm comfortable enough parsing the data and inserting it into mysql that we could just assume the data could be dropped to a text file.
Thank you
Well, the way that I would approach this this to have 2 threads (or processes) working.
Thread 1: The read thread
This thread does nothing but read data from the raw serial port and put the data into a local buffer/queue (In memory is preferred for speed). It should do nothing else. Depending on the clock speed of the serial connection, this should be pretty easy to do.
Thread2: The processing thread
This thread just sleeps until there is data in the local buffer to process, then reads and processes it. That's it.
The reason for splitting it apart in two, is so that if one is busy (a block in MySQL for the processing thread) it won't affect the other. After all, while the serial port is buffered by the OS, the buffer size is limited.
But then again, any local program is likely going to be way faster than the serial port can send data. Serial transfer is actually quite slow relative to the clock speed of the processor (115.2kbps is about the limit on standard hardware). So unless you're CPU speed bound (such as on an Arduino), I can't see normal conditions affecting it too much. So your choice of language really shouldn't be of too much concern (assuming modern hardware). Stick to what you know.
I need a 2D array (as Json) to be sent from server to client. It would be around 400x400 in size with each entry around 4 characters of text. So that makes it around 640KB of data.
Which of the following extreme approaches is better ?
I make a large HTTP request of all the data at one go.
I make 400 requests - each asking for a single row (around 1.6 KB)
I believe optimal approach would be somewhere in middle. Could anyone give me an idea what might be the optimal single request size for this data?
Thanks.
Couple of considerations for choosing one big vs several small:
In the single request case, you can't do progressive data processing as the data arrives; you need to wait for the full packet to arrive before you can do anything. If it fails, you need to start everything from scratch.
In the multiple requests case, you can do progressive data processing. However, you now have to consider the potential for multiple failures and how to recover from these.
Multiple requests incur overhead for each request. This is additional bandwidth you app will be consuming.
Some HTTP agents limit the number of concurrent requests to the same server, and you might need to do some logic to work around that.
Response compression will work better for the single request case.
Multiple requests won't require you to allocate the full memory for your data. Granted, 640KB is not that big chunk of memory, so that might not be a big consideration for you, depending on how often you will allocate it.
In the case of early terminate of the process (either a Cancel button or the app is terminated or the browser navigates away from your page), the single request will still finish the full response download; however, for the multiple requests case, any request your code hasn't started yet will not be executed.
Honestly, I wouldn't be that worried about the last two and would base my choice on 1) is progressive data processing important; and 2) what your app tolerance is for failures and partial data.
Unless you are dealing with slow (very slow by today's standards) connections and really need incremental updates, do it in one request.
That gives you better efficiency for compressing the response, and avoids the overhead of the extra HTTP requests and response headers.
And you have to keep in mind that the servers may have vulnerabilities with large requests.
I'm evaluating possible solutions for handling a large quantity of queued messages, which must be delivered to workers at a certain date and time. The result of executing them is mostly updates to stored data, and they may or may not be originally triggered by user action.
For example, think of what you'd implement in a hypothetical large-scale StarCraft game server for storing and executing users' actions, like upgrading a building, hatching a soldier, all of which requires to be applied to the game state after several seconds or minutes after the player initiates them.
The problem is I can't seem to find the right term to name this problem area. There are several that looks similar, but different:
cron/task/job scheduler
The content of the queue is not dynamic, it's predefined.
Each task is scheduled.
message queue
The content of the queue is dynamic.
Each task is intended to be delivered immediately.
???
The content of the queue is dynamic.
Each task is scheduled.
If there are message queues that allow conditional delivery of messages, that might be it.
Summary:
What are these kind of technology called?
What are some of the solutions out there?
This just sounds like a trivial priority queue on the surface. The priority in this case is the time of completion, and you check the front of the queue to see when the next event is due. Pretty much every language comes with a priority queue or something that can easily be used as one, so I'm not sure what the actual problem is here.
Is it that you're worried about scalability, when it comes to millions of messages? Obviously 'millions' is a meaningless term - if that's millions per day, it's a trivial problem. If it's millions per second, then you can just scale horizontally, splitting the queue across multiple processes. (And the benefit of such a queue system is that this parallelization is really simple.)
I would bet that when implementing a large scale real-time strategy game server you would hit networking problems long before you start hitting problems with the message queue.
Have you tried looking at push queues by Iron.io? The content of the queue can be anything you like, and you specify a webhook to where the messages will be pushed to. You can also set a delay for each of the messages.
The webhook is static though for each queue and delay isn't always exactly on time (could be up to a minute off). If timing is more important or the ability of providing a different webhook per message is important, try looking at boomerang.io.
They say they are pretty accurate on the timing, you can provide a delay or unix timestamp for the webhook to return and that is per message. Sounds like either of those might work for you.
For StarCraft, I would use the Red Dwarf server.
For a Java EE app, I would use Quartz Scheduler.
It seems to me that a queue-based solution would be best in this case for a number of reasons:
Management. Most queuing solutions provide support for inspecting the content of queues which makes it easier to debug, easier to take action when certain threshold are exceeded, ...
Performance. You can divide workload by having multiple enqueue/dequeue processes (gives you the ability to scale out).
Prioritizing. Most queues support prioritizing of messages (probably not all messages are equally important).
...
Remaining problem is the immediate delivery of messages in the queue. You have two ways to solve this: either delay enqueuing of messages or delay execution of dequeued messages. I would go with the first approach, delayed enqueuing.
A message then has two properties: (content, delay). You provide the message to a component in your system that queues the message at the appropriate time.
I'm not sure what programming language you're using, but the MS .NET 4 framework has support for such a scenario (delayed execution of tasks).
I'm trying to replace a small homegrown messaging system, and are playing around a bit with zmq .
I'll be needing to detect slow readers, and boot/disconnect them - slow readers pretty much meaning a particular consumer whos queue size is above a certain threshold.
So far it seems zmq blocks every consumer if one of them is a bit slow (fair enough) - but
I can't find any way to detect a potential slow consumer. Anyone have any experience with
wether and how this is possible with zmq - or have any other broker-less messaging system to recccommend ?
As of zeromq-2.0.7, you can set the ZMQ_HWM option on a ZMQ_PUB socket to control the maximum number of messages that can be queued for a subscriber. Once the high-water mark has been reached, all further messages destined for that subscriber will be dropped until the queue size drops back below the high-water mark. This limits the amount of memory dedicated to what you call a slow reader.
However, because the ZeroMQ library exposes sockets, not clients, there is no way for you to identify and forcibly disconnect unwanted clients without modifying the library itself.
There is a section in the ZeroMq Guide regarding this, it suggests implementing a pattern the call the "Suicidal Snail Pattern".
Basically, it reverses the dependency and tries to convince slow subscribers to disconnect/kill themselves by giving them a way to detect if they have become slow readers.