I am trying to find handle the akka Payloadsizeexceeded exception. But since there is no way to handle it, I would like to know the size of the message that is being passed. For this I would like to calculate the size of message and this size needs to give the exact size of the clustermessage being passed not just the json size inside my request. Is there a way to know this size?
For example my json size is 31998bytes but when the message is being passed between the actors some amount of encoding is happening and the actual size of the message being passed is increased to 32778. How can I know this final message size?
When you send a message between actor systems there is an envelope around the message payload, which contains the recipient actor ref, sender actor ref, identifies the serializer to use and a few other things. A few of them contain strings (actor path, actor system name, node hostname) which will affect making it hard to say exactly how large the overhead is.
I'd recommend you to add test coverage sending messages remotely between the expected actors and then be conservative given that the hostnames will likely differ when deciding on how big payloads to accept.
Note that if you need many times larger payloads than the default 128k it is a good idea to consider splitting the payloads up into smaller messages instead or use some form of side channel to transfer the data as it will cause head of line blocking for other smaller messages such as the remoting heartbeats making remoting and cluster less stable.
With the new remoting subsystem, which is not yet stable/production ready has support for a separate channel for large messages, and additionally does compression of actor refs so that actors often communicating remotely will be identified by an index in a cache rather than a full serialized actor ref.
Related
I'm developing a web app that needs to handle bursts of very high loads,
once per minute I get a burst of requests in very few seconds (~1M-3M/sec) and then for the rest of the minute I get nothing,
What's my best strategy to handle as many req /sec as possible at each front server, just sending a reply and storing the request in memory somehow to be processed in the background by the DB writer worker later ?
The aim is to do as less as possible during the burst, and write the requests to the DB ASAP after the burst.
Edit : the order of transactions in not important,
we can lose some transactions but 99% need to be recorded
latency of getting all requests to the DB can be a few seconds after then last request has been received. Lets say not more than 15 seconds
This question is kind of vague. But I'll take a stab at it.
1) You need limits. A simple implementation will open millions of connections to the DB, which will obviously perform badly. At the very least, each connection eats MB of RAM on the DB. Even with connection pooling, each 'thread' could take a lot of RAM to record it's (incoming) state.
If your app server had a limited number of processing threads, you can use HAProxy to "pick up the phone" and buffer the request in a queue for a few seconds until there is a free thread on your app server to handle the request.
In fact, you could just use a web server like nginx to take the request and say "200 OK". Then later, a simple app reads the web log and inserts into DB. This will scale pretty well, although you probably want one thread reading the log and several threads inserting.
2) If your language has coroutines, it may be better to handle the buffering yourself. You should measure the overhead of relying on our language runtime for scheduling.
For example, if each HTTP request is 1K of headers + data, want to parse it and throw away everything but the one or two pieces of data that you actually need (i.e. the DB ID). If you rely on your language coroutines as an 'implicit' queue, it will have 1K buffers for each coroutine while they are being parsed. In some cases, it's more efficient/faster to have a finite number of workers, and manage the queue explicitly. When you have a million things to do, small overheads add up quickly, and the language runtime won't always be optimized for your app.
Also, Go will give you far better control over your memory than Node.js. (Structs are much smaller than objects. The 'overhead' for the Keys to your struct is a compile-time thing for Go, but a run-time thing for Node.js)
3) How do you know it's working? You want to be able to know exactly how you are doing. When you rely on the language co-routines, it's not easy to ask "how many threads of execution do I have and what's the oldest one?" If you make an explicit queue, those questions are much easier to ask. (Imagine a handful of workers putting stuff in the queue, and a handful of workers pulling stuff out. There is a little uncertainty around the edges, but the queue in the middle very explicitly captures your backlog. You can easily calculate things like "drain rate" and "max memory usage" which are very important to knowing how overloaded you are.)
My advice: Go with Go. Long term, Go will be a much better choice. The Go runtime is a bit immature right now, but every release is getting better. Node.js is probably slightly ahead in a few areas (maturity, size of community, libraries, etc.)
How about a channel with a buffer size equal to what the DB writer can handle in 15 seconds? When the request comes in, it is sent on the channel. If the channel is full, give some sort of "System Overloaded" error response.
Then the DB writer reads from the channel and writes to the database.
I am writing monitoring program for a very high traffic network (HD videos are streamed through the network). Most packets are very large and I only want to watch the headers (IP and UDP/TCP only). Of course I want to avoid overhead of copying the entire data. Does libpcap necessarily give me a copy the whole packet? If yes, is there any library that matches my needs?
There appear to be two questions here:
the one in the title, which sounds as if it's asking whether libpcap copies the packet;
the one in the body, asking whether it always copies the entire packet.
For the first question:
There's probably at least one copy done by any code using the mechanisms atop which libpcap runs in various OSes - a copy from the mbufs/skbuff/STREAMS buffers/whatever to the mechanism's buffer. For Linux, when the tpacket mechanism is not being used, the skbuff might just be queued on the receive queue for the PF_PACKET socket libpcap is using.
There may be another copy - a copy from that buffer to userland; if libpcap is using a "zero-copy" mechanism, such as the Linux tpacket mechanism (which libpcap 1.0 and later use by default), the second copy doesn't happen. It will happen if a zero-copy mechanism isn't being used.
However, if you're using pcap_next() or pcap_next_ex() on a Linux system and the tpacket mechanism is being used, a separate copy, from the memory-mapped buffer to a private buffer; that doesn't happen if you use pcap_dispatch() or pcap_loop().
For the second question:
That's what the "snaplen" argument to pcap_open_live() and pcap_set_snaplen() is for - it lets you specify that no more than "snaplen" bytes of packet data should be captured, and that means that no more than that many bytes are copied.
Note that this length includes the link-layer headers, and that those can include "metadata" headers such as radiotap headers that you might get on 802.11 adapters. This header might be variable-length (for example, on 802.11, the 802.11 header is variable-length, and, if you're getting radiotap headers, those are variable-length as well).
In addition, both IPv4 and TCP headers can have options, and IPv6 packets can have extension headers, so the length of IP and TCP headers can also be variable.
This means that you might have to determine a "worst case" snapshot length to use; there's no way to explicitly say "don't give me anything past the TCP/UDP header", you can only say "give me no more than N bytes".
I am writing fairly simply pcap "live" capture engine, however the packet processing callback implementation for pcap_dispatch should take relatively long time for processing.
Does pcap run every "pcap_handler" callback in separate thread? If yes, is "pcap_handler" thread-safe, or should the care be taken to protect it with critical sections?
Alternatively, does pcap_dispatch callback works in serial fashion? E.g. is "pcap_handler" for the packet 2 called only after "pcap_handler" for packet 1 is done? If so, is there an approach to avoid accumulating latency?
Thanks,
-V
Pcap basically works like this: There is a kernel-mode driver capturing the packets and placing them in a buffer of size B. The user-mode application may request any amount of packets at any time using pcap_loop, pcap_dispatch, or pcap_next (the latter is basically pcap_dispatch with one packet).
Therefore, when you use pcap_dispatch to request some packets, libpcap goes to the kernel and asks for the next packet in the buffer (If there isn't one the timeout code and stuff kicks in, but this is irrelevant for this discussion), transfers it into userland and deletes it from the buffer. After that, pcap_dispatch calls your handler, reduces it's packets-to-do counter and starts from the beginning. As a result, pcap_dispatch only returns if the requested amount of packets have been processed, an error ocurred, or a timeout happened.
As you can see, libpcap is completely non-threaded, as most C API's are. The kernel-mode driver, however, is obviously happy enough to deliver packets to multiple threads (else you wouldn't be able to capture from more than one process), and is completly thread-safe (there is one separate buffer for each usermode handle).
This implies that you must implement all parallelisms by yourself. You'd want to do something like this:
pcap_dispatch(P, count, handler, data);
.
.
.
struct pcap_work_item {
struct pcap_pkthdr header;
u_char data[];
};
void handler(u_char *user, struct pcap_pkthdr *header, u_char *data)
{
struct pcap_work_item *item = malloc(sizeof(pcap_pkthdr) + header->caplen);
item->header = *header;
memcpy(item->data, data, header->caplen);
queue_work_item(item);
}
Note that we have to copy the packet into the heap, because the header and data pointers are invalid after the callback returns.
The function queue_work_item should find a worker thread, and assign it the task of handling the packet. Since you said that your callback takes a 'relativley long time', you likely need a large number of worker threads. Finding a suitable number of workers is subject to fine-tweaking.
At the beginning of this post I said that the kernel-mode driver has buffer to collect incoming packets which await processing. The size of this buffer is implementation-defined. The snaplen parameter to pcap_open_live only controls how many bytes of one packet are captured, however, the number of packets cannot be controlled in a portable fashion. It might be fixed-size. It might get larger as more and more packets arrive. However, if it overflows, all further packets are discarded until there is enough space for the next one to arrive. If you want to use your application in a high-traffic environment, you want to make sure that your *pcap_dispatch* callback completes quickly. My sample callback simply assigns the packet to a worker, so it works fine even in high-traffic enviroments.
I hope this answers all your questions.
I want to create a fairly simple mathematical model that describes usage patterns and performance trade-offs in a system.
The system behaves as follows:
clients periodically issue multi-cast packets to a network of hosts
any host that receives the packet, responds with a unicast answer directly
the initiating host caches the responses for some given time period, then discards them
if the cache is full the next time a request is required, data is pulled from the cache not the network
packets are of a fixed size and always contain the same information
hosts are symmetic - any host can issue a request and respond to requests
I want to produce some simple mathematical models (and graphs) that describe the trade-offs available given some changes to the above system:
What happens where you vary the amount of time a host caches responses? How much data does this save? How many calls to the network do you avoid? (clearly depends on activity)
Suppose responses are also multi-cast, and any host that overhears another client's request can cache all the responses it hears - thereby saving itself potentially making a network request - how would this affect the overall state of the system?
Now, this one gets a bit more complicated - each request-response cycle alters the state of one other host in the network, so the more activity the quicker caches become invalid. How do I model the trade off between the number of hosts, the rate of activity, the "dirtyness" of the caches (assuming hosts listen in to other's responses) and how this changes with cache validity period? Not sure where to begin.
I don't really know what sort of mathematical model I need, or how I construct it. Clearly it's easier to just vary two parameters, but particularly with the last one, I've got maybe four variables changing that I want to explore.
Help and advice appreciated.
Investigate tokenised Petri nets. These seem to be an appropriate tool as they:
provide a graphical representation of the models
provide substantial mathematical analysis
have a large body of prior work and underlying analysis
are (relatively) simple mathematical models
seem to be directly tied to your problem in that they deal with constraint dependent networks that pass tokens only under specified conditions
I found a number of references (quality not assessed) by a search on "token Petri net"
I need a 2D array (as Json) to be sent from server to client. It would be around 400x400 in size with each entry around 4 characters of text. So that makes it around 640KB of data.
Which of the following extreme approaches is better ?
I make a large HTTP request of all the data at one go.
I make 400 requests - each asking for a single row (around 1.6 KB)
I believe optimal approach would be somewhere in middle. Could anyone give me an idea what might be the optimal single request size for this data?
Thanks.
Couple of considerations for choosing one big vs several small:
In the single request case, you can't do progressive data processing as the data arrives; you need to wait for the full packet to arrive before you can do anything. If it fails, you need to start everything from scratch.
In the multiple requests case, you can do progressive data processing. However, you now have to consider the potential for multiple failures and how to recover from these.
Multiple requests incur overhead for each request. This is additional bandwidth you app will be consuming.
Some HTTP agents limit the number of concurrent requests to the same server, and you might need to do some logic to work around that.
Response compression will work better for the single request case.
Multiple requests won't require you to allocate the full memory for your data. Granted, 640KB is not that big chunk of memory, so that might not be a big consideration for you, depending on how often you will allocate it.
In the case of early terminate of the process (either a Cancel button or the app is terminated or the browser navigates away from your page), the single request will still finish the full response download; however, for the multiple requests case, any request your code hasn't started yet will not be executed.
Honestly, I wouldn't be that worried about the last two and would base my choice on 1) is progressive data processing important; and 2) what your app tolerance is for failures and partial data.
Unless you are dealing with slow (very slow by today's standards) connections and really need incremental updates, do it in one request.
That gives you better efficiency for compressing the response, and avoids the overhead of the extra HTTP requests and response headers.
And you have to keep in mind that the servers may have vulnerabilities with large requests.