Reason to set queue size of ROS publisher or subscriver to a large value - message-queue

When I look over the tutorial of Robot Operating system (ROS), I found most example codes set the publisher's queue size to a larger value such as 1000. I think this leads to losing real-time response of the node.
For what purpose, do people set it to that large value?

From ROS docs (http://wiki.ros.org/ROS/Tutorials/WritingPublisherSubscriber):
Message publisher (producer):
"The second parameter to advertise() is the size of the message queue
used for publishing messages. If messages are published more quickly
than we can send them, the number here specifies how many messages to
buffer up before throwing some away."
Message subscriber:
"The second parameter to the subscribe() function is the size of the message queue. If messages are arriving faster than they are being processed, this is the number of messages that will be buffered up before beginning to throw away the oldest ones."
Possible explanation:
Think in the consumer-producer problem.
You can't guarantee that you will consume messages in the rate they arrive. So you create a queue that is filled as messages comes by sender (some sensor for instance).
Bad case: If your program delays in some other part and you can't read the messages in the rate they arrived the queue increases.
Good case: As soon as your other processing load diminishes you can read the queue faster and start to reduce it. If you have available time you will end up reducing queue size to zero.
So as for your question, if you send queue size to large value you may guarantee that will not lose messages. In a simple example you have no memory constraints so you can do anything you want, like use many GBytes of RAM to create a large queue and assures will always work. Or if you create a toy example to explain a concept you don't want your program to crash for other reasons.
A real life example can be a scenario of a waiter and a kitchen to wash dishes.
Suppose the costumers ends its meals and the waiter takes their dirty dishes to wash in the kitchen. He puts in a table. Whenever the dishwasher can, he goes to table and gets dishes and take to wash. In normal operation the table is never filled. But if someone else give another task to the dishwasher guy, the table will start to get full. Until some time the waiter can't place dishes anymore and leave tables dirty (problem in the system). But if table is artificially large there (let's say 1000 square units) the waiter will likely fulfill its job even if dishwasher is busy, considering that after some time he will be able to return to clean dishes.
Ok, long answer, but it may be of help to understand queues.

Related

Understanding the max.inflight property of kafka producer

I work on a bench of my Kafka cluster in version 1.0.0-cp1.
In part of my bench who focus on the max throughput possible with ordering guarantee and no data loss (a topic with only one partition), need I to set the max.in.flight.requests.per.connection property to 1?
I've read this article
And I understand that I only have to set the max.in.flight to 1 if I enable the retry feature at my producer with the retries property.
Another way to ask my question: Only one partition + retries=0 (producer props) is sufficient to guarantee the ordering in Kafka?
I need to know because increase the max.in.flight increases drastically the throughput.
Your use case is slightly unclear. You mention ordering and no data loss but don't specify if you tolerate duplicate messages. So it's unclear if you want At least Once (QoS 1) or Exactly Once
Either way, as you're using 1.0.0 and only using a single partition, you should have a look at the Idempotent Producer instead of tweaking the Producer configs. It allows to properly and efficiently guarantee ordering and no data loss.
From the documentation:
Idempotent delivery ensures that messages are delivered exactly once
to a particular topic partition during the lifetime of a single
producer.
The early Idempotent Producer was forcing max.in.flight.requests.per.connection to 1 (for the same reasons you mentioned) but in the latest releases it can now be used with max.in.flight.requests.per.connection set to up to 5 and still keep its guarantees.
Using the Idempotent Producer you'll not only get stronger delivery semantics (Exactly Once instead of At least Once) but it might even perform better!
I recommend you check the delivery semantics [in the docs]
[in the docs]:http://kafka.apache.org/documentation/#semantics
Back to your question
Yes without the idempotent (or transactional) producer, if you want to avoid data loss (QoS 1) and preserve ordering, you have to set max.in.flight.requests.per.connection to 1, allow retries and use acks=all. As you saw this comes at a significant performance cost.
Yes, you must set the max.in.flight.requests.per.connection property to 1.
In the article you have read it was an initial mistake (currently corrected) where author wrote:
max.in.flights.requests.per.session
which doesn't exist in the Kafka documentation.
This errata comes probably from the book "Kafka The Definitive Guide" (1st edition) where you can read in the page 52:
<...so if guaranteeing order is critical, we recommend setting
in.flight.requests.per.session=1 to make sure that while a batch of
messages is retrying, additional messages will not be sent ...>
imo, it is invaluable to also know about this issue that makes things far more interesting and slightly more complicated.
When you enable enable.idempotence=true , every time you send a message to the broker, you also send a sequence number, starting from zero. Brokers store that sequence number too on their side. When you make a next request to the broker, let’s say with sequence_id=3, the broker can look at its currently stored sequence number and say :
if its 4 - good, its a new batch of records
if its 3 - its a duplicate
if its 5 (or higher), it means messages were lost
And now max.inflight.requests.per.connection . A producer can send as many as this value concurrent requests without actually waiting for an answer from the broker. When we reach 3 (let’s say max.inflight.requests.per.connection=3) , we start to ask the broker for the previous results (at the same time we can’t process any batches now even if they are ready).
Now, for the sake of the example, let’s say the broker says this : “1 was OK, I stored it”, “2 has failed” and now the important part: because 2 failed, the only possible thing you can get for 3 is “out of order”, which means it did not store it. The client now knows that it needs to reprocess 2 and 3 and it creates a List and resends them - in that exact order; if retry is enabled.
This explanation is probably over simplified, but this is my basic understanding after reading the source code a bit.

How to notify once a batch in an SQS queue is done?

I have a batch of n messages in an SQS queue and a number of workers. These workers take messages from the queue, process them, and then delete them if they are successful. Once all the workers finish this batch of n messages, I want to perform an additional action. The only problem is figuring out when a batch is complete.
One way to do it is to check that the queue is empty. When I take a look at the SQS API, the only thing that seems close is the ApproximateNumberOfMessages attribute you get from GetQueueAttributes. However, the word "approximate" suggests that it really isn't intended for what I have in mind, and that its purpose is more for scaling up and down the number of workers based on roughly how many messages are in the queue.
What would be the standard way to achieve what I want? Or is SQS ill-fit for this purpose?
SQS doesn't really have any built-in mechanisms for grouping messages. Furthermore, SQS doesn't guarantee that a particular message won't be processed more than once[1], so you can't simply count the number of messages processed.
Instead, you'll probably need to track each message individually in an external datastore, and then after each message is processed, check to see if there are any remaining messages.
For instance:
As you enqueue each message in the group to the original queue, record the message ID in an external database along with a group number of your own invention.
After a worker processes a message, the worker should get the group number for that message from the database (or just include the group number as an attribute in the original message), and delete the message ID from the database (if it wasn't already deleted by another worker, which could happen if two workers got the same message from the queue). The worker should then enqueue a new message containing the group number into a second queue.
Another worker reads messages containing the group number from the second queue, and checks the database to see if any of the original messages for that group number remain. If there are any, this worker does nothing. If there are no more messages for the group, this worker performs your additional action. Be mindful that due to SQS' distributed nature, this final message could also be processed more than once, so the additional action should be idempotent (or at least somehow check to see if it has been performed already).
With this setup, you'll be able to run multiple unrelated batches through the system simultaneously.
You could consider adding a bit of code to your worker process(es) that starts a timer of some sort when it asks for a message to process and gets nothing back; if you worker asks for messages, processes messages and then delete messages, and as you say the 'batch' is just a collection of messages recieved arund the same time, then presumably if 5 minutes (or some other user-defined period) goes by and no new messages are returned after repeated requests, you might be able to kick-off your 'after batch' process. This will be more accurate if you can scale down your worker process to just one by the time it gets to the end of the queue (so you can be sure that other nodes are not still processing).
This is by no means perfect - and will depend on the flow / timing of your messages and the criticality of defining what belongs in a 'batch' and what does not.
Alternatively, if at the front-end you know the precise number of messages that get put into a batch, you could count down the number of processed messages and know you are down when you get down to zero.

Can I achieve ordered processing with multiple consumers in Kafka?

In Kafka, I have a producer queuing up work of clients. Each piece of work has a client ID on it. Work of different clients can be processed out of order, but work of one client must be processed in order.
To do this, I intend to have (for example) 20 topics to achieve parallelism. The producer will queue up work of a client ID into topic[client ID mod 20]. I then intend to have many consumers each capable of processing work of any client but I still want the work processed in order. This means that the next price of work in the topic can't begin to be processed before the previous piece has completed. In case of consumer failure it's OK to process work twice, but it means that the offset of that topic can't progress to the next piece of work.
Note: the number of messages per second is rather small (10s-100s messages).
To sum up:
'At least once' processing of every message (=work)
In order processing of work for one topic
Multiple consumers for every topic to support consumer failure
Can this be done using Kafka?
Yes, you can do this with Kafka. But you shouldn't do it quite the way you've described. Kafka already supports semantic partitioning within a topic if you provide a key with each message. In this case you'd create a topic with 20 partitions, then make the key for each message the client ID. This guarantees all messages with the same key end up in the same partition, i.e. it will do the partitioning that you were going to do manually.
When consuming, use the high level consumer, which automatically balances partitions across available consumers. If you want to absolutely guarantee at least once processing, you should commit the offsets manually and make sure you have fully processed messages you have consumed before committing them. Beware that consumers joining or leaving the group will cause rebalancing of partitions across the instances, and you'll have to make sure you handle that correctly (e.g. if your processing is stateful, you'll have to make sure that state can be moved between consumers upon rebalancing).

Is there / would be feasible a service providing random elements from a given SQL table?

ABSTRACT
Talking with some colleagues we came accross the "extract random row from a big database table" issue. It's a classic one and we know the naive approach (also on SO) is usually something like:
SELECT * FROM mytable ORDER BY RAND() LIMIT 1
THE PROBLEM
We also know a query like that is utterly inefficient and actually usable only with very few rows. There are some approaches that could be taken to attain better efficiency, like these ones still on SO, but they won't work with arbitrary primary keys and the randomness will be skewed as soon as you have holes in your numeric primary keys. An answer to the last cited question links to this article which has a good explanation and some bright solutions involving an additional "equal distribution" table that must be maintained whenever the "master data" table changes. But then again if you have frequent DELETEs on a big table you'll probably be screwed up by the constant updating of the added table. Also note that many solutions rely on COUNT(*) which is ridiculously fast on MyISAM but "just fast" on InnoDB (I don't know how it performs on other platforms but I suspect the InnoDB case could be representative of other transactional database systems).
In addition to that, even the best solutions I was able to find are fast but not Ludicrous Speed fast.
THE IDEA
A separate service could be responsible to generate, buffer and distribute random row ids or even entire random rows:
it could choose the best method to extract random row ids depending on how the original PKs are structured. An ordered list of keys could be maintained in ram by the service (shouldn't take too many bytes per row in addition to the actual size of the PK, it's probably ok up to 100~1000M rows with standard PCs and up to 1~10 billion rows with a beefy server)
once the keys are in memory you have an implicit "row number" for each key and no holes in it so it's just a matter of choosing a random number and directly fetch the corresponding key
a buffer of random keys ready to be consumed could be maintained to quickly respond to spikes in the incoming requests
consumers of the service will connect and request N random rows from the buffer
rows are returned as simple keys or the service could maintain a (pool of) db connection(s) to fetch entire rows
if the buffer is empty the request could block or return EOF-like
if data is added to the master table the service must be signaled to add the same data to its copy too, flush the buffer of random picks and go on from that
if data is deleted from the master table the service must be signaled to remove that data too from both the "all keys" list and "random picks" buffer
if data is updated in the master table the service must be signaled to update corresponding rows in the key list and in the random picks
WHY WE THINK IT'S COOL
does not touch disks other than the initial load of keys at startup or when signaled to do so
works with any kind of primary key, numerical or not
if you know you're going to update a large batch of data you can just signal it when you're done (i.e. not at every single insert/update/delete on the original data), it's basically like having a fine grained lock that only blocks requests for random rows
really fast on updates of any kind in the original data
offloads some work from the relational db to another, memory only process: helps scalability
responds really fast from its buffers without waiting for any querying, scanning, sorting
could easily be extended to similar use cases beyond the SQL one
WHY WE THINK IT COULD BE A STUPID IDEA
because we had the idea without help from any third party
because nobody (we heard of) has ever bothered to do something similar
because it adds complexity in the mix to keep it updated whenever original data changes
AND THE QUESTION IS...
Does anything similar already exists? If not, would it be feasible? If not, why?
The biggest risk with your "cache of eligible primary keys" concept is keeping the cache up to date, when the origin data is changing continually. It could be just as costly to keep the cache in sync as it is to run the random queries against the original data.
How do you expect to signal the cache that a value has been added/deleted/updated? If you do it with triggers, keep in mind that a trigger can fire even if the transaction that spawned it is rolled back. This is a general problem with notifying external systems from triggers.
If you notify the cache from the application after the change has been committed in the database, then you have to worry about other apps that make changes without being fitted with the signaling code. Or ad hoc queries. Or queries from apps or tools for which you can't change the code.
In general, the added complexity is probably not worth it. Most apps can tolerate some compromise and they don't need an absolutely random selection all the time.
For example, the inequality lookup may be acceptable for some needs, even with the known weakness that numbers following gaps are chosen more often.
Or you could pre-select a small number of random values (e.g. 30) and cache them. Let app requests choose from these. Every 60 seconds or so, refresh the cache with another set of randomly chosen values.
Or choose a random value evenly distributed between MIN(id) and MAX(id). Try a lookup by equality, not inequality. If the value corresponds to a gap in the primary key, just loop and try again with a different random value. You can terminate the loop if it's not successful after a few tries. Then try another method instead. On average, the improved simplicity and speed of an equality lookup may make up for the occasional retries.
It appears you are basically addressing a performance issue here. Most DB performance experts recommend you have as much RAM as your DB size, then disk is no longer a bottleneck - your DB lives in RAM and flushes to disk as required.
You're basically proposing a custom developed in-RAM CDC Hashing system.
You could just build this as a standard database only application and lock your mapping table in RAM, if your DB supports this.
I guess I am saying that you can address performance issues without developing custom applications, just use already existing performance tuning methods.

Considerations for very large SQL tables?

I'm building, basically, an ad server. This is a personal project that I'm trying to impress my boss with, and I'd love any form of feedback about my design. I've already implemented most of what I describe below, but it's never too late to refactor :)
This is a service that delivers banner ads (http://myserver.com/banner.jpg links to http://myserver.com/clicked) and provides reporting on subsets of the data.
For every ad impression served and every click, I need to record a row that has (id, value) [where value is the cash value of this transaction; e.g. -$.001 per served banner ad at $1 CPM, or +$.25 for a click); my output is all based on earnings per impression [abbreviated EPC]: (SUM(value)/COUNT(impressions)), but on subsets of the data, like "Earnings per impression where browser == 'Firefox'". The goal is to output something like "Your overall EPC is $.50, but where browser == 'Firefox', your EPC is $1.00", so that the end user can quickly see significant factors in their data.
Because there's a very large number of these subsets (tens of thousands), and reporting output only needs to include the summary data, I'm precomputing the EPC-per-subset with a background cron task, and storing these summary values in the database. Once in every 2-3 hits, a Hit needs to query the Hits table for other recent Hits by a Visitor (e.g. "find the REFERER of the last Hit"), but usually, each Hit only performs an INSERT, so to keep response times down, I've split the app across 3 servers [bgprocess, mysql, hitserver].
Right now, I've structured all of this as 3 normalized tables: Hits, Events and Visitors. Visitors are unique per visitor session, a Hit is recorded every time a Visitor loads a banner or makes a click, and Events map the distinct many-to-many relationship from Visitors to Hits (e.g. an example Event is "Visitor X at Banner Y", which is unique, but may have multiple Hits). The reason I'm keeping all the hit data in the same table is because, while my above example only describes "Banner impressions -> clickthroughs", we're also tracking "clickthroughs -> pixel fires", "pixel fires -> second clickthrough" and "second clickthrough -> sale page pixel".
My problem is that the Hits table is filling up quickly, and slowing down ~linearly with size. My test data only has a few thousand clicks, but already my background processing is slowing down. I can throw more servers at it, but before launching the alpha of this, I want to make sure my logic is sound.
So I'm asking you SO-gurus, how would you structure this data? Am I crazy to try to precompute all these tables? Since we rarely need to access Hit records older than one hour, would I benefit to split the Hits table into ProcessedHits (with all historical data) and UnprocessedHits (with ~last hour's data), or does having the Hit.at Date column indexed make this superfluous?
This probably needs some elaboration, sorry if I'm not clear, I've been working for past ~3 weeks straight on it so far :) TIA for all input!
You should be able to build an app like this in a way that it won't slow down linearly with the number of hits.
From what you said, it sounds like you have two main potential performance bottlenecks. The first is inserts. If you can have your inserts happen at the end of the table, that will minimize fragmentation and maximize throughput. If they're in the middle of the table, performance will suffer as fragmentation increases.
The second area is the aggregations. Whenever you do a significant aggregation, be careful that you don't cause all in-memory buffers to get purged to make room for the incoming data. Try to minimize how often the aggregations have to be done, and be smart about how you group and count things, to minimize disk head movement (or maybe consider using SSDs).
You might also be able to do some of the accumulations at the web tier based entirely on the incoming data rather than on new queries, perhaps with a fallback of some kind if the server goes down before the collected data is written to the DB.
Are you using INNODB or MyISAM?
Here are a few performance principles:
Minimize round-trips from the web tier to the DB
Minimize aggregation queries
Minimize on-disk fragmentation and maximize write speeds by inserting at the end of the table when possible
Optimize hardware configuration
Generally you have detailed "accumulator" tables where records are written in realtime. As you've discovered, they get large quickly. Your backend usually summarizes these raw records into cubes or other "buckets" from which you then write reports. Your cubes will probably define themselves once you map out what you're trying to report and/or bill for.
Don't forget fraud detection if this is a real project.