Messaging queue : paralellism and ordering - message-queue

From one message queue containing orders (queued by users) and a bunch of workers taking messages from that queue and processing them :
I would like to ensure that the orders are processed in parallel when users are different, but sequentialy when they are queues from the same user.
For example :
message_from_alex_1
message_from_david_1
message_from_alex_2
message_from_david_2
message_from_thomas_1
The process sequence should ensure that message_from_alex_1 is always processed before message_from_alex_2 and message_from_david_1 always before message_from_david_2.
Any hint ?

Related

Group messages by key for pulsar key_shared subscription type

Say I have an unbounded set of keys for messages published to a pulsar topic.
{k0, k1, ..., kn}
And a finite set of expected message categories, where the category information is part of the message payload.
{c0, c1, c2}
Whenever all message categories for a given key are consumed I want to invoke an action in my application. For example, if I see the following key/category pairs, I would expect to see an action invoked.
{(k0, c0), (k0, c1), (k0, c2)} => action invoked for key k0
{(k1, c0), (k1, c1), (k1, c2)} => action invoked for key k1
In order to ensure application resiliency I only ack messages once all categories have been consumed. If a message pertaining to the same category is consumed twice I can ack the older message, holding on to one message per category.
Now, let's say I have a single consumer attached to the subscription and configured with the key_shared subscription type. We consume the following key/category pairs.
{(k0, c0), (k0, c1)}
And while waiting for (k0, c2) a second consumer is added to the subscription. According to this issue, the new consumer will not receive messages until the existing consumer acks or nacks the pending messages. This seems to be expected behaviour, and is indeed the behaviour I am seeing.
I am wondering if there is there a more idiomatic way I can go about implementing this feature? Does it make sense to delay acking of messages in order to achieve this grouping behaviour?
Using a partitioned topic with the failover subscription type achieves our design goal. Below are a description of the approaches we explored and the observed behaviour.
Non-partitioned topic with key_shared subscription
When the application is scaled out (more consumers added to the subscription), any pending messages (messages delivered to a consumer, but not yet acked) cause the new consumer to not receive any messages until the pending messages are acked/nacked or the pre-existing consumer unsubscribes.
Partitioned topic with failover subscription
When the application is scaled out, topic/partition pairs are re-assigned evenly across consumers and pending messages (if any) are re-delivered. Consumers needs to be informed when ownership of a topic partition changes in order to clear the internal state, for this the consumer event listener can be used.

Looking for an example of a OBD-II complete data frame

I'm developing an OBD-II reader where I want to query requests to read PID parameters with a stm32 processor. I already understand what should go on the data field, but the ID is giving me a headache. As I have read, one must send 0x7DF to broadcast a request, and each ECU will respond with his own ID. However, I have been asked to do this within the SAE J1939 protocol, which uses the 29 bit extended identifier, and I don't know what I need to add to this ID.
As I stated in the title, could someone show me some actual data from a bus using this method? I've been searching on the internet for real frames but did not have any luck so far.
I woud also appreciate if someone could shred some light to if the OBD-II communication needs some acknowledgment to work properly.
Thanks
I would suggest you to take a look on the SAE J1939 documentation, in the more specifically on the J1939/21,J1939-71 and J1939/73.
Generally, a J1939 transport protocol response sequence can be processed as follows:
Identify the BAM frame, indicating a new sequence being initiated
(via the PGN 60416 - 0xEC00 can be reach by 0x1CECFF00 )
Extract the J1939 PGN from bytes 6-8 of the BAM payload to use as the
identifier of the new frame
Construct the new data payload by concatenating bytes 2-8 of the data
transfer frames (i.e. excl. the 1st byte)
A J1939 data transfer messages with ID 1CEBFF00 (PGN 60160 or EB00).
Above, the last 3 bytes of the BAM equal E3FE00. When reordered, these equal the PGN FEE3 aka Engine Configuration 1 (EC1). Further, the payload is found by combining the the first 39 bytes across the 6 data transfer packets/fram
The administrative control device or any device issuing the vehicle use status PID should be sensitive to the run switch status (SPN 3046 - 0xFDC0 which probably can be reach by 0xCFDC000) and any other locally defined criteria for authorized use (i.e., driver log-ons) before the vehicle use status PID is used to generate an unauthorized use alarm.
Also, you can't forget to uses a read/send to extend ID message, since that is a 24-bit.
In fact, i will suggest you to use can-utils to make your a analyses even easier. A simple can-dump or can-sniffer you can see what is coming on your broadcast.
Some car's dbc https://github.com/commaai/opendbc

Duplicates on Apache Beam / Dataflow inputs even when using withIdAttribute

I am trying to ingest data from a 3rd party API into a Dataflow pipeline. Since the 3rd party doesn't make webhooks available, I wrote a custom script that constantly polls their endpoint for more data.
The data is refreshed every 15 minutes, but since I don't want to miss any datapoints and I want to consume as soon as new data is available, my "crawler" runs every 1 minute. The script then sends the data to a PubSub topic. Easy to see that PubSub will receive about 15 repeated messages for each datapoint in the source.
My first attempt to identify and discard those repeated messages was to add a custom attribute to each PubSub message (eventid), created from a hash of its [ID + updated_time] at source.
const attributes = {
eventid: Buffer.from(`${item.lastupdate}|${item.segmentid}`).toString('base64'),
timestamp: item.timestamp.toString()
};
const dataBuffer = Buffer.from(JSON.stringify(item))
publisher.publish(dataBuffer, attributes)
Then I configured Dataflow with a withIdAttribute() (which is the new idLabel(), based on Record IDs).
PCollection<String> input = p
.apply("ReadFromPubSub", PubsubIO
.readStrings()
.fromTopic(String.format("projects/%s/topics/%s", options.getProject(), options.getIncomingDataTopic()))
.withTimestampAttribute("timestamp")
.withIdAttribute("eventid"))
.apply("OutputToBigQuery", ...)
With that implementation, I was expecting that when the script sends the same datapoint a second time, the repeated eventid would be the same and the message discarded. But for some reason, I still see duplicates on the output dataset.
Some questions:
Is there a clever way to ingest the data to dataflow from that 3rd party API if they don't provide webhooks?
Any ideas on why dataflow is not discarding the messages on this situation?
I know about the 10-minute restriction for deduplication on dataflow, but I see duplicated data even on the 2nd insertion (2 minutes).
Any help will be greatly appreciated!
I think you are on the right track, instead of the hash I recommend to use timestamps. A better way to to this is by using windows. Review this document which filters data that is outside of the window.
Regarding the additional duplicate data, if you are using pull subscriptions and the acknowledgement deadline is reached before having the data processed the message will be resent as per the at-least-once delivery. In this case change the acknowledgement deadline, the defaults is 10 seconds.

RabbitMQ: Are multiple consumers on one queue using a non-polling strategy possible?

we use RabbitMQ to send jobs from a producer on one machine, to a small group of consumers distributed across several machines.
The producer generates jobs and places them on the queue, and the consumers check the queue every 10ms to see if there are any unclaimed jobs and fetch a job at a time if a job is available. If one particular worker takes too long to process a job (GC pauses or other transient issue), other consumers are free to remove jobs from the queue so that overall job throughput stays high.
When we originally set up this system, we were unable to figure out how to set up a subscriber relationship for more than one consumer on the queue that would prevent us from having to poll and introduce that little extra bit of latency.
Inspecting the documentation has not yielded satisfying answers. We are new to using message queues and it is possible that we don't know the words that accurately describe the above scenario. This is something like a blackboard system, but in this case the "specialists" are all identical and never consume each other's results -- results are always reported back to the job producer.
Any ideas?
Getting pub-subscribe is straight forward, i inital had same problems but works well. The project now has some great help pages at http://www.rabbitmq.com/getstarted.html
RabbitMQ has timeout and a resernt flag which can be used as you see fit.
You can also get the workers to be event driven as aposed to checking every 10ms etc. If you need help on this i have a small project at http://rabbitears.codeplex.com/ which might help slightly.
Here you have to keep in mind that rabbitMQ channel is not thread safe.
so create a singleton class that will handle all these rabbitmq operations
like
I am writing code sample in SCALA
Object QueueManager{
val FACTORY = new ConnectionFactory
FACTORY setUsername (RABBITMQ_USERNAME)
FACTORY setPassword (RABBITMQ_PASSWORD)
FACTORY setVirtualHost (RABBITMQ_VIRTUALHOST)
FACTORY setPort (RABBITMQ_PORT)
FACTORY setHost (RABBITMQ_HOST)
conn = FACTORY.newConnection
var channel: com.rabbitmq.client.Channel = conn.createChannel
//here to decare consumer for queue1
channel.exchangeDeclare(EXCHANGE_NAME, "direct", durable)
channel.queueDeclare(QUEUE1, durable, false, false, null)
channel queueBind (QUEUE1, EXCHANGE_NAME, QUEUE1_ROUTING_KEY)
val queue1Consumer = new QueueingConsumer(channel)
channel basicConsume (QUEUE1, false, queue1Consumer)
//here to decare consumer for queue2
channel.exchangeDeclare(EXCHANGE_NAME, "direct", durable)
channel.queueDeclare(QUEUE2, durable, false, false, null)
channel queueBind (QUEUE2, EXCHANGE_NAME, QUEUE2_ROUTING_KEY)
val queue2Consumer = new QueueingConsumer(channel)
channel basicConsume (QUEUE2, false, queue2Consumer)
//here u should mantion distinct ROUTING key for each queue
def addToQueueOne{
channel.basicPublish(EXCHANGE_NAME, QUEUE1_ROUTING_KEY, MessageProperties.PERSISTENT_TEXT_PLAIN, obj.getBytes)
}
def addToQueueTwo{
channel.basicPublish(EXCHANGE_NAME, QUEUE2_ROUTING_KEY, MessageProperties.PERSISTENT_TEXT_PLAIN, obj.getBytes)
}
def getFromQueue1:Delivery={
queue1Consumer.nextDelivery
}
def getFromQueue2:Delivery={
queue2Consumer.nextDelivery
}
}
i have written a code sample for 2 queues u can add more queues like above........

What is an idempotent operation?

What is an idempotent operation?
In computing, an idempotent operation is one that has no additional effect if it is called more than once with the same input parameters. For example, removing an item from a set can be considered an idempotent operation on the set.
In mathematics, an idempotent operation is one where f(f(x)) = f(x). For example, the abs() function is idempotent because abs(abs(x)) = abs(x) for all x.
These slightly different definitions can be reconciled by considering that x in the mathematical definition represents the state of an object, and f is an operation that may mutate that object. For example, consider the Python set and its discard method. The discard method removes an element from a set, and does nothing if the element does not exist. So:
my_set.discard(x)
has exactly the same effect as doing the same operation twice:
my_set.discard(x)
my_set.discard(x)
Idempotent operations are often used in the design of network protocols, where a request to perform an operation is guaranteed to happen at least once, but might also happen more than once. If the operation is idempotent, then there is no harm in performing the operation two or more times.
See the Wikipedia article on idempotence for more information.
The above answer previously had some incorrect and misleading examples. Comments below written before April 2014 refer to an older revision.
An idempotent operation can be repeated an arbitrary number of times and the result will be the same as if it had been done only once. In arithmetic, adding zero to a number is idempotent.
Idempotence is talked about a lot in the context of "RESTful" web services. REST seeks to maximally leverage HTTP to give programs access to web content, and is usually set in contrast to SOAP-based web services, which just tunnel remote procedure call style services inside HTTP requests and responses.
REST organizes a web application into "resources" (like a Twitter user, or a Flickr image) and then uses the HTTP verbs of POST, PUT, GET, and DELETE to create, update, read, and delete those resources.
Idempotence plays an important role in REST. If you GET a representation of a REST resource (eg, GET a jpeg image from Flickr), and the operation fails, you can just repeat the GET again and again until the operation succeeds. To the web service, it doesn't matter how many times the image is gotten. Likewise, if you use a RESTful web service to update your Twitter account information, you can PUT the new information as many times as it takes in order to get confirmation from the web service. PUT-ing it a thousand times is the same as PUT-ing it once. Similarly DELETE-ing a REST resource a thousand times is the same as deleting it once. Idempotence thus makes it a lot easier to construct a web service that's resilient to communication errors.
Further reading: RESTful Web Services, by Richardson and Ruby (idempotence is discussed on page 103-104), and Roy Fielding's PhD dissertation on REST. Fielding was one of the authors of HTTP 1.1, RFC-2616, which talks about idempotence in section 9.1.2.
No matter how many times you call the operation, the result will be the same.
Idempotence means that applying an operation once or applying it multiple times has the same effect.
Examples:
Multiplication by zero. No matter how many times you do it, the result is still zero.
Setting a boolean flag. No matter how many times you do it, the flag stays set.
Deleting a row from a database with a given ID. If you try it again, the row is still gone.
For pure functions (functions with no side effects) then idempotency implies that f(x) = f(f(x)) = f(f(f(x))) = f(f(f(f(x)))) = ...... for all values of x
For functions with side effects, idempotency furthermore implies that no additional side effects will be caused after the first application. You can consider the state of the world to be an additional "hidden" parameter to the function if you like.
Note that in a world where you have concurrent actions going on, you may find that operations you thought were idempotent cease to be so (for example, another thread could unset the value of the boolean flag in the example above). Basically whenever you have concurrency and mutable state, you need to think much more carefully about idempotency.
Idempotency is often a useful property in building robust systems. For example, if there is a risk that you may receive a duplicate message from a third party, it is helpful to have the message handler act as an idempotent operation so that the message effect only happens once.
A good example of understanding an idempotent operation might be locking a car with remote key.
log(Car.state) // unlocked
Remote.lock();
log(Car.state) // locked
Remote.lock();
Remote.lock();
Remote.lock();
log(Car.state) // locked
lock is an idempotent operation. Even if there are some side effect each time you run lock, like blinking, the car is still in the same locked state, no matter how many times you run lock operation.
An idempotent operation produces the result in the same state even if you call it more than once, provided you pass in the same parameters.
An idempotent operation is an operation, action, or request that can be applied multiple times without changing the result, i.e. the state of the system, beyond the initial application.
EXAMPLES (WEB APP CONTEXT):
IDEMPOTENT:
Making multiple identical requests has the same effect as making a single request. A message in an email messaging system is opened and marked as "opened" in the database. One can open the message many times but this repeated action will only ever result in that message being in the "opened" state. This is an idempotent operation. The first time one PUTs an update to a resource using information that does not match the resource (the state of the system), the state of the system will change as the resource is updated. If one PUTs the same update to a resource repeatedly then the information in the update will match the information already in the system upon every PUT, and no change to the state of the system will occur. Repeated PUTs with the same information are idempotent: the first PUT may change the state of the system, subsequent PUTs should not.
NON-IDEMPOTENT:
If an operation always causes a change in state, like POSTing the same message to a user over and over, resulting in a new message sent and stored in the database every time, we say that the operation is NON-IDEMPOTENT.
NULLIPOTENT:
If an operation has no side effects, like purely displaying information on a web page without any change in a database (in other words you are only reading the database), we say the operation is NULLIPOTENT. All GETs should be nullipotent.
When talking about the state of the system we are obviously ignoring hopefully harmless and inevitable effects like logging and diagnostics.
Just wanted to throw out a real use case that demonstrates idempotence. In JavaScript, say you are defining a bunch of model classes (as in MVC model). The way this is often implemented is functionally equivalent to something like this (basic example):
function model(name) {
function Model() {
this.name = name;
}
return Model;
}
You could then define new classes like this:
var User = model('user');
var Article = model('article');
But if you were to try to get the User class via model('user'), from somewhere else in the code, it would fail:
var User = model('user');
// ... then somewhere else in the code (in a different scope)
var User = model('user');
Those two User constructors would be different. That is,
model('user') !== model('user');
To make it idempotent, you would just add some sort of caching mechanism, like this:
var collection = {};
function model(name) {
if (collection[name])
return collection[name];
function Model() {
this.name = name;
}
collection[name] = Model;
return Model;
}
By adding caching, every time you did model('user') it will be the same object, and so it's idempotent. So:
model('user') === model('user');
Quite a detailed and technical answers. Just adding a simple definition.
Idempotent = Re-runnable
For example,
Create operation in itself is not guaranteed to run without error if executed more than once.
But if there is an operation CreateOrUpdate then it states re-runnability (Idempotency).
Idempotent Operations: Operations that have no side-effects if executed multiple times.
Example: An operation that retrieves values from a data resource and say, prints it
Non-Idempotent Operations: Operations that would cause some harm if executed multiple times. (As they change some values or states)
Example: An operation that withdraws from a bank account
It is any operation that every nth result will result in an output matching the value of the 1st result. For instance the absolute value of -1 is 1. The absolute value of the absolute value of -1 is 1. The absolute value of the absolute value of absolute value of -1 is 1. And so on. See also: When would be a really silly time to use recursion?
An idempotent operation over a set leaves its members unchanged when applied one or more times.
It can be a unary operation like absolute(x) where x belongs to a set of positive integers. Here absolute(absolute(x)) = x.
It can be a binary operation like union of a set with itself would always return the same set.
cheers
In short, Idempotent operations means that the operation will not result in different results no matter how many times you operate the idempotent operations.
For example, according to the definition of the spec of HTTP, GET, HEAD, PUT, and DELETE are idempotent operations; however POST and PATCH are not. That's why sometimes POST is replaced by PUT.
An operation is said to be idempotent if executing it multiple times is equivalent to executing it once.
For eg: setting volume to 20.
No matter how many times the volume of TV is set to 20, end result will be that volume is 20. Even if a process executes the operation 50/100 times or more, at the end of the process the volume will be 20.
Counter example: increasing the volume by 1. If a process executes this operation 50 times, at the end volume will be initial Volume + 50 and if a process executes the operation 100 times, at the end volume will be initial Volume + 100. As you can clearly see that the end result varies based upon how many times the operation was executed. Hence, we can conclude that this operation is NOT idempotent.
I have highlighted the end result in bold.
If you think in terms of programming, let's say that I have an operation in which a function f takes foo as the input and the output of f is set to foo back. If at the end of the process (that executes this operation 50/100 times or more), my foo variable holds the value that it did when the operation was executed only ONCE, then the operation is idempotent, otherwise NOT.
foo = <some random value here, let's say -2>
{ foo = f( foo ) }   curly brackets outline the operation
if f returns the square of the input then the operation is NOT idempotent. Because foo at the end will be (-2) raised to the power (number of times operation is executed)
if f returns the absolute of the input then the operation is idempotent because no matter how many multiple times the operation is executed foo will be abs(-2).
Here, end result is defined as the final value of variable foo.
In mathematical sense, idempotence has a slightly different meaning of:
f(f(....f(x))) = f(x)
here output of f(x) is passed as input to f again which doesn't need to be the case always with programming.
my 5c:
In integration and networking the idempotency is very important.
Several examples from real-life:
Imagine, we deliver data to the target system. Data delivered by a sequence of messages.
1. What would happen if the sequence is mixed in channel? (As network packages always do :) ). If the target system is idempotent, the result will not be different. If the target system depends of the right order in the sequence, we have to implement resequencer on the target site, which would restore the right order.
2. What would happen if there are the message duplicates? If the channel of target system does not acknowledge timely, the source system (or channel itself) usually sends another copy of the message. As a result we can have duplicate message on the target system side.
If the target system is idempotent, it takes care of it and result will not be different.
If the target system is not idempotent, we have to implement deduplicator on the target system side of the channel.
For a workflow manager (as Apache Airflow) if an idempotency operation fails in your pipeline the system can retry the task automatically without affecting the system. Even if the logs change, that is good because you can see the incident.
The most important in this case is that your system can retry the task that failed and doesn't mess up the pipeline (e.g. appending the same data in a table each retry)
Let's say the client makes a request to "IstanceA" service which process the request, passes it to DB, and shuts down before sending the response. since the client does not see that it was processed and it will retry the same request. Load balancer will forward the request to another service instance, "InstanceB", which will make the same change on the same DB item.
We should use idempotent tokens. When a client sends a request to a service, it should have some kind of request-id that can be saved in DB to show that we have already executed the request. if the client retries the request, "InstanceB" will check the requestId. Since that particular request already has been executed, it will not make any change to the DB item. Those kinds of requests are called idempotent requests. So we send the same request multiple times, but we won't make any change