Do web workers use actor model? - html

I have been trying to understand how actor model and web workers work.
In https://dzone.com/articles/html5-web-workers-classic:
"Web Workers provide a message-passing model, where scripts can communicate only through well-defined immutable messages, do not share any data, and do not use synchronization mechanisms for signaling or data integrity."
To me, it sounds very similar to actor model. Is there a difference?
UPDATE:
According to Benjamin Erb:
" The fundamental idea of the actor model is to use actors as concurrent primitives that can act upon receiving messages in different ways:
1.Send a finite number of messages to other actors.
2. Spawn a finite number of new actors.
3. Change its own internal behavior, taking effect when the next incoming message is handled."
The first two apply, but how about the last one?

Yes, actor model quite describes how do the workers work. However actor model has some more abstract requirements that also depend on your implementation - but I recommend complying with the actor model. You can read the full article on Wikipedia.
There's one thing I'd like to point out to you though:
No requirement on order of message arrival
This is something that depends on your implementation and I strongly recommend to comply with this requirement. This means, for example, if sending data in chunks, give chunks indexes. Worker messages arrive in the order they're sent, but it's good not to rely on that once your code becomes complicated.
Change its own internal behavior, taking effect when the next incoming message is handled (as per your "update")
This point kinds conflicts with the previous one. But anyone who at least read some web workers tutorial, the answer is obvious: Yes, worker can do that. Consider following code:
var name = "Worker";
self.addEventListener("message", (e)=>{
if(typeof e.data.newName=="string") {
name = e.data.newName;
}
if(e.data.command == "sendName") {
self.postMessage({myName: name});
}
});
Needless to say, if you send new name to the worker, the response to the "sendName" messages will be different from that point. Such change to behaviour is trivial, but can be arbitrarily complex.
If you're interested in actor model also see javascript implementation Vert.x.
Note: there are ways to block between Workers, but those are hacks and are not intended. One I can think of is asynchronous XHR with server holding the lock. I don't think this counts as exception of actor model.

Related

Single Responsibility Principle Core Understanding

In brushing up on the SRP I read this document which I located via Uncle Bob's page on principles of OOD. I find the following passage puzzling and somewhat at odds with the rest of the document:
"If, on the other hand, the application is not changing in ways that cause the the two responsibilities to change at different times, then there is no need to separate them. Indeed, separating them would smell of Needless Complexity. There is a corollary here. An axis of change is only an axis of change if the changes actually occur. It is not wise to apply the SRP, or any other principle for that matter, if there is no symptom."
While I understand the answer to many software development questions is "it depends" principles like the SRP appear to be almost universally beneficial and to be implemented as a matter of course. The SRP itself affords code a high adaptability to future changes in requirements. Isn't the point to separate out responsibilities from the get-go to avoid struggling with highly coupled code and cascading changes later on?
I would really appreciate some clarification on this to make sure my understanding of this core principle is correct. Thanks in advance!
From my humble understanding, in the Modem example that is presented here, it might be possible that responsibilities of the modem (Connection and Data Exchange) will change as one.
You have two possibilities here :
When the protocol changes, it is possible that only the connection part change, or only the data exchange part change. It this case you should have two interfaces, because a change of protocol in the data does not imply a change of protocol in the connection.
When the protocol changes, it will always change the connection part and the data exchange part. In that case, you don't need two interfaces, because everytime you will have to rewrite the connection part, you are sure that the data exchange will change as well. In that case, you have two responsibilities put on the same change Axis (which is the protocol handled by the modem), so you can leave them inside a single interface.
The key to this statement is "not changing in ways that cause the two responsibilities to change at different times". Let's say for the sake of argument you have a PaymentLogger and a Payment class. Every time you create a new PaymentType (CreditCard, Cash, Paypal, etc) you need to update the PaymentLogger to log actions specific to those Payments. Instead of splitting out a PaymentLogger class you could have Payment class have a method called Log which does whatever is specific for itself.
In this case it could be that the act of recording actions should be build into the class itself since creating a new Payment requires also creating a new PaymentLogger. It's a responsibility that should have been part of Payment all along.

Message queuing solution for millions of topics

I'm thinking about system that will notify multiple consumers about events happening to a population of objects. Every subscriber should be able to subscribe to events happening to zero or more of the objects, multiple subscribers should be able to receive information about events happening to a single object.
I think that some message queuing system will be appropriate in this case but I'm not sure how to handle the fact that I'll have millions of the objects - using separate topic for every of the objects does not sound good [or is it just fine?].
Can you please suggest approach I should should take and maybe even some open source message queuing system that would be reasonable?
Few more details:
there will be thousands of subscribers [meaning not plenty of them],
subscribers will subscribe to tens or hundreds of objects each,
there will be ~5-20 million of the objects,
events themselves dont have to carry any message. just information that that object was changed is enough,
vast majority of objects will never be subscribed to,
events occur at the maximum rate of few hundreds per second,
ideally the server should run under linux, be able to integrate with the rest of the ecosystem via http long-poll [using node js? continuations under jetty?].
Thanks in advance for your feedback and sorry for somewhat vague question!
I can highly recommend RabbitMQ. I have used it in a couple of projects before and from my experience, I think it is very reliable and offers a wide range of configuraions. Basically, RabbitMQ is an open-source ( Mozilla Public License (MPL) ) message broker that implements the Advanced Message Queuing Protocol (AMQP) standard.
As documented on the RabbitMQ web-site:
RabbitMQ can potentially run on any platform that Erlang supports, from embedded systems to multi-core clusters and cloud-based servers.
... meaning that an operating system like Linux is supported.
There is a library for node.js here: https://github.com/squaremo/rabbit.js
It comes with an HTTP based API for management and monitoring of the RabbitMQ server - including a command-line tool and a browser-based user-interface as well - see: http://www.rabbitmq.com/management.html.
In the projects I have been working with, I have communicated with RabbitMQ using C# and two different wrappers, EasyNetQ and Burrow.NET. Both are excellent wrappers for RabbitMQ but I ended up being most fan of Burrow.NET as it is easier and more obvious to work with ( doesn't do a lot of magic under the hood ) and provides good flexibility to inject loggers, serializers, etc.
I have never worked with the amount of amount of objects that you are going to work with - I have worked with thousands ( not millions ). However, no matter how many objects I have been playing around with, RabbitMQ has always worked really stable and has never been the source to errors in the system.
So to sum up - RabbitMQ is simple to use and setup, supports AMQP, can be managed via HTTP and what I like the most - it's rock solid.
Break up the topics to carry specific events for e.g. "Object updated topic" "Object deleted"...So clients need to only have to subscribe to the "finite no:" of event based topics they are interested in.
Inject headers into your messages when you publish them and put intelligence into the clients to use these headers as message selectors. For eg, client knows the list of objects he is interested in - and say you identify the object by an "id" - the id can be the header, and the client will use the "id header" to determine if he is interested in the message.
Depending on whether you want, you may also want to consider ensuring guaranteed delivery to make sure that the client will receive the message even if it goes off-line and comes back later.
The options that I would recommend top of the head are ActiveMQ, RabbitMQ and Redis PUB SUB ( Havent really worked on redis pub-sub, please use your due diligance)
Finally here are some performance benchmarks for RabbitMQ and Redis
Just saw that you only have few 100 messages getting pushed out / sec, this is not a big deal for activemq, I have been using Amq on a system that processes 240 messages per second , and it just works fine. I use a thread pool of workers to asynchronously process the messages though . Look at a framework like akka if you are in the java land, if not stick with nodejs and the cool Eco system around it.
If it has to be open source i'd go for ActiveMQ, and an application server to provide the JMS functionality for topics and it has Ajax Support so you can access them from your client
So, you would use the JMS infrastructure to publish the topics for the objects, and you can create topis as you need them
Besides, by using an java application server you may be able to take advantages from clustering, load balancing and other high availability features (obviously based on the selected product)
Hope that helps!!!
Since your messages are very small might want to consider MQTT, which is designed for small devices, although it works fine on powerful devices as well. Key consideration is the low overhead - basically a 2 byte header for a small message. You probably can't use any simple or open source MQTT server, due to your volume. You probably need a heavy duty dedicated appliance like a MessageSight to handle your volume.
Some more details on your application would certainly help. Also you don't mention security at all. I assume you must have some needs in this area.
Though not sure about your work environment but here are my bits. Can you identify each object with unique ID in your system. If so, you can have a topic per each event type. for e.g. you want to track object deletion event, object updation event and so on. So you can have topic for each event type. These topics would be published with Ids of object whenever corresponding event happened to the object. This will limit the no of topics you needed.
Second part of your problem is different subscribers want to subscribe to different objects. So not all subscribers are interested in knowing events of all objects. This problem statement scoped to message selector(filtering) mechanism provided by messaging framework. So basically you need to seek on what basis a subscriber interested in particular object. Have that basis as a message filtering mechanism. It could be anything: object type, object state etc. So ultimately your system would consists of one topic for each event type with someone publishing event messages : {object-type:object-id} information. Subscribers could subscribe to any topic and with an filtering criteria.
If above solution satisfy, you can use any messaging solution: activeMQ, WMQ, RabbitMQ.

Stateless vs Stateful

I'm interested in articles which have some concrete information about stateless and stateful design in programming. I'm interested because I want to learn more about it, but I really can't find any good articles about it. I've read dozens of articles on the web which vaguely discuss the subject, or they're talking about web servers and sessions - which are also 'bout stateful vs stateless, but I'm interested in stateless vs stateful design of attributes in coding. Example: I've heard that BL-classes are stateless by design, entity classes (or at least that's what I call them - like Person(id, name, ..)) are stateful, etc.
I think it's important to know because I believe if I can understand it, I can write better code (e.g. granularity in mind).
Anyways, really short, here's what I know 'bout stateful vs stateless:
Stateful (like WinForms): Stores the data for further use, but limits the scalability of an application, because it's limited by CPU or memory limits
Stateless (Like ASP.NET - although ASP tries to be stateful with ViewStates):
After actions are completed, the data gets transferred, and the instance gets handed back to the thread pool (Amorphous).
As you can see, it's pretty vague and limited information (and quite focussed on server interaction), so I'd be really grateful if you could provide me with some more tasty bits of information :)
Stateless means there is no memory of the past. Every transaction is performed as if it were being done for the very first time.
Stateful means that there is memory of the past. Previous transactions are remembered and may affect the current transaction.
Stateless:
// The state is derived by what is passed into the function
function int addOne(int number)
{
return number + 1;
}
Stateful:
// The state is maintained by the function
private int _number = 0; //initially zero
function int addOne()
{
_number++;
return _number;
}
Refer from: https://softwareengineering.stackexchange.com/questions/101337/whats-the-difference-between-stateful-and-stateless
A stateful app is one that stores information about what has happened or changed since it started running. Any public info about what "mode" it is in, or how many records is has processed, or whatever, makes it stateful.
Stateless apps don't expose any of that information. They give the same response to the same request, function or method call, every time. HTTP is stateless in its raw form - if you do a GET to a particular URL, you get (theoretically) the same response every time. The exception of course is when we start adding statefulness on top, e.g. with ASP.NET web apps :) But if you think of a static website with only HTML files and images, you'll know what I mean.
I suggest that you start from a question in StackOverflow that discusses the advantages of stateless programming. This is more in the context of functional programming, but what you will read also applies in other programming paradigms.
Stateless programming is related to the mathematical notion of a function, which when called with the same arguments, always return the same results. This is a key concept of the functional programming paradigm and I expect that you will be able to find many relevant articles in that area.
Another area that you could research in order to gain more understanding is RESTful web services. These are by design "stateless", in contrast to other web technologies that try to somehow keep state. (In fact what you say that ASP.NET is stateless isn't correct - ASP.NET tries hard to keep state using ViewState and are definitely to be characterized as stateful. ASP.NET MVC on the other hand is a stateless technology). There are many places that discuss "statelessness" of RESTful web services (like this blog spot), but you could again start from an SO question.
The adjective Stateful or Stateless refers only to the state of the conversation, it is not in connection with the concept of function which provides the same output for the same input. If so any dynamic web application (with a database behind it) would be a stateful service, which is obviously false.
With this in mind if I entrust the task to keep conversational state in the underlying technology (such as a coockie or http session) I'm implementing a stateful service, but if all the necessary information (the context) are passed as parameters I'm implementing a stateless service.
It should be noted that even if the passed parameter is an "identifier" of the conversational state (e.g. a ticket or a sessionId) we are still operating under a stateless service, because the conversation is stateless (the ticket is continually passed between client and server), and are the two endpoints to be, so to speak, "stateful".
Money transfered online form one account to another account is stateful, because the receving account has information about the sender.
Handing over cash from a person to another person, this transaction is statless, because after cash is recived the identity of the giver is not there with the cash.
Just to add on others' contributions....Another way is look at it from a web server and concurrency's point of view...
HTTP is stateless in nature for a reason...In the case of a web server, being stateful means that it would have to remember a user's 'state' for their last connection, and /or keep an open connection to a requester. That would be very expensive and 'stressful' in an application with thousands of concurrent connections...
Being stateless in this case has obvious efficient usage of resources...i.e support a connection in in a single instance of request and response...No overhead of keeping connections open and/or remember anything from the last request...
We make Webapps statefull by overriding HTTP stateless behaviour by using session objects.When we use session objets state is carried but we still use HTTP only.
I had the same doubt about stateful v/s stateless class design and did some research. Just completed and my findings has been posted in my blog
Entity classes needs to be stateful
The helper / worker classes should not be stateful.

changed object state after behavior that used the state

I want to give my previous question a second chance since I think I have chosen a bad example.
The question is how I should deal with situations where an object still can change after I have used it to do something and the new state is relevant for what is being done.
Example in pseudo-code:
class Book
method 'addChapter': adds a chapter
class Person
method 'readBook': read an object of class Book
Now when I ask the person the read a book, at least in PHP where the object will be passed by reference, the book object can still change. I could insert a chapter between chapter 3 and 4 while the person is already reading chapter 6. How can I deal with these kind of situations?
Maybe notifying the person that the book has changed? You can do it with events (not sure how events work in PHP). Another way is to implement the Observer/Observable pattern.
Any of the above answers is good for a good solution depending on your business demand.
You asked "How should I deal with situations..." it depends:
Is adding a chapter after you already shipped a book to the user legal? if not- you should throw an exception (I do not know if PHP supports exceptions, but anyway- you should treat is as an error situation).
Another solution would be to make sure that you expose and object that is already whole and does not expect changes to be made to it- this may be a valid solution especially if your decision to enable this kind of 'streaming' is performance oriented but you don't have real evidence that that this section is a performance bottleneck.
Now lets sat the addition of a new chapter is legal.
Do you want the change to be known to existing client or only to new clients?
If the former- you should implement some kind of notification logic (one of the suggeted forms is the a publisher/subscriber pattern, but there are others).
If the letter- you should make your book object immutable, so mutating operations will not be seen in existing clients, rather they would create an entirely new book the would be passed to new clients (Persons).
I could go on and on, But I suggest that next time you elaborate more on the exact problem you are trying to solve, since as you can see- the same problem can have a different solution in different domains.
Seems to me you are attempting to perform concurrent tasks. You might want to consider serializing activities to your objects instead, certainly in the case of PHP.

How to avoid Anemic Domain Models and maintain Separation of Concerns?

It seems that the decision to make your objects fully cognizant of their roles within the system, and still avoid having too many dependencies within the domain model on the database, and service layers?
For example: Say that I've got an entity with a revision history, and several "lookup tables" that the data references, your entity object should have methods to get the details from some of the lookup tables, whether by providing access to the lookup table rows, or by delegating methods down to them, but in order to do so it depends on the database layer to read the data from those rows. Also, when the entity is saved, It needs to know not only how to save itself, but also to save entries into the revision history. Is it necessary to pass references to dozens of different data layer objects and service objects to the model object? This seems like it makes the logic far more complex to understand than just passing back and forth thin models to service layer objects, but I've heard many "wise men" recommending this sort of structure.
Really really good question. I have spent quite a bit of time thinking about such topics.
You demonstrate great insight by noting the tension between an expressive domain model and separation of concerns. This is much like the tension in the question I asked about Tell Don't Ask and Single Responsibility Principle.
Here is my view on the topic.
A domain model is anemic because it contains no domain logic. Other objects get and set data using an anemic domain object. What you describe doesn't sound like domain logic to me. It might be, but generally, look-up tables and other technical language is most likely terms that mean something to us but not necessarily anything to the customers. If this is incorrect, please clarify.
Anyway, the construction and persistence of domain objects shouldn't be contained in the domain objects themselves because that isn't domain logic.
So to answer the question, no, you shouldn't inject a whole bunch of non-domain objects/concepts like lookup tables and other infrastructure details. This is a leak of one concern into another. The Factory and Repository patterns from Domain-Driven Design are best suited to keep these concerns apart from the domain model itself.
But note that if you don't have any domain logic, then you will end up with anemic domain objects, i.e. bags of brainless getters and setters, which is how some shops claim to do SOA / service layers.
So how do you get the best of both worlds? How do you focus your domain objects only domain logic, while keeping UI, construction, persistence, etc. out of the way? I recommend you use a technique like Double Dispatch, or some form of restricted method access.
Here's an example of Double Dispatch. Say you have this line of code:
entity.saveIn(repository);
In your question, saveIn() would have all sorts of knowledge about the data layer. Using Double Dispatch, saveIn() does this:
repository.saveEntity(this.foo, this.bar, this.baz);
And the saveEntity() method of the repository has all of the knowledge of how to save in the data layer, as it should.
In addition to this setup, you could have:
repository.save(entity);
which just calls
entity.saveIn(this);
I re-read this and I notice that the entity is still thin because it is simply dispatching its persistence to the repository. But in this case, the entity is supposed to be thin because you didn't describe any other domain logic. In this situation, you could say "screw Double Dispatch, give me accessors."
And yeah, you could, but IMO it exposes too much of how your entity is implemented, and those accessors are distractions from domain logic. I think the only class that should have gets and sets is a class whose name ends in "Accessor".
I'll wrap this up soon. Personally, I don't write my entities with saveIn() methods, because I think even just having a saveIn() method tends to litter the domain object with distractions. I use either the friend class pattern, package-private access, or possibly the Builder pattern.
OK, I'm done. As I said, I've obsessed on this topic quite a bit.
"thin models to service layer objects" is what you do when you really want to write the service layer.
ORM is what you do when you don't want to write the service layer.
When you work with an ORM, you are still aware of the fact that navigation may involve a query, but you don't dwell on it.
Lookup tables can be a relational crutch that gets used when there isn't a very complete object model. Instead of things referencing things, you have codes, which must be looked up. In many cases, the codes devolve to little more than a static pool of strings with database keys. And the relevant methods wind up in odd places in the software.
However, if there is a more complete object model, we have first-class things instead of these degenerate lookup values.
For example, I've got some business transactions which have one of n different "rate plans" -- a kind of pricing model. Right now, the legacy relational database has the rate plan as a lookup table with a code, some pricing numbers, and (sometimes) a description.
[Everyone knows the codes -- the codes are sacred. No one is sure what the proper descriptions should be. But they know the codes.]
But really, a "rate plan" is an object that is associated with a contract; the rate plan has the method that computes the final price. When an app asks the contract for a price, the contract delegates some of the pricing work to the associated rate plan object.
There may have been some database query going on to lookup the rate plan when producing a contract price, but that's incidental to the delegation of responsibility between the two classes.
I aggree with DeadBeef - therein lies the tension. I don't really see though how a domain model is 'anemic' simply because it doesn't save itself.
There has to be much more to it. ie. It's anemic because the service is doing all the business rules and not the domain entity.
Service(IRepository) injected
Save(){
DomainEntity.DoSomething();
Repository.Save(DomainEntity);
}
'Do Something' is the business logic of the domain entity.
**This would be anemic**:
Service(IRepository) injected
Save(){
if(DomainEntity.IsSomething)
DomainEntity.SetItProperty();
Repository.Save(DomainEntity);
}
See the inherit difference ? I do :)
Try the "repository pattern" and "Domain driven design". DDD suggests to define certain entities as Aggregate-roots of other objects. Each Aggregate is encapsulated. The entities are "persistence ignorant". All the persistence-related code is put in a repository object which manages Data-access for the entity. This way you don't have to mix persistence-related code with your business logic. If you are interested in DDD, check out eric evans book.