Coarse-grained vs fine-grained - terminology

What is the difference between coarse-grained and fine-grained?
I have searched these terms on Google, but I couldn't find what they mean.

From Wikipedia (granularity):
Granularity is the extent to which a
system is broken down into small
parts, either the system itself or its
description or observation. It is the
extent to which a larger entity is
subdivided. For example, a yard broken
into inches has finer granularity than
a yard broken into feet.
Coarse-grained systems consist of
fewer, larger components than
fine-grained systems; a coarse-grained
description of a system regards large
subcomponents while a fine-grained
description regards smaller components
of which the larger ones are composed.

In simple terms
Coarse-grained - larger components than fine-grained, large subcomponents. Simply wraps one or more fine-grained services together into a more coarse­-grained operation.
Fine-grained - smaller components of which the larger ones are composed, lower­level service
It is better to have more coarse-grained service operations, which are composed by fine-grained operations

Coarse-grained: A few ojects hold a lot of related data that's why services have broader scope in functionality. Example: A single "Account" object holds the customer name, address, account balance, opening date, last change date, etc.
Thus: Increased design complexity, smaller number of cells to various operations
Fine-grained: More objects each holding less data that's why services have more narrow scope in functionality. Example: An Account object holds balance, a Customer object holds name and address, a AccountOpenings object holds opening date, etc.
Thus: Decreased design complexity , higher number of cells to various service operations.
These are relationships defined between these objects.

One more way to understand would be to think in terms of communication between processes and threads. Processes communicate with the help of coarse grained communication mechanisms like sockets, signal handlers, shared memory, semaphores and files. Threads, on the other hand, have access to shared memory space that belongs to a process, which allows them to apply finer grain communication mechanisms.
Source: Java concurrency in practice

In the context of services:
http://en.wikipedia.org/wiki/Service_Granularity_Principle
By definition a coarse-grained service operation has broader scope
than a fine-grained service, although the terms are relative. The
former typically requires increased design complexity but can reduce
the number of calls required to complete a task.
A fine grained service interface is about the same like chatty interface.

In term of dataset like a text file ,Coarse-grained meaning we can transform the whole dataset but not an individual element on the dataset While fine-grained means we can transform individual element on the dataset.

Coarse-grained and Fine-grained both think about optimizing a number of servicess. But the difference is in the level. I like to explain with an example, you will understand easily.
Fine-grained: For example, I have 100 services like findbyId, findbyCategry, findbyName...... so on. Instead of that many services why we can not provide find(id, category, name....so on). So this way we can reduce the services. This is just an example, but the goal is how to optimize the number of services.
Coarse-grained: For example, I have 100 clients, each client have their own set of 100 services. So I have to provide 100*100 total services. It is very much difficult. Instead of that what I do is, I identify all common services which apply to most of the clients as one service set and remaining separately. For example in 100 services 50 services are common. So I have to manage 100*50 + 50 only.

Coarse-grained granularity does not always mean bigger components, if you go by literally meaning of the word coarse, it means harsh, or not appropriate. e.g. In software projects management, if you breakdown a small system into few components, which are equal in size, but varies in complexities and features, this could lead to a coarse-grained granularity. In reverse, for a fine-grained breakdown, you would divide the components based on their cohesiveness of the functionalities each component is providing.

coarse grained and fine grained. Both of these modes define how the cores are shared
between multiple Spark tasks. As the name suggests, fine-grained mode is
responsible for sharing the cores at a more granular level. Fine-grained mode has been deprecated by Spark and will soon be removed.

Corse-grained services provides broader functionalities as compared to fine-grained service. Depending on the business domain, a single service can be created to serve a single business unit or specialised multiple fine-grained services can be created if subunits are largely independent of each other.
Coarse grained service may get more difficult may be less adaptable to change due to its size while fine-grained service may introduce additional complexity of managing multiple services.

Granularity has an important application while storing large scale data where space is very important.
The meaning of granularity according to Oxford dictionary is -
"The scale or level of detail in a set of data."
According to Cambridge dictionary -
"A lot of small details included in information, making it possible for you to understand very clearly what is happening"
So from the word specific meaning, it is some kind of partition of data for a continuous process.
Finer granularity consists of small interval partition, so that detailed representation can be achieved.
On the other hand, coarser granularity is larger frame interval, so that it can save storage.
Uses of two types of granularity is application specific.
For example- If we have an application, where recent time information is more important than the older information. For detailed representation of recent data can be found by finer granularity, while for older data representation we can use coarser granularity.

Related

Defining observation space and reward for traffic signal phase optimization for reinforcement learning

I am trying to use Reinforcement Learning for traffic signal phase optimization for improving traffic flow at intersections.
I am aware that in practice we won't be able to get the information about all the vehicles in each of the lanes.
If we use a camera for getting information about the queue length then we can get accurate data only upto, say 200 meters.
Should I take this into consideration while defining my observation space or can I directly use the data from sumo?
Furthermore, what should be the ideal observation space for such a task?
sumo_rl allows to use various metrics for reward calucation such as pressure metric, queue length metric, etc. What will be a good choice of rewards for my use case or what factors should I consider while defining my reward?
I have tried getting metrics from the e2 detector's output file such as throughput, lane delay and queue length. For the agent however, I might not be able to use them (as traci/sumo wrappers offer better implementations?) So how do I use traci for getting this modified information?
Yes, you should try to match your observation space as close to the real world as possible. SUMO can also filter the data directly (for instance with an E3 detector).
If you want to maximize flow than the reward should also include the flow metric (throughput). It's quite easy to get it via traci (as you already noticed) but I cannot tell how it integrates with your framework since you did not give details about it.

Reinforcement learning target data

I got a question regarding reinforcement learning. let's say we have a robot that is able to adapt to changing environments. Similar to this paper 1. When there is a change in the environment[light dimming], the robot's performance drops and it needs to explore its new environment by collecting data and running the Q-algorithm again to update its policy to be able to "adapt". The collection of new data and updating of the policy takes about 4/5hrs. I was wondering if I have an army of these robots in the same room, undergoing the same environmental changes, can the data collection be sped up so that the policy can be updated more quickly? so that the policy can be updated in under 1 hour or so, allowing the performance of the robots to increase?
I believe you are talking about scaling learning horizontally as in training multiple agents in parallel.
A3C is one algorithm that does this by training multiple agents in parallel and independently of each other. Each agent has its own environment which allows it to gain a different experience than the rest of the agents, ultimately increasing the breadth of your agents collective experience. Eventually each agent updates a shared network asynchronously and you use this network to drive your main agent.
You mentioned that you wanted to use the same environment for all parallel agents. I can think of this in two ways:
If you are talking about a shared environment among agents, then this could possibly speed things up however you are likely not going to gain much in terms of performance. You are also very likely to face issues in terms of episode completion - if multiple agents are taking steps simultaneously then your transitions will be a mess to say the least. The complexity cost is high and the benefit is negligible.
If you are talking about cloning the same environment for each agent then you end up both gaining speed and a broader experience which translates to performance. This is probably the sane thing to do.

How to understand a role of a queue in a distributed system?

I am trying to understand what is the use case of a queue in distributed system.
And also how it scales and how it makes sure it's not a single point of failure in the system?
Any direct answer or a reference to a document is appreciated.
Use case:
I understand that queue is a messaging system. And it decouples the systems that communicate between each other. But, is that the only point of using a queue?
Scalability:
How does the queue scale for high volumes of data? Both read and write.
Reliability:
How does the queue not becoming a single point of failure in the system? Does the queue do a replication, similar to data-storage?
My question is not specified to any particular queue server like Kafka or JMS. Just in general.
Queue is a mental concept, the implementation decides about 1 + 2 + 3
A1: No, it is not the only role -- a messaging seems to be main one, but a distributed-system signalling is another one, by no means any less important. Hoare's seminal CSP-paper is a flagship in this field. Recent decades gave many more options and "smart-behaviours" to work with in designing a distributed-system signalling / messaging services' infrastructures.
A2: Scaling envelopes depend a lot on implementation. It seems obvious that a broker-less queues can work way faster, that a centralised, broker-based, infrastructure. Transport-classes and transport-links account for additional latency + performance degradation as the data-flow volumes grow. BLOB-handling is another level of a performance cliff, as the inefficiencies are accumulating down the distributed processing chain. Zero-copy ( almost ) zero-latency smart-Queue implementations are still victims of the operating systems and similar resources limitations.
A3: Oh sure it is, if left on its own, the SPOF. However, Theoretical Cybernetics makes us safe, as we can create reliable systems, while still using error-prone components. ( M + N )-failure-resilient schemes are thus achievable, however the budget + creativity + design-discipline is the ceiling any such Project has to survive with.
my take:
I would be careful with "decouple" term - if service A calls api on service B, there is coupling since there is a contract between services; this is true even if the communication is happening over a queue, file or fax. The key with queues is that the communication between services is asynchronous. Which means their runtimes are decoupled - from practical point of view, either of systems may go down without affecting the other.
Queues can scale for large volumes of data by partitioning. From clients point of view, there is one queue, but in reality there are many queues/shards and number of shards helps to support more data. Of course sharding a queue is not "free" - you will lose global ordering of events, which may need to be addressed in you application.
A good queue based solution is reliable based on replication/consensus/etc - depends on set of desired properties. Queues are not very different from databases in this regard.
To give you more direction to dig into:
there an interesting feature of queues: deliver-exactly-once, deliver-at-most-once, etc
may I recommend Enterprise Architecture Patterns - https://www.enterpriseintegrationpatterns.com/patterns/messaging/Messaging.html this is a good "system design" level of information
queues may participate in distributed transactions, e.g. you could build something like delete a record from database and write it into queue, and that will be either done/committed or rolledback - another interesting topic to explore

Stress test cases for web application

What other stress test cases are there other than finding out the maximum number of users allowed to login into the web application before it slows down the performance and eventually crashing it?
This question is hard to answer thoroughly since it's too broad.
Anyway many stress tests depend on the type and execution flow of your workload. There's an entire subject dedicated (as a graduate course) to queue theory and resources optimization. Most of the things can be summarized as follows:
if you have a resource (be it a gpu, cpu, memory bank, mechanical or
solid state disk, etc..), it can serve a number of users/requests per
second and takes an X amount of time to complete one unit of work.
Make sure you don't exceed its limits.
Some systems can also be studied with a probabilistic approach (Little's Law is one of the most fundamental rules in these cases)
There are a lot of reasons for load/performance testing, many of which may not be important to your project goals. For example:
- What is the performance of a system at a given load? (load test)
- How many users the system can handle and still meet a specific set of performance goals? (load test)
- How does the performance of a system changes over time under a certain load? (soak test)
- When will the system will crash under increasing load? (stress test)
- How does the system respond to hardware or environment failures? (stress test)
I've got a post on some common motivations for performance testing that may be helpful.
You should also check out your web analytics data and see what people are actually doing.
It's not enough to simply simulate X number of users logging in. Find the scenarios that represent the most common user activities (anywhere between 2 to 20 scenarios).
Also, make sure you're not just hitting your cache on reads. Add some randomness / diversity in the requests.
I've seen stress tests where all the users were requesting the same data which won't give you real world results.

Performances evaluation with Message Passing

I have to build a distributed application, using MPI.
One of the decision that I have to take is how to map instances of classes into process (and then into machines), in order to take maximum advantages from a distributed environment.
My question is: there is a model that let me choose the better mapping? I mean, some arrangements are surely wrong (for ex., putting in two different machines two objects that should process together a fairly large amount of data, in a sequential manner, without a stream of tokens to process), but there's a systematically way to determine such wrong arrangements, determined by flow of execution, message complexity, time taken by the computation done by the algorithmic components?
Well, there are data flow diagrams. Those can help identify parallelism's opportunities and pitfalls. The references on the wikipedia page might give you some more theoretical grounding.
When I worked at Lockheed Martin, I was exposed to CSIM, a tool they developed for modeling algorithm mapping to processing blocks.
Another thing you might try is the Join Calculus. I've found examples of programming with it to be surprisingly intuitive, and I think it's well grounded in theory. I'm not sure why it hasn't caught on more.
The other approach is the Pi Calculus, and I think that might be more popular, though it seems harder to understand.
A practical solution to this would be using a different model of distributed-memory parallel programming, that directly addresses your concerns. I work on the Charm++ programming system, whose model is that of individual objects sending messages from one to another. The runtime system facilitates automatic mapping of these objects to available processors, to account for issues of load balance and communication locality.