What is utility? - reinforcement-learning

What is utility? - reinforcement-learning

As part of Q learning an objective is to maximize the expected utility. I know
Reading wikipedia :
https://en.wikipedia.org/wiki/Q-learning describes expected utility in following contexts :
It works by learning an action-value function that ultimately gives
the expected utility of taking a given action in a given state and
following the optimal policy thereafter.
One of the strengths of Q-learning is that it is able to compare the
expected utility of the available actions without requiring a model of
the environment.
But does not define what utility is, what is meant by utility ?
When maximizing utility what exactly is being maximized ?

In this case, "utility" means functionality or usefulness. So "maximum functionality" or "maximum usefulness".
Plugging the word into Google gives you:
the state of being useful, profitable, or beneficial.

In general terms, utility means profitable or beneficial (as #Rob posted in his response).
In Q-learning context, utility is closed related (they can be viewed as synonyms) with action-value function, as you read in Wikipedia explanation. Here, the action-value function of policy π is an estimation of the return (long term reward) that the agent is going to obtain if it performs the action a in a given state s and follows the policy π. So, when you maximizes the utility, actually you are maximizing the rewards your agent will obtain. As rewards are defined to achieve a goal, you are maximizing the "quantity" of goal achieved.

Related

Goal Seek in Octave to replicate Excel's 'Solver' Macro

This is essentially a question on fundamentals, and whether or not there is a more efficient way to achieve what I am looking for. I have built a working fluid dynamics calculator in Excel to find the flow rates required for a target pressure loss, the optimisation is handled using Solver but it's very clunky and not user friendly.
I'm trying to replicate the function in Octave since it's widely used here, but I am a complete beginner; I'm probably missing something obvious. I can easily enter all of the math for a single iteration via a series of functions, but my excel file required using the 'Solver' macro, and I'm unsure how to efficiently replicate this in Octave.
I am aware that linprog (in matlab) and glpk (octave) can be used to solve systems of linear equations.
I have a series of nested equations which are all dependant on a single matrix, Q (flow rates at various locations). Many other inputs are required, but they either remain constant throughout calculation (e.g. system geometry) or are dictated by Q (e.g. Reynolds number and loss coefficients). In trying to simplify my problem I have settled on two steps:
Write code to solve my problem, input: Q matrix, output: pressure loss matrix
Create a loop that iterates different Q matrices until some conditions for the pressure loss matrix are met.
I don't think it will be practical to get my expressions into the form of A*x = B (in order to use glpk) given the complexity. In excel, I can point solver at a Q value that drives a multitude of equations that impact pressure loss, and it will find the value I need to achieve a target. How can I most efficiently replicate this functionality in Octave?

First off all Solver is not a macro. Pretty far from.
So, you're going to replicate a comprehensive "What-If" Analysis Plug-in -- so complex in fact, that Microsoft chose to contract a 3rd Party company of experts to develop the tool and provide support for it (successfully based on the 1.2 Billion copies they've distributed).
And you're going to this an inferior coding language that you're a complete beginner with? Cool. I'd like to see this!
Cool. Here's a checklist of Solver's features, so you don't miss anything:
Good Luck!
More Information:
Wikipedia : Solver
Office.com : Define and Solve a Problem by using Solver
Frontline: Official Solver Page: http://solver.com
AppSource.Microsoft.com : Solver (with Video)
Frontline:L Solver International Manazine

What does the EpisodeParameterMemory of keras-rl do?

I have found the keras-rl/examples/cem_cartpole.py example and I would like to understand, but I don't find documentation.
What does the line
memory = EpisodeParameterMemory(limit=1000, window_length=1)
do? What is the limit and what is the window_length? Which effect does increasing either / both parameters have?

EpisodeParameterMemory is a special class that is used for CEM. In essence it stores the parameters of a policy network that were used for an entire episode (hence the name).
Regarding your questions: The limit parameter simply specifies how many entries the memory can hold. After exceeding this limit, older entries will be replaced by newer ones.
The second parameter is not used in this specific type of memory (CEM is somewhat of an edge case in Keras-RL and mostly there as a simple baseline). Typically, however, the window_length parameter controls how many observations are concatenated to form a "state". This may be necessary if the environment is not fully observable (think of it as transforming a POMDP into an MDP, or at least approximately). DQN on Atari uses this since a single frame is clearly not enough to infer the velocity of a ball with a FF network, for example.
Generally, I recommend reading the relevant paper (again, CEM is somewhat of an exception). It should then become relatively clear what each parameter means. I agree that Keras-RL desperately needs documentation but I don't have time to work on it right now, unfortunately. Contributions to improve the situation are of course always welcome ;).

A little late to the party, but I feel like the answer doesn't really answer the question.
I found this description online (https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html#replay-memory):
We’ll be using experience replay
memory for training our DQN. It stores the transitions that the agent
observes, allowing us to reuse this data later. By sampling from it
randomly, the transitions that build up a batch are decorrelated. It
has been shown that this greatly stabilizes and improves the DQN
training procedure.
Basically you observe and save all of your state transitions so that you can train your network on them later on (instead of having to make observations from the environment all the time).

How to choose action in TD(0) learning

I am currently reading Sutton's Reinforcement Learning: An introduction book. After reading chapter 6.1 I wanted to implement a TD(0) RL algorithm for this setting:
To do this, I tried to implement the pseudo-code presented here:
Doing this I wondered how to do this step A <- action given by π for S: I can I choose the optimal action A for my current state S? As the value function V(S) is just depending on the state and not on the action I do not really know, how this can be done.
I found this question (where I got the images from) which deals with the same exercise - but here the action is just picked randomly and not choosen by an action policy π.
Edit: Or this is pseudo-code not complete, so that I have to approximate the action-value function Q(s, a) in another way, too?

You are right, you cannot choose an action (neither derive a policy π) only from a value function V(s) because, as you notice, it depends only on the state s.
The key concept that you are probably missing here, it's that TD(0) learning is an algorithm to compute the value function of a given policy. Thus, you are assuming that your agent is following a known policy. In the case of the Random Walk problem, the policy consists in choosing actions randomly.
If you want to be able to learn a policy, you need to estimate the action-value function Q(s,a). There exists several methods to learn Q(s,a) based on Temporal-difference learning, such as for example SARSA and Q-learning.
In the Sutton's RL book, the authors distinguish between two kind of problems: prediction problems and control problems. The former refers to the process of estimating the value function of a given policy, and the latter to estimate policies (often by means of action-value functions). You can find a reference to these concepts in the starting part of Chapter 6:
As usual, we start by focusing on the policy evaluation or prediction
problem, that of estimating the value function for a given policy .
For the control problem (finding an optimal policy), DP, TD, and Monte
Carlo methods all use some variation of generalized policy iteration
(GPI). The differences in the methods are primarily differences in
their approaches to the prediction problem.

Measure unit code coverage % for function/method on Cyclomatic Complexity?

Is there an approach to collect/report on unit test coverage based on a complexity ratio such as Cyclomatic Complexity on a method/function level?
The reason/intent is to provide a measurable metric to show any areas that have a higher chance of defects based on complexity actually have appropriate unit test coverage (i.e. a metric away out of '100%' or '80%' coverage by changing the metric to '100% of Cyclomatic Complexity >= 10 for example).
My usecase is currently Java/junit, and a different approach to reach the same intent would also be helpful (doesn't have to be exactly method/function based on Cyclomatic, but similar type of measurement).
EDIT: if there is a code coverage tool with similar features for both java and .NET, that would be phenomenal.
thanky!
-Darren

Disclaimer: I'm a Clover developer at Atlassian. I see that you've tagged your question with 'clover' so I'm answering :-)
In Clover you can do it in two ways (also combine them):
1) You can define a context filter and do not instrument methods with a cyclomatic complexity <= N. See <clover-setup maxComplexity="NN">
2) You can define your custom metric, for instance multiply coverage value (or better - the 'uncovered' value) by a cyclomatic complexity. Such metric would show complex, uncovered methods as more significant that simple, uncovered ones. See <clover-report> / <columns>
Oh. One more thing. Clover automatically does it for you :-) In the HTML report you can see a "tag cloud" of Top Project Risks metrics which takes coverage+complexity to show the most potentially critical parts of the application.
References:
https://confluence.atlassian.com/display/CLOVER/clover-setup
https://confluence.atlassian.com/display/CLOVER/clover-report
https://confluence.atlassian.com/pages/viewpage.action?pageId=71600304
https://confluence.atlassian.com/display/CLOVER/Treemap+Charts

What is Cyclomatic Complexity?

A term that I see every now and then is "Cyclomatic Complexity". Here on SO I saw some Questions about "how to calculate the CC of Language X" or "How do I do Y with the minimum amount of CC", but I'm not sure I really understand what it is.
On the NDepend Website, I saw an explanation that basically says "The number of decisions in a method. Each if, for, && etc. adds +1 to the CC "score"). Is that really it? If yes, why is this bad? I can see that one might want to keep the number of if-statements fairly low to keep the code easy to understand, but is this really everything to it?
Or is there some deeper concept to it?

I'm not aware of a deeper concept. I believe it's generally considered in the context of a maintainability index. The more branches there are within a particular method, the more difficult it is to maintain a mental model of that method's operation (generally).
Methods with higher cyclomatic complexity are also more difficult to obtain full code coverage on in unit tests. (Thanks Mark W!)
That brings all the other aspects of maintainability in, of course. Likelihood of errors/regressions/so forth. The core concept is pretty straight-forward, though.

Cyclomatic complexity measures the number of times you must execute a block of code with varying parameters in order to execute every path through that block. A higher count is bad because it increases the chances for logical errors escaping your testing strategy.

Cyclocmatic complexity = Number of decision points + 1
The decision points may be your conditional statements like if, if … else, switch , for loop, while loop etc.
The following chart describes the type of the application.
Cyclomatic Complexity lies 1 – 10  To be considered Normal
applicatinon
Cyclomatic Complexity lies 11 – 20  Moderate application
Cyclomatic Complexity lies 21 – 50  Risky application
Cyclomatic Complexity lies more than 50  Unstable application

Wikipedia may be your friend on this one: Definition of cyclomatic complexity
Basically, you have to imagine your program as a control flow graph and then
The complexity is (...) defined as:
M = E − N + 2P
where
M = cyclomatic complexity,
E = the number of edges of the graph
N = the number of nodes of the graph
P = the number of connected components
CC is a concept that attempts to capture how complex your program is and how hard it is to test it in a single integer number.

Yep, that's really it. The more execution paths your code can take, the more things that must be tested, and the higher probability of error.

Another interesting point I've heard:
The places in your code with the biggest indents should have the highest CC. These are generally the most important areas to ensure testing coverage because it's expected that they'll be harder to read/maintain. As other answers note, these are also the more difficult regions of code to ensure coverage.

Cyclomatic Complexity really is just a scary buzzword. In fact it's a measure of code complexity used in software development to point out more complex parts of code (more likely to be buggy, and therefore has to be very carefully and thoroughly tested). You can calculate it using the E-N+2P formula, but I would suggest you have this calculated automatically by a plugin. I have heard of a rule of thumb that you should strive to keep the CC below 5 to maintain good readability and maintainability of your code.
I have just recently experimented with the Eclipse Metrics Plugin on my Java projects, and it has a really nice and concise Help file which will of course integrate with your regular Eclipse help and you can read some more definitions of various complexity measures and tips and tricks on improving your code.

That's it, the idea is that a method which has a low CC has less forks, looping etc which all make a method more complex. Imagine reviewing 500,000 lines of code, with an analyzer and seeing a couple methods which have oder of magnitude higher CC. This lets you then focus on refactoring those methods for better understanding (It's also common that a high CC has a high bug rate)

Each decision point in a routine (loop, switch, if, etc...) essentially boils down to an if statement equivalent. For each if you have 2 codepaths that can be taken. So with the 1st branch there's 2 code paths, with the second there are 4 possible paths, with the 3rd there are 8 and so on. There are at least 2**N code paths where N is the number of branches.
This makes it difficult to understand the behavior of code and to test it when N grows beyond some small number.

The answers provided so far do not mention the correlation of software quality to cyclomatic complexity. Research has shown that having a lower cyclomatic complexity metric should help develop software that is of higher quality. It can help with software quality attributes of readability, maintainability, and portability. In general one should attempt to obtain a cyclomatic complexity metric of between 5-10.
One of the reasons for using metrics like cyclomatic complexity is that in general a human being can only keep track of about 7 (plus or minus 2) pieces of information simultaneously in your brain. Therefore, if your software is overly complex with multiple decision paths, it is unlikely that you will be able to visualize how your software will behave (i.e. it will have a high cyclomatic complexity metric). This would most likely lead to developing erroneous or bug ridden software. More information about this can be found here and also on Wikipedia.

Cyclomatic complexity is computed using the control flow graph. The Number of quantitative measure of linearly independent paths through a program's source code is called as Cyclomatic Complexity ( if/ if else / for / while )

Cyclomatric complexity is basically a metric to figure out areas of code that needs more attension for the maintainability. It would be basically an input to the refactoring.
It definitely gives an indication of code improvement area in terms of avoiding deep nested loop, conditions etc.

That's sort of it. However, each branch of a "case" or "switch" statement tends to count as 1. In effect, this means CC hates case statements, and any code that requires them (command processors, state machines, etc).

Consider the control flow graph of your function, with an additional edge running from the exit to the entrance. The cyclomatic complexity is the maximum number of cuts we can make without separating the graph into two pieces.
For example:
function F:
if condition1:
...
else:
...
if condition2:
...
else:
...
Control Flow Graph
You can probably intuitively see why the linked graph has a cyclomatic complexity of 3.

Cyclomatric complexity is a measure of how complex a unit of software is.It measures the number of different paths a program might follow with conditional logic constructs (If ,while,for,switch & cases etc....). If you will like to learn more about calculating it here is a wonderful youtube video you can watch https://www.youtube.com/watch?v=PlCGomvu-NM
It is important in designing test cases because it reveals the different paths or scenarios a program can take .
"To have good testability and maintainability, McCabe recommends
that no program module should exceed a cyclomatic complexity of 10"(Marsic,2012, p. 232).
Reference:
Marsic., I. (2012, September). Software Engineering. Rutgers University. Retrieved from www.ece.rutgers.edu/~marsic/books/SE/book-SE_marsic.pdf

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008