Intent detection confidence - watson-assistant

The question relates to IBM Watson Assistant dialog management.
I have implemented a digression workflow that is triggered by a certain user intent. Unfortunatelly, the minimal confidence level when this intent is recognized is too low for my use case.
Is there a way to adjust/force certain confidence level with regards to specific intents?
Best regards.

Yes, you can do this globally for the the top 10 intents that are returned by using
intent[0].confidence > 0.00
This will mean that you're node will only trigger if the confidence is above a certain threshold
or you can do a combination of the rule on your dialog node to make it more accurate. Specifying the [0] means that Watson will make sure that the highest scoring intent is above a certain threshold for confidence.
#Intent && intent[0].confidence > 0.99

Related

State definition in Reinforcement learning

When defining state for a specific problem in reinforcement learning, How to decide what to include and what to leave for the definition, and also how to set difference between an observation and a state.
For example assuming that the agent is in the context of human resource and planning where it needs to hire some workers based on the demand of jobs, considering the cost of hiring them (assuming the budget is limited) is a state in the format of (# workers, cost) a good definition of state?
In total I don't know what information is needed to be in state and what should be left as it's rather observation.
Thank you
I am assuming you are formulating this as an RL problem because the demand is an unknown quantity. And, maybe [this is optional criteria] the Cost of hiring them may take into account a worker's contribution towards the job which is unknown initially. If however, both these quantities are known or can be approximated beforehand then you can just run a Planning algorithm to solve the problem [or just some sort of Optimization].
Having said this, the state in this problem could be something as simple as (#workers). Note I'm not including the cost, because cost must be experienced by the agent, and therefore is unknown to the agent until it reaches a specific state. Depending on the problem, you might need to add another factor of "time", or the "job-remaining".
Most of the theoretical results on RL hinge on a key assumption in several setups that the environment is Markovian. There are several works where you can get by without this assumption, but if you can formulate your environment in a way that exhibits this property, then you would have much more tools to work with. The key idea being, the agent can decide which action to take (in your case, an action could be : Hire 1 more person. Other actions could be Fire a person) based on the current state, say (#workers = 5, time=6). Note that we are not distinguishing between workers yet, so firing "a" person, instead of firing "a specific" person x. If the workers have differing capabilities, you may need to add several other factors each representing which worker is currently hired, and which are currently in the pool, yet to be hired so like a boolean array of a fixed length. (I hope you get the idea of how to form a state representation, and this can vary based on the specifics of the problem, which are missing in your question).
Now, once we have the State definition S, the action definition A (hire / fire), we have the "known" quantities for an MDP-setup in an RL framework. We also need an environment that can supply us with the cost function when we query it (Reward Function / Cost Function), and tell us the outcome of taking a certain action on a certain state (Transition). Note that we don't necessarily need to know these Reward / Transition function beforehand, but we should have a means of getting these values when we query for a specific (state, action).
Coming to your final part, the difference between observation and state. There are much better resources to dig deep into it, but in a crude sense, observation is an agent's (any agent, AI, human etc) sensory data. For example, in your case the agent has the ability to count number of workers currently employed (but it does not have an ability to distinguish between workers).
A state, more formally, a true MDP state must be something that is Markovian and captures the environment at its fundamental level. So, maybe in order to determine the true cost to the company, the agent needs to be able to differentiate between workers, working hours of each worker, jobs they are working at, interactions between workers and so on. Note that, much of these factors may not be relevant to your task, for example a worker's gender. Typically one would like to form a good hypothesis on which factors are relevant beforehand.
Now, even though we can agree that a worker's assignment (to a specific job) maybe a relevant feature which making a decision to hire or fire them, your observation does not have this information. So you have two options, either you can ignore the fact that this information is important and work with what you have available, or you try to infer these features. If your observation is incomplete for the decision making in your formulation we typically classify them as Partially Observable Environments (and use POMDP frameworks for it).
I hope I clarified a few points, however, there is huge theory behind all of this and the question you asked about "coming up with a state definition" is a matter of research. (Much like feature engineering & feature selection in Machine Learning).

Action masking for continuous action space in reinforcement learning

Is there a way to model action masking for continuous action spaces? I want to model economic problems with reinforcement learning. These problems often have continuous action and state spaces. In addition, the state often influences what actions are possible and, thus, the allowed actions change from step to step.
Simple example:
The agent has a wealth (continuous state) and decides about spending (continuous action). The next periods is then wealth minus spending. But he is restricted by the budget constraint. He is not allowed to spend more than his wealth. What is the best way to model this?
What I tried:
For discrete actions it is possible to use action masking. So in each time step, I provided the agent with information which action is allowed and which not. I also tried to do it with contiuous action space by providing lower and upper bound on allowed actions and clip the actions smapled from actor network (e.g. DDPG).
I am wondering if this is a valid thing to do (it works in a simple toy model) because I did not find any RL library that implements this. Or is there a smarter way/best practice to include the information about allowed actions to the agent?
I think you are on the right track. I've looked into masked actions and found two possible approaches: give a negative reward when trying to take an invalid action (without letting the environment evolve), or dive deeper into the neural network code and let the neural network output only valid actions.
I've always considered this last approach as the most efficient, and your approach of introducing boundaries seems very similar to it. So as long as this is the type of mask (boundaries) you are looking for, I think you are good to go.

Why Q-Learning is Off-Policy Learning?

Hello Stack Overflow Community!
Currently, I am following the Reinforcement Learning lectures of David Silver and really confused at some point in his "Model-Free Control" slide.
In the slides, Q-Learning is considered as off-policy learning. I could not get the reason behind that. Also he mentions we have both target and behaviour policies. What is the role of behaviour policy in Q-Learning?
When I look at the algorithm, it looks so simple like update your Q(s,a) estimate by using the maximum Q(s',a') function. In the slides, it is said as "we choose the next action using behaviour policy" but here we choose only the maximum one.
I am so confused about the Q-Learning algorithm. Can you help me please?
Link of the slide(pages:36-38):
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/control.pdf
check this answer first https://stats.stackexchange.com/a/184794
According to my knowledge, target policy is what we set as our policy it could be epsilon-greedy or something else. but in behaviour policy, we just use greedy policy to select the action without even considering what is our target policy, So it estimate our Q assuming a greedy policy were followed despite the fact that it's not following a greedy policy.
Ideally, you want to learn the true Q-function, i.e., the one that satisfies the Bellman equation
Q(s,a) = R(s,a) + gamma*E[Q(s',a')] forall s,a
where the expectation is over a' w.r.t the policy.
First, we approximate the problem and get rid of the "forall" because we have access only to few samples (especially in continuous action, where the "forall" results in infinitely many constraints). Second, say you want to learn a deterministic policy (if there is an optimal policy, there is a deterministic optimal policy). Then the expectation disappears, but you need to collect samples somehow. This is where the "behavior" policy comes in, which usually is just a noisy version of the policy you want to optimize (the most common are e-greedy or you add Gaussian noise if the action is continuous).
So now you have samples collected from a behavior policy and a target policy (deterministic) which you want to optimize.
The resulting equation is
Q(s,a) = R(s,a) + gamma*Q(s',pi(s'))
the difference between the two sides is the TD error and you want to minimize it given samples collected from the behavior policy
min E[R(s,a) + gamma*Q(s',pi(s')) - Q(s,a)]
where the expectation is approximated with samples (s,a,s') collected using the behavior policy.
If we consider the pseudocode of Soroush, if actions are discrete, then pi(s') = max_A Q(s',A) and the update rule is the derivative of the TD(0) error.
These are some good easy reads to learn more about TD: 1, 2, 3, 4.
EDIT
Just to underline the difference between on- and off-policy. SARSA is on-policy, because the TD error to update the policy is
min E[R(s,a) + gamma*Q(s',a') - Q(s,a)]
a' is the action collected while sampling data using the behavior policy, and it's not pi(s') (the action that the target policy would choose in state s').
#Soroush's answer is only right if the red text is exchanged. Off-policy learning means you try to learn the optimal policy $\pi$ using trajectories sampled from another policy or policies. This means $\pi$ is not used to generate actual actions that are being executed in the environment. Since A is the executed action from the $\epsilon$-greedy algorithm, it is not from $\pi$ (the target policy) but another policy (the behavior policy, hence the name "behavior").

Downloading Quotes in CSV format from Yahoo Finance - Beta symbol?

By using http://finance.yahoo.com/d/quotes.csv?s=STOCKNAME&f=I am able to download a CSV file, does anyone know what the symbol for beta is? It should go after &f= e.g. the symbol for the stock name is n and it goes in as such: http://finance.yahoo.com/d/quotes.csv?s=STOCKNAME&f=n
Thanks in advance for your help!
Unfortunately you can´t
There is no beta 'symbol' to allow you to download beta using Yahoos CSV API.
With that being said, it may be important to note
Though plenty of financial sites provide them, what risks are you
taking by using one of the betas provided by an outside source? Betas
provided for you by online services have unknown variable inputs,
which in all likelihood are not adaptive to your unique portfolio.
Crucially:
Provided betas are calculated with time frames unknown to their
consumers.
Another problem may be the index used to calculate beta.
Another unknown factor of pre-made betas is the method used to
calculate them.
Yahoo may therefore not provide beta due to it being liable to misinterpretation based on the above (though this is purely speculative).
So then what?
It's actually pretty straight forward to calculate yourself, all you need to do is:
Decide your time horizon for measurement
Decide an appropriate market to measure against
Ensure your chosen investment and markets share matching datapoints across the chose period (for ease of calculation)
Decide an appropriate risk free rate of return
Decide your model of calculation (e.g. regression or the capital asset pricing model, 'CAPM')
The methodology to then perform the calculation is dependant on what you're trying to accomplish and within what (programming) environment.

How to prioritize bugs?

In my current company there isn't clear understanding between the test and development teams as to how severe a bug should be? There are arguments which go back and forth to reduce or to increase the severity. We are not as of now aware of any documents which lays the rules. The tester raises the bug and assigns priority based on his intuition. The developer would request a change based on his load or some other factor.
How are severity/priority of bugs classified? Are there any standards which guide how software defect priorities needs to be determined based on customer needs, time lines and other things?
Use priority levels that deliberately have nothing to do with severity or impact, and describe only the conceptual position of the bug in the schedule. This field will determine which bugs get worked on, so it will be a target for negotiation.
Use severity levels that deliberately have concrete, verifiable definitions not open to negotiation, that have nothing to do with scheduling or priority. I've worked successfully with the severity definitions used by the Debian BTS, generalised to apply to programming projects in general.
That way, the severity is much more a matter of verifiable fact, independent of a statement of priority. The priority is then free to be tweaked up and down by negotiation or whatever, without affecting the factual information in the severity field.
Attempting to conflate both “severity” and “priority” into a single field will lead to soul-draining arguments and wasted time. The bug reporter needs a firm guide of fact to determine how “bad” the bug is, and this needs to be easily agreed on by independent parties. The priority, on the other hand, is the correct target for negotiation and scheduling games.
I work on emergency control centre systems, so this set of bug levels is a little, well... extreme:
someone dies
total system failure requiring DR invocation
server failure requiring engineer response
failure involving loss of call continuity
failure involving loss of data
incorrect data recorded
application failure - non-recoverable
application failure - non-recoverable, but automatically restarted
does not meet requirement spec, no workaround
does not meet requirement spec, but has workaround
cosmetic - layout etc.
actually a feature request
That's off the top of my head. In case you were wondering, it's from most extreme to least :-)
Some stuff we used before. We split the defect rating into priority and severity.
Severity (set by submitter during submission of defect)
Highest (5): Data loss, hardware damage possible, or a security-related failure
High (4): Loss of functionality without any reasonable workaround
Medium (3): Loss of functionality with a reasonable workaround
Low (2): Partial loss of a function or a feature set (feature still hits the design requirements)
Lowest (1): A cosmetic error
Priority (adjusted by development, management and QA during defect evaluation)
Highest (5): The system is practically unusable with this defect.
High (4): The defect will have a serious impact on the company’s ability to sell and maintain this system.
Medium (3): The company will lose some money if this defect is in the system, but it might be more important to meet the schedule. Fix after release.
Low (2): Do not delay the release, but do fix this problem afterwards.
Lowest (1): Fix as time and resources allow.
Both numbers together create a risk priority number (RPN). Simply multiply severity with priority. Higher result means higher risk. 25 defines the ultimate defect bomb. 1 can be done during idle time or if someone is bored and needs something to do.
First goal: Defects with a rating of highest or high of any kind should be fixed before release.
Second goal: Defects with RPN > 8 should be fixed before releasing the product.
This is of course a little bit artificial but helps to give all parties (Support, QA/Test, Engineering, and Product Managers) a tool to set priorities without blowing away the opinion of other side.
Replace your bug tracking system with fogbugz and get rid of severity field altogether.
See Priority vs Severity
"Been there done that".
I've had this discussion over and over again, on different projects. We've tried to combine priority with severity, but the lesson I've learned: do not combine severity with priority !
We've had a lot of brainstorms and meetings which ended with the words "this is it". Multiple guideline-documents have been created and spread between the different "parties", but after a while we discovered that it didn't work at the end. Different "parties" think different about bugs: our helpdesk has another understanding of priority than the development team or the sales has.
Having both a severity and a priority level will very quickly become very confusion because:
when using numbers (between 1 to 5) one will not know what each number means
what if an issue has the highest possible priority, but the lowest possible severity - and I'm sure that this will happen!
what if someone reduces a severity, does he need to reduce the priority also?
"So what should you do then?":
Only use one kind of indicator for the 'level' of an issue: Doesn't matter how you call it.
Use numbers (eg 1 - 5, but could be more or less depending on your needs) to clearly indicate the importance but combine it with a keyword so that it's clear what it means (eg. 'nice to have', 'show stopper'). For some people prio 1 means the most import, for others 5 does -> therefore a keyword to indicate what a number means is necessary.
Make a distinction between a 'normal issue' or a 'red alert'. In our case a 'Red Alert' must be solved immediately and put in production immediately. A normal issue will follow the normal development-test-deployment-flow. The priority/severity/however-how-you-call-it should only be set for normal issues and will be ignored for 'red alerts'.
*> In practice, a 'Red Alert' can become a
'Normal Issue': the support team
discovered a major bug and created a
'Red Alert'. But after some
investigation we discovered that data
had become 'corrupt' in the database
since it was inserted there directly
and not via the application.*
Choose a good tool that allows you to customize the flow; but most tools do.
As for a standard, IEEE guide to classification for software anomalies although I am not sure how widely this is adopted. IEEE 1044.1-1995
One option is to have the product owner determine the priority of the bug. While there is some general intuition on how "bad" a bug is, it can be the responsibility of the owner of the product to set an order of precidence (i.e. bug A should be fixed before bug B etc...).
The more information (clear and concise) that can be provided to the product owner can assist that individual make those determinations (i.e. how many users have experienced the bug, what features are not available as a result of the bug, etc...)
Must be done now
Must be done before we ship
Minor annoyance (Doesn't prevent the user from exercising the functionality)
Edge case/Remote/Tester-from-Mordor scenario
Well I just made that up... my point being categorizing bugs should not be a weekly hour+ long ritual..
IMHO, prioritizing acc to a flowchart is wasted time. Fix bugs in Cat#1 and #2 - as quickly as they surface. If you find yourself swamped by bugs, slow down and reflect. Defer Cat#3 and Cat#4 if the schedule doesn't permit or higher priority items override.
The critical thing is that all of you have a shared understanding of this severity and expected quality. Don't let compliance to the holy standards of X slow you down from delivering what the customer wants... working software.
Personally I favour the two tier severity/priority model. I know the arguments for a single level but the places I've worked generally I've just seen a two level heirarchy work better
Severity is set by the support team (based on input from the client). Priority is set by the client (with input from the support team).
For severity I use:
1 - Blocker/show stopped
2 - Major functionality unavailable (or effectively unavailable), no practical work around possible
3 - Major functionality unavailable (or ...), work around possible
4 - Minor functionality unavailable (or effectively unavailable), no work around possible
5 - Minor functionality unavailable (or ...), work around possible
6 - Cosmetic or other trivial
Then for priority I just use High, Medium, Low but anything from 3 - 5 levels works (much more than that is just over the top).
I'd generally then order by Priority first and then severity within that. The important thing about this is that the client has the most important say. If they say the way their logo is printing out on a report is the highest priority then that's what gets looked at BUT it gets looked at after the other client's high priority which is stopping them logging in.
Generally speaking I wouldn't release with any high priority issues or any medium priority issues with severity 1 - 4. Obviously in an ideal world you'd fix everything but I've never been lucky enough to have that option.
The tester tells what is broken
The developer estimates how much work it will be to fix
The customer decides the business value, i.e. the priority.
Set the requirements of the project so you can base the priority of a fix on the priority of the requirements interfered by the bug.
I had the same issue with one of our customers. In the end we set up a document together describing what kind of bugs would match to a certain severity. Aside from an occasional discussion using this document as a guideline appears to work.
But be well aware that test teams and development teams may have very different opinions on what is a severe bug and what is not. From the point of view of the testers a small layout bug can be high priority when a developer would just say that no one will notice.
In our document those bugs can be high priority if they are "brand damaging", i.e. if the layout bug is in the logo or one of the products then it is severe - if it's just a paragraph on the page that is 2 pixels off then it's not.
I use the following categories both for features and bugs:
Showstopper, the program (or a major feature) will not work
Must have, a significant part of the customers will be bothered by this
Would have, some customers will be bothered
Nice to have, a few customers want this
Normally you plan to fix 1, 2 and 3, but 3 is often postponed to a next release due to time constraints.
I think this is the scale we used at a previous job:
Causes loss of files or system instability.
Crashes the program.
Feature doesn't work.
Feature doesn't work, but there are workarounds.
Cosmetic issue.
Request for enhancement.
Sometimes this was abused - if a feature was so poorly designed that someone couldn't figure out how to use it, that was classified as a 6, and it never got fixed.
I agree with the FogBugz folks that this should be kept super simple: http://fogbugz.stackexchange.com/questions/352/priority-vs-severity
I made up this scheme, which I find easy to remember:
pS: seconds matter, eg, server is on fire
pM: minutes matter, eg, something is broken
pH: hours matter, ie, don't go to bed till this is done
pd: days matter, ie, normal priority
pw: weeks matter, ie, lower priority
pm: months matter, ie, no hurry
py: years matter, ie, maybe/someday, ie, wishlist
It roughly parallels Debian's scheme: http://www.debian.org/Bugs/Developer#severities
I like it because it straightforwardly combines priority and severity into a single field that's easy to set a value for.
PS: You can also pick intermediate urgencies like "pMH" for in between "minutes matter" and "hours matter". Or "pHd" is in between "hours mattter" and "days matter" -- roughly, "don't literally pull an all-nighter for it but don't work on anything else till it's done".