Finite-state Machines: Number of Possible Answers - equation

I'm wondering if there is an equation that gives the number of possible configurations of any given finite-state machine built to handle n inputs and m states.
How many possible solutions are there to any given process when using a finite-state machine to describe it?
I'm asking because I have a problem to solve using finite-state machines and I want to know if there is only one possible solution or many.
[Problem]
Build a finite state machine that produces an output of 1 if the input X, which can take the value 0 or 1, was 101 over the last three clock cycles. X is updated each clock cycle. There are four possible states S0, S1, S2, and S3.

The number of configurations in a FSM is the number of states. It has no memory or context to differentiate being-in-state-X-now from being-in-state-X-then.
Are you talking about the potential paths through the states? i.e. the output sequence it emits as it transitions or, equivalently, the inputs which result in a termination state? These are potentially infinite, depending on the machine.
FSMs are very, very simple. If you're not sure whether you can use them, perhaps you don't have a clear description of the problem.
What is the actual problem?

Related

OpenAI gym GuessingGame-v0 possible solutions

I have been struggling to solve the GuessingGame-v0 environment which is part of the OpenAI gym.
In the environment each episode a random number within a range is selected and the agent must "guess" what this random number is. The agent is only provided with the observation of whether the guess was too large or too small.
After researching how to frame the problem I think it may be possible to frame the problem as a Hidden Markov Model, but I am unsure of how to do this.
Each episode the randomly selected number changes and because of this I don't know how the model won't have to change each episode as the goal state is continually shifting.
I could not find any resources on the environment or any environments similar to it other than the documentation provided by OpenAI which I did not find useful.
I would greatly appreciate any assistance on how to solve this environment.
I'm putting this as an answer so people don't have to read through the list of comments.
You need a program that can simply cycle through:
generate the random number
agent guesses a number (within the allowable guess range)
test whether the number is within 1%.
if the number is within 1%, stop the iteration, maybe print the guess at this point
if the iteration is at step 200, stop the iteration and maybe produce some out that gives the final guessed number and the fact it is not within 1%
if not 200 steps or 1%: a) if the number is too high, record the guess and that it is too high, or b) if the number is too low, record the guess and that it is too low. Iterate through that number bound. Repeat until either the 1% or 200 steps criterion is reached.
Another thought for you: do you need a starting low number and a starting high number?
There are a number of ways in which to implement this solution. There is also a range of programming software in which the solution can be implemented. The particular software you use is probably the one with which you are most familiar.
Good luck!

the number of the output unit and the loss function in binary classification network

Say I have a binary classification task, and I build a neural network to do this.
There are two different framework to choose in which the first is the network has one output unit indicating the probability belonging to one of the class, thus I can use the binary cross-entropy to compute the loss, the second is the network has two output units indicating the probabilities belonging to the two classes separately, also I can use the softmax cross-entropy to compute the loss.
Some suggests to use the first option, my confusion is that what the pros and cons of the two options are, and what the severest problem is if I choose the second framework? Can anyone explain this in detail to me? Thanks in advance.
If you use one output unit then you should understand that you are choosing strictly between two classes. If the probability is high enough then your netwrok chooses class A, otherwise it chooses class B. If you have two output units your network may produce rather low probability for both your units so you will end up with neither A nor B. You should choose among those two approaches depending on what is the real system you're trying to model with your network.

Computation consideration with different Caffe's network topology (difference in number of output)

I would like to use one of Caffe's reference model i.e. bvlc_reference_caffenet. I found that my target class i.e. person is one of the classes included in the ILSVRC dataset that has been trained for the model. As my goal is to classify whether a test image contains a person or not, I may achieve this by the following:
Use inference directly with 1000 number of output. This doesn't
require training/learning.
Change the network topology a little bit with the final FC layer's number of output (num_output) is set to 2 (instead of 1000). Retrain it as a binary classification problem.
My concern is about computational effort at deployment/prediction phase (testing). The latter looks more expensive computationally than the former. This is because during prediction phase it needs to compute those 1000 output possibilities to find the one with the highest score. What I'm not sure is that, it could be the case that there's a heuristic (which I'm not aware of) that simplifies the computation.
Can somebody please help cross check my understanding on this.

Is standard deviation (STDDEV) the right function for the job?

We wrote a monitoring system. This monitor is made of agents. Each agent runs on a different server, and monitors that specific server resources (RAM, CPU, SQL Server Status, Replication Status, Free Disk Space, Internet Access, specific bussiness metrics, etc.).
The agents report every measure they take to a central database where these "observations" are stored.
For example, every few seconds an agent would store in the central database a specific bussiness metric called "unprocessed_files" with its corresponding value:
(unprocessed_files, 41)
That value is constanty being written to our DB (among many others, as explained above).
We are now implementing a client application, a screen, that displays the status of every thing we monitor. So, how can we calculate what's a "normal" value and what's a wrong value?
For example, we know that if our servers are working correctly, the unprocessed_files would always be close to 0, but maybe (We don't know yet), 45 is an acceptable value.
So the question is, should we use the Standard Deviation in order to know what the acceptable range of values is?
ACCEPTABLE_RANGE = AVG(value) +- STDDEV(value) ?
We would like to notify with a red color when something is not going well.
For your backlog (unprocessed file) metric, using a standard deviation to know when to sound an alarm (turn something red) is going to drive you crazy with false alarms.
Why? most of the time your backlog will be zero. So, the standard deviation will also be very close to zero. Standard deviation tells you how much your metric varies. Therefore, whenever you get a nonzero backlog, it will be outside the avg + stdev range.
For a backlog, you may want to turn stuff yellow when the value is > 1 and red when the value is > 10.
If you have a "how long did it take" metric, standard deviation might be a valid way to identify alarm conditions. For example, you might have a web request that usually takes about half a second, but typically varies from 0.25 to 0.8 second. If they suddenly start taking 2.5 seconds, then you know something has gone wrong.
Standard deviation is a measurement that makes most sense for a normal distribution (bell curve distribution). When you handle your measurements as if they fit a bell curve, you're implicitly making the assumption that each measurement is entirely independent of the others. That assumption works poorly for typical metrics of a computing system (backlog, transaction time, load average, etc). So, using stdev is OK, but not great. You'll probably struggle to make sense of stdev numbers: that's because they don't actually make much sense.
You'd be better off, like #duffymo suggested, looking at the 95th percentile (the worst-performing operations). But MySQL doesn't compute those kinds of distributions natively. postgreSQL does. So does Oracle Standard Edition and higher.
How do you determine an out-of-bounds metric? It depends on the metric, and on what you're trying to do. If it's a backlog measurement, and it grows from minute to minute, you have a problem to investigate. If it's a transaction time, and it's far longer than average (avg + 3 x stdev, for example, you have a problem. The open source monitoring system Nagios has worked this out for various kinds of metrics.
Read a book by N. N. Taleb called "The Black Swan" if you want to know how assuming the real world fits normal distributions can crash the global economy.
Standard deviation is just a way of characterizing how much a set of values spreads away from its average (i.e. mean). In a sense, it's an "average deviation from average", though a little more complicated than that. It is true that values which differ from the mean by many times the standard deviation tend to be rare, but that doesn't mean the standard deviation is a good benchmark for identifying anomalous values that might indicate something is wrong.
For one thing, if you set your acceptable range at the average plus or minus one standard deviation, you're probably going to get very frequent results outside that range! You could use the average plus or minus two standard deviations, or three, or however many you want to reduce the number of notifications/error conditions as low as you want, but there's no telling whether any of this actually helps you identify error conditions.
I think your main problem is not statistics. Your problem is that you don't know what kinds of results actually indicate an error. So before you program in any acceptable range, just let the system run for a while and collect some calibration data showing what kinds of values you see when it's running normally, and what kinds of values you see when it's not running normally. Make sure you have some way to tell which are which. Once you have a good amount of data for both conditions, you can analyze it (start with a simple histogram) and see what kinds of values are characteristic of normal operation and what kinds are characteristics of error conditions. Then you can set your acceptable range based on that.
If you want to get fancy, there is a statistical technique called likelihood ratio testing that can help you evaluate just how likely it is that your system is working properly. But I think it's probably overkill. Monitoring systems don't need to be super-precise about this stuff; just show a cautionary notice whenever the readings start to seem abnormal.

comparision and addition of two integers : in detail

I would like to know how many machine cycles does it take to compare two integers and how many if I add that and which one is easier?
basically i m looking to see which one is more expensive generally ??
Also I need an answer from c, c++, java perspective ....
helps is appreciated thanks!!
The answer is yes. And no. And maybe.
There are machines that can compare two values in their spare time between cycles, and others that need several cycles. On the old PDP8 you first had to negate one operand, do an add, and then test the result to do a compare.
But other machines can compare much faster than add, because no register needs to be modified.
But on still other machines the operations take the same time, but it takes several cycles for the result of the compare to make it to a place where one can test it, so, if you can use those cycles the compare is almost free, but fairly expensive if you have no other operations to shove into those cycles.
The simple answer is one cycle, both operations are equally easy.
A totally generic answer is difficult to give, since processor architectures are amazingly complex when you get down into the details.
All modern processors are pipelined. That is, there are no instructions where the operands go in on cycle c, and the result is available on cycle c+1. Instead, the instruction is broken down into multiple steps.
The instructions are read into the front end of the processor, which decodes the instruction. This may include breaking it down into multiple micro-ops. The operands are then read into registers, and then the execution units handle the actual operation. Eventually the answer is returned back to a register.
The instructions go through one pipeline stage each cycle, and modern CPUs have 10-20 pipeline stages. So it could be upto 20 processor cycles to add or compare two numbers. However, once one instruction has been through one stage of the pipeline, another instruction can be read into that stage. The ideal is that each clock cycle, one instruction goes into the front end, while one set of results comes out the other.
There is massive complexity involved in getting all this to work. If you want to do a+b+c, you need to add a+b before you can add c. So a lot of the work in the front end of the processor involves scheduling. Modern processors employ out-of-order execution, so that the processor will examine the incoming instructions, and re-order them such that it does a+b, then gets on with some other work, and then does result+c once the result is available.
Which all brings us back to the original question of which is easier. Because usually, if you're comparing two integers, it is to make a decision on what to do next. Which means you won't know your next instruction until you've got the result of the last one. Because the instructions are pipelined, this means you can lose 20 clock cycles of work if you wait.
So modern CPUs have a branch predictor which makes a guess what the result will be, and continues executing the instructions. If it guesses wrong, the pipeline has to be thrown out, and work restarted on the other branch. The branch predictor helps enormously, but still, if the comparison is a decision point in the code, that is for more difficult for the CPU to deal with than the addition.
Comparison is done via subtraction, which is almost the same as addition, except that the carry and subtrahend are complemented, so a - b - c becomes a + ~b + ~c. This is already accounted for in the CPU and basically takes the same amount of time either way.