Calculate minimum number of test cases for statement coverage and condition coverage respectively - white-box-testing

Could someone draw a flow graph and explain how I get the answer to this question?
Question: Calculate the minimum number of test cases for statement coverage and minimum number of test cases for decision coverage respectively.
READ A
READ B
READ C
IF C>A THEN
IF C>B THEN
PRINT "C MUST BE SMALLER THAN AT LEAST ONE NUMBER"
ELSE
PRINT "PROCEED TO NEXT STAGE"
END IF
ELSE
PRINT "B CAN BE SMALLER THAN C"
END IF

Related

perMANOVA for small sample size

I have data of 6 groups with sample size of n = 2, 10, 2, 9, 3, 1 and I want to perform Permutational multivariate analysis of variance (PERMANOVA) on these data.
My question is: Is it correct to run perMANOVA on these data with the small sample size? The results look strange for me because the group of n = 1 showed insignificant difference to other groups although the graphical representation of the groups clearly show a difference.
Thank you
I would not trust any result with group of n=1 because there is no source of variation to define difference among groups.
I have also received some answers from other platforms. I put them here for information:
The sample size is simply too small to yield a stable solution via manova. Note that the n = 1 cell contributes a constant value for that cell's mean, no matter what you do by way of permutations.
Finally, note that the effective per-cell sample size with unequal cell n for one-way designs tracks well to the harmonic mean of n. For your data set as it stands, that means an "effective" per-cell n of about 2.4. Unless differences are gigantic on the DV set, no procedure (parametric or exact/permutation) will have the statistical power to detect differences with that size.
MANOVA emphasizes the attribute scattering in the study group and the logic of this analysis is based on the scattering of scores. It is not recommended to use small groups with one or more people (I mean less than 20 people) to perform parametric tests such as MANOVA. In my opinion, use non-parametric tests to examine small groups.

Understanding stateful LSTM [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I'm going through this tutorial on RNNs/LSTMs and I'm having quite a hard time understanding stateful LSTMs. My questions are as follows :
1. Training batching size
In the Keras docs on RNNs, I found out that the hidden state of the sample in i-th position within the batch will be fed as input hidden state for the sample in i-th position in the next batch. Does that mean that if we want to pass the hidden state from sample to sample we have to use batches of size 1 and therefore perform online gradient descent? Is there a way to pass the hidden state within a batch of size >1 and perform gradient descent on that batch ?
2. One-Char Mapping Problems
In the tutorial's paragraph 'Stateful LSTM for a One-Char to One-Char Mapping' were given a code that uses batch_size = 1 and stateful = True to learn to predict the next letter of the alphabet given a letter of the alphabet. In the last part of the code (line 53 to the end of the complete code), the model is tested starting with a random letter ('K') and predicts 'B' then given 'B' it predicts 'C', etc. It seems to work well except for 'K'. However, I tried the following tweak to the code (last part too, I kept lines 52 and above):
# demonstrate a random starting point
letter1 = "M"
seed1 = [char_to_int[letter1]]
x = numpy.reshape(seed, (1, len(seed), 1))
x = x / float(len(alphabet))
prediction = model.predict(x, verbose=0)
index = numpy.argmax(prediction)
print(int_to_char[seed1[0]], "->", int_to_char[index])
letter2 = "E"
seed2 = [char_to_int[letter2]]
seed = seed2
print("New start: ", letter1, letter2)
for i in range(0, 5):
x = numpy.reshape(seed, (1, len(seed), 1))
x = x / float(len(alphabet))
prediction = model.predict(x, verbose=0)
index = numpy.argmax(prediction)
print(int_to_char[seed[0]], "->", int_to_char[index])
seed = [index]
model.reset_states()
and these outputs:
M -> B
New start: M E
E -> C
C -> D
D -> E
E -> F
It looks like the LSTM did not learn the alphabet but just the positions of the letters, and that regardless of the first letter we feed in, the LSTM will always predict B since it's the second letter, then C and so on.
Therefore, how does keeping the previous hidden state as initial hidden state for the current hidden state help us with the learning given that during test if we start with the letter 'K' for example, letters A to J will not have been fed in before and the initial hidden state won't be the same as during training ?
3. Training an LSTM on a book for sentence generation
I want to train my LSTM on a whole book to learn how to generate sentences and perhaps learn the authors style too, how can I naturally train my LSTM on that text (input the whole text and let the LSTM figure out the dependencies between the words) instead of having to 'artificially' create batches of sentences from that book myself to train my LSTM on? I believe I should use stateful LSTMs could help but I'm not sure how.
Having a stateful LSTM in Keras means that a Keras variable will be used to store and update the state, and in fact you could check the value of the state vector(s) at any time (that is, until you call reset_states()). A non-stateful model, on the other hand, will use an initial zero state every time it processes a batch, so it is as if you always called reset_states() after train_on_batch, test_on_batch and predict_on_batch. The explanation about the state being reused for the next batch on stateful models is just about that difference with non-stateful; of course the state will always flow within each sequence in the batch and you do not need to have batches of size 1 for that to happen. I see two scenarios where stateful models are useful:
You want to train on split sequences of data because these are very long and it would not be practical to train on their whole length.
On prediction time, you want to retrieve the output for each time point in the sequence, not just at the end (either because you want to feed it back into the network or because your application needs it). I personally do that in the models that I export for later integration (which are "copies" of the training model with batch size of 1).
I agree that the example of an RNN for the alphabet does not really seem very useful in practice; it will only work when you start with the letter A. If you want to learn to reproduce the alphabet starting at any letter, you would need to train the network with that kind of examples (subsequences or rotations of the alphabet). But I think a regular feed-forward network could learn to predict the next letter of the alphabet training on pairs like (A, B), (B, C), etc. I think the example is meant for demonstrative purposes more than anything else.
You may have probably already read it, but the popular post The Unreasonable Effectiveness of Recurrent Neural Networks shows some interesting results along the lines of what you want to do (although it does not really dive into implementation specifics). I don't have personal experience training RNN with textual data, but there is a number of approaches you can research. You can build character-based models (like the ones in the post), where your input and receive one character at a time. A more advanced approach is to do some preprocessing on the texts and transform them into sequences of numbers; Keras includes some text preprocessing functions to do that. Having one single number as feature space is probably not going to work all that well, so you could simply turn each word into a vector with one-hot encoding or, more interestingly, have the network learn the best vector representation for each for, which is what they call en embedding. You can go even further with the preprocessing and look into something like NLTK, specially if you want to remove stop words, punctuation and things like that. Finally, if you have sequences of different sizes (e.g. you are using full texts instead of excerpts of a fixed size, which may or may not be important for you) you will need to be a bit more careful and use masking and/or sample weighting. Depending on the exact problem, you can set up the training accordingly. If you want to learn to generate similar text, the "Y" would be the similar to the "X" (one-hot encoded), only shifted by one (or more) positions (in this case you may need to use return_sequences=True and TimeDistributed layers). If you want to determine the autor, your output could be a softmax Dense layer.
Hope that helps.

How to print probability for repeated measures logistic regression?

I would like SAS to print the probability of my binary dependent variable occurring (“Calliphoridae” a particular fly family being present (1) or not (0), at a specific instance for my continuous independent variable (“degree_index” that was recorded from .055 to 2.89, but can be continuously recorded past 2.89 and always increases as time goes on) using Proc GENMOD. How do I change my code to print the probability, for example, that Calliphoridae is present at degree_index=.1?
My example code is:
proc genmod data=thesis descending ;
class Body_number ;
model Calliphoridae = degree_index / dist=binomial link=logit ;
repeated subject=Body_number/ type=cs;
estimate 'degreeindex=.1' intercept 1 degree_index 0 /exp;
estimate 'degree_index=.2' intercept 1 degree_index .1 /exp;run;
I get an output for the contrast estimate results as mean estimate at degree_index=.1 is ..99; degree_index=.2 is .98.
I think that it is correctly modeling the probability...I just didn't include the square of
the degree-day index. If you do, it allows the probability to increase and decrease. I
realized this when I did the probability by hand
(e^-1.1307x+.2119)/(1+e^-1.1307x+.2119) to verify that this really was modeling
probability when y=1 for the mean estimates at specific x values...and then I realized that it is
fitting a regression line and cannot increase and decrease because there is only
one x value. http://www.stat.sc.edu/~hansont/stat704/chapter14a.pdf

What is out of bag error in Random Forests? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
What is out of bag error in Random Forests?
Is it the optimal parameter for finding the right number of trees in a Random Forest?
I will take an attempt to explain:
Suppose our training data set is represented by T and suppose data set has M features (or attributes or variables).
T = {(X1,y1), (X2,y2), ... (Xn, yn)}
and
Xi is input vector {xi1, xi2, ... xiM}
yi is the label (or output or class).
summary of RF:
Random Forests algorithm is a classifier based on primarily two methods -
Bagging
Random subspace method.
Suppose we decide to have S number of trees in our forest then we first create S datasets of "same size as original" created from random resampling of data in T with-replacement (n times for each dataset). This will result in {T1, T2, ... TS} datasets. Each of these is called a bootstrap dataset. Due to "with-replacement" every dataset Ti can have duplicate data records and Ti can be missing several data records from original datasets. This is called Bootstrapping. (en.wikipedia.org/wiki/Bootstrapping_(statistics))
Bagging is the process of taking bootstraps & then aggregating the models learned on each bootstrap.
Now, RF creates S trees and uses m (=sqrt(M) or =floor(lnM+1)) random subfeatures out of M possible features to create any tree. This is called random subspace method.
So for each Ti bootstrap dataset you create a tree Ki. If you want to classify some input data D = {x1, x2, ..., xM} you let it pass through each tree and produce S outputs (one for each tree) which can be denoted by Y = {y1, y2, ..., ys}. Final prediction is a majority vote on this set.
Out-of-bag error:
After creating the classifiers (S trees), for each (Xi,yi) in the original training set i.e. T, select all Tk which does not include (Xi,yi). This subset, pay attention, is a set of boostrap datasets which does not contain a particular record from the original dataset. This set is called out-of-bag examples. There are n such subsets (one for each data record in original dataset T). OOB classifier is the aggregation of votes ONLY over Tk such that it does not contain (xi,yi).
Out-of-bag estimate for the generalization error is the error rate of the out-of-bag classifier on the training set (compare it with known yi's).
Why is it important?
The study of error estimates for bagged classifiers in Breiman
[1996b], gives empirical evidence to show that the out-of-bag estimate
is as accurate as using a test set of the same size as the training
set. Therefore, using the out-of-bag error estimate removes the need
for a set aside test set.1
(Thanks #Rudolf for corrections. His comments below.)
In Breiman's original implementation of the random forest algorithm, each tree is trained on about 2/3 of the total training data. As the forest is built, each tree can thus be tested (similar to leave one out cross validation) on the samples not used in building that tree. This is the out of bag error estimate - an internal error estimate of a random forest as it is being constructed.

Of Ways to Count the Limitless Primes [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Alright, so maybe I shouldn't have shrunk this question sooo much... I have seen the post on the most efficient way to find the first 10000 primes. I'm looking for all possible ways. The goal is to have a one stop shop for primality tests. Any and all tests people know for finding prime numbers are welcome.
And so:
What are all the different ways of finding primes?
Some prime tests only work with certain numbers, for instance, the Lucas–Lehmer test only works for Mersenne numbers.
Most prime tests used for big numbers can only tell you that a certain number is "probably prime" (or, if the number fails the test, it is definitely not prime). Usually you can continue the algorithm until you have a very high probability of a number being prime.
Have a look at this page and especially its "See Also" section.
The Miller-Rabin test is, I think, one of the best tests. In its standard form it gives you probable primes - though it has been shown that if you apply the test to a number beneath 3.4*10^14, and it passes the test for each parameter 2, 3, 5, 7, 11, 13 and 17, it is definitely prime.
The AKS test was the first deterministic, proven, general, polynomial-time test. However, to the best of my knowledge, its best implementation turns out to be slower than other tests unless the input is ridiculously large.
For a given integer, the fastest primality check I know is:
Take a list of 2 to the square root of the integer.
Loop through the list, taking the remainder of the integer / current number
If the remainder is zero for any number in the list, then the integer is not prime.
If the remainder was non-zero for all numbers in the list, then the integer is prime.
It uses significantly less memory than The Sieve of Eratosthenes and is generally faster for individual numbers.
The Sieve of Eratosthenes is a decent algorithm:
Take the list of positive integers 2 to any given Ceiling.
Take the next item in the list (2 in the first iteration) and remove all multiples of it (beyond the first) from the list.
Repeat step two until you reach the given Ceiling.
Your list is now composed purely of primes.
There is a functional limit to this algorithm in that it exchanges speed for memory. When generating very large lists of primes the memory capacity needed skyrockets.
#akdom's question to me:
Looping would work fine on my previous suggestion, and you don't need to do any calculations to determine if a number is even; in your loop, simply skip every even number, as shown below:
//Assuming theInteger is the number to be tested for primality.
// Check if theInteger is divisible by 2. If not, run this loop.
// This loop skips all even numbers.
for( int i = 3; i < sqrt(theInteger); i + 2)
{
if( theInteger % i == 0)
{
//getting here denotes that theInteger is not prime
// somehow indicate that some number, i, divides it and break
break;
}
}
A Rutgers grad student recently found a recurrence relation that generates primes. The difference of its successive numbers will generate either primes or 1's.
a(1) = 7
a(n) = a(n-1) + gcd(n,a(n-1)).
It makes a lot of crap that needs to be filtered out. Benoit Cloitre also has this recurrence that does a similar task:
b(1) = 1
b(n) = b(n-1) + lcm(n,b(n-1))
then the ratio of successive numbers, minus one [b(n)/b(n-1)-1] is prime. A full account of all this can be read at Recursivity.
For the sieve, you can do better by using a wheel instead of adding one each time, check out the Improved Incremental Prime Number Sieves. Here is an example of a wheel. Let's look at the numbers, 2 and 5 to ignore. Their wheel is, [2,4,2,2].
In your algorithm using the list from 2 to the root of the integer, you can improve performance by only testing odd numbers after 2. That is, your list only needs to contain 2 and all odd numbers from 3 to the square root of the integer. This cuts the number of times you loop in half without introducing any more complexity.
#theprise
If I were wanting to use an incrementing loop instead of an instantiated list (problems with memory for massive numbers...), what would be a good way to do that without building the list?
It doesn't seem like it would be cheaper to do a divisibility check for the given integer (X % 3) than just the check for the normal number (N % X).
If you're wanting to find a way of generating prime numbers, this have been covered in a previous question.