Why is alpha set to 15 in NLTK - VADER? - nltk

I am trying to understand what the VADER does for analysis of sentences.
Why is the hyper-parameter Alpha set to 15 here? I understand that the it is unstable when left unbound, but why 15?
def normalize(score, alpha=15):
"""
Normalize the score to be between -1 and 1 using an alpha that
approximates the max expected value
"""
norm_score = score/math.sqrt((score*score) + alpha)
return norm_score

Vader's normalization equation is which is the equation for
I have read the paper of the research for Vader from here:http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf
Unfortunately, I could not find any reason why such a formula and 15 as the value for alpha was chosen but the experiments and the graph show that as x grows which is the sum of sentiments' scores grow the value becomes closer to -1 or 1 which indicates that as number of words grow the score tends more towards -1 or 1. Which means that Vader works better with short documents or tweets compared to long documents.

Related

extended kalman filter(EKF): add a bias to measurent and estimation

I working on EKF and try to add a bias(1 degree or 1 arcsec) to the measurements. I have two measurements(angles) and finding the position(3) and velocity(3) so total state(6) so I want to extend the filter and for bias estimation (7th and 8th state). The idea is that add bias value to measurements and estimate this values. For example, one of the measurement(190 degree) and the second one(5 degree). If I add 1-degree bias to my measurements new values are 191 and 6 degrees, respectively. My simulation result is, starting from 1 degree and going through zero If I am estimating the bias but I am expecting the result that will start about zero and approximately goes 1 degree.
Where am I wrong? Can you share your ideas or provide some documents for this idea?

Can LSTM train for regression with different numbers of feature in each sample?

In my problem, each training and testing sample has different number of features. For example, the training sample is as following:
There are four features in sample1: x1, x2, x3, x4, y1
There are two features in sample2: x6, x3, y2
There are three features in sample3: x8, x1, x5, y3
x is feature, y is target.
Can these samples train for the LSTM regression and make prediction?
Consider following scenario: you have a (way to small) dataset of 6 sample sequences of lengths: { 1, 2, 3, 4, 5, 6} and you want to train your LSTM (or, more general, an RNN) with minibatch of size 3 (you feed 3 sequences at a time at every training step), that is, you have 2 batches per epoch.
Let's say that due to randomization, on step 1 batch ended up to be constructed from sequences of lengths {2, 1, 5}:
batch 1
----------
2 | xx
1 | x
5 | xxxxx
and, the next batch of sequences of length {6, 3, 4}:
batch 2
----------
6 | xxxxxx
3 | xxx
4 | xxxx
What people would typically do, is pad sample sequences up to the longest sequence in the minibatch (not necessarily to the length of the longest sequence overall) and to concatenate sequences together, one on top of another, to get a nice matrix that can be fed into RNN. Let's say your features consist of real numbers and it is not unreasonable to pad with zeros:
batch 1
----------
2 | xx000
1 | x0000
5 | xxxxx
(batch * length = 3 * 5)
(sequence length 5)
batch 2
----------
6 | xxxxxx
3 | xxx000
4 | xxxx00
(batch * length = 3 * 6)
(sequence length 6)
This way, for the first batch your RNN will only run up to necessary number of steps (5) to save some compute. For the second batch it will have to go up to the longest one (6).
The padding value is chosen arbitrarily. It usually should not influence anything, unless you have bugs. Trying some bogus values, like Inf or NaN may help you during debugging and verification.
Importantly, when using padding like that, there are some other things to do for model to work correctly. If you are using backpropagation, you should exclude the results of the padding from both, output computation and gradient computation (deep learning frameworks will do that for you). If you are running a supervised model, labels should typically also be padded and padding should not be considered for the loss calculation. For example, you calculate cross-entropy for the entire batch (with padding). In order to calculate a correct loss, the bogus cross-entropy values that correspond to padding should be masked with zeros, then each sequence should be summed independently and divided by its real length. That is, averaging should be performed without taking padding into account (in my example this is guaranteed due to the neutrality of zero with respect to addition). Same rule applies to regression losses and metrics such as accuracy, MAE etc (that is, if you average together with padding your metrics will also be wrong).
To save even more compute, sometimes people construct batches such that sequences in batches have roughly the same length (or even exactly the same, if dataset allows). This may introduce some undesired effects though, as long and short sequences are never in the same batch.
To conclude, padding is a powerful tool and if you are attentive, it allows you to run RNNs very efficiently with batching and dynamic sequence length.
Yes. Your input_size for LSTM-layer should be maximal among all input_sizes. And spare cells you replace with nulls:
max(input_size) = 5
input array = [x1, x2, x3]
And you transform it this way:
[x1, x2, x3] -> [x1, x2, x3, 0, 0]
This approach is rather common and does not show any negative big influence on prediction accuracy.

Forest Fire simulation in Octave or Matlab

In this page https://courses.cit.cornell.edu/bionb441/CA/forest.m
I found a code named "Forest Fire"
I am trying to figure out how this code works for educational purposes.
Here are the rules:
Cells can be in 3 different states. State=0 is empty, state=1 is burning and state=2 is forest.
If one or more of the 4 neighbors of a cell is burning and it is forest (state=2) then the new state is burning (state=1).
A cell which is burning (state=1) becomes empty (state=0).
There is a low probablity (0.000005) of a forest cell (state=2) starting to burn on its own (from lightning).
There is a low probability (say, 0.01) of an empty cell becoming forest to simulate growth.
what it is not very clear how it works is...
sum = (veg(1:n,[n 1:n-1])==1) + (veg(1:n,[2:n 1])==1) + ...
(veg([n 1:n-1], 1:n)==1) + (veg([2:n 1],1:n)==1) ;
veg = 2*(veg==2) - ((veg==2) & (sum> 0 | (rand(n,n)< Plightning))) + ...
2*((veg==0) & rand(n,n)< Pgrowth) ;
There is no problem in running the code, it just I am confused what are these vectors (sum and veg). Especially what makes (veg(1:n,[n 1:n-1])==1).
What I see is that, both are matrixes and veg is the data of the plot (matriz with 0's 1's and 2's).
I really appreciate any help you can provide.
Binary comparison operators on a matrix and a scalar return the matrix of elements of that binary comparison with the scalar and the corresponding element of the original matrix.
sum is a matrix in which each cell contains the number of adjacent cells in the corresponding matrix veg that are on fire (==1).
(veg(1:n,[n 1:n-1])==1) is a matrix of logical 1s and 0s (I don't know if the data type is static or dynamic) in which each cell equals 1 when the cell to the left of the corresponding one in veg is on fire (==1).
https://courses.cit.cornell.edu/bionb441/CA/
Look at the URL, go back up the tree to see the source.
The rule:
Cells can be in 3 different states. State=0 is empty, state=1 is burning and state=2 is forest.
If one or more of the 4 neighbors if a cell is burning and it is forest (state=2) then the new state is burning (state=1).
There is a low probablity (say 0.000005) of a forest cell (state=2) starting to burn on its own (from lightning).
A cell which is burning (state=1) becomes empty (state=0).
There is a low probability (say, 0.01) of an empty cell becoming forest to simulate growth.
The array is considered to be toroidly connected, so that fire which burns to left side will start fires on the right. The top and bottom are similarly connected.
The update code:
sum = (veg(1:n,[n 1:n-1])==1) + (veg(1:n,[2:n 1])==1) + ...
(veg([n 1:n-1], 1:n)==1) + (veg([2:n 1],1:n)==1) ;
veg = ...
2*(veg==2) - ((veg==2) & (sum> 0 | (rand(n,n)< Plightning))) + ...
2*((veg==0) & rand(n,n)< Pgrowth) ;
Note that the toroidal connection is implemented by the ordering of subscripts.

How to print probability for repeated measures logistic regression?

I would like SAS to print the probability of my binary dependent variable occurring (“Calliphoridae” a particular fly family being present (1) or not (0), at a specific instance for my continuous independent variable (“degree_index” that was recorded from .055 to 2.89, but can be continuously recorded past 2.89 and always increases as time goes on) using Proc GENMOD. How do I change my code to print the probability, for example, that Calliphoridae is present at degree_index=.1?
My example code is:
proc genmod data=thesis descending ;
class Body_number ;
model Calliphoridae = degree_index / dist=binomial link=logit ;
repeated subject=Body_number/ type=cs;
estimate 'degreeindex=.1' intercept 1 degree_index 0 /exp;
estimate 'degree_index=.2' intercept 1 degree_index .1 /exp;run;
I get an output for the contrast estimate results as mean estimate at degree_index=.1 is ..99; degree_index=.2 is .98.
I think that it is correctly modeling the probability...I just didn't include the square of
the degree-day index. If you do, it allows the probability to increase and decrease. I
realized this when I did the probability by hand
(e^-1.1307x+.2119)/(1+e^-1.1307x+.2119) to verify that this really was modeling
probability when y=1 for the mean estimates at specific x values...and then I realized that it is
fitting a regression line and cannot increase and decrease because there is only
one x value. http://www.stat.sc.edu/~hansont/stat704/chapter14a.pdf

Simulating a roll with a biased dice

I did a search but didn't really get any proper hits. Maybe I used incorrect terms?
What I want to ask about is an algorithm for simulating a biased role rather than a standard supposedly-random roll.
It wouldn't be a problem if you can't give me exact answers (maybe the explanation is lengthy?) but I would appreciate &pointers to material I can read about it.
What I have in mind is to for example, shift the bias towards the 5, 6 area so that the numbers rolls would have a higher chances of getting a 5 or a 6; that's the sort of problem I'm trying to solve.
[Update]
Upon further thought and by inspecting some of the answers, I've realized that what I want to achieve is really the Roulette Wheel Selection operator that's used in genetic algorithms since having a larger sector means increasing the odds the ball will land there. Am I correct with this line of thought?
In general, if your probabilities are {p1,p2, ...,p6}, construct the following helper list:
{a1, a2, ... a5} = { p1, p1+p2, p1+p2+p3, p1+p2+p3+p4, p1+p2+p3+p4+p5}
Now get a random number X in [0,1]
If
X <= a1 choose 1 as outcome
a1 < X <= a2 choose 2 as outcome
a2 < X <= a3 choose 3 as outcome
a3 < X <= a4 choose 4 as outcome
a4 < X <= a5 choose 5 as outcome
a5 < X choose 6 as outcome
Or, more efficient pseudocode
if X > a5 then N=6
elseif X > a4 then N=5
elseif X > a3 then N=4
elseif X > a2 then N=3
elseif X > a1 then N=2
else N=1
Edit
This is equivalent to the roulette wheel selection you mention in your question update as shown in this picture:
Let's say the die is biased towards a 3.
Instead of picking a random entry from an array 1..6 with 6 entries, pick a random entry from an array 1..6, 3, 3. (8 entries).
Make a 2 dimensional array of possible values and their weights. Sum up all the weights. Randomly choose a value on the range of 0 to the sum of the weights.
Now iterate through the array while keeping an accumulator of the weights seen so far. Once this value exceeds your random number, pick the value of the die represented here.
Hope this helps
Hmm. Say you want to have a 1/2 chance of getting a six, and a 1/10 chance of getting any other face. To simulate this, you could generate a random integer n in [1, 2, ... , 10] , and the outcome would map to six if n is in [6, 7, 8, 9, 10] and map to n otherwise.
One way that's usually fairly easy is to start with a random number in an expanded range, and break that range up into unequal pieces.
For example, with a perfectly even (six-sided) die, each number should come up 1/6th of the time. Let's assume you decide on round percentages -- all the other numbers should come up 16 percent of the time, but 2 should come up 17 percent of the time.
You could do that by generating numbers from 1 to 100. If the number is from 1 to 16, it comes out as a 1. If it's from 17 to 34, it comes out as a 2. If it's from 34 to 50, it comes out as a 3 (and the rest are blocks of 16 apiece).