To teach myself PyMC I am trying to define a simple logistic regression. But I get a ZeroProbability error, and does not understand exactly why this happens or how to avoid it.
Here is my code:
import pymc
import numpy as np
x = np.array([85, 95, 70, 65, 70, 90, 75, 85, 80, 85])
y = np.array([1., 1., 0., 0., 0., 1., 1., 0., 0., 1.])
w0 = pymc.Normal('w0', 0, 0.000001) # uninformative prior (any real number)
w1 = pymc.Normal('w1', 0, 0.000001) # uninformative prior (any real number)
#pymc.deterministic
def logistic(w0=w0, w1=w1, x=x):
return 1.0 / (1. + np.exp(-(w0 + w1 * x)))
observed = pymc.Bernoulli('observed', logistic, value=y, observed=True)
And here is the trace back with the error message:
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/IPython/core/interactiveshell.py", line 2883, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-43ed68985dd1>", line 24, in <module>
observed = pymc.Bernoulli('observed', logistic, value=y, observed=True)
File "/usr/local/lib/python2.7/site-packages/pymc/distributions.py", line 318, in __init__
**arg_dict_out)
File "/usr/local/lib/python2.7/site-packages/pymc/PyMCObjects.py", line 772, in __init__
if not isinstance(self.logp, float):
File "/usr/local/lib/python2.7/site-packages/pymc/PyMCObjects.py", line 929, in get_logp
raise ZeroProbability(self.errmsg)
ZeroProbability: Stochastic observed's value is outside its support,
or it forbids its parents' current values.
I suspect np.exp to be causing the trouble, since it returns inf when the linear equation becomes too high.
I know there are other ways to define a logistic regression using PyMC (her is one), but I am interested in knowing why this approach does not work, and how I can define the regression using the Bernoulli object instead of using bernoulli_like
When you create a your normal stochastastic with pymc.Normal('w0', 0, 0.000001), PyMC2 initializes the value with a random draw from the prior distribution. Since your prior is so diffuse, this can be a value which is so unlikely that the posterior is effectively zero. To fix, just request a reasonable initial value for your Normal:
w0 = pymc.Normal('w0', 0, 0.000001, value=0)
w1 = pymc.Normal('w1', 0, 0.000001, value=0)
Here is a notebook with a few more details.
You have to put some sort of bound on the probability returned by the logistic function.
Maybe something like
#pymc.deterministic
def logistic(w0=w0, w1=w1, x=x):
tol = 1e-9
res = 1.0 / (1. + np.exp(-(w0 + w1 * x)))
return np.maximum(np.minimum(res, 1 - tol), tol)
I think you forgot the negative inside the exp() function, too.
#hahdawg's answer is good, but here's something else to consider.
For your uninformative priors on w0 and w1 I would first do an eyeball fit and then use uniforms with limits.
Obviously your w1 is going to be around 1/15 = .07, so a range like .04 to 1.2 might do it.
w0 is going to be in the range of -80/15 = -5.3, so something like -7 to -3 could do it.
I'm just saying this because exp can easily go bananas, so you have to be careful what you feed it.
If your inverse logit function comes out with a value too close to 0 or 1, logistic regression is guaranteed to break.
Out of curiosity, are you using a thin argument in your call to sample? There was a bug related to that, and it may be the culprit here.
Besides, thinning is not worthwhile in any case.
Related
I'm trying to solve these two integrals, I want to use a numerical approach because C_i will eventually become more complicated and I want to use it for all cases. Currently, C_i is just a constant so _quad is not able to solve it. I'm assuming because it is a Heaviside function and it is having trouble finding the a,b. Please correct me if I'm approaching this wrongly.
Equation 33
In [1]: import numpy as np
...: import scipy as sp
...: import sympy as smp
...: from sympy import DiracDelta
...: from sympy import Heaviside
In [2]: C_i = smp.Function('C_i')
In [3]: t, t0, x, v = smp.symbols('t, t0, x, v', positive=True)
In [4]: tot_l = 10
In [5]: C_fm = (1/tot_l)*v*smp.Integral(C_i(t0), (t0, (-x/v)+t, t))
In [6]: C_fm.doit()
Out[6]:
0.1*v*Integral(C_i(t0), (t0, t - x/v, t))
In [7]: C_fm.doit().simplify()
Out[7]:
0.1*v*Integral(C_i(t0), (t0, t - x/v, t))
In [8]: C_fms = C_fm.doit().simplify()
In [9]: t_arr = np.arange(0,1000,1)
In [10]: f_mean = smp.lambdify((x, v, t), C_fms, ['scipy', {'C_i': lambda e: 0.8}])
In [11]: try2 = f_mean(10, 0.1, t_arr)
Traceback (most recent call last):
File "/var/folders/rd/wzfh_5h110l121rmlxn61v440000gn/T/ipykernel_3164/3786931540.py", line 1, in <module>
try2 = f_mean(10, 0.1, t_arr)
File "<lambdifygenerated-1>", line 2, in _lambdifygenerated
return 0.1*v*quad(lambda t0: C_i(t0), t - x/v, t)[0]
File "/opt/anaconda3/lib/python3.9/site-packages/scipy/integrate/quadpack.py", line 348, in quad
flip, a, b = b < a, min(a, b), max(a, b)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Equation 34
In [12]: C_i = smp.Function('C_i')
In [13]: t, tao, x, v = smp.symbols('t, tao, x, v', positive=True)
In [14]: I2 = v*smp.Integral((C_i(t-tao))**2, (tao, 0, t))
In [15]: I2.doit()
Out[15]:
v*Integral(C_i(t - tao)**2, (tao, 0, t))
In [16]: I2.doit().simplify()
Out[16]:
v*Integral(C_i(t - tao)**2, (tao, 0, t))
In [17]: I2_s = I2.doit().simplify()
In [18]: tao_arr = np.arange(0,1000,1)
In [19]: I2_sf = smp.lambdify((v, tao), I2_s, ['scipy', {'C_i': lambda e: 0.8}])
In [20]: try2 = I2_sf(0.1, tao_arr)
Traceback (most recent call last):
File "/var/folders/rd/wzfh_5h110l121rmlxn61v440000gn/T/ipykernel_3164/4262383171.py", line 1, in <module>
try2 = I2_sf(0.1, tao_arr)
File "<lambdifygenerated-2>", line 2, in _lambdifygenerated
return v*quad(lambda tao: C_i(t - tao)**2, 0, t)[0]
File "/opt/anaconda3/lib/python3.9/site-packages/scipy/integrate/quadpack.py", line 351, in quad
retval = _quad(func, a, b, args, full_output, epsabs, epsrel, limit,
File "/opt/anaconda3/lib/python3.9/site-packages/scipy/integrate/quadpack.py", line 463, in _quad
return _quadpack._qagse(func,a,b,args,full_output,epsabs,epsrel,limit)
File "/opt/anaconda3/lib/python3.9/site-packages/sympy/core/expr.py", line 345, in __float__
raise TypeError("Cannot convert expression to float")
TypeError: Cannot convert expression to float
So you are passing an unevaluated Integrate to lambdify, which in turn translates it call to scipy.integrate.quad.
Looks like the integrals can't be evaluated even with doit and simplify calls. Have you actually looked at C_fms and I2_s? That's one of the first things I'd do when running this code!
I've never looked at this approach. I have seen people lambdify the objective expression, and then try to use that in quad directly.
quad has specific requirements (check the docs!). The objective function must return a single number, and the bounds must also be numbers.
In the first error, you are passing array t_arr as the t bound, and it got the usual ambiguity error when checking where it is bigger than the other bound, 0. That's that b < a test. quad cannot use arrays as bounds.
I not sure why the second case gets avoids this problem - bounds must be coming from somewhere else. But the error comes when quad calls the objective function, and expects a float return. Instead the function returns a sympy expression which sympy can't convert to float. My guess there's some variable in the expression that's still a sympy.symbol.
In diagnosing lambdify problems, it's a good idea to look at the generated code. One way is with help on the function, help(I2_sf). But with that you need to be able to read and understand python, including any numpy and scipy functions. That's not always easy.
Have you tried to use sympy's own numeric integrator? Trying to combine sympy and numpy/scipy often has problems.
I am using an LSTM to summarize a trajectory as shown below:
class RolloutEncoder(nn.Module):
def __init__(self, config):
super(RolloutEncoder, self).__init__()
self._input_size = (
2048 + 1
) # deter_state + imag_reward; fix and use config["deter_dim"] + 1
self._hidden_size = config["rollout_enc_size"]
self._lstm = nn.LSTM(self._input_size, self._hidden_size, bias=True)
def forward(self, traj):
features = traj["features_pred"]
rewards = traj["reward_pred"].unsqueeze(1)
input = torch.cat((features, rewards), dim=2)
encoding, (h_n, c_n) = self._lstm(input)
code = h_n.squeeze(0)
return code
My training loop is something like:
encoder = RolloutEncoder(config)
for e in range(episodes):
for step in range(steps):
print(f"Step {steps})
# calc traj
code = encoder(traj)
# some operations that do not modify code but only concat it with some other tensor
# calc loss
opt.zero_grad()
loss.backward()
opt.step()
On running, I get this error:
Step 0
Step 1
Step 2
Step 3
Step 4
Step 5
Step 6
Step 7
Step 8
Step 9
Step 10
Step 11
Step 12
Step 13
Step 14
Traceback (most recent call last):
File "/path/main.py", line 351, in <module>
agent_loss.backward()
File "/home/.conda/envs/abc/lib/python3.9/site-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/user/.conda/envs/abc/lib/python3.9/site-packages/torch/autograd/__init__.py", line 145, in backward
Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 2049]] is at version 8; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
On setting the anomaly_detection to True, it point to this line in the encoder definition:
encoding, (h_n, c_n) = self._lstm(input)
This is a very common error but I am not using any inplace operation. And the error occurs after running some steps successfully which is really weird. On inspecting, I found that the [16, 2049] tensor is one of the weights of the LSTM. I also tried using dummy random tensors in place of features and rewards but the error persists, suggesting that the traj calculation has nothing to do with this error. What might be the reason for this error?
I received error : too many values to unpack (expected 2) , when running the below code. anyone can help me? I added more details.
import gensim
import gensim.corpora as corpora
dictionary = corpora.Dictionary(doc_clean)
doc_term_matrix = [dictionary.doc2bow(doc) for doc in doc_clean]
Lda = gensim.models.ldamodel.LdaModel
ldamodel = Lda(doc_term_matrix, num_topics=3, id2word = dictionary, passes=50, per_word_topics = True, eval_every = 1)
print(ldamodel.print_topics(num_topics=3, num_words=20))
for i in range (0,46):
for index, score in sorted(ldamodel[doc_term_matrix[i]], key=lambda tup: -1*tup[1]):
print("subject", i)
print("\n")
print("Score: {}\t \nTopic: {}".format(score, ldamodel.print_topic(index, 6)))
Focusing on the loop, since this is where the error is being raised. Let's take it one iteration at a time.
>>> import numpy as np # just so we can use np.shape()
>>> i = 0 # value in first loop
>>> x = sorted( ldamodel[doc_term_matrix[i]], key=lambda tup: -1*tup[1] )
>>> np.shape(x)
(3, 3, 2)
>>> for index, score in x:
... pass
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack (expected 2)
Here is where your error is coming from. You are expecting this returned matrix to have 2 elements, however it is a multislice matrix with no simple infer-able way to unpack it. I do not personally have enough experience with this subject material to be able to infer what you might mean to be doing, I can only show you where your problem is coming from. Hope this helps!
In some deep learning models which analyse temporal data (e.g. audio, or video), we use a "time-distributed dense" (TDD) layer. What this creates is a fully-connected (dense) layer which is applied separately to every time-step.
In Keras this can be done using the TimeDistributed wrapper, which is actually slightly more general. In PyTorch it's been an open feature request for a couple of years.
How can we implement time-distributed dense manually in PyTorch?
Specifically for time-distributed dense (and not time-distributed anything else), we can hack it by using a convolutional layer.
Look at the diagram you've shown of the TDD layer. We can re-imagine it as a convolutional layer, where the convolutional kernel has a "width" (in time) of exactly 1, and a "height" that matches the full height of the tensor. If we do this, while also making sure that our kernel is not allowed to move beyond the edge of the tensor, it should work:
self.tdd = nn.Conv2d(1, num_of_output_channels, (num_of_input_channels, 1))
You may need to do some rearrangement of tensor axes. The "input channels" for this line of code are in fact coming from the "freq" axis (the "image's y axis") of your tensor, and the "output channels" will indeed be arranged on the "channel" axis. (The "y axis" of the output will be a singleton dimension of height 1.)
As pointed out in the discussion you referred to:
Meanwhile this #1935 will make TimeDistributed/Bottle unnecessary for Linear layers.
For TDD layer, it would be applying the linear layer directly on the inputs with time slices.
In [1]: import torch
In [2]: m = torch.nn.Linear(20, 30)
In [3]: input = torch.randn(128, 5, 20)
In [4]: output = m(input)
In [5]: print(output.size())
torch.Size([128, 5, 30])
The Following is a short illustration of the computational results
In [1]: import torch
In [2]: m = torch.nn.Linear(2, 3, bias=False)
...:
...: for name, param in m.named_parameters():
...: print(name)
...: print(param)
...:
weight
Parameter containing:
tensor([[-0.3713, -0.1113],
[ 0.2938, 0.4709],
[ 0.2791, 0.5355]], requires_grad=True)
In [3]: input = torch.stack([torch.ones(3, 2), 2 * torch.ones(3, 2)], dim=0)
...: print(input)
tensor([[[1., 1.],
[1., 1.],
[1., 1.]],
[[2., 2.],
[2., 2.],
[2., 2.]]])
In [4]: m(input)
Out[4]:
tensor([[[-0.4826, 0.7647, 0.8145],
[-0.4826, 0.7647, 0.8145],
[-0.4826, 0.7647, 0.8145]],
[[-0.9652, 1.5294, 1.6291],
[-0.9652, 1.5294, 1.6291],
[-0.9652, 1.5294, 1.6291]]], grad_fn=<UnsafeViewBackward>)
More details of the operation of nn.Linear can be seen from the torch.matmul. Note, you may need to add another non-linear function like torch.tanh() to get exact same layer as Dense() in Keras, where they support such non-linearity as keyword argument activation='tanh'.
For Timedistributed with e.g., CNN layers, maybe the snippet from the PyTorch forum could be useful.
Hi I'm a beginner keras.
I'm making some model.
step 1. Input batch and word list, (BATCH_SIZE, WORD_INDEX_LIST)
step 2. Get word embeddings each words (BATCH_SIZE, WORD_LENGTH, EMBEDDING_SIZE)
step 3. Average each each word embeddings in each batch. (BATCH_SIZE, EMBEDDING_SIZE)
step 4. Repeat vector N, (BATCH_SIZE, N, EMBEDDING_SIZE)
step 5. Apply Dense Layer each time step
So, I write code.
MAX_LEN = 20 ( = WORD_INDEX_LIST)
step 1
layer_target_input = Input(shape=(MAX_LEN,), dtype="int32", name="layer_target_input")
# step2
layer_embedding = Embedding(input_dim = n_symbols+1, output_dim=vector_dim,input_length=MAX_LEN,
name="embedding", weights= [embedding_weights],trainable = False)
encoded_target = layer_embedding(layer_target_input)
# step 3
encoded_target_agg = KL.core.Lambda( lambda x: K.sum(x, axis=1) )(encoded_target)
#step 4
encoded_target_agg_repeat = KL.RepeatVector( MAX_LEN)(encoded_target_agg)
# step 5
layer_annotated_tahn = KL.Dense(output_dim=50, name="layer_tahn")
layer_annotated_tahn_td = KL.TimeDistributed(layer_annotated_tahn) (encoded_target_agg_repeat)
model = KM.Model(input=[layer_target_input], output=[ layer_annotated_tahn_td])
r = model.predict({ "layer_target_input":dev_targ}) # dev_targ = (2, 20, 300)
But, when i run this code,
result is bellow.
Traceback (most recent call last):
File "Main.py", line 127, in <module>
r = model.predict({ "layer_target_input":dev_targ})
File "/usr/local/anaconda/lib/python2.7/site-packages/Keras-1.0.7-py2.7.egg/keras/engine/training.py", line 1180, in predict
batch_size=batch_size, verbose=verbose)
File "/usr/local/anaconda/lib/python2.7/site-packages/Keras-1.0.7-py2.7.egg/keras/engine/training.py", line 888, in _predict_loop
outs[i][batch_start:batch_end] = batch_out
ValueError: could not broadcast input array from shape (30,20,50) into shape (2,20,50)
why batch size is changed?
What I have wrong?
The problem is in Lambda operator. In your case it takes a tensor of shape (batch_size, max_len, embedding_size) and is expected to produce a tensor of shape (batch_size, embedding_size). However, the Lambda op doesn't know what transformation you apply internally, and therefore during the graph compilation mistakenly assumes that the shape doesn't change, therefore assuming that the output shape is (batch_size, max_len, embedding_size). The RepeastVector that follows expects the input to be two-dimensional, but never asserts that it is the case. The way it produces the expected shape is (batch_size, num_repetitions, in_shape[1]). Since Lambda mistakenly reported its shape as (batch_size, max_len, embedding_size), RepeatVector now reports its shape as (batch_size, num_repetitions, max_len) instead of expected (batch_size, num_repetitions, embedding_size). num_repetitions in your case is the same as max_len, so RepeastVector reports its shape as (batch_size, max_len, max_len). The way TimeDistributed(Dense) works is:
Reshape((-1, input_shape[2]))
Dense()
Reshape((-1, input_shape[1], num_outputs))
By now input_shape[2] is mistakenly assumed to be max_len instead of embedding_size, but the actual tensor that is given has correct shape of (batch_size, max_len, embedding_size), so what ends up happening is:
Reshape((batch_size * embedding_size, max_len))
Dense()
Reshape((batch_size * embedding_size / max_len, max_len, num_outputs))
In your case batch_size * embedding_size / max_len happens to be 2 * 300 / 20 = 30, that's where your wrong shape comes from.
To fix it, you need to explicitly tell Lambda the shape you want it to produce:
encoded_target_agg = KL.core.Lambda( lambda x: K.sum(x, axis=1), output_shape=(vector_dim,))(encoded_target)