I've updated my question based upon the variable dimension of variables.
Suppose the input tensor stores the 3d points with dimension 10x3, 10 means the #points and 3 is the feature dimension (say x,y,z coordinates). The dimension of the variable depends on the input tensor, say its dimension is 10x10. When the input tensor changes its dimension to 50x3, then the dimension of the variable will also have to change to 50x50.
I know in Tensorflow, if the input dimension is changing/unknown, we can declare it as tf.placeholder(None,3). However, I never meet the situation where the size of variable is changing/unknown, it seems that the variable will always have the fixed dimension.
I am currently learning PyTorch and don't know whether PyTorch supports this function. Any information would be appreciated!
========= Original question ========
I have a variable in which the size is changeable when input dimension changes. For example, if input is 10x2, then the variable should be 10x10. If input is 25x2, then the variable should be 25x25. As my understanding, the variable is used to store weights, which normally has fixed dimension. However in my case, the dimension of the variable depends on input data, which can change. Does PyTorch currently supports this kind of function?
Thanks!
Your question is little ambiguous. When you say, your input is say, 10x2, you need to define what the input tensor contains.
I am assuming you are talking about torch.autograd.Variable. If you want to use PyTorch's functionality, what you need to do is to provide your input through a tensor in the desired shape of the target function.
For example, if you want to use RNN implemented in PyTorch for an input sentence of length 10 where each word is represented by a 300 dimensional vector (e.g., word embedding), then you can do as follows.
rnn = nn.RNN(300, 256, 2) # emb_size=300,hidden_size=256,num_layers=2
input = Variable(torch.randn(10, 1, 300)) # sent_length=10,batch_size=1,emb_size=300
h0 = Variable(torch.randn(2, 1, 256)) # num_layers=2,batch_size=1,hidden_size=256
output, hn = rnn(input, h0)
If you have more than 1 sentence, then you can provide them in batch. In that case, you need to pad them to handle variable lengths. As you can see, RNN doesn't care about the sentence length, it can handle variable lengths but to provide many sentences in batch, you need padding. You can explore related functionalities in the official documentation.
Since you didn't mention what is your input actually, I am assuming you need variables with variable number of timesteps, in that case PyTorch can serve your purpose. Actually, PyTorch is developed to meet all basic functionalities that are required to build deep neural network architectures.
Related
I am using LightGBM for regression problems on my project and the input data has 800 numeric variables which is high dimensional and sparse dataset.
I want to use as many variables as possible in each iterations. In this case, should I unlimit max_depth?
Because I set max_depth=2 to overcome overfitting issue but it seems using only 1~3 variables in each iterations and those variables are used reapetedly.
And I want to know how tree depth affects to learning result of regression tree.
Detailed info. of the present model:
Number of input variables=800 (numeric)
target variables=1 (numeric)
objective=regression
max_depth=2
num_leaves=3
num_iterations=2000
learning_rate=0.01
no
13
what's the meaning of 'parameterize' in deep learning? As shown in the photo, does it means the matrix 'A' can be changed by the optimization during training?
Yes, when something can be parameterized it means that gradients can be calculated.
This means that the (dE/dw) which means the derivative of Error with respect to weight can be calculated (i.e it must be differentiable) and subtracted from the model weights along with obviously a learning_rate and other params being included depending on the optimizer.
What the paper is saying is that if you make a binary matrix a weight and then find the gradient (dE/dw) of that weight with respect to a loss and then make an update on the binary matrix through backpropagation, there is not really an activation function (which by requirement must be differentiable) that can keep the values discrete (like 0 and 1) but rather you will end up with continous values (like these decimal values).
Therefore it is saying since the idea of having binary values be weights and for them to be back-propagated in a way where the weights + activation function also yields an updated weight matrix that is also binary is difficult, another solution like the Bernoulli Distribution is used instead to initialize parameters of a model.
Hope this helps,
In my Neural network model, I represent an 8 word-sentence with a 8x256 dimensional embedding matrix. I want to give it to a LSTM as a input where LSTM takes a single word embedding at a time as input and process it. According to pytorch documentation, the input should be in the shape of (seq_len, batch, input_size). What is the correct way to convert my input to desired shape ? I don't want to mixup the numbers by mistake. I am quite new in PyTorch and row-major calculations, therefore I wanted to ask it here. I do it as follows, is it correct ?
x = torch.rand(8,256)
lstm_input = torch.reshape(x,(8,1,256))
Your solution is correct: you added a Singleton dimension for the "batch" dimension, leaving x to be with temporal dimension 8 and input dimension 256.
Since you are new to pytorch, here are a few equivalent ways of doing the same thing:
x = x[:, None, :]
Putting None in the dim=1 indicates to pytorch to add a singelton dimension.
Another way is to use view:
x = x.view(8, 1, 256)
I'm trying to read variable length 1-D inputs into a Tensorflow CNN.
I have previously implemented reading fixed length inputs by first constructing a CSV file (where the first column is the label and the remaining columns are the input values - flattened spectrogram data all padded/truncated to the same length) using tf.TextLineReader().
This time I have a directory full of files each one containing a line of data I want to use as input (flattened spectrogram data again but I do not want to force them to the same dimensions), and the line lengths are not fixed. I'm getting an error trying to use the previous approach of compiling a CSV first. I looked into the documentation of tf.TextLineReader() and it specifies that all CSV rows must be the same shape, so I am stuck! Any help would be much appreciated, thanks :)
I'm assuming that the data isn't changing shape when you have a longer or shorter sample right? By that I mean that if you trained your network on arrays of 1000 pixels for example, with a kernel of say [5,1] size. That [5,1] kernel needs to see the same patterns in the variable length data as it did in the training data. If your data is stretched or shrunk, then the correct solution is to interpolate the data to the same size as the training data so the shapes/patterns match.
Assuming you just want variable length inputs, then in theory you should be able to do this by setting your batch size to 1 and varying the 1st dimension of the data.
So your input placeholder would look like:
X = tf.placeholder(dtype, shape=[1,None,1,1])
The 4 shape arguments are: 1=batch size; None=unknown first dimension size; 1=unused because it's a 1D dataset, 1=one channel images, again unused but necessary for tf.conv2d to receive the expected 4D image.
This is not very different from configuring tensorflow to support variable batch sizes. So you should review this link below and understand that process.
get the size of a variable batch dimension
Note that you can't use a batch size more than 1 here because you wouldn't be able to construct a matrix with missing values in the 2nd dimension. I expect the convolution operations to work with this variable dimension (though I haven't actually tried this).
Another option to deal with this problem would be to pad your inputs with 0's so they all have a common length, but that will need to have been trained into the model up front.
I have a dialog corpus like below. And I want to implement a LSTM model which predicts a system action. The system action is described as a bit vector. And a user input is calculated as a word-embedding which is also a bit vector.
t1: user: "Do you know an apple?", system: "no"(action=2)
t2: user: "xxxxxx", system: "yyyy" (action=0)
t3: user: "aaaaaa", system: "bbbb" (action=5)
So what I want to realize is "many to many (2)" model. When my model receives a user input, it must output a system action.
But I cannot understand return_sequences option and TimeDistributed layer after LSTM. To realize "many-to-many (2)", return_sequences==True and adding a TimeDistributed after LSTMs are required? I appreciate if you would give more description of them.
return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence.
TimeDistributed: This wrapper allows to apply a layer to every temporal slice of an input.
Updated 2017/03/13 17:40
I think I could understand the return_sequence option. But I am not still sure about TimeDistributed. If I add a TimeDistributed after LSTMs, is the model the same as "my many-to-many(2)" below? So I think Dense layers are applied for each output.
The LSTM layer and the TimeDistributed wrapper are two different ways to get the "many to many" relationship that you want.
LSTM will eat the words of your sentence one by one, you can chose via "return_sequence" to outuput something (the state) at each step (after each word processed) or only output something after the last word has been eaten. So with return_sequence=TRUE, the output will be a sequence of the same length, with return_sequence=FALSE, the output will be just one vector.
TimeDistributed. This wrapper allows you to apply one layer (say Dense for example) to every element of your sequence independently. That layer will have exactly the same weights for every element, it's the same that will be applied to each words and it will, of course, return the sequence of words processed independently.
As you can see, the difference between the two is that the LSTM "propagates the information through the sequence, it will eat one word, update its state and return it or not. Then it will go on with the next word while still carrying information from the previous ones.... as in the TimeDistributed, the words will be processed in the same way on their own, as if they were in silos and the same layer applies to every one of them.
So you dont have to use LSTM and TimeDistributed in a row, you can do whatever you want, just keep in mind what each of them do.
I hope it's clearer?
EDIT:
The time distributed, in your case, applies a dense layer to every element that was output by the LSTM.
Let's take an example:
You have a sequence of n_words words that are embedded in emb_size dimensions. So your input is a 2D tensor of shape (n_words, emb_size)
First you apply an LSTM with output dimension = lstm_output and return_sequence = True. The output will still be a squence so it will be a 2D tensor of shape (n_words, lstm_output).
So you have n_words vectors of length lstm_output.
Now you apply a TimeDistributed dense layer with say 3 dimensions output as parameter of the Dense. So TimeDistributed(Dense(3)).
This will apply Dense(3) n_words times, to every vectors of size lstm_output in your sequence independently... they will all become vectors of length 3. Your output will still be a sequence so a 2D tensor, of shape now (n_words, 3).
Is it clearer? :-)
return_sequences=True parameter:
If We want to have a sequence for the output, not just a single vector as we did with normal Neural Networks, so it’s necessary that we set the return_sequences to True. Concretely, let’s say we have an input with shape (num_seq, seq_len, num_feature). If we don’t set return_sequences=True, our output will have the shape (num_seq, num_feature), but if we do, we will obtain the output with shape (num_seq, seq_len, num_feature).
TimeDistributed wrapper layer:
Since we set return_sequences=True in the LSTM layers, the output is now a three-dimension vector. If we input that into the Dense layer, it will raise an error because the Dense layer only accepts two-dimension input. In order to input a three-dimension vector, we need to use a wrapper layer called TimeDistributed. This layer will help us maintain output’s shape, so that we can achieve a sequence as output in the end.