Let's say we have the following neural network in PyTorch
seq_model = nn.Sequential(
nn.Linear(1, 13),
nn.Tanh(),
nn.Linear(13, 1))
With the following input tensor
input = torch.tensor([1.0, 1.0, 5.0], dtype=torch.float32).unsqueeze(1)
I can run forward through the net and get
seq_model(input)
tensor([[-0.0165],
[-0.0165],
[-0.2289]], grad_fn=<TanhBackward0>)
Probably I also can get a single scalar value as an output, but I'm not sure how.
Thank you. I'm trying to use such an network for reinforcment learning, and use it
as an value function approximator for game board state evaluation.
The first dimension of input represents the number of observations in your minibatch (3), the second dimension represents instead the number of features (1).
If you want to forward a single 3d input, the network must be modified (nn.Linear(1, 13) becomes nn.Linear(3, 13)), and you must remove unsqueeze(1) on input. Otherwise, you can merge the three outputs by using a loss to compute a single scalar from them.
Related
I am beginner in deep learning.
I am using this dataset and I want my network to detect keypoints of a hand.
How can I make my output layer's nodes to be in range [-1, 1] (range of normalized 2D points)?
Another problem is when I train for more than 1 epoch the loss gets negative values
criterion: torch.nn.MultiLabelSoftMarginLoss() and optimizer: torch.optim.SGD()
Here u can find my repo
net = nnModel.Net()
net = net.to(device)
criterion = nn.MultiLabelSoftMarginLoss()
optimizer = optim.SGD(net.parameters(), lr=learning_rate)
lr_scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer=optimizer, gamma=decay_rate)
You can use the Tanh activation function, since the image of the function lies in [-1, 1].
The problem of predicting key-points in an image is more of a regression problem than a classification problem (especially if you're making your model outputs + targets fall within a continuous interval). Therefore, I suggest you use the L2 Loss.
In fact, it could be a good exercise for you to determine which loss function that is appropriate for regression problems provides the lowest expected generalization error using cross-validation. There's several such functions available in PyTorch.
One way I can think of is to use torch.nn.Sigmoid which produces outputs in [0,1] range and scale outputs to [-1,1] using 2*x-1 transformation.
In my Neural network model, I represent an 8 word-sentence with a 8x256 dimensional embedding matrix. I want to give it to a LSTM as a input where LSTM takes a single word embedding at a time as input and process it. According to pytorch documentation, the input should be in the shape of (seq_len, batch, input_size). What is the correct way to convert my input to desired shape ? I don't want to mixup the numbers by mistake. I am quite new in PyTorch and row-major calculations, therefore I wanted to ask it here. I do it as follows, is it correct ?
x = torch.rand(8,256)
lstm_input = torch.reshape(x,(8,1,256))
Your solution is correct: you added a Singleton dimension for the "batch" dimension, leaving x to be with temporal dimension 8 and input dimension 256.
Since you are new to pytorch, here are a few equivalent ways of doing the same thing:
x = x[:, None, :]
Putting None in the dim=1 indicates to pytorch to add a singelton dimension.
Another way is to use view:
x = x.view(8, 1, 256)
I have recently started to learn Deep Learning and CNNs. I have come across the following code which defines a simple CNN.
Can anyone help me to understand how these lines work:
loss = layer_output[:, :, :, 0] - What is the result of this ? My question is that, the network has not been trained yet. Weights [Kernels] are not yet calculated. so, what data it is going to return !! Does 0 represent the first kernel ?
iterate = K.function([input_img], [loss, grads]) - There is not much documentation available on Keras site. What I understand is that iterate is a function which takes an Input tensor and returns a list of tensors, first one is loss and second one is grads. But, they are defined elsewhere !!
Define Input Image with these dimensions:
img_data = np.random.uniform(size=(1, 250, 250, 3))
There is a Simple CNN, which has one Convolutional layer. It uses two 3 X 3 kernels.
input = Input(shape=250, 250, 3,), name='input_1')
First_Conv2D = Conv2D(2, kernel_size=(3, 3), padding="same", name='conv2d_1', activation='relu')(input)
flat = Flatten(name='flatten_1')(First_Conv2D)
output = Dense(2, name='dense_1', activation='softmax')(flat)
model = Model(inputs=[input], outputs=[output])
layer_dict = dict([(layer.name, layer) for layer in model.layers[0:]])
layer_output = layer_dict['conv2d_1'].output
input_img = model.input
# Calculate loss and gradient.
loss = layer_output[:, :, :, 0]
grads = K.gradients(loss, input_img)[0]
# Define a Keras function
iterate = K.function([input_img], [loss, grads])
# Call iterate function
loss_value, grads_value = iterate([img_data])
Thank You.
This looks like a nasty dissection of Keras as an API. I reckon it leads to more confusion rather than an introduction to deep learning. Anyway, addressing your questions:
All tensors are symbolic meaning that until we run a session, they do not contain any values. They instead define a directed computation graph. The loss = layer_output[:,:,:,0] is an slicing operation that takes the first element of the last dimension returning another tensor with 3 dimensions. When you run the session with actual inputs, then the tensors will have values which these operations run. The operations are almost identical to NumPy ndarrays which are not symbolic and contain values, you can get an intuition.
K.function just glues the inputs to the outputs returning a single operation that when given the inputs it will follow the computation graph from the inputs to the defined outputs. In this case, given a list of single input it returns a list of 2 output tensors loss and gradients. These are still symbolic remember, if you try to print one you'll just get what it is and it's shape, data type.
I am working on predicting Semantic Textual Similarity (SemEval 2017 Task-1) between a pair of texts. The similarity score (output) is a continuous value between [0,5]. The neural network model (link below), therefore, has 6 units in the final layer for prediction between values [0,5]. The objective function used is the Pearson correlation coefficient and softmax activation is used. Now, in order to train the model, how can I give the target output values to the model? Since there are 6 output classes, I should probably send one-hot-encoded vectors of the output. In that case, how can we convert the output (which might be a float value such as 2.33) to a one-hot vector of length 6? Or is there any other way of specifying the target output and training the model?
Paper: http://nlp.arizona.edu/SemEval-2017/pdf/SemEval016.pdf
If the value you're trying to predict is continuously-defined, you might be better off configuring this as a regression architecture. This will be simpler to train and interpret and will give you non-integer predictions (which you can then bucket or threshold however you please).
In order to do this, replace your softmax layer with a layer containing a single neuron with a linear activation function. Then you can simply train this network using your real-valued similarity numbers at the output. For loss function, you can use MSE / L2 unless you have a reason to do otherwise.
I am using Pylearn2 OR Caffe to build a deep network. My target is ordered nominal. I am trying to find a proper loss function but cannot find any in Pylearn2 or Caffe.
I read a paper "Loss Functions for Preference Levels: Regression with Discrete Ordered Labels" . I get the general idea - but I am not sure I understand what will the thresholds be, if my final layer is a SoftMax over Logistic Regression (outputting probabilities).
Can some help me by pointing to any implementation of such a loss function ?
Thanks
Regards
For both pylearn2 and caffe, your labels will need to be 0-4 instead of 1-5...it's just the way they work. The output layer will be 5 units, each is a essentially a logistic unit...and the softmax can be thought of as an adaptor that normalizes the final outputs. But "softmax" is commonly used as an output type. When training, the value of any individual unit is rarely ever exactly 0.0 or 1.0...it's always a distribution across your units - which log-loss can be calculated on. This loss is used to compare against the "perfect" case and the error is back-propped to update your network weights. Note that a raw output from PL2 or Caffe is not a specific digit 0,1,2,3, or 5...it's 5 number, each associated to the likelihood of each of the 5 classes. When classifying, one just takes the class with the highest value as the 'winner'.
I'll try to give an example...
say I have a 3 class problem, I train a network with a 3 unit softmax.
the first unit represents the first class, second the second and third, third.
Say I feed a test case through and get...
0.25, 0.5, 0.25 ...0.5 is the highest, so a classifier would say "2". this is the softmax output...it makes sure the sum of the output units is one.
You should have a look at ordinal (logistic) regression. This is the formal solution to the problem setup you describe ( do not use plain regression as the distance measures of errors are wrong).
https://stats.stackexchange.com/questions/140061/how-to-set-up-neural-network-to-output-ordinal-data
In particular I recommend looking at Coral ordinal regression implementation at
https://github.com/ck37/coral-ordinal/issues.