I am using an example of Variationnal AutoEncoder on MNIST data (2D images) made by others (http://louistiao.me/posts/implementing-variational-autoencoders-in-keras-beyond-the-quickstart-tutorial/) and change it to use it on music but I have a problem.
I choosed this example because the author don't use convolutionnal layers, only Dense layers so it is easy to use it for time series.
Images values are in the interval [0,1] ( [0,255]/255 ).
Musical values are in the interval [-1,1].
So I changed the musical values to fit the model: x = (x+1)/ 2 -> [0,1].
Result: the network isn't learning (my val_loss doesn't decrease).
I have no idea why !!!
PS: I tried to changed the activation relu/sigmoid to tanh instead of changing the data. The loss was decreasing but she was negative... and I had no good results either.
Related
I'm investigating the task of training a neural network to predict one future value given a sinusoidal input. So for example, as seen in the Figure, the input signal is x and the expected output signal y. The model's output is y^. Doing the regression task is fairly straightforward, and there are a lot of choices for this problem. I'm using a simple recurrent neural network with mean-squared error (MSE) loss between y and y^.
Additionally, suppose I know that the sinusoid is made up of N modalities, e.g., at some points, the wave oscillates at 5 Hz, then 10 Hz, then back to 5 Hz, then up to 15 Hz maybe—i.e., N=3.
In this case, I have ground-truth class labels in a vector k and the model does both regression and classification, additionally outputting a vector k^. An example is shown in the Figure. As this is a multi-class problem with exclusivity, I figured binary cross entropy (BCE) loss should be relevant here.
I'm sure there is a lot of research about combining loss functions, but does just adding MSE and BCE make sense? Scaling one up or down by a factor of 10 doesn't seem to change the learning outcome too much. So I was wondering what is considered the standard approach to problems where there is a joint classification and regression objective.
Additionally, on top of just BCE, I want to penalize k^ for quickly jumping around between classes; for example, if the model guesses one class, I'd like it to stay in that one class and switch only when it's necessary. See how in the Figure, there are fast dark blue blips in k^. I would like the same solid bands as seen in k, and naive BCE loss doesn't account for that.
Appreciate any and all advice!
I build a LSTM to forecast Precipitation, but it doesn't work well.
My code is very simple and data is very short only contains 720 points.
i use MinMaxScale to scale the data.
this is my code, seq_len = 12
model = Sequential([
layers.LSTM(2, input_shape=(SEQ_LEN, 1),
layers.Dense(1)])
my data is like this
and the output compares with true value like this
I use adam and mae loss function, epoch=10
is it underfitting? or is this simple net can't do this work?
r2_score is no more than 0.55
please tell me how to adjust it. thanks
there are so many options;
first of all it would be better to define the optimized window size by changing the periods of the sequences
The second option would be changing the batch-size of the dataset
Change optimizer into SGD cause of few datapoints and before training model define the best values for learning rate by setting Learning Rate Schedule callback
Try another model architecture with convolution layers and etc
Sometimes it would be a trick to help model performance by setting lambda layer after the last layer to scale up values cause of lstm default activation function is tanh.
I am working on a project to predict soccer player values from a set of inputs. The data consists of about 19,000 rows and 8 columns (7 columns for input and 1 column for the target) all of numerical values.
I am using a fully connected Neural Network for the prediction but the problem is the loss is not decreasing as it should.
The loss is very large (1e+13) and doesn’t decrease as it should, it just fluctuates.
This is the function I am using to run the model:
def gradient_descent(model, learning_rate, num_epochs, data_loader, criterion):
losses = []
optimizer = torch.optim.Adam(model.parameters())
for epoch in range(num_epochs): # one epoch
for inputs, outputs in data_loader: # one iteration
inputs, outputs = inputs.to(torch.float32), outputs.to(torch.float32)
logits = model(inputs)
loss = criterion(torch.squeeze(logits), outputs) # forward-pass
optimizer.zero_grad() # zero out the gradients
loss.backward() # compute the gradients (backward-pass)
optimizer.step() # take one step
losses.append(loss.item())
loss = sum(losses[-len(data_loader):]) / len(data_loader)
print(f'Epoch #{epoch}: Loss={loss:.3e}')
return losses
The model is fully connected neural network with 4 hidden layers, each with 7 neurons. input layer has 7 neurons and output has 1. I am using MSE for loss function. I tried changing the learning rate but it is still bad.
What could be the reason behind this?
Thank you!
It is difficult to diagnose your problem from the information you provided, but I'll try to point you in some useful directions.
Data Normalization:
The way we initialize the weights in deep NN has a significant effect on the training process. See, e.g.:
He, K., Zhang, X., Ren, S. and Sun, J., Delving deep into rectifiers: Surpassing human-level performance on imagenet classification (ICCV 2015).
Most initialization methods assume the inputs have zero mean and unit variance (or similar statistics). If your inputs violate these assumptions, you will find it difficult to train. See, e.g., this post.
Normalize the Targets:
You are trying to solve a regression problem (MSE loss), it might be the case that your targets are poorly scaled and causing very large loss values. Try and normalize the targets to span a more compact range.
Learning Rate:
Try and adjust your learning rate: both increasing it and decreasing it by orders of magnitude.
I'm studying on a deep learning(supervised-learning) to estimate depth images from monocular images.
And the dataset currently uses KITTI data. RGB images (input image) are used KITTI Raw data, and data from the following link is used for ground-truth.
In the process of learning a model by designing a simple encoder-decoder network, the result is not so good, so various attempts are being made.
While searching for various methods, I found that groundtruth only learns valid areas by masking because there are many invalid areas, i.e., values that cannot be used, as shown in the image below.
So, I learned through masking, but I am curious about why this result keeps coming out.
and this is my training part of code.
How can i fix this problem.
for epoch in range(num_epoch):
model.train() ### train ###
for batch_idx, samples in enumerate(tqdm(train_loader)):
x_train = samples['RGB'].to(device)
y_train = samples['groundtruth'].to(device)
pred_depth = model.forward(x_train)
valid_mask = y_train != 0 #### Here is masking
valid_gt_depth = y_train[valid_mask]
valid_pred_depth = pred_depth[valid_mask]
loss = loss_RMSE(valid_pred_depth, valid_gt_depth)
As far as I can understand, you are trying to estimate depth from an RGB image as input. This is an ill-posed problem since the same input image can project to multiple plausible depth values. You would need to integrate certain techniques to estimate accurate depth from RGB images instead of simply taking an L1 or L2 loss between an RGB image and its corresponding depth image.
I would suggest you to go through some papers in estimating depth from single images such as: Depth Map Prediction from a Single Image using a Multi-Scale Deep Network where they use a network to first estimate the global structure of the given image and then use a second network that refines the local scene information. Instead of taking a simple RMSE loss, as you did, they use a scale-invariant error function in which the relationship between points is measured.
I've a small dataset of 500 plant images and I have to predict a number for a single image in range [1, 10]. There is a order relation between the numbers (10 > 9 > ... > 1). This problem is similar to age estimation based on a single photo.
I tried regression using Resnet18, Resnet34 and VGG16. None of them gave a very good result.
The interesting point is that when I plotted the heatmap for a few images it showed that the model was picking the wrong spots to predict the answer. It's like, if I was suppose to predict age based on facial photo, the cnn gave more value to the background than to the actual face.
I tried other approachs as well, like classification and learning to rank, but it happens the same thing when I do heatmap. In these approachs, the best accuracy I get is 30% using classification and 35% using learning to rank.
The regression and classification approachs I used Fastai implementation with pretrained. The learning to rank approach I used this : https://github.com/Raschka-research-group/coral-cnn. I changed a little bit to be able to use a pretrained model as well.
Another important point is that the dataset is unbalanced. 80% of the dataset corresponds to classes 6 to 10.
Does anyone has any tips to improve it or another approach I could try?
EDIT:
My data augmentation looks like this:
transforms.Compose([
transforms.Resize(256), transforms.CenterCrop(224),
transforms.RandomHorizontalFlip(p=0.5),
transforms.ColorJitter(brightness=0.15),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.299, 0.224, 0.225])
])
You can try augmenting your dataset to obtain more data (e.g. random cropping, rotating, etc), and make sure you normalise your data. For the class imbalance problem, you can try using PyTorch's WeightedRandomSampler:
#Let there be 9 samples in class 0 and 1 sample in class 1 respectively
class_counts = [9.0, 1.0]
num_samples = sum(class_counts)
labels = [0, 0,..., 0, 1] #corresponding labels of samples
class_weights = [num_samples/class_counts[i] for i in range(len(class_counts))]
weights = [class_weights[labels[i]] for i in range(int(num_samples))]
sampler = WeightedRandomSampler(torch.DoubleTensor(weights), int(num_samples))
You should be able to apply this to your case with 10 classes easily, hope this solves your problem!