I am working on this implementation of a GAN:
https://github.com/rishikksh20/hifigan-denoiser/blob/master/train.py
You can see there are a total of three neural networks, plus the optimizes defined using torch.optim.AdamW.
It seems that, when loading both the networks and the optimizers using load_state_dict, the loss functions start from a much higher value.
Judging from some similar issues here on stackoverflow, this may be due to the learning rates or other parameters not being stored. However, the learning rate here is fixed. The loading operations seem all be correctly executed here. Can you spot the bug in the code?
Related
I'm not an ML expert and know just a little background about it.
I know that there are techniques to reduce the size of neural networks like distillation and pruning. But I don't know how to efficiently perform those techniques.
Now I need to solve a quite practical problem. I'd like to ship FaceNet, the face recognition model, to mobile devices. There might be trade-offs between recognition accuracy and performance + size. I don't know which size of model would fit best my demands. I think I need to test models in many sizes and figure out which one is the best empirically. To do this, I should acquire models in many sizes that are obtained by compressing the pre-trained FaceNet on its website. For example, 30Mb version of FaceNet, 40Mb version of FaceNet, etc.
However, model compression is not free and costs money. I'm worried that I'd do something very stupid and expensive. What is the recommended way to do this? Does this problem have any common solution for not ML expert?
Network "compression" is not like a slider in which you can move anywhere from slow/accurate to fast/not-accurate.
From your starting model you can apply some techniques which may or may not reduce the size of the network and may or may not reduce also the accuracy. For example converting weights from float32 to float16 will for sure reduce the memory requirement of the network by half, but can also introduce a small reduction in accuracy and it's not supported on all devices.
My suggestion is: start with the base model and perform some tests with it. Understand how far you are from the target FPS and size of your application and then decide which approach makes more sense to reach the objective you have in mind.
Since you said you're not an ML expert I think this kind of task (at least to my knowledge) requires some amount of studying of the subject and experimentation and I would start with an open source solution like for example https://tvm.apache.org/.
Call for experts in deep learning.
Hey, I am recently working on training images using tensorflow in python for tone mapping. To get the better result, I focused on using perceptual loss introduced from this paper by Justin Johnson.
In my implementation, I made the use of all 3 parts of loss: a feature loss that extracted from vgg16; a L2 pixel-level loss from the transferred image and the ground true image; and the total variation loss. I summed them up as the loss for back propagation.
From the function
yˆ=argminλcloss_content(y,yc)+λsloss_style(y,ys)+λTVloss_TV(y)
in the paper, we can see that there are 3 weights of the losses, the λ's, to balance them. The value of three λs are probably fixed throughout the training.
My question is that does it make sense if I dynamically change the λ's in every epoch(or several epochs) to adjust the importance of these losses?
For instance, the perceptual loss converges drastically in the first several epochs yet the pixel-level l2 loss converges fairly slow. So maybe the weight λs should be higher for the content loss, let's say 0.9, but lower for others. As the time passes, the pixel-level loss will be increasingly important to smooth up the image and to minimize the artifacts. So it might be better to adjust it higher a bit. Just like changing the learning rate according to the different epochs.
The postdoc supervises me straightly opposes my idea. He thought it is dynamically changing the training model and could cause the inconsistency of the training.
So, pro and cons, I need some ideas...
Thanks!
It's hard to answer this without knowing more about the data you're using, but in short, dynamic loss should not really have that much effect and may have opposite effect altogether.
If you are using Keras, you could simply run a hyperparameter tuner similar to the following in order to see if there is any effect (change the loss accordingly):
https://towardsdatascience.com/hyperparameter-optimization-with-keras-b82e6364ca53
I've only done this on smaller models (way too time consuming) but in essence, it's best to keep it constant and also avoid angering off your supervisor too :D
If you are running a different ML or DL library, there are optimizer for each, just Google them. It may be best to run these on a cluster and overnight, but they usually give you a good enough optimized version of your model.
Hope that helps and good luck!
I have found the keras-rl/examples/cem_cartpole.py example and I would like to understand, but I don't find documentation.
What does the line
memory = EpisodeParameterMemory(limit=1000, window_length=1)
do? What is the limit and what is the window_length? Which effect does increasing either / both parameters have?
EpisodeParameterMemory is a special class that is used for CEM. In essence it stores the parameters of a policy network that were used for an entire episode (hence the name).
Regarding your questions: The limit parameter simply specifies how many entries the memory can hold. After exceeding this limit, older entries will be replaced by newer ones.
The second parameter is not used in this specific type of memory (CEM is somewhat of an edge case in Keras-RL and mostly there as a simple baseline). Typically, however, the window_length parameter controls how many observations are concatenated to form a "state". This may be necessary if the environment is not fully observable (think of it as transforming a POMDP into an MDP, or at least approximately). DQN on Atari uses this since a single frame is clearly not enough to infer the velocity of a ball with a FF network, for example.
Generally, I recommend reading the relevant paper (again, CEM is somewhat of an exception). It should then become relatively clear what each parameter means. I agree that Keras-RL desperately needs documentation but I don't have time to work on it right now, unfortunately. Contributions to improve the situation are of course always welcome ;).
A little late to the party, but I feel like the answer doesn't really answer the question.
I found this description online (https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html#replay-memory):
We’ll be using experience replay
memory for training our DQN. It stores the transitions that the agent
observes, allowing us to reuse this data later. By sampling from it
randomly, the transitions that build up a batch are decorrelated. It
has been shown that this greatly stabilizes and improves the DQN
training procedure.
Basically you observe and save all of your state transitions so that you can train your network on them later on (instead of having to make observations from the environment all the time).
I have a dataset of around 6K chemical formulas which I am preprocessing via Keras' tokenization to perform binary classification. I am currently using a 1D convolutional neural network with dropouts and am obtaining an accuracy of 82% and validation accuracy of 80% after only two epochs. No matter what I try, the model just plateaus there and doesn't seem to be improving at all. Those same exact accuracies are reached with a vanilla LSTM too. What else can I try to improve my accuracies? Losses only have a difference of 0.04... Anyone have any ideas? Both models use an embedding layer and changing the output dimension isn't having an effect either.
According to your answer, I believe your model has a high bias and low variance (see this link for further details). Thus, your model is not fitting your data very well and it is causing underfitting. So, I suggest you 3 things:
Train your model a little longer: I believe two epoch are too few to give a chance to your model understand the patterns in the data. Try to minimize learning rate and increase the number of epochs.
Try a different architecture: you may change the amount of convolutions, filters and layers, You can also use different activation functions and other layers like max pooling.
Make an error analysis: once you finished your training, apply your model to test set and take a look into the errors. How much false positives and false negatives do you have? Is your model better to classify one class than the other? You can see a pattern in the errors that may be related to your data?
Finally, if none of these suggestions helped you, you may also try to increase the number of features, if possible.
So I have been making a simple HTML5 tuner using the Web Audio API. I have it all set up to respond to the correct frequencies, the problem seems to be with getting the actual frequencies. Using the input, I create an array of the spectrum where I look for the highest value and use that frequency as the one to feed into the tuner. The problem is that when creating an analyser in Web Audio it can not become more specific than an FFT value of 2048. When using this if i play a 440hz note, the closest note in the array is something like 430hz and the next value seems to be higher than 440. Therefor the tuner will think I am playing these notes when infact the loudest frequency should be 440hz and not 430hz. Since this frequency does not exist in the analyser array I am trying to figure out a way around this or if I am missing something very obvious.
I am very new at this so any help would be very appreciated.
Thanks
There are a number of approaches to implementing pitch detection. This paper provides a review of them. Their conclusion is that using FFTs may not be the best way to go - however, it's unclear quite what their FFT-based algorithm actually did.
If you're simply tuning guitar strings to fixed frequencies, much simpler approaches exist. Building a fully chromatic tuner that does not know a-priori the frequency to expect is hard.
The FFT approach you're using is entirely possible (I've built a robust musical instrument tuner using this approach that is being used white-label by a number of 3rd parties). However you need a significant amount of post-processing of the FFT data.
To start, you solve the resolution problem using the Short Timer FFT (STFT) - or more precisely - a succession of them. The process is described nicely in this article.
If you intend building a tuner for guitar and bass guitar (and let's face it, everyone who asks the question here is), you'll need t least a 4092-point DFT with overlapping windows in order not to violate the nyquist rate on the bottom E1 string at ~41Hz.
You have a bunch of other algorithmic and usability hurdles to overcome. Not least, perceived pitch and the spectral peak aren't always the same. Taking the spectral peak from the STFT doesn't work reliably (this is also why the basic auto-correlation approach is also broken).