L1 regularization in catboost - catboost

I want use l1 regularization with catboost to remove some irrelevant features and avoid over fitting. But I cannot find out how to use it. It seems to me there isn’t l1? Please help me. Thank you.
Hint me how I can use it

I'm reasonably sure that Catboost does not support L1 Regularization. This page describes parameter tuning and lists L2 Regularization, but L1 is not listed.

Related

Why is my generator and discriminator loss converging at higher values in WGAN-GP?

This is the loss plot of WGAN-GP after training for 14000 iterations. My image size is 128 by 128. Though the loss plot seems to be converging, the generator loss at iteration 14000 is -26646 and critic loss is -249909.
Loss plot
Batch Normalization in the discriminator breaks Wasserstein GANs with gradient penalty. The authors themselves advocate the usage of layer normalization instead, but this is clearly written in bold in their paper (https://papers.nips.cc/paper/7159-improved-training-of-wasserstein-gans.pdf). It is hard to say if there are other bugs in your code, but I urge you to thoroughly read the DCGAN and the Wasserstein GAN paper and really take notes on the hyperparameters. Getting them wrong really destroys the performance of the GAN and doing a hyperparameter search gets expensive quite quickly.
By the way transposed convolutions produce stairway artifacts in your output images. Use image resizing instead. For an indepth explanation of that phenomenon I can recommend the following resource (https://distill.pub/2016/deconv-checkerboard/).
This is an interesting find as well, which may help you: Accelerated WGAN update strategy with loss change rate balancing.

In deep learning, can I change the weight of loss dynamically?

Call for experts in deep learning.
Hey, I am recently working on training images using tensorflow in python for tone mapping. To get the better result, I focused on using perceptual loss introduced from this paper by Justin Johnson.
In my implementation, I made the use of all 3 parts of loss: a feature loss that extracted from vgg16; a L2 pixel-level loss from the transferred image and the ground true image; and the total variation loss. I summed them up as the loss for back propagation.
From the function
yˆ=argminλcloss_content(y,yc)+λsloss_style(y,ys)+λTVloss_TV(y)
in the paper, we can see that there are 3 weights of the losses, the λ's, to balance them. The value of three λs are probably fixed throughout the training.
My question is that does it make sense if I dynamically change the λ's in every epoch(or several epochs) to adjust the importance of these losses?
For instance, the perceptual loss converges drastically in the first several epochs yet the pixel-level l2 loss converges fairly slow. So maybe the weight λs should be higher for the content loss, let's say 0.9, but lower for others. As the time passes, the pixel-level loss will be increasingly important to smooth up the image and to minimize the artifacts. So it might be better to adjust it higher a bit. Just like changing the learning rate according to the different epochs.
The postdoc supervises me straightly opposes my idea. He thought it is dynamically changing the training model and could cause the inconsistency of the training.
So, pro and cons, I need some ideas...
Thanks!
It's hard to answer this without knowing more about the data you're using, but in short, dynamic loss should not really have that much effect and may have opposite effect altogether.
If you are using Keras, you could simply run a hyperparameter tuner similar to the following in order to see if there is any effect (change the loss accordingly):
https://towardsdatascience.com/hyperparameter-optimization-with-keras-b82e6364ca53
I've only done this on smaller models (way too time consuming) but in essence, it's best to keep it constant and also avoid angering off your supervisor too :D
If you are running a different ML or DL library, there are optimizer for each, just Google them. It may be best to run these on a cluster and overnight, but they usually give you a good enough optimized version of your model.
Hope that helps and good luck!

Why is there a decrease in the performance of pre-trained Deep Learning models?

Doing binary classification with infected/uninfected RBCs (something the pretrained DL models have never seen before) using models and weights from Keras. I find the performance of the models (vgg16,19,xception) decrease with increase in the number of training and validation instances. Why?
Maybe it is related to resource management where you are doing inference and the model expands in the memory and it can decrease the performance. This situation will create a lot of Main memory access to perform the forward pass computations and page faults are occurring and it can decrease the performance.
Hope this helps.

Why neural network tends to output 'mean value'?

I am using keras to build a simple neural network for a regression task.
But the output is always tends to the 'mean value' of ground truth y data.
See the first figure, blue is ground truth, red is predicted value (very close to the constant mean of ground truth).
Also the model stops learning very early even though I set a learning epoch=100.
Anyone have ideas under what kinds of conditions the neural network will stop learning early and why the regression output tends to 'the mean' of ground truth?
Thanks!
Possibly because the data are unpredictable....? Do you know for certain that the data set has N order predictability of some kind?
Just eyeballing your data set, it lacks periodicity, lacks homoscedasticity, it lacks any slope or skew or trend or pattern... I can't really tell if there is anything wrong with your 'net. In the absence of any pattern, the mean is always the best prediction... and it is entirely possible (although not certain) that the neural net is doing its job.
I suggest you find an easier data set, and see if you can tackle that first.
The model is not learning from the data. Think of a basic linear regression - the 'null' prediction, the prediction if you didn't have any predictors at all, is just the expected value; i.e. the mean. It could be caused by many different issues, but initialization comes to mind - bad initialization leads to no learning. This blog post has good practical advice that may help.

How to change the learning rate of specific layer from the solver prototxt (CAFFE)

Anybody knows how to change the learning rate lr_mult of a specific layer in CAFFE from the solver prototxt? I know there's base_lr, however I would like to target the rate of a specific layer, and doing it from the solver instead of the network prototxt.
Thanks!
Every layer that requiers learning (i.e convultional, fully-connected, etc.) has a specific lr_mult parameter that can be controlled specifically for that layer. lr_mult is a "multiplier on the global learning rate for this parameter."
Simply define or change the lr_mult for your layer in train_val.prototxt.
This is useful for fine-tuning, where you might want to have increased learning rate only for the new layer.
For more info check the caffe fine-tuning tutorial. (Note: it is a bit outdated and the deprecated term blobs_lr is used there instead of lr_mult)
EDIT: To my best knowledge it is not possible to define a layer-specific learning rate from the solver.prototxt. Hence, assuming the solver.prototxt limitation is not strict, I suggest a different method to achieve the same result.