Bias mitigation for Regression model - regression

For one of my project, I am planning to try bias mitigation techniques. But, I have been able to only find the MLFairness models for classification problem. But not been able to get any leads for a regression problem. Could you suggest some leads on the fairness model techniques for regression problem.

Related

Keras Applications (Transfer Learning)

I am a student and currently studying deep learning by myself. Here I would like to ask for clarification regarding the transfer learning.
For example MobileNetv2 (https://keras.io/api/applications/mobilenet/#mobilenetv2-function), if the weights parameter is set to None, then I am not doing transfer learning as the weights are random initialized. If I would like to do transfer learning, then I should set the weights parameter to imagenet. Is this concept correct?
Clarification and explanation regarding deep learning
Yes, when you initialize the weights with random values, you are just using the architecture and training the model from scratch. The goal of transfer learning is to use the previously gained knowledge by another trained model to get better results or to use less computational resources.
There are different ways to use transfer learning:
You can freeze the learned weights of the base model and replace the last layer of the model base on your problem and just train the last layer
You can start with the learned weights and fine-tune them (let them change in the learning process). Many people do that because sometimes it makes the training faster and gives better results because the weights already contain so much information.
You can use the first layers to extract basic features like colors, edges, circles... and add your desired layers after them. In this way, you can use your resources to learn high-level features.
There are more cases, but I hope it could give you an idea.

Fine-tuning with a very low learning rate. Any sign that something is not good?

I have working with deep reinforcement learning and in the literature, usually the learning rates are lower than I found in other settings.
My model is the following one:
def create_model(self):
model = Sequential()
model.add(LSTM(HIDDEN_NODES, input_shape=(STATE_SIZE, STATE_SPACE), return_sequences=False))
model.add(Dense(HIDDEN_NODES, activation='relu', kernel_regularizer=regularizers.l2(0.000001)))
model.add(Dense(HIDDEN_NODES, activation='relu', kernel_regularizer=regularizers.l2(0.000001)))
model.add(Dense(ACTION_SPACE, activation='linear'))
# Compile the model
model.compile(loss=tf.keras.losses.Huber(delta=1.0), optimizer=Adam(lr=LEARNING_RATE, clipnorm=1))
return model
Where the initial learning rate (lr) is 3e-5. For the fine-tuning, I freeze the first two layers (this step is essential in my settings) and decrease the learning rate to 3e-9. During the fine-tuning, the model might suffer from a distributional shift once the source of samples is perturbed data. Is there another source of problems besides this for such a low learning rate to keep my model improving?
First, Show me your data sample.
Theoretical Answer:
We have learned how perturbation helps in solving various issues related to neural network training or trained model. Here, we have seen perturbation in three components (gradients, weights, inputs) associated with neural-network training and trained model; perturbation, in gradients is to tackle vanishing gradient problem, in weights for escaping saddle point, and in inputs to avoid malicious attacks. Overall, perturbations in different ways play the role of strengthening the model against various instabilities, for example, it can avoid staying at correctness wreckage point since such position will be tested with perturbation (input, weight, gradient) which will make the model approach towards correctness attraction point.
As of now, perturbation is mainly contingent to empirical-experimentation designed from intuition to solve encountering problems. One needs to experiment if perturbing a component of the training process makes sense intuitively, and further verify empirically if it helps mitigate the problem. Nevertheless, in future, we will see more on perturbation theory in deep learning or machine learning in general which might also be backed by a theoretical guarantee.

PyTorch Model's gradients are converging to zero

I'm currently working on a personal implementation of the Transformer architecture. The code I've written as here.
The problem that I'm facing is that I believe my model isn't training properly and I'm not sure what kind of measures I should take to fix that. I've come to this conclusion after using Weights & Biases to visualize the model's gradient histograms and they look something like this:
The gradients seem to quickly converge to zero. There is a portion of code that contains a feedforward neural network that uses ReLU activation, and I changed this to Leaky ReLU under the suspicion that dying ReLU's may be the problem. However, using Leaky ReLU's doesn't help and just prolongs the zero-convergence.
Any feedback on what else I may try is appreciated.

Overview for Deep Learning Networks

I am fairly new to Deep Learning and get quite overwhelmed by the many different Nets and their field of application. Thus, I want to know if there is some kind of overview which kind of different networks exist, what there key-features are and what kind of purpose they have.
For example I know abut LeNet, ConvNet, AlexNet - and somehow they are the same but still differ?
There are basically two types of neural networks, supervised and unsupervised learning. Both need a training set to "learn". Imagine training set as a massive book where you can learn specific information. In supervised learning, the book is supplied with answer key but without the solution manual, in contrast, unsupervised learning comes without answer key or solution manual. But the goal is the same, which is that to find patterns between the questions and answers (supervised learning) and questions (unsupervised learning).
Now we have differentiate between those two, we can go into the models. Let's discuss about supervised learning, which basically has 3 main models:
artificial neural network (ANN)
convolutional neural network (CNN)
recurrent neural network (RNN)
ANN is the simplest of all three. I believe that you have understand it, so we can move forward to CNN.
Basically in CNN all you have to do is to convolve our input with feature detectors. Feature detectors are matrices which have the dimension of (row,column,depth(number of feature detectors). The goal of convolving our input is to extract informations related to spatial data. Let's say you want to distinguish between cats and dogs. Cats have whiskers but dogs does not. Cats also have different eyes than dogs and so on. But the downside is, the more convolution layers will result in slower computation time. To mitigate that, we do some kind of processing called pooling or downsampling. Basically, this reduce the size of feature detectors while minimizing lost features or information. Then the next step would be flattening or squashing all those 3d matrix into (n,1) dimension so you can input it into ANN. Then the next step is self explanatory, which is normal ANN. Because CNN is inherently able to detect certain features, it mostly(maybe always) used for classification, for example image classification, time series classification, or maybe even video classification. For a crash course in CNN, check out this video by Siraj Raval. He's my favourite youtuber of all time!
Arguably the most sophisticated of all three, RNN is bestly described as neural networks that have "memory" by introducing "loops" within them which allow information to persist. Why is this important? As you are reading this, your brain use previous memory to comprehend all of this information. You don't seem to rethink everything from scratch again and this is what traditional neural networks do, which is to forget everything and re-learn again. But native RNN aren't effective so when people talk about RNN they mostly refer to LSTM which stands for Long Short-Term Memory. If that seems confusing to you, Cristopher Olah will give you in depth explanation in a very simple way. I advice you to check out his link for complete understanding about how RNN, especially LSTM variant
As for unsupervised learning, I'm so sorry that I haven't got the time to learn them, so this is the best I can do. Good luck and have fun!
They are the same type of Networks. Convolutional Neural Networks. The problem with the overview is that as soon as you post something it is already outdated. Most of the networks you describe are already old, even though they are only a few years old.
Nevertheless you can take a look at the networks supplied by caffe (https://github.com/BVLC/caffe/tree/master/models).
In my personal view the most important concepts in deep Learning are recurrent networks (https://keras.io/layers/recurrent/), residual connections, inception blocks (see https://arxiv.org/abs/1602.07261). The rest are largely theoretical concepts, which would not fit in a stack overflow answer.

Calculating ROC and AUC in Caffe?

I have trained imagenet in Caffe. Now i am trying to calculate ROC/AUC for my model and the trained model provided by caffe. I have two questions:
1) ROC/AUC is mainly used for binary classes, but i also found that in some cases people used it for multi-classes. Is it possible for 1000 classes. And what will be its impact? As in reviews people didn't give good answer for ROC/AUC in multi-class problems.
2) If possible, and comparing two models based on ROC/AUC will be a good idea, Can anybody tell how to do it for these 1000 classes in Caffe? And do i have to retrain the models from scratch, or can i calculate only with final trained models?
Regards
This discussion addresses multi-class ROC/AUC analysis nicely. Answering your questions:
You can do multiple one-vs-all classifications for each class, thus building multiple ROC curves.
Having computed 1000 AUC values, you can come up with the mean AUC over all classes and use this metric to compare goodness of your models. No, you don't need to retrain your models.
Also, pay an attention that ROC/AUC metrics are quite specific and used mostly in detection/biometry tasks like voice identification.