Is it possible to do transfer learning on different observation and action space for Actor-Critic? - reinforcement-learning

I have been experimenting with actor-critic networks such as SAC and TD3 on continuous control tasks and trying to do transfer learning using the trained network to another task with smaller observation and action space.
Would it be possible to do so if i were to save the weights in a dictionary and then load it in the new environment? The inputs to the Actor-Critic network requires a state with different dimensions as well as outputting an actor with different dimensions.
I had some experience doing fine-tuning with transformer models by addind another classifier head and fine-tuning it, but how would i do this with Actor-Critic networks, if the initial layer and final layer does not match with the learned agent.

Related

Backbone network in Object detection

I am trying to understand the training process of a object deetaction deeplearng algorithm and I am having some problems understanding how the backbone network (the network that performs feature extraction) is trained.
I understand that it is common to use CNNs like AlexNet, VGGNet, and ResNet but I don't understand if these networks are pre-trained or not. If they are not trained what does the training consist of?
We directly use a pre-trained VGGNet or ResNet backbone. Although the backbone is pre-trained for classification task, the hidden layers learn features which can be used for object detection also. Initial layers will learn low level features such as lines, dots, curves etc. Next layer will learn learn high-level features that are built on top of low-level features to detect objects and larger shapes in the image.
Then the last layers are modified to output the object detection coordinates rather than class.
There are object detection specific backbones too. Check these papers:
DetNet: A Backbone network for Object Detection
CBNet: A Novel Composite Backbone Network Architecture for Object Detection
DetNAS: Backbone Search for Object Detection
High-Resolution Network: A universal neural architecture for visual recognition
Lastly, the pretrained weights will be useful only if you are using them for similar images. E.g.: weights trained on Image-net will be useless on ultrasound medical image data. In this case we would rather train from scratch.

Pretrained model or training from scratch for object detection?

I have a dataset composed of 10k-15k pictures for supervised object detection which is very different from Imagenet or Coco (pictures are much darker and represent completely different things, industrial related).
The model currently used is a FasterRCNN which extracts features with a Resnet used as a backbone.
Could train the backbone of the model from scratch in one stage and then train the whole network in another stage be beneficial for the task, instead of loading the network pretrained on Coco and then retraining all the layers of the whole network in a single stage?
From my experience, here are some important points:
your train set is not big enough to train the detector from scratch (though depends on network configuration, fasterrcnn+resnet18 can work). Better to use a pre-trained network on the imagenet;
the domain the network was pre-trained on is not really that important. The network, especially the big one, need to learn all those arches, circles, and other primitive figures in order to use the knowledge for detecting more complex objects;
the brightness of your train images can be important but is not something to stop you from using a pre-trained network;
training from scratch requires much more epochs and much more data. The longer the training is the more complex should be your LR control algorithm. At a minimum, it should not be constant and change the LR based on the cumulative loss. and the initial settings depend on multiple factors, such as network size, augmentations, and the number of epochs;
I played a lot with fasterrcnn+resnet (various number of layers) and the other networks. I recommend you to use maskcnn instead of fasterrcnn. Just command it not to use the masks and not to do the segmentation. I don't know why but it gives much better results.
don't spend your time on mobilenet, with your train set size you will not be able to train it with some reasonable AP and AR. Start with maskrcnn+resnet18 backbone.

What are backend weights in deep learning models (yolo)?

pretty new to deep learning, but couldn't seem to find/figure out what are backend weights such as
full_yolo_backend.h5
squeezenet_backend.h5
From what I have found and experimented, these backend weights have fundamentally different model architectures such as
yolov2 model has 40+ layers but the backend only 20+ layers (?)
you can build on top of the backend model with your own networks (?)
using backend models tend to yield poorer results (?)
I was hoping to seek some explanation on backend weights vs actual models for learning purposes. Thank you so much!
I'm note sure which implementation you are using but in many applications, you can consider a deep model as a feature extractor whose output is more or less task-agnostic, followed by a number of task-specific heads.
The choice of backend depends on your specific constraints in terms of tradeoff between accuracy and computational complexity. Examples of classical but time-consuming choices for backends are resnet-101, resnet-50 or VGG that can be coupled with FPN (feature pyramid networks) to yield multiscale features. However, if speed is your main concern then you can use smaller backends such as different MobileNet architectures or even the vanilla networks such as the ones used in the original Yolov1/v2 papers (tinyYolo is an extreme case).
Once you have chosen your backend (you can use a pretrained one), you can load its weights (that is what your *h5 files are). On top of that, you will add a small head that will carry the tasks that you need: this can be classification, bbox regression, or like in MaskRCNN forground/background segmentation. For Yolov2, you can just add very few, for example 3 convolutional layers (with non-linearities of course) that will output a tensor of size
BxC1xC2xAxP
#B==batch size
#C1==number vertical of cells
#C2==number of horizontal cells
#C3==number of anchors
#C4==number of parameters (i.e. bbx parameters, class prediction, confidence)
Then, you can just save/load the weights of this head separately. When you are happy with your results though, training jointly (end-to-end) will usually give you a small boost in accuracy.
Finally, to come back to your last questions, I assume that you are getting poor results with the backends because you are only loading backend weights but not the weights of the heads. Another possibility is that you are using a head trained with a backends X but that you are switching the backend to Y. In that case since the head expects different features, it's natural to see a drop in performance.

Sagemaker model evaluation

The Amazon documentation lists several approaches to evaluate a model (e.g. cross validation, etc.) however these methods does not seem to be available in the Sagemaker Java SDK.
Currently if we want to do 5-fold cross validation it seems the only option is to create 5 models (and also deploy 5 endpoints) one model for each subset of data and manually compute the performance metric (recall, precision, etc.).
This approach is not very efficient and can also be expensive need to deploy k-endpoints, based on the number of folds in the k-fold validation.
Is there another way to test the performance of a model?
Amazon SageMaker is a set of multiple components that you can choose which ones to use.
The built-in algorithms are designed for (infinite) scale, which means that you can have huge datasets and be able to build a model with them quickly and with low cost. Once you have large datasets you usually don't need to use techniques such as cross-validation, and the recommendation is to have a clear split between training data and validation data. Each of these parts will be defined with an input channel when you are submitting a training job.
If you have a small amount of data and you want to train on all of it and use cross-validation to allow it, you can use a different part of the service (interactive notebook instance). You can bring your own algorithm or even container image to be used in the development, training or hosting. You can have any python code based on any machine learning library or framework, including scikit-learn, R, TensorFlow, MXNet etc. In your code, you can define cross-validation based on the training data that you copy from S3 to the worker instances.

how to train pre-trained CNN on new dataset which is not organised in classes (Unsupervised)

I have a pretrained CNN (Resnet-18) trained on Imagenet, now i want to extend it on my own dataset of video frames , now the point is all tutorials i found on Finetuning required dataset to be organised in classes like
class1/train/
class1/test/
class2/train/
class2/test/
but i have only frames on many videos , how will i train my CNN on it.
So can anyone point me in right direction , any tutorial or paper etc ?
PS: My final task is to get deep features of all frames that i provide at the time of testing
for training network, you should have some 'label'(sometimes called y) of your input data. from there, network calculate loss between logit(answer of network) and the given label.
And the network will self-revise using that loss value by backpropagating. that process is what we call 'training'.
Because you only have input data, not label, so you can get the logit only. that means a loss cannot be calculated.
Fine tuning is almost same word with 'additional training', so that you cannot fine tuning your pre-trained network without labeled data.
About train set & test set, that is not the problem right now.
If you have enough labeled input data, you can divide it with some ratio.
(e.g. 80% of data for training, 20% of data for testing)
the reason why divide data into these two sets, we want to check the performance of our trained network more general, unseen situation.
However, if you just input your data into pre-trained network(encoder part), it will give a deep feature. It doesn't exactly fit to your task, still it is deep feature.
Added)
Unsupervised pre-training for convolutional neural network in theano
here is the method you need, deep feature encoder in unsupervised situation. I hope it will help.