Import new characters into a pretrained deep learning based OCR model - deep-learning

Now I have a pre-trained deep learning based OCR model but in my real applications there exist several characters not included in the dictionary of my pre-trained model. A small dataset mainly consisting of those new characters is also available. I want the model to learn to recognize new characters without weakening its ability to identify characters in the original dictionary.
However, when I append those new characters onto the model's classification layer and finetune the weights on my dataset, the model performs poorly due to the imbalance in the class distribution of my dataset. Is there any way to effectively import these new characters into my pre-trained model?

Related

Is there a way to visualize the embeddings obtained from Wav2Vec 2.0?

I'm looking to train a word2vec 2.0 model from scratch, but I am a bit new to the field. Crucially, I would like to train it using a large dataset of non-human speech (i.e. cetacean sounds) in order to capture the underlying structure.
Once the pre-training is performed, is it possible to visualize the embeddings the model creates, in a similar way to how latent features are visualized in image processing when using e.g. CNNs? Or are the representations too abstract to be mapped to a spectrogram?
What I would like to do is to see what features the network is learning as the units of speech.
Thanks in advance for the help!

How to choose which pre-trained weights to use for my model?

I am a beginner, and I am very confused about how we can choose a pre-trained model that will improve my model.
I am trying to create a cat breed classifier using pre-trained weights of a model, lets say VGG16 trained on digits dataset, will that improve the performance of the model? or if I train my model just on the database without using any other weights will be better, or will both be the same as those pre-trained weights will be just a starting point.
Also if I use weights of the VGG16 trained for cat vs dog data as a starting point of my cat breed classification model will that help me in improving the model?
Since you've mentioned that you are a beginner I'll try to be a bit more verbose than normal so please bear with me.
How neural models recognise images
The layers in a pre-trained model store multiple aspects of the images they were trained on like patterns(lines, curves), colours within the image which it uses to decide if an image is of a specific class or not
With each layer the complexity of what it can store increases initially it captures lines or dots or simple curves but with each layer, the representation power increases and it starts capturing features like cat ears, dog face, curves in a number etc.
The image below from Keras blog shows how initial layers learn to represent simple things like dots and lines and as we go deeper they start to learn to represent more complex patterns.
Read more about Conv net Filters at keras's blog here
How does using a pretrained model give better results ?
When we train a model we waste a lot of compute and time initially creating these representations and in order to get to those representations we need quite a lot of data too else we might not be able to capture all relevant features and our model might not be as accurate.
So when we say we want to use a pre-trained model we want to use these representations so if we use a model trained on imagenet which has lots of cat pics we can be sure that the model already has representations to identify important features required to identify a cat and will converge to a better point than if we used random weights.
How to use pre-trained weights
So when we say to use pre-trained weights we mean use the layers which hold the representations to identify cats but discard the last layer (dense and output) and instead add fresh dense and output layers with random weights. So our predictions can make use of the representations already learned.
In real life we freeze our pretrained weights during the initial training as we do not want our random weights at the bottom to ruin the learned representations. we only unfreeze the representations in the end after we have a good classification accuracy to fine-tune them, and that too with a very small learning rate.
Which kind of pre-trained model to use
Always choose those pretrained weights that you know has the most amount of representations which can help you in identifying the class you are interested in.
So will using a mnist digits trained weights give relatively bad results when compared with one trained on image net?
Yes, but given that the initial layers have already learned simple patterns like lines and curves for digits using these weights will still put you at an advantage when compared to starting from scratch in most of the cases.
Sane weight initialization
The pre-trained weights to choose depends upon the type of classes you wish to classify. Since, you wish to classify Cat Breeds, use pre-trained weights from a classifier that is trained on similar task. As mentioned by the above answers the initial layers learn things like edges, horizontal or vertical lines, blobs, etc. As you go deeper, the model starts learning problem specific features. So for generic tasks you can use say imagenet & then fine-tune it for the problem at hand.
However, having a pre-trained model which closely resembles your training data helps immensely. A while ago, I had participated in Scene Classification Challenge where we initialized our model with the ResNet50 weights trained on Places365 dataset. Since, the classes in the above challenge were all present in the Places365 dataset, we used the weights available here and fine-tuned our model. This gave us a great boost in our accuracy & we ended up at top positions on the leaderboard.
You can find some more details about it in this blog
Also, understand that the one of the advantages of transfer learning is saving computations. Using a model with randomly initialized weights is like training a neural net from scratch. If you use VGG16 weights trained on digits dataset, then it might have already learned something, so it will definitely save some training time. If you train a model from scratch then it will eventually learn all the patterns which using a pre-trained digits classifier weights would have learnt.
On the other hand using weights from a Dog-vs-Cat classifier should give you better performance as it already has learned features to detect say paws, ears, nose or whiskers.
Could you provide more information, what do you want to classify exactly? I see you wish to classify images, which type of images (containing what?) and in which classes?
As a general remark : If you use a trained model, it must fit your need, of course. Keep in mind that a model which was trained on a given dataset, learned only the information contained in that dataset and can classify / indentify information analogous to the one in the training dataset.
If you want to classify an image containing an animal with a Y/N (binary) classifier, (cat or not cat) you should use a model trained on different animals, cats among them.
If you want to classify an image of a cat into classes corresponding to cat races, let's say, you should use a model trained only on cats images.
I should say you should use a pipeline, containing steps 1. followed by 2.
it really depends on the size of the dataset you have at hand and how related the task and data that the model was pretrained on to your task and data. Read more about Transfer Learning http://cs231n.github.io/transfer-learning/ or Domain Adaptation if your task is the same.
I am trying to create a cat breed classifier using pre-trained weights of a model, lets say VGG16 trained on digits dataset, will that improve the performance of the model?
There are general characteristics that are still learned from digits like edge detection that could be useful for your target task, so the answer here is maybe. You can here try just training the top layers which is common in computer vision applications.
Also if I use weights of the VGG16 trained for cat vs dog data as a starting point of my cat breed classification model will that help me in improving the model?
Your chances should be better if the task and data are more related and similar

Usefulness of Pretrained NN's for performing binary segmentation in images

I am trying to perform binary segmentation on a custom dataset (DAGM dataset in my case Link to the dataset
I was just curious to know if pretrained networks on the imagenet dataset like VGG,Resnet will be of any particular use as I am not trying to segment objects like cats,dogs etc but anomalies in the images.
Normally you would want to fine tune a model on your new dataset which was previously trained and tuned on a similar problem. Neural networks extract features from samples and use those features to classify. If you have previously trained your network on biomedical dataset, then it has learned how to extract features from those models. So try to find a model that was trained on similar domain.
Also you can check the below link for more insight about the issue.
https://en.wikipedia.org/wiki/Catastrophic_interference

Training a neural network with two completely different datasets.

I am working with Neural network for object classification right now. I am working on creating datasets for training and validation. I want to know if it is possible to create two datasets for training comprising of two completely different objects and labels. (EG dataset 1 has cars and dataset 2 has cats) Does it still work or should I create datasets where each file has mixed up both the different object types and labels in all the training files? Does such mixture/separation matter if I am training the network in one cycle with different datasets?
Depending on what you are using to train, many APIs (such as TensorFlow object detection) read the TF Record files (datasets) in order as they are scrambled to make the files beforehand. Scrambling is quite important with training as you will get the model starting training with one class, and then train for a bit with another individual class. It should get to the same standard eventually, but it is a lot better for the model to train with an equal distribution of classes of the training steps.

To create different embedding layers in keras

I recently read this paper End-To-End Memory Networks which uses three different embeddings layers for sentence embeddings. Now I am trying to reproduce this architecture in keras.
But I am not sure how to create three different embeddings. These are exactly same dimension based on same corpus but should have different values for the embeddings. So to implement this layers, should I just use Embedding Layers with kernel_initializer =random_uniform?
I know pre-trained embeddings like Word2Vec, but currently pre-trained model is not important, is it?