Cutting down the size of a fasttext bin file - deep-learning

Currently the bin files for fastText wiki.en.bin is about 8GB. Is there a version about half the size of this? The bin files consists of the model and pretrained vectors that were generated from a large wiki corpus. Is there a smaller en. version that would make it easier for lower range machines? Loading this up is taking too much memory.
Or to get a smaller size bin file for use with fasttext, should i train my own set of fasttext vectors with a smaller set of parallel corpus?

You can use the quantize function
$ ./fasttext quantize -output wiki.en
This will drastically reduce the size of your model without losing too much accuracy.

Currently, the native Facebook fastText library supports quantization only for the supervised models used for classification, and cannot compress unsupervised models for embedding lookup trained e.g. on wiki.
However, I have created a package compress-fasttext that is able to significantly reduce the size of unsupervised fastText models. You can read more about it in this Medium post.
There are a few models of different sizes (10MB to 200MB) compressed with this package for English and Russian, and a set of tiny models for 101 other languages.

Related

Will it be okay if I use my image datasets with 50x 50 dimensions into googLeNet CNN architecture which is recommending for 224 x 224?

I am new to DL and am trying to train my first CNN models with googLeNet architecture. I've prepared my custom image data dimensions with 50x50 but the architecture is recommending to use 224x224. Will it be okay to use the architecture? I don't want to remake my datasets to change the size of the images. So, if there are some other architectures that I can look into it, please kindly recommend them for me.
If you're looking for the best CNN model for image classification, take a look at EfficientNet architecture (Pytorch implementation, Paper). IIRC, Googlenet is kinda old.
If your model requires some specific shape of the input image, you can just resize them (for example, you can use torchvision or OpenCV)

Is there an actual minimum input image size for popular computer vision models? (E.g., vgg, resnet, etc.)

According to the documentation on pre-trained computer vision models for transfer learning (e.g., here), input images should come in "mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224".
However, when running transfer learning experiments on 3-channel images with height and width smaller than expected (e.g., smaller than 224), the networks generally run smoothly and often get decent performances.
Hence, it seems to me that the "minimum height and width" is somehow a convention and not a critical parameter. Am I missing something here?
There is a limitation on your input size which corresponds to the receptive field of the last convolution layer of your network. Intuitively, you can observe the spatial dimensionality decreasing as you progress through the network. At least this is the case for feature extractor CNNs which aim at extracting feature embeddings from the input image. That is most pre-trained models such as vanilla VGG, and ResNets networks do not retain spatial dimensionality. If the input of a convolutional layer is smaller than the kernel size (even if/when padded), then you simply won't be able to perform the operation.
TLDR: adaptive pooling layer
For example, the standard resnet50 model accepts input only in ranges 193-225, and this is due to the architecture and downscaling layers (see below).
The only reason why the default pytorch model works is that it is using adaptive pooling layer which allows to not restrict input size. So it's gonna work but you should be ready for performance decay and other fun things :)
Hope you will find it useful:
https://discuss.pytorch.org/t/how-can-torchvison-models-deal-with-image-whose-size-is-not-224-224/51077/3
What is Adaptive average pooling and How does it work?
https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html
https://github.com/pytorch/vision/blob/c187c2b12d86c3909e59a40dbe49555d85b98703/torchvision/models/resnet.py#L118
https://github.com/pytorch/vision/blob/c187c2b12d86c3909e59a40dbe49555d85b98703/torchvision/models/resnet.py#L151
https://developpaper.com/pytorch-implementation-examples-of-resnet50-resnet101-and-resnet152/

What are backend weights in deep learning models (yolo)?

pretty new to deep learning, but couldn't seem to find/figure out what are backend weights such as
full_yolo_backend.h5
squeezenet_backend.h5
From what I have found and experimented, these backend weights have fundamentally different model architectures such as
yolov2 model has 40+ layers but the backend only 20+ layers (?)
you can build on top of the backend model with your own networks (?)
using backend models tend to yield poorer results (?)
I was hoping to seek some explanation on backend weights vs actual models for learning purposes. Thank you so much!
I'm note sure which implementation you are using but in many applications, you can consider a deep model as a feature extractor whose output is more or less task-agnostic, followed by a number of task-specific heads.
The choice of backend depends on your specific constraints in terms of tradeoff between accuracy and computational complexity. Examples of classical but time-consuming choices for backends are resnet-101, resnet-50 or VGG that can be coupled with FPN (feature pyramid networks) to yield multiscale features. However, if speed is your main concern then you can use smaller backends such as different MobileNet architectures or even the vanilla networks such as the ones used in the original Yolov1/v2 papers (tinyYolo is an extreme case).
Once you have chosen your backend (you can use a pretrained one), you can load its weights (that is what your *h5 files are). On top of that, you will add a small head that will carry the tasks that you need: this can be classification, bbox regression, or like in MaskRCNN forground/background segmentation. For Yolov2, you can just add very few, for example 3 convolutional layers (with non-linearities of course) that will output a tensor of size
BxC1xC2xAxP
#B==batch size
#C1==number vertical of cells
#C2==number of horizontal cells
#C3==number of anchors
#C4==number of parameters (i.e. bbx parameters, class prediction, confidence)
Then, you can just save/load the weights of this head separately. When you are happy with your results though, training jointly (end-to-end) will usually give you a small boost in accuracy.
Finally, to come back to your last questions, I assume that you are getting poor results with the backends because you are only loading backend weights but not the weights of the heads. Another possibility is that you are using a head trained with a backends X but that you are switching the backend to Y. In that case since the head expects different features, it's natural to see a drop in performance.

What kind of on-the-fly artificial data augmentation can be applied for semantic segmentation?

I have almost 5223 and 577 images for training and validation sets, respectively. I am applying CNNs for image segmentation and would like to do artificial on-the-fly data augmentation.
I have installed Caffe latest version. I am wondering whether this version of Caffe is supporting data augmentation or not?
If yes, could you please share some resources for me?
The other question is whenever we are doing artificial data augmentation, should we change the epoch according to the augmentation size? for example, if I only apply mirroring, should I change the size of epoch by mutiplying 2?

User defined transformation in CNTK

Problem setting
I have a dataset with N images.
A certain network (e.g - Alexnet) has to be trained from scratch over this dataset.
For each image, 10 augmented versions are to be produced. These augmentations involve resizing, cropping and flipping. For example - an image has to be resized with minimum dimension of 256 pixels and then a random crop of 224 x 224 of it is to be taken. Then it has to be flipped. 5 such random crops have to be taken and their flipped versions also have to be prepared.
Those augmented versions have to go inside the network for training instead of the original image
What would be additionally very beneficial is that, multiple images in the dataset are augmented in parallel and put in a queue or any container from where abatchsize number of samples are pushed into the GPU for training.
The reason is that we would not ideally like multiple augmented versions of the same image going into the network for training simultaneously.
Context
It is not a random feature requirement. There are some papers such as OverFeat which involve such augmentations. Moreover such a random training can be a very good idea to improve the training of the network.
My understanding
To the best of my search, I could not find any framework inside CNTK that can do this.
Questions
Is it possible to achieve in CNTK ?
Please take a look at the CNTK 201 tutorial:
https://github.com/Microsoft/CNTK/blob/penhe/reasonet_tutorial/Tutorials/CNTK_201B_CIFAR-10_ImageHandsOn.ipynb
The image reader has built in transforms that addresses many of your requirements. Unfortunately, it is not in the GPU.