I have a set of 60000 train and 10000 test images (227x227). The images are either completely black (label 1), or black with a white patch in the middle (label 255). What would be the simplest caffe network that I can train on this data to get accuracy 95% or higher. I need to deploy this on an embedded device so the simplest network is what I desire.
I tried to train it using the BVLC reference caffenet and got an accuracy of 99.6%. I converted this model into CMSISNN to deploy it on a ARM device but it generated a weights file of 150MB which is not feasible for an embedded device.
You can try SqueezeNet (4.7 MB, >= 80.3% top-5 on ImageNet), or MobileNet (v2: 13.5 MB, 90.49% top-5 on ImageNet) as baselines. Since you have a much simpler problem, you might want to try a super simple network with your own topology.
Related
I am facing difficulties in training a custom image classification model (EfficientNetB0) on a dataset of 36,000 images of size (100, 100, 3). The images consist of alphanumeric characters (0-9 and A-Z) and my goal is to classify them. My attempts at training the model have been unsuccessful due to either high memory usage on Google Colab or overheating on my MacBook Air M1. I am seeking suggestions for free alternative training methods and models that would be suitable for this classification task.
Below I have also attached images of Google Colab issue and a sample image from my dataset.
colab_crash_image
sample_dataset_image
Here, is what I have tried till now:
Using Google Colab for training, didn't work due to high memory usage.
Using inbuilt processor of MacBook Air M1, didn't work due to overheating issue.
Tried implementing transfer learning on EfficientNetB0 with imagenet weights, got training accuracy of 87% but only 46% test accuracy.
I am new to deep learning. I am doing a school project where I am trying to train a YOLOv5 model on VisDrone dataset. My training set has 4911 images and validation set has more than 3000 images but less than 4000. I am using google Colab pro. As far as I know it has 32gb ram capacity and GPU VRAM=15-16 GB. If I let the model load data automatically it's showing that "cuda out of memory". What strategy can I take to slove this problem? Should I customize the dataloader which is dataloaders.py file. How do I do that?
Usually cuda out of memory occurs due to batch size, it is much better if you let the trainer decide the batchsize it self. To do this replace line number 442 in train.py from:
parser.add_argument('--batch-size', type=int, default=16, help='total batch size for all GPUs, -1 for autobatch')
to this:
parser.add_argument('--batch-size', type=int, default=-1, help='total batch size for all GPUs, -1 for autobatch')
This should solve the out of memory issue.
Problem setting
I have a dataset with N images.
A certain network (e.g - Alexnet) has to be trained from scratch over this dataset.
For each image, 10 augmented versions are to be produced. These augmentations involve resizing, cropping and flipping. For example - an image has to be resized with minimum dimension of 256 pixels and then a random crop of 224 x 224 of it is to be taken. Then it has to be flipped. 5 such random crops have to be taken and their flipped versions also have to be prepared.
Those augmented versions have to go inside the network for training instead of the original image
What would be additionally very beneficial is that, multiple images in the dataset are augmented in parallel and put in a queue or any container from where abatchsize number of samples are pushed into the GPU for training.
The reason is that we would not ideally like multiple augmented versions of the same image going into the network for training simultaneously.
Context
It is not a random feature requirement. There are some papers such as OverFeat which involve such augmentations. Moreover such a random training can be a very good idea to improve the training of the network.
My understanding
To the best of my search, I could not find any framework inside CNTK that can do this.
Questions
Is it possible to achieve in CNTK ?
Please take a look at the CNTK 201 tutorial:
https://github.com/Microsoft/CNTK/blob/penhe/reasonet_tutorial/Tutorials/CNTK_201B_CIFAR-10_ImageHandsOn.ipynb
The image reader has built in transforms that addresses many of your requirements. Unfortunately, it is not in the GPU.
I have a pre-trained network with which I would like to test my data. I defined the network architecture using a .prototxt and my data layer is a custom Python Layer that receives a .txt file with the path of my data and its label, preprocess it and then feed to the network.
At the end of the network, I have a custom Python layer that get the class prediction made by the net and the label (from the first layer) and print, for example, the accuracy regarding all batches.
I would like to run the network until all examples have passed through the net.
However, while searching for the command to test a network, I've found:
caffe test -model architecture.prototxt -weights model.caffemodel -gpu 0 -iterations 100
If I don't set the -iterations, it uses the default value (50).
Does any of you know a way to run caffe test without setting the number of iterations?
Thank you very much for your help!
No, Caffe does not have a facility to detect that it has run exactly one epoch (use each input vector exactly once). You could write a validation input routine to do that, but Caffe expects you to supply the quantity. This way, you can generate easily comparable results for a variety of validation data sets. However, I agree that it would be a convenient feature.
The lack of this feature might be related to its lack for training and the interstitial testing.
In training, we tune the hyper-parameters to get the most accurate model for a given application. As it turns out, this is more closely dependent on TOTAL_NUM than on the number of epochs (given a sufficiently large training set).
With a fixed training set, we often graph accuracy (y-axis) against epochs (x-axis), because that gives tractable results as we adjust batch size. However, if we cut the size of the training set in half, the most comparable graph would scale on TOTAL_NUM rather than the epoch number.
Also, by restricting the size of the test set, we avoid long waits for that feedback during training. For instance, in training against the ImageNet data set (1.2M images), I generally test with around 1000 images, typically no more than 5 times per epoch.
I am thinking of building a convolutional neural network as a tracking system application.I get the feeling that all the deep network applications require the use of GPUs. Is it necessary to use GPUs in a task like mine? What are the minimum PC requirements I should have in my laptop ?
It all depends on the size and depth of your CNN. If your CNN has one convolution layer, and one fully connected layer, and input images are 64x64, you will be able to train your network on your Laptop in a reasonable time. If you use GoogLeNet with hundred of layers, and train on the entire ImageNet set, than even with a video card it will take you a week, so on a CPU it will never finish training.
For most practical applications, however, it is desirable to have a GPU to train a convolution network. Note that on AWS you can get GPU-enabled instances for a rather reasonable price, especially if you get spot instances, so you don't necessarily need to have a GPU locally.
Last note: most of the frameworks (theano, torch, caffe, mxnet, tensorflow) allow you to execute the same model on CPU and on GPU with minor or no modifications to the code, so you can prototype locally on the CPU with a small set of images, and then when your model works, train it on AWS on a GPU instance.