I am working with images of text which have diagrams in it. My images are basically black and white I do not see why I want colors in my images. I got some decent results with default settings but I want to test on grayscale Images too. I am using this tutorial as the base which is by default using AlexyAB's repo for darknet. I think I have to change the config file as:
channels=3 # I think I have to change it to 0
momentum=0.9
decay=0.0005
angle=0 # A link says that I have to comment these all
saturation = 1.5 # This on
exposure = 1.5 # and this one too
hue=.1 # Should I change it to 0 too?
But there is this link which says that I have to comment hue,saturation,angle,exposure etc. I want to know that:
Do I have to save the images as Grayscale in directory or the code will do it by itself?
some other configuration has to be changed apart from the setting channels=1? Setting hue to 0 is also suggested in this link
Do I need to modify some function which deals with loading the images as given in this link as the load_data_detection function
Just changing channels=1 in the config file should work. If not, then comment out other parameters like angle,hue,exposure,saturation and try again
There are posts online and on the official repository suggesting to edit the channels, hue value, etc.
Here is what works for SURE (I tried),
For training set, either take Grayscale images or convert RGB to Grayscale using openCV.
cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
Train the YOLOv4 AS YOU WOULD FOR RGB, i.e. without messing with channel, hue, etc.
Note:Don't forget to set the steps, batch size, number of channels, etc. as you would do normally for RGB images, just train the model on Grayscale images instead of RGB.
Few personal observations, theoretical explanation of which I couldn't find:
YOLOv4 trained on RGB images, won't work on B&W/Grayscale images.
YOLOv4 trained on B&W/Grayscale images, won't work on RGB.
Hope this helps.
Edit: I'm yet to verify whether this would be computationally more expensive than a model trained with reduced channels. Although it has shown not to reduce/improve the inference times.
Related
I'm new to deep learning and trying cell segmentation with Detectron2 Mask R-CNN.
I use the images and mask images from http://celltrackingchallenge.net/2d-datasets/ - Simulated nuclei of HL60 cells - the training dataset. The folder I am using is here
I tried to create and register a new dataset following balloon dataset format in detectron2 colab tutorial.
I have 1 class, "cell".
My problem is, after I train the model, there are no masks visible when visualizing predictions. There are also no bounding boxes or prediction scores.
A visualized annotated image is like this but the predicted mask image is just a black background like this.
What could I be doing wrong? The colab I made is here
I have a problem similar to yours, the network predicts the box and the class but not the mask. The first thing to note is that the algorithm automatically resizes your images (DefaultTrainer), so you need to create a custom mapper to avoid this. Second thing is that you need to create a data augmentation, using which you significantly improve your convergence and generalization.
First, avoid the resize:
cfg.INPUT.MIN_SIZE_TRAIN = (608,)
cfg.INPUT.MAX_SIZE_TRAIN = 608
cfg.INPUT.MIN_SIZE_TRAIN_SAMPLING = "choice"
cfg.INPUT.MIN_SIZE_TEST = 608
cfg.INPUT.MAX_SIZE_TEST = 608
See too:
https://gilberttanner.com/blog/detectron-2-object-detection-with-pytorch/
How to use detectron2's augmentation with datasets loaded using register_coco_instances
https://eidos-ai.medium.com/training-on-detectron2-with-a-validation-set-and-plot-loss-on-it-to-avoid-overfitting-6449418fbf4e
I found a current workaround by using Matterport Mask R-CNN and the sample nuclei dataset instead: https://github.com/matterport/Mask_RCNN/tree/master/samples/nucleus
I'm trying to get a multiply patch output value change only the hue of a color. I want to keep saturation and luminance set to a fixed value.
With my current configuration it is only changing luminance. It looks like is changing all RGB channels equally. What would be the correct way to manipulate HSL channels individually?
After some research I found the solution. I was missing the 'pack' patch. This is a very useful patch I wasn't aware of. This is how my workflow ended up at the end:
I am actually working with DCGAN (Py-Torch implementation), and the output is always a 64-sized grid of artificial images per epoch. I would like to increase such number, but do not know (I do not know which parameter to change, but I tried to check the code without success).
Does anyone have some idea of how to do that?
The entire Py-Torch implementation of DCGAN can be found in the following link:
https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html
Change this value of below parameters to whatever the size of image you would want as output
Size of feature maps in generator
ngf = 64
Size of feature maps in discriminator
ndf = 64
I've noticed that on some sites, a very low resolution version of an image gets displayed underneath the final version before it's done loading, to give the impression that the page is loading faster. How is this done?
This is called progressive JPEG. When you save a picture using a tool like Photoshop you need to specify you want to use this JPEG flavor.
I've found this Photoshop "Save for Web" dialog sample where you will find the whole Progressive option enabled:
What you are asking for depends upon the decoder and display software used. As noted, it occurs in progressive JPEG images. In that type of JPEG, the coefficients are broken down into separate scans.
The decode then needs to update the image in between decoding scans rather than just at the end of the image.
There was more need for this in the days of dial up modems. Unless the image is really large, it is usually faster just to wait and display the whole image.
If you are programming, the display software you use may have an option to update after scans.
Most libraries now use a model where you decode an image file stream into a generic image buffer. Then you display the image buffer. In this model, there generally is no place to display the images on the fly.
In short, you enable this by creating progressive JPEG images. Whether the image displays fading in dependents entire on what is used to display the image.
As an alternative, you can batch optimize all your images using the ImageMagick's convert command like this:
convert -strip -interlace plane input.jpg output.jpg
You can use these other options instead of plane.
Or just prefix the output's filename with PJPEG
convert -strip input.jpg PJPEG:output.jpg
Along with a proper file search or filename expansion (e.g.):
for i in images/*; do
# Your conversion command
done
The strip option is for stripping any profiles or comments, to make the conversion "cleaner". You may also want to set the -quality option to reduce the quality loss.
Does GIF specify some form of grayscale format that would not require a palette? Normally, when you have a palette, then you can emulate grayscale by setting all palette entries to gray levels. But with other formats (e.g. TIFF) the grayscale palette is implicit and doesn't need to be saved in the file at all; it is assumed that a pixel value of 0 is black and 255 is white.
So is it possible to create such a GIF? I'm using the giflib C library (5.0.5), if that matters.
I forgot about this question. Meanwhile I would out the answer. The GIF format requires a palette. No way around that.