Predict bounding box only - deep-learning

I am having two folders training and testing. Each folder has 14k images of a random single object like chair , box ,fan , can ,etc..
Addition to this i have 4 columns [x1,x2,y1,y2] for each image of training set in which that random object is enclosed(bounding box).
With this information i want to predict the bounding boxes for the test set.
I am very new in Computer Vision,It would be very helpful if any one can help me how to start with training such kind of models.
I found yolov3 but it includes classification as well.

I recommend you see the github code here.
In detect.py, there is do_detect() function.
this function returns both class and bbox that you want to get from image.
boxes = do_detect(model, image, confidence_threshold, nms_threshold)

Related

Related to anchor boxes’s positions

I’m new to DNN.
Is assuming anchor boxes’s positions when training an object detection model similar to initializing kernels with default weights in CNN?
I’m a beginner to Deep Neural Networks
Yes, the idea of initializing anchor boxes and kernels with default weights in a DNN model are similar. In both cases, the idea is to provide a starting point for the model to learn from, rather than starting with a completely random set of values.
If you are using a custom dataset then you can create anchor boxes from several website those are bellow:
https://www.makesense.ai/
https://roboflow.com/
In case you used a dataset from Kaggle then there declare the boxed in the training file and you have to calculate anchor boxed from there.
This just a reminder for each object it has 5 values
class, the center of x and y, width and height

Detectron2 Mask R-CNN cell segmentation - nothing visible

I'm new to deep learning and trying cell segmentation with Detectron2 Mask R-CNN.
I use the images and mask images from http://celltrackingchallenge.net/2d-datasets/ - Simulated nuclei of HL60 cells - the training dataset. The folder I am using is here
I tried to create and register a new dataset following balloon dataset format in detectron2 colab tutorial.
I have 1 class, "cell".
My problem is, after I train the model, there are no masks visible when visualizing predictions. There are also no bounding boxes or prediction scores.
A visualized annotated image is like this but the predicted mask image is just a black background like this.
What could I be doing wrong? The colab I made is here
I have a problem similar to yours, the network predicts the box and the class but not the mask. The first thing to note is that the algorithm automatically resizes your images (DefaultTrainer), so you need to create a custom mapper to avoid this. Second thing is that you need to create a data augmentation, using which you significantly improve your convergence and generalization.
First, avoid the resize:
cfg.INPUT.MIN_SIZE_TRAIN = (608,)
cfg.INPUT.MAX_SIZE_TRAIN = 608
cfg.INPUT.MIN_SIZE_TRAIN_SAMPLING = "choice"
cfg.INPUT.MIN_SIZE_TEST = 608
cfg.INPUT.MAX_SIZE_TEST = 608
See too:
https://gilberttanner.com/blog/detectron-2-object-detection-with-pytorch/
How to use detectron2's augmentation with datasets loaded using register_coco_instances
https://eidos-ai.medium.com/training-on-detectron2-with-a-validation-set-and-plot-loss-on-it-to-avoid-overfitting-6449418fbf4e
I found a current workaround by using Matterport Mask R-CNN and the sample nuclei dataset instead: https://github.com/matterport/Mask_RCNN/tree/master/samples/nucleus

Autodesk Forge - Revit New Dimension added but not visible

I wrote a tool which draws model curves on a view and adds dimensions to it. The tool when run locally on my computer it works fine, lines are drawn and dimension are added and visible.
But, when I upload the code to Forge Design Automation, the lines are drawn and dimensions added. However the dimensions are not visible. After I downloaded the rvt file I can see the dimension through Revit Lookup, but not directly on the view.
Any suggestions where I might be going wrong?
Here is my code...
mCurve.LineStyle = buildingLineStyle;
//Adding dimension
ReferenceArray references = new ReferenceArray();
references.Append(mCurve.GeometryCurve.GetEndPointReference(0));
references.Append(mCurve.GeometryCurve.GetEndPointReference(1));
Dimension dim = doc.Create.NewDimension(groundFloor, line, references);
//Moving dimension to a suitable position
ElementTransformUtils.MoveElement(doc, dim.Id, 2 * XYZ.BasisY);
Thanks for your time in looking into this issue.
Thank you for your query and sorry to hear you are encountering this obscure problem.
I have no complete and guaranteed solution for you, but similar issues have been discussed in the past in the pure desktop Revit API, and two workarounds were suggested that might help in your case as well:
Newly created dimensioning not displayed
Dimension leader remains visible after removal
One workaround consists in creating a new dimension using the Reference objects obtained from the non-visible one.
The other one, in moving the dimension up and down within the same transaction to regenerate it.

Autodesk Forge Reality Capture: Not reconstructing complete meshes

I have lately been working on Forge Reality Capture API and using simple curl commands to reconstruct some scenes from images.
The process goes through smoothly but I never obtain a complete mesh.
1.I have tried increasing the number of images about 5 times ( from 20 to 100)
2.Tried both the obj and rcm formats ( my scenetype=object)
3.I investigated the camera positions after exporting the rcm mesh to Recap photo and only about 15 positions are shown. While I used about 100 frames in several positions. Only the images from these camera positions are stiched and get an incomplete mesh.
Is this a algorithm issue in the reconstruction?
Do I have to capture more pictures? The area is relatively small, a corridor of 50m*20m.
Can I re-process the same scene by adding additional photos?
Is there a necessity for some amount of texture?
I am grateful for the answers.
Cheers!
I suggest having a look at my blog post on Reality Capture API https://forge.autodesk.com/blog/hitchhikers-guide-reality-capture-api that might help you to debug and identify the source of the problems.
The source of the problem could range from object having transparent or reflective surfaces, to your images (or some of them) not being properly uploaded.
In general, if you don't get complete mesh, the best solution is to take more pictures of the missing spots, instead of more pictures of the entire object. If there are missing spots, it means that the engine could not figure out of your images how to stitch them - more images of those areas should help.

convert .caffemodel to yolo/object detection

I do have a .caffemodel file, converted it to a .coreml file which I'm using in an app to recognize special types of bottles. It works, but it only shows if a bottle is in the picture or not.
Now I would like to know WHERE the bottle is and stumbled upon https://github.com/hollance/YOLO-CoreML-MPSNNGraph and https://github.com/r4ghu/iOS-CoreML-Yolo but I don't know how I can convert my .caffemodel to such files; is such a conversion even possible or does the training have to be completely different?
If your current model is a classifier then you cannot use it to detect where the objects are in the picture, since it was not trained to do this.
You will have to train a model that does not just do classification but also object detection. The detection part will tell you where the objects are in the image (it gives you zero or more bounding boxes), while the classification part will tell you what the objects are in those bounding boxes.
A very simple way to do this is to add a "regression" layer to the model that outputs 4 numbers in addition to the classification (so the model now has two outputs instead of just one). Then you train it to make these 4 numbers the coordinates of the bounding box for the thing in the image. (This model can only detect a single object in the image since it only returns the coordinates for a single bounding box.)
To train this model you need not just images but also the coordinates of the bounding box for the thing inside the image. In other words, you'll need to annotate your training images with bounding box info.
YOLO, SSD, R-CNN, and similar models build on this idea and allow for multiple detections per image.