I do have a .caffemodel file, converted it to a .coreml file which I'm using in an app to recognize special types of bottles. It works, but it only shows if a bottle is in the picture or not.
Now I would like to know WHERE the bottle is and stumbled upon https://github.com/hollance/YOLO-CoreML-MPSNNGraph and https://github.com/r4ghu/iOS-CoreML-Yolo but I don't know how I can convert my .caffemodel to such files; is such a conversion even possible or does the training have to be completely different?
If your current model is a classifier then you cannot use it to detect where the objects are in the picture, since it was not trained to do this.
You will have to train a model that does not just do classification but also object detection. The detection part will tell you where the objects are in the image (it gives you zero or more bounding boxes), while the classification part will tell you what the objects are in those bounding boxes.
A very simple way to do this is to add a "regression" layer to the model that outputs 4 numbers in addition to the classification (so the model now has two outputs instead of just one). Then you train it to make these 4 numbers the coordinates of the bounding box for the thing in the image. (This model can only detect a single object in the image since it only returns the coordinates for a single bounding box.)
To train this model you need not just images but also the coordinates of the bounding box for the thing inside the image. In other words, you'll need to annotate your training images with bounding box info.
YOLO, SSD, R-CNN, and similar models build on this idea and allow for multiple detections per image.
Related
I’m new to DNN.
Is assuming anchor boxes’s positions when training an object detection model similar to initializing kernels with default weights in CNN?
I’m a beginner to Deep Neural Networks
Yes, the idea of initializing anchor boxes and kernels with default weights in a DNN model are similar. In both cases, the idea is to provide a starting point for the model to learn from, rather than starting with a completely random set of values.
If you are using a custom dataset then you can create anchor boxes from several website those are bellow:
https://www.makesense.ai/
https://roboflow.com/
In case you used a dataset from Kaggle then there declare the boxed in the training file and you have to calculate anchor boxed from there.
This just a reminder for each object it has 5 values
class, the center of x and y, width and height
I am using the Edit2D extension on an svf created from a 2D dwg file and have a question about transforms. The Autodesk.Edit2D.Polygon's that are created have a getArea() method which is great. However it's not in the correct unit scale. I tested one and something that should be roughly 230sf in size is coming back as about 2.8.
I notice that the method takes an argument of type Autodesk.Edit2D.MeasureTransform which I'm sure is what I need, however I don't know how to get that transform. I see that I can get viewer.model.getData().viewports[1].transform. However, that is just an array of 16 numbers and not a transform object so it creates an error when I try to pass it in.
I have not been able to find any documentation on this. Can someone tell me what units this is coming back in and/or how to convert to the same units as the underlying dwg file?
Related question, how do I tell what units the underlying DWG is in?
EDIT
To add to this, I tried to get all polylines in the drawing which have an area property. In this case I was able to figure out that the polyline in the underlying dwg was reporting its area in square inches (not sure if that's always the case). I generated Edit2D polygons based on the polylines so it basically just drew over them.
I then compared the area property from the polyline to the result of getArea() on the polygon to find the ratio. In this case it was always about 83 or 84 times smaller than the square foot value of the polyline it came from (there is some degree of error in my tracing system so I don't expect they would be exact at this point). However, that doesn't fit any unit value that I know of. So remaining questions:
What unit is this?
Is this consistent or do I need to look somewhere else for this scale?
Maybe you missed the section 3.2 Units for Areas and Lengths of https://forge.autodesk.com/en/docs/viewer/v7/developers_guide/advanced_options/edit2d-use/
If you use Edit2D without the MeasureExtension, it will display all coordinates in model units. You can customize units by modifying or replacing DefaultUnitHandler. More information is available in the Customize Edit2D tutorial.
and https://forge.autodesk.com/en/docs/viewer/v7/developers_guide/advanced_options/edit2d-customize/
BTW, we can get the DefaultUnitHandler by edit2dExt.defaultContext.unitHandler
Ok after a great deal of experimentation and frustration I think I have it working. I ended up looking direction into the js for the getArea() method in dev tools. Searching through the script, I found a class called DefaultMeasureTransform that inherits from MeasureTransform and takes a viewer argument. I was able to construct that and then pass it in as an argument to getArea():
const transform = new Autodesk.Edit2D.DefaultMeasureTransform(viewer);
const area = polygon.getArea(transform);
Now the area variable matches the units in the original cad file (within acceptable rounding error anyway, it's like .05 square inches off).
Would be nice to have better documentation on the coordinate systems, am I missing it somewhere? Either way this is working so hopefully it helps someone else.
I'm new to deep learning and trying cell segmentation with Detectron2 Mask R-CNN.
I use the images and mask images from http://celltrackingchallenge.net/2d-datasets/ - Simulated nuclei of HL60 cells - the training dataset. The folder I am using is here
I tried to create and register a new dataset following balloon dataset format in detectron2 colab tutorial.
I have 1 class, "cell".
My problem is, after I train the model, there are no masks visible when visualizing predictions. There are also no bounding boxes or prediction scores.
A visualized annotated image is like this but the predicted mask image is just a black background like this.
What could I be doing wrong? The colab I made is here
I have a problem similar to yours, the network predicts the box and the class but not the mask. The first thing to note is that the algorithm automatically resizes your images (DefaultTrainer), so you need to create a custom mapper to avoid this. Second thing is that you need to create a data augmentation, using which you significantly improve your convergence and generalization.
First, avoid the resize:
cfg.INPUT.MIN_SIZE_TRAIN = (608,)
cfg.INPUT.MAX_SIZE_TRAIN = 608
cfg.INPUT.MIN_SIZE_TRAIN_SAMPLING = "choice"
cfg.INPUT.MIN_SIZE_TEST = 608
cfg.INPUT.MAX_SIZE_TEST = 608
See too:
https://gilberttanner.com/blog/detectron-2-object-detection-with-pytorch/
How to use detectron2's augmentation with datasets loaded using register_coco_instances
https://eidos-ai.medium.com/training-on-detectron2-with-a-validation-set-and-plot-loss-on-it-to-avoid-overfitting-6449418fbf4e
I found a current workaround by using Matterport Mask R-CNN and the sample nuclei dataset instead: https://github.com/matterport/Mask_RCNN/tree/master/samples/nucleus
I am having two folders training and testing. Each folder has 14k images of a random single object like chair , box ,fan , can ,etc..
Addition to this i have 4 columns [x1,x2,y1,y2] for each image of training set in which that random object is enclosed(bounding box).
With this information i want to predict the bounding boxes for the test set.
I am very new in Computer Vision,It would be very helpful if any one can help me how to start with training such kind of models.
I found yolov3 but it includes classification as well.
I recommend you see the github code here.
In detect.py, there is do_detect() function.
this function returns both class and bbox that you want to get from image.
boxes = do_detect(model, image, confidence_threshold, nms_threshold)
I have a simple Flex paint application which let the user draw anything they want. My problem is how can I save it into MySQL database without converting it to an image format. Moreover, I want it to be save and at the same time to retrieve in case there is an unfinished drawing.
Thank you.
Define what objects can be drawn, e.g. straight lines, points, polygons with controlled corners, etc. For each object, create serialization methods. It may be binary format (I guess you won't need search drawing in database by features used): object type first, then it's attributes. For line, it would be end points, color, maybe width and drawing style (solid, striped, dotted.)
Entire drawing will have some properties too, like width/height, format version. Write those in the header, then will go all drawing objects. If you need layers, you can make special tag for them, which will act like separator between drawing objects:
header - layer 1 tag - line - line - line - layer 2 tag - square - circle
Binary format also gives ability to save drawing into file (or in database as a blob.) Also, you can go with XML, it just will use much more bytes (but will be easier to debug.)