ML.Net object detection and bounding boxes

ML.Net object detection and bounding boxes - deep-learning

I have created an image classification model using the Microsoft model builder. Now I need to use that model to detect objects in a video stream and draw bounding boxes once the object is detected. I can not find a c# sample that uses the generated model from the model builder. All samples of object detection use ONNX models. I have not found a tool to convert the model.zip generated for model builder to model.onnx.
Any help would be appreciated.

The image classification in the model builder cannot detect object in images - it is a different problem.
What you can do is to combine the ONNX sample of object detection in images, with your own custom model.
Basically you run the onnx sample up until the parsing of the bounding boxes. Then you run that part of the image through your image classifier and use that label instead.
It is somewhat of a hack, and you will have a hard time getting anywhere near realtime performance.
ONNX sample for ONNX detection:
https://github.com/dotnet/machinelearning-samples/tree/master/samples/csharp/getting-started/DeepLearning_ObjectDetection_Onnx

Related

How does the finetune on transformer (t5) work?

I am using pytorch lightning to finetune t5 transformer on a specific task. However, I was not able to understand how the finetuning works. I always see this code :
tokenizer = AutoTokenizer.from_pretrained(hparams.model_name_or_path) model = AutoModelForSeq2SeqLM.from_pretrained(hparams.model_name_or_path)
I don't get how the finetuning is done, are they freezing the whole model and training the head only, (if so how can I change the head) or are they using the pre-trained model as a weight initializing? I have been looking for an answer for couple days already. Any links or help are appreciated.

If you are using PyTorch Lightning, then it won't freeze the head until you specify it do so. Lightning has a callback which you can use to freeze your backbone and training only the head module. See Backbone Finetuning
Also checkout Ligthning-Flash, it allows you to quickly build model for various text tasks and uses Transformers library for backbone. You can use the Trainer to specify which kind of finetuning you want to apply for your training.
Thanks

saliency map for multi-class multi-label classification?

I have found ways to do CAM/saliency map for multi class, but not multi label multi class. Do you know of any resources I can use to do it so I don't reinvent the wheel, or rather do you have advice for implementing it?
My specific use case is that I have a transfer learned ResNet that outputs a binary 1x11 vector. Each entry corresponds to presence of a certain feature in the input image. I want to be able to get a saliency map for each feature, so I can know what the network was looking at when deciding if each image has each of those features.

What is the format for the training/testing data for a Computer Vision model

I am trying to build a CV model for detecting objects in videos. I have about 6 videos that have the content I need to train my model. These are things like lanes, other vehicles, etc. that I’m trying to detect.
I’m curious about the format of the dataset I need to train my model with. I can have each frame of each video turn into images and create a large repository of images to train with or I can use the videos directly. Which way do you think is better?
I apologize if this isn't directly a programming question. I'm trying to assemble my data and I couldn't make up my mind about this.

Yolo version 3 is a good starting point. The trained model will have a .weight file and a .cfg file which can be used to detect object from webcam, video in computer or, in Android with opencv.
In opencv python, cv.dnn.readNetFromDarknet("yolov3_tiny.cfg", "CarDetector.weights") can be used load the trained model.
In android similar code,
String tinyYoloCfg = getPath("yolov3_tiny.cfg", this);
String tinyYoloWeights = getPath("CarDetector.weights", this);
Net tinyYolo = Dnn.readNetFromDarknet(tinyYoloCfg, tinyYoloWeights);
Function reference can be found here,
https://docs.opencv.org/4.2.0/d6/d0f/group__dnn.html
Your video frames need to be annotated with a tool that generates bounding boxes in yolo format and there are quite a few available. In order to train custom model this repository contains all necessary information,
https://github.com/AlexeyAB/darknet

How to create a keypoint detection model for human with custom dataset

I am trying to build a key-points detection model for human, as there are many pretrained networks available to generate key-points, but i want to practice myself to create a keypoint detection model with custom dataset, cant find anything in web if someone have some info's then please share.
I want more points specified to the human body, but to do so i need to create a custom model to generate such kind of key-points in human body, i checked some annotation tools but those annotation tool helps to adjust the points they have already specified when taking dataset like COCO etc, i think we cant add more points to the image. i just want to build a new model with custom key-points.
please share your views about my view on to the problem and please suggest some links if you have any idea about the same

I have created a detailed github repo Custom Keypoint Detection for dataset preparation, model training and inference on Centernet-hourglass104 keypoint detection model based on Tensorflow Object detection API with examples.
This could help you in training your keypoint detection model on custom dataset.
Any issues related to the project can be raised in the github itself and doubts can be cleared here.

Autodesk forge: how capture 2d orthographic view from a 3D model can export to 2D dxf

Is there a way to capture 2d orthographic view from a 3D model can export to 2D dxf using Autodesk Forge API?
The workflow I want to achieve is:
import a 3D file, for example STEP file.
capture orthographic views (standard, top, front, right, left, rear, and bottom). Ideally I want to capture all the views in a grid view.
Export these views into 2d vector format, for example DXF.
Thanks!

No such functionality is built into the Forge system.
What you can do yourself is retrieve the 3D coordinates of the faces, edges and vertices of the solids defined in the Forge model and flatten them into the different 2D planes that you mention yourself.
Wireframe view is easy of course.
There may well be some open source JavaScript libraries out there that support your in doing this, for more complex hidden line and raytracing operations.
I hope this helps.

The workflow you describe could be achieved with no user interaction. You may want to take a look at our Design Automation API, i.e AutoCAD in the Cloud. You could import the .step file in AutoCAD and use some custom package that would perform the projection and export to .dxf. That's the only Cloud product that would allow you to produce .dxf. But achieving the projection would be a fair piece of work!

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

ML.Net object detection and bounding boxes - deep-learning

Related

How does the finetune on transformer (t5) work?

saliency map for multi-class multi-label classification?

What is the format for the training/testing data for a Computer Vision model

How to create a keypoint detection model for human with custom dataset

Autodesk forge: how capture 2d orthographic view from a 3D model can export to 2D dxf

Categories

Resources