I have a dataset of only cat images. After training for object detection using fasterRCNN . I have to detect objects which are not cats let's say dogs. How do I do it ?
I have trained on cats using faster RCNN and I put a threshold of 0.95. so which ever region there isn't a bounding box I am assuming there is a non cat . But I am not getting good results.
Related
I train yolo to detect an object. Know I want to train him to distinguish the 2 kind of objects.
How can I do it?
for example:
I trained the weights to detect a dog, and know I want to distinguish between big dog and small dog(still both of them are dogs).
I'm currently working on my bachelor project and I'm using the PointNet deep neural network.
My project group and I have created a dataset of point clouds(an unsorted list of x amount of 3d coordinates) and segmentation files, but we can't train PointNet to predict segmentation with the dataset.
Each segmentation file is a list containing the same amount of rows, as points in the corresponding point cloud, and each row is either a 1 or a 2, depending on the corresponding point belonging to segment 1 or 2.
When PointNet predicts it outputs a list of x elements, where each element is the segment that PointNet predicts the corresponding point belongs to.
When we run the benchmark dataset from the original PointNet implementation, the system runs and can predict segmentation, so we know that the error is in the dataset somewhere, even though we have tried our best to have our dataset look like the original benchmark dataset.
The implemented PointNet uses pytorch conv2d, maxpool2d and linear transformation. For calculating the loss, both the nn.functional.nll_loss and the nn.NLLLos functions have been used. When using the nn.NLLLos the weight parameter was set to a tensor of [1,100] to combat potential imbalance of the data.
These are the thing we have tried:
We have tried downsampling the point clouds i.e remove points using voxel downsampling
We have tried downscaling and normalize all values so they are between 0 and 1, using this formula (data - np.min(data)) / (np.max(data) - np.min(data))
We have tried running an euclidean clustering function on the data, to have each scanned object for it self
We have tried replicating another dataset, which was created using the same raw data, which we know have worked before
In the attached link, images of the datafiles with a description can be found.
Cheers everyone
I am having 3 different datasets, 3 of them were all blood smear image stained with the same chemical substance. Blood smear images are images that capture your blood, include Red, White blood cells inside.
The first dataset contain 2 classes : normal vs blood cancer
The second dataset contain 2 classes: normal vs blood infection
The third dataset contain 2 classes: normal vs sickle cell disease
So, what i want to do is : when i input a blood smear image, the AI system will tell me whether it was : normal , or blood cancer or blood infection or sickle cell disease (4 classes classification task)
What should i do?
Should i mix these 3 datasets and train only 1 model to detect 4 classes ?
Or should i train 3 different models and them combine them? If yes, what method should i use to combine?
Update : i searched for a while. Can this task called "Learning without forgetting?"
I think it depends on the data.
You may use three different models and make three binary predictions on each image. So you get a vote (probability) for each x vs. normal. If binary classifications are accurate, this should deliver okay results. But you kind of get a cummulated missclassification or error in this case.
If you can afford, you can train a four class model and compare the test error to the series of binary classifications. I understand that you already have three models. So training another one may be not too expensive.
If ONLY one of the classes can occur, a four class model might be the way to go. If in fact two (or more) classes can occur jointly, a series of binary classifications would make sense.
As #Peter said it is totally data dependent. If the images of the 4 classes, namely normal ,blood cancer ,blood infection ,sickle cell disease are easily distinguishable with your naked eyes and there is no scope of confusion among all the classes then you should simply go for 1 model which gives out probabilities of all the 4 classes(as mentioned by #maxi marufo). If there is confusion between classes and the images are NOT distinguishable with naked eyes or there is a lot of scope of confusion between the classes then you should use 3 different models but then you'll need. You simply get the predicted probabilities from all the 3 models suppose p1(normal) and p1(c1), p2(normal) and p2(c2), p3(normal) and p3(c3). Now you can average(p1(normal),p2(normal),p3(normal)) and the use a softmax for p(normal), p1(c1), p2(c2), p3(c3) . Out of multiple ways you could try, the above could be one.
This is a multiclass classification problem. You can train just one model, with the final layer being a full connected (dense) layer of 4 units (i.e. output dimension) and softmax activation function.
As I learn more and more about ML (I am a mobile DEV) I'm starting to form an analogy in my head. I would like the communities opinion / validation.
As a front end DEV you have a backend and an API that you can make requests to. The standard format for the inputs and outputs to the API is JSON.
I'm running into a problem with ML Models that I am trying to use where I don't know how to read the expected input (API) and I don't know how to decode the expected output.
So far I my experience has been fragmented because some models say "Give me an image of [1,2,120,120]" or something like that.
To analogize, is there a unified way to define inputs and outputs for a ML model like JSON unifies the inputs and outputs for an backend API?
If so, what are some rules one must follow to encode and decode data into this format?
Assuming this "ML Model" is in the context of running an input through say a trained pytorch model's forward pass to get an output, the unified way to define inputs and outputs for an ML model are through Tensors. Tensors are essentially a multi-dimensional matrix containing elements of a single data type. Think multi-dimensional lists with a single data type.
Tensors:MLModels::JSON:WebAPI
An Example using an Object Detector
Model
Let's say your model example with the image is an object detector model that takes in an image as input and outputs either dog or cat
The input would usually be:
A tensor representation of an Image with the shape of [1, 2, 120, 120] where 1 represents the batch size, 2 is the dimension of your rgb channels, and 120x120 is the width and height of an image.
The output would usually be:
A normalized 2 dimensional tensor like [0.7, 0.3] where index 0 represents the probability of the image depicting a dog and index 1 represents the probability it's a cat.
Encoding and Decoding
Decoding the output to a string like "dog" or "cat" is obvious.
Encoding an image is slightly less obvious. At its heart, the format
of an image is that of a tensor...a multi-dimensional matrix
containing a single datatype. So is still intuitive to encode an
image in the form of a JPEG or PNG to a tensor representation through
the rgb channel dimensions and the pixel values for each channel.
Typically image files are loaded in using libraries and methods like
the python imaging library and pytorch's
torchvision.transforms.ToTensor().
This example is very specific to an object detector type model, but most supervised ML models will output a tensor like the above or a one-hot label. Most ML models in general will always have data inputs and outputs that can be represented as Tensors.
I am trying to build multilabel classifier so I would be able to identify and count objects in the picture. I have my own dataset that I have collected, and I labeled regions.
For the sake of simplicity, this is an example. Let's say that I am building a classifier to classify dogs and cats that appear in the same image, I have prepared the dataset in a way that I define regions.
if the image's size is 1200*1200
First region: [[0,600],[0,1200] will be labeled as a 0 (cat),
Second region: [[600,1200],[0,1200] will be labeled as a 1 (dog).
By reading this example in here, I couldn't figure out how to define regions so I train the classifier to know that an object X is in region [x,y,w,z] ?
Preparing the script this way would help me to categorize the image, but it wouldn't help counting number of objects in the image
Any help would be appreciated.