Is there any deep learning network that can identify if a change from an initial state to another belongs to a class?
For example, I want to classify the shift from this image
to this image
Related
I had taken a drone image of a village and tried to classify using a trained model. The process was successful but it returned a black mask without any contour of bunds traced on it. What could be the reason and solution?enter image description here
I want to retrain the object detector Yolov4 to recognize figures of the board game Ticket to Ride.
While gathering pictures i was searching for an idea to reduce the amount of needed pictures.
I was wondering if more instances of an object/class in a picture means more "training per picture" which leads to "i need less pictures"
Is this correct? If not could you try to explain in simple terms?
On the roboflow page, they say that the YOLOv4 breaks detecting objects into two pieces:
regression to identify object positioning via bounding boxes;
classification to classify the objects into classes.
Regression (analysis) is - in short - a method of analysis that tries to find the data (images in your case) that is relevant. Classification - on the other hand - transforms the ‘interesting’ images from the previous step into a class (which is ’train piece’, ’tracks’, ’station’ or something else that is worth separating from the rest).
Now, to answer your question: “no, you need more pictures.” When taking more pictures, YOLOv4 is using more samples make / test a more accurate classification. Yet, you have to be careful what you want to classify. You do want the algorithm to extract a ’train’ class from an image, but not an ‘ocean’ class for example. To prevent this, make more (different) pictures of the classes you want to have!
We have to make a custom dataset for object detection for CNN. So we're going to note objects for detection with bounding boxes. I referred to several guides for object detection labeling like PASCAL. However, we encountered an issue for labeling.
If we want to label people in dataset images, do we need to label all visible objects(=people) in a picture? If we skip some objects(=people) in a picture, does it effect on object detection? I added some examples for labeling. Image (1) is a case of labeling all visible people in an image. And in Image (2), we just labeled some people in entire image.
Is Image (2) influence bad effect on object detection? It it does, we're going to label all visible objects as possible in an image.
(Image 1) Labeling all visible objects in a picture
(Image 2) Labeling some visible objects in a picture
Object detection models usually consist of 2 basic building blocks:
Region Proposal Generator
Classifier
The first block generates various region proposals. As its name suggest, the region proposal is a candidate region that might contain an object.
The second block receives every region proposal and classify it.
If you neglected a true positive object within the image, then you force the object detection model to label this true positive object as background. This heavily affects the learning experience of the model. Think of it for a while. You ask the model to do different classifications for the same sort of object.
As a conclusion, you have to label each true positive object to the model.
Yes it is important, if you skip some persons the network will only partially learn how to detect and regress a person location. The network may be resilient to few labelling errors but not as many as in your second example image.
To train an accurate network you need to label every visible object instance and if you want your network to be resilient to object obfuscation you should label partially masked objects too.
You can easily verify this behaviour by training two networks: one with all labels and the other one with half of them.
I have an image(e.g. 60x60) with multiple items inside it. Items are in the shape of square boxes, with say 4x4 dimensions, and are randomly placed within the image. The boxes(items) themselves are created with random patterns, some random pixels switched on and others switched off. So, it could be the same box repeated twice(or more in case of more than 2 items) in the image or could be entirely different.
I'm looking to create a deep learning model that could take in the original image(60x60) and output all the patches in the image.
This is all I have for now, but I can definitely share more details as the discussion starts. I'd be interested to weigh in different options that can help me achieve this objective. Thanks.
I would solve this using object detection. First I would train a network to detect those box like objects by cutting out patches of those objects. Then I would run a Faster R-CNN or something like this on it.
You might want to take a look at the stanford lecture on detection (slides here: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf).
I am trying to train a deep neural network which uses information from two separate images in order to get a final image output similar to this. The difference is that my two input images don't have any spatial relation as they are completely different images with different amounts of information. How can I use a two-stream CNN or any other architecture using these kinds of input?
For reference: One image has size (5184x3456) and other has size (640x240).
First of all: It doesn't matter that you have two image. You have exactly the same problem when you have one image as input that single image can have different sizes.
There are multiple strategies to solve this problem:
Cropping and scaling: Just force the input in the size you need it. The cropping is done to make sure the aspect ratio is correct. Sometimes the same image but different parts of it are then fed into the network and the results are combined (e.g. averaged).
Convolutions + Global pooling: Convolutional layers don't care about the input size. At the point where you care about it, you can make global pooling. This means you have a pooling region which will always cover the complete input, no matter of the size.
Special layers: I don't remember the concept or name, but there are some layers which allow different sized input... maybe it was one of the attention-based approaches?
Combining two inputs
Look for "merge layer" or "concatenation layer" in the framework of your choice:
Keras
See also
Keras: Variable-size image to convolutional layer
Caffe: Allow images of different sizes as inputs