Does the number of Instances of an Object in a picture affect the training of a deep-learning object detector - deep-learning

I want to retrain the object detector Yolov4 to recognize figures of the board game Ticket to Ride.
While gathering pictures i was searching for an idea to reduce the amount of needed pictures.
I was wondering if more instances of an object/class in a picture means more "training per picture" which leads to "i need less pictures"
Is this correct? If not could you try to explain in simple terms?

On the roboflow page, they say that the YOLOv4 breaks detecting objects into two pieces:
regression to identify object positioning via bounding boxes;
classification to classify the objects into classes.
Regression (analysis) is - in short - a method of analysis that tries to find the data (images in your case) that is relevant. Classification - on the other hand - transforms the ‘interesting’ images from the previous step into a class (which is ’train piece’, ’tracks’, ’station’ or something else that is worth separating from the rest).
Now, to answer your question: “no, you need more pictures.” When taking more pictures, YOLOv4 is using more samples make / test a more accurate classification. Yet, you have to be careful what you want to classify. You do want the algorithm to extract a ’train’ class from an image, but not an ‘ocean’ class for example. To prevent this, make more (different) pictures of the classes you want to have!

Related

Keypoint detection when target appears multiple times

I am implementing a keypoint detection algorithm to recognize biomedical landmarks on images. I only have one type of landmark to detect. But in a single image, 1-10 of these landmarks can be present. I'm wondering what's the best way to organize the ground truth to maximize learning.
I considered creating 10 landmark coordinates per image and associate them with flags that are either 0 (not present) or 1 (present). But this doesn't seem ideal. Since the multiple landmarks in a single picture are actually the same type of biomedical element, the neural network shouldn't be trying to learn them as separate entities.
Any suggestions?
One landmark that can appear everywhere sounds like a typical CNN problem. Your CNN filters should learn which features make up the landmark, but they don't care where it appears. That would be the responsibility of the next layers. Hence, for training the CNN layers you can use a monochrome image as the target: 1 is "landmark at this pixel", 0 if not.
The next layers are basically processing the CNN-detected features. To train those, your ground truth should be basically the desired outcome. Do you just need a binary output (count>0)? A somewhat accurate estimate of the count? Coordinates? Orientation? NN's don't care that much what they learn, so just give it in training what it should produce in inference.

what is class score map?

I was going through vggnet paper and i came across the testing phase of vggnet.
During the testing phase, test image goes through the vggnet and a class score map is obtained. This class score map is spatially averaged to produce a fixed size vector.
I have googled class score map, but then i couldn't find any relevant results. I wish to know what is the role of class score map.
Any hint would be greatly helpful. Thanks
When you train an image recognition model, you train it for a specific image size (and resolution), let's say n_dims = [256, 256]. Now, in the prediction phase, you have images of different sizes (with respect to pixels), e.g. [1024, 1024]. You extract patches (you can resize the image first by lowering the resolution) and hover over the image patches with your model, and for each patch, you obtain a prediction for all classes (in a patch, more than one of the objects might be present), which you have to average somehow for the whole image at the end.
See OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks.
Instead, we explore the entire image by densely running the network at each location and at multiple
scales. While the sliding window approach may be computationally prohibitive for certain types
of model, it is inherently efficient in the case of ConvNets (see section 3.5). This approach yields
significantly more views for voting, which increases robustness while remaining efficient. The result
of convolving a ConvNet on an image of arbitrary size is a spatial map of C-dimensional vectors at
each scale.

Question about training a object detecting model but the images of train dataset have many miss-annotation objects

in the train dataset, some objects that should to be labeled but didn't cause some reasons.
like the picture below, some objects missed annotation(red rectangle is the labeled one).
image with miss-annotations
what should i do to the uncomplete labeled dataset and what's the effect to the model(maybe overfit to the test data cause false negetive when training)?
Most detection algorithms use portion of images without bounding boxes as examples of "negative" images, meaning images that should not be detected.
If you have many objects in your training set which should have been labeled but aren't, this is a problem because it confuses the training algorithm.
You should definitely think about manually adding the missing labels to the dataset.

Locate/Extract Patches from an Image

I have an image(e.g. 60x60) with multiple items inside it. Items are in the shape of square boxes, with say 4x4 dimensions, and are randomly placed within the image. The boxes(items) themselves are created with random patterns, some random pixels switched on and others switched off. So, it could be the same box repeated twice(or more in case of more than 2 items) in the image or could be entirely different.
I'm looking to create a deep learning model that could take in the original image(60x60) and output all the patches in the image.
This is all I have for now, but I can definitely share more details as the discussion starts. I'd be interested to weigh in different options that can help me achieve this objective. Thanks.
I would solve this using object detection. First I would train a network to detect those box like objects by cutting out patches of those objects. Then I would run a Faster R-CNN or something like this on it.
You might want to take a look at the stanford lecture on detection (slides here: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf).

Store a "routine" which, given some input, generates a 3d model

Well, it's the time of the year were I get busy on my next-generation, cutting edge, R&D project (just for the fun of it...and maybe some profit eventually).
This time, I've had a great idea for a service, which unfortunately I can't detail much.
However, a major part of this project is the ability to generate a 3d model out of certain input criteria. The generated model must be different on each generation.
As such, this is much different than the static models used in games - I think I will have to store actual code more than just model coords.
To give an example of some output:
var apple = new AppleGenerator();
apple->set_size_between(30, 50); // these two numbers are just samples...
apple->set_seeds_between(3, 8); // apple must have at least 3 seeds*
var apple_model = apple->generate();
// * I realize seeds may not be exactly part of the model, but I can't of anything else
So I need to tackle some points here:
How do I store these models as data?
Do you know of any tools that may help?
I need to incorporate a randomness factor (for example, the apples would have slightly different shapes each time)
I suppose math will play a good part here, but since these are complex shapes, it's going to be infeasible to cook up the necessary formulae for each model, right?
Also, textures must be relevant to each part of the model, as well as making the model look random (eg; I could be detailing a 40 to 60 percent red, and the rest green, for the generated apple).
This is in fact not a simple task. The solution varies a LOT depending on the complexity and variety of the objects you are trying to create.
Let's consider a few cases though:
Object is more or less known:
The most simple case is, to have a 3d model in the conventional way, and then randomize it a bit. Take the apple for example. The randomization can vary from the size of the apple to its texture colors to fruit damage.
All your objects can be described using NURBS surfaces:
In this case, you need to store enough data for the surface to be able to be generated, where of course this data can be randomized a bit.
Your objects have rotational symmetry:
In this case, generating a single curve and rotating it around the an axis can give you a shape. An apple is an example. You would need to store only the curve data and randomizing the shape could either be done on the curve (keeping symmetry) or on the final mesh.
On textures
This is way more complicated than the mesh generation. This is mainly because textures carry much more information than meshes (they are more detailed). You can have many texture generation strategies. In the case of your apple, you could select a few vertices, give them colors (one red, one green, another red etc) and interpolate the other vertex colors. This creates a smooth transition of colors which may look nice on an apple. If you are generating a knife however that just looks terrible.
In most cases, you need to be aware of which part of your mesh represents what, and generate the texture part by part. In the knife example above, you can generate the mesh in two steps; blade and handle each part's texture generated separately.
Conclusion
You can have a mixture of these of course. A meshGenerator class can take the data and based on whichever type they are, generates a mesh accordingly. Perhaps the first solution for object creation is the most suitable as any complicated object can be more easily defined by its triangles rather than NURBS.
Take a look at some of the basic architectural principles used to code Spore, the video game about evolving living creatures: http://chrishecker.com/My_liner_notes_for_spore
Here's an example of how to XML-serialize a mesh, along with some random morph behavior: http://www.ogre3d.org/tikiwiki/Morph+animation#The_XML_format_of_meshes_with_morph_animation
To make your apples all a bit different, you can apply a random transformation (or deformation). See for example: http://wiki.blender.org/index.php/Doc:2.4/Manual/Modifiers/Deform/MeshDeform
You want to use an established file format to avoid strange problems. It's more geometry than pure math. Your generate function would plot the polygons, and then your save method would interact with the formats.
https://stackoverflow.com/questions/441388/most-common-3d-model-format