Training Faster R-CNN with multiple objects in an image - deep-learning

I want to train Faster R-CNN network with my own images to detect faces. I have checked quite a few Github libraries, but this is the example of the training file I always find:
/data/imgs/img_001.jpg,837,346,981,456,cow
/data/imgs/img_002.jpg,215,312,279,391,cat
But I can't find an example how to train with images containing couple objects. Should it be:
1) /data/imgs/img_001.jpg,837,346,981,456,cow,215,312,279,391,cow
or
2) /data/imgs/img_001.jpg,837,346,981,456,cow
/data/imgs/img_001.jpg,215,312,279,391,cow
?

I just could not help myself but quote FarCry3 here: "The definition of insanity is doing the same thing over and over and expecting different results."
(Note that this is purely in an entertaining context, and not meant to insult you in any way; I would not take the time to answer your question if I didn't think it worthwile)
In your second example, you would feed the exact same input data, but require the network to learn two different outcomes. But, as you already noted, it is not very common for many of the libraries to support multiple labels per image.
Oftentimes, this is purely done for the sake of simplicity, as it requires you to change your metrics, to accomodate for multiple outputs: Instead of having one-hot encoded targets, you now could have multiple "targets".
This is even more challenging in the task of object detection (and not object classification, as described before), since you now have to decide how you represent your targets.
If it is possible at all, I would personally restrict myself to labeling one class per image, or have a look at another image library that does support that, since the effort of rewriting that much code is probably not worth the minute improvement in the results.

Related

Semantic segmentation without labels in a single class

I am kind of new to semantic segmentation. I am trying to perform segmentation of images having defects.
I have the defect images annotated using a annotation tool and I created the mask for each image. I wanted to predict If an image has defect and where exactly it is located. But my problem is my defects does not look same in all the images. Example: Defects on steel- Steel breakage, erroded surface etc. I am just trying to classify if the image has defect or not and where it is located. So is it wrong to train the neural network with these all types considered as defects even though not everything lookalike?
I thought to do a binary segmentation of defect to no defect. If I am not correct how can I perform segmentation for defect and non defect images?
You first have to well define your problem and your objectives:
If you only want to detect if your image has a defect or not, it's a binary classification problem and you affect a label (0 or 1) to each image.
If you want to localise the defect approximatively (like a bounding box), it's an object detection problem and it can be realised with one or more classes.
If you want to localise precisely the defect (in order to performe measures for instance) the best is semantic segmentation or instance segmentation.
If you want to classify the defect, you will need to create classes for each defect you want to classify.
There is no magical solution because it depends of the objectives of your project. I can give you the following advices because I made an internship on a similar project :
Look carefully at your data, if you have thousands of images it will take a long to create your semantic segmentation dataset. Be smarter by using data augmentation techniques.
If you want to classify the defects, be sure to have enough defects of each type to train your network. If your network only sees one defect type per epoch, it can't learn to detect it.
Be sure that your network can detect the defects you're providing (not a scratch of two pixels for instance or alignement defects).
Performing semantic segmentation to only knows if there is a defect or not seems overkill because it's a long and complex process (rebuilding the image, memory of intermediaries images in Unet, lot of computations). If you really want to apply this method, you may create a threshold to detect if the number of detected pixels as defect allows to classify the image as 'presenting a defect' or not.
One class should be enough for your use-case. If you want to be able to distinguish between different types of defects though, you could try creating attributes for that class. So the class would be if a pixel has a defect or not, and the attribute would be breakage, eroded pixel, etc. Then you could train a model to detect a crack on the semantic class and another one to identify which type of defect it is.
Make sure to use an annotation tool that supports creating attributes. Personally, I use hasty.ai as their automation assistants are great! But I guess most tools should be able to do so.

Camera image recognition with small sample set

I need to visually recognise some flat pictures showed to camera. There are not many of them (maybe 30) but discrimination may depend on details. The input may be partly obscured or shadowed and is suspect to lighting changes.
The samples need to be updatable.
There are many existing frameworks for object detection, with the most reliable ones depending on deep learning methods (mostly convolutional networks). However, the pretrained models are not well optimised to discern flat imagery of course, and even if I start training from scratch, updating the system for new samples would take a cumbersome training process, if I am right about how this works.
Is it possible to use deep learning while still keeping the sample pool flexible?
Is there any other well known reliable method to detect images from a small sample set?
One can use well trained networks for visual classification like Inception or SqueezeNet, slice of the last layer(s) and add a simple statistical algorithm (for example k-nearest neighbour) that can be directly teached by the samples in a non-iterative fashion.
Most classification-related calculations like lighting and orientation insensitivity are already handled by the pre-trained network then, while the network's output keep enough information to allow statistical algorithms decide the image class.
An implementation using k-nearest neighbour is shown here: https://teachablemachine.withgoogle.com/ , the source is hosted here: https://github.com/googlecreativelab/teachable-machine .
Use transfer learning; you’ll still need to build a training set, but you’ll get better results than starting with random weights. Try to find a model trained on images similar to yours. You might also do some black box testing of the selected model with your curated images to baseline it’s response curve to your images.

Combine two views of an image to strengthen the output of a DL model

I have been trying this problem for weeks but to no avail.
My problem is:
Deep Learning Model has the following information:
INPUT: Sequence of Images
OUTPUT: What is happening in the image, i.e. categorise what is the activity happening from a sequence of 10 activities.
I have two cameras recording the same activity from two views, how could I combine those two views to improve the accuracy?
I think you should use DELF features, extract features of both similar images and combine them.
How to combine the two views is fully dependent on your understanding of the problem. Let me give you two different examples,
CASE I: when you review your training data, you can easily tell which camera is better for some data. e.g. one camera may capture everything useful, while the other camera may not due to possible occlusions (note: I am not saying one camera is always better than the other). In this case, you may use a later fusion technique to only fuse the two resulting features representing the sequences from two cameras.
CASE II: it is difficult for you to tell which camera is better. This basically indicates that you may not see performance boost after you consider both cameras, but maybe some small improvement.
Finally, when you say two cameras, is it possible for you do something like the binocular stereo vision? In this case, you may obtain the extra depth information that is not included in any single camera, and maybe helpful for the recognition task.

Drawing two-dimensional point-graphs

I've got a list of objects (probably not more than 100), where each object has a distance to all the other objects. This distance is merely the added absolute difference between all the fields these objects share. There might be few (one) or many (dozens) of fields, thus the dimensionality of the distance is not important.
I'd like to display these points in a 2D graph such that objects which have a small distance appear close together. I'm hoping this will convey clearly how many sub-groups there are in the entire list. Obviously the axes of this graph are meaningless (I'm not even sure "graph" is the correct word to use).
What would be a good algorithm to convert a network of distances into a 2D point distribution? Ideally, I'd like a small change to the distance network to result in a small change in the graphic, so that incremental progress can be viewed as a smooth change over time.
I've made a small example of the sort of result I'm looking for:
Example Graphic http://en.wiki.mcneel.com/content/upload/images/GraphExample.png
Any ideas greatly appreciated,
David
Edit:
It actually seems to have worked. I treat the entire set of values as a 2D particle cloud, construct inverse square repulsion forces between all particles and linear attraction forces based on inverse distance. It's not a stable algorithm, the result tends to spin violently whenever an additional iteration is performed, but it does always seem to generate a good separation into visual clusters:
alt text http://en.wiki.mcneel.com/content/upload/images/ParticleCloudSolution.png
I can post the C# code if anyone is interested (there's quite a lot of it sadly)
Graphviz contains implementations of several different approaches to solving this problem; consider using its spring model graph layout tools as a basis for your solution. Alternatively, its site contains a good collection of source material on the related theory.
The previous answers are probably helpful, but unfortunately given your description of the problem, it isn't guaranteed to have a solution, and in fact most of the time it won't.
I think you need to read in to cluster analysis quite a bit, because there are algorithms to sort your points into clusters based on a relatedness metric, and then you can use graphviz or something like that to draw the results. http://en.wikipedia.org/wiki/Cluster_analysis
One I quite like is a 'minimum-cut partitioning algorithm', see here: http://en.wikipedia.org/wiki/Cut_(graph_theory)
You might want to Google around for terms such as:
automatic graph layout; and
force-based algorithms.
GraphViz does implement some of these algorithms, not sure if it includes any that are useful to you.
One cautionary note -- for some algorithms small changes to your graph content can result in very large changes to the graph.

Can a Sequence Diagram realistically capture your logic in the same depth as code?

I use UML Sequence Diagrams all the time, and am familiar with the UML2 notation.
But I only ever use them to capture the essence of what I intend to do. In other words the diagram always exists at a level of abstraction above the actual code. Every time I use them to try and describe exactly what I intend to do I end up using so much horizontal space and so many alt/loop frames that its not worth the effort.
So it may be possible in theory but has anyone every really used the diagram in this level of detail? If so can you provide an example please?
I have the same problem but when I realize that I am going low-level I re-read this:
You should use sequence diagrams
when you want to look at the behavior
of several objects within a single use
case. Sequence diagrams are good at
showing collaborations among the
objects; they are not so good at
precise definition of the behavior.
If you want to look at the behavior of
a single object across many use cases,
use a state diagram. If you want
to look at behavior across many use
cases or many threads, consider an
activity diagram.
If you want to explore multiple
alternative interactions quickly, you
may be better off with CRC cards,
as that avoids a lot of drawing and
erasing. It’s often handy to have a
CRC card session to explore design
alternatives and then use sequence
diagrams to capture any interactions
that you want to refer to later.
[excerpt from Martin Fowler's UML Distilled book]
It's all relative. The law of diminishing returns always applies when making a diagram. I think it's good to show the interaction between objects (objectA initializes objectB and calls method foo on it). But it's not practical to show the internals of a function. In that regard, a sequence diagram is not practical to capture the logic at the same depth as code. I would argue for intricate logic, you'd want to use a flowchart.
I think there are two issues to consider.
Be concrete
Sequence diagrams are at their best when they are used to convey to a single concrete scenario (of a use case for example).
When you use them to depict more than one scenario, usually to show what happens in every possible path through a use case, they get complicated very quickly.
Since source code is just like a use case in this regard (i.e. a general description instead of a specific one), sequence diagrams aren't a good fit. Imagine expanding x levels of the call graph of some method and showing all that information on a single diagram, including all if & loop conditions..
That's why 'capturing the essence' as you put it, is so important.
Ideally a sequence diagram fits on a single A4/Letter page, anything larger makes the diagram unwieldy. Perhaps as a rule of thumb, limit the number of objects to 6-10 and the number of calls to 10-25.
Focus on communication
Sequence diagrams are meant to highlight communication, not internal processing.
They're very expressive when it comes to specifying the communication that happens (involved parties, asynchronous, synchronous, immediate, delayed, signal, call, etc.) but not when it comes to internal processing (only actions really)
Also, although you can use variables it's far from perfect. The objects at the top are, well, objects. You could consider them as variables (i.e. use their names as variables) but it just isn't very convenient.
For example, try depicting the traversal of a linked list where you need to keep tabs on an element and its predecessor with a sequence diagram. You could use two 'variable' objects called 'current' and 'previous' and add the necessary actions to make current=current.next and previous=current but the result is just awkward.
Personally I have used sequence diagrams only as a description of general interaction between different objects, i.e. as a quick "temporal interaction sketch". When I tried to get more in depth, all quickly started to be confused...
I've found that the best compromise is a "simplified" sequence diagram followed by a clear but in depth description of the logic underneath.
The answer is no - it does capture it better then your source code!
At least in some aspects. Let me elaborate.
You - like the majority of the programmers, including me - think in source code lines. But the software end product - let's call it the System - is much more than that. It only exists in the mind of your team members. In better cases it also exists on paper or in other documented forms.
There are plenty of standard 'views' to describe the System. Like UML Class diagrams, UML activity diagrams etc. Each diagram shows the System from another point of view. You have static views, dynamic views, but in an architectural/software document you don't have to stop there. You can present nonstandard views in your own words, e.g. deployment view, performance view, usability view, company-values view, boss's favourite things view etc.
Each view captures and documents certain properties of the System.
It's very important to realize that the source code is just one view. The most important though because it's needed to generate a computer program. But it doesn't contain every piece of information of your System, nor explicitly nor implicitly. (E.g. the shared data between program modules, what are only connected via offline user activity. No trace in the source). It's just a static view which helps very little to understand your processes, the runtime dynamics of your living-breathing program.
A classic example of the Observer pattern. Especially if it used heavily, you'll hardly understand the System mechanis from the source code. That's why you use Sequence diagrams in that case. It captures the 'dynamic logic' of your system a lot better than your source code.
But if you meant some kind of business logic in great detail, you are better off with plain text/source code/pseudocode etc. You don't have to use UML diagrams just because they are the standard. You can use usecase modeling without drawing usecase diagrams. Always choose the view what's the best for you and for your purpose.
U.M.L. diagrams are guidelines, not strictly rules.
You don't have to make them exactly & detailed as the source code, but, you may try it, if you want it.
Sometimes, its possible to do it, sometimes, its not possible, because of the detail or complexity of systems, or don't have the time or details to do it.
Cheers.
P.D.
Any cheese-burguer or tuna-fish-burguer for the cat ?