Semantic segmentation without labels in a single class - deep-learning

I am kind of new to semantic segmentation. I am trying to perform segmentation of images having defects.
I have the defect images annotated using a annotation tool and I created the mask for each image. I wanted to predict If an image has defect and where exactly it is located. But my problem is my defects does not look same in all the images. Example: Defects on steel- Steel breakage, erroded surface etc. I am just trying to classify if the image has defect or not and where it is located. So is it wrong to train the neural network with these all types considered as defects even though not everything lookalike?
I thought to do a binary segmentation of defect to no defect. If I am not correct how can I perform segmentation for defect and non defect images?

You first have to well define your problem and your objectives:
If you only want to detect if your image has a defect or not, it's a binary classification problem and you affect a label (0 or 1) to each image.
If you want to localise the defect approximatively (like a bounding box), it's an object detection problem and it can be realised with one or more classes.
If you want to localise precisely the defect (in order to performe measures for instance) the best is semantic segmentation or instance segmentation.
If you want to classify the defect, you will need to create classes for each defect you want to classify.
There is no magical solution because it depends of the objectives of your project. I can give you the following advices because I made an internship on a similar project :
Look carefully at your data, if you have thousands of images it will take a long to create your semantic segmentation dataset. Be smarter by using data augmentation techniques.
If you want to classify the defects, be sure to have enough defects of each type to train your network. If your network only sees one defect type per epoch, it can't learn to detect it.
Be sure that your network can detect the defects you're providing (not a scratch of two pixels for instance or alignement defects).
Performing semantic segmentation to only knows if there is a defect or not seems overkill because it's a long and complex process (rebuilding the image, memory of intermediaries images in Unet, lot of computations). If you really want to apply this method, you may create a threshold to detect if the number of detected pixels as defect allows to classify the image as 'presenting a defect' or not.

One class should be enough for your use-case. If you want to be able to distinguish between different types of defects though, you could try creating attributes for that class. So the class would be if a pixel has a defect or not, and the attribute would be breakage, eroded pixel, etc. Then you could train a model to detect a crack on the semantic class and another one to identify which type of defect it is.
Make sure to use an annotation tool that supports creating attributes. Personally, I use hasty.ai as their automation assistants are great! But I guess most tools should be able to do so.

Related

How to make CNN learn positional constraints?

I am working on image segmentation problem in medical domain using fully connected CNN.
The problem is that for particular image, it could have a lot of similar structures. Our task is to find the correct one. One thing that I'd like to make the CNN learn is that there should not be a structure below another structure which is found first on the top. In the ground truth images, it is implicitly shown because there is only one structure in each image. Is it possible to achieve it with CNN? If not, what could be done to achieve it?
With a traditional CNN, positional constraints cannot be learned, because the learning all takes place in convolutional layers which are spatially invariant. One caveat to this is that a CNN will learn relative arrangements of features to a certain extent (if feature A is always above feature B, successful classification of pixels belonging to A will implicitly decrease the likelihood of pixels above being classified as B, at least for pixels that are "sufficiently close", because the boundary region will be the opposite of what the CNN has been trained on). If you do not consider that sufficient, you would need to either design a custom layer that somehow considers position (although if there is only one structure in each ground truth image I'm not sure your data is sufficient to teach anything about relative locations of multiple objects as-is beyond the aforementioned caveat) or just post-process the CNN output with a non-learning algorithm that is designed based on your expert knowledge of these positional constraints. As a fellow medical computer vision engineer, I would recommend the latter, especially since it sounds like you are dealing with a hard no-exceptions rule (why bother trying to learn a rule that is already simple?).

Training Faster R-CNN with multiple objects in an image

I want to train Faster R-CNN network with my own images to detect faces. I have checked quite a few Github libraries, but this is the example of the training file I always find:
/data/imgs/img_001.jpg,837,346,981,456,cow
/data/imgs/img_002.jpg,215,312,279,391,cat
But I can't find an example how to train with images containing couple objects. Should it be:
1) /data/imgs/img_001.jpg,837,346,981,456,cow,215,312,279,391,cow
or
2) /data/imgs/img_001.jpg,837,346,981,456,cow
/data/imgs/img_001.jpg,215,312,279,391,cow
?
I just could not help myself but quote FarCry3 here: "The definition of insanity is doing the same thing over and over and expecting different results."
(Note that this is purely in an entertaining context, and not meant to insult you in any way; I would not take the time to answer your question if I didn't think it worthwile)
In your second example, you would feed the exact same input data, but require the network to learn two different outcomes. But, as you already noted, it is not very common for many of the libraries to support multiple labels per image.
Oftentimes, this is purely done for the sake of simplicity, as it requires you to change your metrics, to accomodate for multiple outputs: Instead of having one-hot encoded targets, you now could have multiple "targets".
This is even more challenging in the task of object detection (and not object classification, as described before), since you now have to decide how you represent your targets.
If it is possible at all, I would personally restrict myself to labeling one class per image, or have a look at another image library that does support that, since the effort of rewriting that much code is probably not worth the minute improvement in the results.

Camera image recognition with small sample set

I need to visually recognise some flat pictures showed to camera. There are not many of them (maybe 30) but discrimination may depend on details. The input may be partly obscured or shadowed and is suspect to lighting changes.
The samples need to be updatable.
There are many existing frameworks for object detection, with the most reliable ones depending on deep learning methods (mostly convolutional networks). However, the pretrained models are not well optimised to discern flat imagery of course, and even if I start training from scratch, updating the system for new samples would take a cumbersome training process, if I am right about how this works.
Is it possible to use deep learning while still keeping the sample pool flexible?
Is there any other well known reliable method to detect images from a small sample set?
One can use well trained networks for visual classification like Inception or SqueezeNet, slice of the last layer(s) and add a simple statistical algorithm (for example k-nearest neighbour) that can be directly teached by the samples in a non-iterative fashion.
Most classification-related calculations like lighting and orientation insensitivity are already handled by the pre-trained network then, while the network's output keep enough information to allow statistical algorithms decide the image class.
An implementation using k-nearest neighbour is shown here: https://teachablemachine.withgoogle.com/ , the source is hosted here: https://github.com/googlecreativelab/teachable-machine .
Use transfer learning; you’ll still need to build a training set, but you’ll get better results than starting with random weights. Try to find a model trained on images similar to yours. You might also do some black box testing of the selected model with your curated images to baseline it’s response curve to your images.

Generate photos based on over 1M protos processed by ourselves before

We are running a huge team that process child photos for our customers, the team processes over 1M photos per year.
The process includes basic tuning of light, resize, apply some filters to make the skin looks better.
We want to use deep learning to complete the jobs as much as possible. Which means I want to choose one model and train that model using our existing data. And then use the trained model to generate photos by inputing the new unprocessed photos.
Is there existing model that I can make use of, or any papers have covered this scenario?
Any help would be appreciated, thanks!
You could try something like this: https://arxiv.org/pdf/1412.7725.pdf. But with deep learning and your amount of training data you can problem get any big enough model to work well.
Image generation is not what you should search for. Image generation means that an image is generated (almost) completely from nothing. You want to enhance an existing image.
Although I haven't read any papers about this scenario so far, searching for "image enhancement neural network" reveald several promising results:
A Survey on Image Enhancement Techniques: Classical Spatial Filter, Neural Network, Cellular Neural Network, and Fuzzy Filter: http://ieeexplore.ieee.org/document/4237993/
A new class of nonlinear filters for image enhancement: http://ieeexplore.ieee.org/document/150915/
An image enhancement technique combining sharpening and noise reduction: http://ieeexplore.ieee.org/document/1044761/
I guess you could do the following:
Create a CNN model. The only "special" thing of this model is that it does not have a fully connected layer as target, but another (3 channel) image. You have to adjust the error function to this. (Similar to semantic segmentation).

Order-issuing neural network?

I'm interested in writing certain software that uses machine learning, and performs certain actions based on external data.
However I've run into problem (that was always interesting to me) -
how is it possible to write machine learning software that issues orders or sequences of orders?
The problem is that as I understand it, neural network gets bunch on inputs, and "recalls" output based on results of previous trainings. Instantly (well, more or less). So I'm not sure how "issuing orders" could fit into that system, especially when actions performed by system affect the system with certain delay. I'm also a bit unsure how is it possible to train this thing.
Examples of such system:
1. First person shooter enemy controller. As I understand it, it is possible to implement neural network controller for the bot that will switch bot behavior strategies(well, assign priorities to them) based on some inputs (probably something like health, ammo, etc). But I don't see a way to make higher-order controller, that could issue sequence of commands like "go there, then turn left". Also, bot's actions will affect variables that control bot's behavior. I.e. shooting reduces ammo, falling from heights reduces health, etc.
2. Automated market trader. It is certainly possible to make system that will try to predict the next market price of something. However, I don't see how is it possible to make system that would issue order to buy something, watch the trend, then sell it back to gain profit/cover up losses.
3. Car driver. Again, (as I understand it) it is possible to make system that will maintain desired movement vector based on position/velocity/torque data and results of previous training. However I don't see a way to make such system (learn to) perform sequence of actions.
I.e. as I understood it, neural net is technically a matrix - you give it input, it produces output. But what about generating sequences of actions that could change environment program operates in?
If such tasks are not entirely suitable for neural networks, what else could be used?
P.S. I understand that the question isn't exactly clear, and I suspect that I'm missing some knowledge. So I'll appreciate some pointers (i.e. books/resources to read, etc).
You could try to connect the output neurons to controllers directly, e.g. moving forward, turning, or shooting in the ego shooter, or buying orders for the trader. However, I think that the best results are gained nowadays when you let the neural net solve one rather specific subproblem, and then let a "normal" program interpret its answer. For example, you could let the neural net construct a map overlay of "where do I want to be", which the bot then translates into movements. The neural network for the trader could produce a "how much do I want which paper", which the bot then translates into buying or selling orders.
The decision which subproblem should be solved by a neural network is a very central one for its design. The important thing is that good solutions can be taught to the neural network.
Edit: Expanding this in the examples: When the ego shooter bot gets shot, it should not have wanted to be there; when it gets to shoot someone else, it should have wanted to be there more. When the trader loses money from a paper, it should have wanted it less before; if it gains, it should have wanted it more. These things can be taught.
The problem you are describing is known as Reinforcement Learning. Reinforcement learning is essentially a machine learning algorithm (such as a neural network) coupled with a controller. It has been used for all of the applications you mention, even to drive real cars.