How to make CNN learn positional constraints? - deep-learning

I am working on image segmentation problem in medical domain using fully connected CNN.
The problem is that for particular image, it could have a lot of similar structures. Our task is to find the correct one. One thing that I'd like to make the CNN learn is that there should not be a structure below another structure which is found first on the top. In the ground truth images, it is implicitly shown because there is only one structure in each image. Is it possible to achieve it with CNN? If not, what could be done to achieve it?

With a traditional CNN, positional constraints cannot be learned, because the learning all takes place in convolutional layers which are spatially invariant. One caveat to this is that a CNN will learn relative arrangements of features to a certain extent (if feature A is always above feature B, successful classification of pixels belonging to A will implicitly decrease the likelihood of pixels above being classified as B, at least for pixels that are "sufficiently close", because the boundary region will be the opposite of what the CNN has been trained on). If you do not consider that sufficient, you would need to either design a custom layer that somehow considers position (although if there is only one structure in each ground truth image I'm not sure your data is sufficient to teach anything about relative locations of multiple objects as-is beyond the aforementioned caveat) or just post-process the CNN output with a non-learning algorithm that is designed based on your expert knowledge of these positional constraints. As a fellow medical computer vision engineer, I would recommend the latter, especially since it sounds like you are dealing with a hard no-exceptions rule (why bother trying to learn a rule that is already simple?).

Related

Creating a dataset of images for object detection for extremely specific task

Even though I am quite familiar with the concepts of Machine Learning & Deep Learning, I never needed to create my own dataset before.
Now, for my thesis, I have to create my own dataset with images of an object that there are no datasets available on the internet(just assume that this is ground-truth).
I have limited computational power so I want to use YOLO, SSD or efficientdet.
Do I need to go over every single image I have in my dataset by my human eyes and create bounding box center coordinates and dimensions to log them with their labels?
Thanks
Yes, you will need to do that.
At the same time, though the task is niche, you could benefit from the concept of transfer learning. That is, you can use a pre-trained backbone in order to help your model to learn faster/achieve better results/need fewer annotations example, but you will still need to annotate the new dataset on your own.
You can use software such as LabelBox, as a starting point, it is very good since it allows you to output the format in Pascal(VOC) format, YOLO and COCO format, so it is a matter of choice/what is more suitable for you.

Semantic segmentation without labels in a single class

I am kind of new to semantic segmentation. I am trying to perform segmentation of images having defects.
I have the defect images annotated using a annotation tool and I created the mask for each image. I wanted to predict If an image has defect and where exactly it is located. But my problem is my defects does not look same in all the images. Example: Defects on steel- Steel breakage, erroded surface etc. I am just trying to classify if the image has defect or not and where it is located. So is it wrong to train the neural network with these all types considered as defects even though not everything lookalike?
I thought to do a binary segmentation of defect to no defect. If I am not correct how can I perform segmentation for defect and non defect images?
You first have to well define your problem and your objectives:
If you only want to detect if your image has a defect or not, it's a binary classification problem and you affect a label (0 or 1) to each image.
If you want to localise the defect approximatively (like a bounding box), it's an object detection problem and it can be realised with one or more classes.
If you want to localise precisely the defect (in order to performe measures for instance) the best is semantic segmentation or instance segmentation.
If you want to classify the defect, you will need to create classes for each defect you want to classify.
There is no magical solution because it depends of the objectives of your project. I can give you the following advices because I made an internship on a similar project :
Look carefully at your data, if you have thousands of images it will take a long to create your semantic segmentation dataset. Be smarter by using data augmentation techniques.
If you want to classify the defects, be sure to have enough defects of each type to train your network. If your network only sees one defect type per epoch, it can't learn to detect it.
Be sure that your network can detect the defects you're providing (not a scratch of two pixels for instance or alignement defects).
Performing semantic segmentation to only knows if there is a defect or not seems overkill because it's a long and complex process (rebuilding the image, memory of intermediaries images in Unet, lot of computations). If you really want to apply this method, you may create a threshold to detect if the number of detected pixels as defect allows to classify the image as 'presenting a defect' or not.
One class should be enough for your use-case. If you want to be able to distinguish between different types of defects though, you could try creating attributes for that class. So the class would be if a pixel has a defect or not, and the attribute would be breakage, eroded pixel, etc. Then you could train a model to detect a crack on the semantic class and another one to identify which type of defect it is.
Make sure to use an annotation tool that supports creating attributes. Personally, I use hasty.ai as their automation assistants are great! But I guess most tools should be able to do so.

Camera image recognition with small sample set

I need to visually recognise some flat pictures showed to camera. There are not many of them (maybe 30) but discrimination may depend on details. The input may be partly obscured or shadowed and is suspect to lighting changes.
The samples need to be updatable.
There are many existing frameworks for object detection, with the most reliable ones depending on deep learning methods (mostly convolutional networks). However, the pretrained models are not well optimised to discern flat imagery of course, and even if I start training from scratch, updating the system for new samples would take a cumbersome training process, if I am right about how this works.
Is it possible to use deep learning while still keeping the sample pool flexible?
Is there any other well known reliable method to detect images from a small sample set?
One can use well trained networks for visual classification like Inception or SqueezeNet, slice of the last layer(s) and add a simple statistical algorithm (for example k-nearest neighbour) that can be directly teached by the samples in a non-iterative fashion.
Most classification-related calculations like lighting and orientation insensitivity are already handled by the pre-trained network then, while the network's output keep enough information to allow statistical algorithms decide the image class.
An implementation using k-nearest neighbour is shown here: https://teachablemachine.withgoogle.com/ , the source is hosted here: https://github.com/googlecreativelab/teachable-machine .
Use transfer learning; you’ll still need to build a training set, but you’ll get better results than starting with random weights. Try to find a model trained on images similar to yours. You might also do some black box testing of the selected model with your curated images to baseline it’s response curve to your images.

Generate photos based on over 1M protos processed by ourselves before

We are running a huge team that process child photos for our customers, the team processes over 1M photos per year.
The process includes basic tuning of light, resize, apply some filters to make the skin looks better.
We want to use deep learning to complete the jobs as much as possible. Which means I want to choose one model and train that model using our existing data. And then use the trained model to generate photos by inputing the new unprocessed photos.
Is there existing model that I can make use of, or any papers have covered this scenario?
Any help would be appreciated, thanks!
You could try something like this: https://arxiv.org/pdf/1412.7725.pdf. But with deep learning and your amount of training data you can problem get any big enough model to work well.
Image generation is not what you should search for. Image generation means that an image is generated (almost) completely from nothing. You want to enhance an existing image.
Although I haven't read any papers about this scenario so far, searching for "image enhancement neural network" reveald several promising results:
A Survey on Image Enhancement Techniques: Classical Spatial Filter, Neural Network, Cellular Neural Network, and Fuzzy Filter: http://ieeexplore.ieee.org/document/4237993/
A new class of nonlinear filters for image enhancement: http://ieeexplore.ieee.org/document/150915/
An image enhancement technique combining sharpening and noise reduction: http://ieeexplore.ieee.org/document/1044761/
I guess you could do the following:
Create a CNN model. The only "special" thing of this model is that it does not have a fully connected layer as target, but another (3 channel) image. You have to adjust the error function to this. (Similar to semantic segmentation).

Drawing two-dimensional point-graphs

I've got a list of objects (probably not more than 100), where each object has a distance to all the other objects. This distance is merely the added absolute difference between all the fields these objects share. There might be few (one) or many (dozens) of fields, thus the dimensionality of the distance is not important.
I'd like to display these points in a 2D graph such that objects which have a small distance appear close together. I'm hoping this will convey clearly how many sub-groups there are in the entire list. Obviously the axes of this graph are meaningless (I'm not even sure "graph" is the correct word to use).
What would be a good algorithm to convert a network of distances into a 2D point distribution? Ideally, I'd like a small change to the distance network to result in a small change in the graphic, so that incremental progress can be viewed as a smooth change over time.
I've made a small example of the sort of result I'm looking for:
Example Graphic http://en.wiki.mcneel.com/content/upload/images/GraphExample.png
Any ideas greatly appreciated,
David
Edit:
It actually seems to have worked. I treat the entire set of values as a 2D particle cloud, construct inverse square repulsion forces between all particles and linear attraction forces based on inverse distance. It's not a stable algorithm, the result tends to spin violently whenever an additional iteration is performed, but it does always seem to generate a good separation into visual clusters:
alt text http://en.wiki.mcneel.com/content/upload/images/ParticleCloudSolution.png
I can post the C# code if anyone is interested (there's quite a lot of it sadly)
Graphviz contains implementations of several different approaches to solving this problem; consider using its spring model graph layout tools as a basis for your solution. Alternatively, its site contains a good collection of source material on the related theory.
The previous answers are probably helpful, but unfortunately given your description of the problem, it isn't guaranteed to have a solution, and in fact most of the time it won't.
I think you need to read in to cluster analysis quite a bit, because there are algorithms to sort your points into clusters based on a relatedness metric, and then you can use graphviz or something like that to draw the results. http://en.wikipedia.org/wiki/Cluster_analysis
One I quite like is a 'minimum-cut partitioning algorithm', see here: http://en.wikipedia.org/wiki/Cut_(graph_theory)
You might want to Google around for terms such as:
automatic graph layout; and
force-based algorithms.
GraphViz does implement some of these algorithms, not sure if it includes any that are useful to you.
One cautionary note -- for some algorithms small changes to your graph content can result in very large changes to the graph.