Slicing up large heterogenous images with binary annotations - deep-learning

I'm working on a deep learning project and have encountered a problem. The images that I'm using are very large and extremely detailed. They also contain a huge amount of necessary visual information, so it's hard to downgrade the resolution. I've gotten around this by slicing my images into 'tiles,' with resolution 512 x 512. There are several thousand tiles for each image.
Here's the problem—the annotations are binary and the images are heterogenous. Thus, an annotation can be applied to a tile of the image that has no impact on the actual classification. How can I lessen the impact of tiles that are 'improperly' labeled.
One thought is to cluster the tiles with something like a t-SNE plot and compare the ratio of the binary annotations for different regions (or 'classes'). I could then assign weights to images based on where it's located and then use that as an extra layer in my training. Very new to all of this, so wouldn't be surprised if that's an awful idea! Just thought I'd take a stab.
For background, I'm using transfer learning on Inception v3.

Related

CNN - Proper way to adjust existing network (e.g. UNet) to fit the input/output size

I've implemented the original UNet-Architecture from its paper. It works with 572x572 images and predicts 388x388 images (original paper is without padding). I have used this network for another task which has 2048x1024 images as input to create the same sized (2048x1024) target. This fails, because the image size doesnt agree with the network architecture. So I saw on github a code which sets padding = 1 for all convolutions and everything works. Fine.
My question: Is that a common thing? "Randomly (maybe experimentally is better)" tweakin padding or stride parameters until the dimensions fit? But then it isn't the original UNet anymore, right?
I am glad for any advices, because I want to learn a good way for using existing networks in different challenges.
Best

Dealing with false positives in binary image segmentation

I'm working on a model to identify bodies of water in satellite imagery. I've modified this example a bit to use a set of ~600 images I've labeled, and it's working quite well for true positives - it produces an accurate mask for imagery tiles with water in it. However, it produces some false-positives as well, generating masks for tiles that have no water in them whatsoever - tiles containing fields, buildings or parking lots, for instance. I'm not sure how to provide this sort of negative feedback to the model - adding false-positive images to the training set with an empty mask is having no effect, and I tried a training set made up of only false-positives, which just produces random noise, making me think that empty masks have no effect on this particular network.
I also tried training a binary classification network from a couple of examples I found to classify tiles as water/notwater. It doesn't seem to be working with a good enough accuracy to use a first-pass filter, with about 5k images per class. I used OSM label-maker for this, and the image sets aren't perfect - there are some water tiles in the non-water set and vice-versa, but even the training set isn't getting good accuracy (~.85 at best).
Is there a way to provide negative feedback to the binary image segmentation model? Should I use a larger training set? I'm kinda stuck here without an ability to provide negative feedback, and would appreciate any pointers on how to handle this.
Thanks!

3ds Max BlendedBoxMap support for the Forge Vewer

We are really interested in adding BlendingBoxMaps to certain objects in our model (such as terrain and larger geometry to avoid obvious repeating in the texture).
However, all our test has failed as objects containing BlendedBoxMap (see image below) turns black after translated to SVF. Any guidance would be highly appreciated.
Update:
If the above doesn't work. Is there any alternative to BlendedBoxMapping to achieve good looking textures for larger terrain? We are aware that baking the texture onto large mesh gives very blurry results as the SVF translation reduces all larger texture resolutions to 1024x1024 (which seems to be impossible to avoid) and stretches the 1024x1024 texture as much as needed to fit the large object.
If materials using BlendedBoxMap fail to correctly appear in the viewer, as a workaround, I would suggest trying to bake your material into a single Bitmat.
Here is an example of how to do so using bake to texture:
https://knowledge.autodesk.com/support/3ds-max/learn-explore/caas/CloudHelp/cloudhelp/2016/ENU/3DSMax/files/GUID-37414F9F-5E33-4B1C-A77F-547D0B6F511A-htm.html

Size of image for prediction with SageMaker object detection?

I'm using the AWS SageMaker "built in" object detection algorithm (SSD) and we've trained it on a series of annotated 512x512 images (image_shape=512). We've deployed an endpoint and when using it for prediction we're getting mixed results.
If the image we use for prediciton is around that 512x512 size we're getting great accuracy and good results. If the image is significantly larger (e.g. 8000x10000) we get either wildly inaccurate, or no results. If I manually resize those large images to 512x512pixels the features we're looking for are no longer discernable to the eye. Which suggests that if my endpoint is resizing images, then that would explain why the model is struggling.
Note: Although the size in pexels is large, my images are basically line drawings on a white background. They have very little color and large patches of solid white, so they compress very well. I'm mot running into the 6Mb request size limit.
So, my questions are:
Does training the model at image_shape=512 mean my prediction images should also be that same size?
Is there a generally accepted method for doing object detection on very large images? I can envisage how I might chop the image into smaller tiles then feed each tile to my model, but if there's something "out of the box" that will do it for me, then that'd save some effort.
Your understanding is correct. The endpoint resizes images based on the parameter image_shape. To answer your questions:
As long as the scale of objects (i.e., expansion of pixels) in the resized images are similar between training and prediction data, the trained model should work.
Cropping is one option. Another method is to train separate models for large and small images as David suggested.

how to format the image data for training/prediction when images are different in size?

I am trying to train my model which classifies images.
The problem I have is, they have different sizes. how should i format my images/or model architecture ?
You didn't say what architecture you're talking about. Since you said you want to classify images, I'm assuming it's a partly convolutional, partly fully connected network like AlexNet, GoogLeNet, etc. In general, the answer to your question depends on the network type you are working with.
If, for example, your network only contains convolutional units - that is to say, does not contain fully connected layers - it can be invariant to the input image's size. Such a network could process the input images and in turn return another image ("convolutional all the way"); you would have to make sure that the output matches what you expect, since you have to determine the loss in some way, of course.
If you are using fully connected units though, you're up for trouble: Here you have a fixed number of learned weights your network has to work with, so varying inputs would require a varying number of weights - and that's not possible.
If that is your problem, here's some things you can do:
Don't care about squashing the images. A network might learn to make sense of the content anyway; does scale and perspective mean anything to the content anyway?
Center-crop the images to a specific size. If you fear you're losing data, do multiple crops and use these to augment your input data, so that the original image will be split into N different images of correct size.
Pad the images with a solid color to a squared size, then resize.
Do a combination of that.
The padding option might introduce an additional error source to the network's prediction, as the network might (read: likely will) be biased to images that contain such a padded border.
If you need some ideas, have a look at the Images section of the TensorFlow documentation, there's pieces like resize_image_with_crop_or_pad that take away the bigger work.
As for just don't caring about squashing, here's a piece of the preprocessing pipeline of the famous Inception network:
# This resizing operation may distort the images because the aspect
# ratio is not respected. We select a resize method in a round robin
# fashion based on the thread number.
# Note that ResizeMethod contains 4 enumerated resizing methods.
# We select only 1 case for fast_mode bilinear.
num_resize_cases = 1 if fast_mode else 4
distorted_image = apply_with_random_selector(
distorted_image,
lambda x, method: tf.image.resize_images(x, [height, width], method=method),
num_cases=num_resize_cases)
They're totally aware of it and do it anyway.
Depending on how far you want or need to go, there actually is a paper here called Spatial Pyramid Pooling in Deep Convolution Networks for Visual Recognition that handles inputs of arbitrary sizes by processing them in a very special way.
Try making a spatial pyramid pooling layer. Then put it after your last convolution layer so that the FC layers always get constant dimensional vectors as input . During training , train the images from the entire dataset using a particular image size for one epoch . Then for the next epoch , switch to a different image size and continue training .