How to do object detection on high resolution images? - deep-learning

I have images of around 2000 X 2000 pixels. The objects that I am trying to identify are of smaller sizes (typically around 100 X 100 pixels), but there are lot of them.
I don't want to resize the input images, apply object detection and rescale the output back to the original size. The reason for this is I have very few images to work with and I would prefer cropping (which would lead to multiple training instances per image) over resizing to smaller size (this would give me 1 input image per original image).
Is there a sophisticated way or cropping and reassembling images for object detection, especially at the time of inference on test images?
For training, I suppose I would just take out the random crops, and use those for training. But for testing, I want to know if there is a specific way of cropping the test image, applying object detection and combining the results back to get the output for the original large image.

I guess using several (I've never tried) networks simultaneously is a choice, for you, using 4*4 (500+50 * 500+50) with respect to each 1*1), then reassembling at the output stage, (probably with NMS at the border since you mentioned the target is dense).
But it's weird.
You know one insight in detection with high resolution images is altering the backbone with "U" shape shortcut, which solves some problems without resize the images. Refer U-Net.

Related

Size of image for prediction with SageMaker object detection?

I'm using the AWS SageMaker "built in" object detection algorithm (SSD) and we've trained it on a series of annotated 512x512 images (image_shape=512). We've deployed an endpoint and when using it for prediction we're getting mixed results.
If the image we use for prediciton is around that 512x512 size we're getting great accuracy and good results. If the image is significantly larger (e.g. 8000x10000) we get either wildly inaccurate, or no results. If I manually resize those large images to 512x512pixels the features we're looking for are no longer discernable to the eye. Which suggests that if my endpoint is resizing images, then that would explain why the model is struggling.
Note: Although the size in pexels is large, my images are basically line drawings on a white background. They have very little color and large patches of solid white, so they compress very well. I'm mot running into the 6Mb request size limit.
So, my questions are:
Does training the model at image_shape=512 mean my prediction images should also be that same size?
Is there a generally accepted method for doing object detection on very large images? I can envisage how I might chop the image into smaller tiles then feed each tile to my model, but if there's something "out of the box" that will do it for me, then that'd save some effort.
Your understanding is correct. The endpoint resizes images based on the parameter image_shape. To answer your questions:
As long as the scale of objects (i.e., expansion of pixels) in the resized images are similar between training and prediction data, the trained model should work.
Cropping is one option. Another method is to train separate models for large and small images as David suggested.

how to format the image data for training/prediction when images are different in size?

I am trying to train my model which classifies images.
The problem I have is, they have different sizes. how should i format my images/or model architecture ?
You didn't say what architecture you're talking about. Since you said you want to classify images, I'm assuming it's a partly convolutional, partly fully connected network like AlexNet, GoogLeNet, etc. In general, the answer to your question depends on the network type you are working with.
If, for example, your network only contains convolutional units - that is to say, does not contain fully connected layers - it can be invariant to the input image's size. Such a network could process the input images and in turn return another image ("convolutional all the way"); you would have to make sure that the output matches what you expect, since you have to determine the loss in some way, of course.
If you are using fully connected units though, you're up for trouble: Here you have a fixed number of learned weights your network has to work with, so varying inputs would require a varying number of weights - and that's not possible.
If that is your problem, here's some things you can do:
Don't care about squashing the images. A network might learn to make sense of the content anyway; does scale and perspective mean anything to the content anyway?
Center-crop the images to a specific size. If you fear you're losing data, do multiple crops and use these to augment your input data, so that the original image will be split into N different images of correct size.
Pad the images with a solid color to a squared size, then resize.
Do a combination of that.
The padding option might introduce an additional error source to the network's prediction, as the network might (read: likely will) be biased to images that contain such a padded border.
If you need some ideas, have a look at the Images section of the TensorFlow documentation, there's pieces like resize_image_with_crop_or_pad that take away the bigger work.
As for just don't caring about squashing, here's a piece of the preprocessing pipeline of the famous Inception network:
# This resizing operation may distort the images because the aspect
# ratio is not respected. We select a resize method in a round robin
# fashion based on the thread number.
# Note that ResizeMethod contains 4 enumerated resizing methods.
# We select only 1 case for fast_mode bilinear.
num_resize_cases = 1 if fast_mode else 4
distorted_image = apply_with_random_selector(
distorted_image,
lambda x, method: tf.image.resize_images(x, [height, width], method=method),
num_cases=num_resize_cases)
They're totally aware of it and do it anyway.
Depending on how far you want or need to go, there actually is a paper here called Spatial Pyramid Pooling in Deep Convolution Networks for Visual Recognition that handles inputs of arbitrary sizes by processing them in a very special way.
Try making a spatial pyramid pooling layer. Then put it after your last convolution layer so that the FC layers always get constant dimensional vectors as input . During training , train the images from the entire dataset using a particular image size for one epoch . Then for the next epoch , switch to a different image size and continue training .

Gap Between Sprites that are Tiled and Scaled

I am working on a map application and I have come across an issue with how my tiles are laying while scaling.
Here is a basic look at my structure:
There is obviously a lot more going on, but you get the idea. Now, I scale the Map App Sprite to zoom in. When that scaling occurs, there is a gap between each tile.
You can see the gap where 4 tiles meet here:
I am caching everything as a bitmap. For each Layer (which all extend Bitmap), I have smoothing set to true and pixelSnapping set to PixelSnapping.ALWAYS (pixel snapping shouldn't help here, but it shouldn't hurt either).
Does anyone have any suggestions on how to fix this issue?
(For the sake of completeness, the Map app is built entirely using AS3 and it is embedded in a Flex app)
Using integers for tile x,y locations and calculating those locations correctly is most likely the fix here, unless the images have seams in them!
The code that calculates and sets the x,y locations would be needed to properly pinpoint the issue in the code.
But, also if you are scaling that container sprite, you would want to ensure that you scale so that the width/height of a tile is an integer value.
For example, if you scale your sprite that contains these tiles, the widths/heights of the individual tiles might not always be integers, therefore creating those seams you see.
What you could do in that case is do your scaling by adjusting your width/height values by integer values, taking into account proportions, as opposed to using scaleX and scaleY on your container sprite.
Without seeing your code it's difficult to be sure, but it is possibly just a visual artifact due to scaling - eg: a 250px wide bitmap scaled to 155% should be rendered at 387.5px wide but thats impossible so its rendered at 388px wide - with the 0.5px part rendered as 1px at 50% alpha to give 'appearance' of 0.5px.
Ensuring scaled bitmaps widths/heights are always integers may solve it?
This looks like a rounding error.
Without code it's hard to know: it would be a great asset to you and us if posted a barebones example of your tiling class. In the process of subtraction you may very well discover your solution.
I'd offer that you should test what happens when you scale and algin four 100x100 bitmap images at various fine grain steps, to detect if it's a Flash rendering issue or a defect in your class.

programmatically create Background Images in Flex 3

I'm developing a visualization for certain parts of a Warehouse with Flex 3. In this visualization there are lot of blocks where 1 to x pallets can be placed where x is between 9 and 15. I need to represent each pallet with a black square, each place which is already assigned to a pallet but not physically taken with a grey square and each free place with a white square. I first thought to just use a canvas for each place on a block and change their color if the state changes. But the hundreds of canvases which are there as a result of this approach are not updated quickly enough for my purposes (screen freezes for a few seconds).
I don't want to use embedded images because of the great amount of images I had to embed in the application (those Images appear in 4 orientations).
My idea was to create background images which reflect the state of the whole block only when needed for that certain state and cache them, so that the computation time is spread over the whole runtime.
My problem now is I don't know how to create them in a way that I can use them as "backgroundImages". As far as I understand I would need them as a class object but I don't know how to achieve that, when not embedding the images.
I'm of course open to better approaches to solve my problem. Thanks for your support.
I would suggest using Graphics property of a Sprite for example. It provides basic drawing API, like drawing lines, circles and rectangles.
Besides, you can draw bitmap images on the Graphics to produce more advances results.

Repeating website background image - size vs speed

I was wondering if anyone has done any tests with background images. We normally create a background that repeats at least in one direction (x or y or both).
Example
Let's say we have a gradient background that repeats in X direction. Gradient height is 400px. We have several possibilities. We can create as small image as possible (1 pixel width and 400 pixels high) or we can create a larger image with 400 pixels height.
Observation
Since gradient is 400 pixels high we probably won't choose GIF format, because it can only store 256 adaptive colours. Maybe that's enaough if our gradient is subtle, since it doesn't have that many, but otherwise we'll probably rather store image as a 24-bit PNG image to preserve complete gradient detail.
Dilemma
Should we create an image of 1×400 px size that will be repeated n times horizontally or should we create an image of 100×400 px size to speed up rendering in the browser and have a larger image file size.
So. Image size vs. rendering speed? Which one wins? Anyone cares to test this? With regards to browser rendering speed and possible small image redraw flickering...
The rendering speed is the bottleneck here, since bigger tiles can be put into the browser's cache.
I've actually tried this for the major browsers, and at least some of them rendered noticeably slow on very small tiles.
So if increasing the bitmap size does not result in ridiculously big file sizes, I would definately go with that. Test it yourself and see. (Remember to include IE6, as still many people are stuck with it).
You might be able to strike a good balance between bitmap size and file size, but in general I'd try 50x400, 100x400, 200x400 and even 400x400 pixels.
I found out that there may be a huge difference in the rendering performance of the browser, if you have a background-image with width of 1px and repeating it. It's better to have a background-image with slightly larger dimensions. So a image with a width of 100px performs much better in the browser. This especially comes into play when you use a repeated background-image in a draggable layer on your website. The drag-performance is pretty bad with an often-repeated background-image.
I'd like to point out that for the cost of sending down an extra few rows (1-2 only example here) .8k - 1.6kb (if you can get away with 8-bit) more like 2.4kb - 4.0kb for 24bit
2 pixel columns more means the required iterations required to blit the background in is cut down to 1/3 for up to 1.6kb (8-bit) or 4kb (24bit)
even 1 extra column halves the blitting required down to half the element width.
If the background's done in less than a second for a 56.6k modem I reckon it's lean enough.
If small dimensions of an image have a negative impact on rendering, I'm sure any decent browser would blit the image internally a few times before tiling.
That said, I tend not to use 1 pixel image dimensions, so I can see the image clearly without resizing it. PNG compression is good enough to handle this at very little cost to file size, in most situations.
I'd put money on the bottleneck being the image download rather than the rendering engine doing the tiling, so go for the 1 pixel wide option.
Also the 24-bit PNG is redundant since you're still only getting 8 bits per channel (red, green and blue).
I generally prefer to go in between, 1pixel wide will probably make your gradient seem a bit unclear but you can do something like 5pixel width which gives enough room to the gradient to maintain consistency and clarity across the page.. but I would suggest you can add more patterns and images to a single image and then use background positioning(css sprites) to position them because download a single image of say 50kb would take less time comapared to 5 40kb images since the browser makes fewer requests to the server...
I have not benchmarked this but I'd bet that on most modern computers the rendering won't be an issue whereas you always want to save on on the image download time. I often go for the 1px type of graphics.