I once read the following statement on using 1*1 convolution, which can help connect the input and output with different dimensions:
For example, to reduce the activation dimensions (HxW) by a factor of 2, you can use a 1x1 convolution with a stride of 2.
How to understand this example?
You can use a stride of 2. However, I wouldn't say this is a trick, not like a magic solution to retain information. You will lose half of the information. I wouldn't qualify this method as a pooling method either.
The kernel size is one pixel high and one pixel wide, and will move (stride) two pixels at a time. As a consequence, for every pixel there is on a row, the kernel will output a single value every two pixels, i.e. will output half the number of pixels on that row. Equivalently for the height, the kernel will completely discard half of the rows.
Here is the example of a 2D convolution of size 1x1 and stride 2 over a 6x6 input. On the left, the 1x1 patches in dark yellow are the successive positions of the kernel. On the right is the resulting image shaped 3x3.
Related
I have a dataset that provides bounding box coordinates in the following format.
height- 84 width- 81 x - 343 y - 510. Now, I want to normalize these values (0-1) to train them using the yolov5 model. I have looked online and found that I can normalize these values in 2 ways. Way 1:
Normalized(Xmin) = (Xmin+w/2)/Image_Width
Normalized(Ymin) = (Ymin+h/2)/Image_Height
Normalized(w) = w/Image_Width
Normalized(h) = h/Image_Height
Way 2: divide x_center and width by image width, and y_center and height by image height.
Now, I am not sure which way I should follow to normalize the values in the given dataset. Can anyone suggest me any solution? Also, the size of the given images in my dataset is 1024 x 1024. Now, if I convert the images in 512 x 512 size, how do I figure the new bounding box coordinates i.e what will be the value of height widht x and y?
First, Yolov5 will resize your images and bounding boxes for you, so you don't have to worry about that. By default, it will resize the longest side to 640px and the shortest side will be resized to a length that preserves the proportion of the original image.
About the normalization [0-1]. Yolov5 expects the center points of the bbox, not the minimum points, so if your box dimensions areheight = 84px and width = 81px and those x and y are the minimum points of the bbox (i'm not sure from your post), your formula works, because you're computing the center points:
Normalized(**x_center**) = (Xmin+w/2)/Image_Width
Normalized(**y_center**) = (Ymin+h/2)/Image_Height
...
About the resizing:
https://github.com/ultralytics/yolov5/discussions/7126#discussioncomment-2429260
I made an interface for a game, using extended viewport and when i resize the screen the aspect ratio changes and every element in scene is scales, but when this happens this is what i get :
This is the most annoying issue i dealt with, any advice ? I tried making the tower n times bigger and then just setting bigger world size for the viewport but same thing happens, idk what is this extra pixels on images..
I'm loading image from atlas
new TextureRegion(skin.getAtlas().findRegion("tower0"));
the atlas looks like this:
skin.png
size: 1024,1024
format: RGBA8888
filter: Nearest,Nearest
repeat: none
tower0
rotate: false
xy: 657, 855
size: 43, 45
orig: 43, 45
offset: 0, 0
index: -1
In the third picture, you are drawing your source image just slightly bigger than it's actual size in screen pixels. So there are some boundaries where extra pixels have to be filled in to make it fill its full on-screen size. Here are some ways to fix this.
Use linear filtering. For the best appearance, use MipMapLinearLinear for the min filter. This is a quick and dirty fix. The results might look slightly blurry.
Draw your game to a FrameBuffer that is sized to the same aspect ratio as you screen, but shrunk down to a size where your sprites will be drawn pixel perfect to their original scale. Then draw that FrameBuffer to the screen using an upsampling shader. There are some good ones you can find by searching for pixel upscale shaders.
The best looking option is to write a custom Viewport class that sizes your world width and height such that you will be always be drawing the sprites pixel perfect or at a whole number multiple. The downside here is that your world size will be inconsistent across devices. Some devices will see more of the scene at once. I've used this method in a game where the player is always traveling in the same direction, so I position the camera to show the same amount of space in front of the character regardless of world size, which keeps it fair.
Edit:
I looked up my code where I did option 3. As a shortcut, rather than writing a custom Viewport class, I used a StretchViewport, and simply changed its world width and height right before updating it in the game's resize() method. Like this:
int pixelScale = Math.min(
height / MIN_WORLD_HEIGHT,
width / MIN_WORLD_WIDTH);
int worldWidth = width / pixelScale;
int worldHeight = height / pixelScale;
stretchViewport.setWorldWidth(worldWidth);
stretchViewport.setWorldHeight(worldHeight);
stretchViewport.update(width, height, true);
Now you may still have rounding artifacts if your pixel scale becomes something that isn't cleanly divisible for both the screen width and height. You might want to do a bit more in your calculations, like round pixelScale off to the nearest common integer factor between screen width and height. The tricky part is picking a value that won't result in a huge variation in amounts of "zoom" between different phone dimensions, but you can quickly test this by experimenting with resizing a desktop window.
In my case, I merged options 2 and 3. I rounded worldWidth and worldHeight up to the nearest even number and used that size for my FrameBuffer. Then I draw the FrameBuffer to the screen at just the right size to crop off any extra from the rounding. This eliminates the possibility of variations in common factors. Quite a bit more complicated, though. Maybe someday I'll clean up that code and publish it.
The input image size of u-net is 572*572, but the output mask size is 388*388. How could the image get masked with a smaller mask?
Probably you are referring to the scientific paper by Ronneberger et al in which the U-Net architecture was published. There the graph shows these numbers.
The explanation is a bit hidden in section "3. Training" of the paper:
Due to the unpadded convolutions, the output image is smaller than the input by a constant border width.
This means that during each convolution, part of the image is "cropped" since the convolution will start in a coordinate so that it fully overlaps with the input-image / input-blob of the layer. In case of 3x3 convolutions, this is always one pixel at each side. For more a visual explanation of kernels/convolutions see e.g. here.
The output is smaller because due to the cropping occuring during unpadded convolutions only (the inner) part of the image gets a result.
It is not a general characteristic of the architecture, but something inherent to (unpadded) convolutions and can be avoided with padding. Probably the most common strategy is mirroring at the image borders, so that each convolution can start at the very edge of an image (and sees mirrored pixels in places where it's kernel overlaps). Then the input size can be preserved and the full image will be segmented.
I have a sprite that acts as a wall in a game that I am building. I would like to have the brick texture repeat itself instead of stretch to fit the height of the screen. I tried:
Texture lSideTexture = new Texture(Gdx.files.internal("wall.png"));
lSideTexture.setWrap(Texture.TextureWrap.Repeat, Texture.TextureWrap.Repeat);
lSideSprite = new Sprite(lSideTexture);
lSideSprite.setPosition(-50, -100 * (height/width) / 2);
lSideSprite.setSize(5,100 * (height/width));
But I am still getting a texture that has been stretched to fit the dimensions rather than repeated.
Any Ideas?
You also have to change the texture region of the Sprite to be bigger than the texture. So if you are drawing the sprite 5 times bigger than normal, you'd do this:
lsideSprite.setRegion(0, 0, 5f, 5f);
One "gotcha" here is that this method is overloaded to take all ints or all floats. If you use ints, then you specify the size in pixel dimensions. If you use floats, then you're specifying how many times you want it to repeat.
I am trying to 3d transform a floor tile pattern in flash, But when i do so the tile lines become dotted (dashed) here is the screenshot
The best solution is as LDMS said, thickening your lines (even if it is an image), or if you can, enable Anti Aliasing (which i think is what smoothing does)
As for why this happens, this is due to texture sampling. You will probably see that if you move your camera around the gaps/dots in the lines move. Now without going into too many details these are the basics:
Close to your camera the amount of pixel from your image that fit into a pixel on your screen will be less then 1, meaning that one pixel from your image is bigger then an actual pixel on your screen, so it will just display that color from the image. But what happens, if your image is so far away that multiple pixels from your image are so small that they combined fit into one pixel on your screen? With smoothing and Anti Aliasing you run an algorithm to combine colors and get en estimated result. But if you do not do this it will have to pick a color, say we have 2 pixels of black (your line) and 2 of the red background for the same pixel on screen, it will (randomly or based on some variable) pick a color and display it without regard of the other colors.
This is why you sometimes see your line and sometimes the background.