I need to perform object detection using deep learning on "huge" images, say 10000x10000 pixels.
At some point in the workflow, I need to resize the images down to something more manageable say 640x640. At the moment, I am achieving this using opencv :
import cv2
img = cv2.imread("some/path/to/my/img")
h, w = 640, 640
img = cv2.resize("some/path/to/my/img_resized", (w,h))
Now, when I am trying to look at some of these pictures (e.g. to check my bounding boxes are well-defined) with my human eye, I "can't see anything" in the sense that the resize is so aggressive that the image is heavily pixelated.
Does this cause an issue for the training of the algorithm ? Because in the end, I can get back the bounding boxes output by the model back to the original image (100000x10000px) using some transform. That is not an issue. But I can't tell if working on such pixelated images during training causes something to go wrong ?
It really depends what information is lost during the resizing. From 10000x10000 to 640x640 I would assume almost everything relevant is lost making the problem a lot harder if even solvable at all.
If you can't solve the problem (seeing the objects in the resized image) it is a very bad starting point to solve the problem with a neural network. I would still try and see if the network does anything.
It probably won't work good. An easy approach trying to solve this is splitting up the initial image in patches and do the detection on them and combine the results. This can work but depending on the problem might not be sufficient.
If this is not sufficient for your problem you might wanna do some state of the art research and try to find someone with a similar problem. I know that medical images also can be quite big. Also people dealing with satellite images might have the same problem of very big input images and maybe came up with ways to solve this.
Related
I've implemented the original UNet-Architecture from its paper. It works with 572x572 images and predicts 388x388 images (original paper is without padding). I have used this network for another task which has 2048x1024 images as input to create the same sized (2048x1024) target. This fails, because the image size doesnt agree with the network architecture. So I saw on github a code which sets padding = 1 for all convolutions and everything works. Fine.
My question: Is that a common thing? "Randomly (maybe experimentally is better)" tweakin padding or stride parameters until the dimensions fit? But then it isn't the original UNet anymore, right?
I am glad for any advices, because I want to learn a good way for using existing networks in different challenges.
Best
Ok I am developing a game using Libgdx and everything was fine until I noticed that when my character moves, its texture becomes slightly blurred at the edges, as if it is blending in with the background in the android version. There seems to be a slight graininess to the image as well which becomes more noticeable when it moves. I THINK these problems might be related to texture filtering, but wanted to come on and ask here to see if anyone had experienced this before. Any insight appreciated.
EDIT: Here is an the image, you can see it's quite grainy on the whole. When it moves the graininess seems move, it is hard to describe, but it looks as if the pixels on the eater are moving relative to each other slightly.
The way I draw it is pretty straight forward the following is called in my render method:
idleTexture = new Texture(Gdx.files.internal("images/GreenGuy.png"));
spriteBatch.draw(idleTexture, x, y*ppuY,width*ppuX,height*ppuY);
All other work I look at does what I am doing, but I am not totally happy with the results. I have tried changing between linear and nearest filtering but this did not change anything. I have also tried just using a smaller image instead of shrinking a large one, but I still get the same problem of the pixels not remaining relatively static while the character is moving.
I'm researching the following problem:
Let's say I have a glass of some fluid (water for example). The fluid is completely transparent and I don't have to render it at all.
However a ink drop is dropped in the glass and it's spreading in the water.
The whole thing should be 3D and user should be able to rotate the camera and see the spreading in real time.
I have researched a couple of way to approach this problem, but it turned out that most of them are dead end.
Тhe only approach that has some success was to use enormous amount of particles which form the skeleton of the "inc spread". The physics simulation of the process of spreading is far form perfect, but let's say it's not a problem.
The problem is the rendering part.
As far as I know I'll not be able to speed up the z-sort process greatly by using the flash GPU acceleration, because the upload of those particles to the GPU memory every frame is quite slow?
Can somebody confirm that please?
The other thing that I'm struggling with is the final render. I tried a whole bunch of filters in combination with "post process" techniques to create smooth lines and gradients between the dots, but the result it terrible. If somebody know some article that could help me with that I'll be very grateful.
Overall if there is another viable approach tho the problem please let me know.
Thanks in advance.
Cheers.
You should probably look at Computational Fluid Dynamics in general to get a basic understanding. This should make it easy to play with actionscript implementations like Eugene's Fluid Solver, either in 2D or 3D, tweaking fluid properties to get the look and feel you're after
Sorry for the maybe-trivial post, but I really cannot figure it out...
Let's suppose you have some 3d glasses or something that allows you a 3d stereo vision.
What happens if you invert left and right image??? Thinking at it I cannot really figure out it. Should you see the reverse of the image? Or just some axis-shift?
Unfortunately I cannot try it out in any way, but even if possible, I'd love to try to figure out and understand the thing with my mind before trying it.
So, please, any help, any idea, any hit that can help me to understand or to deeply discuss are welcome.
For the human brain it's next to impossible to give a formal answer, because frankly, neurologists still don't fully understand how it works in detail. But so much we know:
Our brain does no absolute "measurement" on the parallax in stereo images. The whole depth perception works on parallax differences. You could say, the brain takes the derivative of the parallax to build it's mental representation of depth. Derivative of Parallax and depth are taken to be (nearly) proportional. By swapping the pictures the derivative gets negative, so at every point the brain sums up depth in the wrong direction.
However parallax is not the only source for depth perception. Of similar importance is experienced knowledge about typical object in the world. For example faces are "known" to be never inside out, so even with negative parallax the knowledge will overrule and the face being percept in the right form (however it'll clash hilariously with the surroundings).
You would see it as "inside-out" (it's a little more complex than that, but that's the basic truth).
You can experience 3D without any special hardware, thanks to stereoscopic images (side-by-side, then crossing your eyes to see the images as one unique image).
You can then switch right to left, left to right by editing the image.
Here is an example with an image I've found on the web: https://imgur.com/a/ov7U7N5
Do you feel any difference in the depth? Do you see things inside out?
I believe the sense of depth in this case is preserved. But maybe it's just me.
I am developing a PV3D application that imports DAE models exported by Blender's Collada Exporter plugin (1.4). When I build them in Blender, I use exact dimensions (the end-game is to have scale models in PV3D).
Using the same scale of dimensions, some models appear in PV3D extremely tiny, while others are the appropriate size. Many appear with rotations bearing no resemblance to how they were constructed in Blender. Also, I have to flip the normals in Blender in order to get them to display properly in PV3D, and even then, occasional triangles will appear in PV3D with normals still reversed. I can't seem to discern a pattern amongst which models appear tiny. Same goes for the randomly flipping normals - I there doesn't seem to be a pattern to it.
Has anyone had any experience with a problem like this? I can't even think of how to tackle it - the symptoms seem to point to something with the way PV3D handles the import, or how Blender handles the export, and the 3D math is way beyond me.
I had a similar problem with the normals, I found that after applying scale/rotation to objdata (I had to make it single user first) the normals were facing in the direction which corresponded to what I was seeing in papervision.
This should fix your scaling issues too.
I finally found the source of the problem a while back, and just remembered I should update this post.
Turns out, the normals weren't being flipped. My models contained relative acute angles and sharp, flat projections (think a low grade ramp). When viewed from certain angles, the z-sorting (which sorts by object center by default) was incorrectly sorting the faces because the acute angles and flat, sharp projections caused the poly's center to be farther away than another poly's center behind it.
The effect was consistent from all my view angles because the camera was restricted to a single, fixed orbit around the models, so the same thing happened in reverse from the other side of the model, making it appear like the normals were flipped.
As for the scale issues - I never figured that out. I moved to Sketchup for my model creation, and that seemed to solve it.