Stage3D, AGAL - vertices' and textures' coordinate systems - actionscript-3

I've been trying to work with more complicated shaders, and have run into issues with the coordinate systems used by the vertex shader and texture sampler. In short: they don't seem to make any sense, and when trying to test them I end up getting inconsistent results. To make matters worse, the internet has little in the way of documentation, and most of the information I've found seems to expect me to know how this works already. I was hoping someone could clarify the following:
The vertex shaders pass an (x, y, z) representing a location on the render target. What are acceptable values for x, y, and z?
How do x and y correspond to the width and height of the back buffer (assuming that it's the render target)?
How do x and y correspond to the width and height on an output texture (assuming that it's the render target)?
When x=0 and y=0 where does the vertex sit, location-wise?
The texture samplers sample a texture at a (u, v) coordinate. What are acceptable values for u and v?
How do u and v correspond with the width and height of the texture being sampled?
How do AGAL's wrap, clamp, and repeat flags alter sampling, and what is the default behavior when one isn't given?
when sampling at u=0 and v=0, which pixel is returned location-wise?
EDIT:
From my tests, I believe the answers are:
Unsure
-1 is left/bottom, 1 is right/top
Unsure
At the center of the output
Unsure
0 is left/bottom, 1 is right/top
Unsure
The far bottom-left of the texture

You normally use the coordinate system of your own and then multiply the position of each vertex by MVP (model-view-projection) matrix to get NDC coordinates that can be fed to GPU as an output of vertex shader. There is a nice article explaining all that for Stage3D.
Correct. And z is in range [0, 1]
Rendering to a render target is the same as rendering to backbuffer - you output NDC from your vertex shader so the real size of the texture is irrelevant.
Yup, center of the screen.
Normally, it`s [0, 1] but you can use values that go out of that range and then the output depends on texture wrap mode (like repeat or clamp) set on the sampler.
(0, 0) is left/top, (1, 1) is right/bottom.
Default one is repeat. Those modes decide what you will get when you sample using coordinate that is out of range of [0, 1]. With repeat [1.5, 1.5] will result in [0.5, 0.5] while [1.0, 1.0] will be the result if the mode is set to clamp.
Top-left pixel of the texture.

Related

How to use spatial transformer to crop the image in pytorch?

the paper of the spatial transformer network claims that it can be used to crop the image.
Given the crop region (top_left, bottom_right)=(x1,y1,x2,y2), how to interpret the region as a transformation matrix and crop the image in pytorch?
Here is a introduction about the spatial transformer network in torch (http://torch.ch/blog/2015/09/07/spatial_transformers.html), in the introduction, it visualize the bounding box where the transformer look at, How can we determine the bounding box given the transformation matrix?
[Edit]
I just found out the answer to the first question [given the crop region, find out a transformation matrix]
The image in the original post already provides a good answer, but it might be useful to provide some code.
Importantly, this method should retain gradients correctly. In my case I have a batch of y,x values that represent the center of the crop position (in the range [-1,1]). As for the values a and b, which are scale x and y values for the transformation, in my case I used 0.5 for each in combination with a smaller output size (half in width and height) to retain the original scale, i.e. to crop. You can use 1 to have no scale changes, but then there would be no cropping.
import torch.nn.functional as F
def crop_to_affine_matrix(t):
'Turns (N,2) translate values into (N,2,3) affine transformation matrix'
t = t.reshape(-1,1,2,1).flip(2) # flip x,y order to y,x
t = F.pad(t, (2,0,0,0)).squeeze(1)
t[:,0,0] = a
t[:,1,1] = b
return t
t = torch.zeros(5,2) # center crop positions for batch size 5
F.affine_grid(crop_to_affine_matrix(t), outsize)

How to crop features outside an image region using pytorch?

We can use ROI-Pool/ROI-Align to crop the sub-features inside an image region (which is a rectangle).
I was wondering how to crop features outside this region.
In other words, how to set values (of a feature map) inside a rectangle region to zero, but values outside the region remains unchanged.
I'm not sure that this idea of ROI align is quite correct. ROI pool and align are used to take a number of differently sized regions of interest identified in the original input space (i.e. pixel-space) and output a set of same-sized feature crops from the features calculated by (nominally) the convolutional network.
As perhaps a simple answer to your question, though, you simply need to create a mask tensor of ones with the same dimension as your feature maps, and set the values within the ROIs to zero for this mask, then multiply the mask by the feature maps. This will suppress all values within the ROIs. Creation of this mask should be fairly simple. I did it with a for-loop to avoid thinking but there's likely more efficient ways as well.
feature_maps # batch_size x num_feature maps x width x height
mask = torch.ones(torch.shape(feature_maps[0,0,:,:]))
for ROI in ROIs: # assuming each ROI is [xmin ymin xmax ymax]
mask[ROI[0]:ROI[2],ROI[1]:ROI[3]] = 0
mask = mask.unsqueeze(0).unsqueeze(0) # 1 x 1 x width x height
mask = mask.repeat(batch_size,num_feature_maps,1,1) # batch_size x num_feature maps x width x height
output = torch.mul(mask,feature_maps)

Using floor() function in GLSL when sampling a texture leaves glitch

Here's a shadertoy example of the issue I'm seeing:
https://www.shadertoy.com/view/4dVGzW
I'm sampling a texture by sampling from floor-ed texture coordinates:
#define GRID_SIZE 20.0
void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
vec2 uv = fragCoord.xy / iResolution.xy;
// Sample texture at integral positions
vec2 texUV = floor(uv * GRID_SIZE) / GRID_SIZE;
fragColor = texture2D(iChannel0, texUV).rgba;
}
The bug I'm seeing is that there are 1-2 pixel lines sometimes drawn between the grid squares.
Note that we've seen this issue not only using GLSL ES on the web, but also HLSL in Unity.
Is there something to do with floor() and floating point arithmetic that I don't understand?! I'd love to know not just how to fix it, but why exactly it's happening?
Turned out to be using a low detail mipmap from the texture for those grid lines. Changing the filter settings on the texture to "nearest" or "linear" filtering rather than "mipmap" fixed it.
It is also possible to call texture2D with a 3rd "bias" parameter that modifies which mipmap level it will use.
Thanks to the commenter on shadertoy for answering the question!
(Not sure how the mipmap level actually gets chosen for custom shaders without the bias, would be interested if someone knows!)

HTML5 Canvas - Zooming into a Point

So I know there are threads about it already here, like that one.
I followed the idea that was proposed in the thread above, and it works. However, I don't understand WHY it works.
Here is an example:
Let's say that i have a square centered at (100, 100), and its width/height is 100. So its top-left corner will be at (50, 50).
Now let's say that i want to zoom X2 into the square's center, that is, to zoom into (100, 100). So i will write the following transformation sequence:
translate(100, 100);
scale(2, 2);
translate(-100, -100);
So because the canvas apply the transformation in reverse order, my transformed square's top-left corner will be now at (0, 0), and its height/width will be 200.
Ok, let's say that now i want to zoom X2 into the right-bottom corner of the already transformed square. So intuitively, i would like to perform the following transformation sequence:
translate(200, 200);
scale(2, 2);
translate(-200, -200);
But it wont work, because again, the canvas apply transfomations in reverse order. That is to say, that if i sum up my two transformation sequences, i'll get:
// First Sequence
translate(100, 100);
scale(2, 2);
translate(-100, -100);
// Second Sequence
translate(200, 200);
scale(2, 2);
translate(-200, -200);
This means that the second sequence will be applied to each point before the first sequence (because the canvas will apply the transformation from bottom to top), and this is wrong. So as the thread in the link above suggest the following:
Because sequence 2 will be applied first, i should transform the point (200, 200) to its original coordinates, by applying to it the inverse of the first sequence. that is to say, if T1 is the matrix that represents the first sequence, then it will look like that:
// First Sequence
translate(100, 100);
scale(2, 2);
translate(-100, -100);
// Second Sequence
var point = SVGPoint(200, 200);
var transformedPoint = point.matrixTransform(T1.inverse());
translate(-transformedPoint.x, -transformedPoint.y);
scale(2, 2);
translate(transformedPoint.x, transformedPoint.y);
But why it works? I really don't understand why it should work like that... can anyone elaborate about it?
Thanks!
The HTML5 canvas transformations happen top-down, not bottom-up as you believe. The reason for the distinction is because the transformations applied to the canvas affect the coordinate system, not your logical coordinates.
Translating by translate(100, 100) will move your coordinate system right and down, which appears hauntingly similar to moving your logical coordinate up and left.
Let's take the first sequence (I have changed your use of transform to translate):
translate(100, 100);
scale(2, 2);
translate(-100, -100);
Naturally, when we think to scale an object from it's center, we translate the object to (0,0), scale the object, then move the object back. The above code, when read in reverse, would appear to do that. However, that's not the case.
When we read the above code from top-down, it says (assume we start with an identity transform):
Move the context's (0,0) right 100 units and down 100 units. This takes it to the canvas's (100,100) location.
Make the coordinate system 2x bigger.
Move the context's (0,0) left 100 units and up 100 units, essentially returning it to it's original location (in context coordinate space, not canvas space).
The scaling happens relative to the context's (0,0) point, which is at (100,100) on the canvas.
If we were to now add your second sequence:
translate(200, 200);
scale(2, 2);
translate(-200, -200);
This will:
Move the context's (0,0) to the coordinate system's (200,200) location.
Make the coordinate system 2x bigger than it already was.
Return the context's (0,0) back to where it was previously (in context coordinate space, not canvas space).
As you've found out, that does not give you what you are expecting because (200,200) is not the point about which you want to scale. Remember, all units are relative to the context coordinate system. So we need to convert the canvas location of (200,200) to the context coordinate location of (150,150) which is the original bottom-right corner of our rectangle.
So we change sequence #2 to be:
translate(150, 150);
scale(2, 2);
translate(-150, -150);
This gives us what we are expecting (to zoom in on the bottom-right corner of the rectangle). That's why we do the inverse-transform.
In the demo application, when the app zoom's in, it's taking the coordinate in canvas units where the user's mouse was, inverse-transforming that using the context transformation thus-far, to get the location in context coordinate space that was clicked on. The context origin is moved to that location, zoomed, then returned to it's previous location.
References:
Safari HTML5 Canvas Guide: Translation, Rotation, and Scaling
You seem to be way overthinking transforms!
Here’s the simple rule:
If you apply any set of transforms, then you must undo all of them if you want to get back to your untransformed state.
Period !!!!
So let say you do these 4 transforms:
Do #1. context.translate(100,100);
Do #2. context.scale(2,2);
Do #3. context.translate(-20,50);
Do #4. context.scale(10,10);
To get back to your original untransformed state, you must undo in exactly reverse order:
Undo #4: context.scale( 0.10, 0.10 ); // since we scaled 10x, we must unscale by 0.10
Undo #3: context.translate(20,-50);
Undo #2: context.scale( 0.50, 0.50 ); // since we scaled 2x, we must unscale by 0.50
Undo #1: context.translate(-100,-100);
Think of it like walking to your friends house.
You turn Right + Left + Right.
Then to go home you must reverse that: Left + Right + Left
You must undo your walking path in exactly the reverse of your original walk.
That’s how transforms work too.
That’s Why !!

Camera Calibration Matrix how to?

With this toolbox I was performing calibration of my camera.
However the toolbox outputs results in matrix form, and being a noob I don't really understand mathy stuff.
The matrix is in the following form.
Where R is a rotation matrix, T is a translation vector.
And these are the results I got from the toolbox. It outputs values in pixels.
-0.980755 -0.136184 -0.139905 217.653207
0.148552 -0.055504 -0.987346 995.948880
0.126695 -0.989128 0.074666 371.963957
0.000000 0.000000 0.000000 1.000000
Using this data can I know how much my camera is rotated and distance of it from the calibration object?
The distance part is easy. The translation from the origin is given by the first three numbers in the rightmost column. This represents the translation in the x, y, and z directions respectively. In your example, the camera's position p = (px, py, pz) = (217.653207, 995.948880, 371.963957). You can take the Euclidean distance between the camera's location and the location of the calibration object (cx, cy, cz). That is it would just be sqrt( (px-cx)2 + (py-cy)2 + (pz-cz)2 )
The more difficult part regards the rotation which is captured in the upper left 3x3 elements of the matrix. Without knowing exactly how they arrived at this, you're somewhat out of luck. That is, it's not easy to convert that back to Euler Angles, if that's what you want. However, you can transform those elements into a Quaternion Rotation which will give you the unique unit vector and angle to rotate the camera to that orientation. The specifics of the computation are provided here. Once you have the Quaternion rotation, you can easily apply it to the vectors n = (0, 0, 1), up = (0, 1, 0) and right = (1, 0, 0) to get the normal (direction the camera is pointed), up and right vectors. The right vector is only useful if you are interested in slewing the camera left or right from its current position.
I'm guessing the code uses the 'standard' formation - then you will find more details in the opencv library docs or their book.