How does opengl project a 3d point to screenspace? - language-agnostic

I am trying to emulate a subset of opengl with my own software rasterizer.
I'm taking a wild guess that the process looks like this:
Multiply the 3d point by the modelview matrix -> multiply that result by the projection matrix
Is this correct?
Also what size is the projection matrix and how does it work?

The point is multiplied by the modelview matrix and then with projection matrix. The resultant is normalized and then multiplied with viewport matrix to get the screen coordinates. All matrices are 4X4 matrix. You can view this link for further details.
http://www.songho.ca/opengl/gl_transform.html#example2

(shameless self-promotion, sorry) I wrote a tutorial on the subject :
http://www.opengl-tutorial.org/beginners-tutorials/tutorial-3-matrices/
There is a slight caveat that I don't explain, though. At the end of the tutorial, you're in Normalized Device Coordinates, i.e. -1 to +1. A simple linear mapping transorms this to [0-screensize].

You might also benefit from looking at the gluProject() code. This takes an x, y, z point in object coordinates as well as pointers to modelView, projection, and viewport matrices and tells you what the x, y, (z) coordinates are in screenspace (the z is a value between 0 and 1 that can be used in the depth buffer). All three matrix multiplications are shown there in the code, along with the divisions necessary for perspective.

Related

Why W_q matrix in torch.nn.MultiheadAttention is quadratic

I am trying to implement nn.MultiheadAttention in my network. According to the docs,
embed_dim  – total dimension of the model.
However, according to the source file,
embed_dim must be divisible by num_heads
and
self.q_proj_weight = Parameter(torch.Tensor(embed_dim, embed_dim))
If I understand properly, this means each head takes only a part of features of each query, as the matrix is quadratic. Is it a bug of realization or is my understanding wrong?
Each head uses a different part of the projected query vector. You can imagine it as if the query gets split into num_heads vectors that are independently used to compute the scaled dot-product attention. So, each head operates on a different linear combination of the features in queries (and keys and values, too). This linear projection is done using the self.q_proj_weight matrix and the projected queries are passed to F.multi_head_attention_forward function.
In F.multi_head_attention_forward, it is implemented by reshaping and transposing the query vector, so that the independent attentions for individual heads can be computed efficiently by matrix multiplication.
The attention head sizes are a design decision of PyTorch. In theory, you could have a different head size, so the projection matrix would have a shape of embedding_dim × num_heads * head_dims. Some implementations of transformers (such as C++-based Marian for machine translation, or Huggingface's Transformers) allow that.

How to calculate the gradient for a custom deep learning layer that takes a 1X1 neuron as input and multiplies it by a constant NXN matrix?

I want to create a custom deep learning layer that takes as input a 1X1 neuron and uses it to scale a constant, predefined NXN matrix. I do not understand how to calculate the gradient for this layer.
I understand that in this case dLdZ is NXN and dLdX should be 1X1, and I don't understand what dZdX should be to satisfy that, it's obviously not a simple chained derivative where dLdX = dLdZ*dZdX since the dimensions don't match.
The question is not really language depenedent, I write here in Matlab.
%M is the constant NXN matrix
%X is 1X1X1Xb
Z = zeros(N,N,1,b);
for i = 1:b
Z(:,:,:,i) = squeeze(X(:,:,1,i))*M;
end
==============================
edit: the answer I got was very helpful. I now perform the calculation as follows:
dLdX = zeros(1,1,1,b);
for i = 1:b
dLdX(:,:,:,i) =sum(sum(dLdZ(:,:,:,i).*M)));
end
This works perfectly. Thanks!!
I think ur question is a little unclear. I will assume ur goal is to propagate the gradients through ur above defined layer to the batch of scalar values. Let me answer according to how I understand it.
U have parameter X, which is a scalar and of dimension b (b: batch_size). This is used to scale a constant matrix Z, which is of dimension NxN. Lets assume u calculate some scalar loss L immediately from the scaled matrix Z' = Z*X, where Z' is of dimension bxNxN.
Then u can calculate the gradients in X according to:
dL/dX = dL/dZ * dZ/dX --> Note that the dimensions of this product indeed match (unlike ur initial impression) since dL/dZ' is bxNxN and dZ'/dX is bxNxN. Summing over the correct indeces yields dL/dX which is of dimension b.
Did I understand u correct?
Cheers

DM Script, why does the fourier transform of gaussian-kenel needs modulus

Recently I learn DM_Script for TEM image processing
I needed Gaussian blur process and I found one whose name is 'Gaussian Blur' in http://www.dmscripting.com/recent_updates.html
This code implements Gaussian blur algorithm by multiplying the fast fourier transform(FFT) of source image by the FFT of Gaussian-kernel image and finally doing inverse fourier transform of it.
Here is the part of the code,
// Carry out the convolution in Fourier space
compleximage fftkernelimg:=realFFT(kernelimg) (-> FFT of Gaussian-kernel image)
compleximage FFTSource:=realfft(warpimg) (-> FFT of source image)
compleximage FFTProduct:=FFTSource*fftkernelimg.modulus().sqrt()
realimage invFFT:=realIFFT(FFTProduct)
The point I want to ask is this
compleximage FFTProduct:=FFTSource*fftkernelimg.modulus().sqrt()
Why does the FFT of Gaussian-kernel need '.modulus().sqrt()' for the convolution?
It is related to the fact that the fourier transform of a Gaussian function becomes another Gaussian function?
Or It is related to a sort of limitation of discrete fourier transform?
Please answer me
Thanks
This is related to the general precision limitation of any floating point numeric computing. (see f.e. here, or more in depth here)
A rotational (real-valued) Gaussian of stand.dev. sigma should be transformed into a 100% real-values rotational Gaussioan of 1/sigma. However, doing this numerically will show you deviations: Just try the following:
number sigma = 30
number A0 = 1
realimage first := RealImage( "First", 8, 256, 256 )
first = A0 * exp( - (iradius**2/(2*sigma*sigma) ))
first.showimage()
complexImage second := FFT(first)
second.Showimage()
image nonZeroImaginaryMask = ( 0 != second.Imaginary() )
nonZeroImaginaryMask.Showimage()
nonZeroImaginaryMask.SetLimits(0,1)
When you then multiply these complex images (before back-transferring) you are introducing even more errors. By using modulus, one ensures that the forward transformed kernel is purely real and hence a better "damping" curve.
A better implementation of a FFT filtering code would actually create the FFT(Gaussian) directly with a std.dev of 1/sigma, as this is the analytically correct result. Doing a FFT of the kernel only makes sense if the kernel (or its FFT) is not analytically known.
In general: When implementing any "maths" into a program code, it can pay hugely to think it through with numerical computation limits in the back of your head. Reduce actual computation whenever possible (i.e. compute analytically and use the result instead of relying on brute force numerical computation) and try to "reshape" equations when possible, f.e. avoid large sums over many small numbers, be careful about checks against exact numeric values, try to avoid expressions which are very sensitive on small numerica errors etc.

Determination of formula for a 3 independent variable issue

I have 3 arrays of X, Y and Z. Each have 8 elements. Now for each possible combination of (X,Y,Z) I have a V value.
I am looking to find a formula e.g. V=f(X,Y,Z). Any idea about how that can be done?
Thank you in advance,
Astry
You have a function sampled on a (possibly nonuniform) 3D grid, and want to evaluate the function at any arbitrary point within the volume. One way to approach this (some say the best) is as a multivariate spline evaluation. https://en.wikipedia.org/wiki/Multivariate_interpolation
First, you need to find which rectangular parallelepiped contains the (x,y,z) query point, then you need to interpolate the value from the nearest points. The easiest thing is to use trilinear interpolation from the nearest 8 points. If you want a smoother surface, you can use quadratic interpolation from 27 points or cubic interpolation from 64 points.
For repeated queries of a tricubic spline, your life would be a bit easier by preprocessing the spline to generate Hermite patches/volumes, where your sample points not only have the function value, but also its derivatives (∂/∂x, ∂/∂y, ∂/∂z). That way you don't need messy code for the boundaries at evaluation time.

Invert 4x4 matrix - Numerical most stable solution needed

I want to invert a 4x4 matrix. My numbers are stored in fixed-point format (1.15.16 to be exact).
With floating-point arithmetic I usually just build the adjoint matrix and divide by the determinant (e.g. brute force the solution). That worked for me so far, but when dealing with fixed point numbers I get an unacceptable precision loss due to all of the multiplications used.
Note: In fixed point arithmetic I always throw away some of the least significant bits of immediate results.
So - What's the most numerical stable way to invert a matrix? I don't mind much about the performance, but simply going to floating-point would be to slow on my target architecture.
Meta-answer: Is it really a general 4x4 matrix? If your matrix has a special form, then there are direct formulas for inverting that would be fast and keep your operation count down.
For example, if it's a standard homogenous coordinate transform from graphics, like:
[ux vx wx tx]
[uy vy wy ty]
[uz vz wz tz]
[ 0 0 0 1]
(assuming a composition of rotation, scale, translation matrices)
then there's an easily-derivable direct formula, which is
[ux uy uz -dot(u,t)]
[vx vy vz -dot(v,t)]
[wx wy wz -dot(w,t)]
[ 0 0 0 1 ]
(ASCII matrices stolen from the linked page.)
You probably can't beat that for loss of precision in fixed point.
If your matrix comes from some domain where you know it has more structure, then there's likely to be an easy answer.
I think the answer to this depends on the exact form of the matrix. A standard decomposition method (LU, QR, Cholesky etc.) with pivoting (an essential) is fairly good on fixed point, especially for a small 4x4 matrix. See the book 'Numerical Recipes' by Press et al. for a description of these methods.
This paper gives some useful algorithms, but is behind a paywall unfortunately. They recommend a (pivoted) Cholesky decomposition with some additional features too complicated to list here.
I'd like to second the question Jason S raised: are you certain that you need to invert your matrix? This is almost never necessary. Not only that, it is often a bad idea. If you need to solve Ax = b, it is more numerically stable to solve the system directly than to multiply b by A inverse.
Even if you have to solve Ax = b over and over for many values of b, it's still not a good idea to invert A. You can factor A (say LU factorization or Cholesky factorization) and save the factors so you're not redoing that work every time, but you'd still solve the system each time using the factorization.
You might consider doubling to 1.31 before doing your normal algorithm. It'll double the number of multiplications, but you're doing a matrix invert and anything you do is going to be pretty tied to the multiplier in your processor.
For anyone interested in finding the equations for a 4x4 invert, you can use a symbolic math package to resolve them for you. The TI-89 will do it even, although it'll take several minutes.
If you give us an idea of what the matrix invert does for you, and how it fits in with the rest of your processing we might be able to suggest alternatives.
-Adam
Let me ask a different question: do you definitely need to invert the matrix (call it M), or do you need to use the matrix inverse to solve other equations? (e.g. Mx = b for known M, b) Often there are other ways to do this w/o explicitly needing to calculate the inverse. Or if the matrix M is a function of time & it changes slowly then you could calculate the full inverse once, & there are iterative ways to update it.
If the matrix represents an affine transformation (many times this is the case with 4x4 matrices so long as you don't introduce a scaling component) the inverse is simply the transpose of the upper 3x3 rotation part with the last column negated. Obviously if you require a generalized solution then looking into Gaussian elimination is probably the easiest.