CUDA tridiagonal solver function (cusparse)

CUDA tridiagonal solver function (cusparse) - cuda

In my CUDA code I am using cusparse<t>gtsv() function (more precisely, cusparseZgtsv and cusparseZgtsvStridedBatch ones).
In the documentaion it is said, that this function solves the equation A*x=alpha *B. My question is - what is alpha? I didn't find it as an input parameter. I have no idea how to specify it. Is it always equals to 1?

I performed some testing (solved some random systems of equations where tridiagonal matrices were always diagonally dominant and checked my solution using direct matrix by vector multiplication).
It looks like in the current version alpha = 1 always, so one can just ignore it. I suspect that it will be added as an input parameter in future releases.

Related

PyTorch clip_grad_norm vs clip_grad_norm_, what is the differece when it has underline?

When coding PyTorch in torch.nn.utils I see two functions, clip_grad_norm and clip_grad_norm_.
I want to know the difference so I went to check the documentation but when I searched I only found the clip_grad_norm_ and not clip_grad_norm.
So I'm here to ask if anyone knows the difference.

Pytorch uses the trailing underscore convention for in-place operations. So the difference is that the one with an underscore modifies the tensor in place and the other one leaves the original tensor unmodified and returns a new tensor.

Standard error of absorved fixed effect // Run regression with noninteger factor variable

I have a regression that I can run for example as
reghdfe y, a(x1_est=x1 x2_est=x2)
which will store the estimated coefficients in x1_est and x2_est. Now, the issue is that using absorb() does not allow me to get the standard errors for these coefficients. If I understand it correctly, no postestimation method of reghdfe allows me to retrieve those.
Luckily, I only care about the standard errors of x1. So, I could instead run
reg y i.x1, a(x2)
and inspect _se[x1]. Unfortunately, x1 has so many different levels that it is not possible to store it as integer, it has to be double. The previous regression hence will fail with x1: factor variables may not contain noninteger values.
What could be another approach to get standard errors for x1?

With large number of fixed effects, STATA's default approaches won't work. One angle is to bootstrap fixed effects and generate standard errors. Again, the issue is that there are so many FE, such that standard bootstrapping methods won't work (cannot return such a large matrix in each bootstrap).
Essentially, to bootstrap the FE, one would (for a large number of iterations)
preserve
bsample
run the regression, reghdfe y, a(x1_est=x1 x2_est-x2)
Store x1_est in a .dta file
restore
After the loop is done, iteratively append all the .dta files, and compute standard errors.

How to Solve non-specific non-linear equations?

I am attempting to fit a circle to some data. This requires numerically solving a set of three non-linear simultaneous equations (see the Full Least Squares Method of this document).
To me it seems that the NEWTON function provided by IDL is fit for solving this problem. NEWTON requires the name of a function that will compute the values of the equation system for particular values of the independent variables:
FUNCTION newtfunction,X
RETURN, [Some function of X, Some other function of X]
END
While this works fine, it requires that all parameters of the equation system (in this case the set of data points) is hard coded in the newtfunction. This is fine if there is only one data set to solve for, however I have many thousands of data sets, and defining a new function for each by hand is not an option.
Is there a way around this? Is it possible to define functions programmatically in IDL, or even just pass in the data set in some other manner?

I am not an expert on this matter, but if I were to solve this problem I would do the following. Instead of solving a system of 3 non-linear equations to find the three unknowns (i.e. xc, yc and r), I would use an optimization routine to converge to a solution by starting with an initial guess. For this steepest descent, conjugate gradient, or any other multivariate optimization method can be used.
I just quickly derived the least square equation for your problem as (please check before use):
F = (sum_{i=1}^{N} (xc^2 - 2 xi xc + xi^2 + yc^2 - 2 yi yc + yi^2 - r^2)^2)
Calculating the gradient for this function is fairly easy, since it is just a summation, and therefore writing a steepest descent code would be trivial, to calculate xc, yc and r.
I hope it helps.

It's usual to use a COMMON block in these types of functions to pass in other parameters, cached values, etc. that are not part of the calling signature of the numeric routine.

How to compute the maximum of a function?

How to compute the maximum of a smooth function defined on [a,b] in Fortran ?
For simplicity, a polynomial function.
The background is that almost all numerical flux(a concept in numerical PDE) involves computing the maximum of certain function over an interval [a,b].

For a 1-D problem with smooth and readily-computed derivatives, use Newton-Raphson to find zeros of the first derivative.
For multiple dimensions, and readily-computed derivatives, you're better off using a method that approximates the Hessian. There are several methods of this type, but I've found the L-BFGS method to be reliable and efficient. There a convenient, BSD-licensed package provided by a group at Northwestern University. There's also quite a bit of well-tested code at http://www.netlib.org/

Invert 4x4 matrix - Numerical most stable solution needed

I want to invert a 4x4 matrix. My numbers are stored in fixed-point format (1.15.16 to be exact).
With floating-point arithmetic I usually just build the adjoint matrix and divide by the determinant (e.g. brute force the solution). That worked for me so far, but when dealing with fixed point numbers I get an unacceptable precision loss due to all of the multiplications used.
Note: In fixed point arithmetic I always throw away some of the least significant bits of immediate results.
So - What's the most numerical stable way to invert a matrix? I don't mind much about the performance, but simply going to floating-point would be to slow on my target architecture.

Meta-answer: Is it really a general 4x4 matrix? If your matrix has a special form, then there are direct formulas for inverting that would be fast and keep your operation count down.
For example, if it's a standard homogenous coordinate transform from graphics, like:
[ux vx wx tx]
[uy vy wy ty]
[uz vz wz tz]
[ 0 0 0 1]
(assuming a composition of rotation, scale, translation matrices)
then there's an easily-derivable direct formula, which is
[ux uy uz -dot(u,t)]
[vx vy vz -dot(v,t)]
[wx wy wz -dot(w,t)]
[ 0 0 0 1 ]
(ASCII matrices stolen from the linked page.)
You probably can't beat that for loss of precision in fixed point.
If your matrix comes from some domain where you know it has more structure, then there's likely to be an easy answer.

I think the answer to this depends on the exact form of the matrix. A standard decomposition method (LU, QR, Cholesky etc.) with pivoting (an essential) is fairly good on fixed point, especially for a small 4x4 matrix. See the book 'Numerical Recipes' by Press et al. for a description of these methods.
This paper gives some useful algorithms, but is behind a paywall unfortunately. They recommend a (pivoted) Cholesky decomposition with some additional features too complicated to list here.

I'd like to second the question Jason S raised: are you certain that you need to invert your matrix? This is almost never necessary. Not only that, it is often a bad idea. If you need to solve Ax = b, it is more numerically stable to solve the system directly than to multiply b by A inverse.
Even if you have to solve Ax = b over and over for many values of b, it's still not a good idea to invert A. You can factor A (say LU factorization or Cholesky factorization) and save the factors so you're not redoing that work every time, but you'd still solve the system each time using the factorization.

You might consider doubling to 1.31 before doing your normal algorithm. It'll double the number of multiplications, but you're doing a matrix invert and anything you do is going to be pretty tied to the multiplier in your processor.
For anyone interested in finding the equations for a 4x4 invert, you can use a symbolic math package to resolve them for you. The TI-89 will do it even, although it'll take several minutes.
If you give us an idea of what the matrix invert does for you, and how it fits in with the rest of your processing we might be able to suggest alternatives.
-Adam

Let me ask a different question: do you definitely need to invert the matrix (call it M), or do you need to use the matrix inverse to solve other equations? (e.g. Mx = b for known M, b) Often there are other ways to do this w/o explicitly needing to calculate the inverse. Or if the matrix M is a function of time & it changes slowly then you could calculate the full inverse once, & there are iterative ways to update it.

If the matrix represents an affine transformation (many times this is the case with 4x4 matrices so long as you don't introduce a scaling component) the inverse is simply the transpose of the upper 3x3 rotation part with the last column negated. Obviously if you require a generalized solution then looking into Gaussian elimination is probably the easiest.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

CUDA tridiagonal solver function (cusparse) - cuda

Related

PyTorch clip_grad_norm vs clip_grad_norm_, what is the differece when it has underline?

Standard error of absorved fixed effect // Run regression with noninteger factor variable

How to Solve non-specific non-linear equations?

How to compute the maximum of a function?

Invert 4x4 matrix - Numerical most stable solution needed

Categories

Resources