Suppose I have two 5000 x 1000 matrices, A and B. Will octave compute trace(A*B') efficiently, i.e. in a way that only requires 5000 inner products as opposed to 5000*5000 inner products most of which will not be used?
And, what if the argument to trace is more complicated, i.e.: trace(A*B' + C*D')? Does that change anything?
trace(A*B') will compute the complete matrix product before using trace().
A more efficient approach would be sum(sum(A.*conj(B),2)). The inner sum computes the diagonal of the resulting matrix.
A probably even more efficient approach would be doing both sums in one step via `sum((A.*conj(B))(:)).
trace(A*B' + C*D') would be computed efficiently by sum((A.*conj(B) + C.*conj(D))(:)).
No, the product will be evaluated before the call to trace(). One efficient implementation would be to manually compute only the diagonal terms of that matrix multiply and then sum them sum(sum(diag(A) .* diag(B))); for your second example sum(sum(diag(A) .* diag(B) + diag(C) .* diag(D)))
Note that you can shorten both expressions slightly and possibly gain a bit of speed at the loss of readability and Matlab compatibility like so: sum((diag(A) .* diag(B))(:));
Related
So, I have a vector that corresponds to a given feature (same dimensionality). Is there a package in Julia that would provide a mathematical function that fits these data points, in relation to the original feature? In other words, I have x and y (both vectors) and need to find a decent mapping between the two, even if it's a highly complex one. The output of this process should be a symbolic formula that connects x and y, e.g. (:x)^3 + log(:x) - 4.2454. It's fine if it's just a polynomial approximation.
I imagine this is a walk in the park if you employ Genetic Programming, but I'd rather opt for a simpler (and faster) approach, if it's available. Thanks
Turns out the Polynomials.jl package includes the function polyfit which does Lagrange interpolation. A usage example would go:
using Polynomials # install with Pkg.add("Polynomials")
x = [1,2,3] # demo x
y = [10,12,4] # demo y
polyfit(x,y)
The last line returns:
Poly(-2.0 + 17.0x - 5.0x^2)`
which evaluates to the correct values.
The polyfit function accepts a maximal degree for the output polynomial, but defaults to using the length of the input vectors x and y minus 1. This is the same degree as the polynomial from the Lagrange formula, and since polynomials of such degree agree on the inputs only if they are identical (this is a basic theorem) - it can be certain this is the same Lagrange polynomial and in fact the only one of such a degree to have this property.
Thanks to the developers of Polynomial.jl for leaving me just to google my way to an Answer.
Take a look to MARS regression. Multi adaptive regression splines.
I'd like to know how to plot power series (whose variable is x), but I don't even know where to start with.
I know it might not be possible plot infinite series, but it'd do as well plotting the sum of the first n terms.
Gnuplot has a sum function, which can be used inside the using statement to sum up several columns or terms. Together with the special file name + you can implement power series.
Consider the exponention function, which has a power series
\sum_{n=0}^\infty x^n/n!
So, we define a term as
term(x, n) = x**n/n!
Now we can plot the power series up to the n=5 term with
set xrange [0:4]
term(x, n) = x**n/n!
set samples 20
plot '+' using 1:(sum [n=0:5] term($1, n))
To plot the results when using 2 to 7 terms and compare it with the actual exp function, use
term(x, n) = x**n/n!
set xrange [-2:2]
set samples 41
set key left
plot exp(x), for [i=1:6] '+' using 1:(sum[t=0:i] term($1, t)) title sprintf('%d terms', i)
The easiest way that I can think of is to generate a file that has a column of x-values and a column of f(x) values, then just plot the table like you would any other data. A power series is continuous, so you can just connect the dots and have a fairly accurate representation (provided your dots are close enough together). Also, when evaluating f(x), you just sum up the first N terms (where N is big enough). Big enough means that the sum of the rest of the terms is smaller than whatever error you allow. (*If you want 3 good digits, then N needs to be large enough that the remaining sum is smaller than .001.)
You can pull out a calc II textbook to determine how to bound the error on the tail of the sum. A lot of calc classes briefly cover it, but students tend to feel like the error estimates are pointless (I know because I've taught the course a few times.) As an example, if you have an alternating series (whose terms are decreasing in absolute value), then the absolute value of the first term you omit (don't sum) is an upperbound on your error.
*This statement is not 100% true, it is slightly over simplified, but is correct for most practical purposes.
I have a working detection and tracking process (pixel image in rows and columns) which does not give perfectly repeatable results because its use of atomicAdd means that data points can be accumulated in different orders leading to round off errors in the calculation of centroids and other track statistics.
In the main there are few clashes for the atomicAdd, so most results are identical. However for verification and validation I need to be able to make the atomicAdd add these clashing data points in a consistent order, such that say thread 3 will beat thread 10 when both want to use the atomicAdd to add a pixel on the row N that they are processing.
Is there a mechanism that allows the atomicAdd to be deterministic in its thread order, or have I missed something?
Check out "Fast Reproducible Atomic Summations" paper from Berkeley.
http://www.eecs.berkeley.edu/~hdnguyen/public/papers/ARITH21_Fast_Sum.pdf
But basically you could try something like finding a sum of abs values along with your original sum, multiply it by O(N^2) and then subtract and add it to/from your original sum (sum = (sum - sumAbs * N^2) + sumAbs * N^2) to cancel out the lowest bits (that are indeterministic). As you can see the upper bound grows proportional to N^2... so the lower the N (number of elements in the sum) the better is your error bound.
You could also try Kahan summation to reduce the error bound in conjunction with the above.
I have two codes that theoretically should return the exact same output. However, this does not happen. The issue is that the two codes handle very small numbers (doubles) to the order of 1e-100 or so. I suspect that there could be some numerical issues which are related to that, and lead to the two outputs being different even though they should be theoretically the same.
Does it indeed make sense that handling numbers on the order of 1e-100 cause such problems? I don't mind the difference in output, if I could safely assume that the source is numerical issues. Does anyone have a good source/reference that talks about issues that come up with stability of algorithms when they handle numbers in such order?
Thanks.
Does anyone have a good source/reference that talks about issues that come up with stability of algorithms when they handle numbers in such order?
The first reference that comes to mind is What Every Computer Scientist Should Know About Floating-Point Arithmetic. It covers floating-point maths in general.
As far as numerical stability is concerned, the best references probably depend on the numerical algorithm in question. Two wide-ranging works that come to mind are:
Numerical Recipes by Press et al;
Matrix Computations by Golub and Van Loan.
It is not necessarily the small numbers that are causing the problem.
How do you check whether the outputs are the "exact same"?
I would check equality with tolerance. You may consider the floating point numbers x and y equal if either fabs(x-y) < 1.0e-6 or fabs(x-y) < fabs(x)*1.0e-6 holds.
Usually, there is a HUGE difference between the two algorithms if there are numerical issues. Often, a small change in the input may result in extreme changes in the output, if the algorithm suffers from numerical issues.
What makes you think that there are "numerical issues"?
If possible, change your algorithm to use Kahan Summation (aka compensated summation). From Wikipedia:
function KahanSum(input)
var sum = 0.0
var c = 0.0 //A running compensation for lost low-order bits.
for i = 1 to input.length do
y = input[i] - c //So far, so good: c is zero.
t = sum + y //Alas, sum is big, y small, so low-order digits of y are lost.
c = (t - sum) - y //(t - sum) recovers the high-order part of y; subtracting y recovers -(low part of y)
sum = t //Algebraically, c should always be zero. Beware eagerly optimising compilers!
//Next time around, the lost low part will be added to y in a fresh attempt.
return sum
This works by keeping a second running total of the cumulative error, similar to the Bresenham line drawing algorithm. The end result is that you get precision that is nearly double the data type's advertised precision.
Another technique I use is to sort my numbers from small to large (by manitude, ignoring sign) and add or subtract the small numbers first, then the larger ones. This has the virtue that if you add and subtract the same value multiple times, such numbers may cancel exactly and can be removed from the list.
I want to invert a 4x4 matrix. My numbers are stored in fixed-point format (1.15.16 to be exact).
With floating-point arithmetic I usually just build the adjoint matrix and divide by the determinant (e.g. brute force the solution). That worked for me so far, but when dealing with fixed point numbers I get an unacceptable precision loss due to all of the multiplications used.
Note: In fixed point arithmetic I always throw away some of the least significant bits of immediate results.
So - What's the most numerical stable way to invert a matrix? I don't mind much about the performance, but simply going to floating-point would be to slow on my target architecture.
Meta-answer: Is it really a general 4x4 matrix? If your matrix has a special form, then there are direct formulas for inverting that would be fast and keep your operation count down.
For example, if it's a standard homogenous coordinate transform from graphics, like:
[ux vx wx tx]
[uy vy wy ty]
[uz vz wz tz]
[ 0 0 0 1]
(assuming a composition of rotation, scale, translation matrices)
then there's an easily-derivable direct formula, which is
[ux uy uz -dot(u,t)]
[vx vy vz -dot(v,t)]
[wx wy wz -dot(w,t)]
[ 0 0 0 1 ]
(ASCII matrices stolen from the linked page.)
You probably can't beat that for loss of precision in fixed point.
If your matrix comes from some domain where you know it has more structure, then there's likely to be an easy answer.
I think the answer to this depends on the exact form of the matrix. A standard decomposition method (LU, QR, Cholesky etc.) with pivoting (an essential) is fairly good on fixed point, especially for a small 4x4 matrix. See the book 'Numerical Recipes' by Press et al. for a description of these methods.
This paper gives some useful algorithms, but is behind a paywall unfortunately. They recommend a (pivoted) Cholesky decomposition with some additional features too complicated to list here.
I'd like to second the question Jason S raised: are you certain that you need to invert your matrix? This is almost never necessary. Not only that, it is often a bad idea. If you need to solve Ax = b, it is more numerically stable to solve the system directly than to multiply b by A inverse.
Even if you have to solve Ax = b over and over for many values of b, it's still not a good idea to invert A. You can factor A (say LU factorization or Cholesky factorization) and save the factors so you're not redoing that work every time, but you'd still solve the system each time using the factorization.
You might consider doubling to 1.31 before doing your normal algorithm. It'll double the number of multiplications, but you're doing a matrix invert and anything you do is going to be pretty tied to the multiplier in your processor.
For anyone interested in finding the equations for a 4x4 invert, you can use a symbolic math package to resolve them for you. The TI-89 will do it even, although it'll take several minutes.
If you give us an idea of what the matrix invert does for you, and how it fits in with the rest of your processing we might be able to suggest alternatives.
-Adam
Let me ask a different question: do you definitely need to invert the matrix (call it M), or do you need to use the matrix inverse to solve other equations? (e.g. Mx = b for known M, b) Often there are other ways to do this w/o explicitly needing to calculate the inverse. Or if the matrix M is a function of time & it changes slowly then you could calculate the full inverse once, & there are iterative ways to update it.
If the matrix represents an affine transformation (many times this is the case with 4x4 matrices so long as you don't introduce a scaling component) the inverse is simply the transpose of the upper 3x3 rotation part with the last column negated. Obviously if you require a generalized solution then looking into Gaussian elimination is probably the easiest.