Solving Kepler's Equation computationally - language-agnostic

I'm trying to solve Kepler's Equation as a step towards finding the true anomaly of an orbiting body given time. It turns out though, that Kepler's equation is difficult to solve, and the wikipedia page describes the process using calculus. Well, I don't know calculus, but I understand that solving the equation involves an infinite number of sets which produce closer and closer approximations to the correct answer.
I can't see from looking at the math how to do this computationally, so I was hoping someone with a better maths background could help me out. How can I solve this beast computationally?
FWIW, I'm using F# -- and I can calculate the other elements necessary for this equation, it's just this part I'm having trouble with.
I'm also open to methods which approximate the true anomaly given time, periapsis distance, and eccentricity

This paper:
A Practical Method for Solving the Kepler Equation http://murison.alpheratz.net/dynamics/twobody/KeplerIterations_summary.pdf
shows how to solve Kepler's equation using an iterative computing method. It should be fairly straightforward to translate it to the language of your choice.
You might also find this interesting. It's an ocaml program, part of which claims to contain a Kepler Equation solver. Since F# is in the ML family of languages (as is ocaml), this might provide a good starting point.

wanted to drop a reply in here in case this page gets found by anyone else looking for similar materials.
The following was written as an "expression" in Adobe's After Effects software, so it's javascriptish, although I have a Python version for a different app (cinema 4d). The idea is the same: execute Newton's method iteratively until some arbitrary precision is reached.
Please note that I'm not posting this code as exemplary or meaningfully efficient in any way, just posting code we produced on a deadline to accomplish a specific task (namely, move a planet around a focus according to Kepler's laws, and do so accurately). We don't write code for a living, and so we're also not posting this for critique. Quick & dirty is what meets deadlines.
In After Effects, any "expression" code is executed once - for every single frame in the animation. This restricts what one can do when implementing many algorithms, due to the inability to address global data easily (other algorithms for Keplerian motion use interatively updated velocity vectors, an approach we couldn't use). The result the code leaves behind is the [x,y] position of the object at that instant in time (internally, this is the frame number), and the code is intended to be attached to the position element of an object layer on the timeline.
This code evolved out of material found at http://www.jgiesen.de/kepler/kepler.html, and is offered here for the next guy.
pi = Math.PI;
function EccAnom(ec,am,dp,_maxiter) {
// ec=eccentricity, am=mean anomaly,
// dp=number of decimal places
pi=Math.PI;
i=0;
delta=Math.pow(10,-dp);
var E, F;
// some attempt to optimize prediction
if (ec<0.8) {
E=am;
} else {
E= am + Math.sin(am);
}
F = E - ec*Math.sin(E) - am;
while ((Math.abs(F)>delta) && (i<_maxiter)) {
E = E - F/(1.0-(ec* Math.cos(E) ));
F = E - ec * Math.sin(E) - am;
i = i + 1;
}
return Math.round(E*Math.pow(10,dp))/Math.pow(10,dp);
}
function TrueAnom(ec,E,dp) {
S=Math.sin(E);
C=Math.cos(E);
fak=Math.sqrt(1.0-ec^2);
phi = 2.0 * Math.atan(Math.sqrt((1.0+ec)/(1.0-ec))*Math.tan(E/2.0));
return Math.round(phi*Math.pow(10,dp))/Math.pow(10,dp);
}
function MeanAnom(time,_period) {
curr_frame = timeToFrames(time);
if (curr_frame <= _period) {
frames_done = curr_frame;
if (frames_done < 1) frames_done = 1;
} else {
frames_done = curr_frame % _period;
}
_fractime = (frames_done * 1.0 ) / _period;
mean_temp = (2.0*Math.PI) * (-1.0 * _fractime);
return mean_temp;
}
//==============================
// a=semimajor axis, ec=eccentricity, E=eccentric anomaly
// delta = delta digits to exit, period = per., in frames
//----------------------------------------------------------
_eccen = 0.9;
_delta = 14;
_maxiter = 1000;
_period = 300;
_semi_a = 70.0;
_semi_b = _semi_a * Math.sqrt(1.0-_eccen^2);
_meananom = MeanAnom(time,_period);
_eccentricanomaly = EccAnom(_eccen,_meananom,_delta,_maxiter);
_trueanomaly = TrueAnom(_eccen,_eccentricanomaly,_delta);
r = _semi_a * (1.0 - _eccen^2) / (1.0 + (_eccen*Math.cos(_trueanomaly)));
x = r * Math.cos(_trueanomaly);
y = r * Math.sin(_trueanomaly);
_foc=_semi_a*_eccen;
[1460+x+_foc,540+y];

You could check this out, implemented in C# by Carl Johansen
Represents a body in elliptical orbit about a massive central body
Here is a comment from the code
True Anomaly in this context is the
angle between the body and the sun.
For elliptical orbits, it's a bit
tricky. The percentage of the period
completed is still a key input, but we
also need to apply Kepler's
equation (based on the eccentricity)
to ensure that we sweep out equal
areas in equal times. This
equation is transcendental (ie can't
be solved algebraically) so we
either have to use an approximating
equation or solve by a numeric method.
My implementation uses
Newton-Raphson iteration to get an
excellent approximate answer (usually
in 2 or 3 iterations).

Related

How to solve a second order differential equation on Scilab?

I need to solve this differential equation using Runge-Kytta 4(5) on Scilab:
The initial conditions are above. The interval and the h-step are:
I don't need to implement Runge-Kutta. I just need to solve this and plot the result on the plane:
I tried to follow these instructions on the official "Scilab Help":
https://x-engineer.org/graduate-engineering/programming-languages/scilab/solve-second-order-ordinary-differential-equation-ode-scilab/
The suggested code is:
// Import the diagram and set the ending time
loadScicos();
loadXcosLibs();
importXcosDiagram("SCI/modules/xcos/examples/solvers/ODE_Example.zcos");
scs_m.props.tf = 5000;
// Select the solver Runge-Kutta and set the precision
scs_m.props.tol(6) = 6;
scs_m.props.tol(7) = 10^-2;
// Start the timer, launch the simulation and display time
tic();
try xcos_simulate(scs_m, 4); catch disp(lasterror()); end
t = toc();
disp(t, "Time for Runge-Kutta:");
However, it is not clear for me how I can change this for the specific differential equation that I showed above. I have a very basic knowledge of Scilab.
The final plot should be something like the picture bellow, an ellipse:
Just to provide some mathematical context, this is the differential equation that describes the pendulum problem.
Could someone help me, please?
=========
UPDATE
Based on #luizpauloml comments, I am updating this post.
I need to convert the second-order ODE into a system of first-order ODEs and then I need to write a function to represent such system.
So, I know how to do this on pen and paper. Hence, using z as a variable:
OK, but how do I write a normal script?
The Xcos is quite disposable. I only kept it because I was trying to mimic the example on the official Scilab page.
To solve this, you need to use ode(), which can employ many methods, Runge-Kutta included. First, you need to define a function to represent the system of ODEs, and Step 1 in the link you provided shows you what to do:
function z = f(t,y)
//f(t,z) represents the sysmte of ODEs:
// -the first argument should always be the independe variable
// -the second argument should always be the dependent variables
// -it may have more than two arguments
// -y is a vector 2x1: y(1) = theta, y(2) = theta'
// -z is a vector 2x1: z(1) = z , z(2) = z'
z(1) = y(2) //first equation: z = theta'
z(2) = 10*sin(y(1)) //second equation: z' = 10*sin(theta)
endfunction
Notice that even if t (the independent variable) does not explicitly appear in your system of ODEs, it still needs to be an argument of f(). Now you just use ode(), setting the flag 'rk' or 'rkf' to use either one of the available Runge-Kutta methods:
ts = linspace(0,3,200);
theta0 = %pi/4;
dtheta0 = 0;
y0 = [theta0; dtheta0];
t0 = 0;
thetas = ode('rk',y0, t0, ts, f); //the output have the same order
//as the argument `y` of f()
scf(1); clf();
plot2d(thetas(2,:),thetas(1,:),-5);
xtitle('Phase portrait', 'theta''(t)','theta(t)');
xgrid();
The output:

Why that I have written for Andrew Ng's course not accepted?

Andrew Ng's course in Coursera, which Stanford's Machine Learning course, features programming assignments that deal with implementing the algorithms taught in class. The goal of this assignment is to implement linear regression through gradient descent with an input set of X, y, theta, alpha (learning rate), and number of iterations.
I implemented this solution in Octave, the prescribed language in the course.
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y);
J_history = zeros(num_iters, 1);
numJ = size(theta, 1);
for iter = 1:num_iters
for i = 1:m
for j = 1:numJ
temp = theta(j) - alpha /m * X(i, j) * (((X * theta)(i, 1)) - y(i, 1));
theta(j) = temp
end
prediction = X * theta;
J_history(iter, 1) = computeCost(X,y,theta)
end
end
On the other hand, here is the cost function:
function J = computeCost(X, y, theta)
m = length(y);
J = 0;
prediction = X * theta;
error = (prediction - y).^2;
J = 1/(2 * m) .* sum(error);
end
This does not pass the submit() function. The submit() function simply validates the data through passing an unknown test case.
I have checked other questions on StackOverflow but I really don't get it. :)
Thank you very much!
Your gradient seems to be correct and as already pointed out in the answer given by #Kasinath P, it is likely that the problem is that the code is too slow. You just need to vectorize it. In Matlab/Octave, you usually need to avoid for loops (note that although you have parfor in Matlab, it is not yet available in octave). So it is always better, performance-wise, to write something like A*x instead of iterating over each row of A with a for loop. You can read about vectorization here.
If I understand correctly, X is a matrix of size m*numJ where m is the number of examples, and numJ is the number of features (or the dimension of the space where each point lies. In that case, you can rewrite your cost function as
(1/(2*m)) * (X*theta-y)'*(X*theta-y);%since ||v||_2^2=v'*v for any vector v in Euclidean space
Now, we know from basic matrix calculus that for any two vectors s and v that are functions from R^{num_J} to R^m, the Jacobian of s^{t}v is given by
s^{t}Jacobian(v)+v^{t}*Jacobian(s) %this Jacobian will have size 1*num_J.
Applying that to your cost function, we obtain
jacobian=(1/m)*(theta'*X'-y')*X;
So if you just replace
for i = 1:m
for j = 1:numJ
%%% theta(j) updates
end
end
with
%note that the gradient is the transpose of the Jacobian we've computed
theta-=alpha*(1/m)*X'*(X*theta-y)
you should see a great increase in performance.
your computecost code is correct
and Better follow the vectorized implementation of Gradient Descent.
You are just iterating and it is slow and may have error.
That course aims you to do vectorized implementation as it is simple and handy at the same time.
I knew this because I did that after sweating a lot.
Vectorization is good:)

how to see/extract model flows evolution when solve a model in a Scilab function?

I wrote the Lotka Voterra model (prey-predator) as a Scilab function and solved it with ODE. My problem is that I want to see the flows evolution. I found a solution by including flows in the resolution (in the script below only for one) but it’s really “heavy”. Anyone has a better solution?
//parameters
x=[0.04,0.005,0.3,0.2]
n = x(1);//birth rate of prey
c = x(2);//capture rate
e = x(3);//energy from capture
m = x(4);//death rate or predator
Tmax=100; // maximum time of simulation
dt=0.01; // step of time
t=0:dt:Tmax; //simulation time definition
Ci = [20,30,0]'; //initial conditions (including for 1 flow)
//Lotka Volterra model
function [dy]=LV(t, y, n, c, e, m)
GrowthP = n * y(1)
IngestC = c * y(1) * y(2)
MortC = m * y(2)
dy(1) = GrowthP - IngestC
dy(2) = IngestC * e - MortC
dy(3) = IngestC //here one flow in ode
endfunction
//resolution
sol=ode(Ci,0,t,LV)
//dataframe creation to stock data
soldef=zeros(3,10001);
//for the line 3 and form 2 to the last data it take the value,
//remove the one of the previous time step and store it in soldef
for i = 2:length(sol(3,:))
soldef(3,i)=sol(3,i)-sol(3,i-1);
end
//complete the dataframe with the stock
soldef(1:2,:)=sol(1:2,:)
Thanks for the interest and the time you give to my problem (and sorry for my English)
If you want animated display of your results, you can use the code below to plot more and more data on a graph:
[row,col]=size(soldef);
scf(0);
clf(0);
plot2d(soldef(:,1:2)',rect=[0,0,col,max(soldef)]); //plot first 2 points
en=gce(); //handle of the graphic entities
for i=3:col //plot more and more points
en.children(1).data=[1:i;soldef(3,1:i)]'; //update data matrices
en.children(2).data=[1:i;soldef(2,1:i)]';
en.children(3).data=[1:i;soldef(1,1:i)]';
//you can save the frames here...
xpause(1e4); //wait some time if amination is too fast
end
If you save all the graphs (with e.g.xs2gif()), with an external application (google for "gif maker") you can even make a .gif animation of your results to embedd in a webpage or presentation.

steering behavior and passed time

I'm trying to implement steering behaviors but I have a problem with "including" the passed time in the calculation and then allowing me to control the speed of the game. I have seen various sources of steering behaviors and I've come up with this (Arrive behavior):
var toTarget:Vector2D = new Vector2D( mEntity.targetPosition.x, mEntity.targetPosition.y );
toTarget.subtract( mEntity.position );
var dist:Number = toTarget.length;
toTarget.normalize().scale( mEntity.maxSpeed );
if( dist < slowDownDist ) {
toTarget.scale( dist / slowDownDist );
}
return toTarget.subtract( mEntity.velocity );
And here's the advanceTime method of MovingEntity:
var steeringForce:Vector2D = mSteering.calculate();
steeringForce.x /= mMass;
steeringForce.y /= mMass;
steeringForce.scale( time );
mVelocity.x += steeringForce.x;
mVelocity.y += steeringForce.y;
x += mVelocity.x * time;
y += mVelocity.y * time;
The steering force should be (at some point) directly opposite to the entity's velocity and thus making it stop. The problem I see and do not understand is that in order to simulate acceleration the force needs to be divided by mass and also to consider the passed time it needs to be multiplied by that time - but this scales the steering force down a lot and causes the entity to overshoot the target spot instead of stopping there so then it returns back which effectively causes oscillation. If I do not multiply the force with the time then the moving entity behaves slightly differently depending on the game speed.
By second (?) Newton's law F = m * a where F is force, m is mass and a is acceleration. Force is measured in newtons, mass in kilograms and acceleration in meters per second per second. So, you just need to apply bigger force, and lave the calculation as is.

Translation from Complex-FFT to Finite-Field-FFT

Good afternoon!
I am trying to develop an NTT algorithm based on the naive recursive FFT implementation I already have.
Consider the following code (coefficients' length, let it be m, is an exact power of two):
/// <summary>
/// Calculates the result of the recursive Number Theoretic Transform.
/// </summary>
/// <param name="coefficients"></param>
/// <returns></returns>
private static BigInteger[] Recursive_NTT_Skeleton(
IList<BigInteger> coefficients,
IList<BigInteger> rootsOfUnity,
int step,
int offset)
{
// Calculate the length of vectors at the current step of recursion.
// -
int n = coefficients.Count / step - offset / step;
if (n == 1)
{
return new BigInteger[] { coefficients[offset] };
}
BigInteger[] results = new BigInteger[n];
IList<BigInteger> resultEvens =
Recursive_NTT_Skeleton(coefficients, rootsOfUnity, step * 2, offset);
IList<BigInteger> resultOdds =
Recursive_NTT_Skeleton(coefficients, rootsOfUnity, step * 2, offset + step);
for (int k = 0; k < n / 2; k++)
{
BigInteger bfly = (rootsOfUnity[k * step] * resultOdds[k]) % NTT_MODULUS;
results[k] = (resultEvens[k] + bfly) % NTT_MODULUS;
results[k + n / 2] = (resultEvens[k] - bfly) % NTT_MODULUS;
}
return results;
}
It worked for complex FFT (replace BigInteger with a complex numeric type (I had my own)). It doesn't work here even though I changed the procedure of finding the primitive roots of unity appropriately.
Supposedly, the problem is this: rootsOfUnity parameter passed originally contained only the first half of m-th complex roots of unity in this order:
omega^0 = 1, omega^1, omega^2, ..., omega^(n/2)
It was enough, because on these three lines of code:
BigInteger bfly = (rootsOfUnity[k * step] * resultOdds[k]) % NTT_MODULUS;
results[k] = (resultEvens[k] + bfly) % NTT_MODULUS;
results[k + n / 2] = (resultEvens[k] - bfly) % NTT_MODULUS;
I originally made use of the fact, that at any level of recursion (for any n and i), the complex root of unity -omega^(i) = omega^(i + n/2).
However, that property obviously doesn't hold in finite fields. But is there any analogue of it which would allow me to still compute only the first half of the roots?
Or should I extend the cycle from n/2 to n and pre-compute all the m-th roots of unity?
Maybe there are other problems with this code?..
Thank you very much in advance!
I recently wanted to implement NTT for fast multiplication instead of DFFT too. Read a lot of confusing things, different letters everywhere and no simple solution, and also my finite fields knowledge is rusty , but today i finally got it right (after 2 days of trying and analog-ing with DFT coefficients) so here are my insights for NTT:
Computation
X(i) = sum(j=0..n-1) of ( Wn^(i*j)*x(i) );
where X[] is NTT transformed x[] of size n where Wn is the NTT basis. All computations are on integer modulo arithmetics mod p no complex numbers anywhere.
Important values
Wn = r ^ L mod p is basis for NTT
Wn = r ^ (p-1-L) mod p is basis for INTT
Rn = n ^ (p-2) mod p is scaling multiplicative constant for INTT ~(1/n)
p is prime that p mod n == 1 and p>max'
max is max value of x[i] for NTT or X[i] for INTT
r = <1,p)
L = <1,p) and also divides p-1
r,L must be combined so r^(L*i) mod p == 1 if i=0 or i=n
r,L must be combined so r^(L*i) mod p != 1 if 0 < i < n
max' is the sub-result max value and depends on n and type of computation. For single (I)NTT it is max' = n*max but for convolution of two n sized vectors it is max' = n*max*max etc. See Implementing FFT over finite fields for more info about it.
functional combination of r,L,p is different for different n
this is important, you have to recompute or select parameters from table before each NTT layer (n is always half of the previous recursion).
Here is my C++ code that finds the r,L,p parameters (needs modular arithmetics which is not included, you can replace it with (a+b)%c,(a-b)%c,(a*b)%c,... but in that case beware of overflows especial for modpow and modmul) The code is not optimized yet there are ways to speed it up considerably. Also prime table is fairly limited so either use SoE or any other algo to obtain primes up to max' in order to work safely.
DWORD _arithmetics_primes[]=
{
2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,
179,181,191,193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,277,281,283,293,307,311,313,317,331,337,347,349,353,359,367,373,379,383,389,397,401,409,
419,421,431,433,439,443,449,457,461,463,467,479,487,491,499,503,509,521,523,541,547,557,563,569,571,577,587,593,599,601,607,613,617,619,631,641,643,647,653,659,
661,673,677,683,691,701,709,719,727,733,739,743,751,757,761,769,773,787,797,809,811,821,823,827,829,839,853,857,859,863,877,881,883,887,907,911,919,929,937,941,
947,953,967,971,977,983,991,997,1009,1013,1019,1021,1031,1033,1039,1049,1051,1061,1063,1069,1087,1091,1093,1097,1103,1109,1117,1123,1129,1151,
0}; // end of table is 0, the more primes are there the bigger numbers and n can be used
// compute NTT consts W=r^L%p for n
int i,j,k,n=16;
long w,W,iW,p,r,L,l,e;
long max=81*n; // edit1: max num for NTT for my multiplication purposses
for (e=1,j=0;e;j++) // find prime p that p%n=1 AND p>max ... 9*9=81
{
p=_arithmetics_primes[j];
if (!p) break;
if ((p>max)&&(p%n==1))
for (r=2;r<p;r++) // check all r
{
for (l=1;l<p;l++)// all l that divide p-1
{
L=(p-1);
if (L%l!=0) continue;
L/=l;
W=modpow(r,L,p);
e=0;
for (w=1,i=0;i<=n;i++,w=modmul(w,W,p))
{
if ((i==0) &&(w!=1)) { e=1; break; }
if ((i==n) &&(w!=1)) { e=1; break; }
if ((i>0)&&(i<n)&&(w==1)) { e=1; break; }
}
if (!e) break;
}
if (!e) break;
}
}
if (e) { error; } // error no combination r,l,p for n found
W=modpow(r, L,p); // Wn for NTT
iW=modpow(r,p-1-L,p); // Wn for INTT
and here is my slow NTT and INTT implementations (i havent got to fast NTT,INTT yet) they are both tested with Schönhage–Strassen multiplication successfully.
//---------------------------------------------------------------------------
void NTT(long *dst,long *src,long n,long m,long w)
{
long i,j,wj,wi,a,n2=n>>1;
for (wj=1,j=0;j<n;j++)
{
a=0;
for (wi=1,i=0;i<n;i++)
{
a=modadd(a,modmul(wi,src[i],m),m);
wi=modmul(wi,wj,m);
}
dst[j]=a;
wj=modmul(wj,w,m);
}
}
//---------------------------------------------------------------------------
void INTT(long *dst,long *src,long n,long m,long w)
{
long i,j,wi=1,wj=1,rN,a,n2=n>>1;
rN=modpow(n,m-2,m);
for (wj=1,j=0;j<n;j++)
{
a=0;
for (wi=1,i=0;i<n;i++)
{
a=modadd(a,modmul(wi,src[i],m),m);
wi=modmul(wi,wj,m);
}
dst[j]=modmul(a,rN,m);
wj=modmul(wj,w,m);
}
}
//---------------------------------------------------------------------------
dst is destination array
src is source array
n is array size
m is modulus (p)
w is basis (Wn)
hope this helps to someone. If i forgot something please write ...
[edit1: fast NTT/INTT]
Finally I manage to get fast NTT/INTT to work. Was little bit more tricky than normal FFT:
//---------------------------------------------------------------------------
void _NFTT(long *dst,long *src,long n,long m,long w)
{
if (n<=1) { if (n==1) dst[0]=src[0]; return; }
long i,j,a0,a1,n2=n>>1,w2=modmul(w,w,m);
// reorder even,odd
for (i=0,j=0;i<n2;i++,j+=2) dst[i]=src[j];
for ( j=1;i<n ;i++,j+=2) dst[i]=src[j];
// recursion
_NFTT(src ,dst ,n2,m,w2); // even
_NFTT(src+n2,dst+n2,n2,m,w2); // odd
// restore results
for (w2=1,i=0,j=n2;i<n2;i++,j++,w2=modmul(w2,w,m))
{
a0=src[i];
a1=modmul(src[j],w2,m);
dst[i]=modadd(a0,a1,m);
dst[j]=modsub(a0,a1,m);
}
}
//---------------------------------------------------------------------------
void _INFTT(long *dst,long *src,long n,long m,long w)
{
long i,rN;
rN=modpow(n,m-2,m);
_NFTT(dst,src,n,m,w);
for (i=0;i<n;i++) dst[i]=modmul(dst[i],rN,m);
}
//---------------------------------------------------------------------------
[edit3]
I have optimized my code (3x times faster than code above),but still i am not satisfied with it so i started new question with it. There I have optimized my code even further (about 40x times faster than code above) so its almost the same speed as FFT on floating point of the same bit size. Link to it is here:
Modular arithmetics and NTT (finite field DFT) optimizations
To turn Cooley-Tukey (complex) FFT into modular arithmetic approach, i.e. NTT, you must replace complex definition for omega. For the approach to be purely recursive, you also need to recalculate omega for each level based on current signal size. This is possible because min. suitable modulus decreases as we move down in the call tree, so modulus used for root is suitable for lower layers. Additionally, as we are using same modulus, the same generator may be used as we move down the call tree. Also, for inverse transform, you should take additional step to take recalculated omega a and instead use as omega: b = a ^ -1 (via using inverse modulo operation). Specifically, b = invMod(a, N) s.t. b * a == 1 (mod N), where N is the chosen prime modulus.
Rewriting an expression involving omega by exploiting periodicity still works in modular arithmetic realm. You also need to find a way to determine the modulus (a prime) for the problem and a valid generator.
We note that your code works, though it is not a MWE. We extended it using common sense, and got correct result for a polynomial multiplication application. You just have to provide correct values of omega raised to certain powers.
While your code works, though, like from many other sources, you double spacing for each level. This does not lead to recursion that is as clean, though; this turns out to be identical to recalculating omega based on current signal size because the power for omega definition is inversely proportional to signal size. To reiterate: halving signal size is like squaring omega, which is like giving doubled powers for omega (which is what one would do for doubling of spacing). The nice thing about the approach that deals with recalculating of omega is that each subproblem is more cleanly complete in its own right.
There is a paper that shows some of the math for modular approach; it is a paper by Baktir and Sunar from 2006. See the paper at the end of this post.
You do not need to extend the cycle from n / 2 to n.
So, yes, some sources which say to just drop in a different omega definition for modular arithmetic approach are sweeping under the rug many details.
Another issue is that it is important to acknowledge that the signal size must be large enough if we are to not have overflow for result time-domain signal if we are performing convolution. Additionally, it may be useful to find certain implementations for exponentiation subject to modulus exist that are fast, even if the power is quite large.
References
Baktir and Sunar - Achieving efficient polynomial multiplication in Fermat fields using the fast Fourier transform (2006)
You must make sure that roots of unity actually exist. In R there are only 2 roots of unity: 1 and -1, since only for them x^n=1 can be true.
In C you have infinitely many roots of unity: w=exp(2*pi*i/N) is a primitive N-th roots of unity and all w^k for 0<=k
Now to your problem: you have to make sure the ring you're working in offers the same property: enough roots of unity.
Schönhage and Strassen (http://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm) use integers modulo 2^N+1. This ring has enough roots of unity. 2^N == -1 is a 2nd root of unity, 2^(N/2) is a 4th root of unity and so on. Furthermore, these roots of unity have the advantage that they are powers of two and can be implemented as binary shifts (with a modulo operation afterwards, which comes down to a add/subtract).
I think QuickMul (http://www.cs.nyu.edu/exact/doc/qmul.ps) works modulo 2^N-1.