Converting Arduino compass signal to useable heading - function
Background
I bought an Arduino magnetometer/compass with a QMC5883 chip from Amazon, however the bearing output I'm getting doesn't tally with the calculations I've found online. The serial output seems to be plausible (sinusoids with a phase difference of 90°), but the numbers I'm getting for the calculated bearing don't match what they are supposed to. I stored the serial output as a .csv file to graph the magnetometer response when turned through 360° in Excel:
Response was roughly as expected - Z remaining roughly steady (apart from a few wobbles caused by the cable!), X and Y varying sinusoidaly through 360°. (Bear in mind I couldn't turn the magnetometer at a constant speed with my hand which is why the curves are so unsteady).
However below is the graph of what heading was calculated; results were supposed to be between -180° and +180°:
As you can see it only varies around -60° to -160°, and each bearing reading is not unique as it is given by two different rotations of the magnetometer. The specific calculation in the code used (in full at the bottom) is:
bearing =180*atan2(y,x)/3.141592654; //values will range from +180 to -180°
bearing +=0-(19/60); //Adjust for local magnetic declination
Question
I can't figure out what's wrong with the calculation as it is used in a few different sources, and I would like to know how to convert the readings I'm getting to a usable range which is one to one instead of many to one, such as -180° to +180° or 0° to 360°.
Here is the code:
//There are several differences between the QMC5883L and the HMC5883L chips
//Differences in address: 0x0D for QMC5883L; 0x1E for HMC5883L
//Differences in register map (compare datasheets)
//Output data register differences include location of x,y,z and MSB and LSB for these parameters
//Control registers are also different (so location and values for settings change)
#include <Wire.h> //I2C Arduino Library
#define addr 0x0D //I2C Address for The QMC5883L (0x1E for HMC5883)
double scale=1.0;
void setup() {
// double scaleValues[9]={0.00,0.73,0.92,1.22,1.52,2.27,2.56,3.03,4.35};
// scale=scaleValues[2];
//initialize serial and I2C communications
Serial.begin(9600);
Wire.begin();
Wire.beginTransmission(addr); //start talking to slave
Wire.write(0x0B);
Wire.write(0x01);
Wire.endTransmission();
Wire.beginTransmission(addr); //start talking to slave
Wire.write(0x09);
Wire.write(0x1D);
Wire.endTransmission();
}
void loop() {
int x, y, z; //triple axis data
//Tell the QMC what regist to begin writing data into
Wire.beginTransmission(addr);
Wire.write(0x00); //start with register 00H for QMC5883L
Wire.endTransmission();
double bearing=0.00;
//Read the data.. 2, 8 bit bytes for each axis.. 6 total bytes
Wire.requestFrom(addr, 6);
//read 6 registers in order; register location (i.e.00H)indexes by one once read
if (6 <= Wire.available()) {
//note the order of following statements matters
//as each register will be read in sequence starting from data register 00H to 05H
//where order is xLSB,xMSB,yLSB,yMSB,zLSB,zMSB
//this is different from HMC5883L!
//data registers are 03 to 08
//where order is xMSB,xLSB,zMSB,zLSB,yMSB,yLSB
x = Wire.read(); //LSB x;
x |= Wire.read()<<8; //MSB x; bitshift left 8, then bitwise OR to make "x"
// x*=scale;
y = Wire.read(); //LSB y
y |= Wire.read()<<8; //MSB y;
// y*=scale;
z = Wire.read(); //LSB z; irrelevant for compass
z |= Wire.read()<<8; //MSB z;
// z*=scale;
bearing =180*atan2(y,x)/3.141592654;//values will range from +180 to -180 degrees
bearing +=0-(19/60);//Adjust for local magnetic declination
}
// Show Values
//Serial.print("X:");
Serial.print(x);
//Serial.print(" Y: ");
Serial.print(",");
Serial.print(y);
//Serial.print(" Z: ");
Serial.print(",");
Serial.print(z);
//Serial.print(" B: ");
Serial.print(",");
Serial.println(bearing);
delay(500);
}
For others reading this question:
The OP forgot to implement a x,y,z smoothing and an out of scope value removal. How this can be achieved and how its done look into the source code of this QMC5883 compass library:
QMC5883L Compass is an Arduino library for using QMC5583L series chip boards as a compass.
It supports:
Getting values of XYZ axis.
Calculating Azimuth.
Getting 16 point Azimuth bearing direction (0 - 15).
Getting 16 point Azimuth bearing Names (N, NNE, NE, ENE, E, ESE, SE, SSE, S, SSW, SW, WSW, W, WNW, NW, NNW)
Smoothing of XYZ readings via rolling averaging and min / max removal.
Related
step doubling Runge Kutta implementation stuck shrinking stepsize to machine precision
I need to integrate a system of ODES using an adaptive RK4 method with stepsize control via step doubling techniques. The problem is that the program continues forever shrinking the stepsize down to machine precision while not advancing time. the idea is to step the solution once by a single step and also by two successive half steps, compare the result as their difference and store it in eps. So eps is a measure of the error. Now I want to determine the next step stepsize according to whether eps is greater to a specified accuracy eps0 (as described in the book "Numerical Recipes") RK4Step(double t, double* Y, double *Yout, void (*RHSFunc)(double, double *, double *),double h) steps the solution vector Y by h and puts the result into Yout using the function RHSFunc. #define NEQ 4 //problem dimension int main(int argc, char* argv[]) { ofstream frames("./frames.dat"); ofstream graphs("./graphs.dat"); double Y[4] = {2.0, 2.0, 1.0, 0.0}; //initial conditions for solution vector double finaltime = 100; //end of integration double eps0 = 10e-5; //error to compare with eps double t = 0.0; double step = 0.01; while(t < finaltime) { double eps = 0.0; double Y1[4], Y2[4]; //Y1 will store half step solution //Y2 will store double step solution double dt = step; //cache current stepsize for(;;) { //make a step starting from state stored in Y and //put solution into Y1. Then from Y1 make another half step //and store into Y1. RK4Step(t, Y, Y1, RHS, step); //two half steps RK4Step(t+step,Y1, Y1, RHS, step); RK4Step(t, Y, Y2, RHS, 2*step); //one long step //compute eps as maximum of differences between Y1 and Y2 //(an alternative would be quadrature sums) for(int i=0; i<NEQ; i++) eps=max(eps, fabs( (Y1[i]-Y2[i])/15.0 ) ); //if error is within tolerance we grow stepsize //and advance time if(eps < eps0) { //stepsize is accepted, grow stepsize //save solution from Y1 into Y, //advance time by the previous (cached) stepsize Y[0] = Y1[0]; Y[1] = Y1[1]; Y[2] = Y1[2]; Y[3] = Y1[3]; step = 0.9*step*pow(eps0/eps, 0.20); //(0.9 is the safety factor) t+=dt; break; } //if the error is too big we shrink stepsize step = 0.9*step*pow(eps0/eps, 0.25); } } frames.close(); graphs.close(); return 0; }
You never reset eps in the inner loop. This could be the direct cause of your problem. While the actual error reduces with ever decreasing step sizes, the maximum in eps stays constant, and above eps0. This results in a constant reducing factor in the step size update, without any chance to break the loop. Another "wrong thing" is that the error estimate and tolerance are incompatible. The error tolerance eps0 is an error density or unit-step error. To bring your error estimate eps into that format you need to divide eps by step. Or put another way, currently you are forcing the actual step error to be close to 0.5*eps0, so that the global error is 0.5*eps0 times the number of steps taken, with the number of steps loosely proportional to eps0^0.2. In the version using the unit-step error, the local error is forced to be "dynamically" close to 0.5*eps0*step, so that the global error is about 5*eps0 times the length of the integration interval. I'd say that the second variant is more in line with intuition about the expected behavior. This is not a critical error, but may lead to sub-optimal step sizes and an actual global error that deviates non-trivially from the desired error tolerance. You also have a coding inconsistency as in the propagation of the state and declaration of state vectors you have hard-coded 4 components in the state vector, while in the error computation you have a loop over a variable number NEQ of equations and components. As you are using C++, you could use a state vector class that handles all dimension-dependent loops internally. (If done too far, frequent allocation of instances with a short life span could be an efficiency issue.)
CMSIS real-FFT on 8192 samples in Q15
I need to perform an FFT on a block of 8192 samples on an STM32F446 microcontroller. For that I wanted to use the CMSIS DSP library as it's available easily and optimised for the STM32F4. My 8192 samples of input will ultimately be values from the internal 12-bit ADC (left aligned and converted to q15 by flipping the sign bit)., but for testing purpose I'm feeding the FFT with test-buffers. With CMSIS's FFT functions, only the Q15 version supports lengths of 8192. Thus I am using arm_rfft_q15(). Because the FFT functions of the CMSIS libraries include by default about 32k of LUTs - to adapt to many FFT lengths, I have "rewritten" them to remove all the tables corresponding to other length than the one I'm interested in. I haven't touched anything except removing the useless code. My samples are stored on an external SDRAM that I access via DMA. When using the FFT, I have several problems : Both my source buffer and my destination buffer get modified ; the result is not at all as expected To make sure I had wrong results I did an IFFT right after the FFT but it just confirmed that the code wasn't working. Here is my code : status_codes FSM::fft_state(void) { // Flush the SDRAM section si_ovf_buf_clr_u16((uint16_t *)0xC0000000, 8192); q15_t* buf = (q15_t*)(0xC0000000); for(int i = 0; i<50; i++) buf[i] = 0x0FFF; // Fill the buffer with test vector (50 sp gate) // initialise FFT // ---> Forward, 8192 samples, bitReversed arm_rfft_instance_q15 S; if(arm_rfft_init_q15(&S, 8192, 0, 1) != ARM_MATH_SUCCESS) return state_error; // perform FFT arm_rfft_q15(&S, (q15_t*)0xC0000000, (q15_t*)0xC0400000); // Post-shift by 12, in place (see doc) arm_shift_q15((q15_t*)0xC0400000, 12, (q15_t*)0xC0400000, 16384); // Init inverse FFT if(arm_rfft_init_q15(&S, 8192, 1, 1) != ARM_MATH_SUCCESS) return state_error; // Perform iFFT arm_rfft_q15(&S, (q15_t*)0xC0400000, (q15_t*)0xC0800000); // Post shift arm_shift_q15((q15_t*)0xC0800000, 12, (q15_t*)0xC0800000, 8192); return state_success; } And here is the result (from GDB) PS : I'm using ChibiOS - not sure if it is relevant.
How to determine width of peaks and make FFT for every peak (and plot it in separate graph)
I have an acceleration data for X-axis and time vector for it. I determined the peaks more than threshold and now I should find the FFT for every peak. As result I have this: Peak Value 1 = 458, index 1988 Peak Value 2 = 456, index 1990 Peak Value 3 = 450, index 12081 .... Peak Value 9 = 432, index 12151 To find these peaks I used the peakfinder script. The command [peakLoc, peakMag] = peakfinder(x0,...) gives me location and magnitude of peaks. Also I have the Time (from time data vector) for each peak. So what I suppose, that I should take every peak, find its width (or some data points around the peak) and make the FFT. Am I right? Could you help me in that? I'm working in Octave and I'm new here :) Code: load ("C:\\..patch..\\peakfinder.m"); d =dlmread("C:\\..patch..\\acc2.csv", ";"); T=d(:,1); Ax=d(:,2); [peakInd peakVal]=peakfinder(Ax,10,430,1); peakTime=T(peakInd); [sortVal sortInd] = sort(peakVal, 'descend'); originInd = peakInd(sortInd); for k = 1 : length(sortVal) fprintf(1, 'Peak #%d = %d, index%d\n', k, sortVal(k), originInd (k)); end plot(T,Ax,'b-',T(peakInd),Ax(peakInd),'rv'); and here you can download the data http://www.filedropper.com/acc2 FFT d =dlmread("C:\\..path..\\acc2.csv", ";"); T=d(:,1); Ax=d(:,2); % sampling frequency Fs_a=2000; % length of FFT Length_Ax=numel(Ax); % number of lines of Fourier spectrum fft_L= Fs_a*2; % an array of time samples T_Ax=0:1/Fs_a: Length_Ax; fft_Ax=abs(fft(Ax,fft_L)); fft_Ax=2*fft_Ax./fft_L; F=0:Fs_a/fft_L:Fs_a/2-1/fft_L; subplot(3,1,1); plot(T,Ax); title('Ax axis'); xlabel('time (s)'); ylabel('amplitude)'); grid on; subplot(3,1,2); plot(F,fft_Ax(1:length(F))); title('spectrum max Ax axis'); xlabel('frequency (Hz)'); ylabel('amplitude'); grid on;
It looks like you have two clusters of peaks, so I would plot the data over three plots: one of the whole timeseries, one zoomed in on the first cluster, and the last one zoomed in on the second cluster (note I have divided all your time values by 1e6 otherwise the tick labels get ugly): figure subplot(3,1,1) plot(T/1e6,Ax,'b-',peakTime/1e6,peakVal,'rv'); subplot(3,1,2) plot(T/1e6,Ax,'b-',peakTime(1:4)/1e6,peakVal(1:4),'rv'); axis([0.99*peakTime(1)/1e6 1.01*peakTime(4)/1e6 0.9*peakVal(1) 1.1*peakVal(4)]) subplot(3,1,3) plot(T/1e6,Ax,'b-',peakTime(5:end)/1e6,peakVal(5:end),'rv'); axis([0.995*peakTime(5)/1e6 1.005*peakTime(end)/1e6 0.9*peakVal(5) 1.1*peakVal(end)]) I have set the axes around the extreme time and acceleration values, using some coefficients to have some "padding" around (the values of these coefficients were obtained through trial and error). This gives me the following plot, hopefully this is the sort of thing you are after. You can add x and y labels if you wish. EDIT Here's how I would do the FFT: Fs = 2000; L = length(Ax); NFFT = 2^nextpow2(L); % Next power of 2 from length of Ax Ax_FFT = fft(Ax,NFFT)/L; f = Fs/2*linspace(0,1,NFFT/2+1); % Plot single-sided amplitude spectrum. figure semilogx(f,2*abs(Ax_FFT(1:NFFT/2+1))) % using semilogx as huge DC component title('Single-Sided Amplitude Spectrum of Ax') xlabel('Frequency (Hz)') ylabel('|Ax(f)|') ylim([0 300]) giving the following result:
Relationship between FFT sampling rate, Bandwidth, and Resolution
I´m trying to write a code for omitting one sideband of FFT and shift the other on to the center, I know the sampling rate (2 GHz) and the number of samples is (10000) the sidebands located in (-55,-355) and (55,355) I want to know the frequency resolution of each spectral line this is the code I´ve written... void compfft (double *source, double *destination, int length) { double *realPart = malloc(length*sizeof(double)); double *ImgPart = malloc(length*sizeof(double)); int index,i,j; for (index= 0;index< length; index++) { realPart[index] = source[index]; //data to a local array } memset(ImgPart, 0, sizeof(ImgPart)); FFT(realPart, ImgPart, length); //Take fft //shifting the destination array for(i=0; i<(length/4) ; i++){ *destination[i]=* realPart[i+749]; } //filling the destination array with source array values from 55 Hz to 355 Hz for(j=99; j<(length/5); j++){ destination[j] = realPart[j+750]; } free(realPart); free(ImgPart); } but my supervisor told me it´s wrong and I need to read more about the basics I´m really confused plz help ..
The bandwidth and resolution depend not only on sample rate, but on the FFT window length and shape, as well as how you measure or define bandwidth and resolution, and potentially the signal to noise ratio.
Translation from Complex-FFT to Finite-Field-FFT
Good afternoon! I am trying to develop an NTT algorithm based on the naive recursive FFT implementation I already have. Consider the following code (coefficients' length, let it be m, is an exact power of two): /// <summary> /// Calculates the result of the recursive Number Theoretic Transform. /// </summary> /// <param name="coefficients"></param> /// <returns></returns> private static BigInteger[] Recursive_NTT_Skeleton( IList<BigInteger> coefficients, IList<BigInteger> rootsOfUnity, int step, int offset) { // Calculate the length of vectors at the current step of recursion. // - int n = coefficients.Count / step - offset / step; if (n == 1) { return new BigInteger[] { coefficients[offset] }; } BigInteger[] results = new BigInteger[n]; IList<BigInteger> resultEvens = Recursive_NTT_Skeleton(coefficients, rootsOfUnity, step * 2, offset); IList<BigInteger> resultOdds = Recursive_NTT_Skeleton(coefficients, rootsOfUnity, step * 2, offset + step); for (int k = 0; k < n / 2; k++) { BigInteger bfly = (rootsOfUnity[k * step] * resultOdds[k]) % NTT_MODULUS; results[k] = (resultEvens[k] + bfly) % NTT_MODULUS; results[k + n / 2] = (resultEvens[k] - bfly) % NTT_MODULUS; } return results; } It worked for complex FFT (replace BigInteger with a complex numeric type (I had my own)). It doesn't work here even though I changed the procedure of finding the primitive roots of unity appropriately. Supposedly, the problem is this: rootsOfUnity parameter passed originally contained only the first half of m-th complex roots of unity in this order: omega^0 = 1, omega^1, omega^2, ..., omega^(n/2) It was enough, because on these three lines of code: BigInteger bfly = (rootsOfUnity[k * step] * resultOdds[k]) % NTT_MODULUS; results[k] = (resultEvens[k] + bfly) % NTT_MODULUS; results[k + n / 2] = (resultEvens[k] - bfly) % NTT_MODULUS; I originally made use of the fact, that at any level of recursion (for any n and i), the complex root of unity -omega^(i) = omega^(i + n/2). However, that property obviously doesn't hold in finite fields. But is there any analogue of it which would allow me to still compute only the first half of the roots? Or should I extend the cycle from n/2 to n and pre-compute all the m-th roots of unity? Maybe there are other problems with this code?.. Thank you very much in advance!
I recently wanted to implement NTT for fast multiplication instead of DFFT too. Read a lot of confusing things, different letters everywhere and no simple solution, and also my finite fields knowledge is rusty , but today i finally got it right (after 2 days of trying and analog-ing with DFT coefficients) so here are my insights for NTT: Computation X(i) = sum(j=0..n-1) of ( Wn^(i*j)*x(i) ); where X[] is NTT transformed x[] of size n where Wn is the NTT basis. All computations are on integer modulo arithmetics mod p no complex numbers anywhere. Important values Wn = r ^ L mod p is basis for NTT Wn = r ^ (p-1-L) mod p is basis for INTT Rn = n ^ (p-2) mod p is scaling multiplicative constant for INTT ~(1/n) p is prime that p mod n == 1 and p>max' max is max value of x[i] for NTT or X[i] for INTT r = <1,p) L = <1,p) and also divides p-1 r,L must be combined so r^(L*i) mod p == 1 if i=0 or i=n r,L must be combined so r^(L*i) mod p != 1 if 0 < i < n max' is the sub-result max value and depends on n and type of computation. For single (I)NTT it is max' = n*max but for convolution of two n sized vectors it is max' = n*max*max etc. See Implementing FFT over finite fields for more info about it. functional combination of r,L,p is different for different n this is important, you have to recompute or select parameters from table before each NTT layer (n is always half of the previous recursion). Here is my C++ code that finds the r,L,p parameters (needs modular arithmetics which is not included, you can replace it with (a+b)%c,(a-b)%c,(a*b)%c,... but in that case beware of overflows especial for modpow and modmul) The code is not optimized yet there are ways to speed it up considerably. Also prime table is fairly limited so either use SoE or any other algo to obtain primes up to max' in order to work safely. DWORD _arithmetics_primes[]= { 2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173, 179,181,191,193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,277,281,283,293,307,311,313,317,331,337,347,349,353,359,367,373,379,383,389,397,401,409, 419,421,431,433,439,443,449,457,461,463,467,479,487,491,499,503,509,521,523,541,547,557,563,569,571,577,587,593,599,601,607,613,617,619,631,641,643,647,653,659, 661,673,677,683,691,701,709,719,727,733,739,743,751,757,761,769,773,787,797,809,811,821,823,827,829,839,853,857,859,863,877,881,883,887,907,911,919,929,937,941, 947,953,967,971,977,983,991,997,1009,1013,1019,1021,1031,1033,1039,1049,1051,1061,1063,1069,1087,1091,1093,1097,1103,1109,1117,1123,1129,1151, 0}; // end of table is 0, the more primes are there the bigger numbers and n can be used // compute NTT consts W=r^L%p for n int i,j,k,n=16; long w,W,iW,p,r,L,l,e; long max=81*n; // edit1: max num for NTT for my multiplication purposses for (e=1,j=0;e;j++) // find prime p that p%n=1 AND p>max ... 9*9=81 { p=_arithmetics_primes[j]; if (!p) break; if ((p>max)&&(p%n==1)) for (r=2;r<p;r++) // check all r { for (l=1;l<p;l++)// all l that divide p-1 { L=(p-1); if (L%l!=0) continue; L/=l; W=modpow(r,L,p); e=0; for (w=1,i=0;i<=n;i++,w=modmul(w,W,p)) { if ((i==0) &&(w!=1)) { e=1; break; } if ((i==n) &&(w!=1)) { e=1; break; } if ((i>0)&&(i<n)&&(w==1)) { e=1; break; } } if (!e) break; } if (!e) break; } } if (e) { error; } // error no combination r,l,p for n found W=modpow(r, L,p); // Wn for NTT iW=modpow(r,p-1-L,p); // Wn for INTT and here is my slow NTT and INTT implementations (i havent got to fast NTT,INTT yet) they are both tested with Schönhage–Strassen multiplication successfully. //--------------------------------------------------------------------------- void NTT(long *dst,long *src,long n,long m,long w) { long i,j,wj,wi,a,n2=n>>1; for (wj=1,j=0;j<n;j++) { a=0; for (wi=1,i=0;i<n;i++) { a=modadd(a,modmul(wi,src[i],m),m); wi=modmul(wi,wj,m); } dst[j]=a; wj=modmul(wj,w,m); } } //--------------------------------------------------------------------------- void INTT(long *dst,long *src,long n,long m,long w) { long i,j,wi=1,wj=1,rN,a,n2=n>>1; rN=modpow(n,m-2,m); for (wj=1,j=0;j<n;j++) { a=0; for (wi=1,i=0;i<n;i++) { a=modadd(a,modmul(wi,src[i],m),m); wi=modmul(wi,wj,m); } dst[j]=modmul(a,rN,m); wj=modmul(wj,w,m); } } //--------------------------------------------------------------------------- dst is destination array src is source array n is array size m is modulus (p) w is basis (Wn) hope this helps to someone. If i forgot something please write ... [edit1: fast NTT/INTT] Finally I manage to get fast NTT/INTT to work. Was little bit more tricky than normal FFT: //--------------------------------------------------------------------------- void _NFTT(long *dst,long *src,long n,long m,long w) { if (n<=1) { if (n==1) dst[0]=src[0]; return; } long i,j,a0,a1,n2=n>>1,w2=modmul(w,w,m); // reorder even,odd for (i=0,j=0;i<n2;i++,j+=2) dst[i]=src[j]; for ( j=1;i<n ;i++,j+=2) dst[i]=src[j]; // recursion _NFTT(src ,dst ,n2,m,w2); // even _NFTT(src+n2,dst+n2,n2,m,w2); // odd // restore results for (w2=1,i=0,j=n2;i<n2;i++,j++,w2=modmul(w2,w,m)) { a0=src[i]; a1=modmul(src[j],w2,m); dst[i]=modadd(a0,a1,m); dst[j]=modsub(a0,a1,m); } } //--------------------------------------------------------------------------- void _INFTT(long *dst,long *src,long n,long m,long w) { long i,rN; rN=modpow(n,m-2,m); _NFTT(dst,src,n,m,w); for (i=0;i<n;i++) dst[i]=modmul(dst[i],rN,m); } //--------------------------------------------------------------------------- [edit3] I have optimized my code (3x times faster than code above),but still i am not satisfied with it so i started new question with it. There I have optimized my code even further (about 40x times faster than code above) so its almost the same speed as FFT on floating point of the same bit size. Link to it is here: Modular arithmetics and NTT (finite field DFT) optimizations
To turn Cooley-Tukey (complex) FFT into modular arithmetic approach, i.e. NTT, you must replace complex definition for omega. For the approach to be purely recursive, you also need to recalculate omega for each level based on current signal size. This is possible because min. suitable modulus decreases as we move down in the call tree, so modulus used for root is suitable for lower layers. Additionally, as we are using same modulus, the same generator may be used as we move down the call tree. Also, for inverse transform, you should take additional step to take recalculated omega a and instead use as omega: b = a ^ -1 (via using inverse modulo operation). Specifically, b = invMod(a, N) s.t. b * a == 1 (mod N), where N is the chosen prime modulus. Rewriting an expression involving omega by exploiting periodicity still works in modular arithmetic realm. You also need to find a way to determine the modulus (a prime) for the problem and a valid generator. We note that your code works, though it is not a MWE. We extended it using common sense, and got correct result for a polynomial multiplication application. You just have to provide correct values of omega raised to certain powers. While your code works, though, like from many other sources, you double spacing for each level. This does not lead to recursion that is as clean, though; this turns out to be identical to recalculating omega based on current signal size because the power for omega definition is inversely proportional to signal size. To reiterate: halving signal size is like squaring omega, which is like giving doubled powers for omega (which is what one would do for doubling of spacing). The nice thing about the approach that deals with recalculating of omega is that each subproblem is more cleanly complete in its own right. There is a paper that shows some of the math for modular approach; it is a paper by Baktir and Sunar from 2006. See the paper at the end of this post. You do not need to extend the cycle from n / 2 to n. So, yes, some sources which say to just drop in a different omega definition for modular arithmetic approach are sweeping under the rug many details. Another issue is that it is important to acknowledge that the signal size must be large enough if we are to not have overflow for result time-domain signal if we are performing convolution. Additionally, it may be useful to find certain implementations for exponentiation subject to modulus exist that are fast, even if the power is quite large. References Baktir and Sunar - Achieving efficient polynomial multiplication in Fermat fields using the fast Fourier transform (2006)
You must make sure that roots of unity actually exist. In R there are only 2 roots of unity: 1 and -1, since only for them x^n=1 can be true. In C you have infinitely many roots of unity: w=exp(2*pi*i/N) is a primitive N-th roots of unity and all w^k for 0<=k Now to your problem: you have to make sure the ring you're working in offers the same property: enough roots of unity. Schönhage and Strassen (http://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm) use integers modulo 2^N+1. This ring has enough roots of unity. 2^N == -1 is a 2nd root of unity, 2^(N/2) is a 4th root of unity and so on. Furthermore, these roots of unity have the advantage that they are powers of two and can be implemented as binary shifts (with a modulo operation afterwards, which comes down to a add/subtract). I think QuickMul (http://www.cs.nyu.edu/exact/doc/qmul.ps) works modulo 2^N-1.