Measured GFLOPS is greater then theoretical GFLOPS - octave

I have written a script to measure the GFLOPS that I can expect for an element-wise matrix multiplication in Octave. My CPU is a i7-2670QM # 2.2GHz. Looking at the spec the theoretical GFLOPS is 70.4. Running the script below which uses just one of the four cores of my system I measured 185 GFLOPS.
1;
n = 4096;
x = rand(n, n);
tic, x = x .* x;
y = toc
printf('GFLOPS: %f\n', n * n / y / 1e6);
Starting Ocatve and running the script (mult.m):
octave:1> mult
y = 0.090080
GFLOPS: 186.247910
The script performs 4096 * 4096 double precision multiplications (FLOPS) in 0.09 seconds, i.e. 186 GFLOPS. This is much greater then the theoretical 70.4 GFLOPS. What is wrong?
The operator .* is an element-wise multiplication as you can see:
octave:1> a = [1, 2; 3, 4];
octave:2> b = [2, 3; 4, 5];
octave:3> a .* b
ans =
2 6
12 20
Thus, I expect n² multiplications.

You are measuring M(ega)FLOPS (1e6), not G(iga)FLOPS (1e9)

Related

Why cannot I see the print statements even tho the code is compiling and not showing any errors

Consider this code:
% DTF vs FFT
%% Example 1 N = 64
close all
clear
clc
eval_dft_vs_fft(64);
%% Example 2 N = 512
close all
clear
eval_dft_vs_fft(512);
%% Example 3 N = 4096
close all
clear
eval_dft_vs_fft(4096);
function [t_DFT,t_FFT, RMSE_FFT, RMSE_DFT] = eval_dft_vs_fft(N)
% generate a arrray of random, complex numbers
x = complex(rand(1, N), rand(1,N));
tic % begin time measurement for the DFT calculation
x_DFT = IDFT(DFT(x)); % Determine the DFT and IDFT result
t_DFT = toc; % end time measurement
tic % begin time measurement for the FFT calculation
x_FFT = ifft(fft(x)); % Determine the FFT and IFFT result
t_FFT = toc; % end time measurement
% calculate the RMS Error of the DTF
mean = sum(abs(x - x_DFT).^2)/N;
RMSE_DFT = sqrt(mean);
% calculate the RMS Error of the FFT
mean = sum(abs(x - x_FFT).^2)/N;
RMSE_FFT = sqrt(mean);
disp("Number of elements N = " + N)
disp(" ")
disp("Calculation Time DTF = " + t_DFT)
disp("Calculation Time FFT = " + t_FFT)
disp(" ")
disp("RMS Error DTF = " + RMSE_DFT)
disp("RMS Error FFT = " + RMSE_FFT)
fprintf('\n---------------\n\n')
end
function x = IDFT(X)
N = length(X);
x = zeros(1, N);
for n=0:N-1
x_1 = 0;
for k = 0:N-1
x_1 = x_1 + X(k+1) .* exp((1j*2*pi*k*n)/N);
end
x(n+1) = x_1;
end
x = x ./ N;
end
function X = DFT(x)
N = numel(x);
X = zeros(1, N);
for k=0:N-1
X_1 = 0;
for n = 0:N-1
X_1 = X_1 + x(n+1) .* exp(-(1j*2*pi*k*n)/N);
end
X(k+1) = X_1;
end
end
Its purpose is to compare the calculation time of the DFT and the FFT as well as their RMS error. I am getting no errors in the command window but the disp statements arent showing up anywhere?
Rather what I get in the command window is this;
Columns 1 through 22:
142 181 173 162 165 178 96 175 166 96 165 172 165 173 165 174 180 179 96 142 96 125
61 32
I am very new to Octave so any help is appreciated.
The function disp does not allow mix text and values. That's the reason why the output are a list of numbers (probably codes os chars or something else). You could separate
disp("Number of elements N = "), disp(N) resulting in two lines or use others functions as printf printf("Number of elements N = %d", N) to keep text and value in the same line.

How to deduce left-hand side matrix from vector?

Suppose I have the following script, which constructs a symbolic array, A_known, and a symbolic vector x, and performs a matrix multiplication.
clc; clearvars
try
pkg load symbolic
catch
error('Symbolic package not available!');
end
syms V_l k s0 s_mean
N = 3;
% Generate left-hand-side square matrix
A_known = sym(zeros(N));
for hI = 1:N
A_known(hI, 1:hI) = exp(-(hI:-1:1)*k);
end
A_known = A_known./V_l;
% Generate x vector
x = sym('x', [N 1]);
x(1) = x(1) + s0*V_l;
% Matrix multiplication to give b vector
b = A_known*x
Suppose A_known was actually unknown. Is there a way to deduce it from b and x? If so, how?
Til now, I only had the case where x was unknown, which normally can be solved via x = b \ A.
Mathematically, it is possible to get a solution, but it actually has infinite solutions.
Example
A = magic(5);
x = (1:5)';
b = A*x;
A_sol = b*pinv(x);
which has
>> A
A =
17 24 1 8 15
23 5 7 14 16
4 6 13 20 22
10 12 19 21 3
11 18 25 2 9
but solves A as A_sol like
>> A_sol
A_sol =
3.1818 6.3636 9.5455 12.7273 15.9091
3.4545 6.9091 10.3636 13.8182 17.2727
4.4545 8.9091 13.3636 17.8182 22.2727
3.4545 6.9091 10.3636 13.8182 17.2727
3.1818 6.3636 9.5455 12.7273 15.9091

Find integer solution to formula

Given two vectors of candidates:
x = [1 3 5];
y = [1 2 3 4];
I want to find which candidates satisfy an equation or formula. This is what I want to do:
f = x + y - 6;
solve f;
And then, it spits out the solutions:
5 1
3 3
If it matters, I am actually using Octave, not MatLab because I don't have a Windows machine. I know that I can do this with a for loop:
for i=x
for j=y
if i+j-6==0
disp([i j]);
end
end
This is a trivial example. I am looking for a solution that will handle much larger examples.
Solving such equations per "brute force" is generally a bad idea but here you go:
x = [1 3 5];
y = [1 2 3 4];
## build grid (also works for n vars)
[xx, yy] = ndgrid (x, y);
## anonymous function
f = #(x,y) abs(x + y - 6) < 16*eps
## true?
t = f (xx, yy);
## build result
[xx(t) yy(t)]

Expected number of bank conflicts in shared memory at random access

Let A be a properly aligned array of 32-bit integers in shared memory.
If a single warp tries to fetch elements of A at random, what is the expected number of bank conflicts?
In other words:
__shared__ int A[N]; //N is some big constant integer
...
int v = A[ random(0..N-1) ]; // <-- expected number of bank conflicts here?
Please assume Tesla or Fermi architecture. I don't want to dwell into 32-bit vs 64-bit bank configurations of Kepler. Also, for simplicity, let us assume that all the random numbers are different (thus no broadcast mechanism).
My gut feeling suggests a number somewhere between 4 and 6, but I would like to find some mathematical evaluation of it.
I believe the problem can be abstracted out from CUDA and presented as a math problem. I searched it as an extension to Birthday Paradox, but I found really scary formulas there and didn't find a final formula. I hope there is a simpler way...
In math, this is thought of as a "balls in bins" problem - 32 balls are randomly dropped into 32 bins. You can enumerate the possible patterns and calculate their probabilities to determine the distribution. A naive approach will not work though as the number of patterns is huge: (63!)/(32!)(31!) is "almost" a quintillion.
It is possible to tackle though if you build up the solution recursively and use conditional probabilities.
Look for a paper called "The exact distribution of the maximum, minimum and the range of Multinomial/Dirichlet and Multivariate Hypergeometric frequencies" by Charles J. Corrado.
In the following, we start at leftmost bucket and calculate the probabilities for each number of balls that could have fallen into it. Then we move one to the right and determine the conditional probabilities of each number of balls that could be in that bucket given the number of balls and buckets already used.
Apologies for the VBA code, but VBA was all I had available when motivated to answer :).
Function nCr#(ByVal n#, ByVal r#)
Static combin#()
Static size#
Dim i#, j#
If n = r Then
nCr = 1
Exit Function
End If
If n > size Then
ReDim combin(0 To n, 0 To n)
combin(0, 0) = 1
For i = 1 To n
combin(i, 0) = 1
For j = 1 To i
combin(i, j) = combin(i - 1, j - 1) + combin(i - 1, j)
Next
Next
size = n
End If
nCr = combin(n, r)
End Function
Function p_binom#(n#, r#, p#)
p_binom = nCr(n, r) * p ^ r * (1 - p) ^ (n - r)
End Function
Function p_next_bucket_balls#(balls#, balls_used#, total_balls#, _
bucket#, total_buckets#, bucket_capacity#)
If balls > bucket_capacity Then
p_next_bucket_balls = 0
Else
p_next_bucket_balls = p_binom(total_balls - balls_used, balls, 1 / (total_buckets - bucket + 1))
End If
End Function
Function p_capped_buckets#(n#, cap#)
Dim p_prior, p_update
Dim bucket#, balls#, prior_balls#
ReDim p_prior(0 To n)
ReDim p_update(0 To n)
p_prior(0) = 1
For bucket = 1 To n
For balls = 0 To n
p_update(balls) = 0
For prior_balls = 0 To balls
p_update(balls) = p_update(balls) + p_prior(prior_balls) * _
p_next_bucket_balls(balls - prior_balls, prior_balls, n, bucket, n, cap)
Next
Next
p_prior = p_update
Next
p_capped_buckets = p_update(n)
End Function
Function expected_max_buckets#(n#)
Dim cap#
For cap = 0 To n
expected_max_buckets = expected_max_buckets + (1 - p_capped_buckets(n, cap))
Next
End Function
Sub test32()
Dim p_cumm#(0 To 32)
Dim cap#
For cap# = 0 To 32
p_cumm(cap) = p_capped_buckets(32, cap)
Next
For cap = 1 To 32
Debug.Print " ", cap, Format(p_cumm(cap) - p_cumm(cap - 1), "0.000000")
Next
End Sub
For 32 balls and buckets, I get an expected maximum number of balls in the buckets of about 3.532941.
Output to compare to ahmad's:
1 0.000000
2 0.029273
3 0.516311
4 0.361736
5 0.079307
6 0.011800
7 0.001417
8 0.000143
9 0.000012
10 0.000001
11 0.000000
12 0.000000
13 0.000000
14 0.000000
15 0.000000
16 0.000000
17 0.000000
18 0.000000
19 0.000000
20 0.000000
21 0.000000
22 0.000000
23 0.000000
24 0.000000
25 0.000000
26 0.000000
27 0.000000
28 0.000000
29 0.000000
30 0.000000
31 0.000000
32 0.000000
I'll try a math answer, although I don't have it quite right yet.
You basically want to know, given random 32-bit word indexing within a warp into an aligned __shared__ array, "what is the expected value of the maximum number of addresses within a warp that map to a single bank?"
If I consider the problem similar to hashing, then it relates to the expected maximum number of items that will hash to a single location, and this document shows an upper bound on that number of O(log n / log log n) for hashing n items into n buckets. (The math is pretty hairy!).
For n = 32, that works out to about 2.788 (using natural log). That’s fine, but here I modified ahmad's program a bit to empirically calculate the expected maximum (also simplified the code and modified names and such for clarity and fixed some bugs).
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <algorithm>
#define NBANK 32
#define WARPSIZE 32
#define NSAMPLE 100000
int main(){
int i=0,j=0;
int *bank=(int*)malloc(sizeof(int)*NBANK);
int *randomNumber=(int*)malloc(sizeof(int)*WARPSIZE);
int *maxCount=(int*)malloc(sizeof(int)*(NBANK+1));
memset(maxCount, 0, sizeof(int)*(NBANK+1));
for (int i=0; i<NSAMPLE; ++i) {
// generate a sample warp shared memory access
for(j=0; j<WARPSIZE; j++){
randomNumber[j]=rand()%NBANK;
}
// check the bank conflict
memset(bank, 0, sizeof(int)*NBANK);
int max_bank_conflict=0;
for(j=0; j<WARPSIZE; j++){
bank[randomNumber[j]]++;
}
for(j=0; j<WARPSIZE; j++)
max_bank_conflict = std::max<int>(max_bank_conflict, bank[j]);
// store statistic
maxCount[max_bank_conflict]++;
}
// report statistic
printf("Max conflict degree %% (%d random samples)\n", NSAMPLE);
float expected = 0;
for(i=1; i<NBANK+1; i++) {
float prob = maxCount[i]/(float)NSAMPLE;
printf("%02d -> %6.4f\n", i, prob);
expected += prob * i;
}
printf("Expected maximum bank conflict degree = %6.4f\n", expected);
return 0;
}
Using the percentages found in the program as probabilities, the expected maximum value is the sum of products sum(i * probability(i)), for i from 1 to 32. I compute the expected value to be 3.529 (matches ahmad's data). It’s not super far off, but the 2.788 is supposed to be an upper bound. Since the upper bound is given in big-O notation, I guess there’s a constant factor left out. But that's currently as far as I've gotten.
Open questions: Is that constant factor enough to explain it? Is it possible to compute the constant factor for n = 32? It would be interesting to reconcile these, and/or to find a closed form solution for the expected maximum bank conflict degree with 32 banks and 32 parallel threads.
This is a very useful topic, since it can help in modeling and predicting performance when shared memory addressing is effectively random.
I assume fermi 32-bank shared memory where each 4 consequent bytes are stored in consequent banks. Using following code:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define NBANK 32
#define N 7823
#define WARPSIZE 32
#define NSAMPLE 10000
int main(){
srand ( time(NULL) );
int i=0,j=0;
int *conflictCheck=NULL;
int *randomNumber=NULL;
int *statisticCheck=NULL;
conflictCheck=(int*)malloc(sizeof(int)*NBANK);
randomNumber=(int*)malloc(sizeof(int)*WARPSIZE);
statisticCheck=(int*)malloc(sizeof(int)*(NBANK+1));
while(i<NSAMPLE){
// generate a sample warp shared memory access
for(j=0; j<WARPSIZE; j++){
randomNumber[j]=rand()%NBANK;
}
// check the bank conflict
memset(conflictCheck, 0, sizeof(int)*NBANK);
int max_bank_conflict=0;
for(j=0; j<WARPSIZE; j++){
conflictCheck[randomNumber[j]]++;
max_bank_conflict = max_bank_conflict<conflictCheck[randomNumber[j]]? conflictCheck[randomNumber[j]]: max_bank_conflict;
}
// store statistic
statisticCheck[max_bank_conflict]++;
// next iter
i++;
}
// report statistic
printf("Over %d random shared memory access, there found following precentages of bank conflicts\n");
for(i=0; i<NBANK+1; i++){
//
printf("%d -> %6.4f\n",i,statisticCheck[i]/(float)NSAMPLE);
}
return 0;
}
I got following output:
Over 0 random shared memory access, there found following precentages of bank conflicts
0 -> 0.0000
1 -> 0.0000
2 -> 0.0281
3 -> 0.5205
4 -> 0.3605
5 -> 0.0780
6 -> 0.0106
7 -> 0.0022
8 -> 0.0001
9 -> 0.0000
10 -> 0.0000
11 -> 0.0000
12 -> 0.0000
13 -> 0.0000
14 -> 0.0000
15 -> 0.0000
16 -> 0.0000
17 -> 0.0000
18 -> 0.0000
19 -> 0.0000
20 -> 0.0000
21 -> 0.0000
22 -> 0.0000
23 -> 0.0000
24 -> 0.0000
25 -> 0.0000
26 -> 0.0000
27 -> 0.0000
28 -> 0.0000
29 -> 0.0000
30 -> 0.0000
31 -> 0.0000
32 -> 0.0000
We can come to conclude that 3 to 4 way conflict is the most likely with random access. You can tune the run with different N (number of elements in array), NBANK (number of banks in shared memory), WARPSIZE (warp size of machine), and NSAMPLE (number of random shared memory accesses generated to evaluate the model).

Getting a specific digit from a ratio expansion in any base (nth digit of x/y)

Is there an algorithm that can calculate the digits of a repeating-decimal ratio without starting at the beginning?
I'm looking for a solution that doesn't use arbitrarily sized integers, since this should work for cases where the decimal expansion may be arbitrarily long.
For example, 33/59 expands to a repeating decimal with 58 digits. If I wanted to verify that, how could I calculate the digits starting at the 58th place?
Edited - with the ratio 2124679 / 2147483647, how to get the hundred digits in the 2147484600th through 2147484700th places.
OK, 3rd try's a charm :)
I can't believe I forgot about modular exponentiation.
So to steal/summarize from my 2nd answer, the nth digit of x/y is the 1st digit of (10n-1x mod y)/y = floor(10 * (10n-1x mod y) / y) mod 10.
The part that takes all the time is the 10n-1 mod y, but we can do that with fast (O(log n)) modular exponentiation. With this in place, it's not worth trying to do the cycle-finding algorithm.
However, you do need the ability to do (a * b mod y) where a and b are numbers that may be as large as y. (if y requires 32 bits, then you need to do 32x32 multiply and then 64-bit % 32-bit modulus, or you need an algorithm that circumvents this limitation. See my listing that follows, since I ran into this limitation with Javascript.)
So here's a new version.
function abmody(a,b,y)
{
var x = 0;
// binary fun here
while (a > 0)
{
if (a & 1)
x = (x + b) % y;
b = (2 * b) % y;
a >>>= 1;
}
return x;
}
function digits2(x,y,n1,n2)
{
// the nth digit of x/y = floor(10 * (10^(n-1)*x mod y) / y) mod 10.
var m = n1-1;
var A = 1, B = 10;
while (m > 0)
{
// loop invariant: 10^(n1-1) = A*(B^m) mod y
if (m & 1)
{
// A = (A * B) % y but javascript doesn't have enough sig. digits
A = abmody(A,B,y);
}
// B = (B * B) % y but javascript doesn't have enough sig. digits
B = abmody(B,B,y);
m >>>= 1;
}
x = x % y;
// A = (A * x) % y;
A = abmody(A,x,y);
var answer = "";
for (var i = n1; i <= n2; ++i)
{
var digit = Math.floor(10*A/y)%10;
answer += digit;
A = (A * 10) % y;
}
return answer;
}
(You'll note that the structures of abmody() and the modular exponentiation are the same; both are based on Russian peasant multiplication.)
And results:
js>digits2(2124679,214748367,214748300,214748400)
20513882650385881630475914166090026658968726872786883636698387559799232373208220950057329190307649696
js>digits2(122222,990000,100,110)
65656565656
js>digits2(1,7,1,7)
1428571
js>digits2(1,7,601,607)
1428571
js>digits2(2124679,2147483647,2147484600,2147484700)
04837181235122113132440537741612893408915444001981729642479554583541841517920532039329657349423345806
edit: (I'm leaving post here for posterity. But please don't upvote it anymore: it may be theoretically useful but it's not really practical. I have posted another answer which is much more useful from a practical point of view, doesn't require any factoring, and doesn't require the use of bignums.)
#Daniel Bruckner has the right approach, I think. (with a few additional twists required)
Maybe there's a simpler method, but the following will always work:
Let's use the examples q = x/y = 33/57820 and 44/65 in addition to 33/59, for reasons that may become clear shortly.
Step 1: Factor the denominator (specifically factor out 2's and 5's)
Write q = x/y = x/(2a25a5z). Factors of 2 and 5 in the denominator do not cause repeated decimals. So the remaining factor z is coprime to 10. In fact, the next step requires factoring z, so you might as well factor the whole thing.
Calculate a10 = max(a2, a5) which is the smallest exponent of 10 that is a multiple of the factors of 2 and 5 in y.
In our example 57820 = 2 * 2 * 5 * 7 * 7 * 59, so a2 = 2, a5 = 1, a10 = 2, z = 7 * 7 * 59 = 2891.
In our example 33/59, 59 is a prime and contains no factors of 2 or 5, so a2 = a5 = a10 = 0.
In our example 44/65, 65 = 5*13, and a2 = 0, a5 = a10 = 1.
Just for reference I found a good online factoring calculator here. (even does totients which is important for the next step)
Step 2: Use Euler's Theorem or Carmichael's Theorem.
What we want is a number n such that 10n - 1 is divisible by z, or in other words, 10n ≡ 1 mod z. Euler's function φ(z) and Carmichael's function λ(z) will both give you valid values for n, with λ(z) giving you the smaller number and φ(z) being perhaps a little easier to calculate. This isn't too hard, it just means factoring z and doing a little math.
φ(2891) = 7 * 6 * 58 = 2436
λ(2891) = lcm(7*6, 58) = 1218
This means that 102436 ≡ 101218 ≡ 1 (mod 2891).
For the simpler fraction 33/59, φ(59) = λ(59) = 58, so 1058 ≡ 1 (mod 59).
For 44/65 = 44/(5*13), φ(13) = λ(13) = 12.
So what? Well, the period of the repeating decimal must divide both φ(z) and λ(z), so they effectively give you upper bounds on the period of the repeating decimal.
Step 3: More number crunching
Let's use n = λ(z). If we subtract Q' = 10a10x/y from Q'' = 10(a10 + n)x/y, we get:
m = 10a10(10n - 1)x/y
which is an integer because 10a10 is a multiple of the factors of 2 and 5 of y, and 10n-1 is a multiple of the remaining factors of y.
What we've done here is to shift left the original number q by a10 places to get Q', and shift left q by a10 + n places to get Q'', which are repeating decimals, but the difference between them is an integer we can calculate.
Then we can rewrite x/y as m / 10a10 / (10n - 1).
Consider the example q = 44/65 = 44/(5*13)
a10 = 1, and λ(13) = 12, so Q' = 101q and Q'' = 1012+1q.
m = Q'' - Q' = (1012 - 1) * 101 * (44/65) = 153846153846*44 = 6769230769224
so q = 6769230769224 / 10 / (1012 - 1).
The other fractions 33/57820 and 33/59 lead to larger fractions.
Step 4: Find the nonrepeating and repeating decimal parts.
Notice that for k between 1 and 9, k/9 = 0.kkkkkkkkkkkkk...
Similarly note that a 2-digit number kl between 1 and 99, k/99 = 0.klklklklklkl...
This generalizes: for k-digit patterns abc...ij, this number abc...ij/(10k-1) = 0.abc...ijabc...ijabc...ij...
If you follow the pattern, you'll see that what we have to do is to take this (potentially) huge integer m we got in the previous step, and write it as m = s*(10n-1) + r, where 1 ≤ r < 10n-1.
This leads to the final answer:
s is the non-repeating part
r is the repeating part (zero-padded on the left if necessary to ensure that it is n digits)
with a10 =
0, the decimal point is between the
nonrepeating and repeating part; if
a10 > 0 then it is located
a10 places to the left of
the junction between s and r.
For 44/65, we get 6769230769224 = 6 * (1012-1) + 769230769230
s = 6, r = 769230769230, and 44/65 = 0.6769230769230 where the underline here designates the repeated part.
You can make the numbers smaller by finding the smallest value of n in step 2, by starting with the Carmichael function λ(z) and seeing if any of its factors lead to values of n such that 10n ≡ 1 (mod z).
update: For the curious, the Python interpeter seems to be the easiest way to calculate with bignums. (pow(x,y) calculates xy, and // and % are integer division and remainder, respectively.) Here's an example:
>>> N = pow(10,12)-1
>>> m = N*pow(10,1)*44//65
>>> m
6769230769224
>>> r=m%N
>>> r
769230769230
>>> s=m//N
>>> s
6
>>> 44/65
0.67692307692307696
>>> N = pow(10,58)-1
>>> m=N*33//59
>>> m
5593220338983050847457627118644067796610169491525423728813
>>> r=m%N
>>> r
5593220338983050847457627118644067796610169491525423728813
>>> s=m//N
>>> s
0
>>> 33/59
0.55932203389830504
>>> N = pow(10,1218)-1
>>> m = N*pow(10,2)*33//57820
>>> m
57073676928398478035281909373919059149083362158422691110342442061570390868211691
45624351435489450017295053614666205465236942234520927014873746108612936700103770
32168799723279142165340712556208924247665167762020062262193012798339674852992044
27533725354548599100657212037357315807679003804911795226565202352127291594603943
27222414389484607402282947077135939121411276374956762365963334486336907644413697
68246281563472846765824974057419578000691802144586648218609477689380837080594949
84434451746800415081286751988931165686613628502248356969906606710480802490487720
51193358699411968177101349014181943964026288481494292632307160152196471809062608
09408509166378415773088896575579384296091317883085437564856451054998270494638533
37945347630577654790729851262538913870632998962296783120027672085783465928744379
10757523348322379799377378069872016603251470079557246627464545140089934278796264
26841923209961950882047734347976478727084053960567277758561051539259771705292286
40608785887236250432376340366655136630923555863023175371843652715323417502594258
04219993081978554133517813905223106191629194050501556554825319958491871324801106
88343133863714977516430300933932895191975095122794880664130058803182289865098581
80560359737115185
>>> r=m%N
>>> r
57073676928398478035281909373919059149083362158422691110342442061570390868211691
45624351435489450017295053614666205465236942234520927014873746108612936700103770
32168799723279142165340712556208924247665167762020062262193012798339674852992044
27533725354548599100657212037357315807679003804911795226565202352127291594603943
27222414389484607402282947077135939121411276374956762365963334486336907644413697
68246281563472846765824974057419578000691802144586648218609477689380837080594949
84434451746800415081286751988931165686613628502248356969906606710480802490487720
51193358699411968177101349014181943964026288481494292632307160152196471809062608
09408509166378415773088896575579384296091317883085437564856451054998270494638533
37945347630577654790729851262538913870632998962296783120027672085783465928744379
10757523348322379799377378069872016603251470079557246627464545140089934278796264
26841923209961950882047734347976478727084053960567277758561051539259771705292286
40608785887236250432376340366655136630923555863023175371843652715323417502594258
04219993081978554133517813905223106191629194050501556554825319958491871324801106
88343133863714977516430300933932895191975095122794880664130058803182289865098581
80560359737115185
>>> s=m//N
>>> s
0
>>> 33/57820
0.00057073676928398479
with the overloaded Python % string operator usable for zero-padding, to see the full set of repeated digits:
>>> "%01218d" % r
'0570736769283984780352819093739190591490833621584226911103424420615703908682116
91456243514354894500172950536146662054652369422345209270148737461086129367001037
70321687997232791421653407125562089242476651677620200622621930127983396748529920
44275337253545485991006572120373573158076790038049117952265652023521272915946039
43272224143894846074022829470771359391214112763749567623659633344863369076444136
97682462815634728467658249740574195780006918021445866482186094776893808370805949
49844344517468004150812867519889311656866136285022483569699066067104808024904877
20511933586994119681771013490141819439640262884814942926323071601521964718090626
08094085091663784157730888965755793842960913178830854375648564510549982704946385
33379453476305776547907298512625389138706329989622967831200276720857834659287443
79107575233483223797993773780698720166032514700795572466274645451400899342787962
64268419232099619508820477343479764787270840539605672777585610515392597717052922
86406087858872362504323763403666551366309235558630231753718436527153234175025942
58042199930819785541335178139052231061916291940505015565548253199584918713248011
06883431338637149775164303009339328951919750951227948806641300588031822898650985
8180560359737115185'
As a general technique, rational fractions have a non-repeating part followed by a repeating part, like this:
nnn.xxxxxxxxrrrrrr
xxxxxxxx is the nonrepeating part and rrrrrr is the repeating part.
Determine the length of the nonrepeating part.
If the digit in question is in the nonrepeating part, then calculate it directly using division.
If the digit in question is in the repeating part, calculate its position within the repeating sequence (you now know the lengths of everything), and pick out the correct digit.
The above is a rough outline and would need more precision to implement in an actual algorithm, but it should get you started.
AHA! caffiend: your comment to my other (longer) answer (specifically "duplicate remainders") leads me to a very simple solution that is O(n) where n = the sum of the lengths of the nonrepeating + repeating parts, and requires only integer math with numbers between 0 and 10*y where y is the denominator.
Here's a Javascript function to get the nth digit to the right of the decimal point for the rational number x/y:
function digit(x,y,n)
{
if (n == 0)
return Math.floor(x/y)%10;
return digit(10*(x%y),y,n-1);
}
It's recursive rather than iterative, and is not smart enough to detect cycles (the 10000th digit of 1/3 is obviously 3, but this keeps on going until it reaches the 10000th iteration), but it works at least until the stack runs out of memory.
Basically this works because of two facts:
the nth digit of x/y is the (n-1)th digit of 10x/y (example: the 6th digit of 1/7 is the 5th digit of 10/7 is the 4th digit of 100/7 etc.)
the nth digit of x/y is the nth digit of (x%y)/y (example: the 5th digit of 10/7 is also the 5th digit of 3/7)
We can tweak this to be an iterative routine and combine it with Floyd's cycle-finding algorithm (which I learned as the "rho" method from a Martin Gardner column) to get something that shortcuts this approach.
Here's a javascript function that computes a solution with this approach:
function digit(x,y,n,returnstruct)
{
function kernel(x,y) { return 10*(x%y); }
var period = 0;
var x1 = x;
var x2 = x;
var i = 0;
while (n > 0)
{
n--;
i++;
x1 = kernel(x1,y); // iterate once
x2 = kernel(x2,y);
x2 = kernel(x2,y); // iterate twice
// have both 1x and 2x iterations reached the same state?
if (x1 == x2)
{
period = i;
n = n % period;
i = 0;
// start again in case the nonrepeating part gave us a
// multiple of the period rather than the period itself
}
}
var answer=Math.floor(x1/y);
if (returnstruct)
return {period: period, digit: answer,
toString: function()
{
return 'period='+this.period+',digit='+this.digit;
}};
else
return answer;
}
And an example of running the nth digit of 1/700:
js>1/700
0.0014285714285714286
js>n=10000000
10000000
js>rs=digit(1,700,n,true)
period=6,digit=4
js>n%6
4
js>rs=digit(1,700,4,true)
period=0,digit=4
Same thing for 33/59:
js>33/59
0.559322033898305
js>rs=digit(33,59,3,true)
period=0,digit=9
js>rs=digit(33,59,61,true)
period=58,digit=9
js>rs=digit(33,59,61+58,true)
period=58,digit=9
And 122222/990000 (long nonrepeating part):
js>122222/990000
0.12345656565656565
js>digit(122222,990000,5,true)
period=0,digit=5
js>digit(122222,990000,7,true)
period=6,digit=5
js>digit(122222,990000,9,true)
period=2,digit=5
js>digit(122222,990000,9999,true)
period=2,digit=5
js>digit(122222,990000,10000,true)
period=2,digit=6
Here's another function that finds a stretch of digits:
// find digits n1 through n2 of x/y
function digits(x,y,n1,n2,returnstruct)
{
function kernel(x,y) { return 10*(x%y); }
var period = 0;
var x1 = x;
var x2 = x;
var i = 0;
var answer='';
while (n2 >= 0)
{
// time to print out digits?
if (n1 <= 0)
answer = answer + Math.floor(x1/y);
n1--,n2--;
i++;
x1 = kernel(x1,y); // iterate once
x2 = kernel(x2,y);
x2 = kernel(x2,y); // iterate twice
// have both 1x and 2x iterations reached the same state?
if (x1 == x2)
{
period = i;
if (n1 > period)
{
var jumpahead = n1 - (n1 % period);
n1 -= jumpahead, n2 -= jumpahead;
}
i = 0;
// start again in case the nonrepeating part gave us a
// multiple of the period rather than the period itself
}
}
if (returnstruct)
return {period: period, digits: answer,
toString: function()
{
return 'period='+this.period+',digits='+this.digits;
}};
else
return answer;
}
I've included the results for your answer (assuming that Javascript #'s didn't overflow):
js>digit(1,7,1,7,true)
period=6,digits=1428571
js>digit(1,7,601,607,true)
period=6,digits=1428571
js>1/7
0.14285714285714285
js>digit(2124679,214748367,214748300,214748400,true)
period=1759780,digits=20513882650385881630475914166090026658968726872786883636698387559799232373208220950057329190307649696
js>digit(122222,990000,100,110,true)
period=2,digits=65656565656
Ad hoc I have no good idea. Maybe continued fractions can help. I am going to think a bit about it ...
UPDATE
From Fermat's little theorem and because 39 is prime the following holds. (= indicates congruence)
10^39 = 10 (39)
Because 10 is coprime to 39.
10^(39 - 1) = 1 (39)
10^38 - 1 = 0 (39)
[to be continued tomorow]
I was to tiered to recognize that 39 is not prime ... ^^ I am going to update and the answer in the next days and present the whole idea. Thanks for noting that 39 is not prime.
The short answer for a/b with a < b and an assumed period length p ...
calculate k = (10^p - 1) / b and verify that it is an integer, else a/b has not a period of p
calculate c = k * a
convert c to its decimal represenation and left pad it with zeros to a total length of p
the i-th digit after the decimal point is the (i mod p)-th digit of the paded decimal representation (i = 0 is the first digit after the decimal point - we are developers)
Example
a = 3
b = 7
p = 6
k = (10^6 - 1) / 7
= 142,857
c = 142,857 * 3
= 428,571
Padding is not required and we conclude.
3 ______
- = 0.428571
7