GSL integration accuracy - integration

I am integrating the function f(x) = a(n-1)/(1+ax)^n from x = 0 to x = infinity, where 1 < n < 2. For large a, the gradient near x = 0 becomes large and gsl_integration starts to lose accuracy (even using adaptive integration). Is there any way I can maintain the same accuracy as a increases? Would splitting the integral up to two parts with one near x = 0 help?

The integral with those limits should be independent of the value of a.
If you make the substitution u = ax you will find that
So there will be no need to worry about the loss of accuracy for very large a.

Related

feature scaling and intercept

it is a beginner question but I have a dataset of two features house sizes and number of bedrooms, So I'm working on Octave; So basically I try to do a feature scaling but as in the Design Matrix I added A column of ones (for tetha0) So I try to do a mean normalisation : (x- mean(x)/std(x)
but in the columns of one obviously the mean is 1 since there is only 1 in every line So when I do this the intercept colum is set to 0 :
mu = mean(X)
mu =
1.0000 2000.6809 3.1702
votric = X - mu
votric =
0.00000 103.31915 -0.17021
0.00000 -400.68085 -0.17021
0.00000 399.31915 -0.17021
0.00000 -584.68085 -1.17021
0.00000 999.31915 0.82979
So the first column shouldn't be left out of the mean normalization ?
Yes, you're supposed to normalise the original dataset over all observations first, and only then add the bias term (i.e. the 'column of ones').
The point of normalisation is to allow the various features to be compared on an equal basis, which speeds up optimization algorithms significantly.
The bias (i.e. column of ones) is technically not part of the features. It is just a mathematical convenience to allow us to use a single matrix multiplication to obtain our result in a computationally and notationally efficient manner.
In other words, instead of saying Y = bias + weight1 * X1 + weight2 * X2 etc, you create an imaginary X0 = 1, and denote the bias as weight0, which then allows you to express it in a vectorised fashion as follows: Y = weights * X
"Normalising" the bias term does not make sense, because clearly that would make X0 = 0, the effect of which would be that you would then discard the effect of the bias term altogether. So yes, normalise first, and only then add 'ones' to the normalised features.
PS. I'm going on a limb here and guessing that you're coming from Andrew Ng's machine learning course on coursera. You will see in ex1_multi.m that this is indeed what he's doing in his code (line 52).
% Scale features and set them to zero mean
fprintf('Normalizing Features ...\n');
[X mu sigma] = featureNormalize(X);
% Add intercept term to X
X = [ones(m, 1) X];

Is second order method worse than first order method?

I was thinking about an elementary question in numerical analysis.
When discretizing an ordinary differential equation, it is well known that a second order method is more accurate than a first order method, since the truncation error for second order method is O(dx^2) and O(dx) for the first order method. This is true when 0 < dx < 1.
what if dx > 1? For example, the domain is 0 to 10000 and the mesh size is 1000, then dx = 10. In this case, is the second order method not accurate as first order method, since dx^2 = 100 and dx = 10? We can encounter this when dealing with large scale problem, such as climate modeling (the cloud size could be several kilometers).
A second order method is not more accurate than a first order method because dx^2 < dx, for some value of dx. It's a statement about the asymptotic rate of convergence for small dx.
Additionally, comparing dx^2 to dx directly doesn't make sense, because dx isn't a unitless quantity, it's a length. So you're trying to compare an area to a length, which is meaningless.
In big-O notation, if a quantity converges with O(dx^2), then that typically means that the error is of the form e = a2 dx^2 + a3 dx^3 + ... The leading coefficient a2 is in the units of X/meters^2, where X is whatever units your error is in, and maybe you use some other length instead of meters. Similarly, for a first order solution, the error is in the form b1 dx + b2 dx^2 + ..., where b1 is in units of X/meters.
So if you decide you can neglect the non-leading terms (which you probably can't for large values of dx), the comparison isn't between dx^2 and dx, it's between a2 dx^2 and b1 dx. There is obviously a cross over between these two error terms, but it's not at dx=1, it's at dx = b1/a2. If your discretization is that coarse, you're probably not in the asymptotic regime in which you can ignore higher-order terms, and your solution is probably very inaccurate anyway.

Pollard’s p−1 algorithm: understanding of Berkeley paper

This paper explains about Pollard's p-1 factorization algorithm. I am having trouble understanding the case when factor found is equal to the input we go back and change 'a' (basically page 2 point 2 in the aforementioned paper).
Why we go back and increment 'a'?
Why we not go ahead and keep incrementing the factorial? It it because we keep going into the same cycle we have already seen?
Can I get all the factors using this same algorithm? Such as 49000 = 2^3 * 5^3 * 7^2. Currently I only get 7 and 7000. Perhaps I can use this get_factor() function recursively but I am wondering about the base cases.
def gcd(a, b):
if not b:
return a
return gcd(b, a%b)
def get_factor(input):
a = 2
for factorial in range(2, input-1):
'''we are not calculating factorial as anyway we need to find
out the gcd with n so we do mod n and we also use previously
calculate factorial'''
a = a**factorial % input
factor = gcd(a - 1, input)
if factor == 1:
continue
elif factor == input:
a += 1
elif factor > 1:
return factor
n = 10001077
p = get_factor(n)
q = n/p
print("factors of", n, "are", p, "and", q)
The linked paper is not a particularly good description of Pollard's p − 1 algorithm; most descriptions discuss smoothness bounds that make the algorithm much more practical. You might like to read this page at Prime Wiki. To answer your specific questions:
Why increment a? Because the original a doesn't work. In practice, most implementations don't bother; instead, a different factoring method, such as the elliptic curve method, is tried instead.
Why not increment the factorial? This is where the smoothness bound comes into play. Read the page at Mersenne Wiki for more details.
Can I get all factors? This question doesn't apply to the paper you linked, which assumes that the number being factored is a semi-prime with exactly two factors. The more general answer is "maybe." This is what happens at Step 3a of the linked paper, and choosing a new a may work (or may not). Or you may want to move to a different factoring algorithm.
Here is my simple version of the p − 1 algorithm, using x instead of a. The while loop computes the magical L of the linked paper (it's the least common multiple of the integers less than the smoothness bound b), which is the same calculation as the factorial of the linked paper, but done in a different way.
def pminus1(n, b, x=2):
q = 0; pgen = primegen(); p = next(pgen)
while p < b:
x = pow(x, p**ilog(p,b), n)
q, p = p, next(pgen)
g = gcd(x-1, n)
if 1 < g < n: return g
return False
You can see it in action at http://ideone.com/eMPHtQ, where it factors 10001 as in the linked paper as well as finding a rather spectacular 36-digit factor of fibonacci(522). Once you master that algorithm, you might like to move on to the two-stage version of the algorithm.

any rules of thumb how to smooth FFT spectrum to prevent artifacts when hand-tweaking?

I've got a FFT magnitude spectrum and I want to create a filter from it that selectively passes periodic noise sources (e.g. sinewave spurs) and zero's out the frequency bins associated with the random background noise. I understand sharp transitions in the freq domain will create ringing artifacts once this filter is IFFT back to the time domain... and so I'm wondering if there are any rules of thumb how to smooth the transitions in such a filter to avoid such ringing.
For example, if the FFT has 1M frequency bins, and there are five spurs poking out of the background noise floor, I'd like to zero all bins except the peak bin associated with each of the five spurs. The question is how to handle the neighboring spur bins to prevent artifacts in the time domain. For example, should the the bin on each side of a spur bin be set to 50% amplitude? Should two bins on either side of a spur bin be used (the closest one at 50%, and the next closest at 25%, etc.)? Any thoughts greatly appreciated. Thanks!
I like the following method:
Create the ideal magnitude spectrum (remembering to make it symmetrical about DC)
Inverse transform to the time domain
Rotate the block by half the blocksize
Apply a Hann window
I find it creates reasonably smooth frequency domain results, although I've never tried it on something as sharp as you're suggesting. You can probably make a sharper filter by using a Kaiser-Bessel window, but you have to pick the parameters appropriately. By sharper, I'm guessing maybe you can reduce the sidelobes by 6 dB or so.
Here's some sample Matlab/Octave code. To test the results, I used freqz(h, 1, length(h)*10);.
function [ht, htrot, htwin] = ArbBandPass(N, freqs)
%# N = desired filter length
%# freqs = array of frequencies, normalized by pi, to turn into passbands
%# returns raw, rotated, and rotated+windowed coeffs in time domain
if any(freqs >= 1) || any(freqs <= 0)
error('0 < passband frequency < 1.0 required to fit within (DC,pi)')
end
hf = zeros(N,1); %# magnitude spectrum from DC to 2*pi is intialized to 0
%# In Matlabs FFT, idx 1 -> DC, idx 2 -> bin 1, idx N/2 -> Fs/2 - 1, idx N/2 + 1 -> Fs/2, idx N -> bin -1
idxs = round(freqs * N/2)+1; %# indeces of passband freqs between DC and pi
hf(idxs) = 1; %# set desired positive frequencies to 1
hf(N - (idxs-2)) = 1; %# make sure 2-sided spectrum is symmetric, guarantees real filter coeffs in time domain
ht = ifft(hf); %# this will have a small imaginary part due to numerical error
if any(abs(imag(ht)) > 2*eps(max(abs(real(ht)))))
warning('Imaginary part of time domain signal surprisingly large - is the spectrum symmetric?')
end
ht = real(ht); %# discard tiny imag part from numerical error
htrot = [ht((N/2 + 1):end) ; ht(1:(N/2))]; %# circularly rotate time domain block by N/2 points
win = hann(N, 'periodic'); %# might want to use a window with a flatter mainlobe
htwin = htrot .* win;
htwin = htwin .* (N/sum(win)); %# normalize peak amplitude by compensating for width of window lineshape

How would you write this algorithm for large combinations in the most compact way?

The number of combinations of k items which can be retrieved from N items is described by the following formula.
N!
c = ___________________
(k! * (N - k)!)
An example would be how many combinations of 6 Balls can be drawn from a drum of 48 Balls in a lottery draw.
Optimize this formula to run with the smallest O time complexity
This question was inspired by the new WolframAlpha math engine and the fact that it can calculate extremely large combinations very quickly. e.g. and a subsequent discussion on the topic on another forum.
http://www97.wolframalpha.com/input/?i=20000000+Choose+15000000
I'll post some info/links from that discussion after some people take a stab at the solution.
Any language is acceptable.
Python: O(min[k,n-k]2)
def choose(n,k):
k = min(k,n-k)
p = q = 1
for i in xrange(k):
p *= n - i
q *= 1 + i
return p/q
Analysis:
The size of p and q will increase linearly inside the loop, if n-i and 1+i can be considered to have constant size.
The cost of each multiplication will then also increase linearly.
This sum of all iterations becomes an arithmetic series over k.
My conclusion: O(k2)
If rewritten to use floating point numbers, the multiplications will be atomic operations, but we will lose a lot of precision. It even overflows for choose(20000000, 15000000). (Not a big surprise, since the result would be around 0.2119620413×104884378.)
def choose(n,k):
k = min(k,n-k)
result = 1.0
for i in xrange(k):
result *= 1.0 * (n - i) / (1 + i)
return result
Notice that WolframAlpha returns a "Decimal Approximation". If you don't need absolute precision, you could do the same thing by calculating the factorials with Stirling's Approximation.
Now, Stirling's approximation requires the evaluation of (n/e)^n, where e is the base of the natural logarithm, which will be by far the slowest operation. But this can be done using the techniques outlined in another stackoverflow post.
If you use double precision and repeated squaring to accomplish the exponentiation, the operations will be:
3 evaluations of a Stirling approximation, each requiring O(log n) multiplications and one square root evaluation.
2 multiplications
1 divisions
The number of operations could probably be reduced with a bit of cleverness, but the total time complexity is going to be O(log n) with this approach. Pretty manageable.
EDIT: There's also bound to be a lot of academic literature on this topic, given how common this calculation is. A good university library could help you track it down.
EDIT2: Also, as pointed out in another response, the values will easily overflow a double, so a floating point type with very extended precision will need to be used for even moderately large values of k and n.
I'd solve it in Mathematica:
Binomial[n, k]
Man, that was easy...
Python: approximation in O(1) ?
Using python decimal implementation to calculate an approximation. Since it does not use any external loop, and the numbers are limited in size, I think it will execute in O(1).
from decimal import Decimal
ln = lambda z: z.ln()
exp = lambda z: z.exp()
sinh = lambda z: (exp(z) - exp(-z))/2
sqrt = lambda z: z.sqrt()
pi = Decimal('3.1415926535897932384626433832795')
e = Decimal('2.7182818284590452353602874713527')
# Stirling's approximation of the gamma-funciton.
# Simplification by Robert H. Windschitl.
# Source: http://en.wikipedia.org/wiki/Stirling%27s_approximation
gamma = lambda z: sqrt(2*pi/z) * (z/e*sqrt(z*sinh(1/z)+1/(810*z**6)))**z
def choose(n, k):
n = Decimal(str(n))
k = Decimal(str(k))
return gamma(n+1)/gamma(k+1)/gamma(n-k+1)
Example:
>>> choose(20000000,15000000)
Decimal('2.087655025913799812289651991E+4884377')
>>> choose(130202807,65101404)
Decimal('1.867575060806365854276707374E+39194946')
Any higher, and it will overflow. The exponent seems to be limited to 40000000.
Given a reasonable number of values for n and K, calculate them in advance and use a lookup table.
It's dodging the issue in some fashion (you're offloading the calculation), but it's a useful technique if you're having to determine large numbers of values.
MATLAB:
The cheater's way (using the built-in function NCHOOSEK): 13 characters, O(?)
nchoosek(N,k)
My solution: 36 characters, O(min(k,N-k))
a=min(k,N-k);
prod(N-a+1:N)/prod(1:a)
I know this is a really old question but I struggled with a solution to this problem for a long while until I found a really simple one written in VB 6 and after porting it to C#, here is the result:
public int NChooseK(int n, int k)
{
var result = 1;
for (var i = 1; i <= k; i++)
{
result *= n - (k - i);
result /= i;
}
return result;
}
The final code is so simple you won't believe it will work until you run it.
Also, the original article gives some nice explanation on how he reached the final algorithm.