How to get k prime factors? - language-agnostic

Is there an algorithm that can take a number k, and return a number j such that j has k prime factors? Note: the algorithm should run in polynomial time.
Assume you don't have a table of prime numbers.

Obvious answer: starting from a table of prime numbers, given a number k, multiply k of those prime numbers together and return the result. Assuming k is small enough that the multiplication time remains constant, that should run in linear time.
If you need to count the time to find the prime numbers, it should still be polynomial time, using a sieve of Erathosthenese to find the table of prime numbers.

Related

Explanation on determining if a decimal number has a finite representation in a base

When trying to find the answer I came across this and was wondering if this is true and why it is.
https://stackoverflow.com/a/489870/5712298
If anyone can explain it to me or link me to a page explaining it that would be great.
Stackoverflow markup does not support mathematical notation well, and most readers of this will be programmers, so I am going to use common programming expression syntax:
* multiplication
^ exponentiation
/ division
x[i] Element i of an array x
== equality
PROD product
This deals with the question of whether, given a radix r terminating fraction a/(r^n), there is a terminating radix s fraction b/(s^m) with exactly the same value, a, b integers, r and s positive integers, n and m non-negative integers.
a/(r^n)==b/(s^m) is equivalent to b==a*(s^m)/(r^n). a/(r^n) is exactly equal to some radix s terminating fraction if, and only if, there exists a positive integer m such that a*(s^m)/(r^n) is an integer.
Consider the prime factorization of r, PROD(p[i]^k[i]). If, for some i, p[i]^k[i] is a term in the prime factorization of r, then p[i]^(n*k[i]) is a term in the prime factorization of r^n.
a*(s^m)/(r^n) is an integer if, and only if, every p[i]^(n*k[i]) in the prime factorization of r^n is also a factor of a*(s^m)
First suppose p[i] is also a factor of s. Then for sufficiently large m, p[i]^(n*k[i]) is a factor of s^m.
Now suppose p[i] is not a factor of s. p[i]^(n*k[i]) is a factor of a*(s^m) if, and only if, it is a factor of a.
The necessary and sufficient condition for the existence of a non-negative integer m such that b==a*(s^m)/(r^n) is an integer is that, for each p[i]^k[i] in the prime factorization of r, either p[i] is a factor of s or p[i]^(n*k[i]) is a factor of a.
Applying this to the common case of r=10 and s=2, the prime factorization of r is (2^1)*(5^1). 2 is a factor of 2, so we can ignore it. 5 is not, so we need 5^n to be a factor of a.
Consider some specific cases:
Decimal 0.1 is 1/10, 5 is not factor of 1, so there is no exact binary fraction equivalent.
Decimal 0.625, 625/(10^3). 5^3 is 125, which is a factor of 625, so there is an exact binary fraction equivalent. (It is binary 0.101).
The method in the referenced answer https://stackoverflow.com/a/489870/5712298 is equivalent to this for decimal to binary. It would need some work to extend to the general case, to allow for prime factors whose exponent is not 1.

How to map number in a range to another in the same range with no collisions?

Effectively what I'm looking for is a function f(x) that outputs into a range that is pre-defined. Calling f(f(x)) should be valid as well. The function should be cyclical, so calling f(f(...(x))) where the number of calls is equal to the size of the range should give you the original number, and f(x) should not be time dependent and will always give the same output.
While I can see that taking a list of all possible values and shuffling it would give me something close to what I want, I'd much prefer it if I could simply plug values into the function one at a time so that I do not have to compute the entire range all at once.
I've looked into Minimal Perfect Hash Functions but haven't been able to find one that doesn't use external libraries. I'm okay with using them, but would prefer to not do so.
If an actual range is necessary to help answer my question, I don't think it would need to be bigger than [0, 2^24-1], but the starting and ending values don't matter too much.
You might want to take a look at Linear Congruential Generator. You shall be looking at full period generator (say, m=224), which means parameters shall satisfy Hull-Dobell Theorem.
Calling f(f(x)) should be valid as well.
should work
the number of calls is equal to the size of the range should give you the original number
yes, for LCG with parameters satisfying Hull-Dobell Theorem you'll get full period covered once, and 'm+1' call shall put you back at where you started.
Period of such LCG is exactly equal to m
should not be time dependent and will always give the same output
LCG is O(1) algorithm and it is 100% reproducible
LCG is reversible as well, via extended Euclid algorithm, check Reversible pseudo-random sequence generator for details
Minimal perfect hash functions are overkill, all you've asked for is a function f that is,
bijective, and
"cyclical" (ie fN=f)
For a permutation to be cyclical in that way, its order must divide N (or be N but in a way that's just a special case of dividing N). Which in turn means the LCM of the orders of the sub-cycles must divide N. One way to do that is to just have one "sub"-cycle of order N. For power of two N, it's also really easy to have lots of small cycles of some other power-of-two order. General permutations do not necessarily satisfy the cycle-requirement, of course they are bijective but the LCM of the orders of the sub-cycles may exceed N.
In the following I will leave all reduction modulo N implicit. Without loss of generality I will assume the range starts at 0 and goes up to N-1, where N is the size of the range.
The only thing I can immediately think of for general N is f(x) = x + c where gcd(c, N) == 1. The GCD condition ensures there is only one cycle, which necessarily has order N.
For power-of-two N I have more inspiration:
f(x) = cx where c is odd. Bijective because gcd(c, N) == 1 so c has a modular multiplicative inverse. Also cN=1, because φ(N)=N/2 (since N is a power of two) so cφ(N)=1 (Euler's theorem).
f(x) = x XOR c where c < N. Trivially bijective and trivially cycles with a period of 2, which divides N.
f(x) = clmul(x, c) where c is odd and clmul is carry-less multiplication. Bijective because any odd c has a carry-less multiplicative inverse. Has some power-of-two cycle length (less than N) so it divides N. I don't know why though. This is a weird one, but it has decent special cases such as x ^ (x << k). By symmetry, the "mirrored" version also works.
Eg x ^ (x >> k).
f(x) = x >>> k where >>> is bit-rotation. Obviously bijective, and fN(x) = x >>> Nk, where Nk mod N = 0 so it rotates all the way back to the unrotated position regardless of what k is.

Compute gradients for unknown number of factor

Is there a better way to write this? More particularly, is there a way to remove the loop and calculate grad directly without iterating?
for j = 1:size(theta)
grad(j) = 1 / m * sum((h - y) .* X(:, j));
endfor
h and y are both vectors, X is a matrix with an arbitrary number of rows and the same number of columns as theta
Your code seems to be pretty optimized.
The only other way; this code can be written; is initializing grad to a zeros matrix and then using the vectorized computation for carrying out the necessary Grad. descent optimization algorithm.

How would you write this algorithm for large combinations in the most compact way?

The number of combinations of k items which can be retrieved from N items is described by the following formula.
N!
c = ___________________
(k! * (N - k)!)
An example would be how many combinations of 6 Balls can be drawn from a drum of 48 Balls in a lottery draw.
Optimize this formula to run with the smallest O time complexity
This question was inspired by the new WolframAlpha math engine and the fact that it can calculate extremely large combinations very quickly. e.g. and a subsequent discussion on the topic on another forum.
http://www97.wolframalpha.com/input/?i=20000000+Choose+15000000
I'll post some info/links from that discussion after some people take a stab at the solution.
Any language is acceptable.
Python: O(min[k,n-k]2)
def choose(n,k):
k = min(k,n-k)
p = q = 1
for i in xrange(k):
p *= n - i
q *= 1 + i
return p/q
Analysis:
The size of p and q will increase linearly inside the loop, if n-i and 1+i can be considered to have constant size.
The cost of each multiplication will then also increase linearly.
This sum of all iterations becomes an arithmetic series over k.
My conclusion: O(k2)
If rewritten to use floating point numbers, the multiplications will be atomic operations, but we will lose a lot of precision. It even overflows for choose(20000000, 15000000). (Not a big surprise, since the result would be around 0.2119620413×104884378.)
def choose(n,k):
k = min(k,n-k)
result = 1.0
for i in xrange(k):
result *= 1.0 * (n - i) / (1 + i)
return result
Notice that WolframAlpha returns a "Decimal Approximation". If you don't need absolute precision, you could do the same thing by calculating the factorials with Stirling's Approximation.
Now, Stirling's approximation requires the evaluation of (n/e)^n, where e is the base of the natural logarithm, which will be by far the slowest operation. But this can be done using the techniques outlined in another stackoverflow post.
If you use double precision and repeated squaring to accomplish the exponentiation, the operations will be:
3 evaluations of a Stirling approximation, each requiring O(log n) multiplications and one square root evaluation.
2 multiplications
1 divisions
The number of operations could probably be reduced with a bit of cleverness, but the total time complexity is going to be O(log n) with this approach. Pretty manageable.
EDIT: There's also bound to be a lot of academic literature on this topic, given how common this calculation is. A good university library could help you track it down.
EDIT2: Also, as pointed out in another response, the values will easily overflow a double, so a floating point type with very extended precision will need to be used for even moderately large values of k and n.
I'd solve it in Mathematica:
Binomial[n, k]
Man, that was easy...
Python: approximation in O(1) ?
Using python decimal implementation to calculate an approximation. Since it does not use any external loop, and the numbers are limited in size, I think it will execute in O(1).
from decimal import Decimal
ln = lambda z: z.ln()
exp = lambda z: z.exp()
sinh = lambda z: (exp(z) - exp(-z))/2
sqrt = lambda z: z.sqrt()
pi = Decimal('3.1415926535897932384626433832795')
e = Decimal('2.7182818284590452353602874713527')
# Stirling's approximation of the gamma-funciton.
# Simplification by Robert H. Windschitl.
# Source: http://en.wikipedia.org/wiki/Stirling%27s_approximation
gamma = lambda z: sqrt(2*pi/z) * (z/e*sqrt(z*sinh(1/z)+1/(810*z**6)))**z
def choose(n, k):
n = Decimal(str(n))
k = Decimal(str(k))
return gamma(n+1)/gamma(k+1)/gamma(n-k+1)
Example:
>>> choose(20000000,15000000)
Decimal('2.087655025913799812289651991E+4884377')
>>> choose(130202807,65101404)
Decimal('1.867575060806365854276707374E+39194946')
Any higher, and it will overflow. The exponent seems to be limited to 40000000.
Given a reasonable number of values for n and K, calculate them in advance and use a lookup table.
It's dodging the issue in some fashion (you're offloading the calculation), but it's a useful technique if you're having to determine large numbers of values.
MATLAB:
The cheater's way (using the built-in function NCHOOSEK): 13 characters, O(?)
nchoosek(N,k)
My solution: 36 characters, O(min(k,N-k))
a=min(k,N-k);
prod(N-a+1:N)/prod(1:a)
I know this is a really old question but I struggled with a solution to this problem for a long while until I found a really simple one written in VB 6 and after porting it to C#, here is the result:
public int NChooseK(int n, int k)
{
var result = 1;
for (var i = 1; i <= k; i++)
{
result *= n - (k - i);
result /= i;
}
return result;
}
The final code is so simple you won't believe it will work until you run it.
Also, the original article gives some nice explanation on how he reached the final algorithm.

Computing a 95% confidence interval on the sum of n i.i.d. exponential random variables

Let's in fact generalize to a c-confidence interval. Let the common rate parameter be a. (Note that the mean of an exponential distribution with rate parameter a is 1/a.)
First find the cdf of the sum of n such i.i.d. random variables. Use that to compute a c-confidence interval on the sum. Note that the max likelihood estimate (MLE) of the sum is n/a, ie, n times the mean of a single draw.
Background: This comes up in a program I'm writing to make time estimates via random samples. If I take samples according to a Poisson process (ie, the gaps between samples have an exponential distribution) and n of them happen during Activity X, what's a good estimate for the duration of Activity X? I'm pretty sure the answer is the answer to this question.
As John D. Cook hinted, the sum of i.i.d. exponential random variables has a gamma distribution.
Here's the cdf of the sum of n exponential random variables with rate parameter a (expressed in Mathematica):
F[x_] := 1 - GammaRegularized[n, a*x];
http://mathworld.wolfram.com/RegularizedGammaFunction.html
The inverse cdf is:
Fi[p_] := InverseGammaRegularized[n, 1 - p]/a;
The c-confidence interval is then
ci[c_, a_, n_] := {Fi[a, n, (1-c)/2], Fi[a, n, c+(1-c)/2]}
Here is some code to empirically verify that the above is correct:
(* Random draw from an exponential distribution given rate param. *)
getGap[a_] := -1/a*Log[RandomReal[]]
betw[x_, {a_, b_}] := Boole[a <= x <= b]
c = .95;
a = 1/.75;
n = 40;
ci0 = ci[c, a, n];
N#Mean#Table[betw[Sum[getGap[a], {n}], ci0], {100000}]
----> 0.94995
Hint: the sum of independent exponential random variables is a gamma random variable.
I would use a Chernoff bound, from which you can improvise an interval because the expression is pretty generalizable and you can solve such that the bounded range is wrong < 0.05 of the time.
A Chernoff bound is just about the strongest bound you can get on iid variables without knowing too many moment generating functions.