Getting the prime factorisation of a number rapidly less than 100^100 - prime-factoring

I want to make a program that can compare the prime factorisation of numbers of the size 100^100. I want the program to tell me if there is an 2, for example, in the prime factorisation, but I don't want to know how many there are... And than I want to compare the prime factorisation of two numbers... Could someone help? The difficulty is really the size of the numbers... And the efficiency of the program... I would like that comparing two numbers only takes like maximum 10 seconds...
I have this;
import java.util.ArrayList;
import java.util.List;
public class PrimeFactorisation {
public static List<Integer> primeFactors(int numbers) {
int n = numbers;
List<Integer> factors = new ArrayList<Integer>();
for (int i = 2; i <= n / i; i++) {
while (n % i == 0) {
factors.add(i);
n /= i;
}
}
if (n > 1) {
factors.add(n);
}
return factors;
}
public static void main(String[] args) {
System.out.println("Primefactors of 44");
for (Integer integer : primeFactors(12345678901)) {
System.out.println(integer);
}
}
}
This code is for Java, but I am primarly searching for something efficient, so I am willing to change the language...

In general, you won't be able to find factors of numbers that large. But if you only want to know if two numbers have the same factors except for multiplicity, it's fairly easy. Use Euclid's algorithm to find the greatest common factor of the two numbers. If the result is 1, the numbers are coprime and do not have the same factors. Otherwise, factor the greatest common factor into primes; that should be much easier than factoring the two big numbers, since it will probably be much smaller. Then divide each of the big numbers by each of the common prime factors until it doesn't divide evenly; if, after dividing by all of the common prime factors, the two remainders are the same, you can declare that the two original numbers have the same prime factors but in different multiplicities, otherwise you know that they have prime factors that aren't shared. Ask if you need to know more.
That's a rather odd request. What is your use case? Perhaps there is some other way to solve the underlying problem.

These are huge integers. Using a huge integer library, you will get most of them down to pretty low values quite quickly by tasking out the low factors, but not all of them - primes aren't uncommon.
But you don't need the factors, you need to know if two numbers share a common factor. There is I believe a test for that. It's fairly complicated and a bit beyond my mathematical expertise, but it's a component or randomised primality testing.

Related

What is an effective way to filter numbers according to certain number-ranges using CUDA?

I have a lot of random floating point numbers residing in global GPU memory. I also have "buckets" that specify ranges of numbers they will accept and a capacity of numbers they will accept.
ie:
numbers: -2 0 2 4
buckets(size=1): [-2, 0], [1, 5]
I want to run a filtration process that yields me
filtered_nums: -2 2
(where filtered_nums can be a new block of memory)
But every approach I take runs into a huge overhead of trying to synchronize threads across bucket counters. If I try to use a single-thread, the algorithm completes successfully, but takes frighteningly long (over 100 times slower than generating the numbers in the first place).
What I am asking for is a general high-level, efficient, as-simple-as-possible approach algorithm that you would use to filter these numbers.
edit
I will be dealing with 10 buckets and half a million numbers. Where all the numbers fall into exactly 1 of the 10 bucket ranges. Each bucket will hold 43000 elements. (There are excess elements, since the objective is to fill every bucket, and many numbers will be discarded).
2nd edit
It's important to point out that the buckets do not have to be stored individually. The objective is just to discard elements that would not fit into a bucket.
You can use thrust::remove_copy_if
struct within_limit
{
__host__ __device__
bool operator()(const int x)
{
return (x >=lo && x < hi);
}
};
thrust::remove_copy_if(input, input + N, result, within_limit());
You will have to replace lo and hi with constants for each bin..
I think you can templatize the kernel, but then again you will have to instantiate the template with actual constants. I can't see an easy way at it, but I may be missing something.
If you are willing to look at third party libraries, arrayfire may offer an easier solution.
array I = array(N, input, afDevice);
float **Res = (float **)malloc(sizeof(float *) * nbins);
for(int i = 0; i < nbins; i++) {
array res = where(I >= lo[i] && I < hi[i]);
Res[i] = res.device<float>();
}

Ideal method of multiplying mixed fractional number to prevent overflow?

I have a simple class that contains three unsigned integer fields: a whole value, a numerator, and a denominator, representing a mixed number of the form:
<Whole> <Num>/<Den> // e.g. 3 1/2
I'd like to be able to multiply instances of these classes with each other, but since my main app uses relatively large numbers, I'm concerned about overflow. Is there an algorithm for performing this kind of multiplication that minimizes the potential for multiplication overflow?
I'm OK with having overflow if it's unavoidable, what I'm looking for is for a way to "intelligently" multiply to avoid having overflow if it's possible.
I'm not sure if you actually needed info on multiplying mixed numbers... but this site explains how to do it fairly simply: Multiplying Mixed Numbers.
At any rate... the data structure you've created has inherited the limitations of its parts. That is to say, even if you were just working with rounded up unsigned ints, you were still going to end up with the potential for overflow. If you're worried about blowing out your unsigned int then you should consider bumping the type you're using up to something that can handle larger numbers.
Wikipedia has a pretty good summary on Arithmetic Overflow and some ideas for handling it: Arithmetic Overflow
Calculating the least-common-multiple (LCM) of the two denominators can help to keep the numbers small. There is a lot of info on wikipedia, have a look at the "Reduction by greatest common divisor" section of http://en.wikipedia.org/wiki/Least_common_multiple and the "Implementations" section of http://en.wikipedia.org/wiki/Euclidean_algorithm.
There is a way of doing it without resorting to arbitrary presision arithmetics as well. Unless you are coding in assembly, it's more of a curiosity rather than a useful algorithm, but it may be worth mentioning.
int dividend = 0;
int result = 0;
int remainder = 0;
while( num != 0 ) {
boolean bit = <take the topmost bit of num>
dividend = remainder << 1;
if( bit ) {
dividend += whole;
}
int quotient = dividend / div;
result = (result << 1) + quotient;
remainder = dividend % div;
num = num << 1;
}
result <<= 1;
result += ( remainder << 1 ) / div;
I know the loop is clumsy but my mind's gone blank and I can't rephrase it so that everything fits neatly into it, but you should get the general idea, which is basically perform the multiplication bit by bit while you're doing the division.

How would you write this algorithm for large combinations in the most compact way?

The number of combinations of k items which can be retrieved from N items is described by the following formula.
N!
c = ___________________
(k! * (N - k)!)
An example would be how many combinations of 6 Balls can be drawn from a drum of 48 Balls in a lottery draw.
Optimize this formula to run with the smallest O time complexity
This question was inspired by the new WolframAlpha math engine and the fact that it can calculate extremely large combinations very quickly. e.g. and a subsequent discussion on the topic on another forum.
http://www97.wolframalpha.com/input/?i=20000000+Choose+15000000
I'll post some info/links from that discussion after some people take a stab at the solution.
Any language is acceptable.
Python: O(min[k,n-k]2)
def choose(n,k):
k = min(k,n-k)
p = q = 1
for i in xrange(k):
p *= n - i
q *= 1 + i
return p/q
Analysis:
The size of p and q will increase linearly inside the loop, if n-i and 1+i can be considered to have constant size.
The cost of each multiplication will then also increase linearly.
This sum of all iterations becomes an arithmetic series over k.
My conclusion: O(k2)
If rewritten to use floating point numbers, the multiplications will be atomic operations, but we will lose a lot of precision. It even overflows for choose(20000000, 15000000). (Not a big surprise, since the result would be around 0.2119620413×104884378.)
def choose(n,k):
k = min(k,n-k)
result = 1.0
for i in xrange(k):
result *= 1.0 * (n - i) / (1 + i)
return result
Notice that WolframAlpha returns a "Decimal Approximation". If you don't need absolute precision, you could do the same thing by calculating the factorials with Stirling's Approximation.
Now, Stirling's approximation requires the evaluation of (n/e)^n, where e is the base of the natural logarithm, which will be by far the slowest operation. But this can be done using the techniques outlined in another stackoverflow post.
If you use double precision and repeated squaring to accomplish the exponentiation, the operations will be:
3 evaluations of a Stirling approximation, each requiring O(log n) multiplications and one square root evaluation.
2 multiplications
1 divisions
The number of operations could probably be reduced with a bit of cleverness, but the total time complexity is going to be O(log n) with this approach. Pretty manageable.
EDIT: There's also bound to be a lot of academic literature on this topic, given how common this calculation is. A good university library could help you track it down.
EDIT2: Also, as pointed out in another response, the values will easily overflow a double, so a floating point type with very extended precision will need to be used for even moderately large values of k and n.
I'd solve it in Mathematica:
Binomial[n, k]
Man, that was easy...
Python: approximation in O(1) ?
Using python decimal implementation to calculate an approximation. Since it does not use any external loop, and the numbers are limited in size, I think it will execute in O(1).
from decimal import Decimal
ln = lambda z: z.ln()
exp = lambda z: z.exp()
sinh = lambda z: (exp(z) - exp(-z))/2
sqrt = lambda z: z.sqrt()
pi = Decimal('3.1415926535897932384626433832795')
e = Decimal('2.7182818284590452353602874713527')
# Stirling's approximation of the gamma-funciton.
# Simplification by Robert H. Windschitl.
# Source: http://en.wikipedia.org/wiki/Stirling%27s_approximation
gamma = lambda z: sqrt(2*pi/z) * (z/e*sqrt(z*sinh(1/z)+1/(810*z**6)))**z
def choose(n, k):
n = Decimal(str(n))
k = Decimal(str(k))
return gamma(n+1)/gamma(k+1)/gamma(n-k+1)
Example:
>>> choose(20000000,15000000)
Decimal('2.087655025913799812289651991E+4884377')
>>> choose(130202807,65101404)
Decimal('1.867575060806365854276707374E+39194946')
Any higher, and it will overflow. The exponent seems to be limited to 40000000.
Given a reasonable number of values for n and K, calculate them in advance and use a lookup table.
It's dodging the issue in some fashion (you're offloading the calculation), but it's a useful technique if you're having to determine large numbers of values.
MATLAB:
The cheater's way (using the built-in function NCHOOSEK): 13 characters, O(?)
nchoosek(N,k)
My solution: 36 characters, O(min(k,N-k))
a=min(k,N-k);
prod(N-a+1:N)/prod(1:a)
I know this is a really old question but I struggled with a solution to this problem for a long while until I found a really simple one written in VB 6 and after porting it to C#, here is the result:
public int NChooseK(int n, int k)
{
var result = 1;
for (var i = 1; i <= k; i++)
{
result *= n - (k - i);
result /= i;
}
return result;
}
The final code is so simple you won't believe it will work until you run it.
Also, the original article gives some nice explanation on how he reached the final algorithm.

What is the proper method of constraining a pseudo-random number to a smaller range?

What is the best way to constrain the values of a PRNG to a smaller range? If you use modulus and the old max number is not evenly divisible by the new max number you bias toward the 0 through (old_max - new_max - 1). I assume the best way would be something like this (this is floating point, not integer math)
random_num = PRNG() / max_orginal_range * max_smaller_range
But something in my gut makes me question that method (maybe floating point implementation and representation differences?).
The random number generator will produce consistent results across hardware and software platforms, and the constraint needs to as well.
I was right to doubt the pseudocode above (but not for the reasons I was thinking). MichaelGG's answer got me thinking about the problem in a different way. I can model it using smaller numbers and test every outcome. So, let's assume we have a PRNG that produces a random number between 0 and 31 and you want the smaller range to be 0 to 9. If you use modulus you bias toward 0, 1, 2, and 3. If you use the pseudocode above you bias toward 0, 2, 5, and 7. I don't think there can be a good way to map one set into the other. The best that I have come up with so far is to regenerate the random numbers that are greater than old_max/new_max, but that has deep problems as well (reducing the period, time to generate new numbers until one is in the right range, etc.).
I think I may have naively approached this problem. It may be time to start some serious research into the literature (someone has to have tackled this before).
I know this might not be a particularly helpful answer, but I think the best way would be to conceive of a few different methods, then trying them out a few million times, and check the result sets.
When in doubt, try it yourself.
EDIT
It should be noted that many languages (like C#) have built in limiting in their functions
int maximumvalue = 20;
Random rand = new Random();
rand.Next(maximumvalue);
And whenever possible, you should use those rather than any code you would write yourself. Don't Reinvent The Wheel.
This problem is akin to rolling a k-sided die given only a p-sided die, without wasting randomness.
In this sense, by Lemma 3 in "Simulating a dice with a dice" by B. Kloeckner, this waste is inevitable unless "every prime number dividing k also divides p". Thus, for example, if p is a power of 2 (and any block of random bits is the same as rolling a die with a power of 2 number of faces) and k has prime factors other than 2, the best you can do is get arbitrarily close to no waste of randomness, such as by batching multiple rolls of the p-sided die until p^n is "close enough" to a power of k.
Let me also go over some of your concerns about regenerating random numbers:
"Reducing the period": Besides batching of bits, this concern can be dealt with in several ways:
Use a PRNG with a bigger "period" (maximum cycle length).
Add a Bays–Durham shuffle to the PRNG's implementation.
Use a "true" random number generator; this is not trivial.
Employ randomness extraction, which is discussed in Devroye and Gravel 2015-2020 and in my Note on Randomness Extraction. However, randomness extraction is pretty involved.
Ignore the problem, especially if it isn't a security application or serious simulation.
"Time to generate new numbers until one is in the right range": If you want unbiased random numbers, then any algorithm that does so will generally have to run forever in the worst case. Again, by Lemma 3, the algorithm will run forever in the worst case unless "every prime number dividing k also divides p", which is not the case if, say, k is 10 and p is 32.
See also the question: How to generate a random integer in the range [0,n] from a stream of random bits without wasting bits?, especially my answer there.
If PRNG() is generating uniformly distributed random numbers then the above looks good. In fact (if you want to scale the mean etc.) the above should be fine for all purposes. I guess you need to ask what the error associated with the original PRNG() is, and whether further manipulating will add to that substantially.
If in doubt, generate an appropriately sized sample set, and look at the results in Excel or similar (to check your mean / std.dev etc. for what you'd expect)
If you have access to a PRNG function (say, random()) that'll generate numbers in the range 0 <= x < 1, can you not just do:
random_num = (int) (random() * max_range);
to give you numbers in the range 0 to max_range?
Here's how the CLR's Random class works when limited (as per Reflector):
long num = maxValue - minValue;
if (num <= 0x7fffffffL) {
return (((int) (this.Sample() * num)) + minValue);
}
return (((int) ((long) (this.GetSampleForLargeRange() * num))) + minValue);
Even if you're given a positive int, it's not hard to get it to a double. Just multiply the random int by (1/maxint). Going from a 32-bit int to a double should provide adequate precision. (I haven't actually tested a PRNG like this, so I might be missing something with floats.)
Psuedo random number generators are essentially producing a random series of 1s and 0s, which when appended to each other, are an infinitely large number in base two. each time you consume a bit from you're prng, you are dividing that number by two and keeping the modulus. You can do this forever without wasting a single bit.
If you need a number in the range [0, N), then you need the same, but instead of base two, you need base N. It's basically trivial to convert the bases. Consume the number of bits you need, return the remainder of those bits back to your prng to be used next time a number is needed.

How do I explain what a "naive implementation" is? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What is the clearest explanation of what computer scientists mean by "the naive implementation"? I need a good clear example which will illustrate — ideally, even to non-technical people — that the naive implementation may technically be a functioning solution to the problem, but practically be utterly unusable.
I'd try to keep it away from computers altogether. Ask your audience how they find an entry in a dictionary. (A normal dictionary of word definitions.)
The naive implementation is to start at the very beginning, and look at the first word. Oh, that's not the word we're looking for - look at the next one, etc. It's worth pointing out to the audience that they probably didn't even think of that way of doing things - we're smart enough to discount it immediately! It is, however, about the simplest way you could think of. (It might be interesting to ask them whether they can think of anything simpler, and check that they do really understand why it's simpler than the way we actually do it.)
The next implementation (and a pretty good one) is to start in the middle of the dictionary. Does the word we're looking for come before or after that? If it's before, turn to the page half way between the start and where we are now - otherwise, turn to the page half way between where we are now and the end, etc - binary chop.
The actual human implementation is to use our knowledge of letters to get very rapidly to "nearly the right place" - if we see "elephant" then we'll know it'll be "somewhere near the start" maybe about 1/5th of the way through. Once we've got to E (which we can do with very, very simple comparisons) we find EL etc.
StackOverflow's Jeff Atwood had a great example of a naive algorithm related to shuffling an array.
Doing it the most straightforward, least tricky way available. One example is selection sort.
In this case naive does not mean bad or unusable. It just means not particularly good.
Taking Jon Skeet's advice to heart you can describe selection sort as:
Find the highest value in the list and put it first
Find the next highest value and add it to the list
Repeat step 2 until you run out of list
It is easy to do and easy to understand, but not necessarily the best.
another naive implementation would be the use of recursion in computing for an integer's factorial in an imperative language. a more efficient solution in that case is to just use a loop.
What's the most obvious, naive algorithm for exponentiation that you could think of?
base ** exp is base * base * ... * base, exp times:
double pow(double base, int exp) {
double result = 1;
for (int i = 0; i < exp; i++)
result *= base;
return result;
}
It doesn't handle negative exponents, though. Remembering that base ** exp == 1 / base ** (-exp) == (1 / base) ** (-exp):
double pow(double base, int exp) {
double result = 1;
if (exp < 0) {
base = 1 / base;
exp = -exp;
}
for (int i = 0; i < exp; i++)
result *= base;
return result;
}
It's actually possible to compute base ** exp with less than exp multiplications, though!
double pow(double base, int exp) {
double result = 1;
if (exp < 0) {
base = 1 / base;
exp = -exp;
}
while (exp) {
if (exp % 2) {
result *= base;
exp--;
}
else {
base *= base;
exp /= 2;
}
}
return result * base;
}
This takes advantage of the fact that base ** exp == (base * base) ** (exp / 2) if exp is even, and will only require about log2(exp) multiplications.
I took the time to read your question a little closer, and I have the perfect example.
a good clear example which will illustrate -- ideally, even to non-technical people -- that the naive implementation may technically be a functioning solution to the problem, but practically be utterly unusable.
Try Bogosort!
If bogosort were used to sort a deck of cards, it would consist of checking if the deck were in order, and if it were not, one would throw the deck into the air, pick up the cards up at random, and repeat the process until the deck is sorted.
"Naive implementation" is almost always synonymous with "brute-force implementation". Naive implementations are often intuitive and the first to come to mind, but are also often O(n^2) or worse, thus taking too long too be practical for large inputs.
Programming competitions are full of problems where the naive implementation will fail to run in an acceptable amount of time, and the heart of the problem is coming up with an improved algorithm that is generally much less obvious but runs much more quickly.
Naive implementation is:
intuitive;
first to come in mind;
often inffective and/or buggy incorner cases;
Let's say that someone figures out how to extract a single field from a database and then proceeds to write a web page in PHP or any language that makes a separate query on the database for each field on the page. It works, but will be incredibly slow, inefficient, and difficult to maintain.
Naive doesn't mean bad or unusable - it means having certain qualities which pose a problem in a specific context and for a specific purpose.
The classic example of course is sorting. In the context of sorting a list of ten numbers, any old algorithm (except pogo sort) would work pretty well. However, when we get to the scale of thousands of numbers or more, typically we say that selection sort is the naive algorithm because it has the quality of O(n^2) time which would be too slow for our purposes, and that the non-naive algorithm is quicksort because it has the quality of O(n lg n) time which is fast enough for our purposes.
In fact, the case could be made that in the context of sorting a list of ten numbers, quicksort is the naive algorithm, since it will take longer than selection sort.
Determining if a number is prime or not (primality test) is an excellent example.
The naive method just check if n mod x where x = 2..square root(n) is zero for at least one x. This method can get really slow for very large prime numbers and it is not feasible to use in cryptography.
On the other hand there are a couple of probability or fast deterministic tests. These are too complicated to explain here but you might want to check the relevant Wikipedia article on the subject for more information: http://en.wikipedia.org/wiki/Primality_test
Bubble sort over 100,000 thousand entries.
The intuitive algorithms you normally use to sort a deck of cards (insertion sort or selection sort, both O(n^2)) can be considered naive, because they are easy to learn and implement, but would not scale well to a deck of, say, 100000 cards :D . In a general setting, there are faster (O(n log n)) ways to sort a list.
Note, however, that naive does not necessarily mean bad. There are situations where insertion sort is a good choice (say, when you have an already sorted big deck and few unsorted cards to add).
(Haven't seen a truly naive implementation posted yet so...)
The following implementation is "naive", because it does not cover the edge cases, and will break in other cases. It is very simple to understand, and can convey a programming message.
def naive_inverse(x):
return 1/x
It will:
Break on x=0
Do a bad job when passed an integer
You could make it more "mature" by adding these features.
A O(n!) algorithm.
foreach(object o in set1)
{
foreach(object p in set1)
{
// codez
}
}
This will perform fine with small sets and then exponentially worse with larger ones.
Another might be a naive Singleton that doesn't account for threading.
public static SomeObject Instance
{
get
{
if(obj == null)
{
obj = new SomeObject();
}
return obj;
}
}
If two threads access that at the same time it's possible for them to get two different versions. Leading to seriously weird bugs.