How do you calculate the total number of all possible unique subsets from a set with repeats? - language-agnostic

Given a set** S containing duplicate elements, how can one determine the total number all the possible subsets of S, where each subset is unique.
For example, say S = {A, B, B} and let K be the set of all subsets, then K = {{}, {A}, {B}, {A, B}, {B, B}, {A, B, B}} and therefore |K| = 6.
Another example would be if S = {A, A, B, B}, then K = {{}, {A}, {B}, {A, B}, {A, A}, {B, B}, {A, B, B}, {A, A, B}, {A, A, B, B}} and therefor |K| = 9
It is easy to see that if S is a real set, having only unique elements, then |K| = 2^|S|.
What is a formula to calculate this value |K| given a "set" S (with duplicates), without generating all the subsets?
** Not technically a set.

Take the product of all the (frequencies + 1).
For example, in {A,B,B}, the answer is (1+1) [the number of As] * (2+1) [the number of Bs] = 6.
In the second example, count(A) = 2 and count(B) = 2. Thus the answer is (2+1) * (2+1) = 9.
The reason this works is that you can define any subset as a vector of counts - for {A,B,B}, the subsets can be described as {A=0,B=0}, {A=0,B=1}, {0,2}, {1,0}, {1,1}, {1,2}.
For each number in counts[] there are (frequencies of that object + 1) possible values. (0..frequencies)
Therefore, the total number of possiblities is the product of all (frequencies+1).
The "all unique" case can also be explained this way - there is one occurence of each object, so the answer is (1+1)^|S| = 2^|S|.

I'll argue that this problem is simple to solve, when viewed in the proper way. You don't care about order of the elements, only whether they appear in a subset of not.
Count the number of times each element appears in the set. For the one element set {A}, how many subsets are there? Clearly there are only two sets. Now suppose we added another element, B, that is distinct from A, to form the set {A,B}. We can form the list of all sets very easily. Take all the sets that we formed using only A, and add in zero or one copy of B. In effect, we double the number of sets. Clearly we can use induction to show that for N distinct elements, the total number of sets is just 2^N.
Suppose that some elements appear multiple times? Consider the set with three copies of A. Thus {A,A,A}. How many subsets can you form? Again, this is simple. We can have 0, 1, 2, or 3 copies of A, so the total number of subsets is 4 since order does not matter.
In general, for N copies of the element A, we will end up with N+1 possible subsets. Now, expand this by adding in some number, M, of copies of B. So we have N copies of A and M copies of B. How many total subsets are there? Yes, this seems clear too. To every possible subset with only A in it (there were N+1 of them) we can add between 0 and M copies of B.
So the total number of subsets when we have N copies of A and M copies of B is simple. It must be (N+1)*(M+1). Again, we can use an inductive argument to show that the total number of subsets is the product of such terms. Merely count up the total number of replicates for each distinct element, add 1, and take the product.
See what happens with the set {A,B,B}. We get 2*3 = 6.
For the set {A,A,B,B}, we get 3*3 = 9.

Related

I don't understand the answers of this question with natural join and projection

I am following a MOOC, but I don't understand the correct answer nor the other answers.
The MOOC closed and I cannot ask any questions on the forum.
This is the question:
Considering the following relation R:
A B C D
1 0 2 2
4 1 2 2
6 0 6 3
7 1 2 3
1 0 6 1
1 1 2 1
Between all these requests, which one return the same relation R?
ΠA,B,C,D(R⋈δA→D,D→F(R))
R⋈δA→D,D→A(R)
R⋈δB→C,C→B(R)
ΠA,B,C,D(R⋈δB→G,C→F(R)) (note: this is the correct answer)
The only given explanation is :
The first 3 answers loose the tuple(4,1,2,2). In the last joint, no tuple is lost.
Could you details please whats does the answers do?
Thank you very much for your attention!
This is a question about the Relational Algebra's Natural Join, and attribute naming. I presume the squiggly thing in your formulas is for Rename, usually denoted by Greek letter rho ρ (see the wikipedia link).
For Natural Join see the wikipedia example and note
The result of the natural join is the set of all combinations of tuples in R and S that are equal on their common attribute names.
Because of the renaming in the four formulas, in general, the result from renamed R will not have the same attribute names as the original R, or will not be equal on the values in the resulting same-named attributes.
I suggest you go through each four of the renamings, and work out what is the 'heading' of each result -- that is, what are the resulting attribute names.
You'll find in requests 1., 2., 3. there's at least one resulting attribute same-named as the original R but the values for that attribute are not the same.
In request 4., although attributes B, C are renamed, their new names do not clash with any existing attribute in R. So the Natural Join to original R will use attributes A, D. This'll produce an interesting intermediate result: consider the tuples <1, 0, 6, 1>, <1, 1, 2, 1> which each contain equal values in their A attribute and their D attribute.
But then in request 4., the projection will throw away the newly-named attributes G, F and collapse back to the original A, B, C, D. So in general, request 4. always returns exactly the original R.
Requests 1., 2., 3. might sometimes return the original R, depending on the content of R. But with the content you show, there are clashes of newly-same-named attributes with non-equal values, so they do 'lose' tuples.
BTW, although tuple <4, 1, 2, 2> does indeed get 'lost' in those three requests, it's not the only tuple that gets 'lost'. In particular in request 3., note that for the sample data, there are no values in common between B, C, so swapping them round in the rename has the effect of returning an empty result from the Join.

PROLOG code to find cheapest combination of all possible combinations

Complete newbie in PROLOG here. I am provided with a list of parts to implement a hypothetical computational system:
%partName(name, type, number, required_types, size, price)
part(a-1, a, 1, 2*b, 0.5, 60 )
part(a-2, a, 2, 3*b, 1.0, 25 )
part(b-1, b, 1, 2*c, 0.5, 25 )
part(b-2, b, 2, -, 1, 45 )
part(c-1, c, 1, -, 0.5, 5 )
part(c-2, c, 2, -, 0.5, 10 )
part(d-1, d, 1, 1*b && 2*c, 0.5, 45 )
part(d-2, d, 2, 1*c-1, 0.5, 25 )
For example: part a-1, of type a and number 1, needs two (2) b-type parts to function (either two b-1s or two b-2s of one of each), and then b-1 requires another two (2) c-type parts to function, etc. Parts also include a size (half size is 0.5, full size is 1) and a price.
(NOTE: parts with "-" as needed type work on their own, part d-1 requires one b-type AND two c types, and d-2 requires one c-1 in particular rather than generally a c type.)
The input is to be a setup of part types, like {a, b}, {c, d}, {a, b, d} etc.
The output is to be the cheapest possible configuration/combination of them in terms of space and money.
Remember: for example, in {a, b}, "a" can either be a-1 or a-2 (same with b) and each part needs the parts listed above in order to function and so on, so I presume it all works in a tree like, somewhat "recursive" manner.
I guess I first have to find all possible combinations/configurations and then somehow find the cheapest/most compact.
Can anyone provide a working solution or basic insight on how to set up the problem?
Please be a lifesaver.

Pollard’s p−1 algorithm: understanding of Berkeley paper

This paper explains about Pollard's p-1 factorization algorithm. I am having trouble understanding the case when factor found is equal to the input we go back and change 'a' (basically page 2 point 2 in the aforementioned paper).
Why we go back and increment 'a'?
Why we not go ahead and keep incrementing the factorial? It it because we keep going into the same cycle we have already seen?
Can I get all the factors using this same algorithm? Such as 49000 = 2^3 * 5^3 * 7^2. Currently I only get 7 and 7000. Perhaps I can use this get_factor() function recursively but I am wondering about the base cases.
def gcd(a, b):
if not b:
return a
return gcd(b, a%b)
def get_factor(input):
a = 2
for factorial in range(2, input-1):
'''we are not calculating factorial as anyway we need to find
out the gcd with n so we do mod n and we also use previously
calculate factorial'''
a = a**factorial % input
factor = gcd(a - 1, input)
if factor == 1:
continue
elif factor == input:
a += 1
elif factor > 1:
return factor
n = 10001077
p = get_factor(n)
q = n/p
print("factors of", n, "are", p, "and", q)
The linked paper is not a particularly good description of Pollard's p − 1 algorithm; most descriptions discuss smoothness bounds that make the algorithm much more practical. You might like to read this page at Prime Wiki. To answer your specific questions:
Why increment a? Because the original a doesn't work. In practice, most implementations don't bother; instead, a different factoring method, such as the elliptic curve method, is tried instead.
Why not increment the factorial? This is where the smoothness bound comes into play. Read the page at Mersenne Wiki for more details.
Can I get all factors? This question doesn't apply to the paper you linked, which assumes that the number being factored is a semi-prime with exactly two factors. The more general answer is "maybe." This is what happens at Step 3a of the linked paper, and choosing a new a may work (or may not). Or you may want to move to a different factoring algorithm.
Here is my simple version of the p − 1 algorithm, using x instead of a. The while loop computes the magical L of the linked paper (it's the least common multiple of the integers less than the smoothness bound b), which is the same calculation as the factorial of the linked paper, but done in a different way.
def pminus1(n, b, x=2):
q = 0; pgen = primegen(); p = next(pgen)
while p < b:
x = pow(x, p**ilog(p,b), n)
q, p = p, next(pgen)
g = gcd(x-1, n)
if 1 < g < n: return g
return False
You can see it in action at http://ideone.com/eMPHtQ, where it factors 10001 as in the linked paper as well as finding a rather spectacular 36-digit factor of fibonacci(522). Once you master that algorithm, you might like to move on to the two-stage version of the algorithm.

Checking bitmask: x & b != 0 VS x & b == b

Suppose x is a bitmask value, and b is one flag, e.g.
x = 0b10101101
b = 0b00000100
There seems to be two ways to check whether the bit indicated by b is turned on in x:
if (x & b != 0) // (1)
if (x & b == b) // (2)
In most circumstances it seems these two checks always yield the same result, given that b is always a binary with only one bit turned on.
However I wonder is there any exception that makes one method better than another?
In general, if we interpret both values as bit sets, the first condition checks if the intersection of x and b is not empty (or, to put it differently: if b and x have elements in common), while the second one checks if b is a subset of x.
Clearly, if b is a singleton, b is a subset of x if and only if the intersection is not empty.
So, whenever you cannot guarantee to 100% that b is a singleton, choose your condition wisely. Ask yourself if you want to express that all elements of b must also be elements of x, or that there are elements of b that are also elements of x. It's a huge difference except for the single bit case.

Iteratively declaring distinct variables

This question is strongly related to this and this question.
The distinct function of Z3
(declare-const a S)
(declare-const b S)
(assert (distinct a b))
allows constraining sets of variables (here a and b) such that all variables in the set must take different values.
My question is: is it also possible to force a variable to take a unique value without explicitly referring to the set of variables from which it should be distinct? Something like
(declare-unique-const a S)
(declare-unique-const b S)
(declare-unique-const c S)
This would be nice in situations where you declare new variables in an iterative process, for example, during program verification.
If it is not possible, I guess one has to keep track of all distinct variables and use that set to emit appropriate distinct (newvar, oldvar1, ..., oldvarn)) constraints.
We can define an auxiliary fresh function f from S to Int, and assert
f(a_1) = 1
f(a_1) = 2
f(a_3) = 3
...
f(a_n) = n
Then, a_1, ..., a_n must be different from each other.
If we want to say that b is also different from all a_is. We just assert
f(b) = n+1
In this approach, we only have to track the counter.