I don't understand the answers of this question with natural join and projection - relational-database

I am following a MOOC, but I don't understand the correct answer nor the other answers.
The MOOC closed and I cannot ask any questions on the forum.
This is the question:
Considering the following relation R:
A B C D
1 0 2 2
4 1 2 2
6 0 6 3
7 1 2 3
1 0 6 1
1 1 2 1
Between all these requests, which one return the same relation R?
ΠA,B,C,D(R⋈δA→D,D→F(R))
R⋈δA→D,D→A(R)
R⋈δB→C,C→B(R)
ΠA,B,C,D(R⋈δB→G,C→F(R)) (note: this is the correct answer)
The only given explanation is :
The first 3 answers loose the tuple(4,1,2,2). In the last joint, no tuple is lost.
Could you details please whats does the answers do?
Thank you very much for your attention!

This is a question about the Relational Algebra's Natural Join, and attribute naming. I presume the squiggly thing in your formulas is for Rename, usually denoted by Greek letter rho ρ (see the wikipedia link).
For Natural Join see the wikipedia example and note
The result of the natural join is the set of all combinations of tuples in R and S that are equal on their common attribute names.
Because of the renaming in the four formulas, in general, the result from renamed R will not have the same attribute names as the original R, or will not be equal on the values in the resulting same-named attributes.
I suggest you go through each four of the renamings, and work out what is the 'heading' of each result -- that is, what are the resulting attribute names.
You'll find in requests 1., 2., 3. there's at least one resulting attribute same-named as the original R but the values for that attribute are not the same.
In request 4., although attributes B, C are renamed, their new names do not clash with any existing attribute in R. So the Natural Join to original R will use attributes A, D. This'll produce an interesting intermediate result: consider the tuples <1, 0, 6, 1>, <1, 1, 2, 1> which each contain equal values in their A attribute and their D attribute.
But then in request 4., the projection will throw away the newly-named attributes G, F and collapse back to the original A, B, C, D. So in general, request 4. always returns exactly the original R.
Requests 1., 2., 3. might sometimes return the original R, depending on the content of R. But with the content you show, there are clashes of newly-same-named attributes with non-equal values, so they do 'lose' tuples.
BTW, although tuple <4, 1, 2, 2> does indeed get 'lost' in those three requests, it's not the only tuple that gets 'lost'. In particular in request 3., note that for the sample data, there are no values in common between B, C, so swapping them round in the rename has the effect of returning an empty result from the Join.

Related

Naming of lower_bound, upper_bound c++

Does any one know why they where given these names? Comming from a maths backgound they always left my mind in tangles since they are both mathematical lower bounds i.e. minimums in the finite world. Also the natural language definition given in the stl is a bad mental model imo.
Does anyone use mental synonyms to be able to work with them, or do they just remember the naïve implementations?
lower_bound(rng, x) = get_iter_to(mathematical_lower_bound(rng | filter([](auto y)
{return x<=y;}))
upper_bound(rng, x) = get_iter_to(mathematical_lower_bound(rng | filter([](auto y)
{return x<y;})))
Igor Tandetnik answered this in the comments.
The set in question is the the elements which the given value can be inserted before while preserving the order.
For example if we want to insert 2 in to the range [0,1,2,2,3,4] then we could insert it at index 2, 3 or 4. lower_bound gives the iterator to the start of the range. upper_bound gives the last element in this range.
I suppose this is a name for library implementers writing pivots, rather than me trying to look up keys/indices of a vector of numerics.

What sort of loss function should I use this multi-class multi-label(?) problem?

In my experiment I am trying to train a neural network to detect if patients exhibit symptom A, B, C, D. My data consists of different angled photos of each patient along with whether or not they have symptom A, B, C, D.
Right now in, pytoch, I am using MSELoss and calculating my test error as the total number of correct classifications out of the total number of classifications. I'm guessing this is too naive and even inappropriate.
An example of a test error computation would be like this:
Suppose we have 2 patients with two images each of them. Then there would be 16 total classifications (1 for whether patient 1 has symptom A, B, C, D in photo 1, etc). And if the model correctly predicted that in photo 1 patient 1 exhibited symptom A then that would add 1 to the total number of correct classifications.
I suggest to use binary-crossentropy in multi-class multi-label classifications. This may seem counterintuitive for multi-label classification, but keep in mind that the goal here is to treat each output label as an independent distribution (or class).
In pytorch you can use torch.nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean'). This creates a criterion that measures the Binary Cross Entropy between the target and the output.

Finding the Candidate Keys of a Relation Using the FD's

I have definitely checked out many different related posts, as suggested when creating this question. I have also done different sample problems from online sources as well from a similar problem. However, I am stuck on the problem below specifically.
Given the following relation R and the set of functional dependencies S that hold on R, find all candidate keys for R. Show your work.
R(A, B, C, D, E, F)
S:
AB → C
AC → B
AD → E
BC → A
E → F
Initially, I broke the attributes into groups: attributes found only on the left, only on the right, and on both sides (they are D, ABCE, and F respectively). I also know that I should try to compute the closure of D. This is where I get stuck. At first glance, this seems like I am unable to solve this problem, which isn't true. I also tried computing the closures of (AD), (BD), (CD), and (ED) because I thought that the closure of D = D. Any thoughts?
The keys here are ABD, ACD and BCD.
You were on the right track. After dividing the attributes in three groups, the attributes under "only on the left" list are always a part of the key. Here that attribute is D.
"I also tried computing the closures of (AD), (BD), (CD), and (ED)"
As you couldn't determine the key while taking attributes in groups of 2 you should have then tried making group of 3 attributes and check their closure.

distributive property for product of maxterms

I am unsure how to use the Distributive property on the following function:
F = B'D + A'D + BD
I understand that F = xy + x'z would become (xy + x')(xy + z) but I'm not sure how to do this with three terms with two variables.
Also another small question:
I was wondering how to know what number a minterm is without having to consult (or memorise) the table of minterms.
For example how can I tell that xy'z' is m4?
When you're trying to use the distributive property there, what you're doing is converting minterms to maxterms. This is actually very related to your second question.
To tell that xy'z' is m4, think of function as binary where false is 0 and true is 1. xy'z' then becomes 100, binary for the decimal 4. That's really what a k-map/minterm table is doing for you to give a number.
Now an important extension of this: the number of possible combinations is 2^number of different variables. If you have 3 variables, there are 2^3 or 8 different combinations. That means you have min/maxterm possible numbers from 0-7. Here's the cool part: anything that isn't a minterm is a maxterm, and vice versa.
So, if you have variables x and y, and you have the expression xy', you can see that as 10, or m2. Because the numbers go from 0-3 with 2 variables, m2 implies M0, M1, and M3. Therefore, xy'=(x+y)(x+y')(x'+y').
In other words, the easiest way to do the distributive property in either direction is to note what minterm or maxterm you're dealing with, and just switch it to the other.
For more info/different wording.

How do you calculate the total number of all possible unique subsets from a set with repeats?

Given a set** S containing duplicate elements, how can one determine the total number all the possible subsets of S, where each subset is unique.
For example, say S = {A, B, B} and let K be the set of all subsets, then K = {{}, {A}, {B}, {A, B}, {B, B}, {A, B, B}} and therefore |K| = 6.
Another example would be if S = {A, A, B, B}, then K = {{}, {A}, {B}, {A, B}, {A, A}, {B, B}, {A, B, B}, {A, A, B}, {A, A, B, B}} and therefor |K| = 9
It is easy to see that if S is a real set, having only unique elements, then |K| = 2^|S|.
What is a formula to calculate this value |K| given a "set" S (with duplicates), without generating all the subsets?
** Not technically a set.
Take the product of all the (frequencies + 1).
For example, in {A,B,B}, the answer is (1+1) [the number of As] * (2+1) [the number of Bs] = 6.
In the second example, count(A) = 2 and count(B) = 2. Thus the answer is (2+1) * (2+1) = 9.
The reason this works is that you can define any subset as a vector of counts - for {A,B,B}, the subsets can be described as {A=0,B=0}, {A=0,B=1}, {0,2}, {1,0}, {1,1}, {1,2}.
For each number in counts[] there are (frequencies of that object + 1) possible values. (0..frequencies)
Therefore, the total number of possiblities is the product of all (frequencies+1).
The "all unique" case can also be explained this way - there is one occurence of each object, so the answer is (1+1)^|S| = 2^|S|.
I'll argue that this problem is simple to solve, when viewed in the proper way. You don't care about order of the elements, only whether they appear in a subset of not.
Count the number of times each element appears in the set. For the one element set {A}, how many subsets are there? Clearly there are only two sets. Now suppose we added another element, B, that is distinct from A, to form the set {A,B}. We can form the list of all sets very easily. Take all the sets that we formed using only A, and add in zero or one copy of B. In effect, we double the number of sets. Clearly we can use induction to show that for N distinct elements, the total number of sets is just 2^N.
Suppose that some elements appear multiple times? Consider the set with three copies of A. Thus {A,A,A}. How many subsets can you form? Again, this is simple. We can have 0, 1, 2, or 3 copies of A, so the total number of subsets is 4 since order does not matter.
In general, for N copies of the element A, we will end up with N+1 possible subsets. Now, expand this by adding in some number, M, of copies of B. So we have N copies of A and M copies of B. How many total subsets are there? Yes, this seems clear too. To every possible subset with only A in it (there were N+1 of them) we can add between 0 and M copies of B.
So the total number of subsets when we have N copies of A and M copies of B is simple. It must be (N+1)*(M+1). Again, we can use an inductive argument to show that the total number of subsets is the product of such terms. Merely count up the total number of replicates for each distinct element, add 1, and take the product.
See what happens with the set {A,B,B}. We get 2*3 = 6.
For the set {A,A,B,B}, we get 3*3 = 9.