Proving intuitive statements about THE in Isabelle - unique

I would like to prove something like this lemma in Isabelle
lemma assumes "y = (THE x. P x)" shows "P (THE x. P x)"
I imagine that the assumption implies that THE x. P x exists and is well-defined. So this lemma ought to be true too
lemma assumes "y = (THE x. P x)" shows "∃! x. P x"
I'm not sure how to prove this because I've looked through all the theorems that turn up when I type "name: the" into the query box in Isabelle and they don't seem useful. I can't find the definition of THE either and I am not sure how to define it although I have an intuitive idea of what it means. I tried something like this although I am sure this is wrong
"(∃!x. P x) ⟹ THE x. P x = (SOME x. P x)"
and maybe even useless because I don't know how to define SOME either!

Unfortunately, the assumption does not imply that THE x. P x ‘exists’, at least not in a sense that you would find satisfying. As HOL is a total logic, there is no notion of ‘well-definedness’ in the logic.
If you write THE x. P x when there is no unique x that satisfies P, then THE x. P x is still a value that ‘exists’ in HOL, but one that you cannot prove anything meaningful about (much like the undefined constant) and certainly not one for which P holds. The same is true for SOME, which is basically the same as THE with the difference that for THE, there has to be a unique witness for the property and for SOME uniqueness is not required.
The typical approach for showing something about SOME x. P x is that you first show that a witness exists (i.e. ∃x. P x) and then you plug that into a rule like someI_ex which then tells you that P (SOME x. P x) indeed holds.
It's the same for THE, except that there you have to show that there is exactly one witness – which is what the ∃! means (cf. the theorem Ex1_def). Showing this unique existence can be done e.g. with the rules ex_ex1I or ex1I. Then you can plug that fact into theI' and the1_equality to get the results you want.
By the way, the constant for SOME is called Eps (as in ‘Hilbert's ε operator’) and the others are The and Ex1. If you type e.g. term Eps, you can ctrl-click on the Eps and it takes you to its definition (or, in case of Eps and The rather their axiomatisations).
There's also a LEAST combinator for natural numbers that is very similar to SOME and can be quite useful sometimes (it's called ‘Least’ and the lemmas are LeastI_ex and Least_le).
Another side note: This idea that just because you can write down a term, it is not necessarily ‘well-defined’ in an intuitive sense is very common in Isabelle: you can divide by zero, you can write down the derivative of a non-differentiable function, the measure of a non-measurable set, the integral of a non-integrable function etc. You then get some kind of dummy value (e.g. 0 for division by zero or something completely absurd like THE x. False), but most of the theorems that talk about actual properties of derivatives, integrals, etc. do explicitly require that the thing is actually well-defined.

Related

Do we input only 1s for minterms and 0s for maxterms?

This has been bugging me since a long time.
Suppose I have a boolean function F defined as follows:
Now, it can be expressed in its SOP form as:
F = bar(X)Ybar(Z)+ XYZ
But I fail to understand why we always complement the 0s to express them as 1. Is it assumed that the inputs X, Y and Z will always be 1?
What is the practical application of that? All the youtube videos I watched on this topic, how to express a function in SOP form or as sum of minterms but none of them explained why we need this thing? Why do we need minterms in the first place?
As of now, I believe that we design circuits to yield and take only 1 and that's where minterms come in handy. But I couldn't get any confirmation of this thing anywhere so I am not sure I am right.
Maxterms are even more confusing. Do we design circuits that would yield and take only 0s? Is that the purpose of maxterms?
Why do we need minterms in the first place?
We do not need minterms, we need a way to solve a logic design problem, i.e. given a truth table, find a logic circuit able to reproduce this truth table.
Obviously, this requires a methodology. Minterm and sum-of-products is mean to realize that. Maxterms and product-of-sums is another one. In either case, you get an algebraic representation of your truth table and you can either implement it directly or try to apply standard theorems of boolean algebra to find an equivalent, but simpler, representation.
But these are not the only tools. For instance, with Karnaugh maps, you rewrite your truth table with some rules and you can simultaneously find an algebraic representation and reduce its complexity, and it does not consider minterms. Its main drawback is that it becomes unworkable if the number of inputs rises and it cannot be considered as a general way to solve the problem of logic design.
It happens that minterms (or maxterms) do not have this drawback, and can be used to solve any problem. We get a trut table and we can directly convert it in an equation with ands, ors and nots. Indeed minterms are somehow simpler to human beings than maxterms, but it is just a matter of taste or of a reduced number of parenthesis, they are actually equivalent.
But I fail to understand why we always complement the 0s to express them as 1. Is it assumed that the inputs X, Y and Z will always be 1?
Assume that we have a truth table, with only a given output at 1. For instance, as line 3 of your table. It means that when x=0, y=1 and z=0 , the output will be zero. So, can I express that in boolean logic? With the SOP methodology, we say that we want a solution for this problem that is an "and" of entries or of their complement. And obviously the solution is "x must be false and y must be true and z must be false" or "(not x) must be true and y must be true and (not z) must be true", hence the minterm /x.y./z. So complementing when we have a 0 and leaving unchanged when we have a 1 is way to find the equation that will be true when xyz=010
If I have another table with only one output at 1 (for instance line 8 of your table), we can find similarly that I can implement this TT with x.y.z.
Now if I have a TT with 2 lines at 1, one can use the property of OR gates and do the OR of the previous circuits. when the output of the first one is 1, it will force this behavior and ditto for the second. And we directly get the solution for your table /xy/z+xyz
This can be extended to any number of ones in the TT and gives a systematic way to find an equation equivalent to a truth table.
So just think of minterms and maxterms as a tool to translate a TT into equations. What is important is the truth table (that describes the behaviour of what you want to do) and the equations (that give you a way to realize it).

Universal Quantification in Isabelle/HOL

It has come to my attention that there are several ways to deal with universal quantification when working with Isabelle/HOL Isar. I am trying to write some proofs in a style that is suitable for undergraduate students to understand and reproduce (that's why I'm using Isar!) and I am confused about how to express universal quantification in a nice way.
In Coq for example, I can write forall x, P(x) and then I may say "induction x" and that will automatically generate goals according to the corresponding induction principle. However, in Isabelle/HOL Isar, if I want to directly apply an induction principle I must state the theorem without any quantification, like this:
lemma foo: P(x)
proof (induct x)
And this works fine as x is then treated as a schematic variable, as if it was universally quantified. However, it lacks the universal quantification in the statement which is not very educational. Another way I have fund is by using \<And> and \<forall>. However, I can not directly apply the induction principle if I state the lemma in this way, I have to first fix the universally quantified variables... which again seems inconvenient from an educational point of view:
lemma foo: \<And>x. P(x)
proof -
fix x
show "P(x)"
proof (induct x)
What is a nice proof pattern for expressing universal quantification that does not require me to explicitly fix variables before induction?
You can use induct_tac, case_tac, etc. These are the legacy variant of the induct/induction and cases methods used in proper Isar. They can operate on bound meta-universally-quantified variables in the goal state, like the x in your second example:
lemma foo: "⋀x. P(x :: nat)"
proof (induct_tac x)
One disadvantage of induct_tac over induction is that it does not provide cases, so you cannot just write case (Suc x) and then from Suc.IH and show ?case in your proof. Another disadvantage is that addressing bound variables is, in general, rather fragile, since their names are often generated automatically by Isabelle and may change when Isabelle changes. (not in the case you have shown above, of course)
This is one of the reasons why Isar proofs are preferred these days. I would strongly advise against showing your students ‘bad’ Isabelle with the intention that it is easier for them to understand.
The facts are these: free variables in a theorem statement in Isabelle are logically equivalent to universally-quantified variables and Isabelle automatically converts them to schematic variables after you have proven it. This convention is not unique to Isabelle; it is common in mathematics and logic, and it helps to reduce clutter. Isar in particular tries to avoid explicit use of the ⋀ operator in goal statements (i.e. have/show; they still appear in assume).
Or, in short: free variables in theorems are universally quantified by default. I doubt that students will find this hard to understand; I certainly did not when I started with Isabelle as a BSc student. In fact, I found it much more natural to state a theorem as xs # (ys # zs) = (xs # ys) # zs instead of ∀xs ys zs. xs # (ys # zs) = (xs # ys) # zs.

how to find the highest normal form for a given relation

I've gone through internet and books and still have some difficulties on how to determine the normal form of this relation
R(a, b, c, d, e, f, g, h, i)
FDs =
B→G
BI→CD
EH→AG
G→DE
So far I've got that the only candidate key is BHI (If I should count with F, then BFHI).
Since the attribute F is not in use at all. Totally independent from the given FDs.
What am I supposed to do with the attribute F then?
How to determine the highest normal form for the realation R?
What am I supposed to do with the attribute F then?
You could observe the fact that the only FD in which F gets mentioned, is the trivial one F->F. It's not explicitly mentioned precisely because it is trivial. Nonetheless, all of Armstrong's axioms apply to trivial ones equally well. So, you can use this trivial one, e.g. applying augmentation, to go from B->G to BF->GF;
How to determine the highest normal form for the relation R?
first, test the condition of first normal form. If satisfied, NF is at least 1. Check the condition of second normal form. If satisfied, NF is at least 2. Check the condition of third normal form. If satisfied, NF is at least three.
Note :
"checking the condition of first normal form", is a bit of a weird thing to do in a formal process, because there exists no such thing as a formal definition of that condition, unless you go by Date's, but I have little doubt that your course does not follow that definition.
Hint :
Given that the sole key is BFHI, which is the first clause of "the key, the whole key, and nothing but the key" that gets violated by, say, B->G ?

Higher Order Function

I am having trouble understanding what my lecturer want me to do from this question. Can anyone help explain to me what he wants me to do?
Define a higher order version of the insertion sort algorithm. That is define
functions
insertBy :: Ord b => (a->b) -> a -> [a] -> [a]
inssortBy :: Ord b => (a->b) -> [a] -> [a]
and this bit is where it got me confused:
such that inssort f l sorts the list l such that an element x comes before an elementyif f x < f y.
If you were sorting numbers, then it's clear what x < y means. But what if you were sorting letters? Or customers? Or anything else without a clear (to the computer) ordering?
So you are supposed to create a function f() that defines that ordering for the sorting procedure. That f() will take the letters or customers or whatever and will return an integer for each one that the computer can actually sort on.
At least, that's how the problem is described. I personally would have designed a predicate that accepted two items, x and y and returned a boolean if x < y. But whichever is fine.
The code wants you to rewrite the insertion sort algorithm, but using a function as a parameter - thus a higher order function.
I would like to point out that this code, typo included, seems to stem from a piece of work currently due at a certain university - I found this page while searching for "insertion sort algortihm", as I copy pasted the term out of the document as well, typo included.
Seeking code from the internet is a risky business. Might I recommend the insertion sort algorithm wikipedia entry, or the Haskell code provided in your lecture slides (you are looking for the 'insertion sort algorithm' and for 'higher-order functions), as opposed to the several queries you have placed on Stack Overflow?

Repeated application of functions

Reading this question got me thinking: For a given function f, how can we know that a loop of this form:
while (x > 2)
x = f(x)
will stop for any value x? Is there some simple criterion?
(The fact that f(x) < x for x > 2 doesn't seem to help since the series may converge).
Specifically, can we prove this for sqrt and for log?
For these functions, a proof that ceil(f(x))<x for x > 2 would suffice. You could do one iteration -- to arrive at an integer number, and then proceed by simple induction.
For the general case, probably the best idea is to use well-founded induction to prove this property. However, as Moron pointed out in the comments, this could be impossible in the general case and the right ordering is, in many cases, quite hard to find.
Edit, in reply to Amnon's comment:
If you wanted to use well-founded induction, you would have to define another strict order, that would be well-founded. In case of the functions you mentioned this is not hard: you can take x << y if and only if ceil(x) < ceil(y), where << is a symbol for this new order. This order is of course well-founded on numbers greater then 2, and both sqrt and log are decreasing with respect to it -- so you can apply well-founded induction.
Of course, in general case such an order is much more difficult to find. This is also related, in some way, to total correctness assertions in Hoare logic, where you need to guarantee similar obligations on each loop construct.
There's a general theorem for when then sequence of iterations will converge. (A convergent sequence may not stop in a finite number of steps, but it is getting closer to a target. You can get as close to the target as you like by going far enough out in the sequence.)
The sequence x, f(x), f(f(x)), ... will converge if f is a contraction mapping. That is, there exists a positive constant k < 1 such that for all x and y, |f(x) - f(y)| <= k |x-y|.
(The fact that f(x) < x for x > 2 doesn't seem to help since the series may converge).
If we're talking about floats here, that's not true. If for all x > n f(x) is strictly less than x, it will reach n at some point (because there's only a limited number of floating point values between any two numbers).
Of course this means you need to prove that f(x) is actually less than x using floating point arithmetic (i.e. proving it is less than x mathematically does not suffice, because then f(x) = x may still be true with floats when the difference is not enough).
There is no general algorithm to determine whether a function f and a variable x will end or not in that loop. The Halting problem is reducible to that problem.
For sqrt and log, we could safely do that because we happen to know the mathematical properties of those functions. Say, sqrt approaches 1, log eventually goes negative. So the condition x < 2 has to be false at some point.
Hope that helps.
In the general case, all that can be said is that the loop will terminate when it encounters xi≤2. That doesn't mean that the sequence will converge, nor does it even mean that it is bounded below 2. It only means that the sequence contains a value that is not greater than 2.
That said, any sequence containing a subsequence that converges to a value strictly less than two will (eventually) halt. That is the case for the sequence xi+1 = sqrt(xi), since x converges to 1. In the case of yi+1 = log(yi), it will contain a value less than 2 before becoming undefined for elements of R (though it is well defined on the extended complex plane, C*, but I don't think it will, in general converge except at any stable points that may exist (i.e. where z = log(z)). Ultimately what this means is that you need to perform some upfront analysis on the sequence to better understand its behavior.
The standard test for convergence of a sequence xi to a point z is that give ε > 0, there is an n such that for all i > n, |xi - z| < ε.
As an aside, consider the Mandelbrot Set, M. The test for a particular point c in C for an element in M is whether the sequence zi+1 = zi2 + c is unbounded, which occurs whenever there is a |zi| > 2. Some elements of M may converge (such as 0), but many do not (such as -1).
Sure. For all positive numbers x, the following inequality holds:
log(x) <= x - 1
(this is a pretty basic result from real analysis; it suffices to observe that the second derivative of log is always negative for all positive x, so the function is concave down, and that x-1 is tangent to the function at x = 1). From this it follows essentially immediately that your while loop must terminate within the first ceil(x) - 2 steps -- though in actuality it terminates much, much faster than that.
A similar argument will establish your result for f(x) = sqrt(x); specifically, you can use the fact that:
sqrt(x) <= x/(2 sqrt(2)) + 1/sqrt(2)
for all positive x.
If you're asking whether this result holds for actual programs, instead of mathematically, the answer is a little bit more nuanced, but not much. Basically, many languages don't actually have hard accuracy requirements for the log function, so if your particular language implementation had an absolutely terrible math library this property might fail to hold. That said, it would need to be a really, really terrible library; this property will hold for any reasonable implementation of log.
I suggest reading this wikipedia entry which provides useful pointers. Without additional knowledge about f, nothing can be said.