How does Unison compute the hashes of recursive functions? - unison-lang

In Unison, functions are identified by the hashes of their ASTs instead of by their names.
Their documentation and their FAQs have given some explanations of the mechanism.
However, the example presented in the link is not clear to me how the hashing actually works:
They used an example
f x = g (x - 1)
g x = f (x / 2)
which in the first step of their hashing is converted to the following:
$0 =
f x = $0 (x - 1)
g x = $0 (x / 2)
Doesn't this lose information about the definitions.
For the two following recursively-defined functions, how can the hashing distinguish them:
# definition 1
f x = g (x / 2)
g x = h (x + 1)
h x = f (x * 2 - 7)
# definition 2
f x = h (x / 2)
g x = f (x + 1)
h x = g (x * 2 - 7)
In my understanding, brutally converting all calling of f g and h to $0 would make the two definitions undistinguishable from each other. What am I missing?

The answer is that the form in the example (with $0) is not quite accurate. But in short, there's a special kind of hash (a "cycle hash") which is has the form #h.n where h is the hash of all the mutually recursive definitions taken together, and n is a number from 0 to the number of terms in the cycle. Each definition in the cycle gets the same hash, plus an index.
The long answer:
Upon seeing cyclical definitions, Unison captures them in a binding form called Cycle. It's a bit like a lambda, but introduces one bound variable for each definition in the cycle. References within the cycle are then replaced with those variables. So:
f x = g (x - 1)
g x = f (x / 2)
Internally becomes more like (this is not valid Unison syntax):
$0 = Cycle f g ->
letrec
[ x -> g (x - 1)
, x -> f (x / 2) ]
It then hashes each of the lambdas inside the letrec and sorts them by that hash to get a canonical order. Then the whole cycle is hashed. Then these "cycle hashes" of the form #h.n get introduced at the top level for each lambda (where h is the hash of the whole cycle and n is the canonical index of each term), and the bound variables get replaced with the cycle hashes:
#h.0 = x -> #h.1 (x - 1)
#h.1 = x -> #h.0 (x / 2)
f = #h.0
g = #h.1

Related

Solve for the coefficients of (functions of) the independent variable in a symbolic equation

Using Octave's symbolic package, I define a symbolic function of t like this:
>> syms a b c d t real;
>> f = poly2sym([a b c], t) + d * exp(t)
f = (sym)
2 t
a⋅t + b⋅t + c + d⋅ℯ
I also have another function with known coefficients:
>> g = poly2sym([2 3 5], t) + 7 * exp(t)
g = (sym)
2 t
2⋅t + 3⋅t + 7⋅ℯ + 5
I would like to solve f == g for the coefficients a, b, c, d such that the equation holds for all values of t. That is, I simply want to equate the coefficients of t^2 in both equations, and the coefficients of exp(t), etc. I am looking for this solution:
a = 2
b = 3
c = 5
d = 7
When I try to solve the equation using solve, this is what I get:
>> solve(f == g, a, b, c, d)
ans = (sym)
t 2 t
-b⋅t - c - d⋅ℯ + 2⋅t + 3⋅t + 7⋅ℯ + 5
───────────────────────────────────────
2
t
It solves for a in terms of b, c, d, t. This is understandable since in essence there is no difference between the variables b, c and t. But I was wondering if there was a method to somehow separate the terms (using their symbolic form w. r. t. the variable t) and solve the resulting system of linear equations on a, b, c, d.
Note: The function I wrote here is a minimal example. What I am really trying to do is to solve a linear ordinary differential equation using the method of undetermined coefficients. For example, I define something like y = a*exp(-t) + b*t*exp(-t), and solve for diff(y, t, t) + diff(y,t) + y == t*exp(-t). But I believe solving the problem with simpler functions will lead me to the right direction.
I have found a terribly slow and dirty method to get the job done. The coefficients have to be linear in a, b, ... though.
The idea is to follow these steps:
Write the equation in f - g form (which equals zero)
Use expand() to separate the terms
Use children() to get the terms in the equation as a symbolic vector
Now that we have the terms in a vector, we can find those that are the same function of t and add their coefficients together. The way I checked this was by checking if the division of two terms had t as a symbolic variable
For each term, find other terms with the same function of t, add all these coefficients together, save the obtained equation in a vector
Pass the vector of created equations to solve()
This code solves the equation I wrote in the note at the end of my question:
pkg load symbolic
syms t a b real;
y = a * exp(-t) + b * t * exp(-t);
lhs = diff(y, t, t) + diff(y, t) + y;
rhs = t * exp(-t);
expr = expand(lhs - rhs);
chd = children(expr);
used = false(size(chd));
equations = [];
for z = 1:length(chd)
if used(z)
continue
endif
coefficients = 0;
for zz = z + 1:length(chd)
if used(zz)
continue
endif
division = chd(zz) / chd(z);
vars = findsymbols(division);
if sum(has(vars, t)) == 0 # division result has no t
used(zz) = true;
coefficients += division;
endif
endfor
coefficients += 1; # for chd(z)
vars = findsymbols(chd(z));
nott = vars(!has(vars, t));
if length(nott)
coefficients *= nott;
endif
equations = [equations, expand(coefficients)];
endfor
solution = solve(equations == 0);

MIPS Programming instruction count issue

I wrote this mips code to find the gcf but I am confused on getting the number of instructions executed for this code. I need to find a linear function as a function of number of times the remainder must be calculated before an answer. i tried running this code using Single step with Qtspim but not sure on how to proceed.
gcf:
addiu $sp,$sp,-4 # adjust the stack for an item
sw $ra,0($sp) # save return address
rem $t4,$a0,$a1 # r = a % b
beq $t4,$zero,L1 # if(r==0) go to L1
add $a0,$zero,$a1 # a = b
add $a1,$zero,$t4 # b = r
jr gcf
L1:
add $v0,$zero,$a1 # return b
addiu $sp,$sp,4 # pop 2 items
jr $ra # return to caller
There is absolutely nothing new to show here, the algorithm you just implemented is the Euclidean algorithm and it is well known in the literature1.
I will nonetheless write an informal analysis here as link only questions are evil.
First lets rewrite the code in an high level formulation:
unsigned int gcd(unsigned int a, unsigned int b)
{
if (a % b == 0)
return b;
return gcd(b, a % b);
}
The choice of unsigned int vs int was dicated by the MIPS ISA that makes rem undefined for negative operands.
Out goal is to find a function T(a, b) that gives the number of step the algorithm requires to compute the GDC of a and b.
Since a direct approach leads to nothing, we try by inverting the problem.
What pairs (a, b) makes T(a, b) = 1, in other words what pairs make gcd(a, b) terminates in one step?
We clearly must have that a % b = 0, which means that a must be a multiple of b.
There are actually an (countable) infinite number of pairs, we can limit our selves to pairs with the smallest, a and b2.
To recap, to have T(a, b) = 1 we need a = nb and we pick the pair (a, b) = (1, 1).
Now, given a pair (c, d) that requires N steps, how do we find a new pair (a, b) such that T(a, b) = T(c, d) + 1?
Since gcd(a, b) must take one step further then gcd(c, d) and since starting from gcd(a, b) the next step is gcd(b, a % b) we must have:
c = b => b = c
d = a % b => d = a % c => a = c + d
The step d = a % c => a = c + d comes from the minimality of a, we need the smallest a that when divided by c gives d, so we can take a = c + d since (c + d) % c = c % c d % c = 0 + d = d.
For d % c = d to be true we need that d < c.
Our base pair was (1, 1) which doesn't satisfy this hypothesis, luckily we can take (2, 1) as the base pair (convince your self that T(2, 1) = 1).
Then we have:
gcd(3, 2) = gcd(2, 1) = 1
T(3, 2) = 1 + T(2, 1) = 1 + 1 = 2
gcd(5, 3) = gcd(3, 2) = 1
T(5, 3) = 1 + T(3, 2) = 1 + 2 = 3
gcd(8, 5) = gcd(5, 3) = 1
T(8, 5) = 1 + T(5, 3) = 1 + 3 = 4
...
If we look at the pair (2, 1), (3, 2), (5, 3), (8, 5), ... we see that the n-th pair (starting from 1) is made by the number (Fn+1, Fn).
Where Fn is the n-th Fibonacci number.
We than have:
T(Fn+1, Fn) = n
Regarding Fibonacci number we know that Fn ∝ φn.
We are now going to use all the trickery of asymptotic analysis, particularly in the limit of the big-O notation considering φn or φn + 1 is the same.
Also we won't use the big-O symbol explicitly, we rather assume that each equality is true in the limit. This is an abuse, but makes the analysis more compact.
We can assume without loss of generality that N is an upper bound for both number in the pair and that it is proportional to φn.
We have N ∝ φn that gives logφ N = n, this ca be rewritten as log(N)/log(φ) = n (where logs are in base 10 and log(φ) can be taken to be 1/5).
Thus we finally have 5logN = n or written in reverse order
n = 5 logN
Where n is the number of step taken by gcd(a, b) where 0 < b < a < N.
We can further show that if a = ng and b = mg with n, m coprimes, than T(a, b) = T(n, m) thus the restriction of taking the minimal pairs is not bounding.
1 In the eventuality that you rediscovered such algorithm, I strongly advice against continue with reading this answer. You surely have a sharp mind that would benefit the most from a challenge than from an answer.
2 We'll later see that this won't give rise to a loss of generality.

Pattern matching on Ints

I am a beginner in learning Haskell, and I wanted to know if you could pattern match on Ints like so:
add x 0 = x
add x (1 + y) = 1 + x + add x y,
Or maybe in this way:
add x 0 = x
add x (successor y) = 1 + x + add x y
There is an extension that lets you do that, but instead you should simply pattern match on y, and subtract 1 manually:
add x y = 1 + x + add x (y - 1)
The extension is called NPlusKPatterns. If you really want to use it (keep in mind it's deprecated in haskell 2010), it can be enabled by either passing a -XNPlusKPatterns parameter to GHC, or putting a {-# LANGUAGE NPlusKPatterns #-} at the top of your file.
Pattern matching isn't arbitrary case analysis. It's a disciplined, but limited form of case analysis, where the cases are the constructors of a data type.
In the specific case of pattern matching integers, the constructors are taken to be the integer values. So you can use integer values as the cases for pattern-matching:
foo 0 = ...
foo 2 = ...
foo x = ...
But you can't use arbitrary expressions. The following code is illegal:
foo (2 * x) = ...
foo (2 * x + 1) = ...
You may know that ever integer is either of the form 2 * x or 2 * x + 1. But the type system doesn't know.
The formatting of your code is a bit off so it is difficult to know what you're asking but you can using pattern matching for input of type Int. An example would be
add x 0 = x
add x y = x + y

Benefits of where notation in Haskell

What are the pros and cons of explicit function definition as opposed to where notation in Haskell?
Explicit function definition:
foo :: Integer -> Integer
foo a = bar a
where
bar :: Integer -> Integer
bar a = Some code here
as opposed to:
foo :: Integer -> Integer
foo a = bar a
bar :: Integer -> Integer
bar a = Some code here
Why would I use one over the other? Is there anything to be aware of with regards to efficiency? Security? Code reusability? Code readability?
If your auxiliary function is not going to be used anywhere else, it's better not to pollute the namespace and use a local definition.
When your outer function has only one top-level "pattern", the where clause can simplify the definition of the auxiliary function because the parameters of the outer function will be in scope.
outer x v z f = undefined
where
inner i = i + x + v + z + f
versus
outer x v z f = undefined
inner x v z f i = i + x + v + z + f
If your function has more than one top-level "pattern", then you can't share bindings across patters using where. You have to define a top-level binding.
Certain ways of using where can incur in non-obvious performance penalties. This definition (taken from the HaskellWiki article on let vs where)
fib x = map fib' [0 ..] !! x
where
fib' 0 = 0
fib' 1 = 1
fib' n = fib (n - 1) + fib (n - 2)
is slower than this one:
fib = (map fib' [0 ..] !!)
where
fib' 0 = 0
fib' 1 = 1
fib' n = fib (n - 1) + fib (n - 2)
and also slower than defining fib' at the top-level.
The reason is that, in the first definition, a new fib' is created for each invocation of fib. Explained here.
If you need bar only in the scope of foo, it is more readable and better information hiding to declare it in "where". If bar should be reusable outside of foo, the it needs to be declared parallel to foo.

Haskell function about even and odd numbers

I'm new to Haskell, started learning a couple of days ago and I have a question on a function I'm trying to make.
I want to make a function that verifies if x is a factor of n (ex: 375 has these factors: 1, 3, 5, 15, 25, 75, 125 and 375), then removes the 1 and then the number itself and finally verifies if the number of odd numbers in that list is equal to the number of even numbers!
I thought of making a functions like so to calculate the first part:
factor n = [x | x <- [1..n], n `mod`x == 0]
But if I put this on the prompt it will say Not in scope 'n'. The idea was to input a number like 375 so it would calculate the list. What I'm I doing wrong? I've seen functions being put in the prompt like this, in books.
Then to take the elements I spoke of I was thinking of doing tail and then init to the list. You think it's a good idea?
And finally I thought of making an if statement to verify the last part. For example, in Java, we'd make something like:
(x % 2 == 0)? even++ : odd++; // (I'm a beginner to Java as well)
and then if even = odd then it would say that all conditions were verified (we had a quantity of even numbers equal to the odd numbers)
But in Haskell, as variables are immutable, how would I do the something++ thing?
Thanks for any help you can give :)
This small function does everything that you are trying to achieve:
f n = length evenFactors == length oddFactors
where evenFactors = [x | x <- [2, 4..(n-1)], n `mod` x == 0]
oddFactors = [x | x <- [3, 5..(n-1)], n `mod` x == 0]
If the "command line" is ghci, then you need to
let factor n = [x | x <- [2..(n-1)], n `mod` x == 0]
In this particular case you don't need to range [1..n] only to drop 1 and n - range from 2 to (n-1) instead.
The you can simply use partition to split the list of divisors using a boolean predicate:
import Data.List
partition odd $ factor 10
In order to learn how to write a function like partition, study recursion.
For example:
partition p = foldr f ([],[]) where
f x ~(ys,ns) | p x = (x:ys,ns)
f x ~(ys,ns) = (ys, x:ns)
(Here we need to pattern-match the tuples lazily using "~", to ensure the pattern is not evaluated before the tuple on the right is constructed).
Simple counting can be achieved even simpler:
let y = factor 375
(length $ filter odd y) == (length y - (length $ filter odd y))
Create a file source.hs, then from ghci command line call :l source to load the functions defined in source.hs.
To solve your problem this may be a solution following your steps:
-- computers the factors of n, gets the tail (strips 1)
-- the filter functions removes n from the list
factor n = filter (/= n) (tail [x | x <- [1..n], n `mod` x == 0])
-- checks if the number of odd and even factors is equal
oe n = let factors = factor n in
length (filter odd factors) == length (filter even factors)
Calling oe 10 returns True, oe 15 returns False
(x % 2 == 0)? even++ : odd++;
We have at Data.List a partition :: (a -> Bool) -> [a] -> ([a], [a]) function
So we can divide odds like
> let (odds,evens) = partition odd [1..]
> take 10 odds
[1,3,5,7,9,11,13,15,17,19]
> take 10 evens
[2,4,6,8,10,12,14,16,18,20]
Here is a minimal fix for your factor attempt using comprehensions:
factor nn = [x | n <- [1..nn], x <- [1..n], n `mod`x == 0]