Why do these folds stop at the head/tail? - function

I'm reading learnyouahaskell.com and currently investigating folds. In the book there are these examples:
maximum' :: (Ord a) => [a] -> a
maximum' = foldr1 (\x acc -> if x > acc then x else acc)
reverse' :: [a] -> [a]
reverse' = foldl (\acc x -> x : acc) []
product' :: (Num a) => [a] -> a
product' = foldr1 (*)
filter' :: (a -> Bool) -> [a] -> [a]
filter' p = foldr (\x acc -> if p x then x : acc else acc) []
head' :: [a] -> a
head' = foldr1 (\x _ -> x)
last' :: [a] -> a
last' = foldl1 (\_ x -> x)
I understand all of them except head' and tail'.
It is my understanding that the binary function should be applied to the accumulator and each element in the list in turn, and thus go through all the list. Why does this stop to the head (or tail, respectively)?
I understand _ (underscore) means "whatever" or "I don't care" but how does that stop going through all the list?

A foldr combines two items - the current "running total" sort of item, and the new item.
(\x _ -> x) takes the new item and discards it, retaining the original, so all of the remaining items are ignored.
Let's expand it:
foldr1 (\x _ -> x) [1..100000]
= (\x _ -> x) 1 (foldr (\x _ -> x) [2..100000])
= 1
Since the (foldr (\x _ -> x) [2..100000]) term isn't needed, it isn't evaluated (that's lazy evaluation in action, or rather inaction), so this runs fast.
With (\_ x -> x), the new item is taken and the old one is ignored - this keeps happening until the end of the list, so you get the last element. It doesn't avoid the other ones, it just forgets them all except the last.
A more human-readable name of (\_ x -> x) would refer to the fact that it ignores its first argument and returns its second one. Let's call it secondArg.
foldl1 (\_ x -> x) [1..4]
= let secondArg = (\_ x -> x) in foldl secondArg 1 [2..4]
= foldl (1 `secondArg` 2) [3..4]
= foldl ((1 `secondArg` 2) `secondArg` 3) [4]
= foldl (((1 `secondArg` 2) `secondArg` 3) `secondArg` 4) []
= (((1 `secondArg` 2) `secondArg` 3) `secondArg` 4)
= 4

Let's have a look at the definition of foldr1 first:
foldr1 :: (a -> a -> a) -> [a] -> a
foldr1 f [x] = x
foldr1 f (x : xs) = f x (foldr1 f xs)
Then, consider a call of your function head',
head' :: [a] -> a
head' = foldr1 (\x _ -> x)
to a list, say, [2, 3, 5]:
head' [2, 3, 5]
Now, filling in the right hand-side of head' gives
foldr1 (\x _ -> x) [2, 3, 5]
Recall that [2, 3, 5] is syntactic sugar for (2 : 3 : 5 : []). So, the second case of the definition of foldr1 applies and we yield
(\x _ -> x) 2 (foldr1 (\x _ -> x) (3 : 5 : [])
Now, reducing the applications results in 2 getting bound to x and foldr1 (\x _ -> x) (3 : 5 : []) getting bound to the ignored parameter _. What is left is the right-hand side of the lambda-abstraction with x replaced by 2:
2
Note that lazy evaluation makes that the ignored argument foldr1 (\x _ -> x) (3 : 5 : []) is left unevaluated and so—and this hopefully answers your question—the recursion stops before we have processed the remainder of the list.

Related

How to fix Haskell "error: [-Wincomplete-patterns, -Werror=incomplete-patterns]"

Can some one tell me, why i am getting the following error:
error: [-Wincomplete-patterns, -Werror=incomplete-patterns]
Pattern match(es) are non-exhaustive
In a case alternative: Patterns not matched: []
|
54 | case list of
| ^^^^^^^^^^^^...
Thats my test:
testMinBy :: Test
testMinBy = TestCase $ do
assertEqual "test1" (minBy (\x -> -x) [1,2,3,4,5]) 5
assertEqual "test2" (minBy length ["a", "abcd", "xx"]) "a"
minBy :: Ord b => (a -> b) -> [a] -> a
minBy measure list =
case list of
(x:y:xs) -> minBy measure (if measure x > measure y then y:xs else x:xs)
[x] -> x
Your pattern does not matches with the empty list. Indeed, that is what the error is saying. You can match the empty list, for example with:
minBy :: Ord b => (a -> b) -> [a] -> a
minBy measure list =
case list of
(x:y:xs) -> minBy measure (if measure x > measure y then y:xs else x:xs)
[x] -> x
[] -> error "Empty list"
Your function however is not very efficient: it will recalculate measure multiple times if an item is the current minimum, and will also pack and unpack lists. You can work with an accumulator here, like:
minBy :: Ord b => (a -> b) -> [a] -> a
minBy _ [] = error "Empty list"
minBy f (x:xs) = go xs x (f x)
where go [] y _ = y
go (y₁:ys) y₀ fy₀
| fy₁ < fy₀ = go ys y₁ fy₁
| otherwise = go ys y₀ fy₀
where fy₁ = f y₁
This means it only once has to check for an empty list, and then knows for sure that this is a non-empty list if it enumerates. It also will determine the f of each item exactly once, and uses accumulators to avoid packing and unpacking a "cons".

Couldn't match expected type ‘Bool’ with actual type ‘a -> Bool’

I want to write a function that returns the longest prefix of a list, where applying a function to every item in that prefix produces a strictly ascending list.
For example:
longestAscendingPrefix (`mod` 5) [1..10] == [1,2,3,4]
longestAscendingPrefix odd [1,4,2,6,8,9,3,2,1] == [1]
longestAscendingPrefix :: Ord b => (a -> b) -> [a] -> [a]
longestAscendingPrefix _ [] = []
longestAscendingPrefix f (x:xs) = takeWhile (\y z -> f y <= f z) (x:xs)
This code snippet produces the error message in the title. It seems the problem lies within that lambda function.
takeWhile has type takeWhile :: (a -> Bool) -> [a] -> [a]. The first parameter is thus a function that maps an element of the list to a Bool. Your lambda expression has type Ord b => a -> a -> Bool, which does not make much sense.
You can work with explicit recursion with:
longestAscendingPrefix :: Ord b => (a -> b) -> [a] -> [a]
longestAscendingPrefix f = go
where go [] = []
go [x] = …
go (x1:x2:xs) = …
where you need to fill in the … parts the last one makes a recursive call to go.

Haskell: arrow precedence with function arguments

I'm a relatively experienced Haskell programmer with a few hours of experience, so the answer might be obvious.
After watching A taste of Haskell, I got lost when Simon explained how the append (++) function really works with its arguments.
So, here's the part where he talks about this.
First, he says that (++) :: [a] -> [a] -> [a] can be understood as a function which gets two lists as arguments, and returns a list after the last arrow). However, he adds that actually, something like this happens: (++) :: [a] -> ([a] -> [a]), the function takes only one argument and returns a function.
I'm not sure to understand how the returned function closure gets the first list as it expects one argument as well.
On the next slide of the presentation, we have the following implementation:
(++) :: [a] -> [a] -> [a]
[] ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)
If I think that (++) receives two arguments and return a list, this piece of code along with the recursion is clear enough.
If we consider that (++) receives only one argument and returns a list, where does ys come from? Where is the returned function ?
The trick to understanding this is that all haskell functions only take 1 argument at most, it's just that the implicit parentheses in the type signature and syntax sugar make it appear as if there are more arguments. To use ++ as an example, the following transformations are all equivalent
xs ++ ys = ...
(++) xs ys = ...
(++) xs = \ys -> ...
(++) = \xs -> (\ys -> ...)
(++) = \xs ys -> ...
Another quick example:
doubleList :: [Int] -> [Int]
doubleList = map (*2)
Here we have a function of one argument doubleList without any explicit arguments. It would have been equivalent to write
doubleList x = map (*2) x
Or any of the following
doubleList = \x -> map (*2) x
doubleList = \x -> map (\y -> y * 2) x
doubleList x = map (\y -> y * 2) x
doubleList = map (\y -> y * 2)
The first definition of doubleList is written in what is commonly called point-free notation, so called because in the mathematical theory backing it the arguments are referred to as "points", so point-free is "without arguments".
A more complex example:
func = \x y z -> x * y + z
func = \x -> \y z -> x * y + z
func x = \y z -> x * y + z
func x = \y -> \z -> x * y + z
func x y = \z -> x * y + z
func x y z = x * y + z
Now if we wanted to completely remove all references to the arguments we can make use of the . operator which performs function composition:
func x y z = (+) (x * y) z -- Make the + prefix
func x y = (+) (x * y) -- Now z becomes implicit
func x y = (+) ((*) x y) -- Make the * prefix
func x y = ((+) . ((*) x)) y -- Rewrite using composition
func x = (+) . ((*) x) -- Now y becomes implicit
func x = (.) (+) ((*) x) -- Make the . prefix
func x = ((.) (+)) ((*) x) -- Make implicit parens explicit
func x = (((.) (+)) . (*)) x -- Rewrite using composition
func = ((.) (+)) . (*) -- Now x becomes implicit
func = (.) ((.) (+)) (*) -- Make the . prefix
So as you can see there are lots of different ways to write a particular function with a varying number of explicit "arguments", some of which are very readable (i.e. func x y z = x * y + z) and some which are just a jumble of symbols with little meaning (i.e. func = (.) ((.) (+)) (*))
Maybe this will help. First let's write it without operator notation which might be confusing.
append :: [a] -> [a] -> [a]
append [] ys = ys
append (x:xs) ys = x : append xs ys
We can apply one argument at a time:
appendEmpty :: [a] -> [a]
appendEmpty = append []
we could equivalently could have written that
appendEmpty ys = ys
from the first equation.
If we apply a non-empty first argument:
-- Since 1 is an Int, the type gets specialized.
appendOne :: [Int] -> [Int]
appendOne = append (1:[])
we could have equivalently have written that
appendOne ys = 1 : append [] ys
from the second equation.
You are confused about how Function Currying works.
Consider the following function definitions of (++).
Takes two arguments, produces one list:
(++) :: [a] -> [a] -> [a]
[] ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)
Takes one argument, produces a function taking one list and producing a list:
(++) :: [a] -> ([a] -> [a])
(++) [] = id
(++) (x:xs) = (x :) . (xs ++)
If you look closely, these functions will always produce the same output. By removing the second parameter, we have changed the return type from [a] to [a] -> [a].
If we supply two parameters to (++) we get a result of type [a]
If we supply only one parameter we get a result of type [a] -> [a]
This is called function currying. We don't need to provide all the arguments to a function with multiple arguments. If we supply fewer then the total number of arguments, instead of getting a "concrete" result ([a]) we get a function as a result which can take the remaining parameters ([a] -> [a]).

Act on a `case` clause in Haskell

I'm attempting problem 11 of "99 Haskell Problems." The problem description is pretty much:
Write a function encodeModified that groups consecutive equal elements, then counts each group, and separates singles from runs.
For example:
Prelude> encodeModified "aaaabccaadeeee"
[Multiple 4 'a',Single 'b',Multiple 2 'c',
Multiple 2 'a',Single 'd',Multiple 4 'e']
Here's my working code:
module Batch2 where
import Data.List -- for `group`
data MultiElement a = Single a | Multiple Int a deriving (Show)
encodeModified :: (Eq a) => [a] -> [MultiElement a]
encodeModified = map f . group
where f xs = case length xs of 1 -> Single (head xs)
_ -> Multiple (length xs) (head xs)
I'd like to take out that pesky repeated (head xs) in the final two lines. I figured I could do so by treating the result of the case clause as a partially applied data constructor, as follows, but no luck:
encodeModified :: (Eq a) => [a] -> [MultiElement a]
encodeModified = map f . group
where f xs = case length xs of 1 -> Single
_ -> Multiple length xs
(head xs)
I also tried putting parenthese around the case clause itself, but to no avail. In that case, the case clause itself failed to compile (throwing an error upon hitting the _ symbol on the second line of the clause).
EDIT: this error was because I added a parenthesis but didn't add an extra space to the next line to make the indentation match. Thanks, raymonad.
I can also solve it like this, but it seems a little messy:
encodeModified :: (Eq a) => [a] -> [MultiElement a]
encodeModified = map (\x -> f x (head x)) . group
where f xs = case length xs of 1 -> Single
_ -> Multiple (length xs)
How can I do this?
The function application operator $ can be used to make this work:
encodeModified = map f . group
where f xs = case length xs of 1 -> Single
_ -> Multiple (length xs)
$ head xs
You could match on xs itself instead:
encodeModified :: (Eq a) => [a] -> [MultiElement a]
encodeModified = map f . group
where f xs = case xs of (x:[]) -> Single x
(x:_) -> Multiple (length xs) x
or more tersely as
encodeModified :: (Eq a) => [a] -> [MultiElement a]
encodeModified = map f . group
where f (x:[]) = Single x
f xs#(x:_) = Multiple (length xs) x
or even
encodeModified :: (Eq a) => [a] -> [MultiElement a]
encodeModified = map f . group
where f as#(x:xs) = case xs of [] -> Single x
_ -> Multiple (length as) x
Admittedly most of these have some repetition, but not of function application.
You could also go with let:
encodeModified :: (Eq a) => [a] -> [MultiElement a]
encodeModified = map f . group
where f xs = let x = head xs
len = length xs in
case len of 1 -> Single x
_ -> Multiple len x

foldr and foldl further explanations and examples

I've looked at different folds and folding in general as well as a few others and they explain it fairly well.
I'm still having trouble on how a lambda would work in this case.
foldr (\y ys -> ys ++ [y]) [] [1,2,3]
Could someone go through that step-by-step and try to explain that to me?
And also how would foldl work?
foldr is an easy thing:
foldr :: (a->b->b) -> b -> [a] -> b
It takes a function which is somehow similar to (:),
(:) :: a -> [a] -> [a]
and a value which is similar to the empty list [],
[] :: [a]
and replaces each : and [] in some list.
It looks like this:
foldr f e (1:2:3:[]) = 1 `f` (2 `f` (3 `f` e))
You can imagine foldr as some state-machine-evaluator, too:
f is the transition,
f :: input -> state -> state
and e is the start state.
e :: state
foldr (foldRIGHT) runs the state-machine with the transition f and the start state e over the list of inputs, starting at the right end. Imagine f in infix notation as the pacman coming from-RIGHT.
foldl (foldLEFT) does the same from-LEFT, but the transition function, written in infix notation, takes its input argument from right. So the machine consumes the list starting at the left end. Pacman consumes the list from-LEFT with an open mouth to the right, because of the mouth (b->a->b) instead of (a->b->b).
foldl :: (b->a->b) -> b -> [a] -> b
To make this clear, imagine the function (-) as transition:
foldl (-) 100 [1] = 99 = ((100)-1)
foldl (-) 100 [1,2] = 97 = (( 99)-2) = (((100)-1)-2)
foldl (-) 100 [1,2,3] = 94 = (( 97)-3)
foldl (-) 100 [1,2,3,4] = 90 = (( 94)-4)
foldl (-) 100 [1,2,3,4,5] = 85 = (( 90)-5)
foldr (-) 100 [1] = -99 = (1-(100))
foldr (-) 100 [2,1] = 101 = (2-(-99)) = (2-(1-(100)))
foldr (-) 100 [3,2,1] = -98 = (3-(101))
foldr (-) 100 [4,3,2,1] = 102 = (4-(-98))
foldr (-) 100 [5,4,3,2,1] = -97 = (5-(102))
You probably want to use foldr in situations where the list can be infinite, and where the evaluation should be lazy:
foldr (either (\l ~(ls,rs)->(l:ls,rs))
(\r ~(ls,rs)->(ls,r:rs))
) ([],[]) :: [Either l r]->([l],[r])
And you probably want to use the strict version of foldl, which is foldl', when you consume the whole list to produce its output. It might perform better and might prevent you from having stack-overflow or out-of-memory exceptions (depending on compiler) due to extreme long lists in combination with lazy evaluation:
foldl' (+) 0 [1..100000000] = 5000000050000000
foldl (+) 0 [1..100000000] = error "stack overflow or out of memory" -- dont try in ghci
foldr (+) 0 [1..100000000] = error "stack overflow or out of memory" -- dont try in ghci
The first one –step by step– creates one entry of the list, evaluates it, and consumes it.
The second one creates a very long formula first, wasting memory with ((...((0+1)+2)+3)+...), and evaluates all of it afterwards.
The third one is like the second, but with the other formula.
Using
foldr f z [] = z
foldr f z (x:xs) = x `f` foldr f z xs
And
k y ys = ys ++ [y]
Let's unpack:
foldr k [] [1,2,3]
= k 1 (foldr k [] [2,3]
= k 1 (k 2 (foldr k [] [3]))
= k 1 (k 2 (k 3 (foldr k [] [])))
= (k 2 (k 3 (foldr k [] []))) ++ [1]
= ((k 3 (foldr k [] [])) ++ [2]) ++ [1]
= (((foldr k [] []) ++ [3]) ++ [2]) ++ [1]
= ((([]) ++ [3]) ++ [2]) ++ [1]
= (([3]) ++ [2]) ++ [1]
= ([3,2]) ++ [1]
= [3,2,1]
The definition of foldr is:
foldr f z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
So here's a step by step reduction of your example:
foldr (\y ys -> ys ++ [y]) [] [1,2,3]
= (\y ys -> ys ++ [y]) 1 (foldr (\y ys -> ys ++ [y]) [] [2,3])
= (foldr (\y ys -> ys ++ [y]) [] [2,3]) ++ [1]
= (\y ys -> ys ++ [y]) 2 (foldr (\y ys -> ys ++ [y]) [] [3]) ++ [1]
= (foldr (\y ys -> ys ++ [y]) [] [3]) ++ [2] ++ [1]
= (\y ys -> ys ++ [y]) 3 (foldr (\y ys -> ys ++ [y]) [] []) ++ [2] ++ [1]
= (foldr (\y ys -> ys ++ [y]) [] []) ++ [3] ++ [2] ++ [1]
= [] ++ [3] ++ [2] ++ [1]
= [3,2,1]
Infix notation will probably be clearer here.
Let's start with the definition:
foldr f z [] = z
foldr f z (x:xs) = x `f` (foldr f z xs)
For the sake of brevity, let's write g instead of (\y ys -> ys ++ [y]). The following lines are equivalent:
foldr g [] [1,2,3]
1 `g` (foldr g [] [2,3])
1 `g` (2 `g` (foldr g [] [3]))
1 `g` (2 `g` (3 `g` (foldr g [] [])))
1 `g` (2 `g` (3 `g` []))
(2 `g` (3 `g` [])) ++ [1]
(3 `g` []) ++ [2] ++ [1]
[3] ++ [2] ++ [1]
[3,2,1]
My way of remembering this firstly, is through the use of an associative sensitive subtraction operation:
foldl (\a b -> a - b) 1 [2] = -1
foldr (\a b -> a - b) 1 [2] = 1
Then secondly , foldl starts at the leftmost or first element of the list whereas foldr starts at the rightmost or last element of the list. It is not obvious above since the list has only one element.
My mnemonic is this: The left or right describes two things:
the placement of the minus (-) symbol
the starting element of the list
I tend to remember things with movement, so I imagine and visualize values flying around. This is my internal representation of foldl and foldr.
The diagram below does a few things:
names the arguments of the fold functions in a way that is intuitive (for me),
shows which end each particular fold works from, (foldl from the left, foldr from the right),
color codes the accumulator and current values,
traces the values through the lambda function, mapping them onto the next iteration of the fold.
Mnemonically, I remember the arguments of foldl as being in alphabetical order (\a c ->), and the arguments of foldr to be in reverse alphabetical order (\c a ->). The l means take from the left, the r means take from the right.