Haskell for the Impatient: F-Algebras

June 14, 2024 10-minute read

haskell

F-Algebras

This is the first blog post in the Haskell for the Impatient series. The required knowledge is:

Haskell Data Types
Folds
Functors (It’s what the F stands for!)

Motivations

When I talk about folds, I am not referring to Foldable, I am referring to natural folds. This is when you take a data structure, e.g. a list, and you replace the constructors of that list with something else. An example of a natural fold is foldr. I’ll walk you through an example.

Take the following list, [1,2,3]. In haskell, this is syntactic sugar for 1 : (2 : (3 : [])). Here, we see that we are using 2 constructors, (:) and []. If we wanted to sum this list, we could replace (:) with (+) and we could replace [] with 0. Lets see this in action:

   [1,2,3]
== 1 : (2 : (3 : []))  -- De-sugar
=> 1 + (2 + (3 + 0))   -- replace : and [] with + and 0
== 6                   -- evaluate

This pattern is really useful, if I wanted to multiply all of the elements together, I’d replace : with * and [] with 1. You can probably think of other uses, there are loads!

To return to our sum example, let’s look at the sum function:

sum = foldr (+) 0

You should now be able to link my explanation of a natural fold with the example above. The first argument of foldr is the function that we will replace : with, and the second argument is the value we will replace [] with.

Let’s expand this insight by looking at the type of foldr:

foldr :: (a -> b -> b) -> b -> [a] -> b

The first thing to say is, if we look at the end of this type, we get the intention of the function. foldr takes a list of as and turns it into a b. It collapses it. This means that the output has less structure than the input¹. If we look at the types of the constructors of a list we get (:) :: a -> [a] -> [a] and [] :: [a]. If you squint so hard that these definitions, that [a] starts to look like a b, you get the first two inputs of foldr and we have enough to write a definition:

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr cons empty []     = empty
foldr cons empty (x:xs) = x `cons` foldr cons empty xs

Folding Sets

Now folds are not just a fun thing to do with lists, every single data type as a fold! Let’s look at a set:

data Set a = Empty | Singleton a | Union (Set a) (Set a)

Here I have described a set as either the empty set, a singleton set or a union of two sets. In a way, we represent sets as trees.

Lets write a fold! We start by looking at the types of our constructors:

Empty :: Set a
Singleton :: a -> Set a
Union :: Set a -> Set a -> Set a

And from this we get the type of our folding function by replacing Set a with b:

foldSet :: b -> (a -> b) -> (b -> b -> b) -> Set a -> b

Make sure you understand why this is the type for the fold!

Next we write the body:

foldSet :: b -> (a -> b) -> (b -> b -> b) -> Set a -> b
foldSet empty sing union (Empty)       = empty
foldSet empty sing union (Singleton x) = sing x
foldSet empty sing union (Union s1 s2) = union (foldSet empty sing union s1) 
                                               (foldSet empty sing union s2)

And to demonstrate the power of this function:

isIn :: (Eq a) => a -> Set a -> Bool
isIn x = foldSet False (== x) (||)

isSubsetOf :: (Eq a) => Set a -> Set a -> Bool
isSubsetOf s1 s2 = foldSet True (`isIn` s2) (&&) s1

instance (Eq a) => Eq (Set a) where
   s1 == s2 = (s1 `isSubsetOf` s2) && (s2 `isSubsetOf` s1)

You can see that foldSet is doing a lot of heavy lifting!

Can we Generalise?

A natural fold is just one example of a recursion scheme. There are loads! And they bring a lot to haskell. These schemes make code easier to read and understand, much easier to do program proof with and make you feel warm and fuzzy inside. The only issue is that you have to write an instance for them for every single type you define! So the question is, can we generalise? can we define a version of fold (maybe even unfold) which works for every type? No. Not really. But we can do something equivalent that is arguably just as good!

Removing Recursion

Now we get to the fun part. Let’s define a different version of Set:

data SetF a b = Empty | Singleton a | Union b b

You may be thinking (quite rightly) “what the hell does that achieve!” And fair enough! I have taken our type for Set and defined it in a way that is, quite simply, no longer a set. But it could be! Consider the set: $s = \{1,2,3\}$. This can be expressed as:

s = (Singleton 1 `Union` (Singleton 2 `Union` Singleton 3))

But the type of this is: s :: SetF Int (SetF Int (SetF Int b)). But if we wanted an infinite set (in much the same way we have infinite lists in haskell), how would we go about it? Wouldn’t that require an infinite type? We have to define the type level fixed point combinator:

newtype Fix f = Fix { unfix :: {f (Fix f)} }

If you squint, you realise that this essentially, given a type f, gives us something like f (f (f (f ...)) which is perfect! We can now take SetF and define Set out of it and define functions that act like the original constructors:

type Set a = Fix (SetF a)

empty :: Set a
empty = Fix Empty

singleton :: a -> Set a
singleton x = Fix $ Singleton x

union :: Set a -> Set a -> Set a
union s1 s2 = Fix $ Union s1 s2

So Why did we do any of that!?

What we have defined is an F-algebra. This is a concept that comes from category theory (that’s where it get’s it’s name) and I’m going to try to break it down and avoid reference to category theory.

Firstly, an F-Algebra is a generalisation of an Algebraic structure, so it would probably help to know what those are first. An algebraic structure consists of 3 things:

A set, $A$, called the carrier set. These are the things that our algebra works with. So in discrete maths, it could be $\mathbb{N}$
A set of operations on $A$. So in our discrete maths example, $\{+,-,\times : \mathbb{N}^2 \to \mathbb{N}\}$
A set of identities, e.g. $\{n + 0 = n, 0 + n = n, 1 \times n = n,\dots\}$

You can, if you like understand normal haskell types this way (if you so wished). $A$ could be the set of all lists, you could have ++ as your operations, and identities with [].

If you have made it this far, your next question should be, “how can we generalise this?” Likely followed by: “Why would we generalise it? What is the point of any of this? Am I just distracting myself with mathematics to avoid a looming existential crisis?” To which I say, “yes”.

So an F-algebra generalises this idea by saying that an algebra is in fact a pair, $(A,\alpha)$, where $A$ is a type, and $\alpha$ is a function² $\alpha : F(A) \to A$. $F$, here, is a functor³. So lets come up with a super simple example, then expand it for our set example⁴.

data T x = Tip | Branch x x

instance Functor T where
    fmap _ Tip          = Tip
    fmap f (Branch x y) = Branch (f x) (f y)

We will call this a T algebra. This means we have a type, $A$, and a function, $\alpha : T A \to A$. Now notice that a function of type $T A \to A$ is actually the same as having two functions, $f : () \to A$ and $g : A \to A \to A$! Just because we need to deal with both cases of T! SCHWEET! So in a way, we have got ourselves a set, $A$ and a set of operations without having to specify more than a functor and a function!

Another way of thinking about it is that the functor, $F$, encodes the shape of our data, and $\alpha$ describes it’s deconstruction.

Back to Set

So we have to describe how SetF is a functor:

instance Functor (SetF a) where
    fmap _ (Empty)       = Empty
    fmap f (Singleton x) = Singleton $ f x
    fmap f (Union s1 s2) = Union (fmap f s1) (fmap f s2)

Now we can define fold!

fold :: Functor f => (f a -> a) -> Fix f -> a
fold alpha = alpha . fmap (fold alpha) . unfix

Let’s walk though this because it’s a pain to get one’s head around!

Something of type Fix f enters the function
We apply unfix to it. So we get something of type f (Fix f)
We fmap (fold alpha) over it. Looking at the type of fold alpha, it is Fix f -> a, so the result is f a
We then apply alpha to that, and we get something of type a

So lets define isIn with our new fold function:

isIn :: (Eq a) => a -> Set a -> Bool
isIn x = fold go
    where go :: (Eq a) => SetF a Bool -> Bool -- You don't need the type here. It's just to help you see how it works
          go Empty         = False
          go (Singleton y) = x == y
          go (Union s1 s2) = s1 || s2

Ta daaaa! A version of fold that works with F-algebras!

Why would I care!

So much to even my surprise, this has proven to be useful. I am working on a project where I need a type that can represent a file system. I use F-algebras!


data FileTreeF a = File FileData
                 | Branch DirData [a]

Here, FileData and DirData are records with information about each file system object. They both have an entry of type Word64 which is the file system object ID, unique for each file and folder, and the ID of their parent. Now I store each file and directory in a Map. The type is Map Word64 (FileTreeF Word64), But if I wanted this as a tree, the type would be type FileTree = Fix FileTreeF. 99% of the time, I’m looking up objects from the map by id, so thats much nicer than searching through the tree! But sometimes I want the data as a tree.

I can turn this into a tree with unfold:

unfold :: Functor f => (a -> f a) -> a -> Fix f
unfold alpha = Fix . fmap (unfold alpha) . alpha

buildTree :: Map Word64 (FileTreeF Word64) -> FileTree
buildTree fileMap = unfold go 0
    where go :: Word64 -> FileTreeF Word64 -- Again, to aid with understanding
          go = fileMap ! -- ! looks up a key in the map

I could just as easily build a reversed tree to traverse up instead of down!

Now if I don’t want to completely build a tree and then recurse down it again, we can use this scary function function and it will achieve the same result without building the entire tree in ram!

countNodes :: Map Word64 (FileTreeF Word64) -> Int
countNodes fileMap = hylo break build 0
    where build = fileMap !
          break (File _) = 1
          break (Branch _ children) = 1 + sum children

By ’less’, we mean $\leq$ ↩︎
It’s a morphism, but in our case, that makes it a function ↩︎
Endofunctor, but at this point who gives a crap. I am specialising the category theory for haskell. I’ll stop making footnotes ↩︎
Taken from here ↩︎