About this book

Note: This book is still fairly new. If you find errors, please let me know.

In this book we will implement a simple static blog generator in Haskell, converting documents written in our own custom markup language to HTML.

We will:

  1. Implement a tiny HTML printer library
  2. Define and parse our custom markup language
  3. Read files and glue things together
  4. Add command line arguments parsing

In each chapter of the book we will focus on a particular task we wish to achieve, and through the chapter we'll learn just enough Haskell to complete the task.

Why should you read this book?

There are many Haskell tutorials, guides and books out there. Why read this one?

Pros

There are probably more, but here are a few possible pros:

  • It's relatively short - most Haskell books out there are hundreds of pages long. This book (when exported to PDF) is roughly 150 pages long.
  • It's project oriented. Many Haskell books teach Haskell by teaching the underlying concepts and features in a neat progression. In this book we try to build a program, and learn Haskell on the way. This will be a pro to some, and a con to others. There are other tutorials like this. The most notable ones are Beginning Haskell and Haskell via Sokoban.
  • It's online, which means corrections are easy to make.
  • It's free.

Cons

There are probably more, but here are a few possible cons:

  • It may lack depth - many, much longer Haskell tutorials are long because they go deeper into the nuts and bolts of each feature.
  • It may not cover as many features or techniques as other tutorials - we try to cover features as they pop up in our implementation, but we will probably miss features that aren't as important for our tasks, while other resources may try to cover many different use cases.
  • It is very new and not "battle-tested". Who knows if this is a good approach to learning Haskell? Maybe you could help with that!
  • It doesn't have a technical editor, making the book not as good as it could've been.

Other learning resources

The haskell.org/documentation page lists many tutorials, books, guides and courses. You can find a few alternatives that I can recommend in this list.

Hello, world!

In this chapter we will create a simple HTML "hello world" program and use the Haskell toolchain to compile and run it.

If you haven't installed a Haskell toolchain yet, visit haskell.org/downloads for instructions on how to download and install a Haskell toolchain.

A Haskell source file

A Haskell source file is composed of definitions.

The most common type of definitions have the following form:

<name> = <expression>

Note that:

  1. We cannot write expressions without binding them to a name
  2. Names must start with a lowercase letter
  3. We cannot use the same name more than once in a file

A source file containing a definition of the name main can be treated as an executable, and the expression main is bound to is the entry point to the program.

Let's create a new Haskell source file called hello.hs, and write the following line there:

main = putStrLn "<html><body>Hello, world!</body></html>"

We've defined a new name, main, and bound it to the expression putStrLn "<html><body>Hello, world!</body></html>".

the body of main means calling the function putStrLn with the string "<html><body>Hello, world!</body></html>" as input. putStrLn takes a single string as input and prints that string to the standard output.

Note: we don't need parenthesis to pass arguments to functions in Haskell.

Running this program will result in the following text printed on the screen:

<html><body>Hello, world!</body></html>

Compiling programs

To run this little program, we can compile it using the command line program ghc:

> ghc hello.hs
[1 of 1] Compiling Main             ( hello.hs, hello.o )
Linking hello ...

Invoking ghc with will create the following artifact files:

  1. hello.o - Object file
  2. hello.hi - Haskell interface file
  3. hello - A native executable file

And after the compilation, we can run the hello executable:

> ./hello
<html><body>Hello, world!</body></html>

Interpreting programs

Alternatively, we can skip the compilation and creation of artifact files phase, and run the source file directly using the command line program runghc:

> runghc hello.hs
<html><body>Hello, world!</body></html>

We can also redirect the output of the program to a file and then open it in Firefox.

> runghc hello.hs > hello.html
> firefox hello.html

This command should open Firefox and display a web page with Hello, world! written in it.

I recommend using runghc with this tutorial. While compiling produces significantly faster programs, intepreting programs provides us with faster feedback while we are developing and making frequent changes.

If you want to learn more about the core Haskell tools, you can read this article, but what's described above is enough for our usage at the moment.

More bindings

We can define the HTML string passed to putStrLn in a new name instead of passing it directly to putStrLn. Change the content of file hello.hs we defined above to:

main = putStrLn myhtml

myhtml = "<html><body>Hello, world!</body></html>"

Note: the order in which we declare the bindings does not matter.

Building an HTML printer library

In this part we'll explore a few basic building blocks in Haskell, including functions, types and modules, while building a small HTML printer library with which we will later construct HTML pages from our markup blog posts.

If you're not familiar with HTML and would like a quick tutorial before diving in, MDN's Getting started with HTML is a good overview on the subject.

Flexible HTML content (functions)

We'd like to be able to write different HTML pages without having to write the whole structure of HTML and body tags over and over again. We can do that with functions.

To define a function, we create a definition like we saw previously and add the argument names after the name and before the equals sign (=). So a function definition has the following form:

<name> <arg1> <arg2> ... <argN> = <expression>

The argument names will be available in scope on the right side of the equals sign (in the <expression>), and the function name will be <name>.

We'll define a function that takes a string, which is the content of the page, and wraps it in the relevant html and body tags by concatenating them before and after the content. We use the operator <> to concatenate two strings.

wrapHtml content = "<html><body>" <> content <> "</body></html>"

This function, wrapHtml, takes one argument named content and returns a string that prefixes <html><body> before the content and appends </body></html> after it. Note that it is common to use camelCase in Haskell for names.

Now we can adjust our myhtml definition from the previous chapter:

myhtml = wrapHtml "Hello, world!"

Again, notice that we don't need parenthesis when calling functions. Function calls have the form:

<name> <arg1> <arg2> ... <argN>

However, if we wanted to substitute myhtml with the expression myhtml is bound to in main = putStrLn myhtml, we would have to wrap the expression in parenthesis:

main = putStrLn (wrapHtml "Hello, world!")

If we accidentally write this instead:

main = putStrLn wrapHtml "Hello, world!"

we'll get an error from GHC stating that putStrLn is applied to two arguments, but it only takes one. This is because the above is of the form <name> <arg1> <arg2> in which, as we defined earlier, <arg1> and <arg2> are arguments to <name>.

By using parenthesis we can group together the expressions in the right order.

An aside about operator precedence and fixity

operators (like <>) are infix functions which take two arguments - one from each side.

When there are multiple operators in the same expression without parenthesis, the operator fixity (left or right) and precedence (a number between 0 and 10) determines which operator binds more tightly.

In our case <> has right fixity, so Haskell adds invisible parenthesis on the right side of <>. So for example:

"<html><body>" <> content <> "</body></html>"

is viewed by Haskell as:

"<html><body>" <> (content <> "</body></html>")

For an example of precedence, in the expression 1 + 2 * 3, the operator + has precedence 6, and the operator * has precedence 7, so we give precedence to * over +. Haskell will view this expression as:

1 + (2 * 3)

You might run into errors when mixing different operators with the same precedence but different fixity, because Haskell won't understand how to group these expressions. In that case we can solve the problem by adding parenthesis explicitly.


Exercises:

  1. Separate the functionality of wrapHtml to two functions:

    1. One that wraps content in html tag
    2. one that wraps content in a body tag

    Name the new functions html_ and body_.

  2. Change myhtml to use these two functions.

  3. Add another two similar functions for the tags <head> and <title> and name them head_ and title_.

  4. Create a new function, makeHtml, which takes two strings as input:

    1. One string for the title
    2. One string for the body content

    And construct an HTML string using the functions implemented in the previous exercises.

    The output for:

    makeHtml "My page title" "My page content"
    

    Should be:

    <html><head><title>My page title</title></head><body>My page content</body></html>
    
  5. Use makeHtml in myhtml instead of using html_ and body_ directly


Solutions:

Solution for exercise #1
html_ content = "<html>" <> content <> "</html>"
   
body_ content = "<body>" <> content <> "</body>"
Solution for exercise #2
myhtml = html_ (body_ "Hello, world!")
Solution for exercise #3
head_ content = "<head>" <> content <> "</head>"

title_ content = "<title>" <> content <> "</title>"
Solution for exercise #4
makeHtml title content = html_ (head_ (title_ title) <> body_ content)
Solution for exercise #5
myhtml = makeHtml "Hello title" "Hello, world!"
Our final program
-- hello.hs

main = putStrLn myhtml

myhtml = makeHtml "Hello title" "Hello, world!"

makeHtml title content = html_ (head_ (title_ title) <> body_ content)

html_ content = "<html>" <> content <> "</html>"
   
body_ content = "<body>" <> content <> "</body>"

head_ content = "<head>" <> content <> "</head>"

title_ content = "<title>" <> content <> "</title>"

We can now run our hello.hs program, pipeline the output into a file, and open it in our browser:

runghc hello.hs > hello.html
firefox hello.html

It should display Hello, world! on the page and Hello title on the page's title.


Indentation

You might ask how does Haskell know a definition is complete? The answer is: Haskell uses indentation to know when things should be grouped together.

Indentation in Haskell can be a bit tricky, but in general: code which is supposed to be part of another expression should be indented further than the beginning of that expression.

We know two definitions are separate because the second one is not indented further than the first one.

Indentation tips

  1. Choose a specific amount of spaces for indentation (2 spaces, 4 spaces, etc) and stick to it. Always use spaces over tabs.
  2. Do not indent more than once in any given time.
  3. When in doubt, drop line as needed and indent once.

Here are a few examples:

main =
    putStrLn "Hello, world!"

or:

main =
    putStrLn
        (wrapHtml "Hello, world!")

Avoid the following styles, which use more than one indentation steps, or completely disregard indentation steps:

main = putStrLn
        (wrapHtml "Hello, world!")
main = putStrLn
                (wrapHtml "Hello, world!")

Adding type signatures

Haskell is a statically typed programming language. That means that every expression has a type, and we check that the types are valid with regards to each other before running the program. If we discover that they are not valid, an error message will be printed and the program will not run.

An example of a type error would be if we'd pass 3 arguments to a function that takes only 2, or pass a number instead of a string.

Haskell is also type inferred, so we don't need to specify the type of expressions - Haskell can infer from the context of the expression what its type should be, and that's what we did up until now. However, specifying types is useful - it adds a layer of documentation for you or others that will look at the code later, and it helps verify to some degree that what was intended (with the type signature) is what was written (with the expression). It is generally recommended to annotate all top-level definitions with type signatures.

We use double-colon (::) to specify the type of names. We usually write it right above the definition of the name itself.

Here are a few examples of types we can write:

  • Int - The type of integer numbers
  • String - The type of strings
  • Bool - The type of booleans
  • () - The type of the expression (), also called unit
  • a -> b - The type of a function from an expression of type a to an expression of type b
  • IO () - The type of an expression that represents an IO subroutine that returns ()

Let's specify the type of title_:

title_ :: String -> String

We can see in the code that the type of title_ is a function that takes a String and returns a String.

Let's also specify the type of makeHtml:

makeHtml :: String -> String -> String

Previously, we thought about makeHtml as a function that takes two strings and returns a string.

But actually, all functions in Haskell take exactly one argument as input and return exactly one value as output. It's just convenient to refer to functions like makeHtml as functions with multiple inputs.

In our case, makeHtml is a function that takes one string argument, and returns a function. The function it returns takes a string argument as well and finally returns a string.

The magic here is that -> is right associative. Which means that when we write:

makeHtml :: String -> String -> String

Haskell parses it as:

makeHtml :: String -> (String -> String)

Consequently, the expression makeHtml "My title" is also a function! One that will take a string (the content, the second argument of makeHtml) and will return the expected HTML string with "My title" in the title.

This is called partial application.

To illustrate, let's define html_ and body_ in a different way by defining a new function, el.

el :: String -> String -> String
el tag content =
  "<" <> tag <> ">" <> content <> "</" <> tag <> ">"

el is a function that takes a tag and content, and wraps the content with the tag.

We can now implement html_ and body_ by partially applying el and only provide the tag.

html_ :: String -> String
html_ = el "html"

body_ :: String -> String
body_ = el "body"

Note that we didn't need to add the argument on the left side of equals sign because Haskell functions are "first class" - they behave exactly like normal expressions. You can define names to them like regular values, put them in data structures, pass them to functions, everything you can do with regular values like Int or String.

The way Haskell treats names is very similar to copy paste. anywhere you see html_ in the code, you can replace it with el "html". They are the same (this is what the equals signs say, right? That the two sides are the same). This property, of being able to substitute the two sides of the equals sign with one another, is called referential transparency. And it is pretty unique to Haskell (and a few languages that are very similar to it like PureScript and Elm)! We'll talk more about this in a later chapter.

Anonymous/lambda functions

To further drive the point that Haskell functions are first class and all functions take exactly one argument, I'll mention that the syntax we've been using up until now to define function is just syntactic sugar! We can also define anonymous functions - functions without a name, anywhere we'd like. Anonymous functions are also known as lambda functions. This is a tribute to the original, most primitive functional programming language - the lambda calculus.

We can create an anonymous function anywhere we'd expect an expression such as "hello" with the following syntax:

\<argument> -> <expression>

This little \ (which bears some resemblance to the lowercase Greek letter lambda 'λ') marks the head of the lambda function, and the arrow (->) marks the beginning of the body of the function. We can even chain lambda functions, making them "multiple argument functions" by defining another lambda in the body of another, like this:

three = (\num1 -> \num2 -> num1 + num2) 1 2

Just as before, we evaluate functions by substituting the function argument with the applied value. In the example above we substitute num1 with 1, and get (\num2 -> 1 + num2) 2. Then substitute num2 with 2 and get 1 + 2. We'll talk more about substitution later.

So, when we write:

el :: String -> String -> String
el tag content =
  "<" <> tag <> ">" <> content <> "</" <> tag <> ">"

Haskell actually translates this under the hood to:

el :: String -> (String -> String)
el = \tag -> \content ->
  "<" <> tag <> ">" <> content <> "</" <> tag <> ">"

Hopefully this form makes it a bit clearer why Haskell functions always take one argument, even when we have syntactic sugar that might suggest otherwise.

I'll mention one more syntactic sugar for anonymous functions: We don't actually have to write multiple argument anonymous functions this way, we can just write:

\<arg1> <arg2> ... <argN> -> <expression>

to save us some trouble. For example:

three = (\num1 num2 -> num1 + num2) 1 2

But it's worth remembering what they are under the hood.

We won't be needing anonymous/lambda functions at this point, but we'll talk more about them later and see where they can be useful.


Exercises:

  1. Add types for all of the functions we created until now

  2. Change the implementation of the HTML functions we built to use el instead

  3. Add a couple more functions for defining paragraphs and headers:

    1. p_ which uses the tag <p> for paragraphs
    2. h1_ which uses the tag <h1> for headers
  4. Replace our Hello, world! string with richer content, use h1_ and p_. We can append HTML strings created by h1_ and p_ using the append operator <>.


Solutions:

Solution for exercise #1
myhtml :: String
myhtml = makeHtml "Hello title" "Hello, world!"

makeHtml :: String -> String -> String
makeHtml title content = html_ (head_ (title_ title) <> body_ content)

html_ :: String -> String
html_ content = "<html>" <> content <> "</html>"
   
body_ :: String -> String
body_ content = "<body>" <> content <> "</body>"

head_ :: String -> String
head_ content = "<head>" <> content <> "</head>"

title_ :: String -> String
title_ content = "<title>" <> content <> "</title>"
Solution for exercise #2
html_ :: String -> String
html_ = el "html"
   
body_ :: String -> String
body_ = el "body"

head_ :: String -> String
head_ = el "head"

title_ :: String -> String
title_ = el "title"
Solution for exercise #3
p_ :: String -> String
p_ = el "p"

h1_ :: String -> String
h1_ = el "h1"
Solution for exercise #4
myhtml :: String
myhtml =
  makeHtml
    "Hello title"
    (h1_ "Hello, world!" <> p_ "Let's learn about Haskell!")

Our final program
-- hello.hs

main :: IO ()
main = putStrLn myhtml

myhtml :: String
myhtml =
  makeHtml
    "Hello title"
    (h1_ "Hello, world!" <> p_ "Let's learn about Haskell!")


makeHtml :: String -> String -> String
makeHtml title content = html_ (head_ (title_ title) <> body_ content)

html_ :: String -> String
html_ = el "html"
   
body_ :: String -> String
body_ = el "body"

head_ :: String -> String
head_ = el "head"

title_ :: String -> String
title_ = el "title"

p_ :: String -> String
p_ = el "p"

h1_ :: String -> String
h1_ = el "h1"

el :: String -> String -> String
el tag content =
  "<" <> tag <> ">" <> content <> "</" <> tag <> ">"

Embedded Domain Specific Languages

Right off the bat we run into a common pattern in Haskell: creating Embedded Domain Specific Languages (EDSLs for short).

Domain specific languages (DSLs) are specialized programming languages that are tailored to specific domains. In contrast of general purpose languages, which try to work well in many domains.

A few examples of DSLs are:

  • make - for defining build systems
  • DOT - for defining graphs
  • Sed - for defining text transformations
  • CSS - for defining styling
  • HTML - for defining web pages

An embedded domain specific language is a little language which is embedded inside another programming language, making a program written in the EDSL a valid program in the language it was written in.

The little HTML library we've been writing can be considered an EDSL. It is used specifically for building web pages (by returning HTML strings), and is valid Haskell code!

In Haskell we frequently create and use EDSLs to express domain specific logic. We have EDSLs for concurrency, command-line options parsing, JSON and HTML, Creating build systems, writing tests, and many more.

Specialized languages are useful because they can solve specific problems in a concise (and often safe) way, and embedding them in general purpose programming languages provides us to use the full power of the language for our domain logic, including syntax highlighting and various tools available for the language.

The drawback of embedding domain specific languages is that we have to adhere the rules of the programming language we embed in, such as syntactic and semantic rules.

Some languages alleviate this drawback by providing meta-programming capabilities in the form of macros or other features to extend the language. And while Haskell does provide such capabilities as well, it is also expressive and concise enough that many EDSLs do not need them.

Instead, many Haskell EDSLs use a pattern called the combinator pattern: They define primitives and combinators - primitives are basic building blocks of the language, and combinators are functions that combine primitives to more complex structures.

In our HTML EDSL, our primitives are functions such as html_ and title_ that can be used to create a single HTML node, and we pass other constructed nodes as input to these functions, and combine them to a more complex structure with the append function <>.

There are still a few tricks we can use to make our HTML EDSL better:

  1. We can use Haskell's type system to make sure we only construct valid HTML (so for example we don't create a <title> node without a <head> node, or have user content that can include unescaped special characters), and throw a type error when the user tries to do something invalid.

  2. Our HTML EDSL can move to its own module so it can be reused in multiple modules

In the next few sections we'll take a look at how to define our own types and how to work with modules to make it harder to make errors, and a little bit about linked lists in Haskell.

Safer HTML construction with types

In this section we'll learn how to create our own distinguished types for HTML, and how they can help us avoid invalid construction of HTML strings.

There are a few ways of defining new types in Haskell, in this section we are going to meet two ways: newtype and type.

newtype

newtype lets us give a new name to an already existing type in a way that the two cannot mix together.

A newtype declaration looks like this:

newtype <type-name> = <constructor> <existing-type>

For example in our case we can define a distinct type for Html like this:

newtype Html = Html String

The first Html, to the left of the equals sign, lives in the types name space, meaning that you will only see that name to the right of a double-colon sign (::).

The second Html lives in the expressions (or terms/values) name space, meaning that you will see it where you expect expressions (we'll touch where exactly that can be in a moment).

The two names, <type-name> and <constructor>, do not have to be the same, but they often are. And note that both have to start with a capital letter.

The right-hand side of the newtype declaration describes the shape of a value of that type. In our case, we expect a value of type Html to have the constructor Html and then an expression of type string, for example: Html "hello" or Html ("hello " <> "world").

You can think of the constructor as a function that takes the argument and returns something of our new type:

Html :: String -> Html

Note: We cannot use an expression of type Html the same way we'd use a String. so "hello " <> Html "world" would fail at type checking.

This is useful when we want encapsulation. We can define and use existing representation and functions for our underlying type, but not mix them with other, unrelated (to our domain) types. Similar as meters and feet can both be numbers, but we don't want to accidentally add feet to meters without any conversion.


For now, let's create a couple of types for our use case. We want two separate types to represent:

  1. A complete Html document
  2. A type for html structures such as headers and paragraphs that can go inside the tag

We want them to be distinct because we don't want to mix them together.

Solution
newtype Html = Html String

newtype Structure = Structure String

Using newtypes

In order to use the underlying type that the newtype wraps, we first need to extract it out of the type. We do this using pattern matching.

Pattern matching can be used in two ways, in case expressions and in function definitions.

  1. case expressions are kind of beefed up switch expressions and look like this:

    case <expression> of
      <pattern> -> <expression>
      ...
      <pattern> -> <expression>
    

    The <expression> is the thing we want to unpack, and the pattern is its concrete shape. For example, if we wanted to extract the String out of the type Structure we defined in the exercise above, we do:

    getStructureString :: Structure -> String
    getStructureString struct =
      case struct of
        Structure str -> str
    

    This way we can extract the String out of Structure and return it.

    In later chapters we'll introduce data declarations (which are kind of a struct + enum chimera), where we can define multiple constructors to a type. Then the multiple patterns of a case expression will make more sense.

  2. Alternatively, when declaring a function, we can also use pattern matching on the arguments:

    func <pattern> = <expression>
    

    For example:

    getStructureString :: Structure -> String
    getStructureString (Structure str) = str
    

    Using the types we created, we can change the HTML functions we've defined before, namely html_, body_, p_, etc, to operate on these types instead of Strings.

    But first let's meet another operator that will make our code more concise.

One very cool thing about newtype is that wrapping and extracting expressions doesn't actually have a performance cost! The compiler knows to remove any wrapping and extraction of the newtype constructor and use the underlying type.

The new type and the constructor we defined are only there to help us distinguish between the type we created and the underlying type when we write our code, they are not needed when the code is running.

newtypes provide us with type safety with no performance penalty!

Chaining functions

Another interesting and extremely common operator (which is a regular library function in Haskell) is . (pronounced compose). This operator was made to look like the composition operator you may know from math ().

Let's look at its type and implementation:

(.) :: (b -> c) -> (a -> b) -> a -> c
(.) f g x = f (g x)

Compose takes 3 arguments: two functions (named f and g here) and a third argument named x. It then passes the argument x to the second function g, and calls the first function f with the result of g x.

Note that g takes as input something of the type a and returns something of the type b, and f takes something of the type b, and returns something of the type c.

Another important thing to note is that types which start with a lowercase letter are type variables. Think of them as similar to regular variables. Just like content could be any string, like "hello" or "world", a type variable can be any type: Bool, String, String -> String, etc. This abilitiy is called parametric polymorphism (other languages often call this generics).

The catch is that type variables must match in a signature, so if for example we write a function with the type signature a -> a, the input type and the return type must match, but it could be any type - we cannot know what it is. So the only way to implement a function with that signature is:

id :: a -> a
id x = x

id, short for the identity function, returns the exact value it received. If we tried any other way, for example returning some made up value like "hello", or try to use x like a value of a type we know like writing x + x, the type checker will complain.

Also, remember that -> is right associative? This signature is equivalent to:

(.) :: (b -> c) -> (a -> b) -> (a -> c)

Doesn't it look like a function that takes two functions and returns a third function that is the composition of the two?

We can now use this operator to change our HTML functions. Let's start with one example: p_.

Before, we had:

p_ :: String -> String
p_ = el "p"

And now, we can write:

p_ :: String -> Structure
p_ = Structure . el "p"

The function p_ will take an arbitrary String which is the content of the paragraph we wish to create, will wrap it in <p> and </p> tags, and then wrap it in the Structure constructor to produce the output type Structure (remember: newtype constructors can be used as functions!).

Let's take a deeper look at the types:

  • Structure :: String -> Structure
  • el "p" :: String -> String
  • (.) :: (b -> c) -> (a -> b) -> (a -> c)
  • Structure . el "p" :: String -> Structure

Let's see why the expression Structure . el "p" type checks, and why its type is String -> Structure.

Type checking with pen and paper

If we want to figure out if and how exactly an expression type-checks, we can do that rather systematically. Let's look at an example where we try and type-check this expression:

p_ = Structure . el "p"

First, we write down the type of the outer-most function. In our case this is the operator . which has the type:

(.) :: (b -> c) -> (a -> b) -> (a -> c)

After that, we can try to match the type of the arguments we apply to this function with the type of the arguments from the type signature.

In this case, we try to apply two arguments to .:

  1. Structure :: String -> Structure
  2. el "p" :: String -> String

And luckily . expects two arguments with the types:

  1. b -> c
  2. a -> b

Note: Applying a function with more arguments than it expects is a type error.

Since the . operator takes at least the number of arguments we supply, we continue to the next phase of type-checking: matching the types of the inputs with the types of the expected inputs (from the type signature of the operator).

When we match two types, we are checking for equivalence between them. There are a few possible scenarios here:

  1. When the two types are concrete (as opposed to type variables) and simple, like Int and Bool, we check if they are the same. If they are, they type check and we continue. If they aren't, they don't type check and we throw an error.
  2. When the two types we match are more complex (for example both are functions), we try to match their inputs and outputs (in case of functions). If the inputs and outputs match, then the two types match.
  3. There is a special case when one of the types is a type variable - in this case we treat the matching process like an equation and we write it down somewhere. The next time we see this type variable, we replace it with its match in the equation. Think about this like assigning a type variable with a value.

In our case, we want to match (or check the equivalence of) these types:

  1. String -> Structure with b -> c
  2. String -> String with a -> b

Let's do this one by one, starting with (1) - matching String -> Structure and b -> c:

  1. Because the two types are complex, we check that they are both functions, and match their inputs and outputs: String with b, and Structure with c.
  2. Because b is a type variable, we mark down somewhere that b should be equivalent to String. We write b ~ String (we use ~ to denote equivalence).
  3. We match Structure and c, same as before, we write down that c ~ Structure.

No problem so far, let's try matching String -> String with a -> b:

  1. The two types are complex, we see that both are functions so we match their inputs and outputs.
  2. Matching String with a - we write down that a ~ String.
  3. Matching String with b - we remember that we have already written about b - looking back we see that we already noted that b ~ String. We need to replace b with the type that we wrote down before and check it against this type, so we match String with String which, fortunately, type-check because they are the same.

So far so good. We've type-checked the expression and discovered the following things about the type variables in it:

  1. a ~ String
  2. b ~ String
  3. c ~ Structure

Now, when asking what is the type of the expression:

p_ = Structure . el "p"

We say that it is the type of . after replacing the type variables using the equations we found and removing the inputs we applied to it, so we started with:

(.) :: (b -> c) -> (a -> b) -> (a -> c)

Then we replaced the type variables:

(.) :: (String -> Structure) -> (String -> String) -> (String -> Structure)

And removed the two arguments when we apply the function:

Structure . el "p" :: String -> Structure

And we got the type of the expression!

Fortunately, Haskell is able to do this process for us. But when Haskell complains that our types fail to type-check and we don't understand exactly why, going through this process can help us understand where the types do not match, and then we can figure out how to solve it.

Note: If we use a parametrically polymorphic function more than once, or use different functions that have similar type variable names, the type variables don't have to match in all instances simply because they share a name. Each instance has its own unique set of type variables. For example:

id :: a -> a
ord :: Char -> Int
chr :: Int -> Char

incrementChar :: Char -> Char
incrementChar c = chr (ord (id c) + id 1)

In the snippet above, we use id twice (for no good reason other than for demonstration purposes). The first id takes a Char as an argument, and its a is equivalent to Char. The second id takes an Int as an argument, and its distinct a is equivalent to Int.

This unfortunately only applies to functions defined at the top-level. If we'd define a local function to be passed as an argument to incrementChar with the same type signature as id, the types must match in all uses. So this code:

incrementChar :: (a -> a) -> Char -> Char
incrementChar func c = chr (ord (func c) + func 1)

Will not type check. Try it!

Appending Structure

Before when we wanted to create richer HTML content and appended nodes to one another, we used the append (<>) operator. Since we are now not using String anymore, we need another way to do it.

While it is possible to overload <> using a feature in Haskell called type classes, we will instead create a new function and call it append_, and cover type classes later.

append_ should take two Structures, and return a third Structure, appending the inner String in the first Structure to the second and wrapping the result back in Structure.


Try implementing append_.

Solution
append_ :: Structure -> Structure -> Structure
append_ (Structure a) (Structure b) =
  Structure (a <> b)

Converting back an Html to String

After constructing a valid Html value, we want to be able to print it to the output so we can display it in our browser. For that, we need to write a function that takes an Html and converts it to a String, which we can then pass to putStrLn.


Implement the render function.

Solution
render :: Html -> String
render html =
  case html of
    Html str -> str

type

Let's look at one more way to give new names to types.

A type definition looks really similar to a newtype definition - the only difference is that we reference the type name directly without a constructor:

type <type-name> = <existing-type>

For example in our case we can write:

type Title = String

type, in contrast with newtype, is just a type name alias. When we declare that Title is a type alias of String We mean that Title and String are interchangeable, and we can use one or the other whenever we want:

"hello" :: Title

"hello" :: String

Both are valid in this case.

We can sometimes use types to give a bit more clarity to our code, but they are much less useful than newtypes which allow us to distinguish between two types that have the same type representation.

The rest of the owl


Try changing the code we wrote in previous chapters to use the new types we created.

Tips

We can combine makeHtml and html_, and remove body_ head_ and title_ by calling el directly in html_, which can now have the type Title -> Structure -> Html. This will make our HTML EDSL less flexible but more compact.

Alternatively, we could create newtypes for HtmlHead and HtmlBody and pass those to html_, and we might do that at later chapters, but I've chose to keep the API a bit simple for now, we can always refactor later!

Solution
-- hello.hs

main :: IO ()
main = putStrLn (render myhtml)

myhtml :: Html
myhtml =
  html_
    "My title"
    ( append_
      (h1_ "Header")
      ( append_
        (p_ "Paragraph #1")
        (p_ "Paragraph #2")
      )
    )

newtype Html
  = Html String

newtype Structure
  = Structure String

type Title
  = String

html_ :: Title -> Structure -> Html
html_ title content =
  Html
    ( el "html"
      ( el "head" (el "title" title)
        <> el "body" (getStructureString content)
      )
    )

p_ :: String -> Structure
p_ = Structure . el "p"

h1_ :: String -> Structure
h1_ = Structure . el "h1"

el :: String -> String -> String
el tag content =
  "<" <> tag <> ">" <> content <> "</" <> tag <> ">"

append_ :: Structure -> Structure -> Structure
append_ c1 c2 =
  Structure (getStructureString c1 <> getStructureString c2)

getStructureString :: Structure -> String
getStructureString content =
  case content of
    Structure str -> str

render :: Html -> String
render html =
  case html of
    Html str -> str

Are we safe yet?

We have made some progress - now we can't write "Hello" where we'd expect either a paragraph or a header, but we can still write Structure "hello" and get something that isn't a paragraph or a header. So while we made it harder for the user to make mistakes by accident, we haven't really been able to enforce the invariants we wanted to enforce in our library.

Next we'll see how we can make expressions such as Structure "hello" illegal as well using modules and smart constructors.

Preventing incorrect use with modules

In this section we will move the HTML generation library to its own module.

Modules

Each Haskell source file is a module. The module name should have the same name as the source file and should start with a capital letter. Sub-directories should also be part of the name and we use . do denote a sub-directory. We'll see that in the next section.

The only exception to the rule are entry points to the program - modules with the name 'Main' that define main in them. Their source file names could have any name they want.

A module declaration looks like this:

module <module-name>
  ( <export-list>
  )
  where

The export list can be omitted if you want to export everything defined in the module, but we don't. We will list exactly the functions and type we want to export. This will give us control on how people can use our tiny library.

We will create a new source file named Html.hs and add the following module declaration code at the top of the file:

module Html
  ( Html
  , Title
  , Structure
  , html_
  , p_
  , h1_
  , append_
  , render
  )
  where

Note that we do not export:

  1. The constructors for our new types, only the types themselves. If we wanted to export the constructors as well we would've written Html(Html) or Html(..). This way the user cannot create their own Structure simply by writing Structure "Hello".

  2. Internal functions used by the library, such as el and getStructureString.

And we will also move the HTML related functions from our hello.hs file to this new Html.hs file:

newtype Html
  = Html String

newtype Structure
  = Structure String

type Title
  = String

html_ :: Title -> Structure -> Html
html_ title content =
  Html
    ( el "html"
      ( el "head" (el "title" title)
        <> el "body" (getStructureString content)
      )
    )

p_ :: String -> Structure
p_ = Structure . el "p"

h1_ :: String -> Structure
h1_ = Structure . el "h1"

el :: String -> String -> String
el tag content =
  "<" <> tag <> ">" <> content <> "</" <> tag <> ">"

append_ :: Structure -> Structure -> Structure
append_ c1 c2 =
  Structure (getStructureString c1 <> getStructureString c2)

getStructureString :: Structure -> String
getStructureString content =
  case content of
    Structure str -> str

render :: Html -> String
render html =
  case html of
    Html str -> str

Now, anyone importing our module (using the import statement which can be used below module declarations but above any other declaration), will only be able to import what we export.

Add the following code at the top of the hello.hs file:

import Html

The hello.hs file should now look like this:

-- hello.hs

import Html

main :: IO ()
main = putStrLn (render myhtml)

myhtml :: Html
myhtml =
  html_
    "My title"
    ( append_
      (h1_ "Header")
      ( append_
        (p_ "Paragraph #1")
        (p_ "Paragraph #2")
      )
    )

And the Html.hs file should look like this:

-- Html.hs

module Html
  ( Html
  , Title
  , Structure
  , html_
  , p_
  , h1_
  , append_
  , render
  )
  where

newtype Html
  = Html String

newtype Structure
  = Structure String

type Title
  = String

html_ :: Title -> Structure -> Html
html_ title content =
  Html
    ( el "html"
      ( el "head" (el "title" title)
        <> el "body" (getStructureString content)
      )
    )

p_ :: String -> Structure
p_ = Structure . el "p"

h1_ :: String -> Structure
h1_ = Structure . el "h1"

el :: String -> String -> String
el tag content =
  "<" <> tag <> ">" <> content <> "</" <> tag <> ">"

append_ :: Structure -> Structure -> Structure
append_ c1 c2 =
  Structure (getStructureString c1 <> getStructureString c2)

getStructureString :: Structure -> String
getStructureString content =
  case content of
    Structure str -> str

render :: Html -> String
render html =
  case html of
    Html str -> str

Escaping characters

Now that Html has its own source file and module, and creating HTML code can be done only via the functions we exported, we can also handle user input that may contain characters that may conflict with our meta language, HTML, such as < and > which are used for creating HTML tags.

We can convert these characters into different strings that HTML can handle.

See Stack overflow question for a list of characters we need to escape.

Let's create a new function called escape:

escape :: String -> String
escape =
  let
    escapeChar c =
      case c of
        '<' -> "&lt;"
        '>' -> "&gt;"
        '&' -> "&amp;"
        '"' -> "&quot;"
        '\'' -> "&#39;"
        _ -> [c]
  in
    concat . map escapeChar

In escape we see a few new things:

  1. Let expressions: we can define local names using this syntax:

    let
      <name> = <expression>
    in
      <expression>
    

    This will make <name> available as a variable in the second <expression>.

  2. Pattern matching with multiple patterns: we match on different characters and convert them to a string. Note that _ is a "catch all" pattern that will always succeed.

  3. Two new functions: map and concat, we'll talk about these more in depth

  4. That the syntax highlighting broke a bit for this snippet for some reason. Don't worry about it.

Linked lists briefly

Linked lists are a very common data structure in Haskell, so common that they have their own special syntax:

  1. The type for lists are denoted with brackets and inside them is the type of the element. For example:
    • [Int] - a list of integers
    • [Char] - a list of characters
    • [String] - a list of strings
    • [[String]] - a list of a list of strings
    • [a] - a list of any single type (all elements must be of the same type)
  2. A value representing a empty list is written like this: []
  3. Prepending an element to a list is done with the operator : (pronounced cons) which is right-associative (like ->). For example: 1 : [], or 1 : 2 : 3 : [].
  4. The above lists can also be written like this: [1] and [1, 2, 3].

Also, Strings are linked lists of characters - String is defined as: type String = [Char], so we can use them the same way we use lists.

Do note, however, that linked lists, despite their convenience, are often not the right tool for the job. They are not particularity space efficient and are slow for appending, random access and more. That also makes String a lot less efficient than it could be. And I generally recommend using a different string type, Text, instead, which is available in an external package. We will talk about lists, Text, and other data structures in the future!

We can implement our own operations on lists by using pattern matching and recursion. And we'll touch on this subject later when talking about ADTs.

For now, we will use the various functions found in the Data.List module. Specifically, map and concat.

map

Using map we can apply a function to each of the elements in a list. Its type signature is:

map :: (a -> b) -> [a] -> [b]

For example:

map not [False, True, False] == [True, False, True]

Or as can be seen in our escape function, this can help us escape each character:

map escapeChar ['<','h','1','>'] == ["&lt;","h","1","&gt;"]

However, note that the escapeChar has the type Char -> String, so the result type of map escapeChar ['<','h','1','>'] is [String], and what we really want is a String and not [String].

This is where concat enters the picture to help us flatten the list.

concat

concat has the type:

concat :: [[a]] -> [a]

It flattens a list of list of something into a list of something. In our case in will flatten [String] into String, remember that this works because String is a type alias for [Char], so we actually have [[Char]] -> [Char].

GHCi

One way we can quickly see our code in action is using the interactive development environment GHCi. Running ghci will open an interactive prompt where Haskell expressions can be written and evaluated. This is called a "Read-Evaluate-Print Loop" (for short - REPL).

For example:

ghci> 1 + 1
2
ghci> putStrLn "Hello, world!"
Hello, world!

We can define new names:

ghci> double x = x + x
ghci> double 2
4

We can write multi-line code by surrounding it with :{ and :}:

ghci> :{
| escape :: String -> String
| escape =
|   let
|     escapeChar c =
|       case c of
|         '<' -> "&lt;"
|         '>' -> "&gt;"
|         '&' -> "&amp;"
|         '"' -> "&quot;"
|         '\'' -> "&#39;"
|         _ -> [c]
|   in
|     concat . map escapeChar
| :}

ghci> escape "<html>"
"&lt;html&gt;"

We can import Haskell source files using the :load command (:l for short):

ghci> :load Html.hs
[1 of 1] Compiling Html    ( Html.hs, interpreted )
Ok, one module loaded.
ghci> render (html_ "<title>" (p_ "<body>"))
"<html><head><title>&lt;title&gt;</title></head><body><p>&lt;body&gt;</p></body></html>"

As well as import library modules:

ghci> import Data.Bits
ghci> shiftL 32 1
64
ghci> clearBit 33 0
32

We can even ask what the type of an expression is using the :type command (:t for short):

λ> :type escape
escape :: String -> String

To exit ghci, use the :quit command (or :q for short)

ghci> :quit
Leaving GHCi.

GHCi is a very useful tool for quick experiments and exploration. And it can be used to quickly test what our code does.

We've seen a couple of examples of that above - passing the string "<html>" to our escape function returns the string "&lt;html&gt;", which can be rendered by a browser as <html> instead of the browser considering it as an HTML tag.

If you are having a hard time figuring out what a particular function does, consider testing it in GHCi - pass it different inputs, and see if it matches your expectations. Concrete examples of running code can aid a lot in understanding it!

If you'd like to learn more about GHCi, you can find a more thorough introduction in the GHC user guide.

Escaping


The user of our library can currently only supply strings in a few places:

  1. Page title
  2. Paragraphs
  3. Headers

We can apply our escape function at these places before doing anything else with it. That way all HTML constructions are safe.

Try adding the escaping function in those places.

Solution
html_ :: Title -> Structure -> Html
html_ title content =
  Html
    ( el "html"
      ( el "head" (el "title" (escape title))
        <> el "body" (getStructureString content)
      )
    )

p_ :: String -> Structure
p_ = Structure . el "p" . escape

h1_ :: String -> Structure
h1_ = Structure . el "h1" . escape

Our revised Html.hs
-- Html.hs

module Html
  ( Html
  , Title
  , Structure
  , html_
  , p_
  , h1_
  , append_
  , render
  )
  where

-- * Types

newtype Html
  = Html String

newtype Structure
  = Structure String

type Title
  = String

-- * EDSL

html_ :: Title -> Structure -> Html
html_ title content =
  Html
    ( el "html"
      ( el "head" (el "title" (escape title))
        <> el "body" (getStructureString content)
      )
    )

p_ :: String -> Structure
p_ = Structure . el "p" . escape

h1_ :: String -> Structure
h1_ = Structure . el "h1" . escape

append_ :: Structure -> Structure -> Structure
append_ c1 c2 =
  Structure (getStructureString c1 <> getStructureString c2)

-- * Render

render :: Html -> String
render html =
  case html of
    Html str -> str

-- * Utilities

el :: String -> String -> String
el tag content =
  "<" <> tag <> ">" <> content <> "</" <> tag <> ">"

getStructureString :: Structure -> String
getStructureString content =
  case content of
    Structure str -> str

escape :: String -> String
escape =
  let
    escapeChar c =
      case c of
        '<' -> "&lt;"
        '>' -> "&gt;"
        '&' -> "&amp;"
        '"' -> "&quot;"
        '\'' -> "&#39;"
        _ -> [c]
  in
    concat . map escapeChar

Try constructing an invalid HTML in hello.hs to see if this works or not!

Now we can use our tiny HTML library safely. But what if the user wants to use our library with a valid use case we didn't think about, for example adding unordered lists? We are completely blocking them from extending our library. We'll talk about this next.

Exposing internal functionality (Internal modules)

We have now built a very small but convenient and safe way to write HTML code in Haskell. This is something that we could (potentially) publish as a library and share with the world by uploading it to a package repository such as Hackage. Users who are interested in our library could use a package manager to include our library in their project and build their own HTML pages with it.

It is important to note that users are building their project against the API that we expose to them, and the package manager don't generally provide access to the source code, so they can't, for example, modify the Html module (that we expose) in their project directly without jumping through some hoops.

Because we wanted our Html EDSL to be safe, we hid the internal implementation from the user, and the only way to interact with the library is via the API we provide.

This provides the safety we wanted to provide, but in this case it also blocks the user from extending our library in their own project with things we haven't implemented yet, such as lists or code blocks.

When a user runs into trouble with using a library (such as missing features) the best course of action usually is to open an issue in the repository or submit a pull request, but sometimes the user needs things to work now.

We admit that we are not perfect and can't think of all use cases for our library. Sometimes the restrictions we add are too great and may limit the usage of advanced users that know how things work under the hood and need certain functionality in order to use our library.

Internal modules

For that we can expose internal modules to provide some flexibility for advanced users. Internal modules are not a language concept but rather a (fairly common) design pattern (or idiom) in Haskell.

Internal modules are simply modules named <something>.Internal, which export all of the functionality and implementation details in that module.

Instead of writing the implementation in (for example) the Html module, we write it in the Html.Internal module, which will export everything. Then we will import that module in the Html module, and write an explicit export list to only export the API we'd like to export (as before).

Internal modules are considered unstable and risky to use by convention. If you end up using one yourself when using an external Haskell library, make sure to open a ticket in the library's repository after the storm has passed!

Lets make the changes

We will create a new directory named Html and inside it a new file named Internal.hs. The name of this module should be Html.Internal.

This module will contain all of the code we previously had in the Html module, but we will change the module declaration in Html.Internal and omit the export list:

-- Html/Internal.hs

module Html.Internal where

...

And now in Html.hs, we will remove the code that we moved to Html/Internal.hs and in its stead we'll import the internal module:

-- Html.hs

module Html
  ( Html
  , Title
  , Structure
  , html_
  , p_
  , h1_
  , append_
  , render
  )
  where

import Html.Internal

Now, users of our library can still import Html and safely use our library, but if they run into trouble and have a dire need to implement unordered lists to work with our library, they could always work with Html.Internal instead.

Our revised Html.hs and Html/Internal.hs
-- Html.hs

module Html
  ( Html
  , Title
  , Structure
  , html_
  , p_
  , h1_
  , append_
  , render
  )
  where

import Html.Internal
-- Html/Internal.hs

module Html.Internal where

-- * Types

newtype Html
  = Html String

newtype Structure
  = Structure String

type Title
  = String

-- * EDSL

html_ :: Title -> Structure -> Html
html_ title content =
  Html
    ( el "html"
      ( el "head" (el "title" (escape title))
        <> el "body" (getStructureString content)
      )
    )

p_ :: String -> Structure
p_ = Structure . el "p" . escape

h1_ :: String -> Structure
h1_ = Structure . el "h1" . escape

append_ :: Structure -> Structure -> Structure
append_ c1 c2 =
  Structure (getStructureString c1 <> getStructureString c2)

-- * Render

render :: Html -> String
render html =
  case html of
    Html str -> str

-- * Utilities

el :: String -> String -> String
el tag content =
  "<" <> tag <> ">" <> content <> "</" <> tag <> ">"

getStructureString :: Structure -> String
getStructureString content =
  case content of
    Structure str -> str

escape :: String -> String
escape =
  let
    escapeChar c =
      case c of
        '<' -> "&lt;"
        '>' -> "&gt;"
        '&' -> "&amp;"
        '"' -> "&quot;"
        '\'' -> "&#39;"
        _ -> [c]
  in
    concat . map escapeChar

Summary

For our particular project, Internal modules aren't necessary. Because our project and the source code for the HTML EDSL are part of the same project, and we have access the the Html module directly, we can always go and edit it if we want (and we are going to do that throughout the book).

However, if we were planning to release our HTML EDSL as a library for other developers to use, it would be very nice of us to also expose the internal implementation as an Internal module. Just so we can save some trouble for potential users!

We will see how to create a package from our source code in a later chapter.

Exercises

We need a few more features for our HTML library to be useful for our blog software. Add the following features to our Html.Internal module and expose them from Html.

1. Unordered lists

These lists have the form:

<ul>
  <li>item 1</li>
  <li>item 2</li>
  <li>...</li>
</ul>

We want in our library a new function:

ul_ :: [Structure] -> Structure

So that users can write this:

ul_
  [ p_ "item 1"
  , p_ "item 2"
  , p_ "item 3"
  ]

and get this:

<ul>
  <li><p>item 1</p></li>
  <li><p>item 2</p></li>
  <li><p>item 3</p></li>
</ul>

2. Ordered lists

Very similar to unordered lists, but instead of <ul> we use <ol>

3. Code blocks

Very similar to <p>, but use the <pre> tag. Call this function code_.

Solutions

Unordered lists
ul_ :: [Structure] -> Structure
ul_ =
  Structure . el "ul" . concat . map (el "li" . getStructureString)
Ordered lists
ol_ :: [Structure] -> Structure
ol_ =
  Structure . el "ol" . concat . map (el "li" . getStructureString)

Note: the two functions above could be unified.

Code blocks
code_ :: String -> Structure
code_ = Structure . el "pre"

Summary

In this chapter we built a very minimal HTML EDSL. We will later use this library to convert our custom markup formatted text to HTML.

We've also learned about:

  • Defining and using functions
  • Types and type signatures
  • Embedded domain specific languages
  • Chaining functions using the . operator
  • Preventing incorrect use with newtypes
  • Defining modules and the Internal module pattern
  • Encapsulation using newtypes and modules

Here's our complete program up to this point:

-- hello.hs

import Html

main :: IO ()
main = putStrLn (render myhtml)

myhtml :: Html
myhtml =
  html_
    "My title"
    ( append_
      (h1_ "Header")
      ( append_
        (p_ "Paragraph #1")
        (p_ "Paragraph #2")
      )
    )
-- Html.hs

module Html
  ( Html
  , Title
  , Structure
  , html_
  , h1_
  , p_
  , ul_
  , ol_
  , code_
  , append_
  , render
  )
  where

import Html.Internal
-- Html/Internal.hs

module Html.Internal where

-- * Types

newtype Html
  = Html String

newtype Structure
  = Structure String

type Title
  = String

-- * EDSL

html_ :: Title -> Structure -> Html
html_ title content =
  Html
    ( el "html"
      ( el "head" (el "title" (escape title))
        <> el "body" (getStructureString content)
      )
    )

p_ :: String -> Structure
p_ = Structure . el "p" . escape

h1_ :: String -> Structure
h1_ = Structure . el "h1" . escape

ul_ :: [Structure] -> Structure
ul_ =
  Structure . el "ul" . concat . map (el "li" . getStructureString)

ol_ :: [Structure] -> Structure
ol_ =
  Structure . el "ol" . concat . map (el "li" . getStructureString)

code_ :: String -> Structure
code_ = Structure . el "pre"

append_ :: Structure -> Structure -> Structure
append_ c1 c2 =
  Structure (getStructureString c1 <> getStructureString c2)

-- * Render

render :: Html -> String
render html =
  case html of
    Html str -> str

-- * Utilities

el :: String -> String -> String
el tag content =
  "<" <> tag <> ">" <> content <> "</" <> tag <> ">"

getStructureString :: Structure -> String
getStructureString content =
  case content of
    Structure str -> str

escape :: String -> String
escape =
  let
    escapeChar c =
      case c of
        '<' -> "&lt;"
        '>' -> "&gt;"
        '&' -> "&amp;"
        '"' -> "&quot;"
        '\'' -> "&#39;"
        _ -> [c]
  in
    concat . map escapeChar

Custom markup language

In this chapter we will define our own simple markup language and parse documents written in this language into Haskell data structures.

Our markup language will contain the following features:

  • Headers: prefix by a number of * characters
  • Paragraphs: a group of lines without empty lines in between
  • Unordered lists: a group of lines each prefixed with -
  • Ordered lists: a group of lines each prefixed with #
  • Code blocks: a group of lines each prefixed with >

Here's a sample document which we will use as an example:

* Compiling programs with ghc

Running ghc invokes the Glasgow Haskell Compiler (GHC),
and can be used to compile Haskell modules and programs into native
executables and libraries.

Create a new Haskell source file named hello.hs, and write
the following code in it:

> main = putStrLn "Hello, Haskell!"

Now, we can compile the program by invoking ghc with the file name:

> ➜ ghc hello.hs
> [1 of 1] Compiling Main             ( hello.hs, hello.o )
> Linking hello ...

GHC created the following files:

- hello.hi - Haskell interface file
- hello.o - Object file, the output of the compiler before linking
- hello (or hello.exe on Microsoft Windows) - A native runnable executable.

GHC will produce an executable when the source file satisfies both conditions:

# Defines the main function in the source file
# Defines the module name to be Main, or does not have a module declaration

Otherwise, it will only produce the .o and .hi files.

Which we will, eventually, want to convert into this (modulo formatting) HTML:

<h1>Compiling programs with ghc</h1>

<p>Running ghc invokes the Glasgow Haskell Compiler (GHC),
and can be used to compile Haskell modules and programs into native
executables and libraries.
</p>

<p>Create a new Haskell source file named hello.hs, and write
the following code in it:
</p>

<pre>main = putStrLn "Hello, Haskell!"
</pre>

<p>Now, we can compile the program by invoking ghc with the file name:</p>

<pre>
➜ ghc hello.hs
[1 of 1] Compiling Main             ( hello.hs, hello.o )
Linking hello ...
</pre>

<p>GHC created the following files:
</p>

<ul>
  <li>hello.hi - Haskell interface file</li>
  <li>hello.o - Object file, the output of the compiler before linking</li>
  <li>hello (or hello.exe on Microsoft Windows) - A native runnable executable.</li>
</ul>

<p>GHC will produce an executable when the source file satisfies both conditions:
</p>

<ol>
  <li>Defines the main function in the source file</li>
  <li>Defines the module name to be Main, or does not have a module declaration</li>
</ol>

<p>Otherwise, it will only produce the .o and .hi files.
</p>

Representing the markup language as a Haskell data type

One of the clear differentiators between Haskell and other ML-family of languages from most mainstream languages is the ability to represent data precisely and succinctly.

So how do we represent our markup language using Haskell?

Previously, in our HTML builder library, we used newtypes to differentiate between HTML documents, structures and titles, but we didn't really need to differentiate between different kinds of structures such as paragraphs and headers, not without parsing the data at least.

In this case, we have a list of structures, and each structure could be one of a few specific options (a paragraph, a header, a list, etc), and we want to be able to know which structure is which so we can easily convert it into the equivalent HTML representation.

For that, we have data definitions. data gives us the ability to create custom types by grouping multiple types together and having alternative structures. Think of them as combination of both structs and enums.

data declarations look like this:

data <Type-name> <type-args>
  = <Data-constructor1> <types>
  | <Data-constructor2> <types>
  | ...

It looks really similar to newtype, but there are two important differences:

  1. In the <types> part we can write many types (Like Int, String, or Bool). For newtypes we can only write one.
  2. We can have alternative structures using |, newtypes have no alternatives.

This is because newtypes is used to provide a type safe alias, and data is used to build a new composite type that can potentially have alternatives.

Let's see a few of examples of data types:

  1. Bool

    data Bool
      = True
      | False
    

    We created a new data type named Bool with the possible values True or False. In this case we only have constructor alternatives and none of the constructors carry additional values, this is similar to enums in other languages.

  2. Person

    data Person
      = Person String Int -- where the first is the name and the second is
                          -- the age
    

    We created a new data type named Person. Values of the type Person look like this:

    Person <some-string> <some-int>
    

    For example:

    Person "Gil" 32
    

    In this case we create a composite of multiple types, without alternatives. This is similar to structs in other language, but structs give each field a name, and here we distinguish them by position.

    Alternatively, Haskell has syntactic sugar for naming fields called records. The above definition can also be written like this:

    data Person
      = Person
        { name :: String
        , age :: Int
        }
    

    Values of this type can be written exactly as before,

    Person "Gil" 32
    

    Or with this syntax:

    Person { name = "Gil", age = 32 }
    

    Haskell will also generate functions that can be used to extract the fields from the composite type:

    name :: Person -> String
    age :: Person -> Int
    

    Which can be used like this:

    ghci> age (Person { name = "Gil", age = 32 })
    32
    

    We even have special syntax for updating specific fields in a record. Of course, we do not update records in place - we generate a new value instead.

    ghci> gil = Person { name = "Gil", age = 32 }
    ghci> age (gil { age = 33 })
    33
    ghci> age gil
    32
    

    Unfortunately, having specialized functions for each field also means that if we defined a different data type with the field age, the functions which GHC needs to generate will clash.

    The easiest way to solve this is to give fields unique names, for example by adding a prefix:

    data Person
      = Person
        { pName :: String
        , pAge :: Int
        }
    

    Another way is by using extensions to the Haskell language, which we will cover in later chapters.

  3. Tuple

    data Tuple a b
      = Tuple a b
    

    This is pretty similar to Person, but we can plug any type we want for this definition. For example:

    Tuple "Clicked" True :: Tuple String Bool
    
    Tuple 'a' 'z' :: Tuple Char Char
    

    This type has special syntax in Haskell:

    ("Clicked", True) :: (String, Bool)
    
    ('a', 'z') :: (Char, Char)
    

    This Tuple definition is polymorphic, we define the structure but are able to plug different types into the structure to get concrete types. You can think of Tuple as a template for a data type waiting to be filled, or as a function waiting for types as input in order to return a data type. We can even take a look at the "type" signature of Tuple in ghci using the :kind command.

    ghci> data Tuple a b = Tuple a b
    ghci> :kind Tuple
    Tuple :: * -> * -> *
    

    Quick detour: Kinds

    The :kind command is called as such because the "type" of a type is called a kind. Kinds can be one of two things, either a * which means a saturated (or concrete) type, such as Int or Person, or an -> of two kinds, which is, as you might have guessed, a type function, taking kind and returning a kind.

    Note that only types that have the kind * can have values. So for example while Tuple Int is a valid Haskell concept that has the kind * -> *, and we can write code that will work "generically" for all types that have a certain kind (e.g. * -> *), we cannot construct a value that will have the kind * -> *. All values have types, and all types that have values have the kind *.

    We will talk more about kinds later, for now let's focus on types!

  4. Either

    data Either a b
      = Left a
      | Right b
    

    Similar to Tuple but instead of having only one constructor, we have two. This means that we can choose which side we want. Here are a couple of values of type Either String Int:

    Left "Hello"
    
    Right 17
    

    This type is useful for modeling errors. Either we succeeded and got what we wanted (The Right constructor with the value), or we didn't and got an error instead (The Left constructor with a string or a custom error type).

In our program we use data types to model the different kinds of content types we have in our markup language. We tag each structure using the data constructor and provide the rest of the information (the paragraph text, the list items, etc) in the <types> section of the data declaration for each constructor:

type Document
  = [Structure]

data Structure
  = Header Natural String
  | Paragraph String
  | UnorderedList [String]
  | OrderedList [String]
  | CodeBlock [String]

Note: Natural is defined in the base package but not exported from Prelude. Find out which module to import Natural by using Hoogle.


Exercises

Represent the following markup documents as values of Document:

  1. Hello, world!
    
  2. * Welcome
    
    To this tutorial about Haskell.
    
  3. Remember that multiple lines with no separation
    are grouped together to a single paragraph
    but list items remain separate.
    
    # Item 1 of a list
    # Item 2 of the same list
    
  4. * Compiling programs with ghc
    
    Running ghc invokes the Glasgow Haskell Compiler (GHC),
    and can be used to compile Haskell modules and programs into native
    executables and libraries.
    
    Create a new Haskell source file named hello.hs, and write
    the following code in it:
    
    > main = putStrLn "Hello, Haskell!"
    
    Now, we can compile the program by invoking ghc with the file name:
    
    > ➜ ghc hello.hs
    > [1 of 1] Compiling Main             ( hello.hs, hello.o )
    > Linking hello ...
    
    GHC created the following files:
    
    - hello.hi - Haskell interface file
    - hello.o - Object file, the output of the compiler before linking
    - hello (or hello.exe on Microsoft Windows) - A native runnable executable.
    
    GHC will produce an executable when the source file satisfies both conditions:
    
    # Defines the main function in the source file
    # Defines the module name to be Main, or does not have a module declaration
    
    Otherwise, it will only produce the .o and .hi files.
    

Solutions:

Solution 1
example1 :: Document
example1 =
  [ Paragraph "Hello, world!"
  ]
Solution 2
example2 :: Document
example2 =
  [ Header 1 "Welcome"
  , Paragraph "To this tutorial about Haskell."
  ]
Solution 3
example3 :: Document
example3 =
  [ Paragraph "Remember that multiple lines with no separation are grouped together to a single paragraph but list items remain separate."
  , OrderedList
    [ "Item 1 of a list"
    , "Item 2 of the same list"
    ]
  ]
Solution 4
example4 :: Document
example4 =
  [ Header 1 "Compiling programs with ghc"
  , Paragraph "Running ghc invokes the Glasgow Haskell Compiler (GHC), and can be used to compile Haskell modules and programs into native executables and libraries."
  , Paragraph "Create a new Haskell source file named hello.hs, and write the following code in it:"
  , CodeBlock
    [ "main = putStrLn \"Hello, Haskell!\""
    ]
  , Paragraph "Now, we can compile the program by invoking ghc with the file name:"
  , CodeBlock
    [ "➜ ghc hello.hs"
    , "[1 of 1] Compiling Main             ( hello.hs, hello.o )"
    , "Linking hello ..."
    ]
  , Paragraph "GHC created the following files:"
  , UnorderedList
    [ "hello.hi - Haskell interface file"
    , "hello.o - Object file, the output of the compiler before linking"
    , "hello (or hello.exe on Microsoft Windows) - A native runnable executable."
    ]
  , Paragraph "GHC will produce an executable when the source file satisfies both conditions:"
  , OrderedList
    [ "Defines the main function in the source file"
    , "Defines the module name to be Main, or does not have a module declaration"
    ]
  , Paragraph "Otherwise, it will only produce the .o and .hi files."
  ]

Add a new module named Markup and add the data type definition to it. Note that in this case we do want to export the constructors of Structure.

Solution
-- Markup.hs

module Markup
  ( Document
  , Structure(..)
  )
where

import Numeric.Natural

type Document
  = [Structure]

data Structure
  = Header Natural String
  | Paragraph String
  | UnorderedList [String]
  | OrderedList [String]
  | CodeBlock [String]

Translating directly?

You might ask "Why do we even need to represent the markup as a type? Why don't we convert it into HTML as soon as we parse it instead?". That's a good question and a valid strategy. The reason we first represent it as a Haskell type is for flexibility and modularity.

If the parsing code is coupled with HTML generation, we lose the ability to pre-process the markup document. For example we might want to take only a small part of the document (for summary) and present it, or create a table of content from headers. Or maybe we'd like to add other targets and not just HTML - maybe markdown format or a GUI reader?

Parsing to an "abstract data type" (ADT) representation (one that does not contain the details of the language, that for example that we use '#' for ordered lists) gives us the freedom to do so much more than just conversion to HTML that it's usually worth it in my opinion unless you really need to optimize the process.

Parsing markup part 01 (Recursion)

Let's have a look at how to parse a multi-lined string of markup text written by a user and convert it to the Document type we defined in the previous chapter.

Our strategy is to take the string of markup text, and:

  1. Split it to a list where each element represents a separate line, and
  2. Go over the list line by line and process it, remembering information from previous lines if necessary

So the first thing we want to do is to process the string line by line. We can do that by converting the string to a list of string. Fortunately the Haskell Prelude module from the Haskell standard library base exposes the function lines that does exactly what we want. The Prelude module is exposed in every Haskell file by default so we don't need to import it.

For the line processing part, let's start by ignoring all of the markup syntax and just group lines together into paragraphs (paragraphs are separated by an empty line), and iteratively add new features later in the chapter:

A common solution in imperative programs would be to iterate over the lines using some loop construct and accumulate lines that should be grouped together into some intermediate mutable variable. When we reach an empty line, we insert the content of that variable into another mutable variable that accumulates the results.

Our approach in Haskell isn't so different, except that we do not use loops or mutable variables. Instead, we use recursion.

Recursion and accumulating information

Instead of loops, in Haskell we use recursion to model iteration.

Consider the following contrived example: let's say that we want to write an algorithm for adding two natural numbers together, and we don't have a standard operation to do that (+), but we do have two operations we could use on each number: increment and decrement.

A solution we could come up with is to slowly "pass" one number to the other number iteratively, by incrementing one, and decrementing the other. And we do that until the number we decrement reaches 0.

For example for 3 and 2:

  • We start with 3 and 2, and we increment 3 and decrement 2
  • On the next step we now have 4 and 1, we increment 4 and decrement 1
  • On the next step we now have 5 and 0, since the second number is 0 we declare 5 as the result.

This can be written imperatively using a loop:

function sum(n, m) {
  while (m /= 0) {
    n = increment(n);
    m = decrement(m);
  }
  return n;
}

We can write the same algorithm in Haskell without mutation using recursion:

sum n m =
  if m /= 0
    then sum (increment n) (decrement m)
    else n

In Haskell, in order to emulate iteration with mutable state, we call the function again with the values we want the variables to have in the next iteration.

Evaluation of recursion

Recursion commonly has a bad reputation for being slow and possibly unsafe compared to loops. This is because in imperative languages, calling a function often requires creating a new call stack.

However, functional languages (and Haskell in particular) play by different rules and implement a feature called tail call elimination - when the result of a function call is the result of the function (this is called tail position), we can just drop the current stack frame and then allocate one for the function we call, so we don't require N stack frames for N iterations.

This is of course only one way to do tail call elimination and other strategies exists, such as translating code like our recursive sum above to the iteration version.

Laziness

Haskell plays by slightly different rules because it uses a lazy evaluation strategy instead of the much more common strict evaluation strategy. An evaluation strategy refers to "when do we evaluate a computation". In a strict language the answer is simple: we evaluate the arguments of a function before entering a function.

So for example the evaluation of sum (increment 3) (decrement 2) using strict evaluation will look like this:

  1. Evaluate increment 3 to 4
  2. Evaluate decrement 2 to 1
  3. Evaluate sum 4 1

Or, alternatively (depending on the language) we reverse (1) and (2) and evaluate the arguments from right-to-left instead of left-to-right.

On the other hand, with lazy evaluation, we only evaluate computation when we need it, where 'when do we need it?' is when it is part of a computation that will have some effect on the outside world, for example when writing a computation to standard output or sending it over the network.

So unless this computation is required, it won't be evaluated. For example:

main =
  if sum (increment 2) (decrement 3) == 5
    then putStrLn "Yes."
    else putStrLn "No."

In the case above, we need the result of sum (increment 2) (decrement 3) in order to know which message to write, so it will be evaluated. But:

main =
  let
    five = sum (increment 2) (decrement 3)
  in
    putStrLn "Not required"

In the case above we don't actually need five, so we don't evaluate it!

But then if we know we need sum (increment 2) (decrement 3), do we use strict evaluation now? The answer is no - because we might not need to evaluate the arguments to complete the computation. For example in this case:

const a b = a

main =
  if const (increment 2) (decrement 3) == 3
    then putStrLn "Yes."
    else putStrLn "No."

const ignores the second argument and returns the first, so we don't actually need to calculate decrement 3 in order to provide an answer to the computation and in turn output an answer to the screen.

With the lazy evaluation strategy we will evaluate expressions when we need to (when they are required in order to do something for the user), and we evaluate from the outside in - first we enter functions, and then we evaluate the arguments when we need to (usually when the thing we want to evaluate appears in some control flow such as the condition of an if expression or a pattern in pattern matching).


I've written a more in-depth blog post about how this works in Haskell: Substitution and Equational Reasoning.

Please read it and try to evaluate the following program by hand:

import Prelude hiding (const, sum) -- feel free to ignore this line

increment n = n + 1

decrement n = n - 1

const a b = a

sum n m =
  if m /= 0
    then sum (increment n) (decrement m)
    else n

main =
  if const (sum 3 2) (decrement 3) == 5
    then putStrLn "Yes."
    else putStrLn "No."

Remember that evaluation always begins from main.

Solution

evaluating main

if const (sum 3 2) (decrement 3) == 5
  then putStrLn "Yes."
  else putStrLn "No."

expanding const

if sum 3 2 == 5
  then putStrLn "Yes."
  else putStrLn "No."

expanding sum

if (if 2 /= 0 then sum (increment 3) (decrement 2) else 3) == 5
  then putStrLn "Yes."
  else putStrLn "No."

evaluating the control flow 2 /= 0

if (if True then sum (increment 3) (decrement 2) else 3) == 5
  then putStrLn "Yes."
  else putStrLn "No."

Choosing the then branch

if (sum (increment 3) (decrement 2)) == 5
  then putStrLn "Yes."
  else putStrLn "No."

expanding sum

if
  ( if decrement 2 /= 0
    then sum
      (increment (increment 3))
      (decrement (decrement 2))
    else (increment 3)
  ) == 5
  then putStrLn "Yes."
  else putStrLn "No."

Evaluating decrement 2 in the control flow (notice how both places change!)

if
  ( if 1 /= 0
    then sum
      (increment (increment 3))
      (decrement 1)
    else (increment 3)
  ) == 5
  then putStrLn "Yes."
  else putStrLn "No."

Evaluating the control flow 1 /= 0

if
  ( if True
    then sum
      (increment (increment 3))
      (decrement 1)
    else (increment 3)
  ) == 5
  then putStrLn "Yes."
  else putStrLn "No."

Choosing the then branch

if
  ( sum
    (increment (increment 3))
    (decrement 1)
  ) == 5
  then putStrLn "Yes."
  else putStrLn "No."

Expanding sum

if
  ( if decrement 1 /= 0
    then sum
      (increment (increment (increment 3)))
      (decrement (decrement 1))
    else increment (increment 3)
  ) == 5
  then putStrLn "Yes."
  else putStrLn "No."

Evaluating control flow decrement 1

if
  ( if 0 /= 0
    then sum
      (increment (increment (increment 3)))
      (decrement 0)
    else increment (increment 3)
  ) == 5
  then putStrLn "Yes."
  else putStrLn "No."

Evaluating control flow 0 /= 0

if
  ( if False
    then sum
      (increment (increment (increment 3)))
      (decrement 0)
    else increment (increment 3)
  ) == 5
  then putStrLn "Yes."
  else putStrLn "No."

Choosing the else branch

if
  (increment (increment 3)) == 5
  then putStrLn "Yes."
  else putStrLn "No."

Evaluate control flow increment (increment 3)

if
  (increment 3 + 1) == 5
  then putStrLn "Yes."
  else putStrLn "No."

Evaluate in control flow increment 3

if
  (3 + 1 + 1) == 5
  then putStrLn "Yes."
  else putStrLn "No."

Evaluate in control flow 3 + 1

if
  (4 + 1) == 5
  then putStrLn "Yes."
  else putStrLn "No."

Evaluate in control flow 4 + 1

if
  5 == 5
  then putStrLn "Yes."
  else putStrLn "No."

Evaluate in control flow 5 == 5

if
  True
  then putStrLn "Yes."
  else putStrLn "No."

Choosing the then branch

putStrLn "Yes."

Which when run will print Yes. to the screen.


General recursion

In general, when trying to solve problems recursively, it is useful to think about the problem in three parts:

  1. Finding the base case (the most simple cases - the ones we already know how to answer)
  2. Figuring out how to reduce the problem to something simpler (so it gets closer to the base case)
  3. Mitigating the difference between the reduced version and the solution we need to provide

The reduce and mitigate steps together are usually called the recursive step.

Let's take a look at another example problem: generating a list of a particular size with a specific value in place of every element.

In Haskell, this function would have the following signature:

replicate :: Int -> a -> [a]

Here are a few usage examples of replicate:

ghci> replicate 4 True
[True,True,True,True]
ghci> replicate 0 True
[]
ghci> replicate (-13) True
[]

How would we implement this function recursively? How would describe it in three steps above?

  1. Base case: the cases we already know how to generate are the cases where the length of the list is zero (or less) - we just return an empty list.
  2. Reduce: while we might not know how to generate a list of size N (where N is positive), if we knew the solution for N-1 we could:
  3. Mitigate: Add another element to the solution for N-1 using the : (cons) operator.

Try to write this in Haskell!

Solution
replicate :: Int -> a -> [a]
replicate n x =
  if n <= 0    -- recognizing the base case
    then
      []       -- the solution for the base case
    else
        x : replicate (n - 1) x
  --   ---  -------------------
  --    ^           ^
  --    |           |
  --    |           +-------- reduction
  --    |
  --    +--- mitigation

Mutual recursion

When solving functions recursively we usually call the same function again, but that doesn't have to be the case. It is possible to reduce our problem to something simpler that requires an answer from a different function. If, in turn, that function will (or another function in that call chain) call our function again, we have a mutual recursive solution.

For example, let's write two functions, one that checks whether a natural number is even or not, and one that checks whether a number is odd or not only by decrementing it.

even :: Int -> Bool

odd :: Int -> Bool

Let's start with even, how should we solve this recursively?

  1. Base case: We know the answer for 0 - it is True.
  2. Reduction: We might not know the answer for a general N, but we could check whether N - 1 is odd,
  3. Mitigation: if N - 1 is odd, then N is even! if it isn't odd, then N isn't even.

What about odd?

  1. Base case: We know the answer for 0 - it is False.
  2. Reduction: We might not know the answer for a general N, but we could check whether N - 1 is even,
  3. Mitigation: if N - 1 is even, then N is odd! if it isn't even, then N isn't odd.

Try writing this in Haskell!

Solution
even :: Int -> Bool
even n =
  if n == 0
    then
      True
    else
      odd (n - 1)

odd :: Int -> Bool
odd n =
  if n == 0
    then
      False
    else
      even (n - 1)


Partial functions

because we didn't handle negative cases in the example above, our functions will loop forever when a negative value is passed as input. A function that does not return a result for some value (either by not terminating or by throwing an error) is called a partial function (because it only returns a result of a part of the possible inputs).

Partial functions are generally considered bad practice because they can have unexpected behaviour at runtime, so we want to avoid using partial functions as well as avoid writing partial functions.

The best way to avoid writing partial functions is by covering all inputs! In the situation above, it is definitely possible to handle negative numbers as well, so we should do that! Or, instead, we could require that our functions accept a Natural instead of an Int, and then the type system would've stopped us from using these functions with values that we did not handle.

There are cases where we can't possibly cover all inputs, in these cases it is important to re-examine the code and see if we could further restrict the inputs using types to mitigate these issues.

For example, the head :: [a] -> a function from Prelude promises to return the first element (the head) of a list, but we know that lists could possibly be empty, so how can this function deliver on its promise?

Unfortunately, it can't. But there exists a different function that can: head :: NonEmpty a -> a from the Data.List.NonEmpty module! The trick here is that this other head does not take a general list as input, it takes a different type entirely, one that promises to have at least one element, and therefore can deliver on its promise!

We could also potentially use smart constructors with newtype and enforce some sort of a restriction in the type system, as we saw in earlier chapters. But this solution can sometimes be less ergonomic to use.

An alternative approach is to use data types to encode the absence of a proper result, for example, using Maybe, as we'll see in a future chapter.

Parsing markup?

Let's get back to the task at hand.

As stated previously, our strategy for parsing the markup text is:

  1. Split the string to a list where each element is a separate line (which we can do with lines), and
  2. Go over the list line by line and process it, remembering information from previous lines if necessary

Remember that we want to start by ignoring all of the markup syntax and just group lines together into paragraphs (paragraphs are separated by an empty line), and iteratively add new features later in the chapter:

parse :: String -> Document
parse = parseLines [] . lines -- (1)

parseLines :: [String] -> [String] -> Document
parseLines currentParagraph txts =
  let
    paragraph = Paragraph (unlines (reverse currentParagraph)) -- (2), (3)
  in
    case txts of
      [] -> [paragraph]
      currentLine : rest ->
        if trim currentLine == ""
          then
            paragraph : parseLines [] rest -- (4)
          else
            parseLines (currentLine : currentParagraph) rest -- (5)

trim :: String -> String
trim = unwords . words

Things to note:

  1. We pass a list that contains the currently grouped paragraph (paragraphs are separated by an empty line)
  2. Because of laziness, paragraph is not computed until it's needed, so we don't have to worry about the performance implications in the case the we are still grouping lines
  3. Why do we reverse currentParagraph? (See point (5))
  4. When we run into an empty line we add the accumulated paragraph to the resulting list (A Document is a list of structures) and start the function again with the rest of the input.
  5. We pass the new lines to be grouped in a paragraph in reverse order because of performance characteristics - because of the nature of singly-linked lists, prepending an element is fast, and appending is slow. Prepending only requires us to create a new cons (:) cell to hold a pointer to the value and a pointer to the list, but appending requires us to traverse the list to its end and rebuild the cons cells - the last one will contain the last value of the list and a pointer to the list to append, the next will contain the value before the last value of the list and a pointer to the list which contains the last element and the appended list, and so on.

This code above will group together paragraphs in a structure, but how do we view our result? In the next chapter will take a short detour and talk a bit about type classes, and how they can help us in this scenario.

Displaying the parsing results (type classes)

We want to be able to print a textual representation of values of our Document type. There are a few ways to do that:

  1. Write our own function of type Document -> String which we could then print, or
  2. Have Haskell write one for us

Haskell provides us with a mechanism that can automatically generate the implementation of a type class function called show, that will convert our type to String.

The type of the function show looks like this:

show :: Show a => a -> String

This is something new we haven't seen before. Between :: and => you see what is called a type class constraint on the type a. What we say in this signature, is that the function show can work on any type that is a member of the type class Show.

Type classes is a feature in Haskell that allows us to declare a common interface for different types. In our case, Haskell's standard library defines the type class Show in the following way (this is a simplified version but good enough for our purposes):

class Show a where
  show :: a -> String

A type class declaration describes a common interface for Haskell types. show is an overloaded function that will work for any type that is an instance of the type class Show. We can define an instance of a type class manually like this:

instance Show Bool where
  show x =
    case x of
      True -> "True"
      False -> "False"

Defining an instance means providing an implementation for the interface for a specific type. When we call the function show on a data type, the compiler will search the instance of the type it inferred, and use the implementation provided in the instance declaration.

ghci> show True
"True"
ghci> show 187
"187"
ghci> show "Hello"
"\"Hello\""

As can be seen above, the show function converts a value to its textual representation. Which is why "Hello" includes the quotes as well. The Show type class is usually used for debugging purposes.

Deriving instances

It is also possible to automatically generate implementations of a few selected type classes. Fortunately, Show is one of them.

If all the types we use in the definition of our data type already implement an instance of Show, we can automatically derive it by adding deriving Show at the end of the data definition.

data Structure
  = Header Natural String
  | Paragraph String
  | UnorderedList [String]
  | OrderedList [String]
  | CodeBlock [String]
  deriving Show

Now we can use the function show :: Show a => a -> String for any type that implements an instance of the Show type class. For example, with print:

print :: Show a => a -> IO ()
print = putStrLn . show

We can first convert our type to String and then write it to the standard output.

And because lists also implement Show for any element type that has a Show instance, we can now print Documents, because they are just aliases for [Structure]. Try it!

There are many type classes Haskellers use everyday. A couple more are Eq for equality and Ord for ordering. These are also special type classes that can be derived automatically.

Laws

Type classes often come with "rules" or "laws" that instances should satisfy, the purpose of these laws is to provide predictable behaviour across instances, so that when we run into a new instance we can be confident that it will behave in a certain expected way, and we can write code that works generically for all instances of a type class while expecting them to adhere to these rules.

As an example, let's look at the Semigroup type class:

class Semigroup a where
  (<>) :: a -> a -> a

This type class provides a common interface for types with an operation <> that can combine two values into one in some way.

This type class also mentions that this <> operation should be associative, meaning that these two sides should evaluate to the same result:

x <> (y <> z) = (x <> y) <> z

An example of a lawful instance of Semigroup is lists with the append operation (++):

instance Semigroup [a] where
  (<>) = (++)

Unfortunately the Haskell type system cannot "prove" that instances satisfy these laws, but as a community we often shun unlawful instances.

Many data types (together with their respective operations) can form a Semigroup, and instances don't even have to look similar or have a common analogy/metaphor (and this is true for many other type classes as well).

Type classes are often just interfaces with laws (or expected behaviour if you will). Approaching them with this mindset can be very liberating!

To put it a differently, type classes can be used to create abstractions - interfaces with laws/expected behaviour where we don't actually care about the concrete details of the underlying type, just that it implements a certain API and behaves in a certain way.

Regarding Semigroup, we have previously created a function that looks like <> for our Html EDSL! We can add a Semigroup instance for our Structure data type and have a nicer to use API!


Exercise: Please do this and remove the append_ function from the API.

Solution

Replace this:

append_ :: Structure -> Structure -> Structure
append_ c1 c2 =
  Structure (getStructureString c1 <> getStructureString c2)

With this:

instance Semigroup Structure where
  (<>) c1 c2 =
    Structure (getStructureString c1 <> getStructureString c2)

And remove the export of append_ in Html.hs. You won't need to further export anything as type class instances are exported automatically.

You will also need replace the usage of append_ with <> in hello.hs.


Parsing markup part 02 (Pattern matching)

Maybe

Previously, when we talked about partial functions, we mentioned that one way to avoid writing partial functions is to encode the absence of a result using Maybe:

data Maybe a
    = Nothing
    | Just a

Maybe is a data type from the standard library (named base) that is used to add an additional value to a type: the absence of a value. For example, Maybe Bool has three values, two with the Just constructor to represent regular boolean values (Just True and Just False) and another value, Nothing to represent the absence of a boolean value.

We can use this to encode the result of head, a function that promises to return the first element of a list, without creating a partial function:

head :: [a] -> Maybe a

This way, when the list is empty, we can return Nothing, and when it has at least one element, we can return Just <first element>. This function can be found in the Data.Maybe module under the name listToMaybe.

In order to consume values of type Maybe <something>, and other types created with data, we can use pattern matching.

Pattern Matching

We've already seen pattern matching a few times. It is an incredibly versatile feature of Haskell, we can use it to do two main things:

  1. Deconstruct complex values
  2. Control flow

As we've seen when discussing newtypes, we can use case expressions and function definitions to deconstruct a newtype, but we can do that for data types as well:

-- | A data type representing colors
data Color
  = RGB Word8 Word8 Word8

getBluePart :: Color -> Word8
getBluePart color =
  case color of
    RGB _ _ blue -> blue

In getBluePart we deconstruct a composite value into its part and extract the third component representing the blue value in a color represented by red, green and blue components (RGB).

Note that blue is the name we give to the third component so it will be bound to the right of the arrow that comes after the pattern. This is similar to a function argument. Also note that _ matches any value without binding it to a name.

We can also try to match a value with more than one pattern:

data Brightness
  = Dark
  | Bright

data EightColor
  = Black
  | Red
  | Green
  | Yellow
  | Blue
  | Magenta
  | Cyan
  | White

data AnsiColor
  = AnsiColor Brightness EightColor

ansiColorToVGA :: AnsiColor -> Color
ansiColorToVGA ansicolor =
  case ansicolor of
    AnsiColor Dark Black ->
      RGB 0 0 0
    AnsiColor Bright Black ->
      RGB 85 85 85
    AnsiColor Dark Red ->
      RGB 170 0 0
    AnsiColor Bright Red ->
      RGB 255 85 85
    -- and so on

It's important to notice a few things here:

  1. Patterns can be nested, notice how we deconstructed ansicolor on multiple levels
  2. We try to match patterns from the top down, it is possible for patterns to overlap with one another and the top one will win
  3. If the value we try to match does not match any of the patterns listed, an error will be thrown at runtime

We can ask GHC to notify us when we accidentally write overlapping patterns, or when we haven't listed enough patterns to match all possible values, by passing the flag -Wall to ghc or runghc.

My recommendation is to always use -Wall!

As an aside, while it is possible to use pattern matching in function definitions by defining a function multiple times, I personally don't like that feature very much and I would encourage you to avoid it, but if you want to use it instead of case expressions, it is possible.


Exercises:

  1. Create a function isBright :: AnsiColor -> Bool that checks whether a color is bright or not
  2. Use this table to write ansiToUbuntu.
  3. Create a function isEmpty :: [a] -> Bool that uses listToMaybe to check whether a list is empty or not
  4. Create a function isEmpty :: [a] -> Bool that doesn't use listToMaybe to check whether a list is empty or not

Parsing with rich context

Previously we wrote a parser that separates documents into different paragraphs. With new features under our belt we can now remember the exact context we are in (whether it is a text paragraph, a list, or a code block) and act accordingly!

Let's look again at the parsing code we wrote previously:

parse :: String -> Document
parse = parseLines [] . lines -- (1)

parseLines :: [String] -> [String] -> Document
parseLines currentParagraph txts =
  let
    paragraph = Paragraph (unlines (reverse currentParagraph)) -- (2), (3)
  in
    case txts of
      [] -> [paragraph]
      currentLine : rest ->
        if trim currentLine == ""
          then
            paragraph : parseLines [] rest -- (4)
          else
            parseLines (currentLine : currentParagraph) rest -- (5)

trim :: String -> String
trim = unwords . words

Previously our context, currentParagraph, was used to group adjacent lines in an accumulative list.

Next, instead of using a [String] type to denote adjacent lines, we can instead use a Structure to denote the context.

One issue we might have though with representing context with the Structure type, is that when we start parsing we don't have any context. But we learned of a way to represent the possibility of an absence of a value with Maybe! So our new context type can be Maybe Structure instead.

Let's rewrite our code above to use our new context type:

parse :: String -> Document
parse = parseLines Nothing . lines -- (1)

parseLines :: Maybe Structure -> [String] -> Document
parseLines context txts =
  case txts of
    [] -> maybeToList context   -- (2)
    -- Paragraph case
    currentLine : rest ->
      let
        line = trim currentLine
      in
        if line == ""
          then
            maybe id (:) context (parseLines Nothing rest) -- (3)
          else
            case context of
              Just (Paragraph paragraph) ->
                parseLines (Just (Paragraph (unwords [paragraph, line]))) rest -- (4), (5)
              _ ->
                maybe id (:) context (parseLines (Just (Paragraph line)) rest)

trim :: String -> String
trim = unwords . words
  1. We can now pass Nothing when we don't have a context

  2. Unsure what maybeToList does? Hoogle it!

  3. maybe is a function that works similarly to pattern matching on a Maybe. Check out the types of id, (:) and maybe id (:) in GHCi!

  4. Hey! Didn't we say that appending Strings/lists is slow (which is what unwords does)? Yes, it is. Because in our Structure data type, a paragraph is defined as Paragraph String and not Paragraph [String], we can't use our trick of building a list of lines and the reverse it at the end.

    So what do we do? There are many ways to handle that, one simple way is to create a different type with the right shape:

    data Context
      = CtxHeader Natural String
      | CtxParagraph [String]
      | CtxUnorderedList [String]
      | CtxOrderedList [String]
      | CtxCodeBlock [String]
    

    Since creating new types in Haskell is cheap, this is a very viable solution.

    In this case I'm going with the approach of not worrying about it too much, because it's a very local piece of code that can easily be fixed later if we see that it's an issue.

  5. Anyway, if you've used -Wall like I've suggested, you'd get a warning from GHC saying that the "pattern matches are non-exhaustive". This is because we did not cover all cases. So let's cover more cases:

parse :: String -> Document
parse = parseLines Nothing . lines

parseLines :: Maybe Structure -> [String] -> Document
parseLines context txts =
  case txts of
    -- done case
    [] -> maybeToList context
   
    -- Header 1 case
    ('*' : ' ' : line) : rest ->
      maybe id (:) context (Header 1 (trim line) : parseLines Nothing rest)
   
    -- Unordered list case
    ('-' : ' ' : line) : rest ->
      case context of
        Just (UnorderedList list) ->
          parseLines (Just (UnorderedList (list <> [trim line]))) rest

        _ ->
          maybe id (:) context (parseLines (Just (UnorderedList [trim line])) rest)
   
    -- Paragraph case
    currentLine : rest ->
      let
        line = trim currentLine
      in
        if line == ""
          then
            maybe id (:) context (parseLines Nothing rest)
          else
            case context of
              Just (Paragraph paragraph) ->
                parseLines (Just (Paragraph (unwords [paragraph, line]))) rest
              _ ->
                maybe id (:) context (parseLines (Just (Paragraph line)) rest)

trim :: String -> String
trim = unwords . words

Exercise: Add the CodeBlock and OrderedList cases.

Final module
-- Markup.hs

module Markup
  ( Document
  , Structure(..)
  , parse
  )
where

import Numeric.Natural
import Data.Maybe (maybeToList)

type Document
  = [Structure]

data Structure
  = Header Natural String
  | Paragraph String
  | UnorderedList [String]
  | OrderedList [String]
  | CodeBlock [String]
  deriving (Eq, Show)    -- (1)


parse :: String -> Document
parse = parseLines Nothing . lines

parseLines :: Maybe Structure -> [String] -> Document
parseLines context txts =
  case txts of
    -- done case
    [] -> maybeToList context
   
    -- Header 1 case
    ('*' : ' ' : line) : rest ->
      maybe id (:) context (Header 1 (trim line) : parseLines Nothing rest)
   
    -- Unordered list case
    ('-' : ' ' : line) : rest ->
      case context of
        Just (UnorderedList list) ->
          parseLines (Just (UnorderedList (list <> [trim line]))) rest

        _ ->
          maybe id (:) context (parseLines (Just (UnorderedList [trim line])) rest)

    -- Ordered list case
    ('#' : ' ' : line) : rest ->
      case context of
        Just (OrderedList list) ->
          parseLines (Just (OrderedList (list <> [trim line]))) rest

        _ ->
          maybe id (:) context (parseLines (Just (OrderedList [trim line])) rest)

    -- Code block case
    ('>' : ' ' : line) : rest ->
      case context of
        Just (CodeBlock code) ->
          parseLines (Just (CodeBlock (code <> [line]))) rest

        _ ->
          maybe id (:) context (parseLines (Just (CodeBlock [line])) rest)

    -- Paragraph case
    currentLine : rest ->
      let
        line = trim currentLine
      in
        if line == ""
          then
            maybe id (:) context (parseLines Nothing rest)
          else
            case context of
              Just (Paragraph paragraph) ->
                parseLines (Just (Paragraph (unwords [paragraph, line]))) rest
              _ ->
                maybe id (:) context (parseLines (Just (Paragraph line)) rest)

trim :: String -> String
trim = unwords . words

How do we know our parser works correctly?

In an earlier chapter, we parsed a few examples of our markup language by hand. Now, we can try to test our parser by comparing our solutions to our parser. By adding Eq to the list of type class instances to derive for our Structure data type (as shown on the line marked with (1) in "final module" above), we can add these to our module and use the == (equals) operator to compare our solutions to the result our parser gives.

Try it in GHCi! You can read a text file in GHCi using the following syntax:

ghci> txt <- readFile "/tmp/sample.txt"

And then compare with the hand written example values from the solutions (after adding them to the module and loading them in GHCi):

ghci> parse txt == example4

In a later chapter, we'll discuss how to use a testing framework and write automated tests for our parser. But first, I'd like to glue things together so we'll be able to:

  1. Read markup text from a file
  2. Parse the text
  3. Convert the result to our HTML EDSL
  4. Generate HTML code

And also discuss how to work with IO in Haskell while we're at it.

Gluing things together

In this chapter we are going to glue the pieces that we built together and build an actual blog generator. We will:

  1. Read markup text from a file
  2. Parse the text to a Document
  3. Convert the result to our Html EDSL
  4. Generate HTML code
  5. Write it to file

While doing so, we will learn:

  • How to work with IO
  • How to import external libraries to process whole directories and create a simple command-line interface

Converting Markup to HTML

One key part is missing before we can do glue everything together, and that is to convert our Markup data types to Html.

We'll start by creating a new module and import both the Markup and the Html modules.

module Convert where

import qualified Markup
import qualified Html

Qualified Imports

This time, we've imported the modules qualified. Qualified imports means that instead of exposing the names that we've defined in the imported module to the general module name space, they now have to be prefixed with the module name.

For example, parse becomes Markup.parse. If we would've imported Html.Internal qualified, we'd have to write Html.Internal.el which is a bit long.

We can also give a new name to the module to be used instead with the as keyword:

import qualified Html.Internal as HI

And now write HI.el instead.

I like using qualified imports because readers do not have to guess where a name came from. Some modules are even designed to be imported qualified. For example, many container APIs such as maps, sets and vectors have very similar API. If we want to multiple containers in a single module we pretty much have to use qualified imports so that when we write a function such as singleton, which creates a container with a single value, GHC will know to which singleton function from which module we are referring.

Some people prefer to use import lists instead of qualified imports as well, because qualified names can be a bit verbose and noisy. I usually prefer them, but up to you. For more information about imports, see this wiki article.

Converting Markup.Structure to Html.Structure

Converting a markup structure to an HTML structure is mostly straightforward at this point, we need to pattern match on the markup structure and use the relevant HTML API.

convertStructure :: Markup.Structure -> Html.Structure
convertStructure structure =
  case structure of
    Markup.Header 1 txt ->
      Html.h1_ txt

    Markup.Paragraph p ->
      Html.p_ p

    Markup.UnorderedList list ->
      Html.ul_ $ map Html.p_ list

    Markup.OrderedList list ->
      Html.ol_ $ map Html.p_ list

    Markup.CodeBlock list ->
      Html.code_ (unlines list)

Notice that running this code with -Wall will reveal that the pattern matching is non-exhaustive. This is because we don't currently have a way to build headers that are not h1. There are a few ways to handle this:

  • Ignore the warning - this will like fail at runtime one day and the user will be sad
  • Pattern match other cases and add a nice error with the error function, has the same disadvantage above, but will also not notify that there's a possible issue here at compile time.
  • Pattern match and do the wrong thing - user is still sad
  • Encode errors in the type system using Either, we'll see how to do this in later chapters
  • Restrict the input - change Markup.Header to not include a number but rather specific supported headers. This is a reasonable approach.
  • Implement an HTML function supporting arbitrary headers. Should be straightforward to do.

Exercise: Implement h_ :: Natural -> String -> Structure.

Solution
import Numeric.Natural

h_ :: Natural -> String -> Structure
h_ n = Structure . el ("h" <> show n) . escape

Don't forget to export it from Html.hs!

Exercise: Fix convertStructure using h_.

Solution
convertStructure :: Markup.Structure -> Html.Structure
convertStructure structure =
  case structure of
    Markup.Header n txt ->
      Html.h_ n txt

    Markup.Paragraph p ->
      Html.p_ p

    Markup.UnorderedList list ->
      Html.ul_ $ map Html.p_ list

    Markup.OrderedList list ->
      Html.ol_ $ map Html.p_ list

    Markup.CodeBlock list ->
      Html.code_ (unlines list)

Document -> Html

In order to create an Html document, we need to use the html_ function. This function expects two things: a Title, and a Structure.

For a title we just could supply it from outside using the file name.

In order to convert our markup Document (which is a list of markup Structure) to an HTML Structure, we need to convert each markup Structure and then concatenate them together.

We already know how to convert each markup Structure, we can use the convertStructure function we wrote and map. This will provide us with the following function:

map convertStructure :: Markup.Document -> [Html.Structure]

To concatenate all of the Html.Structure, we could write a recursive function that tries to do that. However we will quickly run to an issue with the base case, what to do when the list is empty?

We could just provide dummy Html.Structure that represents an empty HTML structure.

Let's add this to Html.Internal:

empty_ :: Structure
empty_ = Structure ""

Now we can write our recursive function. Try it!

Solution
concatStructure :: [Structure] -> Structure
concatStructure list =
  case list of
    [] -> empty_
    x : xs -> x <> concatStructure list

Remember the <> function we implemented as an instance of the Semigroup type class? We said that Semigroup is an abstraction for things that implements (<>) :: a -> a -> a, where <> is associative (a <> (b <> c) = (a <> b) <> c).

It turns out that things that have an "empty" value and are also an instance of Semigroup is quite a common thing. For example a string. And this is actually a well known abstraction. This abstraction is called a monoid.

Monoids

Actually, maybe "empty" isn't a very good description of what we want, and isn't very useful as an abstraction. Instead, we can describe it as what is often called an "identity" element. An identity element is one that satisfy the following laws:

  • x <> <identity> = x
  • <identity> <> x = x

In other words, if we try to use this "empty" - this identity value, as one argument to <>, we will always get the other argument back.

For String, the empty string, "", satisfies this:

"" <> "world" = "world"
"hello" <> "" = "hello"

This is of course true for any value we'd write and not just "world" and "hello".

Actually, if we move out of the Haskell world for a second, even integers with + as the associative binary operations + (in place of <>) and 0 in place of the identity member form a monoid:

17 + 0 = 17
0 + 99 = 99

So integers together with the + operation form a semigroup, and together with 0 form a monoid.

We learn new things from this:

  1. A monoid is a more specific abstraction over semigroup, it builds on it by adding a new condition (the existence of an identity member)
  2. This abstraction can be useful! We can write a general concatStructure that could work for any monoid

And indeed, there exists a type class in base called Monoid which has Semigroup as a super class.

class Semigroup a => Monoid a where
  mempty :: a

Note: this is actually a simplified version. The actual is a bit more complicated because of backwards compatibility and performance reasons. Semigroup was actually introduced in Haskell after Monoid!

We could add an instance of Monoid for our markup Structure data type:

instance Monoid Structure where
  mempty = empty_

And now, instead of using our own concatStructure, we can use the library function

mconcat :: Monoid a => [a] -> a

Which could theoretically be implemented as:

mconcat :: Monoid a => [a] -> a
mconcat list =
  case list of
    [] -> mempty
    x : xs -> x <> mconcat list

(Notice that because Semigroup is a super class of Monoid, we can still use the <> function from the Semigroup class without adding the Semigroup a constraint to the left side of =>. By adding the Monoid a constraint we implicitly add a Semigroup a constraint as well!)

This mconcat function we wrote is very similar to the concatStructure function we wrote, but this one works for any Monoid, including Structure! Abstractions are useful and help us reuse code!

Side note: integers with + and 0 aren't actually an instance of Monoid in Haskell. This is because integers can also form a monoid with * and 1! But there can only be one instance per type. Instead, two other newtypes exist that provide that functionality, Sum and Product. See how they can be used in ghci:

ghci> import Data.Monoid
ghci> Product 2 <> Product 3 -- note, Product is a data constructor
Product {getProduct = 6}
ghci> getProduct (Product 2 <> Product 3)
6
ghci> getProduct $ mconcat $ map Product [1..5]
120

Another abstraction?

We've used map and then mconcat twice now. Surely there has to be a function that unifies this pattern. And indeed, it is called foldMap, and it works not only for lists, it will work for any data structure that can be "folded", or "reduced", into a summary value. This abstraction and type class is called Foldable.

For a simpler understanding of Foldable, we can look at fold:

fold :: (Foldable t, Monoid m) => t m -> m

-- compare with
mconcat :: Monoid m            => [m] -> m

mconcat is just a specialized version of fold for lists. And fold can be a used for any pair of a data structure that implements Foldable and a payload type that implements Monoid. This could be [] with Structure, or Maybe with Product Int, or your new shiny binary tree with String as the payload type. But note that the Foldable type must be of kind * -> *. So for example Html cannot be a Foldable.

foldMap is a function that allows us to apply a function to the payload type of the Foldable type right before combining them with the <> function.

foldMap :: (Foldable t, Monoid m) -> (a -> m) -> t a -> m

-- compare to a specialized version with:
-- - t ~ []
-- - m ~ Html.Structure
-- - a ~ Markup.Structure
foldMap
  :: (Markup.Structure -> Html.Structure)
  -> [Html.Structure]
  -> Html.Structure

True to its name, it really "maps" before it "folds". You might pause here and think "this 'map' we are talking about isn't specific for lists, maybe that's another abstraction?", yes. It is actually a very important and fundamental abstraction called Functor. But I think we had enough abstractions for this chapter. We'll cover it in a later chapter!

Finishing our conversion module

Let's finish our code by writing convert:

convert :: Html.Title -> Markup.Document -> Html.Html
convert title = Html.html_ title . foldMap convertStructure

Now we have a full implementation and are able to convert markup documents to HTML:

-- Convert.hs
module Convert where

import qualified Markup
import qualified Html

convert :: Html.Title -> Markup.Document -> Html.Html
convert title = Html.html_ title . foldMap convertStructure

convertStructure :: Markup.Structure -> Html.Structure
convertStructure structure =
  case structure of
    Markup.Header n txt ->
      Html.h_ n txt

    Markup.Paragraph p ->
      Html.p_ p

    Markup.UnorderedList list ->
      Html.ul_ $ map Html.p_ list

    Markup.OrderedList list ->
      Html.ol_ $ map Html.p_ list

    Markup.CodeBlock list ->
      Html.code_ (unlines list)

Summary

We learned about:

  • Qualified imports
  • Ways to handle errors
  • The Monoid type class and abstraction
  • The Foldable type class and abstraction

Next, we are going to glue our functionality together and learn about I/O in Haskell!

Working with IO

In previous chapters we were able to build a parser from a text string to a Haskell representation of our markup language, and we built an EDSL for easy writing of HTML code. However, our program is still not useful to other users because we did not make this functionality accessible to a user via some sort of user interface.

In our program, we'd like to learn from the user what they'd like our program to convert to HTML, and then convert that for them. There are many ways to design this kind of interface, for example:

  • Get text as input via the standard input and output HTML via the standard output
  • Receive two file names as command-line arguments, read the contents of the first one, and write the output to the second one
  • Ask for some fancier command-line arguments parsing and prefix the file with a flag saying what they are
  • Some fancy GUI interface
  • Combination of all of the above

To make this interesting, we will start with the following interface:

  1. If the user calls the program without arguments, we will read from the standard input, and write to the standard output
  2. If the user calls the program with two arguments, the first one will be the input file name which we will read, and the second one will be the output file name to which we will write
  3. If the output file already exists, we'll ask the user if they want to override the file
  4. On any other kind of input, we'll output a generic message explaining the usage as we described above

In a later chapter, we will add a little fancier command-line interface using a library, and also read whole directories and not single files.

But first, we need to learn a bit about I/O in Haskell, what makes it special, and why it's a bit different from other programming languages.

Purely functional

Originally, Haskell was designed in order to become an open standard functional programming language with non-strict semantics, to serve as a unifying language for future research in functional language design.

In GHC Haskell, we use a lazy evaluation strategy to implement non-strict semantics (We've talked about laziness before).

The requirement for non-strict semantics raises an interesting challenge: How do we design a language that can do more than just evaluate expressions, how do we model interaction with the outside world, how do we do I/O?

The challenge with doing I/O operations in a language with a lazy evaluation strategy is that as programs grow larger, the order of evaluation becomes less trivial to figure out. Consider this hypothetical code example (which won't actually type-check in Haskell, we'll see why soon):

addWithInput :: Int -> Int
addWithInput n = readIntFromStdin + n

main =
  let
    result1 = addWithInput 1
    result2 = addWithInput 2
  in
    print (result2 - result1)

This hypothetical program will read 2 integers from the standard input, and then will subtract the second one (+2) from the first one (+1), or so we would expect if this was a strict language. In a strict language we expect the order of operations to to happen from the top-down.

But in a lazy language we don't actually evaluate an expression until it is needed, and so neither result1 nor result2 are evaluated until we wish to print the result of subtracting one from the other, and then when we try to evaluate -, it requires to evaluate the two arguments in order from left to right, so we first evaluate result2.

Evaluating result2, with substitution, means to replace occurrences of n with the input 2, and then evaluate the top level function (+), which is a primitive function. We then evaluate its arguments, readIntFromStdin and then n; at this point we are reading the first integer from the stdin.

After calculating the result, we can move to evaluate result1, in which, during evaluation, will read the second integer from stdin. This is the complete opposite of what we wanted!

Issues like these make lazy evaluation hard to work with in the presence of side effects - when the evaluation of an expression can affect or be affected by the outside world, this includes reading/writing from mutable memory or performing I/O operations.

We call functions that has side-effects such as addWithInput impure functions. And an unfortunate consequence of impure functions is that they can return different results even when they takes the same input.

The presence of impure functions makes it harder for us to reason about lazy evaluation, and also messes up our ability to use equational reasoning to understand programs.

Therefore, in Haskell, it was decided to only allow pure functions and expressions - ones that have no side effects. Pure functions will always return the same output (given the same input) and evaluating pure expressions is deterministic.

But now, how can we do input/output operations? There are many possible solutions for the design space - for Haskell it was decided to design a first class interface with an accompanied type called IO. IO's interface will force a distinction from non-I/O expressions, and will also require that in order to combine multiple IO operations, we will have to determine the order between them.

IO

IO is an opaque type, like our Html type in which we hid its internal representation from the user behind an interface. But in this case IO is a built-in type that is hidden by the Haskell language rather than a module.

IO has a payload type like Maybe has which represents the result of an IO operation/action/computation. When there isn't a meaningful result, we use the unit type, () (which only has one value: ()) to represent that.

Here are a few IO operations and functions that return IO operations:

putStrLn :: String -> IO ()

getLine :: IO String

getArgs :: IO [String]

lookupEnv :: String -> IO (Maybe String)

writeFile :: FilePath -> String -> IO ()

Notice that each function returns an IO <something>, but what does that mean?

The meaning behind IO a is that it is a description of a program (or subroutine) that when executed will produce some value of type a, and may do some I/O effects during execution.

Executing an IO a is different from evaluating it. Evaluating an IO a expression is pure - the evaluation will always reduce to the same description of a program. This helps us keep purity and equational reasoning!

The Haskell runtime will execute the entry point to the program (the expression main, that must have the type IO ()) in order for our IO operation to also run it has to be combined into the main expression - let's see what that means.

Combining IO expressions

Just like our Html.Structure type, the IO interface provides combinators for composing small IO operations into bigger ones. This interface also makes sure that the order of operations is well defined!

Note that, just like with <> we've defined for Html.Structure, the combinators for IO are implemented as type-class instances rather than specialized variants (for example our append_ function was a specialized version of <> tailored only for Structure).

In this section I will introduce specialized type signatures rather than generalized ones, because I think it'll be easier to digest, but we'll talk about the generalized versions later.

>>=

Our first combinator is >>= (pronounced bind), and is the most useful of the bunch:

(>>=) :: IO a -> (a -> IO b) -> IO b

This combinator takes two arguments, the first is an IO operation, and the second is a function that takes as input the result of the first IO operation and returns a new IO b which is the final result.

Here are a few examples using the functions we described above:

  1. Echo

    getLine >>= (\line -> putStrLn line)
    

    We are reading a line from the standard input on the left of >>=, and receive the input to the left of >>=, and then write it to the standard output. >>='s role here is to pass the result of the IO operation on the left to the function returning an IO operation on the right.

    Notice how >>= defines an order of operations - from left to right.

    The type of each sub expression here is:

    getLine :: IO String
    
    putStrLn :: String -> IO ()
    
    (>>=) :: IO String -> (String -> IO ()) -> IO ()
    
    line :: String
    
    • Question: what is the type of the whole expression?
      AnswerIO ()

    Also note that this example above can be written in a more concise manner in point free style getLine >>= putStrLn.

  2. Appending two inputs

    getLine >>= (\honorific -> getLine >>= (\name -> putStrLn ("Hello " ++ honorific ++ " " ++ name)))
    

    This subroutine combines multiple operations together, it reads two lines from the standard input and prints a greeting. Note that:

    • Using >>= defines the order of operation from left to right
    • Because of the scoping rules in Haskell, honorific will be in scope in the body of the function for which it is its input, including the most inner function

    This is a bit hard to read, but we can remove the parenthesis and add indentation to make it a bit easier to read:

    getLine >>= \honorific ->
      getLine >>= \name ->
        putStrLn ("Hello " ++ honorific ++ " " ++ name)
    

Let's see a few more combinators!

*> and >>

(*>) :: IO a -> IO b -> IO b
(>>) :: IO a -> IO b -> IO b

*> and >> have the same type signature for IO and mean the same thing, in fact, *> is a slightly more generalized version of >> and can always be used instead of >>, which only still exists to avoid breaking backward compatibility.

*> for IO means run the first IO operation, discard the result then run the second operation. It can be implemented using >>=:

a *> b = a >>= \_ -> b

This combinator is useful when we want to run several IO operations one after the other that might not return anything meaningful, such as putStrLn:

putStrLn "hello" *> putStrLn "world"

pure and return

pure :: a -> IO a

like *> and >>, pure is a more general version of return. pure also has the advantage of not having a resemblance to an unrelated keyword in other languages.

Remember that we said IO a is description of a program that when executed will produce some value of type a and may do some I/O effects during execution?

With pure, we can build an IO a that does no I/O, and will produce a specific value of type a, the one we supply to pure!

This function is useful when we want to do some uneffectful computation that depends on IO.

For example:

confirm :: IO Bool
confirm =
  putStrLn "Are you sure? (y/n)" *>
    getLine >>= \answer ->
      case answer of
        "y" -> pure True
        "n" -> pure False
        _ ->
          putStrLn "Invalid response. use y or n" *>
            confirm

Trying to return just True or False here wouldn't work because of the type of >>=:

(>>=) :: IO a -> (a -> IO b) -> IO b

The right side of >>= in our code example (\answer -> case ...) must be of type String -> IO Bool. This is because:

  1. getLine :: IO String, so the a in the type signature of >>= should be the same as String in this instance, and
  2. confirm :: IO Bool, so b should be Bool

fmap and <$>

fmap :: (a -> b) -> IO a -> IO b

<$> is the infix version of fmap. Use it at your discretion.

What if we wanted to write a function that reads a line from stdin and returns it with ! at the end? We could write it using a combination of >>= and pure:

getLine >>= \line -> pure (line ++ "!")

The pattern is unified to the fmap function:

fmap (\line -> line ++ "!") getLine

What fmap does is apply a function to the value to be returning from the IO operation, also known as "mapping" over it.

(By the way, Have you noticed the similarities between fmap and map :: (a -> b) -> [a] -> [b]?)

Summary

Here's a list of IO combinators we ran into:

-- chaining IO operations: passing the *result* of the left IO operation
-- as an argument to the function on the right.
-- Pronounced "bind".
(>>=) :: IO a -> (a -> IO b) -> IO b

-- sequence two IO operations, discarding the payload of the first.
(*>) :: IO a -> IO b -> IO b

-- "lift" a value into IO context, does not add any I/O effects.
pure :: a -> IO a

-- "map" (or apply a function) over the payload value of an IO operation.
fmap :: (a -> b) -> IO a -> IO b

IO is first class

The beauty of IO is that it's a completely first-class construct in the language, and is not really different from Maybe, Either or Structure. We can pass it to functions, put it in a container, etc. Remember that it represents a description of a program, and without combining it into main in some way won't actually do anything. It is just a value!

Here's an example of a function that takes IO actions as input:

whenIO :: IO Bool -> IO () -> IO ()
whenIO cond action =
  cond >>= \result ->
    if result
      then action
      else pure ()

And how it can be used:

main :: IO ()
main =
  putStrLn "This program will tell you a secret" *>
    whenIO confirm (putStrLn "IO is actually pretty awesome") *>
      putStrLn "Bye"

Notice how putStrLn "IO is actually pretty awesome" isn't executed right away, but only if it is what whenIO returns, and in turn is combined with *> as part of the main expression.

Getting out of IO?

What we've seen above has great consequences to the Haskell language. In our Html type, we had a function render :: Html -> String that could turn an Html to a string value.

In Haskell, there is no way to implement a function such as execute :: IO a -> a in a way that preserves purity and equational reasoning!

Also, IO is opaque, it does not let us examine it. So we are really bound to what the Haskell API for IO allows us to do.

This means that we need to think about using IO differently!

In Haskell, once we get into IO, there is no getting out. The only thing we can do is build bigger IO computations by combining it with more IO computations.

We also can't use IO a in place of an a, so for example, we can't write getLine ++ "!" because ++ expects both sides to be Strings, and getLine's type is IO String. The types do not match! We have to use fmap and the return type must be IO String, like we've seen before.

In Haskell we like to keep IO usage minimal, and we like to push it to the edges of the program. This pattern is often called Functional core, imperative shell.

Functional core, imperative shell

In our blog generator program, we want to read a file, parse it, and convert it to HTML, and then print the result to the console.

In many programming languages, we might interleave reading from the file with parsing, and writing to the file with the HTML conversion. But we don't mix these here. Parsing operates on a String value rather than some file handle, and Html is being converted to a String rather than being written to the screen directly.

This approach of separating IO and pushing it to the edge of the program gives us a lot of flexibility. These functions without IO are easier to test and examine (because they are guaranteed to have deterministic evaluation!), and they are more modular and can work in many contexts (reading from stdin, reading from network socket, writing to an HTTP connection, and more).

This pattern is often a good approach for building Haskell programs, especially batch programs.

Building a blog generator

We'd like to start building a blog generator, and we want to have the following interface:

  1. If the user calls the program without arguments, we will read from the standard input, and write to the standard output
  2. If the user calls the program with two arguments, the first one will be the input file name which we will read, and the second one will be the output file name to which we will write
  3. If the output file already exists, we'll ask the user if they want to override the file
  4. On any other kind of input, we'll output a generic message explaining the usage as we described above

We are going to need a few functions:

getArgs :: IO [String] -- Get the program arguments

getContents :: IO String -- Read all of the content from stdin

readFile :: FilePath -> IO String -- Read all of the content from a file

writeFile :: FilePath -> String -> IO () -- Write a string into a file

doesFileExist :: FilePath -> IO Bool -- Checks whether a file exists

And the following imports:

import System.Directory (doesFileExist)
import System.Environment (getArgs)

We don't need to add the following import because Prelude already imports these functions for us:

-- imported by Prelude
import System.IO (getContents, readFile, writeFile)

  1. Implement a function process :: Title -> String -> String which will parse a document to markup, convert it to HTML, and then render the HTML to a string.

    Answer
    process :: Html.Title -> String -> String
    process title = Html.render . convert title . Markup.parse
    
  2. Try implementing the "imperative shell" for our blog generator program. Start with main, pattern match on the result of getArgs, and decide what to do. Look back at the examples above for inspiration.

    Answer
    -- Main.hs
    module Main where
    
    import qualified Markup
    import qualified Html
    import Convert (convert)
    
    import System.Directory (doesFileExist)
    import System.Environment (getArgs)
    
    main :: IO ()
    main =
      getArgs >>= \args ->
        case args of
          -- No program arguments: reading from stdin and writing to stdout
          [] ->
            getContents >>= \content ->
              putStrLn (process "Empty title" content)
    
          -- With input and output file paths as program arguments
          [input, output] ->
            readFile input >>= \content ->
              doesFileExist output >>= \exists ->
                let
                  writeResult = writeFile output (process input content)
                in
                  if exists
                    then whenIO confirm writeResult
                    else writeResult
    
          -- Any other kind of program arguments
          _ ->
            putStrLn "Usage: runghc Main.hs [-- <input-file> <output-file>]"
    
    process :: Html.Title -> String -> String
    process title = Html.render . convert title . Markup.parse
    
    confirm :: IO Bool
    confirm =
      putStrLn "Are you sure? (y/n)" *>
        getLine >>= \answer ->
          case answer of
            "y" -> pure True
            "n" -> pure False
            _ -> putStrLn "Invalid response. use y or n" *>
              confirm
    
    whenIO :: IO Bool -> IO () -> IO ()
    whenIO cond action =
      cond >>= \result ->
        if result
          then action
          else pure ()
    

Do notation

While using >>= to chain IO actions is manageable, Haskell provides an even more convenient syntactic sugar called do notation which emulates imperative programming.

A do block starts with the do keyword, and continues with one or more "statements" which can be one of the following:

  1. An expression of type IO (), such as:
    • putStrLn "Hello"
    • if True then putStrLn "Yes" else putStrLn "No"
  2. A let block, such as
    • let x = 1
    • or multiple let declarations:
      let
        x = 1
        y = 2
      
      Note that we do not write the in here.
  3. A binding <variable> <- <expresion>, such as
    line <- getLine
    

And the last "statement" must be an expression of type IO <something> - this will be the result type of the do block.

These constructs are desugared (translated) by the Haskell compiler to:

  1. <expression> *>,
  2. let ... in and
  3. <expression> >>= \<variable>

respectively.

For example:

greeting :: IO ()
greeting = do
  putStrLn "Tell me your name."
  let greet name = "Hello, " ++ name ++ "!"
  name <- getLine
  putStrLn (greet name)

Is just syntactic sugar for:

greeting :: IO ()
greeting =
  putStrLn "Tell me your name." *>
    let
      greet name = "Hello, " ++ name ++ "!"
    in
      getLine >>= \name ->
        putStrLn (greet name)

It's important to note the difference between let and <- (bind). let is used to give a new name to an expression which will be in scope for subsequent lines, and <- is used to bind the result a in an IO a action to a new name which will be in scope for subsequent lines.

code operator type of the left side type of the right side comment
let gretting = "hello"
=
String
String
Both sides are interchangeable
let mygetline = getLine
=
IO String
IO String
We just create a new name for getLine
name <- getLine
<-
String
IO String
Technically <- is not an operator, but just a syntactic sugar for >>= + lambda, where we bind the result of the computation to a variable

Do notation is very very common and is often preferable to using >>= directly.


  1. Exercise: Translate the examples in this chapter to do notation.

  2. Exercise: Translate our glue code for the blog generator to do notation.

    Solution
    -- Main.hs
    module Main where
    
    import qualified Markup
    import qualified Html
    import Convert (convert)
    
    import System.Directory (doesFileExist)
    import System.Environment (getArgs)
    
    main :: IO ()
    main = do
      args <- getArgs
      case args of
        -- No program arguments: reading from stdin and writing to stdout
        [] -> do
          content <- getContents
          putStrLn (process "Empty title" content)
    
        -- With input and output file paths as program arguments
        [input, output] -> do
          content <- readFile input
          exists <- doesFileExist output
          let
            writeResult = writeFile output (process input content)
          if exists
            then whenIO confirm writeResult
            else writeResult
    
        -- Any other kind of program arguments
        _ ->
          putStrLn "Usage: runghc Main.hs [-- <input-file> <output-file>]"
    
    process :: Html.Title -> String -> String
    process title = Html.render . convert title . Markup.parse
    
    confirm :: IO Bool
    confirm = do
      putStrLn "Are you sure? (y/n)"
      answer <- getLine
      case answer of
        "y" -> pure True
        "n" -> pure False
        _ -> do
          putStrLn "Invalid response. use y or n"
          confirm
    
    whenIO :: IO Bool -> IO () -> IO ()
    whenIO cond action = do
      result <- cond
      if result
        then action
        else pure ()
    

Summary

In this chapter we discussed what "purely functional" means, where the initial motivation for being purely functional came from, and how Haskell's I/O interface allows us to create descriptions of programs without breaking purity.

We have also achieved a major milestone. With this chapter, we implemented enough pieces that we can finally run our program on a single document and get an HTML rendered result!

You can view the code we have until now here.

However, our command-line interface is still sub-par. We want to render a blog with multiple articles, create an index page, and more. We still have more to do to be able to call our program a blog generator.

Let's keep going!

Defining a project description

Up until now we've only used base and the libraries included with GHC. Because of that we didn't really need to do anything more fancy than runghc to run our program. However, we want to start using external libraries which are not included with GHC in our programs.

External packages can be downloaded from Hackage - Haskell's central package archive, Stackage - a subset of Hackage packages that are known to work together, or even from remote git repositories. Usually Haskellers use a package manager to download and manage packages for different projects. The most popular package managers for Haskell are cabal and stack.

A major difference between the two tools is their philosophy. cabal tries to be a more minimalist tool that handles building Haskell projects, doing package management using the whole of Hackage, and uses complicated algorithms to make sure packages work together. stack tries to be a more maximalistic tool that handles installing the right GHC for each project, provides integration with external tools like hoogle, and lets the user choose which 'set' of packages (including their versions) they want to use.

If you've installed Haskell using GHCup, you most likely have cabal installed. If you've installed Haskell using stack, well, you have stack installed. Check the haskell.org downloads page if that's not the case.

Creating a project

Using external packages can be done in multiple ways. For quick experimentation, we can just ask stack or cabal to build or even run our program with external packages. But as programs get larger, use more dependencies, and require more functionality, it is better to just create a project description for our programs and even libraries.

Describing packages is done in a cabal file. We can ask cabal or stack to generate one for use using cabal init --libandexe or stack new, along with many other files, but we will likely need to edit the file by hand later so let's just paste some sort of an initial example and then edit it.

cabal-version:       2.4

name:                name should match with <name>.cabal
version:             version should use PvP
synopsis:            Synopsis will appear in the hackage package listing and search
description:         The description will appear at the top of a library
homepage:            Homepage url
bug-reports:         issue-tracker url
license:             License name
license-file:        License file
author:              Author name
maintainer:          Maintainer email
category:            Hackage categories, separated by commas
extra-doc-files:
  README.md

common common-settings
  default-language: Haskell2010
  ghc-options:
    -Wall

library
  import: common-settings
  hs-source-dirs: src
  build-depends:
      base
    , directory
  exposed-modules:
    HsBlog
      HsBlog.Convert
      HsBlog.Html
        HsBlog.Html.Internal
      HsBlog.Markup
  -- other-modules:

executable hs-blog-gen
  import: common-settings
  hs-source-dirs: app
  main-is: Main.hs
  build-depends:
      base
    , <package-name>
  ghc-options:
    -O

Let's break it down to a few parts, the package metadata, common settings, library and executable.

Package metadata

The first part should be fairly straightforward from the comments, maybe except for:

Let's fill this with the metadata of our project:

cabal-version:       2.4

name:                hs-blog
version:             0.1.0.0
synopsis:            A custom blog generator from markup files
description:         This package provides a static blog generator
                     from a custom markup format to HTML.
                     It defines a parser for this custom markup format
                     as well as an html pretty printer EDSL.

                     It is used as the example project in the online book
                     'Learn Haskell Blog Generator'. See the README for
                     more details.
homepage:            https://github.com/soupi/learn-haskell-blog-generator
bug-reports:         https://github.com/soupi/learn-haskell-blog-generator/issues
license:             BSD-3-Clause
license-file:        LICENSE.txt
author:              Gil Mizrahi
maintainer:          gilmi@posteo.net
category:            Learning, Web
extra-doc-files:
  README.md

Common settings

Cabal package descriptions can include multiple "targets": libraries, executables, and test suites. Since Cabal 2.2, we can use common stanzas to group settings to be shared between different targets, so we don't have to repeat them for each target.

In our case we've created a new common stanza (or block) called common-settings and defined the default language (Haskell has two standards, 98 and 2010), and instructed GHC to compile with -Wall.

common common-settings
  default-language: Haskell2010
  ghc-options:
    -Wall

Later, in our targets' descriptions, we can add import: common-settings , and all of these settings will be automatically added.

Library

In a library target, we define:

  • The settings with which to build the library (in this case we just import common-settings)
  • The directory in which the source files can be found
  • The packages we require to build the library
  • The modules exposed from the library and can be used by others
  • The modules not exposed from the library and which cannot be used by others; these could be any module you don't wish to export, such as an internal utility functions module. In our case we don't have anything like this, so we commented out the other-modules label.

Note that it is common to specify version bounds for packages. Version bounds specify which package versions this library works with. These can also be generated using cabal with the cabal gen-bounds command.

library
  import: common-settings
  hs-source-dirs: src
  build-depends:
      base
    , directory
  exposed-modules:
    HsBlog
      HsBlog.Convert
      HsBlog.Html
        HsBlog.Html.Internal
      HsBlog.Markup
  -- other-modules:

Also note that we've added an additional hierarchy for our modules and defined a different source directory. This means we will need to move the files around a bit and change the module name in each file and the import statements. This is to avoid conflict with other packages that a user might import.


Do this now.

Solution
  1. Main.hs -> src/HsBlog.hs

    module HsBlog
      ( main
      , process
      )
      where
    
    import qualified HsBlog.Markup as Markup
    import qualified HsBlog.Html as Html
    import HsBlog.Convert (convert)
    
  2. Convert.hs -> src/HsBlog/Convert.hs

    module HsBlog.Convert where
    
    import qualified HsBlog.Markup as Markup
    import qualified HsBlog.Html as Html
    
  3. Html.hs -> src/HsBlog/Html.hs

    module HsBlog.Html
    ...
    
    import HsBlog.Html.Internal
    
  4. Html/Internal.hs -> src/HsBlog/Html/Internal.hs

    module HsBlog.Html.Internal where
    
  5. Markup.hs -> src/HsBlog/Markup.hs

    module HsBlog.Markup
    

Executable

We have separated our code into two sections: a library and an executable, why?

First, libraries can be used by others. If we publish our code and someone wants to use it and build upon it, they can. Executables can't be imported by other projects. Second, we can write unit tests for libraries. It is usually benefitical to write most, if not all, of our logic as a library, and provide a thin executable over it.

Executables' descriptions are very similar to libraries, here we define:

  • The name of the executable
  • Where the source directory for this application is
  • Which file is the 'Main' file
  • Import our library, which is named hs-blog
  • Add additional flag for GHC: -O to compile with optimizations
executable hs-blog-gen
  import: common-settings
  hs-source-dirs: app
  main-is: Main.hs
  build-depends:
      base
    , hs-blog
  ghc-options:
    -O

We can write many executables descriptions. In this case we only need one.


Exercise: Add a new file: app/Main.hs which imports HsBlog and runs main.

Solution
-- app/Main.hs

module Main where

import qualified HsBlog

main :: IO ()
main = HsBlog.main

Test-suites

test-suite defines a target for running package tests. We will get back to it in a later chapter.

Our complete .cabal file

cabal-version:       2.4

name:                hs-blog
version:             0.1.0.0
synopsis:            A custom blog generator from markup files
description:         This package provides a static blog generator
                     from a custom markup format to HTML.
                     It defines a parser for this custom markup format
                     as well as an html pretty printer EDSL.

                     It is used as the example project in the online book
                     'Learn Haskell Blog Generator'. See the README for
                     more details.
homepage:            https://github.com/soupi/learn-haskell-blog-generator
bug-reports:         https://github.com/soupi/learn-haskell-blog-generator/issues
license:             BSD-3-Clause
license-file:        LICENSE.txt
author:              Gil Mizrahi
maintainer:          gilmi@posteo.net
category:            Learning, Web
extra-doc-files:
  README.md

common common-settings
  default-language: Haskell2010
  ghc-options:
    -Wall

library
  import: common-settings
  hs-source-dirs: src
  build-depends:
      base
    , directory
  exposed-modules:
    HsBlog
      HsBlog.Convert
      HsBlog.Html
        HsBlog.Html.Internal
      HsBlog.Markup
  -- other-modules:

executable hs-blog-gen
  import: common-settings
  hs-source-dirs: app
  main-is: Main.hs
  build-depends:
      base
    , hs-blog
  ghc-options:
    -O

We'll also add a README.md file and a LICENSE.txt file:

README.md

Just write whatever you want here:

# hs-blog

One day it will be a static blog generator.

[Read the book](https://soupi.github.io/learn-haskell-blog-generator).
LICENSE.txt

This is BSD-3-Clause with me as the author. Please write your own name for your projects :)

BSD 3-Clause License

Copyright (c) 2021, Gil Mizrahi
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
   list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its
   contributors may be used to endorse or promote products derived from
   this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

cabal.project and stack.yaml

The cabal.project and stack.yaml files are used by cabal and stack respectively to add additional information on how to build the package. While cabal.project isn't necessary to use cabal, stack.yaml is necessary in order to use stack, so we will cover it briefly.

There are two important fields a stack.yaml file must have:

  • resolver: Describes which snapshot to use for packages and ghc version. We will choose the latest (at time of writing) on the lts branch: lts-18.9. Visit this link to find out which packages this snapshot includes, what their versions are, and which GHC version is used with this snapshot.
  • packages: Describes the location of packages we plan to build. In our case we have only one and it can be found in the current directory.

We'll add stack.yaml to our project directory:

resolver: lts-18.9

packages:
- .

For additional options and configurations, please consult the relevant user guides.

Usage

Now, instead of manually running runghc Main.hs, we will use either stack or cabal to build and run our program and package (I mostly use stack, but it's up to you).

For cabal:

Building the project - on the first run, cabal will download the package dependencies and use the GHC on PATH to build the project.

Cabal caches packages between projects, so if a new project uses the same packages with the same versions (and the same flag settings) they will not need to be reinstalled. cabal commands are usually prefixed with v2- to note that we want to use the new build system implementation.

In older version of cabal, packages could be installed either globally, or in sandboxes. In each sandbox (and globally) there could only be one version of a package installed, and users would usually create different sandboxes for different projects, without caching packages between projects.

With the new build system implementation, multiple versions of the same package can be installed globally, and for each project cabal will (try to) choose a specific version for each package dependency such that they all work together, without needing sandboxing. This change helps us increase sharing of built packages while avoiding conflicts and manual handling of sandboxes.

A few important commands we should be familiar with:

cabal v2-update

v2-update fetches information from remote package repositories (specifically Hackage unless specified otherwise) and updates the local package index which includes various information about available packages such as their names, versions and dependencies.

Usually the first command to run before fetching package dependencies.

cabal v2-build

v2-build compiles the various targets (such as library and executables). It will also fetch and install the package dependencies when they're not already installed.

cabal v2-run hs-blog-gen -- <program arguments>

v2-run Can be used to compile and then run a target (in our case our executable which we named hs-blog-gen). We separate arguments passed to cabal and arguments passed to our target program with --.

cabal v2-repl hs-blog

v2-repl runs ghci in the context of the target (in our case our library which we named hs-blog) - it will load the target's package dependencies and modules to be available in ghci.

cabal v2-clean

v2-clean Deletes the build artifacts that we built.

There are more interesting commands we could use, such as cabal v2-freeze to generate a file which records the packages versions and flags we used to build this project, and cabal v2-sdist to bundle the project source to a package tarball which can be uploaded to Hackage. If you'd like to learn more visit the Cabal user guide.

For stack:

Building the project - on the first run, stack will install the right GHC for this project which is specified by the resolver field in the stack.yaml file, download the package dependencies, and compile the project.

Stack caches these installations between projects that use the same resolver, so future projects with the same resolver and future runs of this project won't require reinstallation. This approach is kind of a middle ground between full packages sharing and sandboxes.

Let's look at the (somewhat) equivalent commands for Stack:

stack build

build will compile the project as described above - installing GHC and package dependencies if they are not installed.

stack exec hs-blog-gen -- <program arguments>

exec will run the executable passing the program arguments to our executable.

stack ghci hs-blog

ghci runs ghci in the context of our library hs-blog - loading the library modules and packages.

stack clean

clean cleans up build artifacts.

The Stack user guide contains more information about how stack works and how to use it effectively.

Build artifacts

Both stack and cabal create build artifacts that we will not want to track using our version control. These build artifacts are found in the dist, dist-newstyle and .stack-work directories. We can add these to a .gitignore file (or similar for other version control programs) to ignore them:

dist
dist-newstyle
.stack-work

Finding packages

Finding packages isn't a very straightforward process at the moment. People have written on how they choose packages, recommendation lists, and more.

My suggestion is:

  • Search for a tutorial on something you'd like to do, and see which packages come up
  • Use the download amount on Hackage as an indication of package popularity
  • Use Stackage package synopses to locate a relevant package
  • Check social network channels for recommendations, but know that sometimes people tend to recommend inappropriate solutions and packages that might be too complicated or still experimental

It's also important to note the amount of dependencies a package has. Adding many dependencies will affect compilation time and code size. And it can sometimes be a good thing to consider when comparing packages, or considering whether a package is needed at all.

Summary

We've created a package description for our library and used stack or/and cabal to build our program. In future chapters we'll start adding external packages, we'll only have to add them to the build-depends section in the cabal file and our package manager will download and install the required package for us!

We've made some change to our project directory, and it should now look like this:

.
├── app
│   └── Main.hs
├── hs-blog.cabal
├── LICENSE.txt
├── README.md
├── src
│   ├── HsBlog
│   │   ├── Convert.hs
│   │   ├── Html
│   │   │   └── Internal.hs
│   │   ├── Html.hs
│   │   └── Markup.hs
│   └── HsBlog.hs
└── stack.yaml

4 directories, 10 files

Note that this package format is something we could release on Hackage for other Haskell developers to use!

You can view the git commit of the changes we've made and the code up until now.

Fancy options parsing

We'd like to define a nicer interface for our program. And while we could manage something ourselves with getArgs and pattern matching, using a library for this case is easier. We are going to use a package called optparse-applicative.

optparse-applicative provides us with an EDSL (yes, another one) to build command arguments parsers. Things like commands, switches, and flags can be built and composed together to make a parser for command-line arguments without actually writing operations on strings as we did when we wrote our Markup parser, and will provide other benefits such as automatic generation of usage lines, help screens, error reporting, and more.

While optparse-applicative's dependency footprint isn't very large, it is likely that a user of our library wouldn't need command-line parsing in this particular case, so it makes sense to add this dependency on the executable rather than the library in the cabal file:

executable hs-blog-gen
  import: common-settings
  hs-source-dirs: app
  main-is: Main.hs
  build-depends:
      base
    , optparse-applicative
    , hs-blog
  ghc-options:
    -O

Building a command-line parser

The optparse-applicative package has pretty decent documentation, but we will cover a few important things to pay attention to in this chapter.

In general, there are four important things we need to do:

  1. Define our model - we want to define an ADT that describes the various options and commands for our program.

  2. Define a parser that will produce our value of our model type when run

  3. Run the parser on our program arguments input

  4. Pattern match on the model and call the right operations according to the options

Define a model

Let's envision our command-line interface for a second, what would we like it to look like?

We want to be able to convert a single file or input stream and produce either a file or an output stream, or we want to process a whole directory and create a new directory. We can model it in an ADT like this:

data Options
  = ConvertSingle SingleInput SingleOutput
  | ConvertDir FilePath FilePath
  deriving Show

data SingleInput
  = Stdin
  | InputFile FilePath
  deriving Show

data SingleOutput
  = Stdout
  | OutputFile FilePath
  deriving Show

Note that we could technically also use Maybe FilePath to encode both SingleInput and SingleOutput, but then we would have to remember what Nothing meant in each context. By creating a new type with properly named constructors for each option we make it easier for readers of the code to understand the meaning of our code.

In terms of interface, we could decide that when a user would like to convert a single input source, they would use the convert command, and supply the optional flags --input FILEPATH and --output FILEPATH to read or write from a file. When the user does not supply one or both flag, we will read or write from the standard input/output accordingly instead.

If the user would like to convert a directory, they can use the convert-dir command and supply the two mandatory flags --input FILEPATH and --output FILEPATH.

Build a parser

This is the most interesting part of the process. How do we build a parser that fits our model?

The optparse-applicative library introduces a new type called Parser. Parser, similar to Maybe and IO, has the kind * -> * - when it is supplied with a saturated (or concrete) type such as Int, Bool or Options, it can become a saturated type (one that has values).

A Parser a represents a specification of a parser for a set of options that will produce a value of type a when the command-line arguments are successfully parsed. This is a bit similar to how IO a represents a description of a program that can produce a value of type a. The main difference between these two types is that while we can't really convert an IO a to an a (we just chain IO operations and have the Haskell runtime execute them), we can convert a Parser a to a function that takes a list of strings representing the program arguments and produces an a if it manages to parse the arguments.

As we've seen with previous EDSLs, this library uses the combinator pattern as well. We need to consider what are the basic primitives for building a parser, and what are the methods of composing small parsers into bigger parsers.

Let's see an example for a small parser:

inp :: Parser FilePath
inp =
  strOption
    ( long "input"
      <> short 'i'
      <> metavar "FILE"
      <> help "Input file"
    )

out :: Parser FilePath
out =
  strOption
    ( long "output"
      <> short 'o'
      <> metavar "FILE"
      <> help "Output file"
    )

strOption is a parser builder. It is a function that takes option modifiers as an argument, and returns a parser that will parse a string. We can specify the type to be FilePath because FilePath is an alias to String. The parser builder describes how to parse the value, and the modifiers describe its properties, such as the flag name, the shorthand of the flag name, and how it would be described in the usage and help messages.

Actually strOption can return any string type that implements the interface IsString. There are a few such types, such as Text, a much more efficient Unicode text type from the text package. It is more efficient than String because while String is implemented as a linked list of Char, Text is implemented as an array of bytes. Text is usually what we should use for text values. We haven't been using it up until now because it is slightly less ergonomic to use than String. But it is often the preferred type to use for text!

As you can see, modifiers can be composed using the <> function, which means they are an instances of Semigroup!

With such interface it means that we don't have to supply all of the modifier options, we can just use the ones that are relevant. So if we don't want to have a shortened flag name, we don't have to add it.

Functor

For the data type we've defined, having Parser FilePath takes us a good step in the right direction, but it is not exactly what we need for a ConvertSingle. We need a Parser SingleInput and a Parser SingleOutput. If we had a FilePath, we could convert it into SingleInput by using the InputFile constructor. Remember, InputFile is also a function:

InputFile :: FilePath -> SingleInput
OutputFile :: FilePath -> SingleOutput

However, to convert a parser, we need functions with these types:

f :: Parser FilePath -> Parser SingleInput
g :: Parser FilePath -> Parser SingleOutput

Fortunately, the Parser interface provides us with a function to "lift" a function like FilePath -> SingleInput to work on parsers, making it a function with the type Parser FilePath -> Parser SingleInput. Of course, this function will work for any input and output, so if we have a function with the type a -> b, we can pass it to that function and get a new function of the type Parser a -> Parser b.

This function is called fmap:

fmap :: (a -> b) -> Parser a -> Parser b

-- Or with its infix version
(<$>)  :: (a -> b) -> Parser a -> Parser b

We've seen fmap before in the interface of other types:

fmap :: (a -> b) -> [a] -> [b]

fmap :: (a -> b) -> IO a -> IO b

fmap is a type class function like <> and show. It belongs to the type class Functor:

class Functor f where
  fmap :: (a -> b) -> f a -> f b

And it has the following laws:

-- 1. Identity law:
--    if we don't change the values, nothing should change
fmap id = id

-- 2. Composition law:
--    Composing the lifted functions is the same a composing
--    them after fmap
fmap (f . g) == fmap f . fmap g

Any type f that can implement fmap and follow these laws can be an instance of functor.

Notice how f has a kind * -> *, we can infer the kind of f by looking at the other types in the type signature of fmap:

  1. a and b have the kind * because they are used as arguments/return types of functions
  2. f a has the kind * because it is used as an argument to a function, therefore
  3. f has the kind * -> *

Let's choose a data type and see if we can implement a Functor instance. We need to choose a data type that has the kind * -> *. Maybe fits the bill. We need to implement a function fmap :: (a -> b) -> Maybe a -> Maybe b. Here's one very simple (and wrong) implementation:

mapMaybe :: (a -> b) -> Maybe a -> Maybe b
mapMaybe func maybeX = Nothing

check it yourself! It compiles and everything! But unfortunately it does not satisfy the first law. fmap id = id means that mapMaybe id (Just x) == Just x, however from the definition we can clearly see that mapMaybe id (Just x) == Nothing.

This is a good example of how Haskell doesn't help us make sure the laws are satisfied, and why they are important. Unlawful Functor instances will behave differently than we'd expect a Functor to behave. Let's try again!

mapMaybe :: (a -> b) -> Maybe a -> Maybe b
mapMaybe func maybeX =
  case maybeX of
    Nothing -> Nothing
    Just x -> Just (func x)

This mapMaybe will satisfy the functor laws. This can be proved by doing algebra - if we can do substitution and reach the other side of the equation in each law, then the law holds.

Functor is a very important type class, and many types implement this interface. As we know, IO, Maybe, [] and Parser all have the kind * -> *, and all allows us to map over their "payload" type.

Often people try to look for analogies and metaphors to what a type class mean, but type classes with funny names like Functor don't usually have an analogy or a metaphor that fits them in all cases. It is easier to give up on the metaphor and think about it as it is - an interface with laws.

We can use fmap on Parser to make a parser that returns FilePath return a SingleInput or SingleOutput instead:

pInputFile :: Parser SingleInput
pInputFile = fmap InputFile parser
  where
    parser =
      strOption
        ( long "input"
          <> short 'i'
          <> metavar "FILE"
          <> help "Input file"
        )

pOutputFile :: Parser SingleOutput
pOutputFile = OutputFile <$> parser -- fmap and <$> are the same
  where
    parser =
      strOption
        ( long "output"
          <> short 'o'
          <> metavar "FILE"
          <> help "Output file"
        )

Applicative

Now that we have two parsers, pInputFile :: Parser SingleInput and pOutputFile :: Parser SingleOutput, we want to combine them as Options. Again, if we only had SingleInput and SingleOutput, we could use the constructor ConvertSingle:

ConvertSingle :: SingleInput -> SingleOutput -> Options

Can we do a similar trick to the one we saw before with fmap? Does a function exist that can lift a binary function to work on Parsers instead? One with this type signature:

???
  :: (SingleInput -> SingleOutput -> Options)
  -> (Parser SingleInput -> Parser SingleOutput -> Parser Options)

Yes. This function is called liftA2 and it is from the Applicative type class. Applicative (also known as applicative functor) has three primary functions:

class Functor f => Applicative f where
  pure :: a -> f a
  liftA2 :: (a -> b -> c) -> f a -> f b -> f c 
  (<*>) :: f (a -> b) -> f a -> f b

Applicative is another very popular type class with many instances.

Just like any Monoid is a Semigroup, any Applicative is a Functor. This means that any type that wants to implement the Applicative interface should also implement the Functor interface.

Beyond what a regular functor can do, which is to lift a function over a certain f, applicative functors allow us to apply a function to multiple instances of a certain f, as well as "lift" any value of type a into an f a.

You should already be familiar with pure, we've seen it when we talked about IO. For IO, pure lets us create an IO action that would return a specific value without doing IO. With pure for Parser, we can create a Parser that when run will return a specific value as output.

liftA2 and <*> are two functions that can be implemented in terms of one another. <*> is actually the more useful one between the two. Because when combined with fmap (or rather the infix version <$>), it can be used to apply a function with many arguments over many values of the same type which is an instance of an applicative functor.

To combine our two parsers to one, we can use either liftA2 or a combination of <$> and <*>:

-- with liftA2
pConvertSingle :: Parser Options
pConvertSingle =
  liftA2 ConvertSingle pInputFile pOutputFile

-- with <$> and <*>
pConvertSingle :: Parser Options
pConvertSingle =
  ConvertSingle <$> pInputFile <*> pOutputFile

Note that both <$> and <*> associate to the left, so we have invisible parenthesis that look like this:

pConvertSingle :: Parser Options
pConvertSingle =
  (ConvertSingle <$> pInputFile) <*> pOutputFile

Let's take a deeper look at the types of the sub-expressions we have here, to prove that this type-checks:

pConvertSingle :: Parser Options

pInputFile :: Parser SingleInput
pOutputFile :: Parser SingleOutput

ConvertSingle :: SingleInput -> SingleOutput -> Options

(<$>) :: (a -> b) -> Parser a -> Parser b
  -- Specifically, here `a` is `SingleInput`
  -- and `b` is `SingleOutput -> Options`,

ConvertSingle <$> pInputFile :: Parser (SingleOutput -> Options)

(<*>) :: Parser (a -> b) -> Parser a -> Parser b
  -- Specifically, here `a -> b` is `SingleOutput -> Options`
  -- so `a` is `SingleOutput` and `b` is `Options`

-- So we get:
(ConvertSingle <$> pInputFile) <*> pOutputFile :: Parser Options

With <$> and <*> we can chain as many parsers (or any applicative really) as we want. This is because of two things: currying and parametric polymorphism: Because functions in Haskell take exactly one argument and return exactly one, any multiple argument function can be represented as a -> b.

You can find the laws for the applicative functors in this article called Typeclassopedia, which talks about various useful type classes and their laws.

Applicative functors are a very important concept and will appear in various parser interfaces (not just for command-line arguments, but also JSON parsers and general parsers), I/O, concurrency, non-determinism, and more. The reason this library is called optparse-applicative is because it uses the Applicative interface as the main API for constructing parsers.


Exercise: create a similar interface for the ConvertDir constructor of Options.

Solution
pInputDir :: Parser FilePath
pInputDir =
  strOption
    ( long "input"
      <> short 'i'
      <> metavar "DIRECTORY"
      <> help "Input directory"
    )

pOutputDir :: Parser FilePath
pOutputDir =
  strOption
    ( long "output"
      <> short 'o'
      <> metavar "DIRECTORY"
      <> help "Output directory"
    )

pConvertDir :: Parser Options
pConvertDir =
  ConvertDir <$> pInputDir <*> pOutputDir

Alternative

One thing we forgot about is that each input and output for ConvertSingle could also potentially use the standard input and output instead. Up until now we only offered one option: reading from or writing to a file by specifying the flags --input and --output. However, we'd like to make these flags optional, and when they are not specified, use the alternative standard i/o. We can do that by using the function optional from Control.Applicative:

optional :: Alternative f => f a -> f (Maybe a)

optional works on types which implement instances of the Alternative type class:

class Applicative f => Alternative f where 
  (<|>) :: f a -> f a -> f a
  empty :: f a 

Alternative looks very similar to the Monoid type class, but it works on applicative functors. This type class isn't very common and is mostly used for parsing libraries as far as I know. It provides us with an interface to combine two Parsers - if the first one fails to parse, we try the other. It also provides other useful functions such as optional, which will help us with our case:

pSingleInput :: Parser SingleInput
pSingleInput =
  fromMaybe Stdin <$> optional pInputFile

pSingleOutput :: Parser SingleOutput
pSingleOutput =
  fromMaybe Stdout <$> optional pOutputFile

Note that with fromMaybe :: a -> Maybe a -> a we can extract the a out of the Maybe by supplying a value for the Nothing case.

Now we can use these more appropriate functions in pConvertSingle instead:

pConvertSingle :: Parser Options
pConvertSingle =
  ConvertSingle <$> pSingleInput <*> pSingleOutput

Commands and subparsers

We currently have two possible operations in our interface, convert a single source, or convert a directory. A nice interface for selecting the right operation would be via commands. If the user would like to convert a single source, they can use convert, for a directory, convert-dir.

We can create a a parser with commands with the subparser and command functions:

subparser :: Mod CommandFields a -> Parser a

command :: String -> ParserInfo a -> Mod CommandFields a

subparser takes command modifiers (which can be constructed with the command function) as input, and produces a Parser. command takes the command name (in our case "convert" or "convert-dir") and a ParserInfo a, and produces a command modifier. As we've seen before these modifiers have a Monoid instance and they can be composed, meaning that we can append multiple commands to serve as alternatives.

A ParserInfo a can be constructed with the info function:

info :: Parser a -> InfoMod a -> ParserInfo a

This function wraps a Parser with some additional information such as a helper message, description, and more, so that the program itself and each sub command can print some additional information.

Let's see how to construct a ParserInfo:

pConvertSingleInfo :: ParserInfo Options
pConvertSingleInfo =
  info
    (helper <*> pConvertSingle)
    (progDesc "Convert a single markup source to html")

Note that helper adds a helper output screen in case the parser fails.

Let's also build a command:

pConvertSingleCommand :: Mod CommandFields Options
pConvertSingleCommand =
  command "convert" pConvertSingleInfo

Try creating a Parser Options combining the two options with subparser.

Solution
pOptions :: Parser Options
pOptions =
  subparser
    ( command
      "convert"
      ( info
        (helper <*> pConvertSingle)
        (progDesc "Convert a single markup source to html")
      )
      <> command
      "convert-dir"
      ( info
        (helper <*> pConvertDir)
        (progDesc "Convert a directory of markup files to html")
      )
    )

ParserInfo

Since we finished building a parser, we should wrap it up in a ParserInfo and add some information to it to make it ready to run.

opts :: ParserInfo Options
opts =
  info (helper <*> pOptions)
    ( fullDesc
      <> header "hs-blog-gen - a static blog generator"
      <> progDesc "Convert markup files or directories to html"
    )

Running a parser

optparse-applicative provides a non-IO interface to parse arguments, but the most convenient way to use it is to let it take care of fetching program arguments, try to parse them, and throw errors and help messages in case it fails. This can be done with the function execParser :: ParserInfo a -> IO a.

We can place all this options parsing stuff in a new module and then import it from app/Main.hs. Let's do that. Here's what we have up until now:

app/OptParse.hs
-- | Command-line options parsing

module OptParse
  ( Options(..)
  , SingleInput(..)
  , SingleOutput(..)
  , parse
  )
  where

import Data.Maybe (fromMaybe)
import Options.Applicative

------------------------------------------------
-- * Our command-line options model

-- | Model
data Options
  = ConvertSingle SingleInput SingleOutput
  | ConvertDir FilePath FilePath
  deriving Show

-- | A single input source
data SingleInput
  = Stdin
  | InputFile FilePath
  deriving Show

-- | A single output sink
data SingleOutput
  = Stdout
  | OutputFile FilePath
  deriving Show

------------------------------------------------
-- * Parser

-- | Parse command-line options
parse :: IO Options
parse = execParser opts

opts :: ParserInfo Options
opts =
  info (pOptions <**> helper)
    ( fullDesc
      <> header "hs-blog-gen - a static blog generator"
      <> progDesc "Convert markup files or directories to html"
    )

-- | Parser for all options
pOptions :: Parser Options
pOptions =
  subparser
    ( command
      "convert"
      ( info
        (helper <*> pConvertSingle)
        (progDesc "Convert a single markup source to html")
      )
      <> command
      "convert-dir"
      ( info
        (helper <*> pConvertDir)
        (progDesc "Convert a directory of markup files to html")
      )
    )

------------------------------------------------
-- * Single source to sink conversion parser

-- | Parser for single source to sink option
pConvertSingle :: Parser Options
pConvertSingle =
  ConvertSingle <$> pSingleInput <*> pSingleOutput

-- | Parser for single input source
pSingleInput :: Parser SingleInput
pSingleInput =
  fromMaybe Stdin <$> optional pInputFile

-- | Parser for single output sink
pSingleOutput :: Parser SingleOutput
pSingleOutput =
  fromMaybe Stdout <$> optional pOutputFile

-- | Input file parser
pInputFile :: Parser SingleInput
pInputFile = fmap InputFile parser
  where
    parser =
      strOption
        ( long "input"
          <> short 'i'
          <> metavar "FILE"
          <> help "Input file"
        )

-- | Output file parser
pOutputFile :: Parser SingleOutput
pOutputFile = OutputFile <$> parser
  where
    parser =
      strOption
        ( long "output"
          <> short 'o'
          <> metavar "FILE"
          <> help "Output file"
        )

------------------------------------------------
-- * Directory conversion parser

pConvertDir :: Parser Options
pConvertDir =
  ConvertDir <$> pInputDir <*> pOutputDir

-- | Parser for input directory
pInputDir :: Parser FilePath
pInputDir =
  strOption
    ( long "input"
      <> short 'i'
      <> metavar "DIRECTORY"
      <> help "Input directory"
    )

-- | Parser for output directory
pOutputDir :: Parser FilePath
pOutputDir =
  strOption
    ( long "output"
      <> short 'o'
      <> metavar "DIRECTORY"
      <> help "Output directory"
    )

Pattern matching on Options

After running the command-line arguments parser, we can pattern match on our model and call the right functions. Currently, our program does not expose this kind of API. So let's go to our src/HsBlog.hs module and change the API. We can delete main from that file and add two new functions instead:

convertSingle :: Html.Title -> Handle -> Handle -> IO ()

convertDirectory :: FilePath -> FilePath -> IO ()

Handle is an I/O abstraction over file system objects, including stdin and stdout. Before, we used writeFile and getContents - these functions either get a FilePath to open and work on, or they assume the Handle is the standard I/O. We can use the explicit versions that take a Handle from System.IO instead:

convertSingle :: Html.Title -> Handle -> Handle -> IO ()
convertSingle title input output = do
  content <- hGetContents input
  hPutStrLn output (process title content)

We will leave convertDirectory unimplemented for now and implement it in the next chapter.

In app/Main.hs, we will need to pattern match on the Options and prepare to call the right functions from HsBlog.

Let's look at our full app/Main.hs and src/HsBlog.hs:

app/Main.hs
-- | Entry point for the hs-blog-gen program

module Main where

import OptParse
import qualified HsBlog

import System.Exit (exitFailure)
import System.Directory (doesFileExist)
import System.IO

main :: IO ()
main = do
  options <- parse
  case options of
    ConvertDir input output ->
      HsBlog.convertDirectory input output

    ConvertSingle input output -> do
      (title, inputHandle) <-
        case input of
          Stdin ->
            pure ("", stdin)
          InputFile file ->
            (,) file <$> openFile file ReadMode

      outputHandle <-
        case output of
          Stdout -> pure stdout
          OutputFile file -> do
            exists <- doesFileExist file
            shouldOpenFile <-
              if exists
                then confirm
                else pure True
            if shouldOpenFile
              then
                openFile file WriteMode
              else
                exitFailure

      HsBlog.convertSingle title inputHandle outputHandle
      hClose inputHandle
      hClose outputHandle

------------------------------------------------
-- * Utilities

-- | Confirm user action
confirm :: IO Bool
confirm =
  putStrLn "Are you sure? (y/n)" *>
    getLine >>= \answer ->
      case answer of
        "y" -> pure True
        "n" -> pure False
        _ -> putStrLn "Invalid response. use y or n" *>
          confirm
src/HsBlog.hs
-- HsBlog.hs
module HsBlog
  ( convertSingle
  , convertDirectory
  , process
  )
  where

import qualified HsBlog.Markup as Markup
import qualified HsBlog.Html as Html
import HsBlog.Convert (convert)

import System.IO

convertSingle :: Html.Title -> Handle -> Handle -> IO ()
convertSingle title input output = do
  content <- hGetContents input
  hPutStrLn output (process title content)

convertDirectory :: FilePath -> FilePath -> IO ()
convertDirectory = error "Not implemented"

process :: Html.Title -> String -> String
process title = Html.render . convert title . Markup.parse

We need to make a few small changes to the cabal file.

First, we need to add the dependency directory to the executable, because we use the library System.Directory in Main.

Second, we need to list OptParse in the list of modules in the executable. Below main-is: Main.hs, add:

  other-modules:
    OptParse

Summary

We've learned about a new fancy library called optparse-applicative and used it to create a fancier command-line interface in a declarative way. See the result of running hs-blog-gen --help:

hs-blog-gen - a static blog generator

Usage: hs-blog-gen COMMAND
  Convert markup files or directories to html

Available options:
  -h,--help                Show this help text

Available commands:
  convert                  Convert a single markup source to html
  convert-dir              Convert a directory of markup files to html

Along the way we've learned two powerful new abstractions, Functor and Applicative. As well as revisited an abstraction we were familiar with called Monoid. With this library we've seen (another) example of the usefulness of these abstractions for constructing APIs and EDSLs.

We will continue to meet these abstractions in the rest of the book.


Bonus exercise: Add another flag named --replace to indicate that if the output file or directory already exists, it's okay to replace them.


You can view the git commit of the changes we've made and the code up until now.

Handling errors and multiple files

We have left an unimplemented function last chapter, and there are a few more things left for us to do to actually call our program a static blog generator. We still need to process multiple files in a directory and create an index landing page with links to other pages.

Our HTML EDSL currently does not support links or other content modifiers such as bold and italics. We should add these so we can use them when creating an index.

Up until now we've passed String to Structure creating functions such as p_ and h_. Instead, we could create and pass them a new type, Content, which can be regular text, links, images, and so on.


Exercise: implement what we've just discussed. Follow the compiler errors and refactor what needs refactoring.

Solution
src/Html/Internal.hs
-- Html/Internal.hs
module HsBlog.Html.Internal where

import Numeric.Natural

-- * Types

newtype Html
  = Html String

newtype Structure
  = Structure String

newtype Content
  = Content String

type Title
  = String

-- * EDSL

html_ :: Title -> Structure -> Html
html_ title content =
  Html
    ( el "html"
      ( el "head" (el "title" (escape title))
        <> el "body" (getStructureString content)
      )
    )

-- * Structure

p_ :: Content -> Structure
p_ = Structure . el "p" . getContentString

h1_ :: Content -> Structure
h1_ = Structure . el "h1" . getContentString

h_ :: Natural -> Content -> Structure
h_ n = Structure . el ("h" <> show n) . getContentString

ul_ :: [Structure] -> Structure
ul_ =
  Structure . el "ul" . concat . map (el "li" . getStructureString)

ol_ :: [Structure] -> Structure
ol_ =
  Structure . el "ol" . concat . map (el "li" . getStructureString)

code_ :: String -> Structure
code_ = Structure . el "pre" . escape

instance Semigroup Structure where
  (<>) c1 c2 =
    Structure (getStructureString c1 <> getStructureString c2)

instance Monoid Structure where
  mempty = Structure ""

-- * Content

txt_ :: String -> Content
txt_ = Content . escape

link_ :: FilePath -> Content -> Content
link_ path content =
  Content $
    elAttr
      "a"
      ("href=\"" <> escape path <> "\"")
      (getContentString content)

img_ :: FilePath -> Content
img_ path =
  Content $ "<img src=\"" <> escape path <> "\">"

b_ :: Content -> Content
b_ content =
  Content $ el "b" (getContentString content)

i_ :: Content -> Content
i_ content =
  Content $ el "i" (getContentString content)

instance Semigroup Content where
  (<>) c1 c2 =
    Content (getContentString c1 <> getContentString c2)

instance Monoid Content where
  mempty = Content ""

-- * Render

render :: Html -> String
render html =
  case html of
    Html str -> str

-- * Utilities

el :: String -> String -> String
el tag content =
  "<" <> tag <> ">" <> content <> "</" <> tag <> ">"

elAttr :: String -> String -> String -> String
elAttr tag attrs content =
  "<" <> tag <> " " <> attrs <> ">" <> content <> "</" <> tag <> ">"

getStructureString :: Structure -> String
getStructureString structure =
  case structure of
    Structure str -> str

getContentString :: Content -> String
getContentString content =
  case content of
    Content str -> str

escape :: String -> String
escape =
  let
    escapeChar c =
      case c of
        '<' -> "&lt;"
        '>' -> "&gt;"
        '&' -> "&amp;"
        '"' -> "&quot;"
        '\'' -> "&#39;"
        _ -> [c]
  in
    concat . map escapeChar


src/Html.hs
-- Html.hs

module HsBlog.Html
  ( Html
  , Title
  , Structure
  , html_
  , p_
  , h_
  , h1_
  , ul_
  , ol_
  , code_
  , Content
  , txt_
  , img_
  , link_
  , b_
  , i_
  , render
  )
  where

import HsBlog.Html.Internal
src/Convert.hs
-- Convert.hs
module HsBlog.Convert where

import qualified HsBlog.Markup as Markup
import qualified HsBlog.Html as Html

convert :: Html.Title -> Markup.Document -> Html.Html
convert title = Html.html_ title . foldMap convertStructure

convertStructure :: Markup.Structure -> Html.Structure
convertStructure structure =
  case structure of
    Markup.Header n txt ->
      Html.h_ n $ Html.txt_ txt

    Markup.Paragraph p ->
      Html.p_ $ Html.txt_ p

    Markup.UnorderedList list ->
      Html.ul_ $ map (Html.p_ . Html.txt_) list

    Markup.OrderedList list ->
      Html.ol_ $ map (Html.p_ . Html.txt_) list

    Markup.CodeBlock list ->
      Html.code_ (unlines list)

Creating an index page

With our extended HTML EDSL, we can now create an index page with links to the other pages.

To create an index page, we need a list of files with their target destination, as well as their Markup (so we can extract information to include in our index page, such as the first header and paragraph). Our output should be an Html page.


We need to implement the following function:

buildIndex :: [(FilePath, Markup)] -> Html
Solution
buildIndex :: [(FilePath, Markup.Document)] -> Html.Html
buildIndex files =
  let
    previews =
      map
        ( \(file, doc) ->
          case doc of
            Markup.Header 1 header : article ->
              Html.h_ 3 (Html.link_ file (Html.txt_ header))
                <> foldMap convertStructure (take 3 article)
                <> Html.p_ (Html.link_ file (Html.txt_ "..."))
            _ ->
              Html.h_ 3 (Html.link_ file (Html.txt_ file))
        )
        files
  in
    Html.html_
      "Blog"
      ( Html.h1_ (Html.link_ "index.html" (Html.txt_ "Blog"))
        <> Html.h_ 2 (Html.txt_ "Posts")
        <> mconcat previews
      )

Processing directories

Our general strategy for processing whole directories is going to be:

  • Create the output directory
  • Grab all file names in a directory
  • Filter them according to their extension, we want to process txt file and copy other files without modification
  • We want to parse each text file, build an index of the result, convert the files to HTML, and write everything to the target directory

While our parsing function can't really fail, trying to read or write a file to the file-system can fail in several ways. It would be nice if our static blog generator was robust enough that it wouldn't fail completely if one single file gave it some trouble. This is a good opportunity to learn about error handling in Haskell, both in uneffectful code and for I/O code.

In the next few chapters we'll survey the landscape of error handling in Haskell before figuring out what is the right approach for our use case.

Handling errors with Either

There are quite a few ways to indicate and handle errors in Haskell. We are going to look at one solution: using the type Either. Either is defined like this:

data Either a b
  = Left a
  | Right b

Simply put, a value of type Either a b can contain either a value of type a, or a value of type b. We can tell them apart from the constructor used.

Left True :: Either Bool b
Right 'a' :: Either a Char

Using this type, we can represent computations that may fail by using the Left constructor to indicate failure with some error value attached, and the Right constructor with one type to represent success with the expected result.

Since Either is polymorphic, we can use any two types to represent failure and success. It is often useful to describe the failure modes using an ADT.

For example, let's say that we want to parse a Char as a decimal digit to an Int. This operation could fail if the Character is not a digit. We can represent this error as a data type:

data ParseDigitError
  = NotADigit Char
  deriving Show

And our parsing function can have the type:

parseDigit :: Char -> Either ParseDigitError Int

Now when we implement our parsing function we can return Left on an error describing the problem, and Right with the parsed value on successful parsing:

parseDigit :: Char -> Either ParseDigitError Int
parseDigit c =
  case c of
    '0' -> Right 0
    '1' -> Right 1
    '2' -> Right 2
    '3' -> Right 3
    '4' -> Right 4
    '5' -> Right 5
    '6' -> Right 6
    '7' -> Right 7
    '8' -> Right 8
    '9' -> Right 9
    _ -> Left (NotADigit c)

Either a is also an instance of Functor and Applicative, so we have some combinators to work with if we want to combine these kind of computations.

For example, if we had three characters and we wanted to try and parse each of them and then find the maximum between them, we could use the applicative interface:

max3chars :: Char -> Char -> Char -> Either ParseDigitError Int
max3chars x y z =
  (\a b c -> max a (max b c))
    <$> parseDigit x
    <*> parseDigit y
    <*> parseDigit z

The Functor and Applicative interfaces of Either a allow us to apply functions to the payload values and delay the error handling to a later phase. Semantically, the first Either in order that returns a Left will be the return value. We can see how this works in the implementation of the applicative instance:

instance Applicative (Either e) where
    pure          = Right
    Left  e <*> _ = Left e
    Right f <*> r = fmap f r

At some point, someone will actually want to inspect the result and see if we got an error (with the Left constructor) or the expected value (with the Right constructor) and they can do that by pattern matching on the result.

Applicative + Traversable

The Applicative interface of Either is very powerful, and can be combined with another abstraction called Traversable - for data structures that can be traversed from left to right, like a linked list or a binary tree. With these, we can combine an unspecified amount of values such as Either ParseDigitError Int, as long as they are all in a data structure that implements Traversable.

Let's see an example:

ghci> :t "1234567"
"1234567" :: String
-- remember, a String is an alias for a list of Char
ghci> :info String
type String :: *
type String = [Char]
      -- Defined in ‘GHC.Base’

ghci> :t map parseDigit "1234567"
map parseDigit mystring :: [Either ParseDigitError Int]
ghci> map parseDigit "1234567"
[Right 1,Right 2,Right 3,Right 4,Right 5,Right 6,Right 7]

ghci> :t sequenceA
sequenceA :: (Traversable t, Applicative f) => t (f a) -> f (t a)
-- Substitute `t` with `[]`, and `f` with `Either Error` for a specialized version

ghci> sequenceA (map parseDigit mystring)
Right [1,2,3,4,5,6,7]

ghci> map parseDigit "1a2"
[Right 1,Left (NotADigit 'a'),Right 2]
ghci> sequenceA (map parseDigit "1a2")
Left (NotADigit 'a')

The pattern of doing map and then sequenceA is another function called traverse:

ghci> :t traverse
traverse
  :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
ghci> traverse parseDigit "1234567"
Right [1,2,3,4,5,6,7]
ghci> traverse parseDigit "1a2"
Left (NotADigit 'a')

We can use traverse on any two types where one implements the Applicative interface, like Either a or IO, and the other implements the Traversable interface, like [] (linked lists) and Map k (also known as a dictionary in other languages - a mapping from keys to values). For example using IO and Map. Note that we can construct a Map data structure from a list of tuples using the fromList function - the first value in the tuple is the key, and the second is the type.

ghci> import qualified Data.Map as M -- from the containers package

ghci> file1 = ("output/file1.html", "input/file1.txt")
ghci> file2 = ("output/file2.html", "input/file2.txt")
ghci> file3 = ("output/file3.html", "input/file3.txt")
ghci> files = M.fromList [file1, file2, file3]
ghci> :t files :: M.Map FilePath FilePath -- FilePath is an alias of String
files :: M.Map FilePath FilePath :: M.Map FilePath FilePath

ghci> readFiles = traverse readFile
ghci> :t readFiles
readFiles :: Traversable t => t FilePath -> IO (t String)

ghci> readFiles files
fromList [("output/file1.html","I'm the content of file1.txt\n"),("output/file2.html","I'm the content of file2.txt\n"),("output/file3.html","I'm the content of file3.txt\n")]
ghci> :t readFiles files
readFiles files :: IO (Map String String)

Above, we created a function readFiles that will take a mapping from output file path to input file path and returns an IO operation that when run will read the input files and replace their contents right there in the map! Surely this will be useful later.

Multiple errors

Note, since Either has the kind * -> * -> * (it takes two type parameters) Either cannot be an instance of Functor and Applicative, instances for these type classes can only be implemented for types that have the kind * -> *. Remember that when we look at a type class function signature like:

fmap :: Functor f => (a -> b) -> f a -> f b

And we want to implement it for a specific type (in place of the f), we need to be able to substitute the f with the target type. If we'd try to do it with Either we'll get:

fmap :: (a -> b) -> Either a -> Either b

And neither Either a or Either b are saturated, so this won't type check. For the same reason if we'll try to substitute f with, say, Int, we'll get:

fmap :: (a -> b) -> Int a -> Int b

Which also doesn't make sense.

So while we can't use Either, we can use Either e, which has the kind * -> *. Now let's try substituting f with Either e in this signature:

liftA2 :: Applicative => (a -> b -> c) -> f a -> f b -> f c

And we'll get:

liftA2 :: (a -> b -> c) -> Either e a -> Either e b -> Either e c

What this teaches us is that we can only use the applicative interface to combine two Eithers with the same type for the Left constructor.

So what can we do if we have two functions that can return different errors? There are a few approaches, the most prominent ones are:

  1. Make them return the same error type. Write an ADT that holds all possible error descriptions. This can work in some cases but isn't always ideal because for example a user calling parseDigit shouldn't be forced to handle a possible case that the input might be an empty string.
  2. Use a specialized error type for each type, and when they are composed together, map the error type of each function to a more general error type. This can be done with the function first from the Bifunctor type class.

Monadic interface

The applicative interface allows us to lift a function on to work on multiple Either values (or other applicative functor instances such as IO and Parser). But more often than not, we'd like to be able to use a value from one computation that might return an error in another computation that might return an error.

For example, a compiler such has GHC operates in stages, such as lexical analysis, parsing, type-checking, and so on. Each stage depends on the output of the stage before it, and each stage might fail. We can write the types for these functions:

tokenize :: String -> Either Error [Token]

parse :: [Token] -> Either Error AST

typcheck :: AST -> Either Error TypedAST

We want to compose these functions so that they work in a chain. The output of tokenize goes to parse, the output of parse goes into to typecheck.

We know that we can lift a function over an Either (and other functors), we could also lift a function that returns an Either:

-- reminder the type of fmap
fmap :: Functor f => (a -> b) -> f a -> f b
-- specialized for `Either Error`
fmap :: (a -> b) -> Either Error a -> Either Error b

-- here, `a` is [Token] and `b` is `Either Error AST`:

> fmap parse (tokenize string) :: Either Error (Either Error AST)

While this code compiles, it isn't great, because we are building layers of Either Error and we can't use this trick again with typecheck! typecheck expects an AST, but if we try to fmap it on fmap parse (tokenize string), the a will be Either Error AST instead.

What we would really like is to flatten this structure instead of building it. If we look at the kind of values Either Error (Either Error AST) could have, it looks something like this:

  • Left <error>
  • Right (Left error)
  • Right (Right <ast>)

Exercise: What if we just used pattern matching for this instead? How would this look like?

Solution
case tokenize string of
  Left err ->
    Left err
  Right tokens ->
    case parse tokens of
      Left err ->
        Left err
      Right ast ->
        typecheck ast

If we run into an error in a stage, we return that error and stop. If we succeed, we use the value on the next stage.


Flattening this structure for Either is very similar to that last part - the body of the Right tokens case:

flatten :: Either e (Either e a) -> Either e a
flatten e =
  case e of
    Left l -> Left l
    Right x -> x

Because we have this function, we can now use it on the output of fmap parse (tokenize string) :: Either Error (Either Error AST) from before:

> flatten (fmap parse (tokenize string)) :: Either Error AST

And now we can use this function again to compose with typecheck:

> flatten (fmap typecheck (flatten (fmap parse (tokenize string)))) :: Either Error TypedAST

This flatten + fmap combination looks like a recurring pattern which we can combine into a function:

flatMap :: (a -> Either e b) -> Either a -> Either b
flatMap func val = flatten (fmap func val)

And now we can write the code this way:

> flatMap typecheck (flatMap parse (tokenize string)) :: Either Error TypedAST

-- Or using backticks syntax to convert the function to infix form:
> typecheck `flatMap` parse `flatMap` tokenize string

-- Or create a custom infix operator: (=<<) = flatMap
> typeCheck =<< parse =<< tokenize string

This function, flatten (and flatMap as well), have different names in Haskell. They are called join and =<< (pronounced "reverse bind"), and they are the essence of another incredibly useful abstraction in Haskell.

If we have a type that can implement:

  1. The Functor interface, specifically the fmap function
  2. The Applicative interface, most importantly the pure function
  3. This join function

They can implement an instance of the Monad type class.

With functors, we were able to "lift" a function to work over the type implementing the functor type class:

fmap :: (a -> b) -> f a -> f b

With applicative functors we were able to "lift" a function of multiple arguments over multiple values of a type implementing the applicative functor type class, and also lift a value into that type:

pure :: a -> f a

liftA2 :: (a -> b -> c) -> f a -> f b -> f c

With monads we can now flatten (or, "join" in Haskell terminology) types that implement the Monad interface:

join :: m (m a) -> m a

-- this is =<< with the arguments reversed, pronounced "bind"
(>>=) :: m a -> (a -> m b) -> m b

With >>= we can write our compilation pipeline from before in a left-to-right manner, which seems to be more popular for monads:

> tokenize string >>= parse >>= typecheck

We have already met this function before when we talked about IO. Yes, IO also implements the Monad interface. The monadic interface for IO helped us with creating a proper ordering of effects.

The essence of the Monad interface is the join/>>= functions, and as we've seen we can implement >>= in terms of join, we can also implement join in terms of >>= (try it!).

The monadic interface can mean very different things for different types. For IO this is ordering of effects, for Either it is early cutoff, for Logic this means backtracking computation, etc.

Again, don't worry about analogies and metaphors, focus on the API and the laws.

Hey, did you check the monad laws? left identity, right identity and associativity? We've already discussed a type class with exactly these laws - the Monoid type class. Maybe this is related to the famous quote about monads beings just monoids in something something...

Do notation?

Remember do notation? Turns out it works for any type that is an instance of Monad. How cool is that? Instead of writing:

pipeline :: String -> Either Error TypedAST
pipeline string =
  tokenize string >>= \tokens ->
    parse tokens >>= \ast ->
      typecheck ast

We can write:

pipeline :: String -> Either Error TypedAST
pipeline string = do
  tokens <- tokenize string
  ast <- parse tokens
  typecheck ast

And it will work! Still, in this particular case tokenize string >>= parse >>= typecheck is so concise it can only be beaten by using >=>

>=> :: Monad m => (a -> m b) -> (b -> m c) -> a -> m c

-- compare with function composition:
(.) ::            (a ->   b) -> (b ->   c) -> a ->   c
pipeline = tokenize >=> parse >=> typecheck

This ability of Haskell's to create very concise code using great abstractions makes it great once one is familiar with the abstractions. Knowing the monad abstraction, we are now already familiar with the core composition API of many libraries - for example:

Summary

Using Either for error handling is useful for two reasons:

  1. We encode possible errors using types, and we force users to acknowledge and handle them, thus making our code more resilient to crashes and bad behaviour.
  2. The Functor, Applicative and Monad interfaces provide us with mechanisms for composing functions that might fail (almost) effortlessly - reducing boilerplate while maintaining strong guarantees about our code, and delaying the need to handle errors until it is appropriate.

Either with IO?

When we create IO actions that may require I/O we risk running into all kinds of errors. For example, when we use writeFile, we could run out of disk space in the middle of writing, or the file might be write protected. While these scenarios aren't super common, they are definitely possible.

We could've potentially encoded Haskell functions like readFile and writeFile as IO operations that return Either, for example:

readFile :: FilePath -> IO (Either ReadFileError String)
writeFile :: FilePath -> String -> IO (Either WriteFileError ())

However there are a couple of issues here, the first is that now composing IO actions became more difficult. Previously we could write:

readFile "input.txt" >>= writeFile "output.html"

But now the types no longer match - readFile will return an Either ReadFileError String, but writeFile wants to take a String as input. We are forced to handle the error before calling writeFile.

Composing IO + Either using ExceptT

One way to handle this is by using monad transformers. Monad transformers are a method to stack monad capabilities on top of one another. They are called transformers because they take a type that has an instance of monad as input, and return a new type that implements the monad interface, stacking a new capability on top of it.

For example, if we want to create a value with a type similar to IO (Either Error a) that we can compose using the monadic interface (the function >>=) with other values of a type similar to IO (Either Error a), we can using a monad transformer called ExceptT. Let's see how ExceptT is defined:

newtype ExceptT e m a = ExceptT (m (Either e a))

Remember, a newtype is a new name for an existing type. And if we substitute e with Error and m with IO we'll get exactly IO (Either Error a) as we wanted. And we can convert an ExceptT Error IO a into IO (Either Error a) using the function runExceptT:

runExceptT :: ExceptT e m a -> m (Either e a)

ExceptT implements the monadic interface in a way that combines the capabilities of Either, and whatever m it takes. Because ExceptT e m has a Monad instance, a specialized version of >>= would look like this:

-- Generalized version
(>>=) :: Monad m => m a -> (a -> m b) -> m b

-- Specialized version, replace `m` with `ExceptT e m`
(>>=) :: ExceptT e m a -> (a -> ExceptT e m b) -> ExceptT e m b

Unsure how this works? Try to implement >>= for IO (Either Error a):

bindExceptT :: IO (Either Error a) -> (a -> IO (Either Error b)) -> IO (Either Error b)
Solution
bindExceptT :: IO (Either Error a) -> (a -> IO (Either Error b)) -> IO (Either Error b)
bindExceptT mx f = do
  x <- mx -- `x` has the type `Either Error a`
  case x of
    Left err -> pure (Left err)
    Right y -> f y

Note that we didn't actually use the implementation details of Error or IO, Error isn't mentioned at all, and for IO we only used the monadic interface with the do notation. We could write the same function with a more generalized type signature:

bindExceptT :: Monad m => m (Either e a) -> (a -> m (Either e b)) -> m (Either e b)
bindExceptT mx f = do
  x <- mx -- `x` has the type `Either e a`
  case x of
    Left err -> pure (Left err)
    Right y -> f y

And because newtype ExceptT e m a = ExceptT (m (Either e a)) we can just pack and unpack that ExceptT constructor and get:

bindExceptT :: Monad m => ExceptT e m a -> (a -> ExceptT e m b) -> ExceptT e m b
bindExceptT mx f = ExceptT $ do
  -- `runExceptT mx` has the type `m (Either e a)`
  -- `x` has the type `Either e a`
  x <- runExceptT mx
  case x of
    Left err -> pure (Left err)
    Right y -> f y

Note that when stacking monad transformers, the order in which we stack them matters. With ExceptT Error IO a, we have an IO operation that when run will return Either an error or a value.

ExceptT can enjoy both worlds - we can return error values using the function throwError:

throwError :: e -> ExceptT e m a

and we can "lift" functions that return a value of the underlying monadic type m to return a value of ExceptT e m a instead:

lift :: m a -> ExceptT e m a

for example:

getLine :: IO String

lift getLine :: ExceptT e IO String

(Actually, lift is also a type class function from MonadTrans, the type class of monad transformers. So technically lift getLine :: MonadTrans t => t IO String, but we are specializing for concreteness)

Now, if we had:

readFile :: FilePath -> ExceptT IOError IO String

writeFile :: FilePath -> String -> ExceptT IOError IO ()

We could compose them again without issue:

readFile "input.txt" >>= writeFile "ouptut.html"

But remember - the error type e (in both the case Either and Except) must be the same between composed functions! This means that the type representing errors for both readFile and writeFile must be the same - that would also force anyone using these functions to handle these errors - should a user who called writeFile be required to handle a "file not found" error? Should a user who called readFile be required to handle an "out of disk space" error? There are many many more possible IO errors! "network unreachable", "out of memory", "cancelled thread", we cannot require a user to handle all these errors, or even cover them all in a data type.

So what do we do?

We give up on this approach for IO code, and use a different one: Exceptions. As we'll see in the next chapter.

Note - when we stack ExceptT on top of a different type called Identity that also implements the Monad interface, we get a type that is exactly like Either called Except (without the T at the end). You might sometimes want to use Except instead of Either because it has a more appropriate name and better API for error handling than Either.

Exceptions

The Control.Exception module provides us with the ability to throw exceptions from IO code, catch Haskell exceptions in IO code, and even convert them to IO (Either ...) with the function try:

throwIO :: Exception e => e -> IO a

catch
  :: Exception e
  => IO a         -- The computation to run
  -> (e -> IO a)  -- Handler to invoke if an exception is raised
  -> IO a

try :: Exception e => IO a -> IO (Either e a)

The important part of these type signatures is the Exception type class. By making a type an instance of the Exception type class, we can throw it and catch it in IO code:

{-# language LambdaCase #-}

import Control.Exception
import System.IO

data MyException
  = ErrZero
  | ErrOdd Int
  deriving Show

instance Exception MyException

sayDiv2 :: Int -> IO ()
sayDiv2 n
  | n == 0 = throwIO ErrZero
  | n `mod` 2 /= 0 = throwIO (ErrOdd n)
  | otherwise = print (n `div` 2)

main :: IO ()
main =
  catch
    ( do
      putStrLn "Going to print a number now."
      sayDiv2 7
      putStrLn "Did you like it?"
    )
    ( \case
      ErrZero ->
        hPutStrLn stderr "Error: we don't support dividing zeroes for some reason"
      ErrOdd n ->
        hPutStrLn stderr ("Error: " <> show n <> " is odd and cannot be divided by 2")
    )

Note: we are using two new things here: guards, and the LambdaCase language extension.

  1. Guards as seen in sayDiv2 are just a nicer syntax around if-then-else expressions. Using guards we can have multiple if branches and finally use the else branch by using otherwise. After each guard (|) there's a condition, after the condition there's a = and then the expression (the part after then in an if expression).

  2. LambdaCase as seen in catch, is just a syntactic sugar to save a few characters, instead of writing \e -> case e of, we can write \case. It requires enabling the LambdaCase extension.

    Language extensions

    Haskell is a standardized language. However, GHC provides extensions to the language - additional features that aren't covered in the 98 or 2010 standards of Haskell. Features such as syntactic extensions (like LambdaCase above), extensions to the type checker, and more.

    These extensions can be added by adding {-# language <extension-name> #-} (the language part is case insensitive) to the top of a Haskell source file, or they can be set globally for an entire project by specifying them in the default-extensions section in the .cabal file.

    The list of language extensions can be found in the GHC manual, feel free to browse it, but don't worry about trying to memorize all the extensions.

This example, of course, is an example that would work much better using Either and separating the division and printing à la 'functional core, imperative shell'. But as an example it works. We have created a custom exception and handled it specifically outside an IO block. However, we have not handled exceptions that might be raised by putStrLn. What if, for example, for some reason we close the stdout handle before this block:

main :: IO ()
main = do
  hClose stdout
  catch
    ( do
      putStrLn "Going to print a number now."
      sayDiv2 7
      putStrLn "Did you like it?"
    )
    ( \case
      ErrZero ->
        hPutStrLn stderr "Error: we don't support dividing zeroes for some reason"
      ErrOdd n ->
        hPutStrLn stderr ("Error: " <> show n <> " is odd and cannot be divided by 2")
    )

Our program will crash with an error:

ghc: <stdout>: hFlush: illegal operation (handle is closed)

First, how do we know which exception we should handle? Some functions' documentation will include this, but unfortunately putStrLn does not. We could guess from the list of instances the Exception type class has; I think IOException fits. Now, how can we handle this case as well? We can chain catches:

-- need to add these at the top

{-# language ScopedTypeVariables #-}

import GHC.IO.Exception (IOException(..))

main :: IO ()
main = do
  hClose stdout
  catch
    ( catch
      ( do
        putStrLn "Going to print a number now."
        sayDiv2 7
        putStrLn "Did you like it?"
      )
      ( \case
        ErrZero ->
          hPutStrLn stderr "Error: we don't support dividing zeroes for some reason"
        ErrOdd n ->
          hPutStrLn stderr ("Error: " <> show n <> " is odd and cannot be divided by 2")
      )
    )
    ( \(e :: IOException) ->
      -- we can check if the error was an illegal operation on the stderr handle
      if ioe_handle e /= Just stderr && ioe_type e /= IllegalOperation
        then pure () -- we can't write to stderr because it is closed
        else hPutStrLn stderr (displayException e)
    )

We use the ScopedTypeVariables to be able to specify types inside let expressions, lambdas, pattern matching and more.

Or we could use the convenient function catches to pass a list of exception handlers:

main :: IO ()
main = do
  hClose stdout
  catches
    ( do
      putStrLn "Going to print a number now."
      sayDiv2 7
      putStrLn "Did you like it?"
    )
    [ Handler $ \case
      ErrZero ->
        hPutStrLn stderr "Error: we don't support dividing zeroes for some reason"
      ErrOdd n ->
        hPutStrLn stderr ("Error: " <> show n <> " is odd and cannot be divided by 2")
    
    , Handler $ \(e :: IOException) ->
      -- we can check if the error was an illegal operation on the stderr handle
      if ioe_handle e /= Just stderr && ioe_type e /= IllegalOperation
        then pure () -- we can't write to stderr because it is closed
        else hPutStrLn stderr (displayException e)
    ]

As an aside, Handler uses a concept called existentially quantified types to hide inside it a function that takes an arbitrary type that implements Exception. This is why we can encode a seemingly heterogeneous list of functions that handle exceptions for catches to take as input. This pattern is rarely useful, but I've included it here to avoid confusion.

And if we wanted to catch any exception, we'd catch SomeException:

main :: IO ()
main = do
  hClose stdout
  catch
    ( do
      putStrLn "Going to print a number now."
      sayDiv2 7
      putStrLn "Did you like it?"
    )
    ( \(SomeException e) ->
      hPutStrLn stderr (show e)
    )

This could also go in catches as the last element in the list if we wanted specialized handling for other scenarios.

A couple more functions worth knowing are bracket and finally. These functions can help us handle resource acquisition more safely when errors are present.

Summary

Exceptions are useful and often necessary when we work with IO and want to make sure our program is handling errors gracefully. They have an advantage over Either in that we can easily compose functions that may throw errors of different types, but also have a disadvantage of not encoding types as return values, and therefore does not force us to handle them.

For Haskell, the language designers have made a choice for us by designing IO to use exceptions instead of Either. And this is what I would recommend to use for handling your own effectful computations. However, I think that Either is more fitting for uneffectful code, because it forces us to acknowledge and handle errors (eventually) thus making our programs more robust. And also because we can only catch exceptions in IO code.

Lets code already!

This was a long info dump. Let's practice what we've learned. We want to:

  • Create the output directory
  • Grab all file names in a directory
  • Filter them according to their extension
  • Process .txt files
  • Copy other files without modification
  • Parse each text file, build an index of the result, convert the files to HTML, and write everything to the target directory

Note: I did not write this code immediately in the final form it was presented. It was an iterative process of writing code, refactoring, splitting functions, changing type signatures, and more. When solving a coding problem, start small and simple, do the thing that works, and refactor it when it makes sense and makes the code clearer and more modular. In Haskell we pride ourselves in our ability to refactor code and improve it over time, and that principle holds when writing new software as well!

New module

Let's create a new module, HsBlog.Directory, which will be responsible for handling directories and multiple files. From this module we will export the convertDirectory and buildIndex functions we've defined before.

-- | Process multiple files and convert directories

module HsBlog.Directory
  ( convertDirectory
  , buildIndex
  )
  where

In this module we are going to use the directory and filepath libraries to manipulate directories, files and filepaths. We'll use the new abstractions we've learned, Traversable and Monad, and the concepts and types we've learned about: Either, IO and exceptions.

For all of that, we need quite a few imports:

import qualified HsBlog.Markup as Markup
import qualified HsBlog.Html as Html
import HsBlog.Convert (convert, convertStructure)

import Data.List (partition)
import Data.Traversable (for)
import Control.Monad (void, when)

import System.IO (hPutStrLn, stderr)
import Control.Exception (catch, displayException, SomeException(..))
import System.Exit (exitFailure)
import System.FilePath
  ( takeExtension
  , takeBaseName
  , (<.>)
  , (</>)
  , takeFileName
  )
import System.Directory
  ( createDirectory
  , removeDirectoryRecursive
  , listDirectory
  , doesDirectoryExist
  , copyFile
  )

If you are unsure what a specific function we're using does, look it up at Hoogle, read the type signature and the documentation, and play around with it in ghci.

Converting a directory

We can start by describing the high-level function convertDirectory which encapsulates many smaller functions, each responsible for doing a specific thing. convertDirectory is quite imperative looking, and looks like a different way to describe the steps of completing our task:

-- | Copy files from one directory to another, converting '.txt' files to
--   '.html' files in the process. Recording unsuccessful reads and writes to stderr.
--
-- May throw an exception on output directory creation.
convertDirectory :: FilePath -> FilePath -> IO ()
convertDirectory inputDir outputDir = do
  DirContents filesToProcess filesToCopy <- getDirFilesAndContent inputDir
  createOutputDirectoryOrExit outputDir
  let
    outputHtmls = txtsToRenderedHtml filesToProcess
  copyFiles outputDir filesToCopy
  writeFiles outputDir outputHtmls
  putStrLn "Done."

Here we trust that each IO function handles errors responsibly, and terminates the project when necessary.

Let's examine the steps in order.

getDirFilesAndContent

-- | The relevant directory content for our application
data DirContents
  = DirContents
    { dcFilesToProcess :: [(FilePath, String)]
      -- ^ File paths and their content
    , dcFilesToCopy :: [FilePath]
      -- ^ Other file paths, to be copied directly
    }

-- | Returns the directory content
getDirFilesAndContent :: FilePath -> IO DirContents

getDirFilesAndContent is responsible for providing the relevant files for processing -- both the ones we need to convert to markup (and their textual content) and other files we might want to copy as-is (such as images and style-sheets).

-- | Returns the directory content
getDirFilesAndContent :: FilePath -> IO DirContents
getDirFilesAndContent inputDir = do
  files <- map (inputDir </>) <$> listDirectory inputDir
  let
    (txtFiles, otherFiles) =
      partition ((== ".txt") . takeExtension) files
  txtFilesAndContent <-
    applyIoOnList readFile txtFiles >>= filterAndReportFailures
  pure $ DirContents
    { dcFilesToProcess = txtFilesAndContent
    , dcFilesToCopy = otherFiles
    }

This function does 4 important things:

  1. Lists all the files in the directory
  2. Splits the files into 2 groups according to their file extension
  3. Reads the contents of the .txt files and report when files failed to be read
  4. Returns the results. We've defined a data type to make what each result is more obvious

Part (3) is a little bit more involved than the rest, lets explore it.

applyIoOnList

-- | Try to apply an IO function on a list of values, document successes and failures
applyIoOnList :: (a -> IO b) -> [a] -> IO [(a, Either String b)]
applyIoOnList action files = do
  for files $ \file -> do
    maybeContent <-
      catch
        (Right <$> action file)
        ( \(SomeException e) -> do
          pure $ Left (displayException e)
        )
    pure (file, maybeContent)

applyIoOnList is a higher order function that applies a particular IO function (in our case readFile) on a list of things (in our case FilePaths), and for each thing, it returns the thing itself along with the result of applying the IO function as an Either, where the Left side is a String representation of an error if one occurred.

Notice how much the type of this function tells us about what it might do. Because the types are polymorphic, there is nothing else to do with with the as other than apply them to the function, and nowhere to generate b from other than the result of the function.

Note: when I first wrote this function, it was specialized to work only on readFile, take specifically [FilePath] and return IO [(FilePath, Either String String)]. But after running into other use cases where I could use it (writeFiles and copyFiles) I refactored out the action, the input type and the return type.

This function uses exceptions to catch any error that might be thrown, and encodes both the failure and success cases in the type system using Either, delaying the handling of exceptions to the caller of the function while making sure it won't be forgotten!

Next, let's look at the function that handles the errors by reporting and then filtering out all the cases that failed.

filterAndReportFailures

-- | Filter out unsuccessful operations on files and report errors to stderr.
filterAndReportFailures :: [(a, Either String b)] -> IO [(a, b)]
filterAndReportFailures =
  foldMap $ \ (file, contentOrErr) ->
    case contentOrErr of
      Left err -> do
        hPutStrLn stderr err
        pure []
      Right content ->
        pure [(file, content)]

This code may seem a bit surprising - how come we can use foldMap here? Reminder, the type of foldMap is:

foldMap :: (Foldable t, Monoid m) => (a -> m) -> t a -> m

If we specialize this function for our use case, substituting the general type with the types we are using, we learn that IO [(a, b)] is a monoid. And indeed - [a] is a monoid for any a with [] (the empty list) as mempty and ++ as <>, but also IO a is a monoid for any a that is itself a monoid with pure mempty as mempty and liftA2 (<>) as <>!

Using these instances, we can map over the content, handling errors and returning an empty list to filter a failed case out, or a singleton list to keep the result. and the fold in foldMap will concatenate the resulting list where we return all of the successful cases!

These functions were responsible for fetching the right information. Next, let's look at the code for creating a new directory.

createOutputDirectoryOrExit

-- | Creates an output directory or terminates the program
createOutputDirectoryOrExit :: FilePath -> IO ()
createOutputDirectoryOrExit outputDir =
  whenIO
    (not <$> createOutputDirectory outputDir)
    (hPutStrLn stderr "Cancelled." *> exitFailure)

-- | Creates the output directory.
--   Returns whether the directory was created or not.
createOutputDirectory :: FilePath -> IO Bool
createOutputDirectory dir = do
  dirExists <- doesDirectoryExist dir
  create <-
    if dirExists
      then do
        override <- confirm "Output directory exists. Override?"
        when override (removeDirectoryRecursive dir)
        pure override
      else
        pure True
  when create (createDirectory dir)
  pure create

createOutputDirectoryOrExit itself is not terribly exciting, it does what it is named -- it tries to create the output directory, and exits the program in case it didn't succeed.

createOutputDirectory is the function that actually does the heavy lifting. It checks if the directory already exists, and checks if the user would like to override it. If they do, we remove it and create the new directory; if they don't, we do nothing and report their decision.

txtsToRenderedHtml

let
  outputHtmls = txtsToRenderedHtml filesToProcess

In this part of the code we convert files to markup and change the input file paths to their respective output file paths (.txt -> .html). We then build the index page, and convert everything to HTML.

-- | Convert text files to Markup, build an index, and render as html.
txtsToRenderedHtml :: [(FilePath, String)] -> [(FilePath, String)]
txtsToRenderedHtml txtFiles =
  let
    txtOutputFiles = map toOutputMarkupFile txtFiles
    index = ("index.html", buildIndex txtOutputFiles)
  in
    map (fmap Html.render) (index : map convertFile txtOutputFiles)

toOutputMarkupFile :: (FilePath, String) -> (FilePath, Markup.Document)
toOutputMarkupFile (file, content) =
  (takeBaseName file <.> "html", Markup.parse content)

convertFile :: (FilePath, Markup.Document) -> (FilePath, Html.Html)
convertFile (file, doc) = (file, convert file doc)

One possibly surprising thing about this code could be the map (fmap Html.render) part. We can use fmap on the tuple because it is a Functor on the second argument, just like Either!

copyFiles and writeFiles

The only thing left to do is to write the directory content, after the processing is completed, to the newly created directory:

-- | Copy files to a directory, recording errors to stderr.
copyFiles :: FilePath -> [FilePath] -> IO ()
copyFiles outputDir files = do
  let
    copyFromTo file = copyFile file (outputDir </> takeFileName file)
  void $ applyIoOnList copyFromTo files >>= filterAndReportFailures

Here we use applyIoOnList again to do something a bit more complicated, instead of reading from a file, it copies from the input path to a newly generated output path. Then we pass the result (which has the type [(FilePath, Either String ())]) to filterAndReportFailures to print the errors and filter out the unsuccessful copies. Because we are not really interested in the output of filterAndReportFailures, we discard it with void, returning () as a result instead.

-- | Write files to a directory, recording errors to stderr.
writeFiles :: FilePath -> [(FilePath, String)] -> IO ()
writeFiles outputDir files = do
  let
    writeFileContent (file, content) =
      writeFile (outputDir </> file) content
  void $ applyIoOnList writeFileContent files >>= filterAndReportFailures

Once again, this code looks almost exactly like copyFiles, but the types are different. Haskell's combination of parametric polymorphism + type class for abstractions is really powerful, and has helped us reduce quite a bit of code.


This pattern of using applyIoOnList and then filterAndReportFailures happens more than once. It might be a good candidate for refactoring. Try it! What do you think about the resulting code? Is it easier or more difficult to understand? Is it more modular or less? What are the pros and cons?


Summary

With that, we have completed our HsBlog.Directory module that handles converting a directory safely. Note that the code could probably be simplified quite a bit if we were fine with errors crashing the entire program altogether, but sometimes this is the price we pay for robustness. It is up to you to choose what you can live with and what not, but I hope this saga has taught you how to approach handling errors in Haskell in case you need to.


View the full module:

HsBlog.Directory
-- | Process multiple files and convert directories

module HsBlog.Directory
  ( convertDirectory
  , buildIndex
  )
  where

import qualified HsBlog.Markup as Markup
import qualified HsBlog.Html as Html
import HsBlog.Convert (convert, convertStructure)

import Data.List (partition)
import Data.Traversable (for)
import Control.Monad (void, when)

import System.IO (hPutStrLn, stderr)
import Control.Exception (catch, displayException, SomeException(..))
import System.Exit (exitFailure)
import System.FilePath
  ( takeExtension
  , takeBaseName
  , (<.>)
  , (</>)
  , takeFileName
  )
import System.Directory
  ( createDirectory
  , removeDirectoryRecursive
  , listDirectory
  , doesDirectoryExist
  , copyFile
  )

-- | Copy files from one directory to another, converting '.txt' files to
--   '.html' files in the process. Recording unsuccessful reads and writes to stderr.
--
-- May throw an exception on output directory creation.
convertDirectory :: FilePath -> FilePath -> IO ()
convertDirectory inputDir outputDir = do
  DirContents filesToProcess filesToCopy <- getDirFilesAndContent inputDir
  createOutputDirectoryOrExit outputDir
  let
    outputHtmls = txtsToRenderedHtml filesToProcess
  copyFiles outputDir filesToCopy
  writeFiles outputDir outputHtmls
  putStrLn "Done."

------------------------------------
-- * Read directory content

-- | Returns the directory content
getDirFilesAndContent :: FilePath -> IO DirContents
getDirFilesAndContent inputDir = do
  files <- map (inputDir </>) <$> listDirectory inputDir
  let
    (txtFiles, otherFiles) =
      partition ((== ".txt") . takeExtension) files
  txtFilesAndContent <-
    applyIoOnList readFile txtFiles >>= filterAndReportFailures
  pure $ DirContents
    { dcFilesToProcess = txtFilesAndContent
    , dcFilesToCopy = otherFiles
    }

-- | The relevant directory content for our application
data DirContents
  = DirContents
    { dcFilesToProcess :: [(FilePath, String)]
      -- ^ File paths and their content
    , dcFilesToCopy :: [FilePath]
      -- ^ Other file paths, to be copied directly
    }

------------------------------------
-- * Build index page

buildIndex :: [(FilePath, Markup.Document)] -> Html.Html
buildIndex files =
  let
    previews =
      map
        ( \(file, doc) ->
          case doc of
            Markup.Header 1 header : article ->
              Html.h_ 3 (Html.link_ file (Html.txt_ header))
                <> foldMap convertStructure (take 2 article)
                <> Html.p_ (Html.link_ file (Html.txt_ "..."))
            _ ->
              Html.h_ 3 (Html.link_ file (Html.txt_ file))
        )
        files
  in
    Html.html_
      "Blog"
      ( Html.h1_ (Html.link_ "index.html" (Html.txt_ "Blog"))
        <> Html.h_ 2 (Html.txt_ "Posts")
        <> mconcat previews
      )

------------------------------------
-- * Conversion

-- | Convert text files to Markup, build an index, and render as html.
txtsToRenderedHtml :: [(FilePath, String)] -> [(FilePath, String)]
txtsToRenderedHtml txtFiles =
  let
    txtOutputFiles = map toOutputMarkupFile txtFiles
    index = ("index.html", buildIndex txtOutputFiles)
  in
    map (fmap Html.render) (index : map convertFile txtOutputFiles)

toOutputMarkupFile :: (FilePath, String) -> (FilePath, Markup.Document)
toOutputMarkupFile (file, content) =
  (takeBaseName file <.> "html", Markup.parse content)

convertFile :: (FilePath, Markup.Document) -> (FilePath, Html.Html)
convertFile (file, doc) = (file, convert file doc)

------------------------------------
-- * Output to directory

-- | Creates an output directory or terminates the program
createOutputDirectoryOrExit :: FilePath -> IO ()
createOutputDirectoryOrExit outputDir =
  whenIO
    (not <$> createOutputDirectory outputDir)
    (hPutStrLn stderr "Cancelled." *> exitFailure)

-- | Creates the output directory.
--   Returns whether the directory was created or not.
createOutputDirectory :: FilePath -> IO Bool
createOutputDirectory dir = do
  dirExists <- doesDirectoryExist dir
  create <-
    if dirExists
      then do
        override <- confirm "Output directory exists. Override?"
        when override (removeDirectoryRecursive dir)
        pure override
      else
        pure True
  when create (createDirectory dir)
  pure create

-- | Copy files to a directory, recording errors to stderr.
copyFiles :: FilePath -> [FilePath] -> IO ()
copyFiles outputDir files = do
  let
    copyFromTo file = copyFile file (outputDir </> takeFileName file)
  void $ applyIoOnList copyFromTo files >>= filterAndReportFailures

-- | Write files to a directory, recording errors to stderr.
writeFiles :: FilePath -> [(FilePath, String)] -> IO ()
writeFiles outputDir files = do
  let
    writeFileContent (file, content) =
      writeFile (outputDir </> file) content
  void $ applyIoOnList writeFileContent files >>= filterAndReportFailures

------------------------------------
-- * IO work and handling errors

-- | Try to apply an IO function on a list of values, document successes and failures
applyIoOnList :: (a -> IO b) -> [a] -> IO [(a, Either String b)]
applyIoOnList action files = do
  for files $ \file -> do
    maybeContent <-
      catch
        (Right <$> action file)
        ( \(SomeException e) -> do
          pure $ Left (displayException e)
        )
    pure (file, maybeContent)

-- | Filter out unsuccessful operations on files and report errors to stderr.
filterAndReportFailures :: [(a, Either String b)] -> IO [(a, b)]
filterAndReportFailures =
  foldMap $ \ (file, contentOrErr) ->
    case contentOrErr of
      Left err -> do
        hPutStrLn stderr err
        pure []
      Right content ->
        pure [(file, content)]

------------------------------------
-- * Utilities

confirm :: String -> IO Bool
confirm question = do
  putStrLn (question <> " (y/n)")
  answer <- getLine
  case answer of
    "y" -> pure True
    "n" -> pure False
    _ -> do
      putStrLn "Invalid response. Use y or n."
      confirm question

whenIO :: IO Bool -> IO () -> IO ()
whenIO cond action = do
  result <- cond
  if result
    then action
    else pure ()

Summary

This was quite a section. Let's recount the things we've learned.

We discussed several ways to handle errors in Haskell:

  1. Encoding errors as a data type and using the Either type to encode "a value or an error". Useful approach for uneffectful code.
  2. Using ExceptT when we want to combine the approach in (1) on top on an existing type with monadic capabilities.
  3. Using exceptions for IO code.

We've also learned a few new abstractions and techniques:

  1. The Traversable type class, for data structures that can be traversed from left to right such as linked lists, binary trees and Maps. Pretty useful when combined with another applicative functor type like Either or IO.
  2. The Monad type class extends the Applicative type class with the join :: m (m a) -> m a function. We learned that Either implements this type class interface and so does IO.
  3. The MonadTrans type class for monad transformers for types that take other monads as inputs and provide a monadic interface (>>=, do notation, etc.) while combining both their capabilities. We saw how to stack an Either-like monad transformer, ExceptT, on top of IO.

We are almost done. Only a couple more things left to do with this project. Let's go!

You can view the git commit of the changes we've made and the code up until now.

Passing environment variables

We'd like to add some sort of an environment to keep general information on the blog for various processing. Things like the blog name, stylesheet location, and so on.

Environment

We can represent our environment as a record data type and build it from user input. The user input can be from command-line arguments, a configuration file, or something else.

module HsBlog.Env where

data Env
  = Env
    { eBlogName :: String
    , eStylesheetPath :: FilePath
    }
  deriving Show

defaultEnv :: Env
defaultEnv = Env "My Blog" "style.css"

After filling this record with the requested information, we can pass it as input to any function that might need it. This is a simple approach that can definitely work for small projects. But sometimes when the project gets bigger and many nested functions need the same information, threading the environment can get tedious.

There is an alternative solution to threading the environment as input to functions, and that is using the ReaderT type from the mtl (or transformers) package.

ReaderT

newType ReaderT r m a = ReaderT (r -> m a)

ReaderT is another monad transformer like ExceptT, which means that it also has an instance of Functor, Applicative, Monad and MonadTrans.

As we can see in the definition, ReaderT is a newtype over a function that takes some value of type r, and returns a value of type m a. The r usually represents the environment we want to share between different functions that we want to compose together, and the m a represents the underlying result that we return. The m could be any type that implements Monad that we are familiar with. Usually goes well with IO or Identity, depending if we want to share an environment between effectful or uneffectful computations.

What ReaderT does is carry a value of type r and passes it around to other functions when we use the Applicative and Monad interfaces so that we don't have to pass the value around manually. And when we want to grab the r and use it, all we have to do is ask.

For our case, this means that instead of Passing around Env, we can instead convert our functions to use ReaderT - those that are uneffectful and don't use IO, can return ReaderT Env Identity a instead of a (or the simplified version, Reader Env a), and those that are effectful can return ReaderT Env IO a instead of IO a.

Note, as we've said before, Functor, Applicative and Monad all expect the type that implements their interface to have the kind * -> *. This means that it is ReaderT r m which implements these interfaces, and when we compose functions with <*> or >>= we replace the f or m in their type signature with ReaderT r m.

This means that, as with Either e when we had composed functions with the same error type, so it is with ReaderT r m - we have to compose functions with the same r type and same m type, so we can't mix different environment types or different underlying m types.

We're going to use a specialized version of ReaderT that uses a specific m = Identity called Reader. The Control.Monad.Reader provides an alias: Reader r a = ReaderT r Identity a.

If the idea behind ReaderT is still a bit fuzzy to you and you want to get a better understanding of how ReaderT works, try doing the following exercise:

  1. Choose an Applicative or Monad interface function, I recommend liftA2, and specialize its type signature by replacing f (or m) with a concrete ReaderT type such as ReaderT Int IO.
  2. Unpack the ReaderT newtype, replacing ReaderT Int IO t with Int -> IO t.
  3. Implement this specialized version of the function you've chosen
Solution for liftA2
liftA2 :: Applicative f => (a -> b -> c) -> f a -> f b -> f c
Solution for (1)
-- Specialize: replace `f` with `ReaderT Env IO`
liftA2 :: (a -> b -> c) -> ReaderT Env IO a -> ReaderT Env IO b -> ReaderT Env IO c
Solution for (2)
-- Unpack the newtype, replacing `ReaderT Env IO a` with `Env -> IO a`
liftA2 :: (a -> b -> c) -> (Env -> IO a) -> (Env -> IO b) -> (Env -> IO c)
Solution for (3)
specialLiftA2 :: (a -> b -> c) -> (Env -> IO a) -> (Env -> IO b) -> (Env -> IO c)
specialLiftA2 combine funcA funcB env =
  liftA2 combine (funcA env) (funcB env)

Notice how the job of our special liftA2 for ReaderT is to supply the two functions with env, and then use the liftA2 implementation of the underlying m type (in our case IO) to do the rest of the work. Does it look like we're adding a capability on top of a different m? That's the idea behind monad transformers.

How to use Reader

Defining a function

Instead of defining a function like this:

txtsToRenderedHtml :: Env -> [(FilePath, String)] -> [(FilePath, String)]

We define it like this:

txtsToRenderedHtml :: [(FilePath, String)] -> Reader Env [(FilePath, String)]

Now that our code uses Reader, we have to accommodate that in the way we write our function:

Before:

txtsToRenderedHtml :: Env -> [(FilePath, String)] -> [(FilePath, String)]
txtsToRenderedHtml env txtFiles =
 let
   txtOutputFiles = map toOutputMarkupFile txtFiles
   index = ("index.html", buildIndex env txtOutputFiles)
   htmlPages = map (convertFile env) txtOutputFiles
 in
   map (fmap Html.render) (index : htmlPages)

Note how we needed to thread the env to the other functions that use it.

After:

txtsToRenderedHtml :: [(FilePath, String)] -> Reader Env [(FilePath, String)]
txtsToRenderedHtml txtFiles = do
  let
    txtOutputFiles = map toOutputMarkupFile txtFiles
  index <- (,) "index.html" <$> buildIndex txtOutputFiles
  htmlPages <- traverse convertFile txtOutputFiles
  pure $ map (fmap Html.render) (index : htmlPages)

Note how we use do notation now, and instead of threading env around we compose the relevant functions, buildIndex and convertFile, we use the type classes interfaces to compose the functions. Note how we needed to fmap over buildIndex to add the output file we needed with the tuple, and how we needed to use traverse instead of map to compose the various Reader values convertFile will produce.

Extracting Env

When we want to use our Env, we need to extract it from the Reader. We can do it with:

ask :: ReaderT r m r

Which yanks the r from the Reader - we can extract with >>= or <- in do notation. See the comparison:

Before:

convertFile :: Env -> (FilePath, Markup.Document) -> (FilePath, Html.Html)
convertFile env (file, doc) =
  (file, convert env (takeBaseName file) doc)

After:

convertFile :: (FilePath, Markup.Document) -> Reader Env (FilePath, Html.Html)
convertFile (file, doc) = do
  env <- ask
  pure (file, convert env (takeBaseName file) doc)

Note: we didn't change convert to use Reader because it is a user facing API for our library. By providing a simpler interface we allow more users to use our library - even those that aren't yet familiar with monad transformers.

Providing a simple function argument passing interface is preferred in this case.

Run a Reader

Similar to handling the errors with Either, at some point we need to supply the environment to a computation that uses Reader, and extract the result from the computation. We can do that with the functions runReader and runReaderT:

runReader :: Reader r a -> (r -> a)

runReaderT :: ReaderT r m a -> (r -> m a)

These functions convert a Reader or ReaderT to a function that takes and r. Then we can pass the initial environment to that function:

convertDirectory :: Env -> FilePath -> FilePath -> IO ()
convertDirectory env inputDir outputDir = do
  DirContents filesToProcess filesToCopy <- getDirFilesAndContent inputDir
  createOutputDirectoryOrExit outputDir
  let
    outputHtmls = runReader (txtsToRenderedHtml filesToProcess) env
  copyFiles outputDir filesToCopy
  writeFiles outputDir outputHtmls
  putStrLn "Done."

See the let outputHtmlspart.

Extra: Transforming Env for a particular call

Sometimes we may want to modify the Env we pass to a particular function call. For example, we may have a general Env type that contains a lot of information, and functions that only need a part of that information.

If the functions we are calling are like convert and take the environment as an argument instead of using a Reader, we can just extract the environment with ask, apply a function to the extracted environment, and pass the result to the function, like this:

outer :: Reader BigEnv MyResult
outer = do
  env <- ask
  pure (inner (extractSmallEnv env))

inner :: SmallEnv -> MyResult
inner = ...

extractSmallEnv :: BigEnv -> SmallEnv
extractSmallEnv = ...

But if inner uses a Reader SamllEnv instead of argument passing, we can use runReader to convert inner to a normal function, and use the same idea as above!

outer :: Reader BigEnv MyResult
outer = do
  env <- ask
  -- Here the type of `runReader inner` is `SmallEnv -> MyResult`
  pure (runReader inner (extractSmallEnv env))

inner :: Reader SmallEnv MyResult
inner = ...

extractSmallEnv :: BigEnv -> SmallEnv
extractSmallEnv = ...

This pattern is generalized and captured by a function called withReaderT, and works even for ReaderT:

withReaderT :: (env2 -> env1) -> ReaderT env1 m a -> ReaderT env2 m a

withReaderT takes a function that modifies the environment, and converts a ReaderT env1 m a computation to a ReaderT env2 m a computation using this function.

Let's see it concretely with our example:

outer :: Reader BigEnv MyResult
outer = withReaderT extractSmallEnv inner

Question: what is the type of withReaderT when specialized in our case?

Answer
withReaderT
  :: (BigEnv -> SmallEnv)     -- This is the type of `extractSmallEnv`
  -> Reader SmallEnv MyResult -- This is the type of `inner`
  -> Reader BigEnv   MyResult -- This is the type of `outer`

Note the order of the environments! We use a function from a BigEnv to a SmallEnv, to convert a Reader of SmallEnv to a Reader of BigEnv!

This is because we are mapping over the input of a function rather than the output, and is related to topics like variance and covariance, but isn't terribly important for us at the moment.

Using Env in our logic code

One thing we haven't talked about yet is actually using our environment in the convert function to generate the pages we want. And actually, we don't even have the ability to add stylesheets to our HTML EDSL at the moment! We need to go back and extend it. Let's do all that now:


Since stylesheets go in the head element, perhaps it's a good idea to create an additional newtype like Structure for Header information? Things like title, stylesheet, and even meta elements can be composed together just like we did for Structure to build the Header!

  1. Do it now: extend our HTML library to include headers and add 3 functions: title_ for titles, stylesheet_ for stylesheets, and meta_ for meta elements like twitter cards.

    Solution
    src/HsBlog/Html.hs
    -- Html.hs
    
    module HsBlog.Html
      ( Html
      , Header
      , title_
      , stylesheet_
      , meta_
      , Structure
      , html_
      , p_
      , h_
      , h1_
      , ul_
      , ol_
      , code_
      , Content
      , txt_
      , img_
      , link_
      , b_
      , i_
      , render
      )
      where
    
    import HsBlog.Html.Internal
    
    src/HsBlog/Html/Internal.hs
    newtype Header
      = Header String
    
    -- * EDSL
    
    html_ :: Header -> Structure -> Html
    html_ (Header header) content =
      Html
        ( el "html"
          ( el "head" header
            <> el "body" (getStructureString content)
          )
        )
    
    -- * Header
    
    title_ :: String -> Header
    title_ = Header . el "title" . escape
    
    stylesheet_ :: FilePath -> Header
    stylesheet_ path =
      Header $ "<link rel=\"stylesheet\" type=\"text/css\" href=\"" <> escape path <> "\">"
    
    meta_ :: String -> String -> Header
    meta_ name content =
      Header $ "<meta name=\"" <> escape name <> "\" content=\"" <> escape content <> "\">"
    
    instance Semigroup Header where
      (<>) (Header h1) (Header h2) =
        Header (h1 <> h2)
    
    instance Monoid Header where
      mempty = Header ""
    
  2. Fix convert and buildIndex to use the new API. Note: buildIndex should return Reader!

    Solution
    src/HsBlog/Convert.hs
    import HsBlog.Env (Env(..))
    
    convert :: Env -> String -> Markup.Document -> Html.Html
    convert env title doc =
      let
        header =
          Html.title_ (eBlogName env <> " - " <> title)
            <> Html.stylesheet_ (eStylesheetPath env)
        article =
          foldMap convertStructure doc
        websiteTitle =
          Html.h1_ (Html.link_ "index.html" $ Html.txt_ $ eBlogName env)
        body =
          websiteTitle <> article
      in
        Html.html_ header body
    
    src/HsBlog/Directory.hs
    buildIndex :: [(FilePath, Markup.Document)] -> Reader Env Html.Html
    buildIndex files = do
      env <- ask
      let
        previews =
          map
            ( \(file, doc) ->
              case doc of
                Markup.Header 1 header : article ->
                  Html.h_ 3 (Html.link_ file (Html.txt_ header))
                    <> foldMap convertStructure (take 2 article)
                    <> Html.p_ (Html.link_ file (Html.txt_ "..."))
                _ ->
                  Html.h_ 3 (Html.link_ file (Html.txt_ file))
            )
            files
      pure $ Html.html_
          ( Html.title_ (eBlogName env)
            <> Html.stylesheet_ (eStylesheetPath env)
          )
          ( Html.h1_ (Html.link_ "index.html" (Html.txt_ "Blog"))
            <> Html.h_ 2 (Html.txt_ "Posts")
            <> mconcat previews
          )
    
  3. Create a command-line parser for Env, attach it to the convert-dir command, and pass the result it to the convertDirectory function.

    Solution
    src/HsBlog.hs
    import HsBlog.Env (defaultEnv)
    
    convertSingle :: String -> Handle -> Handle -> IO ()
    
    process :: String -> String -> String
    process title = Html.render . convert defaultEnv title . Markup.parse
    
    app/OptParse.hs
    import HsBlog.Env
    
    ------------------------------------------------
    -- * Our command-line options model
    
    -- | Model
    data Options
      = ConvertSingle SingleInput SingleOutput
      | ConvertDir FilePath FilePath Env
      deriving Show
    
    ------------------------------------------------
    -- * Directory conversion parser
    
    pConvertDir :: Parser Options
    pConvertDir =
      ConvertDir <$> pInputDir <*> pOutputDir <*> pEnv
    
    -- | Parser for blog environment
    pEnv :: Parser Env
    pEnv =
      Env <$> pBlogName <*> pStylesheet
    
    -- | Blog name parser
    pBlogName :: Parser String
    pBlogName =
      strOption
        ( long "name"
          <> short 'N'
          <> metavar "STRING"
          <> help "Blog name"
          <> value (eBlogName defaultEnv)
          <> showDefault
        )
    
    -- | Stylesheet parser
    pStylesheet :: Parser String
    pStylesheet =
      strOption
        ( long "style"
          <> short 'S'
          <> metavar "FILE"
          <> help "Stylesheet filename"
          <> value (eStylesheetPath defaultEnv)
          <> showDefault
        )
    
    
    app/Main.hs
    main :: IO ()
    main = do
      options <- parse
      case options of
        ConvertDir input output env ->
          HsBlog.convertDirectory env input output
    
        ...
    

Summary

Which version do you like better? Manually passing arguments, or using Reader?

To me, it is not clear that the second version with Reader is better than the first with explicit argument passing in our particular case.

Using Reader and ReaderT makes our code a little less friendly toward beginners that are not yet familiar with these concepts and techniques, and we don't see (in this case) much benefit.

As programs grow larger, techniques like using Reader become more attractive to use. For our relatively small example, using Reader might not be appropriate. I've included it in this book because it is an important technique to have in our arsenal and I wanted to demonstrate it.

It is important to weigh the benefits and costs of using advanced techniques, and it's often better to try and get away with simpler techniques if we can.

You can view the git commit of the changes we've made and the code up until now.

Testing

We want to add some tests to our blog generator. At the very least a few regression tests to make sure that if we extend or change our markup parsing code, HTML generation code, or translation from markup to HTML code, and make a mistake, we'll have a safety net alerting us of issues.

We will use the hspec testing framework to write our tests. There are other testing frameworks in Haskell, for example tasty, but I like hspec's documentation, so we'll use that.

Initial setup

Cabal file additions

We're going to define a new section in our hs-blog-gen.cabal file for our new test suite. This section is called test-suite and it is fairly similar to the library and executable sections.

The interfaces for how to define a test suite are described in the Cabal documentation. We are going to use the exitcode-stdio-1.0 interface. Let's go over the different settings and options:

test-suite hs-blog-gen-test
  import: common-settings
  type: exitcode-stdio-1.0
  hs-source-dirs: test
  main-is: Spec.hs

  -- other-modules:
  build-depends:
      base
    , hspec
    , hspec-discover
    , raw-strings-qq
    , hs-blog
  ghc-options:
    -O -threaded -rtsopts -with-rtsopts=-N
  build-tool-depends:
    hspec-discover:hspec-discover
  • hs-source-dirs: test - The directory of the source file for the test suite
  • main-is: Spec.hs - The entry point to the test suite source file
  • other-modules - The modules in our test suite. Currently commented out because we haven't added any yet.
  • build-depends - The packages we are going to use:
    • base - The standard library for Haskell, as we've used before
    • hspec - The test framework we are going to use
    • hspec-discover - Automatic discovery of hspec tests
    • raw-strings-qq - Additional syntax for writing raw string literals
    • hs-blog - Our library
  • ghc-options - Extra options and flags for GHC
    • -O - Compile with optimizations
    • -threaded - Use the multi-core runtime instead of single-core runtime. The multi-core runtime is generally a bit slower in my experience, but when writing code that actually uses multiple cores (such as a test framework that runs tests in parallel) it can gives a good performance boost.
    • -rtsopts - lets us manipulate the Haskell runtime system by passing command-line arguments to our application
    • -with-rtsopts=-N - Set specific default options for the program at link-time. Specifically, -N Sets the number of cores to use in our program.
  • build-tool-depends - Uses a specific executable from a package dependency in aid of building the package. In this case, we are using the hspec-discover executable from the hspec-discover package, which goes over the source directory for the tests, finds all of the Spec files, and creates an entry point for the program that will run all the tests it discovered.

Hspec discovery

In order for hspec-discover to work, we need to add the following to the "main" file of the test suite, for us this is test/Spec.hs:

{-# OPTIONS_GHC -F -pgmF hspec-discover #-}

That's it! hspec-discover will automatically define a main for us. Now we can run the tests using stack test or cabal v2-test (your choice). Because we haven't defined any tests, our output is:

Finished in 0.0000 seconds
0 examples, 0 failures

When we add new hspec tests, hspec-discover will find and run them automatically (though we will still need add them to the other-modules section in the cabal file).

For hspec-discover to identify modules as test modules, the modules must follow a convention:

  1. Their module names must end with Spec
  2. They must define a value spec :: Spec (which describes the test) and export it outside of the module (by adding it to the export list of the module, for example).

Writing tests

Let's write our first test. We'll create a new module to test markup parsing. We'll call it MarkupParsingSpec.hs. We'll need the following imports as well:

module MarkupParsingSpec where

import Test.Hspec
import HsBlog.Markup

hspec provides us with a monadic interface for describing, composing and nesting test specifications (Specs).

Using the describe function we can describe a group of tests, using the it function we can add a new test, and using a function like shouldBe we can compare two values and make sure they are equal by using their Eq instance. If they are, the test will pass, and if not, it will fail with a descriptive error.

Let's try it and write a test that obviously fails!

spec :: Spec
spec = do
  describe "Markup parsing tests" $ do
    it "empty" $
      shouldBe
        (parse "")
        [Header 1 "bug"]

After adding the module to the other-modules list in the cabal file:

  other-modules:
    MarkupParsingSpec

And running the tests, we get this output:

MarkupParsing
  Markup parsing tests
    empty FAILED [1]

Failures:

  test/MarkupParsingSpec.hs:10:7: 
  1) MarkupParsing, Markup parsing tests, empty
       expected: [Header 1 "bug"]
        but got: []

  To rerun use: --match "/MarkupParsing/Markup parsing tests/empty/"

Randomized with seed 763489823

Finished in 0.0004 seconds
1 example, 1 failure

The output describes which tests are running in a hierarchy tree (module, group and test), whether the tests passed or failed, and if they failed, what was the output and what the expected output was.

We can fix our test by matching the expected output:

      shouldBe
        (parse "")
        []

Now, running the tests will produce:

MarkupParsing
  Markup parsing tests
    empty

Finished in 0.0001 seconds
1 example, 0 failures

We can add a few more tests:

    it "paragraph" $
      shouldBe
        (parse "hello world")
        [Paragraph "hello world"]

    it "header 1" $
      shouldBe
        (parse "* Header 1")
        [Header 1 "Header 1"]

    it "code" $
      shouldBe
        (parse "> main = putStrLn \"hello world!\"")
        [CodeBlock ["main = putStrLn \"hello world!\""]]

And run the tests again:

MarkupParsing
  Markup parsing tests
    Test empty
    paragraph
    header 1
    code

Finished in 0.0003 seconds
4 examples, 0 failures

This is the gist of writing unit tests with hspec. It's important to note that we can nest Specs that are declared with describe to create trees, and of course refactor and move things to different functions and modules to make our test suite better organized.

For example, we can write our tests like this:

spec :: Spec
spec = do
  describe "Markup parsing tests" $ do
    simple

simple :: Spec
simple = do
  describe "simple" $ do
    it "empty" $
      shouldBe
        (parse "")
        []

    it "paragraph" $
      shouldBe
        (parse "hello world")
        [Paragraph "hello world"]

    it "header 1" $
      shouldBe
        (parse "* Header 1")
        [Header 1 "Header 1"]

    it "code" $
      shouldBe
        (parse "> main = putStrLn \"hello world!\"")
        [CodeBlock ["main = putStrLn \"hello world!\""]]

Also, there are other "expectations" like shouldBe that we can use when writing tests. They are described in the hspec tutorial and can be found in the haddock documentation as well.

Raw strings

If we want to write multi-line strings, or avoid escaping strings like we did in the "code" test, we can use a library called raw-strings-qq which uses a language extension called QuasiQuotes. QuasiQuotes is a meta-programming extension that provides a mechanism for extending the syntax of Haskell.

A quasi-quote has the form [quoter| string |], where the quoter is the name of the function providing the syntax we wish to use, and the string is our input.

In our case, we use the quoter r, which is defined in raw-strings-qq, and write any string we want, with multi-lines and unescaped strings! We could use this to write the tests we previously wrote:

{-# language QuasiQuotes #-}

...

import Text.RawString.QQ

...

example3 :: String
example3 = [r|
Remember that multiple lines with no separation
are grouped together to a single paragraph
but list items remain separate.

# Item 1 of a list
# Item 2 of the same list
|]

And add multi-line tests:

spec :: Spec
spec = do
  describe "Markup parsing tests" $ do
    simple
    multiline


multiline :: Spec
multiline = do
  describe "Multi-line tests" $ do
    it "example3" $
      shouldBe
        (parse example3)
        example3Result
        

example3 :: String
example3 = [r|
Remember that multiple lines with no separation
are grouped together to a single paragraph
but list items remain separate.

# Item 1 of a list
# Item 2 of the same list
|]

example3Result :: Document
example3Result =
  [ Paragraph "Remember that multiple lines with no separation are grouped together to a single paragraph but list items remain separate."
  , OrderedList
    [ "Item 1 of a list"
    , "Item 2 of the same list"
    ]
  ]

Running the tests:

MarkupParsing
  Markup parsing tests
    simple
      Test empty
      paragraph
      header 1
      code
    Multi-line tests
      example3

Finished in 0.0004 seconds
5 examples, 0 failures

Exercise: Add a test for the fourth example described in the previous exercises.

Solution
multiline :: Spec
multiline = do
  describe "Multi-line tests" $ do
    it "example3" $
      shouldBe
        (parse example3)
        example3Result

    it "example4" $
      shouldBe
        (parse example4)
        example4Result
        

example4 :: String
example4 = [r|
* Compiling programs with ghc

Running ghc invokes the Glasgow Haskell Compiler (GHC),
and can be used to compile Haskell modules and programs into native
executables and libraries.

Create a new Haskell source file named hello.hs, and write
the following code in it:

> main = putStrLn "Hello, Haskell!"

Now, we can compile the program by invoking ghc with the file name:

> ➜ ghc hello.hs
> [1 of 1] Compiling Main             ( hello.hs, hello.o )
> Linking hello ...

GHC created the following files:

- hello.hi - Haskell interface file
- hello.o - Object file, the output of the compiler before linking
- hello (or hello.exe on Microsoft Windows) - A native runnable executable.

GHC will produce an executable when the source file satisfies both conditions:

# Defines the main function in the source file
# Defines the module name to be Main, or does not have a module declaration

Otherwise, it will only produce the .o and .hi files.
|]

example4Result :: Document
example4Result =
  [ Header 1 "Compiling programs with ghc"
  , Paragraph "Running ghc invokes the Glasgow Haskell Compiler (GHC), and can be used to compile Haskell modules and programs into native executables and libraries."
  , Paragraph "Create a new Haskell source file named hello.hs, and write the following code in it:"
  , CodeBlock
    [ "main = putStrLn \"Hello, Haskell!\""
    ]
  , Paragraph "Now, we can compile the program by invoking ghc with the file name:"
  , CodeBlock
    [ "➜ ghc hello.hs"
    , "[1 of 1] Compiling Main             ( hello.hs, hello.o )"
    , "Linking hello ..."
    ]
  , Paragraph "GHC created the following files:"
  , UnorderedList
    [ "hello.hi - Haskell interface file"
    , "hello.o - Object file, the output of the compiler before linking"
    , "hello (or hello.exe on Microsoft Windows) - A native runnable executable."
    ]
  , Paragraph "GHC will produce an executable when the source file satisfies both conditions:"
  , OrderedList
    [ "Defines the main function in the source file"
    , "Defines the module name to be Main, or does not have a module declaration"
    ]
  , Paragraph "Otherwise, it will only produce the .o and .hi files."
  ]

Summary

This chapter has been just the tip of the iceberg of the Haskell testing landscape. We haven't talked about property testing or golden testing, testing expected failures, testing IO code, inspection testing, benchmarking, and more. There's just too much to cover!

My hope is that this chapter provided you with the basics of how to start writing tests for your own projects. Please consult the tutorial for your chosen testing framework, and read more about this very important subject on your own.

You can view the git commit of the changes we've made and the final version of the code.

Generating documentation

There are many ways to make it easier for others to get started with our projects and libraries. For example, we can write tutorials, provide runnable examples, describe the internals of the system, and create an API reference.

In this chapter we will focus on generating API reference pages (the kind that can be seen on Hackage) from annotated Haskell source code using Haddock.

Running Haddock

We can generate API reference pages (a.k.a. haddocks in the Haskell world) for our project using our favorite package manager:

Cabal

We can run cabal v2-haddock to generate haddocks:

➜ cabal v2-haddock
Resolving dependencies...
Build profile: -w ghc-9.0.1 -O1
In order, the following will be built (use -v for more details):
 - hs-blog-0.1.0.0 (lib) (first run)
Configuring library for hs-blog-0.1.0.0..
Preprocessing library for hs-blog-0.1.0.0..
Running Haddock on library for hs-blog-0.1.0.0..
Haddock coverage:
   0% (  0 /  3) in 'HsBlog.Env'
  Missing documentation for:
    Module header
    Env (src/HsBlog/Env.hs:3)
    defaultEnv (src/HsBlog/Env.hs:10)
  21% (  7 / 33) in 'HsBlog.Html.Internal'
  Missing documentation for:
    Module header
    Html (src/HsBlog/Html/Internal.hs:8)
...
Documentation created:
/tmp/learn-haskell-blog-generator/dist-newstyle/build/x86_64-linux/ghc-9.0.1/hs-blog-0.1.0.0/doc/html/hs-blog/index.html

Cabal and Haddock will build our project and generate HTML pages for us at:

./dist-newstyle/build/<platform>/<compiler>/<package>-<version>/doc/html/<package>/

We can then open the index.html file from that directory in a web browser and view our package documentation.

Stack

We can run stack haddock to generate haddocks:

➜ stack haddock
...
hs-blog> build (lib + exe)
Preprocessing library for hs-blog-0.1.0.0..
Building library for hs-blog-0.1.0.0..
[1 of 7] Compiling HsBlog.Env
[2 of 7] Compiling HsBlog.Html.Internal
...
hs-blog> haddock
Preprocessing library for hs-blog-0.1.0.0..
Running Haddock on library for hs-blog-0.1.0.0..
Haddock coverage:
   0% (  0 /  3) in 'HsBlog.Env'
  Missing documentation for:
    Module header
    Env (src/HsBlog/Env.hs:3)
    defaultEnv (src/HsBlog/Env.hs:10)
  21% (  7 / 33) in 'HsBlog.Html.Internal'
  Missing documentation for:
    Module header
    Html (src/HsBlog/Html/Internal.hs:8)
...
Documentation created:
.stack-work/dist/x86_64-linux-tinfo6/Cabal-3.2.1.0/doc/html/hs-blog/index.html,
.stack-work/dist/x86_64-linux-tinfo6/Cabal-3.2.1.0/doc/html/hs-blog/hs-blog.txt
Preprocessing executable 'hs-blog-gen' for hs-blog-0.1.0.0..
...

Stack and Haddock will build our project and generate HTML pages for us at:

./.stack-work/dist/<platform>/Cabal-<version>/doc/html/<package>/

We can then open the index.html file from that directory in a web browser and view our package documentation.

Haddock coverage

Haddock will also output a coverage report when run, and will mention user-exposed constructs which are missing documentation. These constructs could be module headers, types, data constructors, type classes, functions, values, etc.

For example:

Haddock coverage:
...
   0% (  0 /  3) in 'HsBlog.Convert'
  Missing documentation for:
    Module header
    convert (src/HsBlog/Convert.hs:8)
    convertStructure (src/HsBlog/Convert.hs:23)
  67% (  2 /  3) in 'HsBlog.Directory'
  Missing documentation for:
    buildIndex (src/HsBlog/Directory.hs:80)
...

We can see that we did not document the HsBlog.Convert at all, and we are missing documentation for the module header, the convert function and the convertStructure function.

On the other hand, it seems that we do currently have some documentation written for the HsBlog.Directory module! We'll see why, but first - try to generate haddocks, see the module hierarchy, browse around the different modules, follow the links of the types, imagine what this API reference could look like, and let's see how we can improve it.

Haddock markup

Haddock builds the API reference pages by building our project, examining the exported modules and their exported definitions, and grabbing source code comments written in special markup format.

Let's take a quick look at this markup format. We will go over a few important bits, but if you'd like to learn more, a complete guide for Haddock markup can be found in the Haddock documentation.

Documenting definitions

All haddock annotations appear as part of regular Haskell comments. They can be used with both single line form (--) and multi-line form ({- and -}). The placement of a comment block and the haddock marker determine to which Haskell definition the haddock string is attached.

We can annotate a Haskell definition by writing a comment block prefixed with | before the definition, or by writing a comment block prefixed with ^ after the definition.

For example:

-- | Construct an HTML page from a `Header`
--   and a `Structure`.
html_
  :: Header -- ^ Represents the @\<head\>@ section in an HTML file
  -> Structure -- ^ Represents the @\<body\>@ section in an HTML file
  -> Html
html_ = ...
...

Here's another example:

{- | Represents a single markup structure. Such as:

- A paragraph
- An unordered list
- A code block
-}
data Structure
  = Header Natural String
  -- ^ A section heading with a level
  | Paragraph String
  -- ^ A paragraph
  | UnorderedList [String]
  -- ^ An unordered list of strings
  | OrderedList [String]
  -- ^ An ordered list of strings
  | CodeBlock [String]
  -- ^ A code block

And another:

{- | Markup to HTML conversion module.

This module handles converting documents written in our custom
Markup language into HTML pages.
-}
module HsBlog.Convert where

As you can see, | and ^ can be used to document functions, function arguments, types, data constructors, modules, and more. They are probably the most important Haddock annotations to remember (and even then, | alone will suffice).

Tip: Annotate the modules, types, and the top-level definitions which are exported from your project with some high-level description of what they are used for (at the very least).

Your users and collaborators will thank you!

Section headings

We can separate our module into sections by adding headings. Headings are comments which are prefixed with a number of * (just like in our markup language).

For example:

-- * HTML EDSL

html_ :: Header -> Structure -> Html
html_ = ...

-- ** Structure

p_ :: Content -> Structure
p_ = ..

h1_ :: Content -> Structure
h1_ = ..

...

-- ** Content

txt_ :: String -> Content
txt_ = ...

link_ :: FilePath -> Content -> Content
link_ = ...

It is also possible to add headings to the export list instead:

module HsBlog.Html
  ( -- * HTML EDSL
    Html
  , html_

    -- ** Combinators used to construct the @\<head\>@ section
  , Header
  , title_
  , stylesheet_
  , meta_

    -- ** Combinators used to construct the @\<body\>@ section
  , Structure
  , p_
  , h_
  , h1_
  , ul_
  , ol_
  , code_

    -- ** Combinators used to construct content inside structures
  , Content
  , txt_
  , img_
  , link_
  , b_
  , i_

    -- ** Render HTML to String
  , render
  )
  where

Separating parts of the module into sections helps keeping the important things together and Haddock will create a table-of-contents at the top of a module page for us as well.

Sometimes it's also easier to figure out whether a module should be split into multiple modules or not after splitting it into sections using headings.


Exercise: Try to re-arrange the modules in our project to your liking and add headings to sections.


Formatting

As we saw earlier, we can also add formatting in the content of our comments. For example, we can:

  • Hyperlink identifiers by surrounding them with `.

    For example: `Header`

  • Write monospaced text by surrounding it with @.

    For example: @Paragraph "Hello"@

  • Add emphasis to text by surrounding it with /.

    For example: /this is emphasised/

  • Add bold to text by surrounding it with __.

    For example: __this is bold__

More

In this chapter we've covered the basics of the Haddock markup language. If you'd like to know more, the Haddock markup guide contains information on how to create even more interesting documentation structures, such as code blocks, grid tables, images and examples.

Summary

We've briefly covered one aspect of documenting Haskell programs: using Haddock to generate informative API reference pages created from source code comments which are annotated with Haddock markup.

While API references are incredibly valuable, remember that there are other forms of documentation that can help your users get started quickly, such as examples and tutorials.


Exercise: Add haddock annotation to the top-level definitions in our project and test your understanding of the program and the various parts - sometimes the best way to learn something is to try explaining it!


Recap

In this book we've implemented a very simple static blog generator while learning Haskell as we go.

  • We've learned about basic Haskell building blocks, such as definitions, functions, types, modules, recursion, pattern matching, type classes, IO and exceptions.
  • We've learned about EDSLs and used the combinator pattern to implement a composable html generation library.
  • We've learned how to leverage types, modules and smart constructors to make invalid states unrepresentable.
  • We've learned how to represent complex data using ADTs.
  • We've learned how to use pattern matching to transform ADTs, and how to use recursion to solve problems.
  • We've used the functional core, imperative shell approach to build a program that handles IO and applies our domain logic to user inputs.
  • We've learned about abstractions such as monoids, functors and monads, and how they can help us reuse code and convey information about shared interfaces.
  • We've learned how to create fancy command-line interfaces, write tests, and generate documentation.

While Haskell is a very big and complex language, and there's always more to be learned, I think we've reached an important milestone where you can start building your own Haskell projects and be productive with Haskell!

This is a good time to celebrate and pat yourself on the back for getting this far! Great job, you!

If you'd like to learn even more about Haskell and continue your Haskell journey beyond this book, check out the appendix sections Where to go next and the FAQ.

Thank you!

Thank you for reading this book. I hope you enjoyed it and found Haskell interesting.

I would very much like to hear your feedback. If you'd like, you could leave your feedback at this book's GitHub issue board, or you could reach me directly on Twitter or via email. You can find my contact information on my website.

If you liked this book, do let me know - your kind words mean a lot.

Finally, if you really liked this book and would like to support future passion projects like it, you can support me directly via Ko-fi.

Thank you and good luck with your next Haskell project!

Where to go next

Haskell is an incredibly rich and deep programming language. New cutting-edge techniques, concepts and features are still being discovered and sometimes integrated into GHC. This sometimes makes it seem impossible to catch up to.

This phenomena is sometimes dubbed The Haskell pyramid. My hope is that by reading this book and following the exercises, you readers have reached the bar of productivity, and you can now go and start working on your own projects with Haskell. I highly encourage you to do so. In my opinion, writing useful Haskell projects is the best method to solidify what you currently know, and identify what you still need to learn.

Extending this project

If you'd like to extend this project, here are a few ideas for you:

  1. Serve over HTTP - You can use a web library such as wai or scotty to serve this blog over HTTP instead of generating it statically
  2. Rewrite it with libraries - you could rewrite it and use a real-world HTML package and markdown parser
  3. Add features
    1. You could add a metadata block at the top of each article which would include the title, publish date and tags of a blog post, and use them when generating HTML, index page and even tags pages
    2. You could add HTML pages templating using mustache or similar, and use that to generate a nice and customizable structure to the page
    3. You could add support for links and images in our markup language parser
    4. Add support for configuration via files which would include things like the blog title, description, or other meta information for things like twitter cards

Or anything else you can think about, consider this project your playground and do whatever you like with it!

Other resources

At some point you are likely to run into new concepts, techniques, or even just a feeling of "I feel like this could be done better". I'd like to point you in the right direction so you can find additional information and learn new Haskell things when you need to or want to.

I've compiled a list of resources for learning Haskell called Haskell study plan, it should include links to very useful articles, community hubs and news aggregators, project suggestions, and even cool open-source Haskell projects. You will also find alternative explanations to thing we've covered and even links to other Haskell tutorials, guides and books in case you need a different view on things.

Also, the GHC User Guide is a fantastic resource with loads of articles and information about the language and GHC tooling around it. It is often the best place to learn about Haskell the language.

However, don't feel pressured to learn everything Haskell has to offer right away, mastering Haskell is a journey that can take a lot of time. Most of us are definitely not there yet, but we can still be very productive with Haskell, build real-world projects, and even discover new techniques and concepts ourselves.

Remember that in a lazy language we evaluate things only when we need them. Maybe we can do that too with Haskell concepts!

Frequently asked questions

Got a question? You can ask in the book's issue tracker!

General questions

How to install editor tools

As far as I know, the most recommended setup today for Haskell development is using VSCode or VSCodium together with the marketplace Haskell extension.

The Haskell extension uses haskell-language-server which can be install via GHCup or even via the Haskell extension itself.

There are other options of course if this setup isn't your jam.

How to learn new things

The Haskell community keeps marching forward, developing new libraries, tools and techniques as well as creating new material for older concepts. The Haskell planetarium aggregates feeds from several communities into one page, as well as a Haskell Weekly newsletter. You might also find the quite a bit of Haskell presence on Twitter!

Debugging

How to debug Haskell code

Most imperative languages provide a step debugger. While the GHCi debugger, exists it is not particularly easy to use, especially because of Haskell's lazy evaluation where things might not evaluated at the order we might intuitively expect. Because of that, Haskellers tend to use trace debugging and equational reasoning. With trace debugging, we try to verify our assumptions about the code - we use the various trace functions as a "hack" to print variables, functions inputs, functions output or even just say "got here", from anywhere at the code.

After finding something that does not match our assumptions, such as unexpected input or output of a function, we try to think what piece of code could be responsible for the discrepancy, or even use trace debugging again to pinpoint the exact location, and try to use "equational reasoning" to evaluate the offending code that betrayed our expectations. If it's easy to do, we try running the function in ghci with different inputs to check our assumptions as well.

Because Haskell focuses on immutability, composibility and using types to eliminate many classes of possible errors, "local reasoning" becomes possible, and trace debugging becomes a viable strategy for debugging Haskell programs.

How to understand type errors

GHC type errors are often not the most friendly errors messages, but they mean well! They are just trying to help us find inconsistencies in our code - often with regards to type usage, they help us avoid making errors.

When you run into error messages, start by reading the messages themselves carefully until you get used to them, and then the offending code hinted by the error message. As you gain experience, it is likely that the most important part of an error will be the location of the offending code, and by reading the code we can find the error without the actual error message.

Adding type signatures and annotations to test your understanding of the types also helps greatly. We can even ask GHC for the expected type in a certain place by using typed holes.

My program is slow. Why?

There could be various reasons. From using inefficient algorithms or using an unsuited data structure for the task in terms of time complexity of the common operations, to less efficient memory representations (this is another reminder to use Text over String in most cases), and laziness issues (again, the evaluation strategy!).

The performance section in my Haskell study plan links to various resources on Haskell evaluation, profiling and case studies.

Design

How to structure programs

Start with the imperative shell functional core approach, define EDSLs with the combinator pattern for logic if needed, use monadic capabilities such as State locally if needed, maybe add an environment configuration with ReaderT, see how it goes.

If that approach fails you, look at why it failed and examine other solutions according to your needs.

How to model data

Modeling data using ADTs are usually the way to go. Often programmers coming from object oriented background tend to look at type classes as a way to define methods similar to inheritance, but this often isn't the right approach and ADTs with different constructors for different alternatives go a long way. Remember that even OOP people often preach for composition over inheritance.

Use functions to define behavior on data rather than trying to couple the two together.