# Parsing markup part 01 (Recursion)

Let's have a look at how to parse a multi-lined string of markup text
written by a user and convert it to the `Document`

type we defined
in the previous chapter.

Our strategy is to take the string of markup text, and:

- Split it to a list where each element represents a separate line, and
- Go over the list line by line and process it, remembering information from previous lines if necessary

So the first thing we want to do is to process the string line by line.
We can do that by converting the string to a list of string.
Fortunately the Haskell
`Prelude`

module from the Haskell standard library
`base`

exposes the function
`lines`

that does exactly what we want. The `Prelude`

module is exposed in every
Haskell file by default so we don't need to import it.

For the line processing part, let's start by ignoring all of the markup syntax and just group lines together into paragraphs (paragraphs are separated by an empty line), and iteratively add new features later in the chapter:

A common solution in imperative programs would be to iterate over the lines
using some *loop* construct and accumulate lines that should be grouped together
into some intermediate mutable variable. When we reach an empty line, we insert
the content of that variable into another mutable variable that accumulates the
results.

Our approach in Haskell isn't so different, except that we do not use loops
or mutable variables. Instead, we use **recursion**.

## Recursion and accumulating information

Instead of loops, in Haskell we use recursion to model iteration.

Consider the following contrived example: let's say that
we want to write an algorithm for adding two natural numbers together,
and we don't have a standard operation to do that (+), but we do
have two operations we could use on each number: `increment`

and `decrement`

.

A solution we could come up with is to slowly "pass" one number to the other number iteratively, by incrementing one, and decrementing the other. And we do that until the number we decrement reaches 0.

For example for `3`

and `2`

:

- We start with
`3`

and`2`

, and we increment`3`

and decrement`2`

- On the next step we now have
`4`

and`1`

, we increment`4`

and decrement`1`

- On the next step we now have
`5`

and`0`

, since the second number is`0`

we declare`5`

as the result.

This can be written imperatively using a loop:

```
function sum(n, m) {
while (m /= 0) {
n = increment(n);
m = decrement(m);
}
return n;
}
```

We can write the same algorithm in Haskell without mutation using recursion:

```
sum n m =
if m /= 0
then sum (increment n) (decrement m)
else n
```

**In Haskell, in order to emulate iteration with mutable state, we call the function again
with the values we want the variables to have in the next iteration.**

### Evaluation of recursion

Recursion commonly has a bad reputation for being slow and possibly unsafe compared to loops. This is because in imperative languages, calling a function often requires creating a new call stack.

However, functional languages (and Haskell in particular) play by different
rules and implement a feature called tail call elimination - when the result of a function call
is the result of the function (this is called tail position), we can just drop the current
stack frame and then allocate one for the function we call, so we don't require `N`

stack frames
for `N`

iterations.

This is of course only one way to do tail call elimination and other
strategies exists, such as translating code like our recursive `sum`

above to the iteration version.

#### Laziness

Haskell plays by slightly different rules because it uses a *lazy evaluation strategy*
instead of the much more common strict evaluation strategy. An *evaluation strategy*
refers to "when do we evaluate a computation". In a strict language the answer is simple:
*we evaluate the arguments of a function before entering a function*.

So for example the evaluation of `sum (increment 3) (decrement 2)`

using strict evaluation
will look like this:

- Evaluate
`increment 3`

to`4`

- Evaluate
`decrement 2`

to`1`

- Evaluate
`sum 4 1`

Or, alternatively (depending on the language) we reverse (1) and (2) and evaluate the arguments from right-to-left instead of left-to-right.

On the other hand, with lazy evaluation, we *only evaluate computation when we need it*, where
'*when do we need it?*' is when it is part of a computation that will have some effect on the
outside world, for example when writing a computation to standard output or sending it over the network.

So unless this computation is required, it won't be evaluated. For example:

```
main =
if sum (increment 2) (decrement 3) == 5
then putStrLn "Yes."
else putStrLn "No."
```

In the case above, we need the result of `sum (increment 2) (decrement 3)`

in order to know which message to write,
so it will be evaluated. But:

```
main =
let
five = sum (increment 2) (decrement 3)
in
putStrLn "Not required"
```

In the case above we don't actually need `five`

, so we don't evaluate it!

But then if we know we need `sum (increment 2) (decrement 3)`

,
do we use strict evaluation now? The answer is no - because we might not need
to evaluate the arguments to complete the computation. For example in this case:

```
const a b = a
main =
if const (increment 2) (decrement 3) == 3
then putStrLn "Yes."
else putStrLn "No."
```

`const`

ignores the second argument and returns the first, so we don't actually need
to calculate `decrement 3`

in order to provide an answer to the computation and in
turn output an answer to the screen.

With the lazy evaluation strategy we will evaluate expressions when we need to (when they are required
in order to do something for the user), and we evaluate from the outside in - first
we enter functions, and then we evaluate the arguments when we need to (usually when the thing
we want to evaluate appears in some control flow such as the condition of an `if`

expression
or a pattern in pattern matching).

I've written a more in-depth blog post about how this works in Haskell: Substitution and Equational Reasoning.

Please read it and try to evaluate the following program by hand:

```
import Prelude hiding (const, sum) -- feel free to ignore this line
increment n = n + 1
decrement n = n - 1
const a b = a
sum n m =
if m /= 0
then sum (increment n) (decrement m)
else n
main =
if const (sum 3 2) (decrement 3) == 5
then putStrLn "Yes."
else putStrLn "No."
```

Remember that evaluation always begins from `main`

.

## Solution

evaluating `main`

```
if const (sum 3 2) (decrement 3) == 5
then putStrLn "Yes."
else putStrLn "No."
```

expanding `const`

```
if sum 3 2 == 5
then putStrLn "Yes."
else putStrLn "No."
```

expanding `sum`

```
if (if 2 /= 0 then sum (increment 3) (decrement 2) else 3) == 5
then putStrLn "Yes."
else putStrLn "No."
```

evaluating the control flow `2 /= 0`

```
if (if True then sum (increment 3) (decrement 2) else 3) == 5
then putStrLn "Yes."
else putStrLn "No."
```

Choosing the `then`

branch

```
if (sum (increment 3) (decrement 2)) == 5
then putStrLn "Yes."
else putStrLn "No."
```

expanding `sum`

```
if
( if decrement 2 /= 0
then sum
(increment (increment 3))
(decrement (decrement 2))
else (increment 3)
) == 5
then putStrLn "Yes."
else putStrLn "No."
```

Evaluating `decrement 2`

in the control flow (notice how both places change!)

```
if
( if 1 /= 0
then sum
(increment (increment 3))
(decrement 1)
else (increment 3)
) == 5
then putStrLn "Yes."
else putStrLn "No."
```

Evaluating the control flow `1 /= 0`

```
if
( if True
then sum
(increment (increment 3))
(decrement 1)
else (increment 3)
) == 5
then putStrLn "Yes."
else putStrLn "No."
```

Choosing the `then`

branch

```
if
( sum
(increment (increment 3))
(decrement 1)
) == 5
then putStrLn "Yes."
else putStrLn "No."
```

Expanding `sum`

```
if
( if decrement 1 /= 0
then sum
(increment (increment (increment 3)))
(decrement (decrement 1))
else increment (increment 3)
) == 5
then putStrLn "Yes."
else putStrLn "No."
```

Evaluating control flow `decrement 1`

```
if
( if 0 /= 0
then sum
(increment (increment (increment 3)))
(decrement 0)
else increment (increment 3)
) == 5
then putStrLn "Yes."
else putStrLn "No."
```

Evaluating control flow `0 /= 0`

```
if
( if False
then sum
(increment (increment (increment 3)))
(decrement 0)
else increment (increment 3)
) == 5
then putStrLn "Yes."
else putStrLn "No."
```

Choosing the `else`

branch

```
if
(increment (increment 3)) == 5
then putStrLn "Yes."
else putStrLn "No."
```

Evaluate control flow `increment (increment 3)`

```
if
(increment 3 + 1) == 5
then putStrLn "Yes."
else putStrLn "No."
```

Evaluate in control flow `increment 3`

```
if
(3 + 1 + 1) == 5
then putStrLn "Yes."
else putStrLn "No."
```

Evaluate in control flow `3 + 1`

```
if
(4 + 1) == 5
then putStrLn "Yes."
else putStrLn "No."
```

Evaluate in control flow `4 + 1`

```
if
5 == 5
then putStrLn "Yes."
else putStrLn "No."
```

Evaluate in control flow `5 == 5`

```
if
True
then putStrLn "Yes."
else putStrLn "No."
```

Choosing the `then`

branch

```
putStrLn "Yes."
```

Which when run will print `Yes.`

to the screen.

### General recursion

In general, when trying to solve problems recursively, it is useful to think about the problem in three parts:

- Finding the
**base case**(the most simple cases - the ones we already know how to answer) - Figuring out how to
**reduce**the problem to something simpler (so it gets closer to the base case) **Mitigating the difference**between the reduced version and the solution we need to provide

The reduce and mitigate steps together are usually called the *recursive step*.

Let's take a look at another example problem: generating a list of a particular size with a specific value in place of every element.

In Haskell, this function would have the following signature:

```
replicate :: Int -> a -> [a]
```

Here are a few usage examples of `replicate`

:

```
ghci> replicate 4 True
[True,True,True,True]
ghci> replicate 0 True
[]
ghci> replicate (-13) True
[]
```

How would we implement this function recursively? How would describe it in three steps above?

**Base case**: the cases we already know how to generate are the cases where the length of the list is zero (or less) - we just return an empty list.**Reduce**: while we might not know how to generate a list of size`N`

(where`N`

is positive), if we knew the solution for`N-1`

we could:**Mitigate**: Add another element to the solution for`N-1`

using the`:`

(cons) operator.

Try to write this in Haskell!

## Solution

```
replicate :: Int -> a -> [a]
replicate n x =
if n <= 0 -- recognizing the base case
then
[] -- the solution for the base case
else
x : replicate (n - 1) x
-- --- -------------------
-- ^ ^
-- | |
-- | +-------- reduction
-- |
-- +--- mitigation
```

### Mutual recursion

When solving functions recursively we usually call the same function again,
but that doesn't have to be the case. It is possible to reduce our problem
to something simpler that requires an answer from a different function.
If, in turn, that function will (or another function in that call chain)
call our function again, we have a **mutual recursive** solution.

For example, let's write two functions, one that checks whether a natural number is even or not, and one that checks whether a number is odd or not only by decrementing it.

```
even :: Int -> Bool
odd :: Int -> Bool
```

Let's start with `even`

, how should we solve this recursively?

**Base case**: We know the answer for`0`

- it is`True`

.**Reduction**: We might not know the answer for a general`N`

, but we could check whether`N - 1`

is odd,**Mitigation**: if`N - 1`

is odd, then`N`

is even! if it isn't odd, then`N`

isn't even.

What about `odd`

?

**Base case**: We know the answer for`0`

- it is`False`

.**Reduction**: We might not know the answer for a general`N`

, but we could check whether`N - 1`

is even,**Mitigation**: if`N - 1`

is even, then`N`

is odd! if it isn't even, then`N`

isn't odd.

Try writing this in Haskell!

## Solution

```
even :: Int -> Bool
even n =
if n == 0
then
True
else
odd (n - 1)
odd :: Int -> Bool
odd n =
if n == 0
then
False
else
even (n - 1)
```

## Partial functions

because we didn't handle negative cases in the example above, our functions will loop forever
when a negative value is passed as input. A function that does not return a result for some value
(either by not terminating or by throwing an error) is called **a partial function**
(because it only returns a result of a part of the possible inputs).

Partial functions are generally considered **bad practice** because they can have
unexpected behaviour at runtime, so we want to **avoid using** partial functions
as well as **avoid writing** partial functions.

The best way to avoid writing partial functions is by covering all inputs!
In the situation above, it is definitely possible to handle negative numbers
as well, so we should do that! Or, instead, we could require that our functions
accept a `Natural`

instead of an `Int`

, and then the type system would've stopped
us from using these functions with values that we did not handle.

There are cases where we can't possibly cover all inputs, in these cases it is important to re-examine the code and see if we could further restrict the inputs using types to mitigate these issues.

For example, the `head :: [a] -> a`

function from `Prelude`

promises
to return the first element (the head) of a list, but we know that lists
could possibly be empty, so how can this function deliver on its promise?

Unfortunately, it can't. But there exists a different function that can:
`head :: NonEmpty a -> a`

from the
`Data.List.NonEmpty`

module! The trick here is that this other `head`

does not take a general list
as input, it takes a different type entirely, one that promises to have
at least one element, and therefore can deliver on its promise!

We could also potentially use smart constructors with `newtype`

and enforce some sort
of a restriction in the type system, as we saw in earlier chapters.
But this solution can sometimes be less ergonomic to use.

An alternative approach is to use `data`

types to encode the absence of a proper result,
for example, using `Maybe`

, as we'll see in a future chapter.

## Parsing markup?

Let's get back to the task at hand.

As stated previously, our strategy for parsing the markup text is:

- Split the string to a list where each element is a separate line
(which we can do with
`lines`

), and - Go over the list line by line and process it, remembering information from previous lines if necessary

Remember that we want to start by ignoring all of the markup syntax and just group lines together into paragraphs (paragraphs are separated by an empty line), and iteratively add new features later in the chapter:

```
parse :: String -> Document
parse = parseLines [] . lines -- (1)
parseLines :: [String] -> [String] -> Document
parseLines currentParagraph txts =
let
paragraph = Paragraph (unlines (reverse currentParagraph)) -- (2), (3)
in
case txts of
[] -> [paragraph]
currentLine : rest ->
if trim currentLine == ""
then
paragraph : parseLines [] rest -- (4)
else
parseLines (currentLine : currentParagraph) rest -- (5)
trim :: String -> String
trim = unwords . words
```

Things to note:

- We pass a list that contains the currently grouped paragraph (paragraphs are separated by an empty line)
- Because of laziness,
`paragraph`

is not computed until it's needed, so we don't have to worry about the performance implications in the case the we are still grouping lines - Why do we reverse
`currentParagraph`

? (See point (5)) - When we run into an empty line we add the accumulated paragraph to the resulting list (A
`Document`

is a list of structures) and start the function again with the rest of the input. - We pass the new lines to be grouped in a paragraph
**in reverse order**because of performance characteristics - because of the nature of singly-linked lists, prepending an element is fast, and appending is slow. Prepending only requires us to create a new cons (`:`

) cell to hold a pointer to the value and a pointer to the list, but appending requires us to traverse the list to its end and rebuild the cons cells - the last one will contain the last value of the list and a pointer to the list to append, the next will contain the value before the last value of the list and a pointer to the list which contains the last element and the appended list, and so on.

This code above will group together paragraphs in a structure, but how do we view our result? In the next chapter will take a short detour and talk a bit about type classes, and how they can help us in this scenario.