Lets code already!

This was a long info dump. Let's practice what we've learned. We want to:

  • Create the output directory
  • Grab all file names in a directory
  • Filter them according to their extension, we want to process txt file and
  • copy other files without modification
  • We want to parse each text file, build an index of the result, convert the files to HTML, and write everything to the target directory

Note: I did not write this code immediately in the final form it was presented. It was an iterative process of writing code, refactoring, splitting functions, changing type signatures, and more. When solving a coding problem, start small and simple, do the thing that works, and refactor it when it makes sense and makes the code clearer and more modular. In Haskell we pride ourselves in our ability to refactor code and improve it over time, and that principle holds when writing new software as well!

New module

Let's create a new module, HsBlog.Directory, which will be responsible for handling directories and multiple files. From this module we will export the convertDirectory and buildIndex functions we've defined before.

-- | Process multiple files and convert directories

module HsBlog.Directory
  ( convertDirectory
  , buildIndex
  )
  where

In this module we are going to use the directory and filepath libraries to manipulate directories, files and filepaths. We'll use the new abstractions we've learned, Traversable and Monad, and the concepts and types we've learned about: Either, IO and exceptions.

For all of that, we need quite a few imports:

import qualified HsBlog.Markup as Markup
import qualified HsBlog.Html as Html
import HsBlog.Convert (convert, convertStructure)

import Data.List (partition)
import Data.Traversable (for)
import Control.Monad (void, when)

import System.IO (hPutStrLn, stderr)
import Control.Exception (catch, displayException, SomeException(..))
import System.Exit (exitFailure)
import System.FilePath
  ( takeExtension
  , takeBaseName
  , (<.>)
  , (</>)
  , takeFileName
  )
import System.Directory
  ( createDirectory
  , removeDirectoryRecursive
  , listDirectory
  , doesDirectoryExist
  , copyFile
  )

If you are unsure what a specific function we're using does, look it up at Hoogle, read the type signature and the documentation, and play around with it in ghci.

Converting a directory

We can start by describing the high-level function convertDirectory which encapsulates many smaller functions, each responsible for doing a specific thing. convertDirectory is quite imperative looking, and looks like a different way to describe the steps of completing our task:

-- | Copy files from one directory to another, converting '.txt' files to
--   '.html' files in the process. Recording unsuccessful reads and writes to stderr.
--
-- May throw an exception on output directory creation.
convertDirectory :: FilePath -> FilePath -> IO ()
convertDirectory inputDir outputDir = do
  DirContents filesToProcess filesToCopy <- getDirFilesAndContent inputDir
  createOutputDirectoryOrExit outputDir
  let
    outputHtmls = txtsToRenderedHtml filesToProcess
  copyFiles outputDir filesToCopy
  writeFiles outputDir outputHtmls
  putStrLn "Done."

Here we trust that each IO function handles errors responsibly, and terminates the project when necessary.

Let's examine the steps in order.

getDirFilesAndContent

-- | The relevant directory content for our application
data DirContents
  = DirContents
    { dcFilesToProcess :: [(FilePath, String)]
      -- ^ File paths and their content
    , dcFilesToCopy :: [FilePath]
      -- ^ Other file paths, to be copied directly
    }

-- | Returns the directory content
getDirFilesAndContent :: FilePath -> IO DirContents

getDirFilesAndContent is responsible for providing the relevant files for processing. Both the ones we need to convert to markup (and their textual content) and other files we might want to copy as-is (such as images and style-sheets).

-- | Returns the directory content
getDirFilesAndContent :: FilePath -> IO DirContents
getDirFilesAndContent inputDir = do
  files <- map (inputDir </>) <$> listDirectory inputDir
  let
    (txtFiles, otherFiles) =
      partition ((== ".txt") . takeExtension) files
  txtFilesAndContent <-
    applyIoOnList readFile txtFiles >>= filterAndReportFailures
  pure $ DirContents
    { dcFilesToProcess = txtFilesAndContent
    , dcFilesToCopy = otherFiles
    }

This functions does 4 important things:

  1. List all the files in the directory
  2. Split the files to 2 groups according to their file extension
  3. Read the contents of the .txt files and report when files failed to read
  4. Return the results. We've defined a data type to make what each result more obvious

Part (3) is a little bit more involved than the rest, lets explore it.

applyIoOnList

-- | Try to apply an IO function on a list of values, document successes and failures
applyIoOnList :: (a -> IO b) -> [a] -> IO [(a, Either String b)]
applyIoOnList action files = do
  for files $ \file -> do
    maybeContent <-
      catch
        (Right <$> action file)
        ( \(SomeException e) -> do
          pure $ Left (displayException e)
        )
    pure (file, maybeContent)

applyIoOnList is a higher order function that applies a particular IO function (in our case readFile) on a list of things (In our case FilePaths), and for each thing, it returns the thing itself along with the result of applying the IO function as an Either, where the Left side is a String representation of an error if one occurred.

Notice how much the type of this function tells us about what it might do. Because the types are polymorphic, there is nothing else to do with with the as other than apply them to the function, and nowhere to generate b from other than the result of the function.

Note: when I first wrote this function, it was specialized to work only on readFile, take specifically [FilePath] and return IO [(FilePath, Either String String)]. But after running into other use cases where I could use it (writeFiles and copyFiles) I refactored out the action, the input type and the return type.

This function uses exceptions to catch any error that might be thrown, and encodes both the failure and success cases in the type system using Either, delaying the handling of exceptions to the caller of the function while making sure it won't be forgotten!

Next, let's look at the function that handles the errors by reporting and then filtering out all the cases that failed.

filterAndReportFailures

-- | Filter out unsuccessful operations on files and report errors to stderr.
filterAndReportFailures :: [(a, Either String b)] -> IO [(a, b)]
filterAndReportFailures =
  foldMap $ \ (file, contentOrErr) ->
    case contentOrErr of
      Left err -> do
        hPutStrLn stderr err
        pure []
      Right content ->
        pure [(file, content)]

This code may seem a bit surprising - how come we can use foldMap here? Reminder, the type of foldMap is:

foldMap :: (Foldable t, Monoid m) => (a -> m) -> t a -> m

If we specialize this function for our use case, substituting the general type with the types we are using, we learn that IO [(a, b)] is a monoid. And indeed - [a] is a monoid for any a with [] (the empty list) as mempty and ++ as <>, but also IO a is a monoid for any a that is itself a monoid with pure mempty as mempty and liftA2 (<>) as <>!

Using these instances, we can map over the content, handling errors and returning an empty list to filter a failed case out, or a singleton list to keep the result. and the fold in foldMap will concatenate the resulting list where we return all of the successful cases!

These functions were responsible for fetching the right information. Next, let's look at the code for creating a new directory.

createOutputDirectoryOrExit

-- | Creates an output directory or terminates the program
createOutputDirectoryOrExit :: FilePath -> IO ()
createOutputDirectoryOrExit outputDir =
  whenIO
    (not <$> createOutputDirectory outputDir)
    (hPutStrLn stderr "Cancelled." *> exitFailure)

-- | Creates the output directory.
--   Returns whether the directory was created or not.
createOutputDirectory :: FilePath -> IO Bool
createOutputDirectory dir = do
  dirExists <- doesDirectoryExist dir
  create <-
    if dirExists
      then do
        override <- confirm "Output directory exists. Override?"
        when override (removeDirectoryRecursive dir)
        pure override
      else
        pure True
  when create (createDirectory dir)
  pure create

createOutputDirectoryOrExit itself is not terribly exciting, it does what it is named after, it tries to create the output directory, and exits the program in case it didn't succeed.

createOutputDirectory is the function that actually does the heavy lifting. It checks if the directory already exists, and checks if the user would like to override it. If they do, we remove it and create the new directory, if they don't, we do nothing and report their decision.

txtsToRenderedHtml

let
  outputHtmls = txtsToRenderedHtml filesToProcess

In this part of the code we convert files to markup and change the input file paths into their respective output file paths (.txt -> .html), we then build the index page, and convert everything to HTML.

-- | Convert text files to Markup, build an index, and render as html.
txtsToRenderedHtml :: [(FilePath, String)] -> [(FilePath, String)]
txtsToRenderedHtml txtFiles =
  let
    txtOutputFiles = map toOutputMarkupFile txtFiles
    index = ("index.html", buildIndex txtOutputFiles)
  in
    map (fmap Html.render) (index : map convertFile txtOutputFiles)

toOutputMarkupFile :: (FilePath, String) -> (FilePath, Markup.Document)
toOutputMarkupFile (file, content) =
  (takeBaseName file <.> "html", Markup.parse content)

convertFile :: (FilePath, Markup.Document) -> (FilePath, Html.Html)
convertFile (file, doc) = (file, convert file doc)

One possibly surprising thing about this code could be the map (fmap Html.render) part. We can use fmap on the tuple because it is a Functor on the second argument, just like Either!

copyFiles and writeFiles

The only thing left to do after the processing is completed is to write the directory content after processing to the newly created directory:

-- | Copy files to a directory, recording errors to stderr.
copyFiles :: FilePath -> [FilePath] -> IO ()
copyFiles outputDir files = do
  let
    copyFromTo file = copyFile file (outputDir </> takeFileName file)
  void $ applyIoOnList copyFromTo files >>= filterAndReportFailures

Here we use applyIoOnList again to do something a bit more complicated, instead of reading from a file, it copies from the input path to a newly generated output path. Then we pass the result (which has the type [(FilePath, Either String ())]) to filterAndReportFailures to print the errors and filter out the unsuccessful copies. Because we are not really interested in the output of filterAndReportFailures, we discard it with void, returning () as a result instead.

-- | Write files to a directory, recording errors to stderr.
writeFiles :: FilePath -> [(FilePath, String)] -> IO ()
writeFiles outputDir files = do
  let
    writeFileContent (file, content) =
      writeFile (outputDir </> file) content
  void $ applyIoOnList writeFileContent files >>= filterAndReportFailures

Once again, this code looks almost exactly like copyFiles, but the types are different. Haskell's combination of parametric polymorphism + type class for abstractions is really powerful, and has helped us reduce quite a bit of code.


This pattern of using applyIoOnList and then filterAndReportFailures happens more than once. It might be a good candidate for refactoring. Try it! What do you think about the resulting code? Is it easier or more difficult to understand? Is it more modular or less? What are the pros and cons?


Summary

With that, we have completed our HsBlog.Directory module that handles converting a directory safely. Note that code could probably be simplified quite a bit if we were fine with errors crashing the entire program altogether, but sometimes this is the price we pay for robustness. It is up to you to choose what you can live with and what not, but I hope this saga have taught you how to approach handling errors in Haskell in case you need to.


View the full module:

HsBlog.Directory
-- | Process multiple files and convert directories

module HsBlog.Directory
  ( convertDirectory
  , buildIndex
  )
  where

import qualified HsBlog.Markup as Markup
import qualified HsBlog.Html as Html
import HsBlog.Convert (convert, convertStructure)

import Data.List (partition)
import Data.Traversable (for)
import Control.Monad (void, when)

import System.IO (hPutStrLn, stderr)
import Control.Exception (catch, displayException, SomeException(..))
import System.Exit (exitFailure)
import System.FilePath
  ( takeExtension
  , takeBaseName
  , (<.>)
  , (</>)
  , takeFileName
  )
import System.Directory
  ( createDirectory
  , removeDirectoryRecursive
  , listDirectory
  , doesDirectoryExist
  , copyFile
  )

-- | Copy files from one directory to another, converting '.txt' files to
--   '.html' files in the process. Recording unsuccessful reads and writes to stderr.
--
-- May throw an exception on output directory creation.
convertDirectory :: FilePath -> FilePath -> IO ()
convertDirectory inputDir outputDir = do
  DirContents filesToProcess filesToCopy <- getDirFilesAndContent inputDir
  createOutputDirectoryOrExit outputDir
  let
    outputHtmls = txtsToRenderedHtml filesToProcess
  copyFiles outputDir filesToCopy
  writeFiles outputDir outputHtmls
  putStrLn "Done."

------------------------------------
-- * Read directory content

-- | Returns the directory content
getDirFilesAndContent :: FilePath -> IO DirContents
getDirFilesAndContent inputDir = do
  files <- map (inputDir </>) <$> listDirectory inputDir
  let
    (txtFiles, otherFiles) =
      partition ((== ".txt") . takeExtension) files
  txtFilesAndContent <-
    applyIoOnList readFile txtFiles >>= filterAndReportFailures
  pure $ DirContents
    { dcFilesToProcess = txtFilesAndContent
    , dcFilesToCopy = otherFiles
    }

-- | The relevant directory content for our application
data DirContents
  = DirContents
    { dcFilesToProcess :: [(FilePath, String)]
      -- ^ File paths and their content
    , dcFilesToCopy :: [FilePath]
      -- ^ Other file paths, to be copied directly
    }

------------------------------------
-- * Build index page

buildIndex :: [(FilePath, Markup.Document)] -> Html.Html
buildIndex files =
  let
    previews =
      map
        ( \(file, doc) ->
          case doc of
            Markup.Header 1 header : article ->
              Html.h_ 3 (Html.link_ file (Html.txt_ header))
                <> foldMap convertStructure (take 2 article)
                <> Html.p_ (Html.link_ file (Html.txt_ "..."))
            _ ->
              Html.h_ 3 (Html.link_ file (Html.txt_ file))
        )
        files
  in
    Html.html_
      "Blog"
      ( Html.h1_ (Html.link_ "index.html" (Html.txt_ "Blog"))
        <> Html.h_ 2 (Html.txt_ "Posts")
        <> mconcat previews
      )

------------------------------------
-- * Conversion

-- | Convert text files to Markup, build an index, and render as html.
txtsToRenderedHtml :: [(FilePath, String)] -> [(FilePath, String)]
txtsToRenderedHtml txtFiles =
  let
    txtOutputFiles = map toOutputMarkupFile txtFiles
    index = ("index.html", buildIndex txtOutputFiles)
  in
    map (fmap Html.render) (index : map convertFile txtOutputFiles)

toOutputMarkupFile :: (FilePath, String) -> (FilePath, Markup.Document)
toOutputMarkupFile (file, content) =
  (takeBaseName file <.> "html", Markup.parse content)

convertFile :: (FilePath, Markup.Document) -> (FilePath, Html.Html)
convertFile (file, doc) = (file, convert file doc)

------------------------------------
-- * Output to directory

-- | Creates an output directory or terminates the program
createOutputDirectoryOrExit :: FilePath -> IO ()
createOutputDirectoryOrExit outputDir =
  whenIO
    (not <$> createOutputDirectory outputDir)
    (hPutStrLn stderr "Cancelled." *> exitFailure)

-- | Creates the output directory.
--   Returns whether the directory was created or not.
createOutputDirectory :: FilePath -> IO Bool
createOutputDirectory dir = do
  dirExists <- doesDirectoryExist dir
  create <-
    if dirExists
      then do
        override <- confirm "Output directory exists. Override?"
        when override (removeDirectoryRecursive dir)
        pure override
      else
        pure True
  when create (createDirectory dir)
  pure create

-- | Copy files to a directory, recording errors to stderr.
copyFiles :: FilePath -> [FilePath] -> IO ()
copyFiles outputDir files = do
  let
    copyFromTo file = copyFile file (outputDir </> takeFileName file)
  void $ applyIoOnList copyFromTo files >>= filterAndReportFailures

-- | Write files to a directory, recording errors to stderr.
writeFiles :: FilePath -> [(FilePath, String)] -> IO ()
writeFiles outputDir files = do
  let
    writeFileContent (file, content) =
      writeFile (outputDir </> file) content
  void $ applyIoOnList writeFileContent files >>= filterAndReportFailures

------------------------------------
-- * IO work and handling errors

-- | Try to apply an IO function on a list of values, document successes and failures
applyIoOnList :: (a -> IO b) -> [a] -> IO [(a, Either String b)]
applyIoOnList action files = do
  for files $ \file -> do
    maybeContent <-
      catch
        (Right <$> action file)
        ( \(SomeException e) -> do
          pure $ Left (displayException e)
        )
    pure (file, maybeContent)

-- | Filter out unsuccessful operations on files and report errors to stderr.
filterAndReportFailures :: [(a, Either String b)] -> IO [(a, b)]
filterAndReportFailures =
  foldMap $ \ (file, contentOrErr) ->
    case contentOrErr of
      Left err -> do
        hPutStrLn stderr err
        pure []
      Right content ->
        pure [(file, content)]

------------------------------------
-- * Utilities

confirm :: String -> IO Bool
confirm question = do
  putStrLn (question <> " (y/n)")
  answer <- getLine
  case answer of
    "y" -> pure True
    "n" -> pure False
    _ -> do
      putStrLn "Invalid response. Use y or n."
      confirm question

whenIO :: IO Bool -> IO () -> IO ()
whenIO cond action = do
  result <- cond
  if result
    then action
    else pure ()