## Contents:

- Why I write and who you are
- Haskell - The language of terms
- Haskell - The language of types
- Haskell - Typeclasses
- Parser combinators: Fast streaming of huge and complex files
- Multiple sequence alignment: Algorithms and performance
- Command line scripting: IO, system calls and parallelism
- Rogue: Graphics and concurrency

** In development **

One of the main concepts I want to convey is that Haskell is fundamentally simpler than Python and most other mainstream languages.

Don’t believe me? If Haskell is so simple, why is it hard to learn? The answer is that complex things have been built from these simple components. You could spend a lifetime exploring the idealized forms and puzzles the community creates. But you don’t have to. Most of the complexity of Haskell is optional.

An additional reason why Haskell is harder for some people is that it is very different from the familiar Algol languages (C, Java, Python, etc). Your first thought when starting with a new language may be, how do I write a for loop? Or, how do I print “hello world”? If so, the responses, “there is no for loop” and “let me tell you about monads”, will not seem very friendly.

I will introduce six main concepts: functions, type constructors, type signatures, type classes, pattern matching and IO.

This order may seem strange to you. Why walk through three sections on types before writing a single line of “normal” code? A functional language is all about functions, functions are maps between things, and things are described by types. So I start by explaining how data is described with types. Then how function types are defined. Then how properties of generics are refined with type classes. With these tools, we can specify an entire Haskell program and type check it. I will introduce you to the design process. Then I will show how to actually write a function (this is easy since Haskell is so simple) and how to take apart data with pattern matching. Finally, I will show you how IO is handled in Haskell.

## Type constructors

What is a boolean? In most languages, this is a complex question. In C, there is
no built in boolean type. Relational operators (`==`

, `<`

, etc) return `int`

values of 0 or 1. The conditional clause in an `if`

statement evaluates to true
if the condition has a binary value of 0. This “0” could be a NULL pointer, a 0
integer, an empty string (just a NULL), or a double that is 0. For floats, there
is the odd case of -0, which does not have a binary value of 0 but still
evaluates to false.

In Python, there is a built-in boolean type with just two possible values,
`False`

and `True`

. This simplicity is deceptive though. As with C, the bool
type is also an integer. The expressions `1 + True`

is 2 and `"ab"[True]`

is
“b”. All objects may be used in a conditional clause and Python will coerce them
to a boolean. “True” objects include non-empty sets, non-empty lists, non-zero
numbers, and empty strings. Classes in Python3 can implement a `__bool__`

method
that allows them to be converted to boolean values. Some types may have multiple
zeroes. For instance, if you have a value that may be either None or an integer,
and you use it in a boolean, you cannot be sure why the `else`

branch is
entered.

In short, the boolean is heavily blessed in most languages. Special boolean behavior is cooked into the compiler. Learning all the quirks of the boolean is just one of many chores on the path language mastery.

In Haskell, though, there is nothing to learn. The `Bool`

type is defined as
follows:

`data Bool = True | False `

There is no boolean primitive in Haskell. A boolean is simply a set of two
values, `True`

and `False`

. It is not an integer. It has no properties. It is
exactly equivalent to the logical notion of a boolean that every child
understands.

Haskell programmers often speak of how Haskell removes whole categories of errors. This is one such case. It is also the first of many examples I will show of why Haskell is simpler than languages such as Python.

Now let’s switch from booleans and talk about things that may fail. If you look up a key in a map, and the key does not exist, what is returned? Of course, there are many things one might do. Maybe you could raise an exception, as is done for dictionaries in Python. This halts further evaluation and unwinds the stack until caught. Or you could return a NULL value and hope the calling function handles it correctly. However, if a NULL is returned, how can you know whether a NULL is returned because the data structure stored a NULL or because the key did not exist?

In Haskell, the broad notion of values that may or may not exist, and
computations that may or may not succeed, is captured by the type `Maybe`

,
defined as follows:

`data Maybe a = Nothing | Just a`

`Maybe`

is a parameterized type with one type variable `a`

. Lowercase type terms
in type declarations always represent generics. The type of `a`

depends on use.
`Nothing`

and `Just`

are type constructors. `Just`

takes an argument. As with
`Bool`

, everything there is to know about `Maybe`

is contained in the type
signature. There is no special compiler handling, no coupled dictionary of
methods. It is an minimal concept. The featurelessness of `Maybe`

allow its
universal use.

But Maybe is not very expressive. What if we want to know more about why on
value is returned? What if we are writing a text parser and want to know where
and why the parse failed. In this case, we want to return one of two possible
values, either the result of a successful parse, or a description of a failing
parse. This concept is captured by the data type `Either`

:

`data Either a b = Left a | Right b`

There is no try-catch in Haskell. There is nothing special about code that may “fail”. What “failure” even is is subjective in Haskell. In most languages, the two paths a computation can take, the “exception” path were something goes wrong, and the successful path where nothing goes wrong, use different semantics. In Haskell, both paths are normal computation. The pass and fail branches are equal parts of the model in Haskell. Both are, in general, handled with equal care. A Haskell programmer can usually say with confidence that all corner cases are modeled within their program. If there are any cases that are not handled (i.e., partial functions), the programmer will know (if they pay any attention to compiler warnings). This leads us to another category of bugs that Haskell eliminates in practice: edge cases.

In practice, you always know if a function you are using can fail because its failure is modeled in the return type as Maybe or Either.

So with three lines we define everything needed for handling conditionals, null values and exceptions. All this without adding any special handling in the compiler.

```
data Bool = True | False
data Maybe a = Nothing | Just a
data Either a b = Left a | Right b
```

We can define data structures in a similar fashion. Perhaps the simplest such data structure is the list:

`data List a = Head a | List a `

Square brackets are used in Haskell to represent this type, for example, `[Int]`

is a list of integers with values such as `[1,2,3]`

.

More complex types may also be easily defined, such as binary trees:

`data BinaryTree a = Leaf a | Node a (BinaryTree a) (BinaryTree a)`

Haskell also has special syntax for tuples, which are lists of types of set
length and variable type. For example, a tuple of type `(Int, String, Bool)`

may
contain values such as `(1, "yolo", True)`

.

Long tuples, though, are confusing to work with. Usually “Records” are used instead. Records are just tuples with names for easier access. Here is how you could define a tuple and a record that store the same information:

```
-- a tuple containing 3d coordinates
type Point = (Double, Double, Double)
-- a record that assigns names to each dimension
data PointRecord = PointRecord
pointX :: Double
{ pointY :: Double
, pointZ :: Double
, }
```

The `type`

keyword defines a type synonym. This is just a name that exists
purely for convenience.

The fields of `PointRecord`

are specified in curly brackets. The compiler will
generate accessor functions named `pointX`

, `pointY`

and `pointZ`

for accessing
each field in the record. Since these go into the global namespace, they must be
unique. Which is why we likely cannot list the labels as just `x`

, `y`

and `z`

.
Annoying? Yes, yes it is. Why can’t we just define `Point`

as:

`data Point = Point { x :: Double, y :: Double, z :: Double}`

And access with `foo.x`

syntax? Well, actually we can with a new Haskell
extension. In truth, records are one of the warts in Haskell. Ask any Haskeller
what they dislike about Haskell, and awkward records is likely to be one of the
items they list.

One last thing, why do we repeat `Point`

? Why not just write:

`data Point = {pointX :: Double, pointY :: Double, pointZ :: Double}`

The reason is that the first `Point`

is the name of the type. It is the name
that goes in type signatures. The second `Point`

is the name of the data
constructor. It is the name used to build or match against the record. While
records usually have only one data constructor, they may have more, for example:

```
data Point
= Point1D {x1D :: Double}
| Point2D {x2D :: Double, y2D :: Double}
| Point3D {x3D :: Double, y3D :: Double, z3D :: Double}
```

By conventions, when there is a single data constructor for a type, the type constructor and data constructor will have the same name.

Before I go on, I would like to introduce the concepts of “sum types” and “product types”. These terms are often used in practice and understanding the intuition behind them may be useful. And if it is not useful, it is at least beautiful.

Types describe sets and sets have sizes. The size of the boolean type is 2. The
size of the set of days is 7. What about `Maybe`

and `Either`

? The type `Maybe`

can be `Nothing`

or `Just a`

. `Nothing`

is a atom with just one value. `Just a`

has a size that is equal to the size of `a`

. So the size of `Maybe a`

is $1 +
||a||`, where the double bars represent size (or cardinality). What about`

Either a b`? It can be`

Left a`or`

Right b`, so we can just add the sizes of`

a`and`

b`: ||*E**i**t**h**e**r**a**b*|| = ||*a*|| + ||*b*||. Since the sizes of these types are
a summation over the sizes of their different options, we call them “sum types”.

The elements in a tuple or record, though, are free to be anything. The tuple
`(a,b)`

will have will have `||a|| * ||b||`

different possible values. So we
call this a “product type”.

This may all seem mathy for the sake of being mathy. But it leads to an interesting idea. If two types have the same size, then each element in one type can be uniquely mapped to an element in the other. This means that are “isomorphic”. One can be translated into the other without loss of information.

Haskell adds more syntactic sugar than is really necessary. Implementing `sum`

and `product`

types really only requires two constructors (operators). From a
mathematical perspective, it might be more elegant to use the type syntax `a + b`

rather than `a | b`

and `a * b`

rather than `(a, b)`

. Phrased like this, what
are the algebraic laws governing this system? Thinking along these lines will
turn you into a computer scientist. You might ask wonder, “what is subtraction,
exponentiation, derivation and integration at the type level”? This is why
people hate Haskellers.

OK, so that about covers everything you need to know about data declarations. The next layer of abstraction is to combine the data types with functions.

## Type signatures

To review, we declare data with the `data`

keyword. For example:

`data Weather = Sunny | Rainy | Cloudy`

Functions of data may be defined as follows:

`goOutsideToPlay :: Weather -> Bool`

This type signature describes functions that map values from the `Weather`

type to values in the `Bool`

type.

A function signature generally maps to many possible functions. Just for fun,
what is the size of the the `goOutsideToPlay`

type? That is, how many possible
functions are there between `Weather`

and `Bool`

. We can make a table of all
possible functions. Each function is a particular list of input/output pairs
(with True/False reduced to 0/1 for convenience):

```
Sunny 0 0 0 0 1 1 1 1
Rainy 0 1 0 1 0 1 0 1
Cloudy 0 0 1 1 0 0 1 1
```

These functions range from never going outside to always going outside. We see
that there are 8 different functions. Or 2^3. Or to phrase it differently,
||Bool||^||Weather||. In the previous section, we discussed “sum types” and
“product types”. Funtions can be described as “exponential types”. The size of
the function `a -> b`

is ||*b*||^{||}*a*||. Cool, huh? Useful? Well, probably not to
you. But these mathematical rules can actually be used within the compiler to
optimize code. The extreme simplicity of the Haskell language allows the
compiler to reason powerfully about the code.

Some functions signatures actually are specific enough to map to just one function or reasonable function. In these cases, implementations may be generated automatically. This principle will be applied in the following section to generate many typeclasses instances for a given data type.

What if we want to make functions of multiple arguments? Well. You can’t. But you can make a function that returns a function. Whether the playing outside is dependent on more than just the weather. Maybe it also depends on the child’s age. Now for a given child, you might want a function that tells you when they specifically will go outside. This can be written as follows:

`goOutsideToPlayAge :: Int -> (Weather -> Bool)`

Then you can write `goOutsideToPlayAge 14`

to get a function for children of age
14. Or you can serially apply arguments: `(goOutsideToPlayAge 14) Sunny`

. That
is a lot of unnecessary parentheses. They can be skipped since the `->`

operator
is right associative and function application is left associative. So we can
write the function signature as:

`goOutsideToPlayAge :: Int -> Weather -> Bool`

This signature is similar to the C prototype:

`bool goOutsideToPlayAge(int, Weather);`

But it is more flexible, since it allows convenient partial application, which is very useful when coupled with higher order functions. I will introduce these soon.

Being comfortable with type signatures is an essential skill for the Haskell programmer. This skill is best learned by seeing many examples. So I will show a few here.

The simplest function is the identity function:

`id :: a -> a`

As seen before, generic types are written in lowercase. It is conventional to
use `a`

, `b`

, etc as generic type variables. Single letter variables are frowned
upon in many communities, but they are common in Haskell. This is especially
true for generics. The `a`

in the identity function an be *anything*. It has no
distinguishing features. If we really wanted to call it something, we would have
to use something vague, like `anything`

or `whatever`

or `thing`

. By
conventions, we stick with `a`

unless we have a good reason to do otherwise.

Here are a few more functions, you can probably guess what all of them do:

```
fst :: (a, b) -> a
snd :: (a, b) -> b
onFst :: (a -> c) -> (a, b) -> c
```

In fact, all of these functions have a single non-pathological implementation.
The only non-pathological way to get a generic `a`

from `(a, b)`

is to take the
first value from the tuple.

Actually, there is also exactly one *pathological* implementation for each of
these functions. They could be undefined. That is, they could map *all* possible
input to “bottom”, the empty/undefined member that exists in all types. Since we
know nothing about the input, we cannot do any specific branching. So either we
do the one correct operation or we are undefined on everything.

Of course, most functions are not so simple. Here are few examples of functions that return the first element in a list:

```
head :: [a] -> a
headMay :: [a] -> Maybe a
headEither :: [a] -> Either String a
```

Since a list may be of 0 length, the first function `head`

is a *partial*
function. That is, one possible input maps (domain) to no output (codomain). So
`head`

could crash the program at runtime. The compiler knows this and will
raise warnings. Now notice what I did here, even without reading the
implementation of `head`

, I already know that it is partial. I know exactly how
to use (and when to avoid) `head`

.

`headMay`

is a total function. It handles the empty case by returning `Nothing`

and `Just a`

otherwise.

`headEither`

returns some sort of string describing the error. This is probably
a boring string since it can’t contain any information that `Nothing`

would not
capture equally well.

Now, notice all through here I assumed that `head`

was returning the first
element. What if the impementation actually returns the second element? This
would mean that the function is *mis-named*, which the type-system obviously
won’t catch. The `head`

function actually has a name that is narrow than the
signature. You could write the following functions with the same signature:

```
lastElement :: [a] -> a
middleElement :: [a] -> a
secondElement :: [a] -> a
```

A type system more expressive than vanilla Haskell might allow further refinement of the signature, for example:

`head :: xs:[a]{ len xs > 0 } -> x:a{ x == xs[0] }`

With these type annotations, the signature will uniquely map to one function for
each type `a`

. In languages like Idris, this kind of signature is possible.
Thinking along these lines is another reason why people hate Haskellers. For
most practical purposes, the type-level ambiguity of what `head`

means is
acceptable. The signatures narrow the space of possible implementations and our
mental model, using hints from the function name and documentation, does the
rest.

## Type classes (a foreshadowing)

Lets talk about a few more complex functions:

```
sum :: [a] -> a
generalSum :: col a -> a
sort :: [a] -> [a]
```

All of these signatures are broken.

In `sum`

we want to add together all values in a list. We want this function to
work for anything numeric. But addition does not apply to all things. So `a`

is,
again, *too* generic. There is not, for instance, an obvious way to add two
`Person`

objects.

In `generalSum`

, we want to generalize `sum`

to work on any collection that can
be iterated over. This might include a set of numbers, a binary tree of numbers,
or values in a hashmap of numbers. But `col`

is too generic. Not all types are
collections and certainly not all types take a single parameter.

In `sort`

, we want to take a list and return a sorted list. But any sort
function will require comparison between elements, and how do we compare things
we know nothing about? We need to specify that for any type in `a`

a comparator
function exists.

The problem is that generics are *too* generic. We need to be able to limit
generics to types that have specific required traits.

One solution to this problem would be to always pass in higher order functions that provide the required functionality.

Here is one such signature for `sum`

, written as `sum'`

.:

`sum' :: a -> (a -> a -> a) -> [a] -> a`

A reasonable sum of an empty list is 0. So whatever type `a`

is, it needs a zero
element. We also need a function that taks two members from set `a`

and returns
(adds) a new member also in set a. This function has type `(a -> a -> a)`

.
Taking the summation is then a simple process of iteratively applying the
addition operator to elements in the list and the accumulated value. You may
know this as a “reduce” operation. More commmonly known as “fold” in the Haskell
world.

The name `sum`

is actually a bit too general for the function we have developed.
The operation here needn’t be addition. It could also be multiplication. Or it
could be string concatenation with an empty string as the “zero” element. Or it
could be list appending, with the empty list as the “zero” element.

Now let’s look at `generalSum`

. It needs to be passed the addition functions as
well as a reduce/fold function for operating on the generic collection `col`

:

```
generalSum' :: ((b -> c -> c) -> c -> col b -> c)
-> a
-> (a -> a -> a)
-> col a
-> a
```

Wow. That is a long signature. The first element in the signature is a general
fold function. It takes a function `(b -> c -> c)`

which combines a value from
the initial collection with the accumulator and yields a new accumulator. The
second argument `c`

in the fold signature, is the initial accumulator. `col b`

is the original collection.

The second, third, and fourth arguments of `generalSum'`

will all be applied to
the fold function to yield the final summation.

The signatures are really awkward, though. Maybe we could role all the helper functions into a named record and pass that in instead? While would could put all three helper functions in one record, it would be more modular to separate out the summing functions and the folding functions. So we may get the following general data types:

```
data Foldable col = Foldable {fold :: ((a -> b -> b) -> b -> col a -> b)}
data Summable a = Summable {zero :: a, add :: a -> a -> a}
```

Now we can write:

`generalSum'' :: Foldable col -> Summable a -> col a -> a`

Now we can define a `Foldable`

record for each type of collection we use and a
`Summbable`

record for each type we want to sum.

Again, “Summbable” is a bit to specific a word. We could say “Multiplicatable”
if the operator were `\*`

rather than `+`

. Or “Appendable” if the operator were
list or string appending and the zero element `[]`

or `""`

. What word is there
for the class of things with an empty/zero element and a closed binary operator?
Well, there isn’t one in English. So we must either choose a deceptively
specific word like “Summable” or choose a precise but possibly unfamiliar (and
doubtlessly pretentious) mathematical term. The Haskell community has opted for
the second path and borrowed the term “Monoid” from abstract algebra.

Passing in records of required functions “specializes” the generics without any
builtin shenanigans. No type-level magic. No special methods baked into the
objects. No over-riding or inheritance or (in this case) multiple inheritance
patterns. The functions (folding, adding) and the zero element must by provided
for `generalSum''`

to be called.

Haskell does, however, elaborate on this idea by adding “typeclasses”.