## Contents:

- Why I write and who you are
- Haskell - The language of terms
- Haskell - The language of types
- Haskell - Typeclasses
- Interlude: how typing simplifies development
- Parser combinators: Fast streaming of huge and complex files
- Big data: Algorithms and performance

In this series, I will compare code from Python and Haskell. While I expect you to be familiar with Python, I assume Haskell is new. The goal of the next three posts is to give you a foundation in the language. You may be tempted to jump ahead to the practical chapters, and I’m not one to dissuade anyone from a good temptation, but when you are done being lost, I hope you’ll come back here.

Haskell can be divided into two languages: one of terms and one of types. In this section, I will introduce the language of terms. In the next two sections I will introduce the language of types.

## Haskell blitz - all the syntax

I will start by very quickly presenting all the core term-level syntax in Haskell. At the end of this chapter I will address a few big ideas.

In this chapter, I will only deal with single file programs. These should always be named “Main.hs”. Here is a simple Hello World example:

`= print "Hello World" main `

This can be compiled and run:

```
$ ghc Main.hs
$ ./Main
"Hello World"
```

`ghc`

is the most common Haskell compiler.

The `print`

function used above can print most things. Here are a few
primitives:

```
-- The `do` keyword allows us to chain together many IO operations
= do
main print "I am a string"
print 42
print (42 + 1)
print [1,2,3]
print (1, "hello", True)
```

Say no to heterogeneous lists. Elements in Haskell lists must have the same type.

Lists, such as `[1,2,3]`

, are homogeneous lists containing elements of only one
type. The are *linked lists*, so are not very memory efficient and do not allow
efficient random access. Elements may, however, be added and removed efficiently
from the beginning of the list.

```
= do
main print [] -- empty list
print 1:[2,3] -- ':' adds the left element to the beginning of the list
```

Haskell supports list comprehensions. These may be familiar from Python. They differ only in that Haskell follows the more succinct mathematical set builder notation:

```
= [x * x | x <- [1..10], mod x 2 == 0]
squares = [[x * y | x <- [1..10], mod x 2 == 0] | y <- [1..10], mod y 2 == 0]
timesTable
= do
main print squares
print timesTable
```

I’m going to stop repeating the `main`

term now. You get the idea. There is also a
Haskell repl shell you can use, just call `ghci`

. Then you can enter expressions
and see the results immediately. However, its evaluation can be misleading, so
I’d recommend sticking with compiled code for the moment.

Functions may be defined and called like so:

`= length (concat (map words (lines text))) wc text `

This function counts the number of words in a text. First `lines`

splits the
text into lines, then `map words`

splits each line into words, then `concat`

merges all the lists of words into one list, then `length`

finds the total
number of words.

Functions may be composed with the dot operator:

`= (length . concat . map words . lines) text wc text `

Or equivalently:

`= length . concat . map words . lines wc `

Here we drop the explicit argument `text`

.

The function `map`

applies a function to every element in a list. Above we
apply the function `words`

to split lines into words. Here are a few other
examples of `map`

and *partial application* of arguments:

```
= map (+ 1)
incrementAll = map (take 5) -- take the first 5 from each list in a list of lists firstFive
```

There are no for-loops in Haskell. Functions such as `map`

are used
instead.

Declarations may be given their own scope with `where`

statements:

```
= map square xsInc where
foo xs = map (+ 1) xs
xsInc = x * x square x
```

Haskell functions always contain a single expression. Code that would be placed
in the function body in other languages is typically placed in the `where`

block in Haskell.

Pattern matching is central in Haskell and is used in most functions. Here is the exponential Fibonacci implementation:

```
0 = 1
fib 1 = 1
fib = fib (n - 1) + fib (n - 2) fib n
```

Pattern matching can decompose structures:

```
fst (x, _) = x -- take the first element in a tuple
snd (_, x) = x -- take the second element in a tuple
head (x:_) = x -- take the first element a list
tail (_:xs) = xs -- take all but the first element in a list
```

What if we pass a three-element tuple to `fst`

? This would raise an error at
compile time. For `fst`

, we *know* the type is a two element tuple. Nothing else
can *ever* be passed to `fst`

. Thus the definition above handles all possible cases
and cannot *ever* fail.

Guards provide a succinct alternative to if-elif-else chains:

```
fibonacci n| n < 0 = 0 -- are there negative Fibonacci numbers?
| n < 2 = 1
| otherwise = fibonacci (n - 1) + fibonacci (n - 2) -- otherwise is an alias for True
```

If you want to avoid exponential blow-up due to repeated calculations, you can implement Fibonacci iteratively as follows.

```
-- n is the index of the requested Fibonacci number
fastFibonacci n| n < 0 = 0
| n < 2 = 1
| otherwise = f 2 1 1 where
-- define a new function where
-- ni is the current index
-- x is (ni - 1)th Fibonacci number
-- y is (ni - 2)th Fibonacci number
fib ni x y| n == ni = x + y
| otherwise = fib (ni + 1) y (x + y)
```

Guards and pattern matching play nicely together, for example:

```
-- the merge step from the merge-sort algorithm
:xs) (y:ys)
merge (x| x > y = y : x : merge xs ys
| otherwise = x : y : merge xs ys
-- we can use apostrophes in names, here I use it to avoid a name conflict
= []
filter' f [] :xs)
filter' f (x| f x = x : filter' f xs
| otherwise = filter' f xs
```

That is just about all you need to know about the basic Haskell syntax. Next we will dive a little deeper into a few Haskell concepts.

## A deeper look into functions

In mathematics, a function is a one-to-one mapping from one set (the domain) to another set (the codomain). Every element in the domain must map to exactly one element in the codomain.

Functions in Haskell follow the mathematical definition. That is all you need to know to reason about them.

The function

`= g x + h x f x y `

is equivalent to the mathematical statement *f*(*x*, *y*) = *g*(*x*) + *h*(*y*). The equals
sign indicates true equality. This means that the right and left sides of the
equals signs can be substituted for each other anywhere in the code.

Substitutability has deep consequences. Consider the following Haskell code:

```
= 5
x = f x
y = (y, y) z
```

In Haskell, equals means equals

Since `y = f x`

, then `(y, y)`

must equal `(f x, f x)`

. This entails `f x`

cannot have side effects such as mutating `x`

. To see why this is true, let’s
say that `f`

*could* mutate `x`

. Say `f`

mutates `x`

to `x+1`

and then returns
the new `x`

. In this case, `(f x, f x)`

would equal `(6,7)`

whereas `(y,y)`

,
given `y = f x`

, would equal `(6,6)`

. So we see that allowing mutation breaks
our function definition, thus it cannot be allowed. The same is true for other
side effects. Suppose `f x`

prints the value of `x`

to the terminal every time
it is run. Then `(f x, f x)`

would print twice whereas `(y,y)`

would print only
once (when `y`

is defined). So `f`

cannot have side effects.

The equational reasoning of Haskell differs from Python and most other languages. It is familiar, though, from elementary algebra and from our usual intuition about the world. If Python was your first language, then you had to unlearn your childhood intuition.

In languages that use assignment, it is common to write expressions such as:

`x = x + 1`

Here the value `x`

is reassigned to the old value of `x`

plus 1. In Haskell,
this same expression may be executed without raising an error, but something
stranger happens instead. Since `x`

is equal to `x+1`

, then `x+1`

is equal to
`(x+1)+1`

. `x`

may be substituted again, so `x=((x+1)+1)+1`

. And so on
forever. The term is initiates an infinite summation.

## What about keyword arguments?

Python function definition and call syntax is quite complex. Arguments may be
positional or may be given defaults and passed as keyword arguments. Arguments
may be expanded from `*args`

or `**kwargs`

. Function definitions may be set to
positional only or keyword only with `def foo(x, /)`

and `def foo(*, ...)`

syntax, respectively. Arguments to a function may use names `foo(x=1)`

or be
positional `foo(1)`

. And of course functionality may be expressed as independent
functions or class methods. There is more than one way to do it, and the right
way isn’t always obvious, even if you are Dutch.

Haskell is much simpler. All arguments are positional. If a function needs to take many parameters, for example a CSV parser, then the parameters would be defined in a record and passed as a positional argument. This is often a good practice in Python as well. For an opinion other than mine, see Eden Au’s post here.

## Where are all my loops?

Haskell has no loop constructs. Instead we use recursion.

Rather than having special syntax for looping, in Haskell we define our own looping functions. Some of the basic looping patterns may be implemented (naively) as below:

```
map _ [] = []
map f (x:xs) = f x : map' f xs
filter _ [] = []
filter f (x:xs)
| f x = x : filter f xs
| otherwise = filter f xs
foldl _ b [] = b
foldl f b (x:xs) = foldl (f b x) xs
foldr _ b [] = b
foldr f b (x:xs) = f x (foldr f b xs)
```

These functions may be used as so:

```
map f xs
filter f xs
foldl f 0 xs
```

The same operations could be done with loops in Python as below:

```
# map pattern
= []
ys for x in xs:
ys.append(f(x))
# filter pattern
= []
ys for x in xs:
if f(x):
ys.append(x)
# fold pattern
= 0 # or some other initial value
b for x in xs:
= f b x b
```

There are a few key differences between the Haskell approach of the looping approach.

First, the Haskell functions are far more general. The implementations I wrote above are just for lists, but the functions are usually implemented within typeclasses. We’ll cover this later on. But the main idea is that they work for more than just lists. For example, you might map or fold a tree, not just a list.

Second, the Haskell functions are more structured. `map`

calls a function on
each element in a structure with the guarantee that the topology of the
structure is not changed. The Python loops deconstruct a structure, piece by
piece, but leave reconstruction up to the programmer.

Third, the Haskell functions are always the same. A for-loop may represent many different patterns over unbranched sequences. It may mix many concerns (e.g., filtering and mapping). It may terminate early. Understanding what a for-loop is doing requires carefully reading the body of the loop. For complex cases, this can be a source of errors.

For example, consider this Python generator:

```
for x in xs:
# some odd edge case
# lots of code
... if bar(x):
continue
# lots of code
... yield z
```

For each element in the iterator `xs`

, the generator yields some value `z`

, except in some
edge case. A programmer might casually add the filtering constraint `bar(x)`

and
break downstream code that requires one `z`

for each element in `xs`

. The
intention in Haskell is more explicit:

`map foo . filter bar`

The `map`

function cannot change topology and the `filter`

function can *only*
change topology (by dropping elements). The large Python for-loop body is
cleanly separated into two logical concerns.

## What if I want side effects?

Previously, I wrote the old Hello World with explanation as:

`= print "Hello World" main `

`print`

certainly appears to be a function that has side effects. It writes to
the console. What gives?

I will explain in more detail later, but for now you can think print as a one-way portal out of the magic Haskell wonderland.

We’ll come back to this in chapter 3 when we introduce the IO monad.

## Where are all the types?

Haskell is supposed to be a statically-typed language, but where are all the types?

If you come from Java or C++, you may think of statically typed languages as being languages where you have to write repetitive type annotations every time you define a variable. From this angle, a selling point of Python is the freedom to not worry about types. However, the difference between static and dynamic languages is not that one has type annotations and the other doesn’t. Rather it is that in static languages all types are known at compile time, whereas in dynamic code types are resolved at runtime. But no good compiler would ever trust the annotations of mortal men doomed to lie. So compilers infer types.

Haskell requires type annotations only when types are ambiguous. But that does not mean that Haskell programmers avoid writing types. To the Haskeller, far from being syntactic noise, type annotations are a conceptual tool for reasoning about code and building complex programs. This will be the topic of the chapter.