How to TDD a compiler: the story so far

In this series, I will be documenting my experiences and progress in writing a compiler for a yet-to-be-named programming language.

A while ago now, sometime in 2015, I started to toy with the idea of writing my own programming language. I like to code (a lot!) in my spare time; I find it a great way to meditate and I get a lot of satisfaction out of expressing my thoughts in code.

I think it is fair to say that our industry is still very much in its infancy. We are slowly converging upon some good concepts, such as Test Driven Development (TDD), but I think we can do a lot better. When writing code, the language itself has a huge impact on how the code is written, simply because the language determines how easy or hard it is to express our thoughts.

At one extreme, machine code is a set of instructions that literally tell a processor what to do. This is great if what you are trying to do is tell a processor what to do, but in reality, what we are really trying to do is express our thoughts in code to achieve some objective; the fact that the code will end up as a set of instructions is just an implementation detail.

At the other extreme, wouldn’t it be great if you could just code up what your intention is, rather than what the implementation of your intention is?

I dream of a world where code is intention, not implementation.

There have been some steps towards this in industry, with the mainstream acceptance of functional programming paradigms, such as the Java 8 Streams API.

Unfortunately, Murphy’s law still applies. It is not uncommon in functional programming (and perhaps even encouraged) to see problems solved using types like Tuple with ridiculous constructs such as the wonderful fst and snd in Haskell. These kinds of constructs are back down the path of implementation rather than intention.

I am a huge fan of Tiny Types. They are one step closer to the goal of expressing what our code is supposed to do. Unfortunately, in most languages, the boilerplate code required to introduce a Tiny Type is so much so that most people won’t bother to use them at all!

This got me to thinking, what if the cost of introducing a Tiny Type was zero? This led me to the following realisation: in code that expresses intent well using types, variable names are redundant.

This is a powerful idea. How many times have you seen code like FirstName firstName = new FirstName("name");?

Wouldn’t it be so much nicer to see code like FirstName = "name";?

I want to make this kind of thing a reality. Maybe I’ll find out why it hasn’t been done this way before, or maybe I’ll start a new programming paradigm, who knows. For me at least, it is all about the journey.

This seemingly small change to the way we code has some far reaching implications. Let’s explore some of them.

Types, like variable names in conventional programming, must have a unique name within the same scope. This means that, in a particular scope, when you see a type, you know that it refers to precisely one concept in that scope. To some extent, you can think of it as just swapping the responsibility of variable names to be type introduction instead. You can imagine a syntax like FirstName: String being used to declare a new type in a method signature. This doesn’t introduce any problems in method or class scope, but things get interesting with global scope.

For example, any type that is part of a public API, say:

interface Farm {
    Integer countChickens();
    Integer countEggs();
}

is no longer possible because there is a name clash; does Integer refer to the thing returned by Integer countChickens(); the thing returned by Integer countEggs();?

To resolve this, a new type must be introduced in each case:

interface Foo {
    ChickenCount: Integer countChickens();
    EggCount: Integer countEggs();
}

It is also worth mentioning at this point, the idea with these kinds of type aliases is that it is possible for an instance to act as any of the aliases in its alias chain.

So in our running example, if we have a method Sum: Integer sum(First: Integer, Second: Integer) then it is perfectly possible to invoke it like so: Sum = sum(ChickenCount, EggCount)

That’s all for now, I’ll explore more of the ideas I have in future posts.

The prime factor module

Every integer can be written as a product of its prime factors, for example: 20 = 2 ^2 5^1.

This product of prime powers is a unique representation of that number, up to the order of the factors. This is known as the fundamental theorem of arithmetic.

Note that this also holds for rational numbers, for example: \frac{100}{15} = \frac{2^2 5^2}{3^1 5^1} = 2^2  3^{-1}  5^1.

The interesting thing is that you can consider the prime factorization of a rational number as a vector, where the elements of the vector represent how many times a particular prime number appears. For example, 20 = (2, 0, 1) and \frac{100}{15} = (2, -1, 1). These

It turns out that these prime factor vectors, with vector addition and scalar multiplication defined in the usual way, form a module over the integers. I’m not the first to have spotted this idea either.

I had a bit of fun formalising these ideas in this document.

I’m curious to know whether thinking of numbers in this way can give us some more insights than the usual number line. What do quantities like an angle or a volume in this module represent?