How to TDD a compiler: learning to read

One of the first things I decided to dive deep into was how to get the textual representation of a program into a sensible representation in memory. This is known as parsing.

I got side tracked a lot along the way and ended up writing my own general purpose parser implementation based on the Marpa parsing algorithm.


During this experience, I implemented a scanner that was built around a Nondeterministic Finite Automaton (NFA) encoded in a Binary Decision Diagram (BDD). Using a BDD, I was able to transform an entire set of NFA frontiers to a new set given a newly scanned character in constant time.

Finite Automata

Usually, a Deterministic Finite Automaton (DFA) is used to perform transitions given a newly scanned character. Typically this is generated from an NFA and minimised for performance reasons.

NFAs have a smaller memory footprint than their corresponding DFAs. This is because the DFA must uniquely encode each possible word as a single path to an accepting state, whereas an NFA allows multiple paths up until a word is fully recognised. This smaller memory footprint of an NFA can make it more likely to result in CPU cache hits, drastically speeding up processing since main memory accesses are avoided more often.

However, because of the multiple paths in an NFA, this means it needs to keep track of sets of states rather than just one state as in a DFA. This is usually considered a problem since if the NFA has a large number of states, it could take a quadratic amount of time to process a single character in the worst case, in terms of the number of nodes in the NFA.

At least they are finite

BDD encoding

By using a BDD to encode the NFA, operations are performed on sets of nodes in constant time, and so the time to process a single character is constant.

You can have a look at the scanner implementation here, which I recently tidied up a bit. I made use of JDD, a Java BDD library.

The steps to build up the scanner went something like this:

  1. For each possible symbol type, define a regular expression.
  2. Convert the regular expressions to an NFA with nondeterministic “epsilon transitions”. This is a well-understood process called Thompson’s algorithm.
  3. Remove all the epsilon transitions from the NFA by allowing the NFA to encode sets of states rather than just individual states.
  4. Remove unreachable states.
  5. Relabel states so that the most frequent ones have the lowest identifier.
  6. Relabel transitions so that the most frequent ones have the lowest identifier.
  7. Encode the NFA using a BDD. The BDD variables are chosen to correspond with bit indexes of the binary representation of the state and transition identifiers.
  8. Order BDD variables so that the ones that provide the most entropy reduction have the lowest identifier.

Phew. Easy, right?

The relabelling steps (5, 6, 8) were an attempt to get the BDD variable representation of the transition table to be as compact as possible. This can be achieved by trying to get the representation to share as many nodes as possible.

Representing a row in the transition table, for example, requires a conjunction of every single BDD variable as either being present or not present. If the BDD variables corresponded directly to the states and transitions then this would be a very sparse representation, since each row represents only one transition and there would be very little node reuse.

This is why I encoded the BDD variables as bit indexes instead; a small number of bits can represent a lot of states. The representation of transition table rows will be less sparse since the binary representation of integers spans multiple bits. The lower order bits will be the most commonly populated, which is why the most frequent states were relabelled to have the lowest identifier in steps 5 and 6. The downside of this is that the BDD variables don’t have any intuitive meaning; this is just a hack to keep the number of BDD variables small.

The entropy reduction ordering of the BDD variables in step 8 is designed to place BDD variables that are infrequently used at the top of the BDD. These will typically be the higher order bits of the states, which we constructed to be the least frequently used. This means that the bulk of the BDD representations will say “not present” for the least frequently used variables, eliminating those variables from the graph in a small number of steps. The more frequently used states will then cluster after this, using a small number of variables to represent a large number of similar states.

Executing code by hand is not very fun

ZDD encoding

Digging deeper into how I could improve on the encoding, I discovered a slightly different data structure, called a Zero-suppressed Decision Diagram (ZDD). There is an excellent talk by Donald Knuth about these called “Fun with Zero-Suppressed Binary Decision Diagrams”. Unfortunately, I can’t seem to find an active link to it at the moment; it used to be here. They turn out to be excellent at efficiently representing sparse boolean functions, such as those that represent families of sets. Some good introductory notes on ZDDs can be found here.

Although ZDDs are extremely interesting in their own right, they turned out to not be suitable for the encoding we are using. They could work well if we kept one ZDD variable per state; then the sparseness would be well represented. We would also have some intuitive meaning back; the transition table would be a family of sets that represent all the pieces needed for a transition (from, via, to) as sets of states.

Unfortunately, I wasn’t able to figure out how to implement the frontier transition using the ZDD operations that JDD provides. In response to that, I started to have a go at implementing my own ZDD base with the operations I needed, which you can find here. I took the naive approach and coded in an object-oriented way rather than the low-level approach JDD takes.

Everything was going well until I went down the rabbit hole of caching. Now everything is a big mess and I haven’t looked at it in a long time. If I get back to it, I will probably contribute the operations I needed back to JDD rather than rolling my own, since an object oriented approach is very unlikely to be able to compete with the performance of a low-level implementation.


Moving on from the scanning side of things, when I started looking into parsing, the Marpa algorithm caught my attention. Marpa, envisaged by Jeffrey Kegler, builds on the earlier work of Jay Earley, Joop Leo, John Aycock and R. Nigel Horspool. It parses a wide variety of grammars in linear time, generated from a Backus Normal Form (BNF). The algorithm is designed to maintain lots of contextual information about the parse. For example, rules and symbols that have been recognised or are expected next, with their locations. This will come in handy for error reporting, which historically is very poor in many compilers.

At the time of writing, Marpa is only officially implemented in C and Perl. So you know what comes next… I had a go at writing an implementation in Java! You can have a look at the parser implementation here.

It took me lots of trial and error to wrap my head around the algorithm, but it was worth it in the end. The Marpa paper was a good guide; I was able to lift the core algorithm straight out of the pseudo code in the paper. Also, Early Parsing Explained by Loup Vaillant was very useful in understanding Early Parsers in general.

I’m not going to go into too much detail on this one, the implementation
was relatively straightforward and the resources above are very detailed. I did have some fun running a profiling tool over some performance tests at this point. My takeaways from that experience were to avoid unnecessary object allocations and to precompute as much as possible when constructing the parser.

I got a bit carried away micro optimising, for example, I found that Java 8 method references had a tendency to create a new anonymous object for each usage of a method reference. I found that I could beat the performance of the Stream API by handcrafting the equivalent imperative code, so I did that for a handful of operations that were repeated most frequently. I also found that the Marpa algorithm naturally contained opportunities to use the flyweight pattern.

It all made sense at the time

That’s all for now. I guess you might have noticed, so far there hasn’t been much TDD going on! We will get there I promise, for now, I’m just catching you up with the initial spikes and research that I was doing. On the timeline, we are up to about November 2015, which is around the time I stopped working on the scanning and parsing experiments.

Ron Garret’s “The Quantum Conspiracy: What Popularizers of QM Don’t Want You to Know” Google Tech Talk

Way back in January 2011, Ron Garret gave a Google Tech Talk titled “The Quantum Conspiracy: What Popularizers of QM Don’t Want You to Know”.

I’m just watching it again now, for the third time, hoping that if I write down my thoughts as I go along I will actually be able to make sense of it this time!

The best introduction to quantum mechanics I have found so far is a series of recordings of the great Richard Feynman talking about “The Quantum Mechanical View of Reality” (which you can find here: [1] [2] [3] [4]).

Ron splits his talk up into four steps:

  1. Review the usual QM story
  2. Show how it leads to a contradiction
  3. Do some math and show how that resolves the contradiction
  4. Tell a new story based on the math

I’ll split this post up to follow those steps so you can watch along with me.

Step 1: Review the usual QM story

Quantum mystery #1: The two-slit experiment

Ron starts by explaining the standard “two-slit” experiment, which I’m going to assume you are familiar with. If not, then this video explains it well.

The root of the apparent weirdness lies in the measurement problem, which gives rise to the so-called wave-particle duality.

Any modification that we make to this experiment that allows us to determine even in principal which of these slits this particle went through destroys the interference.Ron Garret

It is worth noting that this holds for any particle, any “measurement” and any equivalent “split/combine” experiment. This is not some obscure effect, it is possible that similar effects are happening around us all the time in ways we do not usually notice (or perhaps take for granted?) at the macro scale.

Ron goes on to attack the Copenhagen interpretation, in which the wave function is assumed to “collapse” at some point and “become” a particle.

By asking the question “how and when does this collapse happen?”, he starts with some intuition that any form of “collapse” is by nature irreversible and discontinuous. This contradicts the mathematics of quantum mechanics, in which quantum effects are continuous and reversible in time.

Quantum mystery #2: The “Quantum Eraser”

He then moves on to introduce an example of a “quantum eraser” experiment with one split and one combine:


In his eraser, he places one detector at a place that would represent a “dark fringe” and another at a place that would represent a “bright fringe” if an interference pattern was produced.

Taking a measurement after the split but before the combine destroys the interference pattern and each detector detects the same amount of particles. The interesting bit is that this measurement can be “erased” by destroying the information that the measurement determined before the particle hits the combining step. This restores the interference pattern that the detectors witness. The mind boggles…

There’s a really interesting article that appeared in Scientific American in 2007 that explains how to make a DIY quantum eraser. The corresponding web article can still be found here. This is what Ron is demonstrating in his talk with the polarised light filters.

The claim Ron makes is that the implication of this erasure is that it does not make sense to say that the wave function “collapses at the time of measurement”, since erasing that measurement in some sense restores the wave function so that wave information must have been preserved somewhere in the system.

Quantum mystery #3: Entanglement

He then moves on to an example of quantum entanglement:


Here, a UV laser and “down converter” (some kind of crystal structure) is used to produce photons of varying wavelengths, sending them off in opposite directions, in a 1:1 ratio.

By splitting at each end, using, for example, one of the polarisation measurements from earlier, we find that the LU/RD and LD/RU detectors are perfectly correlated, because of conservation laws.

This is what Einstein famously called “Spooky action at a distance”Ron Garret

What this seems to show is that, independent of the distance of separation between two entangled particles, a measurement of one instantaneously changes a perfectly correlated aspect of the particle’s entangled twin.

At first glance, this would seem to imply that faster than light communication is possible, however, it is widely believed that this is not the case since we have no control over what the result of a measurement will actually be.

Step 2: Show how it leads to a contradiction

Now, Ron says that he is going to show us how the story presented so far leads to a contradiction.

So far we have that:

  • A split/combine experiment produces interference
  • Any which-way measurement destroys interference
  • Some which-way measurements can be erased, restoring interference
  • Measurements on entangled particles are perfectly correlated

What they don’t want you to know:

All of these things cannot possibly be true!Ron Garret

He presents a thought experiment paradox which he dubs the Einstein-Podolsky-Rosen-Garret Paradox:


In this experiment, two “two-slit” experiments are fed by quantum entangled photons produced by the entanglement process he described earlier.

The question is, what happens if we take a measurement on the left side. Will this destroy interference on the right?

  • If the answer is “yes”, then we have faster than light communications, by using measurement on the left to produce a signal on the right, which is assumed to be impossible
  • But if the answer is “no”, then we know the position of the particle but we still have interference, which violates a fundamental principle of quantum mechanics

There is one more possibility, however. Perhaps there was no interference, to begin with! Maybe entanglement counts as a measurement that destroys interference?

Unfortunately, Ron claims that even in this case, we are not out of the woods yet. We can put in an eraser on the right and destroy the entanglement, leading us back to producing interference but still knowing the position of the particle and so still leading to a contradiction as before.

Step 3: Do some math and show how that resolves the contradiction

After some preliminaries about the wave function \Psi and the “hack” of extracting the probability of the wave function by “squaring” the magnitude |\Psi|^2, we move on to some two-slit math, making use of Dirac notation.

Note that all the non scalar quantities in an amplitude expression represent complex numbers. Also, “squaring” the magnitude of a complex number \Psi is like taking the inner product of that number with itself, where the inner product is defined as the complex conjugate operation and so \langle \Psi|\Psi \rangle = |\Psi|^2 .

Also, remember the properties of the inner product, in particular: \langle A + B|A + B \rangle = |A|^2 + |B|^2 + \langle A|B \rangle + \langle B|A\rangle.

Without detectors

The amplitude of the particle without measurement is (\Psi_U + \Psi_L)/\sqrt{2}, where \Psi_U is the amplitude of the upper detector and \Psi_L is the amplitude of the lower detector. \sqrt{2} is a normalisation constant used to ensure probabilities sum to 1.

The resulting probability is (|\Psi_U|^2 + |\Psi_L|^2 + \Psi_U{}^*\Psi_L + \Psi_L{}^*\Psi_U)/2 .

The term \Psi_U{}^*\Psi_L + \Psi_L{}^*\Psi_U is an interference term made up of the sum of two complex products involving the complex conjugates {}^*\Psi_L and {}^*\Psi_U of the original detector amplitudes. This is the only part of the probability expression that can contribute negatively.

With detectors

The amplitude of the particle with measurement is (\Psi_U|D_U\rangle + \Psi_L|D_L\rangle)/\sqrt{2}, where the new term |D_U\rangle is the amplitude of the detector indicating a particle at the upper slit and |D_L\rangle is the amplitude of the detector indicating a particle at the lower slit.

The resulting probability is (|\Psi_U|^2 + |\Psi_L|^2  + \Psi_U{}^*\Psi_L \langle D_U|D_L\rangle + \Psi_L{}^*\Psi_U \langle D_L|D_U\rangle)/2 .

Again, we have an interference term \Psi_U{}^*\Psi_L \langle D_U|D_L\rangle + \Psi_L{}^*\Psi_U \langle D_L|D_U\rangle that involves two new quantities. \langle D_U|D_L\rangle is the amplitude of the detector switching spontaneously from the U state to the L state. Likewise, \langle D_L|D_U\rangle is the amplitude of the detector switching spontaneously from the L state to the U state.

If the detector is working properly, then both the \langle D_U|D_L\rangle and \langle D_L|D_U\rangle terms will be 0 . This means that the probability would be (|\Psi_U|^2 + |\Psi_L|^2)/2 , with no interference term at all.

In some sense, this demonstrates that “measurement” is a continuum! With imperfect knowledge of the actual state (because of imperfect detectors), the interference term is slowly introduced.

Entangled particles

Ron states that the amplitude of the entangled particle experiment in the EPRG paradox is given by (\lvert\uparrow\downarrow\rangle + \lvert\downarrow\uparrow\rangle)/\sqrt{2}, which is the amplitude for the left particle to be in the up state and the right particle to be in the down state \lvert\uparrow\downarrow\rangle superimposed with the amplitude for the left particle to be in the down state and the right particle to be in the up state \lvert\downarrow\uparrow\rangle .

Changing the notation, this is equivalent to (\Psi_{LU}|RD\rangle + |\Psi_{LD}|RU\rangle)/\sqrt{2} . This is similar to our earlier two-slit with detectors amplitude.

Recall that the LU/RD and LD/RU detectors are perfectly correlated. This means that an observation of \Psi_{LD} , for example, gives us all the information about \Psi_{RU} . Likewise, \Psi_{LU} gives us all the information about \Psi_{RD} .

Entanglement and measurement are the same phenomenon!Ron Garret

It is as if entanglement and measurement of LU and LD is the same as measuring RU and RD directly without entanglement: (\Psi_{RD}|RD\rangle + |\Psi_{RU}|RU\rangle)/\sqrt{2} .

Quantum eraser

After measurement, but before erasure, the amplitude is (|U\rangle|H\rangle + |L\rangle|V\rangle)/\sqrt{2}) , where |U\rangle|H\rangle is an upper photon that is horizontally polarised and |L\rangle|V\rangle is a lower photon that is vertically polarised.

The corresponding probability is:
\displaystyle (\langle U|U\rangle \langle H|H\rangle + \langle L|L\rangle \langle V|V\rangle + \langle U|L\rangle \langle H|V\rangle + \langle L|U\rangle \langle V|H\rangle)/2 .

However, this reduces to ( \langle U|U\rangle + \langle L|L\rangle )/2, with no inteference, since the inteference terms are zero because the polarisation is assumed to be stable and so \langle H|V\rangle = 0, \langle V|H\rangle = 0, \langle H|H\rangle = 1, \langle V|V\rangle = 1 .

After erasure, by filtering in at 45^\circ , the amplitude is given by (|U\rangle + |L\rangle)(|H\rangle + |V\rangle)/2\sqrt{2}, which is a photon that is either in the upper |U\rangle or lower |L\rangle slit and is either horizontally |H\rangle or vertically |V\rangle polarised. |H\rangle + |V\rangle means polarised at 45^\circ .

Note that the probability normalisation constant is 2\sqrt{2} not \sqrt{2} as before. It turns out that the total probability of the amplitude is not one, but a half!

It turns out this is because we have not accounted for half of the photons, those that were filtered by the eraser.

The filtered out photons have a different amplitude (|U\rangle + |L\rangle)(|H\rangle - |V\rangle)/2\sqrt{2} . The |H\rangle - |V\rangle term means “filter out” polarisation at 45^\circ , but along a different axis than the |H\rangle + |V\rangle “filter in” polarisation.


Both sets of photons (both the filtered in and the filtered out) interfere with themselves. The filtered in photons display interference fringes. The filtered out photons display interference “anti-fringes”. These fringes sum together to produce “non-interference”.

You can see this with a little bit of algebra. The overall amplitude after erasure is \displaystyle \begin{array}{rcl} && ((|U\rangle + |L\rangle)(|H\rangle + |V\rangle) +  (|U\rangle + |L\rangle)(|H\rangle - |V\rangle))/2\sqrt{2} \\ &=& (|U|\rangle |H\rangle + |U\rangle |V\rangle + |L\rangle |H\rangle + |L\rangle |V\rangle + |U\rangle |H\rangle - |L\rangle |V\rangle + |L\rangle |H\rangle - |U\rangle |V\rangle )/2\sqrt{2} \\ &=& (|U\rangle |H\rangle + |L\rangle |H\rangle)/\sqrt{2} \\ \end{array}

The corresponding probability is (\langle U|U\rangle + \langle L|L\rangle + \langle U|L\rangle + \langle L|U\rangle)/2 .

Which always has an interference term \langle U|L\rangle + \langle L|U\rangle .

So quantum erasers don’t “erase” anything, and they don’t produce interference either, they just “filter out” interference that was already there.Ron Garret

I think what Ron is trying to highlight in the quote above is that the math shows us that all the eraser does is filter out the |U\rangle |V\rangle + |L\rangle |V\rangle - |L\rangle |V\rangle - |U\rangle |V\rangle interference, leaving only the self interference of \langle U|L\rangle + \langle L|U\rangle. It does not “add in” any interference of it’s own, rather it just highlights interference that is already in the system through this “filtering out” process.

Next, he points out that you can observe this “cancelling out” phenomenon in the laboratory in an Einstein-Podolsky-Rosen experiment, by recording the U and D detector states, then sending this information classically (over a wire for example) to the right-hand side of the experiment. You can then look at the U and D photons separately and notice that there is one “interference pattern” and another “anti-interference pattern” that you usually do not see because, in the combined effect, these patterns cancel out.


Coming back to the original point of this section, we have resolved the contradiction, since:

  • Entanglement does “count” as measurement
  • There is no interference in the EPRG experiment
  • Interference is not “produced” by the use of an eraser

Step 4: Tell a new story based on the math

Here, Ron begins to introduce his “quantum information theory”, or “zero-worlds” interpretation of quantum mechanics. This is an extension of classical information theory that uses complex numbers.

He introduces the entropy measure of a system, H(A) = -\Sigma_a \text{P}(a) \text{log} \text{P}(a), which is 0 when a system is definitely in a single state and \text{log}(N) when a system has an equal probability of being in N states.

Some entropy measures of two systems are then introduced.

The joint entropy is H(AB) = -\Sigma_{a,b} \text{P}(ab) \text{log} \text{P}(ab).

The conditional entropy is H(A|B) = -\Sigma_{a,b} \text{P}(a|b) \text{log} \text{P}(a|b).

The information entropy (more commonly known as the mutual information) is I(A:B) = I(B:A) = H(AB) - H(A|B) - H(B|A), which is the information about A contained in B and is always in the range [0, 1].

This is extended into the complex plane by using the Von Neuman entropy of a system, S(A) = -\text{Tr}(\rho_A \text{log} \rho_A) where \rho_A is the quantum density matrix, \text{Tr} is the matrix trace operator and \text{log} is the natural matrix logarithm.

Ron makes a key point here that the information entropy is no longer restricted to the range [0, 1].

He then illustrates what happens in terms of the mutual information when we consider a system with three mutually entangled particles, where A and B are the “measurement apparatus” and C is the particle we are interested in measuring:quantum-entanglement-3-mutual-ron-garret

By ignoring the particle C , the resulting system looks exactly like a “coin with a sensor”. The two particles A and B are perfectly correlated.

This can be extended to any macroscopic system with a very large number of entangled particles:quantum-entanglement-many-mutual-ron-garret

This is exactly what Feynman was talking about in one of his workshops!

The only reason the macroscopic world seems so deterministic and classical is that many entanglements are being made all of the time in a messy web that is almost impossible to “undo” in practice, but in principle, there is nothing that says this is not possible.

Closing thoughts

If you made it this far then well done, I didn’t realise there was so much content packed into this talk.

My takeaways can be summarised as follows:

  • Measurement is the same phenomenon as entanglement
  • Quantum effects only present themselves when there is a high degree of uncertainty (e.g. not much entanglement)
  • In our daily lives, we have been habituated into becoming very familiar with the consequences of a very high degree of entanglement
  • On the flip side, we are very unfamiliar with the consequences of a very low degree of entanglement, but that should not stop us from accepting what really does happen

I’ll finish with another quote from Ron that made a lot of sense to me after learning all of this:

“Spooky action at a distance” is no more (and no less) mysterious than “spooky action across time.” Both are produced by the same physical mechanism.Ron Garret

How to TDD a compiler: the story so far

In this series, I will be documenting my experiences and progress in writing a compiler for a yet-to-be-named programming language.

A while ago now, sometime in 2015, I started to toy with the idea of writing my own programming language. I like to code (a lot!) in my spare time; I find it a great way to meditate and I get a lot of satisfaction out of expressing my thoughts in code.

I think it is fair to say that our industry is still very much in its infancy. We are slowly converging upon some good concepts, such as Test Driven Development (TDD), but I think we can do a lot better. When writing code, the language itself has a huge impact on how the code is written, simply because the language determines how easy or hard it is to express our thoughts.

At one extreme, machine code is a set of instructions that literally tell a processor what to do. This is great if what you are trying to do is tell a processor what to do, but in reality, what we are really trying to do is express our thoughts in code to achieve some objective; the fact that the code will end up as a set of instructions is just an implementation detail.

At the other extreme, wouldn’t it be great if you could just code up what your intention is, rather than what the implementation of your intention is?

I dream of a world where code is intention, not implementation.

There have been some steps towards this in industry, with the mainstream acceptance of functional programming paradigms, such as the Java 8 Streams API.

Unfortunately, Murphy’s law still applies. It is not uncommon in functional programming (and perhaps even encouraged) to see problems solved using types like Tuple with ridiculous constructs such as the wonderful fst and snd in Haskell. These kinds of constructs are back down the path of implementation rather than intention.

I am a huge fan of Tiny Types. They are one step closer to the goal of expressing what our code is supposed to do. Unfortunately, in most languages, the boilerplate code required to introduce a Tiny Type is so much so that most people won’t bother to use them at all!

This got me to thinking, what if the cost of introducing a Tiny Type was zero? This led me to the following realisation: in code that expresses intent well using types, variable names are redundant.

This is a powerful idea. How many times have you seen code like FirstName firstName = new FirstName("name");?

Wouldn’t it be so much nicer to see code like FirstName = "name";?

I want to make this kind of thing a reality. Maybe I’ll find out why it hasn’t been done this way before, or maybe I’ll start a new programming paradigm, who knows. For me at least, it is all about the journey.

This seemingly small change to the way we code has some far reaching implications. Let’s explore some of them.

Types, like variable names in conventional programming, must have a unique name within the same scope. This means that, in a particular scope, when you see a type, you know that it refers to precisely one concept in that scope. To some extent, you can think of it as just swapping the responsibility of variable names to be type introduction instead. You can imagine a syntax like FirstName: String being used to declare a new type in a method signature. This doesn’t introduce any problems in method or class scope, but things get interesting with global scope.

For example, any type that is part of a public API, say:

interface Farm {
    Integer countChickens();
    Integer countEggs();

is no longer possible because there is a name clash; does Integer refer to the thing returned by Integer countChickens(); the thing returned by Integer countEggs();?

To resolve this, a new type must be introduced in each case:

interface Foo {
    ChickenCount: Integer countChickens();
    EggCount: Integer countEggs();

It is also worth mentioning at this point, the idea with these kinds of type aliases is that it is possible for an instance to act as any of the aliases in its alias chain.

So in our running example, if we have a method Sum: Integer sum(First: Integer, Second: Integer) then it is perfectly possible to invoke it like so: Sum = sum(ChickenCount, EggCount)

That’s all for now, I’ll explore more of the ideas I have in future posts.

The prime factor module

Every integer can be written as a product of its prime factors, for example: 20 = 2 ^2 5^1.

This product of prime powers is a unique representation of that number, up to the order of the factors. This is known as the fundamental theorem of arithmetic.

Note that this also holds for rational numbers, for example: \frac{100}{15} = \frac{2^2 5^2}{3^1 5^1} = 2^2  3^{-1}  5^1.

The interesting thing is that you can consider the prime factorization of a rational number as a vector, where the elements of the vector represent how many times a particular prime number appears. For example, 20 = (2, 0, 1) and \frac{100}{15} = (2, -1, 1). These

It turns out that these prime factor vectors, with vector addition and scalar multiplication defined in the usual way, form a module over the integers. I’m not the first to have spotted this idea either.

I had a bit of fun formalising these ideas in this document.

I’m curious to know whether thinking of numbers in this way can give us some more insights than the usual number line. What do quantities like an angle or a volume in this module represent?