From unaware unknowns to aware knowns

In 1955, two American psychologists Joseph Luft (1916–2014) and Harrington Ingham (1916–1995) developed the Johari Window.  Their goal was to model the relationship between self and others to improve self-awareness and personal development among individuals in a group.

The Jorahi Window Model

This idea of distinguishing between what is known and unknown between self and others has been built on over the years, away from the original use in psychology and towards a more general view based on awareness and knowledge. You may have heard of expressions such as “known knowns” (aware knowledge) and “unknown unknowns” (unaware lack of knowledge) that are sometimes represented as in the matrix below.

Matrix of awareness vs knowledge

What this awareness vs knowledge matrix fails to represent is the unavoidable relationship between self and others, as in the Johari Window. In this article, I present my own view that combines the concepts of awareness, knowledge, self, and others.

To begin with, I consider the “self vs others” concept more generally as “local vs global”, as illustrated in the table below:

Examples of local vs global perspectives
Local Global
Me Humankind
Team Company
Company Industry

Then, I aim to distinguish between awareness and knowledge. Awareness comes first since the question must be asked before you are locally aware of your knowledge of the answer. If something is unknown, it has not been identified; the question has not been asked. Awareness of the question must arise before knowledge of the relevant solution can arise. At the same time, solutions may exist to unanswered questions, but we cannot assign solutions to questions without the questions too.

From “unaware” to “aware” is a process of realization or discovery of a problem or question.

From “unknown” to “known” is a process of finding a solution to a problem that you are aware of.

Bringing this all together results in the following diagram (note that the horizontal axis has been swapped compared to the other two diagrams, so that top right is the “positive” direction rather than top left):

From unware unknowns to aware knowns

Ways to aware knowledge given awareness:

  1. Reinvent the wheel
  2. Use existing knowledge
  3. Discover new knowledge

Ways to aware knowledge given knowledge:

  1. Have existing insight
  2. Use existing insight
  3. Have new insight

Ways to awareness without knowledge:

  1. Ask existing questions
  2. Use existing questions
  3. Ask new questions

By looking at things from this angle, we can see that:

  1. Using only local awareness and local knowledge is a poor use of resources
  2. Efficient use of existing resources comes from “standing on the shoulders of giants” and looking to expand local awareness and knowledge by leaning on the success of those that have come before us
  3. The other edge of the sword is exploring new frontiers by means of discovering new questions and new solutions or new ways of applying existing solutions to existing questions. In this way, the global set of awareness and knowledge increases, for future generations to continue to find new ways to iterate in this great game we call life.

Closing thoughts on the terminology used and how it might fit together:

  • Awareness is the fundamental mechanism a conscious agent has for acquiring data
  • Knowledge is awareness of information by a conscious agent
  • Information is data that has been interpreted by a conscious agent for their own purpose e.g. prediction or pattern recognition
  • Data is the smallest unit of uninterpreted information in the context in use by a conscious agent e.g. when reading a book the smallest unit might be the words, but for someone analyzing words the smallest unit might be the letters; data has subjective units
  • Information can be encoded back as data e.g. text, speech, images
  • Information is self-referential; data leads to information which can act as the data for further information.
  • A conscious agent is something like me or (possibly) you that appears to “think therefore I am”

A common inter-subjective data basis is the binary system; encoding information using bits 0s and 1s to represent discriminatory power. This is by no means a universal basis since it only carries the meaning we associate to the discriminatory ability of the bits. It may be possible to have a universal data basis defined in terms of the Planck units (which appear to be the smallest discrete units the universe has on offer).

System of perception

What are your thoughts? I found this topic especially useful as a reminder that we do not typically operate in closed bounded systems. More often, we are part of something larger than our immediate surroundings and it is important to remember that feedback loops exist between self and others.

Chasing your tail

Warning: thought dump ahead!

Attachment to the outcome of an event leads to a subjective spectrum of measurements of the (usually implicit) metrics that you are using to monitor apparent change. Your preferences about which direction you want these metrics to go in seems to be how the attachment presents itself.

Thinking of ways to game this, let’s explore a few strategies and see what happens.

At one extreme, you could take the position “I have no preferences”. This is kind of self-contradicting since this is a preference in itself. A step further could be to have no preference for having preferences. Huh?

Is it possible to have no preferences? Let’s look at the word “preference” first, starting with a dictionary definition:

a greater liking for one alternative over another or others

We need to go deeper down the rabbit hole. What does “liking” mean? Back to the dictionary…

a feeling of regard or fondness

This is not really helping, we are just arriving at more words that are not objectively defined. Hmm.

Let’s use the facts. To have a preference, we have to have at least two possibilities. We could say that the word preference is a kind of directionality or scoring system for the possible outcomes.

Alright, so one strategy is to weight the scores equally for all outcomes. I can see a few problems with that. As before, this strategy (and perhaps any strategy?) is a kind of preference. By choosing a scoring system, you have weighted that system above all others.

Is it possible to not choose a scoring system? Perhaps…

Certainly, it is possible to not explicitly choose a system. But it seems as if there is always going to be an implicit system in place. Without preference, it would not take long before you got run over by a truck because you did not have a preference for the outcome of crossing the road!

Alright, so it looks like there are going to be at least a handful of fundamental preferences that anyone reading this is going to have. First and foremost would be the self-preservation instinct.

Another way we can look at this is to inspect the “at least two possibilities” bit. What if there were only one possibility? In that case, there would be no preference because there is no choice about what to have a preference for.

So, how could there be only one possibility? Well, I suppose the only way to do that would be to not distinguish between outcomes. The phrase “whatever happens, happens” comes to mind.

Of course, you could argue that the model of not distinguishing between outcomes is a kind of scoring system in itself. In a way, it is isomorphic to the perspective of equally weighting all possible outcomes. Nonetheless, it certainly sounds like a simpler way of framing things.

To not distinguish between outcomes would be something like the absence of chopping and categorising things up. Lack of labelling. A willingness to not know and not understand.

How far could you take this? I think this approach leads to the experience of a lack of agency. By agency, I mean the impression that you have control over the outcome of events. If you have stopped distinguishing between outcomes, it seems that you would also have stopped getting the impression that you were influencing those outcomes.

The suggestion that lack of agency is even possible can be a scary thought for some people. I think this is perhaps tied to the fear of losing control, identity, self, ego, whatever you want to call it.

So, then what? I think the end game is that, with this approach, you retain a fresh sense of wonder about each and every moment as it unfolds. Unsure what to expect and untied to what happens. The most appropriate word that comes to mind is “freedom”.

You are not your beliefs

This one can be quite an eye opener if you have not heard of it before.

When interacting with others, sometimes we can end up defending a particular belief, idea or position. This can be a useful tool, but sometimes it is possible to take it a bit too far.

You are not your beliefs. By attaching the sense of self to particular concepts, you open up a duality of being “right” or “wrong” about something. In turn, this polarization can lead to sticky situations, where you feel under attack when you encounter someone who is pushing an agenda that conflicts with your own.

A great way to work with this is pretty simple: lose. See what that is like. Next time you sense someone is getting defensive about something, try “losing” and see what happens as one of the two fists stops fighting.

Don’t get me wrong, I am not saying everyone should become a tree hugging pacifist. Fight when appropriate, but at least be aware of the mechanisms that lead you there.

7 day (mostly) water fast

This one has been on the backlog for a while. In September 2017 I did a 7 day (mostly) water fast, this post talks a bit about my experiences and the data I collected during this period.

Please do not try this at home. Do your own research, consult with a healthcare professional and make informed decisions.


For those of you that do not know, a water fast is exactly what it sounds like: you stop eating and drink only water. Why would you do that? For me, out of pure curiosity.

There are a bunch of hand-wavy potential health benefits that some people peddle and some more recent more realistic research interest. Clinical fasting is starting to become a more mainstream method of promoting an environment in which the body can use the survival mechanisms it has evolved to potentially benefit overall health. I look forward to new research over the coming years.

Fasting has also been used as a spiritual tool for millennia. It is a great way to wean yourself off of a lifestyle with core values driven by comfort and complacency.


This is very rough, I did not keep a proper diary on this so it is from memory only. Day 1 was the 31st August 2017.

Day 1

No idea what my last meal was but started towards the end of the last day I had at work before the block of time I had booked off for the fast.

Day 2-3

The first few days are the tricky ones in my experience. I did a 5-day water fast a couple of years ago and a number of slightly shorter ones over the past 5 years or so. The likely reason for the trickiness is probably to do with the switchover as glycogen in the liver is depleted and the body turns to fat as an energy source, turning it into ketone bodies that can be used to fuel normal bodily functions.

This can be a bit of a shock to the system. You can expect to feel quite weak, potentially irritable, with some minor headaches. Sleeping patterns might be interrupted too. Having been through this a few times, it was not as daunting this time and just part of the ride.

I started a rough meditation schedule, sitting for periods of between 1 and 3 hours at a time throughout the day.

Day 4-6

I felt like taking in the liquid portion of some vegetable broth to make sure my body had the resources necessary to keep electrolytes in balance. No idea if this actually did anything, but seemed like a sensible thing to do at the time.

At this point, the tricky part of the fast was over and I felt very clear headed. There is nothing quite like the mental experience of an extended fast. It might be a placebo or might even be something to do with the brain switching the primary fuel source from glucose to ketones. Either way, the experience is akin to walking outside on a chilly day in nature; everything seems very still and spacious. There is nothing to do and that feels just right.

It was around this time I had some really great conversations with my brother that led to some breakthroughs in some concepts I was working with at the time. I was on a real high at this point, feeling great for no reason at all all of the time.

Day 7-8

I broke my fast at this point, I think with some melon juice on the evening of the 7th day. Followed by some carrot juice and a few strawberries. The following day I ate mostly grapefruit, did not have much of an appetite at this point and it seemed sensible to just eat some easy to digest fruits.

Day 9+

At this point, I started resuming my normal eating patterns as my appetite returned. Slightly deflated to have ended the fast. I like the challenge and to see how far I can push myself. Next time I will likely try out a slightly longer duration, perhaps 10 to 14 days. It always feels as if just as I am getting to the good bit I have to wind down again and go back to normal life again!

Graphs and Stuff

Blood Glucose/Ketone Levels


This one played out as expected. There are some missing data points at the beginning where I did not have the correct strips for my glucose meter.

Ketone levels go up as the fast begins and peak around the 7mmol/L mark, hovering between 5.8 and 7.7 from Day 3 to Day 8 and sharply dropping back to less than 0.5 from Day 10 onwards. Glucose levels exhibit the opposite behaviour, hovering around the 3.5mmol/L mark from Day 4 to Day 8 and rising to around 5mmol/L as the fast ended.

This pattern occurs predictably as the body uses up the glucose it has stored as glycogen in the liver. At this point, fat begins to be broken down and ketones become the primary energy source. Interestingly, the brain has a preference for these ketone bodies during the fast, preserving the remaining glucose production for the organs that are not able to use the ketones e.g. red blood cells. Upon refeeding, the body switches back to using glucose again.

The glucose ketone index or GKI is a simple ratio of blood glucose levels to blood ketone levels, used by Dr Thomas Seyfriend in his research on the managing brain cancer using a dietary controlled metabolic approach. The takeaway is that maintaining a low GKI can reduce tumour metabolism, essentially starving them of the glucose they need. Low means something along the lines of 0.7 to 1.1. During the fasting period, my GKI stayed between 0.4 and 0.6, sharply returning to the more typical huge ratio as I switched back to using glucose as my primary energy source.

Heart Measurements


My resting heart rate stayed between 58 and 73 beats per minute. It seems mostly uncorrelated with the fasting period. The minimum was reached on Day 6 of the fast. This kind of range is pretty decent for a 27-year-old male with a mostly sedentary lifestyle.

My blood pressure also seems mostly uncorrelated with the fast. It stayed between around 94 to 117 mmHg systolic and 57 to 75 mmHg diastolic. This is in the normal range, perhaps slightly low on the diastolic side.

I also recorded some heart rate variability (HRV) measures. These are a bit more difficult to interpret. Using the data from the study of 260 healthy subjects by Umetani et. al, 1998 here is my summary:

  • Inter-beat RR interval was around the 900ms during the bulk of the fast. It was closer to 800ms before and after the fast. This is within the normal 939±129ms for my age group.
  • The RMSSD (Root Mean Square of the Successive Differences) varied a lot from day to day, between 21.5 and 48.0. Again, pretty close to the normal 39.7±19.9 for my age group.
  • SDNN (Standard Deviation of the NN[RR] intervals) was also had a pretty huge variance, ranging between 42.6 and 86.25. This is slightly above the normal 50.0±20.9 for my age group. Higher numbers for this measurement are typically found in younger individuals.
  • PNN50 (The proportion of the number of successive NN[RR] intervals that differ by more than 50ms divided by the total number of NN[RR] intervals) again was all over the place, between 5.4% and 32.4%. This compares with a typical 20±17% for my age group, so nothing too interesting there.

Here is the raw heart rate variability data I collected:

Heart Rate Variability Mean RR (ms) RMSSD LnRMSSD SDNN NN50 PNN50 (%)
01/09/2017 AM 60 803 47.88 3.87 86.25 52 27.957
02/09/2017 AM 50 807 25.82 3.25 42.6 10 5.405
04/09/2017 AM 60 913 47.97 3.87 83.68 49 31.613
05/09/2017 AM 59 995 45.71 3.82 59.37 49 32.45
06/09/2017 AM 54 952 32.73 3.49 53.69 12 7.692
07/09/2017 AM 56 933 39.29 3.67 55.6 30 18.634
08/09/2017 AM 53 960 30.36 3.41 46.3 14 9.032
09/09/2017 PM 47 797 21.47 3.07 37.03 5 2.674

Weight Measurements

As you might expect, weight loss is pretty much inevitable with an extended fast. Over a very short period of just over a week, my weight went from around 134lbs down to around 127lbs towards the end of the fast. Soon after, it went back to around 131lbs mark. My BMI stayed within a healthy range, between just under 20 and 21 throughout.



Interestingly, my body fat percentage dropped from around 19.6% to 16.4% and remained there despite putting most of the weight back on. This is a good indicator that I was indeed literally burning fat for fuel during the fast.


Water Measurements

I took these measurements as a precaution, just to make sure nothing unusual was going on during the fast. Nothing particularly interesting here. You can see the ketone measurements do correlate with the blood ketone measurements. The accuracy of all of these measures is pretty bad, which results in a noisy graph.


There was a bit of a trend in the specific gravity, which did seem to increase a little during the fast but stayed within the typical 1.0 to 1.03 range. The slight elevation could indicate slight dehydration during the water fast, which is quite amusing.



Here are the gadgets I used to collect the data. I messed up a bit ordering incompatible blood glucose strips which explains some of the gaps in the data. Unfortunately, the product links are all Amazon. You can find similar elsewhere too.

Why meditate?

This question came up recently and I did not give a great answer. The following post aims to provide an answer that hopefully clarifies rather than confuses.

First, let’s set some context. I have explored a number of different meditation approaches and philosophies, with my current focus predominantly on a technique used in the Sōtō lineage of Zen Buddhism. This post will talk mainly about that technique, which is called Shikantaza.

The reason this question is hard is that in truth I do not know the answer. I fell into becoming interested in this stuff some time ago and that interest has not dwindled much since.

Today, lots of traditional Eastern philosophy has started to creep into the Western world. For example, this has manifested the popularization of techniques such as mindfulness. The way these ideas are typically sold to a Western audience is to suggest possible benefits such as “do this and you will be calmer” or “do that and you will have fewer negative thoughts”.

If you go into a meditation practice with the intention to achieve these kinds of goals, then you are enacting a form of spiritual materialism. This means using the spiritual path to achieve material gains. You may even succeed! It is true that people will likely see these kinds of material benefits.

In comparison, the typical intention behind a practice such as Shikantaza is to see clearly what is already there, rather than trying to create something different.

So, what is Shikantaza? Quite simply, it is the practice of “just sitting”. Hold the body still and watch what continues to move. What this results in for most people initially is a flurry of thoughts they did not even know they had. The idea is to just observe. Don’t chase. Don’t reject. Don’t ignore. Mmm doughnuts…

Right, but why would you want to do that? This is the funny bit, you probably do not want to do that at all! All the typical markers that the ego uses to measure “progress” are missing. You could get bored. Your joints might hurt. You will lose time that could have been spent elsewhere. Your friends might say you will turn into a vegetable!

The will to do this goes beyond the ego mind, from a more fundamental urge to see what this is, to see what you are and how things function.

Upon investigation, the realisation that typically follows is along the lines of “this is all there is”. This could be spontaneous or it could take decades or it may never happen at all. Or you might say “screw this, I’m gonna go watch TV!”.

Pinning down a description of what I am trying to point to here is super hard. Many have tried over the years and in my experience, it does not help much. Any words I use might make sense to me and mean nothing to you. If you are interested, I would encourage you to find out for yourself. Don’t accept, reject or ignore anything I am talking about. Consider it, and see how it fits into how things work for you.

So, why meditate? I suppose it depends on what you are looking for. In my case, curiosity is the main driver.

It is clear to me that there are pieces missing in the way we are typically educated. In our society, from a young age, we are taught the philosophy of reductionist materialism, which is typically accepted without question. I am exploring those questions now.

PS: I’m always happy to speak more about this kind of stuff. Send any questions you have my way!

Everything is dependently arisen

Everything. That’s a lot of stuff.

What am I talking about? Let me try to explain.

Notice the way that you make sense of the world. In order to function, you have to interpret your sense data. All that you see, hear, taste, touch, smell and think is relative. Relative to what? Everything else. This is the nature of reality.

Put another way, in the “subject-object” concept, for example in the phrase “I see the phone”, the subject “I” and the object “the phone” are described relative to one another. Most typical language seems to function in this way, relative descriptions of things that only seem to exist in relation to other things.

This is both completely ordinary and utterly mindblowing depending on how you look at it. From here at least, it looks to me that there is Just This, nothing more and nothing less.

Don’t believe a word I say. Find out for yourself!

Seeing the ego: compassion with and without expectation

This post will consider the following statement:

All compassion is selfish

Let’s also begin with a dictionary definition of the noun compassion:

Sympathetic pity and concern for the suffering or misfortunes of others

Think of the last time you were compassionate. Perhaps a friend was ill and you brought them flowers. Or maybe you donated some money to charity. Or maybe you were compassionate and took no action at all.

Now, ask yourself the question: why was I compassionate? There are no right or wrong answers.

In my experience, it is likely that at least part of the perceived reason is something along the lines of “it felt like the right thing to do” or “if I were in that situation, I would be suffering”.

I am going to suggest that these types of reason are subtle forms of selfishness.

Imagine that we lived in a world where compassion would not give us that warm, feel good glow inside. Replace it with whatever strong negativity that makes most sense to you. Would you still be compassionate?

This is not an easy question. To begin with, it is hypothetical. It is very hard for us to imagine this situation, since compassion is baked into our evolutionary psychology. We can’t help but feel good when we help others.

Despite this, to me at least, it seems that the core motivation for compassion is either the desire to feel good through lessening the suffering of others, or a fear that if we were in that situation others would not be compassionate to us. These are both ego driven concepts that are inherently selfish.

Here’s the good bit: once you can see this for yourself, once you can see the self preserving ego at work, you also have the option to see past it.

Compassion without expectation can be achieved by seeing the ego, letting it have its desires and fears, but also seeing past that.

Compassion that comes from this place can manifest in unexpected ways. Being spacious for people in your life is an often overlooked expression of compassion. It is rare in today’s busy world to give someone your undivided attention for an extended period of time.

What opportunities can you think of for compassion without expectation in your life?

Android AER Calculator

This post is about an Android application I wrote to calculate the Annual Equivalent Rate (AER) of a portfolio.

The AER of a portfolio is the annualised interest rate that, when applied to the portfolio contributions, results in the current value of the portfolio.

Simple example

If I invested £100 exactly one year ago and the value now is £105, the AER would be 5%.

This is because £100 x (1 + 5%) = £105.

Not so simple Example

However, things get significantly more complicated when there are multiple contributions on different dates.

What is the AER of a portfolio with contributions of £100 one year ago, £50 ten months ago that is worth £160 today?

It turns out to be approximately 7.04%. This is a typical example that is simply stated but has a not-so-obvious solution.

The Algorithm

To compute the AER, I used a numerical method called the NewtonRaphson method.

The method starts with an initial guess to the root of a function. From here, we follow the derivative of the function in order to converge on the true root of the function.

In this case, the function we are interested in is:
f(r) = \sum_i C_i (r + 1)^{\frac{D_i - D_t}{365}} - P

Where C_i is the ith contribution, D_i is the ith day, D_t is the current day and P is the present value of the portfolio.

The derivative is:
f'(r) = \sum_i C_i \left[ \frac{D_i - D_t}{365} \right] (r + 1)^{\left[ \frac{D_i - D_t}{365} - 1\right] }

The algorithm starts with an initial estimate r_0 and obtains a better approximation r_1 by using the relation:
\displaystyle r_1 = r_0 - \frac{f(r_0)}{f'(r_0)}

This process is repeated until the difference between successive estimates is less than some threshold value or the number of iterations hits some predefined limit.

AER Calculator

This led me to write the Android application AER Calculator, which can be found on the Google Play Store.

aer-calculator-single aer-calculator-multiple

The (open source!) project can be found on GitHub.

How to TDD a compiler: learning to read

One of the first things I decided to dive deep into was how to get the textual representation of a program into a sensible representation in memory. This is known as parsing.

I got side tracked a lot along the way and ended up writing my own general purpose parser implementation based on the Marpa parsing algorithm.


During this experience, I implemented a scanner that was built around a Nondeterministic Finite Automaton (NFA) encoded in a Binary Decision Diagram (BDD). Using a BDD, I was able to transform an entire set of NFA frontiers to a new set given a newly scanned character in constant time.

Finite Automata

Usually, a Deterministic Finite Automaton (DFA) is used to perform transitions given a newly scanned character. Typically this is generated from an NFA and minimised for performance reasons.

NFAs have a smaller memory footprint than their corresponding DFAs. This is because the DFA must uniquely encode each possible word as a single path to an accepting state, whereas an NFA allows multiple paths up until a word is fully recognised. This smaller memory footprint of an NFA can make it more likely to result in CPU cache hits, drastically speeding up processing since main memory accesses are avoided more often.

However, because of the multiple paths in an NFA, this means it needs to keep track of sets of states rather than just one state as in a DFA. This is usually considered a problem since if the NFA has a large number of states, it could take a quadratic amount of time to process a single character in the worst case, in terms of the number of nodes in the NFA.

At least they are finite

BDD encoding

By using a BDD to encode the NFA, operations are performed on sets of nodes in constant time, and so the time to process a single character is constant.

You can have a look at the scanner implementation here, which I recently tidied up a bit. I made use of JDD, a Java BDD library.

The steps to build up the scanner went something like this:

  1. For each possible symbol type, define a regular expression.
  2. Convert the regular expressions to an NFA with nondeterministic “epsilon transitions”. This is a well-understood process called Thompson’s algorithm.
  3. Remove all the epsilon transitions from the NFA by allowing the NFA to encode sets of states rather than just individual states.
  4. Remove unreachable states.
  5. Relabel states so that the most frequent ones have the lowest identifier.
  6. Relabel transitions so that the most frequent ones have the lowest identifier.
  7. Encode the NFA using a BDD. The BDD variables are chosen to correspond with bit indexes of the binary representation of the state and transition identifiers.
  8. Order BDD variables so that the ones that provide the most entropy reduction have the lowest identifier.

Phew. Easy, right?

The relabelling steps (5, 6, 8) were an attempt to get the BDD variable representation of the transition table to be as compact as possible. This can be achieved by trying to get the representation to share as many nodes as possible.

Representing a row in the transition table, for example, requires a conjunction of every single BDD variable as either being present or not present. If the BDD variables corresponded directly to the states and transitions then this would be a very sparse representation, since each row represents only one transition and there would be very little node reuse.

This is why I encoded the BDD variables as bit indexes instead; a small number of bits can represent a lot of states. The representation of transition table rows will be less sparse since the binary representation of integers spans multiple bits. The lower order bits will be the most commonly populated, which is why the most frequent states were relabelled to have the lowest identifier in steps 5 and 6. The downside of this is that the BDD variables don’t have any intuitive meaning; this is just a hack to keep the number of BDD variables small.

The entropy reduction ordering of the BDD variables in step 8 is designed to place BDD variables that are infrequently used at the top of the BDD. These will typically be the higher order bits of the states, which we constructed to be the least frequently used. This means that the bulk of the BDD representations will say “not present” for the least frequently used variables, eliminating those variables from the graph in a small number of steps. The more frequently used states will then cluster after this, using a small number of variables to represent a large number of similar states.

Executing code by hand is not very fun

ZDD encoding

Digging deeper into how I could improve on the encoding, I discovered a slightly different data structure, called a Zero-suppressed Decision Diagram (ZDD). There is an excellent talk by Donald Knuth about these called “Fun with Zero-Suppressed Binary Decision Diagrams”. Unfortunately, I can’t seem to find an active link to it at the moment; it used to be here. They turn out to be excellent at efficiently representing sparse boolean functions, such as those that represent families of sets. Some good introductory notes on ZDDs can be found here.

Although ZDDs are extremely interesting in their own right, they turned out to not be suitable for the encoding we are using. They could work well if we kept one ZDD variable per state; then the sparseness would be well represented. We would also have some intuitive meaning back; the transition table would be a family of sets that represent all the pieces needed for a transition (from, via, to) as sets of states.

Unfortunately, I wasn’t able to figure out how to implement the frontier transition using the ZDD operations that JDD provides. In response to that, I started to have a go at implementing my own ZDD base with the operations I needed, which you can find here. I took the naive approach and coded in an object-oriented way rather than the low-level approach JDD takes.

Everything was going well until I went down the rabbit hole of caching. Now everything is a big mess and I haven’t looked at it in a long time. If I get back to it, I will probably contribute the operations I needed back to JDD rather than rolling my own, since an object oriented approach is very unlikely to be able to compete with the performance of a low-level implementation.


Moving on from the scanning side of things, when I started looking into parsing, the Marpa algorithm caught my attention. Marpa, envisaged by Jeffrey Kegler, builds on the earlier work of Jay Earley, Joop Leo, John Aycock and R. Nigel Horspool. It parses a wide variety of grammars in linear time, generated from a Backus Normal Form (BNF). The algorithm is designed to maintain lots of contextual information about the parse. For example, rules and symbols that have been recognised or are expected next, with their locations. This will come in handy for error reporting, which historically is very poor in many compilers.

At the time of writing, Marpa is only officially implemented in C and Perl. So you know what comes next… I had a go at writing an implementation in Java! You can have a look at the parser implementation here.

It took me lots of trial and error to wrap my head around the algorithm, but it was worth it in the end. The Marpa paper was a good guide; I was able to lift the core algorithm straight out of the pseudo code in the paper. Also, Early Parsing Explained by Loup Vaillant was very useful in understanding Early Parsers in general.

I’m not going to go into too much detail on this one, the implementation
was relatively straightforward and the resources above are very detailed. I did have some fun running a profiling tool over some performance tests at this point. My takeaways from that experience were to avoid unnecessary object allocations and to precompute as much as possible when constructing the parser.

I got a bit carried away micro optimising, for example, I found that Java 8 method references had a tendency to create a new anonymous object for each usage of a method reference. I found that I could beat the performance of the Stream API by handcrafting the equivalent imperative code, so I did that for a handful of operations that were repeated most frequently. I also found that the Marpa algorithm naturally contained opportunities to use the flyweight pattern.

It all made sense at the time

That’s all for now. I guess you might have noticed, so far there hasn’t been much TDD going on! We will get there I promise, for now, I’m just catching you up with the initial spikes and research that I was doing. On the timeline, we are up to about November 2015, which is around the time I stopped working on the scanning and parsing experiments.

Ron Garret’s “The Quantum Conspiracy: What Popularizers of QM Don’t Want You to Know” Google Tech Talk

Way back in January 2011, Ron Garret gave a Google Tech Talk titled “The Quantum Conspiracy: What Popularizers of QM Don’t Want You to Know”.

I’m just watching it again now, for the third time, hoping that if I write down my thoughts as I go along I will actually be able to make sense of it this time!

The best introduction to quantum mechanics I have found so far is a series of recordings of the great Richard Feynman talking about “The Quantum Mechanical View of Reality” (which you can find here: [1] [2] [3] [4]).

Ron splits his talk up into four steps:

  1. Review the usual QM story
  2. Show how it leads to a contradiction
  3. Do some math and show how that resolves the contradiction
  4. Tell a new story based on the math

I’ll split this post up to follow those steps so you can watch along with me.

Step 1: Review the usual QM story

Quantum mystery #1: The two-slit experiment

Ron starts by explaining the standard “two-slit” experiment, which I’m going to assume you are familiar with. If not, then this video explains it well.

The root of the apparent weirdness lies in the measurement problem, which gives rise to the so-called wave-particle duality.

Any modification that we make to this experiment that allows us to determine even in principal which of these slits this particle went through destroys the interference.Ron Garret

It is worth noting that this holds for any particle, any “measurement” and any equivalent “split/combine” experiment. This is not some obscure effect, it is possible that similar effects are happening around us all the time in ways we do not usually notice (or perhaps take for granted?) at the macro scale.

Ron goes on to attack the Copenhagen interpretation, in which the wave function is assumed to “collapse” at some point and “become” a particle.

By asking the question “how and when does this collapse happen?”, he starts with some intuition that any form of “collapse” is by nature irreversible and discontinuous. This contradicts the mathematics of quantum mechanics, in which quantum effects are continuous and reversible in time.

Quantum mystery #2: The “Quantum Eraser”

He then moves on to introduce an example of a “quantum eraser” experiment with one split and one combine:


In his eraser, he places one detector at a place that would represent a “dark fringe” and another at a place that would represent a “bright fringe” if an interference pattern was produced.

Taking a measurement after the split but before the combine destroys the interference pattern and each detector detects the same amount of particles. The interesting bit is that this measurement can be “erased” by destroying the information that the measurement determined before the particle hits the combining step. This restores the interference pattern that the detectors witness. The mind boggles…

There’s a really interesting article that appeared in Scientific American in 2007 that explains how to make a DIY quantum eraser. The corresponding web article can still be found here. This is what Ron is demonstrating in his talk with the polarised light filters.

The claim Ron makes is that the implication of this erasure is that it does not make sense to say that the wave function “collapses at the time of measurement”, since erasing that measurement in some sense restores the wave function so that wave information must have been preserved somewhere in the system.

Quantum mystery #3: Entanglement

He then moves on to an example of quantum entanglement:


Here, a UV laser and “down converter” (some kind of crystal structure) is used to produce photons of varying wavelengths, sending them off in opposite directions, in a 1:1 ratio.

By splitting at each end, using, for example, one of the polarisation measurements from earlier, we find that the LU/RD and LD/RU detectors are perfectly correlated, because of conservation laws.

This is what Einstein famously called “Spooky action at a distance”Ron Garret

What this seems to show is that, independent of the distance of separation between two entangled particles, a measurement of one instantaneously changes a perfectly correlated aspect of the particle’s entangled twin.

At first glance, this would seem to imply that faster than light communication is possible, however, it is widely believed that this is not the case since we have no control over what the result of a measurement will actually be.

Step 2: Show how it leads to a contradiction

Now, Ron says that he is going to show us how the story presented so far leads to a contradiction.

So far we have that:

  • A split/combine experiment produces interference
  • Any which-way measurement destroys interference
  • Some which-way measurements can be erased, restoring interference
  • Measurements on entangled particles are perfectly correlated

What they don’t want you to know:

All of these things cannot possibly be true!Ron Garret

He presents a thought experiment paradox which he dubs the Einstein-Podolsky-Rosen-Garret Paradox:


In this experiment, two “two-slit” experiments are fed by quantum entangled photons produced by the entanglement process he described earlier.

The question is, what happens if we take a measurement on the left side. Will this destroy interference on the right?

  • If the answer is “yes”, then we have faster than light communications, by using measurement on the left to produce a signal on the right, which is assumed to be impossible
  • But if the answer is “no”, then we know the position of the particle but we still have interference, which violates a fundamental principle of quantum mechanics

There is one more possibility, however. Perhaps there was no interference, to begin with! Maybe entanglement counts as a measurement that destroys interference?

Unfortunately, Ron claims that even in this case, we are not out of the woods yet. We can put in an eraser on the right and destroy the entanglement, leading us back to producing interference but still knowing the position of the particle and so still leading to a contradiction as before.

Step 3: Do some math and show how that resolves the contradiction

After some preliminaries about the wave function \Psi and the “hack” of extracting the probability of the wave function by “squaring” the magnitude |\Psi|^2, we move on to some two-slit math, making use of Dirac notation.

Note that all the non scalar quantities in an amplitude expression represent complex numbers. Also, “squaring” the magnitude of a complex number \Psi is like taking the inner product of that number with itself, where the inner product is defined as the complex conjugate operation and so \langle \Psi|\Psi \rangle = |\Psi|^2 .

Also, remember the properties of the inner product, in particular: \langle A + B|A + B \rangle = |A|^2 + |B|^2 + \langle A|B \rangle + \langle B|A\rangle.

Without detectors

The amplitude of the particle without measurement is (\Psi_U + \Psi_L)/\sqrt{2}, where \Psi_U is the amplitude of the upper detector and \Psi_L is the amplitude of the lower detector. \sqrt{2} is a normalisation constant used to ensure probabilities sum to 1.

The resulting probability is (|\Psi_U|^2 + |\Psi_L|^2 + \Psi_U{}^*\Psi_L + \Psi_L{}^*\Psi_U)/2 .

The term \Psi_U{}^*\Psi_L + \Psi_L{}^*\Psi_U is an interference term made up of the sum of two complex products involving the complex conjugates {}^*\Psi_L and {}^*\Psi_U of the original detector amplitudes. This is the only part of the probability expression that can contribute negatively.

With detectors

The amplitude of the particle with measurement is (\Psi_U|D_U\rangle + \Psi_L|D_L\rangle)/\sqrt{2}, where the new term |D_U\rangle is the amplitude of the detector indicating a particle at the upper slit and |D_L\rangle is the amplitude of the detector indicating a particle at the lower slit.

The resulting probability is (|\Psi_U|^2 + |\Psi_L|^2  + \Psi_U{}^*\Psi_L \langle D_U|D_L\rangle + \Psi_L{}^*\Psi_U \langle D_L|D_U\rangle)/2 .

Again, we have an interference term \Psi_U{}^*\Psi_L \langle D_U|D_L\rangle + \Psi_L{}^*\Psi_U \langle D_L|D_U\rangle that involves two new quantities. \langle D_U|D_L\rangle is the amplitude of the detector switching spontaneously from the U state to the L state. Likewise, \langle D_L|D_U\rangle is the amplitude of the detector switching spontaneously from the L state to the U state.

If the detector is working properly, then both the \langle D_U|D_L\rangle and \langle D_L|D_U\rangle terms will be 0 . This means that the probability would be (|\Psi_U|^2 + |\Psi_L|^2)/2 , with no interference term at all.

In some sense, this demonstrates that “measurement” is a continuum! With imperfect knowledge of the actual state (because of imperfect detectors), the interference term is slowly introduced.

Entangled particles

Ron states that the amplitude of the entangled particle experiment in the EPRG paradox is given by (\lvert\uparrow\downarrow\rangle + \lvert\downarrow\uparrow\rangle)/\sqrt{2}, which is the amplitude for the left particle to be in the up state and the right particle to be in the down state \lvert\uparrow\downarrow\rangle superimposed with the amplitude for the left particle to be in the down state and the right particle to be in the up state \lvert\downarrow\uparrow\rangle .

Changing the notation, this is equivalent to (\Psi_{LU}|RD\rangle + |\Psi_{LD}|RU\rangle)/\sqrt{2} . This is similar to our earlier two-slit with detectors amplitude.

Recall that the LU/RD and LD/RU detectors are perfectly correlated. This means that an observation of \Psi_{LD} , for example, gives us all the information about \Psi_{RU} . Likewise, \Psi_{LU} gives us all the information about \Psi_{RD} .

Entanglement and measurement are the same phenomenon!Ron Garret

It is as if entanglement and measurement of LU and LD is the same as measuring RU and RD directly without entanglement: (\Psi_{RD}|RD\rangle + |\Psi_{RU}|RU\rangle)/\sqrt{2} .

Quantum eraser

After measurement, but before erasure, the amplitude is (|U\rangle|H\rangle + |L\rangle|V\rangle)/\sqrt{2}) , where |U\rangle|H\rangle is an upper photon that is horizontally polarised and |L\rangle|V\rangle is a lower photon that is vertically polarised.

The corresponding probability is:
\displaystyle (\langle U|U\rangle \langle H|H\rangle + \langle L|L\rangle \langle V|V\rangle + \langle U|L\rangle \langle H|V\rangle + \langle L|U\rangle \langle V|H\rangle)/2 .

However, this reduces to ( \langle U|U\rangle + \langle L|L\rangle )/2, with no inteference, since the inteference terms are zero because the polarisation is assumed to be stable and so \langle H|V\rangle = 0, \langle V|H\rangle = 0, \langle H|H\rangle = 1, \langle V|V\rangle = 1 .

After erasure, by filtering in at 45^\circ , the amplitude is given by (|U\rangle + |L\rangle)(|H\rangle + |V\rangle)/2\sqrt{2}, which is a photon that is either in the upper |U\rangle or lower |L\rangle slit and is either horizontally |H\rangle or vertically |V\rangle polarised. |H\rangle + |V\rangle means polarised at 45^\circ .

Note that the probability normalisation constant is 2\sqrt{2} not \sqrt{2} as before. It turns out that the total probability of the amplitude is not one, but a half!

It turns out this is because we have not accounted for half of the photons, those that were filtered by the eraser.

The filtered out photons have a different amplitude (|U\rangle + |L\rangle)(|H\rangle - |V\rangle)/2\sqrt{2} . The |H\rangle - |V\rangle term means “filter out” polarisation at 45^\circ , but along a different axis than the |H\rangle + |V\rangle “filter in” polarisation.


Both sets of photons (both the filtered in and the filtered out) interfere with themselves. The filtered in photons display interference fringes. The filtered out photons display interference “anti-fringes”. These fringes sum together to produce “non-interference”.

You can see this with a little bit of algebra. The overall amplitude after erasure is \displaystyle \begin{array}{rcl} && ((|U\rangle + |L\rangle)(|H\rangle + |V\rangle) +  (|U\rangle + |L\rangle)(|H\rangle - |V\rangle))/2\sqrt{2} \\ &=& (|U|\rangle |H\rangle + |U\rangle |V\rangle + |L\rangle |H\rangle + |L\rangle |V\rangle + |U\rangle |H\rangle - |L\rangle |V\rangle + |L\rangle |H\rangle - |U\rangle |V\rangle )/2\sqrt{2} \\ &=& (|U\rangle |H\rangle + |L\rangle |H\rangle)/\sqrt{2} \\ \end{array}

The corresponding probability is (\langle U|U\rangle + \langle L|L\rangle + \langle U|L\rangle + \langle L|U\rangle)/2 .

Which always has an interference term \langle U|L\rangle + \langle L|U\rangle .

So quantum erasers don’t “erase” anything, and they don’t produce interference either, they just “filter out” interference that was already there.Ron Garret

I think what Ron is trying to highlight in the quote above is that the math shows us that all the eraser does is filter out the |U\rangle |V\rangle + |L\rangle |V\rangle - |L\rangle |V\rangle - |U\rangle |V\rangle interference, leaving only the self interference of \langle U|L\rangle + \langle L|U\rangle. It does not “add in” any interference of it’s own, rather it just highlights interference that is already in the system through this “filtering out” process.

Next, he points out that you can observe this “cancelling out” phenomenon in the laboratory in an Einstein-Podolsky-Rosen experiment, by recording the U and D detector states, then sending this information classically (over a wire for example) to the right-hand side of the experiment. You can then look at the U and D photons separately and notice that there is one “interference pattern” and another “anti-interference pattern” that you usually do not see because, in the combined effect, these patterns cancel out.


Coming back to the original point of this section, we have resolved the contradiction, since:

  • Entanglement does “count” as measurement
  • There is no interference in the EPRG experiment
  • Interference is not “produced” by the use of an eraser

Step 4: Tell a new story based on the math

Here, Ron begins to introduce his “quantum information theory”, or “zero-worlds” interpretation of quantum mechanics. This is an extension of classical information theory that uses complex numbers.

He introduces the entropy measure of a system, H(A) = -\Sigma_a \text{P}(a) \text{log} \text{P}(a), which is 0 when a system is definitely in a single state and \text{log}(N) when a system has an equal probability of being in N states.

Some entropy measures of two systems are then introduced.

The joint entropy is H(AB) = -\Sigma_{a,b} \text{P}(ab) \text{log} \text{P}(ab).

The conditional entropy is H(A|B) = -\Sigma_{a,b} \text{P}(a|b) \text{log} \text{P}(a|b).

The information entropy (more commonly known as the mutual information) is I(A:B) = I(B:A) = H(AB) - H(A|B) - H(B|A), which is the information about A contained in B and is always in the range [0, 1].

This is extended into the complex plane by using the Von Neuman entropy of a system, S(A) = -\text{Tr}(\rho_A \text{log} \rho_A) where \rho_A is the quantum density matrix, \text{Tr} is the matrix trace operator and \text{log} is the natural matrix logarithm.

Ron makes a key point here that the information entropy is no longer restricted to the range [0, 1].

He then illustrates what happens in terms of the mutual information when we consider a system with three mutually entangled particles, where A and B are the “measurement apparatus” and C is the particle we are interested in measuring:quantum-entanglement-3-mutual-ron-garret

By ignoring the particle C , the resulting system looks exactly like a “coin with a sensor”. The two particles A and B are perfectly correlated.

This can be extended to any macroscopic system with a very large number of entangled particles:quantum-entanglement-many-mutual-ron-garret

This is exactly what Feynman was talking about in one of his workshops!

The only reason the macroscopic world seems so deterministic and classical is that many entanglements are being made all of the time in a messy web that is almost impossible to “undo” in practice, but in principle, there is nothing that says this is not possible.

Closing thoughts

If you made it this far then well done, I didn’t realise there was so much content packed into this talk.

My takeaways can be summarised as follows:

  • Measurement is the same phenomenon as entanglement
  • Quantum effects only present themselves when there is a high degree of uncertainty (e.g. not much entanglement)
  • In our daily lives, we have been habituated into becoming very familiar with the consequences of a very high degree of entanglement
  • On the flip side, we are very unfamiliar with the consequences of a very low degree of entanglement, but that should not stop us from accepting what really does happen

I’ll finish with another quote from Ron that made a lot of sense to me after learning all of this:

“Spooky action at a distance” is no more (and no less) mysterious than “spooky action across time.” Both are produced by the same physical mechanism.Ron Garret