in Artificial Intelligence, Digital, Psychology, Technology | Articles


I’ve previously written about a neurological view of will power and the design of an artificial neural network with short term memory. Both of those articles came from researching the structure and functioning of the human brain. In this article I describe some of the many ways that our brains store and retrieve memories. This is achieved by analogy to the study of Distributed Systems in computer science and a high level overview of how the brain works. I also discuss the implications of this information for the construction of Artificial Intelligence.

In Part 1 of this article I discuss the idea of “consensus” and how different parts of the brain try to agree on what we’ve experienced. I also introduces CAP Theorem from the study of distributed systems. Part 2 explains the mechanics of consensus through the form of “neuronal assemblies”. Part 3 explains different types of memory and how they work to trigger the activation of neuronal assemblies. Finally part 4 discusses the lessons this knowledge can provide us in constructing Artificial Intelligence capable of remembering things.


An analogy for consensus

Suppose an election is coming up. Conservative John Johnson and the liberal Jack Jackson are competing head to head. You want to predict who the winner will be and head out to conduct a simple poll.

You walk down the street and find 10 people and ask them who they will vote for. For John you tally 7 votes and for Jack 3, and then proclaim that John will win 70% of the vote. What is the probability that your poll is accurate? In this case there are too many variables to predict the accuracy of the poll – and there are two primary reasons.

Random variance – suppose you walked down the street of a particular conservative part of town and so the voters there were more likely to vote for John. Perhaps something opposite happens, like a bus had just unloaded passengers who had come from a workers union rally. Then your results would be skewed in the opposite direction.

Methodology – if you’ve ever done any sort of cold sales or spruiking you’ll know that you need to ask more than 10 people before 10 people will stop. Suppose your success rate is 25%. The reason your success rate is so low could be to do with your approach or your appearance. These things would alter the kinds of people that are likely to stop for you and those types of people might also correlate to certain voting patterns.

A grid of figures with different colours. They are ordered left-to-right by their colour. One subset is selected randomly with a red outline while another is chosen by everyone that fits within a particular square.
There are many ways to select a random sample – and some are more random than others

The solution to the random variance is to increase the sample size. The more people you ask the less likely that random noise is going to have an impact on your outcome. The solution to the methodology is to profile the people that you do ask. Find out their demographics – age, gender, occupation, address, etc. Compare these values to the actual population distribution and then adjust the weight of those responses accordingly.

Of course the simplest solution to both problems would be to ask every single voter, but by that point you will have conducted the election and defeated the purpose of predicting who would win one. A good poll is capable of predicting the consensus of the entire population from a sample.

Distributed consensus

In the modern age many pieces of software run on distributed networks. Multiple computers that process and store information simultaneously. Often the distribution of these computers is across vast distances – sometimes in different continents.

Consider a social network in which people can interact with each other’s profiles. Any interaction immediately appears on that person’s profile visible to everyone. Now if Jane makes a comment about Joan on her smartphone, it connects to a server in Hong Kong and updates Joan’s profile. Joan checks her profile on her laptop which connects to a server in New York. In order to see the comment the Hong Kong server needs to update the New York server.

A server in Paris, New York and Hong Kong, with Jim, Jane and Joan connected to each respectively.

So far this sounds pretty simple. And the first people who designed distributed systems probably thought the same thing. But in the tech world we get used to hearing the phrases “edge case” and “corner case”. And distributed systems are rife with these situations.

Suppose Jane makes the comment, the Hong Kong server then tells the New York server to update Joan’s profile. Now Joan sees the update and writes a reply. The New York server sends the reply but there is a problem with the connection to Hong Kong and the packets are lost. Now the New York server is aware of a comment reply the Hong Kong server is oblivious to.

Jim sees the reply on his laptop via the Paris server. Paris updates Hong Kong saying “add a reply to this message” but the Hong Kong server responds that it doesn’t exist. Paris updates Hong Kong with the information it’s missing but another problem happens. Joan deletes her reply on the New York server and those packets also get lost on the connection to Hong Kong. Paris receives the update that the reply was deleted but it had already told Hong Kong about it.

Janes laptop sends the New York server a 'delete message' request. The Paris server tells Hong Kong about a reply to a message. To which Hong Kong responds 'what message?'. A connection problem happens between New York and Hong Kong.
Servers can receive conflicting information about the same data.

Now there is a problem – the Hong Kong server thinks this reply exists but the New York server does not. If Jane replies to Jim’s message, it will go out to both Paris and New York who will respond that the comment doesn’t exist. When Hong Kong was given a reply to a comment that didn’t exist, it updated its database. Should Paris and New York now update their databases the same way?

This is called a Distributed Consensus problem. In the world of distributed systems the ability to resolve these problems is constrained by the CAP Theorem. Which states that distributed systems can have two of the following three properties:

  • Consistency – is the data consistent between the different nodes in the network
  • Availability – is it always possible to access the data (e.g. no down-time)
  • Partition tolerance – is it possible to have a “split brain” where two halves of the network operate independently of each other because the connection between them has been severed

Partition tolerance is generally considered a given. In the social network example, it would be like saying because New York and Paris can no longer communicate with Hong Kong, they continue to function with their own data set, while Hong Kong also functions with completely different data. Reconciling these two servers once the connection is restored is far more difficult than the smaller partition realising its lost connectivity and shutting down.

Additionally, having chosen partition tolerance you need to balance availability with consistency. It’s not a one-or-the-other choice. You need consistent and available data, you just need to choose which is more valued.

There are a number of protocols that allow distributed servers to achieve strong consistency while maintaining partition resistance. Many of them involve the election of a “leader” which is the arbiter in any consistency disputes, with complex voting rules if communication with the leader is lost. There are also leaderless consensus protocols which involve confirmation requests that bounce back and forth between parties.

Then there is the so-called Byzantine Generals Problem which factors maintaining consistency in the face of adversarial parties which try to corrupt information.

Similarly there are availability focused protocols that ignore consistency. For example, consider instant messaging protocols which usually care more about up-time/“instant” responses and are more than happy to drop or even duplicate messages, so long as the conversation continues to flow without delay. Too much focus on consistency would cause performance bottlenecks and prevent the messaging from being “instant”.

One such approach is to use a Gossip Protocol. Nodes in the network randomly tell other nodes about new information they find, which is then passed to other random nodes. Like office gossip the information spreads randomly but quickly – and eventually everyone finds out. Some people find out twice, some never find out. Some hear conflicting information and simply spread both.

CAP Theorem in the human brain

The human brain favours availability to partition resistance. It’s entirely possible for someone to develop Split Brain Syndrome. Perhaps the reason that the brain has no defense against it is because it is a condition never encountered naturally, only through surgery. Instead, it’s important that parts of the brain are able to continue communicating unimpeded.

Corpus callosum with Anatomography
The Corpus Callosum joins the two hemispheres of the brain. It can be severed to cause “Split-brain Syndrome”. Image source: Corpus Callosum on Wikipedia

In fact even consistency isn’t that big a deal. We forget information all the time and déjà vu is an example of memories being stored twice – failing the consistency test. These sorts of problems lead to a lot of cognitive biases and make us susceptible to logical fallacies.

Much like predicting elections via polls – it would be more efficient if our brain simply stored all the information we absorb in high detail. But then accessing and corroborating that information would require far more processing power. And just like the availability focus of an instant messaging client, the brain favours speed and uptime over perfect recall.

Combating randomness with redundancy

The next part of this article will discuss the stochastic nature of the human brain – how the behaviour of individual neurons is largely random. This randomness extends not only to the frequency of neurons firing electrical signals, but also the death and decay or neurons and the fluctuations of neurotransmitter availability. Additionally there is the randomness of our day-to-day experiences that makes it difficult to predict what neuronal configuration will be optimal for our existence.

Just like random factors could influence the poll in the first analogy, it can influence decision making processes in our brain. This includes our ability to store and recall memories. In order to combat this the brain takes the approach of large-scale redundancy.

Far more neurons than necessary are involved in the vast majority of our neuronal processes. Suppose you want to recall what an elephant looks like. Vast networks in the brain are called up to associate an image (or images) that you associate with the word elephant. Your brain stores an “archetype” elephant which you can reference and are likely picturing right now. But it stores multiple subtle variations that can get picked fairly randomly and one of them will dominate by pure chance. And for each detail you are able to recall about an elephant there are redundant neurons in case any of the others fail to activate.

Additionally, the brain doesn’t necessarily have a collection of neurons that are the “elephant neurons”. Instead you have fragments stored across your brain and when you need to recall an “elephant archetype” your brain will construct a new one from scratch each time. The more recently you’ve seen an elephant or the more frequently you encounter them, the easier it will be to construct such an image.

Exposure to certain stimulation strengthens connections between neurons whose activity is synchronous in response to that stimuli. The more exposure you have to certain stimuli the more neurons that will have been active together. As that memory gets further away in time, or if it was only a fleeting experience, those neurons lose the strength between their connections.

This redundancy is like the aggregation of multiple polls prior to an election. The more samples and methods that are used the more accurate the prediction. The closer the poll to election day the more accurate the prediction. So the more you experience something the easier is to remember because there are more redundant experiences to refer to.

In an attempt to combat both internal and external randomness, our brain stores information sporadically, sparsely, widely and repeatedly. The extra redundancy ensures that when we need to recall information there are plenty of places from which to access it. By decaying old memories and focusing on things we’ve seen often and/or recently, the brain also attempts to combat the randomness of the external world.

We remember things that are common and/or recent because that’s the most pragmatic use of our limited brain capacity. In order to make it more available, multiple traces are stored in different parts of the brain. Then when we need to recall it, this redundancy is relied on to piece together a recollection of the information.

An alternative strategy could have been to avoid the redundancy and store detailed recollections of our experiences. But then recall would take longer as there would be more data to search through. But also the death and decay of a neuron could at best cause us to forget something, at worse split the brain into parts that remember certain things but not others.

In part 2 we look at the mechanics of distributed consensus in the brain by looking at the microscopic world of the neuron and contrast it with the elementary building blocks of the digital world.

Have you got a comment, criticism or suggestion? Contact Rick on or