Alright, OpenAI o1 is out. If you are anything like me, you first chuckled at the description that it was “designed to spend more time thinking before they respond“. But once I delved deeper, it quickly became mind-blowing. (By the way, Ethan Mollick offers an excellent explanation of the power of dedicating more computational resources to “thinking.”)
Developments like this deepen my admiration for the foundational concepts behind AI. At its core, the real breakthroughs seem to lie in math and biology, where simple concepts from each field have laid the groundwork for this widespread wave of change. Over a year ago, I scribbled some rough notes about neurons after watching Harvard’s CS50 Lecture 5 on Intro to AI. Below is a polished version of those thoughts.
Biological Neuron
The biological neuron is the basic unit of the nervous system, including the brain. The interactions between neurons are what ultimately enables us humans to ‘learn’ something. Here are four characteristics of a neuron worth noting:
- Takes input and gives output: Neurons receive sensory input and generate motor command output.
- Strength of connections: Neurons receive electrical signals from one another and one neuron can propagate that signals to another neuron. But not all connections are equal – some are stronger, others weaker. A famous saying goes “Neurons that fire together, wire together,” meaning that when two neurons activate simultaneously, their connection strengthens.
- Activation threshold: The input signals received by a neuron are not simply transmitted as-is. As these signals accumulate, they contribute to the overall electric potential of the receiving neuron. Once a certain threshold is reached, the neuron becomes ‘activated’ and propagates the signal to the next neuron.
- Layers: Neurons are connected to one another, arranged in complex networks, often forming distinct layers. For example, the brain’s cortex contains six layers, which have been shown to have similar patterns of electrical activity across all mammalian species.
All of this inspired the development of Artificial Neural Network (ANN) architecture by AI researchers as far back as the 1940s and 1950s. Like biological neurons, ANNs are also composed of fundamental units—artificial neurons*.
Artificial Neuron
An artificial neuron is a conceptual unit in a neural network designed to mimic the behavior of biological neurons. It shares several analogous properties:
- Takes input, Gives output: Each artificial neuron performs a mathematical transformation that takes input variables and manipulates them to produce an output. For example, an artificial neuron predicting house prices (p) might take three input parameters: zipcode (z), lot size (s), and number of bedrooms (b).
- Weights: Each input variable is multiplied by a weight, which determines how much it influences the output. In the house price example, the zipcode might be weighted 60%, with the other two inputs receiving 20% each. So, p = 0.6z + 0.2s + 0.2b . Determining the correct weights is the key challenge in ANN training (or ‘learning’), and the process often starts with random weights. (I wrote about that before. See last section of this post)
- Activation function: The activation function is a mathematical function applied to the result of the input and weight computations. There are various activation functions (such as boolean or ReLU), which essentially act as logic gates. These functions create a threshold that must be crossed for the output to be considered active and passed on to the next unit or layer.
- Layers: Artificial neurons can be combined, with each calculating a sub-output and passing it to others, creating complex networks. In many machine learning problems, we are often interested in more than one output. For example, other outputs might include the likelihood of a house being sold in 6 months or the likelihood of it being rented for a specific price.
This conceptual symmetry resonates with me, especially since my formative years were in medicine before transitioning to technology. I’ve intentionally simplified some of the peripheral details to keep this at a readable length. For example, it’s not just the input parameters that are multiplied by weights; even the function itself is adjusted by an overall weight (called a “bias”), which makes it possible to shift the final output up or down.
P.S.: *The quest to digitally recreate the human brain and its functions often leads to parallels being drawn between the brain and computers. Part of me cringes at this anthropomorphization—it’s not a healthy habit and can inevitably lead to unintended consequences. Needless to say, there is still a great deal we don’t know about the brain and how it works. The conceptual beauty of leveraging the neuron framework successfully does not mean we have unlocked the inner workings of human intelligence.