Loading learning content...
The human brain is perhaps the most sophisticated information processing system in the known universe. With approximately 86 billion neurons connected through an estimated 100 trillion synapses, it performs feats of pattern recognition, reasoning, and learning that still surpass the most advanced artificial systems in many domains.
Understanding how the brain computes is not merely an academic curiosity—it is the foundational inspiration that launched the entire field of neural networks and deep learning. To truly understand artificial neural networks, we must first appreciate the biological machinery they attempt to emulate.
By the end of this page, you will understand the fundamental architecture and operation of biological neurons—from their anatomical structure to their electrochemical signaling mechanisms. This knowledge provides essential context for understanding why artificial neurons are designed the way they are, and what biological features they capture or ignore.
The modern understanding of neural computation begins with the Neuron Doctrine, established in the late 19th century primarily through the work of Santiago Ramón y Cajal and Camillo Golgi. Before their discoveries, scientists debated whether the nervous system was a continuous mesh (the reticular theory) or composed of discrete units.
Ramón y Cajal, using Golgi's staining technique, demonstrated definitively that the nervous system consists of individual, structurally distinct cells—neurons—that communicate with each other at specialized junctions. This insight was revolutionary: it meant that neural computation could be understood as the collective behavior of discrete computational units, each processing and transmitting information.
Ramón y Cajal and Golgi shared the 1906 Nobel Prize in Physiology or Medicine for their work on the structure of the nervous system—despite Golgi never fully accepting the neuron doctrine. This foundation underlies every neural network we build today: the idea that intelligence emerges from networks of simple, discrete computational units.
The key principles of the Neuron Doctrine:
Structural Independence: Neurons are discrete anatomical units with distinct boundaries, not fused into a continuous network
Functional Independence: Each neuron operates as an independent information-processing unit
Connectivity Through Synapses: Neurons communicate through specialized junctions called synapses, where information passes from one neuron to another
Directional Signal Flow: In most cases, information flows in one direction—from dendrites to axon terminals (though we now know there are exceptions)
These principles directly inform the design of artificial neural networks, where we model neurons as discrete units connected through weighted edges that transmit signals in a specified direction.
A typical neuron consists of three main anatomical regions, each playing a distinct role in neural computation:
The soma is the metabolic center of the neuron, containing the nucleus and the molecular machinery required for the cell's survival. But it also serves a critical computational function: it integrates incoming signals from all connected neurons.
The soma is typically 10-100 micrometers in diameter. Its membrane maintains a resting potential of approximately -70 millivolts (mV) relative to the extracellular fluid. This voltage difference—created by ion pumps that maintain unequal concentrations of sodium (Na⁺), potassium (K⁺), and other ions across the membrane—is the foundation of neural signaling.
Dendrites are tree-like branching structures that extend from the soma and serve as the primary input structures of the neuron. The word 'dendrite' comes from the Greek word for 'tree,' reflecting their branching morphology.
Key properties of dendrites:
| Structure | Primary Function | Signal Type | Typical Size |
|---|---|---|---|
| Dendrites | Receive input signals | Graded potentials (passive) | Up to 2mm total length |
| Soma (Cell Body) | Integrate signals, cell maintenance | Integration zone | 10-100 μm diameter |
| Axon Hillock | Action potential initiation | Threshold detection | ~1 μm |
| Axon | Transmit output signal | Action potentials (digital) | 1 μm to 1+ meter |
| Axon Terminals | Release neurotransmitters | Chemical transmission | 1-5 μm |
The axon is a single, long projection that carries the neuron's output signal away from the soma to other neurons. While each neuron has only one axon, that axon may branch extensively near its target region.
Critical axon properties:
At its target, the axon branches into many axon terminals (also called synaptic boutons). These specialized structures contain vesicles filled with neurotransmitters—chemical messengers that carry the signal across the synaptic cleft to the next neuron.
When an action potential reaches an axon terminal:
Notice how the biological neuron's architecture suggests a computational model: multiple inputs (dendrites) are integrated (soma), and if the combined input exceeds a threshold (axon hillock), an output is generated (axon). This forms the conceptual basis for the artificial neuron model we'll explore in the next page.
Neurons are fundamentally electrochemical devices. They use both electrical signaling (within the neuron) and chemical signaling (between neurons) to process and transmit information. Understanding this dual nature is crucial for appreciating what artificial neurons simplify or abstract away.
At rest, a neuron maintains a voltage difference of approximately -70 mV across its membrane (inside negative relative to outside). This resting potential is established and maintained by:
The sodium-potassium pump (Na⁺/K⁺-ATPase): This active transport protein uses ATP to pump 3 Na⁺ ions out and 2 K⁺ ions in, creating concentration gradients
Ion channel selectivity: The membrane at rest is more permeable to K⁺ than Na⁺, so potassium tends to leak out, making the inside more negative
Electrostatic forces: The resulting charge separation creates an electrical gradient that eventually balances the concentration gradient
The resting potential represents a state of dynamic equilibrium—ions are constantly moving, but the net voltage remains stable. This potential energy, like a cocked spring, is what allows rapid neural signaling.
When neurotransmitters bind to receptors on dendrites, they cause graded potentials—changes in membrane voltage that vary in amplitude based on the strength of the input. These are the neuron's analog input signals.
Properties of graded potentials:
Excitatory postsynaptic potentials (EPSPs): Depolarize the membrane toward the threshold, increasing the probability of firing
Inhibitory postsynaptic potentials (IPSPs): Hyperpolarize the membrane away from threshold, decreasing firing probability
The soma integrates all incoming EPSPs and IPSPs—essentially performing a weighted sum of all inputs, where the weights depend on synapse strength, location, and timing. This is precisely what artificial neurons model with their weighted sum operation.
The biological neuron's integration of EPSPs and IPSPs directly inspired the weighted sum operation in artificial neurons: Σᵢ wᵢxᵢ. Excitatory inputs correspond to positive weights, inhibitory inputs to negative weights. The strength of a synapse maps to the weight magnitude.
If the integrated graded potentials at the axon hillock (the junction between soma and axon) exceed approximately -55 mV (the threshold potential), an action potential is triggered. The action potential is the neuron's digital output signal—it either fires completely or not at all.
The action potential sequence:
Threshold reached: Membrane potential at axon hillock reaches ~-55 mV
Rapid depolarization: Voltage-gated Na⁺ channels open → Na⁺ rushes in → membrane potential shoots to ~+30 mV (within 1 ms)
Repolarization: Na⁺ channels inactivate; voltage-gated K⁺ channels open → K⁺ rushes out → membrane potential returns toward rest
Hyperpolarization: K⁺ channels close slowly → membrane briefly overshoots to ~-80 mV (refractory period)
Return to rest: Na⁺/K⁺ pumps restore resting potential; neuron ready to fire again
Key properties of action potentials:
| Property | Graded Potentials | Action Potentials |
|---|---|---|
| Amplitude | Variable (proportional to input) | Fixed (~100 mV total swing) |
| Propagation | Passive, decremental | Active, regenerative |
| Distance | Short (millimeters) | Long (up to meters) |
| Summation | Yes (temporal and spatial) | No (all-or-nothing) |
| Direction | Bidirectional | Unidirectional (axon → terminals) |
| Neural analog | Weighted sum input | Thresholded output |
| Artificial analog | Σ wᵢxᵢ | Activation function output |
The synapse is where the computation really happens. It's the interface between neurons—the point where one neuron's output becomes another neuron's input. Understanding synapses is crucial because synaptic weights are what artificial neural networks adjust during learning.
Most synapses in the brain are chemical synapses, where information is transmitted via neurotransmitter molecules. The synapse consists of three parts:
Presynaptic terminal: The axon terminal of the sending neuron, containing vesicles filled with neurotransmitters
Synaptic cleft: A 20-40 nanometer gap between neurons filled with extracellular fluid
Postsynaptic membrane: The receiving neuron's membrane (usually a dendritic spine), containing neurotransmitter receptors
The synaptic transmission process:
Major neurotransmitters in the brain:
Glutamate: The primary excitatory neurotransmitter. Binds to AMPA and NMDA receptors. Responsible for most fast excitatory transmission.
GABA (γ-aminobutyric acid): The primary inhibitory neurotransmitter. Binds to GABA receptors. Critical for preventing runaway excitation.
Dopamine: Involved in reward, motivation, and learning. Central to reinforcement learning circuits.
Acetylcholine: Important for attention and memory. Used at neuromuscular junctions.
Serotonin: Modulates mood, sleep, and various cognitive functions.
Receptor types:
Ionotropic receptors: Fast-acting. Neurotransmitter binding directly opens an ion channel. Response in milliseconds.
Metabotropic receptors: Slower but longer-lasting. Neurotransmitter binding triggers intracellular signaling cascades. Response over seconds to minutes.
A minority of synapses are electrical synapses, where neurons are connected by gap junctions—protein channels that directly link the cytoplasm of adjacent neurons. These allow direct electrical coupling:
The strength of a synapse—determined by factors like the number of neurotransmitter receptors, vesicle release probability, and receptor sensitivity—is what we model as 'weights' in artificial neural networks. When we train a neural network by adjusting weights, we're mimicking the biological process of synaptic strengthening and weakening.
How does the brain learn? The answer lies in synaptic plasticity—the ability of synapses to change their strength based on activity patterns. This is the biological foundation for all learning in neural networks, both biological and artificial.
In 1949, psychologist Donald Hebb proposed a theory of learning that remains central to both neuroscience and machine learning:
"When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased."
More succinctly: "Neurons that fire together, wire together."
Hebb's rule suggests that if a presynaptic neuron repeatedly contributes to firing a postsynaptic neuron, the connection between them should strengthen. This provides a mechanism for associative learning—for example, why repeatedly seeing a face and hearing a name together causes you to associate them.
LTP is the primary experimental paradigm for studying synaptic strengthening. Discovered in 1973 by Bliss and Lømo in the hippocampus, LTP demonstrates that synapses can undergo long-lasting increases in transmission efficacy.
Key properties of LTP:
Molecular mechanism:
The NMDA receptor acts as a coincidence detector—it only opens when there is both presynaptic activity (glutamate release) AND postsynaptic depolarization (glutamate binding + membrane depolarization). This implements Hebb's rule at the molecular level.
While backpropagation in artificial neural networks differs mechanistically from LTP, both implement the same core principle: connection strengths change based on the correlation between connected neurons' activities. Backpropagation uses gradient information to determine which direction to change weights; LTP uses local coincidence detection.
LTD is the opposite of LTP—a long-lasting decrease in synaptic strength. It's equally important for learning, as it allows the brain to:
Induction of LTD:
A more refined view of Hebbian learning emerged from STDP experiments in the 1990s:
This timing dependence implements a form of causality detection—the synapse asks "Did the presynaptic neuron help cause the postsynaptic neuron to fire?" If yes, strengthen; if no, weaken.
The STDP learning rule can be approximated as:
Δw = η × (A₊ × e^(-Δt/τ₊)) if Δt > 0 (pre before post) Δw = -η × (A₋ × e^(Δt/τ₋)) if Δt < 0 (pre after post)
Where Δt = t_post - t_pre, and A₊, A₋, τ₊, τ₋ are parameters controlling the magnitude and time constants of potentiation and depression.
Given that action potentials are all-or-nothing events with stereotyped waveforms, how does the brain encode information? This question of neural coding is fundamental to understanding biological computation and has implications for how we design artificial networks.
The most straightforward coding scheme is rate coding, where information is encoded in the firing rate of neurons—the number of action potentials per unit time.
Evidence for rate coding:
Mathematical representation:
r = f(I)
Where r is the firing rate and f(I) is some function of the input I. This directly corresponds to an artificial neuron's output activation.
Limitations of rate coding:
Temporal coding hypothesizes that precise spike timing carries information beyond just the firing rate.
Forms of temporal coding:
Latency coding: Information in the time to first spike after stimulus onset (observed in visual cortex, olfactory system)
Phase coding: Spike timing relative to ongoing oscillations (observed in hippocampus during navigation)
Synchrony coding: Information in which neurons fire together (observed in sensory binding)
Temporal patterns: Specific sequences of interspike intervals (observed in songbird communication)
Evidence for temporal coding:
Population coding recognizes that single neurons are noisy and limited; information is more reliably represented by populations of neurons.
Examples:
Place cells in hippocampus: Each neuron has a preferred location; the animal's position is encoded by which neurons are active and how strongly
Motor cortex population vectors: Direction of arm movement is encoded by a weighted sum of many neurons' preferred directions
Distributed representations: Concepts, objects, and categories are represented by patterns of activity across many neurons, not single 'grandmother cells'
This perspective directly informs artificial neural network design, where we use layers of many neurons to create distributed representations.
Standard artificial neurons abstract away temporal dynamics entirely—they compute continuous-valued activations rather than spike trains. Rate coding provides the primary justification: if information is encoded in firing rates, we can model a neuron's output as proportional to its rate, avoiding the complexity of spiking dynamics. Spiking neural networks (SNNs), which model spike timing explicitly, are an active research area that may offer computational and efficiency advantages.
The brain contains a remarkable diversity of neuron types, each specialized for particular computational roles. Understanding this diversity helps us appreciate what artificial neural networks simplify and what future architectures might incorporate.
Sensory neurons (afferent neurons):
Motor neurons (efferent neurons):
Interneurons:
| Type | Structure | Location | Function |
|---|---|---|---|
| Pyramidal cells | Large, pyramid-shaped soma; long apical dendrite | Cerebral cortex, hippocampus | Primary excitatory neurons; cortical computation |
| Purkinje cells | Very large; elaborate dendritic tree in single plane | Cerebellar cortex | Motor learning and coordination |
| Granule cells | Very small; few dendrites | Cerebellum, hippocampus | Pattern separation; most numerous neuron type |
| Stellate cells | Star-shaped dendritic tree | Cortex, cerebellum | Local inhibition |
| Basket cells | Axons form 'baskets' around other cell bodies | Cortex, cerebellum, hippocampus | Powerful inhibition of nearby neurons |
| Chandelier cells | Axon terminals resemble candelabra | Cerebral cortex | Inhibition at axon initial segment |
Excitatory neurons:
Inhibitory neurons:
The balance between excitation and inhibition (E/I balance) is crucial for proper brain function. Disrupted E/I balance is implicated in disorders from epilepsy to autism.
Modulatory neurons release neuromodulators (dopamine, serotonin, norepinephrine, acetylcholine) that don't directly trigger action potentials but alter how circuits respond to other inputs:
These modulatory systems provide global signals that adjust learning rates, attention, and arousal—suggesting biological precedents for concepts like learning rate schedules and attention mechanisms in artificial networks.
With this understanding of biological neurons, we can now appreciate what artificial neural networks capture and what they abstract away. This mapping is crucial for understanding both the power and the limitations of current deep learning approaches.
Preserved biological features:
Weighted summation of inputs: The graded potential integration across dendrites and soma → Σ wᵢxᵢ
Threshold-based activation: The action potential threshold → activation function
Modifiable connection strengths: Synaptic plasticity → weight updates during training
Distributed representations: Population coding of information → hidden layer activations
Hierarchical processing: Multi-level neural pathways → deep architectures with multiple layers
Specialization through learning: Neurons develop selectivity for particular inputs → learned feature detectors
Simplified or ignored biological features:
Temporal dynamics: Real neurons are continuous-time dynamical systems with rich temporal structure; artificial neurons are typically computed instantaneously
Spiking behavior: Action potentials encode information in spike timing; artificial neurons produce continuous activations (rate-code assumption)
Dendritic computation: Biological dendrites perform local nonlinear computations; artificial neurons have a single integration point
Diverse neuron types: The brain has hundreds of distinct cell types; artificial networks typically use uniform units
Neuromodulation: Global signals like dopamine and serotonin modulate computation; artificial networks lack equivalent mechanisms (though attention approximates some functions)
Energy constraints: Biological neurons are energy-efficient, sparse, and event-driven; most artificial networks are dense and energy-intensive
Local learning rules: Synaptic plasticity uses only locally available information; backpropagation requires non-local gradient information
Despite these simplifications, artificial neural networks have achieved remarkable success. This suggests that the core computational principles—weighted summation, nonlinear activation, learned representations, and hierarchical processing—may be more important than the biological details.
However, the biological features we've abstracted away may hold keys to:
Researchers continue to draw inspiration from neuroscience to improve artificial networks.
You now have a thorough understanding of biological neurons—their structure, electrochemical signaling, synaptic transmission, plasticity, and information coding. This foundation is essential for understanding why artificial neurons are designed the way they are. In the next page, we'll see how these biological insights were distilled into mathematical models of artificial neurons.