This is the opening chapter of the Neural Networks & Deep Learning series. Before we touch a single line of framework code, you need a precise mental model of what a neural network actually is, not the brain metaphor everyone repeats, but the real machinery: numbers flowing through layers, getting multiplied and added, and an error signal slowly bending those numbers into something useful. Get this chapter right and every later chapter (activation functions, backpropagation, CNNs, Transformers) is just a variation on the same core idea.
An Artificial Neural Network (ANN) is a function approximator. You give it inputs, it produces outputs, and during training it adjusts its internal numbers so its outputs get closer to the answers you want. That is the whole job. The "neural" name comes from a loose analogy to biological neurons, but you should think of an ANN as a stack of simple mathematical operations arranged so that, together, they can model very complex relationships in data.
Traditional programming is rules plus data giving answers, where you write the logic yourself. A neural network flips this around: data plus answers give the rules. You show it thousands of examples and it figures out the internal rules, the weights, on its own. This is why neural networks shine on problems where the rules are impossible to write by hand, like recognising a cat in a photo, transcribing speech, or translating a sentence.
A biological neuron receives signals through dendrites, combines them in the cell body, and fires an output down its axon if the combined signal is strong enough. An artificial neuron mirrors this at a very high level: it receives several inputs, combines them, and produces a single output. That is where the resemblance stops. Real neurons are vastly more complex, and modern networks are not trying to simulate a brain, they are doing optimisation with calculus. Keep the analogy as intuition, not as fact.
Everything in deep learning is built from one tiny unit: the neuron, also called a node or unit. A single neuron does exactly three things. First it multiplies each input by a weight, a number that says how important that input is, and adds the results together. Then it adds a constant called the bias, which lets the neuron shift its output up or down independently of the inputs. Finally it passes the result through an activation function, which introduces non-linearity and decides the neuron's final output.
Written out, for inputs x₁, x₂, …, xₙ with weights w₁, w₂, …, wₙ and bias b:
z = (w₁·x₁ + w₂·x₂ + … + wₙ·xₙ) + b
output = activation(z)
That single line is the atom of every neural network ever built. A network with a billion parameters is just this operation, repeated.
Why the bias matters: Without a bias, every neuron is forced to pass through the origin, so when all inputs are zero the output is zero. The bias frees the neuron to activate even when inputs are small, the same way the intercept
ciny = mx + clets a line sit anywhere on the graph.
Neurons become powerful when you stack them into layers. A typical network has three kinds. The input layer has one neuron per input feature; if you feed a 28 by 28 pixel image, that is 784 input neurons, and this layer does no computation, it just holds the data. The hidden layers sit between input and output and are where the actual learning happens, with each neuron combining signals from the previous layer to extract patterns. The word "deep" simply means a network with many hidden layers. The output layer produces the final answer, and the number of neurons there depends on the task, from one neuron for a yes or no prediction to ten neurons to classify digits zero through nine.
When every neuron in one layer connects to every neuron in the next, the layer is called fully connected, or dense. This is the default building block you will use most often.
Forward propagation is the process of pushing input data through the network, layer by layer, until you get an output. Each layer computes its weighted sums and activations, then hands its results to the next layer as input. There is no learning here, just calculation, and this is exactly what happens every time a trained model makes a prediction.
Let's predict whether a student passes based on two inputs, study hours and sleep hours. Take one neuron with weights w₁ = 0.6 for study, w₂ = 0.4 for sleep, and bias b = -4. For a student with study of 5 and sleep of 7:
z = (0.6 × 5) + (0.4 × 7) + (-4)
z = 3.0 + 2.8 - 4
z = 1.8
output = sigmoid(1.8) ≈ 0.86
An output of 0.86 on a scale of 0 to 1 means the neuron is fairly confident the student passes. The sigmoid here is an activation function that squashes any number into the range 0 to 1, and we cover it and its alternatives in the activation functions chapter.
At the start, all weights and biases are random, so the network's predictions are useless. Learning is the loop that fixes this. The network runs a forward pass to get a prediction, then a loss function compares that prediction to the correct answer and outputs a single number describing how wrong it was. Backpropagation then works out how much each weight contributed to the error, and an optimiser like gradient descent nudges every weight in the direction that reduces the error.
Repeat this over many passes through the data, where each full pass is called an epoch, and the loss steadily drops as the network gets better. We unpack this properly in Backpropagation and Gradient Descent and the optimizers chapter. For now, just hold the shape of the loop in your head.
Nothing demystifies a neural network faster than coding the core operation yourself, with no framework hiding the math:
import numpy as np
def sigmoid(z):
return 1 / (1 + np.exp(-z))
# A single neuron: inputs = [study_hours, sleep_hours]
inputs = np.array([5.0, 7.0])
weights = np.array([0.6, 0.4])
bias = -4.0
# weighted sum + bias
z = np.dot(inputs, weights) + bias
# activation
output = sigmoid(z)
print(f"z = {z:.2f}") # z = 1.80
print(f"output = {output:.2f}") # output = 0.86
That line np.dot(inputs, weights) + bias is the entire forward computation of a neuron. A full network just does this many times across many layers. If you are rusty on NumPy arrays or functions, the Python Programming series covers the fundamentals you will lean on throughout this course.
In practice you will not hand-roll neurons. Frameworks handle the weighted sums, activations, and learning loop for you. Here is a tiny network that learns the same pass/fail task:
import tensorflow as tf
from tensorflow.keras import layers, models
model = models.Sequential([
layers.Input(shape=(2,)), # 2 inputs: study, sleep
layers.Dense(4, activation='relu'), # 1 hidden layer, 4 neurons
layers.Dense(1, activation='sigmoid') # 1 output: pass probability
])
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary() # shows layers and parameter counts
The three-step neuron, the layer structure, and the learning loop from above map directly onto Dense layers, activations, and compile. The framework is convenience, not magic; it is running the exact operations you just coded by hand.
Neural networks sit behind a huge range of everyday technology. In computer vision they handle face recognition, medical image diagnosis, and self-driving perception, powered by the CNNs we cover later in this series. In language they drive translation, sentiment analysis, chatbots, and the large language models behind modern AI assistants, built on the Transformer architecture. In finance they do fraud detection, credit scoring, and forecasting, and in consumer products they power the recommendation systems that decide what you see next. If you have worked through classical Machine Learning algorithms, neural networks are the natural next step, because they automate the feature engineering you used to do by hand and keep improving as you feed them more data.
A few ideas trip up newcomers. A neural network does not think like a brain; it performs numerical optimisation, and the brain analogy is a teaching aid rather than a description of the mechanism. More layers is not automatically better; deeper networks can model more but they also overfit, train slower, and need more data, so depth is a tool rather than a goal. And while it is true that the underlying operations are mostly matrix multiplication, that simplicity is exactly what makes the emergent behaviour from billions of them, tuned on huge datasets, so striking.
You now understand the neuron, layers, forward propagation, and the shape of the learning loop. The one thing we glossed over, the activation function, turns out to be what gives neural networks their real power. Without it, even a hundred-layer network is just a straight line. That is where we go next.
Sign in to join the discussion and post comments.
Sign in