On this tutorial

Foundations

How Networks Learn

Core Architectures

Working with Models in Practice

Introduction to Artificial Neural Networks

This is the opening chapter of the Neural Networks & Deep Learning series. Before we touch a single line of framework code, you need a precise mental model of what a neural network actually is, not the brain metaphor everyone repeats, but the real machinery: numbers flowing through layers, getting multiplied and added, and an error signal slowly bending those numbers into something useful. Get this chapter right and every later chapter (activation functions, backpropagation, CNNs, Transformers) is just a variation on the same core idea.

What Is an Artificial Neural Network?

An Artificial Neural Network (ANN) is a function approximator. You give it inputs, it produces outputs, and during training it adjusts its internal numbers so its outputs get closer to the answers you want. That is the whole job. The "neural" name comes from a loose analogy to biological neurons, but you should think of an ANN as a stack of simple mathematical operations arranged so that, together, they can model very complex relationships in data.

Traditional programming is rules plus data giving answers, where you write the logic yourself. A neural network flips this around: data plus answers give the rules. You show it thousands of examples and it figures out the internal rules, the weights, on its own. This is why neural networks shine on problems where the rules are impossible to write by hand, like recognising a cat in a photo, transcribing speech, or translating a sentence.

The Biological Inspiration, and Where It Ends

A biological neuron receives signals through dendrites, combines them in the cell body, and fires an output down its axon if the combined signal is strong enough. An artificial neuron mirrors this at a very high level: it receives several inputs, combines them, and produces a single output. That is where the resemblance stops. Real neurons are vastly more complex, and modern networks are not trying to simulate a brain, they are doing optimisation with calculus. Keep the analogy as intuition, not as fact.

The Artificial Neuron: The Smallest Building Block

Everything in deep learning is built from one tiny unit: the neuron, also called a node or unit. A single neuron does exactly three things. First it multiplies each input by a weight, a number that says how important that input is, and adds the results together. Then it adds a constant called the bias, which lets the neuron shift its output up or down independently of the inputs. Finally it passes the result through an activation function, which introduces non-linearity and decides the neuron's final output.

Written out, for inputs x₁, x₂, …, xₙ with weights w₁, w₂, …, wₙ and bias b:

z = (w₁·x₁ + w₂·x₂ + … + wₙ·xₙ) + b
output = activation(z)

That single line is the atom of every neural network ever built. A network with a billion parameters is just this operation, repeated.

Why the bias matters: Without a bias, every neuron is forced to pass through the origin, so when all inputs are zero the output is zero. The bias frees the neuron to activate even when inputs are small, the same way the intercept c in y = mx + c lets a line sit anywhere on the graph.

The Structure of a Neural Network

Neurons become powerful when you stack them into layers. A typical network has three kinds. The input layer has one neuron per input feature; if you feed a 28 by 28 pixel image, that is 784 input neurons, and this layer does no computation, it just holds the data. The hidden layers sit between input and output and are where the actual learning happens, with each neuron combining signals from the previous layer to extract patterns. The word "deep" simply means a network with many hidden layers. The output layer produces the final answer, and the number of neurons there depends on the task, from one neuron for a yes or no prediction to ten neurons to classify digits zero through nine.

When every neuron in one layer connects to every neuron in the next, the layer is called fully connected, or dense. This is the default building block you will use most often.

Forward Propagation: How a Prediction Is Made

Forward propagation is the process of pushing input data through the network, layer by layer, until you get an output. Each layer computes its weighted sums and activations, then hands its results to the next layer as input. There is no learning here, just calculation, and this is exactly what happens every time a trained model makes a prediction.

A Worked Numeric Example

Let's predict whether a student passes based on two inputs, study hours and sleep hours. Take one neuron with weights w₁ = 0.6 for study, w₂ = 0.4 for sleep, and bias b = -4. For a student with study of 5 and sleep of 7:

z = (0.6 × 5) + (0.4 × 7) + (-4)
z = 3.0 + 2.8 - 4
z = 1.8

output = sigmoid(1.8) ≈ 0.86

An output of 0.86 on a scale of 0 to 1 means the neuron is fairly confident the student passes. The sigmoid here is an activation function that squashes any number into the range 0 to 1, and we cover it and its alternatives in the activation functions chapter.

How Does a Neural Network Learn?

At the start, all weights and biases are random, so the network's predictions are useless. Learning is the loop that fixes this. The network runs a forward pass to get a prediction, then a loss function compares that prediction to the correct answer and outputs a single number describing how wrong it was. Backpropagation then works out how much each weight contributed to the error, and an optimiser like gradient descent nudges every weight in the direction that reduces the error.

Repeat this over many passes through the data, where each full pass is called an epoch, and the loss steadily drops as the network gets better. We unpack this properly in Backpropagation and Gradient Descent and the optimizers chapter. For now, just hold the shape of the loop in your head.

Build a Neuron From Scratch (Python + NumPy)

Nothing demystifies a neural network faster than coding the core operation yourself, with no framework hiding the math:

import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# A single neuron: inputs = [study_hours, sleep_hours]
inputs  = np.array([5.0, 7.0])
weights = np.array([0.6, 0.4])
bias    = -4.0

# weighted sum + bias
z = np.dot(inputs, weights) + bias

# activation
output = sigmoid(z)

print(f"z = {z:.2f}")           # z = 1.80
print(f"output = {output:.2f}")  # output = 0.86

That line np.dot(inputs, weights) + bias is the entire forward computation of a neuron. A full network just does this many times across many layers. If you are rusty on NumPy arrays or functions, the Python Programming series covers the fundamentals you will lean on throughout this course.

The Same Thing With a Framework (Keras / TensorFlow)

In practice you will not hand-roll neurons. Frameworks handle the weighted sums, activations, and learning loop for you. Here is a tiny network that learns the same pass/fail task:

import tensorflow as tf
from tensorflow.keras import layers, models

model = models.Sequential([
    layers.Input(shape=(2,)),               # 2 inputs: study, sleep
    layers.Dense(4, activation='relu'),     # 1 hidden layer, 4 neurons
    layers.Dense(1, activation='sigmoid')   # 1 output: pass probability
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.summary()  # shows layers and parameter counts

The three-step neuron, the layer structure, and the learning loop from above map directly onto Dense layers, activations, and compile. The framework is convenience, not magic; it is running the exact operations you just coded by hand.

Where Neural Networks Are Used

Neural networks sit behind a huge range of everyday technology. In computer vision they handle face recognition, medical image diagnosis, and self-driving perception, powered by the CNNs we cover later in this series. In language they drive translation, sentiment analysis, chatbots, and the large language models behind modern AI assistants, built on the Transformer architecture. In finance they do fraud detection, credit scoring, and forecasting, and in consumer products they power the recommendation systems that decide what you see next. If you have worked through classical Machine Learning algorithms, neural networks are the natural next step, because they automate the feature engineering you used to do by hand and keep improving as you feed them more data.

Common Misconceptions to Avoid

A few ideas trip up newcomers. A neural network does not think like a brain; it performs numerical optimisation, and the brain analogy is a teaching aid rather than a description of the mechanism. More layers is not automatically better; deeper networks can model more but they also overfit, train slower, and need more data, so depth is a tool rather than a goal. And while it is true that the underlying operations are mostly matrix multiplication, that simplicity is exactly what makes the emergent behaviour from billions of them, tuned on huge datasets, so striking.

Key Terms Recap

Neuron / Node: the basic unit; computes a weighted sum, adds bias, applies activation.
Weight: a learnable number controlling an input's influence.
Bias: a learnable constant that shifts the neuron's output.
Activation function: introduces non-linearity, covered next.
Forward propagation: computing an output from inputs.
Loss function: measures prediction error.
Backpropagation: assigns blame for the error to each weight.
Epoch: one full pass over the training data.

What's Next

You now understand the neuron, layers, forward propagation, and the shape of the learning loop. The one thing we glossed over, the activation function, turns out to be what gives neural networks their real power. Without it, even a hundred-layer network is just a straight line. That is where we go next.

Discussion

Activation Functions in Neural Networks

Introduction to Artificial Neural Networks

What Is an Artificial Neural Network?

The Biological Inspiration, and Where It Ends

The Artificial Neuron: The Smallest Building Block

Written out, for inputs x₁, x₂, …, xₙ with weights w₁, w₂, …, wₙ and bias b:

z = (w₁·x₁ + w₂·x₂ + … + wₙ·xₙ) + b
output = activation(z)

That single line is the atom of every neural network ever built. A network with a billion parameters is just this operation, repeated.

Why the bias matters: Without a bias, every neuron is forced to pass through the origin, so when all inputs are zero the output is zero. The bias frees the neuron to activate even when inputs are small, the same way the intercept c in y = mx + c lets a line sit anywhere on the graph.

The Structure of a Neural Network

When every neuron in one layer connects to every neuron in the next, the layer is called fully connected, or dense. This is the default building block you will use most often.

Forward Propagation: How a Prediction Is Made

A Worked Numeric Example

z = (0.6 × 5) + (0.4 × 7) + (-4)
z = 3.0 + 2.8 - 4
z = 1.8

output = sigmoid(1.8) ≈ 0.86

How Does a Neural Network Learn?

Build a Neuron From Scratch (Python + NumPy)

Nothing demystifies a neural network faster than coding the core operation yourself, with no framework hiding the math:

import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# A single neuron: inputs = [study_hours, sleep_hours]
inputs  = np.array([5.0, 7.0])
weights = np.array([0.6, 0.4])
bias    = -4.0

# weighted sum + bias
z = np.dot(inputs, weights) + bias

# activation
output = sigmoid(z)

print(f"z = {z:.2f}")           # z = 1.80
print(f"output = {output:.2f}")  # output = 0.86

The Same Thing With a Framework (Keras / TensorFlow)

In practice you will not hand-roll neurons. Frameworks handle the weighted sums, activations, and learning loop for you. Here is a tiny network that learns the same pass/fail task:

import tensorflow as tf
from tensorflow.keras import layers, models

model = models.Sequential([
    layers.Input(shape=(2,)),               # 2 inputs: study, sleep
    layers.Dense(4, activation='relu'),     # 1 hidden layer, 4 neurons
    layers.Dense(1, activation='sigmoid')   # 1 output: pass probability
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.summary()  # shows layers and parameter counts

Where Neural Networks Are Used

Common Misconceptions to Avoid

Key Terms Recap

Neuron / Node: the basic unit; computes a weighted sum, adds bias, applies activation.
Weight: a learnable number controlling an input's influence.
Bias: a learnable constant that shifts the neuron's output.
Activation function: introduces non-linearity, covered next.
Forward propagation: computing an output from inputs.
Loss function: measures prediction error.
Backpropagation: assigns blame for the error to each weight.
Epoch: one full pass over the training data.

What's Next

Discussion

Activation Functions in Neural Networks

Introduction to Artificial Neural Networks