Project 1: The Manual Neuron

Project 1: The Manual Neuron

Learn how machines โ€œlearnโ€ by building a single neuron that teaches itself logic gates - no libraries, no shortcuts, just raw math becoming intelligence


Project Overview

Attribute Value
Difficulty Beginner
Time Estimate Weekend (8-16 hours)
Language Python (Pure, NO NumPy)
Alternative Languages C, Rust
Prerequisites Basic Python, high school algebra
Main Book Grokking Deep Learning by Andrew Trask
Knowledge Area Artificial Neurons / Logic Gates

Learning Objectives

After completing this project, you will be able to:

  1. Explain the perceptron algorithm - Describe how a single neuron computes its output from inputs, weights, and bias
  2. Implement forward propagation manually - Write output = (input1 * weight1) + (input2 * weight2) + bias without any library help
  3. Derive and apply the Delta Rule - Calculate weight updates based on error and learning rate
  4. Understand linear separability - Explain why single neurons can solve AND/OR but not XOR
  5. Train a model to convergence - Iterate until the neuron correctly predicts all truth table entries
  6. Connect math to AI intuition - See exactly how numbers changing leads to โ€œlearningโ€

The Core Question Youโ€™re Answering

โ€œHow can multiplying numbers lead to โ€˜decisionsโ€™?โ€

Before you write a single line of code, internalize this truth: a neural network making a decision is just drawing a line.

Think of the input space as a 2D plane where the x-axis is input1 and the y-axis is input2. The four possible inputs for a logic gate are the corners of a unit square:

    input2
      ^
    1 |   (0,1)-----(1,1)
      |     |         |
      |     |         |
    0 |   (0,0)-----(1,0)
      +----------------------> input1
          0         1

A single neuron draws a line (or in higher dimensions, a hyperplane) that separates โ€œpositiveโ€ examples from โ€œnegativeโ€ examples. The weights and bias define where that line sits.

When you train a perceptron, youโ€™re adjusting the line until it correctly separates all the positive examples from the negative ones.

Your task: Build the machine that finds that line automatically.


Concepts You Must Understand First

Stop and research these before coding:

1. The Dot Product and Weighted Sum

The fundamental operation of a neuron is the weighted sum: multiply each input by its corresponding weight, then add everything together (including the bias).

z = (x1 * w1) + (x2 * w2) + ... + (xn * wn) + b

This is a dot product plus a bias term. The dot product measures โ€œhow alignedโ€ two vectors are.

Why it matters: The dot product is the building block of ALL neural networks. Every hidden layer, every attention mechanism, every embedding lookup - they all reduce to dot products.

Book Reference: โ€œGrokking Deep Learningโ€ by Andrew Trask - Chapter 3: โ€œIntroduction to Neural Predictionโ€

2. The Step Activation Function

After computing the weighted sum, we need to make a decision: is this input โ€œpositiveโ€ or โ€œnegativeโ€? The step function does exactly this:

         1  if z >= threshold
step(z) =
         0  if z < threshold

Often, we set the threshold to 0 and absorb it into the bias:

         1  if z >= 0
step(z) =
         0  if z < 0

Visualization:

  output
    ^
  1 |         +------------
    |         |
    |         |
  0 |---------+
    +-------------------> z
              0

The step function is non-differentiable at z=0, which is why modern networks use ReLU or sigmoid. But for perceptrons learning logic gates, step works perfectly.

Book Reference: โ€œNeural Networks and Deep Learningโ€ by Michael Nielsen - Chapter 1, Section on โ€œPerceptronsโ€

3. Error Calculation

Error is the difference between what you wanted and what you got:

error = target - prediction

For binary outputs (0 or 1):

  • If target=1 and prediction=0: error = 1 (we need to increase the output)
  • If target=0 and prediction=1: error = -1 (we need to decrease the output)
  • If target=prediction: error = 0 (no change needed)

Why it matters: Error is the signal that drives learning. Without knowing how wrong you are, you canโ€™t improve.

Book Reference: โ€œGrokking Deep Learningโ€ by Andrew Trask - Chapter 4: โ€œIntroduction to Neural Learningโ€

4. The Perceptron Learning Algorithm (Delta Rule)

The Perceptron Learning Rule states:

w_new = w_old + (learning_rate * error * input)
b_new = b_old + (learning_rate * error)

Intuition:

  • If error > 0 (predicted too low), increase weights for inputs that were โ€œonโ€ (input=1)
  • If error < 0 (predicted too high), decrease weights for inputs that were โ€œonโ€
  • Inputs that were โ€œoffโ€ (input=0) donโ€™t change their weights (multiplying by 0)

Why this works: When an input contributed to a wrong prediction:

  • If the input was 1 and we predicted 0 (should be 1), increase that weight so next time the weighted sum is higher
  • If the input was 1 and we predicted 1 (should be 0), decrease that weight so next time the weighted sum is lower

Book Reference: โ€œNeural Networks and Deep Learningโ€ by Michael Nielsen - Chapter 1: โ€œThe Perceptron Learning Algorithmโ€

5. Linear Separability

A problem is linearly separable if you can draw a straight line (or hyperplane in higher dimensions) to separate the positive and negative examples.

AND Gate (linearly separable):

    x2
    ^
  1 |  O (0,1)     X (1,1)   <- One output is 1
    |
    |
  0 |  O (0,0)     O (1,0)   <- All these outputs are 0
    +----------------------> x1
       0           1

O = output 0
X = output 1

A line can separate the X from the Os:
    x2
    ^
  1 |  O         \ X
    |            \
    |           \
  0 |  O         \ O
    +-----------\-------> x1

XOR Gate (NOT linearly separable):

    x2
    ^
  1 |  X (0,1)     O (1,1)
    |
    |
  0 |  O (0,0)     X (1,0)
    +----------------------> x1

No single straight line can separate the Xs from the Os!
They are diagonally opposite.

This is the Minsky-Papert limitation that caused the first โ€œAI Winterโ€ in the 1960s-70s.

Book Reference: โ€œGrokking Deep Learningโ€ by Andrew Trask - Chapter 3: โ€œLinear Separabilityโ€


Deep Theoretical Foundation

History of the Perceptron (Rosenblatt 1958)

In 1958, Frank Rosenblatt at Cornell Aeronautical Laboratory created the Perceptron - the first algorithm that could learn from data. It was inspired by how neurons in the brain work.

   Historical Timeline of Neural Networks
   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
   โ”‚                                                                 โ”‚
   โ”‚  1943: McCulloch-Pitts neuron (theoretical model)               โ”‚
   โ”‚    โ”‚                                                            โ”‚
   โ”‚    โ–ผ                                                            โ”‚
   โ”‚  1958: Rosenblatt's Perceptron (first learning algorithm)       โ”‚
   โ”‚    โ”‚                                                            โ”‚
   โ”‚    โ–ผ                                                            โ”‚
   โ”‚  1969: Minsky & Papert "Perceptrons" book (XOR problem)         โ”‚
   โ”‚    โ”‚                                                            โ”‚
   โ”‚    โ–ผ                                                            โ”‚
   โ”‚  1969-1986: "AI Winter" (research funding dried up)             โ”‚
   โ”‚    โ”‚                                                            โ”‚
   โ”‚    โ–ผ                                                            โ”‚
   โ”‚  1986: Rumelhart, Hinton, Williams (Backpropagation)            โ”‚
   โ”‚    โ”‚                                                            โ”‚
   โ”‚    โ–ผ                                                            โ”‚
   โ”‚  2012: AlexNet (Deep Learning Renaissance)                      โ”‚
   โ”‚    โ”‚                                                            โ”‚
   โ”‚    โ–ผ                                                            โ”‚
   โ”‚  Today: Transformers, LLMs, etc.                                โ”‚
   โ”‚                                                                 โ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Rosenblattโ€™s perceptron was physical hardware - the Mark I Perceptron had 400 photocells connected to neurons implemented as potentiometers (variable resistors). It could learn to recognize letters.

The perceptron was overhyped. The New York Times declared it the โ€œembryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.โ€

Then came the crash.

The Minsky-Papert Book and the First AI Winter

In 1969, Marvin Minsky and Seymour Papert published โ€œPerceptrons,โ€ a mathematical analysis showing the fundamental limitations of single-layer perceptrons.

Their key result: A single perceptron cannot learn XOR because XOR is not linearly separable.

This devastated AI research funding. If neural networks couldnโ€™t even learn XOR, how could they learn anything useful?

What Minsky and Papert actually proved was technically correct but practically misleading. They acknowledged that multi-layer perceptrons (what we now call neural networks) could solve XOR, but dismissed them because โ€œthere is no learning algorithm for multi-layer perceptrons.โ€

They were wrong. The backpropagation algorithm was discovered (and forgotten, and rediscovered) multiple times before being popularized in 1986.

The lesson: Understanding the perceptron deeply - including its limitations - is essential for understanding why we need multiple layers and more sophisticated architectures.

Mathematical Formulation

A perceptron with n inputs computes:

                   n
           z = b + ฮฃ (xi * wi)
                  i=1

           y = step(z)

Where:

  • xi = input i (binary: 0 or 1 for logic gates)
  • wi = weight for input i (real number, learned)
  • b = bias (real number, learned)
  • z = weighted sum (real number)
  • y = output (binary: 0 or 1 after step function)

ASCII Diagram of a 2-Input Perceptron:

                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚                                             โ”‚
    Input x1 โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚  x1 * w1 โ”€โ”€โ”                                โ”‚
                    โ”‚            โ”‚                                โ”‚
                    โ”‚            โ–ผ                                โ”‚
                    โ”‚         โ”Œโ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
    Input x2 โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚  x2*w2โ”€โ”€โ–บโ”‚  ฮฃ  โ”‚โ”€โ”€โ”€โ–บโ”‚ step(z)  โ”‚โ”€โ”€โ–บโ”‚ Outputโ”‚โ”€โ”ผโ”€โ”€โ–บ y
                    โ”‚            โ”‚   โ–ฒ    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
                    โ”‚            โ–ผ   โ”‚                            โ”‚
    Bias 1 โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚    b โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                โ”‚
                    โ”‚                                             โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

    z = (x1 * w1) + (x2 * w2) + b
    y = step(z) = 1 if z >= 0 else 0

The Decision Boundary

The perceptron decides y = 1 when:

z >= 0
(x1 * w1) + (x2 * w2) + b >= 0

Rearranging to see the line equation:

x2 >= (-w1/w2)*x1 + (-b/w2)

This is a line with:

  • Slope: -w1/w2
  • Intercept: -b/w2

Example: Trained OR Gate

After training, letโ€™s say: w1 = 1.5, w2 = 1.5, b = -1.0

Decision boundary: 1.5*x1 + 1.5*x2 - 1.0 = 0

Rearranging: x2 = -x1 + 0.67

    x2
    ^
  1 |  X (0,1)  \   X (1,1)    <- Both have output 1
    |            \
    |             \
0.67|              \           <- Decision boundary
    |               \
  0 |  O (0,0)       \ X (1,0) <- (0,0) is 0, (1,0) is 1
    +------------------\-----> x1
       0       0.67    1

Points above/right of line โ†’ output 1
Points below/left of line โ†’ output 0

Why XOR Fails

For XOR:

  • (0,0) โ†’ 0
  • (0,1) โ†’ 1
  • (1,0) โ†’ 1
  • (1,1) โ†’ 0
    x2
    ^
  1 |  X (0,1)     O (1,1)
    |     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    |     โ”‚ No single   โ”‚
    |     โ”‚ line works! โ”‚
    |     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  0 |  O (0,0)     X (1,0)
    +----------------------> x1

The X points are on opposite corners.
Any line that separates (0,1) from (1,1)
will also separate (0,0) from (1,0) incorrectly.

This is why XOR required multi-layer perceptrons (hidden layers) - they can draw curved decision boundaries.

The Delta Rule Derivation

The perceptron learning algorithm minimizes error through gradient descent (though Rosenblatt didnโ€™t frame it that way).

For the step function, we canโ€™t compute a true gradient (itโ€™s not differentiable). But we can use a heuristic:

Update Rule:

ฮ”wi = ฮท * (t - y) * xi
wi(new) = wi(old) + ฮ”wi

Where:

  • ฮท (eta) = learning rate (typically 0.1 to 1.0)
  • t = target (expected output)
  • y = predicted output
  • xi = input

Intuition:

  • If t = 1 and y = 0: error = 1, so we add ฮท * xi to each weight. This makes z larger next time for this input pattern.
  • If t = 0 and y = 1: error = -1, so we subtract ฮท * xi from each weight. This makes z smaller next time.
  • If t = y: error = 0, no change.

Convergence Theorem: The perceptron convergence theorem (Novikoff, 1962) proves that if the training data is linearly separable, the perceptron learning algorithm will converge to a solution in finite iterations.


Real World Outcome

Youโ€™ll run a script that starts with random garbage weights (guessing randomly) and prints its โ€œlearning processโ€ until it perfectly mimics a logic gate.

Example Output (OR Gate):

$ python manual_neuron.py --gate OR

========================================
        PERCEPTRON TRAINING: OR GATE
========================================

Truth Table for OR:
  [0, 0] -> 0
  [0, 1] -> 1
  [1, 0] -> 1
  [1, 1] -> 1

Initial Weights (random):
  w1 = 0.23
  w2 = -0.47
  b  = 0.15
  Learning Rate: 0.1

----------------------------------------
Epoch 1:
  Input=[0, 0] z=0.15 Predicted=1 Target=0 Error=-1
    -> UPDATING: w1=0.23->0.23, w2=-0.47->-0.47, b=0.15->0.05
  Input=[0, 1] z=-0.42 Predicted=0 Target=1 Error=1
    -> UPDATING: w1=0.23->0.23, w2=-0.47->-0.37, b=0.05->0.15
  Input=[1, 0] z=0.38 Predicted=1 Target=1 Error=0 (Correct!)
  Input=[1, 1] z=0.01 Predicted=1 Target=1 Error=0 (Correct!)
  Epoch 1 Errors: 2/4

Epoch 2:
  Input=[0, 0] z=0.15 Predicted=1 Target=0 Error=-1
    -> UPDATING: w1=0.23->0.23, w2=-0.37->-0.37, b=0.15->0.05
  Input=[0, 1] z=-0.32 Predicted=0 Target=1 Error=1
    -> UPDATING: w1=0.23->0.23, w2=-0.37->-0.27, b=0.05->0.15
  Input=[1, 0] z=0.38 Predicted=1 Target=1 Error=0 (Correct!)
  Input=[1, 1] z=0.11 Predicted=1 Target=1 Error=0 (Correct!)
  Epoch 2 Errors: 2/4

... (many epochs later) ...

Epoch 43:
  Input=[0, 0] z=-0.12 Predicted=0 Target=0 Error=0 (Correct!)
  Input=[0, 1] z=0.78 Predicted=1 Target=1 Error=0 (Correct!)
  Input=[1, 0] z=0.95 Predicted=1 Target=1 Error=0 (Correct!)
  Input=[1, 1] z=1.85 Predicted=1 Target=1 Error=0 (Correct!)
  Epoch 43 Errors: 0/4

========================================
           TRAINING COMPLETE!
========================================

Final Weights:
  w1 = 1.07
  w2 = 0.90
  b  = -0.12

Decision Boundary Equation:
  1.07*x1 + 0.90*x2 - 0.12 = 0

----------------------------------------
            TESTING MODEL
----------------------------------------

[0, 0] -> z=-0.12 -> step -> 0 (Expected: 0) โœ“
[0, 1] -> z=0.78  -> step -> 1 (Expected: 1) โœ“
[1, 0] -> z=0.95  -> step -> 1 (Expected: 1) โœ“
[1, 1] -> z=1.85  -> step -> 1 (Expected: 1) โœ“

ALL TESTS PASSED!
The perceptron has learned the OR function.

Example Output (AND Gate):

$ python manual_neuron.py --gate AND

========================================
        PERCEPTRON TRAINING: AND GATE
========================================

Truth Table for AND:
  [0, 0] -> 0
  [0, 1] -> 0
  [1, 0] -> 0
  [1, 1] -> 1

Initial Weights (random):
  w1 = -0.15
  w2 = 0.32
  b  = 0.05
  Learning Rate: 0.1

... (training epochs) ...

Epoch 28:
  Input=[0, 0] z=-0.45 Predicted=0 Target=0 Error=0 (Correct!)
  Input=[0, 1] z=0.15 Predicted=1 Target=0 Error=-1
    -> UPDATING...
  ...

Epoch 67: SOLVED!

Final Weights:
  w1 = 0.80
  w2 = 0.75
  b  = -1.20

Testing:
[0, 0] -> 0 (Expected: 0) โœ“
[0, 1] -> 0 (Expected: 0) โœ“
[1, 0] -> 0 (Expected: 0) โœ“
[1, 1] -> 1 (Expected: 1) โœ“

ALL TESTS PASSED!

Example Output (XOR - Expected Failure):

$ python manual_neuron.py --gate XOR

========================================
        PERCEPTRON TRAINING: XOR GATE
========================================

Truth Table for XOR:
  [0, 0] -> 0
  [0, 1] -> 1
  [1, 0] -> 1
  [1, 1] -> 0

Initial Weights (random):
  w1 = 0.12
  w2 = 0.45
  b  = -0.08

... (training) ...

Epoch 100: Errors: 1/4
Epoch 200: Errors: 2/4
Epoch 500: Errors: 1/4
Epoch 1000: Still not converged!

========================================
       TRAINING FAILED (as expected)
========================================

XOR is not linearly separable.
A single perceptron cannot learn XOR.
You need hidden layers (multi-layer perceptron).

This is the Minsky-Papert limitation!

Solution Architecture

High-Level Design Approach

This section describes what your solution should look like, not how to implement it.

Architecture Diagram:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         PERCEPTRON TRAINING SYSTEM                      โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                         โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                                   โ”‚
โ”‚   โ”‚  Training Data  โ”‚                                                   โ”‚
โ”‚   โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚                                                   โ”‚
โ”‚   โ”‚  โ”‚ Inputs    โ”‚  โ”‚         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”          โ”‚
โ”‚   โ”‚  โ”‚ [0,0]     โ”‚  โ”‚         โ”‚         PERCEPTRON           โ”‚          โ”‚
โ”‚   โ”‚  โ”‚ [0,1]     โ”‚โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚  โ”Œโ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”   โ”‚          โ”‚
โ”‚   โ”‚  โ”‚ [1,0]     โ”‚  โ”‚         โ”‚  โ”‚ w1 โ”‚    โ”‚ w2 โ”‚   โ”‚ b  โ”‚   โ”‚          โ”‚
โ”‚   โ”‚  โ”‚ [1,1]     โ”‚  โ”‚         โ”‚  โ””โ”€โ”€โ”ฌโ”€โ”˜    โ””โ”€โ”€โ”ฌโ”€โ”˜   โ””โ”€โ”€โ”ฌโ”€โ”˜   โ”‚          โ”‚
โ”‚   โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚         โ”‚     โ”‚         โ”‚        โ”‚     โ”‚          โ”‚
โ”‚   โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚         โ”‚     โ–ผ         โ–ผ        โ–ผ     โ”‚          โ”‚
โ”‚   โ”‚  โ”‚ Targets   โ”‚  โ”‚         โ”‚    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚          โ”‚
โ”‚   โ”‚  โ”‚ 0,1,1,1   โ”‚  โ”‚         โ”‚    โ”‚ z = x1*w1 + x2*w2 + bโ”‚  โ”‚          โ”‚
โ”‚   โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚         โ”‚    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚          โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ”‚               โ”‚              โ”‚          โ”‚
โ”‚            โ”‚                  โ”‚               โ–ผ              โ”‚          โ”‚
โ”‚            โ”‚                  โ”‚    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚          โ”‚
โ”‚            โ”‚                  โ”‚    โ”‚ y = step(z)       โ”‚     โ”‚          โ”‚
โ”‚            โ”‚                  โ”‚    โ”‚   1 if z >= 0     โ”‚     โ”‚          โ”‚
โ”‚            โ”‚                  โ”‚    โ”‚   0 if z < 0      โ”‚     โ”‚          โ”‚
โ”‚            โ”‚                  โ”‚    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚          โ”‚
โ”‚            โ”‚                  โ”‚             โ”‚                โ”‚          โ”‚
โ”‚            โ”‚                  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜          โ”‚
โ”‚            โ”‚                                โ”‚                           โ”‚
โ”‚            โ”‚                                โ–ผ                           โ”‚
โ”‚            โ”‚                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                  โ”‚
โ”‚            โ”‚                    โ”‚    Prediction y    โ”‚                  โ”‚
โ”‚            โ”‚                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                  โ”‚
โ”‚            โ”‚                              โ”‚                             โ”‚
โ”‚            โ–ผ                              โ–ผ                             โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                   โ”‚
โ”‚   โ”‚           ERROR CALCULATION                     โ”‚                   โ”‚
โ”‚   โ”‚                                                 โ”‚                   โ”‚
โ”‚   โ”‚   error = target - prediction                   โ”‚                   โ”‚
โ”‚   โ”‚                                                 โ”‚                   โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                   โ”‚
โ”‚                           โ”‚                                             โ”‚
โ”‚                           โ”‚ if error != 0                               โ”‚
โ”‚                           โ–ผ                                             โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                   โ”‚
โ”‚   โ”‚           WEIGHT UPDATE (Delta Rule)            โ”‚                   โ”‚
โ”‚   โ”‚                                                 โ”‚                   โ”‚
โ”‚   โ”‚   w1 = w1 + (learning_rate * error * x1)        โ”‚                   โ”‚
โ”‚   โ”‚   w2 = w2 + (learning_rate * error * x2)        โ”‚                   โ”‚
โ”‚   โ”‚   b  = b  + (learning_rate * error * 1)         โ”‚                   โ”‚
โ”‚   โ”‚                                                 โ”‚                   โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                   โ”‚
โ”‚                           โ”‚                                             โ”‚
โ”‚                           โ”‚ Loop until all predictions correct          โ”‚
โ”‚                           โ–ผ                                             โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                   โ”‚
โ”‚   โ”‚           CONVERGENCE CHECK                     โ”‚                   โ”‚
โ”‚   โ”‚                                                 โ”‚                   โ”‚
โ”‚   โ”‚   If all 4 inputs predict correctly:            โ”‚                   โ”‚
โ”‚   โ”‚       STOP - Model is trained                   โ”‚                   โ”‚
โ”‚   โ”‚   Else:                                         โ”‚                   โ”‚
โ”‚   โ”‚       Continue to next epoch                    โ”‚                   โ”‚
โ”‚   โ”‚                                                 โ”‚                   โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                   โ”‚
โ”‚                                                                         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Data Structures Needed

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    DATA STRUCTURES                        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                           โ”‚
โ”‚  1. TRAINING DATA                                         โ”‚
โ”‚     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                             โ”‚
โ”‚     โ”‚  inputs   โ”‚  targets  โ”‚                             โ”‚
โ”‚     โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค                             โ”‚
โ”‚     โ”‚  [0, 0]   โ”‚     0     โ”‚                             โ”‚
โ”‚     โ”‚  [0, 1]   โ”‚   0 or 1  โ”‚  <- depends on gate         โ”‚
โ”‚     โ”‚  [1, 0]   โ”‚   0 or 1  โ”‚                             โ”‚
โ”‚     โ”‚  [1, 1]   โ”‚   0 or 1  โ”‚                             โ”‚
โ”‚     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                             โ”‚
โ”‚                                                           โ”‚
โ”‚  2. MODEL PARAMETERS (floats, updated during training)    โ”‚
โ”‚     โ€ข w1: weight for input 1                              โ”‚
โ”‚     โ€ข w2: weight for input 2                              โ”‚
โ”‚     โ€ข b:  bias                                            โ”‚
โ”‚                                                           โ”‚
โ”‚  3. HYPERPARAMETERS (constants, set before training)      โ”‚
โ”‚     โ€ข learning_rate: typically 0.1 to 1.0                 โ”‚
โ”‚     โ€ข max_epochs: limit iterations (e.g., 1000)           โ”‚
โ”‚                                                           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Function Breakdown

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                     FUNCTION DESIGN                       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                           โ”‚
โ”‚  step(z) -> int                                           โ”‚
โ”‚    Input:  z (weighted sum, float)                        โ”‚
โ”‚    Output: 0 or 1                                         โ”‚
โ”‚    Logic:  return 1 if z >= 0 else 0                      โ”‚
โ”‚                                                           โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€    โ”‚
โ”‚                                                           โ”‚
โ”‚  forward(x1, x2, w1, w2, b) -> (z, y)                     โ”‚
โ”‚    Input:  inputs x1, x2; weights w1, w2; bias b          โ”‚
โ”‚    Output: weighted sum z, prediction y                   โ”‚
โ”‚    Logic:  z = x1*w1 + x2*w2 + b                          โ”‚
โ”‚            y = step(z)                                    โ”‚
โ”‚                                                           โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€    โ”‚
โ”‚                                                           โ”‚
โ”‚  update_weights(w1, w2, b, x1, x2, error, lr) -> tuple    โ”‚
โ”‚    Input:  current weights, inputs, error, learning rate  โ”‚
โ”‚    Output: new (w1, w2, b)                                โ”‚
โ”‚    Logic:  Apply Delta Rule                               โ”‚
โ”‚                                                           โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€    โ”‚
โ”‚                                                           โ”‚
โ”‚  train(data, targets, lr, max_epochs) -> (w1, w2, b)      โ”‚
โ”‚    Input:  training data, targets, hyperparameters        โ”‚
โ”‚    Output: trained weights and bias                       โ”‚
โ”‚    Logic:  Loop epochs, update on errors, check converge  โ”‚
โ”‚                                                           โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€    โ”‚
โ”‚                                                           โ”‚
โ”‚  test(data, targets, w1, w2, b) -> bool                   โ”‚
โ”‚    Input:  test data, expected outputs, trained params    โ”‚
โ”‚    Output: True if all correct, False otherwise           โ”‚
โ”‚    Logic:  Run forward pass on each input, compare        โ”‚
โ”‚                                                           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Data Flow Diagram

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         TRAINING FLOW                                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

  Initialize           For each epoch          For each sample
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Random โ”‚           โ”‚  Reset     โ”‚          โ”‚ Get (x1, x2),  โ”‚
  โ”‚ w1,w2,bโ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚  epoch_err โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚ target         โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜           โ”‚  counter   โ”‚          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                  โ”‚
                                                       โ–ผ
                                            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                            โ”‚ Forward Pass       โ”‚
                                            โ”‚ z = x1*w1+x2*w2+b  โ”‚
                                            โ”‚ y = step(z)        โ”‚
                                            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                    โ”‚
                                                    โ–ผ
                                            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                            โ”‚ Calculate Error    โ”‚
                                            โ”‚ err = target - y   โ”‚
                                            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                    โ”‚
                                          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                          โ”‚                   โ”‚
                                    err != 0?           err == 0
                                          โ”‚                   โ”‚
                                          โ–ผ                   โ–ผ
                                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                   โ”‚ Update      โ”‚    โ”‚ No change   โ”‚
                                   โ”‚ weights     โ”‚    โ”‚ continue    โ”‚
                                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                          โ”‚                  โ”‚
                                          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                   โ”‚
                                                   โ–ผ
                                          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                          โ”‚ Next sample or โ”‚
                                          โ”‚ next epoch     โ”‚
                                          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                   โ”‚
                                         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                         โ”‚                   โ”‚
                                   All correct?        Still errors
                                         โ”‚                   โ”‚
                                         โ–ผ                   โ–ผ
                                  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                  โ”‚ STOP        โ”‚    โ”‚ Continue    โ”‚
                                  โ”‚ Return      โ”‚    โ”‚ training    โ”‚
                                  โ”‚ weights     โ”‚    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Phased Implementation Guide

Phase 1: Forward Pass (1-2 hours)

Goal: Implement the core computation of a neuron.

  1. Write the step function:
    • Takes a single float z
    • Returns 1 if z >= 0, else 0
  2. Write the forward function:
    • Takes inputs x1, x2 and parameters w1, w2, b
    • Computes z = x1*w1 + x2*w2 + b
    • Returns step(z)
  3. Test manually:
    # With w1=0.5, w2=0.5, b=-0.75
    # forward(0, 0, 0.5, 0.5, -0.75) should return 0 (z=-0.75)
    # forward(1, 1, 0.5, 0.5, -0.75) should return 1 (z=0.25)
    

Checkpoint: You should be able to manually set weights that make the forward function behave like AND or OR.

Phase 2: Error Calculation (30 minutes)

Goal: Compute how wrong the prediction is.

  1. Write an error function or just compute inline:
    • error = target - prediction
  2. Verify all cases:
    target=0, pred=0 -> error=0  (correct, no update)
    target=1, pred=1 -> error=0  (correct, no update)
    target=1, pred=0 -> error=1  (increase output)
    target=0, pred=1 -> error=-1 (decrease output)
    

Checkpoint: Given a prediction and target, you should know whether and how to update.

Phase 3: Weight Updates (1 hour)

Goal: Implement the Delta Rule.

  1. Write the update_weights function:
    def update_weights(w1, w2, b, x1, x2, error, learning_rate):
        w1_new = w1 + learning_rate * error * x1
        w2_new = w2 + learning_rate * error * x2
        b_new = b + learning_rate * error * 1  # bias input is always 1
        return w1_new, w2_new, b_new
    
  2. Test the update logic:
    • If error=1, x1=1, lr=0.1: w1 should increase by 0.1
    • If error=-1, x1=1, lr=0.1: w1 should decrease by 0.1
    • If x1=0: w1 should not change (0 * anything = 0)

Checkpoint: Weights change in the right direction based on error.

Phase 4: Training Loop (1-2 hours)

Goal: Repeat forward โ†’ error โ†’ update until convergence.

  1. Define training data for AND, OR, NAND, NOR gates
  2. Initialize weights randomly (small values, e.g., -1 to 1)
  3. Implement the epoch loop:
    for epoch in range(max_epochs):
        errors_this_epoch = 0
        for (x1, x2), target in zip(inputs, targets):
            z, prediction = forward(...)
            error = target - prediction
            if error != 0:
                update_weights(...)
                errors_this_epoch += 1
        if errors_this_epoch == 0:
            print("Converged!")
            break
    
  4. Add verbose logging to see learning progress

Checkpoint: Running the training loop on OR should converge within ~100 epochs.

Phase 5: Testing and Validation (1 hour)

Goal: Verify the trained perceptron works correctly.

  1. After training, run all 4 inputs through forward pass
  2. Compare to expected truth table
  3. Print pass/fail for each

Checkpoint: All 4 tests pass for AND, OR, NAND, NOR. XOR should fail to converge.


Questions to Guide Your Design

Before implementing, think through these:

Understanding the Algorithm

  1. Why random initialization?
    • What happens if you start with all zeros?
    • Why not start with โ€œgoodโ€ weights?
  2. What does the learning rate control?
    • What happens if learning_rate = 0?
    • What happens if learning_rate = 100?
    • Why is 0.1 a common choice?
  3. Why iterate through all samples before checking convergence?
    • Could you check after each sample?
    • Whatโ€™s the difference between โ€œepochโ€ and โ€œiterationโ€?

Understanding the Math

  1. Why multiply error by input in the update rule?
    • What happens to w1 when x1=0?
    • Why is this mathematically correct?
  2. How does the bias differ from weights?
    • What does the bias โ€œshiftโ€?
    • Why donโ€™t we multiply bias update by an input?
  3. What does the decision boundary look like geometrically?
    • Draw the boundary for a trained AND gate
    • How do the weights define its slope?

Understanding the Limits

  1. Why canโ€™t a perceptron learn XOR?
    • Draw the 4 XOR points and try to separate them with a line
    • What would you need to separate them?
  2. Whatโ€™s the minimum number of weights to learn a 2-input gate?
    • Could you do it with just w1 and w2 (no bias)?
    • When is bias essential?

Thinking Exercise

Before coding, trace this by hand:

Starting with:

  • w1 = 0.5
  • w2 = 0.5
  • b = -0.75
  • learning_rate = 0.1
  • Training for AND gate: (0,0)โ†’0, (0,1)โ†’0, (1,0)โ†’0, (1,1)โ†’1

Epoch 1 Trace:

Input z = x1w1 + x2w2 + b y = step(z) Target Error New w1 New w2 New b
(0,0) 00.5 + 00.5 - 0.75 = -0.75 0 0 0 0.5 0.5 -0.75
(0,1) 00.5 + 10.5 - 0.75 = -0.25 0 0 0 0.5 0.5 -0.75
(1,0) 10.5 + 00.5 - 0.75 = -0.25 0 0 0 0.5 0.5 -0.75
(1,1) 10.5 + 10.5 - 0.75 = 0.25 1 1 0 0.5 0.5 -0.75

Result: All correct on epoch 1! The initial weights happened to be good.

Now try with different starting weights:

  • w1 = -0.2
  • w2 = 0.3
  • b = 0.1

Trace Epoch 1:

Input z y Target Error Update New w1 New w2 New b
(0,0) 0(-0.2) + 00.3 + 0.1 = 0.1 1 0 -1 Yes ? ? ?
โ€ฆ ย  ย  ย  ย  ย  ย  ย  ย 

Your task: Complete this trace for all 4 inputs of epoch 1. Then continue to epoch 2.

Questions while tracing:

  • Which weight changed the most after the first error?
  • Why didnโ€™t w1 change when processing (0,0)?
  • How many epochs until all 4 are correct?

Testing Strategy

Unit Tests for Each Function

# Test step function
assert step(-1) == 0
assert step(0) == 1  # boundary case: z >= 0
assert step(0.001) == 1
assert step(-0.001) == 0

# Test forward pass
z, y = forward(0, 0, 1, 1, -1.5)  # Mimics AND
assert y == 0
z, y = forward(1, 1, 1, 1, -1.5)
assert y == 1

# Test update rule
w1, w2, b = 0.5, 0.5, 0
w1, w2, b = update_weights(w1, w2, b, 1, 0, 1, 0.1)
assert w1 == 0.6  # increased because x1=1, error=1
assert w2 == 0.5  # unchanged because x2=0
assert b == 0.1   # increased because error=1

Integration Test: Train and Verify

# Train on OR gate
inputs = [(0,0), (0,1), (1,0), (1,1)]
targets = [0, 1, 1, 1]
w1, w2, b = train(inputs, targets, learning_rate=0.1, max_epochs=1000)

# Verify all predictions
for (x1, x2), target in zip(inputs, targets):
    _, prediction = forward(x1, x2, w1, w2, b)
    assert prediction == target, f"Failed on {(x1, x2)}"

Convergence Test

# AND, OR, NAND, NOR should all converge
for gate_name, gate_targets in [("AND", [0,0,0,1]), ("OR", [0,1,1,1]), ...]:
    w1, w2, b, epochs = train_with_count(inputs, gate_targets, ...)
    assert epochs < 1000, f"{gate_name} didn't converge"

# XOR should NOT converge
w1, w2, b, epochs = train_with_count(inputs, [0,1,1,0], max_epochs=1000)
assert epochs == 1000, "XOR unexpectedly converged!"

Common Pitfalls and Debugging Tips

Pitfall 1: Off-by-One in Step Function

Symptom: Inconsistent results at z=0 Cause: Using > instead of >= or vice versa Fix: Decide on convention (usually z >= 0 โ†’ 1) and stick to it

Pitfall 2: Forgetting to Update Bias

Symptom: Model doesnโ€™t converge or converges slowly Cause: Only updating w1 and w2, not b Fix: Remember: b = b + lr * error * 1

Pitfall 3: Wrong Sign in Update Rule

Symptom: Error gets worse instead of better Cause: Using prediction - target instead of target - prediction Fix: Error should be positive when prediction is too low

Pitfall 4: Not Iterating Until Convergence

Symptom: Model seems random Cause: Only running one epoch Fix: Loop until zero errors in an epoch (or max epochs)

Pitfall 5: Learning Rate Too High

Symptom: Weights oscillate wildly, never settle Cause: learning_rate > 1 or very large values Fix: Use lr in range 0.01 to 1.0 (start with 0.1)

Pitfall 6: Learning Rate Too Low

Symptom: Takes thousands of epochs to converge Cause: learning_rate too small (e.g., 0.001) Fix: For simple logic gates, 0.1 to 1.0 works well

Debugging Technique: Print Everything

When stuck, print at each step:

print(f"Input: ({x1}, {x2})")
print(f"Weights before: w1={w1:.3f}, w2={w2:.3f}, b={b:.3f}")
print(f"z = {x1}*{w1} + {x2}*{w2} + {b} = {z:.3f}")
print(f"y = step({z:.3f}) = {y}")
print(f"Target: {target}, Error: {error}")
if error != 0:
    print(f"Updating: w1 += {lr}*{error}*{x1} = {lr*error*x1:.3f}")

The Interview Questions Theyโ€™ll Ask

Prepare to answer these:

1. โ€œExplain how a perceptron learns. Walk me through one update step.โ€

Key points to cover:

  • Forward pass: weighted sum + step function
  • Error calculation: target - prediction
  • Weight update: Delta Rule (w += lr * error * input)
  • Why inputs of 0 donโ€™t change their weights

2. โ€œWhat is the decision boundary of a perceptron?โ€

Key insight:

  • Itโ€™s a hyperplane (line in 2D) defined by w1x1 + w2x2 + b = 0
  • Weights define the orientation (slope)
  • Bias shifts the line

3. โ€œWhy canโ€™t a single perceptron learn XOR?โ€

Key insight:

  • XOR is not linearly separable
  • Positive examples are on opposite corners
  • No single line can separate them
  • Need hidden layers (MLP) to create non-linear boundaries

4. โ€œWhatโ€™s the difference between a perceptron and a modern neural network neuron?โ€

Key insight:

  • Perceptron: step function (non-differentiable)
  • Modern: sigmoid/ReLU (differentiable for gradient descent)
  • Perceptron: single layer
  • Modern: multiple layers with backpropagation

5. โ€œWhat is the role of the bias term?โ€

Key insight:

  • Bias shifts the decision boundary away from the origin
  • Without bias, the hyperplane must pass through origin
  • Example: AND gate needs negative bias to threshold at both inputs high

6. โ€œHow does the learning rate affect training?โ€

Key insight:

  • Too high: overshoots, oscillates, may not converge
  • Too low: converges slowly, may get stuck
  • Just right: smooth convergence to solution

7. โ€œWhat guarantees that a perceptron will converge?โ€

Key insight:

  • The Perceptron Convergence Theorem (Novikoff, 1962)
  • IF data is linearly separable
  • THEN algorithm will converge in finite steps
  • If not separable, it will loop forever (hence XOR failure)

Hints in Layers

Use these hints only when stuck. Try for at least 15 minutes before reading each hint.

Hint 1: Structure

Your main file should have:

  1. A function for the step activation
  2. A function for forward pass
  3. A function for weight updates
  4. A training loop that calls these
  5. A test function that verifies correctness

Hint 2: Initialization

Random initialization should be small values:

import random
w1 = random.uniform(-1, 1)
w2 = random.uniform(-1, 1)
b = random.uniform(-1, 1)

Hint 3: Training Data

Define your gates as dictionaries:

GATES = {
    'AND': [0, 0, 0, 1],
    'OR':  [0, 1, 1, 1],
    'NAND': [1, 1, 1, 0],
    'NOR': [1, 0, 0, 0],
    'XOR': [0, 1, 1, 0],  # Will not converge!
}
INPUTS = [(0, 0), (0, 1), (1, 0), (1, 1)]

Hint 4: The Training Loop Pattern

for epoch in range(max_epochs):
    total_error = 0
    for (x1, x2), target in zip(inputs, targets):
        # forward pass
        # calculate error
        # if error != 0: update weights
        # accumulate error count
    if total_error == 0:
        break  # Converged!

Hint 5: Edge Case - No Error

When prediction equals target, error is 0. The update equation:

w = w + lr * 0 * x = w + 0 = w

Weights donโ€™t change when youโ€™re already correct. This is important!


Extensions and Challenges

After completing the basic perceptron, try these:

Extension 1: 3-Input Gates

Implement AND3, OR3, MAJORITY (output 1 if 2+ inputs are 1).

  • Now you have z = x1*w1 + x2*w2 + x3*w3 + b
  • Visualize in 3D (the decision boundary is a plane!)

Extension 2: NAND as Universal Gate

NAND is a universal gate - you can build any other gate from NANDs.

  • Train a NAND perceptron
  • Show how to compose them (manually) to make AND, OR, NOT

Extension 3: Visualization

Plot the decision boundary as training progresses:

  • Use matplotlib to show the 2D input space
  • Draw the line w1*x1 + w2*x2 + b = 0
  • Update the plot each epoch to see the line move

Extension 4: Multi-class (One-vs-All)

Instead of binary output, classify into 4 categories:

  • Train 4 perceptrons, one for each class
  • Output the class with highest weighted sum (before step)

Extension 5: Implement in C or Rust

Rewrite the perceptron in a low-level language:

  • No garbage collection, manual memory
  • Appreciate how simple the actual computation is
  • Time the training - it should be microseconds

Extension 6: Two-Layer Perceptron

Build a simple 2-layer network to solve XOR:

  • Hidden layer with 2 neurons
  • Output layer with 1 neuron
  • Youโ€™ll need to implement backpropagation (preview of Project 5)

Real-World Connections

Where Perceptrons Appear Today

  1. Spam Filters (Early Versions)
    • Before deep learning, spam filters used linear classifiers
    • Features: word counts, sender reputation
    • Perceptron-style updates on misclassifications
  2. Credit Scoring (Logistic Regression)
    • Banks use linear models for interpretability
    • Similar to perceptron but with sigmoid activation
    • Weights show which factors matter (income, debt ratio)
  3. Sentiment Analysis (Baseline)
    • Count positive/negative words โ†’ weighted sum โ†’ decision
    • Perceptron is the simplest baseline to beat
  4. Medical Triage
    • Simple rule-based systems are essentially perceptrons
    • โ€œIf blood pressure > X AND temperature > Y, alert doctorโ€

Why This Foundation Matters

Understanding the perceptron is essential because:

  1. Every deep learning layer IS a perceptron (plus non-linearity)
    • A dense layer: each output neuron is z = w1x1 + w2x2 + โ€ฆ + b
    • You just learned the atom of neural networks
  2. Debugging deep networks requires this intuition
    • When gradients vanish, youโ€™re seeing the XOR problem at scale
    • When weights explode, itโ€™s learning rate issues
  3. Interpretable AI often means simpler models
    • Regulators want to know WHY a loan was denied
    • Perceptrons are explainable: โ€œthese factors with these weightsโ€
  4. Edge/embedded AI needs efficient models
    • IoT devices canโ€™t run transformers
    • Simple perceptron-style models fit in kilobytes

Books That Will Help

Topic Book Chapter/Section
Perceptron fundamentals Grokking Deep Learning by Andrew Trask Ch. 3: โ€œIntroduction to Neural Predictionโ€
Mathematical foundations Neural Networks and Deep Learning by Michael Nielsen Ch. 1: โ€œUsing neural nets to recognize handwritten digitsโ€
The Perceptron algorithm Grokking Deep Learning by Andrew Trask Ch. 4: โ€œIntroduction to Neural Learningโ€
Linear separability Pattern Recognition and Machine Learning by Christopher Bishop Ch. 4: โ€œLinear Models for Classificationโ€
History and context Perceptrons by Minsky & Papert Introduction and Ch. 1-3 (historical document)
Optimization theory Deep Learning by Goodfellow, Bengio, Courville Ch. 4.3: โ€œGradient-Based Optimizationโ€
Python implementation Data Science from Scratch by Joel Grus Ch. 18: โ€œNeural Networksโ€

Online Resources

  • 3Blue1Brown: โ€œBut what is a neural network?โ€ (YouTube) - Excellent visualization
  • Andrej Karpathy: โ€œNeural Networks: Zero to Heroโ€ - Modern perspective
  • Michael Nielsen: neuralnetworksanddeeplearning.com - Free online book

Self-Assessment Checklist

Before moving to Project 2, verify you can:

Implementation Skills

  • Write the step function without looking at notes
  • Implement forward pass from scratch
  • Apply the Delta Rule correctly
  • Train to convergence on AND, OR, NAND, NOR
  • Explain why XOR doesnโ€™t converge

Conceptual Understanding

  • Draw the decision boundary for a trained perceptron
  • Explain what each weight controls geometrically
  • Describe what the bias shifts
  • Define linear separability with an example

Mathematical Foundations

  • Derive the Delta Rule update from error minimization intuition
  • Calculate z by hand for given weights and inputs
  • Predict whether a point is above or below the decision boundary

Conceptual Questions (Answer Without Looking)

  1. Whatโ€™s the output of step(-0.001)?
  2. If error=1 and x1=0, how much does w1 change?
  3. Why doesnโ€™t XOR work with a single perceptron?
  4. What happens if learning_rate = 0?
  5. How many parameters does a 2-input perceptron have?
  6. Whatโ€™s the role of bias in the decision boundary?
  7. Can a perceptron with 3 inputs learn the MAJORITY function?

Code Challenges (Try Without Hints)

  1. Modify your code to work with 3 inputs
  2. Add a function that plots the decision boundary
  3. Count how many epochs each gate needs on average (run 100 trials)
  4. Find the minimum learning rate that still converges in < 1000 epochs

Whatโ€™s Next

Youโ€™ve built the atom of neural networks. But real learning happens when atoms combine into molecules.

Project 2: Gradient Descent Visualizer will show you:

  • How optimization works in continuous (not binary) spaces
  • Why we need derivatives
  • What a โ€œloss landscapeโ€ looks like
  • How learning rate affects convergence

The perceptron used a simple error and discrete step function. Modern networks use continuous loss functions and smooth activations - thatโ€™s where calculus enters the picture.


Next: P02: Gradient Descent Visualizer - See optimization in action


Appendix: Logic Gate Truth Tables

For reference:

AND Gate:            OR Gate:             NAND Gate:           NOR Gate:
x1 x2 | y            x1 x2 | y            x1 x2 | y            x1 x2 | y
------+--            ------+--            ------+--            ------+--
0  0  | 0            0  0  | 0            0  0  | 1            0  0  | 1
0  1  | 0            0  1  | 1            0  1  | 1            0  1  | 0
1  0  | 0            1  0  | 1            1  0  | 1            1  0  | 0
1  1  | 1            1  1  | 1            1  1  | 0            1  1  | 0

XOR Gate (NOT linearly separable):
x1 x2 | y
------+--
0  0  | 0
0  1  | 1
1  0  | 1
1  1  | 0

XNOR Gate (NOT linearly separable):
x1 x2 | y
------+--
0  0  | 1
0  1  | 0
1  0  | 0
1  1  | 1

This project is part of the โ€œAI Prediction & Neural Networks: From Math to Machineโ€ learning path.