LEARN JUPYTER NOTEBOOKS DEEP DIVE

Learn Jupyter Notebooks: From Zero to Interactive Computing Master

Goal: Deeply understand Jupyter Notebooks—from basic usage and why they exist, to building interactive data science workflows, visualization dashboards, reproducible research documents, and understanding the underlying kernel architecture.

Why Learn Jupyter Notebooks?

Jupyter Notebooks represent a paradigm shift in how we write, test, and share code. Unlike traditional code files (.py, .js, .c), Jupyter Notebooks are interactive documents that blend code, output, visualizations, and narrative text in a single shareable artifact.

The Problem with Pure Code Files

Traditional programming workflow:

Write code → Run entire file → See output → Modify code → Run again

This creates friction for:

Exploration: You want to test one idea quickly, but must run the whole file
Visualization: Plots appear in separate windows, not alongside your code
Documentation: Comments are separate from rendered explanations
Sharing: Colleagues see code, but not the results unless they run it themselves
Teaching: Students can’t see the thought process, only the final result

The Notebook Solution

Jupyter Notebooks workflow:

Write cell → Run cell → See output immediately → Continue experimenting → Share document with code AND results

Why People Choose Notebooks Over Pure Code

Aspect	Pure Code Files	Jupyter Notebooks
Execution	Run entire file	Run cells individually
Feedback	Output in terminal	Output inline with code
Visualization	Separate windows	Embedded in document
Documentation	Comments only	Markdown, LaTeX, images
Sharing	Code only	Code + outputs + narrative
Exploration	Restart for each change	Iterate on state
Reproducibility	Re-run everything	Clear + Run All

Who Uses Jupyter Notebooks?

Data Scientists: 90%+ use notebooks for exploratory data analysis
Machine Learning Engineers: Model prototyping and experimentation
Scientists/Researchers: Reproducible research documents
Educators: Interactive teaching materials
Analysts: Reports that combine code, data, and insights
Engineers: API testing, prototyping, documentation

After completing these projects, you will:

Understand why notebooks exist and when to use them
Master interactive data exploration and visualization
Build reproducible research documents
Create interactive dashboards and widgets
Understand the kernel architecture (how code actually runs)
Know when notebooks are NOT the right tool
Share your work professionally

Core Concept Analysis

The Notebook Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                         BROWSER / JUPYTER LAB                            │
│                                                                          │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │                      Notebook Interface                           │   │
│  │                                                                   │   │
│  │  ┌─────────────────────────────────────────────────────────────┐ │   │
│  │  │ [Markdown Cell]                                             │ │   │
│  │  │ # My Analysis                                               │ │   │
│  │  │ This notebook explores...                                   │ │   │
│  │  └─────────────────────────────────────────────────────────────┘ │   │
│  │  ┌─────────────────────────────────────────────────────────────┐ │   │
│  │  │ [Code Cell]                                       [Run ▶]   │ │   │
│  │  │ import pandas as pd                                         │ │   │
│  │  │ df = pd.read_csv('data.csv')                               │ │   │
│  │  │ df.head()                                                   │ │   │
│  │  └─────────────────────────────────────────────────────────────┘ │   │
│  │  ┌─────────────────────────────────────────────────────────────┐ │   │
│  │  │ [Output]                                                    │ │   │
│  │  │    name    age    salary                                    │ │   │
│  │  │ 0  Alice   28     50000                                     │ │   │
│  │  │ 1  Bob     35     65000                                     │ │   │
│  │  │ 2  Carol   42     75000                                     │ │   │
│  │  └─────────────────────────────────────────────────────────────┘ │   │
│  └──────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    │ WebSocket (ZMQ)
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                           JUPYTER SERVER                                 │
│                                                                          │
│   ┌────────────────┐    ┌────────────────┐    ┌────────────────┐       │
│   │  Python Kernel │    │   R Kernel     │    │  Julia Kernel  │       │
│   │                │    │                │    │                │       │
│   │  Executes code │    │  Executes code │    │  Executes code │       │
│   │  Maintains     │    │  Maintains     │    │  Maintains     │       │
│   │  state (vars)  │    │  state (vars)  │    │  state (vars)  │       │
│   └────────────────┘    └────────────────┘    └────────────────┘       │
└─────────────────────────────────────────────────────────────────────────┘

Key Concepts Explained

1. Cells: The Building Blocks

┌─────────────────────────────────────────────────────────────────────┐
│                           CELL TYPES                                 │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  CODE CELL                    │  MARKDOWN CELL                      │
│  ─────────────────           │  ───────────────────                 │
│  Contains executable code     │  Contains formatted text            │
│  Runs in the kernel          │  Rendered as HTML                   │
│  Shows output below          │  Supports LaTeX math                │
│                              │  Supports images, links             │
│  Example:                    │  Example:                           │
│  x = 10                      │  # Section Title                    │
│  print(x * 2)                │  This is **bold** text             │
│  → 20                        │  Formula: $E = mc^2$               │
│                              │                                      │
└─────────────────────────────────────────────────────────────────────┘

2. Kernel: The Execution Engine

The kernel is a separate process that executes your code. Key properties:

┌─────────────────────────────────────────────────────────────────────┐
│                           KERNEL STATE                               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  # Cell 1: Executed first                                           │
│  x = 10                                                             │
│                                                                      │
│  # Cell 2: Executed second                                          │
│  y = x + 5  # x is still in memory!                                │
│                                                                      │
│  # Cell 3: Executed third                                           │
│  print(y)  → 15                                                     │
│                                                                      │
│  ═══════════════════════════════════════════════════════════════   │
│                                                                      │
│  KERNEL MEMORY:                                                      │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │  x → 10                                                       │   │
│  │  y → 15                                                       │   │
│  │  (All variables persist until kernel restart)                │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Critical Understanding: Execution order matters, not cell position!

Cell [1]: x = 5      # Executed 1st
Cell [3]: z = x + y  # Executed 3rd (uses values from 1st and 2nd)
Cell [2]: y = 10     # Executed 2nd

The numbers in brackets show execution order, not position.

3. The .ipynb File Format

Notebooks are stored as JSON files:

{
  "cells": [
    {
      "cell_type": "markdown",
      "source": ["# My Notebook\n", "This is text"]
    },
    {
      "cell_type": "code",
      "source": ["x = 10\n", "print(x)"],
      "outputs": [
        {
          "output_type": "stream",
          "text": ["10\n"]
        }
      ],
      "execution_count": 1
    }
  ],
  "metadata": {
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  }
}

This format enables:

Git diff: See what changed (though it’s noisy)
Output storage: Results saved with code
Metadata: Kernel info, widgets state

4. Jupyter Ecosystem

┌─────────────────────────────────────────────────────────────────────┐
│                       JUPYTER ECOSYSTEM                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  INTERFACES:                                                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                  │
│  │  Jupyter    │  │  JupyterLab │  │   VSCode    │                  │
│  │  Notebook   │  │  (Modern)   │  │  Extension  │                  │
│  │  (Classic)  │  │             │  │             │                  │
│  └─────────────┘  └─────────────┘  └─────────────┘                  │
│                                                                      │
│  KERNELS:                                                            │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                  │
│  │   Python    │  │      R      │  │    Julia    │                  │
│  │  (ipykernel)│  │ (IRkernel)  │  │  (IJulia)   │                  │
│  └─────────────┘  └─────────────┘  └─────────────┘                  │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                  │
│  │   Scala     │  │   JavaScript│  │   C++       │                  │
│  │ (Almond)    │  │   (IJavaSc) │  │  (xeus-cling│                  │
│  └─────────────┘  └─────────────┘  └─────────────┘                  │
│                                                                      │
│  EXTENSIONS:                                                         │
│  • nbextensions (Notebook)    • jupyterlab-git                      │
│  • widgets (ipywidgets)       • jupyterlab-code-formatter          │
│  • voila (dashboards)         • variable-inspector                  │
│                                                                      │
│  CLOUD PLATFORMS:                                                    │
│  • Google Colab               • Kaggle Notebooks                    │
│  • AWS SageMaker              • Azure Notebooks                     │
│  • Binder                     • Deepnote                            │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

5. When NOT to Use Notebooks

Notebooks are powerful but not always appropriate:

Use Notebooks For	Use Pure Code For
Exploration	Production systems
Prototyping	Libraries/packages
Teaching	CLI tools
Reports	Large applications
Visualization	Version-controlled code
Quick experiments	Team collaboration

Project List

The following 14 projects will teach you Jupyter Notebooks from basics to advanced interactive computing.

Project 1: Interactive Data Explorer

File: LEARN_JUPYTER_NOTEBOOKS_DEEP_DIVE.md
Main Programming Language: Python
Alternative Programming Languages: R, Julia
Coolness Level: Level 2: Practical but Forgettable
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 1: Beginner
Knowledge Area: Data Exploration / Pandas
Software or Tool: Jupyter Notebook, Pandas, Matplotlib
Main Book: “Python for Data Analysis” by Wes McKinney

What you’ll build: An interactive data exploration notebook that loads a dataset (CSV, JSON, or Excel), performs summary statistics, handles missing values, creates visualizations, and documents findings—all in one shareable document.

Why it teaches Jupyter: This is the core use case for notebooks. You’ll immediately understand why data scientists prefer notebooks over plain Python scripts: see your data transformations instantly, iterate on visualizations, and create a document that tells a story.

Core challenges you’ll face:

Cell execution order → maps to understanding kernel state and variable persistence
Inline visualization → maps to matplotlib integration with %matplotlib inline
DataFrame display → maps to rich output representation
Mixing code and narrative → maps to markdown cells for documentation

Resources for key challenges:

“Python for Data Analysis” Chapter 4 - IPython and Jupyter basics
Jupyter Documentation - User Interface

Key Concepts:

Cell Execution: Jupyter Documentation - Running Code
Magic Commands: IPython Documentation - Built-in Magics
Inline Plotting: Matplotlib Documentation - Interactive Mode
DataFrame Display: Pandas Documentation - Options and Settings

Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic Python, understanding of CSV files

Real world outcome:

Your notebook will display:

# Sales Data Analysis - Q4 2024

## 1. Data Loading
[Code] df = pd.read_csv('sales.csv')
       df.head()

[Output] Beautiful formatted table showing first 5 rows

## 2. Summary Statistics
[Code] df.describe()

[Output] Count, mean, std, min, max for all numeric columns

## 3. Missing Values Analysis
[Code] df.isnull().sum()

[Output] Column-by-column missing value counts

## 4. Visualization
[Code] plt.figure(figsize=(12, 6))
       df.plot(kind='bar')

[Output] Bar chart embedded directly in the notebook

## 5. Key Findings
[Markdown] Based on the analysis:
- Sales peaked in November
- Region X underperformed by 15%
- 3 outliers identified for investigation

Implementation Hints:

Start by understanding the notebook interface:

Create a new notebook (Kernel → Python 3)
First cell: Import libraries with %matplotlib inline magic
Second cell: Load data with pandas
Use Shift+Enter to run cells and move to next
Use markdown cells (change cell type) for documentation

Key questions to explore:

What happens if you run cell 3 before cell 2?
What is the [*] that appears while a cell is running?
How do you restart the kernel and clear all outputs?
What’s the difference between df (displays nicely) and print(df) (plain text)?

Magic commands to learn:

%matplotlib inline      # Plots appear in notebook
%timeit expression      # Time how long code takes
%who                    # List all variables
%reset                  # Clear all variables
%%time                  # Time the entire cell

Learning milestones:

Run your first cell → Understand cell execution
Create inline plot → See why notebooks beat scripts
Mix code and markdown → Build a narrative document
Share the .ipynb file → Others see your code AND results

Project 2: Literate Programming Tutorial

File: LEARN_JUPYTER_NOTEBOOKS_DEEP_DIVE.md
Main Programming Language: Python (with Markdown)
Alternative Programming Languages: Any with Jupyter kernel
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 1: Beginner
Knowledge Area: Documentation / Technical Writing
Software or Tool: Jupyter Notebook, nbconvert
Main Book: “The Pragmatic Programmer” by Hunt and Thomas

What you’ll build: A complete programming tutorial that teaches a concept (like sorting algorithms or regex) using a notebook—combining explanations, code examples, exercises, and solutions.

Why it teaches Jupyter: Notebooks originated from the idea of “literate programming” (Donald Knuth): code should be embedded in documentation, not documentation embedded in code. Building a tutorial forces you to use all of Jupyter’s documentation features.

Core challenges you’ll face:

Markdown mastery → maps to headers, lists, emphasis, code blocks
LaTeX for math → maps to inline and block equations
Exercise design → maps to code cells for practice
Export to HTML/PDF → maps to nbconvert for sharing

Resources for key challenges:

Key Concepts:

Literate Programming: “Literate Programming” by Donald Knuth
Markdown Syntax: GitHub Markdown Guide
LaTeX Math: Overleaf LaTeX Documentation
nbconvert: Jupyter Documentation - Converting Notebooks

Difficulty: Beginner Time estimate: 3-5 days Prerequisites: Project 1 (Interactive Data Explorer)

Real world outcome:

Your tutorial notebook will contain:

# Understanding Sorting Algorithms

## Introduction
Sorting is fundamental to computer science. We'll explore three algorithms:
- Bubble Sort: O(n²) - Simple but slow
- Merge Sort: O(n log n) - Divide and conquer
- Quick Sort: O(n log n) average - Fast in practice

## Big O Notation
The time complexity is expressed as:

$$T(n) = O(f(n))$$

Where $f(n)$ describes how time grows with input size $n$.

## Bubble Sort
### Theory
Compare adjacent elements and swap if out of order...

### Implementation
[Code Cell - Students fill in]

### Visualization
[Code Cell - Animated sorting visualization]

## Exercises
1. Implement bubble sort
2. Count comparisons
3. Compare with built-in `sorted()`

## Solutions (Hidden by default)
[Code Cell - Solutions]

Implementation Hints:

Markdown essentials:

# Heading 1
## Heading 2

**bold** and *italic*

- Bullet list
1. Numbered list

`inline code`

​```python
code block
​```

[Link text](url)
![Alt text](image.png)

> Blockquote

---  (horizontal rule)

LaTeX math:

Inline: $E = mc^2$

Block:
$$
\sum_{i=1}^{n} i = \frac{n(n+1)}{2}
$$

Export your tutorial:

# Convert to HTML
jupyter nbconvert --to html tutorial.ipynb

# Convert to PDF (requires LaTeX)
jupyter nbconvert --to pdf tutorial.ipynb

# Convert to slides
jupyter nbconvert --to slides tutorial.ipynb --post serve

Learning milestones:

Master markdown → Headers, lists, emphasis
Add math equations → LaTeX integration
Include images → Visual explanations
Export to HTML → Share with non-Jupyter users

Project 3: Reproducible Research Document

File: LEARN_JUPYTER_NOTEBOOKS_DEEP_DIVE.md
Main Programming Language: Python
Alternative Programming Languages: R, Julia
Coolness Level: Level 3: Genuinely Clever
Business Potential: 1. The “Resume Gold”
Difficulty: Level 2: Intermediate
Knowledge Area: Scientific Computing / Reproducibility
Software or Tool: Jupyter, Papermill, requirements.txt
Main Book: “Reproducible Research with R and RStudio” by Christopher Gandrud

What you’ll build: A complete reproducible analysis that downloads data from an API, performs statistical analysis, generates publication-quality figures, and can be re-run by anyone to verify results.

Why it teaches Jupyter: The “reproducibility crisis” in science stems from analyses that can’t be replicated. This project teaches you to build notebooks that anyone can run and get the same results—essential for research and professional analysis.

Core challenges you’ll face:

Environment management → maps to conda/pip requirements
Random seed control → maps to reproducible random numbers
API data fetching → maps to handling external data
Figure export → maps to publication-ready graphics

Resources for key challenges:

“Python for Data Analysis” Chapter 10 - Data Loading
Jupyter Reproducibility Guide

Key Concepts:

Virtual Environments: Python Packaging Guide
Random Seeds: NumPy random documentation
Data Caching: Requests-cache library
Figure Export: Matplotlib savefig documentation

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Projects 1-2, understanding of statistics basics

Real world outcome:

Repository structure:

project/
├── README.md
│   "How to reproduce this analysis"
├── requirements.txt
│   pandas==2.0.0
│   matplotlib==3.7.0
│   requests==2.28.0
├── environment.yml
│   (conda environment)
├── data/
│   └── .gitkeep
│   (data downloaded on run)
├── figures/
│   └── figure1.png
│   └── figure2.pdf
├── analysis.ipynb
│   (your reproducible notebook)
└── run_analysis.sh
    "pip install -r requirements.txt"
    "jupyter nbconvert --execute analysis.ipynb"

When others run your notebook:
$ git clone your-repo
$ pip install -r requirements.txt
$ jupyter nbconvert --execute analysis.ipynb
# → Same results, same figures, verified!

Implementation Hints:

Environment reproducibility:

# Cell 1: Version check
import sys
print(f"Python: {sys.version}")

import pandas as pd
print(f"Pandas: {pd.__version__}")

# Set random seed for reproducibility
import numpy as np
np.random.seed(42)

Generate requirements:

pip freeze > requirements.txt
# Or better, use pip-compile (pip-tools)
pip-compile requirements.in

Data caching:

import os
import requests

DATA_FILE = 'data/dataset.csv'

if not os.path.exists(DATA_FILE):
    response = requests.get('https://api.example.com/data')
    with open(DATA_FILE, 'w') as f:
        f.write(response.text)

df = pd.read_csv(DATA_FILE)

Publication-quality figures:

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 6), dpi=300)
# ... plotting code ...
fig.savefig('figures/figure1.pdf', bbox_inches='tight')
fig.savefig('figures/figure1.png', dpi=300, bbox_inches='tight')

Learning milestones:

Pin dependencies → Same package versions everywhere
Control randomness → Reproducible random numbers
Cache data → Don’t re-download every run
Export figures → Publication-ready graphics

Project 4: Interactive Visualization Dashboard

File: LEARN_JUPYTER_NOTEBOOKS_DEEP_DIVE.md
Main Programming Language: Python
Alternative Programming Languages: R (with Shiny in notebooks)
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 2: Intermediate
Knowledge Area: Data Visualization / Interactive Computing
Software or Tool: ipywidgets, Plotly, Voilà
Main Book: “Interactive Data Visualization for the Web” by Scott Murray

What you’ll build: An interactive dashboard with sliders, dropdowns, and buttons that update visualizations in real-time—all within a Jupyter notebook, deployable as a standalone web app with Voilà.

Why it teaches Jupyter: Jupyter’s widget system transforms notebooks from static documents into interactive applications. This project teaches the full power of ipywidgets and how to deploy notebooks as dashboards.

Core challenges you’ll face:

Widget types → maps to sliders, dropdowns, text inputs, buttons
Reactive updates → maps to observe and link mechanisms
Layout design → maps to HBox, VBox, GridBox
Voilà deployment → maps to notebooks as web apps

Resources for key challenges:

Key Concepts:

Widget Basics: ipywidgets User Guide
Linking Widgets: ipywidgets - Widget Events
Layout Widgets: ipywidgets - Widget Layout
Voilà Deployment: Voilà Documentation

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Projects 1-3, matplotlib/plotly basics

Real world outcome:

Your dashboard displays:

┌─────────────────────────────────────────────────────────────────────┐
│                    📊 Sales Analytics Dashboard                      │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Date Range: [===========|=======] Jan 2024 - Dec 2024              │
│                                                                      │
│  Region: [Dropdown: All ▼]    Product: [Dropdown: All ▼]            │
│                                                                      │
│  ┌─────────────────────────────────┐ ┌────────────────────────────┐ │
│  │                                 │ │                            │ │
│  │     Monthly Sales Trend         │ │    Sales by Category      │ │
│  │     (Line Chart - Updates       │ │    (Pie Chart - Updates   │ │
│  │      when filters change)       │ │     when filters change)  │ │
│  │                                 │ │                            │ │
│  └─────────────────────────────────┘ └────────────────────────────┘ │
│                                                                      │
│  Total Sales: $1,234,567    YoY Growth: +15.3%    Top Region: West  │
│                                                                      │
│  [📥 Download Report]  [🔄 Refresh Data]                            │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Deploy with Voilà:
$ voila dashboard.ipynb
→ Opens as standalone web app (code hidden, only widgets visible)

Implementation Hints:

Basic widget pattern:

import ipywidgets as widgets
from IPython.display import display

# Create widgets
slider = widgets.IntSlider(
    value=50,
    min=0,
    max=100,
    description='Filter:'
)

output = widgets.Output()

# Update function
def update(change):
    with output:
        output.clear_output()
        filtered = df[df['value'] <= change['new']]
        # Create updated plot
        plt.figure()
        filtered.plot(kind='bar')
        plt.show()

slider.observe(update, names='value')

# Display
display(slider, output)

Layout widgets:

from ipywidgets import HBox, VBox, GridBox, Layout

dashboard = VBox([
    HBox([date_slider, region_dropdown]),
    HBox([
        widgets.Output(layout=Layout(width='50%')),
        widgets.Output(layout=Layout(width='50%'))
    ]),
    HBox([download_button, refresh_button])
])

display(dashboard)

Deploy with Voilà:

# Install
pip install voila

# Run
voila dashboard.ipynb

# With custom template
voila dashboard.ipynb --template=material

Learning milestones:

Use basic widgets → Sliders, dropdowns, buttons
Link widgets to outputs → Reactive updates
Design layouts → Professional-looking dashboard
Deploy with Voilà → Share as web app

Project 5: Kernel Architecture Deep Dive

File: LEARN_JUPYTER_NOTEBOOKS_DEEP_DIVE.md
Main Programming Language: Python
Alternative Programming Languages: Any (for multi-kernel exploration)
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 3: Advanced
Knowledge Area: Systems Programming / IPC
Software or Tool: Jupyter, ZeroMQ, ipykernel
Main Book: “The Linux Programming Interface” by Michael Kerrisk

What you’ll build: A minimal Jupyter kernel from scratch that executes code and returns output, teaching you exactly how the notebook-to-kernel communication works.

Why it teaches Jupyter: Most users treat the kernel as magic. Building one reveals the ZeroMQ messaging protocol, the separation between frontend and backend, and why notebooks can support any programming language.

Core challenges you’ll face:

ZeroMQ messaging → maps to request-reply, publish-subscribe patterns
Message protocol → maps to Jupyter wire protocol
State management → maps to execution count, variable storage
Output handling → maps to stdout, stderr, display data

Resources for key challenges:

Key Concepts:

ZeroMQ Basics: “ZeroMQ Guide” by Pieter Hintjens
Jupyter Wire Protocol: Jupyter Messaging Documentation
IPython Kernel: ipykernel source code
Process Communication: “The Linux Programming Interface” Ch. 43

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Projects 1-4, understanding of sockets/IPC

Real world outcome:

Your minimal kernel:

# my_kernel.py
from ipykernel.kernelbase import Kernel

class MyKernel(Kernel):
    implementation = 'My Kernel'
    implementation_version = '1.0'
    language = 'python'
    language_version = '3.10'
    banner = "My Custom Kernel"
    
    def do_execute(self, code, silent, ...):
        # Execute code
        try:
            result = eval(code)
            if not silent:
                stream_content = {'name': 'stdout', 'text': str(result)}
                self.send_response(self.iopub_socket, 'stream', stream_content)
        except Exception as e:
            # Handle error
            ...
        
        return {'status': 'ok', 'execution_count': self.execution_count}

# Install the kernel
$ python -m my_kernel install

# Use in Jupyter
$ jupyter notebook
# Select "My Kernel" from kernel menu

# See the messages flying
$ jupyter console --existing
Kernel responds to:
- execute_request
- kernel_info_request
- complete_request (autocomplete)
- inspect_request (documentation)

Implementation Hints:

The Jupyter architecture:

Frontend (Notebook)
       │
       │ ZeroMQ (tcp://...)
       │
       ▼
┌──────────────────────────────────────────────────────┐
│                   KERNEL PROCESS                      │
│                                                       │
│  Shell Channel (execute, complete, inspect)          │
│  IOPub Channel (stdout, stderr, display_data)        │
│  Stdin Channel (input() requests)                    │
│  Control Channel (shutdown, interrupt)               │
│  Heartbeat Channel (alive check)                     │
│                                                       │
└──────────────────────────────────────────────────────┘

Message format:

{
    'header': {
        'msg_id': 'uuid',
        'msg_type': 'execute_request',
        'session': 'uuid',
    },
    'parent_header': {...},
    'metadata': {...},
    'content': {
        'code': 'print("hello")',
        'silent': False
    }
}

Questions to explore:

What happens when you press Shift+Enter?
How does tab-completion work?
How are plots displayed inline?
What happens when you restart the kernel?

Learning milestones:

Understand ZeroMQ sockets → Communication channels
Parse Jupyter messages → Wire protocol
Handle execute_request → Run code and return output
Install custom kernel → Use in real notebooks

Project 6: Multi-Language Notebook

File: LEARN_JUPYTER_NOTEBOOKS_DEEP_DIVE.md
Main Programming Language: Python, R, Julia
Alternative Programming Languages: SQL, Bash, JavaScript
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 2: Intermediate
Knowledge Area: Polyglot Programming / Data Pipelines
Software or Tool: Jupyter, IRkernel, IJulia, BeakerX
Main Book: “Data Science at the Command Line” by Jeroen Janssens

What you’ll build: A single notebook that uses multiple programming languages: Python for data processing, R for statistical analysis, SQL for database queries, and JavaScript for visualization—passing data between them.

Why it teaches Jupyter: The “Ju” in Jupyter stands for Julia, “Py” for Python, and “R” for R. This project teaches you the kernel architecture by using multiple kernels and passing data between languages.

Core challenges you’ll face:

Multiple kernels → maps to installing and switching kernels
Data passing → maps to rpy2, julia, or file-based
Magic commands → maps to %%R, %%javascript, %%sql
Environment setup → maps to managing multiple language environments

Resources for key challenges:

Key Concepts:

Kernel Installation: Jupyter Documentation - Installing Kernels
rpy2 Python-R Bridge: rpy2 Documentation
SQL Magic: ipython-sql Documentation
Polyglot Notebooks: BeakerX Documentation

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Project 5, basic R/SQL knowledge

Real world outcome:

# Cell 1: Python - Load data
import pandas as pd
df = pd.read_csv('sales.csv')
print(f"Loaded {len(df)} rows")

# Cell 2: SQL Magic - Query data
%load_ext sql
%sql sqlite:///sales.db

%%sql
SELECT region, SUM(sales) as total
FROM sales_table
GROUP BY region
ORDER BY total DESC
LIMIT 5

# Cell 3: R Magic - Statistical analysis
%load_ext rpy2.ipython

%%R -i df
# df is passed from Python!
model <- lm(sales ~ advertising + price, data=df)
summary(model)

# Cell 4: JavaScript - Interactive visualization
%%javascript
// Access data passed from Python
require(['d3'], function(d3) {
    // Create D3 visualization
    d3.select('#viz').append('svg')...
});

# Cell 5: Bash - System commands
%%bash
wc -l sales.csv
head -5 sales.csv

Implementation Hints:

Install additional kernels:

# R kernel
$ Rscript -e "install.packages('IRkernel')"
$ Rscript -e "IRkernel::installspec()"

# Julia kernel
$ julia -e 'using Pkg; Pkg.add("IJulia")'

# SQL magic
$ pip install ipython-sql

# R magic
$ pip install rpy2

Data passing between languages:

# Python to R with rpy2
%load_ext rpy2.ipython
df = pd.DataFrame({'x': [1,2,3], 'y': [4,5,6]})

%%R -i df -o result
# df comes from Python
result <- summary(df)
# result goes back to Python

# Alternative: Use files
df.to_csv('temp.csv')
# Then read in R, Julia, etc.

Learning milestones:

Install multiple kernels → R, Julia, SQL
Use magic commands → %%R, %%sql, %%bash
Pass data between languages → Python ↔ R
Build polyglot pipeline → Best tool for each job

Project 7: Machine Learning Experiment Tracker

File: LEARN_JUPYTER_NOTEBOOKS_DEEP_DIVE.md
Main Programming Language: Python
Alternative Programming Languages: R, Julia
Coolness Level: Level 3: Genuinely Clever
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 2: Intermediate
Knowledge Area: Machine Learning / MLOps
Software or Tool: Jupyter, MLflow, Weights & Biases
Main Book: “Hands-On Machine Learning” by Aurélien Géron

What you’ll build: A notebook-based ML experiment tracking system that logs hyperparameters, metrics, model artifacts, and visualizations—making ML experiments reproducible and comparable.

Why it teaches Jupyter: ML development is inherently iterative and exploratory—perfect for notebooks. This project teaches best practices for ML in notebooks: experiment tracking, model versioning, and result comparison.

Core challenges you’ll face:

Experiment logging → maps to MLflow, W&B integration
Hyperparameter tracking → maps to parameter logging
Artifact storage → maps to models, plots, data versions
Comparison dashboards → maps to comparing runs

Resources for key challenges:

Key Concepts:

Experiment Tracking: “MLOps” by Mark Treveil
Model Registry: MLflow Model Registry docs
Hyperparameter Logging: W&B Experiment Tracking
Artifact Management: MLflow Artifacts docs

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Projects 1-3, basic ML knowledge

Real world outcome:

# In your notebook:

import mlflow
import mlflow.sklearn

# Start experiment
mlflow.set_experiment("sales-prediction")

with mlflow.start_run():
    # Log parameters
    mlflow.log_param("model", "RandomForest")
    mlflow.log_param("n_estimators", 100)
    mlflow.log_param("max_depth", 10)
    
    # Train model
    model = RandomForestRegressor(n_estimators=100, max_depth=10)
    model.fit(X_train, y_train)
    
    # Log metrics
    train_score = model.score(X_train, y_train)
    test_score = model.score(X_test, y_test)
    mlflow.log_metric("train_r2", train_score)
    mlflow.log_metric("test_r2", test_score)
    
    # Log model
    mlflow.sklearn.log_model(model, "model")
    
    # Log figure
    fig, ax = plt.subplots()
    ax.scatter(y_test, model.predict(X_test))
    mlflow.log_figure(fig, "predictions.png")

# View all experiments:
$ mlflow ui
# Opens dashboard comparing all runs!

┌─────────────────────────────────────────────────────────────────────┐
│                    MLflow Experiment Dashboard                       │
├─────────────────────────────────────────────────────────────────────┤
│ Run      │ model         │ n_estimators │ test_r2  │ Duration      │
├─────────────────────────────────────────────────────────────────────┤
│ run_001  │ RandomForest  │ 100          │ 0.85     │ 2m 34s        │
│ run_002  │ GradientBoost │ 200          │ 0.87     │ 4m 12s        │
│ run_003  │ RandomForest  │ 500          │ 0.86     │ 8m 45s        │
└─────────────────────────────────────────────────────────────────────┘

Implementation Hints:

MLflow setup:

# Install
pip install mlflow

# Start tracking server
mlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./artifacts

# Or use W&B (cloud-based)
pip install wandb
wandb login

Best practices for ML notebooks:

One experiment per run → Clear boundaries
Log everything → Parameters, metrics, artifacts
Version data → DVC or artifact logging
Save notebooks with outputs → Reproducibility
Use tags → Easy filtering and search

Questions to explore:

How do you compare two experiment runs?
How do you load a model from a previous run?
How do you version your training data?
What should go in version control vs. artifact storage?

Learning milestones:

Set up MLflow → Local tracking server
Log first experiment → Parameters and metrics
Compare runs → Find best hyperparameters
Load previous model → Reproducibility

Project 8: Automated Report Generator

File: LEARN_JUPYTER_NOTEBOOKS_DEEP_DIVE.md
Main Programming Language: Python
Alternative Programming Languages: R (with knitr)
Coolness Level: Level 3: Genuinely Clever
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 2: Intermediate
Knowledge Area: Automation / Reporting
Software or Tool: Papermill, nbconvert, scheduling
Main Book: “Automate the Boring Stuff with Python” by Al Sweigart

What you’ll build: A parameterized notebook that generates automated reports—daily sales reports, weekly metrics, monthly analyses—run via cron or scheduled tasks, outputting PDF/HTML reports.

Why it teaches Jupyter: Notebooks aren’t just for interactive work—they can be automated. Papermill lets you parameterize notebooks and run them programmatically, turning notebooks into report-generation pipelines.

Core challenges you’ll face:

Parameterization → maps to Papermill parameters
Programmatic execution → maps to nbconvert –execute
Output formats → maps to PDF, HTML, slides
Scheduling → maps to cron, Task Scheduler, Airflow

Resources for key challenges:

Key Concepts:

Parameterized Notebooks: Papermill User Guide
Headless Execution: nbconvert execute preprocessor
Template Customization: nbconvert templates
Scheduling: cron, Airflow documentation

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Projects 1-3, command-line familiarity

Real world outcome:

# Template notebook: sales_report_template.ipynb

# Parameters cell (tagged with "parameters")
report_date = "2024-01-01"  # Will be overwritten
region = "all"              # Will be overwritten

# Analysis cells
df = load_data(start=report_date)
df = df[df['region'] == region] if region != "all" else df

# Visualizations...
# Summary statistics...
# Markdown conclusions...

# Run with Papermill:
$ papermill sales_report_template.ipynb \
    reports/sales_2024-01-15_west.ipynb \
    -p report_date "2024-01-15" \
    -p region "west"

# Convert to PDF:
$ jupyter nbconvert --to pdf reports/sales_2024-01-15_west.ipynb

# Automate with cron (daily at 6 AM):
0 6 * * * cd /project && ./generate_reports.sh

# Result: Every morning, stakeholders receive:
┌─────────────────────────────────────────────────────────────────────┐
│                    Daily Sales Report                                │
│                    January 15, 2024 - West Region                    │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Executive Summary                                                   │
│  ─────────────────                                                  │
│  Total Sales: $234,567                                              │
│  YoY Change: +12.3%                                                 │
│                                                                      │
│  [Chart: Daily Sales Trend]                                         │
│                                                                      │
│  [Chart: Top Products]                                              │
│                                                                      │
│  Key Insights:                                                       │
│  • Product A exceeded targets by 15%                                │
│  • Marketing campaign drove 20% traffic increase                    │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Implementation Hints:

Papermill workflow:

# 1. Create template with parameters cell
# Tag a cell with "parameters" in cell metadata

# 2. Run with papermill
import papermill as pm

pm.execute_notebook(
    'template.ipynb',
    'output.ipynb',
    parameters={
        'report_date': '2024-01-15',
        'region': 'west'
    }
)

# 3. Convert to PDF
from nbconvert import PDFExporter
import nbformat

nb = nbformat.read('output.ipynb', as_version=4)
pdf_exporter = PDFExporter()
pdf_data, resources = pdf_exporter.from_notebook_node(nb)

with open('report.pdf', 'wb') as f:
    f.write(pdf_data)

Scheduling script:

#!/bin/bash
# generate_reports.sh

DATE=$(date +%Y-%m-%d)

for region in north south east west; do
    papermill template.ipynb \
        "reports/${DATE}_${region}.ipynb" \
        -p report_date "$DATE" \
        -p region "$region"
    
    jupyter nbconvert --to pdf "reports/${DATE}_${region}.ipynb"
done

# Email reports
mutt -a reports/*.pdf -- stakeholders@company.com < email_body.txt

Learning milestones:

Parameterize a notebook → Tag parameters cell
Run with Papermill → Programmatic execution
Convert to PDF → Professional output
Schedule with cron → Fully automated

Project 9: JupyterLab Extension

File: LEARN_JUPYTER_NOTEBOOKS_DEEP_DIVE.md
Main Programming Language: TypeScript/JavaScript
Alternative Programming Languages: Python (for backend)
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 4: Expert
Knowledge Area: Frontend Development / Plugin Architecture
Software or Tool: JupyterLab, Node.js, TypeScript
Main Book: “Programming TypeScript” by Boris Cherny

What you’ll build: A JupyterLab extension that adds new functionality—a custom sidebar, new toolbar buttons, or a completely new view—teaching you the JupyterLab extension architecture.

Why it teaches Jupyter: JupyterLab is built as a collection of extensions. Understanding how to extend it teaches you the modular architecture, PhosphorJS widgets, and the JupyterLab plugin system.

Core challenges you’ll face:

JupyterLab architecture → maps to plugins, services, widgets
TypeScript/React → maps to modern frontend development
Extension API → maps to commands, menus, sidebars
Building and publishing → maps to npm, conda-forge

Resources for key challenges:

Key Concepts:

Plugin System: JupyterLab Extension Guide
Lumino Widgets: Lumino Documentation
JupyterLab Services: @jupyterlab/services API
Extension Publishing: jupyter-packaging docs

Difficulty: Expert Time estimate: 3-4 weeks Prerequisites: Projects 1-5, TypeScript/React experience

Real world outcome:

Your extension adds a "Code Snippets" sidebar:

┌─────────────────────────────────────────────────────────────────────┐
│  JupyterLab                                                          │
├──────────────┬──────────────────────────────────────────────────────┤
│              │                                                       │
│  📁 Files    │   [Notebook]                                         │
│              │                                                       │
│  🔧 Running  │   [1]: import pandas as pd                           │
│              │                                                       │
│  📝 Snippets │   [2]: df = pd.read_csv('data.csv')                  │
│  ───────────│                                                       │
│  > Data Load │   [3]: df.head()                                     │
│    - CSV     │                                                       │
│    - JSON    │                                                       │
│    - SQL     │                                                       │
│  > Plotting  │                                                       │
│    - Line    │                                                       │
│    - Bar     │                                                       │
│    - Scatter │                                                       │
│  > ML        │                                                       │
│    - Train   │                                                       │
│    - Eval    │                                                       │
│              │                                                       │
│  [+ Add]     │                                                       │
│              │                                                       │
└──────────────┴──────────────────────────────────────────────────────┘

Clicking a snippet inserts it into the current cell!

Implementation Hints:

Extension structure:

my-extension/
├── package.json          # Dependencies, scripts
├── tsconfig.json         # TypeScript config
├── src/
│   └── index.ts          # Plugin entry point
├── style/
│   └── index.css         # Styles
└── schema/
    └── plugin.json       # Settings schema

Basic plugin structure:

// src/index.ts
import { JupyterFrontEnd, JupyterFrontEndPlugin } from '@jupyterlab/application';
import { ICommandPalette } from '@jupyterlab/apputils';

const plugin: JupyterFrontEndPlugin<void> = {
  id: 'my-extension:plugin',
  autoStart: true,
  requires: [ICommandPalette],
  activate: (app: JupyterFrontEnd, palette: ICommandPalette) => {
    console.log('Extension activated!');
    
    // Add a command
    app.commands.addCommand('my-extension:hello', {
      label: 'Say Hello',
      execute: () => {
        alert('Hello from my extension!');
      }
    });
    
    // Add to command palette
    palette.addItem({
      command: 'my-extension:hello',
      category: 'My Extension'
    });
  }
};

export default plugin;

Build and install:

# Create from cookiecutter template
pip install cookiecutter
cookiecutter https://github.com/jupyterlab/extension-cookiecutter-ts

# Install dependencies
cd my-extension
npm install

# Build
npm run build

# Install in JupyterLab
pip install -e .
jupyter labextension develop . --overwrite

# Watch for changes during development
npm run watch

Learning milestones:

Create from template → Cookiecutter setup
Add simple command → Command palette integration
Create sidebar widget → Lumino widgets
Publish to npm → Share with community

Project 10: Real-Time Collaborative Notebook

File: LEARN_JUPYTER_NOTEBOOKS_DEEP_DIVE.md
Main Programming Language: Python (backend), TypeScript (frontend)
Alternative Programming Languages: None
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 4: Expert
Knowledge Area: Distributed Systems / Real-Time Collaboration
Software or Tool: JupyterHub, jupyter-collaboration
Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A real-time collaborative notebook system where multiple users can edit the same notebook simultaneously—like Google Docs for code. Understand the CRDT algorithms that make this possible.

Why it teaches Jupyter: Real-time collaboration is the frontier of notebook technology. JupyterLab now supports this via CRDTs (Conflict-free Replicated Data Types). Understanding this teaches distributed systems concepts through a practical lens.

Core challenges you’ll face:

JupyterHub setup → maps to multi-user deployment
Real-time sync → maps to Y.js, CRDTs
Conflict resolution → maps to operational transformation
User presence → maps to cursors, selections

Resources for key challenges:

Key Concepts:

CRDTs: “Designing Data-Intensive Applications” Ch. 5
Operational Transformation: Google Docs engineering blog
WebSocket Communication: MDN WebSocket documentation
JupyterHub Architecture: JupyterHub Technical Overview

Difficulty: Expert Time estimate: 4-6 weeks Prerequisites: Projects 5 and 9, distributed systems basics

Real world outcome:

Two users editing the same notebook simultaneously:

┌─────────────────────────────┐     ┌─────────────────────────────┐
│  Alice's Browser            │     │  Bob's Browser              │
├─────────────────────────────┤     ├─────────────────────────────┤
│                             │     │                             │
│  [1]: x = 10    🟢Alice     │     │  [1]: x = 10                │
│       y = 20    🔵Bob←      │     │       y = 20    🔵Bob←      │
│                             │     │                             │
│  [2]: # Analysis  🟢Alice←  │     │  [2]: # Analysis  🟢Alice←  │
│       Exploring...          │     │       Exploring...          │
│                             │     │                             │
│  ───────────────────────── │     │  ───────────────────────── │
│  🟢 Alice (you)             │     │  🟢 Alice                   │
│  🔵 Bob                     │     │  🔵 Bob (you)               │
│                             │     │                             │
└─────────────────────────────┘     └─────────────────────────────┘
                ↑                                 ↑
                └────────────┬────────────────────┘
                             │
                     ┌───────┴───────┐
                     │ Y.js WebSocket │
                     │    Server      │
                     │   (CRDTs)      │
                     └───────────────┘

Changes sync instantly:
- Alice types → Bob sees immediately
- Cursors show where each user is
- Conflicts resolved automatically

Implementation Hints:

JupyterHub setup:

# Install JupyterHub
pip install jupyterhub
npm install -g configurable-http-proxy

# Generate config
jupyterhub --generate-config

# Install collaboration extension
pip install jupyter-collaboration

# Start hub
jupyterhub

Understanding CRDTs:

Traditional sync: Client → Server → Conflict → Manual resolution
CRDT sync: 
  Client A → [State A]
  Client B → [State B]
  Merge → [State A ∪ B] (automatically consistent!)

Y.js implements:
- Y.Text: Collaborative text editing
- Y.Array: Collaborative lists
- Y.Map: Collaborative key-value

Notebook cells become Y.Array of Y.Map objects!

Key architecture:

┌──────────┐    ┌──────────┐
│ Client A │    │ Client B │
└────┬─────┘    └────┬─────┘
     │               │
     │  WebSocket    │
     ▼               ▼
┌──────────────────────────┐
│    Y.js Provider         │
│    (Awareness + Sync)    │
├──────────────────────────┤
│    JupyterHub Server     │
│    (Auth + Routing)      │
└──────────────────────────┘

Questions to explore:

What happens if two users edit the same cell?
How are cursors transmitted between clients?
What happens when a user goes offline and comes back?
How is the notebook persisted to disk?

Learning milestones:

Deploy JupyterHub → Multi-user setup
Enable collaboration → RTC extension
Understand CRDTs → How conflicts are resolved
Observe sync → Debug WebSocket messages

Project 11: Notebook Testing Framework

File: LEARN_JUPYTER_NOTEBOOKS_DEEP_DIVE.md
Main Programming Language: Python
Alternative Programming Languages: None
Coolness Level: Level 3: Genuinely Clever
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 2: Intermediate
Knowledge Area: Testing / Quality Assurance
Software or Tool: nbval, pytest, testbook
Main Book: “Python Testing with pytest” by Brian Okken

What you’ll build: A testing framework for notebooks that validates outputs, checks for errors, and integrates with CI/CD pipelines—treating notebooks as testable artifacts.

Why it teaches Jupyter: Notebooks are often criticized for being “untestable.” This project teaches you to treat notebooks as first-class tested artifacts, essential for production notebook workflows.

Core challenges you’ll face:

Output validation → maps to nbval expected outputs
Cell execution testing → maps to testbook fixtures
CI integration → maps to GitHub Actions, GitLab CI
Regression testing → maps to detecting output changes

Resources for key challenges:

Key Concepts:

Notebook Validation: nbval documentation
Unit Testing Cells: testbook user guide
CI/CD for Notebooks: GitHub Actions documentation
Property-Based Testing: Hypothesis documentation

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Projects 1-3, pytest experience

Real world outcome:

# Test file: test_analysis.py

from testbook import testbook

@testbook('analysis.ipynb', execute=True)
def test_data_loading(tb):
    """Test that data loads correctly."""
    df = tb.ref('df')  # Get variable from notebook
    assert len(df) > 0
    assert 'sales' in df.columns

@testbook('analysis.ipynb')
def test_specific_cell(tb):
    """Test a specific cell's output."""
    tb.execute_cell('data_cleaning')  # Execute cell by tag
    result = tb.cell_output_text('data_cleaning')
    assert 'cleaned' in result.lower()

@testbook('analysis.ipynb')
def test_visualization(tb):
    """Test that visualization produces output."""
    tb.execute_cell('plot')
    assert tb.cell_output_type('plot') == 'display_data'

# Run with pytest:
$ pytest test_analysis.py -v

test_analysis.py::test_data_loading PASSED
test_analysis.py::test_specific_cell PASSED  
test_analysis.py::test_visualization PASSED

# Or use nbval for output comparison:
$ pytest --nbval analysis.ipynb

analysis.ipynb::Cell 0 PASSED
analysis.ipynb::Cell 1 PASSED
analysis.ipynb::Cell 2 PASSED
...

# GitHub Actions workflow:
name: Test Notebooks
on: [push]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
      - run: pip install pytest testbook nbval
      - run: pytest --nbval notebooks/

Implementation Hints:

nbval (output validation):

# Install
pip install nbval

# Run - checks that saved outputs match re-execution
pytest --nbval my_notebook.ipynb

# Sanitize outputs (ignore variable parts like timestamps)
pytest --nbval --nbval-sanitize-with sanitize.cfg my_notebook.ipynb

# sanitize.cfg:
# [regex]
# regex: \d{4}-\d{2}-\d{2}
# replace: DATE

testbook (unit testing):

from testbook import testbook

@testbook('notebook.ipynb')
def test_function(tb):
    # Execute all cells
    tb.execute()
    
    # Or execute specific cells by index
    tb.execute_cell([0, 1, 2])
    
    # Or by tag
    tb.execute_cell('setup')
    
    # Get variables
    my_var = tb.ref('my_var')
    
    # Inject code
    tb.inject("""
        test_input = [1, 2, 3]
    """)
    
    # Call functions defined in notebook
    func = tb.ref('my_function')
    result = func(test_input)
    assert result == expected

CI/CD best practices:

Execute notebooks from scratch → No stale outputs
Test with different data → Parameterized tests
Check execution time → Performance regression
Validate markdown → No broken links
Lint notebooks → nbqa, pre-commit

Learning milestones:

Validate outputs → nbval basics
Unit test cells → testbook framework
Set up CI → GitHub Actions
Test data variations → Parameterized testing

Project 12: GPU-Accelerated Data Science

File: LEARN_JUPYTER_NOTEBOOKS_DEEP_DIVE.md
Main Programming Language: Python
Alternative Programming Languages: None (CUDA-specific)
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: High Performance Computing / GPU Programming
Software or Tool: RAPIDS (cuDF, cuML), Google Colab
Main Book: “Hands-On GPU Computing with Python” by Avimanyu Bandyopadhyay

What you’ll build: A notebook workflow that processes large datasets on GPUs using RAPIDS (cuDF, cuML), demonstrating 10-100x speedups over CPU-based Pandas/Scikit-learn.

Why it teaches Jupyter: Data science workloads are increasingly GPU-accelerated. Notebooks are the primary interface for GPU data science via RAPIDS and Google Colab. This project teaches you to leverage GPUs interactively.

Core challenges you’ll face:

GPU memory management → maps to understanding VRAM limits
cuDF vs Pandas → maps to API differences
cuML vs Scikit-learn → maps to GPU-accelerated ML
Colab/cloud GPUs → maps to accessing GPU resources

Resources for key challenges:

Key Concepts:

cuDF Basics: RAPIDS cuDF User Guide
GPU Memory: CUDA Programming Guide
cuML Algorithms: RAPIDS cuML documentation
Dask Integration: Dask-cuDF for larger-than-GPU data

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Projects 1-3, understanding of GPU concepts

Real world outcome:

# Google Colab or local GPU environment

# Cell 1: Install RAPIDS
!pip install cudf-cu12 cuml-cu12 --extra-index-url=https://pypi.nvidia.org

# Cell 2: Compare Pandas vs cuDF
import pandas as pd
import cudf
import time

# Load 10 million rows
pandas_df = pd.read_csv('large_data.csv')  # ~30 seconds
cudf_df = cudf.read_csv('large_data.csv')  # ~2 seconds

# Benchmark operations
# Pandas
start = time.time()
pandas_df.groupby('category').agg({'value': 'mean'})
print(f"Pandas: {time.time() - start:.2f}s")  # ~5 seconds

# cuDF (GPU)
start = time.time()
cudf_df.groupby('category').agg({'value': 'mean'})
print(f"cuDF: {time.time() - start:.2f}s")    # ~0.05 seconds

# 100x speedup!

# Cell 3: GPU Machine Learning
from cuml import RandomForestClassifier as cuRF
from sklearn.ensemble import RandomForestClassifier as skRF

# Scikit-learn (CPU)
sk_model = skRF(n_estimators=100)
%time sk_model.fit(X_train, y_train)  # ~2 minutes

# cuML (GPU)
cu_model = cuRF(n_estimators=100)
%time cu_model.fit(X_train, y_train)  # ~5 seconds

# 24x speedup!

Implementation Hints:

Setting up GPU environment:

# Check GPU availability
!nvidia-smi

# Google Colab: Runtime → Change runtime type → GPU

# Local: Install RAPIDS (Linux only, or WSL2)
conda install -c rapidsai -c conda-forge -c nvidia \
    rapids=24.02 python=3.10 cuda-version=12.0

cuDF API (mostly Pandas-compatible):

import cudf

# Read data
gdf = cudf.read_csv('data.csv')
gdf = cudf.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})

# Operations (same as Pandas)
gdf['c'] = gdf['a'] + gdf['b']
grouped = gdf.groupby('a').mean()
filtered = gdf[gdf['a'] > 1]

# Convert to/from Pandas
pandas_df = gdf.to_pandas()
gdf = cudf.from_pandas(pandas_df)

GPU memory management:

import rmm  # RAPIDS Memory Manager

# Check memory
!nvidia-smi --query-gpu=memory.used --format=csv

# Clear GPU memory
import gc
gc.collect()
rmm.reinitialize(pool_allocator=True)

Learning milestones:

Access GPU → Colab or local setup
Use cuDF → GPU-accelerated DataFrames
Compare performance → Benchmark CPU vs GPU
Train ML on GPU → cuML algorithms

Project 13: Notebook-to-Production Pipeline

File: LEARN_JUPYTER_NOTEBOOKS_DEEP_DIVE.md
Main Programming Language: Python
Alternative Programming Languages: None
Coolness Level: Level 3: Genuinely Clever
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: MLOps / Software Engineering
Software or Tool: nbdev, Jupyter, pytest
Main Book: “Building Machine Learning Pipelines” by Hannes Hapke

What you’ll build: A workflow that converts exploratory notebooks into production Python packages—extracting functions, adding tests, generating documentation—using nbdev or manual refactoring patterns.

Why it teaches Jupyter: The biggest criticism of notebooks is “notebook spaghetti”—code that can’t be productionized. This project teaches the discipline of writing production-ready code in notebooks using nbdev’s literate programming approach.

Core challenges you’ll face:

Code extraction → maps to nbdev export, manual refactoring
Test generation → maps to cells as tests
Documentation → maps to docstrings, quarto
Packaging → maps to setup.py, pyproject.toml

Resources for key challenges:

Key Concepts:

Literate Programming: nbdev philosophy
Python Packaging: Python Packaging User Guide
Documentation Generation: Quarto documentation
CI/CD for Packages: GitHub Actions

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Projects 1-8, Python packaging knowledge

Real world outcome:

Your notebook-based development:

notebooks/
├── 00_core.ipynb       # Core library code
├── 01_data.ipynb       # Data loading utilities
├── 02_models.ipynb     # Model definitions
├── 03_training.ipynb   # Training pipeline
└── index.ipynb         # Documentation homepage

↓ nbdev_export ↓

my_package/
├── __init__.py
├── core.py             # Extracted from 00_core.ipynb
├── data.py             # Extracted from 01_data.ipynb
├── models.py           # Extracted from 02_models.ipynb
└── training.py         # Extracted from 03_training.ipynb

tests/
├── test_core.py        # From test cells in 00_core.ipynb
└── test_data.py        # From test cells in 01_data.ipynb

docs/
├── index.html          # Generated from notebooks
├── core.html
└── ...

# Install as package
pip install -e .

# Use in production
from my_package import train_model
train_model(config)

Implementation Hints:

nbdev workflow:

# In notebook cell, mark for export:
#| export
def my_function(x):
    """This function will be exported."""
    return x * 2

# Mark as test:
#| test
assert my_function(2) == 4

# Mark as documentation only (not exported):
#| echo: false
# This is explanation...

nbdev commands:

# Initialize nbdev
nbdev_new

# Export to Python modules
nbdev_export

# Run tests
nbdev_test

# Build documentation
nbdev_docs

# Prepare for release
nbdev_prepare

Manual refactoring pattern:

# 1. Identify reusable code in notebook
# 2. Extract to functions with docstrings
# 3. Move to .py files
# 4. Import in notebook for testing
# 5. Add unit tests
# 6. Package with setup.py

# Notebook becomes integration test:
from my_package import process_data, train_model

# Interactive development with imported code
df = process_data('data.csv')
model = train_model(df)

Learning milestones:

Set up nbdev → Initialize project
Export code → Cells to modules
Generate tests → Test cells
Build documentation → Quarto output
Publish package → PyPI release

Project 14: Complete Data Science Platform

File: LEARN_JUPYTER_NOTEBOOKS_DEEP_DIVE.md
Main Programming Language: Python
Alternative Programming Languages: R, SQL
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 5: Master
Knowledge Area: Platform Engineering / Full Stack
Software or Tool: JupyterHub, Kubernetes, All previous projects
Main Book: “Kubernetes in Action” by Marko Lukša

What you’ll build: A complete data science platform combining JupyterHub (multi-user), MLflow (experiments), Voilà (dashboards), and orchestration (Kubernetes/Docker)—a mini enterprise data science platform.

Why it teaches Jupyter: This capstone project integrates everything: notebooks for development, dashboards for stakeholders, experiment tracking for ML, and scalable infrastructure for teams.

Core challenges you’ll face:

JupyterHub on Kubernetes → maps to Zero to JupyterHub
Shared storage → maps to NFS, S3
User management → maps to OAuth, LDAP
Service integration → maps to MLflow, databases

Time estimate: 2-3 months Prerequisites: All previous projects, Kubernetes basics

Real world outcome:

Your platform architecture:

┌─────────────────────────────────────────────────────────────────────┐
│                        DATA SCIENCE PLATFORM                         │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐              │
│  │   User A     │  │   User B     │  │   User C     │              │
│  │  (Notebook)  │  │  (Notebook)  │  │  (Notebook)  │              │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘              │
│         │                 │                 │                        │
│         └─────────────────┼─────────────────┘                        │
│                           │                                          │
│                    ┌──────┴──────┐                                   │
│                    │ JupyterHub  │                                   │
│                    │   (Auth)    │                                   │
│                    └──────┬──────┘                                   │
│                           │                                          │
│         ┌─────────────────┼─────────────────┐                        │
│         ▼                 ▼                 ▼                        │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐              │
│  │    MLflow    │  │    Voilà     │  │   Shared     │              │
│  │ (Experiments)│  │ (Dashboards) │  │   Storage    │              │
│  └──────────────┘  └──────────────┘  └──────────────┘              │
│                                                                      │
│  Infrastructure: Kubernetes / Docker Compose                        │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Users experience:
1. Login with company credentials (OAuth)
2. Spawn personal Jupyter environment
3. Access shared data on NFS/S3
4. Track experiments in MLflow
5. Deploy dashboards with Voilà
6. Collaborate in real-time

Implementation Hints:

Docker Compose (development):

# docker-compose.yml
version: '3'
services:
  jupyterhub:
    image: jupyterhub/jupyterhub
    ports:
      - "8000:8000"
    volumes:
      - ./jupyterhub_config.py:/srv/jupyterhub/jupyterhub_config.py
      - ./data:/data
    environment:
      - DOCKER_NETWORK_NAME=ds-platform
  
  mlflow:
    image: ghcr.io/mlflow/mlflow
    ports:
      - "5000:5000"
    command: mlflow server --host 0.0.0.0
    volumes:
      - ./mlruns:/mlruns
  
  voila:
    image: voila/voila
    ports:
      - "8866:8866"
    volumes:
      - ./dashboards:/dashboards
    command: voila --no-browser /dashboards

Kubernetes (production):

# Use Zero to JupyterHub
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm install jhub jupyterhub/jupyterhub --values config.yaml

Integration:

# In notebook, connect to platform services

# MLflow
import mlflow
mlflow.set_tracking_uri("http://mlflow:5000")

# Shared storage
import s3fs
fs = s3fs.S3FileSystem(anon=False)
df = pd.read_csv(fs.open('s3://shared-data/dataset.csv'))

# Database
from sqlalchemy import create_engine
engine = create_engine('postgresql://db:5432/analytics')

Learning milestones:

Deploy JupyterHub → Multi-user Jupyter
Add MLflow → Experiment tracking
Configure storage → Shared data access
Set up auth → OAuth/LDAP
Add Voilà → Dashboard deployment

Project Comparison Table

#	Project	Difficulty	Time	Key Skill	Fun
1	Interactive Data Explorer	⭐	Weekend	Cell Execution	⭐⭐⭐
2	Literate Programming Tutorial	⭐	3-5 days	Markdown/LaTeX	⭐⭐⭐
3	Reproducible Research	⭐⭐	1 week	Environment Management	⭐⭐⭐
4	Interactive Dashboard	⭐⭐	1-2 weeks	ipywidgets	⭐⭐⭐⭐
5	Kernel Architecture	⭐⭐⭐	2-3 weeks	ZeroMQ/IPC	⭐⭐⭐⭐
6	Multi-Language Notebook	⭐⭐	1-2 weeks	Polyglot Programming	⭐⭐⭐⭐
7	ML Experiment Tracker	⭐⭐	1-2 weeks	MLflow/W&B	⭐⭐⭐⭐
8	Automated Report Generator	⭐⭐	1 week	Papermill	⭐⭐⭐
9	JupyterLab Extension	⭐⭐⭐⭐	3-4 weeks	TypeScript/React	⭐⭐⭐⭐
10	Real-Time Collaboration	⭐⭐⭐⭐	4-6 weeks	CRDTs/WebSocket	⭐⭐⭐⭐⭐
11	Notebook Testing	⭐⭐	1-2 weeks	pytest/nbval	⭐⭐⭐
12	GPU-Accelerated DS	⭐⭐⭐	2-3 weeks	RAPIDS/cuDF	⭐⭐⭐⭐
13	Notebook-to-Production	⭐⭐⭐	2-3 weeks	nbdev	⭐⭐⭐⭐
14	Complete DS Platform	⭐⭐⭐⭐⭐	2-3 months	Full Stack	⭐⭐⭐⭐⭐

Recommended Learning Path

Phase 1: Fundamentals (1-2 weeks)

Understand why notebooks exist and basic usage:

Project 1: Interactive Data Explorer - Core notebook skills
Project 2: Literate Programming Tutorial - Documentation features

Phase 2: Professional Usage (2-3 weeks)

Learn to use notebooks professionally:

Project 3: Reproducible Research - Environment management
Project 4: Interactive Dashboard - Widgets and interactivity
Project 8: Automated Report Generator - Parameterized notebooks

Phase 3: Data Science Workflows (2-3 weeks)

Apply notebooks to data science:

Project 6: Multi-Language Notebook - Polyglot programming
Project 7: ML Experiment Tracker - MLOps integration
Project 11: Notebook Testing - Quality assurance

Phase 4: Advanced Architecture (3-4 weeks)

Understand how notebooks work:

Project 5: Kernel Architecture - Under the hood
Project 9: JupyterLab Extension - Extending Jupyter

Phase 5: Production & Scale (4-8 weeks)

Enterprise-grade notebook workflows:

Project 12: GPU-Accelerated DS - High-performance computing
Project 13: Notebook-to-Production - Code extraction
Project 10: Real-Time Collaboration - Multi-user editing
Project 14: Complete DS Platform - Full infrastructure

Final Project: End-to-End Data Science Workflow

Following the same pattern as above, this capstone applies everything:

What you’ll build: A complete end-to-end data science workflow in notebooks:

Data ingestion from multiple sources (APIs, databases, files)
Exploratory data analysis with rich visualizations
Feature engineering with documentation
Model development with experiment tracking
Interactive dashboard for stakeholders
Automated reporting pipeline
Production package extraction
CI/CD for the entire workflow

This final project demonstrates:

When to use notebooks vs. pure code
Professional notebook organization
Reproducibility at every step
From exploration to production

Summary

#	Project	Main Language
1	Interactive Data Explorer	Python
2	Literate Programming Tutorial	Python/Markdown
3	Reproducible Research Document	Python
4	Interactive Visualization Dashboard	Python
5	Kernel Architecture Deep Dive	Python
6	Multi-Language Notebook	Python/R/Julia
7	Machine Learning Experiment Tracker	Python
8	Automated Report Generator	Python
9	JupyterLab Extension	TypeScript
10	Real-Time Collaborative Notebook	Python/TypeScript
11	Notebook Testing Framework	Python
12	GPU-Accelerated Data Science	Python
13	Notebook-to-Production Pipeline	Python
14	Complete Data Science Platform	Python

Resources

Essential Books

“Python for Data Analysis” by Wes McKinney - Pandas creator, covers notebooks
“Hands-On Machine Learning” by Aurélien Géron - Uses notebooks throughout
“Data Science at the Command Line” by Jeroen Janssens - Alternative perspective

Tools

Jupyter Notebook: https://jupyter.org/ - Classic interface
JupyterLab: https://jupyterlab.readthedocs.io/ - Modern interface
Google Colab: https://colab.research.google.com/ - Free GPU notebooks
VSCode Jupyter: https://code.visualstudio.com/docs/datascience/jupyter-notebooks
nbdev: https://nbdev.fast.ai/ - Notebooks to packages
Voilà: https://voila.readthedocs.io/ - Notebooks to dashboards

Documentation

Practice Platforms

Kaggle Notebooks: https://www.kaggle.com/code - Data science competitions
Observable: https://observablehq.com/ - JavaScript notebooks
Deepnote: https://deepnote.com/ - Collaborative notebooks

Total Estimated Time: 4-6 months of dedicated study

After completion: You’ll understand exactly why notebooks exist, when to use them (and when not to), how to build professional data science workflows, and how the entire Jupyter architecture works from browser to kernel. You’ll be able to build interactive dashboards, automate reports, track experiments, and even extend JupyterLab with custom functionality.