Project 10: Custom Keyboard Firmware from Scratch

Build a full keyboard firmware without QMK: matrix scanning, debounce, keymap processing, and USB HID.

Quick Reference

Attribute Value
Difficulty Level 5: Expert
Time Estimate 2-4 months
Main Programming Language C or Rust
Alternative Programming Languages Zig
Coolness Level Level 5: Pure Magic
Business Potential Level 5: Industry Disruptor
Prerequisites Projects 1-6, USB basics
Key Topics Bare-metal firmware, HID stack, timing

1. Learning Objectives

By completing this project, you will:

  1. Design a firmware architecture for matrix scanning and key processing.
  2. Implement per-key debounce and layer handling from scratch.
  3. Integrate a USB HID stack and send valid reports.
  4. Measure and optimize scan latency and memory usage.
  5. Produce a stable firmware that enumerates and types reliably.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Real-Time Scan Loop, Debounce, and Event Pipeline

Fundamentals

A keyboard firmware is a real-time system. It must scan the matrix at a consistent rate, detect changes, debounce them, and translate them into key events. This pipeline must be deterministic and low-latency; otherwise, keypresses feel sluggish or are missed. The scan loop is the heart of the firmware and must be designed to avoid long blocking operations.

Deep Dive into the Concept

The scan loop is a periodic task that cycles through all columns (or rows), reads the inputs, and produces a raw matrix snapshot. The time to complete one scan is the scan period. Many firmware designs aim for 1 kHz scanning, which yields 1 ms scan periods. The latency for a keypress is typically one scan period plus debounce time, so a 5 ms debounce adds noticeable delay. Your firmware must balance low latency with stability.

A clean architecture separates stages: raw scan, debounce, event generation, key processing, and report generation. Each stage has clear inputs and outputs. For example, raw scan produces a boolean matrix. Debounce produces a stable matrix. Event generation compares the stable matrix to the previous stable matrix and emits press/release events. Key processing applies layers, macros, and modifiers. Report generation builds the HID report. This pipeline makes debugging easier because you can inspect each stage independently.

Debounce is a per-key filter. A common algorithm uses a counter per key that increments when the raw state differs from the stable state. When the counter reaches a threshold, the stable state changes. This requires maintaining state per key (stable state, counter). This is memory-intensive for large matrices, but it provides reliable results. An alternative is a time-based debounce using timestamps, which can be more precise but requires timers. Either way, you must ensure debounce is deterministic and does not stall the scan loop.

Real-time constraints demand that the scan loop be predictable. Long operations (like USB control transfers or complex macros) should be done outside the critical scan path. If you implement a cooperative scheduler, ensure that the scan loop runs at a fixed interval. You can use a hardware timer interrupt to trigger scans or a software loop with time checks. If you choose a timer interrupt, keep the ISR short and delegate heavy work to the main loop.

The event pipeline is also where you implement key rollover. In a boot keyboard report, only six simultaneous keys are supported. If you want NKRO, you must implement a different report format and manage state accordingly. Even if you stick with 6KRO, you must handle the case where more than six keys are pressed. A common strategy is to drop extra keys or prioritize the latest ones. This is a design decision that should be documented.

Finally, deterministic logging is your friend. When building from scratch, you will have bugs in the scan loop, debounce, or event generation. Implement a debug output (UART or RTT) and log state changes with timestamps. This is how you will verify scan timing and spot stuck keys.

Additional architectural detail: consider adding a small cooperative scheduler that assigns time slices to scanning, USB tasks, and background features. This prevents any single task from starving the scan loop. If you use timer interrupts for scanning, ensure the interrupt priority is appropriate and that shared data is protected (e.g., with atomic access or double-buffering). In a bare-metal system, even simple logging can distort timing; include a compile-time flag to disable logging in performance tests. Also, document the decision on key rollover and ghost handling; these are user-visible behaviors.

Extra pipeline advice: consider packing the matrix into bitfields to reduce RAM usage. A 10x10 matrix can fit into 10 bytes if you treat each row as a bitmask. This makes comparisons and event generation faster as well. When implementing ghost prevention, assume diodes and document that your firmware expects them; otherwise, you need extra logic to detect ghosting rectangles, which adds complexity and latency.

How this fits on projects

This concept builds on Project 1 and Project 3 and is the core of Project 10. It also feeds into Project 11 when designing production firmware.

Definitions & key terms

  • Scan period: Time to complete a full matrix scan.
  • Debounce: Filtering mechanical bounce to produce stable events.
  • Event pipeline: Sequence from raw scan to HID report.
  • Rollover: Number of simultaneous keys supported.

Mental model diagram (ASCII)

Raw Scan -> Debounce -> Events -> Key Processing -> HID Report

How it works (step-by-step, with invariants and failure modes)

  1. Drive one column active and read rows.
  2. Repeat for all columns to form raw matrix.
  3. Apply debounce to each key.
  4. Compare to previous stable state to generate events.
  5. Process events to update modifiers and keycodes.

Invariant: scan loop runs at fixed interval. Failure modes include jitter, blocking operations, and missing key releases.

Minimal concrete example

for (col = 0; col < COLS; col++) {
  drive_col(col, LOW);
  read_rows(raw[col]);
}
if (raw != stable) debounce_update();

Common misconceptions

  • “Debounce is optional”: It is mandatory for mechanical switches.
  • “Scan loop can be slow”: Slow scan increases latency.
  • “Events can be generated directly from raw”: Raw includes bounce noise.

Check-your-understanding questions

  1. Why is a fixed scan period important?
  2. What happens if you process USB tasks inside the scan loop?
  3. How do you detect key release events?

Check-your-understanding answers

  1. It provides consistent latency and makes debounce predictable.
  2. It can block scanning and cause missed keys.
  3. Compare current stable matrix with previous stable matrix.

Real-world applications

  • Any keyboard firmware (QMK, TMK, ZMK) uses this pipeline.

Where you’ll apply it

  • In this project: §3.2 Functional Requirements and §5.10 Phase 1.
  • Also used in: Project 1.

References

  • “Making Embedded Systems” Ch. 4-5.
  • QMK scanning and debounce docs.

Key insights

A reliable keyboard is a real-time pipeline; timing discipline is the core requirement.

Summary

Design a deterministic scan pipeline and keep it separate from heavy processing to achieve low-latency, reliable key detection.

Homework/Exercises to practice the concept

  1. Implement a debounce counter per key in a small 2x2 matrix.
  2. Measure scan period jitter with a GPIO toggle.

Solutions to the homework/exercises

  1. Use a counter array and change stable state after threshold.
  2. Use a logic analyzer to observe scan loop timing.

2.2 USB HID Stack Integration and Memory Constraints

Fundamentals

To communicate with the host, your firmware must implement USB HID. This includes descriptors, endpoint configuration, and report generation. At the same time, you must manage limited flash and RAM. A small MCU can easily run out of memory if you allocate large buffers or enable too many features. A minimal, well-structured HID stack is essential.

Deep Dive into the Concept

USB HID integration is both a protocol problem and a resource problem. You need to define device, configuration, interface, HID, and endpoint descriptors. The HID report descriptor defines the report format, and your report generation must match it exactly. If you choose a boot keyboard format, the report is 8 bytes; if you choose NKRO, the report can be larger and requires bitfields. The USB stack must respond to standard requests (GET_DESCRIPTOR, SET_CONFIGURATION) and HID-specific requests (SET_PROTOCOL, SET_IDLE).

You can build on a USB stack like TinyUSB or implement your own minimal stack. If you implement your own, you must handle control transfers on endpoint 0, manage USB interrupts, and schedule IN transfers on the HID interrupt endpoint. This is complex but instructive. TinyUSB simplifies the transport layer so you can focus on report generation and application logic. Regardless of stack choice, you must ensure the USB task does not block your scan loop. Typically, you call a usb_task() function in the main loop to handle control requests and send reports.

Memory constraints shape architecture. The matrix state, debounce counters, keymap arrays, and USB buffers all consume RAM. For example, a 10x10 matrix with per-key debounce counters can require hundreds of bytes. The HID report buffer is small (8-16 bytes), but if you add NKRO or macro queues, memory grows. On small MCUs, you must carefully size arrays and place constant data in flash. In C, use const and PROGMEM when available. In Rust, use static arrays and avoid heap allocation.

Flash constraints matter too. Each feature (layers, macros, RGB) adds code. For a from-scratch firmware, you should define a minimal set of features and add optional modules. This is similar to QMK’s feature flags but implemented by you. Build-time configuration (e.g., C preprocessor flags) lets you compile out unused features. This is how you keep the firmware within flash limits and maintain fast builds.

The final challenge is testing and validation. A USB device can appear to enumerate but still fail to deliver correct reports. Use host tools like lsusb -v and USB sniffers to validate descriptor correctness. On the device side, add debug logging for report bytes and state changes. This is the only reliable way to debug a bare-metal USB stack.

Additional memory strategy: place constant lookup tables (keymaps, ASCII maps) in flash and keep RAM for dynamic state only. Enable compiler flags like -ffunction-sections and -fdata-sections with linker garbage collection to reduce flash usage. Monitor stack usage; a deep call chain in USB handling can overflow small stacks. If your MCU supports DMA for USB or UART, consider using it to offload transfers, but keep the driver minimal. Always include a build-time memory map report and check it in CI to avoid regressions.

Extra USB/memory note: build a small unit test harness for the report generator that runs on your host machine. This lets you validate report bytes without flashing hardware every time. For memory, generate a link map and inspect large sections. If your firmware grows unexpectedly, use nm or size to identify bloated symbols and refactor. Consider a small ring buffer for key events so that you can decouple scanning from USB report generation under heavy load.

How this fits on projects

This concept is built on Project 2 and is essential for Project 10. It also informs Project 11 when you design production firmware variants.

Definitions & key terms

  • Endpoint 0: Control endpoint used for enumeration.
  • Interrupt IN endpoint: Polled endpoint for HID reports.
  • NKRO: N-key rollover, allowing many simultaneous keys.
  • Flash/RAM budget: Limits on program and data memory.

Mental model diagram (ASCII)

Descriptors -> Enumeration -> HID Endpoint -> Reports
Memory: matrix + debounce + report buffers + keymap

How it works (step-by-step, with invariants and failure modes)

  1. Provide descriptors and handle enumeration requests.
  2. Configure HID interrupt IN endpoint.
  3. Generate report from key state.
  4. Send reports when polled.

Invariant: report size must match descriptor. Failure modes include enumeration failures, stuck keys, and memory overflows.

Minimal concrete example

uint8_t report[8] = {mods, 0, k1, k2, k3, k4, k5, k6};
usb_send(report, 8);

Common misconceptions

  • “USB stack doesn’t affect timing”: It can block if implemented poorly.
  • “Memory is abundant”: Small MCUs have tight RAM constraints.
  • “Enumeration success means correct reports”: Reports can still be wrong.

Check-your-understanding questions

  1. Why must the report size match the descriptor?
  2. What are the key RAM consumers in a keyboard firmware?
  3. How do you keep USB tasks from blocking the scan loop?

Check-your-understanding answers

  1. The host parses reports based on descriptor length; mismatch breaks input.
  2. Matrix state, debounce counters, keymap data, and USB buffers.
  3. Run USB tasks in the main loop and keep ISR minimal.

Real-world applications

  • Custom firmware for embedded USB devices.
  • Specialized input devices with proprietary features.

Where you’ll apply it

  • In this project: §3.2 Functional Requirements and §5.10 Phase 2.
  • Also used in: Project 2.

References

  • “USB Complete” Ch. 3-6.
  • “The Definitive Guide to ARM Cortex-M” Ch. 1-3.

Key insights

USB integration and memory constraints define the architecture of a from-scratch firmware.

Summary

A minimal USB HID stack plus a disciplined memory strategy is the foundation of a reliable custom firmware.

Homework/Exercises to practice the concept

  1. Implement a minimal descriptor and verify with lsusb.
  2. Estimate RAM usage for a 6x15 matrix with debounce counters.

Solutions to the homework/exercises

  1. Use boot keyboard descriptor and check descriptor length in host output.
  2. 90 keys; if each uses 2 bytes, about 180 bytes for counters plus state arrays.

3. Project Specification

3.1 What You Will Build

A complete keyboard firmware that:

  • Scans a matrix and debounces key presses.
  • Implements a layer-aware keymap.
  • Enumerates as a USB HID keyboard.
  • Sends correct press/release reports.

3.2 Functional Requirements

  1. Matrix scan: stable scan loop at 500-1000 Hz.
  2. Debounce: per-key debounce with configurable window.
  3. Keymap: at least 3 layers with transparent keys.
  4. USB HID: descriptors and interrupt IN endpoint.

3.3 Non-Functional Requirements

  • Latency: < 5 ms end-to-end.
  • Memory: fits within target MCU RAM/flash.
  • Reliability: no stuck keys or missed presses.

3.4 Example Usage / Output

$ arm-none-eabi-size firmware.elf
text data bss dec hex
15824 256 4096 20176 4ee0

3.5 Data Formats / Schemas / Protocols

  • USB HID descriptors and reports.

3.6 Edge Cases

  • More than 6 keys pressed (boot protocol limit).
  • Scan loop jitter causing missed debounces.
  • USB enumeration failures.

3.7 Real World Outcome

Your firmware enumerates and types reliably on a host.

3.7.1 How to Run (Copy/Paste)

make
make flash

3.7.2 Golden Path Demo (Deterministic)

  • Scan rate at 1000 Hz.
  • Debounce window 5 ms.
  • Typing “HELLO” once with correct press/release reports.

3.7.3 If CLI: exact terminal transcript

$ make
Build successful
exit_code=0

$ make flash
Flashing complete
exit_code=0

$ make flash
error: device not found
exit_code=3

4. Solution Architecture

4.1 High-Level Design

Matrix Scan -> Debounce -> Keymap -> USB HID

4.2 Key Components

Component Responsibility Key Decisions
Scanner Read matrix state Scan rate, settle time
Debouncer Filter bounce Counter vs time-based
Keymap Resolve layers and keycodes Data structure choice
USB Stack Enumeration and report sending TinyUSB vs custom

4.3 Data Structures (No Full Code)

typedef struct { bool stable; uint8_t counter; } key_t;

4.4 Algorithm Overview

Key Algorithm: Event Generation

  1. Compare stable matrix with previous state.
  2. For each changed key, emit press or release.
  3. Update active key list for report.

Complexity Analysis:

  • Time: O(R * C) per scan
  • Space: O(R * C)

5. Implementation Guide

5.1 Development Environment Setup

arm-none-eabi-gcc --version
make

5.2 Project Structure

firmware/
├── src/
│   ├── main.c
│   ├── matrix.c
│   ├── debounce.c
│   ├── keymap.c
│   └── usb_hid.c
├── include/
│   └── config.h
└── Makefile

5.3 The Core Question You’re Answering

“Can I design a reliable keyboard firmware architecture without relying on QMK?”

5.4 Concepts You Must Understand First

  1. Scan loop timing and debounce.
  2. USB HID descriptors and reports.
  3. Memory budgeting and feature trade-offs.

5.5 Questions to Guide Your Design

  1. What scan rate yields good latency without excessive CPU load?
  2. Which debounce algorithm is simplest and reliable?
  3. Will you support boot protocol or NKRO?

5.6 Thinking Exercise

Sketch the full firmware pipeline and label which parts are time-critical.

5.7 The Interview Questions They’ll Ask

  1. How does your firmware ensure deterministic scan timing?
  2. How do you handle more than six simultaneous keys?
  3. How do you validate USB descriptors?

5.8 Hints in Layers

Hint 1: Start with scan + debug output Prove raw matrix readings before debounce.

Hint 2: Add debounce next Verify with a bouncing key simulation.

Hint 3: Integrate USB last Use a known-good HID descriptor.

Hint 4: Measure latency Use GPIO toggles and a logic analyzer.

5.9 Books That Will Help

Topic Book Chapter
ARM fundamentals “The Definitive Guide to ARM Cortex-M” Ch. 1-3
USB “USB Complete” Ch. 3-6
C safety “Effective C” Ch. 2-4

5.10 Implementation Phases

Phase 1: Scan + Debounce (3-4 weeks)

Goals: stable scan loop and debounced matrix

Tasks:

  1. Implement matrix scan.
  2. Add per-key debounce and event generation.

Checkpoint: Press/release events match expected behavior.

Phase 2: Keymap + Layers (3-4 weeks)

Goals: layer-aware key processing

Tasks:

  1. Implement keymap arrays and layer stack.
  2. Add transparent key handling.

Checkpoint: Layer switching works correctly.

Phase 3: USB HID (4-6 weeks)

Goals: enumeration and reports

Tasks:

  1. Implement descriptors and HID endpoint.
  2. Send reports and verify on host.

Checkpoint: Keyboard types correctly on two OSes.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
USB stack TinyUSB vs custom TinyUSB reduces complexity
Debounce counter vs time counter simple and deterministic
Rollover 6KRO vs NKRO 6KRO simplest for first build

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Timing Tests Measure scan period GPIO toggle timing
Functional Tests Press/release correctness key event logs
USB Tests Enumeration and reports lsusb, usbmon

6.2 Critical Test Cases

  1. Key press + release produces exactly one event each.
  2. Scan period stays within 1-2 ms.
  3. USB enumeration succeeds on two OSes.

6.3 Test Data

Scan rate target: 1000 Hz
Debounce: 5 ms

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Long ISR blocking scan Missed keys Keep ISRs minimal
Descriptor mismatch Enumerates but no typing Compare with known descriptor
Memory overflow Random crashes Reduce buffers or features

7.2 Debugging Strategies

  • Log every event in early stages to verify timing.
  • Use a USB analyzer for descriptor validation.

7.3 Performance Traps

Logging too much over UART can slow the scan loop. Use conditional logging.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add media keys.
  • Add a simple macro engine.

8.2 Intermediate Extensions

  • Add NKRO support.
  • Add per-key RGB with a simple driver.

8.3 Advanced Extensions

  • Implement a custom bootloader.
  • Add dynamic keymap storage in flash.

9. Real-World Connections

9.1 Industry Applications

  • Custom firmware for keyboard products.
  • Specialized USB input devices.
  • TinyUSB.
  • QMK core (for reference).

9.3 Interview Relevance

  • Demonstrates full-stack embedded system design.

10. Resources

10.1 Essential Reading

  • “USB Complete” Ch. 3-6.
  • “The Definitive Guide to ARM Cortex-M” Ch. 1-3.

10.2 Video Resources

  • USB HID implementation talks.

10.3 Tools & Documentation

  • TinyUSB docs, lsusb, usbmon.

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain the full scan pipeline.
  • I can describe USB HID enumeration.

11.2 Implementation

  • Firmware types correctly on host.
  • Scan rate is stable and measured.

11.3 Growth

  • I can evaluate memory usage and optimize.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Matrix scan + debounce works.
  • USB HID enumerates and types.

Full Completion:

  • Layer system and macros implemented.
  • Latency < 5 ms.

Excellence (Going Above & Beyond):

  • NKRO support and dynamic keymaps.
  • Comprehensive timing and memory report.