Project 1: NUMA Topology Explorer

Build a CLI tool that discovers and explains the real NUMA topology of a Linux machine, including nodes, CPUs, memory, caches, and inter-node distances.

Quick Reference

Attribute	Value
Difficulty	Level 1: Beginner
Time Estimate	1-2 weeks
Main Programming Language	C (Alternatives: Rust, Python)
Alternative Programming Languages	Rust, Python
Coolness Level	Level 2: Practical but Forgettable
Business Potential	1. The “Resume Gold”
Prerequisites	Basic C, Linux CLI usage, filesystems, and process basics
Key Topics	NUMA topology, distance matrices, Linux sysfs, libnuma, cpulist parsing

1. Learning Objectives

By completing this project, you will:

Explain how NUMA nodes map to sockets, memory controllers, and CPU sets.
Read NUMA topology data from Linux sysfs and validate it against libnuma.
Parse cpulist/cpumask encodings and convert them into explicit CPU lists.
Build a distance matrix and interpret relative memory access costs.
Handle non-NUMA systems and virtualized environments gracefully.
Provide deterministic topology reports using captured sysfs fixtures.
Communicate topology findings clearly in both human-readable and JSON formats.

2. All Theory Needed (Per-Concept Breakdown)

2.1 NUMA Topology and Distance Matrices

Fundamentals

A NUMA system is organized into nodes, each containing a subset of CPUs and a portion of physical memory that is “local” to those CPUs. Accessing local memory is faster than accessing memory attached to another node because the request must traverse an interconnect (UPI, Infinity Fabric, CCIX, etc.). Operating systems model this topology with a distance matrix: a table of relative costs to access memory on each node. The numbers are not absolute latency; they are normalized costs where the smallest value is the local node. Understanding that these distances are relative is crucial for placement decisions, because a difference of a few points can still represent a measurable latency penalty. NUMA topology is the base map for every scheduling and memory policy decision you will make later.

Deep Dive into the Concept

NUMA topology emerges from how a physical machine is built. A multi-socket server typically has one or more memory controllers per socket. Each controller is wired to a bank of DRAM channels and a set of cores. When a core issues a load or store, its memory controller can service the request locally or forward it across a coherent interconnect to another socket. This extra hop introduces latency, adds congestion, and consumes bandwidth on the interconnect.

Operating systems do not model the machine at a wire level; instead they build an abstracted graph. Firmware (ACPI tables such as SRAT and SLIT) describes which CPUs and memory ranges belong to each NUMA node and a relative distance between nodes. The kernel uses that data to build a NUMA node map and a distance matrix. Distances are normalized so that local access is the smallest value (often 10). Remote distances are larger (often 20-30) but the exact numbers vary by platform. A dual-socket system might show symmetric distances, while a multi-socket system in a mesh could show asymmetric or non-uniform distances due to the topology of the interconnect.

There are two implications for developers. First, any performance analysis that ignores node distances will misattribute slow memory access to “random” causes. Second, there are more than two levels of locality: local, near-remote, and far-remote. On systems with more than two sockets, the best remote node may not be the one you expect. A distance matrix is the simplest way to capture this nuance. This is why your tool must parse and present it clearly.

NUMA topology is also affected by virtualization and firmware settings. Some hypervisors expose a simplified topology or hide NUMA entirely. BIOS options can disable NUMA or create “cluster on die” modes that split a socket into multiple nodes. As a result, your discovery tool must not assume a specific number of nodes or symmetrical distances. It should treat the data as authoritative and handle inconsistent or missing entries by surfacing a clear warning.

Finally, NUMA interacts with cache hierarchy. LLCs can be shared across subsets of cores, and NUMA nodes often align with LLC groups. While the distance matrix captures DRAM access cost, cache-sharing topology affects cross-core sharing and coherence traffic. A thorough topology report should therefore show both node membership and cache sharing to give the user a full mental model of the machine.

How this fits on projects

This concept is the backbone of the topology explorer: every report line about nodes, distances, and locality depends on building and interpreting the matrix correctly. You will also use it to validate the output from libnuma and to choose how you display “near” vs “far” nodes.

Definitions & Key Terms

NUMA node -> A grouping of CPUs and memory that are local to each other.
Distance matrix -> A table of relative memory access costs between nodes.
Local memory -> Memory attached to the same node as the requesting CPU.
Remote memory -> Memory attached to a different node, accessed over an interconnect.
Interconnect -> The hardware fabric that connects sockets and enables coherent memory access.
SRAT/SLIT -> ACPI tables describing NUMA topology and relative distances.

Mental Model Diagram (ASCII)

        [Node 0]                         [Node 1]
   CPUs: 0-7, LLC0                  CPUs: 8-15, LLC1
   Memory: 32 GB                    Memory: 32 GB
         |                                 |
         |  local=10                       | local=10
         |--------- interconnect ----------|
                remote=21 (relative cost)

Distance Matrix:
       N0   N1
N0:   10   21
N1:   21   10

How It Works (Step-by-Step)

Firmware exposes CPU-to-node and memory-to-node mappings (SRAT).
Firmware exposes a distance matrix (SLIT) with relative costs.
Linux kernel builds internal node tables and exposes them via sysfs.
Tools like libnuma read those tables and provide API access.
Your tool reads both sources and compares for consistency.

Invariants: Distance to self is minimal and symmetric on most systems.

Failure modes: NUMA disabled in BIOS, SLIT missing, virtualization hiding topology.

Minimal Concrete Example

#include <numa.h>
#include <stdio.h>

int main(void) {
    if (numa_available() < 0) {
        printf("NUMA not available\n");
        return 1;
    }
    int max = numa_max_node();
    for (int i = 0; i <= max; i++) {
        for (int j = 0; j <= max; j++) {
            printf("%2d ", numa_distance(i, j));
        }
        printf("\n");
    }
    return 0;
}

Common Misconceptions

“Distance equals nanoseconds” -> It is a relative cost, not a measured latency.
“Two-node systems are always symmetric” -> Some systems report asymmetric values.
“NUMA is only about memory” -> It also shapes cache sharing and scheduling.

Check-Your-Understanding Questions

Why is the distance matrix a relative cost rather than an absolute latency?
What would it mean if Node 0 to Node 1 is 21 but Node 1 to Node 0 is 30?
How can BIOS settings change the number of NUMA nodes visible to Linux?
If a machine reports only one node, what does that imply about UMA vs NUMA?

Check-Your-Understanding Answers

Because the kernel exposes topology costs normalized to a baseline, not measured timing.
It suggests an asymmetric topology or firmware anomaly; you should report it clearly.
Options like node interleaving, cluster-on-die, or NUMA disable can alter topology.
It behaves like UMA from the OS perspective, even if hardware is physically NUMA.

Real-World Applications

Database engines choosing buffer pool placement per socket.
HPC codes pinning MPI ranks to minimize remote memory access.
Virtualization platforms sizing and pinning VMs to NUMA nodes.

Where You’ll Apply It

In this project: Sec. 3.1, Sec. 3.5, Sec. 4.1, and Sec. 5.10.
Also used in: P02-memory-latency-microbenchmark, P03-memory-bandwidth-benchmark, P09-numa-aware-thread-pool.

References

“Computer Architecture, 5th Edition” (Hennessy & Patterson) – Ch. 2
“Inside the Machine” (Stokes) – Ch. 3-4
“The Linux Programming Interface” (Kerrisk) – Ch. 6

Key Insights

NUMA distance is a topology signal, not a stopwatch, but it is the map every performance decision follows.

Summary

NUMA topology is the OS-level representation of how CPUs and memory are wired together. Distance matrices expose relative access costs between nodes and drive placement strategies. Your tool must treat this information as authoritative, detect anomalies, and present it in a way users can reason about quickly.

Homework/Exercises to Practice the Concept

Sketch the distance matrix for a 4-socket machine arranged in a ring.
Read the SRAT/SLIT description online and summarize what each table provides.
Explain how node interleaving might change reported distances.

Solutions to the Homework/Exercises

The ring has low cost to neighbors and higher cost to the opposite node; the matrix is symmetric with two remote tiers.
SRAT maps CPUs and memory ranges to nodes; SLIT provides relative distance costs between nodes.
Interleaving may collapse nodes into a single logical domain or flatten distances.

2.2 Linux NUMA Introspection Interfaces (sysfs + libnuma)

Fundamentals

Linux exposes NUMA topology via sysfs under /sys/devices/system/node/. Each nodeN directory contains files that describe CPUs, memory, and distances. CPU lists are encoded as compact ranges (for example 0-3,8-11). Memory sizes are exposed through meminfo, and the distance matrix lives in distance. The libnuma library provides a higher-level API that reads the same data but hides many parsing details. Understanding both layers is essential: sysfs is a stable, inspectable source of truth, while libnuma provides a convenient and portable interface. A robust topology explorer should read both, compare them, and surface mismatches to help users trust the output.

Deep Dive into the Concept

Sysfs is a virtual filesystem that mirrors kernel objects. Under /sys/devices/system/node/, each NUMA node has a directory such as node0, node1, etc. Important files include:

cpulist: a human-readable range list of CPUs belonging to the node.
cpumap: a hex bitmask of CPUs.
meminfo: per-node memory statistics.
distance: a whitespace-separated list of distance values.

Parsing cpulist correctly is deceptively tricky. The format supports single CPU IDs, ranges, and comma-separated lists. The same encoding appears in many places: cpusets, cgroups, and task affinity masks. You must handle malformed inputs, whitespace variations, and empty lists. While cpumap may seem easier, you must understand endianness and bit ordering, which can be error-prone. Many tools therefore parse cpulist and use cpumap only as a validation step.

libnuma provides functions such as numa_available(), numa_max_node(), numa_node_to_cpus(), and numa_distance(). Internally, libnuma uses sysfs and kernel interfaces; it does not magically discover new topology. It can also respect cpuset restrictions, meaning its output might differ from raw sysfs on systems with containers or cpuset constraints. This is a feature, not a bug: libnuma shows the topology visible to the calling process. Your tool should therefore document whether it is operating on the global system view or the process-restricted view.

A professional-grade topology explorer provides both: a “global” mode reading sysfs and a “process” mode reading libnuma and cpuset state. It should also show cache information. Cache topology is exposed under /sys/devices/system/cpu/cpu*/cache/ and can be cross-referenced with nodes. By combining these views, your tool can answer practical questions like: “Which CPUs share L3 on Node 0?” or “Which node does CPU 12 belong to under the current cpuset?”

The takeaway: sysfs is detailed and transparent, libnuma is convenient and context-aware. Comparing them builds trust and exposes hidden constraints.

How this fits on projects

Your tool reads sysfs to build the base topology, then uses libnuma to validate and interpret it. Parsing cpulist correctly is mandatory to match numactl --hardware output.

Definitions & Key Terms

sysfs -> A virtual filesystem exposing kernel objects.
cpulist -> A range-encoded list of CPU IDs (e.g., 0-3,8-11).
cpumap -> A hex bitmask representing CPUs.
libnuma -> A user-space library providing NUMA APIs.
cpuset -> A kernel mechanism to restrict which CPUs and nodes a process can use.

Mental Model Diagram (ASCII)

/sys/devices/system/node/
  node0/
    cpulist  -> "0-7"
    meminfo  -> "MemTotal: 32768 MB"
    distance -> "10 21"
  node1/
    cpulist  -> "8-15"
    meminfo  -> "MemTotal: 32768 MB"
    distance -> "21 10"

libnuma API
  numa_available()
  numa_node_to_cpus(node, mask)
  numa_distance(a, b)

How It Works (Step-by-Step)

Enumerate /sys/devices/system/node/nodeN directories.
Read and parse cpulist into explicit CPU IDs.
Read meminfo to extract total and free memory.
Read distance and build a matrix.
Call libnuma to confirm node count and CPU mapping.
Compare sysfs-derived mapping vs libnuma-visible mapping.

Invariants: cpulist and cpumap represent the same CPUs.

Failure modes: missing sysfs files, permission issues, cpuset restrictions.

Minimal Concrete Example

// Parse a cpulist like "0-3,8-11,16"
// into a vector of CPU IDs.
int parse_cpulist(const char *s, int *out, int max) {
    int count = 0;
    int start = -1, end = -1;
    while (*s) {
        if (*s >= '0' && *s <= '9') {
            int val = 0;
            while (*s >= '0' && *s <= '9') { val = val*10 + (*s - '0'); s++; }
            if (start < 0) start = val; else end = val;
        }
        if (*s == '-' || *s == ',') {
            if (*s == ',') {
                if (end < 0) end = start;
                for (int i = start; i <= end && count < max; i++) out[count++] = i;
                start = end = -1;
            }
            s++;
        } else if (*s == '\0') break; else s++;
    }
    if (start >= 0) {
        if (end < 0) end = start;
        for (int i = start; i <= end && count < max; i++) out[count++] = i;
    }
    return count;
}

Common Misconceptions

“libnuma is always correct” -> It reflects cpusets and process restrictions.
“cpumap is easier” -> It is easy to misread bit ordering.
“sysfs is always complete” -> Some virtualized systems expose partial data.

Check-Your-Understanding Questions

Why might libnuma show fewer CPUs than sysfs?
What is the difference between cpulist and cpumap?
Why should your tool support a --sysfs-root option?

Check-Your-Understanding Answers

Because cpusets or containers restrict the process view of the topology.
cpulist is human-readable ranges; cpumap is a bitmask.
It enables deterministic testing with captured fixtures.

Real-World Applications

Infrastructure teams auditing NUMA topology in data centers.
Performance engineers validating topology before tuning databases.
Container platforms verifying cpuset isolation.

Where You’ll Apply It

In this project: Sec. 3.2, Sec. 3.5, Sec. 4.2, and Sec. 5.2.
Also used in: P05-numa-aware-memory-allocator, P07-numa-memory-migration-tool.

References

“The Linux Programming Interface” (Kerrisk) – Ch. 6
“Linux System Programming” (Love) – Ch. 6
“Operating Systems: Three Easy Pieces” (Arpaci-Dusseau) – Ch. 13

Key Insights

Sysfs is the ground truth; libnuma is the process-specific lens. Comparing both builds trust.

Summary

Linux exposes NUMA topology through sysfs and libnuma. Parsing cpulists correctly and recognizing cpuset restrictions are essential for accurate reporting. A robust explorer uses both sources and provides a deterministic fixture mode for testing.

Homework/Exercises to Practice the Concept

Parse the cpulist 0-3,8-11,16 and count how many CPUs it contains.
Compare cpulist vs cpumap for a node on your machine.
Capture a sysfs snapshot into a directory and test your parser offline.

Solutions to the Homework/Exercises

It contains 9 CPUs: 0,1,2,3,8,9,10,11,16.
Both should represent the same CPU set, but cpumap is bitmask-based.
Copy /sys/devices/system/node into fixtures/sysfs and point your tool at it.

3. Project Specification

3.1 What You Will Build

A CLI tool named numa-topology that:

Enumerates NUMA nodes on Linux.
Lists CPUs and memory per node.
Prints a distance matrix.
Shows cache sharing information per node.
Validates output against libnuma and numactl --hardware.
Supports deterministic reporting using a captured sysfs root.

Included: human-readable table output, JSON output, and validation mode. Excluded: kernel modifications, live performance tuning, or GUI visualization.

3.2 Functional Requirements

Node Enumeration: Discover nodeN directories in sysfs.
CPU Parsing: Parse cpulist into explicit CPU lists.
Memory Extraction: Read per-node memory totals and free values.
Distance Matrix: Parse and render the distance matrix.
Cache Summary: Show L3 cache sharing groups per node.
Libnuma Validation: Compare sysfs results with libnuma APIs.
numactl Comparison: Optionally validate against numactl --hardware.
JSON Output: Provide a --json flag with a stable schema.
Fixture Mode: Accept --sysfs-root <path> for deterministic testing.
Graceful Degradation: Detect UMA or missing topology and explain it.

3.3 Non-Functional Requirements

Performance: Runs in under 100 ms on typical servers.
Reliability: Handles missing files, empty nodes, and invalid cpulists.
Usability: Output is readable, consistent, and well-labeled.

3.4 Example Usage / Output

$ ./numa-topology --format=table
=== NUMA Topology Report ===
Nodes: 2
CPUs: 16

Node 0:
  CPUs: 0-7
  Memory: 32768 MB (28912 MB free)
  L3 Cache: 16 MB shared by CPUs 0-7
  Distance: [10, 21]

Node 1:
  CPUs: 8-15
  Memory: 32768 MB (30120 MB free)
  L3 Cache: 16 MB shared by CPUs 8-15
  Distance: [21, 10]

Distance Matrix:
        N0   N1
N0:     10   21
N1:     21   10

Validation: libnuma=OK, numactl=OK

3.5 Data Formats / Schemas / Protocols

Human-readable table format:

Node <id>:
  CPUs: <ranges>
  Memory: <total> MB (<free> MB free)
  L3 Cache: <size> MB shared by CPUs <ranges>
  Distance: [<d0>, <d1>, ...]

JSON schema (--json):

{
  "nodes": [
    {
      "id": 0,
      "cpus": [0,1,2,3,4,5,6,7],
      "memory_mb": {"total": 32768, "free": 28912},
      "l3_cache": {"size_mb": 16, "shared_cpus": [0,1,2,3,4,5,6,7]},
      "distance": [10,21]
    }
  ],
  "matrix": [[10,21],[21,10]],
  "validation": {"libnuma": "OK", "numactl": "OK"}
}

3.6 Edge Cases

System exposes only one NUMA node (UMA).
cpulist is empty due to cpuset restrictions.
Missing distance file (older kernels or limited virtualization).
Node present but memory is zero (memoryless CPU node).
Non-contiguous CPU ranges with gaps.

3.7 Real World Outcome

Your tool produces a trusted, repeatable topology report, and it flags mismatches between sysfs and libnuma. You can run it on production servers, inside containers, or against captured fixtures to confirm NUMA layout before performance tuning.

3.7.1 How to Run (Copy/Paste)

# Build
cc -O2 -Wall -lnuma -o numa-topology src/main.c

# Run on live system
./numa-topology --format=table

# Deterministic run using a captured sysfs tree
./numa-topology --sysfs-root fixtures/sysfs --format=json

3.7.2 Golden Path Demo (Deterministic)

$ ./numa-topology --sysfs-root fixtures/sysfs --format=table
Nodes: 2
Node 0: CPUs 0-7, Memory 32768 MB (28912 MB free)
Node 1: CPUs 8-15, Memory 32768 MB (30120 MB free)
Distance Matrix: [[10,21],[21,10]]
Validation: libnuma=OK

3.7.3 If CLI: Exact Terminal Transcript

$ ./numa-topology --format=table
=== NUMA Topology Report ===
Nodes: 2
CPUs: 16

Node 0:
  CPUs: 0-7
  Memory: 32768 MB (28912 MB free)
  Distance: [10, 21]

Node 1:
  CPUs: 8-15
  Memory: 32768 MB (30120 MB free)
  Distance: [21, 10]

Distance Matrix:
        N0   N1
N0:     10   21
N1:     21   10

Validation: libnuma=OK, numactl=OK

3.7.4 Failure Demo (Deterministic)

$ ./numa-topology --sysfs-root fixtures/bad_sysfs
ERROR: missing distance file for node0
EXIT: 2

Exit Codes:

0 success
1 usage error (bad flags)
2 data error (missing or malformed sysfs)
3 libnuma not available

4. Solution Architecture

4.1 High-Level Design

+--------------+   +---------------+   +------------------+
| Sysfs Reader |-->| Topology Model|-->| Report Renderer  |
+--------------+   +---------------+   +------------------+
        ^                  ^                    ^
        |                  |                    |
+--------------+      +---------------+   +---------------+
| Libnuma API  |----->| Validation     |   | JSON Encoder  |
+--------------+      +---------------+   +---------------+

4.2 Key Components

4.3 Data Structures (No Full Code)

typedef struct {
    int id;
    int *cpus;
    int cpu_count;
    long mem_total_mb;
    long mem_free_mb;
    int *distance; // array length = node_count
} NodeInfo;

typedef struct {
    NodeInfo *nodes;
    int node_count;
} Topology;

4.4 Algorithm Overview

Key Algorithm: Build Topology

Enumerate nodeN directories and sort by node ID.
For each node, parse cpulist, meminfo, and distance.
Build the distance matrix and validate matrix dimensions.
Query libnuma and compare node counts and CPU lists.
Render output in chosen format.

Complexity Analysis:

Time: O(N * C) where N is nodes and C is CPUs per node.
Space: O(N * C + N^2) for CPU lists and distance matrix.

5. Implementation Guide

5.1 Development Environment Setup

# Ubuntu/Debian
sudo apt-get install -y build-essential libnuma-dev

# Fedora/RHEL
sudo dnf install -y gcc make numactl-devel

5.2 Project Structure

numa-topology/
|-- src/
|   |-- main.c
|   |-- sysfs_reader.c
|   |-- cpulist_parser.c
|   |-- libnuma_validate.c
|   +-- render.c
|-- include/
|   +-- topology.h
|-- tests/
|   |-- test_cpulist.c
|   +-- fixtures/
|       +-- sysfs/
|-- Makefile
+-- README.md

5.3 The Core Question You’re Answering

“How do I discover and trust the real NUMA topology of a Linux machine?”

You are building a tool that turns opaque kernel data into an understandable map. The correctness of every performance decision depends on this map.

5.4 Concepts You Must Understand First

Stop and research these before coding:

NUMA nodes and distance matrix
- How are distances derived from firmware tables?
- Why are values relative instead of absolute?
- Book Reference: “Computer Architecture” (Hennessy & Patterson) Ch. 2
Linux sysfs layout
- What does /sys/devices/system/node/ represent?
- Which files are stable vs version-dependent?
- Book Reference: “The Linux Programming Interface” Ch. 6
libnuma behavior
- How does numa_node_to_cpus() interpret cpusets?
- What does numa_available() actually check?
- Book Reference: “Linux System Programming” Ch. 6

5.5 Questions to Guide Your Design

Will you treat sysfs as authoritative and libnuma as validation, or vice versa?
How will you parse cpulists and preserve the original ranges for display?
How will you present the distance matrix so it is readable at a glance?

5.6 Thinking Exercise

The cpulist puzzle

0-3,8-11,16,18-19

How many CPUs is that?
Which are contiguous ranges?
How would you encode this as a bitmask?

5.7 The Interview Questions They’ll Ask

“How does Linux expose NUMA topology to user space?”
“What is a NUMA distance matrix and how is it used?”
“Why might node distance be different across platforms?”
“How would you handle systems without NUMA support?”
“How do cpusets affect what your tool reports?”

5.8 Hints in Layers

Hint 1: Start with sysfs Read node*/cpulist, node*/meminfo, and node*/distance before using libnuma.

Hint 2: Add libnuma validation Use numa_node_to_cpus() to verify CPU lists.

Hint 3: Build a fixture mode Add a --sysfs-root option to run against captured data.

Hint 4: Compare with numactl Use numactl --hardware as a sanity check on real systems.

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (2-3 days)

Goals:

Parse cpulists and read meminfo.
Build an in-memory topology model.

Tasks:

Implement cpulist_parser.c with unit tests.
Read nodeN directories from sysfs and populate NodeInfo.

Checkpoint: Unit tests pass and a single-node system prints a valid report.

Phase 2: Distance Matrix + Validation (3-4 days)

Goals:

Parse distance files.
Compare with libnuma output.

Tasks:

Parse distance into an N x N matrix.
Add libnuma validation and report mismatches.

Checkpoint: Distances match numactl --hardware on a NUMA machine.

Phase 3: Output + Fixtures (3-4 days)

Goals:

Add JSON output and fixture mode.
Improve readability and error handling.

Tasks:

Implement JSON renderer with a stable schema.
Add --sysfs-root for deterministic testing.
Add clear exit codes and error messages.

Checkpoint: Golden fixture demo is deterministic and matches expected output.

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

cpulist parsing: 0-3,8-11,16 yields 9 CPUs.
single-node UMA: tool reports one node and omits matrix warnings.
missing distance file: tool exits with code 2 and error message.

6.3 Test Data

fixtures/sysfs/node0/cpulist: "0-3"
fixtures/sysfs/node0/distance: "10"
fixtures/sysfs/node0/meminfo: "MemTotal: 4096 MB"

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Compare with numactl --hardware to validate node CPU lists.
Use fixture mode to debug parsing without system variation.
Log raw sysfs files when validation fails.

7.3 Performance Traps

Excessive sysfs reads inside loops can slow the tool. Cache file contents in memory.

8. Extensions & Challenges

8.1 Beginner Extensions

Add a --summary flag that prints only node count and total memory.
Add a --list-cpus flag that prints CPU lists without distances.

8.2 Intermediate Extensions

Add cache topology output using /sys/devices/system/cpu/*/cache.
Add --compare-numactl mode that highlights differences.

8.3 Advanced Extensions

Add topology graph export in Graphviz DOT format.
Add support for cgroup v2 cpuset restrictions.

9. Real-World Connections

9.1 Industry Applications

Databases use topology reports to pin buffer pools and worker threads.
HPC clusters use them to allocate MPI ranks to local memory.

numactl – canonical CLI for NUMA discovery and policy setting.
hwloc – hardware locality library for full topology graphs.

9.3 Interview Relevance

Explain how to detect and use NUMA topology in performance-sensitive systems.
Discuss why ignoring NUMA can halve performance in memory-heavy services.

10. Resources

10.1 Essential Reading

“Computer Architecture, 5th Edition” (Hennessy & Patterson) – Ch. 2
“The Linux Programming Interface” (Kerrisk) – Ch. 6
“Linux System Programming” (Love) – Ch. 6

10.2 Video Resources

NUMA topology fundamentals (conference talk or lecture).
Linux performance tuning overview covering NUMA awareness.

10.3 Tools & Documentation

numactl – Reference output for validation.
lscpu – Quick sanity check for node counts.

P02: Memory Latency Microbenchmark – Use topology to pick local/remote pairs.
P03: Memory Bandwidth Benchmark – Interpret bandwidth results per node.

11. Self-Assessment Checklist

11.1 Understanding

I can explain what a NUMA node is and why distances are relative.
I can read a cpulist and convert it into explicit CPU IDs.
I can explain why sysfs and libnuma might disagree.

11.2 Implementation

All functional requirements are met.
JSON output matches the schema.
Fixture mode provides deterministic output.

11.3 Growth

I documented at least one topology anomaly I observed.
I can explain this tool in a performance tuning interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

Tool reads sysfs and prints nodes, CPUs, memory, and distances.
Distance matrix renders correctly for at least one system.
Handles single-node UMA systems gracefully.

Full Completion:

Adds JSON output and libnuma validation.
Provides deterministic fixture mode with tests.
Clear error messages and exit codes.

Excellence (Going Above & Beyond):

Cache topology integration.
Graphviz export and cpuset-aware reporting.