Project 2: Simple Wayland Compositor with wlroots

Build a minimal but usable Wayland compositor using wlroots that can display client windows, route input, and manage outputs.

Quick Reference

Attribute	Value
Difficulty	Level 5: Master
Time Estimate	1 month+
Main Programming Language	C (Alternatives: C++, Rust, Zig)
Alternative Programming Languages	C++, Rust, Zig
Coolness Level	Level 5: Pure Magic (Super Cool)
Business Potential	Level 4: The “Open Core” Infrastructure
Prerequisites	Project 1, event loops, C, basic graphics pipeline knowledge
Key Topics	Wayland server objects, DRM/KMS outputs, libinput, scene graph, damage tracking

1. Learning Objectives

By completing this project, you will:

Implement the server side of Wayland: globals, resources, and surface handling.
Configure outputs using wlroots and understand the DRM/KMS scanout pipeline.
Route input to the focused surface with correct seat semantics.
Build a scene graph and apply damage tracking to render only what changes.
Launch and use your compositor from a TTY as a real desktop session.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Compositor as Wayland Server (Globals, Resources, and State)

Fundamentals

A Wayland compositor is the server side of the protocol. It owns the display socket, advertises globals, and creates server-side resources when clients bind. Each client has its own objects, and the compositor tracks them as wl_resource structures. When a client creates a surface, the compositor stores its state: buffer, position, role, and damage regions. The compositor is also the policy engine: it decides where windows go, which surface has focus, and which input events are delivered. Unlike a client, the compositor must be correct for every client simultaneously, so its state management needs to be explicit and predictable.

Deep Dive into the concept

On the server side, the compositor starts by creating a wl_display and adding a socket. It then creates globals (wl_compositor, wl_shm, wl_seat, wl_output, xdg_wm_base, etc.). When a client connects and binds to a global, the compositor creates a wl_resource for that interface and stores it in per-client state. Each resource has an implementation (function pointers) that handle requests. For example, when a client calls wl_compositor.create_surface, the compositor creates a wlr_surface (a wlroots abstraction) and attaches it to a server-side resource. That surface will emit events whenever the client commits a buffer or changes state.

The compositor’s job is to translate protocol activity into internal state changes. A wl_surface commit does not immediately draw; it updates the server-side representation: buffer content, damage, and role-specific metadata. The compositor decides when to repaint (usually on a frame timer synchronized with outputs). This separation is why the compositor can enforce policy: it is not forced to render immediately, and it can ignore or delay updates to maintain stability.

Focus and seat handling are also server-side responsibilities. The compositor decides which surface gets keyboard focus and which surface receives pointer events. It must track the active seat, the currently focused surface, and any grabs (temporary overrides during move/resize). This is a security boundary: clients cannot read input unless the compositor explicitly forwards it. That is a fundamental design goal of Wayland.

Another critical aspect is object lifetime. When a client disconnects, the compositor must destroy all resources owned by that client. In wlroots, this is mostly automated, but you still need to understand the lifetime relationships: a toplevel owns a surface role, a surface owns a buffer, and surfaces can have subsurfaces. Destroying a surface should clean up its scene node and input focus. If you forget to clean up, you will have dangling pointers or orphaned nodes in the scene graph. That is why wlroots uses listeners and signals; you must register for destroy events and detach any associated data structures.

The compositor also needs to advertise supported protocols. wlroots provides a set of globals you can create: xdg-shell for windows, layer-shell for panels, and output management protocols. Each global has versioning rules. The compositor must choose which versions to expose, and clients will bind to the highest version they support. This is a compatibility contract. If you expose a protocol version that you do not fully implement, clients may crash or misbehave. So you should pick conservative versions and document them.

A practical mental model: The compositor is a server that receives a stream of state updates from clients, stores those updates in a scene graph, and presents a coherent view to the user. It is not a simple renderer. It is a protocol server, a scheduler, and a policy engine all at once. This is why building a compositor is one of the most demanding projects in the Wayland ecosystem.

A final nuance is version negotiation on the server side. When you expose a global at version N, you are promising that every request up to N is correctly implemented. If you later upgrade and expose version N+1 without full support, clients will break in hard-to-debug ways. Good compositors keep protocol versions conservative and document them, which is why wlroots-based compositors often clamp versions instead of exposing the maximum supported by the library.

How this fit on projects

This concept is used everywhere: Section 3 specification, Section 4 architecture, Section 5 implementation phases, and Section 7 debugging. You cannot build a compositor without understanding server-side objects and lifetimes.

Definitions & key terms

wl_display -> server-side Wayland display object
global -> advertised interface clients can bind to
wl_resource -> server-side representation of a client object
surface role -> protocol-defined role (xdg_toplevel, popup, layer)
seat -> group of input devices (keyboard, pointer, touch)
focus -> which surface receives input

Mental Model Diagram (ASCII)

Client                                     Compositor
------                                     ----------
wl_surface.commit() -------------------->  wlr_surface (server)
                                            | updates state
                                            v
                                      scene graph node
                                            |
                                            v
                                      renderer -> output

How It Works (Step-by-Step)

Create wl_display and add socket.
Create globals (wl_compositor, wl_shm, wl_seat, wl_output, xdg_wm_base).
Accept client connections; create wl_resource objects.
On surface commit, update surface state and scene graph.
On input events, determine focused surface and forward events.
Render scene graph to outputs on frame ticks.

Invariants:

Each surface has exactly one role.
Destroy events must clean up associated scene nodes.
Input events must only go to the focused surface.

Failure modes:

Missing globals -> clients fail to start.
Dangling surface nodes -> crashes or visual artifacts.
Incorrect focus -> input leaks to wrong client.

Minimal Concrete Example

struct wl_display *display = wl_display_create();
const char *socket = wl_display_add_socket_auto(display);
struct wlr_backend *backend = wlr_backend_autocreate(display, NULL);
struct wlr_renderer *renderer = wlr_renderer_autocreate(backend);
struct wlr_allocator *alloc = wlr_allocator_autocreate(backend, renderer);

struct wlr_compositor *compositor =
    wlr_compositor_create(display, renderer);
struct wlr_xdg_shell *xdg_shell = wlr_xdg_shell_create(display, 3);

Common Misconceptions

“The compositor just draws windows.” -> It also enforces policy and security.
“wlroots hides everything.” -> You still manage state and focus explicitly.
“If a client disconnects, wlroots cleans everything.” -> You must handle destroy signals.

Check-Your-Understanding Questions

Why does the compositor own the display socket?
What is the difference between wl_resource and wlr_surface?
Why must a surface have only one role?

Check-Your-Understanding Answers

Because it is the server; clients connect to it.
wl_resource is protocol-level; wlr_surface is wlroots’ abstraction.
Roles define behavior; multiple roles would create conflicting semantics.

Real-World Applications

Desktop environments (Sway, Weston, KWin) are all compositors.
Embedded systems often run custom compositors for kiosks.

Where You’ll Apply It

In this project: Section 3.1-3.3, Section 4.1-4.2, Section 5.10, Section 7.1.
Also used in: P03 Custom Protocol for server-side protocol handlers.

References

wlroots documentation: tinywl example
Wayland server protocol documentation

Key Insights

A compositor is a protocol server and policy engine, not just a renderer.

Summary

You must manage globals, resources, and surface lifetimes while enforcing input and window policy.

Homework/Exercises to Practice the Concept

Read tinywl and identify where globals are created.
Add logging for surface create/destroy events.
Trace how an xdg_toplevel maps to a scene node.

Solutions to the Homework/Exercises

Look for wlr_compositor_create and wlr_xdg_shell_create in tinywl.
Add listeners on wlr_surface events and print pointer values.
Follow the xdg_surface map event to where a wlr_scene_tree node is created.

2.2 Output Pipeline: DRM/KMS, Render Loop, and Presentation

Fundamentals

Outputs are monitors connected to the GPU. On Linux, the compositor talks to the kernel through DRM/KMS. wlroots hides much of the boilerplate, but you still need to understand that each output has a mode (resolution and refresh), a framebuffer, and a presentation loop. The compositor renders the scene to a buffer and performs an atomic commit, which swaps the buffer at the next vblank. This prevents tearing. Understanding this pipeline is necessary to reason about performance and correctness. You should also be aware that outputs can be hotplugged, can have different scale factors, and can require mode negotiation before any frame is visible, so your compositor must treat outputs as dynamic resources rather than fixed globals.

Deep Dive into the concept

DRM/KMS is the kernel subsystem for display. A physical connector (HDMI, DP, etc.) is associated with a CRTC (scanout engine) and a plane. The compositor picks a mode (e.g., 1920x1080@60Hz) and sets up a buffer to scan out. Modern compositors use atomic modesetting: they build a property set for connectors, CRTCs, and planes, and then commit it as a single atomic transaction. This makes updates consistent and tear-free.

In wlroots, outputs are represented by wlr_output. When a display is connected, wlroots emits a new_output event. You register a listener, enable the output, set its mode, and commit it. You then register a frame listener so you know when to render. On each frame, you begin a render pass, draw the scene graph, and end the render pass. wlroots handles buffer allocation through a wlr_allocator, and it provides a renderer abstraction (usually GLES2). Even though wlroots handles GPU details, you still need to manage output timing and damage. If you draw every frame regardless of changes, you waste GPU cycles. If you skip frames incorrectly, you may miss updates.

Presentation timing is driven by vblank. The kernel signals when a frame is presented; wlroots passes this to you through frame events. You should render in response to these events. If you are using wlr_scene, you can use wlr_scene_output_commit to handle damage tracking and output commits automatically. But you should still understand that under the hood, the compositor is creating GPU textures from client buffers (DMA-BUF or wl_shm) and compositing them into a single output buffer. If a client buffer is already a GPU texture (DMA-BUF), the compositor can avoid copies. If it is wl_shm, the compositor must upload it to a texture. That difference matters for performance but not for correctness.

Multi-output handling is another layer. Each output has its own mode and scale factor. The compositor must track output scale and transform (rotation). When a window moves between outputs or spans outputs, you must render it appropriately on each output. wlroots provides output layout utilities to compute global positions. In a minimal compositor, you can use a single output, but you should still design your data structures to handle multiple outputs so you can extend later.

Another subtlety is output hotplug. When a monitor is unplugged, you receive a destroy event for that output. You must remove it from your layout and avoid rendering to it. If you keep stale pointers, your compositor will crash. wlroots provides wlr_output_layout to manage these details, but you still have to connect and disconnect outputs in response to events.

The output pipeline also explains why compositors are careful about synchronization. When you commit a frame, you are implicitly promising that the buffer will remain valid until the kernel scans it out. If you reuse or destroy that buffer too early, the result can be flicker or garbage on screen. wlroots shields you from many of these pitfalls, but a good mental model helps when debugging black screens, slow frame rates, or odd timing glitches.

Understanding the output pipeline is crucial for the compositor project because it is the only part that actually touches hardware. It also explains why a compositor must run on a TTY with access to /dev/dri. If you launch it inside another compositor (nested), wlroots uses a different backend, but the same abstract output model applies. The backend abstraction is useful for testing: you can run a nested backend to debug your compositor without taking over the GPU, then switch to DRM/KMS for real hardware. Knowing this helps you separate hardware-specific assumptions from compositor logic.

How this fit on projects

You will use this in Section 3.2 (functional requirements), Section 4 architecture, and Section 5.10 implementation phases. The frame loop drives your compositor’s main rendering flow.

Definitions & key terms

DRM/KMS -> kernel subsystem for display
CRTC -> scanout engine
plane -> hardware layer that can scan out a buffer
vblank -> vertical blank interval used for tear-free swaps
atomic commit -> single transaction update of display state
wlr_output -> wlroots output abstraction

Mental Model Diagram (ASCII)

Client buffers -> textures -> compositor render pass -> output framebuffer
                                          |
                                          v
                                   DRM atomic commit
                                          |
                                          v
                                   Physical display

How It Works (Step-by-Step)

wlroots detects a new output and emits a new_output event.
Compositor enables the output and sets a mode.
On each frame event, compositor begins a render pass.
Scene graph is drawn into an output buffer.
Buffer is committed via DRM/KMS at vblank.

Invariants:

Outputs must be enabled and have a mode before rendering.
Render only when output is ready for a frame.

Failure modes:

Not setting a mode -> blank screen.
Rendering without output enabled -> crash or no output.
Ignoring hotplug -> stale pointers and crashes.

Minimal Concrete Example

static void handle_new_output(struct wl_listener *listener, void *data) {
    struct wlr_output *output = data;
    wlr_output_set_mode(output, wlr_output_preferred_mode(output));
    wlr_output_enable(output, true);
    wlr_output_commit(output);
}

Common Misconceptions

“The compositor draws whenever it wants.” -> It should align with output frame events.
“Atomic commit is optional.” -> Modern KMS expects it for consistent updates.

Check-Your-Understanding Questions

Why does rendering need to align with vblank?
What happens if you forget to enable an output?
How does a wl_shm client buffer become a GPU texture?

Check-Your-Understanding Answers

To avoid tearing and synchronize presentation timing.
The output stays blank and clients appear not to render.
The compositor uploads the shared memory to a texture during render.

Real-World Applications

Desktop compositors must handle hotplug, scaling, and multiple outputs.
Embedded systems often lock to a single output but still use KMS.

Where You’ll Apply It

In this project: Section 4.1 High-Level Design, Section 5.10 Phase 2, Section 7.3 Performance Traps.
Also used in: P04 Layer Shell Panel for output scale handling.

References

DRM/KMS kernel documentation
wlroots output and backend docs

Key Insights

The output pipeline is the compositor’s bridge to hardware; correctness here determines whether anything appears on screen.

Summary

Outputs are managed through DRM/KMS; render on frame events and commit atomically to avoid tearing.

Homework/Exercises to Practice the Concept

Print the list of available output modes on startup.
Add logging on output hotplug events.
Measure frame time by logging timestamps on frame events.

Solutions to the Homework/Exercises

Use wlr_output_modes to iterate and print width/height/refresh.
Add listeners for output destroy events and log them.
Store last frame time and print the delta each frame.

2.3 Input Routing: libinput, xkbcommon, and Focus

Fundamentals

The compositor is responsible for input routing. It reads events from devices via libinput and decides which surface should receive them. Keyboard input goes to the focused surface; pointer motion determines which surface is under the cursor. The compositor must track focus, handle keybindings (like Alt+Tab or window move), and manage grabs for dragging and resizing. This is a security boundary: clients never see input unless the compositor sends it. A compositor also decides how multiple devices are combined into a single seat, which affects how laptops, external keyboards, and touchpads behave as a unified input source. This unified seat model simplifies multi-device setups while keeping policy centralized.

Deep Dive into the concept

libinput abstracts raw evdev devices and provides high-level events: pointer motion, button presses, scroll events, key presses, and touch events. In wlroots, these events are delivered through wlr_input_device and wlr_seat. The seat represents a logical group of devices. A compositor typically creates one seat and attaches keyboards and pointers to it. Each input event is routed through the seat to a focused surface. For keyboards, focus is explicit: wlr_seat_set_keyboard and wlr_seat_keyboard_notify_enter determine which surface receives key events. For pointers, focus is implicit: when the cursor moves, the compositor finds the surface under the cursor and sends enter/motion events.

Focus policy is your job. Many compositors use “focus follows mouse”; others require click-to-focus. In this project, choose one policy and implement it explicitly. If you do not manage focus, clients will not receive input. You must also consider grabs. When the user holds Alt and drags a window, the compositor should keep sending pointer motion to the grabbed surface even if the cursor leaves it. This requires an explicit grab state. wlroots has helpers for this, but you need to structure your state machine to track active grabs.

Keyboard input also involves xkbcommon. You need to load a keymap and interpret keycodes into keysyms. wlroots provides wlr_keyboard and xkb helpers. Without this, you will see raw keycodes. For a minimal compositor, you can use the default keymap and handle a few shortcuts. For example, Alt+Escape to exit, or Alt+Tab to change focus. The keymap is essential for consistent behavior across layouts.

Another subtlety is client-specific cursor surfaces. Clients can set their own cursor image by providing a surface. The compositor must allow this by calling wlr_seat_pointer_notify_set_cursor. If you do not support it, clients will still work but may not set custom cursors. This is not required for a minimal compositor, but it is a good extension.

Input routing is also tied to surface lifetimes. If the focused surface is destroyed, you must clear focus and choose a new surface. If you keep a pointer to a destroyed surface, you will crash or send events to invalid resources. wlroots uses signals to notify you when surfaces are destroyed; you must listen and update focus accordingly. Touch input adds another layer: touches can be associated with specific surfaces and must be tracked across motion and release events. Even if you do not implement touch now, the mental model of per-contact focus will help you later.

Input systems also have to cope with key repeat, modifier state, and lock state (Caps Lock, Num Lock). wlroots exposes helpers for this, but you should understand that the compositor is the authority for the keyboard state seen by clients. If you fail to forward modifier state correctly, shortcuts inside apps will behave strangely. This is another reason to keep input handling code small and explicit in a minimal compositor. It also affects accessibility features like sticky keys and repeat delays.

How this fit on projects

Input routing drives Section 3.2 (functional requirements), Section 4.2 (components), Section 5.10 (implementation phases), and Section 7 (common pitfalls). It is also central to user experience.

Definitions & key terms

libinput -> library for handling input devices
seat -> logical group of input devices
focus -> which surface receives keyboard input
grab -> temporary capture of input events
xkbcommon -> library for keyboard mapping

Mental Model Diagram (ASCII)

Input device -> libinput -> wlr_seat -> focus policy -> client surface
                   |                     |
                   |                     v
                   |               key events
                   v
              pointer motion

How It Works (Step-by-Step)

Create a wlr_seat.
On new keyboard device, attach to seat and set keymap.
On pointer motion, find surface under cursor.
If click-to-focus, set focus on click; otherwise on hover.
Send pointer enter/motion events to that surface.
Send key events to focused surface.

Invariants:

Only one surface has keyboard focus at a time.
Pointer focus changes based on cursor position or clicks.

Failure modes:

No keymap -> broken keyboard input.
Focus not set -> clients do not receive input.
Grab state not cleared -> stuck move/resize.

Minimal Concrete Example

struct wlr_seat *seat = wlr_seat_create(display, "seat0");

static void handle_key(struct wl_listener *listener, void *data) {
    struct wlr_keyboard_key_event *event = data;
    // translate keycode with xkbcommon, handle compositor shortcuts
    wlr_seat_set_keyboard(seat, keyboard);
    wlr_seat_keyboard_notify_key(seat, event->time_msec,
        event->keycode, event->state);
}

Common Misconceptions

“Clients can read input directly.” -> Only the compositor routes input.
“Focus is automatic.” -> You must decide and set it.

Check-Your-Understanding Questions

Why is focus a security boundary in Wayland?
What is a grab and when is it used?
How does xkbcommon help with key events?

Check-Your-Understanding Answers

Because only focused clients receive input, preventing snooping.
A grab captures input during operations like drag or resize.
It maps hardware keycodes to layout-aware keysyms.

Real-World Applications

Window managers implement custom focus and keybinding policies.
Gaming compositors use grabs for relative pointer input.

Where You’ll Apply It

In this project: Section 3.2, Section 4.2, Section 5.10 Phase 2, Section 7.1.
Also used in: P04 Layer Shell Panel for input focus on panels.

References

libinput documentation
xkbcommon documentation
wlroots seat and input docs

Key Insights

Input is not just events; it is policy, and the compositor owns that policy.

Summary

You must explicitly manage focus, handle keymaps, and route events through the seat.

Homework/Exercises to Practice the Concept

Implement click-to-focus and compare to focus-follows-mouse.
Add a keybinding to quit the compositor.
Log pointer enter/leave events for surfaces.

Solutions to the Homework/Exercises

Only call wlr_seat_keyboard_notify_enter on button press.
Check for Alt+Escape and set a quit flag.
Add listeners on surface events and log enter/leave transitions.

2.4 Scene Graph, Damage Tracking, and Compositing

Fundamentals

The compositor maintains a scene graph representing windows, layers, and their positions. When a surface updates, only the damaged regions need re-rendering. Damage tracking is a performance feature: it reduces the amount of pixels you draw. wlroots provides a scene API that manages nodes for surfaces and handles damage calculation. You still need to build the graph, order nodes, and decide which layers are on top. A clear layering model also makes it easy to add panels and overlays later without rewriting your window management logic. Damage is tracked per output and per surface, so correct coordinate transforms matter.

Deep Dive into the concept

A scene graph is a tree of nodes representing visual elements. Each node has a position, size, and possibly a surface. When you create a window, you create a scene node and attach the surface. When you move the window, you update the node’s position. The graph order determines z-order. wlroots’ wlr_scene organizes nodes and computes damage regions automatically when surfaces update. This is a major simplification compared to older compositors that had to compute damage rectangles manually.

Damage tracking works by comparing the current and previous states of surfaces. When a client commits a new buffer, it may include a damage region. The compositor propagates that region into output space and determines which parts of the screen need to be redrawn. If you ignore damage, you must redraw the entire screen on every frame, which wastes GPU and CPU. wlroots can optimize this: wlr_scene_output_commit renders only damaged areas.

However, you must still understand how damage interacts with movement and stacking. When a window moves, you must damage both the old and new regions, because the old region must be repainted with whatever is behind it. wlroots handles this when you move the scene node, but you must ensure you do move nodes through the API (not by hacking coordinates in your own data). Similarly, when the order changes (e.g., focus brings a window to the front), the newly exposed area needs redraw. Scene graphs handle this by re-evaluating z-order and re-damaging relevant nodes.

Layer management is another key aspect. Compositors typically have layers: background, normal windows, panels, and overlays. With wlroots, you can create separate scene trees for each layer. When you implement layer-shell in later projects, your panel will be in a top layer. For now, your compositor can implement just a background and a window layer, but you should structure it so that more layers can be added.

Finally, the rendering backend. wlroots uses a renderer abstraction that can be backed by OpenGL ES or Pixman. You typically do not call OpenGL directly; you use wlr_renderer to draw surfaces. wlr_scene handles much of this. But understanding that the scene graph eventually becomes a sequence of draw calls is important. It explains why damage tracking improves performance and why incorrect damage can lead to visual artifacts. In practice, you should also be prepared to debug rendering with simple visual aids, such as drawing rectangles around damage regions or coloring the background differently when a frame is redrawn.

Another practical concept is occlusion. If one window fully covers another, the covered window does not need to be drawn. Some compositors implement occlusion culling to skip hidden surfaces. wlroots scene does not do full occlusion culling by default, but understanding the idea helps you reason about performance. When you later add panels or overlays, you will see how layer ordering affects what is visible and therefore what needs to be drawn. Even a simple debug overlay helps validate damage regions.

How this fit on projects

This concept is used in Section 3.2, Section 4.1-4.2, and Section 5.10. It will also carry forward to Project 4 (panels) where layer-shell surfaces must be placed into specific layers.

Definitions & key terms

scene graph -> tree of visual nodes representing surfaces and layers
damage -> region that needs redraw
wlr_scene -> wlroots scene API
z-order -> stacking order of windows
layer -> grouping of surfaces (background, normal, overlay)

Mental Model Diagram (ASCII)

Root scene
  |
  +-- Background layer
  +-- Window layer
  |     +-- Surface A (x=10,y=10)
  |     +-- Surface B (x=400,y=200)
  +-- Overlay layer

How It Works (Step-by-Step)

Create a root wlr_scene.
Create scene trees for background and windows.
When a surface appears, create a scene node under the window tree.
On surface commit, wlr_scene marks damage.
On output frame, render only damaged regions.

Invariants:

Scene node positions define where surfaces appear.
Damage must include both old and new regions when moving.

Failure modes:

Not updating scene node positions -> windows do not move.
Disabling damage -> high CPU/GPU usage.

Minimal Concrete Example

struct wlr_scene *scene = wlr_scene_create();
struct wlr_scene_tree *layer = wlr_scene_tree_create(&scene->node);
struct wlr_scene_surface *node =
    wlr_scene_surface_create(layer, xdg_surface->surface);

Common Misconceptions

“I must redraw everything each frame.” -> Damage tracking avoids this.
“Scene graphs are only for 3D.” -> They are useful for 2D window compositing.

Check-Your-Understanding Questions

Why must the old window region be damaged when a window moves?
How does wlr_scene know which region changed?
What happens if you ignore client-provided damage?

Check-Your-Understanding Answers

Because the old region must be repainted with whatever is behind it.
It tracks surface commits and node transforms.
You redraw more than necessary, reducing performance.

Real-World Applications

Modern compositors use scene graphs to manage complex layering.
Damage tracking is essential for battery life on laptops.

Where You’ll Apply It

In this project: Section 4.1 High-Level Design, Section 5.10 Phase 2, Section 7.3 Performance Traps.
Also used in: P04 Layer Shell Panel and P03 Custom Protocol for custom surfaces.

References

wlroots scene API documentation
Weston compositor architecture notes

Key Insights

A scene graph is the compositor’s internal model of the desktop; damage tracking is how it stays fast.

Summary

Use wlr_scene to manage windows, and rely on damage tracking to minimize redraw work.

Homework/Exercises to Practice the Concept

Implement a background scene node and toggle its color.
Move a window and log the damage rectangles.
Add a simple overlay rectangle in a higher layer.

Solutions to the Homework/Exercises

Create a background node and redraw on a timer.
Use wlr_scene_output_damage and log regions.
Create a new scene tree with higher z-order and attach a surface.

3. Project Specification

3.1 What You Will Build

A minimal Wayland compositor that:

Starts a Wayland display server
Handles outputs and renders a background
Accepts xdg-shell clients and displays their windows
Routes keyboard and pointer input to the focused window
Supports moving windows with a modifier + mouse drag

Excluded:

Advanced effects (blur, shadows)
Full configuration system
Tiling layout (floating windows only)

3.2 Functional Requirements

Wayland server: Create a wl_display, socket, and required globals.
Output management: Enable at least one output and render a background.
Client handling: Accept xdg-shell clients and display their surfaces.
Input routing: Handle pointer motion, clicks, and keyboard input; implement focus policy.
Window movement: Allow moving windows with Alt+drag.

3.3 Non-Functional Requirements

Performance: Idle CPU under 5 percent with a static desktop.
Reliability: No crashes when clients connect/disconnect.
Usability: Basic focus and window movement works predictably.

3.4 Example Usage / Output

$ ./my-compositor
[Compositor] socket=wayland-1
[Output] HDMI-A-1 1920x1080@60 enabled
[Input] keyboard=AT Translated Set 2
[Input] pointer=Logitech USB Mouse
[Client] xdg_toplevel mapped: foot

3.5 Data Formats / Schemas / Protocols

Wayland core protocol (wl_compositor, wl_shm, wl_seat, wl_output)
xdg-shell protocol (xdg_wm_base, xdg_toplevel)

3.6 Edge Cases

Client disconnects while focused -> focus should move to another client.
Output hotplug -> remove output from layout safely.
Multiple configures in quick succession -> only latest size used.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

cc -O2 -Wall -o my-compositor main.c \
  $(pkg-config --cflags --libs wlroots wayland-server)

# Run from TTY
./my-compositor

3.7.2 Golden Path Demo (Deterministic)

Single output: 1920x1080
Background color: #202020
Keybinding: Alt+Esc to exit

3.7.3 If CLI: Exact Terminal Transcript

$ ./my-compositor
[Compositor] Starting on wayland-1
[Output] HDMI-A-1 1920x1080@60 enabled
[Input] seat0 keyboard
[Input] seat0 pointer
[Compositor] Ready

Exit codes:

0 on clean exit
1 if backend fails to initialize

3.7.4 If GUI / Desktop

What you will see:

A fullscreen background
A movable cursor
Windows appearing when you launch clients

ASCII wireframe:

+--------------------------------------------------------------+
|  background (#202020)                                       |
|                                                              |
|   +------------------+    +------------------+               |
|   | terminal         |    | browser          |               |
|   | user@host:~$     |    | [web content]    |               |
|   +------------------+    +------------------+               |
|                                                              |
+--------------------------------------------------------------+

3.7.5 Failure Demo (Deterministic)

$ ./my-compositor
[Error] Could not open DRM device /dev/dri/card0
[Hint] add user to video group
[Exit] code=1

4. Solution Architecture

4.1 High-Level Design

+----------------------------+
| Wayland Server (wl_display)|
+-------------+--------------+
              |
              v
+----------------------------+
| wlroots Backend (DRM)      |
+-------------+--------------+
              |
              v
+----------------------------+
| Scene Graph + Renderer     |
+-------------+--------------+
              |
              v
+----------------------------+
| Input Routing (seat)       |
+----------------------------+

4.2 Key Components

4.3 Data Structures (No Full Code)

struct view {
    struct wlr_xdg_surface *xdg_surface;
    struct wlr_scene_tree *scene_node;
    int x, y;
};

struct server {
    struct wl_display *display;
    struct wlr_backend *backend;
    struct wlr_renderer *renderer;
    struct wlr_allocator *allocator;
    struct wlr_scene *scene;
    struct wlr_output_layout *layout;
    struct wlr_seat *seat;
    struct view *focused;
};

4.4 Algorithm Overview

Key Algorithm: Focus and Move

On pointer button press, pick surface under cursor.
Set it as focused and bring to front.
If modifier is held, enter move grab.
On motion, update view position.

Complexity Analysis:

Hit testing: O(n) over visible surfaces
Rendering: O(k) over damaged surfaces

5. Implementation Guide

5.1 Development Environment Setup

sudo apt install build-essential pkg-config wlroots-dev libxkbcommon-dev

5.2 Project Structure

my-compositor/
|-- src/
|   |-- main.c
|   |-- server.c
|   |-- input.c
|   |-- output.c
|   `-- view.c
|-- README.md
`-- Makefile

5.3 The Core Question You’re Answering

“How does a compositor turn client buffers into a secure, interactive desktop?”

5.4 Concepts You Must Understand First

Wayland server object lifecycles
Output pipeline and frame timing
Input focus and seat semantics
Scene graph and damage tracking

5.5 Questions to Guide Your Design

What focus policy will you implement?
How will you represent windows in your scene graph?
How will you handle output hotplug and scaling?

5.6 Thinking Exercise

Trace a complete input event: from mouse move to pointer focus to client event delivery.

5.7 The Interview Questions They’ll Ask

Why is the compositor also the window manager in Wayland?
How does a compositor enforce input isolation?
What is damage tracking and why does it matter?

5.8 Hints in Layers

Hint 1: Start from tinywl Use tinywl as a reference but rebuild structure yourself.

Hint 2: Use wlr_scene It simplifies damage and rendering.

Hint 3: Focus policy Set focus on click to avoid surprising behavior.

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (1 week)

Goals:

Start wlroots backend
Enable one output

Tasks:

Create wl_display and backend.
Enable output and commit mode.

Checkpoint: Background visible on screen.

Phase 2: Core Functionality (2-3 weeks)

Goals:

Handle xdg-shell clients
Render windows
Route input

Tasks:

Create xdg_shell and map surfaces into scene.
Implement pointer focus and keyboard focus.
Add move grab with modifier key.

Checkpoint: You can launch a terminal and move it.

Phase 3: Polish and Edge Cases (1 week)

Goals:

Handle client disconnects
Handle output hotplug

Tasks:

Remove views on destroy.
Add output destroy listeners.

Checkpoint: No crashes with connect/disconnect.

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

Client disconnect: view removed, no dangling pointers.
Alt-drag move: window follows cursor without jitter.
Focus change: key events go to active window only.

6.3 Test Data

- Output: 1920x1080
- Window positions: (10,10), (400,200)

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Use verbose logging for surface map/unmap events.
Run clients with WAYLAND_DEBUG=1 to inspect protocol traffic.

7.3 Performance Traps

Redrawing every frame without damage -> high GPU usage.

8. Extensions & Challenges

8.1 Beginner Extensions

Add a solid-color wallpaper node.
Add a keybinding to cycle focus.

8.2 Intermediate Extensions

Implement window resize with mouse grab.
Add basic window decorations.

8.3 Advanced Extensions

Add multi-output support with output layout.
Implement layer-shell support for panels.

9. Real-World Connections

9.1 Industry Applications

Sway, Wayfire, and Hyprland are all wlroots-based compositors.

tinywl: minimal wlroots compositor example
Sway: production tiling compositor built on wlroots

9.3 Interview Relevance

Explain the responsibilities of a display server.
Discuss input routing and security in Wayland.

10. Resources

10.1 Essential Reading

“The Wayland Book” by Drew DeVault - compositor sections
wlroots documentation and examples

10.2 Video Resources

Wayland and wlroots talks (Wayland Conference)

10.3 Tools & Documentation

weston-terminal: simple client for testing
wayland-info: inspect globals

11. Self-Assessment Checklist

11.1 Understanding

I can describe how a compositor handles globals and resources.
I understand the output pipeline and frame timing.
I can explain how focus and input routing work.

11.2 Implementation

Clients appear and render correctly.
Input works and focus changes properly.
No crashes on client disconnect.

11.3 Growth

I can describe at least one policy decision in my compositor.
I can compare wlroots abstractions to raw Wayland server APIs.

12. Submission / Completion Criteria

Minimum Viable Completion:

Compositor launches and shows background.
Clients connect and display windows.
Keyboard and pointer input are functional.

Full Completion:

All minimum criteria plus:
Window move implemented.
Clean shutdown on hotplug and client disconnect.

Excellence (Going Above & Beyond):

Multi-monitor support with output layout.
Basic window decorations and resize handling.