Project 3: Custom Wayland Protocol Extension
Project 3: Custom Wayland Protocol Extension
Project Overview
| Attribute | Value |
|---|---|
| Difficulty | Advanced (Level 3) |
| Time Estimate | 1-2 weeks |
| Programming Language | C (also: Rust, C++, Zig) |
| Knowledge Area | Graphics, Protocol Design |
| Main Book | โThe Wayland Bookโ by Drew DeVault |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Business Potential | Resume Gold |
What youโll build: Define and implement a custom Wayland protocol (XML) that extends functionalityโfor example, a โscreenshotโ protocol or a โsystem trayโ protocolโwith both client and compositor sides.
Why it teaches protocols: Wayland is extensible through XML protocol definitions that generate C code. Understanding this mechanism demystifies how xdg-shell, wlr-layer-shell, and other protocols work.
Learning Objectives
By completing this project, you will be able to:
- Design Wayland protocols - Write XML protocol definitions with interfaces, requests, events, and arguments
- Use wayland-scanner - Generate client and server code from protocol definitions
- Implement protocol servers - Create the compositor-side implementation of your protocol
- Implement protocol clients - Write applications that use your custom protocol
- Handle versioning - Design protocols with forward/backward compatibility
- Understand existing protocols - Read and understand xdg-shell, layer-shell, and other extensions
The Core Question Youโre Answering
โHow does Waylandโs extensibility work, and how can I add new capabilities to the display server without modifying the core protocol?โ
Wayland was designed to be minimal and extensible. The core protocol (wl_display, wl_surface, wl_buffer) does the minimum needed for display. Everything elseโwindows, popups, panels, screenshotsโis an extension.
By building a custom protocol, youโll discover:
- Protocol-first design: You define the interface before implementing it
- Code generation: wayland-scanner turns XML into C code
- Object lifecycle: How objects are created, used, and destroyed
- Error handling: How to handle invalid client requests
- Versioning: How to evolve protocols without breaking clients
Deep Theoretical Foundation
1. Wayland Protocol Structure
Wayland protocols are defined in XML and have a specific structure:
<protocol name="my_protocol">
<copyright>...</copyright>
<interface name="my_interface" version="1">
<description summary="short description">
Longer description of what this interface does.
</description>
<!-- Requests: Client โ Server -->
<request name="do_something">
<description summary="perform an action"/>
<arg name="param" type="uint"/>
</request>
<!-- Events: Server โ Client -->
<event name="something_happened">
<description summary="notification of result"/>
<arg name="result" type="int"/>
</event>
<!-- Enums: Named constants -->
<enum name="status">
<entry name="ok" value="0"/>
<entry name="error" value="1"/>
</enum>
</interface>
</protocol>
2. Argument Types
Wayland supports these argument types:
WAYLAND ARGUMENT TYPES
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Type | Wire Format | C Type | Purpose
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
int | 32-bit signed | int32_t | Signed integers
uint | 32-bit unsigned| uint32_t | Unsigned integers, enums
fixed | 24.8 fixed pt | wl_fixed_t | Floating point (coordinates)
string | length + UTF-8 | const char * | Text strings
object | 32-bit ID | struct wl_* | Reference to existing object
new_id | 32-bit ID | struct wl_* | Create new object
array | length + data | struct wl_array| Binary data
fd | SCM_RIGHTS | int | File descriptor
Special modifiers:
interface="wl_surface" - Specifies expected interface for object/new_id
allow-null="true" - Argument can be null
enum="status" - Use enum values
3. Protocol Flow Example
Hereโs how a screenshot protocol might work:
SCREENSHOT PROTOCOL FLOW
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Client Compositor
โ โ
โ 1. Bind to screenshot_manager โ
โ (wl_registry.bind) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโบ โ
โ โ
โ 2. screenshot_manager.capture_output โ
โ (request with new_id, output) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโบ โ
โ โ Creates screenshot_buffer
โ โ Captures output to buffer
โ โ
โ 3. screenshot_buffer.ready โ
โ (event with fd, dimensions) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ 4. mmap(fd) and read pixels โ
โ โ
โ 5. screenshot_buffer.destroy โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโบ โ
โ โ
โ OR on error: โ
โ โ
โ 3. screenshot_buffer.failed โ
โ (event with reason string) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
4. wayland-scanner Code Generation
The scanner generates two sets of code:
wayland-scanner OUTPUTS
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Input: my-protocol.xml
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Client-side code (wayland-scanner client-header, private-code) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ // Proxy structure (client's view of the object) โ
โ struct my_interface; โ
โ โ
โ // Request functions (client calls these) โ
โ void my_interface_do_something(struct my_interface *obj, ...); โ
โ โ
โ // Listener structure (client implements these) โ
โ struct my_interface_listener { โ
โ void (*something_happened)(void *data, ...); โ
โ }; โ
โ โ
โ // Add listener โ
โ void my_interface_add_listener(struct my_interface *obj, โ
โ const struct my_interface_listener *,
โ void *data); โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Server-side code (wayland-scanner server-header, private-code) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ // Resource structure (server's view of the object) โ
โ struct wl_resource; โ
โ โ
โ // Implementation structure (server implements these) โ
โ struct my_interface_interface { โ
โ void (*do_something)(struct wl_client *client, โ
โ struct wl_resource *resource, ...); โ
โ }; โ
โ โ
โ // Send event functions (server calls these) โ
โ void my_interface_send_something_happened(struct wl_resource *, ...);
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Complete Project Specification
Example Protocol: Screenshot Manager
Weโll implement a screenshot protocol as our example:
<?xml version="1.0" encoding="UTF-8"?>
<protocol name="screenshot">
<copyright>
Copyright ยฉ 2025 Your Name
Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction...
</copyright>
<interface name="screenshot_manager" version="1">
<description summary="screenshot capture interface">
This protocol allows clients to request screenshots of outputs.
The compositor captures the current framebuffer content and shares
it with the client via a file descriptor.
</description>
<request name="capture_output">
<description summary="capture an output's content">
Request a screenshot of the specified output. The compositor will
create a screenshot_buffer object and send its content via the
ready event, or send a failed event if capture is not possible.
</description>
<arg name="id" type="new_id" interface="screenshot_buffer"/>
<arg name="output" type="object" interface="wl_output"/>
<arg name="include_cursor" type="int"
summary="1 to include cursor, 0 to exclude"/>
</request>
<request name="destroy" type="destructor">
<description summary="destroy the screenshot manager">
Destroy this screenshot manager. This does not affect any
in-progress screenshot captures.
</description>
</request>
</interface>
<interface name="screenshot_buffer" version="1">
<description summary="screenshot buffer object">
Represents a screenshot capture in progress. The compositor will
send either a ready event with the captured data, or a failed
event if the capture could not be completed.
</description>
<event name="ready">
<description summary="screenshot is ready">
Sent when the screenshot has been captured. The fd contains
the raw pixel data in the specified format.
</description>
<arg name="fd" type="fd" summary="file descriptor with image data"/>
<arg name="width" type="uint" summary="image width in pixels"/>
<arg name="height" type="uint" summary="image height in pixels"/>
<arg name="stride" type="uint" summary="bytes per row"/>
<arg name="format" type="uint" summary="pixel format (wl_shm format)"/>
</event>
<event name="failed">
<description summary="screenshot capture failed">
Sent if the screenshot could not be captured.
</description>
<arg name="reason" type="string" summary="human-readable error"/>
</event>
<request name="destroy" type="destructor">
<description summary="destroy the screenshot buffer">
Destroy this screenshot buffer object. If the screenshot is
still in progress, it will be canceled.
</description>
</request>
</interface>
</protocol>
Functional Requirements
- Protocol Definition: XML file defining interfaces, requests, events
- Code Generation: Use wayland-scanner for both client and server
- Server Implementation: Compositor advertises and implements protocol
- Client Implementation: Tool that captures screenshots using protocol
- Error Handling: Proper handling of failures and edge cases
Solution Architecture
Project Structure
screenshot-protocol/
โโโ protocol/
โ โโโ screenshot.xml # Protocol definition
โโโ server/
โ โโโ screenshot-server.h # Server API
โ โโโ screenshot-server.c # Implementation
โ โโโ Makefile
โโโ client/
โ โโโ screenshot-client.c # CLI tool
โ โโโ Makefile
โโโ generated/ # wayland-scanner output
โ โโโ screenshot-client-protocol.h
โ โโโ screenshot-client-protocol.c
โ โโโ screenshot-server-protocol.h
โ โโโ screenshot-server-protocol.c
โโโ Makefile
Server-Side Architecture
COMPOSITOR INTEGRATION
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Your Compositor (from P02) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ struct screenshot_manager { โ
โ struct wl_global *global; // Advertised in registry โ
โ struct wl_list resources; // Connected clients โ
โ struct my_server *server; // Compositor reference โ
โ }; โ
โ โ
โ // On client bind: โ
โ static void screenshot_manager_bind(struct wl_client *client, โ
โ void *data, โ
โ uint32_t version, โ
โ uint32_t id) { โ
โ struct wl_resource *resource = โ
โ wl_resource_create(client, &screenshot_manager_interface, โ
โ version, id); โ
โ wl_resource_set_implementation(resource, โ
โ &screenshot_manager_impl, โ
โ manager, NULL); โ
โ } โ
โ โ
โ // Request handlers โ
โ static const struct screenshot_manager_interface โ
โ screenshot_manager_impl = { โ
โ .capture_output = handle_capture_output, โ
โ .destroy = handle_manager_destroy, โ
โ }; โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Capture Implementation Flow
handle_capture_output()
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 1. Validate arguments โ
โ - Is output valid? โ
โ - Is client authorized? โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 2. Create screenshot_buffer resource โ
โ struct wl_resource *buffer_resource = โ
โ wl_resource_create(client, &screenshot_buffer_interface, โ
โ 1, id); โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 3. Allocate shared memory for screenshot โ
โ int fd = memfd_create("screenshot", MFD_CLOEXEC); โ
โ ftruncate(fd, width * height * 4); โ
โ void *data = mmap(...); โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 4. Capture output pixels โ
โ Option A: Read from GPU framebuffer โ
โ Option B: Use wlroots screencopy โ
โ memcpy(data, output_pixels, size); โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 5. Send ready event โ
โ screenshot_buffer_send_ready(buffer_resource, โ
โ fd, width, height, stride, format);โ
โ close(fd); // Compositor's copy closed, client has it โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ OR on error: โ
โ screenshot_buffer_send_failed(buffer_resource, "reason"); โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Phased Implementation Guide
Phase 1: Protocol Definition (Day 1-2)
Goal: Create a valid XML protocol definition
Steps:
- Study existing protocols (xdg-shell.xml, wlr-screencopy.xml)
- Design your interface hierarchy
- Define requests (client โ server)
- Define events (server โ client)
- Add proper descriptions and summaries
- Validate with wayland-scanner
Verification:
$ wayland-scanner client-header protocol/screenshot.xml /dev/stdout
# Should output valid C header without errors
Phase 2: Code Generation (Day 2)
Goal: Generate client and server code
Steps:
- Create Makefile with scanner rules
- Generate client header and code
- Generate server header and code
- Verify generated code compiles
Makefile Rules:
SCANNER = wayland-scanner
generated/screenshot-client-protocol.h: protocol/screenshot.xml
$(SCANNER) client-header $< $@
generated/screenshot-client-protocol.c: protocol/screenshot.xml
$(SCANNER) private-code $< $@
generated/screenshot-server-protocol.h: protocol/screenshot.xml
$(SCANNER) server-header $< $@
generated/screenshot-server-protocol.c: protocol/screenshot.xml
$(SCANNER) private-code $< $@
Phase 3: Server Implementation (Day 3-5)
Goal: Implement compositor-side protocol
Steps:
- Create wl_global for screenshot_manager
- Implement bind callback
- Implement capture_output request handler
- Capture actual pixels (use wlroots renderer if available)
- Send ready/failed events
- Handle resource destruction
Key Implementation:
static void handle_capture_output(struct wl_client *client,
struct wl_resource *resource,
uint32_t id,
struct wl_resource *output_resource,
int32_t include_cursor) {
struct wlr_output *output = wlr_output_from_resource(output_resource);
if (!output) {
struct wl_resource *buffer =
wl_resource_create(client, &screenshot_buffer_interface, 1, id);
screenshot_buffer_send_failed(buffer, "invalid output");
return;
}
// Capture logic here...
// Create fd, copy pixels, send ready event
}
Phase 4: Client Implementation (Day 5-7)
Goal: Create command-line screenshot tool
Steps:
- Connect to Wayland display
- Get registry, bind to screenshot_manager
- Enumerate outputs
- Request screenshot
- Handle ready event, save to file
- Handle failed event
Client Usage:
$ ./screenshot-client --output HDMI-A-1 --file screenshot.png
Capturing output HDMI-A-1...
Saved 1920x1080 screenshot to screenshot.png
Phase 5: Integration and Polish (Day 7+)
Goal: Complete working system
Steps:
- Integrate server into your compositor
- Test with multiple clients
- Handle edge cases (output disconnect, client crash)
- Add cursor inclusion option
- Write documentation
Testing Strategy
Protocol Validation
# Check XML syntax
xmllint --noout protocol/screenshot.xml
# Verify scanner accepts it
wayland-scanner --help # Check version
wayland-scanner client-header protocol/screenshot.xml /dev/null
Integration Testing
| Test Case | Expected Result |
|---|---|
| Bind to manager | Resource created, no error |
| Capture valid output | ready event with valid fd |
| Capture invalid output | failed event with reason |
| Destroy manager during capture | Capture continues |
| Client disconnect during capture | Compositor handles gracefully |
| Multiple concurrent captures | All complete correctly |
Memory Testing
# Check for leaks
valgrind --leak-check=full ./screenshot-client --output HDMI-A-1
# Check compositor doesn't leak on client disconnect
# Run client multiple times, check compositor memory
Common Pitfalls and Debugging
Problem: Scanner rejects XML
Common causes:
- Missing version attribute on interface
- Invalid type in arg
- Destructor must have type=โdestructorโ
Fix: Check against working protocols like xdg-shell.xml
Problem: Client canโt bind to protocol
Cause: Server not advertising global correctly
Debug:
WAYLAND_DEBUG=1 ./client 2>&1 | grep screenshot
# Should see: registry.global(..., "screenshot_manager", ...)
Problem: fd not valid on client side
Cause: File descriptor passing failed
Fix: Ensure youโre using the correct libwayland functions; fd passing happens automatically over the socket
Problem: Protocol error on request
Cause: Mismatched interface versions or wrong argument types
Debug: Check that generated code matches your XML exactly
Extensions and Challenges
Challenge 1: Region Capture
Add ability to capture a specific rectangle, not just full output.
Challenge 2: Delayed Capture
Add a request to capture after a delay (for menus, etc.).
Challenge 3: Format Selection
Let client request specific pixel format (PNG, JPEG compression).
Challenge 4: Permission Model
Add authorization (only allow screenshot for trusted clients).
Challenge 5: Live Preview
Implement continuous capture for screen sharing.
Real-World Connections
Production Wayland Protocols
| Protocol | Purpose | Stability |
|---|---|---|
| xdg-shell | Window management | Stable |
| wlr-layer-shell | Panels, overlays | Unstable |
| wlr-screencopy | Screenshots | Unstable |
| xdg-output | Output metadata | Stable |
| zwp-linux-dmabuf | GPU buffer sharing | Stable |
Protocol Design Principles
From real-world experience:
- Asynchronous by default: Events, not return values
- Object-oriented: Create objects for stateful operations
- Version carefully: New features = new version number
- Document thoroughly: Comments in XML are the spec
- Fail explicitly: Send error events, donโt silently fail
Resources
Essential Reading
| Resource | Purpose |
|---|---|
| โThe Wayland Bookโ Ch. 5 | Protocol design philosophy |
| xdg-shell.xml | Reference protocol |
| wayland.xml | Core protocol patterns |
| wlr-protocols repo | Extension examples |
Code References
wayland-protocols/- Official extensionswlr-protocols/- wlroots extensionswayland-scannersource - Understanding generation
Self-Assessment Checklist
Before considering this project complete, verify you can:
- Write a new Wayland interface from scratch (XML)
- Explain the difference between requests and events
- Generate code with wayland-scanner (client and server)
- Implement a protocol server in a compositor
- Write a client that uses your protocol
- Handle object lifecycle (creation, destruction)
- Debug protocol errors using WAYLAND_DEBUG
- Explain how file descriptors are passed between processes
- Add a new request to your protocol (version bump)
- Read and understand xdg-shell.xml
Completing this project gives you the ability to extend Wayland in any direction. You can now implement custom protocols for your specific needsโIPC, configuration, features not covered by existing protocols. This is the key to building unique compositors and desktop environments.