Project 2: Key-Value Store Client Library

Build a production-style C client library that hides protocol details and enforces clear ownership contracts.

Quick Reference

Attribute Value
Difficulty Intermediate
Time Estimate 4-6 days
Main Programming Language C (Alternatives: Rust, Go)
Alternative Programming Languages Rust, Go
Coolness Level Level 2: Practical but Useful
Business Potential Level 3: Service and Support
Prerequisites Project 1, basic socket I/O, memory ownership rules
Key Topics Opaque handles, framing protocols, retriable errors

1. Learning Objectives

  1. Define a stable public C API with explicit ownership and lifecycle rules.
  2. Implement robust request/response framing over TCP including partial reads/writes.
  3. Normalize network and protocol errors into reusable client error codes.
  4. Build deterministic connectivity and retry tests.

2. All Theory Needed (Per-Concept Breakdown)

API Boundaries and Ownership in C

Fundamentals

In C, APIs are contracts, not hints. Every pointer in a function signature implies ownership, mutability, and lifetime expectations. If those expectations are implicit, users eventually trigger leaks, double frees, and dangling references. Opaque handles protect invariants by hiding internal structure from callers and forcing access through controlled functions. Good client libraries document exactly who allocates, who frees, and when resources become invalid.

Deep Dive into the concept

Opaque-handle APIs prevent structural coupling and preserve upgrade flexibility. If clients can embed your struct layout, internal changes become breaking changes. By exposing only forward declarations in headers, you preserve implementation freedom and discourage misuse. Ownership is equally critical. Consider a get function returning textual data: returning an internal pointer is fast but fragile; returning caller-owned heap memory is safer but requires explicit free function. Either can work if contract is precise.

Error modeling is another boundary dimension. Network errors, protocol parse errors, timeout events, and server-side semantic errors should map to stable error categories. A strong API avoids leaking implementation-specific errno details unless explicitly requested for diagnostics. This allows consumers to implement meaningful recovery behavior.

Protocol framing over stream transports demands exact byte accounting. TCP does not preserve message boundaries. If the protocol uses length-prefix framing, your client must parse headers first, then read exact payload length with looped reads. Failure to handle partial reads/writes causes state desynchronization that appears as random parse failures under load.

Finally, boundary observability matters. Provide debug hooks (trace IDs, request timing, raw frame dumps in debug mode) so users can diagnose behavior without forking library internals.

How this fit on projects

  • Primary in this project.
  • Reinforces API discipline used later in SQL and transaction client surfaces.

Definitions & key terms

  • Opaque handle: Incomplete type that hides internal representation.
  • Ownership transfer: Rule defining allocator and deallocator responsibilities.
  • Framing: Method to delimit messages over byte streams.

Mental model diagram

Caller App
   |
   v
Public Header API (stable contract)
   |
   v
Client Library Internals (sockets, framing, retries)
   |
   v
KV Server Protocol

How it works (step-by-step, with invariants and failure modes)

  1. Caller creates handle via connect/open API.
  2. Request encoded with framed protocol.
  3. Client sends frame, receives response frame fully.
  4. Response decoded and returned with ownership contract.
  5. Caller frees resources via dedicated release APIs.

Invariants:

  • No internal pointers leaked to caller unless explicitly borrowed.
  • Each successful allocation has one valid release path.
  • Framed read/write loops consume exact lengths.

Failure modes:

  • Partial frame read interpreted as full frame.
  • Ambiguous ownership of returned buffers.
  • Reconnect logic duplicating non-idempotent writes.

Minimal concrete example

Pseudo-signature contract:
kv_client_get(handle, key, out_value, out_len)
- returns status code
- allocates out_value on success
- caller must call kv_client_free(out_value)

Common misconceptions

  • const alone defines ownership.” -> False; ownership needs explicit docs.
  • “TCP read returns one full message.” -> False; stream semantics are chunked.

Check-your-understanding questions

  1. Why hide internal structs in headers?
  2. Why must protocols define frame boundaries explicitly?
  3. What makes an API error model stable?

Check-your-understanding answers

  1. Prevent misuse and preserve implementation freedom.
  2. Streams do not carry message boundaries.
  3. Consistent categories independent of internal details.

Real-world applications

  • Redis C clients
  • Database driver SDKs

Where you’ll apply it

  • This project and later SQL client/server boundaries in P06/P07.

References

Key insights

A clean C API is a reliability feature, not a cosmetic preference.

Summary

Boundary clarity prevents entire classes of long-tail production bugs.

Homework/Exercises to practice the concept

  1. Write ownership table for every API function.
  2. Design error code taxonomy for connect/send/receive/decode failures.
  3. Create malformed-frame corpus and expected error outcomes.

Solutions to the homework/exercises

  1. Each pointer path should have one authoritative free function.
  2. Keep categories stable; include optional diagnostic detail field.
  3. Reject gracefully and keep connection state consistent.

3. Project Specification

3.1 What You Will Build

A static/dynamic C client library exposing connect, set, get, delete, and close operations with framing and retries.

3.2 Functional Requirements

  1. Connect to server endpoint with timeout.
  2. Encode and send framed requests.
  3. Parse and validate framed responses.
  4. Return stable error codes.
  5. Expose cleanup APIs for all owned resources.

3.3 Non-Functional Requirements

  • Performance: Single-request overhead remains predictable.
  • Reliability: No memory leaks over reconnect loops.
  • Usability: Header docs fully describe ownership.

3.4 Example Usage / Output

connect -> set -> get -> free-returned-buffer -> close

3.5 Data Formats / Schemas / Protocols

Length-prefixed request and response frames with operation code and payload size.

3.6 Edge Cases

Timeout, connection reset mid-frame, malformed length prefix, zero-length value.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

make libkvclient
./kvclient_demo --host 127.0.0.1 --port 7390

3.7.2 Golden Path Demo (Deterministic)

Use fixed command sequence with fixed server fixture.

3.7.3 If CLI: exact terminal transcript

$ ./kvclient_demo
connect -> OK
set user:7:name "Lin" -> OK
get user:7:name -> VALUE "Lin"
free value buffer -> OK
close -> OK

4. Solution Architecture

4.1 High-Level Design

Public API -> Request Builder -> Socket Transport -> Response Decoder

4.2 Key Components

Component Responsibility Key Decisions
Public API Stable caller contract Opaque handle + explicit free
Transport Reliable framed I/O Exact-byte read/write loops
Codec Encode/decode protocol Length and opcode validation
Error model Unified status mapping Stable enum categories

4.4 Data Structures (No Full Code)

ClientHandle { conn_fd, timeouts, retry_policy, trace_id }
FrameHeader  { op_code, payload_len, req_id }

4.4 Algorithm Overview

  1. Build frame from request.
  2. Send full frame with retry on interrupt.
  3. Read full header then payload.
  4. Decode and return typed result.

5. Implementation Guide

5.1 Development Environment Setup

make libkvclient kvclient_demo

5.2 Project Structure

project/
├── include/kvclient.h
├── src/client.c
├── src/codec.c
├── src/transport.c
└── tests/

5.3 The Core Question You’re Answering

How do I make a C client API safe and predictable for unknown downstream callers?

5.4 Concepts You Must Understand First

  • Opaque-handle pattern
  • Framed protocol over stream transport
  • Ownership and cleanup guarantees

5.5 Questions to Guide Your Design

  • Which calls are idempotent under retry?
  • How is timeout configured and surfaced?

5.6 Thinking Exercise

Create an ownership matrix for success and failure paths for each API.

5.7 The Interview Questions They’ll Ask

  1. Why use opaque handles in C APIs?
  2. How do you parse partial network reads safely?
  3. How do you design retriable vs non-retriable error codes?
  4. How do you prevent memory ownership ambiguity?

5.8 Hints in Layers

  • Implement transport read/write helpers first.
  • Lock header contract before implementation.
  • Add frame-fuzz tests before performance tuning.

5.9 Books That Will Help

Topic Book Chapter
Interface design in C C Interfaces and Implementations Ch. 1-3
Memory safety Effective C, 2nd Edition Memory chapters
Socket I/O The Linux Programming Interface networking chapters

5.10 Implementation Phases

  • Phase 1: API header + transport helper
  • Phase 2: codec + operations
  • Phase 3: retries/timeouts + fuzz tests

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Returned value ownership borrowed / caller-owned caller-owned safest boundary
Frame protocol delimiter / length-prefix length-prefix robust binary payloads

6. Testing Strategy

6.1 Test Categories

  • Unit: codec framing
  • Integration: live server roundtrip
  • Fault injection: partial read/write and disconnects

6.2 Critical Test Cases

  1. Value retrieval with caller-owned memory.
  2. Mid-frame disconnect returns recoverable error.
  3. Malformed frame length rejected.

6.3 Test Data

Fixed request corpus with deterministic expected responses.


7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Ambiguous ownership intermittent double-free/leaks explicit release API
Incomplete framed reads parse desync exact-byte read loops

7.2 Debugging Strategies

  • Add frame hex dump mode in debug builds.
  • Log request IDs for request/response correlation.

7.3 Performance Traps

Tiny per-call allocations can dominate latency; use reusable buffers carefully without violating ownership contract.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add kv_ping and server capability query.

8.2 Intermediate Extensions

  • Add connection pooling wrapper.

8.3 Advanced Extensions

  • Add async pipelining mode with request IDs.

9. Real-World Connections

9.1 Industry Applications

Client SDKs for Redis, Memcached, and SQL systems follow similar boundary patterns.

  • Hiredis design references and framing behavior.

9.3 Interview Relevance

Common systems interviews ask ownership, framing, and retry semantics.


10. Resources

10.1 Essential Reading

  • C Interfaces and Implementations
  • Linux read/write man pages

10.2 Video Resources

  • Talks on robust API design and network framing.

10.3 Tools & Documentation

  • ASAN/UBSAN, strace, packet captures for frame debugging.

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain ownership rules for each public function.
  • I can explain framing over stream sockets.

11.2 Implementation

  • No leaks in reconnect stress tests.
  • Protocol parser rejects malformed inputs safely.

11.3 Growth

  • I can justify API tradeoffs under interview pressure.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Stable API + connect/set/get/delete/close

Full Completion:

  • Plus deterministic tests and fault injection coverage

Excellence:

  • Plus async/pipelined mode and deep observability hooks