Project 5: Build a Language Server (LSP)

Build a minimal LSP server that speaks JSON-RPC, tracks open documents, and provides diagnostics and go-to-definition.

Quick Reference

Attribute Value
Difficulty Expert
Time Estimate 1-2 weeks
Main Programming Language Python or Rust
Alternative Programming Languages Go, TypeScript, C#
Coolness Level Level 9 - Your own language tooling backend
Business Potential High (language tools are in demand)
Prerequisites JSON, RPC concepts, basic parsing
Key Topics JSON-RPC, LSP lifecycle, document sync, diagnostics

1. Learning Objectives

By completing this project, you will:

  1. Implement JSON-RPC framing over stdin/stdout.
  2. Follow the LSP lifecycle from initialize to shutdown.
  3. Maintain a synchronized document store with versions.
  4. Generate diagnostics and send them to the client.
  5. Implement go-to-definition for a simple language.
  6. Handle errors and malformed input safely.

2. All Theory Needed (Per-Concept Breakdown)

2.1 JSON-RPC Framing and Message Parsing

Description

LSP uses JSON-RPC over streams with Content-Length headers.

Definitions & Key Terms
  • JSON-RPC: Request/response protocol over JSON.
  • Content-Length: Header that tells how many bytes follow.
  • Message boundary: Where one JSON message ends.
Mental Model Diagram (ASCII)
Content-Length: 86\r\n\r\n{json...}
How It Works (Step-by-Step)
  1. Read headers until \r\n\r\n.
  2. Parse Content-Length.
  3. Read exactly that many bytes.
  4. Parse JSON and dispatch by method.
Minimal Concrete Example
Content-Length: 47\r\n\r\n{"jsonrpc":"2.0","id":1,"method":"initialize"}
Common Misconceptions
  • “You can read line by line” -> JSON can contain newlines.
  • “Content-Length is optional” -> It is required for LSP.
Check-Your-Understanding Questions
  1. Explain why framing is required on stdio.
  2. Predict what happens if Content-Length is wrong.
  3. Explain how to handle partial reads.
Where You’ll Apply It
  • See §3.2 requirements and §5.10 Phase 1.
  • Also used in P03-neovim-gui-client for MessagePack contrast.

2.2 LSP Lifecycle and Capabilities

Description

LSP defines a strict sequence of messages from initialization to shutdown.

Definitions & Key Terms
  • initialize: Client -> server handshake with capabilities.
  • initialized: Client notifies server that initialization finished.
  • shutdown/exit: Graceful termination.
Mental Model Diagram (ASCII)
initialize -> initialized -> requests/notifications -> shutdown -> exit
How It Works (Step-by-Step)
  1. Client sends initialize request with capabilities.
  2. Server replies with supported features.
  3. Client sends initialized notification.
  4. Server handles document events.
  5. On shutdown, stop work; on exit, terminate.
Minimal Concrete Example
{"jsonrpc":"2.0","id":1,"result":{"capabilities":{"textDocumentSync":2}}}
Common Misconceptions
  • “Server can send diagnostics before initialized” -> It should not.
  • “Exit without shutdown is fine” -> It is considered abnormal.
Check-Your-Understanding Questions
  1. Explain why initialize must happen first.
  2. Predict what happens if the server ignores shutdown.
  3. Explain how to advertise go-to-definition capability.
Where You’ll Apply It

2.3 Document Synchronization and Versioning

Description

The server must keep a consistent view of open documents while edits happen.

Definitions & Key Terms
  • TextDocumentSync: LSP capability for document updates.
  • Version: Client-provided incrementing number.
  • Incremental changes: Diffs applied to the server’s document.
Mental Model Diagram (ASCII)
Client doc v3 -> didChange -> server applies -> v4
How It Works (Step-by-Step)
  1. On didOpen, store full document and version.
  2. On didChange, apply edits in order.
  3. Reject or log out-of-order versions.
  4. Keep a clean in-memory representation.
Minimal Concrete Example
{"textDocument":{"uri":"file:///xyl.xyl","version":4},"contentChanges":[{"range":...,"text":"foo"}]}
Common Misconceptions
  • “You can ignore versions” -> It causes desync.
  • “Full sync is always simpler” -> It can be slower for large files.
Check-Your-Understanding Questions
  1. Explain why versioning prevents stale edits.
  2. Predict what happens if changes arrive out of order.
  3. Explain full vs incremental sync tradeoffs.
Where You’ll Apply It

2.4 Parsing, Symbol Tables, and Definitions

Description

To implement go-to-definition, you need a symbol table built from parsing the document.

Definitions & Key Terms
  • Symbol table: Map from identifier to definition location.
  • AST: Abstract syntax tree used for semantic analysis.
  • Definition: The declaration of a symbol.
Mental Model Diagram (ASCII)
source -> parse -> AST -> symbol table
identifier "foo" -> (file, line, col)
How It Works (Step-by-Step)
  1. Parse the document into an AST.
  2. Walk the AST to collect definitions.
  3. Store symbol table per document.
  4. On definition request, lookup symbol and return location.
Minimal Concrete Example
let x = 1
print(x)
-> symbol table: x -> line 1, col 4
Common Misconceptions
  • “String search is enough” -> It fails with shadowing.
  • “Parsing must be perfect” -> A partial AST can still work.
Check-Your-Understanding Questions
  1. Explain why symbol tables are needed for go-to-definition.
  2. Predict what happens when a symbol is redefined.
  3. Explain how to handle references in incomplete code.
Where You’ll Apply It

2.5 Diagnostics and Ranges

Description

Diagnostics are structured errors and warnings sent to the editor.

Definitions & Key Terms
  • Diagnostic: LSP message describing a problem.
  • Range: Start and end position in a document.
  • Severity: Error, Warning, Info.
Mental Model Diagram (ASCII)
{ range: (line 2, col 5 -> line 2, col 8), message: "Unknown symbol" }
How It Works (Step-by-Step)
  1. Analyze document for errors.
  2. Create diagnostics with ranges and messages.
  3. Send textDocument/publishDiagnostics.
  4. Clear diagnostics when errors are fixed.
Minimal Concrete Example
{"range":{"start":{"line":1,"character":3},"end":{"line":1,"character":6}},"message":"Undefined symbol"}
Common Misconceptions
  • “Diagnostics are only errors” -> Warnings and info are valid.
  • “Ranges are byte offsets” -> LSP uses UTF-16 positions.
Check-Your-Understanding Questions
  1. Explain why UTF-16 positions matter in LSP.
  2. Predict what happens if you send stale diagnostics.
  3. Explain how to clear diagnostics.
Where You’ll Apply It

2.6 Concurrency, Cancellation, and Robustness

Description

LSP servers must handle concurrent requests and cancellations.

Definitions & Key Terms
  • $/cancelRequest: Client cancels an in-flight request.
  • Work queue: Serializes document processing.
  • Debounce: Delay diagnostics until edits settle.
Mental Model Diagram (ASCII)
request -> queue -> process -> respond
cancel -> drop if not started
How It Works (Step-by-Step)
  1. Maintain a queue of requests per document.
  2. Handle $/cancelRequest by dropping work if possible.
  3. Debounce diagnostics to avoid floods.
  4. Ensure thread-safe access to document store.
Minimal Concrete Example
on didChange:
  schedule diagnostics in 200ms
on new change:
  cancel previous timer
Common Misconceptions
  • “LSP is single-threaded” -> Many servers use worker threads.
  • “Cancellation is optional” -> Clients rely on it for UX.
Check-Your-Understanding Questions
  1. Explain why diagnostics should be debounced.
  2. Predict what happens if you ignore cancel requests.
  3. Explain a safe concurrency model for the document store.
Where You’ll Apply It

3. Project Specification

3.1 What You Will Build

A minimal LSP server for a toy language that:

  • Supports initialize/shutdown
  • Tracks open documents
  • Publishes diagnostics
  • Implements go-to-definition

Included: JSON-RPC framing, basic parsing, symbol table. Excluded: advanced refactors, completion, code actions.

3.2 Functional Requirements

  1. Initialize: Respond with server capabilities.
  2. Document sync: Handle didOpen, didChange, didClose.
  3. Diagnostics: Publish at least one rule (e.g., unknown symbol).
  4. Go-to-definition: Return location for identifiers.
  5. Error handling: Graceful handling of invalid JSON.

3.3 Non-Functional Requirements

  • Performance: diagnostics within 200ms for small files.
  • Reliability: no crashes on malformed input.
  • Usability: clear diagnostic messages.

3.4 Example Usage / Output

$ ./my-lsp --stdio
Content-Length: 86

{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}

3.5 Data Formats / Schemas / Protocols

  • JSON-RPC: Content-Length framed JSON.
  • LSP: initialize, textDocument/didOpen, textDocument/definition.
  • Error JSON shape:
    {
      "jsonrpc": "2.0",
      "id": 1,
      "error": { "code": -32700, "message": "Parse error", "data": "details" }
    }
    

3.6 Edge Cases

  • Malformed JSON.
  • Out-of-order document versions.
  • Large file with many symbols.

3.7 Real World Outcome

You can connect the server to Neovim and see diagnostics.

3.7.1 How to Run (Copy/Paste)

./my-lsp --stdio

3.7.2 Golden Path Demo (Deterministic)

  • Use a fixed test file fixtures/sample.xyl.
  • On open, server returns one diagnostic at a known range.

Expected:

  • Diagnostic appears at line 2, col 5 with message “Undefined symbol”.

3.7.3 If CLI: exact terminal transcript

$ ./my-lsp --stdio
Content-Length: 86

{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}
Content-Length: 123

{"jsonrpc":"2.0","method":"textDocument/publishDiagnostics","params":{...}}

Failure demo:

$ ./my-lsp --stdio
Content-Length: 10

{bad json}
Error: Parse error
Exit code: 2

Exit codes:

  • 0 clean shutdown
  • 1 init failure
  • 2 parse error

4. Solution Architecture

4.1 High-Level Design

[stdio] -> [JSON-RPC parser] -> [Dispatcher]
                            -> [Document store]
                            -> [Parser + symbol table]
                            -> [Diagnostics]

4.2 Key Components

Component Responsibility Key Decisions
Framer Read Content-Length streaming parser
Dispatcher Route methods table of handlers
Document Store Track uri->text versioned updates
Analyzer Parse + symbols incremental or full
Diagnostics Produce errors debounce policy

4.3 Data Structures (No Full Code)

Document { uri, version, text, ast, symbols }
Diagnostic { range, message, severity }

4.4 Algorithm Overview

Key Algorithm: Apply Incremental Changes

  1. Validate version order.
  2. Apply each change range to document text.
  3. Re-parse or update AST.

Complexity Analysis:

  • Time: O(n) for full reparse (simple implementation)
  • Space: O(n) for document text and AST

5. Implementation Guide

5.1 Development Environment Setup

python3 --version
# or
rustc --version

5.2 Project Structure

my-lsp/
├── src/
│   ├── main.py (or main.rs)
│   ├── rpc.py
│   ├── lsp.py
│   ├── documents.py
│   └── analysis.py
├── fixtures/
│   └── sample.xyl
└── README.md

5.3 The Core Question You’re Answering

“How do editors and language tools stay in sync so navigation and diagnostics feel instant?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. JSON-RPC framing
  2. LSP initialize lifecycle
  3. Versioned document updates

5.5 Questions to Guide Your Design

  1. How will you parse messages safely from a stream?
  2. How will you store documents and versions?
  3. How will you avoid re-parsing too frequently?

5.6 Thinking Exercise

Write out a manual didOpen + didChange sequence and track the document text by hand.

5.7 The Interview Questions They’ll Ask

  1. Why does LSP use JSON-RPC instead of REST?
  2. What is the difference between full and incremental sync?
  3. How do you avoid stale diagnostics?

5.8 Hints in Layers

Hint 1: Start with framing Get Content-Length parsing correct before anything else.

Hint 2: Implement initialize Return minimal capabilities and log everything.

Hint 3: Add diagnostics only Hardcode a simple rule first.

Hint 4: Add go-to-definition last Build a symbol table before responding.

5.9 Books That Will Help

Topic Book Chapter
Networking basics Computer Networks Ch. 2
Parsing Engineering a Compiler Ch. 2-5
Protocols Network Programming with Go Ch. 3

5.10 Implementation Phases

Phase 1: RPC Skeleton (2-3 days)

Goals:

  • JSON-RPC framing
  • Initialize lifecycle

Tasks:

  1. Implement Content-Length parser.
  2. Respond to initialize and shutdown.

Checkpoint: Neovim connects without errors.

Phase 2: Document Sync and Diagnostics (4-5 days)

Goals:

  • Store documents
  • Publish diagnostics

Tasks:

  1. Handle didOpen and didChange.
  2. Implement a basic parser or rule.
  3. Send publishDiagnostics.

Checkpoint: Diagnostics appear in Neovim.

Phase 3: Go-to-definition (3-4 days)

Goals:

  • Build symbol table
  • Support textDocument/definition

Tasks:

  1. Parse identifiers and definitions.
  2. Implement symbol lookup.
  3. Return locations.

Checkpoint: gd in Neovim jumps correctly.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Sync strategy full, incremental full first simpler
Parsing regex, AST simple AST accuracy
Concurrency single-thread, worker single-thread + debounce stable

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Framing, parsing Content-Length parsing
Integration Tests LSP flow initialize, didOpen
Edge Case Tests Bad JSON malformed payloads

6.2 Critical Test Cases

  1. Malformed JSON returns error and exits with code 2.
  2. Out-of-order version is rejected.
  3. Diagnostics clear when errors are fixed.

6.3 Test Data

fixtures/
  sample.xyl
  errors.xyl

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Wrong Content-Length Server hangs Validate lengths
Ignoring versions Diagnostics drift Track versions
Blocking reads No responsiveness Use buffered read loop

7.2 Debugging Strategies

  • Log all incoming messages to a file.
  • Add a debug command to dump document state.
  • Simulate clients with small test scripts.

7.3 Performance Traps

  • Re-parsing large files on every keystroke.
  • Sending diagnostics on every change without debounce.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add completion for keywords.
  • Add hover info for symbols.

8.2 Intermediate Extensions

  • Incremental parsing for faster diagnostics.
  • Multi-file workspace support.

8.3 Advanced Extensions

  • Implement rename and references.
  • Add semantic tokens.

9. Real-World Connections

9.1 Industry Applications

  • Language tooling: LSP powers VSCode, Neovim, and many IDEs.
  • Code intelligence: Diagnostics and go-to-definition are standard features.
  • pyright: Python language server.
  • rust-analyzer: Rust language server.

9.3 Interview Relevance

  • Protocols: JSON-RPC, streaming parsing.
  • Systems: concurrency and state synchronization.

10. Resources

10.1 Essential Reading

  • LSP specification (v3.17)
  • JSON-RPC 2.0 specification

10.2 Video Resources

  • LSP architecture talks (search: “language server protocol overview”)

10.3 Tools & Documentation

  • Neovim :help lsp
  • nvim-lspconfig examples

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain JSON-RPC framing.
  • I can describe the LSP lifecycle.
  • I can explain versioned document sync.

11.2 Implementation

  • Diagnostics appear correctly in Neovim.
  • Go-to-definition works for simple symbols.
  • Server handles malformed input.

11.3 Growth

  • I can extend the server with a new feature.
  • I can explain this project in an interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Initialize/shutdown works.
  • Diagnostics are published for a basic rule.
  • Document sync is stable.

Full Completion:

  • All minimum criteria plus:
  • Go-to-definition works.
  • Out-of-order versions are handled correctly.

Excellence (Going Above & Beyond):

  • Incremental parsing.
  • Multi-file workspace support.

13. Determinism and Reproducibility Notes

  • Use fixtures/sample.xyl for golden demo diagnostics.
  • Do not include timestamps in logs.
  • Failure demo uses malformed JSON to produce a stable error.