Project 5: Build a Language Server (LSP)
Build a minimal LSP server that speaks JSON-RPC, tracks open documents, and provides diagnostics and go-to-definition.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Expert |
| Time Estimate | 1-2 weeks |
| Main Programming Language | Python or Rust |
| Alternative Programming Languages | Go, TypeScript, C# |
| Coolness Level | Level 9 - Your own language tooling backend |
| Business Potential | High (language tools are in demand) |
| Prerequisites | JSON, RPC concepts, basic parsing |
| Key Topics | JSON-RPC, LSP lifecycle, document sync, diagnostics |
1. Learning Objectives
By completing this project, you will:
- Implement JSON-RPC framing over stdin/stdout.
- Follow the LSP lifecycle from initialize to shutdown.
- Maintain a synchronized document store with versions.
- Generate diagnostics and send them to the client.
- Implement go-to-definition for a simple language.
- Handle errors and malformed input safely.
2. All Theory Needed (Per-Concept Breakdown)
2.1 JSON-RPC Framing and Message Parsing
Description
LSP uses JSON-RPC over streams with Content-Length headers.
Definitions & Key Terms
- JSON-RPC: Request/response protocol over JSON.
- Content-Length: Header that tells how many bytes follow.
- Message boundary: Where one JSON message ends.
Mental Model Diagram (ASCII)
Content-Length: 86\r\n\r\n{json...}
How It Works (Step-by-Step)
- Read headers until
\r\n\r\n. - Parse
Content-Length. - Read exactly that many bytes.
- Parse JSON and dispatch by
method.
Minimal Concrete Example
Content-Length: 47\r\n\r\n{"jsonrpc":"2.0","id":1,"method":"initialize"}
Common Misconceptions
- “You can read line by line” -> JSON can contain newlines.
- “Content-Length is optional” -> It is required for LSP.
Check-Your-Understanding Questions
- Explain why framing is required on stdio.
- Predict what happens if Content-Length is wrong.
- Explain how to handle partial reads.
Where You’ll Apply It
- See §3.2 requirements and §5.10 Phase 1.
- Also used in P03-neovim-gui-client for MessagePack contrast.
2.2 LSP Lifecycle and Capabilities
Description
LSP defines a strict sequence of messages from initialization to shutdown.
Definitions & Key Terms
- initialize: Client -> server handshake with capabilities.
- initialized: Client notifies server that initialization finished.
- shutdown/exit: Graceful termination.
Mental Model Diagram (ASCII)
initialize -> initialized -> requests/notifications -> shutdown -> exit
How It Works (Step-by-Step)
- Client sends
initializerequest with capabilities. - Server replies with supported features.
- Client sends
initializednotification. - Server handles document events.
- On
shutdown, stop work; onexit, terminate.
Minimal Concrete Example
{"jsonrpc":"2.0","id":1,"result":{"capabilities":{"textDocumentSync":2}}}
Common Misconceptions
- “Server can send diagnostics before initialized” -> It should not.
- “Exit without shutdown is fine” -> It is considered abnormal.
Check-Your-Understanding Questions
- Explain why
initializemust happen first. - Predict what happens if the server ignores
shutdown. - Explain how to advertise go-to-definition capability.
Where You’ll Apply It
- See §3.2 requirements and §5.10 Phase 1.
- Also used in P06-neovim-lite-capstone.
2.3 Document Synchronization and Versioning
Description
The server must keep a consistent view of open documents while edits happen.
Definitions & Key Terms
- TextDocumentSync: LSP capability for document updates.
- Version: Client-provided incrementing number.
- Incremental changes: Diffs applied to the server’s document.
Mental Model Diagram (ASCII)
Client doc v3 -> didChange -> server applies -> v4
How It Works (Step-by-Step)
- On
didOpen, store full document and version. - On
didChange, apply edits in order. - Reject or log out-of-order versions.
- Keep a clean in-memory representation.
Minimal Concrete Example
{"textDocument":{"uri":"file:///xyl.xyl","version":4},"contentChanges":[{"range":...,"text":"foo"}]}
Common Misconceptions
- “You can ignore versions” -> It causes desync.
- “Full sync is always simpler” -> It can be slower for large files.
Check-Your-Understanding Questions
- Explain why versioning prevents stale edits.
- Predict what happens if changes arrive out of order.
- Explain full vs incremental sync tradeoffs.
Where You’ll Apply It
- See §4.3 data structures and §5.10 Phase 2.
- Also used in P04-tree-sitter-grammar.
2.4 Parsing, Symbol Tables, and Definitions
Description
To implement go-to-definition, you need a symbol table built from parsing the document.
Definitions & Key Terms
- Symbol table: Map from identifier to definition location.
- AST: Abstract syntax tree used for semantic analysis.
- Definition: The declaration of a symbol.
Mental Model Diagram (ASCII)
source -> parse -> AST -> symbol table
identifier "foo" -> (file, line, col)
How It Works (Step-by-Step)
- Parse the document into an AST.
- Walk the AST to collect definitions.
- Store symbol table per document.
- On definition request, lookup symbol and return location.
Minimal Concrete Example
let x = 1
print(x)
-> symbol table: x -> line 1, col 4
Common Misconceptions
- “String search is enough” -> It fails with shadowing.
- “Parsing must be perfect” -> A partial AST can still work.
Check-Your-Understanding Questions
- Explain why symbol tables are needed for go-to-definition.
- Predict what happens when a symbol is redefined.
- Explain how to handle references in incomplete code.
Where You’ll Apply It
- See §3.2 requirements and §5.10 Phase 2.
- Also used in P04-tree-sitter-grammar.
2.5 Diagnostics and Ranges
Description
Diagnostics are structured errors and warnings sent to the editor.
Definitions & Key Terms
- Diagnostic: LSP message describing a problem.
- Range: Start and end position in a document.
- Severity: Error, Warning, Info.
Mental Model Diagram (ASCII)
{ range: (line 2, col 5 -> line 2, col 8), message: "Unknown symbol" }
How It Works (Step-by-Step)
- Analyze document for errors.
- Create diagnostics with ranges and messages.
- Send
textDocument/publishDiagnostics. - Clear diagnostics when errors are fixed.
Minimal Concrete Example
{"range":{"start":{"line":1,"character":3},"end":{"line":1,"character":6}},"message":"Undefined symbol"}
Common Misconceptions
- “Diagnostics are only errors” -> Warnings and info are valid.
- “Ranges are byte offsets” -> LSP uses UTF-16 positions.
Check-Your-Understanding Questions
- Explain why UTF-16 positions matter in LSP.
- Predict what happens if you send stale diagnostics.
- Explain how to clear diagnostics.
Where You’ll Apply It
- See §3.2 requirements and §5.10 Phase 2.
- Also used in P06-neovim-lite-capstone.
2.6 Concurrency, Cancellation, and Robustness
Description
LSP servers must handle concurrent requests and cancellations.
Definitions & Key Terms
$/cancelRequest: Client cancels an in-flight request.- Work queue: Serializes document processing.
- Debounce: Delay diagnostics until edits settle.
Mental Model Diagram (ASCII)
request -> queue -> process -> respond
cancel -> drop if not started
How It Works (Step-by-Step)
- Maintain a queue of requests per document.
- Handle
$/cancelRequestby dropping work if possible. - Debounce diagnostics to avoid floods.
- Ensure thread-safe access to document store.
Minimal Concrete Example
on didChange:
schedule diagnostics in 200ms
on new change:
cancel previous timer
Common Misconceptions
- “LSP is single-threaded” -> Many servers use worker threads.
- “Cancellation is optional” -> Clients rely on it for UX.
Check-Your-Understanding Questions
- Explain why diagnostics should be debounced.
- Predict what happens if you ignore cancel requests.
- Explain a safe concurrency model for the document store.
Where You’ll Apply It
- See §7.3 performance traps and §5.10 Phase 3.
- Also used in P03-neovim-gui-client.
3. Project Specification
3.1 What You Will Build
A minimal LSP server for a toy language that:
- Supports initialize/shutdown
- Tracks open documents
- Publishes diagnostics
- Implements go-to-definition
Included: JSON-RPC framing, basic parsing, symbol table. Excluded: advanced refactors, completion, code actions.
3.2 Functional Requirements
- Initialize: Respond with server capabilities.
- Document sync: Handle
didOpen,didChange,didClose. - Diagnostics: Publish at least one rule (e.g., unknown symbol).
- Go-to-definition: Return location for identifiers.
- Error handling: Graceful handling of invalid JSON.
3.3 Non-Functional Requirements
- Performance: diagnostics within 200ms for small files.
- Reliability: no crashes on malformed input.
- Usability: clear diagnostic messages.
3.4 Example Usage / Output
$ ./my-lsp --stdio
Content-Length: 86
{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}
3.5 Data Formats / Schemas / Protocols
- JSON-RPC:
Content-Lengthframed JSON. - LSP:
initialize,textDocument/didOpen,textDocument/definition. - Error JSON shape:
{ "jsonrpc": "2.0", "id": 1, "error": { "code": -32700, "message": "Parse error", "data": "details" } }
3.6 Edge Cases
- Malformed JSON.
- Out-of-order document versions.
- Large file with many symbols.
3.7 Real World Outcome
You can connect the server to Neovim and see diagnostics.
3.7.1 How to Run (Copy/Paste)
./my-lsp --stdio
3.7.2 Golden Path Demo (Deterministic)
- Use a fixed test file
fixtures/sample.xyl. - On open, server returns one diagnostic at a known range.
Expected:
- Diagnostic appears at line 2, col 5 with message “Undefined symbol”.
3.7.3 If CLI: exact terminal transcript
$ ./my-lsp --stdio
Content-Length: 86
{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}
Content-Length: 123
{"jsonrpc":"2.0","method":"textDocument/publishDiagnostics","params":{...}}
Failure demo:
$ ./my-lsp --stdio
Content-Length: 10
{bad json}
Error: Parse error
Exit code: 2
Exit codes:
0clean shutdown1init failure2parse error
4. Solution Architecture
4.1 High-Level Design
[stdio] -> [JSON-RPC parser] -> [Dispatcher]
-> [Document store]
-> [Parser + symbol table]
-> [Diagnostics]
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Framer | Read Content-Length | streaming parser |
| Dispatcher | Route methods | table of handlers |
| Document Store | Track uri->text | versioned updates |
| Analyzer | Parse + symbols | incremental or full |
| Diagnostics | Produce errors | debounce policy |
4.3 Data Structures (No Full Code)
Document { uri, version, text, ast, symbols }
Diagnostic { range, message, severity }
4.4 Algorithm Overview
Key Algorithm: Apply Incremental Changes
- Validate version order.
- Apply each change range to document text.
- Re-parse or update AST.
Complexity Analysis:
- Time: O(n) for full reparse (simple implementation)
- Space: O(n) for document text and AST
5. Implementation Guide
5.1 Development Environment Setup
python3 --version
# or
rustc --version
5.2 Project Structure
my-lsp/
├── src/
│ ├── main.py (or main.rs)
│ ├── rpc.py
│ ├── lsp.py
│ ├── documents.py
│ └── analysis.py
├── fixtures/
│ └── sample.xyl
└── README.md
5.3 The Core Question You’re Answering
“How do editors and language tools stay in sync so navigation and diagnostics feel instant?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- JSON-RPC framing
- LSP initialize lifecycle
- Versioned document updates
5.5 Questions to Guide Your Design
- How will you parse messages safely from a stream?
- How will you store documents and versions?
- How will you avoid re-parsing too frequently?
5.6 Thinking Exercise
Write out a manual didOpen + didChange sequence and track the document text by hand.
5.7 The Interview Questions They’ll Ask
- Why does LSP use JSON-RPC instead of REST?
- What is the difference between full and incremental sync?
- How do you avoid stale diagnostics?
5.8 Hints in Layers
Hint 1: Start with framing Get Content-Length parsing correct before anything else.
Hint 2: Implement initialize Return minimal capabilities and log everything.
Hint 3: Add diagnostics only Hardcode a simple rule first.
Hint 4: Add go-to-definition last Build a symbol table before responding.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Networking basics | Computer Networks | Ch. 2 |
| Parsing | Engineering a Compiler | Ch. 2-5 |
| Protocols | Network Programming with Go | Ch. 3 |
5.10 Implementation Phases
Phase 1: RPC Skeleton (2-3 days)
Goals:
- JSON-RPC framing
- Initialize lifecycle
Tasks:
- Implement Content-Length parser.
- Respond to
initializeandshutdown.
Checkpoint: Neovim connects without errors.
Phase 2: Document Sync and Diagnostics (4-5 days)
Goals:
- Store documents
- Publish diagnostics
Tasks:
- Handle
didOpenanddidChange. - Implement a basic parser or rule.
- Send
publishDiagnostics.
Checkpoint: Diagnostics appear in Neovim.
Phase 3: Go-to-definition (3-4 days)
Goals:
- Build symbol table
- Support
textDocument/definition
Tasks:
- Parse identifiers and definitions.
- Implement symbol lookup.
- Return locations.
Checkpoint: gd in Neovim jumps correctly.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Sync strategy | full, incremental | full first | simpler |
| Parsing | regex, AST | simple AST | accuracy |
| Concurrency | single-thread, worker | single-thread + debounce | stable |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Framing, parsing | Content-Length parsing |
| Integration Tests | LSP flow | initialize, didOpen |
| Edge Case Tests | Bad JSON | malformed payloads |
6.2 Critical Test Cases
- Malformed JSON returns error and exits with code 2.
- Out-of-order version is rejected.
- Diagnostics clear when errors are fixed.
6.3 Test Data
fixtures/
sample.xyl
errors.xyl
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Wrong Content-Length | Server hangs | Validate lengths |
| Ignoring versions | Diagnostics drift | Track versions |
| Blocking reads | No responsiveness | Use buffered read loop |
7.2 Debugging Strategies
- Log all incoming messages to a file.
- Add a debug command to dump document state.
- Simulate clients with small test scripts.
7.3 Performance Traps
- Re-parsing large files on every keystroke.
- Sending diagnostics on every change without debounce.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add completion for keywords.
- Add hover info for symbols.
8.2 Intermediate Extensions
- Incremental parsing for faster diagnostics.
- Multi-file workspace support.
8.3 Advanced Extensions
- Implement rename and references.
- Add semantic tokens.
9. Real-World Connections
9.1 Industry Applications
- Language tooling: LSP powers VSCode, Neovim, and many IDEs.
- Code intelligence: Diagnostics and go-to-definition are standard features.
9.2 Related Open Source Projects
- pyright: Python language server.
- rust-analyzer: Rust language server.
9.3 Interview Relevance
- Protocols: JSON-RPC, streaming parsing.
- Systems: concurrency and state synchronization.
10. Resources
10.1 Essential Reading
- LSP specification (v3.17)
- JSON-RPC 2.0 specification
10.2 Video Resources
- LSP architecture talks (search: “language server protocol overview”)
10.3 Tools & Documentation
- Neovim
:help lsp nvim-lspconfigexamples
10.4 Related Projects in This Series
- P04 - Tree-sitter Grammar: parsing foundation
- P06 - Neovim Lite Capstone: integration
11. Self-Assessment Checklist
11.1 Understanding
- I can explain JSON-RPC framing.
- I can describe the LSP lifecycle.
- I can explain versioned document sync.
11.2 Implementation
- Diagnostics appear correctly in Neovim.
- Go-to-definition works for simple symbols.
- Server handles malformed input.
11.3 Growth
- I can extend the server with a new feature.
- I can explain this project in an interview.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Initialize/shutdown works.
- Diagnostics are published for a basic rule.
- Document sync is stable.
Full Completion:
- All minimum criteria plus:
- Go-to-definition works.
- Out-of-order versions are handled correctly.
Excellence (Going Above & Beyond):
- Incremental parsing.
- Multi-file workspace support.
13. Determinism and Reproducibility Notes
- Use
fixtures/sample.xylfor golden demo diagnostics. - Do not include timestamps in logs.
- Failure demo uses malformed JSON to produce a stable error.