Project 7: Environment Variable Manager

Build an environment variable system with export, unset, and temporary assignments.

Quick Reference

Attribute	Value
Difficulty	Level 2: Intermediate (The Developer)
Time Estimate	1 week
Main Programming Language	C
Alternative Programming Languages	Rust, Go, Zig
Coolness Level	Level 2: Practical but Useful
Business Potential	1. The “Resume Gold” (Educational/Personal Brand)
Prerequisites	strings/maps, process environment, execve envp
Key Topics	envp construction, export flags, scoping

1. Learning Objectives

By completing this project, you will:

Explain and implement envp construction in the context of a shell.
Build a working environment variable manager that matches the project specification.
Design tests that validate correctness and edge cases.
Document design decisions, trade-offs, and limitations.

2. All Theory Needed (Per-Concept Breakdown)

Environment Variables, Export, and Scoping

Fundamentals Shells maintain two related variable spaces: shell variables (internal to the shell) and environment variables (exported to child processes). The export built-in marks a variable so it appears in the envp array passed to execve(). Shell variables can exist without being exported, and assignments can be local to a single command invocation (VAR=1 cmd). Understanding how variables are stored, expanded, and inherited is essential for scripting and predictable behavior.

Deep Dive into the concept A shell typically stores variables in a dictionary-like structure mapping names to values, plus metadata indicating whether a variable is exported. When the shell executes an external command, it must produce an environment array (list of KEY=VALUE strings) that includes only exported variables. This array is passed to execve() and becomes the child’s environment. Internal-only variables are not visible to children but still participate in expansions like $VAR within the shell. This separation enables fine-grained control: you can have variables for scripting that do not leak into subcommands.

Assignments have special behavior. An assignment preceding a command (FOO=bar cmd) should not permanently change the shell’s variables; it should only affect the environment of that single command. Shells implement this by creating a temporary environment overlay for the child: in the parent, the variable table is not mutated permanently, but in the child, the environment array includes the temporary assignment. This is subtle and often mishandled in toy shells, so it’s worth getting right. For purely shell-level assignment (FOO=bar), the variable should be set in the shell table and optionally exported if the export built-in is used.

Scoping becomes more complex with functions and scripts. Many shells allow “local” variables inside functions that override globals and disappear on function return. This requires a scope stack: when entering a function, push a new scope; on return, pop it. Exported variables usually remain global, but some shells allow local exported variables as well. If you are not implementing full scoping, you should at least provide a consistent model and document it.

Expansion and quoting interact with variables. In double quotes, $VAR expands but does not undergo field splitting; in unquoted contexts, the expanded value is subject to field splitting and globbing. This means the variable system must interact with the expansion engine, not just store strings. When a variable is unset, some shells expand it to an empty string, others treat it as an error if set -u is enabled. These option-dependent behaviors are part of shell state management.

Finally, the environment is not the only inherited state. The shell may propagate other properties, like the working directory and open file descriptors. But variables are the most visible interface for configuring child processes, so correctness here is crucial. If you drop variables or incorrectly export them, commands like PATH and HOME will not work, making the shell feel broken.

How this fits on projects Variable handling appears in assignment parsing, expansion, built-ins, and execution environment construction.

Definitions & key terms

Shell variable: Variable stored in the shell, not exported by default.
Environment variable: Exported variable passed to child processes.
Export: Marking a variable for inheritance.
Scope stack: Data structure for nested variable scopes.

Mental model diagram

shell vars (internal) + exported vars -> envp[] -> execve

How it works (step-by-step)

Parse assignments and decide if they are temporary or permanent.
Update shell variable table and export flags.
On exec, build envp from exported variables.
Apply temporary assignments only in the child environment.
On function entry/exit, push/pop variable scopes if supported.

Minimal concrete example

FOO=1
export BAR=2
FOO=3 cmd   # child sees FOO=3, BAR=2; parent keeps FOO=1

Common misconceptions

“All variables are exported” -> only exported ones reach children.
“Temporary assignment changes shell” -> it should not persist.
“Unset equals empty string” -> depends on shell options.

Check-your-understanding questions

Why does FOO=1 cmd not permanently set FOO?
How do exported variables differ from shell variables?
What happens to local variables after a function returns?

Check-your-understanding answers

The assignment is applied only to the child environment.
Exported variables are included in envp for execve().
They are discarded when the scope stack is popped.

Real-world applications

Configuring tools via PATH, HOME, EDITOR.
Build systems that set environment flags.
Scripts that pass secrets via environment variables.

Where you’ll apply it

In this project: see §3.2 Functional Requirements and §5.4 Concepts.
Also used in: P04 Built-in Commands Engine

References

POSIX Shell Command Language (environment and variable rules).
“The Linux Programming Interface” (environment handling).

Key insights The environment is a filtered view of shell variables, not a separate universe.

Summary Variable management is about scoping, export rules, and correct inheritance into child processes.

Homework/Exercises to practice the concept

Implement a variable table with export flags.
Support temporary assignments for single commands.
Add export and unset built-ins with correct behavior.

Solutions to the homework/exercises

Store name/value pairs with a boolean exported flag.
Build a temporary envp array for the child process.
Remove or mark variables and rebuild envp on exec.

Argument Vector Construction and PATH Lookup

Fundamentals Before a shell can execute a command, it must convert a line of text into an argument vector (argv) and locate the program on disk. This seems simple, but it is the glue between parsing and execution. The first word becomes the command name, and the remaining words become arguments passed to the program. For external commands without a slash, the shell must search each directory in the PATH environment variable, build candidate paths, and test whether they are executable. A correct implementation handles empty path elements, . in PATH, and permission errors. Even minimal shells depend on correct argv construction and path resolution.

Deep Dive into the concept The execve() system call expects two critical inputs: argv, an array of strings where argv[0] is the program name, and envp, an array of environment variables. Building argv is easy only if you ignore quoting, escaping, and expansions; for a minimal executor, you may split on whitespace, but the code should still produce a valid, NULL-terminated array. In a full shell, argv construction happens after expansions, quote removal, and field splitting. The order matters: for example, quoted strings should not be split by whitespace, and globbing should expand to multiple argv entries. Even in a minimal shell, you should treat consecutive spaces as a single separator and preserve the token order exactly.

PATH lookup is equally subtle. If the command contains a /, the shell must treat it as a path and attempt to execute directly. If it does not, the shell searches the colon-separated list in PATH. Each element can be empty; an empty element means “current directory.” When you iterate over PATH, you must build dir + "/" + cmd carefully, handle trailing slashes, and check execute permissions with access(path, X_OK) or by attempting execve and checking errno. The shell should distinguish between “not found” and “not executable”: POSIX uses exit status 127 for missing commands and 126 for found but non-executable commands. For errors like ENOEXEC (text file without shebang), the shell may choose to run it with /bin/sh or return an error, depending on your design scope.

Another detail is the difference between execvp() and manual PATH search. execvp() searches PATH for you, but you still need to handle error mapping and messaging. If you implement the search manually, you can produce more informative diagnostics, record the resolved path, and implement hashing to cache results. Shells like bash maintain a hash table of command lookups to avoid repeated directory scans, invalidating the cache when PATH changes. While your minimal executor may skip hashing, you should structure the code so it can be added later.

Finally, argument vector construction is not just string splitting; it is a data-structure problem. You must allocate storage for each token, store them in a contiguous char* array, and ensure the array is NULL-terminated. You also need to decide ownership and lifetimes: who frees the strings after exec or built-in handling? A consistent memory strategy will prevent leaks and double frees. Even a small shell benefits from a “command struct” that owns argv, the original line, and any metadata such as redirections.

How this fits on projects This concept connects tokenization to execution and appears in every project that launches commands or resolves filenames.

Definitions & key terms

argv: Argument vector passed to execve().
PATH: Colon-separated list of directories used for command lookup.
Shebang: #! line that selects an interpreter for a script.
X_OK: Permission check for executability.

Mental model diagram

"ls -l /tmp" -> tokens -> argv[] -> PATH search -> /bin/ls -> execve

How it works (step-by-step)

Tokenize line into words.
Build argv array and append NULL terminator.
If command contains /, attempt direct execve.
Else iterate PATH entries, build candidate paths, test execute.
On success, execve the resolved path; on failure, map errno.

Minimal concrete example

char *argv[] = {"ls", "-l", NULL};
execvp(argv[0], argv); // uses PATH automatically

Common misconceptions

“argv[0] doesn’t matter” -> many programs read argv[0] for mode.
“PATH search is simple” -> empty elements mean current directory.
“execvp handles errors” -> you must map errno to shell status.

Check-your-understanding questions

When should the shell skip PATH search?
What is the meaning of an empty PATH element?
Why is argv required to be NULL-terminated?

Check-your-understanding answers

When the command contains a / path separator.
It refers to the current directory.
execve uses NULL to know where the argument list ends.

Real-world applications

Any CLI launcher or shell.
Build systems and process managers.
Scripting environments that run external tools.

Where you’ll apply it

In this project: see §3.2 Functional Requirements and §4.4 Data Structures.
Also used in: P01 Minimal Command Executor, P02 Shell Lexer/Tokenizer, P10 Globbing Engine, P13 Tab Completion Engine

References

POSIX Shell Command Language (command search).
“Advanced Programming in the UNIX Environment” (exec family).

Key insights Correct argv construction and PATH lookup prevent confusing “command not found” failures.

Summary Tokenization and PATH search are the bridge from text input to executable code.

Homework/Exercises to practice the concept

Implement a manual PATH search and print the resolved path.
Test how empty PATH entries behave with . directories.
Create a command with a slash and confirm PATH is ignored.

Solutions to the homework/exercises

Split PATH on : and join with / and the command name.
Insert empty elements and confirm lookup uses current directory.
Use ./program to verify direct execution is attempted.

3. Project Specification

3.1 What You Will Build

A variable table with export flags and helpers to build envp for child processes.

Included:

Core feature set described above
Deterministic CLI behavior and exit codes

Excluded:

No advanced parameter expansion; focus on storage and inheritance.

3.2 Functional Requirements

Requirement 1: Store variables in a map with export flags.
Requirement 2: Implement export, unset, and assignment parsing.
Requirement 3: Support temporary assignments for a single command.
Requirement 4: Build envp for exec from exported variables.
Requirement 5: Display variables in a deterministic order.

3.3 Non-Functional Requirements

Performance: Interactive latency under 50ms for typical inputs; pipeline setup should scale linearly.
Reliability: No crashes on malformed input; errors reported clearly with non-zero status.
Usability: Clear prompts, deterministic behavior, and predictable error messages.

3.4 Example Usage / Output

$ ./mysh
mysh> FOO=bar
mysh> echo $FOO
bar
mysh> export FOO
mysh> /bin/sh -c 'echo $FOO'
bar
mysh> unset FOO
mysh> /bin/sh -c 'echo $FOO'

3.5 Data Formats / Schemas / Protocols

Variable table: name, value, exported.

3.6 Edge Cases

Invalid variable names
Empty values
Unsetting exported variables

3.7 Real World Outcome

This is the exact behavior you should be able to demonstrate.

3.7.1 How to Run (Copy/Paste)

make
./mysh

3.7.2 Golden Path Demo (Deterministic)

$ ./mysh
mysh> FOO=bar
mysh> echo $FOO
bar
mysh> export FOO
mysh> /bin/sh -c 'echo $FOO'
bar
mysh> unset FOO
mysh> /bin/sh -c 'echo $FOO'

3.7.3 Failure Demo (Deterministic)

$ ./mysh
mysh> not_a_command
mysh> echo $?
127

4. Solution Architecture

4.1 High-Level Design

[Input] -> [Parser/Lexer] -> [Core Engine] -> [Executor/Output]

4.2 Key Components

Component	Responsibility	Key Decisions
Var Store	Map of variables and export flags	Simple hash table or ordered list.
Env Builder	Construct envp array	Rebuild on exec.
Assignment Parser	Parse NAME=VALUE and scoping	Handles temporary assignments.

4.4 Data Structures (No Full Code)

struct Var { char *name; char *value; int exported; };

4.4 Algorithm Overview

Key Algorithm: Env Build

Iterate vars
include exported
build KEY=VALUE

Complexity Analysis:

Time: O(n) time
Space: O(n) time

5. Implementation Guide

5.1 Development Environment Setup

# install dependencies (if any)
# build
make

5.2 Project Structure

project-root/
├── src/
│   ├── main.c
│   ├── lexer.c
│   └── executor.c
├── tests/
│   └── test_basic.sh
├── Makefile
└── README.md

5.3 The Core Question You’re Answering

How does a shell manage variables differently from the OS environment?

5.4 Concepts You Must Understand First

Stop and research these before coding:

Environment inheritance
Shell parameters
Assignment semantics

5.5 Questions to Guide Your Design

5.6 Thinking Exercise

The “Temporary Assignment” Problem

Explain what happens here:

FOO=1 echo $FOO

5.7 The Interview Questions They’ll Ask

5.8 Hints in Layers

Hint 1: Use a struct with flags Store name, value, and exported flag.

Hint 2: Rebuild env array When execing, build char** envp from exported variables.

Hint 3: Parse assignment tokens Detect NAME=value before command execution.

Hint 4: Preserve insertion order Optional but helpful for predictable env output.

5.9 Books That Will Help

Topic	Book	Chapter
Environment	“The Linux Programming Interface”	Ch. 6
Shell variables	“Shell Programming in Unix, Linux and OS X”	Ch. 3
POSIX parameters	POSIX Shell Command Language	Parameters

5.10 Implementation Phases

Phase 1: Foundation (2-3 days)

Goals:

Define data structures and interfaces
Build a minimal end-to-end demo

Tasks:

Implement the core data structures
Build a tiny CLI or harness for manual tests

Checkpoint: A demo command runs end-to-end with clear logging.

Phase 2: Core Functionality (1 week)

Goals:

Implement full feature set
Validate with unit tests

Tasks:

Implement core requirements
Add error handling and edge cases

Checkpoint: All functional requirements pass basic tests.

Phase 3: Polish & Edge Cases (2-4 days)

Goals:

Harden for weird inputs
Improve UX and documentation

Tasks:

Add edge-case tests
Document design decisions

Checkpoint: Deterministic golden demo and clean error output.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Parsing depth	Minimal vs full	Incremental	Start small, expand safely
Error policy	Silent vs verbose	Verbose	Debuggability for learners

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Test individual components	Tokenizer, matcher, env builder
Integration Tests	Test component interactions	Full command lines
Edge Case Tests	Handle boundary conditions	Empty input, bad args

6.2 Critical Test Cases

Golden Path: Run the canonical demo and verify output.
Failure Path: Provide invalid input and confirm error status.
Stress Path: Run repeated commands to detect leaks or state corruption.

6.3 Test Data

input: echo hello
output: hello

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Misordered redirection	Output goes to wrong place	Apply redirections left-to-right
Leaked file descriptors	Commands hang waiting for EOF	Close unused fds in parent/child
Incorrect exit status	`&&`/`\|\|` behave wrong	Use waitpid macros correctly

7.2 Debugging Strategies

Trace syscalls: Use strace/dtruss to verify fork/exec/dup2 order.
Log state transitions: Print parser states and job table changes in debug mode.
Compare with dash: Run the same input in a reference shell.

7.3 Performance Traps

Avoid O(n^2) behavior in hot paths like line editing.
Minimize allocations inside the REPL loop.

8. Extensions & Challenges

8.1 Beginner Extensions

Add a help built-in with usage docs.
Add colored prompt themes.

8.2 Intermediate Extensions

Add a simple profiling mode for command timing.
Implement a which built-in using PATH lookup.

8.3 Advanced Extensions

Add programmable completion or plugin system.
Add a scriptable test harness with golden outputs.

9. Real-World Connections

9.1 Industry Applications

Build systems: shells orchestrate compilation and test pipelines.
DevOps automation: scripts manage deployments and infrastructure.

bash: The most common interactive shell.
dash: Minimal POSIX shell often used as /bin/sh.
zsh: Feature-rich interactive shell.

9.3 Interview Relevance

Process creation and lifecycle questions.
Parsing and system programming design trade-offs.

10. Resources

10.1 Essential Reading

“The Linux Programming Interface” by Michael Kerrisk - focus on the chapters relevant to this project.
“Advanced Programming in the UNIX Environment” - process control and pipes.

10.2 Video Resources

Unix process model lectures (any OS course).
Compiler front-end videos for lexing/parsing projects.

10.3 Tools & Documentation

strace/dtruss: inspect syscalls.
man pages: fork, execve, waitpid, pipe, dup2.

11. Self-Assessment Checklist

11.1 Understanding

I can explain the core concept without notes.
I can trace a command through my subsystem.
I understand at least one key design trade-off.

11.2 Implementation

All functional requirements are met.
All critical tests pass.
Edge cases are handled cleanly.

11.3 Growth

I documented lessons learned.
I can explain this project in an interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

Core feature works for the golden demo.
Errors are handled with non-zero status.
Code is readable and buildable.

Full Completion:

All functional requirements met.
Tests cover edge cases and failures.

Excellence (Going Above & Beyond):

Performance benchmarks and clear documentation.
Behavior compared against a reference shell.

Project 7: Environment Variable Manager

Quick Reference

1. Learning Objectives

2. All Theory Needed (Per-Concept Breakdown)

Environment Variables, Export, and Scoping

Argument Vector Construction and PATH Lookup

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Non-Functional Requirements

3.4 Example Usage / Output

3.5 Data Formats / Schemas / Protocols

3.6 Edge Cases

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

3.7.2 Golden Path Demo (Deterministic)

3.7.3 Failure Demo (Deterministic)

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.4 Data Structures (No Full Code)

4.4 Algorithm Overview

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 The Core Question You’re Answering

5.4 Concepts You Must Understand First

5.5 Questions to Guide Your Design

5.6 Thinking Exercise

5.7 The Interview Questions They’ll Ask

5.8 Hints in Layers

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (2-3 days)

Phase 2: Core Functionality (1 week)

Phase 3: Polish & Edge Cases (2-4 days)

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

6.3 Test Data

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

7.3 Performance Traps

8. Extensions & Challenges

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.2 Related Open Source Projects

9.3 Interview Relevance

10. Resources

10.1 Essential Reading

10.2 Video Resources

10.3 Tools & Documentation

10.4 Related Projects in This Series

11. Self-Assessment Checklist

11.1 Understanding

11.2 Implementation

11.3 Growth

12. Submission / Completion Criteria