Learn The `sed` Command: From Stream Editor to Text Manipulation Wizard

Goal: Build deep, operational mastery of sed as a stream editor: how it reads input, how the execution cycle works, how addresses select lines, how regular expressions and substitutions transform text, and how the hold space enables multi-line logic. By the end, you will be able to design safe, portable sed scripts that automate config edits, log cleanup, and structured text transformations with confidence. You will also learn how to debug sed programs, reason about their side effects, and choose when sed is the right tool versus awk, perl, or a full scripting language.

Introduction

sed is a stream editor: it reads text from a file or pipeline, applies a script of editing commands to each line (or group of lines), and writes the transformed output to standard output. It is designed for one-pass, streaming transformations, which makes it fast and composable in Unix pipelines. It is not an interactive editor; it is a programmable text transformation engine.

What you will build (by the end of this guide):

One-liner and script-based sed tools that safely edit configuration files
Log-cleaning pipelines that extract and reformat structured messages
A mini Markdown-to-HTML converter built entirely with sed scripts
Multi-line parsing tools that use the hold space for block-level edits
Advanced stream transformations like reversing files line-by-line

Scope (what is included):

The sed execution model (pattern space, hold space, execution cycle)
Addressing, ranges, and selection logic
Regular expressions (BRE and ERE) as used by sed
Substitution mechanics and backreferences
Script structure, control flow, and multi-line techniques
In-place editing and portability across GNU/BSD implementations

Out of scope (for this guide):

Building full parsers for complex languages (use a parser generator)
Large-scale ETL pipelines (use awk, python, or jq for JSON)
Binary file manipulation (use od, xxd, or dedicated tools)

The Big Picture (Mental Model)

Input Stream     sed Script                      Output Stream
(file/stdin)     (addresses + commands)          (stdout/file)
    |                    |                            |
    v                    v                            v
+---------+   +---------------------------+     +--------------+
| line 1  |-->| addr -> cmd -> cmd -> cmd |---->| transformed  |
| line 2  |-->| addr -> cmd -> cmd -> cmd |---->| transformed  |
| line 3  |-->| addr -> cmd -> cmd -> cmd |---->| transformed  |
+---------+   +---------------------------+     +--------------+
      ^                   ^
      |                   |
      |             +-----------+
      +-------------| hold space|
                    +-----------+

Key Terms You Will See Everywhere

Pattern space: The current working buffer sed edits each cycle.
Hold space: A secondary buffer that persists across cycles.
Address: A selector that chooses which lines a command applies to.
Script: A sequence of editing commands (s, d, p, n, etc.).
Cycle: The read -> apply commands -> print -> reset loop.
BRE/ERE: Basic vs Extended Regular Expressions.
Backreference: A reference to captured text in a replacement.

How to Use This Guide

Read the Theory Primer first. Treat it like a mini-book. It builds the mental models you need to reason about sed correctly.
Then complete the projects in order. Each project adds a new capability and reuses the previous concepts.
Always validate by inspection. Use -n with explicit p to avoid accidental changes.
Practice portability. Test your commands on both GNU and BSD sed if you can (Linux vs macOS).
Keep a scratch file. You will learn faster by experimenting and observing pattern/hold space behavior.

Prerequisites & Background

Before starting these projects, you should have foundational understanding in these areas:

Essential Prerequisites (Must Have)

Command-Line Skills:

Navigating directories, redirecting input/output, and piping commands
Editing files with cat, printf, and a text editor
Basic shell scripting (variables, quoting, and subshells)

Text and Regex Fundamentals:

What a regular expression is and how it matches text
The meaning of . * ^ $ and character classes like [0-9]
The idea of capturing groups and backreferences

Unix Tools Fundamentals:

When to use grep, awk, and sed
The difference between stdout, stderr, and files

Helpful But Not Required

Advanced Regex:

Lookarounds and non-greedy matching (not supported in sed BRE/ERE, but helpful context)
Named groups (not supported in sed, but helps mental mapping)

Scripting Discipline:

Writing small test fixtures
Running scripts against sample data and checking diffs

Self-Assessment Questions

Before starting, ask yourself:

Can I explain what a pipeline does in a Unix shell?
Do I understand the difference between stdout and an in-place edit?
Can I read and write a simple regex like ^ERROR or [0-9]+?
Am I comfortable editing a file with a script rather than a GUI editor?
Do I know how to make backups before destructive edits?

If you answered “no” to questions 1-3: Spend a few days with basic CLI and regex resources before starting.

If you answered “yes” to all 5: You are ready to begin.

Development Environment Setup

Required Tools:

A Unix-like shell (Linux, macOS, WSL)
sed (GNU or BSD)
A text editor (vim, nano, VS Code)

Recommended Tools:

rg (ripgrep) for fast pattern searches
diff or git diff for validating file edits
perl or awk for comparison exercises

Testing Your Setup:

# Check sed availability
$ sed --version 2>/dev/null | head -n 1 || sed -h 2>&1 | head -n 1

# Verify ERE support (GNU: -E or -r, BSD: -E)
$ printf 'a1\n' | sed -E 's/[0-9]+/X/'
X

# Confirm you can do a dry-run substitution
$ printf 'DEBUG=true\n' | sed 's/true/false/'
DEBUG=false

Time Investment

Simple projects (1, 2): Weekend (4-8 hours each)
Moderate projects (3): 1 week (10-20 hours)
Complex projects (4, 5): 1-2 weeks each
Total sprint: 5-8 weeks if done sequentially

Important Reality Check

sed mastery is a compounding skill. The learning happens in layers:

First pass: Make it work (copy-paste is fine to start)
Second pass: Understand what each command does
Third pass: Understand why the command order matters
Fourth pass: Understand portability and failure modes

This is normal. Stream editing is subtle. Take your time.

Big Picture / Mental Model

sed is a tiny virtual machine with two buffers and a loop.

                +------------------+
Input stream -->|  Pattern space   |--> output (unless -n)
                +------------------+
                       ^    |
                       |    v
                 +------------------+
                 |   Hold space     |
                 +------------------+

Cycle:
1) Read next line into pattern space
2) Run script commands in order
3) If not suppressed, print pattern space
4) Clear pattern space and repeat

Once you internalize this loop, every sed script becomes predictable.

Theory Primer

This section is the mini-book. Each chapter is a concept cluster you will use in the projects.

Chapter 1: Stream Editing Execution Model

Fundamentals

sed is built around a repeatable execution cycle. Think of it as a deterministic loop that reads one line, applies a fixed script, and then outputs. This is the core reason sed is fast and composable: it does not need to load an entire file into memory, and it can start producing output as soon as the first line is processed. Most sed programs are therefore streaming transformations rather than whole-file transformations.

The cycle has a working buffer called the pattern space. Each cycle starts with sed reading the next line of input into the pattern space. It then runs your script (a list of commands like s, d, p, n, q) in order. After those commands run, the pattern space is printed automatically unless printing is suppressed with -n or the line has been deleted with d. The cycle then repeats.

This mental model explains 80 percent of sed behavior. If you understand how and when the pattern space changes, you will always be able to predict the output. It also explains why command order matters: each command sees the result of the previous command in the same pattern space.

There is also a second buffer, the hold space. Unlike the pattern space, the hold space persists across cycles. It exists specifically to solve the problem of multi-line logic in a tool designed for line-at-a-time processing. Commands like h, H, g, G, and x let you move data between the pattern space and hold space. You will use this later for advanced projects, but it is still part of the execution model from the beginning.

In short: sed is a small virtual machine with a loop and two buffers. If you understand the loop and the buffers, you understand sed.

Deep Dive into the Concept

The stream-editing model is not just an implementation detail; it is a programming style. When you write a sed script, you are writing instructions for a deterministic pipeline. That pipeline has invariants:

Invariant 1: The input is processed in order. sed reads input from left to right and line by line. A single cycle cannot reach backward unless you stored something in the hold space earlier.
Invariant 2: Commands run in script order. If you run s/old/new/ and then s/new/old/, you will likely undo your own work. Order is a feature, not an accident.
Invariant 3: The pattern space is the only mutable working buffer. Unless you explicitly copy something into the hold space, all state is transient and will be replaced by the next line.
Invariant 4: Output is a side effect of the end of a cycle. sed prints automatically unless you explicitly suppress printing or delete the pattern space.

The last invariant is why -n is so powerful. With -n, you disable automatic printing and take full control with p. This is the safest way to develop sed scripts because it prevents accidental output when a command fails to match. It also makes complex scripts easier to reason about: output becomes an explicit decision rather than a default side effect.

The execution cycle also interacts with commands that explicitly advance input. The n command reads the next input line and starts a new cycle (or continues commands depending on the implementation). The N command appends the next line to the current pattern space with a newline, creating a multi-line pattern space. The moment you use N, your script changes from line-at-a-time processing to multi-line processing. This is a powerful technique but also a common source of bugs. When the pattern space contains embedded newlines, regex anchors like ^ and $ still match only the beginning and end of the entire pattern space, not each line. This matters when you attempt to apply line-based logic to multi-line buffers.

Another practical constraint: POSIX requires the pattern space and hold space to hold at least 8192 bytes. Many implementations allow larger buffers, but multi-line accumulation can still hit limits on constrained systems. Design multi-line scripts that keep buffers small and stream whenever possible.

Another subtle point: sed is defined by POSIX and implemented by GNU, BSD, BusyBox, and other variants. The core cycle is the same, but the limits and extensions can differ. For example, the POSIX specification describes a pattern space and hold space and explains the read-edit-print cycle, but GNU sed adds debugging and extra flags. POSIX also requires the pattern space and hold space to be able to hold at least 8192 bytes; most modern implementations allow more, but scripts that accumulate large blocks can still hit limits in constrained environments. Understanding the model (and the limits) allows you to write scripts that are portable and correct across these variants.

Failure modes are often execution-cycle failures:

Double printing: forgetting -n and using p results in duplicate output.
Unexpected deletion: using d mid-script discards the pattern space and immediately starts the next cycle, skipping commands that follow.
Silent no-op: a command that is not addressed to the correct lines does nothing, so your output remains unchanged.
Misordered commands: applying transformations in the wrong order makes later patterns fail or match unintended text.

A good debugging practice is to start with -n, then add p for the lines you want to inspect. When you are confident, remove -n or keep it and explicitly print only what you want. This approach aligns with the execution model and reduces mistakes.

How This Fits on Projects

The execution model powers every project in this guide. Projects 1 and 2 rely on understanding that each line is processed independently unless you explicitly combine lines. Projects 3 and 4 require careful ordering of multiple commands in a single script. Project 5 depends on mastering the cycle and using the hold space to accumulate results across cycles.

Definitions and Key Terms

Stream editor: A tool that processes input sequentially and produces output as it goes, without loading the entire file.
Pattern space: The current line or multi-line buffer being edited.
Hold space: A persistent secondary buffer for cross-line state.
Cycle: The read -> edit -> print -> reset loop.
Automatic printing: Default behavior that prints pattern space at the end of each cycle.
Script: The ordered list of sed commands.

Mental Model Diagram

Cycle N
--------
1) Read input line into pattern space
2) Run commands in order
3) (Optional) Print pattern space
4) Clear pattern space
5) Next line -> next cycle

Hold Space
----------
Persists across cycles; used only when you copy or exchange data

How It Works (Step-by-Step)

sed reads a line from stdin or a file into the pattern space.
It runs each command in your script from top to bottom.
If a command deletes the pattern space (d), it immediately starts the next cycle.
If printing is not suppressed, the pattern space is printed at the end.
The pattern space is cleared, and the next line is read.
The hold space is unchanged unless you explicitly modify it.

Minimal Concrete Example

# Print only the first three lines (explicit output control)
$ sed -n '1,3p' file.txt

Common Misconceptions

Misconception: sed loads the entire file into memory.
- Correction: sed is a stream editor; it works line by line by default.
Misconception: -n is optional and only for performance.
- Correction: -n is for correctness and control of output.
Misconception: The hold space is always used automatically.
- Correction: The hold space is only used when you explicitly move data.

Check-Your-Understanding Questions

What is the difference between the pattern space and the hold space?
Why does sed -n prevent accidental duplicate output?
What happens to the pattern space when the d command executes?
When you use N, how does the pattern space change?

Check-Your-Understanding Answers

The pattern space is the current working buffer for the line being processed; the hold space is a persistent buffer across cycles.
-n disables automatic printing so only explicit p commands output text.
The current pattern space is deleted and sed immediately starts the next cycle.
N appends the next input line to the current pattern space with a newline, creating a multi-line buffer.

Real-World Applications

Filtering and transforming logs in real time
Applying safe configuration edits in deployment scripts
Converting structured text formats without loading large files
Building data cleanup stages in shell pipelines

Where You Will Apply It

Project 1: Config File Updater
Project 2: Log File Cleaner
Project 3: Markdown to HTML Converter
Project 4: Multi-Line Address Parser
Project 5: Reversing a File

References

GNU sed Manual: overview, command behavior, and GNU extensions
- https://www.gnu.org/software/sed/manual/sed.html
POSIX sed specification (execution cycle, pattern and hold space, minimum buffer size)
- https://man7.org/linux/man-pages/man1/sed.1p.html
OpenBSD sed manual (portable behavior reference)
- https://man.openbsd.org/OpenBSD-current/sed.1

Key Insight

If you can predict the state of the pattern space at every point in the script, you can predict the output.

Summary

The execution cycle is the backbone of sed. Learn it once and every command becomes an understandable transformation rather than a magic incantation.

Homework/Exercises to Practice the Concept

Write a sed command that prints only odd-numbered lines from a file.
Write a sed command that prints lines 5 through 10 and nothing else.
Build a one-liner that deletes any blank line and prints the rest.

Solutions to the Homework/Exercises

sed -n '1~2p' file.txt (GNU extension) or sed -n '1p;3p;5p' for POSIX.
sed -n '5,10p' file.txt
sed '/^$/d' file.txt

Chapter 2: Addressing and Line Selection

Fundamentals

Addresses are how you tell sed where a command should apply. Without an address, a command applies to every line. With an address, it applies only to lines that match a specific line number, range, or regular expression. This is the core of sed precision: the same substitution command can be either a blunt hammer or a surgical scalpel depending on the address you attach to it.

The three most common address types are:

Line numbers: 3s/foo/bar/ applies only to line 3.
Ranges: 5,10d deletes lines 5 through 10.
Regex patterns: /ERROR/s/foo/bar/ applies only to lines matching ERROR.

A context address uses a delimiter to wrap the regex (traditionally /, but most implementations allow any delimiter except newline). If your pattern contains many slashes, you can switch delimiters to keep the address readable.

Addresses can be combined with the ! negation operator. !/ERROR/d or /ERROR/!d lets you invert logic and operate on lines that do not match a pattern. This is essential for filters.

The range address is especially powerful. /START/,/END/ selects a block from the first line matching START through the next line matching END, inclusive. This turns sed into a block processor that can isolate sections of a file without loading the whole file into memory.

Deep Dive into the Concept

Addresses are evaluated for each cycle and determine whether a command runs. This seems simple until you combine multiple addresses and commands. Here are key details and edge cases:

Range addresses have state. When you specify /START/,/END/, the range is inactive until a line matches START. After that, all lines are selected until a line matches END. Then the range deactivates and waits for the next START. This statefulness is why ranges can be used to process repeated blocks in a file.
Ranges are inclusive. Both the START and END lines are included. If you want to exclude the START or END line from output, you need an extra command to delete or print conditionally.
Regex addresses are evaluated against the current pattern space. If you have already modified the line with s, the address for subsequent commands sees the modified text, not the original line. This matters when you chain commands. If you want to address on the original line, put the address-based command before any substitution that changes the line.
The $ address selects the last line. This is commonly used for end-of-file logic or for printing a summary once.
Multiple addresses can be paired with a command. For example, 1,10!d means “delete every line that is not in the range 1 through 10.” This is a standard idiom for extracting ranges.
Order matters across commands. If you use one command to delete lines and another to substitute, the deleted lines are gone for subsequent commands in the same cycle. If you need to transform then delete, you must reorder accordingly.
GNU extensions exist. GNU sed provides additional addressing conveniences (like 0,/pattern/ for “up to first match”), but these are not portable. If you care about portability, stick to POSIX address forms.

Failures in sed scripts often come from addressing mistakes:

Off-by-one ranges: Forgetting that ranges are inclusive leads to deleting or printing one extra line.
Greedy regex addresses: A loose regex like /END/ might match unintended lines and prematurely close a range.
Modified text confusion: An address evaluated after substitutions might not match as expected.

A disciplined approach is to build your address first with -n and p. For example: sed -n '/START/,/END/p' file to confirm the selection, and then add your transformation command.

How This Fits on Projects

Addressing is essential in Projects 1 and 2 where you target specific config lines or log levels. Project 3 depends on addresses to identify headings and inline formatting. Project 4 uses range addresses to process multi-line blocks separated by blank lines. Project 5 uses addresses for “all but first” and “last line” behavior.

Definitions and Key Terms

Address: A selector that decides whether a command runs on a line.
Range: A stateful address that selects a start-to-end block.
Negation (!): Inverts an address selection.
Last line ($): An address for end-of-file logic.

Mental Model Diagram

Line stream
-----------
[no match]
[START]  -> range activates
[in range]
[in range]
[END]    -> range ends
[no match]
[START]  -> range activates again
[END]    -> range ends

How It Works (Step-by-Step)

sed reads a line into the pattern space.
It evaluates the address for the current command.
If the address matches, the command executes.
For range addresses, sed maintains internal state to know whether it is currently in-range.
The next command runs, possibly with a different address.

Minimal Concrete Example

# Print only lines between START and END markers
$ sed -n '/^START/,/^END/p' file.txt

Common Misconceptions

Misconception: Addresses are evaluated before any command runs.
- Correction: Addresses are evaluated at the moment the command is reached, against the current pattern space.
Misconception: Ranges only occur once.
- Correction: Ranges can activate multiple times as the file is processed.
Misconception: /pattern/ always matches the original file line.
- Correction: It matches the current pattern space, which might already be modified.

Check-Your-Understanding Questions

What is the difference between 3d and /3/d?
How does /START/,/END/ behave if END appears before START?
What does 1,10!d do?
Why might /ERROR/ fail to match after a substitution?

Check-Your-Understanding Answers

3d deletes line number 3. /3/d deletes any line containing the character 3.
The range only activates when START appears; END before START does nothing.
It deletes every line not in the range 1 through 10 (so it keeps only lines 1-10).
Because the substitution may have removed or changed the substring ERROR in the pattern space.

Real-World Applications

Extracting specific sections of config files
Deleting blocks of text between markers
Filtering logs to only a severity range
Applying transformations to only lines matching a pattern

Where You Will Apply It

Project 1: Targeting a specific config line with a regex address
Project 2: Filtering ERROR lines and deleting everything else
Project 3: Targeting headings and emphasis markers
Project 4: Block selection with range addresses
Project 5: Line 1 and last-line logic

References

GNU sed Manual: address forms and command application
- https://www.gnu.org/software/sed/manual/sed.html
POSIX sed specification: address syntax, range behavior, and execution model
- https://man7.org/linux/man-pages/man1/sed.1p.html
OpenBSD sed manual: address forms and context address rules
- https://man.openbsd.org/OpenBSD-current/sed.1

Key Insight

Addresses are the “if” statement of sed. They are your control flow for where transformations happen.

Summary

Mastering addresses turns sed from a blunt text replacer into a targeted text surgeon.

Homework/Exercises to Practice the Concept

Print only the last line of a file.
Delete all lines between two markers, inclusive.
Print all lines except those containing the word DEBUG.

Solutions to the Homework/Exercises

sed -n '$p' file.txt
sed '/^START/,/^END/d' file.txt
sed '/DEBUG/d' file.txt

Chapter 3: Regular Expressions and Substitution

Fundamentals

The s (substitute) command is the heart of sed. Its syntax is simple: s/regex/replacement/flags. The power comes from the regular expression and the replacement string. sed uses Basic Regular Expressions (BRE) by default. In BRE, characters like +, ?, |, (, ) and {} are not special unless escaped. With Extended Regular Expressions (ERE) (enabled by -E), those operators become special without escaping. Some implementations (GNU and OpenBSD) also accept -r as an alias for extended regex, but -E is the portable POSIX-friendly choice.

Substitution is not just find-and-replace. With capturing groups and backreferences, you can reformat text, reorder fields, and normalize syntax. For example, you can convert YYYY/MM/DD to DD-MM-YYYY or extract only the meaningful portion of a log line.

The replacement string has its own mini-language. & expands to the entire matched text. \1, \2, etc. expand to captured groups. If you want to include the literal delimiter or a backslash, you must escape it appropriately.

Deep Dive into the Concept

Regex and substitution are the most common source of subtle sed bugs. The key is to distinguish between matching and replacing:

Matching (left side): This is a regex. It can include anchors (^, $), character classes ([[:digit:]]), and grouping ($...$ in BRE or ( ... ) in ERE).
Replacement (right side): This is not a regex; it is a literal string with backreference expansions. A dot . in the replacement is just a dot. Parentheses are not groups in the replacement. Only & and \1, \2, etc. have special meaning.

A typical mistake is to forget that BRE requires escaping of grouping parentheses. Another is to forget that the default substitution replaces only the first match on each line. The g flag is required to replace all matches on a line. The p flag prints lines where a substitution occurred (useful with -n). The i flag makes matching case-insensitive in GNU/BSD sed.

Delimiters are another common source of errors. The slash / is conventional but not required. You can use s|a/b|c/d| to avoid escaping every slash when manipulating paths. The delimiter is any single character that does not appear unescaped in the regex or replacement.

Substitution interacts deeply with the execution cycle. If you do multiple substitutions in the same script, they operate on the already modified pattern space. This is both a feature and a trap. For example:

# Order matters
sed -e 's/foo/bar/' -e 's/bar/baz/'

This will convert foo to baz, even though you never matched baz in the original input. That is correct behavior, but if you did not expect it, your output will be surprising.

Regex in sed follows POSIX ERE/BRE rules, which means:

No non-greedy quantifiers
No lookahead or lookbehind
No named capture groups
Anchors and classes behave predictably and portably

These limits are features for portability. They force you to write explicit patterns. If you need advanced regex features, perl is often a better tool.

Failure modes in substitution include:

Under-escaping: parentheses or braces in BRE are treated literally, so your backreferences are empty.
Over-escaping: escaping characters that are already literal can lead to mismatches.
Missing g: only the first occurrence is replaced.
Greedy patterns: .* can swallow too much and leave capture groups empty.
Unintended matches: patterns that are too loose (like .*) match lines you did not intend.

A disciplined approach is to start by matching only and printing with -n and p. Once the regex matches the correct part, add a substitution. Then introduce backreferences one at a time.

How This Fits on Projects

Projects 1 and 2 are direct substitutions with anchors and capturing groups. Project 3 requires multiple ordered substitutions to implement a small Markdown parser. Projects 4 and 5 depend on regex working across multi-line pattern spaces (where . does not match newlines unless you build them into the pattern space).

Definitions and Key Terms

BRE: Basic Regular Expressions (default in sed).
ERE: Extended Regular Expressions (enabled with -E).
Backreference: \1, \2, etc., referencing captured groups.
Delimiter: The character that separates the s command parts.
Flag: A modifier like g, p, or i.

Mental Model Diagram

Input line:  [2025-01-01] [ERROR] Disk full
Regex:       ^\[[^]]+\] \[([A-Z]+)\] (.*)$
Groups:           \1=ERROR         \2=Disk full
Replacement: \1: \2
Output:      ERROR: Disk full

How It Works (Step-by-Step)

sed evaluates the regex against the pattern space.
If the regex matches, capture groups are recorded.
The replacement string is constructed using literals, &, and backreferences.
The pattern space is replaced with the constructed string.
Flags decide whether to replace globally or print on substitution.

Minimal Concrete Example

# Convert a date from MM/DD/YYYY to YYYY-MM-DD
$ printf '12/31/2025\n' | sed -E 's#([0-9]{2})/([0-9]{2})/([0-9]{4})#\3-\1-\2#'
2025-12-31

Common Misconceptions

Misconception: () works for groups without -E.
- Correction: In BRE, you must escape parentheses: $ ... $.
Misconception: The replacement is also a regex.
- Correction: The replacement is a literal string with backreferences.
Misconception: s/foo/bar/ replaces all occurrences.
- Correction: It replaces only the first match on each line unless you add g.

Check-Your-Understanding Questions

What is the difference between BRE and ERE in sed?
Why does s/foo/bar/ not replace multiple occurrences on the same line?
What does & mean in the replacement string?
When should you change the delimiter in s///?

Check-Your-Understanding Answers

BRE requires escaping of (), {}, +, and ?, while ERE treats them as special without escaping.
The s command replaces only the first match by default; use the g flag for all matches.
& expands to the entire matched text.
When the pattern or replacement contains many / characters, using another delimiter reduces escaping.

Real-World Applications

Normalizing date formats in logs
Reformatting CSV fields
Redacting secrets or tokens in configuration files
Converting inline Markdown emphasis to HTML tags

Where You Will Apply It

Project 1: Config value substitution
Project 2: Capturing log message bodies
Project 3: Markdown conversion

References

GNU sed Manual: s command, replacement semantics, and regex options
- https://www.gnu.org/software/sed/manual/sed.html
OpenBSD sed manual (BRE by default, -E for ERE, -r alias)
- https://man.openbsd.org/OpenBSD-current/sed.1
POSIX sed specification (BRE/ERE definitions and portability)
- https://man7.org/linux/man-pages/man1/sed.1p.html

Key Insight

Substitution is not just replacement; it is structured text extraction and recomposition.

Summary

Regex and substitution are the core of sed power. Learn them precisely and your scripts will be short, reliable, and expressive.

Homework/Exercises to Practice the Concept

Replace all tabs with a single space in a file.
Convert Last, First into First Last.
Remove trailing whitespace from every line.

Solutions to the Homework/Exercises

sed 's/\t/ /g' file.txt
sed -E 's/^([^,]+), ([^ ]+)$/\2 \1/' file.txt
sed 's/[[:space:]]*$//' file.txt

Chapter 4: Script Composition and Control Flow

Fundamentals

A sed script is just a sequence of commands. But as soon as you have more than one command, ordering and control flow matter. You can write scripts inline with -e or in a separate file with -f, and you can group commands with { ... } to apply them under a shared address. These features turn sed from a one-liner tool into a tiny programming language.

The basic control flow tools are:

b (branch): jump to a label unconditionally
t (test): jump to a label only if the last substitution succeeded
q (quit): stop processing and exit
d (delete): delete pattern space and start next cycle
n (next line): read the next line into pattern space

You do not need these for simple substitutions, but you do for complex multi-step scripts, especially when you want to avoid repeated work or create custom output logic.

Deep Dive into the Concept

Script composition is where sed starts to feel like a language rather than a command. The rules are strict but consistent:

Commands run in order. Each command sees the modified pattern space from earlier commands. This means you can build pipelines within a single line transformation. It also means you can easily create conflicts if you reorder commands incorrectly.
Addressing applies to command blocks. You can attach an address to a { ... } block, which makes a group of commands conditional. This is how you create “if” statements without a real if.
Branching changes flow but not state. The b and t commands jump to labels in the script. They do not reset the pattern space. This allows you to implement loops and conditional execution, especially when combined with substitutions and the t command that checks whether a substitution succeeded.
d and q are cycle-breaking commands. d ends the current cycle immediately and starts the next one with a new line. q exits the entire program. These are powerful but dangerous if placed in the wrong spot. For example, a d in the middle of a script means later commands will never execute on that line.
Script files improve clarity and portability. As scripts grow, inline -e becomes unreadable. A .sed file allows comments, indentation, and easier maintenance. This is especially useful for multi-step transformations like Project 3 (Markdown conversion).

Control flow is also how you optimize scripts. The t command can be used to skip expensive commands if a substitution did not occur. Example: attempt a substitution; if it succeeds, branch to a label that prints and skips extra checks. This reduces redundant regex matching on large files.

Failure modes are common in control flow:

Hidden early exit: A q or d placed too early stops later commands.
Unreachable code: A branch that always jumps past a section makes those commands dead.
Unexpected fall-through: Forgetting to branch or quit when necessary can apply transformations multiple times.
Address-block confusion: A {} block with a range address might apply to more lines than intended.

The best way to debug is to add temporary p commands under specific addresses to observe intermediate states. GNU sed also provides --debug to trace execution, which is extremely useful when available.

How This Fits on Projects

Project 3 relies on multiple ordered substitutions and benefits from a .sed script file. Project 4 uses grouped commands to process blocks separated by blank lines. Project 5 uses conditional printing with $ and negation to control when output appears.

Definitions and Key Terms

Script file: A file passed with -f containing sed commands.
Command block: { ... } grouped commands under one address.
Label: A named target for branching (e.g., :loop).
Branch (b): Unconditional jump to a label.
Test (t): Conditional jump if last substitution succeeded.

Mental Model Diagram

Script flow
-----------
/start/ {
  s/foo/bar/
  t changed
  b end
}
:changed
  s/bar/BAZ/
:end

How It Works (Step-by-Step)

sed reads a line into the pattern space.
It runs the first command. If the command is inside a block, it only runs if the block address matches.
If a substitution succeeds, t label can jump to a label.
If d runs, the cycle ends immediately.
If q runs, the program exits.
If no early exit occurs, the cycle completes and output prints.

Minimal Concrete Example

# Use a script file to convert headings and bold text
$ cat > converter.sed <<'S'
/^[#][ ]/ { s/^# (.*)/<h1>\1<\/h1>/ }
/^[#][#][ ]/ { s/^## (.*)/<h2>\1<\/h2>/ }
S

$ sed -E -f converter.sed notes.md

Common Misconceptions

Misconception: sed scripts cannot branch.
- Correction: b and t provide branching and conditional control flow.
Misconception: d just deletes output but keeps running commands.
- Correction: d ends the cycle immediately.
Misconception: -e and -f are equivalent in readability.
- Correction: Script files are far more maintainable for multi-command logic.

Check-Your-Understanding Questions

What does t label do and when does it branch?
Why might d prevent later commands from running?
When should you move from -e to -f?
How can you conditionally apply multiple commands to a range?

Check-Your-Understanding Answers

t label branches to the label only if the last substitution succeeded.
d deletes the pattern space and immediately starts the next cycle.
When you have multiple commands and need clarity or reuse.
Use a range address with a { ... } block.

Real-World Applications

Multi-step log normalization pipelines
Safe refactoring of config files across many hosts
Format conversion scripts in CI/CD pipelines

Where You Will Apply It

Project 3: Multi-command Markdown conversion
Project 4: Block processing with command blocks
Project 5: Conditional printing and branching logic

References

GNU sed Manual: invocation (-e, -f), command summary, and GNU extensions
- https://www.gnu.org/software/sed/manual/sed.html
OpenBSD sed manual: option behavior and script parsing rules
- https://man.openbsd.org/OpenBSD-current/sed.1
POSIX sed specification: standard options and command language
- https://man7.org/linux/man-pages/man1/sed.1p.html

Key Insight

sed becomes truly powerful when you treat it as a small programming language with explicit control flow.

Summary

Script structure and control flow are what separate quick one-liners from production-ready transformations.

Homework/Exercises to Practice the Concept

Write a script file that replaces foo with bar, but only in lines between START and END.
Use t to perform a second substitution only if the first one matched.
Write a script that quits after printing the first matching line.

Solutions to the Homework/Exercises

sed -f script.sed file.txt with: /START/,/END/ { s/foo/bar/ }
sed -e 's/foo/bar/' -e 't done' -e 's/baz/qux/' -e ':done' file.txt
sed -n '/pattern/{p;q;}' file.txt

Chapter 5: Multi-Line Processing and the Hold Space

Fundamentals

By default, sed processes one line at a time. Multi-line logic requires you to deliberately expand the pattern space or use the hold space as persistent storage. The N command appends the next input line to the pattern space separated by a newline. Once you do this, your script is operating on multiple lines at once. The P and D commands are the line-oriented counterparts for printing or deleting only the first line of a multi-line pattern space.

The hold space is a second buffer that persists across cycles. Commands like h, H, g, G, and x let you copy or exchange data between the pattern space and hold space. This is how you implement operations that need memory, like reversing a file, joining lines, or processing blocks.

Deep Dive into the Concept

The hold space is often described as a “scratch pad,” but it is more than that. It is the only way to carry state across cycles. This makes it the secret weapon for advanced sed scripts.

Key mechanics:

h and H: h copies the pattern space into the hold space (overwriting). H appends the pattern space to the hold space (with a newline). This lets you accumulate data across multiple cycles.
g and G: g copies the hold space into the pattern space (overwriting). G appends the hold space to the pattern space (with a newline). This lets you inject stored data into the current working line.
x: Exchanges the pattern space and hold space. This is a quick swap and is often used in stateful transformations.

Multi-line pattern spaces introduce subtle regex behavior. Anchors ^ and $ still match the beginning and end of the entire pattern space, not each line. The dot . does not match newlines, so .* will stop at the newline unless you explicitly write a pattern that matches \n or you structure the regex across lines. This is why multi-line sed scripts are both powerful and tricky.

Common multi-line patterns:

Paragraph mode: Use /^$/ to detect blank lines, store a paragraph in the hold space, and then process it as a whole.
Two-line window: Use N to combine two lines, perform a substitution that uses both, then use P;D to step through the data without losing alignment.
File reversal: Use G;h to prepend lines into the hold space, then print at the end. This is a classic sed one-liner.

Failure modes:

Unexpected newlines: Using N without adjusting your regex leads to partial matches.
Hold space corruption: Overwriting the hold space when you meant to append leads to lost data.
Infinite loops: Poor use of D and N can create loops that never advance the input.

The safest way to develop multi-line scripts is to start with very small input files, add -n and explicit p, and trace the pattern/hold space at each step. GNU sed offers --debug to help with this on systems where available.

How This Fits on Projects

Project 4 is explicitly about multi-line block processing and hold space usage. Project 5 relies on hold space accumulation and careful control of output on the last line. These are the capstone skills for sed mastery.

Definitions and Key Terms

N: Append next line to pattern space with a newline.
P: Print first line of pattern space only.
D: Delete first line of pattern space and restart cycle on remainder.
Hold space: Persistent buffer across cycles.
Swap (x): Exchange pattern space and hold space.

Mental Model Diagram

Pattern space:  line1\nline2
Hold space:     (saved lines)

Commands:
  H  -> hold += pattern
  G  -> pattern += hold
  x  -> swap pattern <-> hold

How It Works (Step-by-Step)

Start with a normal line in pattern space.
N appends the next line, creating a multi-line pattern space.
Run substitutions that can see both lines.
Use P to print the first line only, or D to drop the first line and continue.
Use h, H, g, G, or x to persist state across cycles.

Minimal Concrete Example

# Join every pair of lines with a comma
$ sed 'N; s/\n/, /' file.txt

Common Misconceptions

Misconception: N starts a new cycle.
- Correction: N appends the next line to the current pattern space.
Misconception: P prints the entire pattern space.
- Correction: P prints only the first line of a multi-line pattern space.
Misconception: The hold space resets every line.
- Correction: It persists across cycles until you change it.

Check-Your-Understanding Questions

What is the difference between H and h?
Why does N change how ^ and $ behave in regex?
How do P and D help you slide through multi-line pattern space?
Why is x useful in multi-line scripts?

Check-Your-Understanding Answers

h overwrites the hold space; H appends with a newline.
^ and $ still match the beginning/end of the entire pattern space, not each line.
P prints only the first line; D deletes the first line and restarts the cycle with the remainder.
x swaps the pattern and hold spaces, letting you alternate between stored and current data.

Real-World Applications

Collapsing multi-line stack traces into single lines
Extracting blocks of text between markers
Reformatting multi-line configuration stanzas
Reversing files in environments without tac

Where You Will Apply It

Project 4: Parsing blocks separated by blank lines
Project 5: Reversing a file line by line

References

GNU sed Manual: command summary (N, P, D, h, H, g, G, x)
- https://www.gnu.org/software/sed/manual/sed.html
POSIX sed specification: pattern/hold space behavior and minimum buffer size
- https://man7.org/linux/man-pages/man1/sed.1p.html
OpenBSD sed manual: detailed command semantics for multi-line operations
- https://man.openbsd.org/OpenBSD-current/sed.1

Key Insight

Multi-line sed is not magic; it is deliberate manipulation of two buffers.

Summary

Hold space and multi-line commands are how you extend sed beyond single-line edits into real text transformations.

Homework/Exercises to Practice the Concept

Join every two lines in a file with a space.
Reverse a file with a sed one-liner.
Write a script that collects a paragraph (blank-line separated) and prints it in uppercase (use tr for uppercase if needed).

Solutions to the Homework/Exercises

sed 'N; s/\n/ /' file.txt
sed -n '1!G;h;$p' file.txt
Use sed '/^$/ { x; s/\n/ /g; p; x; d; } H' file.txt | tr a-z A-Z

Chapter 6: In-Place Editing and Portability

Fundamentals

sed is often used to edit files in place. This is convenient but dangerous if you do not understand how the -i flag works. GNU sed supports -i[SUFFIX] which edits files in place and optionally creates a backup. BSD/macOS sed expects an explicit extension argument (use -i '' for no backup or -i .bak for a backup file). Some BSDs treat -i with no argument as an error, which is a common portability trap.

Portability also affects regular expression syntax and option flags. GNU sed supports -E (extended regex) and -r (legacy extended regex), while BSD sed uses -E. POSIX defines a smaller option set (-n, -e, -f), so scripts that rely on -i, -E, or GNU-only extensions may not be portable.

The correct mindset is: develop in a safe, non-destructive mode, then add in-place editing only when you are confident in your script. Use backups for safety, especially in automation.

Deep Dive into the Concept

In-place editing sounds simple but hides tricky behavior. GNU sed implements -i by creating a temporary file, writing the transformed output there, and then renaming it to replace the original file. If a suffix is provided, the original file is kept as a backup with that suffix; if no suffix is provided, no backup is kept. This means:

File permissions, ownership, and hard links may be affected if you are not careful.
Errors during processing can result in incomplete files if your script fails mid-way.
Running sed -i without a backup can destroy data permanently if your regex is wrong.

BSD sed is stricter about the -i argument: you pass an explicit extension. -i '' means “no backup,” while -i .bak creates a backup with a .bak suffix. Some BSDs treat a zero-length extension as “no backup” explicitly, which makes destructive edits less likely but introduces a portability difference that scripts must handle.

Portability strategy:

Prefer dry runs: Use sed -n and explicit p to verify output first.
Use backup suffixes: -i.bak (GNU) or -i .bak (BSD) so you can restore.
Avoid GNU-only extensions in scripts meant for macOS/BSD: stick to POSIX basics where possible.
Document your target platform: If your script assumes GNU sed, say so explicitly.

Another aspect of portability is regex mode. GNU sed notes that -E was added to POSIX as a portable way to request extended regex, while -r is GNU-specific legacy. Prefer -E for cross-platform scripts. Avoid advanced regex features that are not in POSIX ERE.

Failure modes here are destructive:

Silent file truncation: If your script produces no output, the temporary file replaces the original with an empty file.
Wrong syntax on macOS: Using sed -i without an argument fails on BSD/macOS.
Broken automation: A script that works on Linux fails in CI on macOS because of -i semantics.

The safe workflow is to always test with a non-destructive pipeline, and only then run -i with backups.

How This Fits on Projects

Project 1 is your first in-place edit. You will learn to apply safe substitution in a config file. Later projects reinforce the practice of dry-run testing before applying edits.

Definitions and Key Terms

In-place editing (-i): Modify files directly instead of writing to stdout.
Backup suffix: The extension added to saved backups when using -i.
Portability: Running the same script across GNU and BSD sed variants.

Mental Model Diagram

Input file -> sed transforms -> temp file -> rename -> output file
     |                                       |
     +------------------- backup ------------+

How It Works (Step-by-Step)

sed reads the input file and applies the script.
Output is written to a temporary file (GNU sed behavior).
The original file is replaced by renaming the temporary file.
If -i was given a suffix, the original file is kept as a backup.

Minimal Concrete Example

# GNU sed (Linux)
$ sed -i.bak 's/^DEBUG=true/DEBUG=false/' app.conf

# BSD sed (macOS)
$ sed -i '' 's/^DEBUG=true/DEBUG=false/' app.conf

Common Misconceptions

Misconception: -i is standardized everywhere.
- Correction: -i is a common extension but not part of the POSIX option set.
Misconception: In-place editing is always safe.
- Correction: It can overwrite files; always use backups during development.
Misconception: -r is portable.
- Correction: Prefer -E for extended regex portability.

Check-Your-Understanding Questions

Why does sed -i behave differently on macOS and Linux?
What is the safest way to apply an in-place edit?
Which option set is portable across POSIX sed?
Why should you prefer -E over -r?

Check-Your-Understanding Answers

BSD sed requires an explicit extension argument, while GNU sed allows -i with or without a suffix.
Run a dry-run with -n and explicit printing, then use -i with a backup suffix.
The POSIX set includes -n, -e, and -f. Other options are extensions.
-E is the POSIX-recognized extended regex flag, while -r is GNU-specific.

Real-World Applications

Updating config files in automation scripts
Rewriting text in build pipelines
Applying safe patches to large sets of files

Where You Will Apply It

Project 1: Config file updates with backups
Project 3: Script-based transformations (portability considerations)

References

GNU sed Manual: -i, -E, portability notes, and GNU-only options
- https://www.gnu.org/software/sed/manual/sed.html
OpenBSD sed manual: -i extension behavior, -E/-r, and BSD semantics
- https://man.openbsd.org/OpenBSD-current/sed.1
POSIX sed specification: standard options
- https://man7.org/linux/man-pages/man1/sed.1p.html

Key Insight

In-place editing is power with risk; portable scripts are power with discipline.

Summary

Portable sed means careful option choices, explicit backups, and an understanding of GNU vs BSD behavior.

Homework/Exercises to Practice the Concept

Write a script that toggles DEBUG=true to DEBUG=false with a backup file.
Test the script on both GNU and BSD sed and note the differences.
Create a wrapper function that selects the correct -i syntax based on the OS.

Solutions to the Homework/Exercises

sed -i.bak 's/^DEBUG=true/DEBUG=false/' app.conf
On macOS: sed -i '' 's/^DEBUG=true/DEBUG=false/' app.conf
Example shell snippet: if sed --version >/dev/null 2>&1; then SED_INPLACE="-i"; else SED_INPLACE="-i ''"; fi

Glossary

Address: A selector that determines which lines a command applies to.
Branch: A control flow jump (b, t) to a labeled location.
Cycle: The read -> apply commands -> print -> reset loop.
Delimiter: The separator character in s/regex/repl/.
Hold space: A persistent buffer across cycles for multi-line logic.
Pattern space: The current line or multi-line buffer being edited.
Range address: A stateful selector from a start line to an end line.
Script: An ordered list of sed commands.
Stream editor: A tool that transforms input sequentially without loading the full file.

Why sed Matters

The Modern Problem It Solves

Every production system emits text: logs, configs, manifests, reports, CI output, and build artifacts. You often need surgical edits that are fast, repeatable, and safe to run in pipelines. sed solves this by applying a deterministic edit script to a streaming input without loading the whole file into memory. That is exactly the shape of real-world operational work: transform large or continuous text streams in a single pass.

Real-World Impact (Recent Data)

sed matters because Unix-like environments dominate production infrastructure and POSIX tools are universally present there:

Unix on web servers: W3Techs reports that Unix is used by 90.7% of websites whose OS is known (January 2026).
Linux share within that universe: Linux accounts for 57.2% of websites with known OS (January 2026).

This means stream-editing tools like sed are available in the overwhelming majority of production environments, making them a dependable baseline skill for ops, data cleanup, and incident response.

Context & Evolution (Short)

sed became part of the POSIX toolset so scripts could rely on a consistent stream editor across Unix systems. The core model (pattern space + hold space + cycle) is standardized, while GNU/BSD variants add convenience flags and debugging features. The combination of standardized core + ubiquitous availability is why sed remains relevant even as higher-level tools exist.

Manual Editing                      Stream Editing
---------------                     ------------------------
Open file -> search -> edit         cat file | sed 's/old/new/'
Repeat per file                     Apply to 1 or 10,000 files
Risk of manual mistakes             Deterministic, scriptable

References

W3Techs OS usage statistics (January 2026)
- https://w3techs.com/technologies/details/os-unix
POSIX sed specification overview
- https://man7.org/linux/man-pages/man1/sed.1p.html
OpenBSD sed manual (portable behavior reference)
- https://man.openbsd.org/OpenBSD-current/sed.1

Concept Summary Table

Concept Cluster	What You Need to Internalize
Stream Editing Execution Model	The read -> edit -> print cycle, pattern space behavior, and how `-n`, `d`, and `p` control output.
Addressing and Line Selection	Line numbers, regex addresses, ranges, and negation with `!` as the core targeting logic.
Regular Expressions and Substitution	BRE vs ERE, capturing groups, backreferences, delimiters, and flags like `g` and `p`.
Script Composition and Control Flow	Multiple commands, `-e`/`-f`, `{}` blocks, branching with `b`/`t`, and early exit with `d`/`q`.
Multi-Line Processing and Hold Space	`N`, `P`, `D`, and hold space commands (`h`, `H`, `g`, `G`, `x`) for stateful edits.
In-Place Editing and Portability	Safe use of `-i`, GNU vs BSD differences, and POSIX-compatible option sets.

Project-to-Concept Map

Project	What It Builds	Primer Chapters It Uses
Project 1: Config File Updater	Safe in-place substitution in configs	1, 2, 3, 6
Project 2: Log File Cleaner	Regex capture and reformatting	1, 2, 3
Project 3: Markdown to HTML Converter	Multi-command script with ordering	1, 2, 3, 4
Project 4: Multi-Line Address Parser	Block processing with hold space	1, 2, 4, 5
Project 5: Reversing a File	Hold space mastery and control flow	1, 4, 5

Deep Dive Reading by Concept

Fundamentals and Unix Context

Concept	Book & Chapter	Why This Matters
Text pipelines	The Linux Command Line by William Shotts - Ch. 20	Solid foundation for text-processing tools.
Shell text processing	Effective Shell by Dave Kerr - Ch. 21	Practical pipelines and tool composition.
Stream editing philosophy	Shell Programming in Unix, Linux and OS X by Kochan/Wood - Ch. 9	Shows idiomatic stream editing patterns.

Regex and Substitution

Concept	Book & Chapter	Why This Matters
Regex fundamentals	Mastering Regular Expressions by Jeffrey Friedl - Ch. 3-5	Deep understanding of regex mechanics.
sed substitution	sed & awk by Dougherty/Robbins - Ch. 2-5	Classic coverage of the `s` command and syntax.

Scripting and Control Flow

Concept	Book & Chapter	Why This Matters
Script structure	Classic Shell Scripting by Robbins/Beebe - Ch. 3-5	Practical multi-command scripts.
Real-world recipes	Wicked Cool Shell Scripts by Taylor/Perry - Ch. 2	Applied, production-style patterns.

Multi-line and Hold Space

Concept	Book & Chapter	Why This Matters
Hold space patterns	sed & awk by Dougherty/Robbins - Ch. 6	Canonical advanced sed techniques.

Portability and Tooling

Concept	Book & Chapter	Why This Matters
Portability	Effective Shell by Dave Kerr - Ch. 22	Cross-platform scripting discipline.

Quick Start

Day 1 (4 hours):

Read Chapter 1 and Chapter 2 in the Theory Primer.
Run the minimal examples in each chapter on your machine.
Start Project 1 and complete the dry-run substitutions (no -i yet).
Verify output with diff or git diff.

Day 2 (4 hours):

Finish Project 1 with safe backups (-i.bak or -i .bak).
Read Chapter 3 and practice 5 regex substitutions.
Start Project 2 and build your regex slowly using -n and p.

End of Weekend: You can confidently explain the sed execution cycle and apply targeted substitutions in real files.

Recommended Learning Paths

Path 1: The Sysadmin/DevOps Track (Recommended)

Best for: People who manage configs and logs.

Project 1 -> Project 2 -> Project 4 -> Project 5
Use Chapter 6 early for in-place safety

Path 2: The Text Transformation Track

Best for: People working on data cleaning.

Project 2 -> Project 3 -> Project 4
Focus on Chapter 3 (regex) and Chapter 5 (multi-line)

Path 3: The CLI Tool Builder

Best for: People building reusable scripts and pipelines.

Project 1 -> Project 3 -> Project 4
Emphasize Chapter 4 (script structure)

Path 4: The Completionist

Best for: Deep mastery.

Project 1 -> 2 -> 3 -> 4 -> 5 in order
Repeat Project 5 with a different solution and document trade-offs

Success Metrics

You can explain the sed execution cycle without notes.
You can write a correct s/regex/repl/ substitution with backreferences.
You can target lines with both numeric and regex addresses.
You can write a multi-command script in a .sed file.
You can safely apply in-place edits with backups on your OS.
You can debug a failing script using -n and explicit p.

Optional Appendices

Appendix A: Sed Portability Cheat Sheet

Feature	GNU sed (Linux)	BSD/macOS sed	Portability Note
In-place edit	`-i[SUFFIX]` (suffix optional)	`-i ''` or `-i .bak` (explicit arg)	GNU accepts optional suffix; BSD expects an argument, even if empty. Use backups during development.
Extended regex	`-E` or `-r`	`-E` (often `-r` alias)	Prefer `-E` for portability.
POSIX mode	`--posix`	N/A	GNU-only strict mode.
Debug tracing	`--debug`	N/A	GNU-only.

Appendix B: Debugging Workflow

Start with -n and explicit p.
Add commands one at a time.
Use sed -n 'l' to display hidden characters (GNU/BSD l command).
For GNU sed, use --debug when available.
Always test on small fixtures before running -i.

Appendix C: Regex Quick Reference (sed)

^ start of line
$ end of line
. any single character
* zero or more of previous
[abc] one of a, b, c
[^abc] not a, b, c
[[:digit:]] any digit (POSIX class)
$...$ capture group in BRE
( ... ) capture group in ERE (-E)
\1 backreference

Appendix D: Common Commands Quick Reference

s substitute
d delete pattern space and start next cycle
p print pattern space
n read next line, start next cycle
N append next line to pattern space
h copy pattern space to hold space
H append pattern space to hold space
g copy hold space to pattern space
G append hold space to pattern space
x exchange pattern and hold space
b branch
t test and branch on successful substitution
q quit

Appendix E: Cross-Platform In-Place Wrapper (Shell)

# Use: sed_inplace 's/old/new/' file1 [file2 ...]
# Creates backups with .bak on both GNU and BSD sed.
sed_inplace() {
  if sed --version >/dev/null 2>&1; then
    # GNU sed
    sed -i.bak "$@"
  else
    # BSD/macOS sed
    sed -i .bak "$@"
  fi
}

Project Overview Table

#	Project	Difficulty	Core Skills	Primary Deliverable
1	Config File Updater	Beginner	`s` command, addresses, safe `-i`	Automated config toggle script with backups
2	Log File Cleaner	Beginner	regex capture, formatting, selective printing	Log normalizer pipeline with validated output
3	Markdown to HTML Converter	Intermediate	multi-command scripts, ordering, ERE	`converter.sed` script that emits HTML
4	Multi-Line Address Parser	Advanced	hold space, multi-line pattern space	Block extractor for paragraph-level transformations
5	Reversing a File	Advanced	hold space accumulation, control flow	File reverser one-liner with explanation

Project List

These projects build sed skills from basic substitutions to advanced multi-line transformations.

Project 1: The Config File Updater - Safe in-place edits with precise addresses.
Project 2: The Log File Cleaner - Regex capture groups and reformatting.
Project 3: Basic Markdown to HTML Converter - Multi-command scripts and ordering.
Project 4: The Multi-Line Address Parser - Hold space and block processing.
Project 5: Reversing a File (Line by Line) - Full hold space mastery.

Project 1: The Config File Updater

File: LEARN_SED_COMMAND.md
Main Programming Language: sed (Bash/Shell)
Alternative Programming Languages: Python, Perl, Awk
Coolness Level: Level 2: Practical but Forgettable
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 1: Beginner
Knowledge Area: Text Substitution / In-place Editing
Software or Tool: sed
Main Book: sed & awk, 2nd Edition by Dale Dougherty & Arnold Robbins

What you’ll build: A sed command that finds and replaces a specific setting in a configuration file. For example, changing DEBUG=true to DEBUG=false.

Why it teaches sed: This is the most common use case for sed. It teaches you the fundamentals of the s command, using regex to anchor your search (^DEBUG=), and how to edit files in-place safely with the -i option.

Core challenges you’ll face:

Constructing the s command → maps to understanding the s/find/replace/ syntax
Matching the whole line vs. just the value → maps to using ^ and $ to make your regex more specific
Handling different file permissions and backups → maps to understanding the -i (in-place) and -i.bak flags
Applying the change only to specific lines → maps to using an address pattern like /^DEBUG=/

Key Concepts:

Substitution: “sed & awk” Ch. 3 - Dougherty & Robbins
In-place Editing: man sed (look for the -i option)
Regular Expressions: “sed & awk” Ch. 2

Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic command-line navigation.

Real World Outcome

You will have a config file, app.conf:

# Application Settings
SERVER_HOST=127.0.0.1
SERVER_PORT=8080
DEBUG=true

You will run a single sed command, and the file will be instantly modified:

$ sed -i.bak 's/^DEBUG=true/DEBUG=false/' app.conf

The file app.conf now contains DEBUG=false, and a backup of the original file, app.conf.bak, has been created.

Command Line Outcome Example:

$ cat app.conf
# Application Settings
SERVER_HOST=127.0.0.1
SERVER_PORT=8080
DEBUG=true

# Dry run (preview only)
$ sed 's/^DEBUG=true/DEBUG=false/' app.conf
# Application Settings
SERVER_HOST=127.0.0.1
SERVER_PORT=8080
DEBUG=false

# In-place edit with backup (GNU sed)
$ sed -i.bak 's/^DEBUG=true/DEBUG=false/' app.conf

$ ls -1 app.conf app.conf.bak
app.conf
app.conf.bak

$ cat app.conf
# Application Settings
SERVER_HOST=127.0.0.1
SERVER_PORT=8080
DEBUG=false

$ cat app.conf.bak
# Application Settings
SERVER_HOST=127.0.0.1
SERVER_PORT=8080
DEBUG=true

Implementation Hints:

Create a sample app.conf file to work with.
Start by just printing the output to the terminal (don’t use -i yet). sed 's/true/false/' app.conf
Notice that this might change other lines if they contain “true”. How can you make it more specific?
- Anchor the search to the beginning of the line. What character does that?
- Only apply the s command to lines that match a certain pattern. The syntax is /pattern/s/find/replace/.
Once your command correctly isolates and changes only the DEBUG=true line, you can add the -i flag to perform the edit on the file itself. It’s good practice to use -i.bak to create a backup, especially when learning.

Learning milestones:

You can replace a simple word in a file → You understand the basic s command.
You can replace a word on a specific line number → You understand numeric addressing.
You can replace a word only on lines that match a pattern → You understand regex addressing.
You confidently use sed -i to modify a file → You’ve unlocked sed for scripting and automation.

The Core Question You’re Answering

“How do I surgically modify a single line in a file without opening an editor?”

This is the fundamental question every sysadmin, DevOps engineer, and shell scripter eventually faces. Configuration files are everywhere - .env files, nginx configs, application settings, systemd units - and you need to change them programmatically. Opening vim or nano is fine for one file, but what about 50 servers? What about a CI/CD pipeline? What about a script that needs to toggle a setting based on the environment?

The answer is sed. Specifically, the s (substitute) command with precise addressing. This project teaches you to be a text surgeon: make exactly the change you need, on exactly the line you need, without touching anything else.

Concepts You Must Understand First

Before writing your first sed command, you need to internalize these foundational concepts:

1. The s Command Structure: s/pattern/replacement/flags

This is the anatomy of every substitution:

s: The substitute command (there are others like d for delete, but s is king)
/pattern/: A regular expression describing what to find
/replacement/: What to put in its place
/flags: Optional modifiers (like g for global, i for case-insensitive)

The delimiter doesn’t have to be /. You can use any character: s#pattern#replacement# or s|pattern|replacement|. This is useful when your pattern contains slashes (like file paths).

Reference: “sed & awk” by Dougherty & Robbins, Chapter 3: “The Basic s Command”

2. Regular Expression Anchors: ^ and $

^ matches the beginning of a line (not a character, but a position)
$ matches the end of a line (again, a position)

Why do these matter? Consider:

sed 's/true/false/' config.txt

This will change the FIRST occurrence of “true” on ANY line. If your file has:

DEBUG=true
ENABLE_LOGGING=true

Both lines get modified. But if you use:

sed 's/^DEBUG=true/DEBUG=false/' config.txt

Only the line that STARTS with DEBUG=true is changed.

Reference: “sed & awk” Chapter 2: “Regular Expressions”, specifically the section on “Positional Metacharacters”

3. The -i Flag for In-Place Editing

Without -i, sed writes to stdout (the terminal). The original file is untouched. This is actually a feature - it lets you preview changes before committing them.

With -i, sed modifies the file directly:

-i (GNU sed): Edits in place, no backup
-i.bak or -i '.bak': Edits in place, saves original as filename.bak
-i '' (macOS/BSD sed): Edits in place, no backup (note the empty string is required)

Warning: macOS and Linux have different sed implementations! On macOS:

sed -i '' 's/old/new/' file.txt  # macOS (BSD sed)
sed -i 's/old/new/' file.txt     # Linux (GNU sed)

Reference: man sed on your specific system, look for -i or --in-place

4. How Sed Addresses Work: Line Numbers vs. Patterns

Addresses tell sed which lines to operate on:

Address Type	Syntax	Example	Meaning
No address	`s/a/b/`	`sed 's/a/b/'`	Apply to ALL lines
Line number	`3s/a/b/`	`sed '3s/a/b/'`	Apply only to line 3
Line range	`3,5s/a/b/`	`sed '3,5s/a/b/'`	Apply to lines 3-5
Pattern	`/regex/s/a/b/`	`sed '/DEBUG/s/a/b/'`	Apply to lines matching regex
Last line	`$s/a/b/`	`sed '$s/a/b/'`	Apply only to the last line

The pattern address is your most powerful tool. /^DEBUG=/s/true/false/ means: “On lines that start with DEBUG=, substitute true with false.”

Reference: “sed & awk” Chapter 3: “Addressing”

Questions to Guide Your Design

Before you write any code, sit with these questions. They’ll save you from common mistakes:

1. What if “true” appears elsewhere in the file?

Your file might contain:

ENABLE_SSL=true
DEBUG=true
DESCRIPTION="Setting this to true enables debugging"

A naive sed 's/true/false/' would change ALL of them, including corrupting the DESCRIPTION field. How do you prevent this?

2. How do you ensure only the DEBUG line is modified?

Think about what makes the DEBUG line unique:

Does it start with a specific prefix?
Is it on a specific line number? (Fragile - line numbers can change)
Does it have a unique pattern?

The most robust approach combines pattern addressing with anchored substitution:

sed '/^DEBUG=/s/true/false/'

3. What happens if the file doesn’t have the expected line?

If you run sed '/^DEBUG=/s/true/false/' app.conf and there’s no DEBUG= line, nothing happens. sed exits with code 0 (success). This can be silent failure.

How would you detect this? Consider:

grep -q '^DEBUG=' app.conf && sed -i '/^DEBUG=/s/true/false/' app.conf

Or check if sed actually made a change.

4. How do you safely test before modifying?

Golden rule: Never use -i on your first attempt.

Your workflow should be:

Run without -i to see output: sed 's/old/new/' file.txt
Verify the output is what you expect
Then add -i.bak for safety: sed -i.bak 's/old/new/' file.txt
Verify the change, then remove the backup if satisfied

Thinking Exercise

Before touching the keyboard, do this paper exercise. It will cement your understanding of how sed processes files.

Exercise: Trace Through a 5-Line Config File

Given this file (app.conf):

# Application Config
SERVER=localhost
PORT=8080
DEBUG=true
TIMEOUT=30

Trace what happens when you run: sed '/^DEBUG=/s/true/false/' app.conf

For each line, answer:

What is in the pattern space?
Does the address /^DEBUG=/ match?
If yes, what substitution occurs?
What is printed to output?

Line #	Pattern Space	Address Match?	Substitution	Output
1	`# Application Config`	No (doesn’t start with DEBUG=)	None	`# Application Config`
2	`SERVER=localhost`	No	None	`SERVER=localhost`
3	`PORT=8080`	No	None	`PORT=8080`
4	`DEBUG=true`	Yes	`true` -> `false`	`DEBUG=false`
5	`TIMEOUT=30`	No	None	`TIMEOUT=30`

Follow-up questions:

What if line 4 was DEBUG=TRUE (uppercase)? Would it match? (No, unless you add the i flag)
What if line 4 was ` DEBUG=true (leading spaces)? Would ^DEBUG= match? (No, because ^` means “start of line”)
How would you handle both cases?

The Interview Questions They’ll Ask

These are real questions from DevOps, SRE, and backend engineering interviews. If you can answer them confidently, you’ve mastered this project’s learning goals.

Question 1: “How would you change a config value in a file from the command line?”

Expected answer: “I’d use sed with the substitute command. For example, to change DEBUG=true to DEBUG=false in a config file, I’d run:

sed -i.bak 's/^DEBUG=true/DEBUG=false/' config.txt

The -i.bak creates a backup, ^ anchors to line start so I don’t accidentally modify other lines, and the s command does the substitution.”

Bonus points: Mention that on macOS you’d need sed -i '' 's/...' or use gsed (GNU sed from Homebrew).

Question 2: “What’s the difference between sed 's/a/b/' and sed 's/a/b/g'?”

Expected answer: “Without the g flag, sed only replaces the FIRST occurrence of the pattern on each line. With g (global), it replaces ALL occurrences on each line.

For example, with input banana:

sed 's/a/X/' produces bXnana (only first a changed)
sed 's/a/X/g' produces bXnXnX (all as changed)”

Question 3: “How do you make sed only modify lines matching a pattern?”

Expected answer: “You use an address before the command. The address can be a line number, range, or regex pattern. For example:

sed '5s/a/b/' - only modify line 5
sed '/ERROR/s/a/b/' - only modify lines containing ‘ERROR’
sed '/^#/d' - delete lines starting with # (comments)

The pattern goes between slashes before the command.”

Question 4 (Follow-up): “What if you want to modify lines that DON’T match a pattern?”

Expected answer: “You use the ! operator to negate the address. For example:

sed '/^#/!s/foo/bar/' - substitute on all lines EXCEPT comments
sed '/DEBUG/!d' - delete all lines that DON’T contain DEBUG (i.e., keep only DEBUG lines)”

Hints in Layers

If you’re stuck, use these hints progressively. Try to solve the problem with the minimum number of hints.

Hint 1: Start Without `-i`

Never edit in place on your first try. Run sed and let it print to the terminal:

sed 's/true/false/' app.conf

Look at the output. Is it what you expected? Only add -i after you’re confident.

Hint 2: Use `^` to Anchor to Line Start

The caret ^ matches the beginning of a line. This makes your pattern more specific:

sed 's/^DEBUG=true/DEBUG=false/' app.conf

Now only lines that START with DEBUG=true will be modified.

Hint 3: Use Pattern Addressing

Even better than anchoring in the substitution, use an address to select which lines the command applies to:

sed '/^DEBUG=/s/true/false/' app.conf

This says: “On lines matching ^DEBUG=, substitute true with false.”

Why is this better? It’s more flexible. What if the value could be true, True, or TRUE?

sed '/^DEBUG=/s/[Tt][Rr][Uu][Ee]/false/' app.conf

Hint 4: Use `-i.bak` for Safety

When you’re ready to modify the file, always create a backup:

sed -i.bak '/^DEBUG=/s/true/false/' app.conf

This creates app.conf.bak with the original content. If something goes wrong, you can restore it:

mv app.conf.bak app.conf

Hint 5 (macOS users): Handle BSD sed

On macOS, the -i flag requires an argument (even if empty):

sed -i '' '/^DEBUG=/s/true/false/' app.conf  # macOS

Or install GNU sed via Homebrew and use gsed:

brew install gnu-sed
gsed -i '/^DEBUG=/s/true/false/' app.conf

Books That Will Help

Topic	Book	Chapter/Section	Why It Helps
Basic `s` command	“sed & awk” by Dougherty & Robbins	Ch. 3: “Basic sed Commands”	The definitive explanation of substitution syntax
Regular expressions	“sed & awk” by Dougherty & Robbins	Ch. 2: “Understanding Basic Regular Expressions”	Master the patterns that power `sed`
In-place editing	`man sed` (system manual)	`-i` option section	Platform-specific behavior (GNU vs BSD)
Regex anchors	“Mastering Regular Expressions” by Friedl	Ch. 3: “Overview of Regular Expression Features”	Deep dive into `^`, `$`, and word boundaries
Shell scripting context	“Classic Shell Scripting” by Robbins & Beebe	Ch. 7: “Power Tools for Text Editing”	How `sed` fits into larger scripts
Quick reference	The Grymoire (online)	www.grymoire.com/Unix/Sed.html	Excellent examples and explanations

Reading order for this project:

Start with “sed & awk” Ch. 2 (regex basics) - 30 minutes
Read “sed & awk” Ch. 3 through the s command section - 45 minutes
Skim man sed for -i flag specifics on your system - 10 minutes
Keep The Grymoire bookmarked for quick reference

Common Pitfalls & Debugging

Problem 1: “Nothing changed after running the command”

Why: Your address or regex did not match the target line.
Fix: Test the address first with -n and p.
Quick test: sed -n '/^DEBUG=/p' app.conf

Problem 2: “My file became empty after I used -i”

Why: You used -n without an explicit p, so no output was written.
Fix: Remove -n or add an explicit p before using -i.
Quick test: sed -n 's/^DEBUG=true/DEBUG=false/p' app.conf

Problem 3: “macOS says sed: 1: … extra characters after command”

Why: BSD/macOS sed requires an argument for -i.
Fix: Use -i '' for no backup or -i .bak for a backup.
Quick test: sed -i '' 's/^DEBUG=true/DEBUG=false/' app.conf

Problem 4: “Other lines changed unexpectedly”

Why: The regex is too broad (e.g., s/true/false/ without an address).
Fix: Anchor the match or add an address: /^DEBUG=/s/true/false/.
Quick test: sed -n '/^DEBUG=/p' app.conf

Definition of Done

A dry run shows exactly one line change and nothing else.
The command only updates DEBUG=true and leaves other keys untouched.
A backup file is created when using in-place edits.
The script works on both GNU and BSD sed with documented syntax.
You can explain why the address and anchors are required.

Project 2: The Log File Cleaner

File: LEARN_SED_COMMAND.md
Main Programming Language: sed (Bash/Shell)
Alternative Programming Languages: Awk, Python
Coolness Level: Level 2: Practical but Forgettable
Business Potential: 1. The “Resume Gold”
Difficulty: Level 1: Beginner
Knowledge Area: Regex / Capturing Groups
Software or Tool: sed
Main Book: Mastering Regular Expressions, 3rd Edition by Jeffrey E.F. Friedl

What you’ll build: A sed script that processes a messy log file, removing unnecessary information (like log level and timestamps) and reformatting it into a cleaner, more readable format.

Why it teaches sed: This project forces you to learn about capturing groups. You’ll match parts of a line and then reference those parts in your replacement string, which is the key to reformatting text instead of just replacing it.

Core challenges you’ll face:

Matching a complex line structure → maps to writing a regular expression that describes the entire log line format
Capturing parts of the line → maps to using ` , , , `, " (or ( and ) with -E) to create groups
Referencing captured groups → maps to using ` , , , `, ", ` , , , `, ", ` , , , `, ", etc. in the replacement part of the s command
Deleting non-matching lines → maps to using the d command with pattern addressing

Key Concepts:

Capturing Groups: “sed & awk” Ch. 3
Extended Regular Expressions: man sed (the -E or -r option)
Combining Commands: Using multiple -e expressions or semicolons.

Difficulty: Beginner Time estimate: Weekend Prerequisites: Project 1.

Real World Outcome

You’ll start with a log file app.log like this:

[2025-12-20 10:00:15] [INFO] User 'admin' logged in from 192.168.1.100.
[2025-12-20 10:01:02] [DEBUG] Caching mechanism triggered for key 'user:123'.
[2025-12-20 10:01:30] [ERROR] Failed to connect to database: Connection refused.

Your sed script will transform it into this:

ERROR: Failed to connect to database: Connection refused.

It extracts only the message from ERROR lines.

Command Line Outcome Example:

$ cat app.log
[2025-12-20 10:00:15] [INFO] User 'admin' logged in from 192.168.1.100.
[2025-12-20 10:01:02] [DEBUG] Caching mechanism triggered for key 'user:123'.
[2025-12-20 10:01:30] [ERROR] Failed to connect to database: Connection refused.

# Step 1: Verify filtering
$ sed -n '/ERROR/p' app.log
[2025-12-20 10:01:30] [ERROR] Failed to connect to database: Connection refused.

# Step 2: Extract and reformat
$ sed -E -e '/ERROR/!d' -e 's/^\[[^]]+\] \[ERROR\] (.*)/ERROR: \1/' app.log
ERROR: Failed to connect to database: Connection refused.

Implementation Hints:

Create your sample app.log.
Your goal is to match an entire ERROR line and extract only the part after the log level.
Think about the structure: [timestamp] [ERROR] message.
Write a regex to match this. With -E, it might look something like ^\t, \n, \r, \\, \", \t, \n, \r, \\, \", .*\t, \n, \r, \\, \", .*\t, \n, \r, \\, \", $.
- ^\t, \n, \r, \\, \", .*\t, \n, \r, \\, \": Matches the timestamp part.
- ` \t, \n, \r, \, ", ERROR \t, \n, \r, \, "`: Matches the log level part.
- (.*)$: This is the key! It’s a capturing group that matches the rest of the line (the message) to the end.
Now, construct your s command. You want to replace the entire line with just the part you captured. How do you reference the first captured group? (\t, \n, \r, \\, \"). s/^\t, \n, \r, \\, \", .*\t, \n, \r, \\, \", .*\t, \n, \r, \\, \", ` [ERROR] \t, \n, \r, \, ", (.*)$/\t, \n, \r, \, ", \1/`
This command will only change the ERROR lines. What about the INFO and DEBUG lines? You want to delete them. You can use a separate d command. How can you apply a command only to lines that don’t match a pattern? (Hint: !). '/ERROR/!d'
You can combine these two commands using the -e flag: sed -E -e '/ERROR/!d' -e 's/.../.../' app.log.

Learning milestones:

You can write a regex that matches an entire structured line → You understand how to model text patterns.
You can extract a substring from a line using a capturing group → You’ve learned the key to reformatting.
You can re-order parts of a line → e.g., s/(part1)(part2)/\t, \n, \r, \\, \", \2 \t, \n, \r, \\, \", \1/.
You can chain multiple commands to perform a multi-step transformation → You are starting to think like a sed scripter.

The Core Question You’re Answering

“How can I extract and reformat specific parts of a structured line, not just replace text?”

This project moves you beyond simple find-and-replace. You’re learning to dissect a line, identify its components, and reassemble them in a new format. This is the difference between knowing sed and truly understanding it.

Concepts You Must Understand First

Before diving into the implementation, ensure you grasp these foundational concepts:

Concept	Description	Example
Capturing groups (BRE)	In Basic Regular Expressions, parentheses must be escaped: `$` and `$`	`sed 's/$hello$/\1 world/'`
Capturing groups (ERE)	In Extended Regular Expressions (`-E` flag), use unescaped parentheses	`sed -E 's/(hello)/\1 world/'`
Backreferences	Reference captured groups with `\1`, `\2`, `\3`, etc. in the replacement	`s/(a)(b)/\2\1/` swaps a and b
Extended regex flag	Use `-E` (POSIX) or `-r` (GNU) to enable ERE syntax	`sed -E 's/(pattern)/\1/'`
Pattern negation	The `!` modifier inverts the address match	`/ERROR/!d` deletes non-ERROR lines
The `d` command	Deletes the current line from the pattern space (no output)	`sed '/DEBUG/d'` removes DEBUG lines

Critical distinction: In BRE (default), you write $group$ and reference with \1. In ERE (-E), you write (group) and still reference with \1. The backreferences always use the backslash.

Questions to Guide Your Design

Work through these questions before writing any code:

How do you match the timestamp format [YYYY-MM-DD HH:MM:SS]?
- What character class matches digits? ([0-9] or [[:digit:]])
- How do you match the literal brackets [ and ]?
- Do you need to escape them in your regex?
How do you capture only the message part?
- Where does the message start in the log line?
- Should you capture everything after [ERROR] or just the meaningful text?
- What about leading/trailing whitespace?
How do you delete lines that DON’T match ERROR?
- What does /pattern/!command mean?
- Should you delete first, then substitute, or vice versa?
- What happens to the pattern space when you use d?
What if a log message contains brackets?
- Example: [2025-12-20 10:01:30] [ERROR] Array index [5] out of bounds.
- How do you ensure your regex doesn’t get confused by extra brackets?
- Should you use greedy or non-greedy matching?

Thinking Exercise

Trace through what each capture group captures for this sample log line:

[2025-12-20 10:01:30] [ERROR] Failed to connect to database: Connection refused.

Given this sed command with -E:

sed -E 's/^\[([^]]+)\] \[([A-Z]+)\] (.*)$/\2: \3/'

Fill in the blanks:

Group	Pattern	Captured Value
`\1`	`([^]]+)`	______
`\2`	`([A-Z]+)`	______
`\3`	`(.*)`	______

Click to reveal answers

Group	Pattern	Captured Value
`\1`	`([^]]+)`	`2025-12-20 10:01:30`
`\2`	`([A-Z]+)`	`ERROR`
`\3`	`(.*)`	`Failed to connect to database: Connection refused.`

The output would be: ERROR: Failed to connect to database: Connection refused.

The Interview Questions They’ll Ask

Prepare for these common interview questions related to this project:

“How would you extract just IP addresses from a log file?”
- Expected answer: Use a regex pattern for IP addresses with capturing groups
- Example: sed -E 's/.*([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*/\1/'
- Bonus: Discuss validating IP address ranges (0-255)
“Explain what a capturing group is and how you’d use it in sed”
- Definition: A portion of a regex enclosed in parentheses that “remembers” the matched text
- Usage: Reference with \1, \2, etc. in the replacement string
- Real example: Reformatting dates from MM/DD/YYYY to YYYY-MM-DD
“How do you invert a match in sed?”
- Use the ! modifier after the address: /pattern/!command
- Example: /ERROR/!d means “on lines NOT matching ERROR, delete”
- Contrast with: /ERROR/d which deletes lines that DO match ERROR
“What’s the difference between BRE and ERE in sed?”
- BRE (Basic): Default mode, must escape metacharacters like (), {}, +, ?
- ERE (Extended): Enabled with -E, metacharacters work without escaping
- Trade-off: ERE is more readable but slightly less portable

Hints in Layers

Progress through these hints only as needed:

Hint 1: Getting started

First, just try to match ERROR lines and print them:

sed -n '/ERROR/p' app.log

The -n suppresses automatic printing, and p prints only matching lines.

Hint 2: Deleting non-ERROR lines

Instead of printing ERROR lines, delete everything else:

sed '/ERROR/!d' app.log

This keeps only ERROR lines in the output.

Hint 3: Matching the log line structure

Break down the log line format:

[timestamp] - starts with [, ends with ], contains date/time
` ` - a space
[ERROR] - the log level
` ` - another space
message - everything else

Pattern with ERE: ^\[.*\] \[ERROR\] (.*)

Hint 4: Capturing the message

Use parentheses around the part you want to keep:

sed -E 's/^\[.*\] \[ERROR\] (.*)/\1/' app.log

This replaces the entire line with just the captured message.

Hint 5: Adding the ERROR prefix

Include literal text in your replacement:

sed -E 's/^\[.*\] \[ERROR\] (.*)/ERROR: \1/' app.log

Now you have the desired output format!

Hint 6: Combining both operations

Use multiple -e expressions to chain commands:

sed -E -e '/ERROR/!d' -e 's/^\[.*\] \[ERROR\] (.*)/ERROR: \1/' app.log

Order matters: first filter to ERROR lines, then reformat them.

Hint 7: Complete solution

#!/bin/bash
# log_cleaner.sh - Extract and reformat ERROR lines from log files

sed -E \
    -e '/ERROR/!d' \
    -e 's/^\[[^]]+\] \[ERROR\] (.*)/ERROR: \1/' \
    "$1"

Usage: ./log_cleaner.sh app.log

Note: [^]]+ is safer than .* for matching the timestamp, as it won’t be greedy across multiple brackets.

Books That Will Help

Topic	Book	Chapter/Section
Capturing groups deep dive	“Mastering Regular Expressions” by Jeffrey Friedl	Ch. 3: Overview of Regular Expression Features
sed substitution mechanics	“sed & awk” by Dale Dougherty & Arnold Robbins	Ch. 3: Understanding Regular Expression Syntax
BRE vs ERE differences	“sed & awk” by Dale Dougherty & Arnold Robbins	Ch. 2: Understanding Basic Operations
Pattern negation	“Classic Shell Scripting” by Arnold Robbins	Ch. 3: Searching and Substitution
Real-world log processing	“The Linux Command Line” by William Shotts	Ch. 19: Regular Expressions
Online reference	The Grymoire - SED	Backreferences section

Common Pitfalls & Debugging

Problem 1: “No output at all”

Why: You deleted all lines with /ERROR/!d but the regex never matched ERROR.
Fix: Verify the log level string and case.
Quick test: sed -n '/ERROR/p' app.log

Problem 2: “The regex eats too much and output is blank”

Why: .* is greedy and your capture group might be empty.
Fix: Use a safer class like [^]]+ for bracketed fields.
Quick test: sed -E 's/^\[[^]]+\\] \\[ERROR\\] (.*)/ERROR: \\1/' app.log

Problem 3: “I still see INFO and DEBUG lines”

Why: You forgot the /ERROR/!d filter or the commands are in the wrong order.
Fix: Filter first, then substitute.
Quick test: sed -E -e '/ERROR/!d' -e 's/^\[[^]]+\\] \\[ERROR\\] (.*)/ERROR: \\1/' app.log

Problem 4: “macOS says ‘invalid command code’“

Why: You used GNU-only flags or quoting mismatches.
Fix: Use -E for extended regex on macOS and single quotes around the script.
Quick test: sed -E 's/^\[[^]]+\\] \\[ERROR\\] (.*)/ERROR: \\1/' app.log

Definition of Done

Your script outputs only ERROR lines with the new format.
INFO and DEBUG lines are fully removed.
The regex uses explicit classes and avoids fragile .* where possible.
The script works with -E on both GNU and BSD sed.
You can explain each capture group and its replacement.

Project 3: Basic Markdown to HTML Converter

File: LEARN_SED_COMMAND.md
Main Programming Language: sed (Bash/Shell)
Alternative Programming Languages: N/A
Coolness Level: Level 3: Genuinely Clever
Business Potential: 1. The “Resume Gold”
Difficulty: Level 2: Intermediate
Knowledge Area: Scripting / Multiple Transformations
Software or Tool: sed
Main Book: Classic Shell Scripting by Arnold Robbins & Nelson H.F. Beebe

What you’ll build: A sed script that reads a simple Markdown file and converts its syntax (headings, bold, italics) into basic HTML tags.

Why it teaches sed: This project teaches you how to structure a sed script with multiple, ordered commands. You’ll learn that the order of substitutions matters and how to handle patterns that occur at the beginning, middle, or end of a line.

Core challenges you’ll face:

Handling multiple patterns in one script → maps to writing a .sed script file or using multiple -e flags
Order of operations → maps to realizing you should probably handle bold/italics before headings to avoid conflicts
Matching patterns at the beginning of a line → maps to using ^ for headings like ## Title
Handling greediness in regex → maps to understanding how .* can sometimes match more than you want

Key Concepts:

Sed Scripts: “sed & awk” Ch. 4
Command Order: Logical thinking about how transformations affect subsequent commands.
Regex Greediness: Regular-Expressions.info - Greed and Laziness

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Project 2.

Real World Outcome

You’ll have a Markdown file notes.md with mixed syntax:

# My Document

This is a paragraph with some *italic* and **bold** text.

## A Subheading

More text with **bold and *nested italic*** here.

### A Sub-subheading

Final paragraph with multiple **bold** items and *italics*.

You will build a sed script file converter.sed step-by-step:

Step 1: Create the basic script file

Start with just heading conversions:

cat > converter.sed << 'EOF'
# Convert headings (h3 before h2 before h1 to prevent partial matches)
s#^### (.*)$#<h3>\1</h3>#
s#^## (.*)$#<h2>\1</h2>#
s#^# (.*)$#<h1>\1</h1>#

# Convert emphasis (bold before italic to handle ** before *)
s#\*\*([^*]+)\*\*#<strong>\1</strong>#g
s#\*([^*]+)\*#<em>\1</em>#g
EOF

Step 2: Test individual transformations

First, test just the h2 conversion:

$ echo "## A Subheading" | sed -E 's#^## (.*)$#<h2>\1</h2>#'
<h2>A Subheading</h2>

Then test the h1:

$ echo "# My Document" | sed -E 's#^# (.*)$#<h1>\1</h1>#'
<h1>My Document</h1>

Test a line with bold (using the greedy problem first to see the bug):

$ echo "Text with **bold** and **more bold** here" | sed -E 's#\*\*(.*)\*\*#<strong>\1</strong>#g'
<strong>bold** and **more bold</strong>

This is WRONG—the .* matches from the first ** to the last **. Now fix it with non-greedy:

$ echo "Text with **bold** and **more bold** here" | sed -E 's#\*\*([^*]+)\*\*#<strong>\1</strong>#g'
<strong>bold</strong> and <strong>more bold</strong>

Step 3: Test order dependency

Show why order matters. Process italic first (WRONG):

$ echo "**bold** and *italic*" | sed -E -e 's#\*([^*]+)\*#<em>\1</em>#g' -e 's#\*\*([^*]+)\*\*#<strong>\1</strong>#g'
<em>*bold*</em> and <em>italic</em>

Now process bold first (CORRECT):

$ echo "**bold** and *italic*" | sed -E -e 's#\*\*([^*]+)\*\*#<strong>\1</strong>#g' -e 's#\*([^*]+)\*#<em>\1</em>#g'
<strong>bold</strong> and <em>italic</em>

Step 4: Run the complete script on the input file

Execute the full transformation:

$ sed -E -f converter.sed notes.md
<h1>My Document</h1>

This is a paragraph with some <em>italic</em> and <strong>bold</strong> text.

<h2>A Subheading</h2>

More text with <strong>bold</strong> and <em>nested italic</em> here.

<h3>A Sub-subheading</h3>

Final paragraph with multiple <strong>bold</strong> items and <em>italics</em>.

Step 5: Compare before and after

Original Markdown	Converted HTML
`# My Document`	`<h1>My Document</h1>`
`## A Subheading`	`<h2>A Subheading</h2>`
`### Sub-subheading`	`<h3>Sub-subheading</h3>`
`bold text`	`<strong>bold text</strong>`
`italic text`	`<em>italic text</em>`
`bold and italic`	`<strong>bold</strong> and <em>italic</em>`
`one and two`	`<strong>one</strong> and <strong>two</strong>`

Key Insights from Building This:

Pattern ordering matters: Bold must be processed before italic, headings can be in any order because their patterns don’t overlap.
Greedy matching is the enemy: .* between delimiters will consume everything up to the LAST delimiter on the line. Use [^char]+ instead to match “anything but this character.”
The g flag is essential: Without s#pattern#replacement#g, only the first match on each line is replaced.
Alternative delimiters save sanity: Using # instead of / means no escaping of HTML’s </h1> — it’s just /h1> inside.
Extended regex (-E) cleans up syntax: Without -E, you’d need to escape every parenthesis: $ and $. With -E, they’re literal.

(Note: for wrapping paragraphs in <p> tags and list items in <ul>/<li> tags, you would need a more advanced script using the hold space or lookahead patterns, but headings and emphasis transformations are fully achievable with basic sed).

Implementation Hints:

Create a file converter.sed to hold your script. You will run it with sed -E -f converter.sed notes.md.
Start with the simplest transformation. How do you convert ## A Subheading to <h2>A Subheading</h2>?
- Your address should match lines starting with ## .
- Your s command needs to capture the text after the ## .
- s/^## (.*)$/<h2>\1<\/h2>/. Note the escaped / in the closing tag. sed lets you use other delimiters to avoid this, e.g., s#^## (.*)$#<h2>\1</h2>#.
Now, add a rule for <h1> headings. Does the order of the h1 and h2 rules in your script file matter? (No, because their patterns are distinct).
Next, tackle bold text: **bold**. The command will look like s/ , ` , , `, ", ` , , , `, ", (.*) , ` , , `, ", ` , , , `, ", /<strong>\1<\/strong>/g.
What happens if you have **bold** and **more bold** on one line? The (.*) is “greedy” and might match from the first ** to the last **. You need to match characters that are not asterisks. A pattern like [^*] can help. s/ , ` , , `, ", ` , , , `, ", ([^*]+) , ` , , `, ", ` , , , `, ", /<strong>\1<\/strong>/g.
Add a rule for italics (*italic*). Does the order of the bold and italic rules matter? (Yes! If you do italics first, **bold** might become <em>*bold*</em>, which is wrong).

Learning milestones:

Your script can convert one type of Markdown syntax → You can write a self-contained rule.
Your script handles multiple heading levels correctly → You understand how to use multiple rules.
You can convert bold and italic text on the same line → You understand the importance of command order and non-greedy matching.
You use an alternate delimiter like # or | in your s command to handle file paths or HTML → You’ve learned a key trick for sed readability.

The Core Question You’re Answering

“How do I chain multiple transformations together in the correct order, and how does regex greediness affect my patterns?”

This project forces you to confront a fundamental truth about text processing: order matters. When you have overlapping syntax patterns (like *italic* and **bold**), the sequence of your transformations determines whether your output is correct or completely broken. You’ll also learn that regex engines are “greedy” by default—they match as much as possible—which can cause unexpected behavior when you have multiple instances of a pattern on the same line.

Concepts You Must Understand First

Before diving into implementation, make sure you’re comfortable with:

Concept	Description	Why It Matters
sed script files (`-f script.sed`)	Instead of cramming all your commands into one long command line, you can put them in a file (one command per line) and run `sed -E -f script.sed input.md`.	This makes complex transformations readable, maintainable, and version-controllable.
Multiple `-e` expressions	For simpler cases, you can chain commands on the command line: `sed -E -e 's/pattern1/replacement1/' -e 's/pattern2/replacement2/' file`. Each `-e` adds another command to the script.	Useful for quick one-liners when you don’t need a full script file.
Regex greediness	The quantifiers `` and `+` are “greedy”—they match the longest possible string. In `bold* and more`, the pattern `\\.\\` would match from the first `` all the way to the last `*`, swallowing “and” in between.	Understanding greediness prevents bugs where patterns match more than intended.
Non-greedy workarounds	To fix greediness, use negated character classes like `[^*]+` (“one or more characters that are NOT asterisks”).	sed doesn’t support the `?` lazy quantifier like Perl or Python, so this is your primary tool.
Alternative delimiters	When your patterns or replacements contain `/` characters (like HTML’s `</h1>`), you can use a different delimiter to avoid escaping. The first character after `s` becomes the delimiter: `s#pattern#replacement#` or `s\\|pattern\\|replacement\\|`.	Makes patterns involving slashes much more readable.
*Character classes `[^]` for negation**	The `[^...]` syntax means “match any character EXCEPT these.”	Crucial for non-greedy matching in sed.

Questions to Guide Your Design

Ask yourself these questions as you work through the implementation:

Why should bold patterns be processed before italic?
- Consider what happens to **bold** if you process *...* first. The outer asterisks get converted to <em>, leaving you with <em>*bold*</em> instead of <strong>bold</strong>.
How do you handle multiple bold sections on one line?
- If you have **one** and **two**, does your pattern correctly identify both? Or does it greedily match from the first ** to the last **?
What’s the difference between (.*) and ([^*]+)?
- (.*) matches any characters (including asterisks) as many times as possible. ([^*]+) matches one or more characters that are specifically NOT asterisks. The second is what you need for correct bold/italic parsing.
How do you escape special characters in the replacement?
- The / in </h1> needs escaping as \/ when using / as your delimiter. But if you use # as your delimiter (s#...#...#), you don’t need to escape slashes at all.

Thinking Exercise

Before writing any code, trace through what happens when you process this line with different rule orderings:

**bold** and *italic* and **more bold**

Scenario A: Process bold first, then italic

Start: **bold** and *italic* and **more bold**
After bold rule: <strong>bold</strong> and *italic* and <strong>more bold</strong>
After italic rule: <strong>bold</strong> and <em>italic</em> and <strong>more bold</strong>
Result: Correct!

Scenario B: Process italic first, then bold

Start: **bold** and *italic* and **more bold**
After italic rule (greedy): What happens? The *...* pattern might match from the first * of **bold** to the * in *italic*!
Potential disaster: <em>*bold** and </em>italic<em> and **more bold*</em>
Result: Completely broken!

Scenario C: Using greedy .* for bold Pattern: s/\*\*(.*)\*\*/<strong>\1<\/strong>/g

Start: **bold** and *italic* and **more bold**
The .* matches “bold** and italic and **more bold” (everything between the FIRST and LAST **)
Result: <strong>bold** and *italic* and **more bold</strong> — Wrong!

Scenario D: Using non-greedy [^*]+ for bold Pattern: s/\*\*([^*]+)\*\*/<strong>\1<\/strong>/g

Start: **bold** and *italic* and **more bold**
First match: **bold** → <strong>bold</strong>
Second match: **more bold** → <strong>more bold</strong>
Result: <strong>bold</strong> and *italic* and <strong>more bold</strong> — Correct!

This exercise demonstrates why understanding greediness is essential for this project.

The Interview Questions They’ll Ask

Master this project, and you’ll be prepared for these common interview questions:

“What is regex greediness and how do you handle it?”
- Greedy quantifiers (*, +, {n,m}) match as much text as possible while still allowing the overall pattern to match. In tools like sed that don’t have lazy quantifiers (*?, +?), you work around greediness by using negated character classes. For example, instead of ".*" to match quoted strings, use "[^"]*" to match a quote, then any non-quote characters, then another quote.
“How would you structure a sed script with multiple transformations?”
- Put each transformation on its own line in a .sed file, run with sed -E -f script.sed. Order commands so that more specific patterns (like **bold**) are processed before more general ones (like *italic*). Test each transformation individually before combining them.
“Why might you use # instead of / as a delimiter?”
- When your pattern or replacement contains forward slashes (like file paths /usr/bin/ or HTML tags </div>), using / as the delimiter requires escaping every slash: s/\/usr\/bin\//\/opt\/bin\//. Using an alternate delimiter is much cleaner: s#/usr/bin/#/opt/bin/#.

Hints in Layers

Start with Layer 1. Only move to the next layer if you’re truly stuck.

Layer 1 - Conceptual Hints

Start with a single transformation (headings are easiest) before attempting multiple rules.
Remember that ^ anchors to the start of the line—perfect for matching # or ## at the beginning.
The -E flag enables extended regex, making your patterns cleaner (no escaping parentheses).

Layer 2 - Structural Hints

Your script file should have the transformations in this order: h3, h2, h1, then bold, then italic. (From most specific to least specific in terms of character overlap.)
For headings, you need to capture everything after the # prefix. The pattern ^# (.*)$ captures the title.
Use the g flag on bold and italic substitutions to handle multiple occurrences per line.

Layer 3 - Specific Patterns

Heading 1: s#^# (.*)$#<h1>\1</h1>#
Heading 2: s#^## (.*)$#<h2>\1</h2>#
Bold: s#\*\*([^*]+)\*\*#<strong>\1</strong>#g
Italic: s#\*([^*]+)\*#<em>\1</em>#g

Layer 4 - Complete Script Structure

# converter.sed - Markdown to HTML converter
# Run with: sed -E -f converter.sed notes.md

# Convert headings (order: h3 before h2 before h1 to prevent partial matches)
s#^### (.*)$#<h3>\1</h3>#
s#^## (.*)$#<h2>\1</h2>#
s#^# (.*)$#<h1>\1</h1>#

# Convert emphasis (bold before italic to handle ** before *)
s#\*\*([^*]+)\*\*#<strong>\1</strong>#g
s#\*([^*]+)\*#<em>\1</em>#g

Books That Will Help

Book	Author(s)	Relevant Chapters	Why It Helps
sed & awk, 2nd Edition	Dale Dougherty & Arnold Robbins	Ch. 3: Understanding Regular Expression Syntax, Ch. 4: Writing sed Scripts	The definitive guide to sed scripting. Chapter 4 specifically covers how to structure multi-command scripts.
Classic Shell Scripting	Arnold Robbins & Nelson H.F. Beebe	Ch. 3: Searching and Substitutions, Ch. 5: Pipelines Can Do Amazing Things	Provides context for how sed fits into larger shell scripts and pipelines.
Mastering Regular Expressions, 3rd Edition	Jeffrey E.F. Friedl	Ch. 4: The Mechanics of Expression Processing, Ch. 5: Practical Regex Techniques	Deep dive into how regex engines work, including greediness. Essential for understanding why your patterns behave the way they do.
The Linux Command Line, 2nd Edition	William Shotts	Ch. 19: Regular Expressions, Ch. 20: Text Processing	Beginner-friendly introduction to regex and text processing tools including sed.

Common Pitfalls & Debugging

Problem 1: “Bold and italic output is mangled”

Why: Italic rules ran before bold rules, causing **bold** to be treated as *italic*.
Fix: Always process bold (**) before italic (*).
Quick test: echo '**bold** and *italic*' | sed -E -e 's#\\*\\*([^*]+)\\*\\*#<strong>\\1</strong>#g' -e 's#\\*([^*]+)\\*#<em>\\1</em>#g'

Problem 2: “Multiple bold sections collapse into one”

Why: Greedy (.*) matched from the first ** to the last **.
Fix: Use a negated class like [^*]+ for bold and italic.
Quick test: echo '**one** and **two**' | sed -E 's#\\*\\*([^*]+)\\*\\*#<strong>\\1</strong>#g'

Problem 3: “Parentheses are treated literally”

Why: You forgot -E and used unescaped () in BRE mode.
Fix: Use -E or escape parentheses as \$ and \$.
Quick test: echo '# Title' | sed -E 's#^# (.*)$#<h1>\\1</h1>#'

Problem 4: “Escaping slashes makes the script unreadable”

Why: Using / as the delimiter in HTML replacements.
Fix: Switch to # or | as the delimiter.
Quick test: sed -E 's#^## (.*)$#<h2>\\1</h2>#' notes.md

Definition of Done

Headings #, ##, and ### convert to <h1>, <h2>, <h3> correctly.
Bold and italic are converted correctly and in the right order.
Multiple bold/italic segments on one line are all converted (use g).
The script runs via sed -E -f converter.sed notes.md and matches expected output.
[ ] You documented limitations (no lists or paragraph tags in this version).

Project 4: The Multi-Line Address Parser

File: LEARN_SED_COMMAND.md
Main Programming Language: sed (Bash/Shell)
Alternative Programming Languages: Awk, Perl
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 3: Advanced
Knowledge Area: Advanced sed / Hold Space / Multi-line processing
Software or Tool: sed
Main Book: The Grymoire - SED (An excellent online tutorial)

What you’ll build: A sed script that transforms a multi-line address block into a single, comma-separated line.

Why it teaches sed: This is your first “real” multi-line problem. It is impossible to solve without using the hold space. This project forces you to leave the line-by-line assembly line model and start thinking about how to store and combine information across multiple lines.

Core challenges you’ll face:

“Remembering” previous lines → maps to using H to append lines to the hold space
Knowing when you’re at the end of a block → maps to using patterns (like a blank line) to trigger an action
Processing the combined block → maps to using g or x to bring the collected lines back into the pattern space for a final substitution
Handling newlines in the pattern space → maps to recognizing that the pattern space will now contain \t, \n, \r, \\, \" characters

Key Concepts:

Hold Space: The Grymoire - SED (Advanced section)
Multi-line Commands: N, P, D commands.
Advanced Flow Control: “sed & awk” Ch. 5

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 3. Be prepared to be confused; this is a big conceptual leap.

Real World Outcome

You will have a file addresses.txt:

123 Fake St.
Anytown, ST 12345
USA

456 Main Ave.
Otherville, CA 67890
USA

Your sed script will transform it into:

123 Fake St., Anytown, ST 12345, USA
456 Main Ave., Otherville, CA 67890, USA

Command Line Outcome Example:

$ cat addresses.txt
123 Fake St.
Anytown, ST 12345
USA

456 Main Ave.
Otherville, CA 67890
USA

$ cat > script.sed <<'S'
/^$/ {
  x
  s/^\\n//
  s/\\n/, /g
  p
  d
}
H
S

$ sed -n -f script.sed addresses.txt
123 Fake St., Anytown, ST 12345, USA
456 Main Ave., Otherville, CA 67890, USA

Implementation Hints:

This is a classic sed pattern. Here’s the logic broken down:

The Goal: Read lines and append them together, replacing newlines with “, “. When we see a blank line, we print the result and start over.
The sed Script Logic (in English):
- For every line…
- If this is the last line of the file ($), jump to a special block of code to handle it.
- Read the next line from the input and append it to the pattern space. The two lines are now separated by a \t, \n, \r, \\, \". This is the N command.
- If the pattern space now contains a blank line (\t, \n, \r, \\, \"$), it means we’ve read the line after an address block.
  - We need to process the block (which is everything before the final \t, \n, \r, \\, \").
  - Print the processed part, then delete it, leaving the blank line to be handled in the next cycle. This is what the P and D commands do.
- If it’s not the end of a block, just branch back to the beginning to append another line.
- This creates a loop that “slurps” lines into the pattern space. Once you have the whole block, you can do substitutions.

A simpler Hold Space approach:

Create a script.sed file. You will run it with sed -f script.sed addresses.txt.
For lines that are NOT blank:
- Append the line to the hold space. Use the H command.
- Delete the line from the pattern space so it’s not printed. Use d.
For lines that ARE blank (this is our trigger):
- First, use x to swap the hold space (which contains \t, \n, \r, \\, \"line1\t, \n, \r, \\, \"line2\t, \n, \r, \\, \"line3`) and the pattern space.
- Now the pattern space has your collected address block.
- Perform substitutions to replace the newlines with “, “. The first \t, \n, \r, \\, \" will be at the beginning. s/\t, \n, \r, \\, \"// removes the first one. s/\t, \n, \r, \, "/, /g replaces the rest.
- The line is now formatted and will be printed automatically.
- The hold space now contains the blank line, which is fine. It will be overwritten on the next cycle.

Learning milestones:

You can use H and g to append a line and print the entire buffer → You understand the basics of storing state.
You can trigger an action on a blank line → You know how to use patterns to control script flow.
You successfully replace \t, \n, \r, \\, \" characters in the pattern space → You’ve mastered multi-line substitution.
You can explain the difference between h and H, and g and G → You have a solid mental model of the hold space.

The Core Question You’re Answering

“How can sed ‘remember’ previous lines when it normally only sees one line at a time?”

This is a critical conceptual leap. By default, sed processes text like water flowing through a pipe - each line passes through, gets transformed (or not), and flows out. The line is then forgotten. But what if you need to combine lines? What if your transformation depends on what came before?

The answer is the hold space - sed’s hidden second buffer that persists across lines. Understanding this unlocks an entirely new category of text manipulation.

Concepts You Must Understand First

Before attempting this project, you need a solid grasp of these concepts:

1. The Pattern Space vs Hold Space

+---------------------------------------------------------------------------+
|                          sed's Two Buffers                                |
+---------------------------------------------------------------------------+
|                                                                           |
|   Pattern Space                    Hold Space                             |
|   +--------------------+          +--------------------+                  |
|   |                    |    h     |                    |                  |
|   |  "current line"    | -------> |  "saved data"      |                  |
|   |                    |  (copy)  |                    |                  |
|   |                    |          |                    |                  |
|   |                    |    H     |                    |                  |
|   |  "current line"    | -------> |  "saved\nmore"     |                  |
|   |                    | (append) |                    |                  |
|   |                    |          |                    |                  |
|   |                    |    g     |                    |                  |
|   |  "from hold"       | <------- |  "saved data"      |                  |
|   |                    |  (copy)  |                    |                  |
|   |                    |          |                    |                  |
|   |                    |    G     |                    |                  |
|   |  "current\nsaved"  | <------- |  "saved data"      |                  |
|   |                    | (append) |                    |                  |
|   |                    |          |                    |                  |
|   |                    |    x     |                    |                  |
|   |  <---------------- | <------> | ---------------->  |                  |
|   |                    |(exchange)|                    |                  |
|   +--------------------+          +--------------------+                  |
|                                                                           |
|   ^ Input comes here              Persists across lines!                  |
|   v Output goes from here         (Initially empty)                       |
|                                                                           |
+---------------------------------------------------------------------------+

2. The Hold Space Commands

Command	Name	Action
`h`	hold	Copy pattern space to hold space (overwrites)
`H`	Hold (append)	Append pattern space to hold space (with newline separator)
`g`	get	Copy hold space to pattern space (overwrites)
`G`	Get (append)	Append hold space to pattern space (with newline separator)
`x`	exchange	Swap pattern space and hold space

3. The Multi-Line Commands

Command	Name	Action
`N`	Next	Append next input line to pattern space (with `\n` separator)
`P`	Print	Print up to the first `\n` in pattern space
`D`	Delete	Delete up to the first `\n` in pattern space, then restart cycle

4. How Newlines Appear in the Pattern Space

When you use H, G, or N, newlines (\n) become actual characters in the buffer:

After H command appends "line2" to hold space containing "line1":
+---------------------------+
| l i n e 1 \n l i n e 2    |
+---------------------------+
           ^
    This is a literal \n character you can match and replace!

Questions to Guide Your Design

Before writing any code, answer these questions:

How do you detect the end of an address block?
- What marks the boundary between one address and the next?
- Is it a blank line? A specific pattern? End of file?
What does the pattern space look like after multiple H commands?
- If you append “123 Fake St.”, then “Anytown, ST 12345”, then “USA”…
- What exact string is in the hold space? (Hint: there are \n characters)
How do you replace embedded newlines with “, “?
- Once you have the multi-line block in the pattern space, what substitution converts it to a single line?
- Is there a leading newline you need to handle differently?
What happens to the hold space between blocks?
- After you print one formatted address, is the hold space empty?
- How do you “reset” for the next address block?

Thinking Exercise: Trace the Buffers

Before coding, manually trace what happens for this input:

123 Fake St.
Anytown, ST 12345
USA

For each line, fill in this table:

Input Line	Action	Pattern Space After	Hold Space After
`123 Fake St.`	?	?	?
`Anytown, ST 12345`	?	?	?
`USA`	?	?	?
`` (blank)	?	?	?

Hint for the trace: Think about these steps:

Line arrives in pattern space
If not blank: append to hold space (H), then delete (d) to suppress printing
If blank: exchange (x) to get the collected lines, perform substitutions, print

What should the hold space contain after “USA”?

\n123 Fake St.\nAnytown, ST 12345\nUSA

Note the leading \n - the H command always adds a newline before appending.

The Interview Questions They’ll Ask

This project prepares you for classic interview questions about sed:

“Explain the difference between the pattern space and hold space in sed.”
- Pattern space: The active workspace where sed loads each line and performs commands
- Hold space: A secondary buffer that persists across lines, used for “remembering” data
- Key insight: Pattern space is automatically printed (unless -n), hold space is never printed directly
“How would you join multiple lines into one with sed?”
- Use N to append lines (with newline separator)
- Or use H to collect lines in hold space, then g or x to retrieve
- Replace \n with desired separator using s/\n/, /g
“What does H;g;$p do?”
- H: Append current line to hold space
- g: Copy hold space to pattern space
- $p: On the last line only, print the pattern space
- Result: Collects all lines, prints them all together at the end (with newlines between them)
“Write a sed command to join every two lines with a comma.”
- sed 'N;s/\n/, /' - Read next line, replace the newline with “, “

Hints in Layers

If you’re stuck, reveal hints one at a time:

Hint 1: The Basic Structure

Your script needs two main sections:

# For non-blank lines: store and suppress
/./{ ... }

# For blank lines: process and output
/^$/{ ... }

The /./ matches any line with at least one character (non-blank). The /^$/ matches lines that are empty (start immediately followed by end).

Hint 2: Storing Lines

For non-blank lines, you want to:

Add this line to your “collection” in the hold space
NOT print it yet

The commands you need:

H    # Append pattern space to hold space (with \n)
d    # Delete pattern space (prevents printing, starts next cycle)

Hint 3: Processing the Block

When you hit a blank line, you need to:

Get the collected lines from hold space
Fix the formatting (remove leading newline, replace others with “, “)
Print the result

x           # Exchange: now pattern space has the collected lines
s/^\n//     # Remove the leading newline (from first H)
s/\n/, /g   # Replace remaining newlines with ", "
p           # Print the formatted line

Hint 4: The Complete Script

# For non-blank lines: accumulate in hold space
/./{
    H
    d
}

# For blank lines: format and print the block
/^$/{
    x
    s/^\n//
    s/\n/, /g
    p
}

Run with: sed -n -f script.sed addresses.txt

The -n suppresses automatic printing (we control output with p).

Hint 5: Handling the Last Block (Edge Case)

What if the file doesn’t end with a blank line? The last address block won’t be printed!

Add a handler for the last line:

# Handle last line if no trailing blank line
$!{
    /./{H;d}
}
${
    /./{H}
    x
    s/^\n//
    s/\n/, /g
    /./p
}
/^$/{
    x
    s/^\n//
    s/\n/, /g
    /./p
}

Or use a simpler approach with $ address to always process at end of file.

Books That Will Help

Book	Relevant Chapter	What You’ll Learn
sed & awk, 2nd Edition (Dougherty & Robbins)	Chapter 5: Advanced sed Commands	The definitive explanation of hold space, `h/H/g/G/x`
sed & awk, 2nd Edition	Chapter 6: Advanced sed Techniques	Multi-line patterns, `N/P/D` commands
The Grymoire - SED (online)	“The Hold Buffer” section	Excellent visual explanations and examples
Classic Shell Scripting (Robbins & Beebe)	Chapter 3: Text Processing Tools	Practical context for when to use sed vs awk
Mastering Regular Expressions (Friedl)	Chapter 3: Overview of Regex Features	Understanding the regex that powers sed

Online Resources:

The Grymoire - SED - The best free online tutorial
sed FAQ - Common questions and patterns
Useful one-line scripts for sed - A goldmine of examples

Common Pitfalls & Debugging

Problem 1: “Output starts with a comma or blank space”

Why: H always prepends a newline, so the hold space starts with \n.
Fix: Remove the leading newline before replacing the rest.
Quick test: sed -n '/^$/ { x; s/^\\n//; s/\\n/, /g; p; } H' addresses.txt

Problem 2: “The last block never prints”

Why: Your script only prints on blank lines, and the file ends without one.
Fix: Add a $ handler to process the final block.
Quick test: Add a $ block that swaps and prints when EOF is reached.

Problem 3: “Blocks from different addresses get merged”

Why: You did not reset the hold space after printing.
Fix: Use x carefully and ensure the hold space is cleared between blocks.
Quick test: After printing, run x; s/.*/ /; x or structure logic so the next block overwrites.

Problem 4: “s/\n/, /g does nothing”

Why: The pattern space never contained newlines because you forgot to x or g first.
Fix: Swap or copy hold space into pattern space before replacing.
Quick test: Insert x before the substitution.

Definition of Done

Each address block is printed as a single comma-separated line.
The script handles the final block even without a trailing blank line.
No extra commas or blank lines appear in output.
You can explain how H, x, and s/\\n/ interact.
The solution works on at least three sample blocks.

Project 5: Reversing a File (Line by Line)

File: LEARN_SED_COMMAND.md
Main Programming Language: sed (Bash/Shell)
Alternative Programming Languages: tac (the real tool for this), Python, Perl
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 3: Advanced
Knowledge Area: Advanced sed / Hold Space Mastery
Software or Tool: sed
Main Book: N/A, this is a classic puzzle found in online forums.

What you’ll build: A sed script that reverses the order of lines in a file, printing the last line first and the first line last.

Why it teaches sed: This is the canonical “expert sed” problem. It is impossible without a complete understanding of how the pattern space, hold space, and command flow interact. It forces you to think about how to accumulate the entire file in a buffer and only print it at the very end.

Core challenges you’ll face:

How to avoid printing each line as it’s read → maps to using -n and controlling all output with p
How to accumulate the entire file → maps to repeatedly appending to the hold space
How to reverse the order → maps to a clever trick of prepending, not appending
When to print the final result → maps to using the $ address to trigger a final action on the last line

Key Concepts:

Suppressing Output: The -n flag.
Hold Space Manipulation: G, h, g.
End-of-file Address: The $ address.

Difficulty: Advanced Time estimate: Weekend Prerequisites: Project 4. You should be comfortable with the hold space.

Real World Outcome

Given a file file.txt:

A
B
C

Your script sed -n -f reverse.sed file.txt will output:

C
B
A

Command Line Outcome Example:

$ cat file.txt
A
B
C

$ sed -n '1!G;h;$p' file.txt
C
B
A

Implementation Hints:

This is a puzzle. Think about the state at each step.

The Goal: At the end of the script ($ line), we want the hold space to contain C\t, \n, \r, \\, \"B\t, \n, \r, \, "A.
Line 1 (“A”):
- The pattern space contains “A”.
- We need to store it. h will copy “A” to the hold space.
- Hold space: “A”
Line 2 (“B”):
- The pattern space contains “B”.
- We need to add this to the hold space, but before “A”.
- G appends the hold space to the pattern space. Pattern space: “B\t, \n, \r, \, "`A”.
- h then copies this combined buffer back to the hold space. Hold space: “B\t, \n, \r, \, "`A”.
Line 3 (“C”, the last line $):
- The pattern space contains “C”.
- G appends the hold space. Pattern space: “C\t, \n, \r, \, "B\t, \n, \r, \\, \"A”.
- Now we have our final reversed buffer, but it’s in the pattern space.
- We need to print it. The p command will do this.
Putting it together: The script looks surprisingly simple.
- For every line except the last (!$): Append to hold space in reverse order (G;h).
- For the last line ($): Append the buffer (G) and then print (p).
- Remember to use -n to prevent printing on every line.
- A more elegant solution exists with only two commands. 1!G;h;$p. Can you figure out why that works? (Hint: what happens on line 1?). A three-command solution is {1!G;h;$p}.

Learning milestones:

You can write a script to collect the whole file into the hold space → You understand accumulation.
You figure out the G;h trick to prepend lines → You’ve had the “aha!” moment of advanced sed.
You can control printing to only happen on the very last line → You master the -n and p flags combined with the $ address.
You can write the classic sed '1!G;h;$p' one-liner from memory → You are now a sed wizard.

The Core Question You’re Answering

“How can I accumulate and reorder an entire file using only the hold space?”

This is the ultimate hold space mastery challenge. Unlike Project 4 where you processed blocks delimited by blank lines, here you must hold the entire file in memory while simultaneously reversing its order. There’s no external trigger—you must track everything yourself and know exactly when to release the accumulated content.

Concepts You Must Understand First

Before attempting this project, ensure you have mastered:

Concept	Why It’s Critical
Complete hold space mastery (from Project 4)	You’ll use `G` and `h` in a very specific sequence to achieve reversal
The `-n` flag	Suppresses automatic printing—without this, every line prints immediately, defeating the purpose
The `p` (print) command	When `-n` is active, you control ALL output explicitly with `p`
The `$` (last line) address	Triggers your final action—printing the accumulated reversed content
The `!` (negation) address modifier	Lets you say “all lines EXCEPT line 1” with `1!`
The `1!G;h;$p` pattern	The canonical solution—three commands that accomplish everything

The 1!G;h;$p Pattern Explained:

This cryptic one-liner consists of three commands executed in sequence:

1!G — “If NOT line 1, append hold space to pattern space”
- On line 1, G is skipped (because the hold space is empty and would add a leading newline)
- On all other lines, G appends the previously accumulated content AFTER the current line
h — “Copy pattern space to hold space”
- Always executed—saves the current accumulated state for the next iteration
$p — “If this is the last line, print”
- Only triggers on the final line of input
- Prints the fully reversed content

Visual State Diagram:

┌─────────────────────────────────────────────────────────────────────────────┐
│ Processing a 3-line file: A, B, C                                           │
│ Command: sed -n '1!G;h;$p'                                                  │
└─────────────────────────────────────────────────────────────────────────────┘

═══════════════════════════════════════════════════════════════════════════════
PROCESSING LINE 1 (content: "A")
═══════════════════════════════════════════════════════════════════════════════

  Step 0: sed reads line into pattern space
  ┌─────────────────┐    ┌─────────────────┐
  │ Pattern Space   │    │ Hold Space      │
  │ ┌─────────────┐ │    │ ┌─────────────┐ │
  │ │     A       │ │    │ │   (empty)   │ │
  │ └─────────────┘ │    │ └─────────────┘ │
  └─────────────────┘    └─────────────────┘

  Step 1: "1!G" — This IS line 1, so G is SKIPPED (the ! negates)
  ┌─────────────────┐    ┌─────────────────┐
  │ Pattern Space   │    │ Hold Space      │
  │ ┌─────────────┐ │    │ ┌─────────────┐ │
  │ │     A       │ │    │ │   (empty)   │ │  ← No change
  │ └─────────────┘ │    │ └─────────────┘ │
  └─────────────────┘    └─────────────────┘

  Step 2: "h" — Copy pattern space to hold space
  ┌─────────────────┐    ┌─────────────────┐
  │ Pattern Space   │    │ Hold Space      │
  │ ┌─────────────┐ │    │ ┌─────────────┐ │
  │ │     A       │ │    │ │     A       │ │  ← Now holds "A"
  │ └─────────────┘ │    │ └─────────────┘ │
  └─────────────────┘    └─────────────────┘

  Step 3: "$p" — This is NOT the last line, so p is SKIPPED

  Output: (nothing — suppressed by -n)

═══════════════════════════════════════════════════════════════════════════════
PROCESSING LINE 2 (content: "B")
═══════════════════════════════════════════════════════════════════════════════

  Step 0: sed reads line into pattern space
  ┌─────────────────┐    ┌─────────────────┐
  │ Pattern Space   │    │ Hold Space      │
  │ ┌─────────────┐ │    │ ┌─────────────┐ │
  │ │     B       │ │    │ │     A       │ │
  │ └─────────────┘ │    │ └─────────────┘ │
  └─────────────────┘    └─────────────────┘

  Step 1: "1!G" — This is NOT line 1, so G EXECUTES
          G appends hold space to pattern space with \n between
  ┌─────────────────┐    ┌─────────────────┐
  │ Pattern Space   │    │ Hold Space      │
  │ ┌─────────────┐ │    │ ┌─────────────┐ │
  │ │     B       │ │    │ │     A       │ │
  │ │     ↓       │ │    │ └─────────────┘ │
  │ │     A       │ │    └─────────────────┘
  │ └─────────────┘ │
  └─────────────────┘
  Pattern space now: "B\nA" (B is BEFORE A — reversal in progress!)

  Step 2: "h" — Copy pattern space to hold space
  ┌─────────────────┐    ┌─────────────────┐
  │ Pattern Space   │    │ Hold Space      │
  │ ┌─────────────┐ │    │ ┌─────────────┐ │
  │ │     B       │ │    │ │     B       │ │
  │ │     ↓       │ │    │ │     ↓       │ │
  │ │     A       │ │    │ │     A       │ │
  │ └─────────────┘ │    │ └─────────────┘ │
  └─────────────────┘    └─────────────────┘

  Step 3: "$p" — This is NOT the last line, so p is SKIPPED

  Output: (nothing)

═══════════════════════════════════════════════════════════════════════════════
PROCESSING LINE 3 (content: "C") — THE LAST LINE
═══════════════════════════════════════════════════════════════════════════════

  Step 0: sed reads line into pattern space
  ┌─────────────────┐    ┌─────────────────┐
  │ Pattern Space   │    │ Hold Space      │
  │ ┌─────────────┐ │    │ ┌─────────────┐ │
  │ │     C       │ │    │ │     B       │ │
  │ └─────────────┘ │    │ │     ↓       │ │
  └─────────────────┘    │ │     A       │ │
                         │ └─────────────┘ │
                         └─────────────────┘

  Step 1: "1!G" — This is NOT line 1, so G EXECUTES
  ┌─────────────────┐    ┌─────────────────┐
  │ Pattern Space   │    │ Hold Space      │
  │ ┌─────────────┐ │    │ ┌─────────────┐ │
  │ │     C       │ │    │ │     B       │ │
  │ │     ↓       │ │    │ │     ↓       │ │
  │ │     B       │ │    │ │     A       │ │
  │ │     ↓       │ │    │ └─────────────┘ │
  │ │     A       │ │    └─────────────────┘
  │ └─────────────┘ │
  └─────────────────┘
  Pattern space now: "C\nB\nA" (FULLY REVERSED!)

  Step 2: "h" — Copy pattern space to hold space (doesn't matter now)

  Step 3: "$p" — This IS the last line, so p EXECUTES!

  ╔═══════════════════════════════════════╗
  ║  OUTPUT:                              ║
  ║    C                                  ║
  ║    B                                  ║
  ║    A                                  ║
  ╚═══════════════════════════════════════╝

The Key Insight: By using G (append hold to pattern) instead of H (append pattern to hold), we naturally build the content in reverse order. Each new line becomes the TOP of our accumulated buffer.

Questions to Guide Your Design

Before writing any code, answer these questions:

Why use -n with explicit p?
- Without -n, sed prints every line automatically after processing
- We want to accumulate ALL lines and print only at the end
- Think: What would happen if lines printed as they were processed?
Why G before h (not after)?
- G appends hold space AFTER pattern space: pattern\nhold
- If we did h first, we’d overwrite the hold space before using it
- The sequence matters: use the old value, THEN update it
Why skip G on line 1?
- On line 1, the hold space is empty
- G would still append a newline (the separator) even with empty content
- This would create a leading blank line in output
When should we print?
- Only on the last line ($)
- At that point, the pattern space contains the complete reversed file
- If we printed earlier, we’d get partial results

Thinking Exercise

Trace the exact state of pattern space and hold space for a 4-line file:

Input file (numbers.txt):
1
2
3
4

Complete this table by hand BEFORE running the command:

Line #	After Read	After 1!G	After h	After $p	Hold Space Contents
1	Pattern: “1”	Pattern: “1” (G skipped)	Pattern: “1”	(p skipped)	“1”
2	Pattern: “2”	Pattern: ?	Pattern: ?	?	?
3	Pattern: “3”	Pattern: ?	Pattern: ?	?	?
4	Pattern: “4”	Pattern: ?	Pattern: ?	?	?

Then verify your answers:

# Create test file
printf '1\n2\n3\n4\n' > numbers.txt

# Run with debug output (using labels and branches for tracing)
sed -n '1!G;h;$p' numbers.txt

Expected final output: 4, 3, 2, 1 (each on its own line)

The Interview Questions They’ll Ask

This is a classic interview question for Unix/Linux positions. Be prepared for:

Question 1: “Write a sed one-liner to reverse a file”

# Your answer should be:
sed -n '1!G;h;$p' filename

# Or the equivalent with curly braces:
sed -n '{1!G;h;$p}' filename

Question 2: “Explain what sed '1!G;h;$p' does step by step”

Your explanation should cover:

Why -n is needed (suppress auto-print)
What 1! means (NOT line 1)
What G does (append hold to pattern with newline)
What h does (copy pattern to hold)
What $p does (print on last line only)
WHY this results in reversal (new lines go on TOP of accumulated content)

Question 3: “Why is there a tac command when you can use sed?”

Good answer points:

tac is purpose-built and more readable for this specific task
tac is likely more memory-efficient for large files
sed solution demonstrates understanding of the tool’s internals
The sed approach is valuable when you need to add additional transformations
In environments without tac (some minimal Unix systems), sed is available

Bonus Question: “What happens if the file is empty?”

With an empty file, there are no lines to process
The $ condition never triggers because there’s no “last line”
Output is empty (correct behavior)

Bonus Question: “What happens with a very large file?”

The entire file accumulates in the hold space
For files larger than available memory, this will fail
tac handles this better with buffering strategies
For production use on large files, prefer tac or tail -r

Hints in Layers

If you’re stuck, reveal these hints one at a time:

Hint 1: The Big Picture

You need to:

Suppress all automatic output (-n)
Build up content in reverse order
Print everything at the very end

The hold space is your accumulator. But HOW you add to it matters.

Hint 2: Append vs. Prepend

There’s no “prepend to hold space” command in sed. But think about what G does:

G appends the HOLD space to the PATTERN space
This puts the NEW line BEFORE the accumulated content
Then h saves this new arrangement

So the pattern is: G to combine (new on top), h to save.

Hint 3: The Line 1 Problem

On line 1, the hold space is empty. If you run G, you get:

current_line + \n + (empty) = current_line\n

That trailing newline becomes a leading blank line in your output. Solution: Skip G on line 1. Use 1!G (execute G on all lines EXCEPT line 1).

Hint 4: When to Print

You only want to print when you have the complete reversed file. That’s when you’ve processed the last line. The $ address matches the last line. So $p means “if last line, print”.

Hint 5: The Complete Solution

sed -n '1!G;h;$p' filename

Breaking it down:

-n: Don’t print automatically
1!G: On lines 2+, append hold space to pattern space
h: Always copy pattern space to hold space
$p: On the last line, print the pattern space

Run on line 1: skip G, copy “line1” to hold Run on line 2: G makes “line2\nline1”, h saves it Run on line 3: G makes “line3\nline2\nline1”, h saves it …on last line: G combines, h saves, p prints the reversed file

Books That Will Help

Book	Author(s)	Why It’s Relevant
sed & awk, 2nd Edition	Dale Dougherty & Arnold Robbins	Chapter 6 covers advanced sed techniques including hold space patterns
The UNIX Programming Environment	Kernighan & Pike	Classic text with stream editing philosophy and real-world examples
Classic Shell Scripting	Arnold Robbins & Nelson H.F. Beebe	Practical sed recipes including file reversal and multi-line patterns
Linux Command Line and Shell Scripting Bible	Richard Blum & Christine Bresnahan	Accessible coverage of sed with many working examples
The Grymoire - SED Tutorial	Bruce Barnett	Free online resource with excellent hold space explanations

Online Resources:

GNU sed Manual — The authoritative reference
sed FAQ — Common questions and advanced patterns
Sed One-Liners Explained — Peteris Krumins’ excellent breakdown

Common Pitfalls & Debugging

Problem 1: “Output has a blank line at the top”

Why: You ran G on line 1, which appends an empty hold space with a newline.
Fix: Skip G on the first line using 1!G.
Quick test: sed -n '1!G;h;$p' file.txt

Problem 2: “Output is in the same order”

Why: You used H (append pattern to hold) instead of G (append hold to pattern).
Fix: Use G;h to prepend the accumulated hold space to the current line.
Quick test: Compare H;h vs G;h on a 3-line file.

Problem 3: “Every line prints immediately”

Why: You forgot -n or used p without $.
Fix: Add -n and print only on the last line with $p.
Quick test: sed -n '1!G;h;$p' file.txt

Problem 4: “Script fails on huge files”

Why: This method stores the entire file in memory.
Fix: Use tac or tail -r for large inputs in production.
Quick test: Measure memory usage on large files and compare with tac.

Definition of Done

sed -n '1!G;h;$p' file.txt produces the exact reversed order.
No extra blank line appears at the top of the output.
The script works for empty files and single-line files.
You can explain why G is skipped on line 1.
[ ] You documented the memory trade-off vs tac.

Summary

Project	Difficulty	Key Learning
1. Config File Updater	Beginner	Basic substitution (`s`), in-place editing (`-i`).
2. Log File Cleaner	Beginner	Regex capturing groups (`\t`, `\n`, `\r`, `\\`, `\"`) for reformatting.
3. Markdown to HTML	Intermediate	Writing multi-command scripts, command order.
4. Multi-Line Address Parser	Advanced	The Hold Space (`H`, `g`, `x`) for multi-line logic.
5. Reversing a File	Advanced	Mastery of the hold space and advanced flow control.

Concept	Description	Example
Capturing groups (BRE)	In Basic Regular Expressions, parentheses must be escaped: `\(` and `\)`	`sed 's/\(hello\)/\1 world/'`
Capturing groups (ERE)	In Extended Regular Expressions (`-E` flag), use unescaped parentheses	`sed -E 's/(hello)/\1 world/'`
Backreferences	Reference captured groups with `\1`, `\2`, `\3`, etc. in the replacement	`s/(a)(b)/\2\1/` swaps a and b
Extended regex flag	Use `-E` (POSIX) or `-r` (GNU) to enable ERE syntax	`sed -E 's/(pattern)/\1/'`
Pattern negation	The `!` modifier inverts the address match	`/ERROR/!d` deletes non-ERROR lines
The `d` command	Deletes the current line from the pattern space (no output)	`sed '/DEBUG/d'` removes DEBUG lines

Learn The sed Command: From Stream Editor to Text Manipulation Wizard

Introduction

The Big Picture (Mental Model)

Key Terms You Will See Everywhere

How to Use This Guide

Prerequisites & Background

Essential Prerequisites (Must Have)

Helpful But Not Required

Self-Assessment Questions

Development Environment Setup

Time Investment

Important Reality Check

Big Picture / Mental Model

Theory Primer

Chapter 1: Stream Editing Execution Model

Fundamentals

Deep Dive into the Concept

How This Fits on Projects

Definitions and Key Terms

Mental Model Diagram

How It Works (Step-by-Step)

Minimal Concrete Example

Common Misconceptions

Check-Your-Understanding Questions

Check-Your-Understanding Answers

Real-World Applications

Where You Will Apply It

References

Key Insight

Summary

Homework/Exercises to Practice the Concept

Solutions to the Homework/Exercises

Chapter 2: Addressing and Line Selection

Fundamentals

Deep Dive into the Concept

How This Fits on Projects

Definitions and Key Terms

Mental Model Diagram

How It Works (Step-by-Step)

Minimal Concrete Example

Common Misconceptions

Check-Your-Understanding Questions

Check-Your-Understanding Answers

Real-World Applications

Where You Will Apply It

References

Key Insight

Summary

Homework/Exercises to Practice the Concept

Solutions to the Homework/Exercises

Chapter 3: Regular Expressions and Substitution

Fundamentals

Deep Dive into the Concept

How This Fits on Projects

Definitions and Key Terms

Mental Model Diagram

How It Works (Step-by-Step)

Minimal Concrete Example

Common Misconceptions

Check-Your-Understanding Questions

Check-Your-Understanding Answers

Real-World Applications

Where You Will Apply It

References

Key Insight

Summary

Homework/Exercises to Practice the Concept

Solutions to the Homework/Exercises

Chapter 4: Script Composition and Control Flow

Fundamentals

Deep Dive into the Concept

How This Fits on Projects

Definitions and Key Terms

Mental Model Diagram

How It Works (Step-by-Step)

Minimal Concrete Example

Common Misconceptions

Check-Your-Understanding Questions

Check-Your-Understanding Answers

Real-World Applications

Learn The `sed` Command: From Stream Editor to Text Manipulation Wizard