Learn The sed Command: From Stream Editor to Text Manipulation Wizard
Goal: Build deep, operational mastery of
sedas a stream editor: how it reads input, how the execution cycle works, how addresses select lines, how regular expressions and substitutions transform text, and how the hold space enables multi-line logic. By the end, you will be able to design safe, portablesedscripts that automate config edits, log cleanup, and structured text transformations with confidence. You will also learn how to debugsedprograms, reason about their side effects, and choose whensedis the right tool versusawk,perl, or a full scripting language.
Introduction
sed is a stream editor: it reads text from a file or pipeline, applies a script of editing commands to each line (or group of lines), and writes the transformed output to standard output. It is designed for one-pass, streaming transformations, which makes it fast and composable in Unix pipelines. It is not an interactive editor; it is a programmable text transformation engine.
What you will build (by the end of this guide):
- One-liner and script-based
sedtools that safely edit configuration files - Log-cleaning pipelines that extract and reformat structured messages
- A mini Markdown-to-HTML converter built entirely with
sedscripts - Multi-line parsing tools that use the hold space for block-level edits
- Advanced stream transformations like reversing files line-by-line
Scope (what is included):
- The
sedexecution model (pattern space, hold space, execution cycle) - Addressing, ranges, and selection logic
- Regular expressions (BRE and ERE) as used by
sed - Substitution mechanics and backreferences
- Script structure, control flow, and multi-line techniques
- In-place editing and portability across GNU/BSD implementations
Out of scope (for this guide):
- Building full parsers for complex languages (use a parser generator)
- Large-scale ETL pipelines (use
awk,python, orjqfor JSON) - Binary file manipulation (use
od,xxd, or dedicated tools)
The Big Picture (Mental Model)
Input Stream sed Script Output Stream
(file/stdin) (addresses + commands) (stdout/file)
| | |
v v v
+---------+ +---------------------------+ +--------------+
| line 1 |-->| addr -> cmd -> cmd -> cmd |---->| transformed |
| line 2 |-->| addr -> cmd -> cmd -> cmd |---->| transformed |
| line 3 |-->| addr -> cmd -> cmd -> cmd |---->| transformed |
+---------+ +---------------------------+ +--------------+
^ ^
| |
| +-----------+
+-------------| hold space|
+-----------+
Key Terms You Will See Everywhere
- Pattern space: The current working buffer
sededits each cycle. - Hold space: A secondary buffer that persists across cycles.
- Address: A selector that chooses which lines a command applies to.
- Script: A sequence of editing commands (
s,d,p,n, etc.). - Cycle: The read -> apply commands -> print -> reset loop.
- BRE/ERE: Basic vs Extended Regular Expressions.
- Backreference: A reference to captured text in a replacement.
How to Use This Guide
- Read the Theory Primer first. Treat it like a mini-book. It builds the mental models you need to reason about
sedcorrectly. - Then complete the projects in order. Each project adds a new capability and reuses the previous concepts.
- Always validate by inspection. Use
-nwith explicitpto avoid accidental changes. - Practice portability. Test your commands on both GNU and BSD
sedif you can (Linux vs macOS). - Keep a scratch file. You will learn faster by experimenting and observing pattern/hold space behavior.
Prerequisites & Background
Before starting these projects, you should have foundational understanding in these areas:
Essential Prerequisites (Must Have)
Command-Line Skills:
- Navigating directories, redirecting input/output, and piping commands
- Editing files with
cat,printf, and a text editor - Basic shell scripting (variables, quoting, and subshells)
Text and Regex Fundamentals:
- What a regular expression is and how it matches text
- The meaning of
.*^$and character classes like[0-9] - The idea of capturing groups and backreferences
Unix Tools Fundamentals:
- When to use
grep,awk, andsed - The difference between stdout, stderr, and files
Helpful But Not Required
Advanced Regex:
- Lookarounds and non-greedy matching (not supported in sed BRE/ERE, but helpful context)
- Named groups (not supported in sed, but helps mental mapping)
Scripting Discipline:
- Writing small test fixtures
- Running scripts against sample data and checking diffs
Self-Assessment Questions
Before starting, ask yourself:
- Can I explain what a pipeline does in a Unix shell?
- Do I understand the difference between stdout and an in-place edit?
- Can I read and write a simple regex like
^ERRORor[0-9]+? - Am I comfortable editing a file with a script rather than a GUI editor?
- Do I know how to make backups before destructive edits?
If you answered “no” to questions 1-3: Spend a few days with basic CLI and regex resources before starting.
If you answered “yes” to all 5: You are ready to begin.
Development Environment Setup
Required Tools:
- A Unix-like shell (Linux, macOS, WSL)
sed(GNU or BSD)- A text editor (vim, nano, VS Code)
Recommended Tools:
rg(ripgrep) for fast pattern searchesdifforgit difffor validating file editsperlorawkfor comparison exercises
Testing Your Setup:
# Check sed availability
$ sed --version 2>/dev/null | head -n 1 || sed -h 2>&1 | head -n 1
# Verify ERE support (GNU: -E or -r, BSD: -E)
$ printf 'a1\n' | sed -E 's/[0-9]+/X/'
X
# Confirm you can do a dry-run substitution
$ printf 'DEBUG=true\n' | sed 's/true/false/'
DEBUG=false
Time Investment
- Simple projects (1, 2): Weekend (4-8 hours each)
- Moderate projects (3): 1 week (10-20 hours)
- Complex projects (4, 5): 1-2 weeks each
- Total sprint: 5-8 weeks if done sequentially
Important Reality Check
sed mastery is a compounding skill. The learning happens in layers:
- First pass: Make it work (copy-paste is fine to start)
- Second pass: Understand what each command does
- Third pass: Understand why the command order matters
- Fourth pass: Understand portability and failure modes
This is normal. Stream editing is subtle. Take your time.
Big Picture / Mental Model
sed is a tiny virtual machine with two buffers and a loop.
+------------------+
Input stream -->| Pattern space |--> output (unless -n)
+------------------+
^ |
| v
+------------------+
| Hold space |
+------------------+
Cycle:
1) Read next line into pattern space
2) Run script commands in order
3) If not suppressed, print pattern space
4) Clear pattern space and repeat
Once you internalize this loop, every sed script becomes predictable.
Theory Primer
This section is the mini-book. Each chapter is a concept cluster you will use in the projects.
Chapter 1: Stream Editing Execution Model
Fundamentals
sed is built around a repeatable execution cycle. Think of it as a deterministic loop that reads one line, applies a fixed script, and then outputs. This is the core reason sed is fast and composable: it does not need to load an entire file into memory, and it can start producing output as soon as the first line is processed. Most sed programs are therefore streaming transformations rather than whole-file transformations.
The cycle has a working buffer called the pattern space. Each cycle starts with sed reading the next line of input into the pattern space. It then runs your script (a list of commands like s, d, p, n, q) in order. After those commands run, the pattern space is printed automatically unless printing is suppressed with -n or the line has been deleted with d. The cycle then repeats.
This mental model explains 80 percent of sed behavior. If you understand how and when the pattern space changes, you will always be able to predict the output. It also explains why command order matters: each command sees the result of the previous command in the same pattern space.
There is also a second buffer, the hold space. Unlike the pattern space, the hold space persists across cycles. It exists specifically to solve the problem of multi-line logic in a tool designed for line-at-a-time processing. Commands like h, H, g, G, and x let you move data between the pattern space and hold space. You will use this later for advanced projects, but it is still part of the execution model from the beginning.
In short: sed is a small virtual machine with a loop and two buffers. If you understand the loop and the buffers, you understand sed.
Deep Dive into the Concept
The stream-editing model is not just an implementation detail; it is a programming style. When you write a sed script, you are writing instructions for a deterministic pipeline. That pipeline has invariants:
- Invariant 1: The input is processed in order.
sedreads input from left to right and line by line. A single cycle cannot reach backward unless you stored something in the hold space earlier. - Invariant 2: Commands run in script order. If you run
s/old/new/and thens/new/old/, you will likely undo your own work. Order is a feature, not an accident. - Invariant 3: The pattern space is the only mutable working buffer. Unless you explicitly copy something into the hold space, all state is transient and will be replaced by the next line.
- Invariant 4: Output is a side effect of the end of a cycle.
sedprints automatically unless you explicitly suppress printing or delete the pattern space.
The last invariant is why -n is so powerful. With -n, you disable automatic printing and take full control with p. This is the safest way to develop sed scripts because it prevents accidental output when a command fails to match. It also makes complex scripts easier to reason about: output becomes an explicit decision rather than a default side effect.
The execution cycle also interacts with commands that explicitly advance input. The n command reads the next input line and starts a new cycle (or continues commands depending on the implementation). The N command appends the next line to the current pattern space with a newline, creating a multi-line pattern space. The moment you use N, your script changes from line-at-a-time processing to multi-line processing. This is a powerful technique but also a common source of bugs. When the pattern space contains embedded newlines, regex anchors like ^ and $ still match only the beginning and end of the entire pattern space, not each line. This matters when you attempt to apply line-based logic to multi-line buffers.
Another practical constraint: POSIX requires the pattern space and hold space to hold at least 8192 bytes. Many implementations allow larger buffers, but multi-line accumulation can still hit limits on constrained systems. Design multi-line scripts that keep buffers small and stream whenever possible.
Another subtle point: sed is defined by POSIX and implemented by GNU, BSD, BusyBox, and other variants. The core cycle is the same, but the limits and extensions can differ. For example, the POSIX specification describes a pattern space and hold space and explains the read-edit-print cycle, but GNU sed adds debugging and extra flags. POSIX also requires the pattern space and hold space to be able to hold at least 8192 bytes; most modern implementations allow more, but scripts that accumulate large blocks can still hit limits in constrained environments. Understanding the model (and the limits) allows you to write scripts that are portable and correct across these variants.
Failure modes are often execution-cycle failures:
- Double printing: forgetting
-nand usingpresults in duplicate output. - Unexpected deletion: using
dmid-script discards the pattern space and immediately starts the next cycle, skipping commands that follow. - Silent no-op: a command that is not addressed to the correct lines does nothing, so your output remains unchanged.
- Misordered commands: applying transformations in the wrong order makes later patterns fail or match unintended text.
A good debugging practice is to start with -n, then add p for the lines you want to inspect. When you are confident, remove -n or keep it and explicitly print only what you want. This approach aligns with the execution model and reduces mistakes.
How This Fits on Projects
The execution model powers every project in this guide. Projects 1 and 2 rely on understanding that each line is processed independently unless you explicitly combine lines. Projects 3 and 4 require careful ordering of multiple commands in a single script. Project 5 depends on mastering the cycle and using the hold space to accumulate results across cycles.
Definitions and Key Terms
- Stream editor: A tool that processes input sequentially and produces output as it goes, without loading the entire file.
- Pattern space: The current line or multi-line buffer being edited.
- Hold space: A persistent secondary buffer for cross-line state.
- Cycle: The read -> edit -> print -> reset loop.
- Automatic printing: Default behavior that prints pattern space at the end of each cycle.
- Script: The ordered list of
sedcommands.
Mental Model Diagram
Cycle N
--------
1) Read input line into pattern space
2) Run commands in order
3) (Optional) Print pattern space
4) Clear pattern space
5) Next line -> next cycle
Hold Space
----------
Persists across cycles; used only when you copy or exchange data
How It Works (Step-by-Step)
sedreads a line from stdin or a file into the pattern space.- It runs each command in your script from top to bottom.
- If a command deletes the pattern space (
d), it immediately starts the next cycle. - If printing is not suppressed, the pattern space is printed at the end.
- The pattern space is cleared, and the next line is read.
- The hold space is unchanged unless you explicitly modify it.
Minimal Concrete Example
# Print only the first three lines (explicit output control)
$ sed -n '1,3p' file.txt
Common Misconceptions
- Misconception:
sedloads the entire file into memory.- Correction:
sedis a stream editor; it works line by line by default.
- Correction:
- Misconception:
-nis optional and only for performance.- Correction:
-nis for correctness and control of output.
- Correction:
- Misconception: The hold space is always used automatically.
- Correction: The hold space is only used when you explicitly move data.
Check-Your-Understanding Questions
- What is the difference between the pattern space and the hold space?
- Why does
sed -nprevent accidental duplicate output? - What happens to the pattern space when the
dcommand executes? - When you use
N, how does the pattern space change?
Check-Your-Understanding Answers
- The pattern space is the current working buffer for the line being processed; the hold space is a persistent buffer across cycles.
-ndisables automatic printing so only explicitpcommands output text.- The current pattern space is deleted and
sedimmediately starts the next cycle. Nappends the next input line to the current pattern space with a newline, creating a multi-line buffer.
Real-World Applications
- Filtering and transforming logs in real time
- Applying safe configuration edits in deployment scripts
- Converting structured text formats without loading large files
- Building data cleanup stages in shell pipelines
Where You Will Apply It
- Project 1: Config File Updater
- Project 2: Log File Cleaner
- Project 3: Markdown to HTML Converter
- Project 4: Multi-Line Address Parser
- Project 5: Reversing a File
References
- GNU sed Manual: overview, command behavior, and GNU extensions
- https://www.gnu.org/software/sed/manual/sed.html
- POSIX sed specification (execution cycle, pattern and hold space, minimum buffer size)
- https://man7.org/linux/man-pages/man1/sed.1p.html
- OpenBSD sed manual (portable behavior reference)
- https://man.openbsd.org/OpenBSD-current/sed.1
Key Insight
If you can predict the state of the pattern space at every point in the script, you can predict the output.
Summary
The execution cycle is the backbone of sed. Learn it once and every command becomes an understandable transformation rather than a magic incantation.
Homework/Exercises to Practice the Concept
- Write a
sedcommand that prints only odd-numbered lines from a file. - Write a
sedcommand that prints lines 5 through 10 and nothing else. - Build a one-liner that deletes any blank line and prints the rest.
Solutions to the Homework/Exercises
sed -n '1~2p' file.txt(GNU extension) orsed -n '1p;3p;5p'for POSIX.sed -n '5,10p' file.txtsed '/^$/d' file.txt
Chapter 2: Addressing and Line Selection
Fundamentals
Addresses are how you tell sed where a command should apply. Without an address, a command applies to every line. With an address, it applies only to lines that match a specific line number, range, or regular expression. This is the core of sed precision: the same substitution command can be either a blunt hammer or a surgical scalpel depending on the address you attach to it.
The three most common address types are:
- Line numbers:
3s/foo/bar/applies only to line 3. - Ranges:
5,10ddeletes lines 5 through 10. - Regex patterns:
/ERROR/s/foo/bar/applies only to lines matching ERROR.
A context address uses a delimiter to wrap the regex (traditionally /, but most implementations allow any delimiter except newline). If your pattern contains many slashes, you can switch delimiters to keep the address readable.
Addresses can be combined with the ! negation operator. !/ERROR/d or /ERROR/!d lets you invert logic and operate on lines that do not match a pattern. This is essential for filters.
The range address is especially powerful. /START/,/END/ selects a block from the first line matching START through the next line matching END, inclusive. This turns sed into a block processor that can isolate sections of a file without loading the whole file into memory.
Deep Dive into the Concept
Addresses are evaluated for each cycle and determine whether a command runs. This seems simple until you combine multiple addresses and commands. Here are key details and edge cases:
-
Range addresses have state. When you specify
/START/,/END/, the range is inactive until a line matches START. After that, all lines are selected until a line matches END. Then the range deactivates and waits for the next START. This statefulness is why ranges can be used to process repeated blocks in a file. -
Ranges are inclusive. Both the START and END lines are included. If you want to exclude the START or END line from output, you need an extra command to delete or print conditionally.
-
Regex addresses are evaluated against the current pattern space. If you have already modified the line with
s, the address for subsequent commands sees the modified text, not the original line. This matters when you chain commands. If you want to address on the original line, put the address-based command before any substitution that changes the line. -
The
$address selects the last line. This is commonly used for end-of-file logic or for printing a summary once. -
Multiple addresses can be paired with a command. For example,
1,10!dmeans “delete every line that is not in the range 1 through 10.” This is a standard idiom for extracting ranges. -
Order matters across commands. If you use one command to delete lines and another to substitute, the deleted lines are gone for subsequent commands in the same cycle. If you need to transform then delete, you must reorder accordingly.
-
GNU extensions exist. GNU
sedprovides additional addressing conveniences (like0,/pattern/for “up to first match”), but these are not portable. If you care about portability, stick to POSIX address forms.
Failures in sed scripts often come from addressing mistakes:
- Off-by-one ranges: Forgetting that ranges are inclusive leads to deleting or printing one extra line.
- Greedy regex addresses: A loose regex like
/END/might match unintended lines and prematurely close a range. - Modified text confusion: An address evaluated after substitutions might not match as expected.
A disciplined approach is to build your address first with -n and p. For example: sed -n '/START/,/END/p' file to confirm the selection, and then add your transformation command.
How This Fits on Projects
Addressing is essential in Projects 1 and 2 where you target specific config lines or log levels. Project 3 depends on addresses to identify headings and inline formatting. Project 4 uses range addresses to process multi-line blocks separated by blank lines. Project 5 uses addresses for “all but first” and “last line” behavior.
Definitions and Key Terms
- Address: A selector that decides whether a command runs on a line.
- Range: A stateful address that selects a start-to-end block.
- Negation (
!): Inverts an address selection. - Last line (
$): An address for end-of-file logic.
Mental Model Diagram
Line stream
-----------
1 [no match]
2 [START] -> range activates
3 [in range]
4 [in range]
5 [END] -> range ends
6 [no match]
7 [START] -> range activates again
8 [END] -> range ends
How It Works (Step-by-Step)
sedreads a line into the pattern space.- It evaluates the address for the current command.
- If the address matches, the command executes.
- For range addresses,
sedmaintains internal state to know whether it is currently in-range. - The next command runs, possibly with a different address.
Minimal Concrete Example
# Print only lines between START and END markers
$ sed -n '/^START/,/^END/p' file.txt
Common Misconceptions
- Misconception: Addresses are evaluated before any command runs.
- Correction: Addresses are evaluated at the moment the command is reached, against the current pattern space.
- Misconception: Ranges only occur once.
- Correction: Ranges can activate multiple times as the file is processed.
- Misconception:
/pattern/always matches the original file line.- Correction: It matches the current pattern space, which might already be modified.
Check-Your-Understanding Questions
- What is the difference between
3dand/3/d? - How does
/START/,/END/behave if END appears before START? - What does
1,10!ddo? - Why might
/ERROR/fail to match after a substitution?
Check-Your-Understanding Answers
3ddeletes line number 3./3/ddeletes any line containing the character3.- The range only activates when START appears; END before START does nothing.
- It deletes every line not in the range 1 through 10 (so it keeps only lines 1-10).
- Because the substitution may have removed or changed the substring
ERRORin the pattern space.
Real-World Applications
- Extracting specific sections of config files
- Deleting blocks of text between markers
- Filtering logs to only a severity range
- Applying transformations to only lines matching a pattern
Where You Will Apply It
- Project 1: Targeting a specific config line with a regex address
- Project 2: Filtering ERROR lines and deleting everything else
- Project 3: Targeting headings and emphasis markers
- Project 4: Block selection with range addresses
- Project 5: Line 1 and last-line logic
References
- GNU sed Manual: address forms and command application
- https://www.gnu.org/software/sed/manual/sed.html
- POSIX sed specification: address syntax, range behavior, and execution model
- https://man7.org/linux/man-pages/man1/sed.1p.html
- OpenBSD sed manual: address forms and context address rules
- https://man.openbsd.org/OpenBSD-current/sed.1
Key Insight
Addresses are the “if” statement of sed. They are your control flow for where transformations happen.
Summary
Mastering addresses turns sed from a blunt text replacer into a targeted text surgeon.
Homework/Exercises to Practice the Concept
- Print only the last line of a file.
- Delete all lines between two markers, inclusive.
- Print all lines except those containing the word DEBUG.
Solutions to the Homework/Exercises
sed -n '$p' file.txtsed '/^START/,/^END/d' file.txtsed '/DEBUG/d' file.txt
Chapter 3: Regular Expressions and Substitution
Fundamentals
The s (substitute) command is the heart of sed. Its syntax is simple: s/regex/replacement/flags. The power comes from the regular expression and the replacement string. sed uses Basic Regular Expressions (BRE) by default. In BRE, characters like +, ?, |, (, ) and {} are not special unless escaped. With Extended Regular Expressions (ERE) (enabled by -E), those operators become special without escaping. Some implementations (GNU and OpenBSD) also accept -r as an alias for extended regex, but -E is the portable POSIX-friendly choice.
Substitution is not just find-and-replace. With capturing groups and backreferences, you can reformat text, reorder fields, and normalize syntax. For example, you can convert YYYY/MM/DD to DD-MM-YYYY or extract only the meaningful portion of a log line.
The replacement string has its own mini-language. & expands to the entire matched text. \1, \2, etc. expand to captured groups. If you want to include the literal delimiter or a backslash, you must escape it appropriately.
Deep Dive into the Concept
Regex and substitution are the most common source of subtle sed bugs. The key is to distinguish between matching and replacing:
- Matching (left side): This is a regex. It can include anchors (
^,$), character classes ([[:digit:]]), and grouping (\(...\)in BRE or( ... )in ERE). - Replacement (right side): This is not a regex; it is a literal string with backreference expansions. A dot
.in the replacement is just a dot. Parentheses are not groups in the replacement. Only&and\1,\2, etc. have special meaning.
A typical mistake is to forget that BRE requires escaping of grouping parentheses. Another is to forget that the default substitution replaces only the first match on each line. The g flag is required to replace all matches on a line. The p flag prints lines where a substitution occurred (useful with -n). The i flag makes matching case-insensitive in GNU/BSD sed.
Delimiters are another common source of errors. The slash / is conventional but not required. You can use s|a/b|c/d| to avoid escaping every slash when manipulating paths. The delimiter is any single character that does not appear unescaped in the regex or replacement.
Substitution interacts deeply with the execution cycle. If you do multiple substitutions in the same script, they operate on the already modified pattern space. This is both a feature and a trap. For example:
# Order matters
sed -e 's/foo/bar/' -e 's/bar/baz/'
This will convert foo to baz, even though you never matched baz in the original input. That is correct behavior, but if you did not expect it, your output will be surprising.
Regex in sed follows POSIX ERE/BRE rules, which means:
- No non-greedy quantifiers
- No lookahead or lookbehind
- No named capture groups
- Anchors and classes behave predictably and portably
These limits are features for portability. They force you to write explicit patterns. If you need advanced regex features, perl is often a better tool.
Failure modes in substitution include:
- Under-escaping: parentheses or braces in BRE are treated literally, so your backreferences are empty.
- Over-escaping: escaping characters that are already literal can lead to mismatches.
- Missing
g: only the first occurrence is replaced. - Greedy patterns:
.*can swallow too much and leave capture groups empty. - Unintended matches: patterns that are too loose (like
.*) match lines you did not intend.
A disciplined approach is to start by matching only and printing with -n and p. Once the regex matches the correct part, add a substitution. Then introduce backreferences one at a time.
How This Fits on Projects
Projects 1 and 2 are direct substitutions with anchors and capturing groups. Project 3 requires multiple ordered substitutions to implement a small Markdown parser. Projects 4 and 5 depend on regex working across multi-line pattern spaces (where . does not match newlines unless you build them into the pattern space).
Definitions and Key Terms
- BRE: Basic Regular Expressions (default in
sed). - ERE: Extended Regular Expressions (enabled with
-E). - Backreference:
\1,\2, etc., referencing captured groups. - Delimiter: The character that separates the
scommand parts. - Flag: A modifier like
g,p, ori.
Mental Model Diagram
Input line: [2025-01-01] [ERROR] Disk full
Regex: ^\[[^]]+\] \[([A-Z]+)\] (.*)$
Groups: \1=ERROR \2=Disk full
Replacement: \1: \2
Output: ERROR: Disk full
How It Works (Step-by-Step)
sedevaluates the regex against the pattern space.- If the regex matches, capture groups are recorded.
- The replacement string is constructed using literals,
&, and backreferences. - The pattern space is replaced with the constructed string.
- Flags decide whether to replace globally or print on substitution.
Minimal Concrete Example
# Convert a date from MM/DD/YYYY to YYYY-MM-DD
$ printf '12/31/2025\n' | sed -E 's#([0-9]{2})/([0-9]{2})/([0-9]{4})#\3-\1-\2#'
2025-12-31
Common Misconceptions
- Misconception:
()works for groups without-E.- Correction: In BRE, you must escape parentheses:
\( ... \).
- Correction: In BRE, you must escape parentheses:
- Misconception: The replacement is also a regex.
- Correction: The replacement is a literal string with backreferences.
- Misconception:
s/foo/bar/replaces all occurrences.- Correction: It replaces only the first match on each line unless you add
g.
- Correction: It replaces only the first match on each line unless you add
Check-Your-Understanding Questions
- What is the difference between BRE and ERE in
sed? - Why does
s/foo/bar/not replace multiple occurrences on the same line? - What does
&mean in the replacement string? - When should you change the delimiter in
s///?
Check-Your-Understanding Answers
- BRE requires escaping of
(),{},+, and?, while ERE treats them as special without escaping. - The
scommand replaces only the first match by default; use thegflag for all matches. &expands to the entire matched text.- When the pattern or replacement contains many
/characters, using another delimiter reduces escaping.
Real-World Applications
- Normalizing date formats in logs
- Reformatting CSV fields
- Redacting secrets or tokens in configuration files
- Converting inline Markdown emphasis to HTML tags
Where You Will Apply It
- Project 1: Config value substitution
- Project 2: Capturing log message bodies
- Project 3: Markdown conversion
References
- GNU sed Manual:
scommand, replacement semantics, and regex options- https://www.gnu.org/software/sed/manual/sed.html
- OpenBSD sed manual (BRE by default,
-Efor ERE,-ralias)- https://man.openbsd.org/OpenBSD-current/sed.1
- POSIX sed specification (BRE/ERE definitions and portability)
- https://man7.org/linux/man-pages/man1/sed.1p.html
Key Insight
Substitution is not just replacement; it is structured text extraction and recomposition.
Summary
Regex and substitution are the core of sed power. Learn them precisely and your scripts will be short, reliable, and expressive.
Homework/Exercises to Practice the Concept
- Replace all tabs with a single space in a file.
- Convert
Last, FirstintoFirst Last. - Remove trailing whitespace from every line.
Solutions to the Homework/Exercises
sed 's/\t/ /g' file.txtsed -E 's/^([^,]+), ([^ ]+)$/\2 \1/' file.txtsed 's/[[:space:]]*$//' file.txt
Chapter 4: Script Composition and Control Flow
Fundamentals
A sed script is just a sequence of commands. But as soon as you have more than one command, ordering and control flow matter. You can write scripts inline with -e or in a separate file with -f, and you can group commands with { ... } to apply them under a shared address. These features turn sed from a one-liner tool into a tiny programming language.
The basic control flow tools are:
b(branch): jump to a label unconditionallyt(test): jump to a label only if the last substitution succeededq(quit): stop processing and exitd(delete): delete pattern space and start next cyclen(next line): read the next line into pattern space
You do not need these for simple substitutions, but you do for complex multi-step scripts, especially when you want to avoid repeated work or create custom output logic.
Deep Dive into the Concept
Script composition is where sed starts to feel like a language rather than a command. The rules are strict but consistent:
-
Commands run in order. Each command sees the modified pattern space from earlier commands. This means you can build pipelines within a single line transformation. It also means you can easily create conflicts if you reorder commands incorrectly.
-
Addressing applies to command blocks. You can attach an address to a
{ ... }block, which makes a group of commands conditional. This is how you create “if” statements without a realif. -
Branching changes flow but not state. The
bandtcommands jump to labels in the script. They do not reset the pattern space. This allows you to implement loops and conditional execution, especially when combined with substitutions and thetcommand that checks whether a substitution succeeded. -
dandqare cycle-breaking commands.dends the current cycle immediately and starts the next one with a new line.qexits the entire program. These are powerful but dangerous if placed in the wrong spot. For example, adin the middle of a script means later commands will never execute on that line. -
Script files improve clarity and portability. As scripts grow, inline
-ebecomes unreadable. A.sedfile allows comments, indentation, and easier maintenance. This is especially useful for multi-step transformations like Project 3 (Markdown conversion).
Control flow is also how you optimize scripts. The t command can be used to skip expensive commands if a substitution did not occur. Example: attempt a substitution; if it succeeds, branch to a label that prints and skips extra checks. This reduces redundant regex matching on large files.
Failure modes are common in control flow:
- Hidden early exit: A
qordplaced too early stops later commands. - Unreachable code: A branch that always jumps past a section makes those commands dead.
- Unexpected fall-through: Forgetting to branch or quit when necessary can apply transformations multiple times.
- Address-block confusion: A
{}block with a range address might apply to more lines than intended.
The best way to debug is to add temporary p commands under specific addresses to observe intermediate states. GNU sed also provides --debug to trace execution, which is extremely useful when available.
How This Fits on Projects
Project 3 relies on multiple ordered substitutions and benefits from a .sed script file. Project 4 uses grouped commands to process blocks separated by blank lines. Project 5 uses conditional printing with $ and negation to control when output appears.
Definitions and Key Terms
- Script file: A file passed with
-fcontainingsedcommands. - Command block:
{ ... }grouped commands under one address. - Label: A named target for branching (e.g.,
:loop). - Branch (
b): Unconditional jump to a label. - Test (
t): Conditional jump if last substitution succeeded.
Mental Model Diagram
Script flow
-----------
/start/ {
s/foo/bar/
t changed
b end
}
:changed
s/bar/BAZ/
:end
How It Works (Step-by-Step)
sedreads a line into the pattern space.- It runs the first command. If the command is inside a block, it only runs if the block address matches.
- If a substitution succeeds,
t labelcan jump to a label. - If
druns, the cycle ends immediately. - If
qruns, the program exits. - If no early exit occurs, the cycle completes and output prints.
Minimal Concrete Example
# Use a script file to convert headings and bold text
$ cat > converter.sed <<'S'
/^[#][ ]/ { s/^# (.*)/<h1>\1<\/h1>/ }
/^[#][#][ ]/ { s/^## (.*)/<h2>\1<\/h2>/ }
S
$ sed -E -f converter.sed notes.md
Common Misconceptions
- Misconception:
sedscripts cannot branch.- Correction:
bandtprovide branching and conditional control flow.
- Correction:
- Misconception:
djust deletes output but keeps running commands.- Correction:
dends the cycle immediately.
- Correction:
- Misconception:
-eand-fare equivalent in readability.- Correction: Script files are far more maintainable for multi-command logic.
Check-Your-Understanding Questions
- What does
t labeldo and when does it branch? - Why might
dprevent later commands from running? - When should you move from
-eto-f? - How can you conditionally apply multiple commands to a range?
Check-Your-Understanding Answers
t labelbranches to the label only if the last substitution succeeded.ddeletes the pattern space and immediately starts the next cycle.- When you have multiple commands and need clarity or reuse.
- Use a range address with a
{ ... }block.
Real-World Applications
- Multi-step log normalization pipelines
- Safe refactoring of config files across many hosts
- Format conversion scripts in CI/CD pipelines
Where You Will Apply It
- Project 3: Multi-command Markdown conversion
- Project 4: Block processing with command blocks
- Project 5: Conditional printing and branching logic
References
- GNU sed Manual: invocation (
-e,-f), command summary, and GNU extensions- https://www.gnu.org/software/sed/manual/sed.html
- OpenBSD sed manual: option behavior and script parsing rules
- https://man.openbsd.org/OpenBSD-current/sed.1
- POSIX sed specification: standard options and command language
- https://man7.org/linux/man-pages/man1/sed.1p.html
Key Insight
sed becomes truly powerful when you treat it as a small programming language with explicit control flow.
Summary
Script structure and control flow are what separate quick one-liners from production-ready transformations.
Homework/Exercises to Practice the Concept
- Write a script file that replaces
foowithbar, but only in lines between START and END. - Use
tto perform a second substitution only if the first one matched. - Write a script that quits after printing the first matching line.
Solutions to the Homework/Exercises
sed -f script.sed file.txtwith:/START/,/END/ { s/foo/bar/ }sed -e 's/foo/bar/' -e 't done' -e 's/baz/qux/' -e ':done' file.txtsed -n '/pattern/{p;q;}' file.txt
Chapter 5: Multi-Line Processing and the Hold Space
Fundamentals
By default, sed processes one line at a time. Multi-line logic requires you to deliberately expand the pattern space or use the hold space as persistent storage. The N command appends the next input line to the pattern space separated by a newline. Once you do this, your script is operating on multiple lines at once. The P and D commands are the line-oriented counterparts for printing or deleting only the first line of a multi-line pattern space.
The hold space is a second buffer that persists across cycles. Commands like h, H, g, G, and x let you copy or exchange data between the pattern space and hold space. This is how you implement operations that need memory, like reversing a file, joining lines, or processing blocks.
Deep Dive into the Concept
The hold space is often described as a “scratch pad,” but it is more than that. It is the only way to carry state across cycles. This makes it the secret weapon for advanced sed scripts.
Key mechanics:
handH:hcopies the pattern space into the hold space (overwriting).Happends the pattern space to the hold space (with a newline). This lets you accumulate data across multiple cycles.gandG:gcopies the hold space into the pattern space (overwriting).Gappends the hold space to the pattern space (with a newline). This lets you inject stored data into the current working line.x: Exchanges the pattern space and hold space. This is a quick swap and is often used in stateful transformations.
Multi-line pattern spaces introduce subtle regex behavior. Anchors ^ and $ still match the beginning and end of the entire pattern space, not each line. The dot . does not match newlines, so .* will stop at the newline unless you explicitly write a pattern that matches \n or you structure the regex across lines. This is why multi-line sed scripts are both powerful and tricky.
Common multi-line patterns:
- Paragraph mode: Use
/^$/to detect blank lines, store a paragraph in the hold space, and then process it as a whole. - Two-line window: Use
Nto combine two lines, perform a substitution that uses both, then useP;Dto step through the data without losing alignment. - File reversal: Use
G;hto prepend lines into the hold space, then print at the end. This is a classicsedone-liner.
Failure modes:
- Unexpected newlines: Using
Nwithout adjusting your regex leads to partial matches. - Hold space corruption: Overwriting the hold space when you meant to append leads to lost data.
- Infinite loops: Poor use of
DandNcan create loops that never advance the input.
The safest way to develop multi-line scripts is to start with very small input files, add -n and explicit p, and trace the pattern/hold space at each step. GNU sed offers --debug to help with this on systems where available.
How This Fits on Projects
Project 4 is explicitly about multi-line block processing and hold space usage. Project 5 relies on hold space accumulation and careful control of output on the last line. These are the capstone skills for sed mastery.
Definitions and Key Terms
N: Append next line to pattern space with a newline.P: Print first line of pattern space only.D: Delete first line of pattern space and restart cycle on remainder.- Hold space: Persistent buffer across cycles.
- Swap (
x): Exchange pattern space and hold space.
Mental Model Diagram
Pattern space: line1\nline2
Hold space: (saved lines)
Commands:
H -> hold += pattern
G -> pattern += hold
x -> swap pattern <-> hold
How It Works (Step-by-Step)
- Start with a normal line in pattern space.
Nappends the next line, creating a multi-line pattern space.- Run substitutions that can see both lines.
- Use
Pto print the first line only, orDto drop the first line and continue. - Use
h,H,g,G, orxto persist state across cycles.
Minimal Concrete Example
# Join every pair of lines with a comma
$ sed 'N; s/\n/, /' file.txt
Common Misconceptions
- Misconception:
Nstarts a new cycle.- Correction:
Nappends the next line to the current pattern space.
- Correction:
- Misconception:
Pprints the entire pattern space.- Correction:
Pprints only the first line of a multi-line pattern space.
- Correction:
- Misconception: The hold space resets every line.
- Correction: It persists across cycles until you change it.
Check-Your-Understanding Questions
- What is the difference between
Handh? - Why does
Nchange how^and$behave in regex? - How do
PandDhelp you slide through multi-line pattern space? - Why is
xuseful in multi-line scripts?
Check-Your-Understanding Answers
hoverwrites the hold space;Happends with a newline.^and$still match the beginning/end of the entire pattern space, not each line.Pprints only the first line;Ddeletes the first line and restarts the cycle with the remainder.xswaps the pattern and hold spaces, letting you alternate between stored and current data.
Real-World Applications
- Collapsing multi-line stack traces into single lines
- Extracting blocks of text between markers
- Reformatting multi-line configuration stanzas
- Reversing files in environments without
tac
Where You Will Apply It
- Project 4: Parsing blocks separated by blank lines
- Project 5: Reversing a file line by line
References
- GNU sed Manual: command summary (
N,P,D,h,H,g,G,x)- https://www.gnu.org/software/sed/manual/sed.html
- POSIX sed specification: pattern/hold space behavior and minimum buffer size
- https://man7.org/linux/man-pages/man1/sed.1p.html
- OpenBSD sed manual: detailed command semantics for multi-line operations
- https://man.openbsd.org/OpenBSD-current/sed.1
Key Insight
Multi-line sed is not magic; it is deliberate manipulation of two buffers.
Summary
Hold space and multi-line commands are how you extend sed beyond single-line edits into real text transformations.
Homework/Exercises to Practice the Concept
- Join every two lines in a file with a space.
- Reverse a file with a
sedone-liner. - Write a script that collects a paragraph (blank-line separated) and prints it in uppercase (use
trfor uppercase if needed).
Solutions to the Homework/Exercises
sed 'N; s/\n/ /' file.txtsed -n '1!G;h;$p' file.txt- Use
sed '/^$/ { x; s/\n/ /g; p; x; d; } H' file.txt | tr a-z A-Z
Chapter 6: In-Place Editing and Portability
Fundamentals
sed is often used to edit files in place. This is convenient but dangerous if you do not understand how the -i flag works. GNU sed supports -i[SUFFIX] which edits files in place and optionally creates a backup. BSD/macOS sed expects an explicit extension argument (use -i '' for no backup or -i .bak for a backup file). Some BSDs treat -i with no argument as an error, which is a common portability trap.
Portability also affects regular expression syntax and option flags. GNU sed supports -E (extended regex) and -r (legacy extended regex), while BSD sed uses -E. POSIX defines a smaller option set (-n, -e, -f), so scripts that rely on -i, -E, or GNU-only extensions may not be portable.
The correct mindset is: develop in a safe, non-destructive mode, then add in-place editing only when you are confident in your script. Use backups for safety, especially in automation.
Deep Dive into the Concept
In-place editing sounds simple but hides tricky behavior. GNU sed implements -i by creating a temporary file, writing the transformed output there, and then renaming it to replace the original file. If a suffix is provided, the original file is kept as a backup with that suffix; if no suffix is provided, no backup is kept. This means:
- File permissions, ownership, and hard links may be affected if you are not careful.
- Errors during processing can result in incomplete files if your script fails mid-way.
- Running
sed -iwithout a backup can destroy data permanently if your regex is wrong.
BSD sed is stricter about the -i argument: you pass an explicit extension. -i '' means “no backup,” while -i .bak creates a backup with a .bak suffix. Some BSDs treat a zero-length extension as “no backup” explicitly, which makes destructive edits less likely but introduces a portability difference that scripts must handle.
Portability strategy:
- Prefer dry runs: Use
sed -nand explicitpto verify output first. - Use backup suffixes:
-i.bak(GNU) or-i .bak(BSD) so you can restore. - Avoid GNU-only extensions in scripts meant for macOS/BSD: stick to POSIX basics where possible.
- Document your target platform: If your script assumes GNU
sed, say so explicitly.
Another aspect of portability is regex mode. GNU sed notes that -E was added to POSIX as a portable way to request extended regex, while -r is GNU-specific legacy. Prefer -E for cross-platform scripts. Avoid advanced regex features that are not in POSIX ERE.
Failure modes here are destructive:
- Silent file truncation: If your script produces no output, the temporary file replaces the original with an empty file.
- Wrong syntax on macOS: Using
sed -iwithout an argument fails on BSD/macOS. - Broken automation: A script that works on Linux fails in CI on macOS because of
-isemantics.
The safe workflow is to always test with a non-destructive pipeline, and only then run -i with backups.
How This Fits on Projects
Project 1 is your first in-place edit. You will learn to apply safe substitution in a config file. Later projects reinforce the practice of dry-run testing before applying edits.
Definitions and Key Terms
- In-place editing (
-i): Modify files directly instead of writing to stdout. - Backup suffix: The extension added to saved backups when using
-i. - Portability: Running the same script across GNU and BSD
sedvariants.
Mental Model Diagram
Input file -> sed transforms -> temp file -> rename -> output file
| |
+------------------- backup ------------+
How It Works (Step-by-Step)
sedreads the input file and applies the script.- Output is written to a temporary file (GNU sed behavior).
- The original file is replaced by renaming the temporary file.
- If
-iwas given a suffix, the original file is kept as a backup.
Minimal Concrete Example
# GNU sed (Linux)
$ sed -i.bak 's/^DEBUG=true/DEBUG=false/' app.conf
# BSD sed (macOS)
$ sed -i '' 's/^DEBUG=true/DEBUG=false/' app.conf
Common Misconceptions
- Misconception:
-iis standardized everywhere.- Correction:
-iis a common extension but not part of the POSIX option set.
- Correction:
- Misconception: In-place editing is always safe.
- Correction: It can overwrite files; always use backups during development.
- Misconception:
-ris portable.- Correction: Prefer
-Efor extended regex portability.
- Correction: Prefer
Check-Your-Understanding Questions
- Why does
sed -ibehave differently on macOS and Linux? - What is the safest way to apply an in-place edit?
- Which option set is portable across POSIX sed?
- Why should you prefer
-Eover-r?
Check-Your-Understanding Answers
- BSD
sedrequires an explicit extension argument, while GNUsedallows-iwith or without a suffix. - Run a dry-run with
-nand explicit printing, then use-iwith a backup suffix. - The POSIX set includes
-n,-e, and-f. Other options are extensions. -Eis the POSIX-recognized extended regex flag, while-ris GNU-specific.
Real-World Applications
- Updating config files in automation scripts
- Rewriting text in build pipelines
- Applying safe patches to large sets of files
Where You Will Apply It
- Project 1: Config file updates with backups
- Project 3: Script-based transformations (portability considerations)
References
- GNU sed Manual:
-i,-E, portability notes, and GNU-only options- https://www.gnu.org/software/sed/manual/sed.html
- OpenBSD sed manual:
-iextension behavior,-E/-r, and BSD semantics- https://man.openbsd.org/OpenBSD-current/sed.1
- POSIX sed specification: standard options
- https://man7.org/linux/man-pages/man1/sed.1p.html
Key Insight
In-place editing is power with risk; portable scripts are power with discipline.
Summary
Portable sed means careful option choices, explicit backups, and an understanding of GNU vs BSD behavior.
Homework/Exercises to Practice the Concept
- Write a script that toggles
DEBUG=truetoDEBUG=falsewith a backup file. - Test the script on both GNU and BSD
sedand note the differences. - Create a wrapper function that selects the correct
-isyntax based on the OS.
Solutions to the Homework/Exercises
sed -i.bak 's/^DEBUG=true/DEBUG=false/' app.conf- On macOS:
sed -i '' 's/^DEBUG=true/DEBUG=false/' app.conf - Example shell snippet:
if sed --version >/dev/null 2>&1; then SED_INPLACE="-i"; else SED_INPLACE="-i ''"; fi
Glossary
- Address: A selector that determines which lines a command applies to.
- Branch: A control flow jump (
b,t) to a labeled location. - Cycle: The read -> apply commands -> print -> reset loop.
- Delimiter: The separator character in
s/regex/repl/. - Hold space: A persistent buffer across cycles for multi-line logic.
- Pattern space: The current line or multi-line buffer being edited.
- Range address: A stateful selector from a start line to an end line.
- Script: An ordered list of
sedcommands. - Stream editor: A tool that transforms input sequentially without loading the full file.
Why sed Matters
The Modern Problem It Solves
Every production system emits text: logs, configs, manifests, reports, CI output, and build artifacts. You often need surgical edits that are fast, repeatable, and safe to run in pipelines. sed solves this by applying a deterministic edit script to a streaming input without loading the whole file into memory. That is exactly the shape of real-world operational work: transform large or continuous text streams in a single pass.
Real-World Impact (Recent Data)
sed matters because Unix-like environments dominate production infrastructure and POSIX tools are universally present there:
- Unix on web servers: W3Techs reports that Unix is used by 90.7% of websites whose OS is known (January 2026).
- Linux share within that universe: Linux accounts for 57.2% of websites with known OS (January 2026).
This means stream-editing tools like sed are available in the overwhelming majority of production environments, making them a dependable baseline skill for ops, data cleanup, and incident response.
Context & Evolution (Short)
sed became part of the POSIX toolset so scripts could rely on a consistent stream editor across Unix systems. The core model (pattern space + hold space + cycle) is standardized, while GNU/BSD variants add convenience flags and debugging features. The combination of standardized core + ubiquitous availability is why sed remains relevant even as higher-level tools exist.
Manual Editing Stream Editing
--------------- ------------------------
Open file -> search -> edit cat file | sed 's/old/new/'
Repeat per file Apply to 1 or 10,000 files
Risk of manual mistakes Deterministic, scriptable
References
- W3Techs OS usage statistics (January 2026)
- https://w3techs.com/technologies/details/os-unix
- POSIX sed specification overview
- https://man7.org/linux/man-pages/man1/sed.1p.html
- OpenBSD sed manual (portable behavior reference)
- https://man.openbsd.org/OpenBSD-current/sed.1
Concept Summary Table
| Concept Cluster | What You Need to Internalize |
|---|---|
| Stream Editing Execution Model | The read -> edit -> print cycle, pattern space behavior, and how -n, d, and p control output. |
| Addressing and Line Selection | Line numbers, regex addresses, ranges, and negation with ! as the core targeting logic. |
| Regular Expressions and Substitution | BRE vs ERE, capturing groups, backreferences, delimiters, and flags like g and p. |
| Script Composition and Control Flow | Multiple commands, -e/-f, {} blocks, branching with b/t, and early exit with d/q. |
| Multi-Line Processing and Hold Space | N, P, D, and hold space commands (h, H, g, G, x) for stateful edits. |
| In-Place Editing and Portability | Safe use of -i, GNU vs BSD differences, and POSIX-compatible option sets. |
Project-to-Concept Map
| Project | What It Builds | Primer Chapters It Uses |
|---|---|---|
| Project 1: Config File Updater | Safe in-place substitution in configs | 1, 2, 3, 6 |
| Project 2: Log File Cleaner | Regex capture and reformatting | 1, 2, 3 |
| Project 3: Markdown to HTML Converter | Multi-command script with ordering | 1, 2, 3, 4 |
| Project 4: Multi-Line Address Parser | Block processing with hold space | 1, 2, 4, 5 |
| Project 5: Reversing a File | Hold space mastery and control flow | 1, 4, 5 |
Deep Dive Reading by Concept
Fundamentals and Unix Context
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| Text pipelines | The Linux Command Line by William Shotts - Ch. 20 | Solid foundation for text-processing tools. |
| Shell text processing | Effective Shell by Dave Kerr - Ch. 21 | Practical pipelines and tool composition. |
| Stream editing philosophy | Shell Programming in Unix, Linux and OS X by Kochan/Wood - Ch. 9 | Shows idiomatic stream editing patterns. |
Regex and Substitution
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| Regex fundamentals | Mastering Regular Expressions by Jeffrey Friedl - Ch. 3-5 | Deep understanding of regex mechanics. |
| sed substitution | sed & awk by Dougherty/Robbins - Ch. 2-5 | Classic coverage of the s command and syntax. |
Scripting and Control Flow
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| Script structure | Classic Shell Scripting by Robbins/Beebe - Ch. 3-5 | Practical multi-command scripts. |
| Real-world recipes | Wicked Cool Shell Scripts by Taylor/Perry - Ch. 2 | Applied, production-style patterns. |
Multi-line and Hold Space
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| Hold space patterns | sed & awk by Dougherty/Robbins - Ch. 6 | Canonical advanced sed techniques. |
Portability and Tooling
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| Portability | Effective Shell by Dave Kerr - Ch. 22 | Cross-platform scripting discipline. |
Quick Start
Day 1 (4 hours):
- Read Chapter 1 and Chapter 2 in the Theory Primer.
- Run the minimal examples in each chapter on your machine.
- Start Project 1 and complete the dry-run substitutions (no
-iyet). - Verify output with
difforgit diff.
Day 2 (4 hours):
- Finish Project 1 with safe backups (
-i.bakor-i .bak). - Read Chapter 3 and practice 5 regex substitutions.
- Start Project 2 and build your regex slowly using
-nandp.
End of Weekend:
You can confidently explain the sed execution cycle and apply targeted substitutions in real files.
Recommended Learning Paths
Path 1: The Sysadmin/DevOps Track (Recommended)
Best for: People who manage configs and logs.
- Project 1 -> Project 2 -> Project 4 -> Project 5
- Use Chapter 6 early for in-place safety
Path 2: The Text Transformation Track
Best for: People working on data cleaning.
- Project 2 -> Project 3 -> Project 4
- Focus on Chapter 3 (regex) and Chapter 5 (multi-line)
Path 3: The CLI Tool Builder
Best for: People building reusable scripts and pipelines.
- Project 1 -> Project 3 -> Project 4
- Emphasize Chapter 4 (script structure)
Path 4: The Completionist
Best for: Deep mastery.
- Project 1 -> 2 -> 3 -> 4 -> 5 in order
- Repeat Project 5 with a different solution and document trade-offs
Success Metrics
- You can explain the
sedexecution cycle without notes. - You can write a correct
s/regex/repl/substitution with backreferences. - You can target lines with both numeric and regex addresses.
- You can write a multi-command script in a
.sedfile. - You can safely apply in-place edits with backups on your OS.
- You can debug a failing script using
-nand explicitp.
Optional Appendices
Appendix A: Sed Portability Cheat Sheet
| Feature | GNU sed (Linux) | BSD/macOS sed | Portability Note |
|---|---|---|---|
| In-place edit | -i[SUFFIX] (suffix optional) |
-i '' or -i .bak (explicit arg) |
GNU accepts optional suffix; BSD expects an argument, even if empty. Use backups during development. |
| Extended regex | -E or -r |
-E (often -r alias) |
Prefer -E for portability. |
| POSIX mode | --posix |
N/A | GNU-only strict mode. |
| Debug tracing | --debug |
N/A | GNU-only. |
Appendix B: Debugging Workflow
- Start with
-nand explicitp. - Add commands one at a time.
- Use
sed -n 'l'to display hidden characters (GNU/BSDlcommand). - For GNU sed, use
--debugwhen available. - Always test on small fixtures before running
-i.
Appendix C: Regex Quick Reference (sed)
^start of line$end of line.any single character*zero or more of previous[abc]one of a, b, c[^abc]not a, b, c[[:digit:]]any digit (POSIX class)\(...\)capture group in BRE( ... )capture group in ERE (-E)\1backreference
Appendix D: Common Commands Quick Reference
ssubstituteddelete pattern space and start next cyclepprint pattern spacenread next line, start next cycleNappend next line to pattern spacehcopy pattern space to hold spaceHappend pattern space to hold spacegcopy hold space to pattern spaceGappend hold space to pattern spacexexchange pattern and hold spacebbranchttest and branch on successful substitutionqquit
Appendix E: Cross-Platform In-Place Wrapper (Shell)
# Use: sed_inplace 's/old/new/' file1 [file2 ...]
# Creates backups with .bak on both GNU and BSD sed.
sed_inplace() {
if sed --version >/dev/null 2>&1; then
# GNU sed
sed -i.bak "$@"
else
# BSD/macOS sed
sed -i .bak "$@"
fi
}
Project Overview Table
| # | Project | Difficulty | Core Skills | Primary Deliverable |
|---|---|---|---|---|
| 1 | Config File Updater | Beginner | s command, addresses, safe -i |
Automated config toggle script with backups |
| 2 | Log File Cleaner | Beginner | regex capture, formatting, selective printing | Log normalizer pipeline with validated output |
| 3 | Markdown to HTML Converter | Intermediate | multi-command scripts, ordering, ERE | converter.sed script that emits HTML |
| 4 | Multi-Line Address Parser | Advanced | hold space, multi-line pattern space | Block extractor for paragraph-level transformations |
| 5 | Reversing a File | Advanced | hold space accumulation, control flow | File reverser one-liner with explanation |
Project List
These projects build sed skills from basic substitutions to advanced multi-line transformations.
- Project 1: The Config File Updater - Safe in-place edits with precise addresses.
- Project 2: The Log File Cleaner - Regex capture groups and reformatting.
- Project 3: Basic Markdown to HTML Converter - Multi-command scripts and ordering.
- Project 4: The Multi-Line Address Parser - Hold space and block processing.
- Project 5: Reversing a File (Line by Line) - Full hold space mastery.
Project 1: The Config File Updater
- File: LEARN_SED_COMMAND.md
- Main Programming Language: sed (Bash/Shell)
- Alternative Programming Languages: Python, Perl, Awk
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 1: Beginner
- Knowledge Area: Text Substitution / In-place Editing
- Software or Tool: sed
- Main Book: sed & awk, 2nd Edition by Dale Dougherty & Arnold Robbins
What you’ll build: A sed command that finds and replaces a specific setting in a configuration file. For example, changing DEBUG=true to DEBUG=false.
Why it teaches sed: This is the most common use case for sed. It teaches you the fundamentals of the s command, using regex to anchor your search (^DEBUG=), and how to edit files in-place safely with the -i option.
Core challenges you’ll face:
- Constructing the
scommand → maps to understanding thes/find/replace/syntax - Matching the whole line vs. just the value → maps to using
^and$to make your regex more specific - Handling different file permissions and backups → maps to understanding the
-i(in-place) and-i.bakflags - Applying the change only to specific lines → maps to using an address pattern like
/^DEBUG=/
Key Concepts:
- Substitution: “sed & awk” Ch. 3 - Dougherty & Robbins
- In-place Editing:
man sed(look for the-ioption) - Regular Expressions: “sed & awk” Ch. 2
Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic command-line navigation.
Real World Outcome
You will have a config file, app.conf:
# Application Settings
SERVER_HOST=127.0.0.1
SERVER_PORT=8080
DEBUG=true
You will run a single sed command, and the file will be instantly modified:
$ sed -i.bak 's/^DEBUG=true/DEBUG=false/' app.conf
The file app.conf now contains DEBUG=false, and a backup of the original file, app.conf.bak, has been created.
Command Line Outcome Example:
$ cat app.conf
# Application Settings
SERVER_HOST=127.0.0.1
SERVER_PORT=8080
DEBUG=true
# Dry run (preview only)
$ sed 's/^DEBUG=true/DEBUG=false/' app.conf
# Application Settings
SERVER_HOST=127.0.0.1
SERVER_PORT=8080
DEBUG=false
# In-place edit with backup (GNU sed)
$ sed -i.bak 's/^DEBUG=true/DEBUG=false/' app.conf
$ ls -1 app.conf app.conf.bak
app.conf
app.conf.bak
$ cat app.conf
# Application Settings
SERVER_HOST=127.0.0.1
SERVER_PORT=8080
DEBUG=false
$ cat app.conf.bak
# Application Settings
SERVER_HOST=127.0.0.1
SERVER_PORT=8080
DEBUG=true
Implementation Hints:
- Create a sample
app.conffile to work with. - Start by just printing the output to the terminal (don’t use
-iyet).sed 's/true/false/' app.conf - Notice that this might change other lines if they contain “true”. How can you make it more specific?
- Anchor the search to the beginning of the line. What character does that?
- Only apply the
scommand to lines that match a certain pattern. The syntax is/pattern/s/find/replace/.
- Once your command correctly isolates and changes only the
DEBUG=trueline, you can add the-iflag to perform the edit on the file itself. It’s good practice to use-i.bakto create a backup, especially when learning.
Learning milestones:
- You can replace a simple word in a file → You understand the basic
scommand. - You can replace a word on a specific line number → You understand numeric addressing.
- You can replace a word only on lines that match a pattern → You understand regex addressing.
- You confidently use
sed -ito modify a file → You’ve unlockedsedfor scripting and automation.
The Core Question You’re Answering
“How do I surgically modify a single line in a file without opening an editor?”
This is the fundamental question every sysadmin, DevOps engineer, and shell scripter eventually faces. Configuration files are everywhere - .env files, nginx configs, application settings, systemd units - and you need to change them programmatically. Opening vim or nano is fine for one file, but what about 50 servers? What about a CI/CD pipeline? What about a script that needs to toggle a setting based on the environment?
The answer is sed. Specifically, the s (substitute) command with precise addressing. This project teaches you to be a text surgeon: make exactly the change you need, on exactly the line you need, without touching anything else.
Concepts You Must Understand First
Before writing your first sed command, you need to internalize these foundational concepts:
1. The s Command Structure: s/pattern/replacement/flags
This is the anatomy of every substitution:
s: The substitute command (there are others likedfor delete, butsis king)/pattern/: A regular expression describing what to find/replacement/: What to put in its place/flags: Optional modifiers (likegfor global,ifor case-insensitive)
The delimiter doesn’t have to be /. You can use any character: s#pattern#replacement# or s|pattern|replacement|. This is useful when your pattern contains slashes (like file paths).
Reference: “sed & awk” by Dougherty & Robbins, Chapter 3: “The Basic s Command”
2. Regular Expression Anchors: ^ and $
^matches the beginning of a line (not a character, but a position)$matches the end of a line (again, a position)
Why do these matter? Consider:
sed 's/true/false/' config.txt
This will change the FIRST occurrence of “true” on ANY line. If your file has:
DEBUG=true
ENABLE_LOGGING=true
Both lines get modified. But if you use:
sed 's/^DEBUG=true/DEBUG=false/' config.txt
Only the line that STARTS with DEBUG=true is changed.
Reference: “sed & awk” Chapter 2: “Regular Expressions”, specifically the section on “Positional Metacharacters”
3. The -i Flag for In-Place Editing
Without -i, sed writes to stdout (the terminal). The original file is untouched. This is actually a feature - it lets you preview changes before committing them.
With -i, sed modifies the file directly:
-i(GNU sed): Edits in place, no backup-i.bakor-i '.bak': Edits in place, saves original asfilename.bak-i ''(macOS/BSD sed): Edits in place, no backup (note the empty string is required)
Warning: macOS and Linux have different sed implementations! On macOS:
sed -i '' 's/old/new/' file.txt # macOS (BSD sed)
sed -i 's/old/new/' file.txt # Linux (GNU sed)
Reference: man sed on your specific system, look for -i or --in-place
4. How Sed Addresses Work: Line Numbers vs. Patterns
Addresses tell sed which lines to operate on:
| Address Type | Syntax | Example | Meaning |
|---|---|---|---|
| No address | s/a/b/ |
sed 's/a/b/' |
Apply to ALL lines |
| Line number | 3s/a/b/ |
sed '3s/a/b/' |
Apply only to line 3 |
| Line range | 3,5s/a/b/ |
sed '3,5s/a/b/' |
Apply to lines 3-5 |
| Pattern | /regex/s/a/b/ |
sed '/DEBUG/s/a/b/' |
Apply to lines matching regex |
| Last line | $s/a/b/ |
sed '$s/a/b/' |
Apply only to the last line |
The pattern address is your most powerful tool. /^DEBUG=/s/true/false/ means: “On lines that start with DEBUG=, substitute true with false.”
Reference: “sed & awk” Chapter 3: “Addressing”
Questions to Guide Your Design
Before you write any code, sit with these questions. They’ll save you from common mistakes:
1. What if “true” appears elsewhere in the file?
Your file might contain:
ENABLE_SSL=true
DEBUG=true
DESCRIPTION="Setting this to true enables debugging"
A naive sed 's/true/false/' would change ALL of them, including corrupting the DESCRIPTION field. How do you prevent this?
2. How do you ensure only the DEBUG line is modified?
Think about what makes the DEBUG line unique:
- Does it start with a specific prefix?
- Is it on a specific line number? (Fragile - line numbers can change)
- Does it have a unique pattern?
The most robust approach combines pattern addressing with anchored substitution:
sed '/^DEBUG=/s/true/false/'
3. What happens if the file doesn’t have the expected line?
If you run sed '/^DEBUG=/s/true/false/' app.conf and there’s no DEBUG= line, nothing happens. sed exits with code 0 (success). This can be silent failure.
How would you detect this? Consider:
grep -q '^DEBUG=' app.conf && sed -i '/^DEBUG=/s/true/false/' app.conf
Or check if sed actually made a change.
4. How do you safely test before modifying?
Golden rule: Never use -i on your first attempt.
Your workflow should be:
- Run without
-ito see output:sed 's/old/new/' file.txt - Verify the output is what you expect
- Then add
-i.bakfor safety:sed -i.bak 's/old/new/' file.txt - Verify the change, then remove the backup if satisfied
Thinking Exercise
Before touching the keyboard, do this paper exercise. It will cement your understanding of how sed processes files.
Exercise: Trace Through a 5-Line Config File
Given this file (app.conf):
# Application Config
SERVER=localhost
PORT=8080
DEBUG=true
TIMEOUT=30
Trace what happens when you run: sed '/^DEBUG=/s/true/false/' app.conf
For each line, answer:
- What is in the pattern space?
- Does the address
/^DEBUG=/match? - If yes, what substitution occurs?
- What is printed to output?
| Line # | Pattern Space | Address Match? | Substitution | Output |
|---|---|---|---|---|
| 1 | # Application Config |
No (doesn’t start with DEBUG=) | None | # Application Config |
| 2 | SERVER=localhost |
No | None | SERVER=localhost |
| 3 | PORT=8080 |
No | None | PORT=8080 |
| 4 | DEBUG=true |
Yes | true -> false |
DEBUG=false |
| 5 | TIMEOUT=30 |
No | None | TIMEOUT=30 |
Follow-up questions:
- What if line 4 was
DEBUG=TRUE(uppercase)? Would it match? (No, unless you add theiflag) - What if line 4 was ` DEBUG=true
(leading spaces)? Would^DEBUG=match? (No, because^` means “start of line”) - How would you handle both cases?
The Interview Questions They’ll Ask
These are real questions from DevOps, SRE, and backend engineering interviews. If you can answer them confidently, you’ve mastered this project’s learning goals.
Question 1: “How would you change a config value in a file from the command line?”
Expected answer: “I’d use sed with the substitute command. For example, to change DEBUG=true to DEBUG=false in a config file, I’d run:
sed -i.bak 's/^DEBUG=true/DEBUG=false/' config.txt
The -i.bak creates a backup, ^ anchors to line start so I don’t accidentally modify other lines, and the s command does the substitution.”
Bonus points: Mention that on macOS you’d need sed -i '' 's/...' or use gsed (GNU sed from Homebrew).
Question 2: “What’s the difference between sed 's/a/b/' and sed 's/a/b/g'?”
Expected answer: “Without the g flag, sed only replaces the FIRST occurrence of the pattern on each line. With g (global), it replaces ALL occurrences on each line.
For example, with input banana:
sed 's/a/X/'producesbXnana(only firstachanged)sed 's/a/X/g'producesbXnXnX(allas changed)”
Question 3: “How do you make sed only modify lines matching a pattern?”
Expected answer: “You use an address before the command. The address can be a line number, range, or regex pattern. For example:
sed '5s/a/b/'- only modify line 5sed '/ERROR/s/a/b/'- only modify lines containing ‘ERROR’sed '/^#/d'- delete lines starting with#(comments)
The pattern goes between slashes before the command.”
Question 4 (Follow-up): “What if you want to modify lines that DON’T match a pattern?”
Expected answer: “You use the ! operator to negate the address. For example:
sed '/^#/!s/foo/bar/'- substitute on all lines EXCEPT commentssed '/DEBUG/!d'- delete all lines that DON’T contain DEBUG (i.e., keep only DEBUG lines)”
Hints in Layers
If you’re stuck, use these hints progressively. Try to solve the problem with the minimum number of hints.
Hint 1: Start Without `-i`
Never edit in place on your first try. Run sed and let it print to the terminal:
sed 's/true/false/' app.conf
Look at the output. Is it what you expected? Only add -i after you’re confident.
Hint 2: Use `^` to Anchor to Line Start
The caret ^ matches the beginning of a line. This makes your pattern more specific:
sed 's/^DEBUG=true/DEBUG=false/' app.conf
Now only lines that START with DEBUG=true will be modified.
Hint 3: Use Pattern Addressing
Even better than anchoring in the substitution, use an address to select which lines the command applies to:
sed '/^DEBUG=/s/true/false/' app.conf
This says: “On lines matching ^DEBUG=, substitute true with false.”
Why is this better? It’s more flexible. What if the value could be true, True, or TRUE?
sed '/^DEBUG=/s/[Tt][Rr][Uu][Ee]/false/' app.conf
Hint 4: Use `-i.bak` for Safety
When you’re ready to modify the file, always create a backup:
sed -i.bak '/^DEBUG=/s/true/false/' app.conf
This creates app.conf.bak with the original content. If something goes wrong, you can restore it:
mv app.conf.bak app.conf
Hint 5 (macOS users): Handle BSD sed
On macOS, the -i flag requires an argument (even if empty):
sed -i '' '/^DEBUG=/s/true/false/' app.conf # macOS
Or install GNU sed via Homebrew and use gsed:
brew install gnu-sed
gsed -i '/^DEBUG=/s/true/false/' app.conf
Books That Will Help
| Topic | Book | Chapter/Section | Why It Helps |
|---|---|---|---|
Basic s command |
“sed & awk” by Dougherty & Robbins | Ch. 3: “Basic sed Commands” | The definitive explanation of substitution syntax |
| Regular expressions | “sed & awk” by Dougherty & Robbins | Ch. 2: “Understanding Basic Regular Expressions” | Master the patterns that power sed |
| In-place editing | man sed (system manual) |
-i option section |
Platform-specific behavior (GNU vs BSD) |
| Regex anchors | “Mastering Regular Expressions” by Friedl | Ch. 3: “Overview of Regular Expression Features” | Deep dive into ^, $, and word boundaries |
| Shell scripting context | “Classic Shell Scripting” by Robbins & Beebe | Ch. 7: “Power Tools for Text Editing” | How sed fits into larger scripts |
| Quick reference | The Grymoire (online) | www.grymoire.com/Unix/Sed.html | Excellent examples and explanations |
Reading order for this project:
- Start with “sed & awk” Ch. 2 (regex basics) - 30 minutes
- Read “sed & awk” Ch. 3 through the
scommand section - 45 minutes - Skim
man sedfor-iflag specifics on your system - 10 minutes - Keep The Grymoire bookmarked for quick reference
Common Pitfalls & Debugging
Problem 1: “Nothing changed after running the command”
- Why: Your address or regex did not match the target line.
- Fix: Test the address first with
-nandp. - Quick test:
sed -n '/^DEBUG=/p' app.conf
Problem 2: “My file became empty after I used -i”
- Why: You used
-nwithout an explicitp, so no output was written. - Fix: Remove
-nor add an explicitpbefore using-i. - Quick test:
sed -n 's/^DEBUG=true/DEBUG=false/p' app.conf
Problem 3: “macOS says sed: 1: … extra characters after command”
- Why: BSD/macOS
sedrequires an argument for-i. - Fix: Use
-i ''for no backup or-i .bakfor a backup. - Quick test:
sed -i '' 's/^DEBUG=true/DEBUG=false/' app.conf
Problem 4: “Other lines changed unexpectedly”
- Why: The regex is too broad (e.g.,
s/true/false/without an address). - Fix: Anchor the match or add an address:
/^DEBUG=/s/true/false/. - Quick test:
sed -n '/^DEBUG=/p' app.conf
Definition of Done
- A dry run shows exactly one line change and nothing else.
- The command only updates
DEBUG=trueand leaves other keys untouched. - A backup file is created when using in-place edits.
- The script works on both GNU and BSD
sedwith documented syntax. - You can explain why the address and anchors are required.
Project 2: The Log File Cleaner
- File: LEARN_SED_COMMAND.md
- Main Programming Language: sed (Bash/Shell)
- Alternative Programming Languages: Awk, Python
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 1: Beginner
- Knowledge Area: Regex / Capturing Groups
- Software or Tool: sed
- Main Book: Mastering Regular Expressions, 3rd Edition by Jeffrey E.F. Friedl
What you’ll build: A sed script that processes a messy log file, removing unnecessary information (like log level and timestamps) and reformatting it into a cleaner, more readable format.
Why it teaches sed: This project forces you to learn about capturing groups. You’ll match parts of a line and then reference those parts in your replacement string, which is the key to reformatting text instead of just replacing it.
Core challenges you’ll face:
- Matching a complex line structure → maps to writing a regular expression that describes the entire log line format
- Capturing parts of the line → maps to using `
,,,`,"(or(and)with-E) to create groups - Referencing captured groups → maps to using `
,,,`,", `,,,`,", `,,,`,", etc. in the replacement part of thescommand - Deleting non-matching lines → maps to using the
dcommand with pattern addressing
Key Concepts:
- Capturing Groups: “sed & awk” Ch. 3
- Extended Regular Expressions:
man sed(the-Eor-roption) - Combining Commands: Using multiple
-eexpressions or semicolons.
Difficulty: Beginner Time estimate: Weekend Prerequisites: Project 1.
Real World Outcome
You’ll start with a log file app.log like this:
[2025-12-20 10:00:15] [INFO] User 'admin' logged in from 192.168.1.100.
[2025-12-20 10:01:02] [DEBUG] Caching mechanism triggered for key 'user:123'.
[2025-12-20 10:01:30] [ERROR] Failed to connect to database: Connection refused.
Your sed script will transform it into this:
ERROR: Failed to connect to database: Connection refused.
It extracts only the message from ERROR lines.
Command Line Outcome Example:
$ cat app.log
[2025-12-20 10:00:15] [INFO] User 'admin' logged in from 192.168.1.100.
[2025-12-20 10:01:02] [DEBUG] Caching mechanism triggered for key 'user:123'.
[2025-12-20 10:01:30] [ERROR] Failed to connect to database: Connection refused.
# Step 1: Verify filtering
$ sed -n '/ERROR/p' app.log
[2025-12-20 10:01:30] [ERROR] Failed to connect to database: Connection refused.
# Step 2: Extract and reformat
$ sed -E -e '/ERROR/!d' -e 's/^\[[^]]+\] \[ERROR\] (.*)/ERROR: \1/' app.log
ERROR: Failed to connect to database: Connection refused.
Implementation Hints:
- Create your sample
app.log. - Your goal is to match an entire ERROR line and extract only the part after the log level.
- Think about the structure:
[timestamp] [ERROR] message. - Write a regex to match this. With
-E, it might look something like^\t,\n,\r,\\,\",\t,\n,\r,\\,\",.*\t,\n,\r,\\,\",.*\t,\n,\r,\\,\",$.^\t,\n,\r,\\,\",.*\t,\n,\r,\\,\": Matches the timestamp part.- ` \t
,\n,\r,\,",ERROR \t,\n,\r,\,"`: Matches the log level part. (.*)$: This is the key! It’s a capturing group that matches the rest of the line (the message) to the end.
- Now, construct your
scommand. You want to replace the entire line with just the part you captured. How do you reference the first captured group? (\t,\n,\r,\\,\").s/^\t,\n,\r,\\,\",.*\t,\n,\r,\\,\",.*\t,\n,\r,\\,\", ` [ERROR] \t,\n,\r,\,",(.*)$/\t,\n,\r,\,",\1/` - This command will only change the ERROR lines. What about the INFO and DEBUG lines? You want to delete them. You can use a separate
dcommand. How can you apply a command only to lines that don’t match a pattern? (Hint:!).'/ERROR/!d' - You can combine these two commands using the
-eflag:sed -E -e '/ERROR/!d' -e 's/.../.../' app.log.
Learning milestones:
- You can write a regex that matches an entire structured line → You understand how to model text patterns.
- You can extract a substring from a line using a capturing group → You’ve learned the key to reformatting.
- You can re-order parts of a line → e.g.,
s/(part1)(part2)/\t,\n,\r,\\,\",\2 \t,\n,\r,\\,\",\1/. - You can chain multiple commands to perform a multi-step transformation → You are starting to think like a
sedscripter.
The Core Question You’re Answering
“How can I extract and reformat specific parts of a structured line, not just replace text?”
This project moves you beyond simple find-and-replace. You’re learning to dissect a line, identify its components, and reassemble them in a new format. This is the difference between knowing sed and truly understanding it.
Concepts You Must Understand First
Before diving into the implementation, ensure you grasp these foundational concepts:
| Concept | Description | Example |
|---|---|---|
| Capturing groups (BRE) | In Basic Regular Expressions, parentheses must be escaped: \( and \) |
sed 's/\(hello\)/\1 world/' |
| Capturing groups (ERE) | In Extended Regular Expressions (-E flag), use unescaped parentheses |
sed -E 's/(hello)/\1 world/' |
| Backreferences | Reference captured groups with \1, \2, \3, etc. in the replacement |
s/(a)(b)/\2\1/ swaps a and b |
| Extended regex flag | Use -E (POSIX) or -r (GNU) to enable ERE syntax |
sed -E 's/(pattern)/\1/' |
| Pattern negation | The ! modifier inverts the address match |
/ERROR/!d deletes non-ERROR lines |
The d command |
Deletes the current line from the pattern space (no output) | sed '/DEBUG/d' removes DEBUG lines |
Critical distinction: In BRE (default), you write \(group\) and reference with \1. In ERE (-E), you write (group) and still reference with \1. The backreferences always use the backslash.
Questions to Guide Your Design
Work through these questions before writing any code:
- How do you match the timestamp format
[YYYY-MM-DD HH:MM:SS]?- What character class matches digits? (
[0-9]or[[:digit:]]) - How do you match the literal brackets
[and]? - Do you need to escape them in your regex?
- What character class matches digits? (
- How do you capture only the message part?
- Where does the message start in the log line?
- Should you capture everything after
[ERROR]or just the meaningful text? - What about leading/trailing whitespace?
- How do you delete lines that DON’T match ERROR?
- What does
/pattern/!commandmean? - Should you delete first, then substitute, or vice versa?
- What happens to the pattern space when you use
d?
- What does
- What if a log message contains brackets?
- Example:
[2025-12-20 10:01:30] [ERROR] Array index [5] out of bounds. - How do you ensure your regex doesn’t get confused by extra brackets?
- Should you use greedy or non-greedy matching?
- Example:
Thinking Exercise
Trace through what each capture group captures for this sample log line:
[2025-12-20 10:01:30] [ERROR] Failed to connect to database: Connection refused.
Given this sed command with -E:
sed -E 's/^\[([^]]+)\] \[([A-Z]+)\] (.*)$/\2: \3/'
Fill in the blanks:
| Group | Pattern | Captured Value |
|---|---|---|
\1 |
([^]]+) |
______ |
\2 |
([A-Z]+) |
______ |
\3 |
(.*) |
______ |
Click to reveal answers
| Group | Pattern | Captured Value |
|---|---|---|
\1 |
([^]]+) |
2025-12-20 10:01:30 |
\2 |
([A-Z]+) |
ERROR |
\3 |
(.*) |
Failed to connect to database: Connection refused. |
The output would be: ERROR: Failed to connect to database: Connection refused.
The Interview Questions They’ll Ask
Prepare for these common interview questions related to this project:
- “How would you extract just IP addresses from a log file?”
- Expected answer: Use a regex pattern for IP addresses with capturing groups
- Example:
sed -E 's/.*([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*/\1/' - Bonus: Discuss validating IP address ranges (0-255)
- “Explain what a capturing group is and how you’d use it in sed”
- Definition: A portion of a regex enclosed in parentheses that “remembers” the matched text
- Usage: Reference with
\1,\2, etc. in the replacement string - Real example: Reformatting dates from
MM/DD/YYYYtoYYYY-MM-DD
- “How do you invert a match in sed?”
- Use the
!modifier after the address:/pattern/!command - Example:
/ERROR/!dmeans “on lines NOT matching ERROR, delete” - Contrast with:
/ERROR/dwhich deletes lines that DO match ERROR
- Use the
- “What’s the difference between BRE and ERE in sed?”
- BRE (Basic): Default mode, must escape metacharacters like
(),{},+,? - ERE (Extended): Enabled with
-E, metacharacters work without escaping - Trade-off: ERE is more readable but slightly less portable
- BRE (Basic): Default mode, must escape metacharacters like
Hints in Layers
Progress through these hints only as needed:
Hint 1: Getting started
First, just try to match ERROR lines and print them:
sed -n '/ERROR/p' app.log
The -n suppresses automatic printing, and p prints only matching lines.
Hint 2: Deleting non-ERROR lines
Instead of printing ERROR lines, delete everything else:
sed '/ERROR/!d' app.log
This keeps only ERROR lines in the output.
Hint 3: Matching the log line structure
Break down the log line format:
[timestamp]- starts with[, ends with], contains date/time- ` ` - a space
[ERROR]- the log level- ` ` - another space
message- everything else
Pattern with ERE: ^\[.*\] \[ERROR\] (.*)
Hint 4: Capturing the message
Use parentheses around the part you want to keep:
sed -E 's/^\[.*\] \[ERROR\] (.*)/\1/' app.log
This replaces the entire line with just the captured message.
Hint 5: Adding the ERROR prefix
Include literal text in your replacement:
sed -E 's/^\[.*\] \[ERROR\] (.*)/ERROR: \1/' app.log
Now you have the desired output format!
Hint 6: Combining both operations
Use multiple -e expressions to chain commands:
sed -E -e '/ERROR/!d' -e 's/^\[.*\] \[ERROR\] (.*)/ERROR: \1/' app.log
Order matters: first filter to ERROR lines, then reformat them.
Hint 7: Complete solution
#!/bin/bash
# log_cleaner.sh - Extract and reformat ERROR lines from log files
sed -E \
-e '/ERROR/!d' \
-e 's/^\[[^]]+\] \[ERROR\] (.*)/ERROR: \1/' \
"$1"
Usage: ./log_cleaner.sh app.log
Note: [^]]+ is safer than .* for matching the timestamp, as it won’t be greedy across multiple brackets.
Books That Will Help
| Topic | Book | Chapter/Section |
|---|---|---|
| Capturing groups deep dive | “Mastering Regular Expressions” by Jeffrey Friedl | Ch. 3: Overview of Regular Expression Features |
| sed substitution mechanics | “sed & awk” by Dale Dougherty & Arnold Robbins | Ch. 3: Understanding Regular Expression Syntax |
| BRE vs ERE differences | “sed & awk” by Dale Dougherty & Arnold Robbins | Ch. 2: Understanding Basic Operations |
| Pattern negation | “Classic Shell Scripting” by Arnold Robbins | Ch. 3: Searching and Substitution |
| Real-world log processing | “The Linux Command Line” by William Shotts | Ch. 19: Regular Expressions |
| Online reference | The Grymoire - SED | Backreferences section |
Common Pitfalls & Debugging
Problem 1: “No output at all”
- Why: You deleted all lines with
/ERROR/!dbut the regex never matched ERROR. - Fix: Verify the log level string and case.
- Quick test:
sed -n '/ERROR/p' app.log
Problem 2: “The regex eats too much and output is blank”
- Why:
.*is greedy and your capture group might be empty. - Fix: Use a safer class like
[^]]+for bracketed fields. - Quick test:
sed -E 's/^\[[^]]+\\] \\[ERROR\\] (.*)/ERROR: \\1/' app.log
Problem 3: “I still see INFO and DEBUG lines”
- Why: You forgot the
/ERROR/!dfilter or the commands are in the wrong order. - Fix: Filter first, then substitute.
- Quick test:
sed -E -e '/ERROR/!d' -e 's/^\[[^]]+\\] \\[ERROR\\] (.*)/ERROR: \\1/' app.log
Problem 4: “macOS says ‘invalid command code’“
- Why: You used GNU-only flags or quoting mismatches.
- Fix: Use
-Efor extended regex on macOS and single quotes around the script. - Quick test:
sed -E 's/^\[[^]]+\\] \\[ERROR\\] (.*)/ERROR: \\1/' app.log
Definition of Done
- Your script outputs only ERROR lines with the new format.
- INFO and DEBUG lines are fully removed.
- The regex uses explicit classes and avoids fragile
.*where possible. - The script works with
-Eon both GNU and BSDsed. - You can explain each capture group and its replacement.
Project 3: Basic Markdown to HTML Converter
- File: LEARN_SED_COMMAND.md
- Main Programming Language: sed (Bash/Shell)
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Scripting / Multiple Transformations
- Software or Tool: sed
- Main Book: Classic Shell Scripting by Arnold Robbins & Nelson H.F. Beebe
What you’ll build: A sed script that reads a simple Markdown file and converts its syntax (headings, bold, italics) into basic HTML tags.
Why it teaches sed: This project teaches you how to structure a sed script with multiple, ordered commands. You’ll learn that the order of substitutions matters and how to handle patterns that occur at the beginning, middle, or end of a line.
Core challenges you’ll face:
- Handling multiple patterns in one script → maps to writing a
.sedscript file or using multiple-eflags - Order of operations → maps to realizing you should probably handle bold/italics before headings to avoid conflicts
- Matching patterns at the beginning of a line → maps to using
^for headings like## Title - Handling greediness in regex → maps to understanding how
.*can sometimes match more than you want
Key Concepts:
- Sed Scripts: “sed & awk” Ch. 4
- Command Order: Logical thinking about how transformations affect subsequent commands.
- Regex Greediness: Regular-Expressions.info - Greed and Laziness
Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Project 2.
Real World Outcome
You’ll have a Markdown file notes.md with mixed syntax:
# My Document
This is a paragraph with some *italic* and **bold** text.
## A Subheading
More text with **bold and *nested italic*** here.
### A Sub-subheading
Final paragraph with multiple **bold** items and *italics*.
You will build a sed script file converter.sed step-by-step:
Step 1: Create the basic script file
Start with just heading conversions:
cat > converter.sed << 'EOF'
# Convert headings (h3 before h2 before h1 to prevent partial matches)
s#^### (.*)$#<h3>\1</h3>#
s#^## (.*)$#<h2>\1</h2>#
s#^# (.*)$#<h1>\1</h1>#
# Convert emphasis (bold before italic to handle ** before *)
s#\*\*([^*]+)\*\*#<strong>\1</strong>#g
s#\*([^*]+)\*#<em>\1</em>#g
EOF
Step 2: Test individual transformations
First, test just the h2 conversion:
$ echo "## A Subheading" | sed -E 's#^## (.*)$#<h2>\1</h2>#'
<h2>A Subheading</h2>
Then test the h1:
$ echo "# My Document" | sed -E 's#^# (.*)$#<h1>\1</h1>#'
<h1>My Document</h1>
Test a line with bold (using the greedy problem first to see the bug):
$ echo "Text with **bold** and **more bold** here" | sed -E 's#\*\*(.*)\*\*#<strong>\1</strong>#g'
<strong>bold** and **more bold</strong>
This is WRONG—the .* matches from the first ** to the last **. Now fix it with non-greedy:
$ echo "Text with **bold** and **more bold** here" | sed -E 's#\*\*([^*]+)\*\*#<strong>\1</strong>#g'
<strong>bold</strong> and <strong>more bold</strong>
Step 3: Test order dependency
Show why order matters. Process italic first (WRONG):
$ echo "**bold** and *italic*" | sed -E -e 's#\*([^*]+)\*#<em>\1</em>#g' -e 's#\*\*([^*]+)\*\*#<strong>\1</strong>#g'
<em>*bold*</em> and <em>italic</em>
Now process bold first (CORRECT):
$ echo "**bold** and *italic*" | sed -E -e 's#\*\*([^*]+)\*\*#<strong>\1</strong>#g' -e 's#\*([^*]+)\*#<em>\1</em>#g'
<strong>bold</strong> and <em>italic</em>
Step 4: Run the complete script on the input file
Execute the full transformation:
$ sed -E -f converter.sed notes.md
<h1>My Document</h1>
This is a paragraph with some <em>italic</em> and <strong>bold</strong> text.
<h2>A Subheading</h2>
More text with <strong>bold</strong> and <em>nested italic</em> here.
<h3>A Sub-subheading</h3>
Final paragraph with multiple <strong>bold</strong> items and <em>italics</em>.
Step 5: Compare before and after
| Original Markdown | Converted HTML |
|---|---|
# My Document |
<h1>My Document</h1> |
## A Subheading |
<h2>A Subheading</h2> |
### Sub-subheading |
<h3>Sub-subheading</h3> |
**bold text** |
<strong>bold text</strong> |
*italic text* |
<em>italic text</em> |
**bold** and *italic* |
<strong>bold</strong> and <em>italic</em> |
**one** and **two** |
<strong>one</strong> and <strong>two</strong> |
Key Insights from Building This:
- Pattern ordering matters: Bold must be processed before italic, headings can be in any order because their patterns don’t overlap.
- Greedy matching is the enemy:
.*between delimiters will consume everything up to the LAST delimiter on the line. Use[^char]+instead to match “anything but this character.” - The
gflag is essential: Withouts#pattern#replacement#g, only the first match on each line is replaced. - Alternative delimiters save sanity: Using
#instead of/means no escaping of HTML’s</h1>— it’s just/h1>inside. - Extended regex (
-E) cleans up syntax: Without-E, you’d need to escape every parenthesis:\(and\). With-E, they’re literal.
(Note: for wrapping paragraphs in <p> tags and list items in <ul>/<li> tags, you would need a more advanced script using the hold space or lookahead patterns, but headings and emphasis transformations are fully achievable with basic sed).
Implementation Hints:
- Create a file
converter.sedto hold your script. You will run it withsed -E -f converter.sed notes.md. - Start with the simplest transformation. How do you convert
## A Subheadingto<h2>A Subheading</h2>?- Your address should match lines starting with
##. - Your
scommand needs to capture the text after the##. s/^## (.*)$/<h2>\1<\/h2>/. Note the escaped/in the closing tag.sedlets you use other delimiters to avoid this, e.g.,s#^## (.*)$#<h2>\1</h2>#.
- Your address should match lines starting with
- Now, add a rule for
<h1>headings. Does the order of theh1andh2rules in your script file matter? (No, because their patterns are distinct). - Next, tackle bold text:
**bold**. The command will look likes/, `,,`,", `,,,`,",(.*), `,,`,", `,,,`,",/<strong>\1<\/strong>/g. - What happens if you have
**bold** and **more bold**on one line? The(.*)is “greedy” and might match from the first**to the last**. You need to match characters that are not asterisks. A pattern like[^*]can help.s/, `,,`,", `,,,`,",([^*]+), `,,`,", `,,,`,",/<strong>\1<\/strong>/g. - Add a rule for italics (
*italic*). Does the order of the bold and italic rules matter? (Yes! If you do italics first,**bold**might become<em>*bold*</em>, which is wrong).
Learning milestones:
- Your script can convert one type of Markdown syntax → You can write a self-contained rule.
- Your script handles multiple heading levels correctly → You understand how to use multiple rules.
- You can convert bold and italic text on the same line → You understand the importance of command order and non-greedy matching.
- You use an alternate delimiter like
#or|in yourscommand to handle file paths or HTML → You’ve learned a key trick forsedreadability.
The Core Question You’re Answering
“How do I chain multiple transformations together in the correct order, and how does regex greediness affect my patterns?”
This project forces you to confront a fundamental truth about text processing: order matters. When you have overlapping syntax patterns (like *italic* and **bold**), the sequence of your transformations determines whether your output is correct or completely broken. You’ll also learn that regex engines are “greedy” by default—they match as much as possible—which can cause unexpected behavior when you have multiple instances of a pattern on the same line.
Concepts You Must Understand First
Before diving into implementation, make sure you’re comfortable with:
| Concept | Description | Why It Matters |
|---|---|---|
sed script files (-f script.sed) |
Instead of cramming all your commands into one long command line, you can put them in a file (one command per line) and run sed -E -f script.sed input.md. |
This makes complex transformations readable, maintainable, and version-controllable. |
Multiple -e expressions |
For simpler cases, you can chain commands on the command line: sed -E -e 's/pattern1/replacement1/' -e 's/pattern2/replacement2/' file. Each -e adds another command to the script. |
Useful for quick one-liners when you don’t need a full script file. |
| Regex greediness | The quantifiers * and + are “greedy”—they match the longest possible string. In **bold** and **more**, the pattern \*\*.*\*\* would match from the first ** all the way to the last **, swallowing “and” in between. |
Understanding greediness prevents bugs where patterns match more than intended. |
| Non-greedy workarounds | To fix greediness, use negated character classes like [^*]+ (“one or more characters that are NOT asterisks”). |
sed doesn’t support the ? lazy quantifier like Perl or Python, so this is your primary tool. |
| Alternative delimiters | When your patterns or replacements contain / characters (like HTML’s </h1>), you can use a different delimiter to avoid escaping. The first character after s becomes the delimiter: s#pattern#replacement# or s\|pattern\|replacement\|. |
Makes patterns involving slashes much more readable. |
Character classes [^*] for negation |
The [^...] syntax means “match any character EXCEPT these.” |
Crucial for non-greedy matching in sed. |
Questions to Guide Your Design
Ask yourself these questions as you work through the implementation:
- Why should bold patterns be processed before italic?
- Consider what happens to
**bold**if you process*...*first. The outer asterisks get converted to<em>, leaving you with<em>*bold*</em>instead of<strong>bold</strong>.
- Consider what happens to
- How do you handle multiple bold sections on one line?
- If you have
**one** and **two**, does your pattern correctly identify both? Or does it greedily match from the first**to the last**?
- If you have
- What’s the difference between
(.*)and([^*]+)?(.*)matches any characters (including asterisks) as many times as possible.([^*]+)matches one or more characters that are specifically NOT asterisks. The second is what you need for correct bold/italic parsing.
- How do you escape special characters in the replacement?
- The
/in</h1>needs escaping as\/when using/as your delimiter. But if you use#as your delimiter (s#...#...#), you don’t need to escape slashes at all.
- The
Thinking Exercise
Before writing any code, trace through what happens when you process this line with different rule orderings:
**bold** and *italic* and **more bold**
Scenario A: Process bold first, then italic
- Start:
**bold** and *italic* and **more bold** - After bold rule:
<strong>bold</strong> and *italic* and <strong>more bold</strong> - After italic rule:
<strong>bold</strong> and <em>italic</em> and <strong>more bold</strong> - Result: Correct!
Scenario B: Process italic first, then bold
- Start:
**bold** and *italic* and **more bold** - After italic rule (greedy): What happens? The
*...*pattern might match from the first*of**bold**to the*in*italic*! - Potential disaster:
<em>*bold** and </em>italic<em> and **more bold*</em> - Result: Completely broken!
Scenario C: Using greedy .* for bold
Pattern: s/\*\*(.*)\*\*/<strong>\1<\/strong>/g
- Start:
**bold** and *italic* and **more bold** - The
.*matches “bold** and italic and **more bold” (everything between the FIRST and LAST**) - Result:
<strong>bold** and *italic* and **more bold</strong>— Wrong!
Scenario D: Using non-greedy [^*]+ for bold
Pattern: s/\*\*([^*]+)\*\*/<strong>\1<\/strong>/g
- Start:
**bold** and *italic* and **more bold** - First match:
**bold**→<strong>bold</strong> - Second match:
**more bold**→<strong>more bold</strong> - Result:
<strong>bold</strong> and *italic* and <strong>more bold</strong>— Correct!
This exercise demonstrates why understanding greediness is essential for this project.
The Interview Questions They’ll Ask
Master this project, and you’ll be prepared for these common interview questions:
- “What is regex greediness and how do you handle it?”
- Greedy quantifiers (
*,+,{n,m}) match as much text as possible while still allowing the overall pattern to match. In tools like sed that don’t have lazy quantifiers (*?,+?), you work around greediness by using negated character classes. For example, instead of".*"to match quoted strings, use"[^"]*"to match a quote, then any non-quote characters, then another quote.
- Greedy quantifiers (
- “How would you structure a sed script with multiple transformations?”
- Put each transformation on its own line in a
.sedfile, run withsed -E -f script.sed. Order commands so that more specific patterns (like**bold**) are processed before more general ones (like*italic*). Test each transformation individually before combining them.
- Put each transformation on its own line in a
- “Why might you use
#instead of/as a delimiter?”- When your pattern or replacement contains forward slashes (like file paths
/usr/bin/or HTML tags</div>), using/as the delimiter requires escaping every slash:s/\/usr\/bin\//\/opt\/bin\//. Using an alternate delimiter is much cleaner:s#/usr/bin/#/opt/bin/#.
- When your pattern or replacement contains forward slashes (like file paths
Hints in Layers
Start with Layer 1. Only move to the next layer if you’re truly stuck.
Layer 1 - Conceptual Hints
- Start with a single transformation (headings are easiest) before attempting multiple rules.
- Remember that
^anchors to the start of the line—perfect for matching#or##at the beginning. - The
-Eflag enables extended regex, making your patterns cleaner (no escaping parentheses).
Layer 2 - Structural Hints
- Your script file should have the transformations in this order:
h3,h2,h1, thenbold, thenitalic. (From most specific to least specific in terms of character overlap.) - For headings, you need to capture everything after the
#prefix. The pattern^# (.*)$captures the title. - Use the
gflag on bold and italic substitutions to handle multiple occurrences per line.
Layer 3 - Specific Patterns
- Heading 1:
s#^# (.*)$#<h1>\1</h1># - Heading 2:
s#^## (.*)$#<h2>\1</h2># - Bold:
s#\*\*([^*]+)\*\*#<strong>\1</strong>#g - Italic:
s#\*([^*]+)\*#<em>\1</em>#g
Layer 4 - Complete Script Structure
# converter.sed - Markdown to HTML converter
# Run with: sed -E -f converter.sed notes.md
# Convert headings (order: h3 before h2 before h1 to prevent partial matches)
s#^### (.*)$#<h3>\1</h3>#
s#^## (.*)$#<h2>\1</h2>#
s#^# (.*)$#<h1>\1</h1>#
# Convert emphasis (bold before italic to handle ** before *)
s#\*\*([^*]+)\*\*#<strong>\1</strong>#g
s#\*([^*]+)\*#<em>\1</em>#g
Books That Will Help
| Book | Author(s) | Relevant Chapters | Why It Helps |
|---|---|---|---|
| sed & awk, 2nd Edition | Dale Dougherty & Arnold Robbins | Ch. 3: Understanding Regular Expression Syntax, Ch. 4: Writing sed Scripts | The definitive guide to sed scripting. Chapter 4 specifically covers how to structure multi-command scripts. |
| Classic Shell Scripting | Arnold Robbins & Nelson H.F. Beebe | Ch. 3: Searching and Substitutions, Ch. 5: Pipelines Can Do Amazing Things | Provides context for how sed fits into larger shell scripts and pipelines. |
| Mastering Regular Expressions, 3rd Edition | Jeffrey E.F. Friedl | Ch. 4: The Mechanics of Expression Processing, Ch. 5: Practical Regex Techniques | Deep dive into how regex engines work, including greediness. Essential for understanding why your patterns behave the way they do. |
| The Linux Command Line, 2nd Edition | William Shotts | Ch. 19: Regular Expressions, Ch. 20: Text Processing | Beginner-friendly introduction to regex and text processing tools including sed. |
Common Pitfalls & Debugging
Problem 1: “Bold and italic output is mangled”
- Why: Italic rules ran before bold rules, causing
**bold**to be treated as*italic*. - Fix: Always process bold (
**) before italic (*). - Quick test:
echo '**bold** and *italic*' | sed -E -e 's#\\*\\*([^*]+)\\*\\*#<strong>\\1</strong>#g' -e 's#\\*([^*]+)\\*#<em>\\1</em>#g'
Problem 2: “Multiple bold sections collapse into one”
- Why: Greedy
(.*)matched from the first**to the last**. - Fix: Use a negated class like
[^*]+for bold and italic. - Quick test:
echo '**one** and **two**' | sed -E 's#\\*\\*([^*]+)\\*\\*#<strong>\\1</strong>#g'
Problem 3: “Parentheses are treated literally”
- Why: You forgot
-Eand used unescaped()in BRE mode. - Fix: Use
-Eor escape parentheses as\\(and\\). - Quick test:
echo '# Title' | sed -E 's#^# (.*)$#<h1>\\1</h1>#'
Problem 4: “Escaping slashes makes the script unreadable”
- Why: Using
/as the delimiter in HTML replacements. - Fix: Switch to
#or|as the delimiter. - Quick test:
sed -E 's#^## (.*)$#<h2>\\1</h2>#' notes.md
Definition of Done
- Headings
#,##, and###convert to<h1>,<h2>,<h3>correctly. - Bold and italic are converted correctly and in the right order.
- Multiple bold/italic segments on one line are all converted (use
g). - The script runs via
sed -E -f converter.sed notes.mdand matches expected output. -
[ ] You documented limitations (no lists or paragraph tags in this version).
Project 4: The Multi-Line Address Parser
- File: LEARN_SED_COMMAND.md
- Main Programming Language: sed (Bash/Shell)
- Alternative Programming Languages: Awk, Perl
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 3: Advanced
- Knowledge Area: Advanced
sed/ Hold Space / Multi-line processing - Software or Tool: sed
- Main Book: The Grymoire - SED (An excellent online tutorial)
What you’ll build: A sed script that transforms a multi-line address block into a single, comma-separated line.
Why it teaches sed: This is your first “real” multi-line problem. It is impossible to solve without using the hold space. This project forces you to leave the line-by-line assembly line model and start thinking about how to store and combine information across multiple lines.
Core challenges you’ll face:
- “Remembering” previous lines → maps to using
Hto append lines to the hold space - Knowing when you’re at the end of a block → maps to using patterns (like a blank line) to trigger an action
- Processing the combined block → maps to using
gorxto bring the collected lines back into the pattern space for a final substitution - Handling newlines in the pattern space → maps to recognizing that the pattern space will now contain
\t,\n,\r,\\,\"characters
Key Concepts:
- Hold Space: The Grymoire - SED (Advanced section)
- Multi-line Commands:
N,P,Dcommands. - Advanced Flow Control: “sed & awk” Ch. 5
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 3. Be prepared to be confused; this is a big conceptual leap.
Real World Outcome
You will have a file addresses.txt:
123 Fake St.
Anytown, ST 12345
USA
456 Main Ave.
Otherville, CA 67890
USA
Your sed script will transform it into:
123 Fake St., Anytown, ST 12345, USA
456 Main Ave., Otherville, CA 67890, USA
Command Line Outcome Example:
$ cat addresses.txt
123 Fake St.
Anytown, ST 12345
USA
456 Main Ave.
Otherville, CA 67890
USA
$ cat > script.sed <<'S'
/^$/ {
x
s/^\\n//
s/\\n/, /g
p
d
}
H
S
$ sed -n -f script.sed addresses.txt
123 Fake St., Anytown, ST 12345, USA
456 Main Ave., Otherville, CA 67890, USA
Implementation Hints:
This is a classic sed pattern. Here’s the logic broken down:
- The Goal: Read lines and append them together, replacing newlines with “, “. When we see a blank line, we print the result and start over.
- The
sedScript Logic (in English):- For every line…
- If this is the last line of the file (
$), jump to a special block of code to handle it. - Read the next line from the input and append it to the pattern space. The two lines are now separated by a
\t,\n,\r,\\,\". This is theNcommand. - If the pattern space now contains a blank line (
\t,\n,\r,\\,\"$), it means we’ve read the line after an address block.- We need to process the block (which is everything before the final
\t,\n,\r,\\,\"). - Print the processed part, then delete it, leaving the blank line to be handled in the next cycle. This is what the
PandDcommands do.
- We need to process the block (which is everything before the final
- If it’s not the end of a block, just branch back to the beginning to append another line.
- This creates a loop that “slurps” lines into the pattern space. Once you have the whole block, you can do substitutions.
A simpler Hold Space approach:
- Create a
script.sedfile. You will run it withsed -f script.sed addresses.txt. - For lines that are NOT blank:
- Append the line to the hold space. Use the
Hcommand. - Delete the line from the pattern space so it’s not printed. Use
d.
- Append the line to the hold space. Use the
- For lines that ARE blank (this is our trigger):
- First, use
xto swap the hold space (which contains\t,\n,\r,\\,\"line1\t,\n,\r,\\,\"line2\t,\n,\r,\\,\"line3`) and the pattern space. - Now the pattern space has your collected address block.
- Perform substitutions to replace the newlines with “, “. The first
\t,\n,\r,\\,\"will be at the beginning.s/\t,\n,\r,\\,\"//removes the first one.s/\t,\n,\r,\,"/, /greplaces the rest. - The line is now formatted and will be printed automatically.
- The hold space now contains the blank line, which is fine. It will be overwritten on the next cycle.
- First, use
Learning milestones:
- You can use
Handgto append a line and print the entire buffer → You understand the basics of storing state. - You can trigger an action on a blank line → You know how to use patterns to control script flow.
- You successfully replace
\t,\n,\r,\\,\"characters in the pattern space → You’ve mastered multi-line substitution. - You can explain the difference between
handH, andgandG→ You have a solid mental model of the hold space.
The Core Question You’re Answering
“How can sed ‘remember’ previous lines when it normally only sees one line at a time?”
This is a critical conceptual leap. By default, sed processes text like water flowing through a pipe - each line passes through, gets transformed (or not), and flows out. The line is then forgotten. But what if you need to combine lines? What if your transformation depends on what came before?
The answer is the hold space - sed’s hidden second buffer that persists across lines. Understanding this unlocks an entirely new category of text manipulation.
Concepts You Must Understand First
Before attempting this project, you need a solid grasp of these concepts:
1. The Pattern Space vs Hold Space
+---------------------------------------------------------------------------+
| sed's Two Buffers |
+---------------------------------------------------------------------------+
| |
| Pattern Space Hold Space |
| +--------------------+ +--------------------+ |
| | | h | | |
| | "current line" | -------> | "saved data" | |
| | | (copy) | | |
| | | | | |
| | | H | | |
| | "current line" | -------> | "saved\nmore" | |
| | | (append) | | |
| | | | | |
| | | g | | |
| | "from hold" | <------- | "saved data" | |
| | | (copy) | | |
| | | | | |
| | | G | | |
| | "current\nsaved" | <------- | "saved data" | |
| | | (append) | | |
| | | | | |
| | | x | | |
| | <---------------- | <------> | ----------------> | |
| | |(exchange)| | |
| +--------------------+ +--------------------+ |
| |
| ^ Input comes here Persists across lines! |
| v Output goes from here (Initially empty) |
| |
+---------------------------------------------------------------------------+
2. The Hold Space Commands
| Command | Name | Action |
|---|---|---|
h |
hold | Copy pattern space to hold space (overwrites) |
H |
Hold (append) | Append pattern space to hold space (with newline separator) |
g |
get | Copy hold space to pattern space (overwrites) |
G |
Get (append) | Append hold space to pattern space (with newline separator) |
x |
exchange | Swap pattern space and hold space |
3. The Multi-Line Commands
| Command | Name | Action |
|---|---|---|
N |
Next | Append next input line to pattern space (with \n separator) |
P |
Print up to the first \n in pattern space |
|
D |
Delete | Delete up to the first \n in pattern space, then restart cycle |
4. How Newlines Appear in the Pattern Space
When you use H, G, or N, newlines (\n) become actual characters in the buffer:
After H command appends "line2" to hold space containing "line1":
+---------------------------+
| l i n e 1 \n l i n e 2 |
+---------------------------+
^
This is a literal \n character you can match and replace!
Questions to Guide Your Design
Before writing any code, answer these questions:
- How do you detect the end of an address block?
- What marks the boundary between one address and the next?
- Is it a blank line? A specific pattern? End of file?
- What does the pattern space look like after multiple
Hcommands?- If you append “123 Fake St.”, then “Anytown, ST 12345”, then “USA”…
- What exact string is in the hold space? (Hint: there are
\ncharacters)
- How do you replace embedded newlines with “, “?
- Once you have the multi-line block in the pattern space, what substitution converts it to a single line?
- Is there a leading newline you need to handle differently?
- What happens to the hold space between blocks?
- After you print one formatted address, is the hold space empty?
- How do you “reset” for the next address block?
Thinking Exercise: Trace the Buffers
Before coding, manually trace what happens for this input:
123 Fake St.
Anytown, ST 12345
USA
For each line, fill in this table:
| Input Line | Action | Pattern Space After | Hold Space After |
|---|---|---|---|
123 Fake St. |
? | ? | ? |
Anytown, ST 12345 |
? | ? | ? |
USA |
? | ? | ? |
| `` (blank) | ? | ? | ? |
Hint for the trace: Think about these steps:
- Line arrives in pattern space
- If not blank: append to hold space (
H), then delete (d) to suppress printing - If blank: exchange (
x) to get the collected lines, perform substitutions, print
What should the hold space contain after “USA”?
\n123 Fake St.\nAnytown, ST 12345\nUSA
Note the leading \n - the H command always adds a newline before appending.
The Interview Questions They’ll Ask
This project prepares you for classic interview questions about sed:
- “Explain the difference between the pattern space and hold space in sed.”
- Pattern space: The active workspace where
sedloads each line and performs commands - Hold space: A secondary buffer that persists across lines, used for “remembering” data
- Key insight: Pattern space is automatically printed (unless
-n), hold space is never printed directly
- Pattern space: The active workspace where
- “How would you join multiple lines into one with sed?”
- Use
Nto append lines (with newline separator) - Or use
Hto collect lines in hold space, thengorxto retrieve - Replace
\nwith desired separator usings/\n/, /g
- Use
- “What does
H;g;$pdo?”H: Append current line to hold spaceg: Copy hold space to pattern space$p: On the last line only, print the pattern space- Result: Collects all lines, prints them all together at the end (with newlines between them)
- “Write a sed command to join every two lines with a comma.”
sed 'N;s/\n/, /'- Read next line, replace the newline with “, “
Hints in Layers
If you’re stuck, reveal hints one at a time:
Hint 1: The Basic Structure
Your script needs two main sections:
# For non-blank lines: store and suppress
/./{ ... }
# For blank lines: process and output
/^$/{ ... }
The /./ matches any line with at least one character (non-blank).
The /^$/ matches lines that are empty (start immediately followed by end).
Hint 2: Storing Lines
For non-blank lines, you want to:
- Add this line to your “collection” in the hold space
- NOT print it yet
The commands you need:
H # Append pattern space to hold space (with \n)
d # Delete pattern space (prevents printing, starts next cycle)
Hint 3: Processing the Block
When you hit a blank line, you need to:
- Get the collected lines from hold space
- Fix the formatting (remove leading newline, replace others with “, “)
- Print the result
x # Exchange: now pattern space has the collected lines
s/^\n// # Remove the leading newline (from first H)
s/\n/, /g # Replace remaining newlines with ", "
p # Print the formatted line
Hint 4: The Complete Script
# For non-blank lines: accumulate in hold space
/./{
H
d
}
# For blank lines: format and print the block
/^$/{
x
s/^\n//
s/\n/, /g
p
}
Run with: sed -n -f script.sed addresses.txt
The -n suppresses automatic printing (we control output with p).
Hint 5: Handling the Last Block (Edge Case)
What if the file doesn’t end with a blank line? The last address block won’t be printed!
Add a handler for the last line:
# Handle last line if no trailing blank line
$!{
/./{H;d}
}
${
/./{H}
x
s/^\n//
s/\n/, /g
/./p
}
/^$/{
x
s/^\n//
s/\n/, /g
/./p
}
Or use a simpler approach with $ address to always process at end of file.
Books That Will Help
| Book | Relevant Chapter | What You’ll Learn |
|---|---|---|
| sed & awk, 2nd Edition (Dougherty & Robbins) | Chapter 5: Advanced sed Commands | The definitive explanation of hold space, h/H/g/G/x |
| sed & awk, 2nd Edition | Chapter 6: Advanced sed Techniques | Multi-line patterns, N/P/D commands |
| The Grymoire - SED (online) | “The Hold Buffer” section | Excellent visual explanations and examples |
| Classic Shell Scripting (Robbins & Beebe) | Chapter 3: Text Processing Tools | Practical context for when to use sed vs awk |
| Mastering Regular Expressions (Friedl) | Chapter 3: Overview of Regex Features | Understanding the regex that powers sed |
Online Resources:
- The Grymoire - SED - The best free online tutorial
- sed FAQ - Common questions and patterns
- Useful one-line scripts for sed - A goldmine of examples
Common Pitfalls & Debugging
Problem 1: “Output starts with a comma or blank space”
- Why:
Halways prepends a newline, so the hold space starts with\n. - Fix: Remove the leading newline before replacing the rest.
- Quick test:
sed -n '/^$/ { x; s/^\\n//; s/\\n/, /g; p; } H' addresses.txt
Problem 2: “The last block never prints”
- Why: Your script only prints on blank lines, and the file ends without one.
- Fix: Add a
$handler to process the final block. - Quick test: Add a
$block that swaps and prints when EOF is reached.
Problem 3: “Blocks from different addresses get merged”
- Why: You did not reset the hold space after printing.
- Fix: Use
xcarefully and ensure the hold space is cleared between blocks. - Quick test: After printing, run
x; s/.*/ /; xor structure logic so the next block overwrites.
Problem 4: “s/\n/, /g does nothing”
- Why: The pattern space never contained newlines because you forgot to
xorgfirst. - Fix: Swap or copy hold space into pattern space before replacing.
- Quick test: Insert
xbefore the substitution.
Definition of Done
- Each address block is printed as a single comma-separated line.
- The script handles the final block even without a trailing blank line.
- No extra commas or blank lines appear in output.
- You can explain how
H,x, ands/\\n/interact. - The solution works on at least three sample blocks.
Project 5: Reversing a File (Line by Line)
- File: LEARN_SED_COMMAND.md
- Main Programming Language: sed (Bash/Shell)
- Alternative Programming Languages:
tac(the real tool for this), Python, Perl - Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 3: Advanced
- Knowledge Area: Advanced
sed/ Hold Space Mastery - Software or Tool: sed
- Main Book: N/A, this is a classic puzzle found in online forums.
What you’ll build: A sed script that reverses the order of lines in a file, printing the last line first and the first line last.
Why it teaches sed: This is the canonical “expert sed” problem. It is impossible without a complete understanding of how the pattern space, hold space, and command flow interact. It forces you to think about how to accumulate the entire file in a buffer and only print it at the very end.
Core challenges you’ll face:
- How to avoid printing each line as it’s read → maps to using
-nand controlling all output withp - How to accumulate the entire file → maps to repeatedly appending to the hold space
- How to reverse the order → maps to a clever trick of prepending, not appending
- When to print the final result → maps to using the
$address to trigger a final action on the last line
Key Concepts:
- Suppressing Output: The
-nflag. - Hold Space Manipulation:
G,h,g. - End-of-file Address: The
$address.
Difficulty: Advanced Time estimate: Weekend Prerequisites: Project 4. You should be comfortable with the hold space.
Real World Outcome
Given a file file.txt:
A
B
C
Your script sed -n -f reverse.sed file.txt will output:
C
B
A
Command Line Outcome Example:
$ cat file.txt
A
B
C
$ sed -n '1!G;h;$p' file.txt
C
B
A
Implementation Hints:
This is a puzzle. Think about the state at each step.
- The Goal: At the end of the script (
$line), we want the hold space to containC\t,\n,\r,\\,\"B\t,\n,\r,\,"A. - Line 1 (“A”):
- The pattern space contains “A”.
- We need to store it.
hwill copy “A” to the hold space. - Hold space: “A”
- Line 2 (“B”):
- The pattern space contains “B”.
- We need to add this to the hold space, but before “A”.
Gappends the hold space to the pattern space. Pattern space: “B\t,\n,\r,\,"`A”.hthen copies this combined buffer back to the hold space. Hold space: “B\t,\n,\r,\,"`A”.
- Line 3 (“C”, the last line
$):- The pattern space contains “C”.
Gappends the hold space. Pattern space: “C\t,\n,\r,\,"B\t,\n,\r,\\,\"A”.- Now we have our final reversed buffer, but it’s in the pattern space.
- We need to print it. The
pcommand will do this.
- Putting it together: The script looks surprisingly simple.
- For every line except the last (
!$): Append to hold space in reverse order (G;h). - For the last line (
$): Append the buffer (G) and then print (p). - Remember to use
-nto prevent printing on every line. - A more elegant solution exists with only two commands.
1!G;h;$p. Can you figure out why that works? (Hint: what happens on line 1?). A three-command solution is{1!G;h;$p}.
- For every line except the last (
Learning milestones:
- You can write a script to collect the whole file into the hold space → You understand accumulation.
- You figure out the
G;htrick to prepend lines → You’ve had the “aha!” moment of advancedsed. - You can control printing to only happen on the very last line → You master the
-nandpflags combined with the$address. - You can write the classic
sed '1!G;h;$p'one-liner from memory → You are now asedwizard.
The Core Question You’re Answering
“How can I accumulate and reorder an entire file using only the hold space?”
This is the ultimate hold space mastery challenge. Unlike Project 4 where you processed blocks delimited by blank lines, here you must hold the entire file in memory while simultaneously reversing its order. There’s no external trigger—you must track everything yourself and know exactly when to release the accumulated content.
Concepts You Must Understand First
Before attempting this project, ensure you have mastered:
| Concept | Why It’s Critical |
|---|---|
| Complete hold space mastery (from Project 4) | You’ll use G and h in a very specific sequence to achieve reversal |
The -n flag |
Suppresses automatic printing—without this, every line prints immediately, defeating the purpose |
The p (print) command |
When -n is active, you control ALL output explicitly with p |
The $ (last line) address |
Triggers your final action—printing the accumulated reversed content |
The ! (negation) address modifier |
Lets you say “all lines EXCEPT line 1” with 1! |
The 1!G;h;$p pattern |
The canonical solution—three commands that accomplish everything |
The 1!G;h;$p Pattern Explained:
This cryptic one-liner consists of three commands executed in sequence:
1!G— “If NOT line 1, append hold space to pattern space”- On line 1,
Gis skipped (because the hold space is empty and would add a leading newline) - On all other lines,
Gappends the previously accumulated content AFTER the current line
- On line 1,
h— “Copy pattern space to hold space”- Always executed—saves the current accumulated state for the next iteration
$p— “If this is the last line, print”- Only triggers on the final line of input
- Prints the fully reversed content
Visual State Diagram:
┌─────────────────────────────────────────────────────────────────────────────┐
│ Processing a 3-line file: A, B, C │
│ Command: sed -n '1!G;h;$p' │
└─────────────────────────────────────────────────────────────────────────────┘
═══════════════════════════════════════════════════════════════════════════════
PROCESSING LINE 1 (content: "A")
═══════════════════════════════════════════════════════════════════════════════
Step 0: sed reads line into pattern space
┌─────────────────┐ ┌─────────────────┐
│ Pattern Space │ │ Hold Space │
│ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ A │ │ │ │ (empty) │ │
│ └─────────────┘ │ │ └─────────────┘ │
└─────────────────┘ └─────────────────┘
Step 1: "1!G" — This IS line 1, so G is SKIPPED (the ! negates)
┌─────────────────┐ ┌─────────────────┐
│ Pattern Space │ │ Hold Space │
│ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ A │ │ │ │ (empty) │ │ ← No change
│ └─────────────┘ │ │ └─────────────┘ │
└─────────────────┘ └─────────────────┘
Step 2: "h" — Copy pattern space to hold space
┌─────────────────┐ ┌─────────────────┐
│ Pattern Space │ │ Hold Space │
│ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ A │ │ │ │ A │ │ ← Now holds "A"
│ └─────────────┘ │ │ └─────────────┘ │
└─────────────────┘ └─────────────────┘
Step 3: "$p" — This is NOT the last line, so p is SKIPPED
Output: (nothing — suppressed by -n)
═══════════════════════════════════════════════════════════════════════════════
PROCESSING LINE 2 (content: "B")
═══════════════════════════════════════════════════════════════════════════════
Step 0: sed reads line into pattern space
┌─────────────────┐ ┌─────────────────┐
│ Pattern Space │ │ Hold Space │
│ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ B │ │ │ │ A │ │
│ └─────────────┘ │ │ └─────────────┘ │
└─────────────────┘ └─────────────────┘
Step 1: "1!G" — This is NOT line 1, so G EXECUTES
G appends hold space to pattern space with \n between
┌─────────────────┐ ┌─────────────────┐
│ Pattern Space │ │ Hold Space │
│ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ B │ │ │ │ A │ │
│ │ ↓ │ │ │ └─────────────┘ │
│ │ A │ │ └─────────────────┘
│ └─────────────┘ │
└─────────────────┘
Pattern space now: "B\nA" (B is BEFORE A — reversal in progress!)
Step 2: "h" — Copy pattern space to hold space
┌─────────────────┐ ┌─────────────────┐
│ Pattern Space │ │ Hold Space │
│ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ B │ │ │ │ B │ │
│ │ ↓ │ │ │ │ ↓ │ │
│ │ A │ │ │ │ A │ │
│ └─────────────┘ │ │ └─────────────┘ │
└─────────────────┘ └─────────────────┘
Step 3: "$p" — This is NOT the last line, so p is SKIPPED
Output: (nothing)
═══════════════════════════════════════════════════════════════════════════════
PROCESSING LINE 3 (content: "C") — THE LAST LINE
═══════════════════════════════════════════════════════════════════════════════
Step 0: sed reads line into pattern space
┌─────────────────┐ ┌─────────────────┐
│ Pattern Space │ │ Hold Space │
│ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ C │ │ │ │ B │ │
│ └─────────────┘ │ │ │ ↓ │ │
└─────────────────┘ │ │ A │ │
│ └─────────────┘ │
└─────────────────┘
Step 1: "1!G" — This is NOT line 1, so G EXECUTES
┌─────────────────┐ ┌─────────────────┐
│ Pattern Space │ │ Hold Space │
│ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ C │ │ │ │ B │ │
│ │ ↓ │ │ │ │ ↓ │ │
│ │ B │ │ │ │ A │ │
│ │ ↓ │ │ │ └─────────────┘ │
│ │ A │ │ └─────────────────┘
│ └─────────────┘ │
└─────────────────┘
Pattern space now: "C\nB\nA" (FULLY REVERSED!)
Step 2: "h" — Copy pattern space to hold space (doesn't matter now)
Step 3: "$p" — This IS the last line, so p EXECUTES!
╔═══════════════════════════════════════╗
║ OUTPUT: ║
║ C ║
║ B ║
║ A ║
╚═══════════════════════════════════════╝
The Key Insight: By using G (append hold to pattern) instead of H (append pattern to hold), we naturally build the content in reverse order. Each new line becomes the TOP of our accumulated buffer.
Questions to Guide Your Design
Before writing any code, answer these questions:
- Why use
-nwith explicitp?- Without
-n, sed prints every line automatically after processing - We want to accumulate ALL lines and print only at the end
- Think: What would happen if lines printed as they were processed?
- Without
- Why
Gbeforeh(not after)?Gappends hold space AFTER pattern space:pattern\nhold- If we did
hfirst, we’d overwrite the hold space before using it - The sequence matters: use the old value, THEN update it
- Why skip
Gon line 1?- On line 1, the hold space is empty
Gwould still append a newline (the separator) even with empty content- This would create a leading blank line in output
- When should we print?
- Only on the last line (
$) - At that point, the pattern space contains the complete reversed file
- If we printed earlier, we’d get partial results
- Only on the last line (
Thinking Exercise
Trace the exact state of pattern space and hold space for a 4-line file:
Input file (numbers.txt):
1
2
3
4
Complete this table by hand BEFORE running the command:
| Line # | After Read | After 1!G | After h | After $p | Hold Space Contents |
|---|---|---|---|---|---|
| 1 | Pattern: “1” | Pattern: “1” (G skipped) | Pattern: “1” | (p skipped) | “1” |
| 2 | Pattern: “2” | Pattern: ? | Pattern: ? | ? | ? |
| 3 | Pattern: “3” | Pattern: ? | Pattern: ? | ? | ? |
| 4 | Pattern: “4” | Pattern: ? | Pattern: ? | ? | ? |
Then verify your answers:
# Create test file
printf '1\n2\n3\n4\n' > numbers.txt
# Run with debug output (using labels and branches for tracing)
sed -n '1!G;h;$p' numbers.txt
Expected final output: 4, 3, 2, 1 (each on its own line)
The Interview Questions They’ll Ask
This is a classic interview question for Unix/Linux positions. Be prepared for:
Question 1: “Write a sed one-liner to reverse a file”
# Your answer should be:
sed -n '1!G;h;$p' filename
# Or the equivalent with curly braces:
sed -n '{1!G;h;$p}' filename
Question 2: “Explain what sed '1!G;h;$p' does step by step”
Your explanation should cover:
- Why
-nis needed (suppress auto-print) - What
1!means (NOT line 1) - What
Gdoes (append hold to pattern with newline) - What
hdoes (copy pattern to hold) - What
$pdoes (print on last line only) - WHY this results in reversal (new lines go on TOP of accumulated content)
Question 3: “Why is there a tac command when you can use sed?”
Good answer points:
tacis purpose-built and more readable for this specific tasktacis likely more memory-efficient for large filessedsolution demonstrates understanding of the tool’s internals- The
sedapproach is valuable when you need to add additional transformations - In environments without
tac(some minimal Unix systems),sedis available
Bonus Question: “What happens if the file is empty?”
- With an empty file, there are no lines to process
- The
$condition never triggers because there’s no “last line” - Output is empty (correct behavior)
Bonus Question: “What happens with a very large file?”
- The entire file accumulates in the hold space
- For files larger than available memory, this will fail
tachandles this better with buffering strategies- For production use on large files, prefer
tacortail -r
Hints in Layers
If you’re stuck, reveal these hints one at a time:
Hint 1: The Big Picture
You need to:
- Suppress all automatic output (
-n) - Build up content in reverse order
- Print everything at the very end
The hold space is your accumulator. But HOW you add to it matters.
Hint 2: Append vs. Prepend
There’s no “prepend to hold space” command in sed. But think about what G does:
Gappends the HOLD space to the PATTERN space- This puts the NEW line BEFORE the accumulated content
- Then
hsaves this new arrangement
So the pattern is: G to combine (new on top), h to save.
Hint 3: The Line 1 Problem
On line 1, the hold space is empty. If you run G, you get:
current_line + \n + (empty) = current_line\n
That trailing newline becomes a leading blank line in your output.
Solution: Skip G on line 1. Use 1!G (execute G on all lines EXCEPT line 1).
Hint 4: When to Print
You only want to print when you have the complete reversed file.
That’s when you’ve processed the last line.
The $ address matches the last line.
So $p means “if last line, print”.
Hint 5: The Complete Solution
sed -n '1!G;h;$p' filename
Breaking it down:
-n: Don’t print automatically1!G: On lines 2+, append hold space to pattern spaceh: Always copy pattern space to hold space$p: On the last line, print the pattern space
Run on line 1: skip G, copy “line1” to hold Run on line 2: G makes “line2\nline1”, h saves it Run on line 3: G makes “line3\nline2\nline1”, h saves it …on last line: G combines, h saves, p prints the reversed file
Books That Will Help
| Book | Author(s) | Why It’s Relevant |
|---|---|---|
| sed & awk, 2nd Edition | Dale Dougherty & Arnold Robbins | Chapter 6 covers advanced sed techniques including hold space patterns |
| The UNIX Programming Environment | Kernighan & Pike | Classic text with stream editing philosophy and real-world examples |
| Classic Shell Scripting | Arnold Robbins & Nelson H.F. Beebe | Practical sed recipes including file reversal and multi-line patterns |
| Linux Command Line and Shell Scripting Bible | Richard Blum & Christine Bresnahan | Accessible coverage of sed with many working examples |
| The Grymoire - SED Tutorial | Bruce Barnett | Free online resource with excellent hold space explanations |
Online Resources:
- GNU sed Manual — The authoritative reference
- sed FAQ — Common questions and advanced patterns
- Sed One-Liners Explained — Peteris Krumins’ excellent breakdown
Common Pitfalls & Debugging
Problem 1: “Output has a blank line at the top”
- Why: You ran
Gon line 1, which appends an empty hold space with a newline. - Fix: Skip
Gon the first line using1!G. - Quick test:
sed -n '1!G;h;$p' file.txt
Problem 2: “Output is in the same order”
- Why: You used
H(append pattern to hold) instead ofG(append hold to pattern). - Fix: Use
G;hto prepend the accumulated hold space to the current line. - Quick test: Compare
H;hvsG;hon a 3-line file.
Problem 3: “Every line prints immediately”
- Why: You forgot
-nor usedpwithout$. - Fix: Add
-nand print only on the last line with$p. - Quick test:
sed -n '1!G;h;$p' file.txt
Problem 4: “Script fails on huge files”
- Why: This method stores the entire file in memory.
- Fix: Use
tacortail -rfor large inputs in production. - Quick test: Measure memory usage on large files and compare with
tac.
Definition of Done
sed -n '1!G;h;$p' file.txtproduces the exact reversed order.- No extra blank line appears at the top of the output.
- The script works for empty files and single-line files.
- You can explain why
Gis skipped on line 1. -
[ ] You documented the memory trade-off vs
tac.
Summary
| Project | Difficulty | Key Learning |
|---|---|---|
| 1. Config File Updater | Beginner | Basic substitution (s), in-place editing (-i). |
| 2. Log File Cleaner | Beginner | Regex capturing groups (\t, \n, \r, \\, \") for reformatting. |
| 3. Markdown to HTML | Intermediate | Writing multi-command scripts, command order. |
| 4. Multi-Line Address Parser | Advanced | The Hold Space (H, g, x) for multi-line logic. |
| 5. Reversing a File | Advanced | Mastery of the hold space and advanced flow control. |