Project 1: Field Extractor CLI Tool

Build a fieldex command that extracts, reorders, and formats columns from delimited text with predictable, testable output.

Quick Reference

Attribute	Value
Difficulty	Level 1: Beginner
Time Estimate	6-10 hours
Main Programming Language	AWK (with a Bash wrapper) (Alternatives: Python, Perl, Go)
Alternative Programming Languages	Python, Perl, Go
Coolness Level	Level 2: Practical
Business Potential	1: Internal utility
Prerequisites	Basic shell usage, piping, simple regex
Key Topics	FS/OFS, NF, FPAT, CLI arg parsing, field ranges

1. Learning Objectives

By completing this project, you will:

Explain how AWK splits records into fields and rebuilds output records.
Implement a CLI that accepts field lists and ranges like 1,3,5-7 and 2-.
Support delimiter configuration and consistent output formatting.
Handle quoted CSV reliably using GNU awk FPAT.
Build deterministic tests for CLI text tools.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Records, Fields, and Separators (FS/OFS/RS/ORS)

Fundamentals

AWK processes text as a stream of records (usually lines) and splits each record into fields. The full line is stored in $0 and the fields are $1 through $NF. FS controls how input is split, while OFS controls how output fields are joined. RS and ORS do the same for records. The default FS is a single space, which collapses runs of whitespace, so tabular data “just works” without extra configuration. Understanding these variables is essential for a field extraction tool because your output is literally a new record assembled from specific fields. Changing a field or NF can cause AWK to rebuild $0 with OFS, which is a powerful but sometimes surprising behavior.

Deep Dive into the concept

At runtime, AWK executes a consistent pipeline: read a record using RS, set $0, split into fields using FS (or FPAT/FIELDWIDTHS), then run pattern-action rules. For most CLI tools, you keep RS="\n" and operate line by line. The key nuance is how FS behaves. If FS is a single space, AWK treats any run of whitespace as a separator and ignores leading/trailing whitespace. This is why awk '{print $1}' works on irregular spacing. But if FS is a literal comma or colon, AWK does not collapse adjacent delimiters; empty fields are preserved, which matters for CSV and /etc/passwd style data.

OFS is equally important. If you print multiple fields with commas in the print statement, AWK inserts OFS between them, not the original separator. This means print $1, $3 is “format output,” not “reproduce input.” For a CLI tool, this is a feature: you can standardize output regardless of input delimiters. However, it means you must set OFS deliberately and document it. If you print using concatenation (print $1 $3), AWK does not insert OFS, which yields different output. This distinction is the difference between a stable tool and a confusing one.

RS can be configured to treat paragraphs or other blocks as records. You probably keep RS as newline in this project, but understanding it matters because a “field extractor” that claims to be general must explicitly state its assumptions. If you later extend to multi-line records, you will need to reason about how NF changes and how you rebuild $0 after modifications.

The rebuild behavior is the critical invariant: when you assign to a field ($3 = "x") or to NF, AWK marks the record as “dirty.” The next time $0 is referenced, AWK rebuilds it by joining the fields with OFS. This is how a field extraction tool can normalize spacing or output. But it also means that scripts with multiple rules can behave differently depending on order. If an earlier rule modifies fields, later rules see the modified record. This is why a CLI tool should prefer a single rule with a deterministic BEGIN for setup and a single action for output.

Finally, consider portability. POSIX AWK guarantees FS, OFS, RS, ORS, and the record/field model. GNU awk adds FPAT for content-based splitting, which is extremely useful for CSV with quoted commas. If you require FPAT, the tool should document that it expects GNU awk. This is a design decision that affects users, tests, and error handling. For the fieldex tool, it is reasonable to detect gawk and enable --csv mode with FPAT, while leaving default FS parsing for other cases.

How this fit on projects

This concept is the foundation of the extractor: you are explicitly selecting fields and rebuilding output records in a controlled format.

Definitions & key terms

Record -> The current input unit ($0), typically one line.
Field -> A token within a record ($1..$NF).
FS -> Input field separator (string or regex).
OFS -> Output field separator inserted by print.
RS -> Input record separator, usually newline.
ORS -> Output record separator, usually newline.
NF -> Number of fields in the current record.

Mental model diagram

raw line -> split by FS -> [$1][$2][$3]... -> select -> join with OFS -> output line

How it works (step-by-step)

Read a line into $0 using RS.
Split $0 into $1..$NF using FS.
Select requested fields and build a list.
print the list so AWK inserts OFS between fields.
Emit ORS at the end of the record.

Invariants and failure modes:

Invariant: NF equals the number of fields after splitting.
Failure: If FS does not match input format, fields shift or become empty.

Minimal concrete example

BEGIN { FS=":"; OFS="|" }
{ print $1, $7 }

Common misconceptions

“FS=" " means a single space.” (It collapses runs of whitespace.)
“print $1 $2 uses OFS.” (It does not; it concatenates.)
“Changing $1 doesn’t change $0.” (It does when $0 is rebuilt.)

Check-your-understanding questions

Why does FS=" " behave differently than FS=","?
What is the difference between print $1, $2 and print $1 $2?
What happens if you set NF=2 in the middle of a script?

Check-your-understanding answers

A single space collapses runs of whitespace; a literal comma does not.
The comma inserts OFS; concatenation does not insert anything.
AWK truncates the record to two fields and rebuilds $0 on demand.

Real-world applications

Extracting specific columns from CSV exports
Normalizing log formats for downstream tools
Selecting columns from /etc/passwd or ps output

Where you’ll apply it

See §5.4 “Concepts You Must Understand First” and §5.10 Phase 1.
Also used in: P04 CSV to JSON/SQL, P06 Multi-File Join

References

The AWK Programming Language, Ch. 1-2
Effective awk Programming, Ch. 4
POSIX awk specification (awk(1p))

Key insights

Field extraction is just controlled splitting and rejoining—get the separators right and everything else becomes predictable.

Summary

Records and fields are AWK’s core model. If you can predict how FS and OFS shape input and output, you can build reliable text tools with very little code.

Homework/Exercises to practice the concept

Extract fields 1 and 3 from a space-delimited file and join with |.
Switch FS from space to colon and observe NF changes.
Modify $2 and print $0 to see record rebuilding.

Solutions to the homework/exercises

# 1
BEGIN { OFS="|" }
{ print $1, $3 }

# 2
BEGIN { FS=":" }
{ print NF }

# 3
{ $2 = "X"; print $0 }

2.2 Field Selection Syntax and Range Expansion

Fundamentals

A field extractor needs a compact way to describe which columns to print. Humans use lists and ranges: 1,3,5-7,10-. The tool’s job is to parse that description, expand it into a concrete ordered list, and then print fields in that order. The tricky part is validating field indices, handling open-ended ranges, and deciding what to do when fields are missing. This is not an AWK language feature; it is a small parsing problem you solve in the wrapper or in AWK itself.

Deep Dive into the concept

Range expansion is a tiny domain-specific language. The input string is a comma-separated list of tokens. Each token is either a single positive integer (field index) or a range with a hyphen. Ranges can be start-end or start- (open ended). Your parser must detect invalid tokens (non-numeric, zero, negative) and decide on failure behavior. For example, -3 should be rejected, and 3-2 is either an error or treated as a descending range. Define this explicitly and test for it.

Once you parse the list, you need to expand it into a deterministic sequence. If a user writes 1,3-5,3, do you allow duplicates? Many CLI tools preserve order and duplicates, which can be useful for reordering or duplication. That behavior should be documented because it affects output. A safe default is to preserve order but allow a --unique flag that removes duplicates. For this project, you can preserve order and avoid de-duplication to keep behavior simple and explainable.

Open-ended ranges require a decision on what - means. A common interpretation is start- meaning “from start to NF.” In AWK, NF is only known at runtime for each record. That means you cannot fully expand such ranges ahead of time unless you treat them as a special token resolved per record. A clean design is to parse ranges into a list of selectors that can be evaluated per record. For example, store range_start and a flag range_to_end. At print time, you loop from range_start to NF and print each field. This keeps your parser simple and your runtime logic clear.

The wrapper vs. pure AWK design choice matters. A Bash wrapper can parse command-line options, then pass a generated AWK program or variable values to AWK. Alternatively, you can implement parsing inside AWK using ARGV and split() on the -f option. The wrapper approach is easier to read and test for beginners, while the pure-AWK approach is more self-contained and portable. In this project, you can use a small shell wrapper and keep the AWK script focused on field selection.

There is also a subtle formatting consideration: if a requested field index is greater than NF, AWK returns an empty string. If you print a list of fields, you will still get separators. This can produce trailing delimiters. Decide whether to omit missing fields or output empty fields. The more predictable behavior is to output empty fields, because it preserves column positions. But for a CLI tool, many users prefer to drop missing fields. You can offer a --strict or --skip-missing flag. For this project, define a clear default and document it in the usage.

Finally, consider error handling. If the user passes -f without a value, or includes a malformed token like 1,,3, your tool should return a non-zero exit code and print a helpful message. This is part of being a production-quality CLI, even for a small tool.

How this fit on projects

Field selection and range expansion is the “business logic” of the tool. Everything else is setup and I/O.

Definitions & key terms

Field list -> A comma-separated list of field selectors.
Range -> A selector with start-end or start-.
Open-ended range -> A range that ends at NF.
Selector -> A single field index or a range definition.

Mental model diagram

"1,3-5,7-" -> tokens -> [1] [3-5] [7-] -> runtime expansion -> [1,3,4,5,7..NF]

How it works (step-by-step)

Parse -f argument into comma-separated tokens.
For each token, detect single index or range.
Validate indices (positive integers).
Store selectors in order.
For each input record, expand ranges using NF and print fields.

Invariants and failure modes:

Invariant: all selectors are validated integers.
Failure: invalid tokens should abort with exit code 2.

Minimal concrete example

# Given selectors in an array sel[1..n]
for (i = 1; i <= n; i++) {
    if (sel[i] ~ /-/) {
        split(sel[i], r, "-")
        start = r[1]
        end = (r[2] == "" ? NF : r[2])
        for (j = start; j <= end; j++) out[++k] = $j
    } else {
        out[++k] = $(sel[i])
    }
}
print join(out, OFS)

Common misconceptions

“Ranges can be expanded before reading input.” (Open-ended ranges depend on NF.)
“Missing fields are errors.” (In AWK, missing fields are empty strings.)
“You should always de-duplicate fields.” (Order and duplicates can be meaningful.)

Check-your-understanding questions

Why can’t 3- be expanded without reading the record?
What happens when you print a field index larger than NF?
Should 1,1,2 print field 1 twice?

Check-your-understanding answers

Because the end depends on NF, which is record-specific.
AWK prints an empty string for that field.
It depends on design; by default preserving order means yes.

Real-world applications

Command-line replacements for cut with range support
Data extraction in ETL pipelines
Reordering columns for data import/export

Where you’ll apply it

See §5.5 “Questions to Guide Your Design” and §5.10 Phase 2.
Also used in: P07 Report Generator, P16 Data Pipeline

References

The AWK Programming Language, Ch. 2
Effective awk Programming, Ch. 5 (programming with arrays)

Key insights

Field selection is a tiny parsing language; treat it with the same rigor as any CLI interface.

Summary

Parsing and expanding field ranges is what turns a simple one-liner into a reliable CLI tool that people can reuse.

Homework/Exercises to practice the concept

Write a parser that expands 2-4,1,6- into explicit indices.
Decide how to handle invalid tokens like 3--4 or a.
Implement a --unique mode that removes duplicates.

Solutions to the homework/exercises

# 1 (sketch)
function expand(sel, out,   n, i, r, start, end, j) {
    n = split(sel, s, ",")
    for (i=1; i<=n; i++) {
        if (s[i] ~ /-/) {
            split(s[i], r, "-")
            start = r[1]; end = r[2] ? r[2] : NF
            for (j=start; j<=end; j++) out[++k]=j
        } else out[++k]=s[i]
    }
}

2.3 Robust CSV Parsing with FPAT

Fundamentals

CSV looks simple but gets tricky when fields contain commas or quotes. If you split on commas with FS=",", a value like "Smith, John" will be split incorrectly. GNU awk’s FPAT solves this by defining what a field is rather than what separates fields. A typical CSV FPAT matches either a quoted field or an unquoted field. This allows you to preserve commas inside quotes and then unescape quotes for output.

Deep Dive into the concept

CSV has a small but strict grammar: fields are separated by commas, fields may be quoted with double quotes, and quotes inside quoted fields are escaped by doubling them (""). If you parse with FS=",", you cannot distinguish commas inside quoted fields from separators. FPAT reverses the logic. Instead of describing separators, you define a regex that matches valid field content. For CSV, a common pattern is:

FPAT = "([^,]*)|( \"([^\"]|\"\")*\" )"

In practice you can use a single-line regex like:

FPAT = "([^,]+)|(""([^""]|"")*"")"

This matches either unquoted text without commas or a quoted string that may contain escaped quotes. Once fields are matched, AWK treats each match as a field and sets $1..$NF accordingly.

However, FPAT is a GNU awk extension. That means portability is limited. Your CLI tool should detect GNU awk (gawk --version) or provide a --csv mode that explicitly requires it. For users on macOS, the default /usr/bin/awk is not GNU awk; this is a common portability problem. You can either instruct users to install gawk and run gawk -f fieldex.awk, or bundle a wrapper that checks and fails with a clear error message.

After parsing, you still need to unquote fields. CSV quoted fields preserve quotes, so "Alice" becomes a field value including quotes. If you output raw fields, your output may include quotes in unexpected places. For a field extractor tool, you typically want to output the value, so you should strip surrounding quotes and unescape doubled quotes. This can be done with a simple helper: if a field starts and ends with ", remove them, then replace "" with ".

Edge cases: empty fields, trailing commas, and line endings. FPAT patterns should allow empty fields (e.g., "" or nothing between commas). This is easy to miss. Also, Windows CSV files may end with \r\n; you should normalize line endings or set RS appropriately. If you ignore this, your output may carry stray \r characters.

Finally, note that CSV can include newlines inside quoted fields. Handling that would require multi-line record splitting by setting RS to a regex that matches a full CSV record. This is advanced and not required for this project, but your documentation should state that multi-line CSV is not supported. Clear boundaries are part of robust CLI tool design.

How this fit on projects

If you want a “real” field extractor, it must not break on quoted CSV. FPAT is the simplest practical approach in GNU awk.

Definitions & key terms

CSV -> Comma-separated values with quoted field rules.
FPAT -> GNU awk variable defining field patterns.
Quoted field -> A field enclosed in double quotes.
Escaped quote -> A doubled quote inside a quoted field ("").

Mental model diagram

line: Bob,"Smith, John",42
FPAT -> [Bob] ["Smith, John"] [42]

How it works (step-by-step)

Set FPAT to a regex that matches CSV fields.
AWK scans the line and extracts matches as fields.
Optionally unquote and unescape quoted fields.
Print selected fields with OFS.

Invariants and failure modes:

Invariant: quoted commas do not split fields.
Failure: non-GNU awk ignores FPAT and breaks parsing.

Minimal concrete example

BEGIN { FPAT = "([^,]+)|(\"([^\"]|\"\")*\")"; OFS="|" }
{ f = $2; gsub(/^"|"$/, "", f); gsub(/""/, "\"", f); print $1, f }

Common misconceptions

“CSV is just FS=",".” (Quoted commas break this.)
“FPAT is portable.” (It is GNU awk only.)
“Quoted fields should keep quotes.” (Depends on output requirements.)

Check-your-understanding questions

Why does FS="," fail for "Doe, Jane"?
What does FPAT change about field splitting?
How do you handle doubled quotes inside a quoted field?

Check-your-understanding answers

The comma inside quotes is treated as a separator.
It matches field content instead of separators.
Replace "" with " after stripping outer quotes.

Real-world applications

Parsing CSV exports from spreadsheets
Cleaning CRM or sales data
Extracting columns from log CSVs with quoted strings

Where you’ll apply it

See §5.4 “Concepts You Must Understand First” and §5.10 Phase 2.
Also used in: P04 CSV to JSON/SQL, P10 Text Spreadsheet

References

GNU awk User’s Guide: FPAT
RFC 4180 (CSV format)

Key insights

CSV parsing is about defining field content, not just separators; FPAT is the practical AWK solution.

Summary

A reliable field extractor must handle quoted CSV. FPAT makes that possible with a clear regex and post-processing.

Homework/Exercises to practice the concept

Write a FPAT that matches empty fields and quoted fields.
Parse a CSV row with embedded commas and output the second column only.
Add a flag to keep quotes rather than stripping them.

Solutions to the homework/exercises

BEGIN { FPAT = "([^,]*)|(\"([^\"]|\"\")*\")" }
{ f = $2; gsub(/^"|"$/, "", f); gsub(/""/, "\"", f); print f }

3. Project Specification

3.1 What You Will Build

A CLI tool called fieldex that extracts one or more fields from each input record. It supports delimiter configuration, field lists and ranges, and predictable output formatting. You will deliver a small Bash wrapper plus an AWK script, with tests and usage documentation.

Included:

-f for field list/range selection
-d for delimiter (FS) and --ofs for output
--csv mode using GNU awk FPAT
Deterministic test fixtures

Excluded:

Multi-line CSV parsing
Binary formats
GUI output

3.2 Functional Requirements

Field selection: Accept -f "1,3,5-7" and -f "2-".
Delimiter control: Accept -d "," to set FS.
Output separator: Accept --ofs "|" to set OFS.
CSV mode: --csv uses FPAT and unquotes fields.
Error handling: Invalid field specs return exit code 2.

3.3 Non-Functional Requirements

Performance: Stream input without loading entire files.
Reliability: Deterministic output for same inputs.
Usability: Clear usage and helpful error messages.

3.4 Example Usage / Output

$ printf 'a b c\n1 2 3\n' | ./fieldex -f 1,3
# Output:
a c
1 3

3.5 Data Formats / Schemas / Protocols

Input: delimited text, one record per line
Output: delimited text, one record per line
CSV mode: RFC 4180 style quoted fields (single-line only)

3.6 Edge Cases

Field index exceeds NF -> output empty field
Empty lines -> output empty line
-f missing or invalid -> error with exit code 2
-d empty string -> error with exit code 2

3.7 Real World Outcome

A user can run a single command to extract and reformat columns from mixed-delimiter files. They can see exact, stable outputs and understand errors when inputs or flags are invalid.

3.7.1 How to Run (Copy/Paste)

cd /path/to/fieldex
chmod +x fieldex
./fieldex -f 1,3 -d ',' --ofs '|' data.csv

3.7.2 Golden Path Demo (Deterministic)

Input file data.csv:

name,age,role
"Smith, John",32,engineer
Alice,29,manager

Command:

./fieldex --csv -f 1,2 --ofs '|' data.csv

Output:

Smith, John|32
Alice|29

3.7.3 Failure Demo (Deterministic)

$ ./fieldex -f 3--4 data.csv
fieldex: invalid field selector: 3--4
exit=2

3.7.4 Exit Codes

0 success
2 invalid arguments or field specs
3 input file not found/unreadable

4. Solution Architecture

4.1 High-Level Design

+-----------+      +----------------+      +----------------+
| CLI args  | ---> | selector parser | ---> | AWK executor   |
+-----------+      +----------------+      +----------------+
                                              |
                                              v
                                          formatted output

4.2 Key Components

Component	Responsibility	Key Decisions
Bash wrapper	Parse flags, validate, build args	Keep parsing simple, fail fast
Selector parser	Expand ranges and store selectors	Preserve order and duplicates
AWK script	Apply selectors and print fields	Single action for deterministic run

4.3 Data Structures (No Full Code)

# selectors[1..n] holds either "N" or "N-" or "N-M"
# output buffer holds selected fields per record

4.4 Algorithm Overview

Key Algorithm: Selector Expansion

Split -f string by commas.
Validate each token.
Store tokens in array.
For each record, expand ranges and print fields.

Complexity Analysis:

Time: O(R * (F + S)) where R records, F fields, S selectors
Space: O(S) selectors per record

5. Implementation Guide

5.1 Development Environment Setup

# Ensure GNU awk is available
awk --version | head -1

5.2 Project Structure

fieldex/
├── fieldex            # wrapper
├── fieldex.awk        # core logic
├── tests/
│   ├── sample.csv
│   └── expected.txt
└── README.md

5.3 The Core Question You’re Answering

“How do I treat text like a table and reshape it with almost no code?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

Field splitting (FS, FPAT)
Output formatting (OFS, print vs concatenation)
Array iteration and range expansion

5.5 Questions to Guide Your Design

Should invalid selectors abort the program or be ignored?
Should duplicates be preserved or removed?
Should missing fields be printed as empty or skipped?

5.6 Thinking Exercise

Manually expand 2-4,1,6- for a record with 7 fields.

5.7 The Interview Questions They’ll Ask

What is the difference between FS and FPAT?
Why does print $1, $2 insert OFS?
How do you handle field ranges that depend on NF?

5.8 Hints in Layers

Hint 1: Start with a fixed list of fields and print them.

Hint 2: Parse -f into an array with split().

Hint 3: Support ranges by detecting - and looping.

Hint 4: Add --csv only after the basic tool works.

5.9 Books That Will Help

Topic	Book	Chapter
Field variables	The AWK Programming Language	Ch. 1
Separators	Effective awk Programming	Ch. 4
Arrays	Effective awk Programming	Ch. 5

5.10 Implementation Phases

Phase 1: Foundation (2-3 hours)

Goals:

Parse -f and -d
Print fixed fields

Tasks:

Write a minimal wrapper that passes FS to AWK.
Hardcode print $1, $3 and validate output.

Checkpoint: Correct output for a small sample file.

Phase 2: Core Functionality (3-4 hours)

Goals:

Parse field lists and ranges
Set OFS

Tasks:

Implement selector parsing.
Expand ranges per record.
Add --ofs option.

Checkpoint: -f 1,3,5- works on files with variable NF.

Phase 3: Polish & Edge Cases (2-3 hours)

Goals:

Add --csv mode
Add tests and error handling

Tasks:

Implement FPAT and unquote logic.
Write tests for invalid selectors.
Document usage.

Checkpoint: Tests pass; CLI returns correct exit codes.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Range duplicates	Keep vs remove	Keep	Preserves explicit user intent
Missing fields	Empty vs skip	Empty	Predictable column count
CSV support	FS vs FPAT	FPAT	Correct handling of quoted commas

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Selector parsing	`1,3-5` expansion
Integration Tests	Full CLI invocation	`./fieldex -f 1,3 file`
Edge Case Tests	Missing fields, invalid arg	`-f 0`, `-f 3--4`

6.2 Critical Test Cases

CSV quoted field: Ensure "Smith, John" stays intact.
Open-ended range: -f 2- expands to 2..NF.
Missing field: -f 9 on 3-field line prints empty.

6.3 Test Data

# input
A,B,"C,D"
1,2,3

Expected output for -f 1,3:

A|C,D
1|3

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Wrong `FS`	Fields misaligned	Verify delimiter and whitespace
Missing `OFS`	Output glued together	Set `OFS` in `BEGIN`
Range parsing errors	Incorrect columns printed	Add validation + tests

7.2 Debugging Strategies

Print NF and fields to confirm splitting before selection.
Echo parsed selectors to verify range expansion.

7.3 Performance Traps

Printing extra debug output to stdout can corrupt data streams; use stderr.

8. Extensions & Challenges

8.1 Beginner Extensions

Add a --header mode that preserves a header line.
Add a --unique selector mode.

8.2 Intermediate Extensions

Support multiple delimiters via regex FS.
Add --null output (NUL-separated) for pipeline safety.

8.3 Advanced Extensions

Multi-line CSV support with custom RS.
Add schema validation for numeric columns.

9. Real-World Connections

9.1 Industry Applications

ETL pipelines: Extract columns before loading into databases.
Log processing: Select fields for analytics or alerting.

GNU coreutils cut: Similar tool with fewer CSV features.
xsv: Rust CSV toolkit that inspires robust CSV handling.

9.3 Interview Relevance

Parsing CLI args, text processing, and basic streaming concepts.

10. Resources

10.1 Essential Reading

The AWK Programming Language by Aho, Kernighan, Weinberger - Ch. 1-2
Effective awk Programming by Arnold Robbins - Ch. 4-5

10.2 Video Resources

AWK basics and field splitting demos (YouTube, GNU awk talks)

10.3 Tools & Documentation

GNU awk User’s Guide: FPAT, FIELDWIDTHS, getline
POSIX awk spec: portability notes

P02 Line Stats Calculator for aggregation
P04 CSV to JSON/SQL for structured output

11. Self-Assessment Checklist

11.1 Understanding

I can explain FS, OFS, and how they affect output
I can parse and expand field ranges
I can explain when FPAT is required

11.2 Implementation

All functional requirements are met
All test cases pass
Errors return correct exit codes
Output is deterministic

11.3 Growth

I can explain this tool in a job interview
I documented key design decisions

12. Submission / Completion Criteria

Minimum Viable Completion:

Implements -f, -d, and --ofs
Handles invalid selectors with exit code 2
Passes at least 5 tests

Full Completion:

CSV mode implemented with FPAT
Deterministic golden demo documented

Excellence (Going Above & Beyond):

Multi-delimiter and NUL output modes
Robust CLI help and man-page style docs

13. Additional Content Rules (Hard Requirements)

13.1 Determinism

Use fixed sample input files under tests/fixtures/.
In golden demos, always show exact input files and expected output.
Avoid locale-dependent sorting or formatting.

13.2 Outcome Completeness

Include a success demo (§3.7.2) and a failure demo (§3.7.3).
Exit codes are specified in §3.7.4.

13.3 Cross-Linking

Concepts reference §5.4 and §5.10.
Cross-links provided to P04, P06, P07, and P16.

13.4 No Placeholder Text

All sections are complete and concrete.

Project 1: Field Extractor CLI Tool

Quick Reference

1. Learning Objectives

2. All Theory Needed (Per-Concept Breakdown)

2.1 Records, Fields, and Separators (FS/OFS/RS/ORS)

2.2 Field Selection Syntax and Range Expansion

2.3 Robust CSV Parsing with FPAT

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Non-Functional Requirements

3.4 Example Usage / Output

3.5 Data Formats / Schemas / Protocols

3.6 Edge Cases

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

3.7.2 Golden Path Demo (Deterministic)

3.7.3 Failure Demo (Deterministic)

3.7.4 Exit Codes

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.3 Data Structures (No Full Code)

4.4 Algorithm Overview

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 The Core Question You’re Answering

5.4 Concepts You Must Understand First

5.5 Questions to Guide Your Design

5.6 Thinking Exercise

5.7 The Interview Questions They’ll Ask

5.8 Hints in Layers

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (2-3 hours)

Phase 2: Core Functionality (3-4 hours)

Phase 3: Polish & Edge Cases (2-3 hours)

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

6.3 Test Data

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

7.3 Performance Traps

8. Extensions & Challenges

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.2 Related Open Source Projects

9.3 Interview Relevance

10. Resources

10.1 Essential Reading

10.2 Video Resources

10.3 Tools & Documentation

10.4 Related Projects in This Series

11. Self-Assessment Checklist

11.1 Understanding

11.2 Implementation

11.3 Growth

12. Submission / Completion Criteria

13. Additional Content Rules (Hard Requirements)

13.1 Determinism

13.2 Outcome Completeness

13.3 Cross-Linking

13.4 No Placeholder Text