Project 1: Universal Base Converter

Convert values across decimal, binary, and hex with strict validation.

Quick Reference

Attribute Value
Difficulty Level 1: Beginner
Time Estimate 4-8 hours
Main Programming Language C or Python (Alternatives: Rust, Go, JavaScript)
Alternative Programming Languages Rust, Go, JavaScript
Coolness Level Level 2
Business Potential Level 1
Prerequisites Positional notation, Basic loops, Integer arithmetic
Key Topics positional notation, base conversion, input validation

1. Learning Objectives

By completing this project, you will:

  1. Translate between representations with explicit rules.
  2. Validate and normalize input at the byte level.
  3. Produce outputs that are deterministic and testable.

2. All Theory Needed (Per-Concept Breakdown)

Positional Number Systems

Fundamentals Positional number systems encode value by place value: the same digit has different weight depending on its position. The base (radix) defines how many symbols are available and what each position means. In base 10, positions represent powers of 10; in base 2, powers of 2; in base 16, powers of 16. A number is therefore a weighted sum of digit values. This is why 123 means 110^2 + 210^1 + 310^0, and why 0x7B means 716^1 + 11*16^0. The value is abstract; the notation is a choice. Conversion is the act of rewriting the same value in a different notation without changing its meaning.

Deep Dive into the concept The essence of positional notation is the place-value expansion. Every digit d_i contributes d_i * base^i to the total. This is both a definition and a computation algorithm. If you can decompose a number into place values, you can reconstruct it in any base. This is why conversions are deterministic and reversible.

Converting from any base to decimal is a fold operation: start with 0, multiply by the base, add the next digit. Each step preserves the invariant that the accumulator equals the value of the digits seen so far. This method is robust, avoids large exponent tables, and works for arbitrarily long inputs as long as you track overflow or use big integers.

Converting from decimal to another base uses repeated division. Each division by the base yields a remainder that becomes the next digit in the target base. The sequence of remainders, read in reverse, forms the representation. This is a direct consequence of the division algorithm in arithmetic and works for any base greater than 1. The invariant here is that the original number equals (quotient * base + remainder) at each step. When the quotient becomes 0, the remainders contain the full representation.

The most important practical optimization is direct base-to-base conversion when bases are powers of two. Base 2 and base 16 are tightly related: 16 is 2^4, so each hex digit maps to exactly four binary digits. This creates a perfect grouping: split a binary string into 4-bit chunks, and you have hex digits; expand each hex digit back into 4 bits and you have binary. This is why hex is the standard human-facing representation of bytes, addresses, and machine code: it compresses by a factor of four while preserving byte boundaries.

Another key idea is normalization. Leading zeros do not change value, but they do change representation. A conversion system should define whether it keeps or drops leading zeros, because that affects display width and alignment. For example, a memory address might be shown as 16 hex digits on a 64-bit system even if the leading digits are zero. The choice is not about value, but about context and readability.

Failure modes in conversion systems are almost always about validation and overflow. Each base has a finite set of valid digits, and any digit outside that set must be rejected. If you parse long inputs into fixed-width integers, you must detect overflow and either signal it or switch to a larger representation. These are the same risks you face in real system parsers, which is why the conversion project is not trivial: it forces you to define and enforce the rules of your numeric system.

Finally, remember that a numeral system is just a notation for values. A file, a memory dump, or a protocol does not store numbers in decimal or hex; it stores bytes. Hex and binary are lenses you apply to the same data. The skill you are learning is the ability to move between these lenses intentionally and correctly.

How this fit on projects This concept is a primary pillar for this project and appears again in other projects in this folder.

Definitions & key terms

  • Positional Number Systems definition, scope, and usage in this project context.
  • Key vocabulary used throughout the implementation.

Mental model diagram

[Input] -> [Rule/Conversion] -> [Value] -> [Representation]

How it works (step-by-step, with invariants and failure modes)

  1. Identify the input representation and its constraints.
  2. Apply the conversion or interpretation rules.
  3. Validate bounds and emit a canonical output.
  4. Invariant: the underlying value is preserved across representations.
  5. Failure modes: invalid digits, width overflow, or order mismatch.

Minimal concrete example

INPUT: small example value
PROCESS: apply the core rule in this concept
OUTPUT: normalized representation

Common misconceptions

  • Confusing representation with value.
  • Skipping validation because “inputs look right”.

Check-your-understanding questions

  1. Explain the concept in your own words.
  2. Predict the output of a simple conversion scenario.
  3. Why does this concept matter for correct parsing?

Check-your-understanding answers

  1. The concept is the rule set that maps representation to meaning.
  2. The output follows the defined rules and preserves value.
  3. Without it, you will misinterpret bytes or bit fields.

Real-world applications

  • Binary file parsing and validation
  • Protocol field extraction
  • Debugging with hexdumps

Where you’ll apply it

  • In this project, during the core parsing and output steps.
  • Also used in: P01-universal-base-converter, P03-bitwise-logic-calculator, P09-hexdump-clone.

References

  • “Computer Systems: A Programmer’s Perspective” - Ch. 2
  • “Code” by Charles Petzold - Ch. 7-8

Key insights This concept is a repeatable rule that transforms raw bits into reliable meaning.

Summary You can only trust your output when you apply this concept deliberately and consistently.

Homework/Exercises to practice the concept

  1. Do a manual conversion or extraction by hand.
  2. Build a tiny test case and predict the output.

Solutions to the homework/exercises

  1. The manual process should match your tool output.
  2. If the output differs, revisit your assumptions about representation.

3. Project Specification

3.1 What You Will Build

Build a focused tool that takes structured input, applies the project-specific transformations, and emits a precise, verifiable output. Include input validation, clear error messages, and deterministic formatting. Exclude any optional UI features until the core logic is correct.

3.2 Functional Requirements

  1. Validated Input: Reject malformed or out-of-range values.
  2. Deterministic Output: Same input always yields the same output.
  3. Human-Readable Display: Show results in both hex and binary where relevant.

3.3 Non-Functional Requirements

  • Performance: Must handle small files or values instantly.
  • Reliability: Must not crash on invalid inputs.
  • Usability: Outputs must be unambiguous and aligned.

3.4 Example Usage / Output

$ run-tool --example
[expected output goes here]

3.5 Data Formats / Schemas / Protocols

  • Input: simple CLI arguments or a small config file.
  • Output: fixed-width hex, optional binary, and labeled fields.

3.6 Edge Cases

  • Empty input
  • Invalid digits
  • Maximum-width values
  • Unexpected file length

3.7 Real World Outcome

The learner should be able to run the tool and compare output against a known reference with no ambiguity.

3.7.1 How to Run (Copy/Paste)

  • Build commands: make or equivalent
  • Run commands: ./tool --args
  • Working directory: project root

3.7.2 Golden Path Demo (Deterministic)

A known input produces a known output that matches a prewritten test vector.

3.7.3 If CLI: exact terminal transcript

$ ./tool --demo
[result line 1]
[result line 2]

4. Solution Architecture

4.1 High-Level Design

[Input] -> [Parser] -> [Core Logic] -> [Formatter] -> [Output]

4.2 Key Components

Component Responsibility Key Decisions
Parser Validate and normalize input Strict digit validation
Core Logic Apply conversion or extraction rules Keep math explicit
Formatter Render hex/binary/text views Fixed-width alignment

4.4 Data Structures (No Full Code)

  • Fixed-width integer values
  • Byte buffers for file I/O
  • Simple structs for labeled fields

4.4 Algorithm Overview

Key Algorithm: Core Transformation

  1. Parse input into a canonical internal value.
  2. Apply project-specific conversion or extraction rules.
  3. Format the result for display.

Complexity Analysis:

  • Time: O(n) in input size
  • Space: O(1) to O(n) depending on buffering

5. Implementation Guide

5.1 Development Environment Setup

# Use a standard compiler and a minimal build script

5.2 Project Structure

project-root/
├── src/
│   ├── main.ext
│   ├── parser.ext
│   └── formatter.ext
├── tests/
│   └── test_vectors.txt
└── README.md

5.3 The Core Question You’re Answering

“How do I transform a raw representation into a reliable value and show it clearly?”

5.4 Concepts You Must Understand First

  • See the Theory section above and confirm you can explain each concept without notes.

5.5 Questions to Guide Your Design

  1. How will you validate inputs?
  2. How will you normalize outputs for comparison?
  3. How will you handle errors without hiding failures?

5.6 Thinking Exercise

Before coding, draw the data flow from input to output and label every transformation step.

5.7 The Interview Questions They’ll Ask

  1. “How do you validate binary or hex input?”
  2. “How do you detect overflow or width mismatch?”
  3. “Why is deterministic output important?”
  4. “How would you test your tool with known vectors?”

5.8 Hints in Layers

Hint 1: Start by parsing and validating a single fixed-size input.

Hint 2: Implement the core transformation in isolation and test it.

Hint 3: Add formatting after correctness is proven.

Hint 4: Compare outputs against a trusted reference tool.

5.9 Books That Will Help

Topic Book Chapter
Data representation “Computer Systems: A Programmer’s Perspective” Ch. 2
Number systems “Code” by Charles Petzold Ch. 7-8

5.10 Implementation Phases

Phase 1: Foundation (2-4 hours)

Goals:

  • Input parsing
  • Basic validation

Tasks:

  1. Implement digit validation.
  2. Parse into internal value.

Checkpoint: Parse test vectors correctly.

Phase 2: Core Functionality (4-8 hours)

Goals:

  • Core transformation logic
  • Primary output format

Tasks:

  1. Implement core math rules.
  2. Render hex and binary outputs.

Checkpoint: Output matches known results.

Phase 3: Polish & Edge Cases (2-4 hours)

Goals:

  • Error handling
  • Edge cases

Tasks:

  1. Add invalid input tests.
  2. Add max-width tests.

Checkpoint: No crashes on invalid input.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Input format hex/dec/bin support all flexibility
Output width fixed/variable fixed compare easily

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Validate conversions known vectors
Integration Tests CLI parsing sample files
Edge Case Tests boundaries max/min values

6.2 Critical Test Cases

  1. Zero input: output should be zero in all bases.
  2. Max width: output should not overflow.
  3. Invalid digit: error message, no crash.

6.3 Test Data

inputs: 0, 1, 255, 256
expected: 0x0, 0x1, 0xFF, 0x100

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Wrong base incorrect output re-check digit map
Overflow wrapped values add bounds checks
Misalignment messy output pad columns

7.2 Debugging Strategies

  • Compare against a trusted tool for random inputs.
  • Print intermediate values in binary.

7.3 Performance Traps

  • Avoid reading entire files when streaming is enough.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add binary output padding.
  • Add uppercase/lowercase hex toggles.

8.2 Intermediate Extensions

  • Add batch conversion from a file.
  • Add JSON output mode.

8.3 Advanced Extensions

  • Add big-integer support.
  • Add a reversible binary patch feature.

9. Real-World Connections

9.1 Industry Applications

  • Binary file parsing and validation tools
  • Protocol debugging utilities
  • xxd-like hex tools
  • file-type identification utilities

9.3 Interview Relevance

  • Bit manipulation and data representation questions

10. Resources

10.1 Essential Reading

  • “Computer Systems: A Programmer’s Perspective” - Ch. 2
  • “Code” by Charles Petzold - Ch. 7-8

10.2 Video Resources

  • University lecture on data representation (search by course name)