P01: Binary & Hex Visualization Tool

The Core Question: “How does a CPU see numbers, and why must we become fluent in reading hex dumps and binary patterns before understanding machine instructions?”


Learning Objectives

By completing this project, you will:

  1. Master two’s complement representation - Understand how CPUs encode signed integers and why -1 looks like all 1s
  2. Internalize the binary-hex relationship - Instantly recognize that 4 binary bits equal 1 hex digit (the nibble)
  3. Understand bit widths and their significance - Know why 8-bit, 16-bit, 32-bit, and 64-bit representations matter
  4. Visualize byte order (endianness) - See how the same value appears differently in little-endian vs big-endian memory layouts
  5. Build professional CLI tools in C - Create robust command-line utilities with proper input parsing and error handling
  6. Think like a CPU - See numbers not as decimal abstractions but as raw bit patterns that CPUs manipulate
  7. Read hex dumps fluently - After this project, debugger output and memory inspectors become readable

Project Overview

Attribute Value
Main Language C
Alternative Languages Python, Rust, Go
Difficulty Beginner
Time Estimate Weekend (6-12 hours)
Prerequisites Basic C programming, understanding of decimal numbers
Knowledge Area Number Systems / Data Representation
Main Book “Code: The Hidden Language of Computer Hardware and Software” by Charles Petzold

Theoretical Foundation

Core Concepts

Before building this tool, you must internalize these fundamental concepts that underpin all of computing:

1. Binary: The Language CPUs Speak

CPUs don’t understand decimal. At the hardware level, everything is electrical signals that are either ON or OFF, HIGH or LOW, 1 or 0. This is why binary exists - it’s the only number system that maps directly to transistor states.

Hardware Reality:

    Transistor States → Binary Digits → Numbers
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    HIGH voltage (>2.5V)  →  1  ─┐
    LOW voltage  (<0.8V)  →  0  ─┼─→ Combine into patterns
                                 │
    8 transistors together:      │
    [ON][ON][ON][ON][ON][ON][ON][ON] = 11111111 = 255
    [ON][OFF][OFF][OFF][OFF][OFF][OFF][OFF] = 10000000 = 128

2. Positional Notation: The Universal Pattern

Every number system works the same way. Each digit’s position represents a power of the base:

DECIMAL: 1,234
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Position:    3       2       1       0
Power:      10³     10²     10¹     10⁰
Value:     1000     100      10       1
           ─────   ─────   ─────   ─────
Digit:       1       2       3       4
           ─────   ─────   ─────   ─────
Total:   1×1000 + 2×100 + 3×10 + 4×1 = 1234

BINARY: 1011
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Position:    3       2       1       0
Power:       2³      2²      2¹      2⁰
Value:        8       4       2       1
            ─────   ─────   ─────   ─────
Digit:        1       0       1       1
            ─────   ─────   ─────   ─────
Total:      1×8  +  0×4  +  1×2  +  1×1 = 11

HEXADECIMAL: 2F
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Position:    1       0
Power:      16¹     16⁰
Value:       16       1
            ─────   ─────
Digit:        2      F(=15)
            ─────   ─────
Total:      2×16 + 15×1 = 47

3. The Magic of Hexadecimal

Hexadecimal exists because of a beautiful mathematical relationship: 16 = 2^4. This means exactly 4 binary bits map to exactly 1 hex digit. This isn’t coincidence - it’s why hex became the standard for representing binary data:

THE NIBBLE RELATIONSHIP (Memorize This!)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Binary    Hex  │  Binary    Hex
──────────────┼────────────────
0000   =   0  │  1000   =   8
0001   =   1  │  1001   =   9
0010   =   2  │  1010   =   A
0011   =   3  │  1011   =   B
0100   =   4  │  1100   =   C
0101   =   5  │  1101   =   D
0110   =   6  │  1110   =   E
0111   =   7  │  1111   =   F
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Example: Convert 0xDEADBEEF to binary
D    E    A    D    B    E    E    F
↓    ↓    ↓    ↓    ↓    ↓    ↓    ↓
1101 1110 1010 1101 1011 1110 1110 1111

Result: 11011110101011011011111011101111

Notice: No calculation needed! Just table lookup.

4. Two’s Complement: How CPUs Handle Negative Numbers

CPUs can’t store minus signs. Instead, they use a brilliant encoding called two’s complement where the most significant bit (MSB) indicates sign:

TWO'S COMPLEMENT (8-bit examples)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Positive numbers: Same as unsigned
   00000001 =  1
   00000010 =  2
   01111111 = 127  (largest positive 8-bit signed)

The sign bit:
   0xxxxxxx = positive (MSB = 0)
   1xxxxxxx = negative (MSB = 1)

Negative numbers: Invert all bits, add 1
   To get -1:
   Step 1: Start with +1:      00000001
   Step 2: Invert all bits:    11111110
   Step 3: Add 1:              11111111
   Result: -1 = 11111111 (0xFF)

   To get -128:
   10000000 = -128 (most negative 8-bit signed)

VERIFICATION: Adding -1 + 1 should equal 0
   11111111  (-1)
 + 00000001  (+1)
 ──────────
  100000000  (9 bits!)
  ↑
  This bit "overflows" and is discarded in 8-bit math

  Result: 00000000 = 0 ✓

WHY THIS WORKS:
The same addition hardware works for both signed and unsigned!
The CPU doesn't care about interpretation - just bit patterns.

5. Bit Widths: Why Size Matters

Different data types use different numbers of bits. Understanding widths is essential for reading memory dumps:

COMMON BIT WIDTHS AND THEIR RANGES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Width    Bytes  Unsigned Range           Signed Range
─────    ─────  ──────────────           ────────────
8-bit      1    0 to 255                 -128 to 127
16-bit     2    0 to 65,535              -32,768 to 32,767
32-bit     4    0 to 4,294,967,295       -2,147,483,648 to 2,147,483,647
64-bit     8    0 to 18,446,744,073...   -9,223,372,036... to 9,223...

VISUAL REPRESENTATION:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

8-bit:  ████████
        └──────┘
          1 byte

16-bit: ████████ ████████
        └──────┘ └──────┘
         byte 0   byte 1

32-bit: ████████ ████████ ████████ ████████
        └──────┘ └──────┘ └──────┘ └──────┘
         byte 0   byte 1   byte 2   byte 3

64-bit: ████████ ████████ ████████ ████████ ████████ ████████ ████████ ████████
        └───────────────────────────────────────────────────────────────────────┘
                                        8 bytes

6. Endianness: Byte Order in Memory

Different CPU architectures store multi-byte values in different orders. This is called “endianness”:

THE ENDIANNESS PROBLEM
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

The 32-bit value 0x12345678: How is it stored in memory?

LITTLE-ENDIAN (x86, x86-64, ARM default):
Least significant byte at lowest address

Memory Address:  100    101    102    103
Memory Content:  0x78   0x56   0x34   0x12
                 └─────────────────────────┘
                 Reversed! LSB first.

BIG-ENDIAN (Network byte order, older PowerPC):
Most significant byte at lowest address

Memory Address:  100    101    102    103
Memory Content:  0x12   0x34   0x56   0x78
                 └─────────────────────────┘
                 "Natural" reading order.

REAL EXAMPLE - 0xDEADBEEF:
━━━━━━━━━━━━━━━━━━━━━━━━━━━
Little-endian memory dump:  EF BE AD DE
Big-endian memory dump:     DE AD BE EF

When you see a hex dump, you MUST know the endianness!

Why This Matters

Understanding binary and hex representation is the foundation for:

  1. Reading debugger output - GDB, LLDB, and other debuggers show memory in hex
  2. Understanding assembly language - Instructions and addresses are in hex
  3. Analyzing network packets - Protocol data is binary
  4. Writing embedded systems code - Direct hardware register manipulation
  5. Security analysis - Exploit development requires bit-level understanding
  6. Understanding CPU architecture - Instruction encoding is pure bit patterns

Historical Context

Binary computing dates back to the 1940s when Claude Shannon proved that Boolean algebra could be implemented with electronic circuits. The shift from decimal computers (like ENIAC) to binary was driven by hardware simplicity - a switch with two states is easier and cheaper to build than one with ten states.

Hexadecimal emerged in the 1960s as programmers needed a way to represent bytes (8 bits) more compactly than binary. The IBM System/360 (1964) popularized hex notation, and it became the universal standard for representing binary data.

Common Misconceptions

Misconception 1: “Negative numbers have a minus sign stored somewhere” Reality: There’s no minus sign. Two’s complement uses the MSB pattern to represent negativity. The same bit pattern can be positive or negative depending on interpretation.

Misconception 2: “Hex is harder to understand than decimal” Reality: Hex is actually simpler for binary data because of the exact 4-bit mapping. With practice, you’ll read hex faster than decimal for byte values.

Misconception 3: “Little-endian is backwards and wrong” Reality: Little-endian has practical advantages - adding numbers can start from the lowest address, and casting between sizes is trivial. Both are valid design choices.

Misconception 4: “My computer stores numbers as text like ‘255’” Reality: Numbers are stored as raw binary patterns. The text ‘255’ and the integer 255 are completely different in memory.


Project Specification

What You Will Build

A command-line tool called bitview that takes a number (decimal, hex, or binary) and displays its representation in all formats, with visual highlighting of important patterns like sign bits, byte boundaries, and endianness.

Functional Requirements

  1. Multi-format Input (-d, -x, -b flags):
    • Accept decimal numbers (default): ./bitview 255
    • Accept hexadecimal with 0x prefix or -x flag: ./bitview 0xDEADBEEF
    • Accept binary with 0b prefix or -b flag: ./bitview 0b11111111
  2. Complete Output Display:
    • Show decimal representation
    • Show binary representation with byte grouping
    • Show hexadecimal representation
    • Show signed interpretation (two’s complement)
    • Show bit width used
  3. Bit Width Selection (-w flag):
    • Support 8-bit, 16-bit, 32-bit, and 64-bit representations
    • Default to smallest width that fits the number
    • Show overflow warning if number exceeds selected width
  4. Endianness Display (-e flag):
    • Show both little-endian and big-endian memory layouts
    • Highlight byte order differences visually
  5. Byte Highlighting (--bytes):
    • Separate binary output into byte groups
    • Show hex digit alignment with binary nibbles
  6. Sign Bit Visualization:
    • Clearly mark the sign bit in signed interpretations
    • Show two’s complement breakdown for negative numbers

Non-Functional Requirements

  • Performance: Handle all inputs up to 64-bit instantly
  • Portability: Compile and run on any POSIX system with standard C library
  • Robustness: Handle all invalid inputs gracefully with meaningful error messages
  • Usability: Self-documenting with --help flag

Example Usage/Output

$ ./bitview 255
Decimal:     255
Binary:      00000000 00000000 00000000 11111111
Hex:         0x000000FF
Signed:      255 (positive)
Bit width:   32-bit

$ ./bitview -1
Decimal:     -1
Binary:      11111111 11111111 11111111 11111111
             ^
             └─ Sign bit (1 = negative)
Hex:         0xFFFFFFFF
Signed:      -1 (two's complement)
Bit width:   32-bit

$ ./bitview 0xDEADBEEF
Decimal:     3735928559
Binary:      11011110 10101101 10111110 11101111
             ^^^^^^^^ ^^^^^^^^ ^^^^^^^^ ^^^^^^^^
                DE       AD       BE       EF
Hex:         0xDEADBEEF
Signed:      -559038737 (two's complement)
Bit width:   32-bit

$ ./bitview -w 8 127
Decimal:     127
Binary:      01111111
             ^
             └─ Sign bit (0 = positive)
Hex:         0x7F
Signed:      127 (positive, max for signed 8-bit)
Bit width:   8-bit

$ ./bitview -w 8 128
Decimal:     128
Binary:      10000000
             ^
             └─ Sign bit (1 = negative when interpreted as signed)
Hex:         0x80
Unsigned:    128
Signed:      -128 (two's complement, min for signed 8-bit)
Bit width:   8-bit

$ ./bitview -e 0x12345678
Decimal:     305419896
Binary:      00010010 00110100 01010110 01111000
Hex:         0x12345678

Memory Layout (32-bit):
  Big-endian:    [0x12] [0x34] [0x56] [0x78]  (address 0 → 3)
  Little-endian: [0x78] [0x56] [0x34] [0x12]  (address 0 → 3)
                 ^Most x86/x64 systems use little-endian

$ ./bitview --help
Usage: bitview [OPTIONS] NUMBER

Display number in binary, hexadecimal, and decimal formats.

Input Formats:
  123           Decimal (default)
  0xFF          Hexadecimal (0x prefix)
  0b1010        Binary (0b prefix)

Options:
  -w, --width N   Force bit width (8, 16, 32, or 64)
  -e, --endian    Show endianness memory layouts
  -s, --signed    Interpret as signed (default for negative input)
  -u, --unsigned  Interpret as unsigned
  -h, --help      Show this help message

Examples:
  bitview 255           Show 255 in all formats
  bitview -w 8 -1       Show -1 as 8-bit signed
  bitview 0xCAFE        Show hex input in all formats
  bitview -e 0x12345678 Show byte order layouts

Real World Outcome

After completing this tool, you’ll use it constantly for:

  • Debugging sessions: Quickly verify what a value looks like in binary/hex
  • Understanding error codes: System error codes are often hex
  • Analyzing bit flags: See which bits are set in configuration values
  • Learning assembly: Verify instruction encoding by checking opcodes
  • Network debugging: Convert between address formats

Solution Architecture

High-Level Design

┌──────────────────────────────────────────────────────────────────────────┐
│                             bitview                                       │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌───────────┐ │
│  │   Argument  │───▶│   Input     │───▶│   Display   │───▶│  Output   │ │
│  │   Parser    │    │   Parser    │    │  Formatter  │    │  Printer  │ │
│  └─────────────┘    └─────────────┘    └─────────────┘    └───────────┘ │
│         │                  │                  │                 │        │
│         ▼                  ▼                  ▼                 ▼        │
│  ┌────────────────────────────────────────────────────────────────────┐ │
│  │                        Data Model                                   │ │
│  │  ┌─────────────────────────────────────────────────────────────┐   │ │
│  │  │  struct number_display {                                     │   │ │
│  │  │      uint64_t value;         // The raw value                │   │ │
│  │  │      int64_t  signed_value;  // Signed interpretation        │   │ │
│  │  │      int      bit_width;     // 8, 16, 32, or 64             │   │ │
│  │  │      bool     is_negative;   // Original input was negative  │   │ │
│  │  │      enum     input_base;    // DEC, HEX, BIN                │   │ │
│  │  │  }                                                           │   │ │
│  │  └─────────────────────────────────────────────────────────────┘   │ │
│  └────────────────────────────────────────────────────────────────────┘ │
│                                                                          │
└──────────────────────────────────────────────────────────────────────────┘

Key Components

Component Responsibility Key Functions
Argument Parser Parse command-line flags and extract input parse_args(), validate options
Input Parser Convert input string to numeric value parse_decimal(), parse_hex(), parse_binary()
Display Formatter Format value for output in all bases format_binary(), format_hex(), format_decimal()
Output Printer Render formatted output to stdout print_display(), handle alignment

Data Structures

// Input parsing result
typedef enum {
    BASE_DECIMAL,
    BASE_HEXADECIMAL,
    BASE_BINARY
} input_base_t;

// Command-line options
typedef struct {
    int bit_width;           // 0 = auto-detect, or 8/16/32/64
    bool show_endian;        // Display byte order layouts
    bool force_signed;       // Force signed interpretation
    bool force_unsigned;     // Force unsigned interpretation
    input_base_t input_base; // Which base the input is in
} options_t;

// Parsed and processed number
typedef struct {
    uint64_t raw_value;      // The unsigned representation
    int64_t signed_value;    // Signed interpretation (if applicable)
    int bit_width;           // Determined or specified width
    bool input_was_negative; // True if input started with '-'
    input_base_t input_base; // How the input was specified
} number_data_t;

// Output strings ready for printing
typedef struct {
    char decimal[32];        // Decimal representation
    char hex[20];            // Hex with prefix
    char binary[80];         // Binary with spaces
    char signed_info[64];    // "positive" or "negative" with value
    char endian_big[64];     // Big-endian layout
    char endian_little[64];  // Little-endian layout
} display_strings_t;

Algorithm Overview

Main Program Flow:

1. Parse command-line arguments
   ├── Extract flags (-w, -e, -s, -u)
   ├── Extract input number string
   └── Validate options (e.g., width must be 8/16/32/64)

2. Detect input format
   ├── Starts with "0x" or "0X" → hexadecimal
   ├── Starts with "0b" or "0B" → binary
   ├── Starts with "-" → negative decimal
   └── Otherwise → positive decimal

3. Parse input to numeric value
   ├── For hex: iterate chars, multiply by 16, add digit value
   ├── For binary: iterate chars, shift left, add bit
   └── For decimal: standard strtol/strtoll

4. Determine bit width (if not specified)
   ├── Value fits in 8 bits → 8
   ├── Value fits in 16 bits → 16
   ├── Value fits in 32 bits → 32
   └── Otherwise → 64

5. Calculate signed interpretation
   ├── Check if MSB is set for given width
   ├── If set: calculate two's complement value
   └── Store both unsigned and signed values

6. Format output strings
   ├── Binary: convert each bit, group by 8s
   ├── Hex: convert each nibble, pad to width
   └── Decimal: sprintf the values

7. Print formatted output
   ├── Print each representation with labels
   ├── If -e flag: print endian layouts
   └── Align output for readability

Binary Conversion Algorithm (Decimal to Binary String):

Input: value (uint64_t), width (int)
Output: binary string with spaces between bytes

1. Create output buffer of appropriate size
2. For i from (width-1) down to 0:
   a. Extract bit i: (value >> i) & 1
   b. Append '0' or '1' to output
   c. If (i % 8 == 0) and (i != 0): append space
3. Return output string

Two’s Complement Interpretation:

Input: value (uint64_t), width (int)
Output: signed_value (int64_t)

1. Create mask for sign bit: 1 << (width - 1)
2. If (value & mask) is non-zero:
   a. Number is negative
   b. Extend sign bits: signed_value = value | ~((1 << width) - 1)
3. Else:
   a. Number is positive
   b. signed_value = value
4. Return signed_value

Implementation Guide

Development Environment Setup

# Required tools
# On macOS:
xcode-select --install

# On Linux:
sudo apt-get install build-essential

# Create project structure
mkdir -p bitview/{src,include,tests}
cd bitview

# Verify compiler
gcc --version
# or
clang --version

Project Structure

bitview/
├── src/
│   ├── main.c              # Entry point, main loop
│   ├── parser.c            # Input parsing functions
│   ├── formatter.c         # Output formatting functions
│   └── display.c           # Printing and layout functions
├── include/
│   ├── bitview.h           # Shared data structures
│   ├── parser.h            # Parser function declarations
│   ├── formatter.h         # Formatter function declarations
│   └── display.h           # Display function declarations
├── tests/
│   ├── test_parser.c       # Unit tests for parser
│   ├── test_formatter.c    # Unit tests for formatter
│   └── run_tests.sh        # Integration test script
├── Makefile
└── README.md

The Core Question You’re Answering

“How can I instantly visualize any number as the CPU sees it - in binary, with clear byte boundaries, sign bits, and endianness?”

This question drives every design decision:

  • Why binary output? Because that’s what the CPU actually stores
  • Why byte grouping? Because memory is addressed by bytes
  • Why sign bit marking? Because signed vs unsigned interpretation changes meaning
  • Why endianness display? Because memory layout affects debugging

Concepts You Must Understand First

Before writing code, verify you can answer these questions:

Concept Self-Test Question Where to Learn
Positional notation How is 1011 binary calculated to 11 decimal? “Code” Ch. 7-9
Division-remainder algorithm How do you convert 25 to binary by hand? CS:APP Ch. 2.1
Two’s complement Why is -1 represented as 0xFF in 8 bits? CS:APP Ch. 2.2
Bit widths What’s the range of a signed 8-bit integer? CS:APP Ch. 2.2
Endianness How is 0x12345678 stored in little-endian? CS:APP Ch. 2.1.3
C bit manipulation What does (x >> 4) & 0xF extract? K&R Ch. 2.9

Questions to Guide Your Design

Work through these before writing code:

  1. How will you detect the input format?
    • Check for “0x”/”0X” prefix → hex
    • Check for “0b”/”0B” prefix → binary
    • Check for leading “-“ → negative decimal
    • What about invalid prefixes like “0z”?
  2. How will you handle negative numbers?
    • Parse the absolute value, then apply two’s complement
    • Or parse as signed directly?
    • What if someone inputs “-0xFF”?
  3. How will you determine bit width if not specified?
    • Find minimum width that contains the value
    • Special case for negative numbers (need sign bit)
    • What about 0? (Could be any width)
  4. How will you format binary output for readability?
    • Space every 8 bits (byte boundary)
    • Align hex digits under corresponding nibbles?
    • Leading zeros to fill the width?
  5. How will you structure your code for testability?
    • Pure functions that take input, return output
    • No global state
    • Separate parsing from formatting from printing

Thinking Exercise

Before coding, trace through these by hand:

Exercise 1: Manual Conversion Convert each of these without a calculator:

  • 173 decimal to binary
  • 0xBEEF to decimal
  • 10110011 binary to hex
  • -5 to 8-bit two’s complement

Exercise 2: Width Determination For each value, what’s the minimum bit width needed?

  • 255: ____ bits (unsigned)
  • 256: ____ bits (unsigned)
  • -1: ____ bits (signed)
  • -129: ____ bits (signed)

Exercise 3: Endianness Layout Write out the memory layout for 0x12345678:

  • Big-endian: [__] [] [] [__]
  • Little-endian: [__] [] [] [__]

Exercise 4: Design Sketch On paper, write pseudocode for:

  • parse_hex_string(const char* str) → returns uint64_t
  • format_as_binary(uint64_t value, int width) → returns char*
  • get_signed_interpretation(uint64_t value, int width) → returns int64_t

Hints in Layers

Layer 1: Getting Started

If you’re stuck on where to begin:

  • Start with just decimal to binary conversion for unsigned numbers
  • Hardcode 32-bit width initially
  • Ignore command-line arguments - just use a hardcoded test value
  • Get the core algorithm working before adding features

Layer 2: Core Algorithm Hints

For decimal to binary:

// Extract each bit from MSB to LSB
for (int i = width - 1; i >= 0; i--) {
    int bit = (value >> i) & 1;
    // Append '0' or '1' to output
}

For hex to decimal:

// Each hex digit adds to running total
uint64_t result = 0;
for (each char in input) {
    result = result * 16 + digit_value(char);
}

For two’s complement:

// Check if sign bit is set
uint64_t sign_bit = 1ULL << (width - 1);
if (value & sign_bit) {
    // Negative: extend sign bits
    uint64_t mask = ~((1ULL << width) - 1);
    return (int64_t)(value | mask);
}

Layer 3: Implementation Details

For hex character to value:

int hex_char_to_int(char c) {
    if (c >= '0' && c <= '9') return c - '0';
    if (c >= 'a' && c <= 'f') return c - 'a' + 10;
    if (c >= 'A' && c <= 'F') return c - 'A' + 10;
    return -1; // Invalid
}

For value to hex character:

char int_to_hex_char(int value) {
    if (value < 10) return '0' + value;
    return 'A' + value - 10;
}

Layer 4: Debugging Hints

Common bugs to watch for:

  • Integer overflow when parsing (use unsigned long long)
  • Off-by-one in bit positions
  • Forgetting that C strings need null terminators
  • Sign extension when casting between sizes
  • Using %d for values larger than int

Test with these values:

0       → Should work (edge case: zero)
1       → Simplest positive
255     → Max 8-bit unsigned
256     → First value needing 9+ bits
-1      → All bits set
0xFFFFFFFF → 32-bit max
0x8000000000000000 → 64-bit MSB only

Interview Questions

After completing this project, you should be able to answer:

  1. “How would you convert a decimal number to binary without built-in functions?”
    • Explain division-remainder algorithm
    • Mention that you reverse the remainders (or build string backwards)
    • Complexity is O(log n)
  2. “What is two’s complement and why do we use it?”
    • Same addition hardware works for signed and unsigned
    • Only one representation of zero
    • Negation is simple: invert bits, add 1
  3. “Why do we use hexadecimal instead of decimal for memory addresses?”
    • Perfect mapping: 4 bits = 1 hex digit
    • Much more compact than binary
    • Easy to see byte boundaries
  4. “What is endianness and when does it matter?”
    • Order of bytes in multi-byte values
    • Matters for: network protocols, file formats, cross-platform code
    • x86 is little-endian, network order is big-endian
  5. “How can the same bit pattern represent different values?”
    • Interpretation depends on context (signed vs unsigned)
    • 0xFF is 255 unsigned but -1 signed in 8 bits
    • CPU doesn’t care - uses same hardware for both

Books That Will Help

Topic Book Chapter Why
Binary fundamentals “Code” by Petzold Ch. 7-9 Builds intuition from first principles
Two’s complement CS:APP Ch. 2.2 Rigorous explanation with examples
Endianness CS:APP Ch. 2.1.3 Shows exactly how bytes are laid out
Bit manipulation in C K&R Ch. 2.9 Classic reference for C operators
Number representation CS:APP Ch. 2.1-2.2 Complete coverage of data formats

Implementation Phases

Phase 1: Core Parsing (Day 1, 2-3 hours)

Goals:

  • Parse a decimal string to uint64_t
  • Parse a hex string (with 0x prefix) to uint64_t
  • Handle basic error cases (invalid characters)

Checkpoint: You can run ./bitview 255 and ./bitview 0xFF and get the same internal value.

Phase 2: Binary Formatting (Day 1, 2-3 hours)

Goals:

  • Convert uint64_t to binary string
  • Add byte-boundary spacing
  • Handle specified bit widths

Checkpoint: You can see 255 displayed as 11111111 and 256 as 00000001 00000000.

Phase 3: Complete Output (Day 2, 2-3 hours)

Goals:

  • Display all three representations (dec, hex, bin)
  • Show signed interpretation
  • Proper output formatting and alignment

Checkpoint: Your output matches the example specification.

Phase 4: Polish and Features (Day 2, 2-3 hours)

Goals:

  • Add command-line argument parsing
  • Add endianness display
  • Add help message
  • Handle all edge cases

Checkpoint: Tool is complete and handles all test cases.

Key Implementation Decisions

Decision Options Recommendation Rationale
Integer size int, long, int64_t uint64_t/int64_t Consistent 64-bit support across platforms
String handling Dynamic allocation vs static buffers Static buffers Simpler, no memory leaks, fixed max sizes
Argument parsing getopt vs manual getopt or getopt_long Standard, robust, handles edge cases
Error handling errno + return codes vs fprintf fprintf + exit Simple for CLI tool, immediate feedback

Testing Strategy

Test Categories

Category Purpose Examples
Boundary values Test limits of each bit width 0, 127, 128, 255, 256, 32767, etc.
Sign handling Verify two’s complement works -1, -128, INT_MIN, etc.
Input formats All prefix styles work 0x, 0X, 0b, 0B, no prefix
Error cases Invalid input rejected “hello”, “0xGG”, “0b123”
Edge cases Special situations Leading zeros, very large numbers

Critical Test Cases

# Boundary values
./bitview 0           # Zero
./bitview 1           # Smallest positive
./bitview 127         # Max 7-bit signed
./bitview 128         # First value needing 8th bit
./bitview 255         # Max 8-bit unsigned
./bitview 256         # First 9-bit value
./bitview 65535       # Max 16-bit unsigned
./bitview 2147483647  # Max 32-bit signed
./bitview 4294967295  # Max 32-bit unsigned

# Negative numbers
./bitview -1          # Should show all 1s
./bitview -128        # Min 8-bit signed
./bitview -129        # Needs 16 bits

# Hexadecimal input
./bitview 0xFF        # 255
./bitview 0xDEADBEEF  # Famous test value
./bitview 0xffffffff  # 32-bit max (lowercase)
./bitview 0XABCD      # Uppercase prefix

# Binary input
./bitview 0b11111111  # 255
./bitview 0b10000000  # 128

# With width flag
./bitview -w 8 255    # Show as 8-bit
./bitview -w 16 255   # Show as 16-bit
./bitview -w 8 256    # Should warn about overflow

# Error cases (should fail gracefully)
./bitview hello       # Not a number
./bitview 0xGHIJ      # Invalid hex
./bitview 0b123       # Invalid binary
./bitview ""          # Empty input

Test Data File

#!/bin/bash
# test_bitview.sh - Run all tests

PASS=0
FAIL=0

test_case() {
    local input="$1"
    local expected_contains="$2"
    local result=$(./bitview $input 2>&1)

    if echo "$result" | grep -q "$expected_contains"; then
        echo "PASS: $input"
        ((PASS++))
    else
        echo "FAIL: $input - expected '$expected_contains'"
        echo "  Got: $result"
        ((FAIL++))
    fi
}

# Run tests
test_case "255" "0xFF"
test_case "255" "11111111"
test_case "-1" "0xFFFFFFFF"
test_case "0xDEADBEEF" "3735928559"
test_case "0b10101010" "170"

echo ""
echo "Results: $PASS passed, $FAIL failed"

Common Pitfalls & Debugging

Frequent Mistakes

Pitfall Symptom Solution
Using int instead of uint64_t Overflow on large values Always use fixed-width types
Forgetting null terminator Garbage characters in output Ensure all strings are terminated
Wrong shift direction Bits in wrong position Draw out the operation on paper
Sign extension on cast Unexpected large positive values Use explicit masking
Not handling zero Empty output or crash Special case: if value == 0, output “0”

Debugging Strategies

Print intermediate values:

// Add during development, remove later
printf("DEBUG: parsed value = %llu (0x%llX)\n", value, value);
printf("DEBUG: sign bit position = %d\n", width - 1);
printf("DEBUG: sign bit value = %d\n", (value >> (width - 1)) & 1);

Test each function in isolation:

// Test parser alone
assert(parse_hex("FF") == 255);
assert(parse_hex("0xFF") == 255);
assert(parse_binary("11111111") == 255);

// Test formatter alone
assert(strcmp(format_hex(255), "0xFF") == 0);
assert(strcmp(format_binary(255, 8), "11111111") == 0);

Use a debugger:

# Compile with debug symbols
gcc -g -O0 main.c -o bitview

# Run in GDB
gdb ./bitview
(gdb) break main
(gdb) run 255
(gdb) print value
(gdb) step

Performance Traps

For this project, performance isn’t critical (everything is O(log n) at worst), but watch for:

  • Unnecessary string copies
  • Reallocating buffers in loops
  • Computing the same value multiple times

Extensions & Challenges

Beginner Extensions

  • Color output: Use ANSI codes to highlight sign bits in red
  • ASCII display: For byte-sized values, show the ASCII character if printable
  • Octal output: Add octal (base 8) representation

Intermediate Extensions

  • Interactive mode: REPL that accepts continuous input
  • Bit field highlighting: Highlight specific bit ranges (e.g., bits 4-7)
  • IEEE 754 floating point: Show float/double bit layout (sign, exponent, mantissa)
  • Arbitrary bases: Support base 3, base 7, etc.

Advanced Extensions

  • Instruction decoder: For x86, decode common instruction patterns
  • Memory dump parsing: Accept hex dump format and decode
  • GUI version: Create a graphical version with bit toggles
  • Network byte order: Convert between host and network byte order

Real-World Connections

Industry Applications

  • Debugging: Every debugger shows memory as hex dumps
  • Embedded systems: Direct register manipulation requires bit-level understanding
  • Network protocols: Packet headers are parsed bit by bit
  • Cryptography: Hash functions and encryption work at bit level
  • Graphics: Color values, pixel formats all use binary/hex
  • xxd: Hex dump utility (compare your output to this)
  • od: Octal dump (original Unix tool)
  • hexdump: Another standard hex viewer
  • Python struct module: Binary packing/unpacking

Interview Relevance

This project demonstrates:

  • Understanding of fundamental computer science concepts
  • Ability to build useful CLI tools
  • Knowledge of how CPUs represent data
  • C programming competence

Resources

Essential Reading

  • “Code” by Charles Petzold - Chapters 7-9 on binary and counting
  • CS:APP - Chapter 2 on information representation
  • K&R - Chapter 2.9 on bitwise operators

Online References

  • Wikipedia: “Two’s complement” - Clear explanation with examples
  • Stanford CS107: Binary and Data lab exercises
  • Computerphile YouTube: Binary and number systems videos

Tools

  • Calculator with programmer mode: Windows Calculator, macOS, online tools
  • GDB/LLDB: Practice reading hex in a debugger
  • Python: Quick verification with bin(), hex(), int(x, base)

Self-Assessment Checklist

Before considering this project complete, verify:

Conceptual Understanding

  • I can convert between decimal, binary, and hex without tools
  • I understand why -1 is 0xFFFFFFFF in 32 bits
  • I can explain two’s complement to someone else
  • I know the nibble table (4 bits to hex) by heart
  • I understand little-endian vs big-endian memory layout

Implementation Skills

  • My tool correctly handles all positive values up to 64 bits
  • My tool correctly shows negative number representations
  • Invalid input produces meaningful error messages
  • All command-line options work as specified
  • Output is properly formatted and aligned

Interview Readiness

  • I can explain the division-remainder algorithm clearly
  • I can describe why hex is used for memory addresses
  • I can discuss signed vs unsigned representation tradeoffs
  • I can explain endianness and when it matters

Submission/Completion Criteria

Minimum Viable Completion:

  • Accepts decimal input
  • Outputs binary and hex representations
  • Handles values 0 through 2^32-1
  • Basic error handling for invalid input

Full Completion:

  • All input formats (dec, hex, bin) work
  • Negative numbers with two’s complement
  • Bit width selection (-w flag)
  • Proper sign bit visualization
  • Help message with usage examples

Excellence:

  • Endianness display mode
  • Color-coded output
  • Byte and nibble alignment in output
  • Comprehensive test suite
  • Clean, well-documented code

This project is the foundation for understanding CPU architecture. The ability to read and interpret binary and hex values fluently is essential for every project that follows. Take your time, do the paper exercises, and ensure you truly understand before moving on.