P01: Binary & Hex Visualization Tool
The Core Question: “How does a CPU see numbers, and why must we become fluent in reading hex dumps and binary patterns before understanding machine instructions?”
Learning Objectives
By completing this project, you will:
- Master two’s complement representation - Understand how CPUs encode signed integers and why -1 looks like all 1s
- Internalize the binary-hex relationship - Instantly recognize that 4 binary bits equal 1 hex digit (the nibble)
- Understand bit widths and their significance - Know why 8-bit, 16-bit, 32-bit, and 64-bit representations matter
- Visualize byte order (endianness) - See how the same value appears differently in little-endian vs big-endian memory layouts
- Build professional CLI tools in C - Create robust command-line utilities with proper input parsing and error handling
- Think like a CPU - See numbers not as decimal abstractions but as raw bit patterns that CPUs manipulate
- Read hex dumps fluently - After this project, debugger output and memory inspectors become readable
Project Overview
| Attribute | Value |
|---|---|
| Main Language | C |
| Alternative Languages | Python, Rust, Go |
| Difficulty | Beginner |
| Time Estimate | Weekend (6-12 hours) |
| Prerequisites | Basic C programming, understanding of decimal numbers |
| Knowledge Area | Number Systems / Data Representation |
| Main Book | “Code: The Hidden Language of Computer Hardware and Software” by Charles Petzold |
Theoretical Foundation
Core Concepts
Before building this tool, you must internalize these fundamental concepts that underpin all of computing:
1. Binary: The Language CPUs Speak
CPUs don’t understand decimal. At the hardware level, everything is electrical signals that are either ON or OFF, HIGH or LOW, 1 or 0. This is why binary exists - it’s the only number system that maps directly to transistor states.
Hardware Reality:
Transistor States → Binary Digits → Numbers
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
HIGH voltage (>2.5V) → 1 ─┐
LOW voltage (<0.8V) → 0 ─┼─→ Combine into patterns
│
8 transistors together: │
[ON][ON][ON][ON][ON][ON][ON][ON] = 11111111 = 255
[ON][OFF][OFF][OFF][OFF][OFF][OFF][OFF] = 10000000 = 128
2. Positional Notation: The Universal Pattern
Every number system works the same way. Each digit’s position represents a power of the base:
DECIMAL: 1,234
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Position: 3 2 1 0
Power: 10³ 10² 10¹ 10⁰
Value: 1000 100 10 1
───── ───── ───── ─────
Digit: 1 2 3 4
───── ───── ───── ─────
Total: 1×1000 + 2×100 + 3×10 + 4×1 = 1234
BINARY: 1011
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Position: 3 2 1 0
Power: 2³ 2² 2¹ 2⁰
Value: 8 4 2 1
───── ───── ───── ─────
Digit: 1 0 1 1
───── ───── ───── ─────
Total: 1×8 + 0×4 + 1×2 + 1×1 = 11
HEXADECIMAL: 2F
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Position: 1 0
Power: 16¹ 16⁰
Value: 16 1
───── ─────
Digit: 2 F(=15)
───── ─────
Total: 2×16 + 15×1 = 47
3. The Magic of Hexadecimal
Hexadecimal exists because of a beautiful mathematical relationship: 16 = 2^4. This means exactly 4 binary bits map to exactly 1 hex digit. This isn’t coincidence - it’s why hex became the standard for representing binary data:
THE NIBBLE RELATIONSHIP (Memorize This!)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Binary Hex │ Binary Hex
──────────────┼────────────────
0000 = 0 │ 1000 = 8
0001 = 1 │ 1001 = 9
0010 = 2 │ 1010 = A
0011 = 3 │ 1011 = B
0100 = 4 │ 1100 = C
0101 = 5 │ 1101 = D
0110 = 6 │ 1110 = E
0111 = 7 │ 1111 = F
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Example: Convert 0xDEADBEEF to binary
D E A D B E E F
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
1101 1110 1010 1101 1011 1110 1110 1111
Result: 11011110101011011011111011101111
Notice: No calculation needed! Just table lookup.
4. Two’s Complement: How CPUs Handle Negative Numbers
CPUs can’t store minus signs. Instead, they use a brilliant encoding called two’s complement where the most significant bit (MSB) indicates sign:
TWO'S COMPLEMENT (8-bit examples)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Positive numbers: Same as unsigned
00000001 = 1
00000010 = 2
01111111 = 127 (largest positive 8-bit signed)
The sign bit:
0xxxxxxx = positive (MSB = 0)
1xxxxxxx = negative (MSB = 1)
Negative numbers: Invert all bits, add 1
To get -1:
Step 1: Start with +1: 00000001
Step 2: Invert all bits: 11111110
Step 3: Add 1: 11111111
Result: -1 = 11111111 (0xFF)
To get -128:
10000000 = -128 (most negative 8-bit signed)
VERIFICATION: Adding -1 + 1 should equal 0
11111111 (-1)
+ 00000001 (+1)
──────────
100000000 (9 bits!)
↑
This bit "overflows" and is discarded in 8-bit math
Result: 00000000 = 0 ✓
WHY THIS WORKS:
The same addition hardware works for both signed and unsigned!
The CPU doesn't care about interpretation - just bit patterns.
5. Bit Widths: Why Size Matters
Different data types use different numbers of bits. Understanding widths is essential for reading memory dumps:
COMMON BIT WIDTHS AND THEIR RANGES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Width Bytes Unsigned Range Signed Range
───── ───── ────────────── ────────────
8-bit 1 0 to 255 -128 to 127
16-bit 2 0 to 65,535 -32,768 to 32,767
32-bit 4 0 to 4,294,967,295 -2,147,483,648 to 2,147,483,647
64-bit 8 0 to 18,446,744,073... -9,223,372,036... to 9,223...
VISUAL REPRESENTATION:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
8-bit: ████████
└──────┘
1 byte
16-bit: ████████ ████████
└──────┘ └──────┘
byte 0 byte 1
32-bit: ████████ ████████ ████████ ████████
└──────┘ └──────┘ └──────┘ └──────┘
byte 0 byte 1 byte 2 byte 3
64-bit: ████████ ████████ ████████ ████████ ████████ ████████ ████████ ████████
└───────────────────────────────────────────────────────────────────────┘
8 bytes
6. Endianness: Byte Order in Memory
Different CPU architectures store multi-byte values in different orders. This is called “endianness”:
THE ENDIANNESS PROBLEM
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
The 32-bit value 0x12345678: How is it stored in memory?
LITTLE-ENDIAN (x86, x86-64, ARM default):
Least significant byte at lowest address
Memory Address: 100 101 102 103
Memory Content: 0x78 0x56 0x34 0x12
└─────────────────────────┘
Reversed! LSB first.
BIG-ENDIAN (Network byte order, older PowerPC):
Most significant byte at lowest address
Memory Address: 100 101 102 103
Memory Content: 0x12 0x34 0x56 0x78
└─────────────────────────┘
"Natural" reading order.
REAL EXAMPLE - 0xDEADBEEF:
━━━━━━━━━━━━━━━━━━━━━━━━━━━
Little-endian memory dump: EF BE AD DE
Big-endian memory dump: DE AD BE EF
When you see a hex dump, you MUST know the endianness!
Why This Matters
Understanding binary and hex representation is the foundation for:
- Reading debugger output - GDB, LLDB, and other debuggers show memory in hex
- Understanding assembly language - Instructions and addresses are in hex
- Analyzing network packets - Protocol data is binary
- Writing embedded systems code - Direct hardware register manipulation
- Security analysis - Exploit development requires bit-level understanding
- Understanding CPU architecture - Instruction encoding is pure bit patterns
Historical Context
Binary computing dates back to the 1940s when Claude Shannon proved that Boolean algebra could be implemented with electronic circuits. The shift from decimal computers (like ENIAC) to binary was driven by hardware simplicity - a switch with two states is easier and cheaper to build than one with ten states.
Hexadecimal emerged in the 1960s as programmers needed a way to represent bytes (8 bits) more compactly than binary. The IBM System/360 (1964) popularized hex notation, and it became the universal standard for representing binary data.
Common Misconceptions
Misconception 1: “Negative numbers have a minus sign stored somewhere” Reality: There’s no minus sign. Two’s complement uses the MSB pattern to represent negativity. The same bit pattern can be positive or negative depending on interpretation.
Misconception 2: “Hex is harder to understand than decimal” Reality: Hex is actually simpler for binary data because of the exact 4-bit mapping. With practice, you’ll read hex faster than decimal for byte values.
Misconception 3: “Little-endian is backwards and wrong” Reality: Little-endian has practical advantages - adding numbers can start from the lowest address, and casting between sizes is trivial. Both are valid design choices.
Misconception 4: “My computer stores numbers as text like ‘255’” Reality: Numbers are stored as raw binary patterns. The text ‘255’ and the integer 255 are completely different in memory.
Project Specification
What You Will Build
A command-line tool called bitview that takes a number (decimal, hex, or binary) and displays its representation in all formats, with visual highlighting of important patterns like sign bits, byte boundaries, and endianness.
Functional Requirements
- Multi-format Input (
-d,-x,-bflags):- Accept decimal numbers (default):
./bitview 255 - Accept hexadecimal with
0xprefix or-xflag:./bitview 0xDEADBEEF - Accept binary with
0bprefix or-bflag:./bitview 0b11111111
- Accept decimal numbers (default):
- Complete Output Display:
- Show decimal representation
- Show binary representation with byte grouping
- Show hexadecimal representation
- Show signed interpretation (two’s complement)
- Show bit width used
- Bit Width Selection (
-wflag):- Support 8-bit, 16-bit, 32-bit, and 64-bit representations
- Default to smallest width that fits the number
- Show overflow warning if number exceeds selected width
- Endianness Display (
-eflag):- Show both little-endian and big-endian memory layouts
- Highlight byte order differences visually
- Byte Highlighting (
--bytes):- Separate binary output into byte groups
- Show hex digit alignment with binary nibbles
- Sign Bit Visualization:
- Clearly mark the sign bit in signed interpretations
- Show two’s complement breakdown for negative numbers
Non-Functional Requirements
- Performance: Handle all inputs up to 64-bit instantly
- Portability: Compile and run on any POSIX system with standard C library
- Robustness: Handle all invalid inputs gracefully with meaningful error messages
- Usability: Self-documenting with
--helpflag
Example Usage/Output
$ ./bitview 255
Decimal: 255
Binary: 00000000 00000000 00000000 11111111
Hex: 0x000000FF
Signed: 255 (positive)
Bit width: 32-bit
$ ./bitview -1
Decimal: -1
Binary: 11111111 11111111 11111111 11111111
^
└─ Sign bit (1 = negative)
Hex: 0xFFFFFFFF
Signed: -1 (two's complement)
Bit width: 32-bit
$ ./bitview 0xDEADBEEF
Decimal: 3735928559
Binary: 11011110 10101101 10111110 11101111
^^^^^^^^ ^^^^^^^^ ^^^^^^^^ ^^^^^^^^
DE AD BE EF
Hex: 0xDEADBEEF
Signed: -559038737 (two's complement)
Bit width: 32-bit
$ ./bitview -w 8 127
Decimal: 127
Binary: 01111111
^
└─ Sign bit (0 = positive)
Hex: 0x7F
Signed: 127 (positive, max for signed 8-bit)
Bit width: 8-bit
$ ./bitview -w 8 128
Decimal: 128
Binary: 10000000
^
└─ Sign bit (1 = negative when interpreted as signed)
Hex: 0x80
Unsigned: 128
Signed: -128 (two's complement, min for signed 8-bit)
Bit width: 8-bit
$ ./bitview -e 0x12345678
Decimal: 305419896
Binary: 00010010 00110100 01010110 01111000
Hex: 0x12345678
Memory Layout (32-bit):
Big-endian: [0x12] [0x34] [0x56] [0x78] (address 0 → 3)
Little-endian: [0x78] [0x56] [0x34] [0x12] (address 0 → 3)
^Most x86/x64 systems use little-endian
$ ./bitview --help
Usage: bitview [OPTIONS] NUMBER
Display number in binary, hexadecimal, and decimal formats.
Input Formats:
123 Decimal (default)
0xFF Hexadecimal (0x prefix)
0b1010 Binary (0b prefix)
Options:
-w, --width N Force bit width (8, 16, 32, or 64)
-e, --endian Show endianness memory layouts
-s, --signed Interpret as signed (default for negative input)
-u, --unsigned Interpret as unsigned
-h, --help Show this help message
Examples:
bitview 255 Show 255 in all formats
bitview -w 8 -1 Show -1 as 8-bit signed
bitview 0xCAFE Show hex input in all formats
bitview -e 0x12345678 Show byte order layouts
Real World Outcome
After completing this tool, you’ll use it constantly for:
- Debugging sessions: Quickly verify what a value looks like in binary/hex
- Understanding error codes: System error codes are often hex
- Analyzing bit flags: See which bits are set in configuration values
- Learning assembly: Verify instruction encoding by checking opcodes
- Network debugging: Convert between address formats
Solution Architecture
High-Level Design
┌──────────────────────────────────────────────────────────────────────────┐
│ bitview │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌───────────┐ │
│ │ Argument │───▶│ Input │───▶│ Display │───▶│ Output │ │
│ │ Parser │ │ Parser │ │ Formatter │ │ Printer │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └───────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ Data Model │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ struct number_display { │ │ │
│ │ │ uint64_t value; // The raw value │ │ │
│ │ │ int64_t signed_value; // Signed interpretation │ │ │
│ │ │ int bit_width; // 8, 16, 32, or 64 │ │ │
│ │ │ bool is_negative; // Original input was negative │ │ │
│ │ │ enum input_base; // DEC, HEX, BIN │ │ │
│ │ │ } │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────┘
Key Components
| Component | Responsibility | Key Functions |
|---|---|---|
| Argument Parser | Parse command-line flags and extract input | parse_args(), validate options |
| Input Parser | Convert input string to numeric value | parse_decimal(), parse_hex(), parse_binary() |
| Display Formatter | Format value for output in all bases | format_binary(), format_hex(), format_decimal() |
| Output Printer | Render formatted output to stdout | print_display(), handle alignment |
Data Structures
// Input parsing result
typedef enum {
BASE_DECIMAL,
BASE_HEXADECIMAL,
BASE_BINARY
} input_base_t;
// Command-line options
typedef struct {
int bit_width; // 0 = auto-detect, or 8/16/32/64
bool show_endian; // Display byte order layouts
bool force_signed; // Force signed interpretation
bool force_unsigned; // Force unsigned interpretation
input_base_t input_base; // Which base the input is in
} options_t;
// Parsed and processed number
typedef struct {
uint64_t raw_value; // The unsigned representation
int64_t signed_value; // Signed interpretation (if applicable)
int bit_width; // Determined or specified width
bool input_was_negative; // True if input started with '-'
input_base_t input_base; // How the input was specified
} number_data_t;
// Output strings ready for printing
typedef struct {
char decimal[32]; // Decimal representation
char hex[20]; // Hex with prefix
char binary[80]; // Binary with spaces
char signed_info[64]; // "positive" or "negative" with value
char endian_big[64]; // Big-endian layout
char endian_little[64]; // Little-endian layout
} display_strings_t;
Algorithm Overview
Main Program Flow:
1. Parse command-line arguments
├── Extract flags (-w, -e, -s, -u)
├── Extract input number string
└── Validate options (e.g., width must be 8/16/32/64)
2. Detect input format
├── Starts with "0x" or "0X" → hexadecimal
├── Starts with "0b" or "0B" → binary
├── Starts with "-" → negative decimal
└── Otherwise → positive decimal
3. Parse input to numeric value
├── For hex: iterate chars, multiply by 16, add digit value
├── For binary: iterate chars, shift left, add bit
└── For decimal: standard strtol/strtoll
4. Determine bit width (if not specified)
├── Value fits in 8 bits → 8
├── Value fits in 16 bits → 16
├── Value fits in 32 bits → 32
└── Otherwise → 64
5. Calculate signed interpretation
├── Check if MSB is set for given width
├── If set: calculate two's complement value
└── Store both unsigned and signed values
6. Format output strings
├── Binary: convert each bit, group by 8s
├── Hex: convert each nibble, pad to width
└── Decimal: sprintf the values
7. Print formatted output
├── Print each representation with labels
├── If -e flag: print endian layouts
└── Align output for readability
Binary Conversion Algorithm (Decimal to Binary String):
Input: value (uint64_t), width (int)
Output: binary string with spaces between bytes
1. Create output buffer of appropriate size
2. For i from (width-1) down to 0:
a. Extract bit i: (value >> i) & 1
b. Append '0' or '1' to output
c. If (i % 8 == 0) and (i != 0): append space
3. Return output string
Two’s Complement Interpretation:
Input: value (uint64_t), width (int)
Output: signed_value (int64_t)
1. Create mask for sign bit: 1 << (width - 1)
2. If (value & mask) is non-zero:
a. Number is negative
b. Extend sign bits: signed_value = value | ~((1 << width) - 1)
3. Else:
a. Number is positive
b. signed_value = value
4. Return signed_value
Implementation Guide
Development Environment Setup
# Required tools
# On macOS:
xcode-select --install
# On Linux:
sudo apt-get install build-essential
# Create project structure
mkdir -p bitview/{src,include,tests}
cd bitview
# Verify compiler
gcc --version
# or
clang --version
Project Structure
bitview/
├── src/
│ ├── main.c # Entry point, main loop
│ ├── parser.c # Input parsing functions
│ ├── formatter.c # Output formatting functions
│ └── display.c # Printing and layout functions
├── include/
│ ├── bitview.h # Shared data structures
│ ├── parser.h # Parser function declarations
│ ├── formatter.h # Formatter function declarations
│ └── display.h # Display function declarations
├── tests/
│ ├── test_parser.c # Unit tests for parser
│ ├── test_formatter.c # Unit tests for formatter
│ └── run_tests.sh # Integration test script
├── Makefile
└── README.md
The Core Question You’re Answering
“How can I instantly visualize any number as the CPU sees it - in binary, with clear byte boundaries, sign bits, and endianness?”
This question drives every design decision:
- Why binary output? Because that’s what the CPU actually stores
- Why byte grouping? Because memory is addressed by bytes
- Why sign bit marking? Because signed vs unsigned interpretation changes meaning
- Why endianness display? Because memory layout affects debugging
Concepts You Must Understand First
Before writing code, verify you can answer these questions:
| Concept | Self-Test Question | Where to Learn |
|---|---|---|
| Positional notation | How is 1011 binary calculated to 11 decimal? | “Code” Ch. 7-9 |
| Division-remainder algorithm | How do you convert 25 to binary by hand? | CS:APP Ch. 2.1 |
| Two’s complement | Why is -1 represented as 0xFF in 8 bits? | CS:APP Ch. 2.2 |
| Bit widths | What’s the range of a signed 8-bit integer? | CS:APP Ch. 2.2 |
| Endianness | How is 0x12345678 stored in little-endian? | CS:APP Ch. 2.1.3 |
| C bit manipulation | What does (x >> 4) & 0xF extract? |
K&R Ch. 2.9 |
Questions to Guide Your Design
Work through these before writing code:
- How will you detect the input format?
- Check for “0x”/”0X” prefix → hex
- Check for “0b”/”0B” prefix → binary
- Check for leading “-“ → negative decimal
- What about invalid prefixes like “0z”?
- How will you handle negative numbers?
- Parse the absolute value, then apply two’s complement
- Or parse as signed directly?
- What if someone inputs “-0xFF”?
- How will you determine bit width if not specified?
- Find minimum width that contains the value
- Special case for negative numbers (need sign bit)
- What about 0? (Could be any width)
- How will you format binary output for readability?
- Space every 8 bits (byte boundary)
- Align hex digits under corresponding nibbles?
- Leading zeros to fill the width?
- How will you structure your code for testability?
- Pure functions that take input, return output
- No global state
- Separate parsing from formatting from printing
Thinking Exercise
Before coding, trace through these by hand:
Exercise 1: Manual Conversion Convert each of these without a calculator:
- 173 decimal to binary
- 0xBEEF to decimal
- 10110011 binary to hex
- -5 to 8-bit two’s complement
Exercise 2: Width Determination For each value, what’s the minimum bit width needed?
- 255: ____ bits (unsigned)
- 256: ____ bits (unsigned)
- -1: ____ bits (signed)
- -129: ____ bits (signed)
Exercise 3: Endianness Layout Write out the memory layout for 0x12345678:
- Big-endian: [__] [] [] [__]
- Little-endian: [__] [] [] [__]
Exercise 4: Design Sketch On paper, write pseudocode for:
parse_hex_string(const char* str)→ returns uint64_tformat_as_binary(uint64_t value, int width)→ returns char*get_signed_interpretation(uint64_t value, int width)→ returns int64_t
Hints in Layers
Layer 1: Getting Started
If you’re stuck on where to begin:
- Start with just decimal to binary conversion for unsigned numbers
- Hardcode 32-bit width initially
- Ignore command-line arguments - just use a hardcoded test value
- Get the core algorithm working before adding features
Layer 2: Core Algorithm Hints
For decimal to binary:
// Extract each bit from MSB to LSB
for (int i = width - 1; i >= 0; i--) {
int bit = (value >> i) & 1;
// Append '0' or '1' to output
}
For hex to decimal:
// Each hex digit adds to running total
uint64_t result = 0;
for (each char in input) {
result = result * 16 + digit_value(char);
}
For two’s complement:
// Check if sign bit is set
uint64_t sign_bit = 1ULL << (width - 1);
if (value & sign_bit) {
// Negative: extend sign bits
uint64_t mask = ~((1ULL << width) - 1);
return (int64_t)(value | mask);
}
Layer 3: Implementation Details
For hex character to value:
int hex_char_to_int(char c) {
if (c >= '0' && c <= '9') return c - '0';
if (c >= 'a' && c <= 'f') return c - 'a' + 10;
if (c >= 'A' && c <= 'F') return c - 'A' + 10;
return -1; // Invalid
}
For value to hex character:
char int_to_hex_char(int value) {
if (value < 10) return '0' + value;
return 'A' + value - 10;
}
Layer 4: Debugging Hints
Common bugs to watch for:
- Integer overflow when parsing (use
unsigned long long) - Off-by-one in bit positions
- Forgetting that C strings need null terminators
- Sign extension when casting between sizes
- Using
%dfor values larger thanint
Test with these values:
0 → Should work (edge case: zero)
1 → Simplest positive
255 → Max 8-bit unsigned
256 → First value needing 9+ bits
-1 → All bits set
0xFFFFFFFF → 32-bit max
0x8000000000000000 → 64-bit MSB only
Interview Questions
After completing this project, you should be able to answer:
- “How would you convert a decimal number to binary without built-in functions?”
- Explain division-remainder algorithm
- Mention that you reverse the remainders (or build string backwards)
- Complexity is O(log n)
- “What is two’s complement and why do we use it?”
- Same addition hardware works for signed and unsigned
- Only one representation of zero
- Negation is simple: invert bits, add 1
- “Why do we use hexadecimal instead of decimal for memory addresses?”
- Perfect mapping: 4 bits = 1 hex digit
- Much more compact than binary
- Easy to see byte boundaries
- “What is endianness and when does it matter?”
- Order of bytes in multi-byte values
- Matters for: network protocols, file formats, cross-platform code
- x86 is little-endian, network order is big-endian
- “How can the same bit pattern represent different values?”
- Interpretation depends on context (signed vs unsigned)
- 0xFF is 255 unsigned but -1 signed in 8 bits
- CPU doesn’t care - uses same hardware for both
Books That Will Help
| Topic | Book | Chapter | Why |
|---|---|---|---|
| Binary fundamentals | “Code” by Petzold | Ch. 7-9 | Builds intuition from first principles |
| Two’s complement | CS:APP | Ch. 2.2 | Rigorous explanation with examples |
| Endianness | CS:APP | Ch. 2.1.3 | Shows exactly how bytes are laid out |
| Bit manipulation in C | K&R | Ch. 2.9 | Classic reference for C operators |
| Number representation | CS:APP | Ch. 2.1-2.2 | Complete coverage of data formats |
Implementation Phases
Phase 1: Core Parsing (Day 1, 2-3 hours)
Goals:
- Parse a decimal string to uint64_t
- Parse a hex string (with 0x prefix) to uint64_t
- Handle basic error cases (invalid characters)
Checkpoint: You can run ./bitview 255 and ./bitview 0xFF and get the same internal value.
Phase 2: Binary Formatting (Day 1, 2-3 hours)
Goals:
- Convert uint64_t to binary string
- Add byte-boundary spacing
- Handle specified bit widths
Checkpoint: You can see 255 displayed as 11111111 and 256 as 00000001 00000000.
Phase 3: Complete Output (Day 2, 2-3 hours)
Goals:
- Display all three representations (dec, hex, bin)
- Show signed interpretation
- Proper output formatting and alignment
Checkpoint: Your output matches the example specification.
Phase 4: Polish and Features (Day 2, 2-3 hours)
Goals:
- Add command-line argument parsing
- Add endianness display
- Add help message
- Handle all edge cases
Checkpoint: Tool is complete and handles all test cases.
Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Integer size | int, long, int64_t | uint64_t/int64_t | Consistent 64-bit support across platforms |
| String handling | Dynamic allocation vs static buffers | Static buffers | Simpler, no memory leaks, fixed max sizes |
| Argument parsing | getopt vs manual | getopt or getopt_long | Standard, robust, handles edge cases |
| Error handling | errno + return codes vs fprintf | fprintf + exit | Simple for CLI tool, immediate feedback |
Testing Strategy
Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Boundary values | Test limits of each bit width | 0, 127, 128, 255, 256, 32767, etc. |
| Sign handling | Verify two’s complement works | -1, -128, INT_MIN, etc. |
| Input formats | All prefix styles work | 0x, 0X, 0b, 0B, no prefix |
| Error cases | Invalid input rejected | “hello”, “0xGG”, “0b123” |
| Edge cases | Special situations | Leading zeros, very large numbers |
Critical Test Cases
# Boundary values
./bitview 0 # Zero
./bitview 1 # Smallest positive
./bitview 127 # Max 7-bit signed
./bitview 128 # First value needing 8th bit
./bitview 255 # Max 8-bit unsigned
./bitview 256 # First 9-bit value
./bitview 65535 # Max 16-bit unsigned
./bitview 2147483647 # Max 32-bit signed
./bitview 4294967295 # Max 32-bit unsigned
# Negative numbers
./bitview -1 # Should show all 1s
./bitview -128 # Min 8-bit signed
./bitview -129 # Needs 16 bits
# Hexadecimal input
./bitview 0xFF # 255
./bitview 0xDEADBEEF # Famous test value
./bitview 0xffffffff # 32-bit max (lowercase)
./bitview 0XABCD # Uppercase prefix
# Binary input
./bitview 0b11111111 # 255
./bitview 0b10000000 # 128
# With width flag
./bitview -w 8 255 # Show as 8-bit
./bitview -w 16 255 # Show as 16-bit
./bitview -w 8 256 # Should warn about overflow
# Error cases (should fail gracefully)
./bitview hello # Not a number
./bitview 0xGHIJ # Invalid hex
./bitview 0b123 # Invalid binary
./bitview "" # Empty input
Test Data File
#!/bin/bash
# test_bitview.sh - Run all tests
PASS=0
FAIL=0
test_case() {
local input="$1"
local expected_contains="$2"
local result=$(./bitview $input 2>&1)
if echo "$result" | grep -q "$expected_contains"; then
echo "PASS: $input"
((PASS++))
else
echo "FAIL: $input - expected '$expected_contains'"
echo " Got: $result"
((FAIL++))
fi
}
# Run tests
test_case "255" "0xFF"
test_case "255" "11111111"
test_case "-1" "0xFFFFFFFF"
test_case "0xDEADBEEF" "3735928559"
test_case "0b10101010" "170"
echo ""
echo "Results: $PASS passed, $FAIL failed"
Common Pitfalls & Debugging
Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
Using int instead of uint64_t |
Overflow on large values | Always use fixed-width types |
| Forgetting null terminator | Garbage characters in output | Ensure all strings are terminated |
| Wrong shift direction | Bits in wrong position | Draw out the operation on paper |
| Sign extension on cast | Unexpected large positive values | Use explicit masking |
| Not handling zero | Empty output or crash | Special case: if value == 0, output “0” |
Debugging Strategies
Print intermediate values:
// Add during development, remove later
printf("DEBUG: parsed value = %llu (0x%llX)\n", value, value);
printf("DEBUG: sign bit position = %d\n", width - 1);
printf("DEBUG: sign bit value = %d\n", (value >> (width - 1)) & 1);
Test each function in isolation:
// Test parser alone
assert(parse_hex("FF") == 255);
assert(parse_hex("0xFF") == 255);
assert(parse_binary("11111111") == 255);
// Test formatter alone
assert(strcmp(format_hex(255), "0xFF") == 0);
assert(strcmp(format_binary(255, 8), "11111111") == 0);
Use a debugger:
# Compile with debug symbols
gcc -g -O0 main.c -o bitview
# Run in GDB
gdb ./bitview
(gdb) break main
(gdb) run 255
(gdb) print value
(gdb) step
Performance Traps
For this project, performance isn’t critical (everything is O(log n) at worst), but watch for:
- Unnecessary string copies
- Reallocating buffers in loops
- Computing the same value multiple times
Extensions & Challenges
Beginner Extensions
- Color output: Use ANSI codes to highlight sign bits in red
- ASCII display: For byte-sized values, show the ASCII character if printable
- Octal output: Add octal (base 8) representation
Intermediate Extensions
- Interactive mode: REPL that accepts continuous input
- Bit field highlighting: Highlight specific bit ranges (e.g., bits 4-7)
- IEEE 754 floating point: Show float/double bit layout (sign, exponent, mantissa)
- Arbitrary bases: Support base 3, base 7, etc.
Advanced Extensions
- Instruction decoder: For x86, decode common instruction patterns
- Memory dump parsing: Accept hex dump format and decode
- GUI version: Create a graphical version with bit toggles
- Network byte order: Convert between host and network byte order
Real-World Connections
Industry Applications
- Debugging: Every debugger shows memory as hex dumps
- Embedded systems: Direct register manipulation requires bit-level understanding
- Network protocols: Packet headers are parsed bit by bit
- Cryptography: Hash functions and encryption work at bit level
- Graphics: Color values, pixel formats all use binary/hex
Related Tools
- xxd: Hex dump utility (compare your output to this)
- od: Octal dump (original Unix tool)
- hexdump: Another standard hex viewer
- Python struct module: Binary packing/unpacking
Interview Relevance
This project demonstrates:
- Understanding of fundamental computer science concepts
- Ability to build useful CLI tools
- Knowledge of how CPUs represent data
- C programming competence
Resources
Essential Reading
- “Code” by Charles Petzold - Chapters 7-9 on binary and counting
- CS:APP - Chapter 2 on information representation
- K&R - Chapter 2.9 on bitwise operators
Online References
- Wikipedia: “Two’s complement” - Clear explanation with examples
- Stanford CS107: Binary and Data lab exercises
- Computerphile YouTube: Binary and number systems videos
Tools
- Calculator with programmer mode: Windows Calculator, macOS, online tools
- GDB/LLDB: Practice reading hex in a debugger
- Python: Quick verification with
bin(),hex(),int(x, base)
Self-Assessment Checklist
Before considering this project complete, verify:
Conceptual Understanding
- I can convert between decimal, binary, and hex without tools
- I understand why -1 is 0xFFFFFFFF in 32 bits
- I can explain two’s complement to someone else
- I know the nibble table (4 bits to hex) by heart
- I understand little-endian vs big-endian memory layout
Implementation Skills
- My tool correctly handles all positive values up to 64 bits
- My tool correctly shows negative number representations
- Invalid input produces meaningful error messages
- All command-line options work as specified
- Output is properly formatted and aligned
Interview Readiness
- I can explain the division-remainder algorithm clearly
- I can describe why hex is used for memory addresses
- I can discuss signed vs unsigned representation tradeoffs
- I can explain endianness and when it matters
Submission/Completion Criteria
Minimum Viable Completion:
- Accepts decimal input
- Outputs binary and hex representations
- Handles values 0 through 2^32-1
- Basic error handling for invalid input
Full Completion:
- All input formats (dec, hex, bin) work
- Negative numbers with two’s complement
- Bit width selection (-w flag)
- Proper sign bit visualization
- Help message with usage examples
Excellence:
- Endianness display mode
- Color-coded output
- Byte and nibble alignment in output
- Comprehensive test suite
- Clean, well-documented code
This project is the foundation for understanding CPU architecture. The ability to read and interpret binary and hex values fluently is essential for every project that follows. Take your time, do the paper exercises, and ensure you truly understand before moving on.