Project 3: Numeric Representation Deep Dive

A laboratory that reveals how integers and floating-point numbers are actually stored and manipulated at the bit level.

Quick Reference

Attribute Value
Difficulty Level 3 - Advanced
Time Estimate 1-2 weeks
Main Programming Language C
Alternative Programming Languages Python (for verification)
Coolness Level Level 4 - Hardcore Tech Flex
Business Potential Level 1 - Resume Gold
Prerequisites Binary/hex, pointers, basic C I/O
Key Topics Two’s complement, IEEE-754, endianness, rounding

1. Learning Objectives

By completing this project, you will:

  1. Explain two’s complement representation and detect integer overflow safely.
  2. Visualize the bit patterns of integers and floats, including sign/exponent/mantissa.
  3. Demonstrate how endianness affects byte order and serialization.
  4. Measure floating-point rounding error and understand ULPs.
  5. Build a report that maps bit patterns to numeric meaning and back.

2. All Theory Needed (Per-Concept Breakdown)

Concept 1: Two’s Complement Integers, Overflow, and Endianness

Fundamentals

Most modern CPUs represent signed integers using two’s complement. In two’s complement, negative numbers are represented by inverting all bits and adding one. This makes addition and subtraction work uniformly in hardware, but it also introduces edge cases: the representable range is asymmetric, and overflow is undefined behavior in C for signed types. Unsigned arithmetic is defined to wrap modulo 2^N, but that does not mean it is safe in all contexts. Endianness determines the byte order of multi-byte integers in memory, which is critical when you inspect raw bytes or serialize data.

Deep Dive into the concept

Two’s complement is elegant because it uses the same circuitry for signed and unsigned addition. If you have an N-bit word, the unsigned range is [0, 2^N - 1]. The signed range is [-2^(N-1), 2^(N-1)-1]. The most negative value has no positive counterpart because of the extra negative value (e.g., -128 for 8-bit). When you add two signed integers, the CPU performs the same binary addition, but the interpretation of the result depends on signedness. In C, however, signed overflow is UB. This is a language-level rule, not a hardware rule. The CPU may wrap, but the compiler can assume overflow never occurs and optimize accordingly. That means if you write overflow checks with signed arithmetic, the compiler might remove them. The safe approach is to use unsigned arithmetic with explicit range checks, or use C23 checked arithmetic functions like ckd_add.

Endianness refers to the order in which bytes of a multi-byte value are stored. On little-endian systems, the least significant byte comes first; on big-endian systems, the most significant byte comes first. Endianness doesn’t change the logical value of an integer, but it changes its byte layout in memory. This matters when you dump memory, read from disk, or communicate over a network. If you serialize an integer by writing its raw bytes, you must specify endianness and convert appropriately. In this project, you will use byte dumps to reveal endianness and then verify that reconstructing the value from bytes yields the original integer.

Two’s complement also interacts with bitwise operations. Right shifts on signed integers may be arithmetic (sign-extended) or logical depending on the implementation. The C standard allows this to be implementation-defined. That means -1 >> 1 could yield -1 (arithmetic) or a large positive number (logical). A professional C programmer must know which behavior their compiler uses or avoid relying on it. Your lab will include tests demonstrating this and document the compiler-specific result.

Finally, consider overflow detection. For unsigned, you can detect overflow by comparing the result to operands. For signed, you must avoid UB by performing checks before the operation or by using wider types. This is a core skill for safe systems code. Your project will produce a set of test cases and helper functions that show both safe and unsafe approaches, and it will measure how optimizations affect the unsafe ones.

To deepen your understanding, connect arithmetic rules to real-world protocols. Network protocols almost always define byte order (big-endian), so you must convert host values to network order explicitly. This is why functions like htons and htonl exist. Endianness also affects how you inspect memory in debuggers: the byte order is reversed compared to the numeric representation you are used to seeing in hex. Another subtlety is sign extension: when you cast a smaller signed type to a larger one, the sign bit is extended. This can surprise you when you treat a signed byte as a mask. The safe approach is to use unsigned types for bitwise operations and only convert to signed when you need arithmetic. Finally, overflow checks should be designed with the compiler’s rules in mind. If you want wraparound, use unsigned arithmetic. If you want detection, use checked operations or explicit range checks before the computation. Mixing these approaches without clarity is a recipe for bugs that appear only under optimization.

To operationalize this concept in a real codebase, create a short checklist of invariants and a set of micro-experiments. Start with a minimal, deterministic test that isolates one rule or behavior, then vary a single parameter at a time (inputs, flags, platform, or data layout) and record the outcome. Keep a table of assumptions and validate them with assertions or static checks so violations are caught early. Whenever the concept touches the compiler or OS, capture tool output such as assembly, warnings, or system call traces and attach it to your lab notes. Finally, define explicit failure modes: what does a violation look like at runtime, and how would you detect it in logs or tests? This turns abstract theory into repeatable engineering practice and makes results comparable across machines and compiler versions.

How this fits on projects

Definitions & key terms

  • Two’s complement: Signed integer representation where negative values are inverted plus one.
  • Overflow (signed): UB if the mathematical result does not fit in the type.
  • Modulo arithmetic: Unsigned arithmetic wraps around modulo 2^N.
  • Endianness: The byte order for multi-byte values in memory.
  • ULP: Unit in the last place (smallest representable step).

Mental model diagram (ASCII)

Value: 0x12345678
Little-endian memory: 78 56 34 12
Big-endian memory:    12 34 56 78

How it works (step-by-step, with invariants and failure modes)

  1. Represent integers in N-bit two’s complement.
  2. Perform arithmetic; hardware wraps, but C rules may declare UB.
  3. Store the result in memory as bytes based on endianness.
  4. Reinterpret bytes to reconstruct the value.

Invariant: Unsigned operations wrap modulo 2^N. Failure mode: Signed overflow is UB and can invalidate checks.

Minimal concrete example

#include <stdint.h>
#include <stdio.h>

int main(void) {
    int8_t x = -5; // 0b11111011
    printf("x = %d\n", x);
    return 0;
}

Common misconceptions

  • “Signed overflow just wraps.” → It is UB in C.
  • “Endianness only matters for networking.” → It matters for any binary I/O.
  • “Right shift of negative is always arithmetic.” → It is implementation-defined.

Check-your-understanding questions

  1. Why is the signed range asymmetric in two’s complement?
  2. How can you safely detect overflow for signed addition in C?
  3. What byte order would you see for 0x01020304 on little-endian?
  4. Why is -1 >> 1 not portable?
  5. What does modulo arithmetic guarantee for unsigned values?

Check-your-understanding answers

  1. Because one bit pattern is used for zero, leaving an extra negative value.
  2. Check bounds before the operation or use checked arithmetic.
  3. 04 03 02 01.
  4. The C standard allows either arithmetic or logical shift.
  5. Results wrap modulo 2^N for the type width.

Real-world applications

  • Serializing integers in file formats and network protocols.
  • Writing cryptographic code that relies on defined wrapping behavior.
  • Debugging overflow bugs in embedded systems.

Where you’ll apply it

References

  • “CS:APP” — Bryant & O’Hallaron, Ch. 2
  • “Hacker’s Delight” — Henry S. Warren (bit tricks)

Key insights

Two’s complement simplifies hardware but makes signed overflow a language-level hazard.

Summary

Two’s complement is the dominant integer representation, but C’s rules about overflow and shifting are subtle. Endianness controls how bytes appear in memory and must be handled explicitly when you cross system boundaries.

Homework/Exercises to practice the concept

  1. Write a function that prints the binary form of a signed integer.
  2. Detect signed addition overflow without invoking UB.
  3. Implement endian conversion for 32-bit integers.

Solutions to the homework/exercises

  1. Iterate bits and print from MSB to LSB.
  2. Check bounds before addition or use a wider type.
  3. Swap bytes with shifts and masks.

Concept 2: IEEE-754 Floating Point, Rounding, and Error

Fundamentals

Floating-point numbers are stored as sign, exponent, and mantissa (fraction) following the IEEE-754 standard on most platforms. This representation trades exactness for range: many decimal values cannot be represented exactly, so they are rounded to the nearest representable value. Understanding rounding modes, ULPs, NaNs, and infinities is essential for writing reliable numerical code. Small errors can accumulate, comparisons can be misleading, and seemingly simple arithmetic can yield surprising results.

Deep Dive into the concept

IEEE-754 defines a binary floating-point format. For a 32-bit float, there is 1 sign bit, 8 exponent bits, and 23 fraction bits (with an implicit leading 1 for normal numbers). The value is interpreted as (-1)^sign * 1.fraction * 2^(exponent-bias). This representation provides wide range but limited precision. For example, 0.1 cannot be represented exactly in binary, so the stored value is an approximation. When you add or multiply floats, the result is rounded to the nearest representable value, using a specified rounding mode (usually round-to-nearest-even).

Special values complicate matters: NaNs represent invalid results and propagate through computations; infinities represent overflow; subnormals represent values too small to be normalized. Each has unique bit patterns and behaviors. Comparisons involving NaNs are always false (except !=), which can break naive algorithms. Subnormals trade precision for the ability to represent tiny values, but they can be slower on some hardware. Understanding these cases is critical for numerical robustness.

Floating-point error is not random; it is structured. Each operation introduces a rounding error on the order of half an ULP. Over many operations, these errors can accumulate or cancel depending on the algorithm. The order of operations matters: (a + b) + c may not equal a + (b + c) because of rounding. This is why numerical libraries use techniques like Kahan summation to reduce error. In this project, you will build experiments that measure these errors, compare different summation orders, and visualize the difference between mathematical and representable values. You will also show how conversions between float and integer can overflow or truncate, emphasizing the need for explicit checks.

Finally, IEEE-754 is pervasive but not guaranteed by the C standard. The standard allows other representations, though in practice most systems use IEEE-754. The key is to detect and document the implementation: you can use FLT_RADIX, FLT_MANT_DIG, and FLT_MAX macros to verify assumptions. Your lab should print these values and explain their meaning, reinforcing the idea that numerical code is only portable if you know the platform’s floating-point model.

To operationalize this concept in a real codebase, create a short checklist of invariants and a set of micro-experiments. Start with a minimal, deterministic test that isolates one rule or behavior, then vary a single parameter at a time (inputs, flags, platform, or data layout) and record the outcome. Keep a table of assumptions and validate them with assertions or static checks so violations are caught early. Whenever the concept touches the compiler or OS, capture tool output such as assembly, warnings, or system call traces and attach it to your lab notes. Finally, define explicit failure modes: what does a violation look like at runtime, and how would you detect it in logs or tests? This turns abstract theory into repeatable engineering practice and makes results comparable across machines and compiler versions.

Another way to deepen understanding is to map the concept to a small decision table: list inputs, expected outcomes, and the assumptions that must hold. Create at least one negative test that violates an assumption and observe the failure mode, then document how you would detect it in production. Add a short trade-off note: what you gain by following the rule and what you pay in complexity or performance. Where possible, instrument the implementation with debug-only checks so violations are caught early without affecting release builds. If the concept admits multiple approaches, implement two and compare them; the act of measuring and documenting the difference is part of professional practice. This habit turns theoretical understanding into an engineering decision framework you can reuse across projects.

How this fits on projects

Definitions & key terms

  • IEEE-754: Standard for floating-point representation and behavior.
  • Mantissa (significand): Fractional part of a floating-point number.
  • Exponent bias: Offset used to store signed exponents as unsigned values.
  • NaN: Not-a-Number, represents invalid results.
  • ULP: Unit in the last place, the spacing between representable numbers.

Mental model diagram (ASCII)

Float (32-bit):
[sign][ exponent 8 ][ mantissa 23 ]
 value = (-1)^s * 1.m * 2^(e-bias)

How it works (step-by-step, with invariants and failure modes)

  1. Convert decimal to binary fraction (approximate if necessary).
  2. Normalize the binary representation and extract sign/exponent/mantissa.
  3. Round to the nearest representable mantissa.
  4. Perform operations with rounding after each step.

Invariant: Each float represents a discrete set of values. Failure mode: Comparisons and equality checks can fail due to rounding.

Minimal concrete example

#include <stdio.h>
int main(void) {
    double x = 0.1 + 0.2;
    printf("%.17f\n", x);
}

Common misconceptions

  • “Floating point is precise enough for equality checks.” → Often false.
  • “NaN compares equal to itself.” → It does not.
  • “Order of operations doesn’t matter.” → It can change results.

Check-your-understanding questions

  1. Why can’t 0.1 be represented exactly in binary?
  2. What is an ULP and why does it matter?
  3. What happens when a float overflows?
  4. Why does summation order affect results?
  5. How can you check if your platform uses IEEE-754?

Check-your-understanding answers

  1. It is a repeating fraction in base-2.
  2. It is the spacing between representable numbers.
  3. It becomes infinity and raises a floating-point exception.
  4. Rounding occurs at each step, so reordering changes error.
  5. Inspect FLT_RADIX, FLT_MANT_DIG, and related macros.

Real-world applications

  • Numerical simulations and scientific computing.
  • Financial systems where rounding errors matter.
  • Graphics pipelines and signal processing.

Where you’ll apply it

References

  • “What Every Computer Scientist Should Know About Floating-Point Arithmetic” — Goldberg
  • IEEE-754 standard summaries
  • “CS:APP” — floating-point chapter

Key insights

Floating point is a finite approximation system; correctness depends on respecting its limits.

Summary

IEEE-754 defines how floats are stored and computed, but rounding and special values make arithmetic non-intuitive. A professional C programmer must measure and reason about floating-point error rather than assume ideal math.

Homework/Exercises to practice the concept

  1. Print the bit pattern of several floats and decode them.
  2. Implement Kahan summation and compare with naive summation.
  3. Detect NaNs and infinities in a stream of values.

Solutions to the homework/exercises

  1. Use memcpy to a uint32_t and print bits.
  2. Kahan should reduce error for long sums.
  3. Use isnan and isinf from math.h.

3. Project Specification

3.1 What You Will Build

A numeric representation lab that can display integer and floating-point values as raw bits, explain their meaning, and demonstrate overflow, underflow, rounding, and endianness. It outputs a report and provides a set of reproducible experiments.

3.2 Functional Requirements

  1. Integer Bit Viewer: Convert integers to binary/hex representation.
  2. Float Decoder: Show sign/exponent/mantissa for float and double.
  3. Endianness Detector: Print byte order and demonstrate conversions.
  4. Overflow Experiments: Show safe vs unsafe overflow detection.
  5. Rounding Experiments: Compare operation orders and show error.

3.3 Non-Functional Requirements

  • Performance: Must run within seconds.
  • Reliability: Deterministic outputs for defined behavior experiments.
  • Usability: Provide clear explanations for each experiment.

3.4 Example Usage / Output

$ ./numeric_lab --float 0.1
Value: 0.1
Bits: 0x3fb999999999999a
Sign: 0 Exponent: 0x3fb (bias 1023) Mantissa: 0x999999999999a

3.5 Data Formats / Schemas / Protocols

Report format (text):

Experiment | Input | Bits | Explanation
-----------|-------|------|------------
...

3.6 Edge Cases

  • Values near integer overflow boundaries.
  • Subnormal floats and NaNs.
  • Endianness-sensitive byte dumps.

3.7 Real World Outcome

What you will see:

  1. A report explaining each integer and float experiment.
  2. Bit-level outputs you can compare to published references.
  3. A deterministic demo of floating-point rounding error.

3.7.1 How to Run (Copy/Paste)

make
./numeric_lab --report > numeric_report.txt

3.7.2 Golden Path Demo (Deterministic)

Run with fixed inputs (e.g., 1, -1, 0.1) and verify bit patterns.

3.7.3 If CLI: exact terminal transcript

$ ./numeric_lab --int 255
Value: 255
Binary: 11111111
Hex: 0xff
Exit: 0

Failure demo (deterministic):

$ ./numeric_lab --float not_a_number
ERROR: invalid float literal
Exit: 2

4. Solution Architecture

4.1 High-Level Design

+-------------------+
| parser             |
+---------+---------+
          |
          v
+-------------------+      +-------------------+
| integer analyzer   | --->| integer report    |
+-------------------+      +-------------------+
          |
          v
+-------------------+      +-------------------+
| float analyzer     | --->| float report      |
+-------------------+      +-------------------+

4.2 Key Components

| Component | Responsibility | Key Decisions | |———–|—————-|—————-| | Parser | Parse CLI inputs | Strict validation | | Integer analyzer | Convert to binary/hex | Use unsigned ops for safety | | Float analyzer | Decode IEEE-754 | Use memcpy for bit reinterpretation |

4.3 Data Structures (No Full Code)

typedef struct {
    const char *label;
    uint64_t bits;
} bit_view_t;

4.4 Algorithm Overview

  1. Parse input value.
  2. For integers, format bits and compute range info.
  3. For floats, extract sign/exponent/mantissa.
  4. Emit report entries.

Complexity Analysis:

  • Time: O(N) per value for bit formatting
  • Space: O(1)

5. Implementation Guide

5.1 Development Environment Setup

clang -std=c23 -Wall -Wextra -Werror -g

5.2 Project Structure

numeric-lab/
├── src/
│   ├── main.c
│   ├── int_bits.c
│   └── float_bits.c
├── include/
├── tests/
└── Makefile

5.3 The Core Question You’re Answering

“What exact bit patterns represent my numbers, and why do they behave the way they do?”

5.4 Concepts You Must Understand First

  1. Two’s complement and signed overflow rules.
  2. Endianness and byte order.
  3. IEEE-754 float structure and rounding.

5.5 Questions to Guide Your Design

  1. How will you represent bits for both 32-bit and 64-bit types?
  2. How will you detect and label special float values?
  3. How will you avoid UB in type conversions?

5.6 Thinking Exercise

Predict the bit pattern of -5 in 8-bit two’s complement.

5.7 The Interview Questions They’ll Ask

  1. Why is signed overflow UB in C?
  2. What is an ULP and why is it important?
  3. How do you detect endianness in C?

5.8 Hints in Layers

  • Hint 1: Start with unsigned values and add signed later.
  • Hint 2: Use memcpy to reinterpret float bits.
  • Hint 3: Validate with Python’s struct module.

5.9 Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | Data representation | “CS:APP” — Bryant | Ch. 2 | | Bit-level hacks | “Hacker’s Delight” — Warren | Ch. 1-3 |

5.10 Implementation Phases

Phase 1: Foundation (3-4 days)

  • Integer bit formatting.
  • Checkpoint: Binary/hex outputs correct.

Phase 2: Core Functionality (4-5 days)

  • Float decoding and special values.
  • Checkpoint: Sign/exponent/mantissa printed.

Phase 3: Polish & Edge Cases (2-3 days)

  • Error handling and report formatting.
  • Checkpoint: Report covers fixed demo set.

5.11 Key Implementation Decisions

| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Bit display | string, array | string | Simple output | | Float decode | bitfields, shifts | shifts | Portability |


6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |———|———|———-| | Unit tests | Validate bit conversions | Known values | | Integration tests | Full report | Fixed demo set | | Edge case tests | NaN, subnormal | Special cases |

6.2 Critical Test Cases

  1. Integer boundary values (INT_MAX, INT_MIN).
  2. Float values 0.1, 1.0, NaN, inf.
  3. Endianness check against known pattern.

6.3 Test Data

Input: 0.1
Expected bits: 0x3fb999999999999a (double)

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |——–|———|———-| | Signed overflow tests optimized away | Inconsistent results | Use checked arithmetic or wider types | | Aliasing UB in float decoding | Random bits | Use memcpy | | Endianness confusion | Reversed output | Print bytes and label order |

7.2 Debugging Strategies

  • Compare results with Python struct.pack.
  • Use -fno-strict-aliasing to validate assumptions.

7.3 Performance Traps

Formatting bits as strings in tight loops can be slow; limit to demo inputs.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add a table of power-of-two boundaries.
  • Print signed/unsigned interpretation side by side.

8.2 Intermediate Extensions

  • Implement half-precision (16-bit) float decoding.
  • Add rounding mode toggles with fesetround.

8.3 Advanced Extensions

  • Compare results across different compilers/architectures.
  • Add fixed-point representation experiments.

9. Real-World Connections

9.1 Industry Applications

  • Protocol serialization and endian conversion.
  • Debugging floating-point drift in simulations.
  • SoftFloat — software floating-point reference.
  • Compiler runtime libraries for FP support.

9.3 Interview Relevance

  • Common questions on overflow, IEEE-754, and endianness.

10. Resources

10.1 Essential Reading

  • Goldberg’s floating-point paper
  • “CS:APP” — data representation chapter

10.2 Video Resources

  • IEEE-754 explainers and numerics talks

10.3 Tools & Documentation

  • bc for arithmetic checks
  • Python struct for validation

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain two’s complement and overflow.
  • I can decode a float’s exponent and mantissa.
  • I can describe endianness without looking it up.

11.2 Implementation

  • Integer and float bit dumps are correct.
  • The report is deterministic.
  • Error handling is clear and consistent.

11.3 Growth

  • I can reason about numerical errors in real code.
  • I can explain why a result differs across platforms.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Integer and float bit visualizers.
  • Endianness detection demo.
  • Report with fixed demo inputs.

Full Completion:

  • All minimum criteria plus:
  • Rounding error experiments with explanations.

Excellence (Going Above & Beyond):

  • Support for multiple float formats and automated cross-platform comparisons.