LEARN PYTHON2 VS PYTHON3 DEEP DIVE
Learn Python 2 vs Python 3: Understanding the Evolution
Goal: Deeply understand the differences between Python 2.7 and Python 3, why Python 3 was designed the way it was, how to work with legacy codebases, and how to leverage Python 3’s modern features.
Why This Matters
Python 2.7 reached end of life on January 1, 2020, but millions of lines of legacy code still exist. Understanding the differences between Python 2 and 3 is essential for:
- Migrating legacy codebases - Many companies still have Python 2 code that needs updating
- Understanding Python 3’s design decisions - Know why things changed, not just what changed
- Reading older tutorials and Stack Overflow answers - Pre-2020 content often uses Python 2 syntax
- Appreciating Python 3’s improvements - You’ll understand why modern Python is better
- Historical perspective - Understanding the “Python 2 vs 3” war that almost split the community
The Historical Context
Timeline:
2000 ─────── Python 2.0 released (list comprehensions, garbage collection)
│
2008 ─────── Python 3.0 released (breaking changes, "the great divide")
│ Community splits: many refuse to migrate
│
2010 ─────── Python 2.7 released (final 2.x version)
│ "The last Python 2 ever"
│
2014-2018 ── Slow migration, libraries adopt Python 3
│
2020 ─────── Python 2.7 End of Life
│ No more security patches
│
Today ────── Python 3.12+ with type hints, async/await, pattern matching
The migration from Python 2 to 3 was one of the most challenging transitions in programming language history. Guido van Rossum (Python’s creator) made deliberate breaking changes to fix fundamental design mistakes—most notably around strings and Unicode.
Core Differences Overview
The Big Categories
| Category | Python 2 | Python 3 | Impact |
|---|---|---|---|
| Strings | ASCII by default, separate unicode type |
Unicode by default, separate bytes type |
HUGE - biggest migration pain |
Statement: print "hello" |
Function: print("hello") |
Syntax change | |
| Division | Integer: 5/2 = 2 |
True: 5/2 = 2.5 |
Logic changes |
| Iterators | Lists: range(), dict.keys() |
Iterators: memory-efficient | Performance |
| Exceptions | except E, e: |
except E as e: |
Syntax change |
| Standard Library | urllib2, ConfigParser |
urllib.request, configparser |
Import changes |
Core Concept Analysis
1. The String/Unicode Revolution (The Biggest Change)
This is the most important difference and the source of most migration pain.
Python 2 String Model (Problematic)
┌─────────────────────────────────────────────────────────────┐
│ Python 2 │
├─────────────────────────────────────────────────────────────┤
│ str ──────► Bytes (ASCII) "hello" is bytes │
│ unicode ──► Text (Unicode) u"hello" is text │
│ │
│ PROBLEM: Python 2 IMPLICITLY converts between them! │
│ "hello" + u"world" → Works... until it doesn't │
│ Fails silently with non-ASCII characters │
└─────────────────────────────────────────────────────────────┘
Python 3 String Model (Fixed)
┌─────────────────────────────────────────────────────────────┐
│ Python 3 │
├─────────────────────────────────────────────────────────────┤
│ str ──────► Text (Unicode) "hello" is text (always!) │
│ bytes ────► Bytes (Binary) b"hello" is bytes │
│ │
│ SOLUTION: NO implicit conversion. TypeError if you mix! │
│ "hello" + b"world" → TypeError (explicit error) │
│ Must encode/decode explicitly │
└─────────────────────────────────────────────────────────────┘
Why This Matters
# Python 2: The "Mojibake" problem
>>> "café" # Bytes, not text!
'caf\xc3\xa9'
>>> print "café"
café # Works... by accident
# This works fine:
>>> "hello" + u"world"
u'helloworld'
# Until you hit non-ASCII:
>>> "café" + u"latte"
Traceback: UnicodeDecodeError # BOOM! At runtime!
# Python 3: Explicit and safe
>>> "café" # This IS Unicode
'café'
>>> "café".encode() # Explicit conversion to bytes
b'caf\xc3\xa9'
>>> "hello" + b"world"
TypeError: can only concatenate str (not "bytes") to str
# Error is IMMEDIATE and CLEAR
2. Print: Statement vs Function
# Python 2
print "Hello" # Statement (no parentheses)
print "Hello", # Trailing comma = no newline
print >> sys.stderr, "Error" # Redirect to stderr
# Python 3
print("Hello") # Function (parentheses required)
print("Hello", end="") # Named argument for no newline
print("Error", file=sys.stderr) # Named argument for file
Why it changed: Functions are more powerful:
- Can be replaced (mocking for tests)
- Can be passed as arguments
- Consistent with everything else in Python
3. Division: Integer vs True
# Python 2
>>> 5 / 2
2 # Integer division (floor)
>>> 5.0 / 2
2.5 # Float division (only if one operand is float)
# Python 3
>>> 5 / 2
2.5 # True division (always)
>>> 5 // 2
2 # Integer division (explicit)
Why it changed: The Python 2 behavior was a common source of bugs. Mathematical division should give mathematical results.
4. Iterators vs Lists
# Python 2
>>> range(5)
[0, 1, 2, 3, 4] # Returns a LIST (uses memory)
>>> xrange(5)
xrange(5) # Returns an iterator (memory-efficient)
>>> dict.keys()
['a', 'b', 'c'] # Returns a LIST
# Python 3
>>> range(5)
range(0, 5) # Returns an iterator (like xrange)
>>> list(range(5))
[0, 1, 2, 3, 4] # Explicit conversion to list
>>> dict.keys()
dict_keys(['a', 'b', 'c']) # Returns a VIEW (live, memory-efficient)
Why it changed: Memory efficiency. range(1000000000) in Python 2 creates a billion-element list. In Python 3, it’s instant and uses almost no memory.
5. Exception Handling Syntax
# Python 2
try:
risky_operation()
except IOError, e: # Comma syntax
print e
# Python 3
try:
risky_operation()
except IOError as e: # 'as' keyword
print(e)
6. Standard Library Reorganization
Python 2 Python 3
──────── ────────
ConfigParser → configparser (PEP 8 naming)
Queue → queue
SocketServer → socketserver
repr → reprlib
Tkinter → tkinter
urllib + urllib2 → urllib.request
urllib.parse
urllib.error
urllib.robotparser
cPickle → (merged into pickle, auto-uses C)
7. Python 3 Exclusive Features
Python 3 didn’t just fix problems—it added powerful new features:
| Feature | Version | Description |
|---|---|---|
| f-strings | 3.6 | f"Hello {name}" - fastest string formatting |
| Type hints | 3.5+ | def greet(name: str) -> str: |
| async/await | 3.5 | Native coroutines for async programming |
| dataclasses | 3.7 | @dataclass decorator for data containers |
| pathlib | 3.4 | Object-oriented filesystem paths |
| walrus operator | 3.8 | := assignment expressions |
| pattern matching | 3.10 | match/case structural pattern matching |
Project List
These projects are designed to teach you the differences through hands-on experience. You’ll build tools that work with both Python versions, migrate code, and leverage Python 3’s modern features.
Project 1: The Unicode/Bytes Torture Chamber
- File: LEARN_PYTHON2_VS_PYTHON3_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: N/A (Python-specific)
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Text Encoding / Unicode
- Software or Tool: Python 2.7, Python 3.x
- Main Book: “Fluent Python, 2nd Edition” by Luciano Ramalho (Chapter 4: Unicode Text Versus Bytes)
What you’ll build: A text processing library that handles files in various encodings (UTF-8, Latin-1, UTF-16, etc.), demonstrating the difference between Python 2’s implicit coercion and Python 3’s explicit model.
Why it teaches Python 2 vs 3: The string/bytes distinction is THE biggest difference. By building a tool that reads/writes encoded files, you’ll experience firsthand why Python 2’s model was broken and why Python 3’s explicitness is better.
Core challenges you’ll face:
- Reading files with unknown encodings → maps to understanding encoding detection
- Mixing bytes and text → maps to Python 3’s TypeError vs Python 2’s silent corruption
- The “Mojibake” problem → maps to encoding/decoding failures
- File mode differences → maps to text mode vs binary mode
Key Concepts:
- Unicode fundamentals: “Fluent Python” Chapter 4 - Luciano Ramalho
- Encoding detection: chardet library documentation
- The Absolute Minimum About Unicode: Joel Spolsky’s famous article
- Python 3 Unicode HOWTO: Python official documentation
Difficulty: Intermediate Time estimate: Weekend Prerequisites: Basic Python, understanding that text is not the same as bytes
Real world outcome:
$ python3 unicode_torture.py
Testing encoding handling...
File: data/utf8_sample.txt
Detected: UTF-8
Content: "Héllo Wörld 日本語 🎉"
Bytes: 27
Characters: 18
File: data/latin1_sample.txt
Detected: ISO-8859-1
Content: "Café résumé naïve"
File: data/mixed_encoding.txt
WARNING: Mixed encodings detected!
Line 1: UTF-8
Line 5: Windows-1252
Python 2 vs 3 comparison:
Python 2: Would silently corrupt at line 5
Python 3: Raises UnicodeDecodeError (explicit failure)
Conversion results:
✓ All files normalized to UTF-8
✓ 3 files converted
✓ 0 data loss
Implementation Hints:
Think about what encoding really means:
- A character is an abstract concept (the letter “A”, the emoji “🎉”)
- A code point is a number assigned to that character (U+0041 for “A”)
- An encoding is how to represent code points as bytes (UTF-8, UTF-16, Latin-1)
The key insight:
"café" as a concept
↓ encode('utf-8')
b'caf\xc3\xa9' as bytes
↓ decode('utf-8')
"café" as text again
Python 2’s mistake was treating str as both bytes AND text. Python 3 forces you to be explicit.
Build your tool to:
- Read a file in binary mode (
'rb') - Detect the encoding (using chardet or similar)
- Decode to Unicode text
- Process the text
- Encode back to the desired output encoding
- Write in binary mode
Test with:
- Pure ASCII files
- UTF-8 with emoji
- Latin-1 (ISO-8859-1) files
- Windows-1252 “smart quotes”
- Files with BOM (Byte Order Mark)
Learning milestones:
- You understand encode/decode → You know text is not bytes
- You can detect encodings → You understand real-world file handling
- You’ve seen Mojibake → You understand why Python 2’s model failed
- Your tool handles any encoding → You’ve mastered the Unicode model
Project 2: The Division Calculator Time Machine
- File: LEARN_PYTHON2_VS_PYTHON3_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: N/A
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 1: Beginner
- Knowledge Area: Numeric Types / Division Semantics
- Software or Tool: Python 2.7, Python 3.x, Calculator
- Main Book: “Python Cookbook, 3rd Edition” by David Beazley
What you’ll build: A mathematical expression evaluator that demonstrates the division behavior differences between Python 2 and Python 3, with a mode to simulate Python 2 behavior in Python 3.
Why it teaches Python 2 vs 3: The integer division change broke countless programs. By building a calculator that can switch between behaviors, you’ll understand the implications of 5/2 = 2 vs 5/2 = 2.5.
Core challenges you’ll face:
- Simulating Python 2 division in Python 3 → maps to understanding the
//and/operators - The
from __future__ import division→ maps to backporting Python 3 behavior to Python 2 - Type coercion rules → maps to when does division become float division?
Key Concepts:
- True division vs floor division: PEP 238
- Operator precedence: “Python Cookbook” Chapter 1 - David Beazley
- Numeric tower: Python documentation on numeric types
Difficulty: Beginner Time estimate: One evening Prerequisites: Basic arithmetic, basic Python
Real world outcome:
$ python3 division_calculator.py
Division Calculator - Python 2 vs 3 Comparison
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Enter expression: 5 / 2
┌────────────────┬────────────┬────────────┐
│ Expression │ Python 2 │ Python 3 │
├────────────────┼────────────┼────────────┤
│ 5 / 2 │ 2 │ 2.5 │
│ 5 // 2 │ 2 │ 2 │
│ 5.0 / 2 │ 2.5 │ 2.5 │
│ -5 / 2 │ -3 │ -2.5 │
│ -5 // 2 │ -3 │ -3 │
└────────────────┴────────────┴────────────┘
⚠️ Note: Python 2's -5/2 = -3 (floor), Python 3's -5//2 = -3 too
But Python 3's -5/2 = -2.5 (true division)
Enter expression: 7 / 3
┌────────────────┬────────────┬────────────┐
│ Expression │ Python 2 │ Python 3 │
├────────────────┼────────────┼────────────┤
│ 7 / 3 │ 2 │ 2.333... │
│ 7 // 3 │ 2 │ 2 │
│ 7 % 3 │ 1 │ 1 │
└────────────────┴────────────┴────────────┘
Implementation Hints:
The fundamental difference:
- Python 2
/: Floor division for integers, true division if either operand is float - Python 3
/: Always true division - Both
//: Always floor division
To simulate Python 2 behavior in Python 3:
# Python 3 simulating Python 2
def python2_div(a, b):
if isinstance(a, int) and isinstance(b, int):
return a // b # Floor division
return a / b # True division
The tricky case is negative numbers:
- Floor division rounds toward negative infinity
-7 // 2 = -4(not -3!)
Build a REPL that:
- Parses mathematical expressions
- Evaluates them under both Python 2 and Python 3 rules
- Highlights differences
- Explains why the results differ
Learning milestones:
- You understand
/vs//→ You won’t accidentally use the wrong operator - You know about negative floor division → You understand the edge cases
- You can use
from __future__ import division→ You can write Python 2 code with Python 3 behavior - You’ve built a working calculator → You’ve internalized the difference
Project 3: The Print Statement Converter
- File: LEARN_PYTHON2_VS_PYTHON3_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Parsing / AST Manipulation
- Software or Tool: Python AST module, 2to3
- Main Book: “CPython Internals” by Anthony Shaw
What you’ll build: A tool that converts Python 2 print statements to Python 3 print() functions, handling all edge cases (trailing comma, stderr redirection, print >> file).
Why it teaches Python 2 vs 3: The print change is the most visible syntax difference. By building a converter, you’ll understand both syntaxes deeply and learn about Python’s AST (Abstract Syntax Tree).
Core challenges you’ll face:
- Parsing Python 2 print statements → maps to understanding Python 2’s grammar
- Handling
print >> file, "msg"→ maps to thefile=keyword argument - Trailing comma behavior → maps to the
end=keyword argument - Preserving formatting → maps to AST to source code conversion
Key Concepts:
- Python AST: Python documentation ast module
- 2to3 internals: lib2to3 library
- Print function semantics: PEP 3105 - Make print a function
Difficulty: Intermediate Time estimate: Weekend Prerequisites: Understanding of parsing concepts, basic Python
Real world outcome:
$ python3 print_converter.py legacy_code.py
Converting Python 2 print statements to Python 3...
Line 5: print "Hello"
→ print("Hello")
Line 12: print "Value:", x
→ print("Value:", x)
Line 23: print "No newline",
→ print("No newline", end=" ")
Line 31: print >> sys.stderr, "Error!"
→ print("Error!", file=sys.stderr)
Line 45: print
→ print()
Line 52: print ("Already", "function", "call")
→ print("Already", "function", "call") # No change needed
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Summary:
✓ 6 print statements converted
✓ 1 already valid Python 3
✓ Output written to: legacy_code_py3.py
Implementation Hints:
The print statement had several forms:
# Python 2 print statement variations
print # Empty line
print "hello" # Simple string
print "a", "b", "c" # Multiple items (space-separated)
print "no newline", # Trailing comma = no newline
print >> sys.stderr, "error" # Redirect to file
print >> logfile, "message", # Redirect + no newline
Each converts to:
# Python 3 equivalents
print() # Empty line
print("hello") # Simple string
print("a", "b", "c") # Multiple items
print("no newline", end=" ") # end parameter
print("error", file=sys.stderr) # file parameter
print("message", file=logfile, end=" ") # Both parameters
Approach:
- Use regex or tokenizer to find
printstatements - Parse the arguments
- Identify special cases (trailing comma,
>>redirection) - Generate the equivalent
print()call - Handle edge cases (print is a valid variable name in Python 3!)
Tricky edge case:
# This is valid Python 2:
print ("hello") # Looks like a function, but it's a statement with parens!
# Python sees it as:
print("hello",) # A tuple! Prints: ('hello',)
Learning milestones:
- You understand all print statement forms → You can read any Python 2 code
- You’ve built a working converter → You understand parsing and AST
- You handle edge cases → You’ve mastered the subtleties
- You appreciate why print became a function → You understand Python 3’s philosophy
Project 4: Iterator vs List Profiler
- File: LEARN_PYTHON2_VS_PYTHON3_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Memory Management / Iterators
- Software or Tool: memory_profiler, tracemalloc
- Main Book: “High Performance Python, 2nd Edition” by Micha Gorelick & Ian Ozsvald
What you’ll build: A memory profiling tool that demonstrates the difference between Python 2’s list-returning functions (range, map, filter, dict.keys()) and Python 3’s iterator-returning versions.
Why it teaches Python 2 vs 3: Python 3’s shift to iterators was a major performance improvement. By profiling memory usage, you’ll see why range(1000000000) works in Python 3 but crashes Python 2.
Core challenges you’ll face:
- Measuring memory usage → maps to understanding Python’s memory model
- Understanding lazy evaluation → maps to iterators vs lists
- dict views vs dict lists → maps to the new dict_keys, dict_values, dict_items types
Key Concepts:
- Iterator protocol: “Fluent Python” Chapter 17 - Luciano Ramalho
- Memory profiling: memory_profiler documentation
- Generators: “Python Cookbook” Chapter 4 - David Beazley
Difficulty: Intermediate Time estimate: Weekend Prerequisites: Basic Python, understanding of memory concepts
Real world outcome:
$ python3 iterator_profiler.py
Iterator vs List Memory Comparison
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Test: range(10,000,000)
┌──────────────┬─────────────┬───────────────┐
│ Version │ Memory Used │ Time │
├──────────────┼─────────────┼───────────────┤
│ Python 2 │ 400 MB │ 0.8s │
│ (list) │ │ │
├──────────────┼─────────────┼───────────────┤
│ Python 3 │ 48 bytes │ <0.001s │
│ (iterator) │ │ │
└──────────────┴─────────────┴───────────────┘
Improvement: 8,333,333x less memory!
Test: dict.keys() on 1,000,000 items
┌──────────────┬─────────────┬───────────────┐
│ Version │ Memory Used │ Live updates? │
├──────────────┼─────────────┼───────────────┤
│ Python 2 │ 40 MB │ No (snapshot) │
├──────────────┼─────────────┼───────────────┤
│ Python 3 │ 48 bytes │ Yes (view) │
└──────────────┴─────────────┴───────────────┘
Test: map(str, range(1,000,000))
┌──────────────┬─────────────┬───────────────┐
│ Version │ Behavior │ Memory │
├──────────────┼─────────────┼───────────────┤
│ Python 2 │ List │ 50 MB │
├──────────────┼─────────────┼───────────────┤
│ Python 3 │ Iterator │ 48 bytes │
│ │ (lazy) │ + on-demand │
└──────────────┴─────────────┴───────────────┘
Implementation Hints:
The key insight is lazy evaluation:
- Python 2’s
range(n)creates ALL n numbers immediately - Python 3’s
range(n)is an object that generates numbers on-demand
# Python 3 - This is instant and uses almost no memory:
r = range(1_000_000_000_000) # One trillion!
# You can still use it like a list:
print(r[500]) # 500
print(1000 in r) # True
print(len(r)) # 1000000000000
# But it never creates all billion numbers!
Functions that changed:
| Python 2 | Python 3 | Returns |
|———-|———-|———|
| range() | range() | iterator (was list) |
| xrange() | (removed) | - |
| map() | map() | iterator (was list) |
| filter() | filter() | iterator (was list) |
| zip() | zip() | iterator (was list) |
| dict.keys() | dict.keys() | view (was list) |
| dict.values() | dict.values() | view (was list) |
| dict.items() | dict.items() | view (was list) |
Build a profiler that:
- Uses
tracemallocto measure memory before/after operations - Compares list creation vs iterator creation
- Shows the “live view” behavior of dict views
- Demonstrates when you NEED a list vs when iterator is fine
Learning milestones:
- You see the memory difference → You understand why iterators matter
- You understand lazy evaluation → You can write memory-efficient code
- You know when to use
list()→ You understand the tradeoffs - You appreciate dict views → You understand the new dict methods
Project 5: The Exception Syntax Migrator
- File: LEARN_PYTHON2_VS_PYTHON3_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: N/A
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 1: Beginner
- Knowledge Area: Exception Handling / Syntax
- Software or Tool: Python regex, AST
- Main Book: “Effective Python, 3rd Edition” by Brett Slatkin
What you’ll build: A tool that finds and converts Python 2 exception syntax (except E, e:) to Python 3 syntax (except E as e:), plus identifies other exception-related changes.
Why it teaches Python 2 vs 3: Exception handling has several subtle differences beyond just the syntax. By building a migrator, you’ll learn about exception chaining, bare raise, and the __traceback__ attribute.
Core challenges you’ll face:
- Finding the comma syntax → maps to regex/AST parsing
- Exception chaining (
raise from) → maps to Python 3’s enhanced tracebacks - The
__traceback__attribute → maps to accessing traceback objects
Key Concepts:
- Exception handling: “Effective Python” Item 65-66 - Brett Slatkin
- Exception chaining: PEP 3134
- Traceback module: Python documentation
Difficulty: Beginner Time estimate: One evening Prerequisites: Basic Python exception handling
Real world outcome:
$ python3 exception_migrator.py legacy_code.py
Exception Syntax Analysis
━━━━━━━━━━━━━━━━━━━━━━━━━
Found 12 exception handlers to update:
Line 23: except ValueError, e:
→ except ValueError as e:
Line 45: except (TypeError, KeyError), e:
→ except (TypeError, KeyError) as e:
Line 67: except:
⚠️ Bare except - catches KeyboardInterrupt/SystemExit!
→ Consider: except Exception as e:
Line 89: raise IOError, "File not found"
→ raise IOError("File not found")
Line 102: raise IOError, "File not found", tb
→ raise IOError("File not found").with_traceback(tb)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
New Python 3 Features Available:
• Exception chaining: raise NewError() from original_error
• Accessing traceback: error.__traceback__
• Suppressing context: raise NewError() from None
Implementation Hints:
Exception handling changes:
# Python 2 exception catching
except TypeError, ValueError: # WRONG! Catches TypeError, binds to ValueError
except (TypeError, ValueError), e: # Correct for multiple types
# Python 3
except TypeError as e: # Clear syntax
except (TypeError, ValueError) as e: # Multiple types
The raise statement changed too:
# Python 2
raise ValueError, "message" # Two-argument form
raise ValueError, "message", traceback # Three-argument form
# Python 3
raise ValueError("message") # Constructor call
raise ValueError("message").with_traceback(tb) # Explicit traceback
Python 3 added exception chaining:
# Python 3 only
try:
process_file(name)
except FileNotFoundError as e:
raise RuntimeError("Processing failed") from e
# Shows BOTH tracebacks: the original and the new one
Learning milestones:
- You can spot Python 2 syntax → You can read legacy code
- You understand exception chaining → You can write better error handling
- You know about bare except dangers → You avoid catching KeyboardInterrupt
- You use
fromin raise → You write Python 3-style exceptions
Project 6: The Standard Library Import Fixer
- File: LEARN_PYTHON2_VS_PYTHON3_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: N/A
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Standard Library / Imports
- Software or Tool: Python AST, importlib
- Main Book: “The Python Standard Library by Example” by Doug Hellmann
What you’ll build: A tool that scans Python 2 code for standard library imports and generates the correct Python 3 equivalents, handling all the renamed and reorganized modules.
Why it teaches Python 2 vs 3: The standard library reorganization is a major source of “ImportError” when running Python 2 code on Python 3. This project forces you to learn the mapping.
Core challenges you’ll face:
- urllib/urllib2 → urllib.request/urllib.parse → maps to the most confusing reorganization
- ConfigParser → configparser → maps to PEP 8 naming conventions
- Merged modules (cPickle → pickle) → maps to automatic C acceleration
Key Concepts:
- Standard library reorganization: PEP 3108
- The six library: six.moves documentation
- Import system: Python documentation on imports
Difficulty: Intermediate Time estimate: Weekend Prerequisites: Familiarity with Python imports, basic AST knowledge
Real world outcome:
$ python3 import_fixer.py legacy_project/
Scanning for Python 2 imports...
┌─────────────────────────────────────────────────────────────────┐
│ File: legacy_project/network.py │
├─────────────────────────────────────────────────────────────────┤
│ Line 3: import urllib2 │
│ → from urllib.request import urlopen, Request │
│ → from urllib.error import HTTPError, URLError │
│ │
│ Line 4: from urlparse import urlparse, urljoin │
│ → from urllib.parse import urlparse, urljoin │
│ │
│ Line 7: import cPickle │
│ → import pickle # C version is automatic in Python 3 │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ File: legacy_project/config.py │
├─────────────────────────────────────────────────────────────────┤
│ Line 2: import ConfigParser │
│ → import configparser │
│ │
│ Line 45: cfg = ConfigParser.SafeConfigParser() │
│ → cfg = configparser.ConfigParser() # SafeConfigParser │
│ is now just ConfigParser │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ File: legacy_project/server.py │
├─────────────────────────────────────────────────────────────────┤
│ Line 1: import SocketServer │
│ → import socketserver │
│ │
│ Line 2: import Queue │
│ → import queue │
└─────────────────────────────────────────────────────────────────┘
Summary:
Scanned: 15 files
Imports to fix: 23
Automated fixes available: 20
Manual review needed: 3 (complex urllib2 usage)
Implementation Hints:
Here’s the complete mapping of renamed modules:
RENAMED_MODULES = {
# PEP 8 naming (lowercase)
'ConfigParser': 'configparser',
'Queue': 'queue',
'SocketServer': 'socketserver',
'Tkinter': 'tkinter',
'repr': 'reprlib',
# Merged into one
'cPickle': 'pickle',
'cStringIO': 'io',
# Completely reorganized
'urllib': 'urllib.request', # Partial
'urllib2': 'urllib.request', # Partial
'urlparse': 'urllib.parse',
'robotparser': 'urllib.robotparser',
}
# urllib is the trickiest:
URLLIB_MAPPING = {
# From urllib
'urllib.urlopen': 'urllib.request.urlopen',
'urllib.urlencode': 'urllib.parse.urlencode',
# From urllib2
'urllib2.urlopen': 'urllib.request.urlopen',
'urllib2.Request': 'urllib.request.Request',
'urllib2.HTTPError': 'urllib.error.HTTPError',
'urllib2.URLError': 'urllib.error.URLError',
# From urlparse
'urlparse.urlparse': 'urllib.parse.urlparse',
'urlparse.urljoin': 'urllib.parse.urljoin',
'urlparse.parse_qs': 'urllib.parse.parse_qs',
}
Build a tool that:
- Parses Python files for import statements
- Identifies Python 2-only modules
- Generates the correct Python 3 imports
- Handles
from X import Ystyle imports - Warns about behavioral changes (not just renames)
Learning milestones:
- You know the module renames → You can fix ImportError quickly
- You understand urllib’s reorganization → You’ve tackled the hardest mapping
- You know about automatic C acceleration → You understand pickle vs cPickle
- You can migrate any imports → You’re ready for real migrations
Project 7: The input() vs raw_input() Sandbox
- File: LEARN_PYTHON2_VS_PYTHON3_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: N/A
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 1: Beginner
- Knowledge Area: User Input / Security
- Software or Tool: Python REPL
- Main Book: “Effective Python, 3rd Edition” by Brett Slatkin
What you’ll build: An interactive demonstration of the dangerous difference between Python 2’s input() (which evaluates code!) and raw_input() (which returns a string), showing why Python 3 removed the dangerous behavior.
Why it teaches Python 2 vs 3: Python 2’s input() was a security vulnerability that evaluated arbitrary code. This project shows why it was removed and replaced with Python 3’s safe input().
Core challenges you’ll face:
- Demonstrating the security issue safely → maps to understanding code evaluation
- Showing eval() dangers → maps to arbitrary code execution
- Creating a safe sandbox → maps to security best practices
Key Concepts:
- Security implications of eval(): OWASP documentation on injection
- Input handling: “Effective Python” Item 65 - Brett Slatkin
- Sandboxing: Python subprocess module
Difficulty: Beginner Time estimate: One evening Prerequisites: Basic Python, understanding of security concepts
Real world outcome:
$ python3 input_security_demo.py
Python 2 vs 3: The input() Security Disaster
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
In Python 2, there were TWO input functions:
• raw_input() - Returns user input as a string (SAFE)
• input() - EVALUATES user input as Python code (DANGEROUS!)
In Python 3:
• input() - Returns user input as a string (like raw_input())
• (raw_input was removed)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔴 DANGER DEMO (Python 2 behavior simulation)
Simulating Python 2's dangerous input():
Enter your age: __import__('os').system('echo PWNED')
Python 2 would execute that as code!
Result: The string "PWNED" is printed, but imagine if it was:
__import__('os').system('rm -rf /')
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🟢 SAFE DEMO (Python 3 behavior)
Python 3's safe input():
Enter your age: __import__('os').system('echo PWNED')
Result: Just the string "__import__('os').system('echo PWNED')"
No code execution! This is treated as text.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
If you NEED to evaluate user input (rarely!), use:
ast.literal_eval() - Only evaluates literals, no function calls
>>> ast.literal_eval("[1, 2, 3]")
[1, 2, 3]
>>> ast.literal_eval("__import__('os')") # Raises ValueError
Implementation Hints:
The fundamental problem with Python 2:
# Python 2 - DANGEROUS
age = input("Enter age: ") # Equivalent to eval(raw_input())!
# User types: 42
# Result: age = 42 (an integer!)
# User types: __import__('os').system('rm -rf /')
# Result: YOUR FILES ARE DELETED
# Python 2 - Safe
age = raw_input("Enter age: ")
# User types anything, result is always a string
Python 3 fixed this by:
- Removing the dangerous
input()behavior - Renaming
raw_input()toinput() - If you NEED to evaluate, use
eval()explicitly (so it’s obvious)
For your demo, simulate Python 2 behavior:
def python2_input(prompt):
"""Simulate Python 2's dangerous input()"""
return eval(input(prompt)) # DON'T DO THIS IN REAL CODE!
Show safe alternatives:
import ast
# Safe way to evaluate literals
user_data = ast.literal_eval(input("Enter a list: "))
# Works for: [1,2,3], {'a':1}, (1,2), "string", 42, True
# Fails for: function calls, imports, anything dangerous
Learning milestones:
- You understand the security issue → You’ll never use eval() carelessly
- You know raw_input → input → You understand the naming change
- You know about ast.literal_eval → You can safely parse user data
- You’ve seen code injection → You understand why input matters
Project 8: The f-string Converter (Python 3 Upgrade)
- File: LEARN_PYTHON2_VS_PYTHON3_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: String Formatting / Parsing
- Software or Tool: Python AST, regex
- Main Book: “Fluent Python, 2nd Edition” by Luciano Ramalho
What you’ll build: A tool that converts old-style string formatting (% operator and .format()) to Python 3.6+ f-strings where appropriate, demonstrating the evolution of string formatting in Python.
Why it teaches Python 2 vs 3: String formatting evolved significantly. By building a converter, you’ll master all three styles and understand when each is appropriate (f-strings aren’t always best!).
Core challenges you’ll face:
- Parsing % formatting → maps to printf-style format specifiers
- Parsing .format() → maps to positional and keyword arguments
- Knowing when NOT to convert → maps to dynamic format strings, i18n
Key Concepts:
- f-string internals: PEP 498
- String formatting comparison: “Fluent Python” Chapter 4 - Luciano Ramalho
- When to use each style: Real Python’s f-string guide
Difficulty: Intermediate Time estimate: Weekend Prerequisites: Understanding of Python string formatting
Real world outcome:
$ python3 fstring_converter.py legacy_code.py
F-String Conversion Analysis
━━━━━━━━━━━━━━━━━━━━━━━━━━━
Scanning for old-style formatting...
Line 12: "Hello, %s!" % name
→ f"Hello, {name}!"
Line 23: "Value: %.2f" % price
→ f"Value: {price:.2f}"
Line 34: "{} + {} = {}".format(a, b, a+b)
→ f"{a} + {b} = {a+b}"
Line 45: "Name: {name}, Age: {age}".format(name=name, age=age)
→ f"Name: {name}, Age: {age}"
Line 56: "{0} vs {1} vs {0}".format(x, y)
→ f"{x} vs {y} vs {x}" # Note: reuses x
⚠️ Cannot convert (manual review needed):
Line 67: LOG_FORMAT = "%(levelname)s: %(message)s"
Reason: This is a logging format string, not a .format() call
Line 78: template = "Hello, {name}!"
result = template.format(**user_data)
Reason: Dynamic formatting with dictionary unpacking
Line 89: _("Hello, {name}!").format(name=name)
Reason: i18n string (translation function)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Summary:
Convertible to f-strings: 15
Should stay as .format(): 5
Should stay as % formatting: 3
f-strings are 40% faster than .format()!
Implementation Hints:
The three formatting styles:
name = "World"
age = 42
# 1. % formatting (Python 2 style, still works)
"Hello, %s! You are %d years old." % (name, age)
# 2. .format() method (Python 2.6+/3.0+)
"Hello, {}! You are {} years old.".format(name, age)
"Hello, {0}! You are {1} years old.".format(name, age)
"Hello, {name}! You are {age} years old.".format(name=name, age=age)
# 3. f-strings (Python 3.6+)
f"Hello, {name}! You are {age} years old."
When NOT to use f-strings:
- Logging format strings:
logginguses %-formatting internally - Dynamic templates: When the format string comes from a variable
- Internationalization: Translation systems need static strings
- Regex patterns: Braces conflict with regex groups
Format specifiers comparison:
value = 3.14159
# All equivalent:
"%.2f" % value # % style
"{:.2f}".format(value) # .format style
f"{value:.2f}" # f-string style
Build a converter that:
- Parses % format strings using regex
- Parses .format() calls using AST
- Identifies the variables being formatted
- Generates equivalent f-strings
- Detects cases where conversion is inappropriate
Learning milestones:
- You master all three styles → You can read any Python code
- You know when f-strings are best → You write modern Python
- You know when to avoid f-strings → You understand the edge cases
- You understand the performance difference → You make informed choices
Project 9: The Type Hints Migrator
- File: LEARN_PYTHON2_VS_PYTHON3_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: TypeScript (for comparison)
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 2: Intermediate
- Knowledge Area: Type Systems / Static Analysis
- Software or Tool: mypy, pyright
- Main Book: “Robust Python” by Patrick Viafore
What you’ll build: A tool that analyzes Python 2 code and suggests type hints based on variable usage, docstrings, and runtime analysis, introducing you to Python 3’s type system.
Why it teaches Python 2 vs 3: Type hints are one of Python 3’s most significant additions. By building an inference tool, you’ll deeply understand how types flow through Python code.
Core challenges you’ll face:
- Inferring types from usage → maps to understanding duck typing
- Parsing docstring type annotations → maps to legacy documentation formats
- Understanding generics → maps to
List[str],Dict[str, int], etc.
Key Concepts:
- Type hints introduction: PEP 484
- Type checking with mypy: mypy documentation
- Generics and protocols: “Robust Python” Chapter 4-6 - Patrick Viafore
Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Solid Python knowledge, understanding of static typing concepts
Real world outcome:
$ python3 type_inferrer.py legacy_module.py
Type Hint Suggestions for legacy_module.py
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Original (Python 2 style):
```python
def calculate_total(items, tax_rate):
"""Calculate total with tax.
Args:
items: List of prices
tax_rate: Tax rate as decimal
Returns:
Total price with tax
"""
subtotal = sum(items)
return subtotal * (1 + tax_rate)
Suggested (Python 3 with type hints):
def calculate_total(items: list[float], tax_rate: float) -> float:
"""Calculate total with tax."""
subtotal = sum(items)
return subtotal * (1 + tax_rate)
Analysis: • items: Inferred as list[float] from sum() usage • tax_rate: Inferred as float from arithmetic • return: Inferred as float from calculation
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Additional suggestions: • Line 45: user_data: dict[str, Any] (complex nested structure) • Line 67: callback: Callable[[int, str], bool] • Line 89: Consider using TypedDict for config parameter
Would you like to apply these suggestions? [y/N]
**Implementation Hints**:
Type hints evolution:
```python
# Python 3.5+
from typing import List, Dict, Optional
def greet(name: str) -> str:
return f"Hello, {name}!"
def process(items: List[int]) -> Dict[str, int]:
return {"count": len(items), "sum": sum(items)}
# Python 3.9+ (no imports needed for built-in generics)
def process(items: list[int]) -> dict[str, int]:
return {"count": len(items), "sum": sum(items)}
# Python 3.10+ (union syntax)
def maybe_int(value: str) -> int | None:
try:
return int(value)
except ValueError:
return None
Common type hint patterns:
from typing import Optional, Union, Callable, TypeVar, Generic
# Optional (can be None)
def find(name: str) -> Optional[User]:
...
# Union (multiple types)
def process(value: Union[int, str]) -> None:
...
# Callable (function type)
def apply(func: Callable[[int], int], value: int) -> int:
return func(value)
# TypeVar (generics)
T = TypeVar('T')
def first(items: list[T]) -> T:
return items[0]
Build a tool that:
- Parses function signatures
- Extracts type info from docstrings (Google, NumPy, Sphinx formats)
- Infers types from variable usage (isinstance checks, arithmetic, etc.)
- Generates appropriate type hints
- Validates with mypy
Learning milestones:
- You understand basic type hints → You can type simple functions
- You know about generics → You can type complex data structures
- You understand Callable and TypeVar → You can type higher-order functions
- You can run mypy → You can validate types in real projects
Project 10: The async/await Converter
- File: LEARN_PYTHON2_VS_PYTHON3_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: JavaScript (for comparison)
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Asynchronous Programming / Concurrency
- Software or Tool: asyncio, aiohttp
- Main Book: “Using Asyncio in Python” by Caleb Hattingh
What you’ll build: A tool that converts synchronous Python code to asynchronous code using async/await, and vice versa, demonstrating Python 3’s native coroutine support.
Why it teaches Python 2 vs 3: Python 2 had no native async support (only threads and callbacks). Python 3.5+ introduced async/await, which is fundamentally different. Understanding this shift is essential for modern Python.
Core challenges you’ll face:
- Understanding the event loop → maps to cooperative multitasking
- Converting blocking calls to async → maps to aiohttp, aiofiles, etc.
- Managing concurrent tasks → maps to asyncio.gather, asyncio.create_task
Key Concepts:
- Coroutines and event loops: “Using Asyncio in Python” Chapter 1-2 - Caleb Hattingh
- async/await syntax: PEP 492
- Asyncio patterns: asyncio documentation
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Understanding of concurrency concepts, experience with callbacks or threads
Real world outcome:
$ python3 async_converter.py sync_downloader.py
Async Conversion Analysis
━━━━━━━━━━━━━━━━━━━━━━━━
Original synchronous code:
```python
import requests
def download_all(urls):
results = []
for url in urls:
response = requests.get(url)
results.append(response.text)
return results
# Downloads sequentially: 10 URLs × 1 second = 10 seconds
Converted to async:
import asyncio
import aiohttp
async def download_all(urls):
async with aiohttp.ClientSession() as session:
tasks = [fetch(session, url) for url in urls]
results = await asyncio.gather(*tasks)
return results
async def fetch(session, url):
async with session.get(url) as response:
return await response.text()
# Downloads concurrently: 10 URLs = ~1 second!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Required changes: • requests → aiohttp (async HTTP library) • def → async def • .get() → await session.get() • for loop → asyncio.gather() for concurrency
Dependencies to add: pip install aiohttp
Performance comparison: Sync: 10.2 seconds (10 URLs) Async: 1.1 seconds (10 URLs) Speedup: 9.3x
**Implementation Hints**:
The evolution of async in Python:
```python
# Python 2: Callbacks (Twisted style)
def fetch(url, callback):
# Make request
callback(result)
fetch(url1, lambda r1: fetch(url2, lambda r2: process(r1, r2)))
# "Callback hell"
# Python 3.4: Generators with yield from
@asyncio.coroutine
def fetch(url):
response = yield from aiohttp.get(url)
return (yield from response.text())
# Python 3.5+: Native async/await
async def fetch(url):
async with aiohttp.get(url) as response:
return await response.text()
Key async concepts:
import asyncio
# 1. Coroutine function
async def my_coroutine():
await asyncio.sleep(1)
return "done"
# 2. Running coroutines
asyncio.run(my_coroutine()) # Python 3.7+
# 3. Concurrent execution
async def main():
# Run multiple coroutines concurrently
results = await asyncio.gather(
fetch(url1),
fetch(url2),
fetch(url3),
)
return results
# 4. Creating tasks
async def main():
task1 = asyncio.create_task(fetch(url1))
task2 = asyncio.create_task(fetch(url2))
# Tasks start immediately
result1 = await task1
result2 = await task2
Build a converter that:
- Identifies I/O-bound operations (file, network, database)
- Suggests async alternatives for common libraries
- Adds
asyncto function definitions - Wraps calls with
await - Converts loops to
asyncio.gather
Learning milestones:
- You understand the event loop → You know how async works
- You can convert sync to async → You can modernize codebases
- You know when NOT to use async → You understand CPU-bound vs I/O-bound
- You can debug async code → You’ve mastered concurrent programming
Project 11: The dataclass Generator
- File: LEARN_PYTHON2_VS_PYTHON3_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: Kotlin (for comparison)
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: OOP / Metaprogramming
- Software or Tool: dataclasses module, attrs
- Main Book: “Fluent Python, 2nd Edition” by Luciano Ramalho
What you’ll build: A tool that converts traditional Python 2-style classes (with manual __init__, __repr__, __eq__) to Python 3.7+ dataclasses, showing the dramatic reduction in boilerplate.
Why it teaches Python 2 vs 3: dataclasses eliminate massive amounts of boilerplate. By building a converter, you’ll understand what @dataclass actually generates and when to use it.
Core challenges you’ll face:
- Parsing class definitions → maps to AST analysis
- Detecting data-holding patterns → maps to identifying when dataclass is appropriate
- Understanding frozen and slots → maps to immutability and memory optimization
Key Concepts:
- dataclasses module: PEP 557
- Comparison with namedtuple and attrs: “Fluent Python” Chapter 5 - Luciano Ramalho
- Slots and memory: Python documentation
Difficulty: Intermediate Time estimate: Weekend Prerequisites: Understanding of Python classes, OOP basics
Real world outcome:
$ python3 dataclass_converter.py models.py
Dataclass Conversion Analysis
━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Original Python 2-style class (32 lines):
```python
class User:
def __init__(self, name, email, age=None, active=True):
self.name = name
self.email = email
self.age = age
self.active = active
def __repr__(self):
return f"User(name={self.name!r}, email={self.email!r}, age={self.age!r}, active={self.active!r})"
def __eq__(self, other):
if not isinstance(other, User):
return NotImplemented
return (self.name == other.name and
self.email == other.email and
self.age == other.age and
self.active == other.active)
def __hash__(self):
return hash((self.name, self.email, self.age, self.active))
Converted to dataclass (6 lines):
from dataclasses import dataclass
from typing import Optional
@dataclass(frozen=True) # frozen=True for hashability
class User:
name: str
email: str
age: Optional[int] = None
active: bool = True
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Code reduction: 81%! (32 lines → 6 lines)
Auto-generated methods: ✓ init ✓ repr ✓ eq ✓ hash (when frozen=True)
Additional options available: • @dataclass(slots=True) # Python 3.10+, memory efficient • @dataclass(kw_only=True) # Python 3.10+, keyword-only args • field(default_factory=list) # For mutable defaults
**Implementation Hints**:
The evolution of data-holding classes:
```python
# Python 2: Manual everything
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __repr__(self):
return f"Point({self.x}, {self.y})"
def __eq__(self, other):
return self.x == other.x and self.y == other.y
# Python 3.0+: namedtuple (immutable)
from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
# Python 3.7+: dataclass
from dataclasses import dataclass
@dataclass
class Point:
x: float
y: float
Dataclass features:
from dataclasses import dataclass, field
from typing import List
@dataclass
class User:
name: str # Required
email: str # Required
age: int = 0 # Optional with default
tags: List[str] = field(default_factory=list) # Mutable default
_internal: str = field(repr=False) # Excluded from repr
# What gets generated:
# __init__(name, email, age=0, tags=None, _internal=None)
# __repr__() -> "User(name='...', email='...', age=0, tags=[...])"
# __eq__() -> compares all fields
# (optionally) __hash__(), __lt__(), __le__(), __gt__(), __ge__()
Build a converter that:
- Parses class definitions with AST
- Identifies
__init__assignments toself.x - Detects if
__repr__,__eq__,__hash__are standard implementations - Generates equivalent
@dataclasscode - Handles mutable default values correctly
Learning milestones:
- You understand dataclass basics → You eliminate boilerplate
- You know about field() → You handle complex defaults
- You understand frozen and slots → You optimize for specific use cases
- You can compare with attrs → You know the ecosystem
Project 12: The pathlib Migration Tool
- File: LEARN_PYTHON2_VS_PYTHON3_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: N/A
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 1: Beginner
- Knowledge Area: Filesystem / Path Handling
- Software or Tool: pathlib, os.path
- Main Book: “Python Cookbook, 3rd Edition” by David Beazley
What you’ll build: A tool that converts old os.path code to modern pathlib usage, demonstrating the object-oriented path handling introduced in Python 3.4.
Why it teaches Python 2 vs 3: pathlib is one of Python 3’s best additions—it makes path manipulation intuitive and cross-platform. Converting legacy code shows the improvement.
Core challenges you’ll face:
- Mapping os.path functions to pathlib → maps to understanding both APIs
- Handling string paths vs Path objects → maps to gradual migration
- Cross-platform path handling → maps to PurePath vs Path
Key Concepts:
- pathlib module: PEP 428
- Path object methods: “Python Cookbook” Chapter 5 - David Beazley
- Cross-platform paths: Python documentation on pathlib
Difficulty: Beginner Time estimate: One evening Prerequisites: Basic file operations knowledge
Real world outcome:
$ python3 pathlib_converter.py file_utils.py
Pathlib Migration Analysis
━━━━━━━━━━━━━━━━━━━━━━━━━
Original os.path style:
```python
import os
def get_config():
home = os.path.expanduser("~")
config_dir = os.path.join(home, ".config", "myapp")
if not os.path.exists(config_dir):
os.makedirs(config_dir)
config_file = os.path.join(config_dir, "config.json")
if os.path.isfile(config_file):
with open(config_file) as f:
return f.read()
return None
Converted to pathlib:
from pathlib import Path
def get_config():
config_dir = Path.home() / ".config" / "myapp"
config_dir.mkdir(parents=True, exist_ok=True)
config_file = config_dir / "config.json"
if config_file.is_file():
return config_file.read_text()
return None
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Improvements: • Path joining: os.path.join(a, b, c) → a / b / c • Existence checks: os.path.exists() → path.exists() • File reading: open(path).read() → path.read_text() • Directory creation: os.makedirs() → path.mkdir(parents=True) • Home directory: os.path.expanduser(“~”) → Path.home()
Code reduction: 30% Readability: Much improved
**Implementation Hints**:
The os.path to pathlib mapping:
```python
# Creating paths
os.path.join(a, b, c) → Path(a) / b / c
os.path.expanduser("~") → Path.home()
os.path.abspath(p) → Path(p).resolve()
os.path.dirname(p) → Path(p).parent
os.path.basename(p) → Path(p).name
# Checking paths
os.path.exists(p) → Path(p).exists()
os.path.isfile(p) → Path(p).is_file()
os.path.isdir(p) → Path(p).is_dir()
os.path.isabs(p) → Path(p).is_absolute()
# Path components
os.path.splitext(p) → Path(p).suffix, Path(p).stem
os.path.split(p) → Path(p).parent, Path(p).name
# File operations
open(path, 'r').read() → Path(path).read_text()
open(path, 'rb').read() → Path(path).read_bytes()
open(path, 'w').write(data) → Path(path).write_text(data)
# Directory operations
os.listdir(p) → list(Path(p).iterdir())
os.makedirs(p) → Path(p).mkdir(parents=True)
glob.glob("*.txt") → Path(".").glob("*.txt")
Why pathlib is better:
# os.path: String concatenation, easy to mess up
config = os.path.join(
os.path.expanduser("~"),
".config",
"myapp",
"config.json"
)
# pathlib: Object-oriented, intuitive
config = Path.home() / ".config" / "myapp" / "config.json"
# Even better - pathlib handles this:
config.parent.mkdir(parents=True, exist_ok=True) # Create dirs if needed
data = config.read_text() # Read file
config.write_text(new_data) # Write file
Learning milestones:
- You know the / operator for paths → You write readable path code
- You understand Path methods → You use modern file operations
- You handle cross-platform paths → You write portable code
- You prefer pathlib → You’ve internalized Python 3 style
Project 13: The 2to3/futurize Wrapper
- File: LEARN_PYTHON2_VS_PYTHON3_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Code Migration / AST Transformation
- Software or Tool: lib2to3, futurize, modernize
- Main Book: “Porting to Python 3” by Lennart Regebro
What you’ll build: An intelligent wrapper around 2to3/futurize that provides a better UX: interactive mode, confidence scores, explanations for each change, and the ability to preview changes before applying.
Why it teaches Python 2 vs 3: By wrapping the official migration tools, you’ll understand all the transformations they make and why each is necessary.
Core challenges you’ll face:
- Understanding lib2to3 fixers → maps to how each syntax change is detected and fixed
- Handling edge cases the tools miss → maps to manual migration requirements
- Two-stage migration → maps to incremental modernization
Key Concepts:
- 2to3 tool: Python documentation
- Futurize stages: python-future documentation
- Common migration problems: “Porting to Python 3” - Lennart Regebro
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Solid Python 2 and 3 knowledge, AST experience
Real world outcome:
$ python3 smart_migrator.py --analyze legacy_project/
Smart Python Migration Analysis
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Scanning 47 Python files...
Migration Complexity Score: 7/10 (Moderate)
Breakdown by category:
┌──────────────────────┬───────┬────────────┬───────────────┐
│ Category │ Count │ Confidence │ Auto-fixable? │
├──────────────────────┼───────┼────────────┼───────────────┤
│ Print statements │ 234 │ 100% │ Yes │
│ Division operators │ 45 │ 85% │ Review │
│ Exception syntax │ 23 │ 100% │ Yes │
│ Unicode strings │ 156 │ 60% │ Review │
│ urllib imports │ 12 │ 95% │ Yes │
│ Dictionary methods │ 34 │ 90% │ Yes │
│ xrange calls │ 28 │ 100% │ Yes │
│ raw_input calls │ 8 │ 100% │ Yes │
└──────────────────────┴───────┴────────────┴───────────────┘
⚠️ Items requiring manual review:
1. models/user.py:45 - Unicode string handling
Current: str_value = str(unicode_input)
Issue: str/unicode coercion may have different behavior
Suggestion: Use explicit .encode()/.decode()
2. utils/math.py:23 - Division operator
Current: result = count / total
Issue: Integer division behavior changes
Question: Should this be floor division (//) or true division (/)?
3. network/client.py:89 - pickle usage
Current: pickle.loads(binary_data)
Issue: Python 2 pickles may not load in Python 3
Suggestion: Re-pickle data with protocol=2
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Recommended migration strategy:
1. Stage 1: Safe fixes (print, exceptions, imports) - 1 day
2. Stage 2: Test suite updates - 2 days
3. Stage 3: String/unicode fixes - 3 days (manual review)
4. Stage 4: Final testing and cleanup - 1 day
Estimated effort: 1 week for 47 files
[Preview changes] [Apply safe fixes] [Generate report]
Implementation Hints:
The lib2to3 fixers:
# 2to3 has "fixers" for each transformation
# Located in lib2to3/fixes/
# Examples:
fix_print # print x → print(x)
fix_except # except E, e: → except E as e:
fix_has_key # d.has_key(k) → k in d
fix_dict # d.keys() → list(d.keys()) when needed
fix_xrange # xrange() → range()
fix_input # raw_input() → input()
fix_import # Import renames
Two-stage migration with futurize:
# Stage 1: Safe, non-breaking changes
futurize --stage1 -w legacy_code.py
# Stage 2: Changes that require 'future' library
futurize --stage2 -w legacy_code.py
Edge cases the tools miss:
- Pickle compatibility: Python 2 pickles may not load in Python 3
- Byte string operations:
"hello"[0]returns different types - Dictionary ordering: Python 3.7+ preserves insertion order
- Implicit relative imports: Removed in Python 3
- reload(): Moved to importlib
Build a wrapper that:
- Runs 2to3 in dry-run mode
- Parses the output to identify changes
- Categorizes by type and confidence
- Allows interactive review
- Applies changes incrementally
Learning milestones:
- You understand all 2to3 fixers → You know every syntax change
- You identify edge cases → You can handle complex migrations
- You use futurize stages → You can do incremental migration
- You’ve migrated real code → You’re ready for production migrations
Project 14: The Walrus Operator Introducer
- File: LEARN_PYTHON2_VS_PYTHON3_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Syntax / Code Patterns
- Software or Tool: Python 3.8+
- Main Book: “Effective Python, 3rd Edition” by Brett Slatkin
What you’ll build: A tool that identifies opportunities to use Python 3.8’s walrus operator (:=) and suggests refactorings, demonstrating assignment expressions.
Why it teaches Python 2 vs 3: The walrus operator is one of Python 3.8’s most controversial additions (Guido resigned over it!). Understanding when to use it teaches you about Python’s expression/statement distinction.
Core challenges you’ll face:
- Identifying walrus opportunities → maps to pattern matching for assignment
- Knowing when NOT to use it → maps to readability concerns
- Understanding expression vs statement → maps to Python’s fundamental design
Key Concepts:
- Assignment expressions: PEP 572
- When to use walrus operator: “Effective Python” Item 10 - Brett Slatkin
- The controversy: Python governance history
Difficulty: Intermediate Time estimate: Weekend Prerequisites: Solid Python understanding
Real world outcome:
$ python3 walrus_suggester.py code.py
Walrus Operator Opportunities
━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Line 23 - Loop with initial assignment:
```python
# Before
line = file.readline()
while line:
process(line)
line = file.readline()
# After (with walrus)
while (line := file.readline()):
process(line)
Benefit: Eliminates duplicate readline() call
Line 45 - Conditional with assignment:
# Before
match = pattern.search(text)
if match:
process(match.group(1))
# After (with walrus)
if (match := pattern.search(text)):
process(match.group(1))
Benefit: Reduces scope of ‘match’ variable
Line 67 - List comprehension with filter:
# Before
results = []
for x in data:
y = expensive_compute(x)
if y > threshold:
results.append(y)
# After (with walrus)
results = [y for x in data if (y := expensive_compute(x)) > threshold]
Benefit: Avoids computing expensive_compute twice
⚠️ Not recommended (readability concerns):
Line 89:
# Don't do this - too complex
if (a := foo()) and (b := bar(a)) and (c := baz(b)):
process(a, b, c)
Suggestion: Keep as separate statements for readability
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Summary: Good candidates: 5 Maybe (review readability): 3 Not recommended: 2
**Implementation Hints**:
Common walrus operator patterns:
```python
# 1. While loops
# Before
chunk = file.read(8192)
while chunk:
process(chunk)
chunk = file.read(8192)
# After
while (chunk := file.read(8192)):
process(chunk)
# 2. If statements with setup
# Before
result = expensive_operation()
if result:
use(result)
# After
if (result := expensive_operation()):
use(result)
# 3. List comprehensions avoiding double computation
# Before (computes twice!)
[f(x) for x in data if f(x) > 0]
# After (computes once)
[y for x in data if (y := f(x)) > 0]
# 4. Regex matching
# Before
match = re.search(pattern, text)
if match:
print(match.group(1))
# After
if (match := re.search(pattern, text)):
print(match.group(1))
When NOT to use walrus:
# Too complex - hard to read
if (a := f()) and (b := g(a)) and (c := h(b)):
...
# Side effects in expressions - confusing
data = [(y := x * 2) for x in range(10)]
print(y) # y is 18 - leaked from comprehension!
# Simple assignments - no benefit
x := 5 # SyntaxError! Walrus can't be a statement
Build a tool that:
- Parses Python code with AST
- Identifies patterns where walrus helps:
- While loops with pre-loop assignment
- If statements with unused-after assignment
- List comprehensions with expensive duplicate calls
- Generates the refactored version
- Warns about readability concerns
Learning milestones:
- You understand assignment expressions → You know what
:=does - You identify good use cases → You can improve existing code
- You know when to avoid it → You prioritize readability
- You understand the controversy → You appreciate language design tradeoffs
Project 15: The Complete Python 2→3 Migration Simulator
- File: LEARN_PYTHON2_VS_PYTHON3_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: N/A
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 4: Expert
- Knowledge Area: Code Migration / CI/CD
- Software or Tool: Docker, tox, pytest
- Main Book: “Porting to Python 3” by Lennart Regebro + all previous resources
What you’ll build: A complete migration pipeline that takes a Python 2 project, runs it through all conversion tools, tests under both Python versions (using Docker), and produces a fully migrated Python 3 codebase with a migration report.
Why it teaches Python 2 vs 3: This capstone project combines everything: you’ll use all the individual tools, handle real-world edge cases, and understand the full migration workflow that companies use.
Core challenges you’ll face:
- Setting up dual-version testing → maps to tox, Docker, CI/CD
- Handling all migration categories → maps to combining all previous projects
- Maintaining compatibility → maps to six library, future imports
Key Concepts:
- All concepts from previous projects combined
- CI/CD for Python: GitHub Actions, tox
- Containerized testing: Docker multi-stage builds
- Compatibility libraries: six, future
Difficulty: Expert Time estimate: 1 month+ Prerequisites: Completion of most previous projects, DevOps knowledge
Real world outcome:
$ python3 migration_pipeline.py --repo github.com/example/legacy-project
Python 2 to 3 Migration Pipeline
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 1: Clone and analyze repository
✓ Cloned 156 Python files
✓ Detected Python 2.7 (setup.py classifiers)
✓ Found test suite: pytest (89 tests)
Step 2: Run tests on Python 2.7 (baseline)
[Docker: python:2.7]
✓ 89/89 tests passed
Step 3: Static analysis
✓ Print statements: 234
✓ Exception syntax: 45
✓ Unicode issues: 78
✓ Import changes: 23
✓ Dictionary methods: 34
Step 4: Stage 1 migration (safe fixes)
✓ Applied futurize --stage1
✓ 312 changes made
Step 5: Test on both versions
[Docker: python:2.7]
✓ 89/89 tests passed
[Docker: python:3.11]
✗ 67/89 tests passed
Step 6: Stage 2 migration (with future library)
✓ Applied futurize --stage2
✓ 89 changes made
Step 7: Test again
[Docker: python:2.7]
✓ 89/89 tests passed
[Docker: python:3.11]
✗ 82/89 tests passed
Step 8: Manual fixes applied
✓ Fixed pickle protocol issues
✓ Fixed binary file handling
✓ Updated requirements.txt
Step 9: Final test
[Docker: python:3.11]
✓ 89/89 tests passed
Step 10: Generate migration report
✓ Report saved to: migration_report.html
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Migration Complete!
Summary:
Total changes: 423
Automated: 401 (95%)
Manual: 22 (5%)
Time: 2 hours automated, ~4 hours manual review
Output:
✓ Migrated code: ./output/legacy-project-py3/
✓ Report: ./output/migration_report.html
✓ Diff: ./output/migration.patch
Implementation Hints:
Pipeline architecture:
┌──────────────────────────────────────────────────────────────┐
│ Migration Pipeline │
├──────────────────────────────────────────────────────────────┤
│ │
│ 1. Clone Repository │
│ ↓ │
│ 2. Baseline Testing (Python 2.7 Docker) │
│ ↓ │
│ 3. Static Analysis (all our previous tools) │
│ ↓ │
│ 4. Stage 1: Safe Fixes (futurize --stage1) │
│ ↓ │
│ 5. Test Both Versions (Python 2.7 + 3.x Docker) │
│ ↓ │
│ 6. Stage 2: Future Library (futurize --stage2) │
│ ↓ │
│ 7. Test Both Versions │
│ ↓ │
│ 8. Manual Review Queue │
│ ↓ │
│ 9. Final Testing │
│ ↓ │
│ 10. Generate Report │
│ │
└──────────────────────────────────────────────────────────────┘
Docker setup for testing:
# Dockerfile.py27
FROM python:2.7
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["pytest"]
# Dockerfile.py311
FROM python:3.11
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["pytest"]
tox.ini for local testing:
[tox]
envlist = py27,py311
[testenv]
deps =
pytest
six
future
commands = pytest
The manual review queue should flag:
- Pickle files - May need re-pickling
- Binary file operations - Mode changes
- C extensions - Need recompilation
- Database migrations - Text encoding changes
- External APIs - String/bytes handling
Learning milestones:
- You can run dual-version tests → You understand the migration workflow
- You handle all migration categories → You’ve mastered the differences
- You know what requires manual review → You understand edge cases
- You’ve migrated a real project → You’re ready for production migrations
Project Comparison Table
| Project | Difficulty | Time | Key Learning | Fun Factor |
|---|---|---|---|---|
| 1. Unicode/Bytes Torture Chamber | Intermediate | Weekend | String encoding | ⭐⭐⭐ |
| 2. Division Calculator | Beginner | Evening | Integer/true division | ⭐⭐ |
| 3. Print Statement Converter | Intermediate | Weekend | AST manipulation | ⭐⭐⭐ |
| 4. Iterator vs List Profiler | Intermediate | Weekend | Memory efficiency | ⭐⭐⭐ |
| 5. Exception Syntax Migrator | Beginner | Evening | Exception handling | ⭐⭐ |
| 6. Standard Library Import Fixer | Intermediate | Weekend | Library reorganization | ⭐⭐ |
| 7. input() Security Sandbox | Beginner | Evening | Security, eval dangers | ⭐⭐⭐⭐ |
| 8. f-string Converter | Intermediate | Weekend | String formatting | ⭐⭐⭐ |
| 9. Type Hints Migrator | Intermediate | 1-2 weeks | Static typing | ⭐⭐⭐ |
| 10. async/await Converter | Advanced | 1-2 weeks | Asynchronous programming | ⭐⭐⭐⭐ |
| 11. dataclass Generator | Intermediate | Weekend | OOP, metaprogramming | ⭐⭐⭐ |
| 12. pathlib Migration Tool | Beginner | Evening | Filesystem operations | ⭐⭐ |
| 13. 2to3/futurize Wrapper | Advanced | 1-2 weeks | Complete migration | ⭐⭐⭐ |
| 14. Walrus Operator Introducer | Intermediate | Weekend | Modern syntax | ⭐⭐⭐ |
| 15. Complete Migration Pipeline | Expert | 1 month | Everything combined | ⭐⭐⭐⭐⭐ |
Recommended Learning Path
Path 1: Quick Overview (Weekend)
- Division Calculator (2) - Understand numeric changes
- input() Security Sandbox (7) - See the security fix
- Iterator vs List Profiler (4) - Understand performance improvements
- pathlib Migration Tool (12) - Use modern file handling
Path 2: Migration Focus (2-3 weeks)
- Unicode/Bytes Torture Chamber (1) - Master THE biggest difference
- Print Statement Converter (3) - Learn AST manipulation
- Standard Library Import Fixer (6) - Handle reorganization
- Exception Syntax Migrator (5) - Complete syntax migration
- 2to3/futurize Wrapper (13) - Build complete migration tooling
Path 3: Modern Python (3-4 weeks)
- f-string Converter (8) - Modern string formatting
- Type Hints Migrator (9) - Static typing
- async/await Converter (10) - Asynchronous programming
- dataclass Generator (11) - Modern OOP
- Walrus Operator Introducer (14) - Latest syntax
Path 4: Complete Mastery (2+ months)
Complete all 15 projects in order, culminating in the Complete Migration Pipeline.
Final Capstone: Production Migration
What you’ll build: Take a real open-source Python 2 project (many still exist on GitHub) and fully migrate it to Python 3, contributing the migration back to the project.
Why this matters: Real-world migration experience is invaluable. You’ll encounter edge cases no tutorial covers and learn to make judgment calls.
Suggested projects to migrate:
- Look for GitHub repos with “python2” in issues
- Check if older libraries you use still have Python 2 code
- Find unmaintained but useful tools
Your contribution:
- Fork the repository
- Run your migration pipeline
- Fix all failing tests
- Update documentation
- Submit a pull request
This is resume gold - you’ve not only learned the differences, you’ve proven you can handle real migrations.
Summary
| # | Project | Language |
|---|---|---|
| 1 | Unicode/Bytes Torture Chamber | Python |
| 2 | Division Calculator Time Machine | Python |
| 3 | Print Statement Converter | Python |
| 4 | Iterator vs List Profiler | Python |
| 5 | Exception Syntax Migrator | Python |
| 6 | Standard Library Import Fixer | Python |
| 7 | input() vs raw_input() Security Sandbox | Python |
| 8 | f-string Converter | Python |
| 9 | Type Hints Migrator | Python |
| 10 | async/await Converter | Python |
| 11 | dataclass Generator | Python |
| 12 | pathlib Migration Tool | Python |
| 13 | 2to3/futurize Wrapper | Python |
| 14 | Walrus Operator Introducer | Python |
| 15 | Complete Migration Pipeline | Python |
Resources
Books
- “Fluent Python, 2nd Edition” by Luciano Ramalho - Deep understanding of Python 3
- “Effective Python, 3rd Edition” by Brett Slatkin - Modern Python best practices
- “Porting to Python 3” by Lennart Regebro - The classic migration guide
- “Python Cookbook, 3rd Edition” by David Beazley - Practical Python 3 recipes
- “Robust Python” by Patrick Viafore - Type hints and modern patterns
- “Using Asyncio in Python” by Caleb Hattingh - Async/await deep dive
- “High Performance Python, 2nd Edition” by Gorelick & Ozsvald - Performance optimization
Online Resources
- Python 3 Porting Guide - Comprehensive migration documentation
- Python-Future Documentation - The futurize tool
- What’s New in Python 3.x - Official changelog for each version
- The Conservative Python 3 Porting Guide - Practical migration advice
- PEP Index - Official Python Enhancement Proposals
Key PEPs
- PEP 3000 - Python 3000 overview
- PEP 3100 - Miscellaneous Python 3.0 changes
- PEP 3105 - Make print a function
- PEP 3107 - Function annotations (type hints precursor)
- PEP 3108 - Standard library reorganization
- PEP 238 - Division operator changes
- PEP 484 - Type hints
- PEP 492 - Async/await syntax
- PEP 498 - f-strings
- PEP 557 - Data classes
- PEP 572 - Assignment expressions (walrus operator)