P06: File Synchronization Tool
Project Overview
What you’ll build: A PowerShell-based file sync tool that compares two directories and synchronizes them—showing what would change, then applying changes with confirmation.
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 1 week |
| Programming Language | PowerShell |
| Knowledge Area | Filesystem, Scripting, Algorithms |
| Prerequisites | Basic PowerShell, understanding of filesystems |
Learning Objectives
After completing this project, you will be able to:
- Traverse directory trees recursively - Enumerate files across nested folder structures efficiently
- Compare files using cryptographic hashes - Use SHA256/MD5 for reliable change detection beyond timestamps
- Build production-quality PowerShell cmdlets - Implement CmdletBinding with -WhatIf, -Confirm, and -Verbose
- Handle errors at scale - Gracefully recover from individual file failures without stopping the entire sync
- Implement efficient comparison algorithms - Use hashtables for O(1) lookups instead of nested loops
- Design robust logging systems - Track all operations for audit trails and debugging
- Understand sync strategies - Mirror vs. bidirectional sync with conflict resolution
Deep Theoretical Foundation
Filesystem Traversal and Recursion
Understanding directory structure:
A filesystem is a tree data structure where directories (folders) are nodes and files are leaves:
C:\Source\
├── Documents\
│ ├── report.docx
│ ├── data.xlsx
│ └── Archive\
│ ├── 2024_report.docx
│ └── 2023_report.docx
├── Images\
│ ├── photo1.jpg
│ └── photo2.png
└── config.json
Recursive traversal algorithm:
FUNCTION TraverseDirectory(path):
FOR each item in path:
IF item is a file:
PROCESS file
ELSE IF item is a directory:
TraverseDirectory(item) // Recursive call
In PowerShell, Get-ChildItem -Recurse handles this automatically:
# Non-recursive - only immediate children
Get-ChildItem -Path "C:\Source" -File
# Recursive - all descendants
Get-ChildItem -Path "C:\Source" -File -Recurse
Why recursion matters for sync:
When syncing directories, you need to:
- Visit every file in the source
- Determine its relative path from root
- Check if corresponding file exists in destination
- Compare content if it exists
- Handle nested subdirectories that may or may not exist
Calculating relative paths:
# Given: C:\Source\Documents\report.docx
# Root: C:\Source\
# Relative path: Documents\report.docx
$rootPath = "C:\Source\"
$fullPath = "C:\Source\Documents\report.docx"
# Method 1: String manipulation
$relativePath = $fullPath.Substring($rootPath.Length)
# Method 2:.NET Path methods
$relativePath = [System.IO.Path]::GetRelativePath($rootPath, $fullPath)
Relative paths are crucial because they let you map source files to destination files regardless of the actual root paths.
File Hashing for Change Detection
The fundamental problem:
How do you know if two files are the same? Consider:
File A: C:\Source\report.docx (1,234,567 bytes, modified 2024-12-25 10:30)
File B: D:\Backup\report.docx (1,234,567 bytes, modified 2024-12-25 10:30)
Are they identical? Maybe. The timestamp and size match, but:
Why timestamp comparison fails:
| Scenario | Problem |
|---|---|
| Timezone differences | File copied across timezones shows different time |
| Daylight saving | Timestamps shift by 1 hour twice yearly |
| Clock drift | Different machines have different clocks |
| Copy tools | Some preserve original, others use current time |
| Deliberate changes | User may edit an older file |
| Filesystem limitations | FAT32 has 2-second resolution, NTFS has 100-nanosecond |
The reliable solution: cryptographic hashes
A hash function transforms any input into a fixed-size output (digest):
"Hello World" → SHA256 → a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e
"Hello World!" → SHA256 → 7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069
Key properties:
- Deterministic: Same input always produces same output
- Fixed size: Output is always 256 bits (for SHA256) regardless of input size
- Avalanche effect: Small input change causes massive output change
- One-way: Cannot reverse-engineer input from output
- Collision-resistant: Extremely unlikely for two different inputs to produce same output
Hash algorithm comparison:
| Algorithm | Output Size | Speed | Security | Use Case |
|---|---|---|---|---|
| MD5 | 128 bits | Fastest | Broken (collisions found) | Quick checks, non-security |
| SHA1 | 160 bits | Fast | Weakened | Legacy compatibility |
| SHA256 | 256 bits | Medium | Strong | Recommended for file sync |
| SHA512 | 512 bits | Slower | Strongest | High-security environments |
| xxHash | 64/128 bits | Very fast | Non-cryptographic | When speed matters more than security |
PowerShell hash calculation:
# Get hash of a file
$hash = Get-FileHash -Path "C:\file.txt" -Algorithm SHA256
$hash.Hash # Returns: ABC123DEF456...
# Comparing two files
$sourceHash = (Get-FileHash -Path $sourcePath -Algorithm SHA256).Hash
$destHash = (Get-FileHash -Path $destPath -Algorithm SHA256).Hash
if ($sourceHash -eq $destHash) {
Write-Host "Files are identical"
} else {
Write-Host "Files differ - need to sync"
}
Performance consideration:
Hashing reads the entire file, so for large files it can be slow:
File Size | Approximate Hash Time (SSD)
1 MB | ~5 ms
100 MB | ~200 ms
1 GB | ~2 seconds
10 GB | ~20 seconds
Optimization strategy: Multi-tier comparison
# Tier 1: Size check (instant)
if ($source.Length -ne $dest.Length) {
return "DIFFERENT"
}
# Tier 2: Quick timestamp check (instant)
if ($source.LastWriteTime -gt $dest.LastWriteTime) {
return "SOURCE_NEWER"
}
# Tier 3: Hash comparison (slow but definitive)
$sourceHash = (Get-FileHash -Path $source.FullName -Algorithm SHA256).Hash
$destHash = (Get-FileHash -Path $dest.FullName -Algorithm SHA256).Hash
if ($sourceHash -ne $destHash) {
return "DIFFERENT"
}
return "IDENTICAL"
Advanced Functions and CmdletBinding
What makes a PowerShell function “advanced”?
Basic functions:
function Copy-MyFile {
param($Source, $Destination)
Copy-Item -Path $Source -Destination $Destination
}
Advanced functions add professional-grade features:
function Copy-MyFile {
[CmdletBinding(SupportsShouldProcess=$true, ConfirmImpact='High')]
param(
[Parameter(Mandatory=$true, Position=0, ValueFromPipeline=$true)]
[ValidateScript({Test-Path $_ -PathType Leaf})]
[string]$Source,
[Parameter(Mandatory=$true, Position=1)]
[string]$Destination
)
process {
if ($PSCmdlet.ShouldProcess($Source, "Copy file")) {
Copy-Item -Path $Source -Destination $Destination
Write-Verbose "Copied $Source to $Destination"
}
}
}
CmdletBinding parameters explained:
| Parameter | Purpose |
|---|---|
SupportsShouldProcess |
Enables -WhatIf and -Confirm parameters |
ConfirmImpact |
Controls automatic confirmation prompting |
DefaultParameterSetName |
Sets default when multiple parameter sets exist |
PositionalBinding |
Allows positional parameter usage |
SupportsShouldProcess in depth:
When you add SupportsShouldProcess=$true, PowerShell automatically adds -WhatIf and -Confirm parameters:
# User can run:
Sync-Directories -Source C:\A -Destination D:\B -WhatIf # Preview only
Sync-Directories -Source C:\A -Destination D:\B -Confirm # Confirm each action
Sync-Directories -Source C:\A -Destination D:\B # Execute normally
Using $PSCmdlet.ShouldProcess():
# Single parameter: "Performing operation on target"
if ($PSCmdlet.ShouldProcess($targetFile)) {
Remove-Item $targetFile
}
# Output: What if: Performing the operation "Remove-Item" on target "C:\file.txt".
# Two parameters: "Performing operation on target"
if ($PSCmdlet.ShouldProcess($targetFile, "Delete file")) {
Remove-Item $targetFile
}
# Output: What if: Performing the operation "Delete file" on target "C:\file.txt".
# Three parameters: Complete control
if ($PSCmdlet.ShouldProcess($target, $action, $verboseWarning)) {
# Action
}
ConfirmImpact levels:
| Level | Behavior |
|---|---|
| None | Never auto-prompts |
| Low | Prompts if $ConfirmPreference is Low or lower |
| Medium | Prompts if $ConfirmPreference is Medium or lower |
| High | Always prompts unless -Confirm:$false specified |
For a file sync tool that deletes files, use ConfirmImpact='High'.
Write-Verbose for debugging:
function Sync-Directories {
[CmdletBinding()]
param(...)
# These only appear when user specifies -Verbose
Write-Verbose "Starting directory comparison..."
Write-Verbose "Found $($files.Count) files in source"
Write-Verbose "Hash comparison for: $($file.Name)"
}
# User runs:
Sync-Directories -Source C:\A -Destination D:\B -Verbose
Error Handling at Scale
When syncing thousands of files, you need strategies beyond simple try/catch:
The problem with stopping on first error:
# Bad: One locked file stops everything
foreach ($file in $files) {
Copy-Item -Path $file.FullName -Destination $dest # Throws on locked file
}
# Result: 5000 files to sync, stops after 100 due to one locked file
Per-file error handling:
$results = @{
Successful = [System.Collections.ArrayList]::new()
Failed = [System.Collections.ArrayList]::new()
}
foreach ($file in $files) {
try {
Copy-Item -Path $file.FullName -Destination $dest -ErrorAction Stop
[void]$results.Successful.Add($file.Name)
}
catch {
[void]$results.Failed.Add([PSCustomObject]@{
File = $file.Name
Error = $_.Exception.Message
})
Write-Warning "Failed to copy $($file.Name): $($_.Exception.Message)"
# Continue with next file
}
}
Write-Host "Completed: $($results.Successful.Count) succeeded, $($results.Failed.Count) failed"
Error categories and handling strategies:
| Error Type | Cause | Strategy |
|---|---|---|
| File locked | Another process has exclusive access | Log and skip, retry later |
| Access denied | Insufficient permissions | Log and skip, flag for attention |
| Path too long | Path exceeds 260 characters | Use \?\ prefix or PowerShell 7 |
| Disk full | Destination has no space | Stop sync, alert user |
| Network error | Transient network issue | Retry with exponential backoff |
| File changed during copy | Source modified mid-copy | Re-read and retry |
ErrorAction preference values:
# Per-command
Copy-Item -Path $src -Destination $dest -ErrorAction Stop # Throw exception
Copy-Item -Path $src -Destination $dest -ErrorAction Continue # Show error, continue
Copy-Item -Path $src -Destination $dest -ErrorAction SilentlyContinue # Suppress
Copy-Item -Path $src -Destination $dest -ErrorAction Inquire # Ask user
# Script-wide
$ErrorActionPreference = 'Stop' # All errors throw exceptions
Retry logic with exponential backoff:
function Invoke-WithRetry {
param(
[ScriptBlock]$ScriptBlock,
[int]$MaxRetries = 3,
[int]$InitialDelayMs = 100
)
$attempt = 0
$delay = $InitialDelayMs
while ($true) {
try {
return & $ScriptBlock
}
catch {
$attempt++
if ($attempt -ge $MaxRetries) {
throw $_
}
Write-Verbose "Attempt $attempt failed, retrying in ${delay}ms..."
Start-Sleep -Milliseconds $delay
$delay *= 2 # Exponential backoff: 100, 200, 400, 800...
}
}
}
# Usage
Invoke-WithRetry {
Copy-Item -Path $src -Destination $dest -ErrorAction Stop
}
Comparison Algorithms: Naive vs. Optimized
The naive approach (O(n^2)):
# For each source file, scan all destination files
foreach ($sourceFile in $sourceFiles) { # n iterations
foreach ($destFile in $destFiles) { # n iterations per source
if ($sourceFile.Name -eq $destFile.Name) {
# Compare
break
}
}
}
# Total: n * n = n^2 comparisons
Performance impact:
- 100 files: 10,000 comparisons
- 1,000 files: 1,000,000 comparisons
- 10,000 files: 100,000,000 comparisons (VERY slow)
The optimized approach (O(n)):
# Step 1: Build hashtable from destination (O(n))
$destTable = @{}
foreach ($file in $destFiles) { # n iterations
$relativePath = GetRelativePath $file $destRoot
$destTable[$relativePath] = $file # O(1) insertion
}
# Step 2: Check each source file (O(n))
foreach ($file in $sourceFiles) { # n iterations
$relativePath = GetRelativePath $file $sourceRoot
if ($destTable.ContainsKey($relativePath)) { # O(1) lookup
# File exists in both - compare hashes
} else {
# File only in source - needs copy
}
}
# Total: n + n = 2n = O(n)
Visual comparison:
NAIVE (Nested Loops):
Source: [A, B, C, D, E]
Dest: [B, D, E, F, G]
A vs B? No A vs D? No A vs E? No A vs F? No A vs G? No (5 comparisons)
B vs B? Yes! (1 comparison)
C vs B? No C vs D? No C vs E? No C vs F? No C vs G? No (5 comparisons)
...
Total: Many comparisons
HASHTABLE (Single Pass):
Build table from Dest: {B: file, D: file, E: file, F: file, G: file}
A in table? No -> Copy
B in table? Yes -> Compare
C in table? No -> Copy
D in table? Yes -> Compare
E in table? Yes -> Compare
Total: 5 lookups (plus 5 insertions)
Hashtable structure for file comparison:
# Each entry in the hashtable
$fileEntry = [PSCustomObject]@{
RelativePath = "Documents\report.docx" # Key
FullPath = "C:\Source\Documents\report.docx"
Hash = "ABC123..." # Calculated on demand
Size = 1234567
LastWriteTime = [DateTime]"2024-12-25 10:30:00"
}
# The hashtable itself
$fileTable = @{
"Documents\report.docx" = $fileEntry1
"Documents\data.xlsx" = $fileEntry2
"Images\photo.jpg" = $fileEntry3
}
# O(1) lookup
$file = $fileTable["Documents\report.docx"]
Sync Direction Strategies
One-way mirror (Source -> Destination):
The destination becomes an exact copy of source:
Source Destination (before) Destination (after)
├── file1.txt (new) ├── file2.txt ├── file1.txt (copied)
├── file2.txt (unchanged) ├── file3.txt (obsolete) ├── file2.txt (kept)
└── file4.txt (modified) └── file4.txt (old) └── file4.txt (updated)
# file3.txt DELETED
Operations:
- New in source: Copy to destination
- Modified in source: Update in destination
- Missing from source: Delete from destination (mirror mode)
- Unchanged: Skip
Two-way sync (bidirectional):
Both directories can have changes, merged together:
Source (before) Destination (before)
├── file1.txt (new here) ├── file2.txt (new here)
├── shared.txt (modified) ├── shared.txt (also modified!) <- CONFLICT
└── old.txt └── old.txt
After two-way sync:
├── file1.txt ├── file1.txt (copied from source)
├── file2.txt (copied from dest) ├── file2.txt
├── shared.txt (???) ├── shared.txt (???) <- How to resolve?
└── old.txt └── old.txt
Conflict resolution strategies:
| Strategy | Description | Use Case |
|---|---|---|
| Source wins | Always take source version | Authoritative source |
| Destination wins | Always take destination version | Destination is primary |
| Newer wins | Take file with later timestamp | General sync |
| Larger wins | Take larger file (no data loss) | Append-only logs |
| Keep both | Rename conflicting files | User decides later |
| Manual | Stop and ask user | Critical data |
| Merge | Attempt to merge changes | Text files with merge tools |
This project implements one-way mirror - the simpler and more common use case. Two-way sync requires tracking modification times, detecting conflicts, and implementing resolution strategies, which adds significant complexity.
Complete Project Specification
Functional Requirements
| ID | Requirement | Priority | Implementation Phase |
|---|---|---|---|
| F1 | Compare two directories recursively | Must Have | Phase 1 |
| F2 | Identify new files in source | Must Have | Phase 1 |
| F3 | Identify modified files (by hash) | Must Have | Phase 3 |
| F4 | Identify files to delete (mirror mode) | Must Have | Phase 1 |
| F5 | Copy new/modified files to destination | Must Have | Phase 4 |
| F6 | Delete orphaned files (mirror mode) | Must Have | Phase 4 |
| F7 | Support -WhatIf parameter | Must Have | Phase 2 |
| F8 | Support -Confirm parameter | Must Have | Phase 2 |
| F9 | Support -Verbose parameter | Must Have | Phase 2 |
| F10 | Use file hashes (SHA256) for comparison | Should Have | Phase 3 |
| F11 | Show progress bar during sync | Should Have | Phase 4 |
| F12 | Log all operations to file | Should Have | Phase 5 |
| F13 | Handle locked files gracefully | Should Have | Phase 4 |
| F14 | Support exclude patterns (wildcards) | Nice to Have | Phase 5 |
| F15 | Retry failed operations | Nice to Have | Phase 5 |
| F16 | Generate summary report | Nice to Have | Phase 5 |
Non-Functional Requirements
| ID | Requirement | Target |
|---|---|---|
| NF1 | Compare 10,000 files in < 30 seconds | Performance |
| NF2 | Memory usage < 500MB for large directories | Resource efficiency |
| NF3 | Continue after individual file errors | Reliability |
| NF4 | Provide accurate progress indication | User experience |
| NF5 | Work with paths up to 32,000 characters | Compatibility |
| NF6 | Support PowerShell 5.1 and 7+ | Compatibility |
Real World Outcome
When complete, you’ll have a production-ready sync tool:
Example -WhatIf Output
PS> Sync-Directories -Source C:\Work -Destination D:\Backup -Mirror -WhatIf
================================================================================
FILE SYNC PREVIEW (DRY RUN)
================================================================================
Source: C:\Work
Destination: D:\Backup
Mode: Mirror (destination will match source exactly)
================================================================================
ANALYZING DIRECTORIES...
Source files found: 1,247
Destination files found: 1,189
Comparison complete in: 3.2 seconds
--------------------------------------------------------------------------------
FILES TO BE COPIED (23 new files, 47.3 MB total):
--------------------------------------------------------------------------------
[NEW] Documents\Q4_Report.docx (2.1 MB)
[NEW] Documents\Budget_2025.xlsx (856 KB)
[NEW] Projects\ClientA\proposal.pdf (4.2 MB)
[NEW] Projects\ClientA\mockups\design_v3.psd (12.8 MB)
[NEW] Scripts\deploy.ps1 (3 KB)
... and 18 more files
--------------------------------------------------------------------------------
FILES TO BE UPDATED (12 modified files, 8.7 MB total):
--------------------------------------------------------------------------------
[MOD] Documents\Contacts.csv (45 KB -> 52 KB)
Reason: Content hash differs
[MOD] Projects\ClientB\timeline.xlsx (1.2 MB -> 1.4 MB)
Reason: Content hash differs
[MOD] config.json (2 KB -> 2 KB)
Reason: Content hash differs (same size)
... and 9 more files
--------------------------------------------------------------------------------
FILES TO BE DELETED (34 orphaned files, 156.2 MB total):
--------------------------------------------------------------------------------
[DEL] Archive\old_project.zip (89.4 MB)
Reason: Not present in source
[DEL] temp\cache.dat (45.2 MB)
Reason: Not present in source
[DEL] Documents\draft_v1.docx (1.2 MB)
Reason: Not present in source
... and 31 more files
================================================================================
SUMMARY
================================================================================
Files to copy: 23 files (47.3 MB)
Files to update: 12 files (8.7 MB)
Files to delete: 34 files (156.2 MB)
Files unchanged: 1,178 files
--------------------------------------------------------------------------------
Total operations: 69
Estimated time: ~45 seconds
================================================================================
What if: Use -WhatIf:$false or omit -WhatIf to execute these changes.
Progress Bar Example
Syncing files [=========================> ] 65% (812/1,247)
Currently: Copying Projects\ClientA\mockups\design_v3.psd (12.8 MB)
Speed: 45.2 MB/s | Elapsed: 00:01:23 | Remaining: ~00:00:44
Log File Format
# sync_log_20241227_143052.txt
================================================================================
SYNC LOG: 2024-12-27 14:30:52
Source: C:\Work
Destination: D:\Backup
Mode: Mirror
================================================================================
[14:30:52] START - Beginning file synchronization
[14:30:52] INFO - Enumerating source directory: C:\Work
[14:30:53] INFO - Found 1,247 files in source (2.3 GB total)
[14:30:53] INFO - Enumerating destination directory: D:\Backup
[14:30:54] INFO - Found 1,189 files in destination (2.1 GB total)
[14:30:55] INFO - Comparison complete: 23 to copy, 12 to update, 34 to delete
[14:30:55] COPY - Documents\Q4_Report.docx (2.1 MB) -> SUCCESS
[14:30:55] COPY - Documents\Budget_2025.xlsx (856 KB) -> SUCCESS
[14:30:56] COPY - Projects\ClientA\proposal.pdf (4.2 MB) -> SUCCESS
[14:30:58] COPY - Projects\ClientA\mockups\design_v3.psd (12.8 MB) -> SUCCESS
[14:30:58] WARN - Scripts\deploy.ps1 - Access denied, file in use by another process
[14:30:58] RETRY - Scripts\deploy.ps1 - Attempt 1 of 3
[14:30:59] RETRY - Scripts\deploy.ps1 - Attempt 2 of 3
[14:31:00] COPY - Scripts\deploy.ps1 (3 KB) -> SUCCESS (after retry)
[14:31:15] UPDATE - Documents\Contacts.csv (45 KB -> 52 KB) -> SUCCESS
[14:31:15] UPDATE - Projects\ClientB\timeline.xlsx (1.2 MB -> 1.4 MB) -> SUCCESS
[14:31:16] UPDATE - config.json (2 KB) -> SUCCESS
[14:31:25] DELETE - Archive\old_project.zip (89.4 MB) -> SUCCESS
[14:31:25] DELETE - temp\cache.dat (45.2 MB) -> SUCCESS
[14:31:26] ERROR - Documents\draft_v1.docx - Access denied (file locked)
[14:31:26] DELETE - remaining 31 files -> SUCCESS
[14:31:45] COMPLETE - Synchronization finished
================================================================================
SUMMARY
================================================================================
Copied: 23 files (47.3 MB)
Updated: 12 files (8.7 MB)
Deleted: 33 files (155.0 MB) - 1 failed
Unchanged: 1,178 files
Errors: 2 (1 recovered via retry, 1 permanent)
Duration: 53 seconds
================================================================================
Conflict Resolution Examples
When running in Mirror mode, conflicts are resolved automatically (source wins):
PS> Sync-Directories -Source C:\Work -Destination D:\Backup -Mirror -Verbose
VERBOSE: Conflict detected: Documents\shared.docx
Source: Modified 2024-12-27 14:30 (Size: 45,678 bytes, Hash: ABC123...)
Destination: Modified 2024-12-27 13:15 (Size: 44,892 bytes, Hash: DEF456...)
Resolution: Source wins (Mirror mode) - destination will be overwritten
VERBOSE: Copied Documents\shared.docx -> D:\Backup\Documents\shared.docx
For future bidirectional sync (extension challenge):
PS> Sync-Directories -Source C:\Work -Destination D:\Backup -TwoWay -ConflictResolution KeepBoth
WARNING: Conflict detected for Documents\shared.docx
Source modified: 2024-12-27 14:30:22
Destination modified: 2024-12-27 14:28:45
Resolution: Keeping both versions
-> Documents\shared.docx (destination version)
-> Documents\shared_CONFLICT_20241227_143022.docx (source version)
Solution Architecture
Component Diagram
┌─────────────────────────────────────────────────────────────────────────────┐
│ SYNC-DIRECTORIES │
│ (Main Entry Point) │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ Parameters: │ │
│ │ - Source (mandatory) - HashAlgorithm (SHA256/MD5) │ │
│ │ - Destination (mandatory) - Exclude (patterns) │ │
│ │ - Mirror (switch) - LogPath (optional) │ │
│ │ - WhatIf, Confirm, Verbose (automatic via CmdletBinding) │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ v │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ DIRECTORY ENUMERATOR │ │
│ │ ┌─────────────────────────┐ ┌─────────────────────────┐ │ │
│ │ │ Get-FileHashTable │ │ Get-FileHashTable │ │ │
│ │ │ (Source) │ │ (Destination) │ │ │
│ │ │ │ │ │ │ │
│ │ │ Returns: Hashtable │ │ Returns: Hashtable │ │ │
│ │ │ Key: RelativePath │ │ Key: RelativePath │ │ │
│ │ │ Value: FileInfo │ │ Value: FileInfo │ │ │
│ │ └────────────┬────────────┘ └─────────────┬───────────┘ │ │
│ │ │ │ │ │
│ │ v v │ │
│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │
│ │ │ COMPARE-DIRECTORIES │ │ │
│ │ │ │ │ │
│ │ │ Input: SourceTable, DestTable, Mirror flag │ │ │
│ │ │ │ │ │
│ │ │ Output: ComparisonResult │ │ │
│ │ │ ├── ToCopy[] (files only in source) │ │ │
│ │ │ ├── ToUpdate[] (files in both, different hash) │ │ │
│ │ │ ├── ToDelete[] (files only in dest, if Mirror) │ │ │
│ │ │ └── Unchanged[] (files match exactly) │ │ │
│ │ └──────────────────────────────┬──────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ v │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ OPERATION EXECUTOR │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Copy Files │ │Update Files │ │Delete Files │ │ Logging │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ │ ShouldProc │ │ ShouldProc │ │ ShouldProc │ │ Write-Log │ │ │
│ │ │ Copy-Item │ │ Copy-Item │ │ Remove-Item │ │ to file │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ │ │
│ │ ┌───────────────────────────────────────────────────────────────────┐│ │
│ │ │ PROGRESS REPORTER ││ │
│ │ │ Write-Progress with current file, percentage, ETA ││ │
│ │ └───────────────────────────────────────────────────────────────────┘│ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ v │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ RESULTS REPORTER │ │
│ │ │ │
│ │ Output: Summary object with counts, errors, duration │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
Hashtable Structure for File Comparison
# The hashtable key is the relative path (case-insensitive on Windows)
# This enables O(1) lookups when comparing directories
$sourceTable = @{
"Documents\report.docx" = [PSCustomObject]@{
RelativePath = "Documents\report.docx"
FullPath = "C:\Source\Documents\report.docx"
Hash = "E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855"
Size = 1048576 # 1 MB
LastWriteTime = [DateTime]"2024-12-27 14:30:00"
}
"Documents\data.csv" = [PSCustomObject]@{
RelativePath = "Documents\data.csv"
FullPath = "C:\Source\Documents\data.csv"
Hash = "ABC123..."
Size = 524288 # 512 KB
LastWriteTime = [DateTime]"2024-12-26 09:15:00"
}
# ... more files
}
$destTable = @{
"Documents\report.docx" = [PSCustomObject]@{
RelativePath = "Documents\report.docx"
FullPath = "D:\Backup\Documents\report.docx"
Hash = "DIFFERENT_HASH_123..." # <-- Different! Needs update
Size = 1024000
LastWriteTime = [DateTime]"2024-12-20 10:00:00"
}
"Documents\old_file.txt" = [PSCustomObject]@{
# This file doesn't exist in source - delete if Mirror mode
...
}
}
# Comparison result structure
$comparisonResult = [PSCustomObject]@{
ToCopy = [System.Collections.ArrayList]@(
# Files in source but not in destination
)
ToUpdate = [System.Collections.ArrayList]@(
[PSCustomObject]@{
Source = $sourceTable["Documents\report.docx"]
Destination = $destTable["Documents\report.docx"]
Reason = "Content hash differs"
}
)
ToDelete = [System.Collections.ArrayList]@(
# Files in destination but not in source (Mirror mode)
)
Unchanged = [System.Collections.ArrayList]@(
# Files that match exactly
)
Errors = [System.Collections.ArrayList]@(
# Files that couldn't be processed
)
}
Sync Direction Strategies (Visual)
ONE-WAY MIRROR (Source -> Destination):
=========================================
Source Directory Operations Destination Directory
(After Sync)
├── new_file.txt ---> COPY --------> ├── new_file.txt
├── modified.docx ---> UPDATE -----> ├── modified.docx
├── unchanged.pdf ---> SKIP -------> ├── unchanged.pdf
DELETE <---- X orphan.tmp (deleted)
TWO-WAY SYNC (Bidirectional):
=========================================
Source Directory Operations Destination Directory
(After Sync)
├── source_new.txt ---> COPY --------> ├── source_new.txt
├── conflict.docx ---> RESOLVE ----> ├── conflict.docx (resolution)
├── unchanged.pdf <--- SKIP -------> ├── unchanged.pdf
<--- COPY -------- ├── dest_new.txt
├── dest_new.txt └── dest_new.txt
Phased Implementation Guide
Phase 1: Directory Comparison (2-3 hours)
Goal: List files in source and destination, identify new and missing files.
Learning focus: Recursive enumeration, relative paths, basic comparison
Steps:
- Create the main function skeleton with CmdletBinding
- Validate that source exists
- Create destination if it doesn’t exist
- Enumerate source files with
Get-ChildItem -Recurse -File - Calculate relative paths for each file
- Build a hashtable keyed by relative path
- Repeat for destination
- Compare the two hashtables to find:
- Files only in source (ToCopy)
- Files in both (need hash comparison later)
- Files only in destination (ToDelete if mirror mode)
Verification:
# Create test directories
New-Item -ItemType Directory -Path "C:\TestSync\Source" -Force
New-Item -ItemType Directory -Path "C:\TestSync\Dest" -Force
"Content A" | Out-File "C:\TestSync\Source\fileA.txt"
"Content B" | Out-File "C:\TestSync\Source\fileB.txt"
"Content C" | Out-File "C:\TestSync\Dest\fileC.txt"
# Run comparison
$result = Compare-Directories -Source "C:\TestSync\Source" -Destination "C:\TestSync\Dest"
# Verify
$result.ToCopy.Count # Should be 2 (fileA, fileB)
$result.ToDelete.Count # Should be 1 (fileC) if mirror mode
Key code patterns:
# Calculating relative path
$rootLength = $RootPath.TrimEnd('\').Length + 1
$relativePath = $file.FullName.Substring($rootLength)
# Building the hashtable
$fileTable = @{}
Get-ChildItem -Path $RootPath -Recurse -File | ForEach-Object {
$relativePath = $_.FullName.Substring($rootLength)
$fileTable[$relativePath] = [PSCustomObject]@{
RelativePath = $relativePath
FullPath = $_.FullName
Size = $_.Length
LastWriteTime = $_.LastWriteTime
}
}
Phase 2: CmdletBinding and Parameters (2-3 hours)
Goal: Add professional parameter handling with -WhatIf, -Confirm, -Verbose.
Learning focus: Advanced functions, parameter validation, ShouldProcess
Steps:
- Add
[CmdletBinding(SupportsShouldProcess=$true, ConfirmImpact='High')] - Define parameters with validation:
[ValidateScript({Test-Path $_ -PathType Container})]for Source[ValidateSet('SHA256', 'SHA1', 'MD5')]for HashAlgorithm
- Add Write-Verbose statements throughout
- Wrap file operations in
$PSCmdlet.ShouldProcess()calls - Test all three modes: -WhatIf, -Confirm, and normal execution
Verification:
# Test -WhatIf (should show what would happen, no actual changes)
Sync-Directories -Source C:\A -Destination D:\B -WhatIf
# Verify: D:\B is unchanged
# Test -Verbose (should show detailed progress)
Sync-Directories -Source C:\A -Destination D:\B -Verbose -WhatIf
# Verify: See VERBOSE: messages
# Test -Confirm (should prompt for each operation)
Sync-Directories -Source C:\A -Destination D:\B -Confirm
# Verify: Prompts "Are you sure?" for each file
Key code patterns:
function Sync-Directories {
[CmdletBinding(SupportsShouldProcess=$true, ConfirmImpact='High')]
param(
[Parameter(Mandatory=$true, Position=0)]
[ValidateScript({Test-Path $_ -PathType Container})]
[string]$Source,
[Parameter(Mandatory=$true, Position=1)]
[string]$Destination,
[switch]$Mirror
)
process {
Write-Verbose "Comparing $Source to $Destination..."
foreach ($file in $filesToCopy) {
if ($PSCmdlet.ShouldProcess($file.RelativePath, "Copy new file")) {
Copy-Item -Path $file.FullPath -Destination $destPath
}
}
}
}
Phase 3: File Hashing (2-3 hours)
Goal: Use SHA256 hashes to accurately detect modified files.
Learning focus: Cryptographic hashes, performance optimization
Steps:
- Add hash calculation to file enumeration
- Handle errors during hash calculation (locked files)
- Compare hashes for files that exist in both locations
- Only mark as “ToUpdate” if hashes differ
- Implement hash algorithm parameter (SHA256/SHA1/MD5)
- Add hash caching to avoid recalculating
Verification:
# Create files with same name but different content
"Original content" | Out-File "C:\TestSync\Source\shared.txt"
"Modified content" | Out-File "C:\TestSync\Dest\shared.txt"
# Run comparison
$result = Compare-Directories -Source "C:\TestSync\Source" -Destination "C:\TestSync\Dest"
# Verify
$result.ToUpdate.Count # Should be 1 (shared.txt has different hash)
Performance test:
# Create 1000 files for performance testing
1..1000 | ForEach-Object {
"Content $_" | Out-File "C:\TestSync\Source\file$_.txt"
}
# Measure comparison time
Measure-Command {
Compare-Directories -Source "C:\TestSync\Source" -Destination "C:\TestSync\Dest"
}
# Target: < 30 seconds for 10,000 files
Key code patterns:
function Get-FileHashSafe {
param(
[string]$Path,
[string]$Algorithm = 'SHA256'
)
try {
return (Get-FileHash -Path $Path -Algorithm $Algorithm -ErrorAction Stop).Hash
}
catch {
Write-Warning "Could not hash $Path : $_"
return $null
}
}
Phase 4: Copy/Delete with Progress (2-3 hours)
Goal: Execute sync operations with progress indication.
Learning focus: Write-Progress, error handling, directory creation
Steps:
- Create destination directories as needed
- Copy new files with Copy-Item
- Update modified files with Copy-Item -Force
- Delete orphaned files (if Mirror mode)
- Add Write-Progress showing current file and percentage
- Handle per-file errors without stopping entire sync
- Track success/failure counts
Verification:
# Run actual sync
Sync-Directories -Source C:\TestSync\Source -Destination C:\TestSync\Dest -Verbose
# Verify files were copied
Test-Path "C:\TestSync\Dest\fileA.txt" # Should be True
Test-Path "C:\TestSync\Dest\fileB.txt" # Should be True
# Verify content matches
(Get-FileHash "C:\TestSync\Source\fileA.txt").Hash -eq `
(Get-FileHash "C:\TestSync\Dest\fileA.txt").Hash # Should be True
Key code patterns:
$totalOperations = $toCopy.Count + $toUpdate.Count + $toDelete.Count
$completed = 0
foreach ($file in $toCopy) {
$completed++
$percent = [int](($completed / $totalOperations) * 100)
Write-Progress -Activity "Syncing files" `
-Status "Copying $($file.RelativePath)" `
-PercentComplete $percent
$destPath = Join-Path $Destination $file.RelativePath
$destDir = Split-Path $destPath -Parent
if (-not (Test-Path $destDir)) {
New-Item -ItemType Directory -Path $destDir -Force | Out-Null
}
try {
if ($PSCmdlet.ShouldProcess($file.RelativePath, "Copy file")) {
Copy-Item -Path $file.FullPath -Destination $destPath -Force
$results.Copied++
}
}
catch {
Write-Warning "Failed to copy $($file.RelativePath): $_"
$results.Failed++
}
}
Write-Progress -Activity "Syncing files" -Completed
Phase 5: Logging and Polish (2-3 hours)
Goal: Add comprehensive logging and handle edge cases.
Learning focus: Logging patterns, exclusion filters, final polish
Steps:
- Create logging function that writes to file
- Log all operations (copy, update, delete, errors)
- Add timestamps to log entries
- Implement exclude patterns (e.g.,
*.tmp,node_modules\*) - Add retry logic for transient errors
- Generate final summary report
- Handle edge cases:
- Empty directories
- Read-only files
- Very long paths
- Symbolic links
Verification:
# Run sync with logging
Sync-Directories -Source C:\TestSync\Source -Destination C:\TestSync\Dest `
-LogPath C:\TestSync\sync.log -Verbose
# Check log file
Get-Content C:\TestSync\sync.log
# Test exclude patterns
Sync-Directories -Source C:\TestSync\Source -Destination C:\TestSync\Dest `
-Exclude "*.tmp", "temp\*" -WhatIf
# Verify excluded files are not in the sync list
Key code patterns:
function Write-SyncLog {
param(
[string]$LogPath,
[ValidateSet('INFO', 'WARN', 'ERROR', 'COPY', 'UPDATE', 'DELETE')]
[string]$Level,
[string]$Message
)
$timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
$logLine = "[$timestamp] $($Level.PadRight(6)) - $Message"
Add-Content -Path $LogPath -Value $logLine
Write-Verbose $Message
}
# Exclusion filtering
$excludePatterns = @("*.tmp", "temp\*", "*.bak")
$files = Get-ChildItem -Path $Source -Recurse -File | Where-Object {
$relativePath = $_.FullName.Substring($rootLength)
$excluded = $false
foreach ($pattern in $excludePatterns) {
if ($relativePath -like $pattern) {
Write-Verbose "Excluding: $relativePath (matches $pattern)"
$excluded = $true
break
}
}
-not $excluded
}
Testing Strategy
Unit Tests
| Test ID | Test Name | Input | Expected Result |
|---|---|---|---|
| UT01 | Compare identical directories | Same files both sides | Empty ToCopy, ToUpdate, ToDelete |
| UT02 | Detect new file | File only in source | File appears in ToCopy list |
| UT03 | Detect modified file | Same file, different content | File appears in ToUpdate list |
| UT04 | Detect deleted file | File only in destination | File appears in ToDelete list (mirror mode) |
| UT05 | Hash calculation | Known content | Known hash value |
| UT06 | Relative path calculation | Full path + root | Correct relative path |
| UT07 | Exclude pattern matching | File matching pattern | File excluded from results |
Integration Tests
| Test ID | Test Name | Steps | Expected Result |
|---|---|---|---|
| IT01 | Full sync - empty dest | Source with 100 files, empty dest | All files copied to dest |
| IT02 | Full sync - partial dest | Source with 100 files, dest with 50 | Missing files copied |
| IT03 | WhatIf mode | Run with -WhatIf | No files actually modified |
| IT04 | Confirm mode | Run with -Confirm | Prompts for each operation |
| IT05 | Mirror delete | Extra files in dest, -Mirror | Extra files deleted |
| IT06 | Error handling | Locked file in source | Skip file, continue with others |
| IT07 | Logging | Run with -LogPath | Log file created with entries |
| IT08 | Progress display | Large file set | Progress bar updates correctly |
Performance Tests
| Test ID | Test Name | Input | Target |
|---|---|---|---|
| PT01 | Small directory | 100 files | < 5 seconds |
| PT02 | Medium directory | 1,000 files | < 15 seconds |
| PT03 | Large directory | 10,000 files | < 30 seconds |
| PT04 | Memory usage | 10,000 files | < 500 MB RAM |
Test Script
# test-sync.ps1 - Automated testing script
$testRoot = "C:\TestSync"
# Setup test environment
function Initialize-TestEnvironment {
Remove-Item -Path $testRoot -Recurse -Force -ErrorAction SilentlyContinue
New-Item -ItemType Directory -Path "$testRoot\Source" -Force
New-Item -ItemType Directory -Path "$testRoot\Dest" -Force
}
# Test: New files are detected
function Test-NewFileDetection {
Initialize-TestEnvironment
# Create file only in source
"Content A" | Out-File "$testRoot\Source\newfile.txt"
$result = Compare-Directories -Source "$testRoot\Source" -Destination "$testRoot\Dest"
if ($result.ToCopy.Count -eq 1 -and $result.ToCopy[0].RelativePath -eq "newfile.txt") {
Write-Host "PASS: Test-NewFileDetection" -ForegroundColor Green
} else {
Write-Host "FAIL: Test-NewFileDetection" -ForegroundColor Red
}
}
# Test: Modified files are detected
function Test-ModifiedFileDetection {
Initialize-TestEnvironment
# Create file in both with different content
"Original" | Out-File "$testRoot\Source\shared.txt"
"Modified" | Out-File "$testRoot\Dest\shared.txt"
$result = Compare-Directories -Source "$testRoot\Source" -Destination "$testRoot\Dest"
if ($result.ToUpdate.Count -eq 1) {
Write-Host "PASS: Test-ModifiedFileDetection" -ForegroundColor Green
} else {
Write-Host "FAIL: Test-ModifiedFileDetection" -ForegroundColor Red
}
}
# Test: WhatIf doesn't modify files
function Test-WhatIfMode {
Initialize-TestEnvironment
"Source content" | Out-File "$testRoot\Source\test.txt"
$originalCount = (Get-ChildItem "$testRoot\Dest" -File).Count
Sync-Directories -Source "$testRoot\Source" -Destination "$testRoot\Dest" -WhatIf
$afterCount = (Get-ChildItem "$testRoot\Dest" -File).Count
if ($afterCount -eq $originalCount) {
Write-Host "PASS: Test-WhatIfMode" -ForegroundColor Green
} else {
Write-Host "FAIL: Test-WhatIfMode" -ForegroundColor Red
}
}
# Run all tests
Test-NewFileDetection
Test-ModifiedFileDetection
Test-WhatIfMode
Common Pitfalls and Debugging Tips
Problem: Hash calculation is extremely slow
Symptoms: Sync takes minutes for small directories
Causes:
- Hashing every file, even when not necessary
- Using SHA512 when SHA256 would suffice
- Files on slow network drives
Solutions:
- Size-first comparison: Skip hash if sizes differ
if ($sourceFile.Size -ne $destFile.Size) { # Different sizes = definitely different return "DIFFERENT" } # Only hash if sizes match - Lazy hash calculation: Only calculate hash when needed
```powershell
Don’t pre-calculate all hashes
$fileInfo = [PSCustomObject]@{ RelativePath = $relativePath FullPath = $fullPath Size = $size _hash = $null # Calculate on demand }
Add method to calculate when needed
Add-Member -InputObject $fileInfo -MemberType ScriptMethod -Name GetHash -Value { if ($null -eq $this._hash) { $this._hash = (Get-FileHash -Path $this.FullPath -Algorithm SHA256).Hash } return $this._hash }
3. **Use faster algorithm for initial checks**:
```powershell
# Use MD5 for quick check, SHA256 for verification
$quickHash = Get-FileHash -Path $path -Algorithm MD5
Problem: Long paths fail (path exceeds 260 characters)
Symptoms: Error “The specified path, file name, or both are too long”
Cause: Windows MAX_PATH limit of 260 characters
Solutions:
- Use \?\ prefix (works up to 32,767 characters):
$longPath = "\\?\" + $originalPath Get-ChildItem -LiteralPath $longPath -
Use PowerShell 7+ which has native long path support
- Enable long paths in Windows 10+:
# Registry setting (requires admin + reboot) Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem" ` -Name "LongPathsEnabled" -Value 1
Problem: Permission denied errors
Symptoms: “Access to the path is denied” for some files
Causes:
- File owned by different user
- File in use by another process
- System/protected file
Solutions:
- Per-file error handling:
try { Copy-Item -Path $source -Destination $dest -ErrorAction Stop $results.Copied++ } catch [System.UnauthorizedAccessException] { Write-Warning "Permission denied: $($source) - skipping" $results.PermissionDenied++ } catch [System.IO.IOException] { Write-Warning "File in use: $($source) - skipping" $results.InUse++ } - Retry with delay for locked files: ```powershell $retries = 3 $delay = 1000
for ($i = 0; $i -lt $retries; $i++) { try { Copy-Item -Path $source -Destination $dest -ErrorAction Stop break } catch [System.IO.IOException] { if ($i -eq $retries - 1) { throw } Start-Sleep -Milliseconds $delay $delay *= 2 # Exponential backoff } }
### Problem: Progress bar flickers or updates too fast
**Symptoms**: Console scrolling rapidly, hard to read
**Cause**: Updating progress for every small file
**Solution**: Throttle progress updates:
```powershell
$lastProgressUpdate = [DateTime]::MinValue
$progressInterval = [TimeSpan]::FromMilliseconds(100)
foreach ($file in $files) {
# ... do work ...
if (([DateTime]::Now - $lastProgressUpdate) -gt $progressInterval) {
Write-Progress -Activity "Syncing" -Status $file.Name -PercentComplete $percent
$lastProgressUpdate = [DateTime]::Now
}
}
Problem: Memory usage grows unbounded
Symptoms: PowerShell process consumes gigabytes of RAM
Causes:
- Storing all file contents in memory
- Building massive arrays instead of streaming
Solutions:
- Use ArrayList instead of array +=:
```powershell
BAD: Creates new array each time (O(n^2) memory operations)
$files = @() $files += $newFile # Slow!
GOOD: ArrayList mutates in place
$files = [System.Collections.ArrayList]::new() [void]$files.Add($newFile) # Fast!
2. **Stream results instead of collecting**:
```powershell
# BAD: Collect all then process
$allFiles = Get-ChildItem -Recurse
foreach ($file in $allFiles) { ... }
# GOOD: Stream as enumerated
Get-ChildItem -Recurse | ForEach-Object { ... }
Problem: Sync runs but nothing happens
Symptoms: Script completes, no files copied, no errors
Cause: Often a path issue or incorrect comparison
Debugging:
# Add verbose output everywhere
Write-Verbose "Source path: $Source"
Write-Verbose "Files found in source: $($sourceFiles.Count)"
Write-Verbose "Files found in dest: $($destFiles.Count)"
Write-Verbose "To copy: $($toCopy.Count)"
Write-Verbose "To update: $($toUpdate.Count)"
# Check if paths are being calculated correctly
$sourceFiles | ForEach-Object {
Write-Verbose "Source file: $($_.RelativePath)"
}
# Verify ShouldProcess is being called
if ($PSCmdlet.ShouldProcess($file, "Copy")) {
Write-Verbose "ShouldProcess returned True for $file"
# ... actual copy
} else {
Write-Verbose "ShouldProcess returned False (WhatIf mode)"
}
Extensions and Challenges
Easy Extensions
- JSON output mode - Output comparison results as JSON for automation
$result | ConvertTo-Json -Depth 5 | Out-File "sync_result.json" -
Email notification - Send summary email after sync completes
- Scheduled execution - Create a Windows Task Scheduler job
$action = New-ScheduledTaskAction -Execute "pwsh.exe" ` -Argument "-File C:\Scripts\Sync-Directories.ps1 -Source C:\A -Destination D:\B" $trigger = New-ScheduledTaskTrigger -Daily -At "2:00AM" Register-ScheduledTask -TaskName "NightlySync" -Action $action -Trigger $trigger - Checksum file generation - Generate
.sha256files alongside synced files
Medium Extensions
- Bandwidth throttling - Limit copy speed to avoid saturating network
function Copy-ItemThrottled { param($Source, $Destination, [int]$BytesPerSecond) # Read in chunks, sleep between chunks } -
Incremental sync with journal - Track last sync time, only check modified files
-
Parallel hashing - Use PowerShell 7
ForEach-Object -Parallelfor hash calculation - Compression during transfer - Compress files before copying, decompress at destination
Advanced Extensions
- Two-way sync with conflict resolution - Implement bidirectional sync
-ConflictResolution NewerWins -ConflictResolution SourceWins -ConflictResolution KeepBoth -
Delta sync - Only transfer changed portions of files (like rsync)
-
Encryption - Encrypt files during transfer, decrypt at destination
- Cloud integration - Sync to Azure Blob Storage or AWS S3
Sync-Directories -Source C:\Local -Destination "az://container/path" - Real-time sync - Use FileSystemWatcher to sync on change
$watcher = New-Object System.IO.FileSystemWatcher $watcher.Path = $Source $watcher.IncludeSubdirectories = $true $watcher.EnableRaisingEvents = $true Register-ObjectEvent $watcher "Changed" -Action { Sync-File $Event.SourceEventArgs.FullPath }
Books That Will Help
| Topic | Book | Relevant Chapter(s) |
|---|---|---|
| PowerShell fundamentals | Learn PowerShell in a Month of Lunches (4th ed.) by Travis Plunk et al. | Ch. 3: Using the Help System; Ch. 7: The Pipeline; Ch. 18: Variables |
| Advanced PowerShell | Windows PowerShell in Action (3rd ed.) by Bruce Payette | Ch. 7: Advanced Functions; Ch. 8: Scripts, Functions, and Filters; Ch. 11: Error Handling |
| Hash functions and data integrity | Designing Data-Intensive Applications by Martin Kleppmann | Ch. 5: Replication (checksums for data validation); Ch. 7: Transactions |
| Algorithm efficiency | Introduction to Algorithms by Cormen et al. | Ch. 11: Hash Tables (for O(1) lookup understanding) |
| Error handling patterns | Release It! by Michael Nygard | Ch. 4: Stability Patterns (timeouts, retries, circuit breakers) |
| File systems deep dive | Operating Systems: Three Easy Pieces by Arpaci-Dusseau | Ch. 39-41: Files and Directories |
Quick reference recommendations:
- For PowerShell syntax: Microsoft PowerShell Documentation
- For hash algorithms: NIST Cryptographic Standards
Self-Assessment Checklist
Before considering this project complete, verify you can answer “yes” to all:
Core Functionality
- Script compares two directories correctly
- New files (only in source) are identified
- Modified files are detected by comparing file hashes
- -WhatIf shows what would happen without making changes
- -Confirm prompts before each destructive action
- -Verbose provides detailed progress information
- Progress bar displays during sync operations
- Individual file errors are caught and logged (sync continues)
- Mirror mode correctly deletes files not in source
- Log file captures all operations with timestamps
Performance
- 10,000 files compared in under 30 seconds
- Memory usage stays under 500MB for large directories
- Hashtable-based comparison achieves O(n) complexity
Edge Cases
- Works with empty source or destination directories
- Handles files with special characters in names
- Handles very long file paths (260+ characters)
- Handles read-only files appropriately
- Handles locked files without crashing
- Exclude patterns work correctly
Code Quality
- Functions are well-documented with comment-based help
- Parameter validation prevents invalid input
- No global variables - all state passed via parameters
- Consistent error handling throughout
Understanding
- Can explain why hash comparison is better than timestamp comparison
- Can explain why hashtables provide O(1) lookup
- Can explain how SupportsShouldProcess enables -WhatIf and -Confirm
- Can explain the difference between one-way mirror and two-way sync
- Can describe when to use ErrorAction Stop vs Continue
Resources
Official Documentation
Similar Tools for Reference
- robocopy - Windows robust file copy (study its flags for feature ideas)
- rsync - Unix sync tool (delta transfer algorithm)
- FreeFileSync - Open source GUI sync tool
Interview Preparation
Questions You Should Be Able to Answer
- “Why use file hashes instead of timestamps for comparison?”
- Timestamps are unreliable across timezones, DST changes, and different clocks
- Copy operations may preserve or modify timestamps inconsistently
- Hashes compare actual content, providing definitive equality
- Even if file times differ, identical hashes mean identical content
- “How would you optimize this for 1 million files?”
- Use hashtables for O(1) lookups instead of nested loops
- Stream files instead of loading all into memory
- Implement parallel hashing with PowerShell 7 ForEach-Object -Parallel
- Use size comparison as first-tier filter before hashing
- Consider delta sync with change journals
- “Explain how -WhatIf works in your implementation.”
- CmdletBinding with SupportsShouldProcess adds -WhatIf automatically
- Each operation wrapped in $PSCmdlet.ShouldProcess() check
- When -WhatIf specified, ShouldProcess returns false
- Script shows what would happen without executing
- ConfirmImpact=’High’ ensures prompting for destructive operations
- “How do you handle errors during sync?”
- Per-file try/catch blocks prevent one failure from stopping sync
- Different exception types handled differently (permission vs locked vs network)
- Retry logic with exponential backoff for transient errors
- All errors logged with details for later review
- Summary report shows success/failure counts
- “What’s the time complexity of your comparison algorithm?”
- Building source hashtable: O(n) where n = source file count
- Building destination hashtable: O(m) where m = dest file count
- Comparing: O(n) lookups, each O(1) with hashtable
- Total: O(n + m) = O(n) linear time
- vs naive nested loop: O(n * m) = O(n^2) quadratic time