P06: File Synchronization Tool

P06: File Synchronization Tool

Project Overview

What youโ€™ll build: A PowerShell-based file sync tool that compares two directories and synchronizes themโ€”showing what would change, then applying changes with confirmation.

Attribute Value
Difficulty Intermediate
Time Estimate 1 week
Programming Language PowerShell
Knowledge Area Filesystem, Scripting, Algorithms
Prerequisites Basic PowerShell, understanding of filesystems

Learning Objectives

After completing this project, you will be able to:

  1. Traverse directory trees recursively - Enumerate files across nested folder structures efficiently
  2. Compare files using cryptographic hashes - Use SHA256/MD5 for reliable change detection beyond timestamps
  3. Build production-quality PowerShell cmdlets - Implement CmdletBinding with -WhatIf, -Confirm, and -Verbose
  4. Handle errors at scale - Gracefully recover from individual file failures without stopping the entire sync
  5. Implement efficient comparison algorithms - Use hashtables for O(1) lookups instead of nested loops
  6. Design robust logging systems - Track all operations for audit trails and debugging
  7. Understand sync strategies - Mirror vs. bidirectional sync with conflict resolution

Deep Theoretical Foundation

Filesystem Traversal and Recursion

Understanding directory structure:

A filesystem is a tree data structure where directories (folders) are nodes and files are leaves:

C:\Source\
โ”œโ”€โ”€ Documents\
โ”‚   โ”œโ”€โ”€ report.docx
โ”‚   โ”œโ”€โ”€ data.xlsx
โ”‚   โ””โ”€โ”€ Archive\
โ”‚       โ”œโ”€โ”€ 2024_report.docx
โ”‚       โ””โ”€โ”€ 2023_report.docx
โ”œโ”€โ”€ Images\
โ”‚   โ”œโ”€โ”€ photo1.jpg
โ”‚   โ””โ”€โ”€ photo2.png
โ””โ”€โ”€ config.json

Recursive traversal algorithm:

FUNCTION TraverseDirectory(path):
    FOR each item in path:
        IF item is a file:
            PROCESS file
        ELSE IF item is a directory:
            TraverseDirectory(item)  // Recursive call

In PowerShell, Get-ChildItem -Recurse handles this automatically:

# Non-recursive - only immediate children
Get-ChildItem -Path "C:\Source" -File

# Recursive - all descendants
Get-ChildItem -Path "C:\Source" -File -Recurse

Why recursion matters for sync:

When syncing directories, you need to:

  1. Visit every file in the source
  2. Determine its relative path from root
  3. Check if corresponding file exists in destination
  4. Compare content if it exists
  5. Handle nested subdirectories that may or may not exist

Calculating relative paths:

# Given: C:\Source\Documents\report.docx
# Root:  C:\Source\
# Relative path: Documents\report.docx

$rootPath = "C:\Source\"
$fullPath = "C:\Source\Documents\report.docx"

# Method 1: String manipulation
$relativePath = $fullPath.Substring($rootPath.Length)

# Method 2: .NET Path methods
$relativePath = [System.IO.Path]::GetRelativePath($rootPath, $fullPath)

Relative paths are crucial because they let you map source files to destination files regardless of the actual root paths.

File Hashing for Change Detection

The fundamental problem:

How do you know if two files are the same? Consider:

File A: C:\Source\report.docx      (1,234,567 bytes, modified 2024-12-25 10:30)
File B: D:\Backup\report.docx      (1,234,567 bytes, modified 2024-12-25 10:30)

Are they identical? Maybe. The timestamp and size match, but:

Why timestamp comparison fails:

Scenario Problem
Timezone differences File copied across timezones shows different time
Daylight saving Timestamps shift by 1 hour twice yearly
Clock drift Different machines have different clocks
Copy tools Some preserve original, others use current time
Deliberate changes User may edit an older file
Filesystem limitations FAT32 has 2-second resolution, NTFS has 100-nanosecond

The reliable solution: cryptographic hashes

A hash function transforms any input into a fixed-size output (digest):

"Hello World" โ†’ SHA256 โ†’ a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e
"Hello World!" โ†’ SHA256 โ†’ 7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069

Key properties:

  1. Deterministic: Same input always produces same output
  2. Fixed size: Output is always 256 bits (for SHA256) regardless of input size
  3. Avalanche effect: Small input change causes massive output change
  4. One-way: Cannot reverse-engineer input from output
  5. Collision-resistant: Extremely unlikely for two different inputs to produce same output

Hash algorithm comparison:

Algorithm Output Size Speed Security Use Case
MD5 128 bits Fastest Broken (collisions found) Quick checks, non-security
SHA1 160 bits Fast Weakened Legacy compatibility
SHA256 256 bits Medium Strong Recommended for file sync
SHA512 512 bits Slower Strongest High-security environments
xxHash 64/128 bits Very fast Non-cryptographic When speed matters more than security

PowerShell hash calculation:

# Get hash of a file
$hash = Get-FileHash -Path "C:\file.txt" -Algorithm SHA256
$hash.Hash  # Returns: ABC123DEF456...

# Comparing two files
$sourceHash = (Get-FileHash -Path $sourcePath -Algorithm SHA256).Hash
$destHash = (Get-FileHash -Path $destPath -Algorithm SHA256).Hash

if ($sourceHash -eq $destHash) {
    Write-Host "Files are identical"
} else {
    Write-Host "Files differ - need to sync"
}

Performance consideration:

Hashing reads the entire file, so for large files it can be slow:

File Size       | Approximate Hash Time (SSD)
1 MB            | ~5 ms
100 MB          | ~200 ms
1 GB            | ~2 seconds
10 GB           | ~20 seconds

Optimization strategy: Multi-tier comparison

# Tier 1: Size check (instant)
if ($source.Length -ne $dest.Length) {
    return "DIFFERENT"
}

# Tier 2: Quick timestamp check (instant)
if ($source.LastWriteTime -gt $dest.LastWriteTime) {
    return "SOURCE_NEWER"
}

# Tier 3: Hash comparison (slow but definitive)
$sourceHash = (Get-FileHash -Path $source.FullName -Algorithm SHA256).Hash
$destHash = (Get-FileHash -Path $dest.FullName -Algorithm SHA256).Hash

if ($sourceHash -ne $destHash) {
    return "DIFFERENT"
}

return "IDENTICAL"

Advanced Functions and CmdletBinding

What makes a PowerShell function โ€œadvancedโ€?

Basic functions:

function Copy-MyFile {
    param($Source, $Destination)
    Copy-Item -Path $Source -Destination $Destination
}

Advanced functions add professional-grade features:

function Copy-MyFile {
    [CmdletBinding(SupportsShouldProcess=$true, ConfirmImpact='High')]
    param(
        [Parameter(Mandatory=$true, Position=0, ValueFromPipeline=$true)]
        [ValidateScript({Test-Path $_ -PathType Leaf})]
        [string]$Source,

        [Parameter(Mandatory=$true, Position=1)]
        [string]$Destination
    )

    process {
        if ($PSCmdlet.ShouldProcess($Source, "Copy file")) {
            Copy-Item -Path $Source -Destination $Destination
            Write-Verbose "Copied $Source to $Destination"
        }
    }
}

CmdletBinding parameters explained:

Parameter Purpose
SupportsShouldProcess Enables -WhatIf and -Confirm parameters
ConfirmImpact Controls automatic confirmation prompting
DefaultParameterSetName Sets default when multiple parameter sets exist
PositionalBinding Allows positional parameter usage

SupportsShouldProcess in depth:

When you add SupportsShouldProcess=$true, PowerShell automatically adds -WhatIf and -Confirm parameters:

# User can run:
Sync-Directories -Source C:\A -Destination D:\B -WhatIf      # Preview only
Sync-Directories -Source C:\A -Destination D:\B -Confirm    # Confirm each action
Sync-Directories -Source C:\A -Destination D:\B             # Execute normally

Using $PSCmdlet.ShouldProcess():

# Single parameter: "Performing operation on target"
if ($PSCmdlet.ShouldProcess($targetFile)) {
    Remove-Item $targetFile
}
# Output: What if: Performing the operation "Remove-Item" on target "C:\file.txt".

# Two parameters: "Performing operation on target"
if ($PSCmdlet.ShouldProcess($targetFile, "Delete file")) {
    Remove-Item $targetFile
}
# Output: What if: Performing the operation "Delete file" on target "C:\file.txt".

# Three parameters: Complete control
if ($PSCmdlet.ShouldProcess($target, $action, $verboseWarning)) {
    # Action
}

ConfirmImpact levels:

Level Behavior
None Never auto-prompts
Low Prompts if $ConfirmPreference is Low or lower
Medium Prompts if $ConfirmPreference is Medium or lower
High Always prompts unless -Confirm:$false specified

For a file sync tool that deletes files, use ConfirmImpact='High'.

Write-Verbose for debugging:

function Sync-Directories {
    [CmdletBinding()]
    param(...)

    # These only appear when user specifies -Verbose
    Write-Verbose "Starting directory comparison..."
    Write-Verbose "Found $($files.Count) files in source"
    Write-Verbose "Hash comparison for: $($file.Name)"
}

# User runs:
Sync-Directories -Source C:\A -Destination D:\B -Verbose

Error Handling at Scale

When syncing thousands of files, you need strategies beyond simple try/catch:

The problem with stopping on first error:

# Bad: One locked file stops everything
foreach ($file in $files) {
    Copy-Item -Path $file.FullName -Destination $dest  # Throws on locked file
}
# Result: 5000 files to sync, stops after 100 due to one locked file

Per-file error handling:

$results = @{
    Successful = [System.Collections.ArrayList]::new()
    Failed = [System.Collections.ArrayList]::new()
}

foreach ($file in $files) {
    try {
        Copy-Item -Path $file.FullName -Destination $dest -ErrorAction Stop
        [void]$results.Successful.Add($file.Name)
    }
    catch {
        [void]$results.Failed.Add([PSCustomObject]@{
            File = $file.Name
            Error = $_.Exception.Message
        })
        Write-Warning "Failed to copy $($file.Name): $($_.Exception.Message)"
        # Continue with next file
    }
}

Write-Host "Completed: $($results.Successful.Count) succeeded, $($results.Failed.Count) failed"

Error categories and handling strategies:

Error Type Cause Strategy
File locked Another process has exclusive access Log and skip, retry later
Access denied Insufficient permissions Log and skip, flag for attention
Path too long Path exceeds 260 characters Use \?\ prefix or PowerShell 7
Disk full Destination has no space Stop sync, alert user
Network error Transient network issue Retry with exponential backoff
File changed during copy Source modified mid-copy Re-read and retry

ErrorAction preference values:

# Per-command
Copy-Item -Path $src -Destination $dest -ErrorAction Stop     # Throw exception
Copy-Item -Path $src -Destination $dest -ErrorAction Continue # Show error, continue
Copy-Item -Path $src -Destination $dest -ErrorAction SilentlyContinue  # Suppress
Copy-Item -Path $src -Destination $dest -ErrorAction Inquire  # Ask user

# Script-wide
$ErrorActionPreference = 'Stop'  # All errors throw exceptions

Retry logic with exponential backoff:

function Invoke-WithRetry {
    param(
        [ScriptBlock]$ScriptBlock,
        [int]$MaxRetries = 3,
        [int]$InitialDelayMs = 100
    )

    $attempt = 0
    $delay = $InitialDelayMs

    while ($true) {
        try {
            return & $ScriptBlock
        }
        catch {
            $attempt++
            if ($attempt -ge $MaxRetries) {
                throw $_
            }
            Write-Verbose "Attempt $attempt failed, retrying in ${delay}ms..."
            Start-Sleep -Milliseconds $delay
            $delay *= 2  # Exponential backoff: 100, 200, 400, 800...
        }
    }
}

# Usage
Invoke-WithRetry {
    Copy-Item -Path $src -Destination $dest -ErrorAction Stop
}

Comparison Algorithms: Naive vs. Optimized

The naive approach (O(n^2)):

# For each source file, scan all destination files
foreach ($sourceFile in $sourceFiles) {        # n iterations
    foreach ($destFile in $destFiles) {         # n iterations per source
        if ($sourceFile.Name -eq $destFile.Name) {
            # Compare
            break
        }
    }
}
# Total: n * n = n^2 comparisons

Performance impact:

  • 100 files: 10,000 comparisons
  • 1,000 files: 1,000,000 comparisons
  • 10,000 files: 100,000,000 comparisons (VERY slow)

The optimized approach (O(n)):

# Step 1: Build hashtable from destination (O(n))
$destTable = @{}
foreach ($file in $destFiles) {                 # n iterations
    $relativePath = GetRelativePath $file $destRoot
    $destTable[$relativePath] = $file           # O(1) insertion
}

# Step 2: Check each source file (O(n))
foreach ($file in $sourceFiles) {               # n iterations
    $relativePath = GetRelativePath $file $sourceRoot
    if ($destTable.ContainsKey($relativePath)) { # O(1) lookup
        # File exists in both - compare hashes
    } else {
        # File only in source - needs copy
    }
}
# Total: n + n = 2n = O(n)

Visual comparison:

NAIVE (Nested Loops):
Source: [A, B, C, D, E]
Dest:   [B, D, E, F, G]

A vs B? No    A vs D? No    A vs E? No    A vs F? No    A vs G? No   (5 comparisons)
B vs B? Yes!  (1 comparison)
C vs B? No    C vs D? No    C vs E? No    C vs F? No    C vs G? No   (5 comparisons)
...
Total: Many comparisons

HASHTABLE (Single Pass):
Build table from Dest: {B: file, D: file, E: file, F: file, G: file}

A in table? No  -> Copy
B in table? Yes -> Compare
C in table? No  -> Copy
D in table? Yes -> Compare
E in table? Yes -> Compare
Total: 5 lookups (plus 5 insertions)

Hashtable structure for file comparison:

# Each entry in the hashtable
$fileEntry = [PSCustomObject]@{
    RelativePath  = "Documents\report.docx"     # Key
    FullPath      = "C:\Source\Documents\report.docx"
    Hash          = "ABC123..."                  # Calculated on demand
    Size          = 1234567
    LastWriteTime = [DateTime]"2024-12-25 10:30:00"
}

# The hashtable itself
$fileTable = @{
    "Documents\report.docx" = $fileEntry1
    "Documents\data.xlsx"   = $fileEntry2
    "Images\photo.jpg"      = $fileEntry3
}

# O(1) lookup
$file = $fileTable["Documents\report.docx"]

Sync Direction Strategies

One-way mirror (Source -> Destination):

The destination becomes an exact copy of source:

Source                          Destination (before)      Destination (after)
โ”œโ”€โ”€ file1.txt (new)            โ”œโ”€โ”€ file2.txt             โ”œโ”€โ”€ file1.txt (copied)
โ”œโ”€โ”€ file2.txt (unchanged)      โ”œโ”€โ”€ file3.txt (obsolete)  โ”œโ”€โ”€ file2.txt (kept)
โ””โ”€โ”€ file4.txt (modified)       โ””โ”€โ”€ file4.txt (old)       โ””โ”€โ”€ file4.txt (updated)
                                                          # file3.txt DELETED

Operations:

  • New in source: Copy to destination
  • Modified in source: Update in destination
  • Missing from source: Delete from destination (mirror mode)
  • Unchanged: Skip

Two-way sync (bidirectional):

Both directories can have changes, merged together:

Source (before)                 Destination (before)
โ”œโ”€โ”€ file1.txt (new here)       โ”œโ”€โ”€ file2.txt (new here)
โ”œโ”€โ”€ shared.txt (modified)      โ”œโ”€โ”€ shared.txt (also modified!) <- CONFLICT
โ””โ”€โ”€ old.txt                    โ””โ”€โ”€ old.txt

After two-way sync:
โ”œโ”€โ”€ file1.txt                  โ”œโ”€โ”€ file1.txt (copied from source)
โ”œโ”€โ”€ file2.txt (copied from dest) โ”œโ”€โ”€ file2.txt
โ”œโ”€โ”€ shared.txt (???)           โ”œโ”€โ”€ shared.txt (???)  <- How to resolve?
โ””โ”€โ”€ old.txt                    โ””โ”€โ”€ old.txt

Conflict resolution strategies:

Strategy Description Use Case
Source wins Always take source version Authoritative source
Destination wins Always take destination version Destination is primary
Newer wins Take file with later timestamp General sync
Larger wins Take larger file (no data loss) Append-only logs
Keep both Rename conflicting files User decides later
Manual Stop and ask user Critical data
Merge Attempt to merge changes Text files with merge tools

This project implements one-way mirror - the simpler and more common use case. Two-way sync requires tracking modification times, detecting conflicts, and implementing resolution strategies, which adds significant complexity.


Complete Project Specification

Functional Requirements

ID Requirement Priority Implementation Phase
F1 Compare two directories recursively Must Have Phase 1
F2 Identify new files in source Must Have Phase 1
F3 Identify modified files (by hash) Must Have Phase 3
F4 Identify files to delete (mirror mode) Must Have Phase 1
F5 Copy new/modified files to destination Must Have Phase 4
F6 Delete orphaned files (mirror mode) Must Have Phase 4
F7 Support -WhatIf parameter Must Have Phase 2
F8 Support -Confirm parameter Must Have Phase 2
F9 Support -Verbose parameter Must Have Phase 2
F10 Use file hashes (SHA256) for comparison Should Have Phase 3
F11 Show progress bar during sync Should Have Phase 4
F12 Log all operations to file Should Have Phase 5
F13 Handle locked files gracefully Should Have Phase 4
F14 Support exclude patterns (wildcards) Nice to Have Phase 5
F15 Retry failed operations Nice to Have Phase 5
F16 Generate summary report Nice to Have Phase 5

Non-Functional Requirements

ID Requirement Target
NF1 Compare 10,000 files in < 30 seconds Performance
NF2 Memory usage < 500MB for large directories Resource efficiency
NF3 Continue after individual file errors Reliability
NF4 Provide accurate progress indication User experience
NF5 Work with paths up to 32,000 characters Compatibility
NF6 Support PowerShell 5.1 and 7+ Compatibility

Real World Outcome

When complete, youโ€™ll have a production-ready sync tool:

Example -WhatIf Output

PS> Sync-Directories -Source C:\Work -Destination D:\Backup -Mirror -WhatIf

================================================================================
                         FILE SYNC PREVIEW (DRY RUN)
================================================================================
Source:      C:\Work
Destination: D:\Backup
Mode:        Mirror (destination will match source exactly)
================================================================================

ANALYZING DIRECTORIES...
  Source files found:      1,247
  Destination files found: 1,189
  Comparison complete in:  3.2 seconds

--------------------------------------------------------------------------------
FILES TO BE COPIED (23 new files, 47.3 MB total):
--------------------------------------------------------------------------------
  [NEW]  Documents\Q4_Report.docx                   (2.1 MB)
  [NEW]  Documents\Budget_2025.xlsx                 (856 KB)
  [NEW]  Projects\ClientA\proposal.pdf              (4.2 MB)
  [NEW]  Projects\ClientA\mockups\design_v3.psd     (12.8 MB)
  [NEW]  Scripts\deploy.ps1                         (3 KB)
  ... and 18 more files

--------------------------------------------------------------------------------
FILES TO BE UPDATED (12 modified files, 8.7 MB total):
--------------------------------------------------------------------------------
  [MOD]  Documents\Contacts.csv                     (45 KB -> 52 KB)
         Reason: Content hash differs
  [MOD]  Projects\ClientB\timeline.xlsx             (1.2 MB -> 1.4 MB)
         Reason: Content hash differs
  [MOD]  config.json                                (2 KB -> 2 KB)
         Reason: Content hash differs (same size)
  ... and 9 more files

--------------------------------------------------------------------------------
FILES TO BE DELETED (34 orphaned files, 156.2 MB total):
--------------------------------------------------------------------------------
  [DEL]  Archive\old_project.zip                    (89.4 MB)
         Reason: Not present in source
  [DEL]  temp\cache.dat                             (45.2 MB)
         Reason: Not present in source
  [DEL]  Documents\draft_v1.docx                    (1.2 MB)
         Reason: Not present in source
  ... and 31 more files

================================================================================
                              SUMMARY
================================================================================
  Files to copy:    23 files  (47.3 MB)
  Files to update:  12 files  (8.7 MB)
  Files to delete:  34 files  (156.2 MB)
  Files unchanged:  1,178 files
--------------------------------------------------------------------------------
  Total operations: 69
  Estimated time:   ~45 seconds
================================================================================

What if: Use -WhatIf:$false or omit -WhatIf to execute these changes.

Progress Bar Example

Syncing files [=========================>                ] 65% (812/1,247)
  Currently: Copying Projects\ClientA\mockups\design_v3.psd (12.8 MB)
  Speed: 45.2 MB/s | Elapsed: 00:01:23 | Remaining: ~00:00:44

Log File Format

# sync_log_20241227_143052.txt

================================================================================
SYNC LOG: 2024-12-27 14:30:52
Source: C:\Work
Destination: D:\Backup
Mode: Mirror
================================================================================

[14:30:52] START - Beginning file synchronization
[14:30:52] INFO  - Enumerating source directory: C:\Work
[14:30:53] INFO  - Found 1,247 files in source (2.3 GB total)
[14:30:53] INFO  - Enumerating destination directory: D:\Backup
[14:30:54] INFO  - Found 1,189 files in destination (2.1 GB total)
[14:30:55] INFO  - Comparison complete: 23 to copy, 12 to update, 34 to delete

[14:30:55] COPY  - Documents\Q4_Report.docx (2.1 MB) -> SUCCESS
[14:30:55] COPY  - Documents\Budget_2025.xlsx (856 KB) -> SUCCESS
[14:30:56] COPY  - Projects\ClientA\proposal.pdf (4.2 MB) -> SUCCESS
[14:30:58] COPY  - Projects\ClientA\mockups\design_v3.psd (12.8 MB) -> SUCCESS
[14:30:58] WARN  - Scripts\deploy.ps1 - Access denied, file in use by another process
[14:30:58] RETRY - Scripts\deploy.ps1 - Attempt 1 of 3
[14:30:59] RETRY - Scripts\deploy.ps1 - Attempt 2 of 3
[14:31:00] COPY  - Scripts\deploy.ps1 (3 KB) -> SUCCESS (after retry)

[14:31:15] UPDATE - Documents\Contacts.csv (45 KB -> 52 KB) -> SUCCESS
[14:31:15] UPDATE - Projects\ClientB\timeline.xlsx (1.2 MB -> 1.4 MB) -> SUCCESS
[14:31:16] UPDATE - config.json (2 KB) -> SUCCESS

[14:31:25] DELETE - Archive\old_project.zip (89.4 MB) -> SUCCESS
[14:31:25] DELETE - temp\cache.dat (45.2 MB) -> SUCCESS
[14:31:26] ERROR - Documents\draft_v1.docx - Access denied (file locked)
[14:31:26] DELETE - remaining 31 files -> SUCCESS

[14:31:45] COMPLETE - Synchronization finished

================================================================================
SUMMARY
================================================================================
  Copied:    23 files (47.3 MB)
  Updated:   12 files (8.7 MB)
  Deleted:   33 files (155.0 MB) - 1 failed
  Unchanged: 1,178 files
  Errors:    2 (1 recovered via retry, 1 permanent)
  Duration:  53 seconds
================================================================================

Conflict Resolution Examples

When running in Mirror mode, conflicts are resolved automatically (source wins):

PS> Sync-Directories -Source C:\Work -Destination D:\Backup -Mirror -Verbose

VERBOSE: Conflict detected: Documents\shared.docx
  Source:      Modified 2024-12-27 14:30 (Size: 45,678 bytes, Hash: ABC123...)
  Destination: Modified 2024-12-27 13:15 (Size: 44,892 bytes, Hash: DEF456...)
  Resolution:  Source wins (Mirror mode) - destination will be overwritten
VERBOSE: Copied Documents\shared.docx -> D:\Backup\Documents\shared.docx

For future bidirectional sync (extension challenge):

PS> Sync-Directories -Source C:\Work -Destination D:\Backup -TwoWay -ConflictResolution KeepBoth

WARNING: Conflict detected for Documents\shared.docx
  Source modified:      2024-12-27 14:30:22
  Destination modified: 2024-12-27 14:28:45
  Resolution: Keeping both versions
    -> Documents\shared.docx (destination version)
    -> Documents\shared_CONFLICT_20241227_143022.docx (source version)

Solution Architecture

Component Diagram

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                           SYNC-DIRECTORIES                                   โ”‚
โ”‚                         (Main Entry Point)                                   โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  Parameters:                                                           โ”‚  โ”‚
โ”‚  โ”‚  - Source (mandatory)      - HashAlgorithm (SHA256/MD5)              โ”‚  โ”‚
โ”‚  โ”‚  - Destination (mandatory) - Exclude (patterns)                       โ”‚  โ”‚
โ”‚  โ”‚  - Mirror (switch)         - LogPath (optional)                       โ”‚  โ”‚
โ”‚  โ”‚  - WhatIf, Confirm, Verbose (automatic via CmdletBinding)            โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                     โ”‚                                        โ”‚
โ”‚                                     v                                        โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚                     DIRECTORY ENUMERATOR                               โ”‚  โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”          โ”‚  โ”‚
โ”‚  โ”‚  โ”‚   Get-FileHashTable     โ”‚    โ”‚   Get-FileHashTable     โ”‚          โ”‚  โ”‚
โ”‚  โ”‚  โ”‚   (Source)              โ”‚    โ”‚   (Destination)         โ”‚          โ”‚  โ”‚
โ”‚  โ”‚  โ”‚                         โ”‚    โ”‚                         โ”‚          โ”‚  โ”‚
โ”‚  โ”‚  โ”‚   Returns: Hashtable    โ”‚    โ”‚   Returns: Hashtable    โ”‚          โ”‚  โ”‚
โ”‚  โ”‚  โ”‚   Key: RelativePath     โ”‚    โ”‚   Key: RelativePath     โ”‚          โ”‚  โ”‚
โ”‚  โ”‚  โ”‚   Value: FileInfo       โ”‚    โ”‚   Value: FileInfo       โ”‚          โ”‚  โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜          โ”‚  โ”‚
โ”‚  โ”‚               โ”‚                               โ”‚                        โ”‚  โ”‚
โ”‚  โ”‚               v                               v                        โ”‚  โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚  โ”‚
โ”‚  โ”‚  โ”‚                      COMPARE-DIRECTORIES                         โ”‚  โ”‚  โ”‚
โ”‚  โ”‚  โ”‚                                                                   โ”‚  โ”‚  โ”‚
โ”‚  โ”‚  โ”‚  Input: SourceTable, DestTable, Mirror flag                      โ”‚  โ”‚  โ”‚
โ”‚  โ”‚  โ”‚                                                                   โ”‚  โ”‚  โ”‚
โ”‚  โ”‚  โ”‚  Output: ComparisonResult                                        โ”‚  โ”‚  โ”‚
โ”‚  โ”‚  โ”‚    โ”œโ”€โ”€ ToCopy[]    (files only in source)                       โ”‚  โ”‚  โ”‚
โ”‚  โ”‚  โ”‚    โ”œโ”€โ”€ ToUpdate[]  (files in both, different hash)              โ”‚  โ”‚  โ”‚
โ”‚  โ”‚  โ”‚    โ”œโ”€โ”€ ToDelete[]  (files only in dest, if Mirror)              โ”‚  โ”‚  โ”‚
โ”‚  โ”‚  โ”‚    โ””โ”€โ”€ Unchanged[] (files match exactly)                        โ”‚  โ”‚  โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                     โ”‚                                        โ”‚
โ”‚                                     v                                        โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚                     OPERATION EXECUTOR                                 โ”‚  โ”‚
โ”‚  โ”‚                                                                         โ”‚  โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚  โ”‚
โ”‚  โ”‚  โ”‚ Copy Files  โ”‚  โ”‚Update Files โ”‚  โ”‚Delete Files โ”‚  โ”‚   Logging   โ”‚  โ”‚  โ”‚
โ”‚  โ”‚  โ”‚             โ”‚  โ”‚             โ”‚  โ”‚             โ”‚  โ”‚             โ”‚  โ”‚  โ”‚
โ”‚  โ”‚  โ”‚ ShouldProc  โ”‚  โ”‚ ShouldProc  โ”‚  โ”‚ ShouldProc  โ”‚  โ”‚ Write-Log   โ”‚  โ”‚  โ”‚
โ”‚  โ”‚  โ”‚ Copy-Item   โ”‚  โ”‚ Copy-Item   โ”‚  โ”‚ Remove-Item โ”‚  โ”‚ to file     โ”‚  โ”‚  โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚  โ”‚
โ”‚  โ”‚                                                                         โ”‚  โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”โ”‚  โ”‚
โ”‚  โ”‚  โ”‚                      PROGRESS REPORTER                            โ”‚โ”‚  โ”‚
โ”‚  โ”‚  โ”‚  Write-Progress with current file, percentage, ETA               โ”‚โ”‚  โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                     โ”‚                                        โ”‚
โ”‚                                     v                                        โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚                         RESULTS REPORTER                               โ”‚  โ”‚
โ”‚  โ”‚                                                                         โ”‚  โ”‚
โ”‚  โ”‚  Output: Summary object with counts, errors, duration                  โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Hashtable Structure for File Comparison

# The hashtable key is the relative path (case-insensitive on Windows)
# This enables O(1) lookups when comparing directories

$sourceTable = @{
    "Documents\report.docx" = [PSCustomObject]@{
        RelativePath  = "Documents\report.docx"
        FullPath      = "C:\Source\Documents\report.docx"
        Hash          = "E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855"
        Size          = 1048576      # 1 MB
        LastWriteTime = [DateTime]"2024-12-27 14:30:00"
    }
    "Documents\data.csv" = [PSCustomObject]@{
        RelativePath  = "Documents\data.csv"
        FullPath      = "C:\Source\Documents\data.csv"
        Hash          = "ABC123..."
        Size          = 524288       # 512 KB
        LastWriteTime = [DateTime]"2024-12-26 09:15:00"
    }
    # ... more files
}

$destTable = @{
    "Documents\report.docx" = [PSCustomObject]@{
        RelativePath  = "Documents\report.docx"
        FullPath      = "D:\Backup\Documents\report.docx"
        Hash          = "DIFFERENT_HASH_123..."  # <-- Different! Needs update
        Size          = 1024000
        LastWriteTime = [DateTime]"2024-12-20 10:00:00"
    }
    "Documents\old_file.txt" = [PSCustomObject]@{
        # This file doesn't exist in source - delete if Mirror mode
        ...
    }
}

# Comparison result structure
$comparisonResult = [PSCustomObject]@{
    ToCopy = [System.Collections.ArrayList]@(
        # Files in source but not in destination
    )
    ToUpdate = [System.Collections.ArrayList]@(
        [PSCustomObject]@{
            Source = $sourceTable["Documents\report.docx"]
            Destination = $destTable["Documents\report.docx"]
            Reason = "Content hash differs"
        }
    )
    ToDelete = [System.Collections.ArrayList]@(
        # Files in destination but not in source (Mirror mode)
    )
    Unchanged = [System.Collections.ArrayList]@(
        # Files that match exactly
    )
    Errors = [System.Collections.ArrayList]@(
        # Files that couldn't be processed
    )
}

Sync Direction Strategies (Visual)

ONE-WAY MIRROR (Source -> Destination):
=========================================

Source Directory              Operations              Destination Directory
                                                      (After Sync)
โ”œโ”€โ”€ new_file.txt        ---> COPY -------->          โ”œโ”€โ”€ new_file.txt
โ”œโ”€โ”€ modified.docx       ---> UPDATE ----->           โ”œโ”€โ”€ modified.docx
โ”œโ”€โ”€ unchanged.pdf       ---> SKIP ------->           โ”œโ”€โ”€ unchanged.pdf
                             DELETE <----            X  orphan.tmp (deleted)


TWO-WAY SYNC (Bidirectional):
=========================================

Source Directory              Operations              Destination Directory
                                                      (After Sync)
โ”œโ”€โ”€ source_new.txt      ---> COPY -------->          โ”œโ”€โ”€ source_new.txt
โ”œโ”€โ”€ conflict.docx       ---> RESOLVE ---->           โ”œโ”€โ”€ conflict.docx (resolution)
โ”œโ”€โ”€ unchanged.pdf       <--- SKIP ------->           โ”œโ”€โ”€ unchanged.pdf
                        <--- COPY --------           โ”œโ”€โ”€ dest_new.txt
โ”œโ”€โ”€ dest_new.txt                                     โ””โ”€โ”€ dest_new.txt

Phased Implementation Guide

Phase 1: Directory Comparison (2-3 hours)

Goal: List files in source and destination, identify new and missing files.

Learning focus: Recursive enumeration, relative paths, basic comparison

Steps:

  1. Create the main function skeleton with CmdletBinding
  2. Validate that source exists
  3. Create destination if it doesnโ€™t exist
  4. Enumerate source files with Get-ChildItem -Recurse -File
  5. Calculate relative paths for each file
  6. Build a hashtable keyed by relative path
  7. Repeat for destination
  8. Compare the two hashtables to find:
    • Files only in source (ToCopy)
    • Files in both (need hash comparison later)
    • Files only in destination (ToDelete if mirror mode)

Verification:

# Create test directories
New-Item -ItemType Directory -Path "C:\TestSync\Source" -Force
New-Item -ItemType Directory -Path "C:\TestSync\Dest" -Force
"Content A" | Out-File "C:\TestSync\Source\fileA.txt"
"Content B" | Out-File "C:\TestSync\Source\fileB.txt"
"Content C" | Out-File "C:\TestSync\Dest\fileC.txt"

# Run comparison
$result = Compare-Directories -Source "C:\TestSync\Source" -Destination "C:\TestSync\Dest"

# Verify
$result.ToCopy.Count    # Should be 2 (fileA, fileB)
$result.ToDelete.Count  # Should be 1 (fileC) if mirror mode

Key code patterns:

# Calculating relative path
$rootLength = $RootPath.TrimEnd('\').Length + 1
$relativePath = $file.FullName.Substring($rootLength)

# Building the hashtable
$fileTable = @{}
Get-ChildItem -Path $RootPath -Recurse -File | ForEach-Object {
    $relativePath = $_.FullName.Substring($rootLength)
    $fileTable[$relativePath] = [PSCustomObject]@{
        RelativePath = $relativePath
        FullPath = $_.FullName
        Size = $_.Length
        LastWriteTime = $_.LastWriteTime
    }
}

Phase 2: CmdletBinding and Parameters (2-3 hours)

Goal: Add professional parameter handling with -WhatIf, -Confirm, -Verbose.

Learning focus: Advanced functions, parameter validation, ShouldProcess

Steps:

  1. Add [CmdletBinding(SupportsShouldProcess=$true, ConfirmImpact='High')]
  2. Define parameters with validation:
    • [ValidateScript({Test-Path $_ -PathType Container})] for Source
    • [ValidateSet('SHA256', 'SHA1', 'MD5')] for HashAlgorithm
  3. Add Write-Verbose statements throughout
  4. Wrap file operations in $PSCmdlet.ShouldProcess() calls
  5. Test all three modes: -WhatIf, -Confirm, and normal execution

Verification:

# Test -WhatIf (should show what would happen, no actual changes)
Sync-Directories -Source C:\A -Destination D:\B -WhatIf
# Verify: D:\B is unchanged

# Test -Verbose (should show detailed progress)
Sync-Directories -Source C:\A -Destination D:\B -Verbose -WhatIf
# Verify: See VERBOSE: messages

# Test -Confirm (should prompt for each operation)
Sync-Directories -Source C:\A -Destination D:\B -Confirm
# Verify: Prompts "Are you sure?" for each file

Key code patterns:

function Sync-Directories {
    [CmdletBinding(SupportsShouldProcess=$true, ConfirmImpact='High')]
    param(
        [Parameter(Mandatory=$true, Position=0)]
        [ValidateScript({Test-Path $_ -PathType Container})]
        [string]$Source,

        [Parameter(Mandatory=$true, Position=1)]
        [string]$Destination,

        [switch]$Mirror
    )

    process {
        Write-Verbose "Comparing $Source to $Destination..."

        foreach ($file in $filesToCopy) {
            if ($PSCmdlet.ShouldProcess($file.RelativePath, "Copy new file")) {
                Copy-Item -Path $file.FullPath -Destination $destPath
            }
        }
    }
}

Phase 3: File Hashing (2-3 hours)

Goal: Use SHA256 hashes to accurately detect modified files.

Learning focus: Cryptographic hashes, performance optimization

Steps:

  1. Add hash calculation to file enumeration
  2. Handle errors during hash calculation (locked files)
  3. Compare hashes for files that exist in both locations
  4. Only mark as โ€œToUpdateโ€ if hashes differ
  5. Implement hash algorithm parameter (SHA256/SHA1/MD5)
  6. Add hash caching to avoid recalculating

Verification:

# Create files with same name but different content
"Original content" | Out-File "C:\TestSync\Source\shared.txt"
"Modified content" | Out-File "C:\TestSync\Dest\shared.txt"

# Run comparison
$result = Compare-Directories -Source "C:\TestSync\Source" -Destination "C:\TestSync\Dest"

# Verify
$result.ToUpdate.Count  # Should be 1 (shared.txt has different hash)

Performance test:

# Create 1000 files for performance testing
1..1000 | ForEach-Object {
    "Content $_" | Out-File "C:\TestSync\Source\file$_.txt"
}

# Measure comparison time
Measure-Command {
    Compare-Directories -Source "C:\TestSync\Source" -Destination "C:\TestSync\Dest"
}
# Target: < 30 seconds for 10,000 files

Key code patterns:

function Get-FileHashSafe {
    param(
        [string]$Path,
        [string]$Algorithm = 'SHA256'
    )

    try {
        return (Get-FileHash -Path $Path -Algorithm $Algorithm -ErrorAction Stop).Hash
    }
    catch {
        Write-Warning "Could not hash $Path : $_"
        return $null
    }
}

Phase 4: Copy/Delete with Progress (2-3 hours)

Goal: Execute sync operations with progress indication.

Learning focus: Write-Progress, error handling, directory creation

Steps:

  1. Create destination directories as needed
  2. Copy new files with Copy-Item
  3. Update modified files with Copy-Item -Force
  4. Delete orphaned files (if Mirror mode)
  5. Add Write-Progress showing current file and percentage
  6. Handle per-file errors without stopping entire sync
  7. Track success/failure counts

Verification:

# Run actual sync
Sync-Directories -Source C:\TestSync\Source -Destination C:\TestSync\Dest -Verbose

# Verify files were copied
Test-Path "C:\TestSync\Dest\fileA.txt"  # Should be True
Test-Path "C:\TestSync\Dest\fileB.txt"  # Should be True

# Verify content matches
(Get-FileHash "C:\TestSync\Source\fileA.txt").Hash -eq `
(Get-FileHash "C:\TestSync\Dest\fileA.txt").Hash  # Should be True

Key code patterns:

$totalOperations = $toCopy.Count + $toUpdate.Count + $toDelete.Count
$completed = 0

foreach ($file in $toCopy) {
    $completed++
    $percent = [int](($completed / $totalOperations) * 100)

    Write-Progress -Activity "Syncing files" `
                   -Status "Copying $($file.RelativePath)" `
                   -PercentComplete $percent

    $destPath = Join-Path $Destination $file.RelativePath
    $destDir = Split-Path $destPath -Parent

    if (-not (Test-Path $destDir)) {
        New-Item -ItemType Directory -Path $destDir -Force | Out-Null
    }

    try {
        if ($PSCmdlet.ShouldProcess($file.RelativePath, "Copy file")) {
            Copy-Item -Path $file.FullPath -Destination $destPath -Force
            $results.Copied++
        }
    }
    catch {
        Write-Warning "Failed to copy $($file.RelativePath): $_"
        $results.Failed++
    }
}

Write-Progress -Activity "Syncing files" -Completed

Phase 5: Logging and Polish (2-3 hours)

Goal: Add comprehensive logging and handle edge cases.

Learning focus: Logging patterns, exclusion filters, final polish

Steps:

  1. Create logging function that writes to file
  2. Log all operations (copy, update, delete, errors)
  3. Add timestamps to log entries
  4. Implement exclude patterns (e.g., *.tmp, node_modules\*)
  5. Add retry logic for transient errors
  6. Generate final summary report
  7. Handle edge cases:
    • Empty directories
    • Read-only files
    • Very long paths
    • Symbolic links

Verification:

# Run sync with logging
Sync-Directories -Source C:\TestSync\Source -Destination C:\TestSync\Dest `
                 -LogPath C:\TestSync\sync.log -Verbose

# Check log file
Get-Content C:\TestSync\sync.log

# Test exclude patterns
Sync-Directories -Source C:\TestSync\Source -Destination C:\TestSync\Dest `
                 -Exclude "*.tmp", "temp\*" -WhatIf

# Verify excluded files are not in the sync list

Key code patterns:

function Write-SyncLog {
    param(
        [string]$LogPath,
        [ValidateSet('INFO', 'WARN', 'ERROR', 'COPY', 'UPDATE', 'DELETE')]
        [string]$Level,
        [string]$Message
    )

    $timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
    $logLine = "[$timestamp] $($Level.PadRight(6)) - $Message"

    Add-Content -Path $LogPath -Value $logLine
    Write-Verbose $Message
}

# Exclusion filtering
$excludePatterns = @("*.tmp", "temp\*", "*.bak")

$files = Get-ChildItem -Path $Source -Recurse -File | Where-Object {
    $relativePath = $_.FullName.Substring($rootLength)
    $excluded = $false

    foreach ($pattern in $excludePatterns) {
        if ($relativePath -like $pattern) {
            Write-Verbose "Excluding: $relativePath (matches $pattern)"
            $excluded = $true
            break
        }
    }

    -not $excluded
}

Testing Strategy

Unit Tests

Test ID Test Name Input Expected Result
UT01 Compare identical directories Same files both sides Empty ToCopy, ToUpdate, ToDelete
UT02 Detect new file File only in source File appears in ToCopy list
UT03 Detect modified file Same file, different content File appears in ToUpdate list
UT04 Detect deleted file File only in destination File appears in ToDelete list (mirror mode)
UT05 Hash calculation Known content Known hash value
UT06 Relative path calculation Full path + root Correct relative path
UT07 Exclude pattern matching File matching pattern File excluded from results

Integration Tests

Test ID Test Name Steps Expected Result
IT01 Full sync - empty dest Source with 100 files, empty dest All files copied to dest
IT02 Full sync - partial dest Source with 100 files, dest with 50 Missing files copied
IT03 WhatIf mode Run with -WhatIf No files actually modified
IT04 Confirm mode Run with -Confirm Prompts for each operation
IT05 Mirror delete Extra files in dest, -Mirror Extra files deleted
IT06 Error handling Locked file in source Skip file, continue with others
IT07 Logging Run with -LogPath Log file created with entries
IT08 Progress display Large file set Progress bar updates correctly

Performance Tests

Test ID Test Name Input Target
PT01 Small directory 100 files < 5 seconds
PT02 Medium directory 1,000 files < 15 seconds
PT03 Large directory 10,000 files < 30 seconds
PT04 Memory usage 10,000 files < 500 MB RAM

Test Script

# test-sync.ps1 - Automated testing script

$testRoot = "C:\TestSync"

# Setup test environment
function Initialize-TestEnvironment {
    Remove-Item -Path $testRoot -Recurse -Force -ErrorAction SilentlyContinue
    New-Item -ItemType Directory -Path "$testRoot\Source" -Force
    New-Item -ItemType Directory -Path "$testRoot\Dest" -Force
}

# Test: New files are detected
function Test-NewFileDetection {
    Initialize-TestEnvironment

    # Create file only in source
    "Content A" | Out-File "$testRoot\Source\newfile.txt"

    $result = Compare-Directories -Source "$testRoot\Source" -Destination "$testRoot\Dest"

    if ($result.ToCopy.Count -eq 1 -and $result.ToCopy[0].RelativePath -eq "newfile.txt") {
        Write-Host "PASS: Test-NewFileDetection" -ForegroundColor Green
    } else {
        Write-Host "FAIL: Test-NewFileDetection" -ForegroundColor Red
    }
}

# Test: Modified files are detected
function Test-ModifiedFileDetection {
    Initialize-TestEnvironment

    # Create file in both with different content
    "Original" | Out-File "$testRoot\Source\shared.txt"
    "Modified" | Out-File "$testRoot\Dest\shared.txt"

    $result = Compare-Directories -Source "$testRoot\Source" -Destination "$testRoot\Dest"

    if ($result.ToUpdate.Count -eq 1) {
        Write-Host "PASS: Test-ModifiedFileDetection" -ForegroundColor Green
    } else {
        Write-Host "FAIL: Test-ModifiedFileDetection" -ForegroundColor Red
    }
}

# Test: WhatIf doesn't modify files
function Test-WhatIfMode {
    Initialize-TestEnvironment

    "Source content" | Out-File "$testRoot\Source\test.txt"
    $originalCount = (Get-ChildItem "$testRoot\Dest" -File).Count

    Sync-Directories -Source "$testRoot\Source" -Destination "$testRoot\Dest" -WhatIf

    $afterCount = (Get-ChildItem "$testRoot\Dest" -File).Count

    if ($afterCount -eq $originalCount) {
        Write-Host "PASS: Test-WhatIfMode" -ForegroundColor Green
    } else {
        Write-Host "FAIL: Test-WhatIfMode" -ForegroundColor Red
    }
}

# Run all tests
Test-NewFileDetection
Test-ModifiedFileDetection
Test-WhatIfMode

Common Pitfalls and Debugging Tips

Problem: Hash calculation is extremely slow

Symptoms: Sync takes minutes for small directories

Causes:

  • Hashing every file, even when not necessary
  • Using SHA512 when SHA256 would suffice
  • Files on slow network drives

Solutions:

  1. Size-first comparison: Skip hash if sizes differ
    if ($sourceFile.Size -ne $destFile.Size) {
     # Different sizes = definitely different
     return "DIFFERENT"
    }
    # Only hash if sizes match
    
  2. Lazy hash calculation: Only calculate hash when needed ```powershell

    Donโ€™t pre-calculate all hashes

    $fileInfo = [PSCustomObject]@{ RelativePath = $relativePath FullPath = $fullPath Size = $size _hash = $null # Calculate on demand }

Add method to calculate when needed

Add-Member -InputObject $fileInfo -MemberType ScriptMethod -Name GetHash -Value { if ($null -eq $this._hash) { $this._hash = (Get-FileHash -Path $this.FullPath -Algorithm SHA256).Hash } return $this._hash }


3. **Use faster algorithm for initial checks**:
```powershell
# Use MD5 for quick check, SHA256 for verification
$quickHash = Get-FileHash -Path $path -Algorithm MD5

Problem: Long paths fail (path exceeds 260 characters)

Symptoms: Error โ€œThe specified path, file name, or both are too longโ€

Cause: Windows MAX_PATH limit of 260 characters

Solutions:

  1. Use \?\ prefix (works up to 32,767 characters):
    $longPath = "\\?\" + $originalPath
    Get-ChildItem -LiteralPath $longPath
    
  2. Use PowerShell 7+ which has native long path support

  3. Enable long paths in Windows 10+:
    # Registry setting (requires admin + reboot)
    Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem" `
                  -Name "LongPathsEnabled" -Value 1
    

Problem: Permission denied errors

Symptoms: โ€œAccess to the path is deniedโ€ for some files

Causes:

  • File owned by different user
  • File in use by another process
  • System/protected file

Solutions:

  1. Per-file error handling:
    try {
     Copy-Item -Path $source -Destination $dest -ErrorAction Stop
     $results.Copied++
    }
    catch [System.UnauthorizedAccessException] {
     Write-Warning "Permission denied: $($source) - skipping"
     $results.PermissionDenied++
    }
    catch [System.IO.IOException] {
     Write-Warning "File in use: $($source) - skipping"
     $results.InUse++
    }
    
  2. Retry with delay for locked files: ```powershell $retries = 3 $delay = 1000

for ($i = 0; $i -lt $retries; $i++) { try { Copy-Item -Path $source -Destination $dest -ErrorAction Stop break } catch [System.IO.IOException] { if ($i -eq $retries - 1) { throw } Start-Sleep -Milliseconds $delay $delay *= 2 # Exponential backoff } }


### Problem: Progress bar flickers or updates too fast

**Symptoms**: Console scrolling rapidly, hard to read

**Cause**: Updating progress for every small file

**Solution**: Throttle progress updates:
```powershell
$lastProgressUpdate = [DateTime]::MinValue
$progressInterval = [TimeSpan]::FromMilliseconds(100)

foreach ($file in $files) {
    # ... do work ...

    if (([DateTime]::Now - $lastProgressUpdate) -gt $progressInterval) {
        Write-Progress -Activity "Syncing" -Status $file.Name -PercentComplete $percent
        $lastProgressUpdate = [DateTime]::Now
    }
}

Problem: Memory usage grows unbounded

Symptoms: PowerShell process consumes gigabytes of RAM

Causes:

  • Storing all file contents in memory
  • Building massive arrays instead of streaming

Solutions:

  1. Use ArrayList instead of array +=: ```powershell

    BAD: Creates new array each time (O(n^2) memory operations)

    $files = @() $files += $newFile # Slow!

GOOD: ArrayList mutates in place

$files = [System.Collections.ArrayList]::new() [void]$files.Add($newFile) # Fast!


2. **Stream results instead of collecting**:
```powershell
# BAD: Collect all then process
$allFiles = Get-ChildItem -Recurse
foreach ($file in $allFiles) { ... }

# GOOD: Stream as enumerated
Get-ChildItem -Recurse | ForEach-Object { ... }

Problem: Sync runs but nothing happens

Symptoms: Script completes, no files copied, no errors

Cause: Often a path issue or incorrect comparison

Debugging:

# Add verbose output everywhere
Write-Verbose "Source path: $Source"
Write-Verbose "Files found in source: $($sourceFiles.Count)"
Write-Verbose "Files found in dest: $($destFiles.Count)"
Write-Verbose "To copy: $($toCopy.Count)"
Write-Verbose "To update: $($toUpdate.Count)"

# Check if paths are being calculated correctly
$sourceFiles | ForEach-Object {
    Write-Verbose "Source file: $($_.RelativePath)"
}

# Verify ShouldProcess is being called
if ($PSCmdlet.ShouldProcess($file, "Copy")) {
    Write-Verbose "ShouldProcess returned True for $file"
    # ... actual copy
} else {
    Write-Verbose "ShouldProcess returned False (WhatIf mode)"
}

Extensions and Challenges

Easy Extensions

  1. JSON output mode - Output comparison results as JSON for automation
    $result | ConvertTo-Json -Depth 5 | Out-File "sync_result.json"
    
  2. Email notification - Send summary email after sync completes

  3. Scheduled execution - Create a Windows Task Scheduler job
    $action = New-ScheduledTaskAction -Execute "pwsh.exe" `
     -Argument "-File C:\Scripts\Sync-Directories.ps1 -Source C:\A -Destination D:\B"
    $trigger = New-ScheduledTaskTrigger -Daily -At "2:00AM"
    Register-ScheduledTask -TaskName "NightlySync" -Action $action -Trigger $trigger
    
  4. Checksum file generation - Generate .sha256 files alongside synced files

Medium Extensions

  1. Bandwidth throttling - Limit copy speed to avoid saturating network
    function Copy-ItemThrottled {
     param($Source, $Destination, [int]$BytesPerSecond)
     # Read in chunks, sleep between chunks
    }
    
  2. Incremental sync with journal - Track last sync time, only check modified files

  3. Parallel hashing - Use PowerShell 7 ForEach-Object -Parallel for hash calculation

  4. Compression during transfer - Compress files before copying, decompress at destination

Advanced Extensions

  1. Two-way sync with conflict resolution - Implement bidirectional sync
    -ConflictResolution NewerWins
    -ConflictResolution SourceWins
    -ConflictResolution KeepBoth
    
  2. Delta sync - Only transfer changed portions of files (like rsync)

  3. Encryption - Encrypt files during transfer, decrypt at destination

  4. Cloud integration - Sync to Azure Blob Storage or AWS S3
    Sync-Directories -Source C:\Local -Destination "az://container/path"
    
  5. Real-time sync - Use FileSystemWatcher to sync on change
    $watcher = New-Object System.IO.FileSystemWatcher
    $watcher.Path = $Source
    $watcher.IncludeSubdirectories = $true
    $watcher.EnableRaisingEvents = $true
    Register-ObjectEvent $watcher "Changed" -Action { Sync-File $Event.SourceEventArgs.FullPath }
    

Books That Will Help

Topic Book Relevant Chapter(s)
PowerShell fundamentals Learn PowerShell in a Month of Lunches (4th ed.) by Travis Plunk et al. Ch. 3: Using the Help System; Ch. 7: The Pipeline; Ch. 18: Variables
Advanced PowerShell Windows PowerShell in Action (3rd ed.) by Bruce Payette Ch. 7: Advanced Functions; Ch. 8: Scripts, Functions, and Filters; Ch. 11: Error Handling
Hash functions and data integrity Designing Data-Intensive Applications by Martin Kleppmann Ch. 5: Replication (checksums for data validation); Ch. 7: Transactions
Algorithm efficiency Introduction to Algorithms by Cormen et al. Ch. 11: Hash Tables (for O(1) lookup understanding)
Error handling patterns Release It! by Michael Nygard Ch. 4: Stability Patterns (timeouts, retries, circuit breakers)
File systems deep dive Operating Systems: Three Easy Pieces by Arpaci-Dusseau Ch. 39-41: Files and Directories

Quick reference recommendations:


Self-Assessment Checklist

Before considering this project complete, verify you can answer โ€œyesโ€ to all:

Core Functionality

  • Script compares two directories correctly
  • New files (only in source) are identified
  • Modified files are detected by comparing file hashes
  • -WhatIf shows what would happen without making changes
  • -Confirm prompts before each destructive action
  • -Verbose provides detailed progress information
  • Progress bar displays during sync operations
  • Individual file errors are caught and logged (sync continues)
  • Mirror mode correctly deletes files not in source
  • Log file captures all operations with timestamps

Performance

  • 10,000 files compared in under 30 seconds
  • Memory usage stays under 500MB for large directories
  • Hashtable-based comparison achieves O(n) complexity

Edge Cases

  • Works with empty source or destination directories
  • Handles files with special characters in names
  • Handles very long file paths (260+ characters)
  • Handles read-only files appropriately
  • Handles locked files without crashing
  • Exclude patterns work correctly

Code Quality

  • Functions are well-documented with comment-based help
  • Parameter validation prevents invalid input
  • No global variables - all state passed via parameters
  • Consistent error handling throughout

Understanding

  • Can explain why hash comparison is better than timestamp comparison
  • Can explain why hashtables provide O(1) lookup
  • Can explain how SupportsShouldProcess enables -WhatIf and -Confirm
  • Can explain the difference between one-way mirror and two-way sync
  • Can describe when to use ErrorAction Stop vs Continue

Resources

Official Documentation

Similar Tools for Reference

  • robocopy - Windows robust file copy (study its flags for feature ideas)
  • rsync - Unix sync tool (delta transfer algorithm)
  • FreeFileSync - Open source GUI sync tool

Interview Preparation

Questions You Should Be Able to Answer

  1. โ€œWhy use file hashes instead of timestamps for comparison?โ€
    • Timestamps are unreliable across timezones, DST changes, and different clocks
    • Copy operations may preserve or modify timestamps inconsistently
    • Hashes compare actual content, providing definitive equality
    • Even if file times differ, identical hashes mean identical content
  2. โ€œHow would you optimize this for 1 million files?โ€
    • Use hashtables for O(1) lookups instead of nested loops
    • Stream files instead of loading all into memory
    • Implement parallel hashing with PowerShell 7 ForEach-Object -Parallel
    • Use size comparison as first-tier filter before hashing
    • Consider delta sync with change journals
  3. โ€œExplain how -WhatIf works in your implementation.โ€
    • CmdletBinding with SupportsShouldProcess adds -WhatIf automatically
    • Each operation wrapped in $PSCmdlet.ShouldProcess() check
    • When -WhatIf specified, ShouldProcess returns false
    • Script shows what would happen without executing
    • ConfirmImpact=โ€™Highโ€™ ensures prompting for destructive operations
  4. โ€œHow do you handle errors during sync?โ€
    • Per-file try/catch blocks prevent one failure from stopping sync
    • Different exception types handled differently (permission vs locked vs network)
    • Retry logic with exponential backoff for transient errors
    • All errors logged with details for later review
    • Summary report shows success/failure counts
  5. โ€œWhatโ€™s the time complexity of your comparison algorithm?โ€
    • Building source hashtable: O(n) where n = source file count
    • Building destination hashtable: O(m) where m = dest file count
    • Comparing: O(n) lookups, each O(1) with hashtable
    • Total: O(n + m) = O(n) linear time
    • vs naive nested loop: O(n * m) = O(n^2) quadratic time