P06: File Synchronization Tool
P06: File Synchronization Tool
Project Overview
What youโll build: A PowerShell-based file sync tool that compares two directories and synchronizes themโshowing what would change, then applying changes with confirmation.
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 1 week |
| Programming Language | PowerShell |
| Knowledge Area | Filesystem, Scripting, Algorithms |
| Prerequisites | Basic PowerShell, understanding of filesystems |
Learning Objectives
After completing this project, you will be able to:
- Traverse directory trees recursively - Enumerate files across nested folder structures efficiently
- Compare files using cryptographic hashes - Use SHA256/MD5 for reliable change detection beyond timestamps
- Build production-quality PowerShell cmdlets - Implement CmdletBinding with -WhatIf, -Confirm, and -Verbose
- Handle errors at scale - Gracefully recover from individual file failures without stopping the entire sync
- Implement efficient comparison algorithms - Use hashtables for O(1) lookups instead of nested loops
- Design robust logging systems - Track all operations for audit trails and debugging
- Understand sync strategies - Mirror vs. bidirectional sync with conflict resolution
Deep Theoretical Foundation
Filesystem Traversal and Recursion
Understanding directory structure:
A filesystem is a tree data structure where directories (folders) are nodes and files are leaves:
C:\Source\
โโโ Documents\
โ โโโ report.docx
โ โโโ data.xlsx
โ โโโ Archive\
โ โโโ 2024_report.docx
โ โโโ 2023_report.docx
โโโ Images\
โ โโโ photo1.jpg
โ โโโ photo2.png
โโโ config.json
Recursive traversal algorithm:
FUNCTION TraverseDirectory(path):
FOR each item in path:
IF item is a file:
PROCESS file
ELSE IF item is a directory:
TraverseDirectory(item) // Recursive call
In PowerShell, Get-ChildItem -Recurse handles this automatically:
# Non-recursive - only immediate children
Get-ChildItem -Path "C:\Source" -File
# Recursive - all descendants
Get-ChildItem -Path "C:\Source" -File -Recurse
Why recursion matters for sync:
When syncing directories, you need to:
- Visit every file in the source
- Determine its relative path from root
- Check if corresponding file exists in destination
- Compare content if it exists
- Handle nested subdirectories that may or may not exist
Calculating relative paths:
# Given: C:\Source\Documents\report.docx
# Root: C:\Source\
# Relative path: Documents\report.docx
$rootPath = "C:\Source\"
$fullPath = "C:\Source\Documents\report.docx"
# Method 1: String manipulation
$relativePath = $fullPath.Substring($rootPath.Length)
# Method 2: .NET Path methods
$relativePath = [System.IO.Path]::GetRelativePath($rootPath, $fullPath)
Relative paths are crucial because they let you map source files to destination files regardless of the actual root paths.
File Hashing for Change Detection
The fundamental problem:
How do you know if two files are the same? Consider:
File A: C:\Source\report.docx (1,234,567 bytes, modified 2024-12-25 10:30)
File B: D:\Backup\report.docx (1,234,567 bytes, modified 2024-12-25 10:30)
Are they identical? Maybe. The timestamp and size match, but:
Why timestamp comparison fails:
| Scenario | Problem |
|---|---|
| Timezone differences | File copied across timezones shows different time |
| Daylight saving | Timestamps shift by 1 hour twice yearly |
| Clock drift | Different machines have different clocks |
| Copy tools | Some preserve original, others use current time |
| Deliberate changes | User may edit an older file |
| Filesystem limitations | FAT32 has 2-second resolution, NTFS has 100-nanosecond |
The reliable solution: cryptographic hashes
A hash function transforms any input into a fixed-size output (digest):
"Hello World" โ SHA256 โ a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e
"Hello World!" โ SHA256 โ 7f83b1657ff1fc53b92dc18148a1d65dfc2d4b1fa3d677284addd200126d9069
Key properties:
- Deterministic: Same input always produces same output
- Fixed size: Output is always 256 bits (for SHA256) regardless of input size
- Avalanche effect: Small input change causes massive output change
- One-way: Cannot reverse-engineer input from output
- Collision-resistant: Extremely unlikely for two different inputs to produce same output
Hash algorithm comparison:
| Algorithm | Output Size | Speed | Security | Use Case |
|---|---|---|---|---|
| MD5 | 128 bits | Fastest | Broken (collisions found) | Quick checks, non-security |
| SHA1 | 160 bits | Fast | Weakened | Legacy compatibility |
| SHA256 | 256 bits | Medium | Strong | Recommended for file sync |
| SHA512 | 512 bits | Slower | Strongest | High-security environments |
| xxHash | 64/128 bits | Very fast | Non-cryptographic | When speed matters more than security |
PowerShell hash calculation:
# Get hash of a file
$hash = Get-FileHash -Path "C:\file.txt" -Algorithm SHA256
$hash.Hash # Returns: ABC123DEF456...
# Comparing two files
$sourceHash = (Get-FileHash -Path $sourcePath -Algorithm SHA256).Hash
$destHash = (Get-FileHash -Path $destPath -Algorithm SHA256).Hash
if ($sourceHash -eq $destHash) {
Write-Host "Files are identical"
} else {
Write-Host "Files differ - need to sync"
}
Performance consideration:
Hashing reads the entire file, so for large files it can be slow:
File Size | Approximate Hash Time (SSD)
1 MB | ~5 ms
100 MB | ~200 ms
1 GB | ~2 seconds
10 GB | ~20 seconds
Optimization strategy: Multi-tier comparison
# Tier 1: Size check (instant)
if ($source.Length -ne $dest.Length) {
return "DIFFERENT"
}
# Tier 2: Quick timestamp check (instant)
if ($source.LastWriteTime -gt $dest.LastWriteTime) {
return "SOURCE_NEWER"
}
# Tier 3: Hash comparison (slow but definitive)
$sourceHash = (Get-FileHash -Path $source.FullName -Algorithm SHA256).Hash
$destHash = (Get-FileHash -Path $dest.FullName -Algorithm SHA256).Hash
if ($sourceHash -ne $destHash) {
return "DIFFERENT"
}
return "IDENTICAL"
Advanced Functions and CmdletBinding
What makes a PowerShell function โadvancedโ?
Basic functions:
function Copy-MyFile {
param($Source, $Destination)
Copy-Item -Path $Source -Destination $Destination
}
Advanced functions add professional-grade features:
function Copy-MyFile {
[CmdletBinding(SupportsShouldProcess=$true, ConfirmImpact='High')]
param(
[Parameter(Mandatory=$true, Position=0, ValueFromPipeline=$true)]
[ValidateScript({Test-Path $_ -PathType Leaf})]
[string]$Source,
[Parameter(Mandatory=$true, Position=1)]
[string]$Destination
)
process {
if ($PSCmdlet.ShouldProcess($Source, "Copy file")) {
Copy-Item -Path $Source -Destination $Destination
Write-Verbose "Copied $Source to $Destination"
}
}
}
CmdletBinding parameters explained:
| Parameter | Purpose |
|---|---|
SupportsShouldProcess |
Enables -WhatIf and -Confirm parameters |
ConfirmImpact |
Controls automatic confirmation prompting |
DefaultParameterSetName |
Sets default when multiple parameter sets exist |
PositionalBinding |
Allows positional parameter usage |
SupportsShouldProcess in depth:
When you add SupportsShouldProcess=$true, PowerShell automatically adds -WhatIf and -Confirm parameters:
# User can run:
Sync-Directories -Source C:\A -Destination D:\B -WhatIf # Preview only
Sync-Directories -Source C:\A -Destination D:\B -Confirm # Confirm each action
Sync-Directories -Source C:\A -Destination D:\B # Execute normally
Using $PSCmdlet.ShouldProcess():
# Single parameter: "Performing operation on target"
if ($PSCmdlet.ShouldProcess($targetFile)) {
Remove-Item $targetFile
}
# Output: What if: Performing the operation "Remove-Item" on target "C:\file.txt".
# Two parameters: "Performing operation on target"
if ($PSCmdlet.ShouldProcess($targetFile, "Delete file")) {
Remove-Item $targetFile
}
# Output: What if: Performing the operation "Delete file" on target "C:\file.txt".
# Three parameters: Complete control
if ($PSCmdlet.ShouldProcess($target, $action, $verboseWarning)) {
# Action
}
ConfirmImpact levels:
| Level | Behavior |
|---|---|
| None | Never auto-prompts |
| Low | Prompts if $ConfirmPreference is Low or lower |
| Medium | Prompts if $ConfirmPreference is Medium or lower |
| High | Always prompts unless -Confirm:$false specified |
For a file sync tool that deletes files, use ConfirmImpact='High'.
Write-Verbose for debugging:
function Sync-Directories {
[CmdletBinding()]
param(...)
# These only appear when user specifies -Verbose
Write-Verbose "Starting directory comparison..."
Write-Verbose "Found $($files.Count) files in source"
Write-Verbose "Hash comparison for: $($file.Name)"
}
# User runs:
Sync-Directories -Source C:\A -Destination D:\B -Verbose
Error Handling at Scale
When syncing thousands of files, you need strategies beyond simple try/catch:
The problem with stopping on first error:
# Bad: One locked file stops everything
foreach ($file in $files) {
Copy-Item -Path $file.FullName -Destination $dest # Throws on locked file
}
# Result: 5000 files to sync, stops after 100 due to one locked file
Per-file error handling:
$results = @{
Successful = [System.Collections.ArrayList]::new()
Failed = [System.Collections.ArrayList]::new()
}
foreach ($file in $files) {
try {
Copy-Item -Path $file.FullName -Destination $dest -ErrorAction Stop
[void]$results.Successful.Add($file.Name)
}
catch {
[void]$results.Failed.Add([PSCustomObject]@{
File = $file.Name
Error = $_.Exception.Message
})
Write-Warning "Failed to copy $($file.Name): $($_.Exception.Message)"
# Continue with next file
}
}
Write-Host "Completed: $($results.Successful.Count) succeeded, $($results.Failed.Count) failed"
Error categories and handling strategies:
| Error Type | Cause | Strategy |
|---|---|---|
| File locked | Another process has exclusive access | Log and skip, retry later |
| Access denied | Insufficient permissions | Log and skip, flag for attention |
| Path too long | Path exceeds 260 characters | Use \?\ prefix or PowerShell 7 |
| Disk full | Destination has no space | Stop sync, alert user |
| Network error | Transient network issue | Retry with exponential backoff |
| File changed during copy | Source modified mid-copy | Re-read and retry |
ErrorAction preference values:
# Per-command
Copy-Item -Path $src -Destination $dest -ErrorAction Stop # Throw exception
Copy-Item -Path $src -Destination $dest -ErrorAction Continue # Show error, continue
Copy-Item -Path $src -Destination $dest -ErrorAction SilentlyContinue # Suppress
Copy-Item -Path $src -Destination $dest -ErrorAction Inquire # Ask user
# Script-wide
$ErrorActionPreference = 'Stop' # All errors throw exceptions
Retry logic with exponential backoff:
function Invoke-WithRetry {
param(
[ScriptBlock]$ScriptBlock,
[int]$MaxRetries = 3,
[int]$InitialDelayMs = 100
)
$attempt = 0
$delay = $InitialDelayMs
while ($true) {
try {
return & $ScriptBlock
}
catch {
$attempt++
if ($attempt -ge $MaxRetries) {
throw $_
}
Write-Verbose "Attempt $attempt failed, retrying in ${delay}ms..."
Start-Sleep -Milliseconds $delay
$delay *= 2 # Exponential backoff: 100, 200, 400, 800...
}
}
}
# Usage
Invoke-WithRetry {
Copy-Item -Path $src -Destination $dest -ErrorAction Stop
}
Comparison Algorithms: Naive vs. Optimized
The naive approach (O(n^2)):
# For each source file, scan all destination files
foreach ($sourceFile in $sourceFiles) { # n iterations
foreach ($destFile in $destFiles) { # n iterations per source
if ($sourceFile.Name -eq $destFile.Name) {
# Compare
break
}
}
}
# Total: n * n = n^2 comparisons
Performance impact:
- 100 files: 10,000 comparisons
- 1,000 files: 1,000,000 comparisons
- 10,000 files: 100,000,000 comparisons (VERY slow)
The optimized approach (O(n)):
# Step 1: Build hashtable from destination (O(n))
$destTable = @{}
foreach ($file in $destFiles) { # n iterations
$relativePath = GetRelativePath $file $destRoot
$destTable[$relativePath] = $file # O(1) insertion
}
# Step 2: Check each source file (O(n))
foreach ($file in $sourceFiles) { # n iterations
$relativePath = GetRelativePath $file $sourceRoot
if ($destTable.ContainsKey($relativePath)) { # O(1) lookup
# File exists in both - compare hashes
} else {
# File only in source - needs copy
}
}
# Total: n + n = 2n = O(n)
Visual comparison:
NAIVE (Nested Loops):
Source: [A, B, C, D, E]
Dest: [B, D, E, F, G]
A vs B? No A vs D? No A vs E? No A vs F? No A vs G? No (5 comparisons)
B vs B? Yes! (1 comparison)
C vs B? No C vs D? No C vs E? No C vs F? No C vs G? No (5 comparisons)
...
Total: Many comparisons
HASHTABLE (Single Pass):
Build table from Dest: {B: file, D: file, E: file, F: file, G: file}
A in table? No -> Copy
B in table? Yes -> Compare
C in table? No -> Copy
D in table? Yes -> Compare
E in table? Yes -> Compare
Total: 5 lookups (plus 5 insertions)
Hashtable structure for file comparison:
# Each entry in the hashtable
$fileEntry = [PSCustomObject]@{
RelativePath = "Documents\report.docx" # Key
FullPath = "C:\Source\Documents\report.docx"
Hash = "ABC123..." # Calculated on demand
Size = 1234567
LastWriteTime = [DateTime]"2024-12-25 10:30:00"
}
# The hashtable itself
$fileTable = @{
"Documents\report.docx" = $fileEntry1
"Documents\data.xlsx" = $fileEntry2
"Images\photo.jpg" = $fileEntry3
}
# O(1) lookup
$file = $fileTable["Documents\report.docx"]
Sync Direction Strategies
One-way mirror (Source -> Destination):
The destination becomes an exact copy of source:
Source Destination (before) Destination (after)
โโโ file1.txt (new) โโโ file2.txt โโโ file1.txt (copied)
โโโ file2.txt (unchanged) โโโ file3.txt (obsolete) โโโ file2.txt (kept)
โโโ file4.txt (modified) โโโ file4.txt (old) โโโ file4.txt (updated)
# file3.txt DELETED
Operations:
- New in source: Copy to destination
- Modified in source: Update in destination
- Missing from source: Delete from destination (mirror mode)
- Unchanged: Skip
Two-way sync (bidirectional):
Both directories can have changes, merged together:
Source (before) Destination (before)
โโโ file1.txt (new here) โโโ file2.txt (new here)
โโโ shared.txt (modified) โโโ shared.txt (also modified!) <- CONFLICT
โโโ old.txt โโโ old.txt
After two-way sync:
โโโ file1.txt โโโ file1.txt (copied from source)
โโโ file2.txt (copied from dest) โโโ file2.txt
โโโ shared.txt (???) โโโ shared.txt (???) <- How to resolve?
โโโ old.txt โโโ old.txt
Conflict resolution strategies:
| Strategy | Description | Use Case |
|---|---|---|
| Source wins | Always take source version | Authoritative source |
| Destination wins | Always take destination version | Destination is primary |
| Newer wins | Take file with later timestamp | General sync |
| Larger wins | Take larger file (no data loss) | Append-only logs |
| Keep both | Rename conflicting files | User decides later |
| Manual | Stop and ask user | Critical data |
| Merge | Attempt to merge changes | Text files with merge tools |
This project implements one-way mirror - the simpler and more common use case. Two-way sync requires tracking modification times, detecting conflicts, and implementing resolution strategies, which adds significant complexity.
Complete Project Specification
Functional Requirements
| ID | Requirement | Priority | Implementation Phase |
|---|---|---|---|
| F1 | Compare two directories recursively | Must Have | Phase 1 |
| F2 | Identify new files in source | Must Have | Phase 1 |
| F3 | Identify modified files (by hash) | Must Have | Phase 3 |
| F4 | Identify files to delete (mirror mode) | Must Have | Phase 1 |
| F5 | Copy new/modified files to destination | Must Have | Phase 4 |
| F6 | Delete orphaned files (mirror mode) | Must Have | Phase 4 |
| F7 | Support -WhatIf parameter | Must Have | Phase 2 |
| F8 | Support -Confirm parameter | Must Have | Phase 2 |
| F9 | Support -Verbose parameter | Must Have | Phase 2 |
| F10 | Use file hashes (SHA256) for comparison | Should Have | Phase 3 |
| F11 | Show progress bar during sync | Should Have | Phase 4 |
| F12 | Log all operations to file | Should Have | Phase 5 |
| F13 | Handle locked files gracefully | Should Have | Phase 4 |
| F14 | Support exclude patterns (wildcards) | Nice to Have | Phase 5 |
| F15 | Retry failed operations | Nice to Have | Phase 5 |
| F16 | Generate summary report | Nice to Have | Phase 5 |
Non-Functional Requirements
| ID | Requirement | Target |
|---|---|---|
| NF1 | Compare 10,000 files in < 30 seconds | Performance |
| NF2 | Memory usage < 500MB for large directories | Resource efficiency |
| NF3 | Continue after individual file errors | Reliability |
| NF4 | Provide accurate progress indication | User experience |
| NF5 | Work with paths up to 32,000 characters | Compatibility |
| NF6 | Support PowerShell 5.1 and 7+ | Compatibility |
Real World Outcome
When complete, youโll have a production-ready sync tool:
Example -WhatIf Output
PS> Sync-Directories -Source C:\Work -Destination D:\Backup -Mirror -WhatIf
================================================================================
FILE SYNC PREVIEW (DRY RUN)
================================================================================
Source: C:\Work
Destination: D:\Backup
Mode: Mirror (destination will match source exactly)
================================================================================
ANALYZING DIRECTORIES...
Source files found: 1,247
Destination files found: 1,189
Comparison complete in: 3.2 seconds
--------------------------------------------------------------------------------
FILES TO BE COPIED (23 new files, 47.3 MB total):
--------------------------------------------------------------------------------
[NEW] Documents\Q4_Report.docx (2.1 MB)
[NEW] Documents\Budget_2025.xlsx (856 KB)
[NEW] Projects\ClientA\proposal.pdf (4.2 MB)
[NEW] Projects\ClientA\mockups\design_v3.psd (12.8 MB)
[NEW] Scripts\deploy.ps1 (3 KB)
... and 18 more files
--------------------------------------------------------------------------------
FILES TO BE UPDATED (12 modified files, 8.7 MB total):
--------------------------------------------------------------------------------
[MOD] Documents\Contacts.csv (45 KB -> 52 KB)
Reason: Content hash differs
[MOD] Projects\ClientB\timeline.xlsx (1.2 MB -> 1.4 MB)
Reason: Content hash differs
[MOD] config.json (2 KB -> 2 KB)
Reason: Content hash differs (same size)
... and 9 more files
--------------------------------------------------------------------------------
FILES TO BE DELETED (34 orphaned files, 156.2 MB total):
--------------------------------------------------------------------------------
[DEL] Archive\old_project.zip (89.4 MB)
Reason: Not present in source
[DEL] temp\cache.dat (45.2 MB)
Reason: Not present in source
[DEL] Documents\draft_v1.docx (1.2 MB)
Reason: Not present in source
... and 31 more files
================================================================================
SUMMARY
================================================================================
Files to copy: 23 files (47.3 MB)
Files to update: 12 files (8.7 MB)
Files to delete: 34 files (156.2 MB)
Files unchanged: 1,178 files
--------------------------------------------------------------------------------
Total operations: 69
Estimated time: ~45 seconds
================================================================================
What if: Use -WhatIf:$false or omit -WhatIf to execute these changes.
Progress Bar Example
Syncing files [=========================> ] 65% (812/1,247)
Currently: Copying Projects\ClientA\mockups\design_v3.psd (12.8 MB)
Speed: 45.2 MB/s | Elapsed: 00:01:23 | Remaining: ~00:00:44
Log File Format
# sync_log_20241227_143052.txt
================================================================================
SYNC LOG: 2024-12-27 14:30:52
Source: C:\Work
Destination: D:\Backup
Mode: Mirror
================================================================================
[14:30:52] START - Beginning file synchronization
[14:30:52] INFO - Enumerating source directory: C:\Work
[14:30:53] INFO - Found 1,247 files in source (2.3 GB total)
[14:30:53] INFO - Enumerating destination directory: D:\Backup
[14:30:54] INFO - Found 1,189 files in destination (2.1 GB total)
[14:30:55] INFO - Comparison complete: 23 to copy, 12 to update, 34 to delete
[14:30:55] COPY - Documents\Q4_Report.docx (2.1 MB) -> SUCCESS
[14:30:55] COPY - Documents\Budget_2025.xlsx (856 KB) -> SUCCESS
[14:30:56] COPY - Projects\ClientA\proposal.pdf (4.2 MB) -> SUCCESS
[14:30:58] COPY - Projects\ClientA\mockups\design_v3.psd (12.8 MB) -> SUCCESS
[14:30:58] WARN - Scripts\deploy.ps1 - Access denied, file in use by another process
[14:30:58] RETRY - Scripts\deploy.ps1 - Attempt 1 of 3
[14:30:59] RETRY - Scripts\deploy.ps1 - Attempt 2 of 3
[14:31:00] COPY - Scripts\deploy.ps1 (3 KB) -> SUCCESS (after retry)
[14:31:15] UPDATE - Documents\Contacts.csv (45 KB -> 52 KB) -> SUCCESS
[14:31:15] UPDATE - Projects\ClientB\timeline.xlsx (1.2 MB -> 1.4 MB) -> SUCCESS
[14:31:16] UPDATE - config.json (2 KB) -> SUCCESS
[14:31:25] DELETE - Archive\old_project.zip (89.4 MB) -> SUCCESS
[14:31:25] DELETE - temp\cache.dat (45.2 MB) -> SUCCESS
[14:31:26] ERROR - Documents\draft_v1.docx - Access denied (file locked)
[14:31:26] DELETE - remaining 31 files -> SUCCESS
[14:31:45] COMPLETE - Synchronization finished
================================================================================
SUMMARY
================================================================================
Copied: 23 files (47.3 MB)
Updated: 12 files (8.7 MB)
Deleted: 33 files (155.0 MB) - 1 failed
Unchanged: 1,178 files
Errors: 2 (1 recovered via retry, 1 permanent)
Duration: 53 seconds
================================================================================
Conflict Resolution Examples
When running in Mirror mode, conflicts are resolved automatically (source wins):
PS> Sync-Directories -Source C:\Work -Destination D:\Backup -Mirror -Verbose
VERBOSE: Conflict detected: Documents\shared.docx
Source: Modified 2024-12-27 14:30 (Size: 45,678 bytes, Hash: ABC123...)
Destination: Modified 2024-12-27 13:15 (Size: 44,892 bytes, Hash: DEF456...)
Resolution: Source wins (Mirror mode) - destination will be overwritten
VERBOSE: Copied Documents\shared.docx -> D:\Backup\Documents\shared.docx
For future bidirectional sync (extension challenge):
PS> Sync-Directories -Source C:\Work -Destination D:\Backup -TwoWay -ConflictResolution KeepBoth
WARNING: Conflict detected for Documents\shared.docx
Source modified: 2024-12-27 14:30:22
Destination modified: 2024-12-27 14:28:45
Resolution: Keeping both versions
-> Documents\shared.docx (destination version)
-> Documents\shared_CONFLICT_20241227_143022.docx (source version)
Solution Architecture
Component Diagram
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SYNC-DIRECTORIES โ
โ (Main Entry Point) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Parameters: โ โ
โ โ - Source (mandatory) - HashAlgorithm (SHA256/MD5) โ โ
โ โ - Destination (mandatory) - Exclude (patterns) โ โ
โ โ - Mirror (switch) - LogPath (optional) โ โ
โ โ - WhatIf, Confirm, Verbose (automatic via CmdletBinding) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ v โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ DIRECTORY ENUMERATOR โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ Get-FileHashTable โ โ Get-FileHashTable โ โ โ
โ โ โ (Source) โ โ (Destination) โ โ โ
โ โ โ โ โ โ โ โ
โ โ โ Returns: Hashtable โ โ Returns: Hashtable โ โ โ
โ โ โ Key: RelativePath โ โ Key: RelativePath โ โ โ
โ โ โ Value: FileInfo โ โ Value: FileInfo โ โ โ
โ โ โโโโโโโโโโโโโโฌโโโโโโโโโโโโโ โโโโโโโโโโโโโโโฌโโโโโโโโโโโโ โ โ
โ โ โ โ โ โ
โ โ v v โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ COMPARE-DIRECTORIES โ โ โ
โ โ โ โ โ โ
โ โ โ Input: SourceTable, DestTable, Mirror flag โ โ โ
โ โ โ โ โ โ
โ โ โ Output: ComparisonResult โ โ โ
โ โ โ โโโ ToCopy[] (files only in source) โ โ โ
โ โ โ โโโ ToUpdate[] (files in both, different hash) โ โ โ
โ โ โ โโโ ToDelete[] (files only in dest, if Mirror) โ โ โ
โ โ โ โโโ Unchanged[] (files match exactly) โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ v โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ OPERATION EXECUTOR โ โ
โ โ โ โ
โ โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ โ
โ โ โ Copy Files โ โUpdate Files โ โDelete Files โ โ Logging โ โ โ
โ โ โ โ โ โ โ โ โ โ โ โ
โ โ โ ShouldProc โ โ ShouldProc โ โ ShouldProc โ โ Write-Log โ โ โ
โ โ โ Copy-Item โ โ Copy-Item โ โ Remove-Item โ โ to file โ โ โ
โ โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ โ
โ โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ PROGRESS REPORTER โโ โ
โ โ โ Write-Progress with current file, percentage, ETA โโ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ v โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ RESULTS REPORTER โ โ
โ โ โ โ
โ โ Output: Summary object with counts, errors, duration โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Hashtable Structure for File Comparison
# The hashtable key is the relative path (case-insensitive on Windows)
# This enables O(1) lookups when comparing directories
$sourceTable = @{
"Documents\report.docx" = [PSCustomObject]@{
RelativePath = "Documents\report.docx"
FullPath = "C:\Source\Documents\report.docx"
Hash = "E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855"
Size = 1048576 # 1 MB
LastWriteTime = [DateTime]"2024-12-27 14:30:00"
}
"Documents\data.csv" = [PSCustomObject]@{
RelativePath = "Documents\data.csv"
FullPath = "C:\Source\Documents\data.csv"
Hash = "ABC123..."
Size = 524288 # 512 KB
LastWriteTime = [DateTime]"2024-12-26 09:15:00"
}
# ... more files
}
$destTable = @{
"Documents\report.docx" = [PSCustomObject]@{
RelativePath = "Documents\report.docx"
FullPath = "D:\Backup\Documents\report.docx"
Hash = "DIFFERENT_HASH_123..." # <-- Different! Needs update
Size = 1024000
LastWriteTime = [DateTime]"2024-12-20 10:00:00"
}
"Documents\old_file.txt" = [PSCustomObject]@{
# This file doesn't exist in source - delete if Mirror mode
...
}
}
# Comparison result structure
$comparisonResult = [PSCustomObject]@{
ToCopy = [System.Collections.ArrayList]@(
# Files in source but not in destination
)
ToUpdate = [System.Collections.ArrayList]@(
[PSCustomObject]@{
Source = $sourceTable["Documents\report.docx"]
Destination = $destTable["Documents\report.docx"]
Reason = "Content hash differs"
}
)
ToDelete = [System.Collections.ArrayList]@(
# Files in destination but not in source (Mirror mode)
)
Unchanged = [System.Collections.ArrayList]@(
# Files that match exactly
)
Errors = [System.Collections.ArrayList]@(
# Files that couldn't be processed
)
}
Sync Direction Strategies (Visual)
ONE-WAY MIRROR (Source -> Destination):
=========================================
Source Directory Operations Destination Directory
(After Sync)
โโโ new_file.txt ---> COPY --------> โโโ new_file.txt
โโโ modified.docx ---> UPDATE -----> โโโ modified.docx
โโโ unchanged.pdf ---> SKIP -------> โโโ unchanged.pdf
DELETE <---- X orphan.tmp (deleted)
TWO-WAY SYNC (Bidirectional):
=========================================
Source Directory Operations Destination Directory
(After Sync)
โโโ source_new.txt ---> COPY --------> โโโ source_new.txt
โโโ conflict.docx ---> RESOLVE ----> โโโ conflict.docx (resolution)
โโโ unchanged.pdf <--- SKIP -------> โโโ unchanged.pdf
<--- COPY -------- โโโ dest_new.txt
โโโ dest_new.txt โโโ dest_new.txt
Phased Implementation Guide
Phase 1: Directory Comparison (2-3 hours)
Goal: List files in source and destination, identify new and missing files.
Learning focus: Recursive enumeration, relative paths, basic comparison
Steps:
- Create the main function skeleton with CmdletBinding
- Validate that source exists
- Create destination if it doesnโt exist
- Enumerate source files with
Get-ChildItem -Recurse -File - Calculate relative paths for each file
- Build a hashtable keyed by relative path
- Repeat for destination
- Compare the two hashtables to find:
- Files only in source (ToCopy)
- Files in both (need hash comparison later)
- Files only in destination (ToDelete if mirror mode)
Verification:
# Create test directories
New-Item -ItemType Directory -Path "C:\TestSync\Source" -Force
New-Item -ItemType Directory -Path "C:\TestSync\Dest" -Force
"Content A" | Out-File "C:\TestSync\Source\fileA.txt"
"Content B" | Out-File "C:\TestSync\Source\fileB.txt"
"Content C" | Out-File "C:\TestSync\Dest\fileC.txt"
# Run comparison
$result = Compare-Directories -Source "C:\TestSync\Source" -Destination "C:\TestSync\Dest"
# Verify
$result.ToCopy.Count # Should be 2 (fileA, fileB)
$result.ToDelete.Count # Should be 1 (fileC) if mirror mode
Key code patterns:
# Calculating relative path
$rootLength = $RootPath.TrimEnd('\').Length + 1
$relativePath = $file.FullName.Substring($rootLength)
# Building the hashtable
$fileTable = @{}
Get-ChildItem -Path $RootPath -Recurse -File | ForEach-Object {
$relativePath = $_.FullName.Substring($rootLength)
$fileTable[$relativePath] = [PSCustomObject]@{
RelativePath = $relativePath
FullPath = $_.FullName
Size = $_.Length
LastWriteTime = $_.LastWriteTime
}
}
Phase 2: CmdletBinding and Parameters (2-3 hours)
Goal: Add professional parameter handling with -WhatIf, -Confirm, -Verbose.
Learning focus: Advanced functions, parameter validation, ShouldProcess
Steps:
- Add
[CmdletBinding(SupportsShouldProcess=$true, ConfirmImpact='High')] - Define parameters with validation:
[ValidateScript({Test-Path $_ -PathType Container})]for Source[ValidateSet('SHA256', 'SHA1', 'MD5')]for HashAlgorithm
- Add Write-Verbose statements throughout
- Wrap file operations in
$PSCmdlet.ShouldProcess()calls - Test all three modes: -WhatIf, -Confirm, and normal execution
Verification:
# Test -WhatIf (should show what would happen, no actual changes)
Sync-Directories -Source C:\A -Destination D:\B -WhatIf
# Verify: D:\B is unchanged
# Test -Verbose (should show detailed progress)
Sync-Directories -Source C:\A -Destination D:\B -Verbose -WhatIf
# Verify: See VERBOSE: messages
# Test -Confirm (should prompt for each operation)
Sync-Directories -Source C:\A -Destination D:\B -Confirm
# Verify: Prompts "Are you sure?" for each file
Key code patterns:
function Sync-Directories {
[CmdletBinding(SupportsShouldProcess=$true, ConfirmImpact='High')]
param(
[Parameter(Mandatory=$true, Position=0)]
[ValidateScript({Test-Path $_ -PathType Container})]
[string]$Source,
[Parameter(Mandatory=$true, Position=1)]
[string]$Destination,
[switch]$Mirror
)
process {
Write-Verbose "Comparing $Source to $Destination..."
foreach ($file in $filesToCopy) {
if ($PSCmdlet.ShouldProcess($file.RelativePath, "Copy new file")) {
Copy-Item -Path $file.FullPath -Destination $destPath
}
}
}
}
Phase 3: File Hashing (2-3 hours)
Goal: Use SHA256 hashes to accurately detect modified files.
Learning focus: Cryptographic hashes, performance optimization
Steps:
- Add hash calculation to file enumeration
- Handle errors during hash calculation (locked files)
- Compare hashes for files that exist in both locations
- Only mark as โToUpdateโ if hashes differ
- Implement hash algorithm parameter (SHA256/SHA1/MD5)
- Add hash caching to avoid recalculating
Verification:
# Create files with same name but different content
"Original content" | Out-File "C:\TestSync\Source\shared.txt"
"Modified content" | Out-File "C:\TestSync\Dest\shared.txt"
# Run comparison
$result = Compare-Directories -Source "C:\TestSync\Source" -Destination "C:\TestSync\Dest"
# Verify
$result.ToUpdate.Count # Should be 1 (shared.txt has different hash)
Performance test:
# Create 1000 files for performance testing
1..1000 | ForEach-Object {
"Content $_" | Out-File "C:\TestSync\Source\file$_.txt"
}
# Measure comparison time
Measure-Command {
Compare-Directories -Source "C:\TestSync\Source" -Destination "C:\TestSync\Dest"
}
# Target: < 30 seconds for 10,000 files
Key code patterns:
function Get-FileHashSafe {
param(
[string]$Path,
[string]$Algorithm = 'SHA256'
)
try {
return (Get-FileHash -Path $Path -Algorithm $Algorithm -ErrorAction Stop).Hash
}
catch {
Write-Warning "Could not hash $Path : $_"
return $null
}
}
Phase 4: Copy/Delete with Progress (2-3 hours)
Goal: Execute sync operations with progress indication.
Learning focus: Write-Progress, error handling, directory creation
Steps:
- Create destination directories as needed
- Copy new files with Copy-Item
- Update modified files with Copy-Item -Force
- Delete orphaned files (if Mirror mode)
- Add Write-Progress showing current file and percentage
- Handle per-file errors without stopping entire sync
- Track success/failure counts
Verification:
# Run actual sync
Sync-Directories -Source C:\TestSync\Source -Destination C:\TestSync\Dest -Verbose
# Verify files were copied
Test-Path "C:\TestSync\Dest\fileA.txt" # Should be True
Test-Path "C:\TestSync\Dest\fileB.txt" # Should be True
# Verify content matches
(Get-FileHash "C:\TestSync\Source\fileA.txt").Hash -eq `
(Get-FileHash "C:\TestSync\Dest\fileA.txt").Hash # Should be True
Key code patterns:
$totalOperations = $toCopy.Count + $toUpdate.Count + $toDelete.Count
$completed = 0
foreach ($file in $toCopy) {
$completed++
$percent = [int](($completed / $totalOperations) * 100)
Write-Progress -Activity "Syncing files" `
-Status "Copying $($file.RelativePath)" `
-PercentComplete $percent
$destPath = Join-Path $Destination $file.RelativePath
$destDir = Split-Path $destPath -Parent
if (-not (Test-Path $destDir)) {
New-Item -ItemType Directory -Path $destDir -Force | Out-Null
}
try {
if ($PSCmdlet.ShouldProcess($file.RelativePath, "Copy file")) {
Copy-Item -Path $file.FullPath -Destination $destPath -Force
$results.Copied++
}
}
catch {
Write-Warning "Failed to copy $($file.RelativePath): $_"
$results.Failed++
}
}
Write-Progress -Activity "Syncing files" -Completed
Phase 5: Logging and Polish (2-3 hours)
Goal: Add comprehensive logging and handle edge cases.
Learning focus: Logging patterns, exclusion filters, final polish
Steps:
- Create logging function that writes to file
- Log all operations (copy, update, delete, errors)
- Add timestamps to log entries
- Implement exclude patterns (e.g.,
*.tmp,node_modules\*) - Add retry logic for transient errors
- Generate final summary report
- Handle edge cases:
- Empty directories
- Read-only files
- Very long paths
- Symbolic links
Verification:
# Run sync with logging
Sync-Directories -Source C:\TestSync\Source -Destination C:\TestSync\Dest `
-LogPath C:\TestSync\sync.log -Verbose
# Check log file
Get-Content C:\TestSync\sync.log
# Test exclude patterns
Sync-Directories -Source C:\TestSync\Source -Destination C:\TestSync\Dest `
-Exclude "*.tmp", "temp\*" -WhatIf
# Verify excluded files are not in the sync list
Key code patterns:
function Write-SyncLog {
param(
[string]$LogPath,
[ValidateSet('INFO', 'WARN', 'ERROR', 'COPY', 'UPDATE', 'DELETE')]
[string]$Level,
[string]$Message
)
$timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
$logLine = "[$timestamp] $($Level.PadRight(6)) - $Message"
Add-Content -Path $LogPath -Value $logLine
Write-Verbose $Message
}
# Exclusion filtering
$excludePatterns = @("*.tmp", "temp\*", "*.bak")
$files = Get-ChildItem -Path $Source -Recurse -File | Where-Object {
$relativePath = $_.FullName.Substring($rootLength)
$excluded = $false
foreach ($pattern in $excludePatterns) {
if ($relativePath -like $pattern) {
Write-Verbose "Excluding: $relativePath (matches $pattern)"
$excluded = $true
break
}
}
-not $excluded
}
Testing Strategy
Unit Tests
| Test ID | Test Name | Input | Expected Result |
|---|---|---|---|
| UT01 | Compare identical directories | Same files both sides | Empty ToCopy, ToUpdate, ToDelete |
| UT02 | Detect new file | File only in source | File appears in ToCopy list |
| UT03 | Detect modified file | Same file, different content | File appears in ToUpdate list |
| UT04 | Detect deleted file | File only in destination | File appears in ToDelete list (mirror mode) |
| UT05 | Hash calculation | Known content | Known hash value |
| UT06 | Relative path calculation | Full path + root | Correct relative path |
| UT07 | Exclude pattern matching | File matching pattern | File excluded from results |
Integration Tests
| Test ID | Test Name | Steps | Expected Result |
|---|---|---|---|
| IT01 | Full sync - empty dest | Source with 100 files, empty dest | All files copied to dest |
| IT02 | Full sync - partial dest | Source with 100 files, dest with 50 | Missing files copied |
| IT03 | WhatIf mode | Run with -WhatIf | No files actually modified |
| IT04 | Confirm mode | Run with -Confirm | Prompts for each operation |
| IT05 | Mirror delete | Extra files in dest, -Mirror | Extra files deleted |
| IT06 | Error handling | Locked file in source | Skip file, continue with others |
| IT07 | Logging | Run with -LogPath | Log file created with entries |
| IT08 | Progress display | Large file set | Progress bar updates correctly |
Performance Tests
| Test ID | Test Name | Input | Target |
|---|---|---|---|
| PT01 | Small directory | 100 files | < 5 seconds |
| PT02 | Medium directory | 1,000 files | < 15 seconds |
| PT03 | Large directory | 10,000 files | < 30 seconds |
| PT04 | Memory usage | 10,000 files | < 500 MB RAM |
Test Script
# test-sync.ps1 - Automated testing script
$testRoot = "C:\TestSync"
# Setup test environment
function Initialize-TestEnvironment {
Remove-Item -Path $testRoot -Recurse -Force -ErrorAction SilentlyContinue
New-Item -ItemType Directory -Path "$testRoot\Source" -Force
New-Item -ItemType Directory -Path "$testRoot\Dest" -Force
}
# Test: New files are detected
function Test-NewFileDetection {
Initialize-TestEnvironment
# Create file only in source
"Content A" | Out-File "$testRoot\Source\newfile.txt"
$result = Compare-Directories -Source "$testRoot\Source" -Destination "$testRoot\Dest"
if ($result.ToCopy.Count -eq 1 -and $result.ToCopy[0].RelativePath -eq "newfile.txt") {
Write-Host "PASS: Test-NewFileDetection" -ForegroundColor Green
} else {
Write-Host "FAIL: Test-NewFileDetection" -ForegroundColor Red
}
}
# Test: Modified files are detected
function Test-ModifiedFileDetection {
Initialize-TestEnvironment
# Create file in both with different content
"Original" | Out-File "$testRoot\Source\shared.txt"
"Modified" | Out-File "$testRoot\Dest\shared.txt"
$result = Compare-Directories -Source "$testRoot\Source" -Destination "$testRoot\Dest"
if ($result.ToUpdate.Count -eq 1) {
Write-Host "PASS: Test-ModifiedFileDetection" -ForegroundColor Green
} else {
Write-Host "FAIL: Test-ModifiedFileDetection" -ForegroundColor Red
}
}
# Test: WhatIf doesn't modify files
function Test-WhatIfMode {
Initialize-TestEnvironment
"Source content" | Out-File "$testRoot\Source\test.txt"
$originalCount = (Get-ChildItem "$testRoot\Dest" -File).Count
Sync-Directories -Source "$testRoot\Source" -Destination "$testRoot\Dest" -WhatIf
$afterCount = (Get-ChildItem "$testRoot\Dest" -File).Count
if ($afterCount -eq $originalCount) {
Write-Host "PASS: Test-WhatIfMode" -ForegroundColor Green
} else {
Write-Host "FAIL: Test-WhatIfMode" -ForegroundColor Red
}
}
# Run all tests
Test-NewFileDetection
Test-ModifiedFileDetection
Test-WhatIfMode
Common Pitfalls and Debugging Tips
Problem: Hash calculation is extremely slow
Symptoms: Sync takes minutes for small directories
Causes:
- Hashing every file, even when not necessary
- Using SHA512 when SHA256 would suffice
- Files on slow network drives
Solutions:
- Size-first comparison: Skip hash if sizes differ
if ($sourceFile.Size -ne $destFile.Size) { # Different sizes = definitely different return "DIFFERENT" } # Only hash if sizes match - Lazy hash calculation: Only calculate hash when needed
```powershell
Donโt pre-calculate all hashes
$fileInfo = [PSCustomObject]@{ RelativePath = $relativePath FullPath = $fullPath Size = $size _hash = $null # Calculate on demand }
Add method to calculate when needed
Add-Member -InputObject $fileInfo -MemberType ScriptMethod -Name GetHash -Value { if ($null -eq $this._hash) { $this._hash = (Get-FileHash -Path $this.FullPath -Algorithm SHA256).Hash } return $this._hash }
3. **Use faster algorithm for initial checks**:
```powershell
# Use MD5 for quick check, SHA256 for verification
$quickHash = Get-FileHash -Path $path -Algorithm MD5
Problem: Long paths fail (path exceeds 260 characters)
Symptoms: Error โThe specified path, file name, or both are too longโ
Cause: Windows MAX_PATH limit of 260 characters
Solutions:
- Use \?\ prefix (works up to 32,767 characters):
$longPath = "\\?\" + $originalPath Get-ChildItem -LiteralPath $longPath -
Use PowerShell 7+ which has native long path support
- Enable long paths in Windows 10+:
# Registry setting (requires admin + reboot) Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem" ` -Name "LongPathsEnabled" -Value 1
Problem: Permission denied errors
Symptoms: โAccess to the path is deniedโ for some files
Causes:
- File owned by different user
- File in use by another process
- System/protected file
Solutions:
- Per-file error handling:
try { Copy-Item -Path $source -Destination $dest -ErrorAction Stop $results.Copied++ } catch [System.UnauthorizedAccessException] { Write-Warning "Permission denied: $($source) - skipping" $results.PermissionDenied++ } catch [System.IO.IOException] { Write-Warning "File in use: $($source) - skipping" $results.InUse++ } - Retry with delay for locked files: ```powershell $retries = 3 $delay = 1000
for ($i = 0; $i -lt $retries; $i++) { try { Copy-Item -Path $source -Destination $dest -ErrorAction Stop break } catch [System.IO.IOException] { if ($i -eq $retries - 1) { throw } Start-Sleep -Milliseconds $delay $delay *= 2 # Exponential backoff } }
### Problem: Progress bar flickers or updates too fast
**Symptoms**: Console scrolling rapidly, hard to read
**Cause**: Updating progress for every small file
**Solution**: Throttle progress updates:
```powershell
$lastProgressUpdate = [DateTime]::MinValue
$progressInterval = [TimeSpan]::FromMilliseconds(100)
foreach ($file in $files) {
# ... do work ...
if (([DateTime]::Now - $lastProgressUpdate) -gt $progressInterval) {
Write-Progress -Activity "Syncing" -Status $file.Name -PercentComplete $percent
$lastProgressUpdate = [DateTime]::Now
}
}
Problem: Memory usage grows unbounded
Symptoms: PowerShell process consumes gigabytes of RAM
Causes:
- Storing all file contents in memory
- Building massive arrays instead of streaming
Solutions:
- Use ArrayList instead of array +=:
```powershell
BAD: Creates new array each time (O(n^2) memory operations)
$files = @() $files += $newFile # Slow!
GOOD: ArrayList mutates in place
$files = [System.Collections.ArrayList]::new() [void]$files.Add($newFile) # Fast!
2. **Stream results instead of collecting**:
```powershell
# BAD: Collect all then process
$allFiles = Get-ChildItem -Recurse
foreach ($file in $allFiles) { ... }
# GOOD: Stream as enumerated
Get-ChildItem -Recurse | ForEach-Object { ... }
Problem: Sync runs but nothing happens
Symptoms: Script completes, no files copied, no errors
Cause: Often a path issue or incorrect comparison
Debugging:
# Add verbose output everywhere
Write-Verbose "Source path: $Source"
Write-Verbose "Files found in source: $($sourceFiles.Count)"
Write-Verbose "Files found in dest: $($destFiles.Count)"
Write-Verbose "To copy: $($toCopy.Count)"
Write-Verbose "To update: $($toUpdate.Count)"
# Check if paths are being calculated correctly
$sourceFiles | ForEach-Object {
Write-Verbose "Source file: $($_.RelativePath)"
}
# Verify ShouldProcess is being called
if ($PSCmdlet.ShouldProcess($file, "Copy")) {
Write-Verbose "ShouldProcess returned True for $file"
# ... actual copy
} else {
Write-Verbose "ShouldProcess returned False (WhatIf mode)"
}
Extensions and Challenges
Easy Extensions
- JSON output mode - Output comparison results as JSON for automation
$result | ConvertTo-Json -Depth 5 | Out-File "sync_result.json" -
Email notification - Send summary email after sync completes
- Scheduled execution - Create a Windows Task Scheduler job
$action = New-ScheduledTaskAction -Execute "pwsh.exe" ` -Argument "-File C:\Scripts\Sync-Directories.ps1 -Source C:\A -Destination D:\B" $trigger = New-ScheduledTaskTrigger -Daily -At "2:00AM" Register-ScheduledTask -TaskName "NightlySync" -Action $action -Trigger $trigger - Checksum file generation - Generate
.sha256files alongside synced files
Medium Extensions
- Bandwidth throttling - Limit copy speed to avoid saturating network
function Copy-ItemThrottled { param($Source, $Destination, [int]$BytesPerSecond) # Read in chunks, sleep between chunks } -
Incremental sync with journal - Track last sync time, only check modified files
-
Parallel hashing - Use PowerShell 7
ForEach-Object -Parallelfor hash calculation - Compression during transfer - Compress files before copying, decompress at destination
Advanced Extensions
- Two-way sync with conflict resolution - Implement bidirectional sync
-ConflictResolution NewerWins -ConflictResolution SourceWins -ConflictResolution KeepBoth -
Delta sync - Only transfer changed portions of files (like rsync)
-
Encryption - Encrypt files during transfer, decrypt at destination
- Cloud integration - Sync to Azure Blob Storage or AWS S3
Sync-Directories -Source C:\Local -Destination "az://container/path" - Real-time sync - Use FileSystemWatcher to sync on change
$watcher = New-Object System.IO.FileSystemWatcher $watcher.Path = $Source $watcher.IncludeSubdirectories = $true $watcher.EnableRaisingEvents = $true Register-ObjectEvent $watcher "Changed" -Action { Sync-File $Event.SourceEventArgs.FullPath }
Books That Will Help
| Topic | Book | Relevant Chapter(s) |
|---|---|---|
| PowerShell fundamentals | Learn PowerShell in a Month of Lunches (4th ed.) by Travis Plunk et al. | Ch. 3: Using the Help System; Ch. 7: The Pipeline; Ch. 18: Variables |
| Advanced PowerShell | Windows PowerShell in Action (3rd ed.) by Bruce Payette | Ch. 7: Advanced Functions; Ch. 8: Scripts, Functions, and Filters; Ch. 11: Error Handling |
| Hash functions and data integrity | Designing Data-Intensive Applications by Martin Kleppmann | Ch. 5: Replication (checksums for data validation); Ch. 7: Transactions |
| Algorithm efficiency | Introduction to Algorithms by Cormen et al. | Ch. 11: Hash Tables (for O(1) lookup understanding) |
| Error handling patterns | Release It! by Michael Nygard | Ch. 4: Stability Patterns (timeouts, retries, circuit breakers) |
| File systems deep dive | Operating Systems: Three Easy Pieces by Arpaci-Dusseau | Ch. 39-41: Files and Directories |
Quick reference recommendations:
- For PowerShell syntax: Microsoft PowerShell Documentation
- For hash algorithms: NIST Cryptographic Standards
Self-Assessment Checklist
Before considering this project complete, verify you can answer โyesโ to all:
Core Functionality
- Script compares two directories correctly
- New files (only in source) are identified
- Modified files are detected by comparing file hashes
- -WhatIf shows what would happen without making changes
- -Confirm prompts before each destructive action
- -Verbose provides detailed progress information
- Progress bar displays during sync operations
- Individual file errors are caught and logged (sync continues)
- Mirror mode correctly deletes files not in source
- Log file captures all operations with timestamps
Performance
- 10,000 files compared in under 30 seconds
- Memory usage stays under 500MB for large directories
- Hashtable-based comparison achieves O(n) complexity
Edge Cases
- Works with empty source or destination directories
- Handles files with special characters in names
- Handles very long file paths (260+ characters)
- Handles read-only files appropriately
- Handles locked files without crashing
- Exclude patterns work correctly
Code Quality
- Functions are well-documented with comment-based help
- Parameter validation prevents invalid input
- No global variables - all state passed via parameters
- Consistent error handling throughout
Understanding
- Can explain why hash comparison is better than timestamp comparison
- Can explain why hashtables provide O(1) lookup
- Can explain how SupportsShouldProcess enables -WhatIf and -Confirm
- Can explain the difference between one-way mirror and two-way sync
- Can describe when to use ErrorAction Stop vs Continue
Resources
Official Documentation
Similar Tools for Reference
- robocopy - Windows robust file copy (study its flags for feature ideas)
- rsync - Unix sync tool (delta transfer algorithm)
- FreeFileSync - Open source GUI sync tool
Interview Preparation
Questions You Should Be Able to Answer
- โWhy use file hashes instead of timestamps for comparison?โ
- Timestamps are unreliable across timezones, DST changes, and different clocks
- Copy operations may preserve or modify timestamps inconsistently
- Hashes compare actual content, providing definitive equality
- Even if file times differ, identical hashes mean identical content
- โHow would you optimize this for 1 million files?โ
- Use hashtables for O(1) lookups instead of nested loops
- Stream files instead of loading all into memory
- Implement parallel hashing with PowerShell 7 ForEach-Object -Parallel
- Use size comparison as first-tier filter before hashing
- Consider delta sync with change journals
- โExplain how -WhatIf works in your implementation.โ
- CmdletBinding with SupportsShouldProcess adds -WhatIf automatically
- Each operation wrapped in $PSCmdlet.ShouldProcess() check
- When -WhatIf specified, ShouldProcess returns false
- Script shows what would happen without executing
- ConfirmImpact=โHighโ ensures prompting for destructive operations
- โHow do you handle errors during sync?โ
- Per-file try/catch blocks prevent one failure from stopping sync
- Different exception types handled differently (permission vs locked vs network)
- Retry logic with exponential backoff for transient errors
- All errors logged with details for later review
- Summary report shows success/failure counts
- โWhatโs the time complexity of your comparison algorithm?โ
- Building source hashtable: O(n) where n = source file count
- Building destination hashtable: O(m) where m = dest file count
- Comparing: O(n) lookups, each O(1) with hashtable
- Total: O(n + m) = O(n) linear time
- vs naive nested loop: O(n * m) = O(n^2) quadratic time