Project 2: Budget Variance Analyzer with Conditional Formatting

Project 2: Budget Variance Analyzer with Conditional Formatting

Build a tool that compares budgeted vs actual expenses and highlights variances automatically.


Project Overview

Every department submits a budget at the start of the year. Throughout the year, actual expenses are tracked. Finance needs to quickly identify where departments are over or under budget. Your task: create an automated variance analysis tool.

What Youโ€™ll Build

  • Read budget data (planned expenses by category)
  • Read actual expenses (real transactions throughout the year)
  • Calculate variances (actual - budget) and percentage differences
  • Apply conditional formatting (red for over budget, green for under budget)
  • Add data bars to visualize variance magnitude
  • Generate an executive summary sheet

Technologies

  • pandas for data comparison
  • openpyxl for advanced formatting (conditional rules, data bars)
  • Excel formulas embedded via Python

Learning Objectives

By completing this project, you will:

  1. Master conditional formatting rules - Color scales, data bars, icon sets
  2. Understand variance analysis - A fundamental business metric
  3. Apply formula-based formatting - Rules that evaluate cell relationships
  4. Build executive summaries - Aggregate insights for decision-makers
  5. Handle data merging challenges - When datasets donโ€™t perfectly align
  6. Create visual dashboards - Turn numbers into insights

Deep Theoretical Foundation

Understanding Conditional Formatting

Conditional formatting is Excelโ€™s way of making data self-interpreting. Instead of requiring users to mentally process numbers, formatting automatically highlights what matters.

The Conditional Formatting Hierarchy

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                 CONDITIONAL FORMATTING TYPES                     โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                   โ”‚
โ”‚  1. CELL VALUE RULES                                             โ”‚
โ”‚     โ”œโ”€โ”€ Greater than X โ†’ Format                                  โ”‚
โ”‚     โ”œโ”€โ”€ Less than X โ†’ Format                                     โ”‚
โ”‚     โ”œโ”€โ”€ Between X and Y โ†’ Format                                 โ”‚
โ”‚     โ””โ”€โ”€ Equal to X โ†’ Format                                      โ”‚
โ”‚                                                                   โ”‚
โ”‚  2. COLOR SCALES (Gradients)                                     โ”‚
โ”‚     โ”œโ”€โ”€ 2-Color: Low (green) โ†’ High (red)                       โ”‚
โ”‚     โ””โ”€โ”€ 3-Color: Low โ†’ Mid โ†’ High                               โ”‚
โ”‚                                                                   โ”‚
โ”‚  3. DATA BARS                                                    โ”‚
โ”‚     โ””โ”€โ”€ Bar length proportional to value                        โ”‚
โ”‚                                                                   โ”‚
โ”‚  4. ICON SETS                                                    โ”‚
โ”‚     โ”œโ”€โ”€ Traffic lights (โ—โ—โ—)                                    โ”‚
โ”‚     โ”œโ”€โ”€ Arrows (โ†‘โ†’โ†“)                                            โ”‚
โ”‚     โ””โ”€โ”€ Flags, checkmarks, etc.                                  โ”‚
โ”‚                                                                   โ”‚
โ”‚  5. FORMULA-BASED RULES                                          โ”‚
โ”‚     โ””โ”€โ”€ Custom formulas return TRUE/FALSE                        โ”‚
โ”‚                                                                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Rule Precedence

When multiple rules apply to the same cell, Excel evaluates them in order. The first matching rule wins (unless โ€œStop If Trueโ€ is not set).

Priority 1: Formula =AND($D2>0, $E2>10%)  โ†’ Dark Red
Priority 2: Formula =$D2>0                 โ†’ Light Red
Priority 3: Default                        โ†’ No format

If D2=500 and E2=15%:
  โ†’ Priority 1 matches (positive variance AND >10%)
  โ†’ Cell turns Dark Red
  โ†’ Lower priorities ignored

Variance Analysis: The Business Foundation

Understanding variance is crucial before you can build a variance analyzer.

Types of Variance

Budget Variance = Actual - Budget

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Scenario               โ”‚ Variance โ”‚ Meaning                    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Budget: $10,000        โ”‚          โ”‚                            โ”‚
โ”‚  Actual: $12,000        โ”‚  +$2,000 โ”‚ OVER budget (bad for cost) โ”‚
โ”‚                         โ”‚          โ”‚ ABOVE target (good for rev)โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Budget: $10,000        โ”‚          โ”‚                            โ”‚
โ”‚  Actual: $8,000         โ”‚  -$2,000 โ”‚ UNDER budget (good for costโ”‚
โ”‚                         โ”‚          โ”‚ BELOW target (bad for rev) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key insight: Whether variance is โ€œgoodโ€ or โ€œbadโ€ depends on context:

  • For expenses: Under budget is good (you saved money)
  • For revenue: Over budget is good (you earned more)

Percentage Variance

Raw dollar variance doesnโ€™t tell the whole story:

  • $1,000 over budget on a $10,000 budget = 10% variance (significant)
  • $1,000 over budget on a $1,000,000 budget = 0.1% variance (trivial)

Formula: Variance % = ((Actual - Budget) / Budget) ร— 100

Edge case: What if budget is $0? Division by zero! Handle this:

variance_pct = (actual - budget) / budget if budget != 0 else None

Data Merging Deep Dive

Real-world budget and actual data often come from different sources and donโ€™t align perfectly.

Join Types

Budget Data:              Actual Data:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Category   โ”‚ Budgetโ”‚    โ”‚ Category   โ”‚ Actual โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค    โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Marketing  โ”‚ 50000 โ”‚    โ”‚ Marketing  โ”‚ 47500  โ”‚
โ”‚ IT         โ”‚ 120000โ”‚    โ”‚ IT         โ”‚ 134500 โ”‚
โ”‚ HR         โ”‚ 75000 โ”‚    โ”‚ HR         โ”‚ 78200  โ”‚
โ”‚ Legal      โ”‚ 25000 โ”‚    โ”‚ UNBUDGETED โ”‚ 5000   โ”‚  โ† Not in budget!
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Inner Join: Only matching categories (Marketing, IT, HR)
Left Join: All budget categories + matching actuals (Legal shows, UNBUDGETED dropped)
Right Join: All actual categories + matching budgets (UNBUDGETED shows, Legal dropped)
Outer Join: All categories from both (nothing dropped, some have nulls)

Best practice for variance analysis: Use left join on budget, then investigate unbudgeted actuals separately.

openpyxl Conditional Formatting API

openpyxl provides several rule types for conditional formatting:

ColorScaleRule

Creates a gradient based on cell values:

from openpyxl.formatting.rule import ColorScaleRule

rule = ColorScaleRule(
    start_type='num', start_value=-20, start_color='00FF00',  # Green at -20%
    mid_type='num', mid_value=0, mid_color='FFFFFF',          # White at 0%
    end_type='num', end_value=20, end_color='FF0000'          # Red at +20%
)

DataBarRule

Creates in-cell bars proportional to values:

from openpyxl.formatting.rule import DataBarRule

rule = DataBarRule(
    start_type='num', start_value=-50000,
    end_type='num', end_value=50000,
    color='638EC6',        # Bar color
    showValue=True,        # Show number AND bar
    minLength=5,           # Minimum bar length %
    maxLength=95           # Maximum bar length %
)

IconSetRule

Adds icons based on thresholds:

from openpyxl.formatting.rule import IconSetRule

rule = IconSetRule(
    icon_style='3TrafficLights1',  # Red/Yellow/Green lights
    type='num',
    values=[5, 10],                # Thresholds
    showValue=True,                # Show number AND icon
    reverse=False                  # Normal order
)

The Core Question Youโ€™re Answering

โ€œHow can I make Excel โ€˜smartโ€™ so it automatically highlights what matters?โ€

Youโ€™re learning that Excel isnโ€™t just a data containerโ€”itโ€™s a decision support tool. Conditional formatting, formulas, and visual cues turn raw numbers into insights. Python lets you apply these features programmatically.


Project Specification

Input Requirements

  1. Budget file: annual_budget_2024.xlsx
    • Columns: Category, Department, Budget_Amount
  2. Actual expenses: One or more files containing:
    • Columns: Category, Department, Date, Amount, Description

Sample budget structure:

Category,Department,Budget_Amount
Marketing - Digital,Marketing,50000
Marketing - Events,Marketing,25000
IT - Cloud Services,IT,120000
IT - Hardware,IT,45000
HR - Recruitment,HR,75000

Output Requirements

Generate Budget_Variance_Analysis_{Year}.xlsx with:

  1. Variance Analysis sheet
    • Columns: Category, Department, Budget, Actual, Variance ($), Variance (%), Status
    • Conditional formatting:
      • Color scale on Variance % column
      • Data bars on Variance $ column
      • Icon set for Status column
    • Sorted by variance magnitude (worst first)
  2. Executive Summary sheet
    • Total budget vs total actual
    • Top 5 over-budget categories
    • Top 5 under-budget categories
    • Overall budget utilization percentage
    • Visual indicators of budget health
  3. Department Breakdown sheet
    • Aggregated by department
    • Sparklines or mini-charts showing category breakdown
    • Department ranking by variance

Functional Requirements

  • Handle categories in budget with no actuals (not yet spent)
  • Handle actuals in categories not in budget (flag for review)
  • Calculate both dollar and percentage variance
  • Support multiple threshold levels (e.g., >5%, >10%, >20%)
  • Generate warning flags for significantly over-budget items

Solution Architecture

Component Design

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   BUDGET VARIANCE ANALYZER                       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                   โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚
โ”‚  โ”‚ BudgetLoader โ”‚     โ”‚ActualsLoader โ”‚     โ”‚   Merger     โ”‚     โ”‚
โ”‚  โ”‚              โ”‚     โ”‚              โ”‚     โ”‚              โ”‚     โ”‚
โ”‚  โ”‚ โ€ข read()     โ”‚     โ”‚ โ€ข read()     โ”‚     โ”‚ โ€ข join()     โ”‚     โ”‚
โ”‚  โ”‚ โ€ข validate() โ”‚     โ”‚ โ€ข aggregate()โ”‚     โ”‚ โ€ข calc_var() โ”‚     โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚ โ€ข flag()     โ”‚     โ”‚
โ”‚         โ”‚                    โ”‚             โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚
โ”‚         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜             โ”‚
โ”‚                              โ”‚                                   โ”‚
โ”‚                              โ–ผ                                   โ”‚
โ”‚                     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                            โ”‚
โ”‚                     โ”‚ ReportBuilderโ”‚                            โ”‚
โ”‚                     โ”‚              โ”‚                            โ”‚
โ”‚                     โ”‚ โ€ข variance() โ”‚                            โ”‚
โ”‚                     โ”‚ โ€ข summary()  โ”‚                            โ”‚
โ”‚                     โ”‚ โ€ข dept_brk() โ”‚                            โ”‚
โ”‚                     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                            โ”‚
โ”‚                            โ”‚                                     โ”‚
โ”‚                            โ–ผ                                     โ”‚
โ”‚                     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                            โ”‚
โ”‚                     โ”‚  Formatter   โ”‚                            โ”‚
โ”‚                     โ”‚              โ”‚                            โ”‚
โ”‚                     โ”‚ โ€ข colors()   โ”‚                            โ”‚
โ”‚                     โ”‚ โ€ข databars() โ”‚                            โ”‚
โ”‚                     โ”‚ โ€ข icons()    โ”‚                            โ”‚
โ”‚                     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                            โ”‚
โ”‚                                                                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Conditional Formatting Logic

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                FORMATTING DECISION TREE                          โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                   โ”‚
โ”‚  Variance % Column:                                              โ”‚
โ”‚                                                                   โ”‚
โ”‚     IF variance_pct > 20%                                        โ”‚
โ”‚        โ†’ DARK RED background, white text, red X icon             โ”‚
โ”‚     ELSE IF variance_pct > 10%                                   โ”‚
โ”‚        โ†’ RED background, black text, yellow ! icon               โ”‚
โ”‚     ELSE IF variance_pct > 5%                                    โ”‚
โ”‚        โ†’ LIGHT RED background, exclamation icon                  โ”‚
โ”‚     ELSE IF variance_pct >= -5%                                  โ”‚
โ”‚        โ†’ WHITE background, no icon (within tolerance)            โ”‚
โ”‚     ELSE IF variance_pct >= -10%                                 โ”‚
โ”‚        โ†’ LIGHT GREEN background, checkmark icon                  โ”‚
โ”‚     ELSE (variance_pct < -10%)                                   โ”‚
โ”‚        โ†’ DARK GREEN background, star icon                        โ”‚
โ”‚                                                                   โ”‚
โ”‚  Variance $ Column:                                              โ”‚
โ”‚     โ†’ Data bars: proportional to value                          โ”‚
โ”‚     โ†’ Negative values: red bars extending left                  โ”‚
โ”‚     โ†’ Positive values: blue bars extending right                โ”‚
โ”‚                                                                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Data Flow

Phase 1: LOAD
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Inputs:                                                          โ”‚
โ”‚   - annual_budget_2024.xlsx                                     โ”‚
โ”‚   - expenses_Q1_2024.xlsx, expenses_Q2_2024.xlsx, etc.         โ”‚
โ”‚                                                                   โ”‚
โ”‚ Process:                                                         โ”‚
โ”‚   1. Load budget file โ†’ budget_df                               โ”‚
โ”‚   2. Load all expense files                                     โ”‚
โ”‚   3. Concatenate expense files                                  โ”‚
โ”‚   4. Aggregate expenses by category: actual_df                  โ”‚
โ”‚                                                                   โ”‚
โ”‚ Output:                                                          โ”‚
โ”‚   - budget_df: Category | Department | Budget                   โ”‚
โ”‚   - actual_df: Category | Actual                                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
                              โ–ผ
Phase 2: MERGE & CALCULATE
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Process:                                                         โ”‚
โ”‚   1. Left join budget_df with actual_df on Category             โ”‚
โ”‚   2. Fill missing actuals with 0                                โ”‚
โ”‚   3. Calculate: Variance = Actual - Budget                      โ”‚
โ”‚   4. Calculate: Variance_Pct = Variance / Budget * 100          โ”‚
โ”‚   5. Handle division by zero                                    โ”‚
โ”‚   6. Determine status flags                                     โ”‚
โ”‚                                                                   โ”‚
โ”‚ Output:                                                          โ”‚
โ”‚   variance_df: Category | Dept | Budget | Actual | Var | Var% | Status
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
                              โ–ผ
Phase 3: GENERATE SHEETS
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Sheet 1: Variance Analysis                                       โ”‚
โ”‚   - All rows from variance_df                                   โ”‚
โ”‚   - Sorted by |Variance| descending                             โ”‚
โ”‚                                                                   โ”‚
โ”‚ Sheet 2: Executive Summary                                       โ”‚
โ”‚   - Total budget: SUM(Budget)                                   โ”‚
โ”‚   - Total actual: SUM(Actual)                                   โ”‚
โ”‚   - Overall variance                                            โ”‚
โ”‚   - Top 5 over-budget (nlargest by Variance)                    โ”‚
โ”‚   - Top 5 under-budget (nsmallest by Variance)                  โ”‚
โ”‚                                                                   โ”‚
โ”‚ Sheet 3: Department Breakdown                                    โ”‚
โ”‚   - Aggregate variance_df by Department                         โ”‚
โ”‚   - Department-level totals and variances                       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
                              โ–ผ
Phase 4: APPLY FORMATTING
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ For Variance Analysis sheet:                                     โ”‚
โ”‚   1. Header styling                                              โ”‚
โ”‚   2. ColorScaleRule on Variance % column                        โ”‚
โ”‚   3. DataBarRule on Variance $ column                           โ”‚
โ”‚   4. IconSetRule on Status column                               โ”‚
โ”‚   5. Number formatting ($, %)                                   โ”‚
โ”‚   6. Enable filters                                              โ”‚
โ”‚                                                                   โ”‚
โ”‚ For Executive Summary:                                           โ”‚
โ”‚   1. Dashboard styling                                           โ”‚
โ”‚   2. Large fonts for key metrics                                โ”‚
โ”‚   3. Colored boxes for over/under summaries                     โ”‚
โ”‚                                                                   โ”‚
โ”‚ For Department Breakdown:                                        โ”‚
โ”‚   1. Table formatting                                            โ”‚
โ”‚   2. Conditional colors per department                          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Phased Implementation Guide

Phase 1: Data Loading and Merging

Goal: Load budget and actual data, merge them correctly.

Tasks:

  1. Load budget Excel file with pandas
  2. Load and concatenate all expense files
  3. Aggregate actual expenses by category
  4. Merge budget with actuals using left join
  5. Handle missing values (categories with no actuals)

Checkpoint: Merged DataFrame has all budget categories with corresponding actual amounts.

Common issues:

  • Category names donโ€™t match exactly (whitespace, case)
  • Missing categories filled with NaN instead of 0

Phase 2: Variance Calculations

Goal: Calculate all variance metrics accurately.

Tasks:

  1. Calculate dollar variance: actual - budget
  2. Calculate percentage variance with zero handling
  3. Determine status based on thresholds
  4. Identify outliers (extreme variances)
  5. Sort by absolute variance magnitude

Checkpoint: Every row has accurate variance metrics.

Common issues:

  • Percentage variance incorrect due to wrong formula
  • Division by zero not handled
  • Status logic inverted (over=under, under=over)

Phase 3: Basic Report Generation

Goal: Create multi-sheet workbook with all data.

Tasks:

  1. Write variance analysis to Sheet 1
  2. Calculate and write executive summary to Sheet 2
  3. Aggregate by department and write to Sheet 3
  4. Save workbook

Checkpoint: Excel file opens with correct data on all sheets.

Phase 4: Conditional Formatting

Goal: Apply visual formatting rules.

Tasks:

  1. Apply ColorScaleRule to variance percentage column
  2. Apply DataBarRule to variance dollar column
  3. Apply IconSetRule to status column
  4. Add number formatting for currency and percentages
  5. Style headers and add filters

Checkpoint: Opening Excel shows color-coded, visually meaningful data.

Common issues:

  • Rules applied to wrong range
  • Colors not showing (check color codes are valid hex)
  • Data bars extending wrong direction

Phase 5: Executive Summary Polish

Goal: Create a decision-ready summary view.

Tasks:

  1. Create summary metrics with large fonts
  2. Add colored boxes for over/under budget sections
  3. Include overall budget utilization gauge
  4. Add timestamp for when analysis was generated

Checkpoint: Executive summary provides at-a-glance budget health.


Testing Strategy

Unit Tests

def test_variance_calculation():
    budget = 10000
    actual = 12000
    variance, variance_pct = calculate_variance(budget, actual)
    assert variance == 2000
    assert variance_pct == 20.0

def test_variance_zero_budget():
    budget = 0
    actual = 5000
    variance, variance_pct = calculate_variance(budget, actual)
    assert variance == 5000
    assert variance_pct is None  # or inf, depending on design

def test_merge_with_missing_actuals():
    budget_df = pd.DataFrame({'Category': ['A', 'B', 'C'], 'Budget': [100, 200, 300]})
    actual_df = pd.DataFrame({'Category': ['A', 'C'], 'Actual': [110, 280]})
    merged = merge_budget_actual(budget_df, actual_df)
    assert merged.loc[merged['Category'] == 'B', 'Actual'].values[0] == 0

Integration Tests

def test_full_variance_report():
    # Create test data
    create_test_budget('test_budget.xlsx')
    create_test_expenses('test_expenses.xlsx')

    # Run analyzer
    generate_variance_report('test_budget.xlsx', ['test_expenses.xlsx'], 'test_output.xlsx')

    # Verify output
    wb = load_workbook('test_output.xlsx')
    assert 'Variance Analysis' in wb.sheetnames
    assert 'Executive Summary' in wb.sheetnames

    # Verify conditional formatting exists
    ws = wb['Variance Analysis']
    assert len(ws.conditional_formatting) > 0

Visual Verification Checklist

  • Over-budget items appear in red shades
  • Under-budget items appear in green shades
  • Data bars show correct direction
  • Icons display correctly
  • Percentage formatting shows % sign
  • Currency formatting shows $ and commas
  • Filters are enabled on data

Common Pitfalls and Debugging

Pitfall 1: Conditional Formatting Not Showing

Symptom: Code runs without error, but no colors appear.

Possible causes:

  1. Range reference is wrong (applied to empty cells)
  2. Color code format incorrect (missing โ€˜#โ€™ or wrong length)
  3. Rule values outside data range

Debug steps:

# Print the range you're formatting
print(f"Applying formatting to: {range_string}")

# Verify data exists in that range
print(f"Data in column: {ws[range_string].value}")

Pitfall 2: Merge Losing Data

Symptom: Some budget categories donโ€™t appear after merge.

Cause: Using inner join instead of left join.

Solution:

merged = budget_df.merge(actual_df, on='Category', how='left')  # NOT 'inner'
merged['Actual'].fillna(0, inplace=True)

Pitfall 3: Division by Zero Crashes

Symptom: Script crashes when calculating percentage variance.

Cause: Some budget items have $0 budget.

Solution:

def safe_pct(actual, budget):
    if budget == 0:
        return float('inf') if actual > 0 else 0
    return (actual - budget) / budget * 100

Pitfall 4: Color Scale Shows Wrong Colors

Symptom: High variance shows green, low shows red (inverted).

Cause: Start and end colors swapped, or values in wrong order.

Solution: Verify that:

  • start_value < mid_value < end_value
  • start_color is green (for low/good), end_color is red (for high/bad)

Pitfall 5: Data Bars Not Visible

Symptom: Data bars donโ€™t appear or are too small.

Cause: Value range is much larger than expected.

Solution:

# Set reasonable bounds for data bars
rule = DataBarRule(
    start_type='percentile', start_value=5,   # Bottom 5%
    end_type='percentile', end_value=95,      # Top 95%
    ...
)

Extensions and Challenges

Extension 1: Quarterly Comparison

Show how variance evolved over Q1, Q2, Q3, Q4.

Skills practiced: Time series analysis, trend formatting

Extension 2: Drill-Down Details

Clicking a category reveals individual transactions that make up the total.

Skills practiced: Excel hyperlinks, multi-level aggregation

Extension 3: Budget Reforecasting

Based on current actuals, predict year-end totals and show projected variance.

Skills practiced: Linear projection, confidence intervals

Extension 4: Multi-Year Analysis

Compare this yearโ€™s variance to previous years.

Skills practiced: Historical data management, year-over-year calculations

Extension 5: Alert System

Email stakeholders when variance exceeds threshold.

Skills practiced: Conditional triggers, email automation


Real-World Connections

Industry Applications

  • Corporate Finance: Monthly budget reviews
  • Project Management: Project cost tracking
  • Manufacturing: Production cost variance
  • Retail: Store performance vs plan

Career Relevance

This project demonstrates skills for:

  • Financial Analyst roles
  • FP&A (Financial Planning & Analysis)
  • Business Controller positions
  • Management Accounting

Portfolio Value

When presenting:

  1. Show the visual impact of conditional formatting
  2. Explain the business logic behind threshold choices
  3. Discuss how this enables faster decision-making
  4. Quantify reduction in report preparation time

Interview Questions

Prepare to answer:

  1. โ€œWhatโ€™s the difference between applying formatting to individual cells vs using conditional formatting rules?โ€

  2. โ€œHow would you handle a budget file thatโ€™s updated mid-year?โ€

  3. โ€œWhatโ€™s the performance impact of formula-based cells vs pre-calculated values?โ€

  4. โ€œHow do you ensure your script doesnโ€™t overwrite userโ€™s manual edits to the Excel file?โ€

  5. โ€œWhat Python data structure would you use to efficiently match budget to actual expenses?โ€

  6. โ€œHow would you extend this to handle multiple years of historical data?โ€


Hints by Layer

Layer 1: Merge Data

budget_df = pd.read_excel('annual_budget_2024.xlsx')
actual_df = pd.read_excel('expenses_2024.xlsx')

# Group actual expenses by category
actual_summary = actual_df.groupby('Category')['Amount'].sum().reset_index()
actual_summary.columns = ['Category', 'Actual']

# Merge with budget
variance_df = budget_df.merge(actual_summary, on='Category', how='left')
variance_df['Actual'].fillna(0, inplace=True)
variance_df['Variance'] = variance_df['Actual'] - variance_df['Budget']
variance_df['Variance_Pct'] = (variance_df['Variance'] / variance_df['Budget']) * 100

Layer 2: Apply Color Scale

from openpyxl.formatting.rule import ColorScaleRule

ws.conditional_formatting.add('E2:E157',
    ColorScaleRule(
        start_type='num', start_value=-20, start_color='00FF00',
        mid_type='num', mid_value=0, mid_color='FFFF00',
        end_type='num', end_value=20, end_color='FF0000'
    )
)

Layer 3: Add Data Bars

from openpyxl.formatting.rule import DataBarRule

ws.conditional_formatting.add('D2:D157',
    DataBarRule(
        start_type='num', start_value=-50000,
        end_type='num', end_value=50000,
        color='638EC6'
    )
)

Layer 4: Add Icon Sets

from openpyxl.formatting.rule import IconSetRule

rule = IconSetRule(
    icon_style='3Symbols',
    type='num',
    values=[5, 10],
    reverse=False
)
ws.conditional_formatting.add('E2:E157', rule)

Layer 5: Executive Summary Formulas

summary_ws = wb.create_sheet('Executive Summary')
summary_ws['A1'] = 'Total Budget'
summary_ws['B1'] = '=SUM(\'Variance Analysis\'!B:B)'
summary_ws['A2'] = 'Total Actual'
summary_ws['B2'] = '=SUM(\'Variance Analysis\'!C:C)'
summary_ws['A3'] = 'Overall Variance'
summary_ws['B3'] = '=B2-B1'

Self-Assessment Checklist

Before considering this project complete:

  • Budget and actual data merge correctly
  • All categories from budget appear in output
  • Dollar variance calculated accurately
  • Percentage variance handles zero budgets
  • Color scale shows intuitive colors (red=over, green=under)
  • Data bars visualize magnitude correctly
  • Icon sets display appropriate symbols
  • Executive summary shows correct totals
  • Top over/under budget lists are accurate
  • Department breakdown aggregates correctly
  • Filters enabled on data tables
  • File naming includes year
  • Code handles missing data gracefully

Resources

Topic Resource
pandas merging and joining โ€œPython for Data Analysisโ€ by Wes McKinney - Ch. 8
Conditional formatting in Excel โ€œExcel 2024 Bibleโ€ by Michael Alexander - Ch. 21
openpyxl formatting features openpyxl official documentation
Data aggregation โ€œPython for Data Analysisโ€ by Wes McKinney - Ch. 10
Handling missing data โ€œPython for Data Analysisโ€ by Wes McKinney - Ch. 7
Professional reporting โ€œClean Codeโ€ by Robert C. Martin - Ch. 10