Connect a BI Tool and Build a Dashboard
Goal: Deliver a stakeholder-ready analytics dashboard connected to your warehouse. You will learn how BI tools query data, how to design metrics that are stable, and how to validate dashboard numbers against raw SQL. By the end, you can build a dashboard that is both useful and trustworthy.
Context and Problem
- Real-world scenario: Leadership wants a weekly sales dashboard with clear trends and top performers.
- Stakeholders and constraints: Executives need clarity, Analysts need drill-down, Engineering needs reproducibility. Constraints include limited time and evolving business questions.
- What happens if this system fails? Decisions are made on incorrect or stale data.
Real World Outcome
- You open a dashboard with live metrics and filters.
- Expected dashboard widgets:
- Total revenue (last 30 days)
- Revenue by month (line chart)
- Top 10 products by revenue (bar chart)
- Top 10 customers by revenue (table)
- Order count by status (if available)
- Example metric snapshot:
Total revenue (last 30 days): $152,430.75
Top product: Wireless Mouse ($12,840.50)
Top customer: Aline Ortiz ($12,842.50)
Core Concepts
- BI semantic layer: How metrics are defined and reused.
- SQL validation: Proving dashboard values match warehouse queries.
- Caching and refresh: Ensuring data freshness.
- Data quality dimensions: Accuracy, completeness, consistency, timeliness, traceability.
Architecture
+--------------------+ +------------------+ +--------------------+
| Data Warehouse | --> | BI Tool (Metabase)| --> | Dashboards & Alerts|
+--------------------+ +------------------+ +--------------------+
| |
+--> SQL validations +--> cached queries
Data Model
- Uses star schema from Project 3:
- dim_customer, dim_product, dim_date, fact_sales
- Example fact record:
sales_key=5012 date_key=20240512 customer_key=92 product_key=18 quantity=3 revenue=76.50
- Example metric query:
SELECT SUM(f.revenue) AS total_revenue
FROM fact_sales f
JOIN dim_date d ON f.date_key = d.date_key
WHERE d.date >= CURRENT_DATE - INTERVAL 30 DAY;
Implementation Plan
- Deploy BI tool (Metabase or Superset) locally via Docker.
- Connect to your warehouse with read-only credentials.
- Define key metrics as saved questions or models.
- Build dashboard with clear layout and filters.
- Validate each chart with a raw SQL query.
- Configure refresh schedule or cache settings.
Validation and Data Quality
- Accuracy: Dashboard totals match SQL queries in the warehouse.
- Completeness: No empty charts for expected data ranges.
- Consistency: Same metric definition reused across charts.
- Timeliness: Data refresh lag <= 24 hours.
- Traceability: Each chart links to underlying SQL.
Failure Modes and Debugging
- Stale data: BI cache not refreshed.
- Symptom: numbers do not change after new data load. Fix by reducing cache TTL or triggering refresh.
- Mismatched metrics: Different definitions used in separate charts.
- Symptom: totals differ between charts. Fix by centralizing metric definitions.
- Timezone drift: Daily metrics shift by one day.
- Symptom: yesterday’s revenue appears under today. Fix by aligning timezone settings in BI tool and warehouse.
Definition of Done
- Dashboard displays all required widgets with clear labels.
- All dashboard metrics are validated against SQL.
- Refresh schedule is documented and tested.
- Dashboard is shareable with stakeholders.
- Data quality checks for accuracy and timeliness are in place.