P04: GUI Automation Testing Framework
P04: GUI Automation Testing Framework
Project Overview
What youโll build: A mini framework for automating Windows GUI applicationsโrecord mouse/keyboard actions, play them back, and add verification steps (check if a button exists, verify text in a field).
| Attribute | Value |
|---|---|
| Difficulty | Level 3: Advanced |
| Time Estimate | 2-3 weeks |
| Programming Language | AutoHotkey v2 |
| Knowledge Area | QA, GUI Automation, Testing |
| Prerequisites | Intermediate AutoHotkey, understanding of Windows controls |
Learning Objectives
After completing this project, you will be able to:
- Record user interactions - Capture mouse clicks, keystrokes, and timing using low-level Windows hooks
- Play back recorded actions - Reproduce user interactions reliably with proper synchronization
- Find controls programmatically - Locate buttons, text fields using multiple identification strategies (class, text, HWND, UIA)
- Implement the Command Pattern - Structure recorded actions as replayable command objects
- Handle timing and synchronization - Wait for windows, controls, and states to prevent flaky tests
- Work with the Windows control hierarchy - Navigate parent/child control relationships using Win32 and COM
- Use image recognition - Find controls by visual appearance when accessibility APIs fail
- Design robust assertions - Verify application state during test execution with meaningful error messages
- Understand Windows accessibility APIs - Leverage UI Automation (UIA) and COM for reliable control interaction
- Build testable automation infrastructure - Create a framework that can be extended and maintained
Deep Theoretical Foundation
Windows GUI Controls & the Accessibility Stack
Every Windows application is built from controlsโthe atomic building blocks of graphical interfaces. Understanding how Windows exposes these controls is fundamental to reliable automation.
The Three Layers of Control Access
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ APPLICATION LAYER โ
โ (What the user sees: buttons, menus, text fields) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
v
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ACCESSIBILITY LAYER โ
โ UI Automation (UIA) - Modern, recommended โ
โ MSAA (Active Accessibility) - Legacy, still works โ
โ COM Interfaces - Direct access to application internals โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
v
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ WIN32 LAYER โ
โ Window Handles (HWND), Window Messages (WM_*), โ
โ Control Classes, DWM composition โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
UI Automation (UIA): The Modern Approach
UI Automation is Microsoftโs accessibility framework introduced in Windows Vista. It provides:
Control Types: UIA defines standard control types (Button, Edit, ComboBox, Tree, DataGrid, etc.) with expected patterns.
Automation Patterns: Each control supports patterns that define its capabilities:
- Invoke Pattern: Click-like action (buttons)
- Value Pattern: Get/set text value (text boxes)
- Selection Pattern: Select items (lists, combos)
- Scroll Pattern: Scroll content
- Toggle Pattern: Check/uncheck (checkboxes)
- Expand/Collapse Pattern: Expand/collapse (tree nodes)
Why UIA matters for automation:
Traditional approach (fragile):
Click at coordinates (500, 300)
โ Fails if window moves, DPI changes, or resolution differs
UIA approach (robust):
Find element with AutomationId="submitButton"
Invoke the Invoke pattern
โ Works regardless of position, size, or DPI
UIA in AutoHotkey v2:
; Access UIA through COM
UIA := ComObject("UIAutomationClient.CUIAutomation")
root := UIA.GetRootElement()
; Find window by name
condition := UIA.CreatePropertyCondition(
30005, ; UIA_NamePropertyId
"Notepad"
)
element := root.FindFirst(2, condition) ; TreeScope.Descendants
; Invoke a button
if (element.GetCurrentPattern(10000)) { ; UIA_InvokePatternId
invokePattern := element.GetCurrentPattern(10000)
invokePattern.Invoke()
}
COM Interfaces: Deep Integration
COM (Component Object Model) allows direct interaction with application internals. Many Windows applications expose COM interfaces for automation:
Common COM automation targets:
- Microsoft Office (Word.Application, Excel.Application)
- Internet Explorer (InternetExplorer.Application)
- Windows Shell (Shell.Application)
- Many enterprise applications
COM in AutoHotkey v2:
; Create Excel instance
excel := ComObject("Excel.Application")
excel.Visible := true
workbook := excel.Workbooks.Add()
sheet := workbook.Sheets(1)
sheet.Range("A1").Value := "Hello from AutoHotkey!"
; Automate Internet Explorer
ie := ComObject("InternetExplorer.Application")
ie.Visible := true
ie.Navigate("https://example.com")
while ie.Busy
Sleep(100)
document := ie.Document
The Control Hierarchy Deep Dive
Windows organizes controls in a parent-child hierarchy. Understanding this is crucial for reliable automation.
Application Window (hwnd: 0x00010234, class: "Notepad")
โ
โโโ Menu Bar (hwnd: 0x00010236, class: "#32768")
โ โโโ File Menu
โ โโโ Edit Menu
โ โโโ Help Menu
โ
โโโ Toolbar (hwnd: 0x00010238, class: "ToolbarWindow32")
โ โโโ Button "New" (hwnd: 0x0001023A, id: 101)
โ โโโ Button "Open" (hwnd: 0x0001023C, id: 102)
โ โโโ Button "Save" (hwnd: 0x0001023E, id: 103)
โ
โโโ Text Area (hwnd: 0x00010240, class: "Edit")
โ โโโ [Contains the document text]
โ
โโโ Status Bar (hwnd: 0x00010242, class: "msctls_statusbar32")
โ โโโ Panel 1: "Line 1, Column 1"
โ โโโ Panel 2: "UTF-8"
โ
โโโ Scroll Bars (vertical, horizontal)
Control identification strategies (in order of reliability):
| Strategy | Pros | Cons | Example |
|---|---|---|---|
| AutomationId (UIA) | Unique, stable | Not all apps set it | AutomationId="btnSubmit" |
| Control ID (hwnd) | Unique within window | Can change between versions | ControlClick("Button1") |
| Class + Instance | Usually stable | Multiple matches possible | ControlClick("Edit2") |
| Text/Caption | Human-readable | Changes with localization | ControlClick("OK") |
| Coordinates | Always works | Fragile, resolution-dependent | Click(500, 300) |
| Image match | Visual verification | Slow, theme-dependent | ImageSearch(...) |
Recording & Playback: The Command Pattern
The Command Pattern is a behavioral design pattern that encapsulates a request as an object. This is perfect for recording and playback.
The Command Pattern Explained
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Invoker โ โ Command โ โ Receiver โ
โ (Playback โโโโโถโ (Recorded โโโโโถโ (Windows โ
โ Engine) โ โ Action) โ โ Control) โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ โ
โ โ
โ โโโโโโโโโโโโโดโโโโโโโโโโโโ
โ โ Interface: โ
โ โ Execute() โ
โ โ Undo() โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโ
โ โณ
โ โโโโโโโโโโโโผโโโโโโโโโโโ
โ โ โ โ
โ โ โ โ
โ โโโโโโโดโโโโโ โโโโโดโโโโโ โโโโโดโโโโโ
โ โClickCmd โ โ TypeCmdโ โ WaitCmdโ
โ โExecute():โ โExecute โ โExecute โ
โ โ Click() โ โ Send() โ โWinWait โ
โโโโโถโโโโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ
Why Command Pattern is perfect for recording:
- Encapsulation: Each action is a self-contained object with all data needed to execute
- Undo capability: Each command can implement an undo method for rollback
- Serialization: Commands can be saved to files and loaded later
- Composition: Commands can be grouped into macro commands
- Logging: Every execution can be logged for debugging
Command Pattern implementation:
; Base command class
class Command {
timestamp := A_Now
delay := 0
Execute() {
throw Error("Execute must be overridden")
}
Undo() {
; Optional: override for undo support
}
ToString() {
return Type(this) . " at " . this.timestamp
}
ToJSON() {
return '{"type":"' . Type(this) . '","timestamp":"' . this.timestamp . '"}'
}
}
; Concrete command: Click
class ClickCommand extends Command {
x := 0
y := 0
button := "left"
window := ""
control := ""
__New(x, y, button := "left", window := "", control := "") {
this.x := x
this.y := y
this.button := button
this.window := window
this.control := control
}
Execute() {
if (this.control && this.window) {
if WinExist(this.window) {
try {
ControlClick(this.control, this.window)
return true
}
}
}
; Fallback to coordinate click
Click(this.x, this.y, this.button)
return true
}
ToString() {
return "Click at (" . this.x . ", " . this.y . ") on " . this.window
}
}
; Concrete command: Type
class TypeCommand extends Command {
text := ""
raw := false
__New(text, raw := false) {
this.text := text
this.raw := raw
}
Execute() {
if (this.raw)
SendRaw(this.text)
else
Send(this.text)
return true
}
}
; Concrete command: Wait
class WaitCommand extends Command {
waitType := "window"
target := ""
timeout := 5000
__New(waitType, target, timeout := 5000) {
this.waitType := waitType
this.target := target
this.timeout := timeout
}
Execute() {
switch this.waitType {
case "window":
return WinWait(this.target, , this.timeout / 1000)
case "control":
return this.WaitForControl()
case "text":
return this.WaitForText()
default:
throw Error("Unknown wait type: " . this.waitType)
}
}
WaitForControl() {
startTime := A_TickCount
while (A_TickCount - startTime < this.timeout) {
try {
ControlGetPos(, , , , this.target)
return true
}
Sleep(100)
}
return false
}
}
Command History for Recording
The recording process builds a history of commands:
class CommandHistory {
commands := []
currentIndex := 0
Add(command) {
this.commands.Push(command)
this.currentIndex := this.commands.Length
}
Undo() {
if (this.currentIndex > 0) {
this.commands[this.currentIndex].Undo()
this.currentIndex--
}
}
Redo() {
if (this.currentIndex < this.commands.Length) {
this.currentIndex++
this.commands[this.currentIndex].Execute()
}
}
PlayAll(speed := 1.0) {
for cmd in this.commands {
if (cmd.delay > 0)
Sleep(cmd.delay / speed)
cmd.Execute()
}
}
Save(path) {
content := ""
for cmd in this.commands {
content .= cmd.ToJSON() . "`n"
}
FileDelete(path)
FileAppend(content, path, "UTF-8")
}
}
Image Recognition: When Accessibility Fails
Some applications donโt expose their controls through standard APIs:
- Games and DirectX/OpenGL applications
- Electron/CEF apps with custom rendering
- Remote desktop sessions (RDP, VNC)
- Flash/Silverlight applications (legacy)
- Custom-drawn UI elements
For these, image recognition is the fallback.
How ImageSearch Works
ImageSearch performs pixel-by-pixel comparison:
Screen Region (1920x1080)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Application Window โ โ
โ โ โ โ
โ โ โโโโโโโโโโโโโโโ โ โ
โ โ โ [Search] โ โ Target image (100x30 px) โ โ
โ โ โโโโโโโโโโโโโโโ โ โ
โ โ โ โ
โ โ For each pixel (x, y) in screen: โ โ
โ โ Does region (x, y, x+100, y+30) โ โ
โ โ match target image? โ โ
โ โ If yes โ return (x, y) โ โ
โ โ If no โ continue โ โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Image Search Strategies
1. Exact match (default):
ImageSearch(&foundX, &foundY, 0, 0, A_ScreenWidth, A_ScreenHeight, "button.png")
- Requires pixel-perfect match
- Fails with different themes, DPI, or anti-aliasing
2. Variation tolerance:
ImageSearch(&foundX, &foundY, 0, 0, A_ScreenWidth, A_ScreenHeight, "*50 button.png")
*nallows n shades of variation per color channel- More forgiving but can match wrong elements
3. Transparent color:
ImageSearch(&foundX, &foundY, 0, 0, A_ScreenWidth, A_ScreenHeight, "*TransBlack button.png")
- Ignores pixels of specified color
- Useful for buttons with varying backgrounds
4. Region-specific search:
; Only search within the application window
WinGetPos(&winX, &winY, &winW, &winH, "My App")
ImageSearch(&foundX, &foundY, winX, winY, winX + winW, winY + winH, "button.png")
Image Recognition Best Practices
- Capture minimal regions: Include only the distinctive part of the control
- Use unique visual features: Avoid common icons or patterns
- Handle multiple DPI: Capture images at each target DPI
- Implement retry logic: Visual state may change momentarily
- Use region constraints: Narrow the search area when possible
- Version your images: Keep images organized by application version
class ImageFinder {
static imageDir := A_ScriptDir . "\images\"
static defaultVariation := 30
; Find image with retry logic
static Find(imageName, region := "", timeout := 5000, variation := "") {
if (variation = "")
variation := this.defaultVariation
imagePath := this.imageDir . imageName
if (!FileExist(imagePath))
throw Error("Image not found: " . imagePath)
; Parse region or use full screen
if (region = "") {
x1 := 0, y1 := 0
x2 := A_ScreenWidth, y2 := A_ScreenHeight
} else {
x1 := region.x, y1 := region.y
x2 := region.x + region.w, y2 := region.y + region.h
}
searchOptions := "*" . variation . " " . imagePath
startTime := A_TickCount
loop {
if ImageSearch(&foundX, &foundY, x1, y1, x2, y2, searchOptions) {
return {x: foundX, y: foundY, found: true}
}
if (A_TickCount - startTime > timeout)
return {x: 0, y: 0, found: false}
Sleep(200)
}
}
; Find and click
static FindAndClick(imageName, region := "", timeout := 5000) {
result := this.Find(imageName, region, timeout)
if (result.found) {
; Click center of found image
Click(result.x + 10, result.y + 10)
return true
}
return false
}
; Wait for image to appear
static WaitFor(imageName, timeout := 10000) {
return this.Find(imageName, "", timeout)
}
; Wait for image to disappear
static WaitForGone(imageName, timeout := 10000) {
startTime := A_TickCount
loop {
result := this.Find(imageName, "", 500)
if (!result.found)
return true
if (A_TickCount - startTime > timeout)
return false
Sleep(200)
}
}
; Capture region to file
static CaptureRegion(x, y, w, h, savePath) {
; Requires GDI+ library (Gdip_All.ahk)
; Simplified pseudocode:
; pToken := Gdip_Startup()
; pBitmap := Gdip_BitmapFromScreen(x "|" y "|" w "|" h)
; Gdip_SaveBitmapToFile(pBitmap, savePath)
; Gdip_DisposeImage(pBitmap)
; Gdip_Shutdown(pToken)
}
}
Synchronization & Timing: The Key to Reliable Tests
The #1 cause of flaky GUI tests is timing. Applications are inherently asynchronousโwindows open at their own pace, controls render when ready, network calls complete unpredictably.
The Flaky Test Problem
Test Script Application
โ โ
โโโ Run("notepad.exe") โโโโโโโโโถโ
โ โ [Starting...]
โโโ Send("Hello") โโโโโโโโโโโโโโถโ [Window not ready!]
โ โ โ Input lost!
โ โ
โ โ [Window appears]
โ โ
Why fixed delays are dangerous:
; Bad: Works on your machine, fails on slow machines
Run("notepad.exe")
Sleep(1000)
Send("Hello")
; Also bad: Wastes time on fast machines
Run("notepad.exe")
Sleep(5000) ; 5 seconds every time, even if it opens in 0.5s
Send("Hello")
Explicit Waits: The Solution
Principle: Wait for conditions, not time.
; Good: Wait for the condition to be true
Run("notepad.exe")
if !WinWait("Notepad", , 10) {
throw Error("Notepad failed to start within 10 seconds")
}
WinActivate()
Send("Hello")
Wait Strategy Catalog
| Scenario | Wait Method | AutoHotkey Implementation |
|---|---|---|
| Window exists | WinWait |
WinWait(title, , timeout) |
| Window active | WinWaitActive |
WinWaitActive(title, , timeout) |
| Window closed | WinWaitClose |
WinWaitClose(title, , timeout) |
| Control exists | Custom polling | while !ControlExist(ctrl, win) |
| Control enabled | Custom polling | while !ControlGetEnabled(ctrl, win) |
| Control has text | Custom polling | while ControlGetText(ctrl, win) != expected |
| Image visible | ImageSearch loop |
while !ImageSearch(...) |
| Element via UIA | UIA polling | while !element.FindFirst(...) |
Implementing Smart Waits
class Waiter {
static defaultTimeout := 10000
static pollInterval := 100
; Wait for window
static ForWindow(title, timeout := "") {
if (timeout = "")
timeout := this.defaultTimeout
if WinWait(title, , timeout / 1000) {
return true
}
throw Error("Window '" . title . "' not found within " . timeout . "ms")
}
; Wait for control to exist
static ForControl(control, windowTitle, timeout := "") {
if (timeout = "")
timeout := this.defaultTimeout
startTime := A_TickCount
loop {
try {
ControlGetPos(, , , , control, windowTitle)
return true
}
if (A_TickCount - startTime > timeout)
throw Error("Control '" . control . "' not found in '" . windowTitle . "'")
Sleep(this.pollInterval)
}
}
; Wait for control to be enabled
static ForEnabled(control, windowTitle, timeout := "") {
if (timeout = "")
timeout := this.defaultTimeout
; First wait for control to exist
this.ForControl(control, windowTitle, timeout)
startTime := A_TickCount
loop {
try {
if ControlGetEnabled(control, windowTitle)
return true
}
if (A_TickCount - startTime > timeout)
throw Error("Control '" . control . "' not enabled within timeout")
Sleep(this.pollInterval)
}
}
; Wait for control to have expected text
static ForText(control, windowTitle, expected, timeout := "") {
if (timeout = "")
timeout := this.defaultTimeout
this.ForControl(control, windowTitle, timeout)
startTime := A_TickCount
loop {
try {
actual := ControlGetText(control, windowTitle)
if (actual = expected)
return true
}
if (A_TickCount - startTime > timeout) {
actualText := ""
try actualText := ControlGetText(control, windowTitle)
throw Error("Control text mismatch. Expected: '" . expected . "', Got: '" . actualText . "'")
}
Sleep(this.pollInterval)
}
}
; Wait for custom condition (lambda)
static Until(conditionFn, timeout := "", message := "Condition not met") {
if (timeout = "")
timeout := this.defaultTimeout
startTime := A_TickCount
loop {
try {
if conditionFn()
return true
}
if (A_TickCount - startTime > timeout)
throw Error(message . " within " . timeout . "ms")
Sleep(this.pollInterval)
}
}
}
; Usage examples:
; Waiter.ForWindow("Save As")
; Waiter.ForControl("Edit1", "Notepad")
; Waiter.ForEnabled("Button1", "My App")
; Waiter.ForText("Static1", "My App", "Ready")
; Waiter.Until(() => FileExist("output.txt"), 5000, "Output file not created")
Control Manipulation APIs
AutoHotkey provides multiple levels of control interaction:
Input Simulation vs. Control Messages
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ METHOD 1: Input Simulation (Send, Click) โ
โ โ
โ User types "hello" โ
โ โ โ
โ v โ
โ OS captures key events โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ v โ โ
โ Sends WM_KEYDOWN/WM_KEYUP to focused window โ โ
โ โ โ โ
โ v โ โ
โ Application processes as if user typed โ โ
โ โ โ
โ Pros: Works universally, exactly like real input โ โ
โ Cons: Requires focus, timing dependent โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ METHOD 2: Control Messages (ControlSend, ControlClick) โ
โ โ โ
โ Script sends BM_CLICK directly โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ v โ
โ Message goes directly to control handle (HWND) โ
โ โ โ
โ v โ
โ Control processes as if clicked โ
โ โ
โ Pros: Doesn't require focus, works in background, reliable โ
โ Cons: Some apps ignore direct messages, security software block โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Choosing the Right Method
| Scenario | Best Method | Why |
|---|---|---|
| Standard Win32 app (Notepad, Paint) | ControlSend/ControlClick | Direct control access |
| Game or DirectX app | Send/Click + SetKeyDelay | Games ignore control messages |
| Web browser (Chrome, Firefox) | Send/Click or UIA | Complex rendering, multiple processes |
| Background automation | ControlSend/ControlClick | Works without focus |
| Remote desktop session | Send/Click | Controls are on remote machine |
| Electron app | Depends on app | Try ControlSend first, fall back to Send |
Control Functions Reference
Reading control state:
; Get control text
text := ControlGetText("Edit1", "Notepad")
; Get control position and size
ControlGetPos(&x, &y, &w, &h, "Edit1", "Notepad")
; Check if control is enabled
enabled := ControlGetEnabled("Button1", "My App")
; Check if control is visible
visible := ControlGetVisible("Button1", "My App")
; Check if checkbox is checked
checked := ControlGetChecked("Button1", "My App")
; Get selected item in listbox
selected := ControlGetChoice("ListBox1", "My App")
; Get all items in listbox
items := ControlGetItems("ListBox1", "My App")
Modifying controls:
; Set text
ControlSetText("New text", "Edit1", "Notepad")
; Focus a control
ControlFocus("Edit1", "Notepad")
; Click a control (various forms)
ControlClick("Button1", "My App") ; By ClassNN
ControlClick("OK", "My App") ; By text
ControlClick("x100 y50", "My App") ; By position
ControlClick("Button1", "My App", , "Right") ; Right-click
ControlClick("Button1", "My App", , "Left", 2) ; Double-click
; Send keystrokes to a control
ControlSend("Hello world", "Edit1", "Notepad")
ControlSendText("Hello world", "Edit1", "Notepad") ; Raw text, no special keys
; Set checkbox state
ControlSetChecked(1, "Button1", "My App") ; Check
ControlSetChecked(0, "Button1", "My App") ; Uncheck
; Select item in listbox/combobox
ControlChooseIndex(3, "ComboBox1", "My App") ; By index
ControlChooseString("Option B", "ComboBox1", "My App") ; By text
Complete Project Specification
Functional Requirements
| ID | Requirement | Priority | Acceptance Criteria |
|---|---|---|---|
| F1 | Record mouse clicks with coordinates | Must Have | Clicks logged with x, y, button, window, control |
| F2 | Record keyboard input | Must Have | Keys logged with keycode, modifiers, timing |
| F3 | Record timing between actions | Must Have | Delay calculated in milliseconds |
| F4 | Save recording to file | Must Have | Recordings persist as JSON/INI files |
| F5 | Load and playback recording | Must Have | Recordings execute reliably |
| F6 | Wait for windows during playback | Must Have | WinWait before window-specific actions |
| F7 | Support ControlClick for reliable clicks | Must Have | Use control ID when available |
| F8 | Provide assertion functions | Must Have | Assert, AssertEquals, AssertTextEquals |
| F9 | Report test pass/fail results | Must Have | Clear pass/fail summary with details |
| F10 | Find controls by class and text | Should Have | Multiple identification strategies |
| F11 | Support image-based control finding | Should Have | ImageSearch fallback with retry |
| F12 | Variable playback speed | Should Have | 0.5x, 1x, 2x, 4x speed control |
| F13 | Step-through debugging mode | Nice to Have | Pause between actions, inspect state |
| F14 | Generate readable test scripts | Nice to Have | Export as AHK code from recording |
Non-Functional Requirements
| ID | Requirement | Target | Measurement |
|---|---|---|---|
| NF1 | Playback reliability | 99% success rate | Same test passes 99/100 runs |
| NF2 | Recording overhead | < 5ms per event | Timing measurement |
| NF3 | Test determinism | 100% reproducible | Same result each run |
| NF4 | Error recovery | Graceful degradation | No crashes on app failure |
| NF5 | Memory usage | < 100MB | Task Manager measurement |
| NF6 | Script load time | < 500ms | Startup timing |
Test Target Application Matrix
Your framework should work with these application types:
| Application Type | Example | Expected Support Level |
|---|---|---|
| Native Win32 | Notepad, Paint | Full support |
| .NET WinForms | Many enterprise apps | Full support |
| WPF | Visual Studio | Good support via UIA |
| Qt | VLC, Wireshark | Partial (varies by app) |
| Electron | VS Code, Slack | Partial (complex DOM) |
| Java Swing | IntelliJ IDEA | Limited (accessibility API) |
| Game/DirectX | Any game | Image-based only |
Real World Outcome
When complete, youโll have a testing framework that:
Recording Workflow
- Start recording - Press hotkey (e.g., Ctrl+F9)
- Use the application normally - Click buttons, type text, navigate menus
- Recording captures everything:
- Mouse clicks with coordinates and target control
- Keystrokes with timing
- Window context for each action
- Stop recording - Press hotkey (e.g., Ctrl+F10)
- Save to file - Recording saved as JSON with all action details
Recording Session Example:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
[Ctrl+F9 pressed - Recording started]
Action 1: Click at (150, 200) on "Untitled - Notepad", control: "Edit1"
Action 2: Type: "Hello, this is a test"
Action 3: Wait: 500ms (user pause)
Action 4: Key: Ctrl+S
Action 5: Wait for window: "Save As"
Action 6: Type: "test_document.txt"
Action 7: Click at (400, 350) on "Save As", control: "Button1" (Save)
Action 8: Wait for window close: "Save As"
[Ctrl+F10 pressed - Recording stopped]
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Saved: test_create_document.json (8 actions)
Playback with Verification
; Example test script structure
class Test_NotepadCreateFile extends TestCase {
static Setup() {
; Clean state before test
if FileExist("C:\temp\test_document.txt")
FileDelete("C:\temp\test_document.txt")
}
static Run() {
; Launch application
Run("notepad.exe")
Waiter.ForWindow("Untitled - Notepad")
; Perform actions
Send("Hello, this is a test{Enter}Line 2 of the test")
; Save file
Send("^s")
Waiter.ForWindow("Save As", 5000)
; Enter filename
Send("C:\temp\test_document.txt")
ControlClick("Button1", "Save As") ; Click Save
; Wait for save to complete
Waiter.ForWindow("test_document.txt - Notepad", 5000)
; Verify: File was created
Assert.True(FileExist("C:\temp\test_document.txt"), "File should exist")
; Verify: Content is correct
content := FileRead("C:\temp\test_document.txt")
Assert.Contains(content, "Hello, this is a test", "File should contain our text")
; Cleanup
WinClose("test_document.txt - Notepad")
}
static Teardown() {
; Clean up after test
if WinExist("Notepad")
WinClose("Notepad")
if FileExist("C:\temp\test_document.txt")
FileDelete("C:\temp\test_document.txt")
}
}
Complete Example Test Script
#Requires AutoHotkey v2.0
; Test: Calculator Addition
Test_CalculatorAddition() {
testName := "Calculator Addition Test"
passed := true
try {
; Setup: Close any existing calculator
if WinExist("Calculator")
WinClose("Calculator")
; Step 1: Launch Calculator
Run("calc.exe")
if !WinWait("Calculator", , 5) {
throw Error("Calculator failed to start")
}
; Step 2: Click "5"
; Using UIA for modern Windows Calculator
ControlClick("5", "Calculator")
Sleep(100)
; Step 3: Click "+"
ControlClick("+", "Calculator")
Sleep(100)
; Step 4: Click "3"
ControlClick("3", "Calculator")
Sleep(100)
; Step 5: Click "="
ControlClick("=", "Calculator")
Sleep(200)
; Step 6: Verify result
; Note: Reading Calculator result requires UIA
; This is simplified - real implementation needs UIA
; result := GetCalculatorResult()
; Assert.Equals(result, "8", "5 + 3 should equal 8")
; Alternative: Use image recognition to verify "8" is displayed
if !ImageFinder.Find("calc_result_8.png", , 2000).found {
throw Error("Expected result '8' not found on screen")
}
LogPass(testName)
} catch Error as e {
passed := false
LogFail(testName, e.Message)
} finally {
; Cleanup
if WinExist("Calculator")
WinClose("Calculator")
}
return passed
}
; Test: Notepad Find and Replace
Test_NotepadFindReplace() {
testName := "Notepad Find and Replace Test"
passed := true
try {
; Setup
Run("notepad.exe")
Waiter.ForWindow("Untitled - Notepad")
; Type initial text
Send("Hello World{Enter}Hello Universe{Enter}Hello Galaxy")
; Open Find and Replace (Ctrl+H)
Send("^h")
Waiter.ForWindow("Replace")
; Enter find text
ControlSetText("Hello", "Edit1", "Replace")
; Enter replace text
ControlSetText("Goodbye", "Edit2", "Replace")
; Click "Replace All"
ControlClick("Replace &All", "Replace")
Sleep(500)
; Close dialog
ControlClick("Cancel", "Replace")
; Verify: Get the text content
text := ControlGetText("Edit1", "Notepad")
Assert.NotContains(text, "Hello", "Text should not contain 'Hello' after replace")
Assert.Contains(text, "Goodbye", "Text should contain 'Goodbye' after replace")
LogPass(testName)
} catch Error as e {
passed := false
LogFail(testName, e.Message)
} finally {
; Cleanup without saving
if WinExist("Notepad") {
WinClose("Notepad")
if WinExist("Notepad") {
Send("n") ; Don't save
}
}
}
return passed
}
; Run all tests
RunTestSuite() {
tests := [
Test_CalculatorAddition,
Test_NotepadFindReplace
]
passed := 0
failed := 0
for testFn in tests {
if testFn() {
passed++
} else {
failed++
}
}
MsgBox("Test Results:`nPassed: " . passed . "`nFailed: " . failed)
}
Assertion Examples
class Assert {
static results := []
; Basic assertion
static True(condition, message := "Assertion failed") {
if (!condition) {
this.results.Push({passed: false, message: message})
throw AssertionError(message)
}
this.results.Push({passed: true, message: message})
}
static False(condition, message := "Expected false") {
this.True(!condition, message)
}
; Equality assertions
static Equals(actual, expected, message := "") {
if (message = "")
message := "Expected '" . expected . "' but got '" . actual . "'"
this.True(actual = expected, message)
}
static NotEquals(actual, unexpected, message := "") {
if (message = "")
message := "Expected value to not equal '" . unexpected . "'"
this.True(actual != unexpected, message)
}
; String assertions
static Contains(haystack, needle, message := "") {
if (message = "")
message := "String should contain '" . needle . "'"
this.True(InStr(haystack, needle), message)
}
static NotContains(haystack, needle, message := "") {
if (message = "")
message := "String should not contain '" . needle . "'"
this.True(!InStr(haystack, needle), message)
}
static StartsWith(str, prefix, message := "") {
if (message = "")
message := "String should start with '" . prefix . "'"
this.True(SubStr(str, 1, StrLen(prefix)) = prefix, message)
}
static EndsWith(str, suffix, message := "") {
if (message = "")
message := "String should end with '" . suffix . "'"
this.True(SubStr(str, -StrLen(suffix)) = suffix, message)
}
static Matches(str, pattern, message := "") {
if (message = "")
message := "String should match pattern '" . pattern . "'"
this.True(RegExMatch(str, pattern), message)
}
; Numeric assertions
static Greater(actual, expected, message := "") {
if (message = "")
message := actual . " should be greater than " . expected
this.True(actual > expected, message)
}
static Less(actual, expected, message := "") {
if (message = "")
message := actual . " should be less than " . expected
this.True(actual < expected, message)
}
static Between(actual, min, max, message := "") {
if (message = "")
message := actual . " should be between " . min . " and " . max
this.True(actual >= min && actual <= max, message)
}
; GUI assertions
static WindowExists(title, message := "") {
if (message = "")
message := "Window '" . title . "' should exist"
this.True(WinExist(title), message)
}
static WindowNotExists(title, message := "") {
if (message = "")
message := "Window '" . title . "' should not exist"
this.True(!WinExist(title), message)
}
static ControlText(control, windowTitle, expected, message := "") {
actual := ControlGetText(control, windowTitle)
if (message = "")
message := "Control text should be '" . expected . "', got '" . actual . "'"
this.Equals(actual, expected, message)
}
static ControlEnabled(control, windowTitle, message := "") {
if (message = "")
message := "Control '" . control . "' should be enabled"
this.True(ControlGetEnabled(control, windowTitle), message)
}
static ControlDisabled(control, windowTitle, message := "") {
if (message = "")
message := "Control '" . control . "' should be disabled"
this.True(!ControlGetEnabled(control, windowTitle), message)
}
static ControlVisible(control, windowTitle, message := "") {
if (message = "")
message := "Control '" . control . "' should be visible"
this.True(ControlGetVisible(control, windowTitle), message)
}
static ControlChecked(control, windowTitle, message := "") {
if (message = "")
message := "Checkbox '" . control . "' should be checked"
this.True(ControlGetChecked(control, windowTitle), message)
}
; File assertions
static FileExists(path, message := "") {
if (message = "")
message := "File '" . path . "' should exist"
this.True(FileExist(path), message)
}
static FileContains(path, expected, message := "") {
content := FileRead(path)
if (message = "")
message := "File should contain '" . expected . "'"
this.Contains(content, expected, message)
}
; Reset for new test run
static Reset() {
this.results := []
}
; Get summary
static GetSummary() {
passed := 0
failed := 0
for result in this.results {
if result.passed
passed++
else
failed++
}
return {
total: this.results.Length,
passed: passed,
failed: failed,
details: this.results
}
}
}
; Custom error type for assertions
class AssertionError extends Error {
__New(message) {
super.__New(message)
}
}
Image-Based Finding Example
; Capture a button for later recognition
CaptureButtonImage() {
; Let user draw a rectangle around the button
MsgBox("Move mouse to top-left corner of target and press Enter")
KeyWait("Enter", "D")
MouseGetPos(&x1, &y1)
MsgBox("Move mouse to bottom-right corner and press Enter")
KeyWait("Enter", "D")
MouseGetPos(&x2, &y2)
; Calculate dimensions
w := x2 - x1
h := y2 - y1
; Capture and save (requires GDI+ library)
savePath := A_ScriptDir . "\captured_button.png"
ImageFinder.CaptureRegion(x1, y1, w, h, savePath)
MsgBox("Image saved to: " . savePath)
}
; Use image to click button in test
Test_ImageBasedClick() {
; Find the button by image
result := ImageFinder.Find("submit_button.png", , 5000)
if (!result.found) {
throw Error("Submit button not found on screen")
}
; Click the center of the found image
; Assuming image is 100x30 pixels
Click(result.x + 50, result.y + 15)
; Wait for expected response
Waiter.ForWindow("Success", 3000)
}
Solution Architecture
Component Diagram
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ GUI AUTOMATION TESTING FRAMEWORK โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ TEST RUNNER โ โ
โ โ - RunAll() - RunSingle(test) - GenerateReport() โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโดโโโโโโโโโโโโโโ โ
โ v v โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ RECORDER โ โ PLAYER โ โ
โ โ - StartRecording() โ โ - Load(file) โ โ
โ โ - StopRecording() โ โ - Play(speed) โ โ
โ โ - OnKeyDown(hook) โ โ - Pause() โ โ
โ โ - OnMouseClick(x,y) โ โ - Step() โ โ
โ โ - Save(file) โ โ - Stop() โ โ
โ โโโโโโโโโโโโโฌโโโโโโโโโโโโโโ โโโโโโโโโโโโฌโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโ โ
โ v โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ ACTION ENGINE โ โ
โ โ - ExecuteAction(action) - CreateAction(type, params) โ โ
โ โ - ValidateAction(action) - ActionToString(action) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโ โ
โ v v v โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โ WAITER โ โ ASSERTIONS โ โ CONTROL FINDER โ โ
โ โ - ForWindow() โ โ - Assert() โ โ - ByClass() โ โ
โ โ - ForControl() โ โ - Equals() โ โ - ByText() โ โ
โ โ - ForText() โ โ - Contains() โ โ - ByImage() โ โ
โ โ - Until() โ โ - WindowExists โ โ - ByUIA() โ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโ โ
โ v โ
โ โโโโโโโโโโโโโโโโโโโ โ
โ โ IMAGE FINDER โ โ
โ โ - Find() โ โ
โ โ - FindAndClick โ โ
โ โ - WaitFor() โ โ
โ โ - Capture() โ โ
โ โโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
External Dependencies:
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Windows API โ โ UI Automation โ โ GDI+ โ
โ (User32.dll) โ โ (UIAutomation โ โ (Screenshot) โ
โ โ โ Client) โ โ โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
Recording Data Structure
; A complete test recording
Recording := {
metadata: {
name: "Test_CreateNotepadFile",
description: "Creates a new file in Notepad and verifies it",
createdAt: "20251226140000",
author: "automation_user",
version: "1.0",
targetApp: "Notepad",
targetAppVersion: "11.2309.33.0"
},
settings: {
defaultTimeout: 10000,
playbackSpeed: 1.0,
captureScreenshots: true,
stopOnFirstFailure: true
},
actions: [
{
id: 1,
type: "run",
timestamp: "20251226140001",
delay: 0,
params: {
command: "notepad.exe",
args: "",
workDir: ""
}
},
{
id: 2,
type: "wait",
timestamp: "20251226140002",
delay: 0,
params: {
waitType: "window",
target: "Untitled - Notepad",
timeout: 10000
}
},
{
id: 3,
type: "click",
timestamp: "20251226140003",
delay: 500,
params: {
x: 150,
y: 200,
button: "left",
clicks: 1,
window: "Untitled - Notepad",
windowClass: "Notepad",
control: "Edit1",
controlClass: "Edit"
}
},
{
id: 4,
type: "type",
timestamp: "20251226140004",
delay: 100,
params: {
text: "Hello, this is a test",
raw: false
}
},
{
id: 5,
type: "key",
timestamp: "20251226140005",
delay: 200,
params: {
key: "s",
modifiers: ["ctrl"],
vk: 83,
sc: 31
}
},
{
id: 6,
type: "assertion",
timestamp: "20251226140010",
delay: 500,
params: {
assertType: "windowExists",
target: "Save As",
message: "Save dialog should appear"
}
}
],
results: {
lastRun: "20251226150000",
status: "passed",
duration: 5234,
actionsExecuted: 6,
assertionsPassed: 1,
assertionsFailed: 0,
errors: []
}
}
Playback Engine Design
class PlaybackEngine {
recording := {}
currentIndex := 0
state := "stopped" ; stopped, playing, paused, stepping
speed := 1.0
results := []
__New() {
this.Reset()
}
Reset() {
this.recording := {}
this.currentIndex := 0
this.state := "stopped"
this.results := []
}
Load(path) {
if (!FileExist(path))
throw Error("Recording file not found: " . path)
content := FileRead(path, "UTF-8")
this.recording := this.ParseRecording(content)
this.currentIndex := 0
return true
}
ParseRecording(content) {
; Parse JSON format
; In production, use a JSON library
; This is simplified
return Jxon_Load(&content)
}
Play(speed := 1.0) {
this.speed := speed
this.state := "playing"
this.currentIndex := 0
this.results := []
; Execute setup if exists
if this.recording.HasOwnProp("setup") {
this.ExecuteSetup()
}
; Play all actions
while (this.currentIndex < this.recording.actions.Length && this.state = "playing") {
action := this.recording.actions[++this.currentIndex]
this.ExecuteAction(action)
}
; Execute teardown if exists
if this.recording.HasOwnProp("teardown") {
this.ExecuteTeardown()
}
this.state := "stopped"
return this.GenerateReport()
}
Pause() {
if (this.state = "playing")
this.state := "paused"
}
Resume() {
if (this.state = "paused")
this.state := "playing"
}
Step() {
if (this.currentIndex < this.recording.actions.Length) {
action := this.recording.actions[++this.currentIndex]
return this.ExecuteAction(action)
}
return false
}
Stop() {
this.state := "stopped"
}
ExecuteAction(action) {
startTime := A_TickCount
result := {
actionId: action.id,
type: action.type,
success: false,
duration: 0,
error: ""
}
try {
; Wait for delay (adjusted by speed)
if (action.delay > 0 && this.speed > 0)
Sleep(action.delay / this.speed)
; Execute based on type
switch action.type {
case "run":
this.ExecuteRun(action.params)
case "click":
this.ExecuteClick(action.params)
case "type":
this.ExecuteType(action.params)
case "key":
this.ExecuteKey(action.params)
case "wait":
this.ExecuteWait(action.params)
case "assertion":
this.ExecuteAssertion(action.params)
default:
throw Error("Unknown action type: " . action.type)
}
result.success := true
} catch Error as e {
result.success := false
result.error := e.Message
; Stop on first failure if configured
if (this.recording.settings.stopOnFirstFailure)
this.state := "stopped"
}
result.duration := A_TickCount - startTime
this.results.Push(result)
return result
}
ExecuteRun(params) {
Run(params.command . " " . params.args, params.workDir)
}
ExecuteClick(params) {
; Try control-level click first
if (params.control && params.window) {
if WinExist(params.window) {
try {
ControlClick(params.control, params.window, , params.button, params.clicks)
return
}
}
}
; Fallback to coordinate click
Click(params.x, params.y, params.button, params.clicks)
}
ExecuteType(params) {
if (params.raw)
SendRaw(params.text)
else
Send(params.text)
}
ExecuteKey(params) {
keyStr := ""
for mod in params.modifiers {
keyStr .= "{" . mod . " down}"
}
keyStr .= "{" . params.key . "}"
for mod in params.modifiers {
keyStr .= "{" . mod . " up}"
}
Send(keyStr)
}
ExecuteWait(params) {
timeout := params.HasOwnProp("timeout") ? params.timeout : this.recording.settings.defaultTimeout
switch params.waitType {
case "window":
if !WinWait(params.target, , timeout / 1000)
throw Error("Window '" . params.target . "' not found")
case "windowClose":
if !WinWaitClose(params.target, , timeout / 1000)
throw Error("Window '" . params.target . "' did not close")
case "control":
Waiter.ForControl(params.target, params.window, timeout)
case "text":
Waiter.ForText(params.control, params.window, params.expected, timeout)
case "image":
if !ImageFinder.Find(params.imagePath, , timeout).found
throw Error("Image '" . params.imagePath . "' not found")
default:
throw Error("Unknown wait type: " . params.waitType)
}
}
ExecuteAssertion(params) {
switch params.assertType {
case "windowExists":
Assert.WindowExists(params.target, params.message)
case "windowNotExists":
Assert.WindowNotExists(params.target, params.message)
case "controlText":
Assert.ControlText(params.control, params.window, params.expected, params.message)
case "controlEnabled":
Assert.ControlEnabled(params.control, params.window, params.message)
case "fileExists":
Assert.FileExists(params.path, params.message)
case "fileContains":
Assert.FileContains(params.path, params.expected, params.message)
default:
throw Error("Unknown assertion type: " . params.assertType)
}
}
GenerateReport() {
passed := 0
failed := 0
totalDuration := 0
for result in this.results {
if result.success
passed++
else
failed++
totalDuration += result.duration
}
return {
testName: this.recording.metadata.name,
status: (failed = 0) ? "PASSED" : "FAILED",
totalActions: this.results.Length,
passed: passed,
failed: failed,
duration: totalDuration,
details: this.results
}
}
}
Phased Implementation Guide
Phase 1: Basic Action Logging (4-6 hours)
Goal: Capture mouse clicks and keystrokes with timing.
Steps:
- Create the basic structure: ```autohotkey #Requires AutoHotkey v2.0 #SingleInstance Force
; Global state global isRecording := false global recordedActions := [] global lastActionTime := 0
; Recording control hotkeys ^F9::StartRecording() ^F10::StopRecording()
2. **Implement keyboard hook**:
```autohotkey
StartRecording() {
global isRecording, recordedActions, lastActionTime, inputHook
recordedActions := []
isRecording := true
lastActionTime := A_TickCount
; Create input hook for keyboard
inputHook := InputHook("L0 V I1")
inputHook.KeyOpt("{All}", "N")
inputHook.OnKeyDown := OnKeyDown
inputHook.Start()
; Start mouse monitoring timer
SetTimer(CheckMouse, 50)
ToolTip("Recording started... (Ctrl+F10 to stop)")
SetTimer(() => ToolTip(), -2000)
}
- Implement key capture:
OnKeyDown(hook, vk, sc) { global isRecording, recordedActions, lastActionTime if (!isRecording) return delay := A_TickCount - lastActionTime lastActionTime := A_TickCount ; Get key name keyName := GetKeyName(Format("vk{:X}sc{:X}", vk, sc)) ; Get modifier state modifiers := [] if GetKeyState("Ctrl") modifiers.Push("ctrl") if GetKeyState("Alt") modifiers.Push("alt") if GetKeyState("Shift") modifiers.Push("shift") action := { type: "key", timestamp: A_Now, delay: delay, key: keyName, vk: vk, sc: sc, modifiers: modifiers } recordedActions.Push(action) ; Debug output OutputDebug("Key: " . keyName . " (delay: " . delay . "ms)") } - Implement mouse monitoring: ```autohotkey global lastMouseState := 0
CheckMouse() { global isRecording, recordedActions, lastActionTime, lastMouseState
if (!isRecording)
return
leftDown := GetKeyState("LButton", "P")
; Detect button press (not held)
if (leftDown && !lastMouseState) {
MouseGetPos(&x, &y, &hwnd, &control)
delay := A_TickCount - lastActionTime
lastActionTime := A_TickCount
windowTitle := ""
windowClass := ""
try {
windowTitle := WinGetTitle("ahk_id " . hwnd)
windowClass := WinGetClass("ahk_id " . hwnd)
}
action := {
type: "click",
timestamp: A_Now,
delay: delay,
x: x,
y: y,
button: "left",
clicks: 1,
window: windowTitle,
windowClass: windowClass,
control: control
}
recordedActions.Push(action)
OutputDebug("Click: (" . x . ", " . y . ") on " . windowTitle)
}
lastMouseState := leftDown } ```
-
Implement stop recording: ```autohotkey StopRecording() { global isRecording, recordedActions, inputHook
isRecording := false inputHook.Stop() SetTimer(CheckMouse, 0)
actionCount := recordedActions.Length ToolTip(โRecording stopped. โ . actionCount . โ actions captured.โ) SetTimer(() => ToolTip(), -3000)
; Display summary ShowRecordingSummary() }
ShowRecordingSummary() { global recordedActions
summary := "Recorded Actions:`n`n"
for idx, action in recordedActions {
summary .= idx . ". " . action.type
if (action.type = "click")
summary .= " at (" . action.x . ", " . action.y . ")"
else if (action.type = "key")
summary .= ": " . action.key
summary .= " (delay: " . action.delay . "ms)`n"
}
MsgBox(summary) } ```
Verification: Click around and type text. Verify all actions are logged with correct timing.
Phase 2: Playback Engine (4-6 hours)
Goal: Replay recorded actions reliably.
Steps:
- Create basic playback function:
PlayRecording(actions, speed := 1.0) { results := { total: actions.Length, passed: 0, failed: 0, errors: [] } for idx, action in actions { ; Wait for delay if (action.delay > 0 && speed > 0) Sleep(action.delay / speed) try { ExecuteAction(action) results.passed++ } catch Error as e { results.failed++ results.errors.Push({ index: idx, action: action, error: e.Message }) } } return results } - Implement action execution: ```autohotkey ExecuteAction(action) { switch action.type { case โclickโ: ExecuteClick(action) case โkeyโ: ExecuteKey(action) case โtypeโ: ExecuteType(action) default: throw Error(โUnknown action type: โ . action.type) } }
ExecuteClick(action) { ; Try control click first for reliability if (action.control && action.window) { if WinExist(action.window) { try { ControlClick(action.control, action.window) return } } }
; Fallback to coordinate click
Click(action.x, action.y, action.button) }
ExecuteKey(action) { ; Reconstruct key with modifiers keyStr := โโ
; Add modifier key downs
for mod in action.modifiers {
keyStr .= "{" . mod . " down}"
}
; Add the key
if (StrLen(action.key) > 1)
keyStr .= "{" . action.key . "}"
else
keyStr .= action.key
; Add modifier key ups
for mod in action.modifiers {
keyStr .= "{" . mod . " up}"
}
Send(keyStr) }
ExecuteType(action) { Send(action.text) }
3. **Add playback hotkey**:
```autohotkey
^F11::PlayRecording(recordedActions)
Verification: Record opening Notepad and typing. Playback should recreate it exactly.
Phase 3: Control-Level Interaction (4-6 hours)
Goal: Use ControlClick/ControlSend for reliability.
Steps:
- Enhance click recording with control info: ```autohotkey ; Already capturing control in Phase 1 ; Now enhance to get more control details
GetControlInfo(hwnd, control) { info := { classNN: control, class: โโ, id: 0, text: โโ, pos: {x: 0, y: 0, w: 0, h: 0} }
try {
; Get control class
if (control)
info.class := RegExReplace(control, "\d+$")
; Get control text
info.text := ControlGetText(control, "ahk_id " . hwnd)
; Get control position
ControlGetPos(&x, &y, &w, &h, control, "ahk_id " . hwnd)
info.pos := {x: x, y: y, w: w, h: h}
}
return info } ```
- Implement smart click execution:
ExecuteClickSmart(action) { ; Strategy 1: Try control click by ClassNN if (action.control && action.window) { try { if WinExist(action.window) { ControlClick(action.control, action.window) return true } } } ; Strategy 2: Try click by control text (for buttons) if (action.controlText && action.window) { try { if WinExist(action.window) { ControlClick(action.controlText, action.window) return true } } } ; Strategy 3: Click at relative coordinates (to window) if (action.window) { try { if WinExist(action.window) { WinActivate(action.window) WinGetPos(&wx, &wy, , , action.window) relX := action.x - wx relY := action.y - wy ControlClick("x" . relX . " y" . relY, action.window) return true } } } ; Strategy 4: Absolute coordinate click (least reliable) Click(action.x, action.y) return true }
Verification: Move windows between recording and playback. Control clicks should still work.
Phase 4: Synchronization and Assertions (4-6 hours)
Goal: Add waits and verification steps.
Steps:
-
Implement the Waiter class (as shown in Theoretical Foundation)
-
Implement the Assert class (as shown in Real World Outcome)
-
Add wait actions to recording: ```autohotkey ; Detect window changes during recording global lastActiveWindow := โโ
CheckWindowChange() { global isRecording, recordedActions, lastActiveWindow, lastActionTime
if (!isRecording)
return
currentWindow := WinGetTitle("A")
if (currentWindow != lastActiveWindow && currentWindow != "") {
delay := A_TickCount - lastActionTime
lastActionTime := A_TickCount
action := {
type: "wait",
timestamp: A_Now,
delay: delay,
waitType: "window",
target: currentWindow,
timeout: 10000
}
recordedActions.Push(action)
lastActiveWindow := currentWindow
OutputDebug("Window changed to: " . currentWindow)
} }
; Add to recording start: ; SetTimer(CheckWindowChange, 100)
4. **Implement assertion execution**:
```autohotkey
ExecuteAssertion(action) {
switch action.assertType {
case "windowExists":
if !WinExist(action.target)
throw AssertionError("Window '" . action.target . "' not found")
case "controlText":
actual := ControlGetText(action.control, action.window)
if (actual != action.expected)
throw AssertionError("Expected '" . action.expected . "', got '" . actual . "'")
case "fileExists":
if !FileExist(action.path)
throw AssertionError("File '" . action.path . "' not found")
}
}
Verification: Add assertions to tests. Verify they pass when conditions are met, fail otherwise.
Phase 5: Image-Based Finding (4-6 hours)
Goal: Fall back to image recognition when controls arenโt accessible.
Steps:
-
Implement ImageFinder class (as shown in Theoretical Foundation)
-
Add image capture utility: ```autohotkey ; Press Ctrl+Shift+C to capture a screen region ^+c::CaptureImageForTest()
CaptureImageForTest() { MsgBox(โDraw a rectangle around the target element.`nPress Enter at each corner.โ)
; Get first corner
KeyWait("Enter", "D")
MouseGetPos(&x1, &y1)
KeyWait("Enter")
; Get second corner
KeyWait("Enter", "D")
MouseGetPos(&x2, &y2)
; Normalize coordinates
left := Min(x1, x2)
top := Min(y1, y2)
width := Abs(x2 - x1)
height := Abs(y2 - y1)
; Generate filename
timestamp := FormatTime(A_Now, "yyyyMMdd_HHmmss")
filename := "capture_" . timestamp . ".png"
savePath := A_ScriptDir . "\images\" . filename
; Ensure directory exists
if !DirExist(A_ScriptDir . "\images")
DirCreate(A_ScriptDir . "\images")
; Capture (requires GDI+ library)
CaptureScreen(left, top, width, height, savePath)
MsgBox("Image saved to: " . savePath) } ```
- Integrate image finding into playback:
ExecuteImageClick(action) { result := ImageFinder.Find(action.imagePath, , action.timeout) if (!result.found) { throw Error("Image '" . action.imagePath . "' not found on screen") } ; Click center of found image centerX := result.x + (action.imageWidth / 2) centerY := result.y + (action.imageHeight / 2) Click(centerX, centerY) }
Verification: Capture an image of a button. Move the window, verify image search finds it.
Testing Strategy
Unit Tests
| Test ID | Test Name | Steps | Expected Result |
|---|---|---|---|
| UT-01 | Record single click | Click at known position | Action recorded with correct x, y |
| UT-02 | Record key with modifiers | Press Ctrl+S | Action has key=โsโ, modifiers=[โctrlโ] |
| UT-03 | Calculate delay | Wait 500ms between actions | delay property approximately 500 |
| UT-04 | Save recording | Save to file | File exists, content valid |
| UT-05 | Load recording | Load saved file | Data matches original |
| UT-06 | Assert pass | Assert.True(true) | No exception thrown |
| UT-07 | Assert fail | Assert.True(false) | AssertionError thrown |
| UT-08 | Image search found | Search for known image | Returns correct coordinates |
| UT-09 | Image search timeout | Search for missing image | Returns found=false after timeout |
Integration Tests
| Test ID | Test Name | Steps | Expected Result |
|---|---|---|---|
| IT-01 | Full recording cycle | Record โ Save โ Load โ Play | Playback recreates original |
| IT-02 | Control click reliability | Record button click, move window, play | Click hits button at new position |
| IT-03 | Wait synchronization | Record with app launch, replay | Wait completes before interaction |
| IT-04 | Assertion integration | Add assertion after typing | Assertion verifies typed text |
| IT-05 | Image click fallback | Use image for non-standard control | Image found and clicked |
Reliability Tests
| Test ID | Test Name | Steps | Expected Result |
|---|---|---|---|
| RT-01 | Consistency | Run same test 20 times | All 20 runs pass |
| RT-02 | Speed variation | Run at 0.5x, 1x, 2x speed | All speeds pass |
| RT-03 | Different resolution | Change resolution, run | Adapts gracefully |
| RT-04 | DPI variation | Test on 100%, 125%, 150% DPI | Image search adapts |
| RT-05 | Background execution | Run with other windows active | ControlClick works |
Test Execution Template
class TestRunner {
tests := []
results := []
AddTest(name, testFn) {
this.tests.Push({name: name, fn: testFn})
}
RunAll() {
this.results := []
for test in this.tests {
result := this.RunSingle(test)
this.results.Push(result)
}
return this.GenerateReport()
}
RunSingle(test) {
result := {
name: test.name,
status: "unknown",
duration: 0,
error: ""
}
startTime := A_TickCount
try {
test.fn()
result.status := "PASSED"
} catch Error as e {
result.status := "FAILED"
result.error := e.Message
}
result.duration := A_TickCount - startTime
return result
}
GenerateReport() {
passed := 0
failed := 0
report := "โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ`n"
report .= " TEST EXECUTION REPORT `n"
report .= "โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ`n`n"
for result in this.results {
statusIcon := (result.status = "PASSED") ? "[PASS]" : "[FAIL]"
report .= statusIcon . " " . result.name
report .= " (" . result.duration . "ms)`n"
if (result.error)
report .= " Error: " . result.error . "`n"
if (result.status = "PASSED")
passed++
else
failed++
}
report .= "`nโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ`n"
report .= "Total: " . this.results.Length
report .= " | Passed: " . passed
report .= " | Failed: " . failed . "`n"
report .= "โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ`n"
return report
}
}
Common Pitfalls & Debugging Tips
Problem 1: Playback clicks wrong location
Symptoms: Button clicks miss their targets, clicking empty space.
Causes:
- Window moved since recording
- DPI/scaling changed
- Using absolute coordinates
Solutions:
; Solution 1: Always use ControlClick when possible
ControlClick("Button1", "My App") ; Reliable
; Solution 2: Calculate relative coordinates
WinGetPos(&winX, &winY, , , "My App")
relativeX := originalClickX - originalWinX
relativeY := originalClickY - originalWinY
Click(winX + relativeX, winY + relativeY)
; Solution 3: Re-identify control at playback time
control := FindControlByText("OK", "My App")
ControlClick(control, "My App")
Problem 2: Keys not registering
Symptoms: Typed text doesnโt appear, shortcuts donโt work.
Causes:
- Wrong window has focus
- Application uses DirectInput
- Security software blocking
Solutions:
; Solution 1: Ensure correct focus
WinActivate("Target App")
WinWaitActive("Target App", , 5)
Send("text to type")
; Solution 2: Use SendInput mode
SendMode("Input")
Send("text")
; Solution 3: Use ControlSend for background
ControlSend("text", "Edit1", "Target App")
; Solution 4: Add key delays for problematic apps
SetKeyDelay(50, 50)
Send("text")
Problem 3: Recording misses actions
Symptoms: Not all clicks or keys appear in recording.
Causes:
- Hook not installed correctly
- Events too fast to capture
- Filtering too aggressive
Solutions:
; Solution 1: Verify hook is active
OnKeyDown(hook, vk, sc) {
OutputDebug("Hook received: vk=" . vk) ; Debug output
; ... rest of handler
}
; Solution 2: Use timer-based polling as backup
SetTimer(CheckInputs, 10) ; Faster polling
; Solution 3: Check for single-instance issues
#SingleInstance Force ; Ensure only one instance
Problem 4: Tests are flaky (intermittent failures)
Symptoms: Test passes sometimes, fails others. No code changes.
Causes:
- Timing issues (app slower than expected)
- State not reset between tests
- External interference (popups, notifications)
Solutions:
; Solution 1: Add explicit waits for all conditions
WinWait("My App", , 10)
Waiter.ForControl("Button1", "My App", 5000)
Waiter.ForEnabled("Button1", "My App", 5000)
ControlClick("Button1", "My App")
; Solution 2: Reset state before each test
class TestCase {
static Setup() {
; Kill any existing instances
while WinExist("My App")
WinClose("My App")
; Clear temp files
if FileExist("C:\temp\test_*")
FileDelete("C:\temp\test_*")
}
}
; Solution 3: Disable notifications during tests
; Use Focus Assist or DND mode
; Solution 4: Increase timeouts for CI environments
if (A_ComputerName = "CI-SERVER")
Waiter.defaultTimeout := 30000 ; 30 seconds
Problem 5: Image search fails
Symptoms: ImageSearch returns not found, even when image is visible.
Causes:
- Theme/color scheme changed
- DPI scaling different
- Anti-aliasing differences
Solutions:
; Solution 1: Use variation tolerance
ImageSearch(&x, &y, 0, 0, A_ScreenWidth, A_ScreenHeight, "*50 button.png")
; Solution 2: Capture at multiple DPI levels
images := ["button_100dpi.png", "button_125dpi.png", "button_150dpi.png"]
for img in images {
if ImageSearch(&x, &y, 0, 0, A_ScreenWidth, A_ScreenHeight, img)
break
}
; Solution 3: Narrow search region
WinGetPos(&wx, &wy, &ww, &wh, "My App")
ImageSearch(&x, &y, wx, wy, wx + ww, wy + wh, "button.png")
; Solution 4: Use transparent background
ImageSearch(&x, &y, 0, 0, A_ScreenWidth, A_ScreenHeight, "*TransWhite button.png")
Debugging Toolkit
class Debug {
static logFile := A_ScriptDir . "\debug.log"
static enabled := true
static Log(message) {
if (!this.enabled)
return
timestamp := FormatTime(A_Now, "yyyy-MM-dd HH:mm:ss.") . A_MSec
line := timestamp . " | " . message . "`n"
FileAppend(line, this.logFile)
OutputDebug(message)
}
static LogAction(action) {
this.Log("ACTION: " . action.type . " | " . JSON.stringify(action))
}
static LogError(context, error) {
this.Log("ERROR in " . context . ": " . error.Message)
this.Log("Stack: " . error.Stack)
}
static Screenshot(filename := "") {
if (filename = "")
filename := "debug_" . A_Now . ".png"
path := A_ScriptDir . "\screenshots\" . filename
; Capture full screen
CaptureScreen(0, 0, A_ScreenWidth, A_ScreenHeight, path)
this.Log("Screenshot saved: " . path)
}
static DumpWindowInfo(title := "A") {
hwnd := WinExist(title)
info := "Window Info for '" . title . "':`n"
info .= " HWND: " . hwnd . "`n"
info .= " Title: " . WinGetTitle(title) . "`n"
info .= " Class: " . WinGetClass(title) . "`n"
info .= " PID: " . WinGetPID(title) . "`n"
WinGetPos(&x, &y, &w, &h, title)
info .= " Position: " . x . ", " . y . "`n"
info .= " Size: " . w . " x " . h . "`n"
controls := WinGetControls(title)
info .= " Controls: " . controls.Length . "`n"
for ctrl in controls {
info .= " - " . ctrl . "`n"
}
this.Log(info)
return info
}
}
Extensions & Challenges
Easy Extensions
- Recording indicator
- Show a red dot in the corner when recording
- Display action count in real-time
- Speed control slider
- GUI with speed options: 0.25x, 0.5x, 1x, 2x, 4x
- Real-time speed adjustment during playback
- Step-through mode
- Pause between each action
- Show current action and next action
- โStepโ and โContinueโ buttons
- Recording viewer/editor
- List all recorded actions
- Delete unwanted actions
- Reorder actions
- Edit action parameters
Medium Extensions
- Auto-wait insertion
- Detect window changes during recording
- Automatically insert WinWait actions
- Detect control state changes
- Control highlighting
- During recording, highlight the control under cursor
- During playback, highlight target before clicking
- Error recovery
- Retry failed actions up to N times
- Alternative action paths on failure
- Screenshot on error for debugging
- Parallel test execution
- Run multiple independent tests simultaneously
- Aggregate results from parallel runs
Advanced Extensions
- Visual test builder
- Drag-and-drop interface for creating tests
- Visual representation of test flow
- No coding required for basic tests
- Data-driven testing
- Read test data from CSV/Excel/JSON
- Run same test with different input values
- Parameterized test scripts
- CI/CD integration
- Command-line execution with exit codes
- JUnit XML output format
- Integration with Jenkins/GitHub Actions
- Report generation
- HTML test reports with screenshots
- Historical trend analysis
- Email notifications on failure
Challenge Project: Self-Testing Framework
Build a test suite that tests the framework itself:
; Test the recorder
Test_RecorderCapturesClicks() {
; Start recording programmatically
GUIAutomation.StartRecording()
; Simulate a click
Click(100, 100)
Sleep(100)
; Stop recording
recording := GUIAutomation.StopRecording()
; Verify
Assert.Equals(recording.Length, 1, "Should record one action")
Assert.Equals(recording[1].type, "click", "Action should be click")
Assert.Equals(recording[1].x, 100, "X should be 100")
}
Books That Will Help
| Topic | Book | Specific Chapter/Section |
|---|---|---|
| Command Pattern for actions | Game Programming Patterns by Robert Nystrom | Chapter โCommandโ - Shows exactly how to encapsulate actions as objects with execute/undo |
| Timeout handling | Release It! by Michael Nygard | Chapter 5: โStability Patternsโ - Timeouts section explains why and how to use timeouts correctly |
| Test design principles | Test Driven Development by Kent Beck | Part I: โThe Money Exampleโ - Shows how to think about testable code |
| Windows internals | Windows Internals, 7th Edition by Russinovich | Part 1, Chapter 2: โSystem Architectureโ - Understanding window handles and messages |
| Accessibility APIs | UI Automation Fundamentals (Microsoft Docs) | The entire UIA documentation - essential for control identification |
| Design patterns | Design Patterns by Gang of Four | Chapter 5: โBehavioral Patternsโ - Command, Observer, State patterns |
| Robust software | The Pragmatic Programmer by Hunt & Thomas | Chapter 4: โPragmatic Paranoiaโ - Defensive coding practices |
| Automation architecture | Selenium WebDriver by Arun Motoori | Chapters on Page Object Model - principles apply to any GUI automation |
Self-Assessment Checklist
Before considering this project complete, verify you can answer YES to all:
Core Recording
- Can record mouse clicks with correct x, y coordinates
- Can record which window and control received the click
- Can record keyboard input with correct key names
- Can record modifier keys (Ctrl, Alt, Shift) with key presses
- Can calculate timing delays between actions
- Can display recording summary when stopped
Persistence
- Recordings save to file in readable format (JSON/INI)
- Recordings load correctly with all data preserved
- Save/load round-trip produces identical recording
Playback
- Playback reproduces recorded actions
- Timing delays are respected during playback
- Variable speed playback works (0.5x, 1x, 2x)
- Playback can be paused and resumed
Control-Level Interaction
- ControlClick works for standard Windows controls
- ControlSend types text into controls without focus
- Controls identified by ClassNN, text, or other attributes
- Control clicks work even when window position changes
Synchronization
- WinWait is used before window-specific actions
- Control waits implemented (ForControl, ForEnabled)
- Text waits can verify control content
- Custom condition waits supported (Until)
Assertions
- Assert.True/False work correctly
- Assert.Equals verifies values
- Assert.Contains checks for substrings
- GUI assertions verify window/control state
- File assertions verify file existence/content
- Assertion failures provide clear error messages
Image Recognition
- ImageSearch finds controls by appearance
- Variation tolerance handles minor differences
- Retry logic handles timing issues
- Region constraints speed up searches
Reliability
- Same test passes 10+ consecutive runs
- Tests work on different screen resolutions
- Tests work with different DPI settings
- Error handling prevents crashes
Reporting
- Pass/fail results clearly displayed
- Error details included for failures
- Test duration tracked
- Summary report generated
Interview Questions You Should Be Able to Answer
- โWhatโs the difference between Send and ControlSend?โ
- Send simulates keyboard at the OS level, requires the target window to have focus
- ControlSend sends directly to a specific control via window messages, works without focus
- ControlSend is more reliable for automation but some apps only accept real input
- โHow do you make GUI tests reliable?โ
- Explicit waits for conditions instead of Sleep
- Use control-level interaction over coordinates when possible
- Reset application state before each test
- Handle errors gracefully with retry logic
- Use multiple control identification strategies
- โWhat makes GUI tests flaky and how do you fix it?โ
- Timing issues: Add explicit waits, increase timeouts
- State not reset: Add setup/teardown routines
- External interference: Disable notifications, run in clean environment
- Coordinate-based clicking: Use ControlClick or relative positioning
- โWhen would you use image recognition over control-based automation?โ
- Custom-drawn controls not exposed to accessibility APIs
- Games and DirectX applications
- Remote desktop sessions
- When traditional methods fail consistently
- โExplain the Command Pattern and why itโs useful for test automation.โ
- Encapsulates each action as an object with execute() and undo() methods
- Enables: recording (store commands), playback (execute stored commands), undo, serialization
- Makes it easy to add new action types without changing playback logic
- โHow do you handle applications that donโt expose their controls?โ
- Try UI Automation (UIA) for modern accessibility
- Fall back to image recognition
- Use coordinate-based clicking relative to window position
- Consider OCR for text verification
- โWhat is UI Automation and how does it differ from traditional control access?โ
- UIA is Microsoftโs modern accessibility framework
- Provides standardized control types and patterns
- Works across application technologies (Win32, WPF, UWP)
- More reliable than HWND-based access for modern applications