LEARN AUGMENTED REALITY
Learn Augmented Reality: From Zero to AR Developer
Goal: Deeply understand the core technologies behind Augmented Reality—from computer vision fundamentals to 3D rendering and interaction—by building a series of increasingly sophisticated AR applications.
Why Learn Augmented Reality?
Augmented Reality is poised to become the next major computing platform. It overlays digital information and graphics onto the real world, creating experiences that are more intuitive, immersive, and contextual than anything possible on a flat screen. Understanding AR is not just about learning a new API; it’s about learning how to build for the 3D world.
After completing these projects, you will:
- Understand how a device tracks its position in the real world (SLAM).
- Know how to detect surfaces and understand scene geometry.
- Be able to anchor and render 3D objects that appear to exist in reality.
- Have practical experience with the ARKit framework, from basic placement to advanced image and face tracking.
- Think spatially and design user interactions for a 3D environment.
Core Concept Analysis
AR is built on three pillars: Tracking, Scene Understanding, and Rendering.
┌──────────────────────────────────────────────────────────────┐
│ Real World (Camera & Sensors) │
└──────────────────────────────────────────────────────────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────────┐ ┌────────────────────┐ ┌──────────────────┐
│ Tracking │ │ Scene Understanding│ │ Rendering │
│ (Where am I?) │ │ (What's around me?) │ │ (How do I draw?) │
│ │ │ │ │ │
│ • World Tracking │ │ • Plane Detection │ │ • 3D Graphics │
│ (SLAM) │ │ • Image Recognition│ │ • Virtual Camera │
│ • Image Tracking │ │ • 3D Meshing (LiDAR)│ │ • Lighting │
│ • Face/Body │ │ • Lighting Estimation│ │ • Occlusion │
│ Tracking │ │ │ │ │
└───────────────────┘ └────────────────────┘ └──────────────────┘
│ │ │
└───────────┬──────┴─────────┬────────┘
│ │
▼ ▼
┌────────────────────────────────┐ ┌──────────────────────────────────┐
│ AR Session │ │ Virtual Content │
│ (The brain: tracks pose, │ │ (The imagination: 3D models, │
│ understands the environment) │ │ videos, UI) │
└────────────────────────────────┘ └──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Fused AR Experience │
│ (Virtual objects locked to the real world) │
└──────────────────────────────────────────────────────────────┘
Modern frameworks like ARKit (Apple) and ARCore (Google) handle most of the heavy lifting for Tracking and Scene Understanding, allowing you to focus on the application logic and rendering. We will primarily use Apple’s ARKit for these projects as it offers a well-integrated and beginner-friendly ecosystem with Swift and Xcode.
Project List
The projects are ordered to build your skills progressively. We start with computer vision fundamentals, then move into native AR development with ARKit.
Project 1: Fiducial Marker Detector
- File: LEARN_AUGMENTED_REALITY.md
- Main Programming Language: Python
- Alternative Programming Languages: C++
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Computer Vision
- Software or Tool: OpenCV, ArUco Markers
- Main Book: “Learning OpenCV 4” by Adrian Kaehler and Gary Bradski
What you’ll build: A desktop application that uses your webcam to find special “fiducial markers” (like QR codes, but simpler) in the real world and draw a green box around them in the video feed.
Why it teaches AR: This is the foundation of tracking. Before you can place an object, you must first find a reference point. This project teaches you the absolute basics of how a computer “sees” and recognizes a known pattern in a noisy video stream.
Core challenges you’ll face:
- Accessing a camera feed → maps to
cv2.VideoCapture - Converting images to grayscale → maps to basic image processing
- Detecting ArUco markers → maps to using a pre-built library for robust pattern recognition
- Drawing on an image → maps to overlaying graphics on the video feed
Key Concepts:
- Computer Vision Pipeline: “Learning OpenCV 4”, Chapter 1
- Fiducial Markers (ArUco): OpenCV ArUco Documentation
Difficulty: Intermediate Time estimate: Weekend Prerequisites: Basic Python, familiarity with installing packages.
Real world outcome: An application window showing your live webcam feed. When you hold up a printed ArUco marker, a green square appears perfectly aligned over it, tracking it as you move it around.
Implementation Hints:
- Install OpenCV and its contrib package (
pip install opencv-python opencv-contrib-python). - Generate some ArUco markers to print out using the OpenCV library.
- Create a
cv2.VideoCaptureobject to get frames from the camera. - Inside a loop, read a frame.
- Load the ArUco dictionary (
cv2.aruco.getPredefinedDictionary). - Call
cv2.aruco.detectMarkerson the frame. - If markers are found, the function returns their corner coordinates.
- Use
cv2.polylinesto draw the outline on the original frame before displaying it.
Learning milestones:
- Your webcam feed displays in a window.
- The program prints the ID of any detected marker to the console.
- A bounding box is drawn around the detected marker.
- The bounding box tracks the marker smoothly in real-time.
Project 2: “Hello, ARKit” - World Tracking
- File: LEARN_AUGMENTED_REALITY.md
- Main Programming Language: Swift
- Alternative Programming Languages: N/A for ARKit
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 1: Beginner
- Knowledge Area: AR Frameworks
- Software or Tool: Xcode, ARKit, SceneKit
- Main Book: “ARKit by Tutorials” by the raywenderlich.com team
What you’ll build: Your first true AR app for an iPhone or iPad. The app will open, use the camera to scan the environment, find the floor, and allow you to tap the screen to place a 3D cube on the floor.
Why it teaches AR: This project jumps you straight into the modern AR development loop. It teaches you how to initialize an AR session, how ARKit understands the world through plane detection, and how to translate a 2D screen tap into a 3D world coordinate.
Core challenges you’ll face:
- Setting up an AR project in Xcode → maps to using the Augmented Reality App template
- Starting an AR Session → maps to configuring and running
ARSession - Visualizing feature points and planes → maps to using debug options to “see” what ARKit sees
- Handling user taps → maps to using
UITapGestureRecognizerand performing araycast - Adding 3D content → maps to creating a
SCNNodewith aSCNBoxgeometry
Key Concepts:
- ARKit Session (
ARSession): The central object that manages the AR experience. - World Tracking (
ARWorldTrackingConfiguration): The configuration that enables SLAM. - Anchors (
ARAnchor): Objects that represent a fixed position and orientation in the real world (e.g., a detected plane). - Raycasting: Projecting a line from the camera into the 3D world to find real-world surfaces.
Difficulty: Beginner Time estimate: Weekend Prerequisites: An Apple Developer account, a recent iPhone/iPad, basic Swift knowledge.
Real world outcome: You can walk around your room, see a yellow grid appear on the floor as ARKit detects it, and tap to place a colorful cube that stays locked to that position as you walk around it.
Implementation Hints:
- In Xcode, create a new project using the “Augmented Reality App” template. Choose SceneKit as the content technology.
- The template provides a lot of boilerplate. The key file is
ViewController.swift. - In
viewDidLoad, add the debug optionARSCNDebugOptions.showFeaturePointsto see the points ARKit uses for tracking. - Add a
UITapGestureRecognizerto theARSCNView(the view that renders the scene). - In the gesture’s handler function, get the tap location.
- Use the
arView.raycastQuerymethod to find a real-world plane at that location. - If a plane is found, create an
ARAnchorat the hit-test result’s world transform. - Implement the
renderer(_:didAdd:for:)delegate method. This method is called when ARKit adds an anchor to the scene. Inside, check if the anchor is the one you created, and if so, add your 3D cube model as a child of the anchor’s node.
Learning milestones:
- The app runs and shows the camera feed.
- You can see yellow feature points dancing on surfaces.
- Tapping the screen prints the 3D coordinates of the tap location to the console.
- A 3D cube appears on the floor where you tapped and stays in place.
Project 3: AR Measuring Tape
- File: LEARN_AUGMENTED_REALITY.md
- Main Programming Language: Swift
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: 3D Math / User Interaction
- Software or Tool: Xcode, ARKit
- Main Book: “3D Math Primer for Graphics and Game Development” by Fletcher Dunn
What you’ll build: An AR utility that allows the user to tap two points in their environment to measure the real-world distance between them.
Why it teaches AR: This project goes beyond just placing objects. It forces you to manage state (the start and end points), perform 3D vector math, and dynamically draw content (the measurement line and text) in the 3D scene.
Core challenges you’ll face:
- Managing application state → maps to tracking whether the user is placing the first or second point
- Calculating distance → maps to finding the magnitude of the vector between two 3D points
- Drawing a line in 3D → maps to creating a custom
SCNNodewith a line geometry - Displaying text in 3D → maps to using
SCNTextto create 3D text that always faces the camera (billboarding)
Key Concepts:
- Vector Math: Subtracting two
simd_float3vectors and calculating thelength(). - Dynamic Geometry: Programmatically creating 3D shapes instead of loading them from a file.
- Billboarding: A technique to make a 2D plane (with text on it) always face the camera.
Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 2.
Real world outcome: You can tap on one corner of a real-world table, then tap on another corner, and see a virtual line drawn between them with a label floating above it that reads “2.5 ft” (or the metric equivalent).
Implementation Hints:
- Build on Project 2. You’ll need an array or a pair of optional variables to store the
simd_float3positions of the two user taps. - When the user taps, if the first point is not set, store the result.
- If the first point is set, store the second point’s position.
- Once you have two points,
let distance = length(point2 - point1). - To draw the line, create a custom
SCNNodesubclass. You can use a simple thinSCNBoxorSCNCylinderand scale/orient it to stretch between the two points. - Create an
SCNTextnode to display the formatted distance string. To make it billboard, you can add anSCNBillboardConstraintto itsconstraintsarray. - Add a “reset” button to clear the points and start a new measurement.
Learning milestones:
- The app can store and visualize two separate tapped points.
- The correct distance is calculated and printed to the console.
- A virtual line is drawn between the two points in the AR scene.
- 3D text showing the distance is displayed and is always readable.
Project 4: AR Furniture Placer
- File: LEARN_AUGMENTED_REALITY.md
- Main Programming Language: Swift
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 2: Intermediate
- Knowledge Area: 3D Assets / UI
- Software or Tool: Xcode, ARKit, Reality Composer
- Main Book: N/A
What you’ll build: An app that lets users browse a catalog of 3D furniture models and place them in their room. Users can then move, rotate, and scale the virtual furniture.
Why it teaches AR: This is a classic, practical use case for AR. It teaches you how to load and manage pre-made 3D assets, handle more complex user gestures for manipulation, and build a simple UI to control the AR experience.
Core challenges you’ll face:
- Loading 3D models → maps to working with
.usdzor.scnfiles - Implementing manipulation gestures → maps to using
UIPanGestureRecognizer(for moving) andUIRotationGestureRecognizer - Translating 2D gestures to 3D movement → maps to projecting a 2D screen drag onto a 3D plane
- Building a simple item picker UI → maps to using
UICollectionViewor SwiftUI’sScrollViewto present a catalog
Key Concepts:
- USDZ file format: Apple’s standard format for sharing and loading AR assets.
- Entity Component System (ECS): A common pattern in game/3D development, used by RealityKit (an alternative to SceneKit).
- Gesture Recognizers: The standard UIKit way of handling complex user input.
Difficulty: Intermediate Time estimate: 2-3 weeks Prerequisites: Project 2.
Real world outcome: You can “preview” a 3D model of a chair in your living room. You can slide it across the floor with your finger, use two fingers to rotate it, and see how it looks from different angles before you buy it.
Implementation Hints:
- Find or create some
.usdzmodels. Sketchfab is a great resource, and Reality Composer (part of Xcode tools) can convert other formats. - Add a simple UI (e.g., a horizontal
ScrollViewat the bottom of the screen) with thumbnail images for each piece of furniture. - When the user selects an item, load that model and hold it in a variable. When they tap the screen, place that model.
- For movement, add a
UIPanGestureRecognizer. When a drag gesture occurs on an object, perform a new raycast to the floor plane and update the object’s position to the new world coordinate. - For rotation, use a
UIRotationGestureRecognizer. Use the gesture’srotationproperty to update theeulerAnglesof the 3D node.
Learning milestones:
- A 3D model downloaded from the internet is successfully loaded and placed in the scene.
- The user can select different models from a simple UI.
- Dragging a finger on the screen moves the virtual object along the floor.
- A two-finger twist gesture rotates the object.
Project 5: Image Tracking AR Postcard
- File: LEARN_AUGMENTED_REALITY.md
- Main Programming Language: Swift
- Alternative Programming Languages: N/A
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 2: Intermediate
- Knowledge Area: Image Recognition
- Software or Tool: Xcode, ARKit
- Main Book: N/A
What you’ll build: An AR app that recognizes a specific, real-world image (like a postcard, a book cover, or a movie poster). When the app’s camera is pointed at the image, a 3D model or a video pops out of it.
Why it teaches AR: This teaches you a different kind of tracking. Instead of tracking the whole world, you’re tracking the position of a known 2D image. This is powerful for creating interactive print media, museum exhibits, or packaging.
Core challenges you’ll face:
- Setting up an image tracking configuration → maps to using
ARImageTrackingConfiguration - Creating a reference image catalog → maps to adding your target images to the project’s asset catalog
- Responding to image detection → maps to using the
renderer(_:didAdd:for:)delegate method forARImageAnchor - Attaching content to the image → maps to parenting your 3D model to the detected image anchor’s node
Key Concepts:
ARImageTrackingConfiguration: A session configuration that focuses all the device’s power on finding known 2D images.ARImageAnchor: A special type of anchor that ARKit creates when it finds one of your reference images. It provides the position, orientation, and physical size of the detected image.- Reference Images: The images you provide to ARKit to look for. Good reference images have high contrast and lots of detail.
Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 2.
Real world outcome: You point your phone at a movie poster. A 3D model of the main character appears to stand on top of the poster, or the movie’s trailer starts playing in a virtual video player floating in front of it.
Implementation Hints:
- In your Xcode project, open the
Assets.xcassetsfolder. - Create a new “AR Resource Group”. Drag your target images (e.g.,
poster.jpg) into this group. You must specify the real-world width of the image in meters. - Instead of using
ARWorldTrackingConfiguration, create and run your session with anARImageTrackingConfiguration. - Set the
trackingImagesproperty of the configuration to the reference images you just created. - Implement the
renderer(_:didAdd:for:)delegate method. - Inside, check if the anchor is an
ARImageAnchor. - If it is, you can get its name to see which image was detected. Create your 3D content and add it as a child to the provided
node. The content will now be “stuck” to the real-world image.
Learning milestones:
- The app successfully loads the reference images.
- A log message appears when the target image is brought into view.
- A simple 3D cube appears on the detected image.
- The 3D content tracks the image perfectly as you move the image or the phone.
Project 6: AR Portal to Another World
- File: LEARN_AUGMENTED_REALITY.md
- Main Programming Language: Swift
- Alternative Programming Languages: N/A
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 4: Expert
- Knowledge Area: 3D Rendering / Shaders
- Software or Tool: Xcode, SceneKit, a 360-degree image
- Main Book: “OpenGL ES 3.0 Programming Guide” (for rendering concepts)
What you’ll build: An app that places a virtual doorway on a wall in your room. When you walk through the doorway, the world inside is a completely different 3D environment (e.g., a beach or outer space).
Why it teaches AR: This is an advanced rendering project that teaches you how to manipulate the graphics pipeline to create magical effects. You’ll learn about render order, depth testing, and stencils to create the illusion of a window into another world.
Core challenges you’ll face:
- Creating the portal illusion → maps to selectively rendering the virtual world only “inside” the portal
- Masking and depth testing → maps to making sure objects inside the portal don’t render outside of it
- Handling occlusions → maps to making the user feel like they are “walking through” the door
- Setting up a 360 environment → maps to creating a large sphere with a texture mapped on the inside
Key Concepts:
- Render Order: Controlling the order in which objects are drawn.
- Stencil Buffer: A graphics buffer used to mask pixels.
- Depth Buffer (Z-buffer): Determines which pixels are in front of others.
- Shader Modifiers: Modifying the rendering code (shaders) that SceneKit uses.
Difficulty: Expert Time estimate: 3-4 weeks Prerequisites: Project 2, a good grasp of 3D graphics concepts.
Real world outcome: You place a door on your wall. Looking through it, you see a beach. As you physically walk closer, the beach scene gets bigger. When you walk through the doorframe, your entire screen is filled with the beach scene. When you turn around, you see the back of the portal and your real room through it.
Implementation Hints: This is a classic AR effect. The general approach is:
- Create the “inner world,” for example, a giant sphere with a 360-degree photo mapped to its interior surface. Place this sphere in your scene.
- Create the doorframe geometry.
- The trick is to control what gets rendered and where.
- Pass 1: Render the portal frame, but write only to the stencil buffer, not the color or depth buffers. This “marks” the pixels where the portal is.
- Pass 2: Render the inner world (the sphere), but configure the rendering pipeline to only draw where the stencil buffer was marked in Pass 1.
- Pass 3: Render the portal frame again, this time normally to the color and depth buffers.
- You also need to disable depth testing when rendering the inner world so it doesn’t get hidden by real-world walls that are closer to the camera.
Learning milestones:
- A 360-degree environment can be placed and viewed in AR.
- A doorframe object can be placed on a real-world wall.
- The 360-degree view is only visible through the doorframe.
- Walking through the door correctly transitions the view.
Project 7: Simple Face Filter
- File: LEARN_AUGMENTED_REALITY.md
- Main Programming Language: Swift
- Alternative Programming Languages: N/A
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 2: Intermediate
- Knowledge Area: Face Tracking
- Software or Tool: Xcode, ARKit
- Main Book: N/A
What you’ll build: An app that uses the front-facing TrueDepth camera to track your face and overlay a virtual object, like a pair of glasses, a funny nose, or a hat.
Why it teaches AR: It introduces another major tracking capability: faces. You’ll learn how ARKit provides a detailed 3D mesh of a face and specific anchor points for features like eyes, nose, and mouth, allowing you to attach content that follows facial movements in real-time.
Core challenges you’ll face:
- Using the front-facing camera for AR → maps to
ARFaceTrackingConfiguration - Working with
ARFaceAnchor→ maps to getting the face geometry and transform - Attaching content to a moving target → maps to parenting a 3D model to the face anchor’s node
- Responding to facial expressions → maps to reading
blendShapesto detect a smile or a raised eyebrow
Key Concepts:
ARFaceTrackingConfiguration: The configuration for tracking faces with the TrueDepth camera.ARFaceAnchor: An anchor that provides a 3D mesh of the detected face and its transform.- Blend Shapes: A set of coefficients (0.0 to 1.0) that describe how much the user is making a specific expression, like “mouthSmile” or “eyeBlinkLeft”.
Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 2, a device with a TrueDepth front camera (iPhone X or later).
Real world outcome: You open the app, and it shows your face with a virtual pair of sunglasses perfectly attached. When you nod your head, the glasses move with you. When you raise your eyebrows, the glasses fly up off your face.
Implementation Hints:
- Create a new ARKit project. In your
viewDidLoad, check ifARFaceTrackingConfiguration.isSupportedis true. - If it is, create and run the session with that configuration.
- Implement the
renderer(_:didAdd:for:)andrenderer(_:didUpdate:for:)delegate methods. - When an
ARFaceAnchoris added or updated, ARKit gives you a node that represents its position. - Load your 3D model (e.g., a pair of glasses) and add it as a child to the face anchor’s node. You will need to manually adjust the model’s position and scale so it sits correctly on the face.
- In the
didUpdatemethod, you can inspect theARFaceAnchor’sblendShapesdictionary. For example,if anchor.blendShapes[.jawOpen] > 0.5, you could trigger an animation.
Learning milestones:
- The app starts and successfully tracks a face.
- A 3D cube is attached to the center of the user’s face.
- A glasses model is correctly positioned over the user’s eyes.
- An animation is triggered when the user opens their mouth.
Summary
| Project | Main Language | Difficulty | Core Concept |
|---|---|---|---|
| 1. Fiducial Marker Detector | Python | Intermediate | Computer Vision Basics |
| 2. “Hello, ARKit” - World Tracking | Swift | Beginner | World Tracking, Plane Detection |
| 3. AR Measuring Tape | Swift | Intermediate | 3D Math, State Management |
| 4. AR Furniture Placer | Swift | Intermediate | 3D Assets, Gestures |
| 5. Image Tracking AR Postcard | Swift | Intermediate | Image Recognition |
| 6. AR Portal to Another World | Swift | Expert | Advanced Rendering, Shaders |
| 7. Simple Face Filter | Swift | Intermediate | Face Tracking |