← Back to all projects

LEARN MALWARE INTERNALS

Learn Malware Analysis & Development (Red vs. Blue)

Goal: Demystify the “black magic” of malware. You will learn how malicious software is structured, how it hides, how it persists, and—crucially—how to detect and defeat it.

⚠️ WARNING: You are entering the domain of Dual-Use Technology. The techniques below are used by criminals to attack systems, but they are also used by Security Researchers to defend them. MANDATORY RULE: All projects below must be built and executed inside an Isolated Virtual Machine (VM) without internet access. Never run these on your host machine. Never release this code into the wild.


Core Concept Analysis

To understand malware and antivirus (AV), you need to understand the Cat and Mouse Game:

  1. The Structure (The Virus): How code lives inside other files (PE/ELF formats).
  2. The Hook (The Rootkit): How code intercepts OS calls to hide itself.
  3. The Mask (The Packer): How code obfuscates itself to look like random data.
  4. The Fingerprint (The Signature): How AV identifies known threats.
  5. The Behavior (The Heuristics): How AV detects unknown threats based on actions.

Project List

Project 1: The “Parasite” (File Infector)

  • File: LEARN_MALWARE_INTERNALS.md
  • Main Programming Language: C
  • Alternative Programming Languages: Python (with pefile/pyelftools), Assembly
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Binary Formats / System Programming
  • Software or Tool: Hex Editor (HxD), VM
  • Main Book: “Practical Malware Analysis” by Michael Sikorski and Andrew Honig

What you’ll build: A program (the “virus”) that finds other executable files in a folder, injects its own code into them, and ensures that when the victim file is run, the virus runs first, followed by the original program.

Why it teaches Malware: This is the definition of a computer virus. You will learn the anatomy of an executable file (PE on Windows or ELF on Linux)—headers, sections, and entry points. You will understand how code can be added to a binary without breaking it.

Core challenges you’ll face:

  • The Entry Point: Finding where the computer starts reading the file and changing it to point to your code.
  • Section Padding: Finding empty space (code caves) in a file to hide your code, or adding a new section.
  • Preserving State: Saving the CPU registers before your code runs and restoring them so the original program doesn’t crash.

Key Concepts:

  • PE/ELF Headers: “Practical Malware Analysis” Chapter 1
  • Code Caves: The art of finding unused bytes in binaries.
  • Relative Virtual Addresses (RVA): How memory addresses change when loaded.

Difficulty: Advanced Time estimate: 2 weeks Prerequisites: C pointers, Hexadecimal, Basic Assembly understanding.

Real world outcome:

  1. You have a clean program hello_world.exe.
  2. You run ./parasite.
  3. hello_world.exe increases in size by a few KB.
  4. When you run hello_world.exe, it prints: “I AM INFECTED” and then prints “Hello World”.

Implementation Hints: Start with a “Prepend” method (simplest, though not stealthy):

  1. Read the target file into a buffer.
  2. Open a new file.
  3. Write your “Virus Payload” binary first.
  4. Write the Target file binary after it.
  5. In a real scenario, you’d extract the original to a temp file and run it, but for learning, just seeing the file grow and the dual execution is enough. Advanced: Modify the Entry Point in the PE Header to point to a new code section you added, then jump back to the original entry point.

Learning milestones:

  1. Format Reader - You can parse a header and print the “Entry Point” address.
  2. The Append - You can add bytes to the end of a file without corrupting it.
  3. The Hijack - You successfully redirect execution to your code.
  4. The Restoration - The original program still works after your code runs.

Project 2: The “Signature Hunter” (Basic Antivirus)

  • File: LEARN_MALWARE_INTERNALS.md
  • Main Programming Language: Python
  • Alternative Programming Languages: C++, Go
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Static Analysis / Pattern Matching
  • Software or Tool: YARA (Industry standard tool you will mimic)
  • Main Book: “The Art of Computer Virus Research and Defense” by Peter Szor

What you’ll build: A scanning engine that takes a database of “known bad” byte sequences (signatures) and scans a directory of files. If it finds those bytes, it flags the file as malware.

Why it teaches Antivirus: This is how traditional AV works. You will learn why “changing one byte” often breaks AV detection (because the signature no longer matches) and why AV updates are so frequent.

Core challenges you’ll face:

  • Performance: scanning 100,000 files byte-by-byte is slow. You need efficient search algorithms (Aho-Corasick).
  • False Positives: Ensuring your signature is unique enough not to flag Windows system files as viruses.
  • Wildcards: Implementing signatures that allow for variable bytes (e.g., E8 ?? ?? ?? 00 for a call instruction).

Key Concepts:

  • Static Analysis: Analyzing code without running it.
  • Hashing: MD5/SHA256 for exact file matching.
  • Byte Signatures: Unique hex strings that identify code.

Difficulty: Intermediate Time estimate: 1 week Prerequisites: File I/O, Basic Algorithms.

Real world outcome:

  1. You create a signature file db.txt containing the hex bytes of your Project 1 virus.
  2. You run ./scanner /home/user/downloads.
  3. Output: [ALERT] Found 'Parasite.A' in file 'hello_world.exe' at offset 0x400.

Implementation Hints: Start with simple hash matching (SHA256 of the whole file). Then move to “substring” matching. Signature format: NAME: BYTES. Example: VIRUS_X: 4D 5A 90 00. Read the target file in chunks (don’t load 10GB into RAM). Check if the chunk contains the byte sequence.

Learning milestones:

  1. Hash Matcher - Detects exact file copies.
  2. String Matcher - Detects specific text inside binaries.
  3. Hex Matcher - Detects specific code instructions.
  4. Wildcard Matcher - Detects patterns even with changing addresses.

Project 3: The “Polymorphic” Engine (The Evasion)

  • File: LEARN_MALWARE_INTERNALS.md
  • Main Programming Language: Python (Script based is easier for logic) or C
  • Alternative Programming Languages: Rust
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Obfuscation / Compilers
  • Software or Tool: Disassemblers
  • Main Book: “Rootkits and Bootkits” by Matrosov, Rodionov, and Bratus

What you’ll build: A program that rewrites its own code (or a payload’s code) every time it runs. It changes the instructions but keeps the logic the same (e.g., add eax, 1 becomes inc eax), defeating the signature scanner from Project 2.

Why it teaches Malware: This explains why simple AV signatures fail. If the virus looks different every time, how do you catch it? This teaches you about instruction equivalence and the “arms race” between coders and scanners.

Core challenges you’ll face:

  • Instruction Equivalence: Knowing that x = x + 0 is the same as doing nothing (NOP), but changes the file hash.
  • Junk Code Insertion: Adding random calculations that don’t affect the result but change the file structure.
  • Self-Modification: Writing to your own memory space (requires changing memory permissions).

Key Concepts:

  • Polymorphism vs Metamorphism: Changing the encryption key vs changing the code body.
  • Control Flow Graph (CFG): Understanding execution paths.

Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Project 1, Project 2, Assembly.

Real world outcome:

  1. Take your “Virus” from Project 1.
  2. Scan it with Project 2 -> “Detected”.
  3. Run it through your Polymorphic Engine.
  4. Scan the new file -> “Clean”.
  5. Run the new file -> It still works.

Implementation Hints: Don’t try to recompile C code. Work with a simple “payload” shellcode.

  1. Create a “decoder stub” that decrypts the main body.
  2. The engine only changes the decoder stub each time.
  3. Swap register usage (use EBX instead of EAX for the counter).
  4. Insert NOPs (0x90) randomly.

Learning milestones:

  1. Junk Code - You can insert random bytes without crashing.
  2. Register Swapping - You can dynamically change which registers are used.
  3. Signature Evasion - You successfully bypass your own Project 2 scanner.

Project 4: The “Ghost” (Userland Rootkit/Hooking)

  • File: LEARN_MALWARE_INTERNALS.md
  • Main Programming Language: C (Linux) or C/C++ (Windows)
  • Alternative Programming Languages: Rust
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: OS Internals / API Hooking
  • Software or Tool: LD_PRELOAD (Linux) or MinHook (Windows)
  • Main Book: “Rootkits and Bootkits”

What you’ll build: A dynamic library (.so or .dll) that intercepts calls to the Operating System. When a user tries to list files (using ls or dir), your library intercepts the result and removes specific files from the list, making them “invisible.”

Why it teaches Malware: Malware uses rootkits to hide. If you check Task Manager, you won’t see the virus because it has hooked the function that lists processes. This teaches you how the OS API really works and how fragile “reality” is on a computer.

Core challenges you’ll face:

  • The Hook: Intercepting a function like readdir() or NtQuerySystemInformation.
  • The Filter: Calling the real function, getting the data, removing the entry you want to hide, and passing the fake data back to the user.
  • Injection: Forcing a running program to load your library.

Key Concepts:

  • API Hooking: Trampolines and Detours.
  • DLL Injection / LD_PRELOAD: Methods to force code loading.
  • Kernel vs User Mode: Where the hooking happens.

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: C function pointers, OS API knowledge.

Real world outcome:

  1. Create a file named secret_virus.txt.
  2. Compile your rootkit ghost.so.
  3. Run export LD_PRELOAD=./ghost.so.
  4. Run ls.
  5. Outcome: secret_virus.txt is NOT listed, even though it exists.
  6. Run cat secret_virus.txt.
  7. Outcome: You can read it (because you didn’t hook open/read, only ls).

Implementation Hints (Linux): Use dlsym(RTLD_NEXT, "readdir") to get the address of the real readdir function. Your custom readdir calls the real one, loops through the results, and if entry->d_name == "secret_virus.txt", you skip it.

Learning milestones:

  1. The Interceptor - You can print “I see you!” every time ls runs.
  2. The Filter - You can successfully hide a file from the list.
  3. The Process Hider - You apply the same logic to /proc to hide a running process.

Project 5: The “Packer” (Obfuscator)

  • File: LEARN_MALWARE_INTERNALS.md
  • Main Programming Language: C or Go
  • Alternative Programming Languages: Python (for the builder)
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool” (Legit software uses packers too)
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Compression / Cryptography / Memory Loading
  • Software or Tool: PE Headers
  • Main Book: “Practical Malware Analysis” Chapter 18

What you’ll build: A tool that takes a compiled .exe, compresses and encrypts it, and wraps it in a new executable (the “stub”). When run, the stub decrypts the original program directly into memory and runs it, never writing the clean file to disk.

Why it teaches Malware: 90% of malware is packed. This prevents AV from reading the code on disk. The code only exists in its true form in RAM for a split second. Understanding this is key to “unpacking” malware for analysis.

Core challenges you’ll face:

  • In-Memory Execution: Running code that isn’t a file on disk (RunPE / Process Hollowing).
  • The Stub: Writing a tiny piece of Assembly/C that handles the decryption without dependencies.
  • Import Address Table (IAT) Repair: When you move code manually, all its links to Windows functions break. You have to fix them.

Key Concepts:

  • Entropy: High entropy (randomness) usually indicates packing/encryption.
  • Stub: The code responsible for unpacking.
  • Process Hollowing: Replacing a legitimate process’s memory with malicious code.

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Project 1, strong understanding of memory layout.

Real world outcome:

  1. Input: malware.exe (Detects as malicious).
  2. Run ./packer malware.exe.
  3. Output: packed_malware.exe (Detects as “Unknown/Suspicious” but not the specific malware signature).
  4. Run packed_malware.exe -> It behaves exactly like the original.

Implementation Hints: Simple version: The stub extracts the payload to a temporary file, runs it, waits for it to close, then deletes it (easier). Complex version (Real Packer): The stub allocates memory (VirtualAlloc), decrypts the payload into that memory, resolves imports, and jumps to the Entry Point.

Learning milestones:

  1. The Wrapper - You can hide a file inside another.
  2. The Decryptor - The stub correctly decrypts the data.
  3. The Loader - The stub successfully executes the decrypted code.

Project 6: The “Behavioral Analyst” (EDR Simulator)

  • File: LEARN_MALWARE_INTERNALS.md
  • Main Programming Language: Python (using libraries) or C#
  • Alternative Programming Languages: C++
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Blue Team / Forensics
  • Software or Tool: Sysmon / ETW (Event Tracing for Windows)
  • Main Book: “Practical Malware Analysis”

What you’ll build: A tool that doesn’t look at files, but looks at actions. It monitors the system for “suspicious sequences,” such as a Word document launching PowerShell, or a process trying to access the memory of lsass.exe.

Why it teaches Antivirus: Modern EDR (Endpoint Detection and Response) relies less on signatures and more on behavior. This project teaches you how to detect the intent of a program, regardless of how it is packed or obfuscated.

Core challenges you’ll face:

  • Event Noise: The OS does thousands of things per second. Filtering the noise to find the signal.
  • Parent-Child Relationships: Tracking who started whom (Process Trees).
  • Real-time Processing: Analyzing events before the damage is done.

Key Concepts:

  • Heuristics: Rule of thumb analysis (e.g., “If it encrypts 100 files in 1 second, it’s ransomware”).
  • ETW (Event Tracing for Windows): The built-in telemetry of Windows.
  • Process Injection Detection: Identifying when one process touches another’s memory.

Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Understanding of OS processes/threads.

Real world outcome:

  1. Start ./edr_monitor.
  2. Run your “Packer” or “Virus” from previous projects.
  3. EDR Output: [ALERT] Suspicious behavior detected: Process 'packed.exe' (PID: 1234) modified executable memory in dynamic allocation (Unpacking detected).
  4. Run a script that deletes shadow copies.
  5. EDR Output: [ALERT] Ransomware activity: Attempt to delete backup files.

Implementation Hints: Use a library to tap into ETW (like pywintrace for Python or C# System.Diagnostics.Eventing). Define rules:

  • Process == winword.exe AND ChildProcess == powershell.exe -> ALERT.
  • FileWriteCount > 50 AND FileExtension == .encrypted -> ALERT.

Learning milestones:

  1. Event Stream - You can see every process start/stop in real time.
  2. Tree Builder - You can visualize the process tree (Parent -> Child).
  3. Rule Engine - You can trigger alerts on specific logic.

Project 7: The “Sandbox” (Automated Analysis Lab)

  • File: LEARN_MALWARE_INTERNALS.md
  • Main Programming Language: Python
  • Alternative Programming Languages: Bash/PowerShell
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Automation / Virtualization
  • Software or Tool: VirtualBox / Cuckoo Sandbox (You are building a mini-Cuckoo)
  • Main Book: “Practical Malware Analysis”

What you’ll build: An automation script that spins up a clean Virtual Machine, copies a suspicious file into it, runs the file, records network traffic and screenshots for 2 minutes, shuts down the VM, and generates a report.

Why it teaches Malware: Manually analyzing malware is dangerous and slow. Sandboxes allow researchers to analyze thousands of samples a day. This project teaches you about safe detonation environments and automation.

Core challenges you’ll face:

  • VM Control: Using the VirtualBox/VMware API to revert snapshots and start machines programmatically.
  • Guest Control: Executing commands inside the VM from the host.
  • Anti-Evasion: Malware tries to detect if it’s in a VM (checking mouse movement, checking CPU ID). Your sandbox needs to look “real.”

Key Concepts:

  • Snapshotting: Reverting to a clean state.
  • Instrumentation: Recording what happens inside the black box.
  • IOC (Indicator of Compromise): Extracting IPs and filenames from the run.

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Basic scripting, Experience with VMs.

Real world outcome:

  1. Drop a file mystery.exe into your incoming folder.
  2. The script wakes up, restores the VM snapshot.
  3. The file is executed in the VM.
  4. After 2 minutes, you get a PDF report: “Screenshots of what happened,” “Network connections made,” “Files created.”

Implementation Hints: Use vboxmanage (CLI for VirtualBox) to control the VM. Use a simple agent (Python script running inside the VM) to listen for the file and execute it, then send logs back to the host via a shared folder or network socket.

Learning milestones:

  1. The Controller - Script can start/stop/revert VMs.
  2. The Detonator - Script can execute files inside the guest.
  3. The Reporter - Script collects logs and presents them.

Project Comparison Table

Project Difficulty Time Depth Fun Factor
File Infector Advanced 2 wks ⭐⭐⭐⭐⭐ (Core Concept) ⭐⭐⭐
Signature Scanner Intermed. 1 wk ⭐⭐⭐ (Defense) ⭐⭐
Polymorphic Engine Expert 3 wks ⭐⭐⭐⭐⭐ (Evasion) ⭐⭐⭐⭐⭐
Rootkit (Ghost) Advanced 2 wks ⭐⭐⭐⭐ (Stealth) ⭐⭐⭐⭐⭐
Packer Advanced 2 wks ⭐⭐⭐⭐ (Obfuscation) ⭐⭐⭐
Behavioral EDR Advanced 2 wks ⭐⭐⭐⭐ (Modern Defense) ⭐⭐⭐
The Sandbox Intermed. 2 wks ⭐⭐⭐ (Tooling) ⭐⭐⭐⭐

Recommendation

Start with Project 1 (The Parasite) and Project 2 (The Scanner) together.

This is the perfect feedback loop:

  1. Build the virus (Project 1).
  2. Build the tool to detect it (Project 2).
  3. Update the virus to bypass the tool.
  4. Update the tool to catch the new virus.

This cycle is exactly what happens in the real world between malware authors and antivirus companies. It will give you the deepest understanding of “how it works.”

Crucial Advice: Don’t just copy code. Read the PE File Format Specification (for Windows) or ELF Specification (for Linux). If you understand the file header, you understand the virus.

Final Overall Project: The “APT” (Advanced Persistent Threat) Simulation

  • File: LEARN_MALWARE_INTERNALS.md
  • Main Programming Language: Multiple
  • Difficulty: Level 5: Master

What you’ll build: A full chain attack simulation.

  1. Dropper: A harmless-looking PDF or Installer that drops the payload.
  2. Payload: A Packed (Project 5) binary that achieves Persistence (Registry run keys).
  3. Stealth: It uses Rootkit techniques (Project 4) to hide its process.
  4. C2: It beacons out to a server (from the previous cybersecurity response) to receive commands.
  5. Defense: You will then use your EDR Monitor (Project 6) and Sandbox (Project 7) to analyze and catch your own creation.

This closes the loop. You become the attacker to understand the attack, and the defender to understand the protection.


Summary

Project Main Language
The Parasite (Infector) C / Python
The Signature Hunter (AV) Python
The Polymorphic Engine Python / Assembly
The Ghost (Rootkit) C
The Packer C / Go
The Behavioral Analyst Python / C#
The Sandbox Python
The APT Simulation Mixed