Project 1: SELinux Context Explorer & Visualizer

Build a CLI that maps SELinux contexts across processes, files, and ports, detects label drift, and renders a human-readable graph of who can touch what.

Quick Reference

Attribute Value
Difficulty Level 1: Beginner
Time Estimate 6-10 hours
Main Programming Language Python (Alternatives: Go, Rust)
Alternative Programming Languages Go, Rust
Coolness Level Level 2
Business Potential 1
Prerequisites Linux shell basics, SELinux enabled VM, basic scripting
Key Topics Security contexts, file labeling, domain types, label drift detection

1. Learning Objectives

By completing this project, you will:

  1. Read and interpret SELinux contexts for processes, files, and ports.
  2. Detect label drift using policy-defined defaults and live labels.
  3. Build a reliable pipeline that normalizes SELinux output into structured data.
  4. Render an ASCII relationship map that links process domains to object types.
  5. Diagnose common SELinux label errors without disabling enforcement.

2. All Theory Needed (Per-Concept Breakdown)

This section includes every concept required to implement the explorer correctly.

Security Contexts and Type Enforcement

Fundamentals

A security context is the SELinux identity attached to kernel objects. It appears as user:role:type:level and is the core input to every SELinux decision. The type field is the most important piece for type enforcement (TE), which is the dominant policy model in targeted SELinux configurations. Processes run in domains (process types), and files or sockets have object types. When a process tries to access an object, SELinux compares the process type, object type, class, and permission against policy rules. Understanding this context format is essential because all CLI outputs (ls -Z, ps -eZ, semanage, audit logs) present this data in different ways. If you cannot reliably split and interpret contexts, you cannot reason about why access is allowed or denied. This project requires reading contexts, grouping them by type, and using them as keys in a data model that drives your visualization.

Deep Dive into the concept

Type Enforcement is the rule system that maps source_type -> target_type:class { perms } into allow or deny decisions. The context format is not decorative; it is how SELinux names security identities. The SELinux user and role fields are largely constant in targeted policy, but they matter for MLS and RBAC. The level field (s0 or s0:cX,cY) indicates MLS/MCS classification and is a critical dimension in container isolation. However, even in simple systems, the type is what differentiates httpd_t from sshd_t or init_t. This distinction makes it possible for two processes with the same Unix UID to have different access profiles. When your tool prints a context, you are effectively showing the input to the security server for any access request.

Parsing contexts correctly means understanding that a colon-separated string can include trailing categories or even multiple category ranges. Your parser must not assume a fixed length; instead, it should split into four fields with the last field potentially containing commas. For files, contexts are stored in extended attributes (security.selinux). For processes, the context lives in the kernel task structure and is exposed through /proc/<pid>/attr/current. When you read contexts from ps -eZ, you are reading a cached snapshot of that kernel value. For ports, semanage port -l shows port type mappings, which represent label assignments for port numbers. The key idea is that SELinux decisions are made against object classes. A file is class file, a directory is class dir, a TCP socket is tcp_socket, and so on. The same type can behave differently across classes (for example, the write permission on a file is different from write on a dir). Your tool does not need to evaluate policy, but it must preserve class and type information so that later analysis can map to allow rules if needed.

In terms of tooling, ls -Z displays file contexts, ps -eZ shows process contexts, and semanage fcontext -l lists path regex mappings to default labels. The difference between a file’s current label and its expected label is label drift. The only reliable way to determine expected labels is to consult policy-defined file context rules, not to guess based on location. The correct workflow is: compute expected label via matchpathcon, compare to current label via getfilecon, then decide whether to relabel. This is an important invariant for your tool: do not assume that /var/www implies httpd_sys_content_t without checking policy.

Finally, contexts drive transitions. An executable labeled httpd_exec_t triggers a transition into httpd_t when executed by an allowed domain. This means that if a binary is mislabeled, the resulting process domain may be incorrect. Your visualization can show surprising relationships, such as a systemd service running in init_t because the entrypoint is mislabeled. That is not just a labeling detail; it changes the entire enforcement surface. Building a context explorer makes these invisible mismatches visible and teaches you the primacy of labels in SELinux reasoning.

Additional operational notes on Security Contexts and Type Enforcement: In real systems, this concept interacts with policy versions, distribution defaults, and local overrides. Always record the exact policy version and runtime toggles when diagnosing behavior, because the same action can be allowed on one host and denied on another. When you change configuration related to this concept, capture before/after evidence (labels, logs, and outcomes) so you can justify the change, detect regressions, and roll it back if needed. Treat every tweak as a hypothesis: change one variable, re-run the same action, and compare results against a known baseline. This makes debugging repeatable and keeps your fixes defensible.

From a design perspective, treat Security Contexts and Type Enforcement as an invariant: define what success means, which data proves it, and what failure looks like. Build tooling that supports dry-run mode and deterministic fixtures so you can validate behavior without risking production. This also makes the concept teachable to others. Finally, connect the concept to security and performance trade-offs: overly broad changes reduce security signal, while overly strict changes create operational friction. Good designs surface these trade-offs explicitly so operators can make safe decisions.

How this fit on projects

You apply context parsing in §3.2 Functional Requirements, display it in §3.7 Real World Outcome, and validate it in §6.2 Critical Test Cases. The same context model is reused in P02-avc-denial-analyzer-auto-fixer.md to interpret AVC fields.

Definitions & key terms

  • context -> SELinux security label user:role:type:level
  • type -> primary TE decision token for processes and objects
  • domain -> process type (e.g., httpd_t)
  • object type -> label on files, ports, sockets
  • class -> kernel object class (file, dir, tcp_socket)

Mental model diagram

process:system_u:system_r:httpd_t:s0
                |
                v
object:system_u:object_r:httpd_sys_content_t:s0
                |
                v
allow rule: httpd_t httpd_sys_content_t:file { read getattr }

How it works (step-by-step, with invariants and failure modes)

  1. Read process contexts from ps -eZ or /proc/<pid>/attr/current.
  2. Read file contexts via getfilecon or ls -Z and normalize paths.
  3. Split each context into user, role, type, and level (preserve level text).
  4. Group by process type and object type to form a relationship graph.
  5. When labels are malformed, mark them as parse errors and continue.

Invariants: context has four fields; type is never empty; all nodes in the graph have a type. Failure modes: SELinux disabled yields empty contexts; truncated output from tools; permissions preventing context reads.

Minimal concrete example

$ ls -Z /var/www/html/index.html
system_u:object_r:httpd_sys_content_t:s0 /var/www/html/index.html

Common misconceptions

  • “SELinux context is just another permission string.” -> It is a label that drives policy, not a permission set.
  • “Type is optional.” -> Type is mandatory for TE decisions.
  • “Role matters for files.” -> Roles are primarily relevant to process domains.

Check-your-understanding questions

  1. Which field of the context is the key input to Type Enforcement?
  2. Why can two processes with the same UID behave differently under SELinux?
  3. What is label drift and how do you detect it?
  4. Predict the effect of labeling a daemon binary as unconfined_exec_t.

Check-your-understanding answers

  1. The type field.
  2. Because the process type (domain) can differ even for the same UID.
  3. Drift is when a file’s current label differs from the policy’s expected label; detect with matchpathcon vs getfilecon.
  4. The process may run in an unconfined domain, bypassing intended restrictions.

Real-world applications

  • Security audits that validate service confinement.
  • Compliance checks that detect mislabeled content.
  • Incident response to trace why a service can access sensitive files.

Where you’ll apply it

References

  • “SELinux System Administration” (Vermeulen), labeling and contexts chapters
  • “SELinux by Example” (Mayer et al.), TE basics
  • Red Hat SELinux Guide, context format section

Key insights

The type field of a context is the currency of SELinux policy; everything else is interpretation.

Summary

Security contexts uniquely label processes and objects. Parsing them correctly is the foundation for any SELinux tooling.

Homework/Exercises to practice the concept

  1. Use ps -eZ to list three different process domains and explain their roles.
  2. Find a file with default_t and explain why it is likely mislabeled.
  3. Use matchpathcon -V on a path and interpret the output.

Solutions to the homework/exercises

  1. Example: init_t for PID 1, sshd_t for SSH, httpd_t for Apache. Each domain is policy-isolated.
  2. default_t indicates no specific policy label matched the path; typically a mislabeled or unlabeled file.
  3. matchpathcon -V compares actual vs expected label and reports a mismatch if drift exists.

File Context Rules and Labeling Workflow

Fundamentals

SELinux labels for files are determined by file context rules, not by file ownership or permissions. These rules are path-based patterns stored in the policy, and tools like restorecon or setfiles apply them to actual files. When a file is created, it typically inherits the label of its parent directory, but this can be overridden by a more specific file context rule. Understanding this workflow is essential for detecting label drift and proposing safe fixes. Your tool must compute expected labels using policy rules and compare them to actual labels. This is the only safe way to decide whether relabeling is required. A simple directory move or tar extract can break labels, so the tool should help expose those differences and recommend semanage fcontext when non-standard paths are used.

Deep Dive into the concept

File labeling is a two-layer system: policy-defined defaults and runtime assignment. Policy stores regex-based mappings from path patterns to types. For example, /var/www(/.*)? might map to httpd_sys_content_t. The most specific matching rule wins, which means path precedence matters. When you run restorecon, it does not guess; it applies these mappings to the filesystem. If you use chcon for a quick fix, you set the label directly on the file, but that change is not persistent because a future relabel operation will revert it to the policy default. This is why semanage fcontext exists: it lets you define custom path mappings that become part of policy. The correct workflow is: identify drift, define a persistent fcontext rule, then relabel with restorecon.

A file’s label lives in extended attributes (security.selinux). This means the label is on the inode, not the pathname. When you move a file within the same filesystem, the label stays attached. When you copy a file, the label may or may not be preserved depending on the tool and flags. cp -a preserves xattrs; plain cp may not. When you mount a filesystem without SELinux xattr support or with context= mount options, labels can be lost or overridden. In those cases, everything can collapse to default_t, triggering widespread denials. Tools like fixfiles onboot and setfiles are used to relabel entire filesystems after such events.

In practical terms, label drift detection is a consistency check between the policy expectations and actual filesystem labels. The algorithm is: for each file of interest, run matchpathcon to compute expected label, then compare to getfilecon. If they differ, classify as drift. If the label is default_t or unlabeled_t, it is likely a policy or mount issue. Your tool can also detect when the expected label is itself undesirable because the path is outside standard directories. For example, an application installed in /opt/myapp should not be forced to use default_t; the better fix is to create a custom file context rule for /opt/myapp(/.*)? and label it with an app-specific type. This distinction is critical: relabeling to the default can sometimes make things worse if the default is not appropriate for the service.

Labeling is also tied to process transitions. The type of an executable determines which domain the process transitions into. If a binary is labeled bin_t instead of httpd_exec_t, the transition may not happen and the process runs in a less confined domain. Therefore, a label checker should include key executable paths and highlight mismatched labels. The explorer can also support a “safe fix” mode that prints commands to restore default labels without applying them. This respects the operational reality: relabeling production services can be disruptive if the policy expects different paths.

The stability of this workflow is an invariant: policy defines the expected label; restorecon applies it; drift is fixed by adjusting policy or relabeling. Your tool should never recommend chcon as the final answer. That is a temporary fix and violates the policy-driven nature of SELinux. This is also why your report should include the recommended semanage fcontext command when the desired label is not a default path, and then suggest restorecon -Rv to apply it.

How this fit on projects

Label drift detection is central to §3.2 and §3.7. The workflow is reinforced in P06-file-context-integrity-checker.md where drift detection is expanded into a compliance tool.

Definitions & key terms

  • file context rule -> policy mapping from path pattern to type
  • matchpathcon -> tool to compute expected label for a path
  • restorecon -> tool to apply default labels to files
  • xattr -> extended attribute storing security.selinux
  • relabel -> reapply policy-defined labels to files

Mental model diagram

path -> policy regex -> expected label
path -> inode xattr  -> actual label
expected == actual ? ok : drift

How it works (step-by-step, with invariants and failure modes)

  1. For each file path, call matchpathcon to get expected label.
  2. Read actual label via getfilecon or ls -Z.
  3. Compare labels; if mismatch, record drift with both labels.
  4. If expected label is wrong for the app, recommend semanage fcontext rule.
  5. Use restorecon -v to apply policy label after rules are set.

Invariants: expected label is derived from policy; restorecon never invents labels. Failure modes: missing policy rules, xattr unsupported filesystem, permissions preventing label reads.

Minimal concrete example

$ matchpathcon /srv/www/index.html
/srv/www/index.html system_u:object_r:default_t:s0

Common misconceptions

  • “chcon is the right fix.” -> It is not persistent.
  • “Moving a file changes its label.” -> It does not; labels are on the inode.
  • “restorecon is dangerous.” -> It only applies policy defaults.

Check-your-understanding questions

  1. Why is semanage fcontext preferable to chcon for production fixes?
  2. What happens to labels when you copy files without preserving xattrs?
  3. How does a mislabeled binary affect process domains?

Check-your-understanding answers

  1. semanage fcontext makes the mapping persistent across relabels; chcon does not.
  2. The label may be lost, resulting in default_t or incorrect types.
  3. The process may fail to transition into the intended domain, breaking confinement.

Real-world applications

  • Service deployment pipelines that enforce correct labels.
  • Compliance audits verifying that sensitive directories are labeled.
  • Automated remediation tools that fix drift safely.

Where you’ll apply it

References

  • “SELinux System Administration” (Vermeulen), file contexts chapter
  • Red Hat SELinux Guide, labeling tools

Key insights

Policy defines the expected label; your job is to align reality with that policy, not to bypass it.

Summary

File context rules map paths to labels. Drift detection is the comparison between policy expectations and actual xattrs.

Homework/Exercises to practice the concept

  1. Create a test directory under /srv/test and label it using semanage fcontext.
  2. Copy a file with and without -a and observe the label differences.
  3. Run restorecon -Rv on a directory and explain the changes.

Solutions to the homework/exercises

  1. Use semanage fcontext -a -t httpd_sys_content_t "/srv/test(/.*)?" then restorecon -Rv /srv/test.
  2. Without -a, labels may become default_t; with -a, labels are preserved.
  3. restorecon applies policy defaults and fixes drift.

Process Domains, Transitions, and Access Checks

Fundamentals

A process domain is the SELinux type assigned to a running process. Domains are the “subjects” in access decisions. When a process executes a binary, SELinux can transition it into a new domain if a transition rule exists and the binary is labeled as an entrypoint type. This matters because the same Unix user can execute different binaries that end up in different SELinux domains, each with different privileges. Your explorer needs to show domain-to-resource relationships so that users can see which domains touch which files. This helps diagnose why a service is denied when it is running under an unexpected domain, such as init_t or unconfined_t.

Deep Dive into the concept

Domain transitions are the heart of SELinux confinement. The policy specifies that when a process of type init_t executes a binary labeled httpd_exec_t, the resulting process transitions to httpd_t. This is controlled by type_transition rules and the entrypoint permission on the executable. If a binary is mislabeled, the transition may not occur, and the process remains in its caller domain. This is a common source of “why is my service unconfined” issues. Another subtlety is that domain transitions can be conditional on booleans, on roles, or on MLS levels, which means that the same binary can behave differently under different system configurations.

Access checks are performed by LSM hooks on each relevant syscall. For file access, SELinux checks a combination of class and permission: file class operations include read, write, append, execute, and getattr. For directories, the permissions include search, add_name, remove_name, and write. This is why a process might be able to read a file but not traverse a directory. Your explorer does not evaluate these checks, but it should be aware that access is per-class. When you render a mapping like “httpd_t reads httpd_sys_content_t”, you are collapsing a complex set of per-class permissions. In a more advanced version, you could categorize object types by class or permission, but for this beginner project the relationship map is enough.

Another key idea is that domains can be in permissive mode individually. SELinux supports per-domain permissive settings, which means httpd_t can be permissive while the system is enforcing. This can mask problems if you assume system-wide enforcement is the whole story. Your tool can optionally check for permissive domains via semanage permissive -l and report them in the summary. That is a real-world reliability practice.

Processes can also transition via exec from one domain to another inside a service. For example, a CGI script executed by httpd_t can transition into httpd_sys_script_t under some policies. This is why mapping process domains to files is meaningful: the domain influences which file types the process can access, and transitions are the mechanism that changes domains during program execution. Your explorer should capture the domain of each PID and link it to the resources it touches, not just the service name.

Finally, consider that processes can be labeled incorrectly due to path labeling errors, incorrect service files, or policy misconfiguration. A systemd unit that uses a custom binary path without proper labeling might run in init_t, which is often far more permissive than intended. Your tool should highlight “unexpected domains” by comparing the process domain to known service domain types. Even a simple rule set like “httpd should be httpd_t” helps catch these misconfigurations early.

Operational expansion for Process Domains, Transitions, and Access Checks: In real systems, the behavior you observe is the product of policy, labels, and runtime state. That means your investigation workflow must be repeatable. Start by documenting the exact inputs (contexts, paths, users, domains, ports, and the action attempted) and the exact outputs (audit events, error codes, and any policy query results). Then, replay the same action after each change so you can attribute cause and effect. When the concept touches multiple subsystems, isolate variables: change one label, one boolean, or one rule at a time. This reduces confusion and prevents accidental privilege creep. Use staging environments or fixtures to test fixes before deploying them widely, and always keep a rollback path ready.

To deepen understanding, connect Process Domains, Transitions, and Access Checks to adjacent concepts: how it affects policy decisions, how it appears in logs, and how it changes operational risk. Build small verification scripts that assert the expected outcome and fail loudly if the outcome diverges. Over time, these scripts become a regression suite for your SELinux posture. Finally, treat the concept as documentation-worthy: write down the invariants it guarantees, the constraints it imposes, and the exact evidence that proves it works. This makes future debugging faster and creates a shared mental model for teams.

How this fit on projects

Label drift detection is central to §3.2 and §3.7. The workflow is reinforced in P06-file-context-integrity-checker.md where drift detection is expanded into a compliance tool.

Further depth on Process Domains, Transitions, and Access Checks: In production environments, this concept is shaped by policy versions, automation layers, and distro-specific defaults. To keep reasoning consistent, capture a minimal evidence bundle every time you analyze behavior: the policy name/version, the exact labels or contexts involved, the command that triggered the action, and the resulting audit event. If the same action yields different decisions on two hosts, treat that as a signal that a hidden variable changed (boolean state, module priority, label drift, or category range). This disciplined approach prevents trial-and-error debugging and makes your conclusions defensible.

Operationally, build a short checklist for Process Domains, Transitions, and Access Checks: verify prerequisites, verify labels or mappings, verify policy query results, then run the action and confirm the expected audit outcome. Track metrics that reflect stability, such as the count of denials per hour, the number of unique denial keys, or the fraction of hosts in compliance. When you must change behavior, apply the smallest change that can be verified (label fix before boolean, boolean before policy). Document the rollback path and include a post-change validation step so the system returns to a known-good state.

Definitions & key terms

  • file context rule -> policy mapping from path pattern to type
  • matchpathcon -> tool to compute expected label for a path
  • restorecon -> tool to apply default labels to files
  • xattr -> extended attribute storing security.selinux
  • relabel -> reapply policy-defined labels to files

Mental model diagram

path -> policy regex -> expected label
path -> inode xattr  -> actual label
expected == actual ? ok : drift

How it works (step-by-step, with invariants and failure modes)

  1. For each file path, call matchpathcon to get expected label.
  2. Read actual label via getfilecon or ls -Z.
  3. Compare labels; if mismatch, record drift with both labels.
  4. If expected label is wrong for the app, recommend semanage fcontext rule.
  5. Use restorecon -v to apply policy label after rules are set.

Invariants: expected label is derived from policy; restorecon never invents labels. Failure modes: missing policy rules, xattr unsupported filesystem, permissions preventing label reads.

Minimal concrete example

$ matchpathcon /srv/www/index.html
/srv/www/index.html system_u:object_r:default_t:s0

Common misconceptions

  • “chcon is the right fix.” -> It is not persistent.
  • “Moving a file changes its label.” -> It does not; labels are on the inode.
  • “restorecon is dangerous.” -> It only applies policy defaults.

Check-your-understanding questions

  1. Why is semanage fcontext preferable to chcon for production fixes?
  2. What happens to labels when you copy files without preserving xattrs?
  3. How does a mislabeled binary affect process domains?

Check-your-understanding answers

  1. semanage fcontext makes the mapping persistent across relabels; chcon does not.
  2. The label may be lost, resulting in default_t or incorrect types.
  3. The process may fail to transition into the intended domain, breaking confinement.

Real-world applications

  • Service deployment pipelines that enforce correct labels.
  • Compliance audits verifying that sensitive directories are labeled.
  • Automated remediation tools that fix drift safely.

Where you’ll apply it

References

  • “SELinux System Administration” (Vermeulen), file contexts chapter
  • Red Hat SELinux Guide, labeling tools

Key insights

Policy defines the expected label; your job is to align reality with that policy, not to bypass it.

Summary

File context rules map paths to labels. Drift detection is the comparison between policy expectations and actual xattrs.

Homework/Exercises to practice the concept

  1. Create a test directory under /srv/test and label it using semanage fcontext.
  2. Copy a file with and without -a and observe the label differences.
  3. Run restorecon -Rv on a directory and explain the changes.

Solutions to the homework/exercises

  1. Use semanage fcontext -a -t httpd_sys_content_t "/srv/test(/.*)?" then restorecon -Rv /srv/test.
  2. Without -a, labels may become default_t; with -a, labels are preserved.
  3. restorecon applies policy defaults and fixes drift.

3. Project Specification

3.1 What You Will Build

A CLI tool named selctx that scans processes, files, and ports, normalizes SELinux context data, detects label drift, and renders a readable ASCII map of domain-to-object relationships.

Included features:

  • Process context scan and grouping by domain
  • File context scan for configured paths (recursively)
  • Port type listing and basic checks
  • Label drift detection using matchpathcon
  • ASCII relationship map and JSON output

Excluded features:

  • Full policy evaluation or allow-rule derivation
  • Automated policy module generation
  • GUI interface (CLI only for this project)

3.2 Functional Requirements

  1. Context Collection: Collect process contexts from ps -eZ or /proc and file contexts from getfilecon.
  2. Normalization: Parse contexts into structured fields: user, role, type, level.
  3. Mapping: Build a mapping from process type -> object types accessed or present.
  4. Drift Detection: Compare current vs expected labels for a given path using matchpathcon.
  5. Reporting: Render an ASCII summary and output JSON for scripting.
  6. Filtering: Support filters by process name, domain, or path prefix.

3.3 Non-Functional Requirements

  • Performance: Handle 10k files in under 5 seconds on a VM.
  • Reliability: Continue scanning even if individual paths are unreadable.
  • Usability: Provide clear remediation suggestions and exit codes.

3.4 Example Usage / Output

$ selctx map --process httpd --paths /var/www,/var/log/httpd

Process Domain: httpd_t
  Reads:
    httpd_sys_content_t  -> /var/www/html/index.html
  Writes:
    httpd_log_t          -> /var/log/httpd/access_log
  Denied (from audit cache):
    shadow_t             -> /etc/shadow

Drift Summary:
  /var/www/app.conf
    current:  system_u:object_r:default_t:s0
    expected: system_u:object_r:httpd_sys_content_t:s0

3.5 Data Formats / Schemas / Protocols

JSON output schema (v1):

{
  "version": "1.0",
  "scanned_at": "2026-01-01T00:00:00Z",
  "domains": {
    "httpd_t": {
      "processes": [2134, 2135],
      "objects": {
        "httpd_sys_content_t": ["/var/www/html/index.html"],
        "httpd_log_t": ["/var/log/httpd/access_log"]
      },
      "drift": [
        {
          "path": "/var/www/app.conf",
          "current": "system_u:object_r:default_t:s0",
          "expected": "system_u:object_r:httpd_sys_content_t:s0"
        }
      ]
    }
  }
}

3.6 Edge Cases

  • SELinux disabled or in permissive mode
  • Filesystems without xattr support (labels missing)
  • Huge directories with millions of files
  • Processes exiting during scan
  • Paths containing spaces or unusual characters

3.7 Real World Outcome

A reliable, deterministic CLI report that operators can compare to expected policy behavior.

3.7.1 How to Run (Copy/Paste)

# From the project root
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Example scan
./selctx map --process httpd --paths /var/www,/var/log/httpd --format text

3.7.2 Golden Path Demo (Deterministic)

Use a fixed input fixture:

  • Sample ps -eZ output stored in fixtures/ps_eZ.txt
  • Sample ls -Z output stored in fixtures/ls_Z.txt

Run:

./selctx map --fixture fixtures/ps_eZ.txt --fixture-files fixtures/ls_Z.txt --format text

Expected summary: 1 domain, 2 object types, 0 drift entries.

3.7.3 CLI Transcript (Success and Failure)

$ ./selctx drift /srv/www
MISMATCH: /srv/www/app.conf
  current:  system_u:object_r:default_t:s0
  expected: system_u:object_r:httpd_sys_content_t:s0

Exit code: 0

$ ./selctx drift /does/not/exist
ERROR: path not found: /does/not/exist
Exit code: 2

3.7.4 If CLI: exit codes

  • 0 success
  • 1 partial success (scan completed with unreadable paths)
  • 2 invalid input/path
  • 3 SELinux disabled or labels unavailable

4. Solution Architecture

4.1 High-Level Design

            +-------------------+
            |   CLI Frontend    |
            +---------+---------+
                      |
                      v
+---------------------+---------------------+
| Context Collector & Normalizer            |
|  - process scan   - file scan   - ports    |
+---------------------+---------------------+
                      |
                      v
+---------------------+---------------------+
| Drift Analyzer & Policy Matcher           |
+---------------------+---------------------+
                      |
                      v
+---------------------+---------------------+
| Report Renderer (ASCII + JSON)            |
+-------------------------------------------+

4.2 Key Components

Component Responsibility Key Decisions
Collector Gather contexts from system tools Use ps -eZ and getfilecon for portability
Normalizer Parse contexts into fields Strict parser with fallback for malformed entries
Drift Checker Compare expected vs actual labels Use matchpathcon for policy truth
Renderer Produce ASCII map and JSON Keep output deterministic for tests

4.3 Data Structures (No Full Code)

# Core data model
DomainMap = {
  "domain_type": {
    "pids": [123, 456],
    "objects": {"object_type": ["/path", "/path2"]},
    "drift": [
      {"path": "/path", "current": "...", "expected": "..."}
    ]
  }
}

4.4 Algorithm Overview

Key Algorithm: Drift Detection

  1. Normalize path list and skip duplicates.
  2. For each path, compute expected label via matchpathcon.
  3. Read actual label via getfilecon.
  4. Compare and record mismatches.

Complexity Analysis:

  • Time: O(n) in number of scanned files
  • Space: O(n) for stored results

5. Implementation Guide

5.1 Development Environment Setup

# Fedora/RHEL-based
sudo dnf install -y policycoreutils-python-utils libselinux-utils

# Debian/Ubuntu with SELinux
sudo apt-get install -y selinux-utils selinux-basics policycoreutils

5.2 Project Structure

selctx/
├── selctx/
│   ├── __init__.py
│   ├── cli.py
│   ├── collect.py
│   ├── parse.py
│   ├── drift.py
│   └── render.py
├── fixtures/
├── tests/
└── README.md

5.3 The Core Question You’re Answering

“What does SELinux actually see when it makes an allow or deny decision?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. Security context fields and how to parse them.
  2. File context rules vs actual labels (matchpathcon vs ls -Z).
  3. Process domains and domain transitions.

5.5 Questions to Guide Your Design

  1. Which system tools are stable enough to parse in production?
  2. How will you keep output deterministic for tests?
  3. What is your minimal JSON schema for scripting?

5.6 Thinking Exercise

Draw a small graph with three domains and three object types. Label which accesses are expected and which are suspicious. Compare that graph to a selctx output you want to see.

5.7 The Interview Questions They’ll Ask

  1. “What field in a context determines TE decisions?”
  2. “How do you detect a mislabeled file?”
  3. “Why is restorecon safer than chcon?”
  4. “What is a domain transition?”

5.8 Hints in Layers

Hint 1: Start with ps -eZ parsing

Hint 2: Add matchpathcon -V for drift

Hint 3: Render a simple ASCII tree before adding colors or JSON

5.9 Books That Will Help

Topic Book Chapter
Contexts and labeling “SELinux System Administration” Contexts, labeling chapters
TE basics “SELinux by Example” Ch. 2-3
Linux audit “The Linux Programming Interface” Security/audit sections

5.10 Implementation Phases

Phase 1: Foundation (2-3 hours)

Goals:

  • Parse contexts from ps -eZ and ls -Z.
  • Normalize into structured records.

Tasks:

  1. Implement context parser with strict validation.
  2. Build a small sample dataset from fixtures.

Checkpoint: Parser correctly splits 50 sample contexts without errors.

Phase 2: Core Functionality (3-5 hours)

Goals:

  • Implement drift detection and mapping.
  • Render ASCII output.

Tasks:

  1. Add matchpathcon integration.
  2. Build domain->object graph and renderer.

Checkpoint: Output matches the golden path example.

Phase 3: Polish & Edge Cases (1-2 hours)

Goals:

  • Handle missing labels and errors.
  • Add JSON output and exit codes.

Tasks:

  1. Implement structured errors and exit codes.
  2. Add unit tests for malformed contexts.

Checkpoint: All tests pass, and error output is consistent.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Data source for processes ps -eZ vs /proc ps -eZ Stable output and easy parsing
Drift comparison direct string compare vs normalized type compare full string compare Captures MLS differences
Output format text only vs text + JSON text + JSON Enables automation

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Validate parsing context parser, label parser
Integration Tests Run against fixtures drift detection on known data
Edge Case Tests Missing labels, invalid paths ensure graceful errors

6.2 Critical Test Cases

  1. Malformed context: missing fields should produce parse error and not crash.
  2. Label drift: mismatch should be reported with both current and expected labels.
  3. SELinux disabled: tool should exit with code 3 and a clear message.

6.3 Test Data

fixtures/ps_eZ.txt
fixtures/ls_Z.txt
fixtures/matchpathcon.txt

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Parsing contexts by naive split Incorrect levels or types Split into 4 fields, preserve level string
Using chcon for fixes Fixes revert after relabel Use semanage fcontext + restorecon
Assuming SELinux enabled Empty contexts Check sestatus and exit early

7.2 Debugging Strategies

  • Use fixtures first: avoid live system variability when testing.
  • Compare matchpathcon vs ls -Z: isolate drift vs policy.

7.3 Performance Traps

  • Recursing huge directories without pruning can make the tool unusable.
  • Repeated calls to matchpathcon can be slow; cache results when possible.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add CSV output for spreadsheet analysis.
  • Add a filter for specific domains.

8.2 Intermediate Extensions

  • Integrate with ausearch to show recent denials per domain.
  • Add a port label report via semanage port -l.

8.3 Advanced Extensions

  • Visualize relationships as a DOT graph.
  • Add cross-host diffing with SSH.

9. Real-World Connections

9.1 Industry Applications

  • Security audits: validate that web services are confined correctly.
  • Compliance: detect label drift in regulated environments.
  • setroubleshoot: illustrates how SELinux diagnostics can be automated.
  • selinux-python: provides bindings used for context queries.

9.3 Interview Relevance

  • SELinux troubleshooting workflows
  • Understanding of TE and labeling

10. Resources

10.1 Essential Reading

  • “SELinux System Administration” by Sven Vermeulen (labeling and contexts)
  • “SELinux by Example” by Mayer et al. (TE fundamentals)

10.2 Video Resources

  • Red Hat SELinux basics (conference talk)
  • Linux Security Modules overview (kernel summit session)

10.3 Tools & Documentation

  • matchpathcon, restorecon, semanage
  • ps -eZ, ls -Z

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain the context format and which field drives TE.
  • I can explain label drift and how to fix it correctly.
  • I can describe why a mislabeled binary changes a domain.

11.2 Implementation

  • All functional requirements are met.
  • JSON output matches the schema.
  • Edge cases are handled with correct exit codes.

11.3 Growth

  • I can explain this tool to a teammate and defend design choices.
  • I documented lessons learned about labeling.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Parses and prints contexts for processes and files.
  • Detects label drift for a given path.
  • Produces deterministic output from fixtures.

Full Completion:

  • All minimum criteria plus JSON output and filtering.
  • Handles SELinux-disabled and xattr-missing cases gracefully.

Excellence (Going Above & Beyond):

  • Adds DOT graph output and denials integration.
  • Includes cross-host diffing and a drift report summary.

13 Additional Content Rules (Hard Requirements)

13.1 Determinism

  • Use fixtures for tests.
  • Freeze timestamps in output when --fixture is used.

13.2 Outcome Completeness

  • Provide at least one success and one failure demo.
  • Define CLI exit codes and error messages.

13.3 Cross-Linking

  • Concepts link to §3.2, §3.7, §5.10, and related projects.

13.4 No Placeholder Text

  • All sections contain concrete, actionable content.