⚡ This product was generated with Kupkaike in under 4 minutes
Create Your Own Product →56 chapters, 12k+ words. Ready to sell in minutes — not months.
A complete, chapter-by-chapter guide to codifying your home lab with Terraform — built exclusively for Proxmox, pfSense, TrueNAS, and local infrastructure, not AWS. Go from a fragile, manually-configured lab to a version-controlled stack you can fully rebuild in under 30 minutes.

No editing, no design skills, no copywriting — just a niche idea and Kupkaike did the rest.
Generated by Claude Opus 4.6. Real content, unedited.
You already know enough Terraform to be frustrated. You've worked through the official docs, watched the courses, and understand the syntax — but every single example provisions an EC2 instance or an Azure resource group. When you try to apply those patterns to your Proxmox cluster, your pfSense VLANs, or your TrueNAS shares, you hit a wall. The providers are different, the state backend options change, the networking model doesn't map cleanly, and there's no Stack Overflow thread that covers your exact setup. So your lab stays what it's always been: a collection of manually configured machines that you secretly dread having to rebuild.
Like what you see?
This guide was written from the ground up for home lab infrastructure — not adapted from enterprise cloud patterns. Every chapter addresses the specific gaps that local infrastructure creates: managing Terraform state without S3 or Terraform Cloud, working with self-signed TLS and split-DNS, provisioning VLANs through pfSense and OPNsense providers, templating VMs in Proxmox without cloud-init shortcuts, and structuring a modular codebase that grows with your lab without becoming a maintenance burden. Every code pattern in the book has been tested against real consumer hardware, not hypothetical cloud resources billed by the minute.
The blueprint covers eight chapters that take you from auditing your existing 'snowflake' lab through full CI/CD-driven rebuilds: compute codification for VMs and containers, networking as code for VLANs and firewall rules, storage and service automation for TrueNAS and reverse proxies, and a final chapter dedicated to the 30-minute rebuild workflow. You also get three practical bonuses — a ready-to-clone Proxmox + pfSense starter repository with annotated working configs, a single-page cheat sheet mapping common lab tasks to exact Terraform resource blocks, and a step-by-step import guide for bringing 20 existing manually-created resources into state without downtime. The outcome is a git-committed, reproducible lab you can tear down and rebuild confidently — not someday, but the next time you need to.
---
Like what you see?
---
You know exactly what your lab does. You built it piece by piece, and every weird hostname, hand-edited `/etc/network/interfaces`, and VLAN tag lives in your head. That's precisely the problem — and this chapter is how you get it out of your head and into a format Terraform can actually consume.
Before you write a single `.tf` file, you need a complete picture of what exists. Most people skip this and start coding their most familiar VM first. Six hours later, they've got a Terraform resource for their Nginx reverse proxy that silently depends on a DNS entry, a static IP assignment in their router's DHCP reservation table, and a Proxmox tag that controls firewall rules — none of which are in code yet. The whole thing collapses the moment they run `terraform destroy`.
The Lab Topology Capture Method (LTCM) is a five-phase audit process that produces a single inventory document: your Terraform-ready resource map. Think of it as the bill of materials for your lab.
---
Phase 1: Enumerate Every Compute Resource
Log into every hypervisor and list every running and stopped VM and container. Don't filter — a stopped VM is still state. For each resource, capture:
On Proxmox, run `qm list` and `pct list` on each node. Pipe the output into a file. For ESXi, use `vim-cmd vmsvc/getallvms`. Don't trust the GUI alone — it lies by omission.
---
Phase 2: Map Your Network Topology
Draw (or document) every network segment that exists. This means:
If you're running pfSense or OPNsense, export your configuration XML right now. That file contains your entire network state. You'll reference it constantly.
---
Phase 3: Inventory Storage Targets
For each storage pool, document:
This matters because Terraform's Proxmox provider (`bpg/proxmox`) requires you to specify `datastore_id` per disk. If you have VMs split across `local-zfs`, `nas-nfs`, and `local-lvm`, you need to know which is which before you write a single resource block.
---
Phase 4: Catalog Services and Their Dependencies
List every service your lab runs — not VMs, services. Nginx, Gitea, Home Assistant, Vaultwarden, Grafana, whatever you've got. For each service:
This phase reveals your implicit state — the configuration that exists only in a web GUI or a sysadmin's memory. That OPNsense HAProxy rule pointing to `192.168.10.45`? That's implicit state. The Proxmox firewall alias called `trusted_hosts` that you built in the UI two years ago? Implicit state. Every piece of implicit state is a rebuild failure waiting to happen, and every one of them needs to become a Terraform resource or a documented manual step.
---
Phase 5: Document Provider Access and Credentials
For every platform you'll manage with Terraform, you need:
Write these down in a `providers.md` file in your future repo. You'll reference this when configuring your `provider` blocks and when setting up your `terraform.tfvars` or environment variables for secrets.
---
Marcus runs a three-node Proxmox cluster with 14 VMs, 6 LXC containers, and a TrueNAS box on a dedicated storage VLAN. He starts the LTCM audit and immediately discovers three problems he didn't know he had.
First, his Gitea VM (`vmid 105`) has its disk split across two storage pools — the OS disk is on `local-zfs` on `pve-node01`, but a secondary data disk was manually migrated to `truenas-nfs` six months ago and never documented. His mental model of that VM was wrong.
Second, his monitoring stack (Prometheus on `vmid 108`, Grafana on `vmid 109`) depends on a static DNS entry in Pi-hole that points to `192.168.20.15` — but that IP is a DHCP reservation in OPNsense, not a static assignment in the VM itself. If he rebuilds the VM and the MAC address changes (which it will, unless he specifies it in Terraform), the reservation breaks, Prometheus loses its scrape target, and Grafana shows nothing.
Third, his Proxmox firewall has an IP set called `mgmt_hosts` that controls SSH access to all VMs. It was created in the GUI. It has no corresponding Terraform resource. If he runs `terraform apply` on a fresh cluster, SSH to every VM will be open to his entire LAN until he manually recreates that IP set.
The LTCM audit surfaces all three issues before Marcus writes a single `.tf` file. He adds the split-disk VM to his Tier 2 codification list (complex, needs careful handling), documents the DNS/DHCP dependency chain, and flags the firewall IP set as Tier 1 implicit state that must be codified before anything else touches the firewall.
---
Use this as a multi-tab document (Google Sheets, Excel, or a Markdown table set). Fill in every row before moving to Chapter 2.
---
Tab 1 — Compute Inventory
| VMID | Name | Type | Node | vCPU | RAM (GB) | Disk (GB) | Storage Pool | Bridge | VLAN | IP Address | OS | Tier |
|------|------|------|------|------|----------|-----------|--------------|--------|------|------------|-----|------|
| `___` | `___` | VM/LXC | `___` | `___` | `___` | `___` | `___` | `___` | `___` | `___` | `___` | 1/2/3 |
Tier 1 = Codify first (foundational services, DNS, firewall). Tier 2 = Codify second (application VMs). Tier 3 = Codify last or document as manual (one-off experiments, rarely rebuilt).
---
Tab 2 — Network Segments
| VLAN ID | Name/Purpose | Subnet | Gateway | DHCP Range | DNS Server | Firewall Zone | Notes |
|---------|-------------|--------|---------|------------|------------|---------------|-------|
| `___` | `___` | `___` | `___` | `___` | `___` | `___` | `___` |
---
Tab 3 — Storage Pools
| Pool Name | Type | Nodes | Total (TB) | Free (TB) | NFS/iSCSI Path | VMs Using It |
|-----------|------|-------|------------|-----------|----------------|--------------|
| `___` | `___` | `___` | `___` | `___` | `___` | `___` |
---
Tab 4 — Services and Dependencies
| Service Name | Host VM/LXC | IP:Port | Depends On | DNS Entry | Data Location | Implicit State? |
|-------------|-------------|---------|------------|-----------|---------------|-----------------|
| `___` | `___` | `___` | `___` | `___` | `___` | Yes/No |
---
Tab 5 — Provider Access
| Platform | API Endpoint | Auth Method | Token/User | Permissions Needed | TLS Verified? | Notes |
|----------|-------------|-------------|------------|-------------------|---------------|-------|
| Proxmox | `https://___:8006` | API Token | `___` | `___` | Yes/No | `___` |
| OPNsense | `https://___` | API Key | `___` | `___` | Yes/No | `___` |
| TrueNAS | `https://___/api/v2.0` | API Key | `___` | `___` | Yes/No | `___` |
---
Tab 6 — Implicit State Register
| Item Description | Where It Lives Now | Terraform Resource Type | Tier | Owner |
|-----------------|-------------------|------------------------|------|-------|
| `___` | GUI / config file / memory | `___` | 1/2/3 | `___` |
---
Like what you see?
You've mapped your lab topology in Chapter 1 — you know what's running, what API it exposes, and what credentials you'll need. Now the frustrating part begins: every Terraform getting-started guide points you at `hashicorp/aws` and calls it a day. Your Proxmox node doesn't have an AWS access key, and your pfSense box has never heard of an IAM role.
This chapter gets your actual infrastructure wired into Terraform with the right providers, the right directory structure, and a working `terraform plan` that talks to real hardware.
---
The Local Provider Stack Blueprint is a five-stage process for bootstrapping a multi-platform home lab Terraform project from scratch — covering provider selection, credential isolation, directory layout, version locking, and connectivity verification. Unlike cloud-centric setups where one provider handles everything, your home lab is a federation of independent APIs. The blueprint treats them that way.
Stage 1: Inventory Your API Surface
Pull out your Lab Topology Capture Inventory from Chapter 1. For each platform, identify the API type it exposes:
Write these down. You're not installing every provider on day one — only the ones your topology inventory confirmed are in your lab.
Stage 2: Create API Tokens (Not Passwords)
Never store your admin password in Terraform. Every platform in your stack supports token-based auth:
Proxmox: Navigate to Datacenter → Permissions → API Tokens. Create a token under a dedicated user (e.g., `terraform@pve`). Assign the minimum required role — `PVEVMAdmin` for VM management, `PVEDatastoreUser` for storage. Copy the token secret immediately; Proxmox won't show it again. The token ID format is `user@realm!tokenname`.
pfSense: Install the `pfSense-api` package (System → Package Manager). Navigate to System → API → Keys, generate a client ID and client token. Restrict the key to the management VLAN interface only.
TrueNAS: Go to Credentials → API Keys, create a key scoped to your use case. TrueNAS SCALE supports key descriptions — use `terraform-homelab` so you know what to rotate when something breaks.
Docker: If you're running Docker locally on the same machine as Terraform, the Unix socket (`unix:///var/run/docker.sock`) is sufficient. For remote Docker hosts, generate a TLS client certificate pair and store them in your secrets directory.
Stage 3: Secure Credential Storage
Create a `secrets/` directory at the project root. Add it to `.gitignore` immediately — before you type a single credential. Use a `terraform.tfvars` file inside `secrets/` for sensitive values, and reference them via variables in your provider blocks.
```
echo "secrets/" >> .gitignore
echo "*.tfvars" >> .gitignore
```
For teams or if you want to go further, HashiCorp Vault or `sops` with age encryption are solid options. For a solo home lab, a `.gitignore`-protected `secrets/terraform.tfvars` file is acceptable — just don't skip the gitignore step.
Stage 4: Project Directory Layout
Cloud tutorials push a flat mono-repo. Your multi-platform lab needs separation by platform boundary, not by resource type. Here's the structure that works:
```
homelab-iac/
├── .terraform.lock.hcl
├── .gitignore
├── versions.tf # Terraform and provider version constraints
├── providers.tf # All provider configurations
├── variables.tf # Input variable declarations
├── secrets/
│ └── terraform.tfvars # Actual credential values (gitignored)
├── modules/
│ ├── proxmox-vm/
│ ├── docker-container/
│ ├── pfsense-firewall-rule/
│ └── truenas-dataset/
└── environments/
├── core-infra/ # DNS, VLAN configs, base VMs
└── services/ # Application-layer containers and VMs
```
The `environments/` split is critical. Your core infrastructure (Proxmox VMs, pfSense rules, TrueNAS datasets) changes infrequently. Your services layer (Docker containers, app configs) changes constantly. Separating them means a botched `terraform apply` on a new container doesn't risk your firewall rules.
Stage 5: Version Pin Everything
In `versions.tf`, pin Terraform itself and every provider to a minimum version with a pessimistic constraint operator:
```hcl
terraform {
required_version = "~> 1.7.0"
required_providers {
proxmox = {
source = "telmate/proxmox"
version = "~> 2.9.14"
}
libvirt = {
source = "dmacvicar/libvirt"
version = "~> 0.7.6"
}
docker = {
source = "kreuzwerker/docker"
version = "~> 3.0.2"
}
}
}
```
Run `terraform init` once, then commit `.terraform.lock.hcl` to git. This file records the exact provider checksums. When you rebuild your lab on a new machine, `terraform init` will pull the identical provider binaries — not whatever the latest breaking version happens to be that week.
---
Scenario: Marcus runs a Proxmox cluster (two nodes) with pfSense as his gateway and a TrueNAS VM for NAS storage. He completed the Lab Topology Capture Inventory in Chapter 1 and confirmed three API surfaces: Proxmox REST, pfSense REST (he installed pfSense-api last month), and TrueNAS SCALE REST.
Marcus creates his project directory, adds `secrets/` to `.gitignore`, and creates three API tokens — one per platform. In Proxmox, he creates `terraform@pve!homelab-token` with `PVEVMAdmin` on the root path. In pfSense, he generates a client ID/token pair and notes the management interface is `192.168.1.1`. In TrueNAS, he creates a key named `terraform-homelab`.
He drops all three credential pairs into `secrets/terraform.tfvars`:
```hcl
proxmox_api_token_id = "terraform@pve!homelab-token"
proxmox_api_token_secret = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
pfsense_client_id = "abc123"
pfsense_client_token = "def456"
truenas_api_key = "1-xxxxxxxxxxxxxxxxxxx"
```
He writes his `providers.tf`, runs `terraform init`, and executes `terraform plan`. The plan returns zero changes and no errors — confirming all three providers authenticated successfully. He commits `.terraform.lock.hcl` and `providers.tf` to git, but not `secrets/`. That's the entire foundation. Everything built in later chapters sits on top of this verified base.
---
Use this template to document each provider as you configure it. Fill in every row before moving to Chapter 3.
```
PROJECT ROOT: ___________________________________
TERRAFORM VERSION: ______________________________
DATE INITIALIZED: _______________________________
┌─────────────────────────────────────────────────────────────────┐
│ PROXMOX PROVIDER │
├─────────────────────────────────────────────────────────────────┤
│ Proxmox Host URL: https://_______________:8006/api2/json │
│ API Token User: _______________@pve │
│ API Token Name: !_______________ │
│ Token stored in: secrets/terraform.tfvars [ ] │
│ Role assigned: _______________ │
│ Scope (node/datacenter): _______________ │
│ `terraform plan` result: PASS [ ] FAIL [ ] │
│ Working provider block committed to git: YES [ ] NO [ ] │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ PFSENSE PROVIDER │
├─────────────────────────────────────────────────────────────────┤
│ pfSense Management URL: https://_______________ │
│ API Client ID: _______________ │
│ API Client Token: stored in secrets/ [ ] │
│ Interface restriction: _______________ │
│ pfSense-api version: _______________ │
│ `terraform plan` result: PASS [ ] FAIL [ ] │
│ Working provider block committed to git: YES [ ] NO [ ] │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ DOCKER PROVIDER │
├─────────────────────────────────────────────────────────────────┤
│ Connection type: Unix socket [ ] TCP+TLS [ ] │
│ Host address: _______________ │
│ TLS cert path (if TCP): _______________ │
│ `terraform plan` result: PASS [ ] FAIL [ ] │
│ Working provider block committed to git: YES [ ] NO [ ] │
You've mapped your lab topology and written your first resources — and right now, your Terraform state lives in a `terraform.tfstate` file sitting next to your `.tf` files. That file is the single most important artifact your entire IaC setup produces, and it's one corrupted NVMe drive away from making your entire codebase a lie.
---
The Self-Hosted State Vault System is a four-stage architecture for storing, locking, versioning, and recovering Terraform state without touching a cloud provider. It's designed specifically for lab environments where you already have running services — Proxmox, TrueNAS, maybe a GitLab instance — and you want state management that integrates with what you already have rather than bolting on something new.
Why local state files will eventually destroy you
A local `terraform.tfstate` file has three failure modes that are uniquely painful in a home lab:
The fix isn't Terraform Cloud. It's a self-hosted backend with three non-negotiable properties: S3-compatible storage (for the state blob), a locking mechanism (so concurrent applies are blocked), and versioned snapshots (so you can roll back to any prior state).
---
Stage 1: Choose Your Backend
Your existing lab services determine which backend makes the most sense. The three viable options for a home lab are:
Use the decision tree worksheet below before proceeding.
Stage 2: Deploy MinIO as Your S3-Compatible Backend
If you're running Proxmox, deploy MinIO as an LXC container (512MB RAM, 1 vCPU, a dedicated disk or ZFS dataset for the data directory). The MinIO team publishes a single binary — no Docker required, no Kubernetes.
```bash
Like what you see?
wget https://dl.min.io/server/minio/release/linux-amd64/minio
chmod +x minio
MINIO_ROOT_USER=admin MINIO_ROOT_PASSWORD=changeme \
./minio server /mnt/data --console-address ":9001"
```
Create a bucket called `terraform-state` and a dedicated access key with read/write permissions scoped to that bucket only. Do not use your root credentials in Terraform.
Your backend block looks like this:
```hcl
terraform {
backend "s3" {
bucket = "terraform-state"
key = "homelab/proxmox/terraform.tfstate"
region = "us-east-1" # Required field, value is arbitrary
endpoint = "http://minio.lab.internal:9000"
access_key = "your-minio-access-key"
secret_key = "your-minio-secret-key"
skip_credentials_validation = true
skip_metadata_api_check = true
skip_region_validation = true
force_path_style = true
}
}
```
The `force_path_style = true` flag is non-negotiable for MinIO. Without it, Terraform tries to reach `terraform-state.minio.lab.internal` instead of `minio.lab.internal/terraform-state`, and you'll spend an hour debugging DNS that isn't broken.
Stage 3: Add State Locking
MinIO doesn't implement the DynamoDB API, so you need a sidecar for locking. The cleanest option for a home lab is `dynamo-local` — Amazon's official local DynamoDB emulator, packaged as a JAR or Docker image.
Run it on the same LXC as MinIO:
```bash
docker run -p 8000:8000 amazon/dynamodb-local
```
Then create your lock table:
```bash
aws dynamodb create-table \
--table-name terraform-locks \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
--endpoint-url http://localhost:8000
```
Add the `dynamodb_table` and `dynamodb_endpoint` parameters to your backend block. Now every `terraform apply` acquires a lock entry before writing state, and any concurrent run will fail with a clear error instead of silently corrupting your file.
Stage 4: Automate State Backups
State locking prevents corruption. Backups prevent loss. You need both.
Set up a cron job on your MinIO host that runs `restic` against your TrueNAS or a second storage target:
```bash
0 2 * root restic -r sftp:truenas.lab.internal:/mnt/pool/backups/terraform backup /mnt/data/terraform-state/ --tag terraform-state
```
Enable versioning on your MinIO bucket via the MinIO console (Buckets → terraform-state → Versioning → Enable). This gives you object-level version history independent of your restic snapshots — two independent recovery paths for the same data.
---
If you already run GitLab CE on your lab (and if you followed Chapter 2's inventory, you know whether you do), you get managed Terraform state for free. Navigate to your project → Infrastructure → Terraform, and GitLab generates a backend configuration block with HTTP locking already wired in:
```hcl
terraform {
backend "http" {
address = "https://gitlab.lab.internal/api/v4/projects/12/terraform/state/homelab"
lock_address = "https://gitlab.lab.internal/api/v4/projects/12/terraform/state/homelab/lock"
unlock_address = "https://gitlab.lab.internal/api/v4/projects/12/terraform/state/homelab/lock"
username = "terraform-bot"
password = "glpat-xxxxxxxxxxxx"
lock_method = "POST"
unlock_method = "DELETE"
retry_wait_min = 5
}
}
```
GitLab stores state versions automatically and shows diffs in the UI. If you're already running GitLab for your lab's Git hosting, this is the zero-overhead choice.
---
Your lab topology inventory from Chapter 1 identified resources that already exist — VMs, networks, storage pools that Terraform doesn't know about yet. Before your state backend matters, you need to get those resources into state without destroying and recreating them.
The `terraform import` workflow has a specific sequence that tutorials always get wrong:
The goal is a clean `No changes. Infrastructure is up-to-date.` plan output. That's your confirmation that Terraform's state accurately reflects reality.
---
Scenario: Marcus runs a home lab with Proxmox on two nodes, TrueNAS Scale on a third machine, and a GitLab CE instance he set up for personal projects. He's been managing 14 VMs manually and wants to bring them under Terraform control.
Marcus opens the decision tree worksheet and answers three questions: Does he already run MinIO? No. Does he run GitLab CE? Yes. Is GitLab highly available? No, it's a single VM. He lands on GitLab CE Managed State as his backend.
He creates a `terraform-bot` GitLab user with Developer access to his `homelab-iac` project, generates a project access token, and pastes the backend block into his `backend.tf` file. He runs `terraform init` and migrates his existing local state with `terraform init -migrate-state`. GitLab now shows version 1 of his state file in the UI.
Next, Marcus picks his most critical VM — his pfSense router — and imports it. He writes the resource block first, referencing the topology inventory he built in Chapter 1. The first `terraform plan` after import shows 11 attribute mismatches: CPU type, BIOS mode, network bridge names, and several cloud-init fields. He fixes them one by one. After four iterations, the plan is clean.
He then sets up a nightly restic backup from his GitLab VM to TrueNAS, targeting the GitLab backup directory that already includes Terraform state exports. Two independent backup paths, zero
Like what you see?
You've mapped your lab topology. Now it's time to stop clicking through the Proxmox web UI and start treating your compute layer like the code it should be. Every VM you've been manually cloning from a half-configured template is technical debt waiting to bite you at 11pm during a failed upgrade.
---
The Golden Image Pipeline is a three-stage process that separates image creation from infrastructure provisioning from instance configuration. Most home labbers collapse all three into one messy manual process — which is exactly why rebuilding takes days instead of minutes.
Stage 1: Build the Base Image (Packer + cloud-init)
Packer talks directly to the Proxmox API and builds a VM from an ISO, installs the OS unattended using a cloud-init seed, then converts the result into a Proxmox template. You do this once per OS flavor. Your `packer/ubuntu-2204.pkr.hcl` file should look structurally like this:
```hcl
source "proxmox-iso" "ubuntu-2204" {
proxmox_url = "https://${var.proxmox_host}:8006/api2/json"
username = var.proxmox_user
password = var.proxmox_password
insecure_skip_tls_verify = true
node = "pve01"
iso_url = "https://releases.ubuntu.com/22.04/ubuntu-22.04.3-live-server-amd64.iso"
iso_checksum = "sha256:a4acfda10b18da50e2ec50ccaf860d7f20b389df8765611142305c0e911d16fd"
vm_id = 9000
vm_name = "ubuntu-2204-template"
template_description = "Ubuntu 22.04 Golden Image — built with Packer"
cores = 2
memory = 2048
disk {
disk_size = "20G"
storage_pool = "local-lvm"
type = "scsi"
}
network_adapters {
model = "virtio"
bridge = "vmbr0"
}
cloud_init = true
cloud_init_storage_pool = "local-lvm"
boot_command = [
"<esc><wait>",
"linux /casper/vmlinuz --- autoinstall ds=nocloud-net;s=http://{{ .HTTPIP }}:{{ .HTTPPort }}/",
"<enter><wait>",
"initrd /casper/initrd",
"<enter><wait>",
"boot<enter>"
]
http_directory = "http"
}
```
Your `http/` directory contains `user-data` and `meta-data` files. The `user-data` file is your cloud-init autoinstall config — set your locale, create your default user with an SSH key, install `qemu-guest-agent`, and nothing else. Keep the golden image lean. Application-layer packages belong in Terraform provisioners or Ansible, not the base image.
Run `packer build ubuntu-2204.pkr.hcl` and Proxmox will spin up a VM, install Ubuntu, shut it down, and register it as template ID 9000. Repeat this process for Debian 12 (template 9001) and Rocky Linux 9 (template 9002). These three templates cover 95% of home lab workloads.
Stage 2: Provision Instances (Terraform)
With templates in place, Terraform's job is to clone them and inject per-VM configuration via cloud-init. The `bpg/proxmox` provider (formerly `telmate/proxmox`) is the current community standard — it has better resource coverage and more predictable behavior for LXC containers.
A complete VM resource block for a home lab DNS server:
```hcl
resource "proxmox_virtual_environment_vm" "dns_primary" {
name = "dns-01"
description = "Primary DNS — AdGuard Home + Unbound"
node_name = "pve01"
vm_id = 101
tags = ["dns", "infrastructure", "terraform"]
clone {
vm_id = 9000 # Ubuntu 22.04 golden image
full = true
}
cpu {
cores = 2
type = "x86-64-v2-AES"
}
memory {
dedicated = 1024
}
disk {
datastore_id = "local-lvm"
size = 20
interface = "scsi0"
discard = "on"
ssd = true
}
network_device {
bridge = "vmbr10" # Infrastructure VLAN
vlan_id = 10
model = "virtio"
}
initialization {
ip_config {
ipv4 {
address = "10.10.10.53/24"
gateway = "10.10.10.1"
}
}
dns {
servers = ["1.1.1.1"]
domain = "lab.internal"
}
user_account {
username = "labadmin"
keys = [var.ssh_public_key]
}
}
lifecycle {
prevent_destroy = true # DNS going down breaks everything
}
}
```
Notice `prevent_destroy = true` on DNS. Apply this to any VM whose absence breaks other provisioning — your DNS server, your NTP server, your Vault instance if you're running one. Terraform will refuse to destroy these without an explicit `-target` override, which forces you to be intentional.
For rolling updates on non-critical VMs, use `create_before_destroy = true`. This matters when you're rebuilding an app server that sits behind a load balancer — Terraform spins up the replacement, then tears down the old one, keeping your service online.
Stage 3: LXC Containers for Lightweight Services
Not everything needs a full VM. Containers are ideal for stateless services, monitoring agents, and anything where you want sub-second startup times. The golden rule for home lab LXC: always use unprivileged containers (`unprivileged = true`). Privileged containers can escape to the host — that's a risk even in a home lab if you're running anything internet-facing.
```hcl
resource "proxmox_virtual_environment_container" "monitoring" {
description = "Prometheus + Node Exporter"
node_name = "pve01"
vm_id = 200
initialization {
hostname = "monitoring-01"
ip_config {
ipv4 {
address = "10.10.10.100/24"
gateway = "10.10.10.1"
}
}
user_account {
keys = [var.ssh_public_key]
}
}
cpu {
cores = 2
}
memory {
dedicated = 512
swap = 512
}
disk {
datastore_id = "local-lvm"
size = 8
}
network_interface {
name = "eth0"
bridge = "vmbr10"
vlan_id = 10
}
operating_system {
template_file_id = "local:vztmpl/debian-12-standard_12.2-1_amd64.tar.zst"
type = "debian"
}
unprivileged = true
features {
nesting = false # Only enable if running Docker inside LXC — adds attack surface
}
}
```
Parameterizing for Home Lab Constraints
Your home lab isn't AWS. You have a fixed RAM ceiling — probably 32GB or 64GB across one or two nodes. Use `terraform.tfvars` to define your hardware envelope and variable validation to enforce it:
```hcl
variable "vm_memory_mb" {
type = number
description = "VM memory allocation in MB"
validation {
condition = var.vm_memory_mb >= 512 && var.vm_memory_mb <= 16384
error_message = "Memory must be between 512MB and 16GB per VM. Check your node capacity."
}
}
```
This prevents you from accidentally allocating 32GB to a single test VM and wondering why everything else is swapping.
Provisioners: The Last Resort
`remote-exec` provisioners exist for situations where cloud-init isn't enough and you don't have Ansible in your pipeline yet. Use them sparingly — they create implicit dependencies on network connectivity and SSH availability that make `terraform apply` flaky. Acceptable home lab use cases: installing a specific package version that isn't in apt repos, registering a VM with a license server, or running a one-time database initialization script.
```hcl
provisioner "remote-exec" {
inline = [
"sudo apt-get update -qq",
"sudo apt-get install -y adguardhome"
]
connection {
type = "ssh"
user = "labadmin"
private_key = file("~/.ssh/lab_ed25519")
host = self.initialization[0].ip_config[0].ipv4[0].address
}
}
```
If you find yourself writing more than 5 lines of `remote-exec`, stop and write an Ansible playbook instead.
---
Scenario: Marcus runs a Proxmox cluster on two nodes — a repurposed Dell PowerEdge R620 (64GB RAM) and a mini PC (16GB RAM). He's been manually cloning a "base Ubuntu" template he built 18 months ago that still has an old SSH key baked in from a previous job. Every new VM gets that key. He's also been forgetting to set static IPs consistently, so his DNS records drift.
Using the Golden Image Pipeline, Marcus builds fresh templates with Packer on a Sunday afternoon. His `
You've already codified your VM images and state backend — but every time you rebuild, you're still clicking through the pfSense or OPNsense GUI to recreate VLANs, punch firewall holes, and add DNS overrides. That manual gap is exactly where rebuilds go sideways at 11pm.
The Network Segmentation Code Map is a five-stage process for translating your entire layer-2/layer-3 topology — VLANs, firewall rules, DHCP reservations, and DNS records — into Terraform resources that apply in dependency order. The key insight is that network configuration has strict sequencing requirements: a firewall rule referencing an alias that doesn't exist yet will fail silently or apply incorrectly. This framework enforces that ordering explicitly.
Stage 1: Enumerate Your Segments as Variables
Before writing a single resource block, define your VLANs as a structured Terraform variable. This becomes the single source of truth that every downstream resource references.
```hcl
variable "vlans" {
type = map(object({
id = number
description = string
subnet = string
gateway = string
}))
default = {
trusted = { id = 10, description = "Trusted Devices", subnet = "10.10.10.0/24", gateway = "10.10.10.1" }
iot = { id = 20, description = "IoT Devices", subnet = "10.10.20.0/24", gateway = "10.10.20.1" }
lab = { id = 30, description = "Home Lab", subnet = "10.10.30.0/24", gateway = "10.10.30.1" }
dmz = { id = 40, description = "DMZ Services", subnet = "10.10.40.0/24", gateway = "10.10.40.1" }
}
}
```
This map-of-objects pattern lets you `for_each` over segments consistently across interface, DHCP, and DNS resources — change the map once, and everything downstream updates.
Stage 2: Declare Interfaces and VLANs via Provider
For OPNsense, use the `browningluke/opnsense` Terraform provider. For pfSense, the `marshallford137/pfsense` provider covers most resources. Both require API access enabled on the firewall — do this once manually, then never touch the GUI again.
```hcl
resource "opnsense_vlan" "segments" {
for_each = var.vlans
device = "igb1"
tag = each.value.id
description = each.value.description
}
```
Stage 3: Build Aliases Before Rules
Firewall aliases are the dependency anchors. Define host groups, network groups, and port groups as `opnsense_firewall_alias` resources before any rule resources. Terraform's dependency graph handles ordering when you reference alias names in rule resources directly — but only if you use resource references, not hardcoded strings.
```hcl
resource "opnsense_firewall_alias" "lab_servers" {
name = "lab_servers"
type = "host"
description = "Lab segment hosts allowed to reach DMZ"
content = ["10.10.30.10", "10.10.30.11"]
}
```
Stage 4: Codify Rules with Explicit Sequence Numbers
OPNsense and pfSense both evaluate rules top-down. Terraform doesn't guarantee apply order within a resource type unless you use `depends_on` or sequence fields. Always set explicit sequence numbers and use `depends_on` referencing your alias resources.
```hcl
resource "opnsense_firewall_rule" "lab_to_dmz_https" {
sequence = 100
action = "pass"
direction = "in"
interface = "opt3" # lab VLAN interface
source = opnsense_firewall_alias.lab_servers.name
destination = "10.10.40.0/24"
destination_port = "443"
protocol = "tcp"
description = "Lab servers to DMZ HTTPS"
depends_on = [opnsense_firewall_alias.lab_servers]
}
```
Stage 5: DNS Records and DHCP Reservations as Code
Pi-hole's REST API and AdGuard Home's API are both consumable via the `restapi` Terraform provider or targeted shell provisioners. For AdGuard Home:
```hcl
resource "restapi_object" "adguard_dns_record" {
for_each = local.dns_records
path = "/control/rewrite/add"
data = jsonencode({
domain = each.value.hostname
answer = each.value.ip
})
}
```
DHCP reservations map directly to OPNsense's `opnsense_dhcp_static_map` resource — MAC address to IP, documented in your `terraform.tfvars` alongside the device inventory you built in Chapter 1.
---
Terraform needs network connectivity to provision the network. If your OPNsense box is the target and the gateway, a failed apply can lock you out. The fix is a bootstrap interface: keep one physical interface (`igb0`) statically configured as your management LAN outside of Terraform's scope. Tag it with a `lifecycle { prevent_destroy = true }` comment in your documentation and a matching `ignore_changes` block if you do import it. Everything else — VLANs, rules, DNS — is Terraform-managed. Your management interface is the ladder you don't kick away.
---
Marcus runs a home lab with Proxmox on two nodes, an OPNsense VM on dedicated hardware, and a Pi-hole container on VLAN 10. His IoT VLAN (20) had 14 firewall rules built up over two years of GUI clicking — some redundant, some conflicting. When he tried to upgrade OPNsense and the config backup failed to restore cleanly, he spent six hours recreating rules from memory.
After applying the Network Segmentation Code Map, Marcus defined all four VLANs in a single `vlans` variable map. He ran `terraform import` to pull his existing aliases into state (using the workflow from Chapter 3), then translated each firewall rule into a numbered resource block. His IoT rules collapsed from 14 to 9 once he could see them side-by-side in code — three were duplicates, two were shadowed by earlier rules and never evaluated.
His Pi-hole DNS overrides — 23 internal hostnames — became a `local.dns_records` map in `terraform.tfvars`. The `restapi` provider pushes them on every apply. Now his full network configuration is a `terraform apply` that completes in under four minutes.
---
Use this template to document your current network state before writing a single resource block. Complete columns left-to-right; the Translation Reference at the bottom maps each field to its Terraform argument.
```
SEGMENT INVENTORY
─────────────────────────────────────────────────────────────────
Segment Name | VLAN ID | Subnet CIDR | Gateway IP | Interface
-------------|---------|-------------|------------|----------
Trusted | | | |
IoT | | | |
Lab | | | |
DMZ | | | |
[Add rows] | | | |
FIREWALL RULE INVENTORY (complete one table per segment)
─────────────────────────────────────────────────────────────────
Segment: _______________
Seq# | Action | Protocol | Source (IP/Alias) | Destination | Port | Description
-----|--------|----------|-------------------|-------------|------|------------
| | | | | |
| | | | | |
| | | | | |
ALIAS INVENTORY
─────────────────────────────────────────────────────────────────
Alias Name | Type (host/network/port) | Members | Used In Rules (Seq#)
-----------|--------------------------|---------|---------------------
| | |
| | |
DNS OVERRIDE INVENTORY
─────────────────────────────────────────────────────────────────
Hostname (FQDN) | Internal IP | DNS Server (Pi-hole/AdGuard) | Notes
----------------|-------------|------------------------------|------
| | |
| | |
DHCP RESERVATION INVENTORY
─────────────────────────────────────────────────────────────────
Device Name | MAC Address | Reserved IP | VLAN Segment
------------|-------------|-------------|-------------
| | |
| | |
TERRAFORM TRANSLATION REFERENCE
─────────────────────────────────────────────────────────────────
Worksheet Field → Terraform Argument
─────────────────────────────────────────────────────────────────
VLAN ID → tag (opnsense_vlan)
Subnet CIDR → subnet in var.vlans map
Seq# → sequence (opnsense_firewall_rule)
Action → action = "pass" | "block" | "reject"
Source (Alias name) → source = opnsense_firewall_alias.<name>.name
Hostname → domain (restapi AdGuard rewrite)
MAC Address → mac (opnsense_dhcp_static_map)
Reserved IP → ipaddr (opnsense_dhcp_static_map)
```
---
Like what you see?
You've got VMs provisioning cleanly and your state is locked in a self-hosted backend — but the moment your application stack needs a database volume, a TLS certificate, and a reverse proxy entry, you're back to clicking through UIs and hoping you remember the steps next time. This chapter closes that gap by treating storage, services, and secrets as first-class Terraform citizens.
---
Every self-hosted service stack has an implicit boot order that most people manage manually without realizing it: storage must exist before the database can mount it, the database must be healthy before the application starts, and the reverse proxy needs a valid certificate before it can route traffic. When you're clicking through UIs, your brain handles this sequencing automatically. When you're codifying it, Terraform needs explicit instruction.
The Service Dependency Orchestration Model (SDOM) is a five-stage process that forces you to make those implicit dependencies explicit before writing a single resource block.
Stage 1 — Enumerate Primitives
List every infrastructure primitive your stack requires: datasets, shares, DNS records, certificates, secrets, container networks, and volumes. Don't think in services yet — think in atoms.
Stage 2 — Assign Ownership Layers
Group primitives into four layers: Storage (TrueNAS datasets, NFS/SMB shares, snapshot policies), Network (VLANs, DNS entries, firewall rules), Platform (Docker networks, Traefik entrypoints, certificate resolvers), and Application (containers, environment variables, secrets injection). Each layer depends on the one above it.
Stage 3 — Map Explicit Dependencies
For each resource, write down what it reads from or waits on. A PostgreSQL container reads a dataset path from TrueNAS. Traefik reads a certificate from a resolver that needs a DNS record. This becomes your `depends_on` and `data` source map.
Stage 4 — Modularize by Layer
Each ownership layer becomes a Terraform module. Your root `main.tf` calls them in order: `module.storage` → `module.platform` → `module.application`. Pass outputs from lower layers as inputs to higher ones — never hardcode paths or IPs across module boundaries.
Stage 5 — Validate Boot Order in CI
Before `terraform apply`, run `terraform plan` in your pipeline and grep for dependency cycles. A plan that succeeds in isolation but fails on a fresh apply usually means a missing `depends_on` or a resource that assumes a side effect rather than declaring it.
---
The `dariusbakunas/truenas` Terraform provider gives you full API coverage for TrueNAS CORE and SCALE. Authentication uses an API key — generate one in TrueNAS under Account → API Keys and store it with SOPS (covered below).
```hcl
provider "truenas" {
api_key = var.truenas_api_key
base_url = "https://truenas.lab.internal/api/v2.0"
}
resource "truenas_dataset" "postgres_data" {
pool = "tank"
name = "postgres/immich"
comments = "Managed by Terraform — do not edit manually"
share_type = "UNIX"
atime = "OFF"
recordsize = "16K" # Optimal for PostgreSQL workloads
}
resource "truenas_share_nfs" "postgres_nfs" {
paths = ["/mnt/tank/postgres/immich"]
comment = "Immich PostgreSQL data"
networks = ["10.10.20.0/24"] # Your services VLAN
maproot_user = "root"
maproot_group = "wheel"
depends_on = [truenas_dataset.postgres_data]
}
resource "truenas_snapshot_task" "postgres_hourly" {
dataset = truenas_dataset.postgres_data.id
recursive = false
lifetime_value = 2
lifetime_unit = "WEEK"
schedule = {
minute = "0"
hour = "*"
dom = "*"
month = "*"
dow = "*"
}
depends_on = [truenas_dataset.postgres_data]
}
```
Notice the `recordsize = "16K"` — that's not a generic setting. PostgreSQL performs significantly better with 8K or 16K ZFS record sizes versus the default 128K. These details live in your code now, not in someone's memory.
---
The `kreuzwerker/docker` provider lets you manage containers, networks, and volumes directly. For self-hosted services, the pattern is: Terraform provisions the infrastructure layer, Docker Compose handles the application layer, and Terraform calls `docker_container` or invokes Compose via `null_resource` with a `local-exec` provisioner.
For production-grade home labs, the hybrid approach works best: use `docker_network` and `docker_volume` resources in Terraform (so they're tracked in state), but reference a Compose file for the actual service definitions:
```hcl
resource "docker_network" "services" {
name = "services_net"
driver = "bridge"
ipam_config {
subnet = "172.20.0.0/24"
gateway = "172.20.0.1"
}
}
resource "docker_volume" "postgres_data" {
name = "immich_postgres"
driver = "local"
driver_opts = {
type = "nfs"
o = "addr=10.10.10.5,rw,nfsvers=4"
device = ":/mnt/tank/postgres/immich"
}
depends_on = [module.storage]
}
```
The NFS-backed Docker volume is the critical link between your TrueNAS module and your application module. When `module.storage` completes, the NFS share is live and the volume can mount it.
---
Traefik's file provider watches a directory for YAML configuration files. Terraform can write those files using `local_file` resources, giving you version-controlled dynamic routing without touching Traefik's API directly.
```hcl
resource "local_file" "traefik_immich_router" {
filename = "/opt/traefik/dynamic/immich.yml"
content = templatefile("${path.module}/templates/traefik_route.yml.tpl", {
service_name = "immich"
backend_url = "http://immich:3001"
hostname = "photos.lab.internal"
cert_resolver = var.use_letsencrypt ? "letsencrypt" : "internal"
})
}
```
Your `traefik_route.yml.tpl` template handles both internal self-signed and Let's Encrypt paths based on a single variable. For internal-only services, use a self-signed wildcard cert generated with `tls_self_signed_cert` and `tls_private_key` Terraform resources. For services exposed externally, set `cert_resolver = "letsencrypt"` and ensure your DNS-01 challenge credentials are injected as environment variables — not hardcoded in the Traefik static config.
For Docker label-based routing, the `docker_container` resource accepts a `labels` block:
```hcl
labels = {
"traefik.enable" = "true"
"traefik.http.routers.immich.rule" = "Host(`photos.lab.internal`)"
"traefik.http.routers.immich.tls" = "true"
"traefik.http.routers.immich.tls.certresolver" = "internal"
"traefik.http.services.immich.loadbalancer.server.port" = "3001"
}
```
Both approaches work. Use file-based for services not managed by the Docker provider, and label-based for containers Terraform directly manages.
---
You have three viable options for a home lab, ordered by complexity:
SOPS + age is the right choice for most home labs. Age generates a keypair; you encrypt your secrets file with the public key; SOPS handles the encryption envelope. The private key lives on your workstation (or in a Proxmox VM) and never touches your git repo.
```bash
age-keygen -o ~/.config/sops/age/keys.txt
Like what you see?
sops --age=$(cat ~/.config/sops/age/keys.txt | grep "public key" | awk '{print $4}') \
--encrypt secrets.yaml > secrets.enc.yaml
```
In your `.sops.yaml`:
```yaml
creation_rules:
- path_regex: secrets\.yaml$
age: age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p
```
Terraform reads decrypted values at plan time via the `sops_file` data source from the `carlpett/sops` provider:
```hcl
data "sops_file" "secrets" {
source_file = "${path.module}/secrets.enc.yaml"
}
locals {
truenas_api_key = data.sops_file.secrets.data["truenas_api_key"]
postgres_password = data.sops_file.secrets.data["postgres_password"]
}
```
Self-hosted Vault makes sense if you already have it running or plan to manage secrets for more than one project. The `hashicorp/vault` provider works identically to the cloud version — point it at your internal Vault address and authenticate with AppRole or a token.
Plain `terraform.tfvars` with git-crypt is the escape hatch. It works, but it's fragile — one misconfigured `.gitignore` and your secrets are in your repo history. Use SOPS
You started with a single `main.tf` that provisioned one VM. Now you have eight files, three VLANs, and a growing suspicion that you've copy-pasted the same VM block six times with slightly different names. That's not Infrastructure-as-Code — that's Infrastructure-as-Spreadsheet.
The Lab Module Architecture is a five-stage process for decomposing a monolithic Terraform configuration into a hierarchy of reusable, composable units — without over-engineering it into something that requires a PhD to maintain.
Stage 1: Run the Duplication Detector
Before touching a single file, audit your existing code for patterns that repeat with minor variation. The threshold for extraction in a home lab context is lower than in enterprise work: if a resource block appears more than twice with different variable values, it's a module candidate. If it appears once but contains more than four interdependent resources (VM + disk + DNS + firewall), it's also a module candidate because the unit of deployment is meaningful, not just the repetition.
Stage 2: Define Your Module Boundaries
Home labs have two natural module layers:
The decision rule: if you'd rebuild these resources together or tear them down together, they belong in the same stack module.
Stage 3: Build the Base VM Module
Create `modules/base-vm/main.tf`. The key variables are `template_name` (string, references your Packer-built golden image), `vm_name`, `cores`, `memory`, `vlan_tag`, `cloud_init_file`, and `target_node`. Set conservative defaults — `cores = 2`, `memory = 2048` — so callers only override what they need.
The cloud-init variable should accept a path to a `.yml` file, not inline content. This keeps your module clean and lets you version cloud-init configs separately in your repo. Use Terraform's `templatefile()` function to render it with per-VM variables like hostname and static IP.
Stage 4: Build the Service Stack Module
Create `modules/service-stack/`. This module's `variables.tf` accepts a `service_name`, `vlan_id`, `ip_address`, `storage_size_gb`, and a `firewall_rules` list of objects. Internally it calls `module "vm" { source = "../base-vm" }`, then creates dependent resources using the VM module's outputs. The firewall rule resource references `module.vm.ip_address` directly — no hardcoded IPs, no manual coordination.
Stage 5: Manage Environments with Directory Separation
Workspaces are tempting for dev/staging separation, but they share a single backend configuration and make it easy to accidentally apply staging changes to production. For a home lab, directory-based separation is cleaner and more explicit:
```
environments/
dev/
main.tf # calls modules, dev-specific vars
terraform.tfvars
staging/
main.tf
terraform.tfvars
```
Both environments call the same modules from `../../modules/`. The `terraform.tfvars` files hold environment-specific values like VLAN IDs and IP ranges. This mirrors the pattern your Self-Hosted State Vault from Chapter 3 already supports — each environment directory gets its own backend block pointing to a separate state file path.
The `for_each` Fleet Pattern
For homogeneous nodes — Kubernetes workers, monitoring exporters, build agents — use `for_each` on a map variable instead of duplicating module blocks:
```hcl
variable "k8s_workers" {
type = map(object({
ip = string
cores = number
memory = number
}))
}
module "k8s_worker" {
for_each = var.k8s_workers
source = "../../modules/base-vm"
vm_name = "k8s-worker-${each.key}"
ip_address = each.value.ip
cores = each.value.cores
memory = each.value.memory
}
```
In `terraform.tfvars`, define your five workers as a map. Adding a sixth node is a one-line addition to the map — no new resource blocks, no copy-paste.
Local Module Registry with Versioned Tags
Store all modules in a `modules/` directory at the repo root. Tag releases in git: `git tag modules/base-vm/v1.2.0`. Reference modules with a git source and tag in production environments:
```hcl
source = "git::https://gitlab.homelab.local/infra/terraform-modules.git//modules/base-vm?ref=v1.2.0"
```
In dev environments, use a relative path for rapid iteration. This gives you the stability of versioned modules without standing up a private Terraform registry.
---
Marcus runs a home lab with Proxmox across three nodes. His original `main.tf` had separate `proxmox_vm_qemu` blocks for his Gitea server, his Nextcloud instance, and his Vaultwarden deployment — 180 lines of near-identical configuration with different VM names, IPs, and VLAN tags. DNS records and firewall rules were managed manually in pfSense.
After applying the Lab Module Architecture, his root `main.tf` became:
```hcl
module "gitea" {
source = "../../modules/service-stack"
service_name = "gitea"
vlan_id = 20
ip_address = "10.20.0.10"
storage_size_gb = 50
firewall_rules = [{ port = 3000, proto = "tcp", src = "10.0.0.0/8" }]
}
module "nextcloud" {
source = "../../modules/service-stack"
service_name = "nextcloud"
vlan_id = 20
ip_address = "10.20.0.11"
storage_size_gb = 500
firewall_rules = [{ port = 443, proto = "tcp", src = "any" }]
}
```
His 180-line `main.tf` became 40 lines. Adding Vaultwarden took 8 lines and 3 minutes. When he spun up a staging environment for testing a Nextcloud upgrade, he created `environments/staging/main.tf`, pointed it at the same modules, and ran `terraform apply` — a complete staging clone was live in 11 minutes, DNS and firewall rules included.
---
Use this worksheet against the code you built in Chapters 4–6. Complete each section before writing a single new `.tf` file.
---
Section 1 — Duplication Detector
List every resource block in your current configuration. Mark each one:
| Resource Block Name | Type | Appears N Times | Varies By | Module Candidate? (Y/N) |
|---|---|---|---|---|
| `proxmox_vm_qemu.gitea` | VM | ___ | ___ | ___ |
| `proxmox_vm_qemu.nextcloud` | VM | ___ | ___ | ___ |
| (add all your blocks) | | | | |
Extraction threshold: Mark Y if N ≥ 2 OR if the block has ≥ 4 dependent resources.
---
Section 2 — Module Boundary Map
For each group of Y-marked resources, define the module:
```
Module Name: ______________________
Layer (base / stack): ______________
Input Variables:
- _________________ (type: ______, default: ______)
- _________________ (type: ______, default: ______)
- _________________ (type: ______, default: ______)
Output Values:
- _________________
- _________________
Internal Resources:
- _________________
- _________________
```
---
Section 3 — Fleet Inventory
List any resource groups that are homogeneous (same config, different names/IPs):
```
Fleet Name: _______________________
Count: ____________________________
Differentiating attributes (map keys):
- _________________
- _________________
for_each map variable name: ________
```
---
Section 4 — Environment Separation Plan
```
Environments needed: dev / staging / prod (circle all that apply)
Backend state path for dev: ___________________________
Backend state path for staging: ___________________________
Shared module source path: ../../modules/
Environment-specific variables:
dev: VLAN prefix _____, IP range _____
staging: VLAN prefix _____, IP range _____
```
---
Section 5 — Plan Comparison Validation
After refactoring, run `terraform plan` in your existing environment directory. Record:
```
Resources to add: _____ (should be 0 if refactor is clean)
Resources to change: _____ (should be 0)
Resources to destroy:_____ (should be 0)
```
A clean refactor produces a no-op plan. Any non-zero value means a resource attribute changed during extraction — find it before applying.
---
Like what you see?
You've spent seven chapters building something most home labbers never achieve: a fully codified, version-controlled infrastructure stack. The question isn't whether your Terraform code works today — it's whether it will work at 2 AM six months from now when your Proxmox host fails after a botched kernel upgrade and you're staring at a blank hypervisor.
This chapter closes the loop. By the end, you'll have a self-hosted CI/CD pipeline validating every change before it touches your lab, a tested disaster recovery runbook with real timing data, and a drift detection system that catches configuration rot before it becomes a crisis.
---
The Full-Stack Recovery Protocol is a five-phase system that transforms your Terraform codebase from "code that works when I run it" into a self-validating, self-documenting recovery machine. It addresses the two failure modes that kill home lab IaC projects: breaking changes that slip through undetected, and the bootstrap paradox — the gap between a bare hypervisor and the moment Terraform can take over.
Phase 1: Pipeline Infrastructure
Deploy your CI/CD runner on a VM that is not managed by the pipeline itself. This is non-negotiable. Woodpecker CI is the recommended choice here — it's lightweight, runs on a single container, integrates natively with Gitea (which you're likely already running from Chapter 3's state backend work), and its YAML syntax is close enough to GitHub Actions that the learning curve is minimal.
Minimum viable Woodpecker setup: one server container, one agent container, a PostgreSQL backend (reuse your existing instance if you have one), and an OAuth application registered in Gitea. The agent needs network access to your Proxmox API, your Minio state backend, and your DNS server. Mount your Terraform provider cache as a volume so pipeline runs don't re-download 200MB of provider binaries on every commit.
Phase 2: Pipeline Stages
Your `.woodpecker.yml` pipeline runs five sequential stages, with one gate:
Phase 3: Automated Testing with Terratest
Terratest runs Go-based integration tests against your modules after apply. For a home lab, focus on three test categories: VM reachability (SSH to the provisioned IP within 60 seconds), service health (HTTP 200 from your internal DNS-resolved service endpoint), and idempotency (run `terraform plan` again immediately after apply and assert zero changes). Write one test file per module. Keep them in a `tests/` directory alongside your module code.
Phase 4: The Bootstrap Paradox Documentation
Every Terraform codebase has a line zero — the moment before Terraform exists. Your pipeline can't provision the Gitea server that hosts your pipeline code. Your state backend can't store the state of the VM running your state backend. Document this explicitly.
The bootstrap sequence is a numbered list of manual steps, each with an estimated time, that gets a bare hypervisor to the point where `terraform init && terraform apply` can handle everything else. It lives in `BOOTSTRAP.md` at the root of your repo. It is the most important file in your codebase.
Phase 5: Scheduled Drift Detection
Add a nightly cron pipeline in Woodpecker that runs `terraform plan` across all workspaces and pipes the output to a script that checks for non-zero planned changes. If drift is detected, push a notification to your preferred alerting channel — a Gotify server on your lab network works well for this. Drift that goes undetected for weeks becomes a merge conflict between your code and reality. Catching it daily keeps the delta manageable.
---
Marcus runs a six-node Proxmox cluster with 40+ VMs across four VLANs. His Terraform codebase covers VM provisioning, Nginx Proxy Manager configuration, Pi-hole DNS records, and Minio bucket policies. Before this chapter's work, he'd had two incidents where `terraform apply` silently failed mid-run, leaving infrastructure in a half-applied state that took hours to untangle.
After implementing the Full-Stack Recovery Protocol, his pipeline caught a breaking change before it reached production: a module refactor had renamed a Proxmox VM resource, which would have caused Terraform to destroy and recreate 12 VMs. The plan stage surfaced this as a `-12/+12` change. Marcus reviewed the artifact, recognized the problem, added `moved` blocks to the module, and re-ran. Zero downtime, zero destroyed VMs.
His bootstrap sequence clocks in at 22 minutes: 8 minutes for Proxmox ISO install and initial network config, 6 minutes for the Gitea/Woodpecker/Minio seed VMs (deployed via a standalone `bootstrap.sh` script that uses the Proxmox API directly), and 8 minutes for `terraform apply` to rebuild everything else. His last timed drill came in at 31 minutes — 9 minutes over target, traced to a Minio bucket policy that needed manual seeding before the Terraform provider could authenticate. That fix took 20 minutes to implement and is now part of the bootstrap script.
---
Use this template to document your bootstrap sequence, then run a timed drill against a non-critical subset of your lab. Fill in each section before you run the drill, then update it with actual results afterward.
---
BOOTSTRAP.md Template
```
Last validated: ____________
Validated by: ____________
Target rebuild time: ____________ minutes
Like what you see?
Estimated time: ______ minutes
Actual time (drill): ______ minutes
Step 1: ____________________________________________
Step 2: ____________________________________________
Step 3: ____________________________________________
[Add steps until Proxmox API is reachable]
Like what you see?
Estimated time: ______ minutes
Actual time (drill): ______ minutes
Script location: ____________
Required environment variables:
- ____________
- ____________
Expected output: ____________
Failure indicators: ____________
Estimated time: ______ minutes
Actual time (drill): ______ minutes
Minio bucket name: ____________
State files to restore from backup: ____________
Backup location: ____________
Like what you see?
Estimated time: ______ minutes
Actual time (drill): ______ minutes
Command sequence:
cd ____________
terraform init -backend-config=____________
terraform workspace select ____________
terraform apply
| Drill Date | Total Time | Failures Encountered | Fixes Applied |
|------------|------------|---------------------|---------------|
| | | | |
| | | | |
Like what you see?
```
---
Drill Execution Log (fill in during the drill)
```
Drill date: ____________
Resources destroyed for drill: ____________
Pipeline URL for rebuild run: ____________
Timeline:
Drill start: ______:______
Phase 0 complete: ______:______
Phase 1 complete: ______:______
Phase 2 complete: ______:______
Terraform apply started: ______:______
All services healthy: ______:______
Total elapsed: ______ minutes
Failures encountered:
1. ____________________________________________
Root cause: ____________________________________________
Fix applied: ____________________________________________
Code change required: [ ] Yes [ ] No
2. ____________________________________________
Root cause: ____________________________________________
Fix applied: ____________________________________________
Code change required: [ ] Yes [ ] No
Target met: [ ] Yes [ ] No
If no, primary bottleneck: ____________________________________________
```
---
---
---
---
#### Template 1: `terraform.tfvars` — Home Lab Topology Definition File
```hcl
Like what you see?
Like what you see?
proxmox_host = "[YOUR_PROXMOX_IP_OR_HOSTNAME]" # e.g., "192.168.1.10" or "pve01.lan"
proxmox_node = "[YOUR_NODE_NAME]" # e.g., "pve01" — must match node name in PVE UI
proxmox_api_user = "terraform@pam" # Service account — see setup guide
proxmox_api_token_id = "terraform@pam!homelab"
proxmox_api_token_secret = "[YOUR_API_TOKEN_SECRET]" # Generate in PVE → Datacenter → API Tokens
Like what you see?
pfsense_url = "https://[YOUR_PFSENSE_IP]" # e.g., "https://192.168.1.1"
pfsense_user = "admin"
pfsense_password = "[YOUR_PFSENSE_PASSWORD]"
pfsense_insecure = true # Set false if you've installed a valid cert
proxmox_storage_pool = "[YOUR_STORAGE_POOL]" # e.g., "local-lvm", "nvme-pool", "ceph-pool"
proxmox_iso_storage = "local" # Where ISOs live — usually "local"
Like what you see?
vlans = {
management = {
id = 10
subnet = "10.10.10.0/24"
gateway = "10.10.10.1"
description = "Proxmox hosts, switches, OOB management"
}
servers = {
id = 20
subnet = "10.10.20.0/24"
gateway = "10.10.20.1"
description = "Self-hosted services — Nextcloud, Gitea, etc."
}
iot = {
id = 30
subnet = "10.10.30.0/24"
gateway = "10.10.30.1"
description = "IoT devices — isolated, no LAN routing"
}
lab = {
id = 99
subnet = "10.10.99.0/24"
gateway = "10.10.99.1"
description = "Experimental — can be nuked freely"
}
}
default_vm_user = "[YOUR_DEFAULT_USER]" # e.g., "labadmin"
default_ssh_key = "[YOUR_SSH_PUBLIC_KEY]" # Paste full public key string
default_template_id = 9000 # VMID of your cloud-init template
```
---
#### Template 2: `modules/proxmox-vm/variables.tf` — Reusable VM Module Input Schema
```hcl
Like what you see?
variable "vm_name" {
description = "Hostname of the VM — used for DNS and PVE display name"
type = string
# Example: "gitea-01"
}
variable "vmid" {
description = "Proxmox VMID — must be unique across the cluster. Use 100–199 for servers, 200–299 for lab VMs"
type = number
# Example: 121
}
variable "target_node" {
description = "Proxmox node to deploy on — must match node name exactly"
type = string
# Example: "pve01"
}
variable "clone_template" {
description = "VMID of the cloud-init template to clone from"
type = number
# Example: 9000
}
variable "cores" {
description = "Number of vCPU cores"
type = number
default = 2
}
variable "memory_mb" {
description = "RAM in megabytes — use multiples of 1024"
type = number
default = 2048
# 2048 = 2GB, 4096 = 4GB, 8192 = 8GB
}
variable "disk_size" {
description = "Primary disk size — include unit suffix"
type = string
default = "20G"
}
variable "storage_pool" {
description = "Proxmox storage pool for VM disk"
type = string
# Example: "local-lvm"
}
variable "vlan_tag" {
description = "VLAN ID for the primary network interface"
type = number
# Example: 20 for servers VLAN
}
variable "ip_address" {
description = "Static IP in CIDR notation — passed to cloud-init"
type = string
# Example: "10.10.20.21/24"
}
variable "gateway" {
description = "Default gateway for this VM"
type = string
# Example: "10.10.20.1"
}
variable "dns_servers" {
description = "List of DNS resolvers — use your internal resolver first"
type = list(string)
default = ["10.10.10.1", "1.1.1.1"]
}
variable "ssh_public_key" {
description = "SSH public key injected via cloud-init"
type = string
sensitive = true
}
variable "tags" {
description = "Proxmox tags for filtering in the UI — use lowercase, hyphen-separated"
type = list(string)
default = []
# Example: ["terraform", "servers", "gitea"]
}
```
---
#### Template 3: `modules/pfsense-vlan/main.tf` — pfSense VLAN + Interface + DHCP Block
```hcl
Like what you see?
Like what you see?
terraform {
required_providers {
pfsense = {
source = "markdumay/pfsense"
version = "~> 0.6"
}
}
}
resource "pfsense_vlan" "this" {
interface = var.parent_interface # e.g., "igb1" — your LAN-side NIC, NOT WAN
vlan_tag = var.vlan_id
description = var.description
priority = 0
}
Like what you see?
resource "pfsense_interface" "this" {
name = var.interface_name # e.g., "SERVERS" — appears in pfSense UI
interface = "vlan${var.vlan_id}" # References the VLAN created above
enable = true
description = var.description
ipv4_type = "staticv4"
ipv4_address = var.gateway_ip # e.g., "10.10.20.1"
ipv4_prefix_length = var.prefix_length # e.g., 24
depends_on = [pfsense_vlan.this]
}
resource "pfsense_dhcp_server" "this" {
interface = pfsense_interface.this.name
enable = true
range_from = var.dhcp_start # e.g., "10.10.20.100"
range_to = var.dhcp_end # e.g., "10.10.20.199"
dns_servers = var.dns_servers
gateway = var.gateway_ip
domain_name = var.domain_name # e.g., "servers.lab"
depends_on = [pfsense_interface.this]
}
Like what you see?
resource "pfsense_firewall_rule" "default_deny" {
count = var.isolate_vlan ? 1 : 0
interface = pfsense_interface.this.name
action = "block"
protocol = "any"
source = "any"
destination = "!${var.interface_name}" # Block traffic leaving this VLAN to other VLANs
description = "TF-MANAGED: Default deny inter-VLAN — ${var.description}"
log = true
depends_on = [pfsense_interface.this]
}
```
---
#### Template 4: `environments/lab/main.tf` — Full Lab Environment Orchestration File
```hcl
Like what you see?
Like what you see?
Like what you see?
module "vlan_servers" {
source = "../../modules/pfsense-vlan"
vlan_id = var.vlans["servers"].id
parent_interface = "[YOUR_LAN_NIC]" # e.g., "igb1"
interface_name = "SERVERS"
gateway_ip = var.vlans["servers
---
The definitive Infrastructure-as-Code blueprint that bridges the gap between enterprise Terraform tutorials and the messy reality of running a home lab on consumer hardware, mixed hypervisors, and a residential ISP.
This product was designed for: Mid-career IT professionals and DevOps-aspiring sysadmins (28–45) who run a home lab with Proxmox, ESXi, or bare-metal servers. They understand basic Terraform syntax from tutorials but cannot translate enterprise-cloud examples into their own VLAN-segmented, self-hosted environment. Their main frustration is that every Terraform course assumes AWS/Azure, leaving them to reverse-engineer providers, state management, and networking for local infrastructure. Their desired outcome is a fully codified, version-controlled home lab they can tear down and rebuild in under 30 minutes.
Your transformation: From a fragile, manually-configured home lab that takes days to rebuild after a failed upgrade → To a git-committed, modular Terraform codebase that provisions the entire stack (VMs, containers, networking, DNS, storage) reproducibly in under 30 minutes with a single 'terraform apply'.
Like what you see?
Generated with DALL-E 3. No design tools needed.

1200×1800 optimized images generated with Puppeteer HTML rendering.





Your home lab is one bad drive away from 40 hours of rebuilding from memory — fix that today.
Primary hookEvery cloud engineer learns Terraform on AWS. This is the guide that teaches it on the hardware sitting under your desk.
What if you could nuke your entire home lab and have it fully rebuilt — VMs, firewall rules, DNS, storage — in 30 minutes flat?
You've spent hundreds of hours building your home lab into something genuinely powerful. Proxmox humming, pfSense locked down, TrueNAS holding everything together. But deep down you know the truth: it's all held together with memory, sticky notes, and luck. One corrupted config, one forgotten VLAN rule, one reinstall — and you're starting over blind. That anxiety doesn't have to be your reality. This guide transforms your fragile, hand-crafted lab into a version-controlled infrastructure you actually understand and can fully reproduce. No cloud required. No AWS account. Just clean, modular Terraform code built specifically for the hardware you already own — and the confidence that comes from knowing exactly how to rebuild everything from scratch.
This entire product — 56 chapters, 12,000+ words, cover image, sales copy, and Pinterest pins — was created by AI in minutes.
Not days. Not weeks. Minutes.
Try Kupkaike Free — 20 Credits →Everything on this page was generated from a single niche idea. No design skills. No copywriting. No code. Just your idea — and Kupkaike does the rest.
Free account includes 20 cupcakes · No credit card required
Home Lab Infrastructure as Code: The Terraform Blueprint for Proxmox, pfSense & Self-Hosted Stacks
AI-generated digital product