Advanced Homelab Topics

Monitoring, Security, Backups & Automation

As my homelab grew, I realized the basics were just the start. Proper monitoring, ironclad security, reliable backups, and smart automation became non-negotiable to avoid headaches and keep everything humming along without constant babysitting.

↓

Monitoring & Observability

In my homelab, monitoring isn't optionalβ€”it's the difference between catching a failing drive at 3 AM via an alert and waking up to a crashed server. I rely on Grafana for beautiful dashboards, Prometheus for metrics collection, and Uptime Kuma for simple uptime checks. These tools give me eyes on everything from CPU spikes to service downtimes.

Grafana Prometheus Uptime Kuma Node Exporter cAdvisor

My Monitoring Stack

πŸ“ˆ

Grafana - The Visual Dashboard

Grafana is my command center. I've built custom dashboards that show CPU, RAM, disk usage, and network traffic across all my VMs and containers. The best part? It's gorgeous and makes you feel like you're running NASA Mission Control.

# Deploy Grafana with Docker Compose:
docker run -d -p 3000:3000 \
  -v grafana-storage:/var/lib/grafana \
  grafana/grafana-oss
πŸ”

Prometheus - The Data Collector

Prometheus scrapes metrics from all my services every 15 seconds. It stores time-series data that Grafana queries to build those beautiful graphs. I use Node Exporter for system metrics and cAdvisor for Docker container stats.

# Prometheus config example:
scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['192.168.1.10:9100']
🚨

Uptime Kuma - Simple Status Checks

For quick uptime monitoring, Uptime Kuma is perfect. It pings all my services every minute and sends me a Telegram notification if anything goes down. Clean UI, easy setup, no fuss.

Network Topology

A simplified logical view of how traffic flows into my homelab.

Network Topology v2.0

Real-time traffic orchestration from the global edge to the local compute nodes.

☁️
Internet
πŸ›‘οΈ
Cloudflare
πŸ”„
OPNsense
πŸ”Œ
UniFi 10G
πŸ–₯️ Proxmox
πŸ’Ύ Synology
πŸ“‘ IoT Stack

Why Monitoring Changed Everything

βœ…
Catch Issues Early: Spotted a memory leak in a container before it took down the entire host. Prometheus showed RAM climbing steadily over 3 days.
βœ…
Performance Optimization: Discovered which services were CPU hogs and moved them to separate VMs for better resource distribution.
βœ…
Historical Data: When something breaks, I can look back and see exactly when things started going wrong. Invaluable for troubleshooting.
βœ…
Peace of Mind: Getting a "All systems operational" notification every morning is surprisingly reassuring. No more wondering if everything is still running.

Backups & Disaster Recovery

Here's the truth: if you don't have backups, you don't have data. I learned this the hard way when a power surge killed a drive with 6 months of config files. Now I follow the 3-2-1 rule religiously: 3 copies of data, on 2 different media types, with 1 copy offsite. Proxmox Backup Server handles VM snapshots, Restic backs up Docker volumes to cloud storage, and I test restores monthly.

3-2-1 Rule Proxmox Backup Restic Automated Schedules

The 3-2-1 Backup Rule

3️⃣

Three Copies of Your Data

Your original data plus two backups. If one fails, you've got another. I keep: (1) Production data on my Proxmox host, (2) Daily backups on a NAS, (3) Weekly backups encrypted to Backblaze B2.

2️⃣

Two Different Media Types

Don't put all backups on the same type of storage. I use SSDs for production, spinning HDDs in my NAS for local backups, and cloud object storage for offsite. If SSDs fail industry-wide (unlikely but possible), my HDD backup survives.

1️⃣

One Copy Offsite

House fires, floods, theftβ€”local disasters happen. My weekly encrypted backups go to Backblaze B2 via Restic. Costs me $2/month for 100GB. Best $2 I've ever spent for peace of mind.

# Restic backup to B2:
restic -r b2:my-bucket:/backups backup /data

# Restore if disaster strikes:
restic -r b2:my-bucket:/backups restore latest --target /
πŸ•
Daily VM Snapshots
Proxmox Backup Server runs at 2 AM daily. Full VM backups of critical infrastructure servers. Retention: 7 daily, 4 weekly.
🐳
Docker Volume Backups
Restic backs up all Docker volumes every 6 hours. Includes databases, config files, user data. Encrypted before leaving my network.
πŸ§ͺ
Monthly Restore Tests
I set a calendar reminder to restore a random backup monthly. Untested backups are just hopes and dreams. Test or regret later.
πŸ“‹
Disaster Recovery Docs
Step-by-step guide for rebuilding everything from scratch. Stored in GitHub and printed on paper. When the server's dead, you can't access the docs on the server.

Security Best Practices

Security in my homelab is layered like an onionβ€”peel one back, and there's another keeping the bad guys out. I start with strict firewall rules on my router and Proxmox host, only opening ports I absolutely need. Fail2ban scans logs and bans IPs after failed login attempts, which has stopped countless brute-force attacks. Two-factor authentication is non-negotiable for anything exposed to the internet.

Defense in Depth Zero Trust Fail2ban 2FA Everywhere Regular Audits

My Security Layers (Defense in Depth)

🌐

Layer 1: Perimeter (Cloudflare + Firewall)

Cloudflare sits in front of everything public-facing, absorbing DDoS attacks and hiding my real IP. My router firewall blocks all inbound traffic except what comes through Cloudflare Tunnel (no open ports!). If attackers can't reach the door, they can't pick the lock.

πŸšͺ

Layer 2: Authentication (VPN + 2FA)

Admin panels and internal tools require WireGuard VPN access. Even if someone gets past that, every service has 2FA enabled via Authelia or built-in TOTP. Passwords alone aren't enough anymore.

# Authelia config for 2FA:
default_2fa_method: "totp"
totp:
  issuer: leleasley.uk
  period: 30
πŸ‘οΈ

Layer 3: Monitoring (Fail2ban + Crowdsec)

Fail2ban watches SSH, Nginx, and other logs. After 3 failed attempts, IP gets banned for 24 hours. Crowdsec takes this further by sharing threat intelligenceβ€”if someone attacks another homelab, I get their IP banned automatically.

πŸ”

Layer 4: Vulnerability Scanning

Weekly Trivy scans check Docker images for known CVEs. Lynis audits my Linux systems for misconfigurations. Keeping software updated is crucialβ€”most breaches exploit old, unpatched systems.

# Scan Docker images weekly:
trivy image nginx:latest

# Audit system security:
sudo lynis audit system

My Security Checklist

βœ…
SSH Keys Only: Disabled password auth for SSH. Private keys only, stored securely. No more brute-force attempts filling my logs.
βœ…
Separate VLANs: IoT devices on their own VLAN with no access to my servers. Smart bulbs don't need to talk to Proxmox.
βœ…
Regular Updates: Automated unattended-upgrades on Ubuntu, manual updates on Proxmox after testing. Patch early, patch often.
βœ…
Encrypted Backups: All offsite backups are encrypted before leaving my network. Restic uses AES-256. Even if cloud storage leaks, data stays safe.
βœ…
Least Privilege: Services run with minimal permissions. Docker containers as non-root users. If compromised, damage is contained.

Automation & Infrastructure as Code

The moment I started treating my infrastructure as code, everything became reproducible. Ansible manages all my VM configurations, Terraform provisions cloud resources, and Docker Compose defines every service. If my entire lab burns down tomorrow, I can rebuild it from GitHub in a few hours instead of weeks.

Ansible Terraform Docker Compose Git for Everything
πŸ“œ
Ansible Playbooks
All VM configuration in YAML. Install Docker, configure firewall rules, deploy monitoringβ€”all automated. Run once, applies everywhere consistently.
πŸ—οΈ
Docker Compose Stacks
Every service defined in compose files. Want to add a new container? Edit YAML, run `docker compose up`. No manual clicking through dashboards.
πŸ”„
GitOps Workflow
All configs in Git. Make changes locally, test, commit, push. Git history becomes disaster recovery documentation. Broke something? Git revert.
⏰
Scheduled Tasks
Cron jobs for backups, updates, and maintenance. Certificate renewals, log rotation, cleanup tasksβ€”all automated. Set it once, forget it forever.
# Example Ansible playbook for Docker setup:
- hosts: all
  tasks:
    - name: Install Docker
      apt:
        name: docker.io
        state: present

# Run across all VMs:
ansible-playbook -i inventory setup.yml