The Tailscale Revolution: 16 Months from Port Forwarding Hell to Mesh Nirvana

The Tailscale Revolution: 16 Months from Port Forwarding Hell to Mesh Nirvana

I’m sitting in a coffee shop, forty miles from home. I open a terminal and type:

ssh commander@192.168.20.196

38ms. Connected. I’m inside an LXC container running on a Proxmox host at my dad’s house, on a residential ISP connection, behind a consumer router that has never had a port forwarded in its life. No VPN client window. No tunneling app. No DNS gymnastics. Just an IP address that shouldn’t work from here — but does.

Sixteen months ago, this would have been a three-beer problem.


The Bad Old Days

My infrastructure spans two physical locations. The Milky Way network — 10.42.0.0/24 — is the home base. Proxmox hosts, Docker swarms, a build swarm that compiles Gentoo packages, the whole circus. Forty miles south, at dad’s house, sits the Andromeda network — 192.168.20.0/24 — with another Proxmox hypervisor, an Unraid array stuffed with media, and a Synology quietly doing backup duty.

For sixteen months, connecting them was an exercise in creative suffering.

OpenVPN was the first attempt. It worked. Slowly. The connection was stable enough to SSH over, but transferring anything larger than a config file felt like pushing a shopping cart through mud. The client was clunky. The config files were fragile. And every time the residential IP changed — which was roughly whenever Comcast felt like it — everything broke until I noticed, logged in locally (via some other hack), and updated the endpoint.

WireGuard was attempt two. Faster. Much faster. The protocol itself is elegant. But managing keys across a dozen devices, keeping configs in sync, handling the endpoint dance on dynamic IPs… it scaled about as well as a to-do list written on a napkin. Every new VM or container meant generating a new keypair, updating every peer config, and hoping I didn’t fat-finger a public key somewhere.

Port forwarding was the constant background noise through all of this. SSH on some non-standard port. Plex remote access punching through the router. A reverse proxy exposed to the internet for services I didn’t want to think about too hard. Dynamic DNS updating on a cron job that sometimes just… didn’t. I’d wake up to find my DDNS record pointed at some IP Comcast assigned to someone in the next zip code.

The whole thing held together with duct tape and denial.


One Command

I don’t remember exactly when I first ran tailscale up. Sometime in late 2024. Probably 11:43 PM on a Saturday, because that’s when I make all my best infrastructure decisions.

The laptop joined the tailnet in about four seconds. I could ping my desktop across the room via its 100.64.x.x address. Cool. But that wasn’t the point. I didn’t need to reach devices that already had Tailscale installed. I needed to reach everything behind those devices — the LXC containers, the Docker services, the NAS boxes, the random VM I spun up three months ago and forgot about.

Simply putting Tailscale on my laptop was step zero. The real work was turning my existing infrastructure into subnet routers — nodes that say “hey, I know how to reach this entire network, send the packets to me.”

That’s where things got interesting.


Subnet Routing: The Actual Magic

The concept is simple. You pick a node on each network, install Tailscale on it, and tell it to advertise the local subnet. Any device on the tailnet can then reach any IP on that subnet, routed through the advertiser.

For the Milky Way network, I designated Altair-Link (10.42.0.199) as the subnet router. For Andromeda, it’s Tarn-Host (192.168.20.100) — the Proxmox hypervisor at dad’s place.

The magic incantation:

tailscale up --advertise-routes=10.42.0.0/24,192.168.20.0/24 \
             --advertise-tags=tag:milkyway-altair

That’s it. That one flag — --advertise-routes — turns a regular Tailscale node into a gateway for an entire subnet. Every device on my tailnet can now route to those CIDRs through the advertising node.

But there’s a catch. By default, Tailscale requires you to manually approve advertised routes in the admin console. Every time the node restarts, you’re back in the web UI clicking “approve.” At 2 AM. After a power blip at dad’s house. While half asleep.

No. We fix this with ACLs and auto-approvers.


ACLs: The Bouncer at the Door

Tailscale’s default policy is “allow all.” Everyone on the tailnet can reach everyone else. That’s fine for three devices. It’s reckless for thirty.

I locked it down with tags and ACL rules. The ACL policy lives in the Tailscale admin console and looks like this:

{
  "tagOwners": {
    "tag:milkyway-altair":   ["commander@github"],
    "tag:andromeda-tarn":   ["commander@github"],
    "tag:milkyway-gateway":   ["commander@github"],
    "tag:server":         ["commander@github"]
  },
  "acls": [
    {
      "action": "accept",
      "src": ["commander@github"],
      "dst": ["*:*"]
    },
    {
      "action": "accept",
      "src": ["tag:milkyway-gateway"],
      "dst": ["*:*"]
    },
    {
      "action": "accept",
      "src": ["tag:server"],
      "dst": ["tag:server:*"]
    }
  ],
  "autoApprovers": {
    "routes": {
      "10.42.0.0/24":    ["tag:milkyway-altair", "tag:milkyway-gateway"],
      "192.168.20.0/24": ["tag:andromeda-tarn"]
    }
  }
}

The autoApprovers block is the critical piece. It says: “When a node tagged tag:andromeda-tarn advertises the 192.168.20.0/24 route, approve it automatically.” No manual clicking. No 2 AM admin console visits. The node reboots, re-advertises its routes, and the tailnet accepts them without intervention.

The tag structure also gives me granular control. My personal devices (logged in as commander@github) can reach everything. Servers tagged tag:server can only talk to other servers. The gateway nodes can route freely because that’s their entire job.

This took maybe twenty minutes to set up. I spent longer than that per week managing WireGuard peer configs.


The LXC Problem: Where Everyone Gets Stuck

Here’s where the story gets ugly.

Subnet routing was working. I could reach 192.168.20.100 (Tarn-Host itself) from anywhere on the tailnet. Beautiful. So I tried reaching the LXC container at 192.168.20.196.

ssh commander@192.168.20.196

Nothing. The terminal just… hung. No timeout. No connection refused. No error. Just infinite patience from a TCP stack that would never get a reply.

I ran tcpdump on Tarn-Host. The SYN packet arrived. It got forwarded to the container. The container received it. The container sent back a SYN-ACK.

To the wrong place.

Here’s what was happening, step by painful step:

  1. My laptop sends a packet to 192.168.20.196. The source IP is 100.64.0.22 (my Tailscale IP).
  2. The packet hits Tarn-Host via the tailscale0 interface.
  3. Tarn-Host says “I know where .196 is, it’s on vmbr0” and forwards the packet.
  4. The LXC container at .196 receives it. Source IP: 100.64.0.22. Destination: itself.
  5. The container prepares a reply to 100.64.0.22.
  6. The container checks its routing table. Default gateway: 192.168.20.1 (the physical router).
  7. The reply goes to the router. The router sees destination 100.64.0.22, a CGNAT range it’s never heard of, and drops the packet into the void.

The container was replying, but replying to the wrong door. It sent its answer to the physical router instead of back through the Proxmox host’s Tailscale interface. The router, having no concept of the 100.64.0.0/10 range, did what any sensible router does with traffic it can’t route: absolutely nothing.

Asymmetric routing. The packet came in through Tailscale but tried to leave through the default gateway. Classic.

I stared at tcpdump output for about forty minutes before this clicked. Past Me thought subnet routing was just “advertise and go.” Present Me learned that Linux routing tables don’t care about your assumptions.


Policy-Based Routing: The Fix

The solution has four parts. None of them are optional. I learned this by trying to skip each one individually.

Part 1: IP Forwarding

The Proxmox host needs to act as a router, forwarding packets between its interfaces. This is probably already enabled on most Proxmox installations, but verify it:

# /etc/sysctl.conf
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
sysctl -p

Part 2: Reverse Path Filtering

Linux has a security feature called reverse path filtering (rp_filter). It checks: “Did this packet arrive on the interface it should have arrived on, given its source IP?” For our asymmetric routing situation, the answer is always “no,” so Linux drops the packet as a potential spoof.

We need to disable this for the relevant interfaces:

# /etc/sysctl.d/99-tailscale.conf
net.ipv4.conf.tailscale0.rp_filter = 0
net.ipv4.conf.vmbr0.rp_filter = 0

I initially only disabled it for tailscale0. Spent another thirty minutes wondering why packets were still being dropped before realizing vmbr0 needed it too. Both sides of the asymmetric path need the check relaxed.

Part 3: Custom Routing Table

This is the actual routing fix. We create a custom routing table (number 52, chosen arbitrarily — use whatever number isn’t taken) and tell the kernel: “If traffic arrives on the tailscale0 interface, consult this special table for the return path.”

# Ensure Tailscale CGNAT range routes through tailscale0
ip route add 100.64.0.0/10 dev tailscale0 table main

# In table 52, tell traffic how to reach the local subnet
ip route add 192.168.20.0/24 dev vmbr0 src 192.168.20.100 table 52

# Policy rule: traffic arriving from tailscale0 uses table 52
ip rule add iif tailscale0 lookup 52 prio 100

The src 192.168.20.100 in the route is important. It tells the kernel what source IP to use when forwarding to the local subnet — the Proxmox host’s own LAN IP. Without this, replies can get confused about which interface to use.

Part 4: SNAT Masquerading

The final piece. Even with the routing table fixed, the LXC container still sees 100.64.0.22 as the source IP in incoming packets. Its reply will go to its default gateway (the router) because it doesn’t know what 100.64.0.0/10 is.

We fix this by masquerading. As traffic leaves Tarn-Host toward the container, we rewrite the source IP to be Tarn-Host’s own LAN address:

iptables -t nat -A POSTROUTING -s 100.0.0.0/8 -o vmbr0 -j SNAT --to-source 192.168.20.100

Now the container sees the request coming from 192.168.20.100 — its local neighbor. It replies to .100, which is Tarn-Host, which un-NATs the packet, looks up the routing table, and sends it back down tailscale0.

The Complete Traffic Flow

Here’s the full path, all seven hops, for an SSH connection from my laptop to the LXC container:

  1. Laptop (100.64.0.22) sends packet to 192.168.20.196 via Tailscale.
  2. Tarn-Host receives it on tailscale0. Destination is on vmbr0.
  3. SNAT rewrites source from 100.64.0.22 to 192.168.20.100.
  4. Tarn-Host forwards the packet out vmbr0 to the container.
  5. Container (192.168.20.196) receives the packet, sees source 192.168.20.100, replies normally.
  6. Tarn-Host receives the reply, un-NATs it (restoring destination to 100.64.0.22), consults routing table.
  7. Tarn-Host routes the reply out tailscale0. Laptop receives it.

38ms round trip. The container has no idea it’s talking to someone forty miles away.


The Unraid Challenge

With Proxmox sorted, I turned to Meridian-Mako-Silo — the Unraid box at dad’s place. Surely installing Tailscale on Unraid would be straightforward.

It was not.

Unraid boots from a USB flash drive formatted as FAT. FAT doesn’t support Unix execute permissions. FAT doesn’t support Unix sockets. The Tailscale binary needs both.

The solution: copy the binaries to a RAM-backed filesystem at boot, where execute permissions work, and persist only the authentication state back to the flash drive.

The boot script (/boot/config/go):

# Copy Tailscale binaries to RAM
mkdir -p /usr/local/bin
cp /boot/config/tailscale/tailscaled /usr/local/bin/
cp /boot/config/tailscale/tailscale /usr/local/bin/
chmod +x /usr/local/bin/tailscaled /usr/local/bin/tailscale

# State directory on flash for auth persistence
mkdir -p /boot/config/tailscale/state

# Start the daemon with state on flash
/usr/local/bin/tailscaled \
  --state=/boot/config/tailscale/state/tailscaled.state \
  --socket=/var/run/tailscale/tailscaled.sock &

sleep 3

# Bring up Tailscale with subnet advertising
/usr/local/bin/tailscale up \
  --advertise-routes=192.168.20.0/24 \
  --accept-routes \
  --hostname=meridian-mako-silo

This worked. Mostly. The problem was that local traffic — Plex talking to its own media shares, Docker containers reaching the NAS — started routing through Tailscale instead of staying on the local network. Latency for local operations went from sub-millisecond to 15ms as packets took a scenic tour through the tailnet.

The fix: a local routing priority rule that says “traffic destined for the local subnet stays local, period.”

ip rule add to 192.168.20.0/24 lookup main priority 5200

This inserts a rule with higher priority (lower number = higher priority) than the Tailscale-injected rules, ensuring local-to-local traffic never touches tailscale0. Plex goes back to sub-millisecond access to its media. The Tailscale route only kicks in for traffic from outside the local subnet.

A subtle thing, and the kind of bug that doesn’t show up immediately. I only caught it because Plex transcoding suddenly felt sluggish and I started looking at interface counters. The tailscale0 interface was carrying way more traffic than it should have been for a node with no active remote sessions.


The Gentoo Quirk

Back on the Milky Way side, Capella-Outpost runs Gentoo. (Everything I touch eventually runs Gentoo. It’s a disease.) Tailscale on Gentoo works fine, but there’s a subtle issue with --netfilter-mode.

On most distros, Tailscale manages its own iptables/nftables rules to handle routing. On Gentoo with OpenRC, the --netfilter-mode=off flag doesn’t persist across reboots the way it does on systemd distros. The Tailscale daemon starts, ignores your preference, and quietly re-enables netfilter management, which conflicts with any custom iptables rules you’ve written.

The fix: a helper script that runs after Tailscale starts and reapplies the correct configuration.

#!/bin/bash
# /etc/local.d/tailscale-fixup.start

sleep 10  # Wait for tailscaled to fully initialize

# Re-apply netfilter-mode
tailscale set --netfilter-mode=off

# Manually add routes that Tailscale would normally manage
ip route add 192.168.20.0/24 dev tailscale0 2>/dev/null || true
ip route add 10.42.0.0/24 dev tailscale0 2>/dev/null || true

The sleep 10 is ugly but necessary. The Tailscale daemon takes a few seconds to connect to the coordination server and establish its interface. If you try to tailscale set before it’s ready, you get a cryptic “not connected” error that doesn’t tell you to just wait.

I spent an entire evening debugging that timing issue. The script worked perfectly when I ran it manually. Failed silently on boot. Turned out the daemon wasn’t ready yet. Added the sleep. Felt dirty. Moved on.


Persistence: Surviving Reboots

None of this matters if it vanishes after a reboot. For the Proxmox hosts, I wrap the routing rules into a script at /etc/network/if-up.d/tailscale-routes:

#!/bin/bash
# /etc/network/if-up.d/tailscale-routes
# Policy-based routing for Tailscale subnet access to LXC containers

# Wait for tailscale0 to exist
for i in $(seq 1 30); do
    ip link show tailscale0 &>/dev/null && break
    sleep 1
done

# Bail if Tailscale never came up
ip link show tailscale0 &>/dev/null || exit 0

# Routes
ip route add 100.64.0.0/10 dev tailscale0 table main 2>/dev/null || true
ip route add 192.168.20.0/24 dev vmbr0 src 192.168.20.100 table 52 2>/dev/null || true

# Policy rule
ip rule add iif tailscale0 lookup 52 prio 100 2>/dev/null || true

# SNAT
iptables -t nat -C POSTROUTING -s 100.0.0.0/8 -o vmbr0 -j SNAT --to-source 192.168.20.100 2>/dev/null || \
iptables -t nat -A POSTROUTING -s 100.0.0.0/8 -o vmbr0 -j SNAT --to-source 192.168.20.100

The 2>/dev/null || true pattern on the ip route commands prevents errors if the rules already exist (idempotency). The iptables -t nat -C check before -A prevents duplicate NAT rules stacking up on every interface bounce. I learned that one the hard way — after a few days of interface flaps, I had seventeen identical SNAT rules and couldn’t figure out why conntrack was acting weird.

chmod +x /etc/network/if-up.d/tailscale-routes

Testing: Trust but Verify

After all this, I run four tests. Every time. Even if I’m sure it works.

Test 1: Local traffic stays local.

From Tarn-Host, ping a local device and verify packets go through vmbr0, not tailscale0:

tcpdump -i tailscale0 host 192.168.20.1 &
ping -c 3 192.168.20.1
# Should see ZERO packets on tailscale0

If you see traffic here, your local routing priority is wrong. Local traffic is leaking into the mesh.

Test 2: Remote subnet routing works.

From the laptop (on an external network), reach the remote subnet router:

ping 192.168.20.100
# Should get replies, ~38ms from 40 miles away

If this fails, check that the route is approved in the Tailscale admin console (or that your autoApprovers ACL is correct).

Test 3: LXC container access.

The moment of truth. From the laptop:

ssh commander@192.168.20.196

If this hangs, the asymmetric routing fix isn’t working. Check tcpdump on both tailscale0 and vmbr0 on the Proxmox host. You should see the SYN arrive on tailscale0, the SNAT’d SYN leave on vmbr0, the SYN-ACK come back on vmbr0, and the reply go out on tailscale0.

Test 4: Reboot persistence.

reboot
# Wait for system to come back
ssh commander@192.168.20.196

If it works before reboot but not after, your persistence script isn’t running or isn’t waiting long enough for tailscale0 to initialize. Check the timing.


What I Actually Have Now

Two sites. Flat mesh. Zero exposed ports.

I SSH from coffee shops into LXC containers like they’re on the next desk. Plex streams from Meridian-Mako-Silo via its Tailscale IP — no remote access configuration, no relay servers, no “not available outside your network” nonsense. The build swarm drones at dad’s house communicate with the orchestrator at mine across the tailnet, coordinating Gentoo package compilation as if they’re on the same switch.

The Tailscale admin console shows twelve nodes. Two subnet routers. Four advertised routes. The ACL policy is twenty-three lines long and hasn’t needed a change in months.

Here’s what replaced what:

BeforeAfter
OpenVPN config files on every deviceTailscale installed on 2 subnet routers
WireGuard keypairs for every peerTags and ACLs in one JSON policy
Port forwarding on two routersZero ports forwarded
Dynamic DNS with stale recordsTailscale MagicDNS (or just IPs)
Praying after power outagesAuto-approvers + persistence scripts

The routing table hacks are ugly. Policy-based routing, SNAT masquerading, reverse path filter exceptions — this isn’t clean networking. It’s the kind of thing that makes a network engineer twitch. But it works. It’s been working for months. The persistence scripts survive reboots. The auto-approvers handle restarts. The LXC containers don’t even know they’re accessible from forty miles away.

The Unraid FAT filesystem problem is absurd. Copying binaries to RAM on every boot, persisting auth state to flash, adding local routing priority rules so your own NAS doesn’t route local traffic through a WireGuard tunnel to a coordination server in New York and back. It’s a hack. It works. I’m not proud. I’m not ashamed. I’m just running.

The Gentoo timing issue is the most Gentoo thing ever. A ten-second sleep in a startup script because the daemon isn’t ready yet and the error message doesn’t say “wait.” I have written more elegant code. I have also written code that doesn’t work. I’ll take the sleep.

Sixteen months of VPN pain. Port forwarding nightmares. Dynamic DNS failures at the worst possible time. And the fix was a mesh network that took an afternoon to set up and a weekend to polish the routing edges.

The laptop doesn’t care where I am anymore. Coffee shop, hotel, airport — I type an IP address and I’m home.

That’s the whole point.