Skip to main content
Back to Journal
user@argobox:~/journal/2026-02-11-the-admin-panel-fix-and-the-duplicate-build-crisis
$ cat entry.md

The Admin Panel Fix & The Duplicate Build Crisis

○ NOT REVIEWED

The Admin Panel Fix & The Duplicate Build Crisis

Started: 22:00 on Feb 11 Ended: 01:30 on Feb 12 Duration: 3.5 hours Bugs Found: 3 critical connection issues Systems Affected: Admin panel, binary flow, cron automation Status: Admin panel working, duplicate builds mitigated, package control system designed


22:00 -- The Admin Panel Couldn't Connect

The build swarm admin panel was completely offline. Tried to load the dashboard, got connection refused.

Took a step back and checked the config. Found three separate port/IP mismatches:

Issue 1: build-swarm-v3/admin/app.js line 10 had fallback IP 10.0.0.199 (the gateway, which doesn't run the admin service). Should be 10.0.0.100 (callisto).

Issue 2: argobox/[...path].ts line 14 had port 8094 hardcoded. But the admin service listens on 8093.

Issue 3: argobox/astro.config.mjs line 223 also had port 8094. Same problem.

All three were pointing to wrong places. Fixed them all. Admin panel came online immediately.

This is embarrassing because these are config values that should have been synchronized. They drifted over time as people added features without checking if there was a canonical place to store them.


23:00 -- The Binary Flow Mystery

Started investigating why duplicate builds were happening. The issue: drones would build the same package twice, wasting compute.

Traced the flow:

  1. Package gets built on drone
  2. Binary rsync'd from drone → callisto staging dir (/var/cache/binpkgs-staging/)
  3. Here it sits. Never syncs to io's binhost.
  4. Meanwhile, orchestrator still sees "needed" in the queue
  5. Different drone picks it up again
  6. Builds it again

690MB stuck in the staging directory. Stale binpkgs going nowhere.

The SQL delegation IS atomic — the WHERE status = 'needed' guard prevents double-assignment in theory. But the real problem was explicit: the swarm-drone code was deliberately blanking PORTAGE_BINHOST and disabling getbinpkg.

The drones weren't using binpkgs from the binhost at all. They were building everything from source. So when they resolved dependencies, emerge would pull in all the transitive dependencies. Another drone would get assigned a dependency as a separate package. Both drones build it. Duplicate work.


23:30 -- Option 4: Use the Binhost

The fix was straightforward but required coordination:

Modified swarm-drone's emerge command:

--usepkg=y
--getbinpkg=y
PORTAGE_BINHOST=http://10.0.0.204/packages/
getbinpkg in FEATURES

Now drones try to download pre-built packages from the binhost before building from source. Dependency resolution still pulls in transitive deps, but they exist on the binhost, so no build is needed. No duplicate work.

Automated the staging → binhost sync:

Rewrote sync-to-binhost.sh with:

  • Lock file to prevent concurrent syncs
  • Quiet mode for cron
  • Proper error logging
  • Installed cron: */2 * * * * (every 2 minutes)

Now the 690MB stuck in staging automatically syncs to io every 2 minutes. Deployed to all 4 drones via scp + rc-service restart. All back online.


00:45 -- The Package Control System Design

This is the real problem: the build system has no way to tell which packages SHOULD be built and which shouldn't. It's reactive — "here's the queue, build it" — with no strategy for what actually needs to exist.

Designed a 3-phase solution:

Phase 1: Package Exclusion List

  • Global exclusion patterns (e.g., never build media-video/nvidia-driver)
  • Per-drone exclusions (some hardware needs specific packages)
  • Auto-exclude on cross-compilation failures (if a package fails on one drone, exclude it from others with different CFLAGS)

Phase 2: Capability-Based Filtering

  • Route multilib packages (ABI_X86=32 variants) only to drones that have multilib enabled
  • Route ARM packages only to ARM drones
  • Route CUDA packages only to NVIDIA drones
  • Use existing heartbeat data to know what each drone can actually build

Phase 3: Dependency-Aware Scheduling

  • Build a dependency cache
  • Spawn emerge --pretend in a background thread to resolve dependencies
  • Schedule packages in dependency order instead of FIFO
  • Prevent "build the deps, then the package" redundancy

The full plan is at ~/.claude/plans/floating-gliding-mitten.md. Approved and ready to implement.


01:00 -- How Packages Actually Enter the Queue

Spent time documenting the actual flow because it's not obvious:

build-swarmv3 fresh → runs emerge --pretend --emptytree @world on callisto → parses output → populates queue table.

That's it. Manual only. No auto-rescan when portage syncs. No incremental updates.

This is why the build swarm can't maintain a distribution. You run fresh once, get 1,680 packages, build them all, and then... nothing. The next day, portage adds 5 new packages. But they're not in the queue because nobody ran fresh again.

This is why the Build Profiles system (planned for v2.7.0) is necessary. You define what SHOULD be built, and it stays built. Portage updates? Auto-rescan and queue only what changed. Someone adds a package to the world file? Next sync picks it up.


01:30 -- Final Status

All drones are back online:

  • 4/4 drones online
  • 81.2% success rate (224/276 packages built successfully)
  • 1,299 packages needed
  • 368 packages received (binhost transfers completed)
  • 234 packages synced to io's binhost
  • Duplicate builds mitigated
  • Admin panel working
  • Binhost sync automated via cron

The system is more stable now, but the root problem — lack of strategy for what to build — is still there. The Package Control System will fix that.


The Insight

Infrastructure doesn't have opinions. It'll build the same package twice. It'll sit on 690MB of binpkgs forever. It'll let the admin panel point to the wrong port. The system needs explicit rules about what should happen.

That's what the Package Control System is: explicit rules about which packages should be built, which drones can build them, and what order they should be built in.

Right now we have reactive chaos. Phase 1 is exclusions (stop building things we don't want). Phase 2 is filtering (send each package to drones that can actually build it). Phase 3 is intelligence (understand dependencies and build in the right order).

One session can't design all three. But the design is done. Implementation starts next sprint.


690MB of binpkgs sitting in staging, waiting to be synced. An admin panel pointing to the wrong port. Drones building the same package twice. All of it because the system had no strategy. Strategy starts with Phase 1.