Skip to main content
user@argobox:~/journal/2026-02-07-25-bugs-one-night
$ cat entry.md

25 Bugs, One Night

○ NOT REVIEWED

25 Bugs, One Night

Session: Feb 7, 6:00 PM — Feb 8, 6:30 AM Duration: 12.5 hours Bugs Fixed: 25+ Components Touched: 4 Coffee Consumed: Don’t ask


18:00 — The Audit Begins

Sat down planning to “quickly review” the codebase before a deploy. Opened the drone module first. Opened five more files. Opened the coordinator, the dashboard, the auto-balance logic.

By 18:30 I had six component reviews running in parallel and a growing list of things that made me wince.

19:15 — The Drone Bugs (7 total)

Seven bugs in the drone module alone. Most were edge cases around error handling and state transitions. But one was special.

The recursive retry bug. The star of the evening.

When a package build failed and the drone retried, the retry logic was appending --retry-after-fix to the package atom string. Every. Single. Retry.

Attempt 1: emerge -1 sys-libs/glibc
Attempt 2: emerge -1 sys-libs/glibc --retry-after-fix
Attempt 3: emerge -1 sys-libs/glibc --retry-after-fix --retry-after-fix
Attempt 4: emerge -1 sys-libs/glibc --retry-after-fix --retry-after-fix --retry-after-fix

Portage doesn’t know what --retry-after-fix is. So every retry after the first was guaranteed to fail. Which triggered another retry. Which appended another flag. The string was growing unbounded until the drone gave up and grounded itself.

Past Me wrote that. Present Me fixed it and pretended it never happened.

21:40 — The Coordinator Bugs (8 total)

Eight bugs in the coordinator. State machine inconsistencies, a race condition where two drones could claim the same package if the coordinator was slow to update, and a nasty one where completed packages weren’t being removed from the queue under specific failure-then-success sequences.

The race condition was the scariest. In a 4-drone swarm, the odds of collision were low but nonzero. On a big queue run with drone-Izar (16 cores) and drone-Meridian (24 cores) both grabbing fast, it would have surfaced eventually.

23:30 — Dashboard Overhaul

The dashboard had a WebSocket data overwrite bug. When multiple drones reported status within the same render cycle, the last one to report would overwrite the others. So if drone-Tarn reported idle and drone-Izar reported building, you’d only see drone-Izar’s state.

Turned out the WebSocket handler was writing directly to shared state without merging. Fixed it with per-drone state buckets that merge on render. Simple fix, embarrassing bug.

01:15 — Auto-Balance for Idle Drones

New feature instead of a fix. If a drone finishes its batch and the queue still has work, the coordinator now automatically assigns more. Before this, idle drones would just sit there smugly while others were still grinding.

drone-Meridian has 24 cores. When it finishes fast and sits idle while drone-Tau-Beta (8 cores) is still chugging, that’s wasted compute.

03:00 — Deployment

Version bumped to 2.5.1. Tagged. Built. Deployed to all four drones and orchestrator-Izar.

for host in drone-Izar drone-Tarn drone-Meridian drone-Tau-Beta orchestrator-Izar; do
    echo "Deploying v2.5.1 to $host..."
    # rsync, restart, verify
done

Every node updated. Every service restarted. Every health check green.

05:45 — Running Clean

Kicked off a full queue run to verify. Watched the dashboard (now actually showing all drones correctly) as work flowed through.

Queue:     152 packages
Drones:    4 active, 0 idle, 0 grounded
Cores:     58 active
Status:    BUILDING

152 packages in the queue. 58 cores across 4 drones. All of them building. No corrupted atom strings. No race conditions. No overwritten dashboard state. No idle drones sitting around doing nothing.

06:30 — Done

12.5 hours. 25+ bugs. 1 new feature. 4 components. 5 nodes deployed. v2.5.1 out the door before sunrise.

The recursive retry bug was the worst one — silent corruption that made every retry worse than the last. The kind of bug that looks like “intermittent build failures” in the logs until you read the actual emerge command and see a flag that shouldn’t exist, appended four times.

I should sleep. The drones don’t need to, but I do.


152 packages. 58 cores. Zero corrupted atom strings. v2.5.1 earned its version number.