The Placeholder That Broke Telegram
The Discovery
I check the OpenClaw logs and see this on repeat:
[telegram] deleteMyCommands failed: 404: Not Found
[telegram] deleteWebhook failed: 404: Not Found
[telegram] [default] channel exited: Call to 'deleteWebhook' failed
[telegram] [default] auto-restart attempt 7/10
Ten restart attempts. Then the health monitor pauses recovery — rate limit, 3 restarts per hour max. And then... nothing. The bot sits there, broken, while every other channel works fine.
The Root Cause
Open up the config file on the Docker host. There it is:
{
"channels": {
"telegram": {
"enabled": true,
"botToken": "REPLACE_WITH_TELEGRAM_BOT_TOKEN"
}
}
}
That's a placeholder. From the initial template. It was NEVER replaced with the actual bot token.
The Telegram API is getting a request with the literal string REPLACE_WITH_TELEGRAM_BOT_TOKEN as an auth token. Of course it returns 404. There's no bot with that token because that's not a token, it's a sticky note that says "put the real thing here."
How Long Was It Broken?
That's the uncomfortable question. The web UI worked fine. The admin APIs worked fine. Cron jobs ran on schedule. Everything about OpenClaw looked healthy... unless you specifically tried to use Telegram.
Multi-channel systems are sneaky like that. One channel dies and the others keep humming along. If nobody's actively using the broken channel, nobody notices. The monitoring showed a healthy system because it was checking the wrong things.
The Fix
Fifteen minutes. Looked up the actual bot token from my credentials vault. Set it in three places — the .env file, the docker-compose override, and the config file directly. Restarted the container.
[telegram] [default] starting provider (@argobox_oc_bot)
Delivery recovery: 2 messages recovered
Two messages that had been sitting in the queue, waiting for a channel that worked. Recovered automatically once the token was real.
Why This Happens
Placeholder values in config templates are a trap. You write REPLACE_WITH_ACTUAL_VALUE as a reminder to yourself. Then deployment happens and nobody runs the replacement step. Or it gets skipped because the container starts up fine and the logs are 200 lines of normal startup noise before the Telegram 404s scroll past.
The config file was version-controlled with the placeholder (correct practice — don't commit secrets). But the step between "committed template" and "running production config" was... a human remembering to do it. And humans forget.
What I Changed
First: the obvious. Real token in place, triple-checked.
Second: I documented every placeholder that exists in OpenClaw's config with a big "THIS MUST BE REPLACED" note. Not in a separate runbook that nobody reads — in the config file itself, as comments next to the placeholders.
Third: I started writing a pre-deployment validation script. Something that greps for REPLACE_WITH in the running config and screams if it finds anything. Should've had this from day one.
The health monitor's rate-limiting actually worked as designed — it prevented a cascade of rapid restarts from making things worse. But it also masked the problem by making the bot appear "stable" (paused) instead of "broken" (restart-looping). Trade-offs everywhere.
The Real Problem
I have a monitoring system. I have a health check. I have container restart policies and rate limits and delivery recovery queues. All of that worked exactly as designed.
What I didn't have was a check for the most basic failure mode: "is the config file actually configured?" Sometimes the sophisticated system works perfectly and the thing that breaks you is a string that says REPLACE_WITH_TELEGRAM_BOT_TOKEN sitting in production for weeks.