AI Engineering Curriculum
Phase 4: OpenClaw Mastery·18 min read

Module 4.3

Agentic Loops & Overnight Tasks

Part 1 — The Heartbeat & Cron System

What Is an Agentic Loop?

Every AI agent runs the same fundamental loop: receive input → think → act → observe result → repeat. That's true for Claude Code, for your LangGraph pipelines, for CrewAI crews. The loop is the loop.

What makes OpenClaw different is that the loop can start without you. You don't have to send a message. You don't have to be awake. At a configured interval, the Gateway fires the loop on its own — wakes the agent up, runs a turn, and goes back to sleep. This is what makes it a 24/7 assistant rather than a chat window.

OpenClaw gives you two mechanisms for firing autonomous turns: the heartbeat and cron jobs. They're related but distinct, and understanding the difference is the key to building reliable autonomous workflows.

Real-World Use Cases

  • Heartbeat: Check email every 30 minutes. Flag anything urgent. Move a reminder to your calendar if it detects a scheduled item in a thread.
  • Heartbeat: Every hour, scan your task queue file and note if anything is overdue.
  • Cron job: Every morning at 7:00am, run the daily briefing and deliver it to Telegram.
  • Cron job: Every Monday at 6:00am, generate a weekly project status summary and save it to your workspace.
  • Cron job: At a specific time tonight, search for articles on a topic, synthesize findings, write a report, send it to you.
  • Cron + overnight: Queue five research tasks. Let the agent work through them between midnight and 6am. Wake up to five completed files.

Key Terms

Heartbeat — a periodic agent turn that fires on a timer without user input. The agent wakes up, checks its SOUL.md instructions, decides if there's anything to do, and goes back to sleep. It's passive context awareness on a pulse.

Cron job — a scheduled task with a specific prompt that fires the agent at a defined time. Unlike the heartbeat, a cron job gives the agent a direct instruction: "do this right now."

Main session — the ongoing conversation thread the agent shares with you. Messages inject into your existing context and conversation history.

Isolated session — a fresh, blank agent session with no history. Each cron run starts clean. The agent knows nothing about your previous conversations unless you put that context in the job's prompt.

Delivery mode — how the output of a cron job gets to you. Options: announce (posts directly to a channel), webhook (HTTP POST to a URL), or none (silent, internal only).


The Heartbeat

The heartbeat is the simplest autonomous mechanism. Every N minutes, the Gateway injects a silent system event into the agent's main session. The agent runs a turn — no user message, just the event — and decides what to do.

By default, this fires every 30 minutes (60 minutes if you're on Anthropic OAuth, due to rate limits). You configure the interval:

json5
{ gateway: { heartbeatInterval: 1800000 // milliseconds — 30 minutes } }

What does the agent do during a heartbeat? Whatever its SOUL.md tells it to do.

If your SOUL.md says nothing about heartbeats, the agent will likely do very little — maybe check if there's anything pending, decide there isn't, and stop. If your SOUL.md has explicit heartbeat instructions, it will follow them:

markdown
## Heartbeat Instructions On every heartbeat: 1. Check ~/tasks/queue.md for any tasks in "Up Next" status 2. If there's an "Up Next" task, move it to "In Progress" and begin working on it 3. Check for any overdue calendar events and flag them to me via Telegram 4. If nothing to do, do nothing — don't send a message just to confirm you're alive

The heartbeat is how you turn your assistant from "reactive" (only responds when you talk to it) to "proactive" (monitors things and acts on its own).

The key mental model: The heartbeat is a passive sweep. It's the agent glancing at the situation every half-hour. It's not designed for heavy lifting — it's designed for ambient awareness and lightweight tasks that keep your workflow moving.


Cron Jobs

Cron is the full scheduler. If the heartbeat is a glance, a cron job is a dedicated work session with a specific brief.

Three ways to schedule a cron job:

1. One-shot — fires once at an exact time:

Bash
openclaw cron add \ --name "Review reminder" \ --at "2026-02-20T09:00:00Z" \ --session main \ --system-event "Remind Tommy to review the Q1 report." \ --wake now

2. Interval — fires every N milliseconds:

json5
{ "schedule": { "kind": "every", "everyMs": 3600000 } } // every hour

3. Cron expression — the standard 5-field schedule:

Bash
openclaw cron add \ --name "Morning briefing" \ --cron "0 7 * * *" \ --tz "America/Los_Angeles" \ --session isolated \ --message "Generate my daily briefing." \ --announce --channel telegram --to "+15551234567"

Cron expressions follow the format: minute hour day-of-month month day-of-week. Five fields. * means "every". So 0 7 * * * means: at minute 0 of hour 7, every day of every month, every day of the week — i.e., 7:00am daily.

Jobs persist at ~/.openclaw/cron/jobs.json. They survive Gateway restarts. Add a job once; it runs forever until you remove it.


Main Session vs. Isolated Session

This is the most important decision in cron job design.

Main session jobs inject a system event into your active conversation. The agent sees your full history — everything you've discussed, all prior context. Good for: reminders, nudges, injecting time-sensitive information into an ongoing conversation.

Bash
--session main --system-event "Reminder: budget review in 30 minutes."

The agent will respond in your conversation thread, as if you'd sent that message yourself.

Isolated session jobs start a completely blank agent context. No conversation history. No prior context. The agent knows only: its SOUL.md, its skills, its tools, and whatever you put in the job's --message. Good for: recurring reports, overnight research, anything that should be a clean slate every run.

Bash
--session isolated --message "Summarize all open GitHub issues from the last 7 days."

Each isolated run is labeled cron:<jobId> in your transcripts. This makes it easy to find and review what ran overnight.

The rule: Use main for context-aware nudges. Use isolated for task execution. Almost all production overnight workflows use isolated — you don't want the Monday briefing contaminated by what you were discussing on Sunday.

The trap: Isolated sessions know nothing. If your isolated job prompt says "check on the project we discussed", the agent has no idea what project that is — it has no history. Every isolated job prompt must be fully self-contained. All context the agent needs must be in the prompt itself.


Delivery Modes

After an isolated cron job finishes, how does the output reach you?

announce — the Gateway sends the output directly to a channel (Telegram, WhatsApp, Slack, etc.) on your behalf. No extra model turn for delivery — it uses the channel adapter directly. This is what "deliver to my phone" looks like in practice.

Bash
--announce --channel telegram --to "+15551234567"

webhook — the Gateway POSTs the completed job payload as JSON to a URL you specify. Good for pushing results into external systems — a dashboard, a database, another service.

Bash
--webhook --to "https://myapp.example.com/agent-results"

none — no delivery. The job runs silently. Output exists in the transcript but goes nowhere. Useful for internal jobs where the agent is writing to files or updating state, not producing a message.


Model and Thinking Overrides Per Job

Not every cron job deserves the same model. A daily health-check heartbeat can run on Haiku. A deep weekly analysis should run on Opus with extended thinking.

You can override the model and thinking level per job:

Bash
openclaw cron add \ --name "Weekly strategy review" \ --cron "0 6 * * 1" \ --session isolated \ --message "Analyze last week's work. What patterns do you see? What should I prioritize this week?" \ --model "anthropic/claude-opus-4-6" \ --thinking high \ --announce --channel telegram --to "+15551234567"

--thinking high enables extended thinking — the model reasons more deeply before responding. For complex analysis, it's worth the extra cost and latency. For a simple briefing, it's waste.

Resolution order when a job runs: job-level override → agent config default → global default.


The Overnight Task Pattern

Here's how to structure work you want done while you sleep.

What makes a good overnight task:

  • Bounded scope. "Research X" is not bounded. "Search for the top 5 competitors of ProductX, find their pricing pages, extract the pricing tiers, and save the results as a table to ~/research/competitors.md" is bounded. The agent knows when it's done.

  • Explicit output location. Tell the agent exactly where to save results — file path, format, headers. Don't make it decide.

  • Failure mode instruction. "If you can't find pricing for a competitor, note it in the table as 'N/A' and continue." This prevents one failure from halting the whole task.

  • Self-contained context. Since it's isolated, everything the agent needs is in the prompt. No references to "the thing we discussed yesterday."

A complete overnight research job:

Bash
openclaw cron add \ --name "Competitor research" \ --at "2026-02-20T00:00:00Z" \ --session isolated \ --model "anthropic/claude-opus-4-6" \ --message "Research the top 5 direct competitors of ProductX (a B2B SaaS project management tool). For each competitor: (1) company name, (2) pricing tiers with prices, (3) target market, (4) one key differentiator. If you can't find pricing publicly, note 'pricing not public'. Save results as a markdown table to ~/research/competitors.md. Add a 2-sentence summary at the top of the file." \ --announce --channel telegram --to "+15551234567"

When you wake up, you get a Telegram message saying the job completed, and the file is waiting in your workspace.


Error Handling & Retry

For recurring jobs, OpenClaw applies exponential backoff on consecutive failures:

30 seconds → 1 minute → 5 minutes → 15 minutes → 60 minutes

After a successful run, the backoff resets. The agent retries, but it doesn't hammer a broken system with constant attempts.

One-shot jobs don't retry. After a terminal status (ok, error, skipped), the job is done.

Managing your cron jobs:

Bash
openclaw cron list # see all scheduled jobs openclaw cron run <jobId> # trigger immediately, skip schedule openclaw cron runs --id <jobId> --limit 20 # view run history openclaw cron edit <jobId> --message "Updated prompt" openclaw cron remove <jobId>

Check run history regularly when setting up a new overnight job. The first few runs will reveal whether your prompt is working as intended. Don't trust it blindly until you've verified three successful runs.


Gotchas

Top-of-hour cron expressions get a stagger. 0 7 * * * doesn't fire exactly at 7:00:00. OpenClaw adds a deterministic random stagger (up to 5 minutes) for recurring top-of-hour expressions, to spread load across the Gateway. If you need exactly 7:00:00, use --exact in the CLI. For morning briefings, the stagger doesn't matter.

Isolated jobs forget everything — by design. This is not a bug. It's the guarantee that each overnight run is clean and repeatable. But it means you must write prompts that are fully self-contained. Every time you catch yourself writing "as we discussed" in a cron job prompt, stop. The agent doesn't know what you discussed.

Long-running jobs can hit token limits. An open-ended prompt like "research everything about X" can produce an agent loop that runs for hours, accumulating context until it hits the model's limit. Be specific. Bound the scope. Add "complete this in no more than 3 web searches" if needed.

The heartbeat is not a guarantee. If the Gateway is under load, or the previous heartbeat turn is still running, a heartbeat can be skipped. Don't design critical workflows around heartbeat timing. Use cron for anything time-sensitive.

allowUnsafeExternalContent on cron payloads is a debugging flag only. If you see it in a tutorial or example, don't copy it. It disables content safety checks for that job — which means an adversarial email or web page can hijack the overnight task. Leave it off.


Sources


Part 2 — Running OpenClaw 24/7

The Real Problem

Setting up a heartbeat and a morning briefing cron job is satisfying. Then your laptop lid closes, or your machine goes to sleep, or the process crashes at 3am — and the "24/7 assistant" is just a chat window that's not running.

The Gateway is the single point of failure. If it's down, nothing works. No heartbeats. No cron jobs. No channel responses. Your overnight tasks don't run. Your morning briefing doesn't arrive. The whole system is only as reliable as the process keeping the Gateway alive.

This part is about making that reliability real — not aspirational.

What "Running 24/7" Actually Means

Running something 24/7 means two things:

  1. It starts automatically — on boot, on login, without you manually launching it every time
  2. It stays running — if it crashes, something restarts it; it doesn't just silently die

On Linux, systemd handles both. On macOS, launchd (via LaunchAgent/LaunchDaemon) handles both. On Windows, you're running inside WSL2, which has its own considerations.

The tool that solves this problem is called a process supervisor — software whose only job is to start your process, watch it, and restart it if it dies. systemd is the standard process supervisor on Linux. It's already installed. You just need to register OpenClaw with it.


On any modern Linux system, OpenClaw's onboarding wizard handles this automatically:

Bash
openclaw onboard --install-daemon

This registers OpenClaw as a systemd user service — a background service that runs under your user account. You can also do it manually:

Bash
openclaw service install

Once registered, the core commands you need to know:

Bash
# Start the Gateway now systemctl --user start openclaw-gateway # Stop it systemctl --user stop openclaw-gateway # Check whether it's running and any recent errors systemctl --user status openclaw-gateway # Watch live logs as they come in journalctl --user -u openclaw-gateway -f # Enable auto-start at boot (does this by default on install) systemctl --user enable openclaw-gateway

The --user flag means this service runs as your user, not as root. It starts when you log in. If you want it to start at boot (before any login), you need to enable user lingering:

Bash
loginctl enable-linger $USER

That one command is often the missing piece. Without it, your systemd user services only start when you're logged in — the machine reboots, nobody logs in, OpenClaw doesn't start.

Automatic restart on crash: The systemd service file that OpenClaw installs includes restart behavior. But it's worth knowing what it looks like and confirming it's in place:

ini
[Service] Restart=on-failure RestartSec=10s

Restart=on-failure means: if the process exits with a non-zero status (i.e., it crashed), systemd waits 10 seconds and starts it again. It won't restart if you manually stop it (systemctl --user stop). That's the correct behavior.

Reading logs when something goes wrong:

Bash
# Last 50 lines of Gateway logs journalctl --user -u openclaw-gateway -n 50 # Logs from the last hour journalctl --user -u openclaw-gateway --since "1 hour ago" # Follow live journalctl --user -u openclaw-gateway -f

When a cron job fails silently or the heartbeat stops, the logs are the first place to look.


macOS: LaunchAgent

On macOS, the equivalent of systemd is launchd, and the user-level variant is a LaunchAgent.

Bash
openclaw service install

This creates a LaunchAgent plist file in ~/Library/LaunchAgents/ and loads it. Check that it's registered:

Bash
launchctl list | grep openclaw

If you see it in the output, it's registered. The LaunchAgent starts when you log in and keeps the Gateway running in the background.

The limitation: LaunchAgents start at login, not at boot. If your Mac restarts and nobody logs in, OpenClaw doesn't start. For a personal machine where you're always logged in, this doesn't matter. For a machine you want running headless (no monitor, no login), it does.

For true headless 24/7 on macOS: enable auto-login in System Settings → Users & Groups, or use a LaunchDaemon (system-level, runs at boot before login — requires root). In practice, most macOS users running OpenClaw seriously just use a VPS instead.

macOS sleep is the other enemy. By default, Macs sleep after inactivity. A sleeping Mac doesn't run background processes. Prevent sleep:

Bash
# Prevent sleep indefinitely (until you restart caffeinate) caffeinate -i & # Or configure it permanently in System Settings → Battery → Prevent sleep

Windows: WSL2

OpenClaw has no native Windows support. On Windows, you run it inside WSL2 — Windows Subsystem for Linux 2, which runs a real Linux kernel inside a lightweight VM.

Inside WSL2, you have a full Ubuntu (or Debian) environment. Install OpenClaw as you would on Linux. Run it as a systemd service inside WSL2.

The complication: WSL2 doesn't fully support systemd by default on older Windows builds. On Windows 11 (build 22621+), systemd is supported. Enable it in your WSL2 config:

ini
# /etc/wsl.conf inside WSL2 [boot] systemd=true

Restart WSL2: wsl --shutdown in PowerShell, then reopen.

The bigger problem: WSL2 shuts down when you close the last terminal window. OpenClaw dies with it. Workarounds:

  • Keep a terminal open (not real 24/7)
  • Use a Windows Task Scheduler job to keep WSL2 running
  • Use a VPS (the honest solution)

For anything beyond experimenting on Windows, use a VPS.


The VPS Option (Best for Real 24/7)

A VPS — Virtual Private Server — is a small Linux server you rent in a data center. It's always on, always connected, never sleeps. The entry level is $4–6/month at DigitalOcean, Hetzner, or Linode. For OpenClaw, that's more than enough.

The setup is identical to Linux locally:

Bash
# On the VPS (Ubuntu 22.04 or later) npm install -g openclaw@latest openclaw onboard --install-daemon loginctl enable-linger $USER

Then configure your channels, add your .env with API keys, and the VPS runs 24/7 indefinitely.

Why a VPS beats your laptop for this:

Laptop/DesktopVPS
UptimeOnly when on + awake24/7 always
NetworkChanges IPs, can go offlineStatic IP, always connected
RestartsManualAutomatic (systemd)
CostAlready own it~$5/month
SleepKills processesNever sleeps

The one thing you give up: local file system access. The VPS can't read files on your laptop. For tasks that need local files, you'd either sync them to the VPS (via git, rsync, or a shared folder) or keep OpenClaw local and accept the uptime tradeoff.

A common pattern: run OpenClaw on a VPS for channels and scheduling, but mount a network share or use git to sync the workspace back to your local machine. Best of both worlds.


The Watchdog: Knowing When Things Break

systemd handles crashes. But the worst failure mode isn't a crash — it's a silent failure. The Gateway is running. Systemd is happy. But heartbeats are failing, cron jobs are erroring, and you have no idea.

The simplest watchdog: a daily cron job that proves the system is working by sending you a message.

Bash
openclaw cron add \ --name "Daily health ping" \ --cron "0 9 * * *" \ --tz "America/Los_Angeles" \ --session isolated \ --message "Send a single short message: 'Gateway healthy ✅ — [today's date]'. Nothing else." \ --announce --channel telegram --to "YOUR_PHONE_NUMBER"

If you don't get this message at 9am, something is wrong. You know immediately, instead of discovering at 11pm that the overnight tasks didn't run.

Add a second check for cron job health:

Bash
# Check recent run history for any jobs that errored openclaw cron runs --limit 20

Make it a habit to run this after any Gateway restart or update.

OS-level health check (belt-and-suspenders): A cron job at the OS level (not inside OpenClaw — actual Linux crontab) that pings the Gateway API and restarts the service if it doesn't respond:

Bash
# In your Linux crontab (crontab -e) */5 * * * * curl -sf http://localhost:18789/health || systemctl --user restart openclaw-gateway

Every 5 minutes, curl the Gateway health endpoint. If it fails (non-zero exit), restart the service. This catches cases where the Gateway is technically running but not responding — a zombie process that systemd doesn't know is broken.


Known Production Failure Modes

These are documented real failures from users running OpenClaw in production. Knowing them in advance saves hours.

Silent heartbeat failures. The Gateway runs. Systemd is happy. But the heartbeat silently errors — maybe an API call times out, maybe a skill fails, maybe there's a transient rate limit. The Gateway doesn't crash, so systemd doesn't restart it. The daily health ping catches this; nothing else does.

Config drift after updates. OpenClaw updates occasionally. New config fields appear, old ones change. After an update, the Gateway might reject your existing config. Always check openclaw gateway --validate-config after updating before restarting.

Gateway token mismatch. After a version upgrade or VPS rebuild, the stored Gateway auth token may no longer match. Symptoms: channels can connect but the Control UI can't authenticate. Fix: rotate the token in config and reconnect all clients.

Path issues in systemd. When the Gateway runs as a systemd service, it doesn't inherit your shell's PATH. If a skill requires a binary that's installed in a non-standard location (like Homebrew: /home/linuxbrew/.linuxbrew/bin), the systemd service won't find it. Fix: add the path explicitly in the service's Environment=PATH=... directive, or use absolute paths in skill commands.

WSL2 sleep. On Windows, WSL2 will shut down after a period of inactivity if no Windows processes are keeping it alive. OpenClaw inside WSL2 dies with it, silently. The health ping will catch it — the next morning.

Invalid config causes immediate shutdown. An unclosed bracket, a typo in a key name, an invalid value — the Gateway rejects the config at startup and shuts down without a helpful error message (sometimes). Always edit config with the Gateway stopped, validate the JSON5 syntax first, then restart.


The Minimal Production Checklist

Before declaring your OpenClaw setup "production":

  • Gateway registered as a system service (openclaw service install)
  • Auto-start at boot enabled (loginctl enable-linger $USER on Linux)
  • Daily health ping cron job configured and delivering
  • OS-level watchdog cron checking Gateway health every 5 minutes
  • Log access confirmed (journalctl --user -u openclaw-gateway works)
  • Config validated after initial setup
  • .env file permissions locked down (chmod 600 ~/.openclaw/.env)
  • At least 3 successful heartbeat cycles confirmed after setup

Do this once, verify it, and you have an assistant you can actually trust to be running when you need it.


Sources