English
A trip2g network on preemptible nodes
Short version. A trip2g knowledge network (markdown vaults backed by SQLite) can run on preemptible nodes. They cost far less and can vanish at any moment. In a synthetic test the network survives it: a node dies, comes up on another machine, and restores its data in about 35 seconds, losing no more than 1 second of writes. What holds the data is not the local disk but a copy in object storage (Litestream). The rest is detail: what I tested, how it works, and where the limits are.
I tested on a throwaway two-node cluster, a real power-off via the Hetzner API. The test is synthetic, not production: it shows the mechanism holds, not that you will never lose data.
What preemptible nodes are
Preemptible (spot) nodes can be taken back by the provider within seconds, for a much lower price. It takes them back hard: the hypervisor reclaims the machine, you get a SIGTERM and about 30 seconds. There is no polite "let me finish." For a knowledge network that chaos is acceptable under one condition: every node must come up with its data, on any machine.
What it saves (provider numbers): AWS Spot is usually 70-90% cheaper than on-demand, Google Cloud Spot 60-91%, Yandex Cloud preemptible a flat 50% (and they live no longer than 24 hours). So 2x to about 10x depending on provider and machine type. One caveat: what follows is a simulation (power-off via API on throwaway VMs), repeatable even on a home machine. The real saving comes from actual cloud spot; the test only shows the network survives how spot behaves.
The warm cache migrates, the data doesn't
Nomad has ephemeral_disk { sticky = true, migrate = true }. The docs are accurate: when an allocation moves, the warm cache really does migrate to the new node. The trap is the word "moves": it means the graceful case, and there is no graceful case on spot.
- Graceful drain (
nomad node drain): data migrates. A marker from node 2 showed up intact on node 1 in 27-34 seconds. - Hard power-off (which is what preemption is): the allocation reschedules onto a fresh node and comes up empty.
migratecopies from the old node, and the old node is dead. Nothing to copy.
Why migrate structurally cannot help here: for it to fire, the scheduler has to gracefully drain the allocation, which takes minutes. Spot gives you a SIGTERM and 30 seconds. It does not add up.
What actually survives: a copy off the node
Only an off-node copy survives. Litestream streams the SQLite WAL to object storage (MinIO here) once a second, and on every start the allocation restores from object storage before the app boots. Same hard power-off:
ephemeral_disk{migrate} |
Litestream in MinIO | |
|---|---|---|
| Graceful drain | survives, ~26s | survives, 27-34s |
| Hard power-off | empty disk, data lost | fresh node, 33-35s |
| Loss window | the whole DB if there is no off-node copy (with a periodic backup, the backup interval) | no more than the sync interval (1s); ~0 in the run |
The database came back on a node that had never held it. The trick that makes it robust: restore on every start, not only on a fresh node. Then the object store becomes the single source of truth at every placement.
The config
# prestart: object store is the source of truth on every start
task "restore" {
lifecycle { hook = "prestart" }
# rm -f /data/db.sqlite3* && litestream restore -if-replica-exists -o /data/db.sqlite3 s3://bucket/db
}
# poststart sidecar: stream the WAL once a second
task "replicate" {
lifecycle { hook = "poststart", sidecar = true }
# litestream replicate /data/db.sqlite3 s3://bucket/db (sync-interval: 1s)
}
# job: let Nomad recreate forever and declare the node lost fast
reschedule { unlimited = true }
disconnect { lost_after = "30s", replace = true }
Plus docker volumes { enabled = true } for the host bind, and keep the control-plane and object store on a stable node, not on spot.
Where it loses
One bound: Litestream replicates once a second. A write landing under a second before the power-off is lost; if Litestream dies in the same window as the node, those WAL pages go with it. I measured about zero in this run, but that is luck of timing, not a zero-loss guarantee. Need it tighter: lower the interval and pay for more PUTs.
Between the two extremes (migrate with no copy = total loss, Litestream = no more than 1 second) there is a middle: a periodic off-node backup every N. Loss is then bounded by N (a minute, if you back up every minute), not "everything since placement." The downside of naive rotation: you keep only the last few points and can roll back ~5 minutes at most. The fix is exponential thinning (a point a minute ago, an hour, a day, 7 days) with continuous rotation: tight RPO and deep history at once. That is a separate piece of work.
What's next: zero-downtime reads via a read replica
The restore above costs ~35 seconds where the network is rebuilding. Reads can stay up with no gap. Litestream can keep a continuously-updated read replica on a second node; if the writer's node is preempted, reads keep serving from the replica while a fresh writer comes back from object storage. That is a separate scenario with its own test and write-up (planned; this section will link to it once measured).
Bottom line
- Durability on spot lives off the node: WAL to object storage, restore on every start.
ephemeral_disk migrateis a warm cache for graceful moves, not durability. Spot has no graceful move.- A node's death becomes a 35-second reassembly, not a loss.
- The loss window equals the sync interval (no more than 1 second here). This is a sandbox and a 2x-10x saving, not a zero-loss production guarantee.