Cart
Net 9 regions NRT 232 ms SYD 264 ms AMS 12 ms Uptime 30d 99.997 %
All posts

Engineering

How we handle customer backups (and why we don't trust "free daily backups" alone)

Omega Digital 8 min read

Every hosting provider advertises "free daily backups". That phrase is doing a lot of work. Here is exactly what our backup setup covers, what it does not cover, how often we actually test restores, and what we think customers should do on top.

Contents

  • · What "daily backup" actually means at most hosts
  • · What our backup setup covers
  • · Retention, storage location, and who can see what
  • · Restore testing: the part nobody advertises
  • · What you should do on top of our backups
  • · Provider backup versus disaster recovery

If you have ever read hosting marketing, you have read "free daily backups". It is one of those phrases that sounds reassuring and means something slightly different at every provider. We want to be specific about what we do, what we do not, and where the gaps are that a customer should fill on their own.

What "daily backup" actually means at most hosts

At most providers, and for years at us, "daily backup" meant one of these: a full filesystem snapshot taken at the hypervisor level once per 24 hours, or an rsync-to-another-disk that copies changed files overnight. Either approach produces something you can restore from. Neither is a disaster recovery plan, and two specific failure modes hit both:

  • · If the backup is on the same server (or same SAN) as the origin, a hardware failure takes both out simultaneously. We have seen this happen elsewhere. The "daily backup" folder was on the same RAID array that failed.
  • · If the backup runs at 03:00 and the site is corrupted at 02:55, the backup captures the corrupted state. A single point-in-time snapshot with a short retention window means you have a clean window of only a few hours to notice and restore.

Neither of these is theoretical. Both have happened to real customers at real hosts. Our setup is designed specifically to avoid them.

What our backup setup covers

On every shared and reseller server, the backup schedule is:

  • · Filesystem snapshot: daily, using LVM snapshots on the host. Retained for 14 days rolling.
  • · Off-server sync: daily, to a separate backup storage server in a different network segment. Retained for 30 days rolling.
  • · Off-site sync: weekly, to a physically separate datacenter. Retained for 90 days rolling.
  • · Database-only dumps: daily mysqldump per customer database, compressed, stored alongside the filesystem sync. Retained for 14 days.

What is not backed up, and we want to be explicit about this: email spool on servers where customer mail is hosted on the same box (we back those up separately but on a slower schedule); DNS zones managed by external registrars (we have no access); any data that lives in a customer's third-party service, like a Stripe account or an external CDN cache.

For VPS and dedicated customers, the default is filesystem-snapshot only, daily, 7-day retention, on the same datacenter. Off-site backup is an opt-in add-on. This is because VPS customers are much more heterogeneous: some run databases we cannot dump without cooperation, some deliberately treat their server as immutable infrastructure and do not want us touching it, some are running backup systems of their own.

Retention, storage location, and who can see what

Backups are encrypted at rest on the off-site target with AES-256. The encryption key is held by the backup service account, not by the individual server, so a compromised web server cannot read the backup even for its own data. Access to restore backups requires a separate administrative credential.

Retention is retention. A 30-day rolling backup means day 31 is gone. If a customer reports a problem 45 days after it happened, we may have the weekly off-site copy and we may not. This is a limitation we are honest about, not a gap we hide.

Restore testing: the part nobody advertises

A backup you have never restored is a hypothesis. We test restores monthly, not on every backup, but on a random sample. Specifically:

  • · One random customer site per shared server is restored to a scratch server each month, and we verify the site loads and the database is readable. The customer is not notified (we are restoring to a scratch host, not to their live environment).
  • · Each off-site backup destination has a checksum sweep run weekly: we pick 500 random files per server, compare the backup checksum to the origin checksum, and alert if any differ.
  • · Full disaster recovery rehearsals happen quarterly. We simulate loss of an entire server and restore it to a new host end-to-end, timing how long it takes.

The quarterly rehearsal is the most useful of the three. On the first one we ran, we discovered that the LVM snapshot format we were producing was fine for file-level restore but painful for whole-machine restore: the process took 11 hours where we expected 3. We changed the backup format afterwards. That is the kind of thing you only find out by actually running the restore.

What you should do on top of our backups

Even with everything above, we tell every customer: keep your own backup. Not because we do not trust ours, but because the scenario where you most need a backup is also the scenario where your provider might be unavailable. "I cannot reach my hosting provider" and "I cannot reach my backup" should never be the same sentence.

Concrete recommendations:

  • · Set up at least one backup destination that is not us. An S3-compatible bucket at a different provider is the easiest. Roughly €1/month for a small WordPress site's worth of data.
  • · For WordPress, schedule a weekly wp-cli database dump pushed offsite. We publish the cron snippet below. For a database-driven app of any kind, the database dump is the part you cannot afford to lose.
  • · Keep your configuration in git. Nginx/Apache config, systemd units, deploy scripts, cron entries: everything that took you a day to get right the first time. A server you can rebuild from a git clone and a backup tarball is a resilient server.
  • · Test your own restore at least once. Stand up a VM from scratch, pull your offsite backup, and see how long it takes to get back to a working state. If the answer is "I do not know", that is the experiment worth running before you need the answer.
bash
#!/bin/bash
# /etc/cron.weekly/wp-backup-offsite
# Dump WordPress DB and sync to an offsite S3-compatible bucket.
# Expects: WP_PATH, BACKUP_DIR, and S3_BUCKET set in /etc/default/wp-backup
set -euo pipefail
source /etc/default/wp-backup

stamp=$(date -u +%Y%m%dT%H%M%SZ)
out="$BACKUP_DIR/$(basename "$WP_PATH")-$stamp.sql.gz"

# DB dump via wp-cli, piped straight through gzip
wp --path="$WP_PATH" db export - --default-character-set=utf8mb4 \
  | gzip -9 > "$out"

# Also capture the uploads directory
tar -C "$WP_PATH/wp-content" -czf \
  "$BACKUP_DIR/$(basename "$WP_PATH")-uploads-$stamp.tar.gz" uploads/

# Push offsite, retain 12 weekly copies, delete older
aws s3 cp "$out" "s3://$S3_BUCKET/" --storage-class STANDARD_IA
aws s3 cp "$BACKUP_DIR/$(basename "$WP_PATH")-uploads-$stamp.tar.gz" \
  "s3://$S3_BUCKET/" --storage-class STANDARD_IA

# Clean up local copies older than 7 days
find "$BACKUP_DIR" -type f -mtime +7 -delete

Provider backup versus disaster recovery

The distinction matters. A provider backup is insurance against the provider's failures: a drive dies, a process corrupts a file, an operator makes a mistake. A disaster recovery plan is about your ability to keep operating when something larger happens: the provider has a regional outage, a billing issue suspends your account, a state actor seizes the datacenter, a lightning strike takes out a rack.

Provider backups solve the first. Only you can solve the second. The honest SLA for any hosting provider is: we will hold up our end, and we will tell you when we cannot, and we recommend you be in a position to keep operating either way.

The realistic SLA for any hosting provider is: we will hold up our end, we will tell you when we cannot, and you should still have your own backup.

If you host with us, you get the backup setup above by default. If you host elsewhere, ask your provider the same specific questions: where is the backup physically, how often is restore tested, what is the retention, who holds the encryption key. A provider that cannot answer those questions precisely is one you should be backing up independently from, just as aggressively.

OD

Omega Digital

Platform · Omega Digital