Zfs backup juggling tool -- snapshotting, archiving, retention.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Mirek Kratochvil 7ddbff0376 add a relatively permissive license 1 year ago
LICENSE add a relatively permissive license 1 year ago
Makefile Makefile for installing 5 years ago
README.md readme: notice about local usage 1 year ago
zb-cleanup add some useful error messages 4 years ago
zb-pull support local pull 2 years ago
zb-snap require bash 3 years ago

README.md

ZFS-Backup

The zfs backing-up tool. ha-ha.

Tools

  • zb-snap <zfs_object> creates a snapshot
  • zb-cleanup <zfs_object> <density> [max age] destroys unnecessary snapshots
  • zb-pull <ssh_connection> <remote_zfs_object> <local_zfs_object> pulls most recent snapshots of remote_zfs_object to local_zfs_object, using ssh called with ssh_connection

Requirements

bash shell and zfs utils are needed. zb-pull requires ssh.

zfs-backup requires GNU date or compatible, other date programs may fail. Test is simple, check if this command works for you:

date --date=now

Installation

Run make install, it installs itself to some sbin/. You can also specify DESTDIR=/usr/local/ or similar.

For local changes (command aliases/wrappers, PATH setting etc.), file $HOME/.zb-rc is sourced before any commands are run.

Example

$ zb-snap tank/test
$ zfs list -t snapshot
NAME                                    USED  AVAIL  REFER  MOUNTPOINT
tank/test@zb-2014-06-07_10:46:19_p0200     0      -    34K  -

$ zb-snap tank/test
$ zb-snap tank/test
$ zb-snap tank/test
$ zfs list -t snapshot
NAME                                    USED  AVAIL  REFER  MOUNTPOINT
tank/test@zb-2014-06-07_10:46:19_p0200     0      -    34K  -
tank/test@zb-2014-06-07_10:46:51_p0200     0      -    34K  -
tank/test@zb-2014-06-07_10:46:52_p0200     0      -    34K  -
tank/test@zb-2014-06-07_10:46:54_p0200     0      -    34K  -

$ zb-cleanup tank/test 200
$ zfs list -t snapshot
NAME                                    USED  AVAIL  REFER  MOUNTPOINT
tank/test@zb-2014-06-07_10:46:19_p0200     0      -    34K  -
tank/test@zb-2014-06-07_10:46:54_p0200     0      -    34K  -

---- other machine ----

$ zb-pull root@first.machine.example.com tank/test tank/repl
$ zfs list -t snapshot
NAME                                    USED  AVAIL  REFER  MOUNTPOINT
tank/repl@zb-2014-06-07_10:46:19_p0200     0      -    34K  -
tank/repl@zb-2014-06-07_10:46:54_p0200     0      -    34K  -

Recommended usage and a word about density

There is a long-time backup weirdness about that everyone wants some “hourly backups” along with “daily backups”, “monthly backups”, sometimes “weekly”, “yearly”, “full-moon”, “christmas” and “ramadan”.

I don’t like this approach simply for it’s not machine-enough. Instead, I choose to generate the backups regularly, and forget some of the backups from time to time. Obvious way to achieve a good ratio between how many backups to hold vs. their age is “less with the time”, e.g. “for backups that are X hours old, don’t keep backups that are closer than X/10 hours apart”.

This creates a pretty good logarithmic distribution of datapoints in time, can be generally extended to any backup scheme, and looks cool because there is no god damned human timing.

From there, my setup goes like this:

  • run zb-snap every night (or every hour, if I want it to be denser; it generally doesn’t really matter).
  • run zb-cleanup with density around 400 to cleanup old stuff

And on remote backup machines:

  • zb-pull every morning
  • zb-cleanup with a slightly higher density number (it keeps more backups)

FAQ

What exactly does zb-cleanup clean up?

Candidates for backup deletion are determined like this:

  1. if shapshot is older than max_age, delete it right away.
  2. get two historically subsequent snapshots. Determine time in seconds since the newer was created is X seconds, time since the older was created is Y. Obviously X is less than Y.
  3. Calculate density*(Y-X)/Y. If the result is less than 1.0, delete the closer backup.

How to determine your density and other numbers?

Density is “maximum ratio of time between backups to age of backups, in percent”.

Good approach to determine it (with all the other numbers) is this:

  1. Take several time specifications of how much backups you want:
    • “I want at least 7 backups per last week”
    • “I need One backup daily”
    • “I want at least 4 backups per month”
    • “I want one backup yearly”
  2. Convert them to reasonable numbers to the sortof table:
    • 7 times, 7 days
    • 1 time, 1 day
    • 4 times, 31 days
    • 1 time, 365 days
  3. Get your density as maximal value from the first column, and max_age as maximum of the second column. Run zb-cleanup periodically with that values. E.g. in our example: zb-cleanup data/set 700 '1 year ago'.
  4. Setup cron to run zb-snap periodically in time interval same as minimum value from the second row - in our case, daily. (probably in morning or somehow off-peak hours).

It doesn’t work from cron!

Check if the environment is the same as when you test the stuff from the command line. At least two common caveats exist:

  • PATH may be different in cron (which may select wrong date program to run, or not find something other like custom-installed zfs). Edit ~/.zb-rc and fix PATH there.
  • Some SSH authentication methods may not work from cron environment due to missing ssh-agent, especially the password-protected privkeys. Descriptions of many workarounds are available around the internet.

Backups pulling soo sloowwwwwww!

There are two possible bottlenecks. We cannot actually cure ZFS’s internal send/recv speed (for that, add a multitude of faster disks and caches), but we can usually speed up SSH data tranfer a lot. Best advice currently available is this: https://gist.github.com/KartikTalwar/4393116

In short, to use the fastest SSH cipher around, add something like this to your user’s SSH config file:

Host  fill.in.some.host
Ciphers arcfour

Make sure that you understand possible security and compatibility implications of this configuration. Specifically, note that some recent SSH installations disable arcfour-family ciphers completely for a good reason. If you have aes CPU extension, aes128-gcm could work quite fast as well.

Can I pull among backups on local machine without SSH?

Yep, use - instead of the SSH connection string.

Disclaimer

Be sure to verify that this software really fits your use-case before you use it. Backups are precious.