UoA

Rsync and Grsync Backups for Linux

From Science IT

Jump to: navigation, search
There are security restrictions on this page

Contents

Grsync

Grsync is a simple GUI for rsync, making it easy to set up and use different rsync configurations. There is a much more detailed explanation of rsync and its options below.

The main advantage of grsync is that it makes it much easier to work out which rsync options are useful - grsync has a selection of the more common rsync options (along with tooltips for them). Once you have worked out which options work well in grsync you can happily invoke the exact same rsync options from the command line or another (more powerful, more useful) rsync front-end.

The disadvantage of grsync is that it is a very thin veneer over grsync and doesn't do anything at all fancy.

This makes grsync a good choice for ad hoc backups and working out what you really want from a more powerful (and more confusing) backup system. Grsync in itself is good enough for (say) mirroring your entire /home directory to an external USB drive (that you normally keep off-site) once a week or once a month.

It is worth making sure rsync/grsync is working properly and doing the right things for you before you start relying on it as a reliable backup.

  • Ensure rsync and grsync are installed on your local desktop/laptop. Under most common Linux distributions grsync will be available, finding and installing grync through your favourite package manager will implicitly install rsync (if rsync isn't already installed).
  • I prefer to turn on the "Verbose" option for reassurance. It adds information about which files are being copied during a backup, it is otherwise difficult to tell if anything is happening at all.
  • There is a "Simulation" button in grsync - this tells you exactly which files will be copied/updated without doing a backup.
  • Do a first backup while you have good bandwidth (you are preferably on the same network as the destination server). This first backup will be much larger (and slower) than subsequent backups.
  • After this first backup check what files are created on the server - if you can't find your essential files grsync options need to be changed (perhaps the source directory, or filtering/exclusion/filelist options as in the examples below).


General ideas for any backup (using any software):

  • Decide what files are really important to you, and check that they are being backed up.
  • It may be equally important to ensure you don't back up unnecessary (large) files that you don't need backed up. Why back up files that can be restored by other means (videos, music, ISO disk images, snapshotted virtual machines, operating system files) or your huge, un-emptied trash folder?
  • Make sure you can back up and restore files. False security is worse than no security at all.
  • Backing up files should be an easy, worry-free exercise. Restoring files however should be done with caution. If I am writing a book in separate chapters and want to restore an earlier version of Chapter 3, I may not want to restore Chapter 4, which I have worked on since the last backup was done. I personally restore files to a different folder and look at them carefully before moving them to their original location.
  • It doesn't work well if you are backing up files locally and (unintentionally) try to back up the destination backup directory. At worst you may completely fill up your disk.
  • Be careful and restrained about backing up files to space-limited servers.


Grsync example 1 - Selective backup of my home directory to a USB stick

I want to back up my Linux home directory, but not the directories Music, Pictures, Videos, Downloads - the USB stick isn't large enough.

There is also no point in backing up various directories produced by Gnome and Mozilla - I want to exclude the directories .thumbnails, .cache, .icons, and any directory called Cache or Trash. There are other directories that could be excluded but they aren't so obtrusive. The idea for grsync (and rsync-based applications generally) is to exclude unneeded large files and unneeded files that continually change.

One good way to do this is to set up an exclusion list - files and directories under my home directory that I don't want to back up.

Below is an example file:

#######file /home/fred/.grsync/grsync-rules#######
# .grsync/grsync-rules

# Large files that I exclude from a backup of the home directory /home/fred/ .
# Feel free to add more file types.  NB *Most* of these files will
# be in the directories I exclude below, but not all of them.

- *.gz
- *.tgz
- *.iso
- *.jpeg
- *.mp3
- *.mp4
- *.png
- *.wmv
# Virtualbox disk images
- *.vdi
# I exclude Mozilla Thunderbird mail index files - they continually rebuild.
- *.msf

# Directories that I exclude.
# A leading slash '/' anchors from the top level of the backup (rather
# than the root of the filesystem).

- /Downloads/
- /Music/
- /Pictures/
- /Videos/
- /.cache/
- /.icons/
- /.thumbnails/

# Subdirectories that I exclude.
- Trash/
- Cache/
- perl-bookshelf/
#######end of file /home/fred/.grsync/grsync-rules#######

/tree/one/ /tree/two/


Example 1 - Basic options

I start grsync, add a new session called "exclude-file-example", and set source and destination directories. The "Browse" buttons only work on locally mounted directories.

The basic options "Preserve time", "Preserve permissions", "Preserve owner", "Preserve group" and "Verbose" should almost always be set. Grsync 0.92 (the version I have, under Ubuntu 10.04) implicitly always enables the "-r" recursion option. Grsync 1.11 (the other version I have seen, via Ubuntu 10.10) has only minor, cosmetic changes to the user interface.

Example 1 - Tooltip

As you mouse over options you get tooltips that give a bit more information.

Example 1 - Advanced options

On the "Advanced options" tab I haven't set any options except in the "Additional options" window: "--exclude-from=/home/fred/.grsync/grsync-rules".

I then click on the "Simulation" button which opens another window reporting the what rsync would say if it did do a backup - without "Verbose" rsync says very little indeed.

Example 1 - Simulation

I took a look at the list of files rsync would back up, along with the total size (a bit over 200 MB), but the important detail was the message "Completed successfully!"

"Non-regular" files are things like soft and hard links and device files.

I then start the real session by clicking the "Execute" button. This has to work much harder. Once again, without the "Verbose" option set you get much less indication of progress. The "Global progress" bar isn't that useful or meaningful - on a large backup I did it continually overestimated how far it had got (and underestimated how much more time it needed).

Example 1 - Execution

The triangle beside "Rsync output:" reveals more when clicked:

Example 1 - Execution detail

If you scroll back to the top of the rsync output and resize the window you get the full rsync command:

Example 1 - Execution rsync command

This is a major advantage of grsync; you can experiment and get an idea of what rsync options are useful to you.

Example 1 - Execution rerun

If I re-run the "Execute" command again there are only a few files that have changed - about 3 MB. The "speedup" quoted is the ratio of raw filesizes, once again not that meaningful.


If I was doing the same thing directly using rsync I would prefer "-a" to "-r -n -t -p -o -g". A generic form would be:

Rsync command:

rsync -av --exclude-from=FILE   /home/user/   /media/usb/backup-dir/

It is very important to remember the trailing '/' on the source and destination directories. You will otherwise get undesired and confusing results - rsync will not be backing things up for you.


Grsync example 2 - Backup of a few directories to a network-mounted drive

I want to back up the directories "/home/fred/work", "/home/fred/bin" and "/home/fred/Documents" from my desktop PC to a mounted directory /media/aitken (from aitken.math.auckland.ac.nz). This has the advantage of getting my files backed up, however be careful and restrained about pushing data at servers whose primary purpose isn't file archiving.

I can do this by setting up a file with an explicit list of directories (and files) to be backed up.

#######/home/fred/.grsync/file-list#######
# .grsync/file-list

# A list of files and directories that I explicitly want backed up
# (using the "--files-from" option).

# These references are implicitly below the source directory
# specified.  Leading slashes are stripped, and '../' references are
# not allowed to go above the source directory.

# "--files-from" alters rsync "-a" behaviour - if you want recursion you need
# to explicitly include the "-r" option.

# Grsync 0.92 doesn't use "-a" and always adds "-r".

/bin
/work
/Documents
#######end of /home/fred/.grsync/file-list#######


Example 2 - Basic options

Example 2 - Advanced options

Rsync command:

rsync -av --files-from=FILE   /home/user/   /media/mounted-dir/backup-dir/

It is very important to remember the trailing '/' on the source and destination directories. You will otherwise get undesired and confusing results - rsync will not be backing things up for you.


Grsync example 3 - Backup of a few directories to a remote machine

I am backing up a laptop from home and like example 2 I want to back up "/home/fred/work", "/home/fred/bin" and "/home/fred/Documents" to Aitken. In this case I don't have a network mount, and I also want to be more selective and exclude some subdirectories. The network connection is a bottleneck, so the grsync advanced option "Compress file data" (rsync "-z") is a good idea.


#######/home/fred/.grsync/filter-patterns#######
# .grsync/filter-patterns

# Call with --filter=". /home/fred/.grsync/filter-patterns".
# I recommended also adding "-m"/"--prune-empty-dirs".

# WARNING WARNING WARNING That is dot-space-slash in '. /home' -
# having no spaces OR two spaces will not work WARNING WARNING WARNING

# These include and exclude patterns have complex interactions - you
# really should check your results using grsync "Simulation" or rsync
# "-n"/"--dry-run".

# The order of these patterns matters and patterns are re-applied in
# each subdirectory rsync traverses - but exclusion rules can
# short-circuit traversion.

# Rsync filename/pathname wildcard semantics are not the same as shell
# wildcard (filename glob) semantics.  '***' != '**' != '*'.  Beware.

# Match bin, work and Documents and subdirectories and files (when
# rsync options '-a' or '-r' used), except disallow
# Documents/perl-bookshelf/ .
+ /bin/***
+ /work/***
- /Documents/perl-bookshelf/
+ /Documents/***

# Include all subdirectories (prune with "-m"/"--prune-empty-dirs")
# but exclude all files not included above.
+ */
- *
#######end of /home/fred/.grsync/filter-patterns#######


Example 3 - Basic options

Rsync/grsync can use SSH-style logins. If you can ssh to a host, you have most of what you need in place.
If ssh normally asks you for a password, rsync/grsync will too.

Example 3 - Advanced options

It is definitely worth running the first backup while you have a fast connection.

Rsync command:

rsync -avz -m --filter=". FILE"   /home/user/   user@full-hostname:/home/user/backup-dir/

Be careful about that ". FILE" syntax! And remember the trailing '/' on the source and destination directories.


Rsync Reference

Rsync is a very powerful command-line tool for efficient copying of files either locally (from one directory to another) or to/from a remote Linux/Unix host. Many non-commercial backup systems for Linux are based on rsync. These systems have rsync (doing very efficient file copying and updating) and a GUI or front-end providing flexibility that rsync doesn't have.

Rsync strengths

  • Rsync assumes by default that if the file size or "last modified" date/time stamp are same for source and destination the file doesn't need to be re-copied.
  • Rsync can preserve date/time stamps and file permissions.
  • Rsync tries very hard to minimise network traffic - after the first (full) backup, subsequent backups will only transfer the parts of files that have changed.
  • Rsync runs on top of SSH, so is reasonably secure.
  • Rsync will back up hidden files and directories.


Rsync weaknesses

  • Rsync is not friendly or forgiving.
  • Rsync is so flexible the rsync man page is huge and heavy reading.
  • Rsync minimises network traffic by being (potentially) very heavy on CPU use at both source and destination ends.
  • Both local and remote hosts must have rsync installed (should apply to most Linux/Unix servers and desktops/laptops).
  • Rsync will will only back up when you tell it to back up - it isn't quietly working in the background all the time (like for example Mac OS X 10.5+ Time Machine can).
  • Backed up hidden files and directories are hidden at the destination end too.
  • Rsync by design does not cope with both source and destination being remote (non-local). Investigate "scp" instead if you are interested in this.
  • Rsync isn't useful by itself if you want automatic snapshots of projects or multiple versions of files generally. You need some sort of versioning/snapshot front-end (dirvish) or a source code control system (Mercurial, git, perhaps CVS?). Backing up your local Mercurial or git repository using rsync may be good enough to avoid the need to back up anything else.


Rsync isn't a wonderful choice for versioning and snapshots. It is fundamentally designed to keep two directory trees (one local, one remote) in step with each other - "Remote SYNChronisation." Grsync doesn't to do versioning/snapshots either.


There little point in using rsync on tarballs (.tar files) and no point at all in using rsync on compressed tarballs (e.g., .tgz files). A few files altered in a .tgz may completely change the bit pattern of the .tgz - this defeats ANY sort of incremental backup scheme. Use the strengths of rsync by letting it work on "raw" versions of files - as you use those files yourself.


Rsync options related to backups

-v (--verbose)

-n (--dry-run) Perform a trial run but don't actually do anything.

-z Compress during transfer. Only really needed if there is a network bottleneck between source and destination.

-s (--protect-args) Help remote end cope with whitespace in filenames. Linux GUI file managers sometimes like to create files with embedded whitespace (for example, links called "Link to file").

-a --archive mode (implies -rlptgoD) See the following six options.

-r (--recursive) Back up whole directory trees.

-p (--perms) Preserve file permissions.

-t (--times) Preserve file timestamps. Rsync may have to work much harder if timestamps aren't preserved.

-o (--owner) Preserve file owner.

-g (--group) Preserve group.

-l (--links) Recreate the symlink on the destination. It may be pointing at a file that has not been backed up.


Less common rsync options

--bwlimit=KBPS Limit bandwidth used by rsync. Useful for limited speed network connections if you want to (say) keep browsing at the same time as a largish rsync is running. The bandwidth limit is an average; rsync may well cause delays for other network traffic.

-i (--itemize-changes) This give a terse (coded) summary of changes made. Itemize-changes interacts with --dry-run and in this case tells you "there are differences but no changes were made." This is disconcerting if you expected --itemize-changes to say what changes would be made if ---dry-run was removed.

-W (--whole-file) Don't bother with any partial-file transfer fanciness. This is useful if both source and destination are local directories (or there is such a fast network connection that disk access is the bottleneck).

--delete Delete extraneous files from the destination. This makes the destination exactly mirror the source.
If you use --delete be aware that any accidental file deletion will delete the same file from the destination (backup) end at the next backup.

--exclude-from=FILE Read exclude patterns from a file. See Example 1 above.

--files-from=FILE Read a list of files (and directories) from a file. See Example 2 above.

--filter=". FILE" Source a list of filter rules from a file. The syntax of the command line is very unforgiving - that is double-quote - period - space - filename - double-quote. See Example 3 above.

-m (--prune-empty-dirs) Don't create empty directories. This is after filtering and exclusion rules. In Example 3 above a lot of empty directories would otherwise be created.

--no-D turn off preserving device and special files. This may otherwise make rsync to ask for root password each time, making non-interactive use (scheduled backups) difficult.

-x (--one-file-system) Don't cross file system boundaries (when following source links). This may be useful useful as a safety measure.

--copy-unsafe-links Replace only links that point outside the directory tree you are backing up. This is not quite the converse of "-x".

-S (--sparse) handle sparse files efficiently. If you don't know that you have sparse files, don't bother to use '-S'.

-b (--backup) This keeps one previous version of files. It is much less useful than it sounds.


Rsync example

Rsync is capable of doing some powerful and surprising things. Eamonn O'Brien's snippet

rsync -a --dry-run --itemize-changes --cvs-exclude /tree/one/ /tree/two/ | grep ">" | less

will give a list of files differing between two directory trees without making any changes. A more-or-less equivalent using diff is

diff -r -q /tree/one /tree/two | grev -v .svn | less


Rsync summary

It is fairly common to see examples on the Internet using "rsync -avz". This covers most needs. "rsync -av" is probably better with most broadband-or-faster network connections.

It is very important to remember the trailing '/' on the source and destination directories. You will otherwise get undesired and confusing results - rsync will not be backing things up for you.

See also

Your local man pages - "man rsync". Looking up specific options isn't quite as hard as trying to find an option that "does the right thing" in the first place.

There are other rsync front-ends - "dirvish" looks very powerful and (unfortunately) rather less friendly than grsync. Once you have worked out rsync options that suit you in grsync, it should be rather easier to make dirvish do the right things. You probably don't want the "-D" option. Remember, be restrained about pushing large snapshots at network-accessible storage (that isn't set up for archive purposes).

This page was last modified on 19 November 2010, at 15:41. This page has been accessed 9,964 times.
Categories

Personal tools