1. GETTING THE SOURCE

Simply clone the source with:

git clone https://github.com/rdiff-backup/rdiff-backup.git
Note
If you plan to provide your own code, you should first fork our repo and clone your own forked repo (probably using ssh not https). How is described at https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/working-with-forks

2. GENERAL GUIDELINES

  • Before committing to a lot of writing or coding, please file an issue on Github and discuss your plans and gather feedback. Eventually it will be much easier to merge your change request if the idea and design has been agreed upon, and there will be less work for you as a contributor if you implement your idea along the correct lines to begin with.

  • Please check out existing issues and existing merge requests and browse the git history to see if somebody already tried to address the thing you have are interested in. It might provide useful insight why the current state is as it is.

  • Changes can be submitted using the typical Github workflow: clone this repository, make your changes, test and verify, and submit a Pull Request (PR).

  • For all code changes, please remember also to include inline comments and update tests where needed.

  • Follow of course our coding and documentation guidelines.

2.1. License

Rdiff-backup is licensed with GNU General Public License v2.0 or later. By contributing to this repository you agree that your work is licensed using the chosen project license.

2.2. Branching model and pull requests

The master branch is always kept in a clean state. Anybody can at any time clone this repository and branch off from master and expect test suite to pass and the code and other contents to be of good quality and a reasonable foundation for them to continue development on.

Each PR focuses on some topic and resist changing anything else. Keeping the scope clear also makes it easier to review the pull request. A good pull request has only one or a few commits, with each commit having a good commit subject and if needed also a body that explains the change.

Each pull request has only one author, but anybody can give feedback. The original author should be given time to address the feedback — reviewers should not do the fixes for the author, but instead let the author keep the authorship. Things can always be iterated and extended in future commits once the PR has been merged, or even in parallel if the changes are in different files or at least on different lines and do not cause merge conflicts if worked on.

It is the responsibility of the PR author to keep it without conflict with master (e.g. if not quickly merged) and overall to support the review process.

Ideally each pull request gets some feedback within 24 hours from it having been filed, and is merged within days or a couple of weeks. Each author should facilitate quick reviews and merges by making clean and neat commits and pull requests that are quick to review and do not spiral out in long discussions.

If something is of interest for the changelog, prefix the statement in the commit body with a three uppercase letters and a colon; which acronym is not that important but here is a list of recommended ones (see the release section to understand why it’s important):

  • FIX: for a bug fix

  • NEW: for a new feature

  • CHG: for a change requesting consideration when upgrading

  • DOC: for documentation aspects

  • WEB: anything regarding the website

2.2.1. Merging changes to master

Currently the rdiff-backup Github repository is configured so that merging a pull request is possible only if it:

  • passes the CI testing

  • has at least one approving review

While anybody can make forks, pull requests and comment them, only a developer with write access to the main repository can merge and land commits in the master branch. To get write access, the person mush exhibit commitment to high standards and have a track record of meaningful contributions over several months.

It is the responsibility of the merging developer to make sure that the PR is squashed and that the squash commit message helps the release process with the right description and 3-capital-letters prefix (it is still the obligation of the PR author to provide enough information in their commit messages).

2.3. Versioning

In versioning we utilize git tags as understood by setuptools_scm. Version strings follow the PEP-440 standard.

The rules are currently as follows (check the files in .github/workflows for details):

  • all commits tagged with an underscore at the end or with a tag looking like a version number (i.e. as in next two bullets) are released to GitHub.

  • all commits tagged with alpha, beta, rc or final format are released to PyPI, i.e. the ones looking like: vX.Y.ZaN (alpha), vX.Y.ZbN (beta), vX.Y.ZrcN (release candidate) or vX.Y.Z (final).

  • all commits where the "version tag" is a development one, i.e. like previously with an additional .devM at the end, are released to Test PyPI. They are meant mostly to test the deployment itself (use alpha versions to release development code).

Note
the GitHub releases are created as draft, meaning that a maintainer must review them and publish them before they become visible.

3. BUILD AND INSTALL

3.1. Pre-requisites

The same pre-requisites as for the installation of rdiff-backup also apply for building:

  • Python 3.6 or higher

  • librsync 1.0.0 or higher

Further python dependencies are documented in requirements.txt.

Additionally following pre-requisites are needed:

  • python3-dev (or -devel)

  • librsync-dev (or -devel)

  • a C compiler (gcc)

  • libacl-devel (for sys/acl.h)

  • rdiff (for testing)

  • asciidoctor (for documentation generation)

  • rpdb and netcat/ncat/nc (for remote debugging of server processes)

All of those should come packaged with your system or available from https://pypi.org/ but if you need them otherwise, here are some sources:

3.1.1. Changing dependencies versions

Python interpreter
  • Windows:

    • .github/workflows/test_windows.yml - check for WIN_PYTHON_VERSION

    • .github/workflows/deploy.yml - check for WIN_PYTHON_VERSION

    • tools/windows/group_vars/windows_hosts/generic.yml - check for python_version and python_version_full

  • Linux:

    • tox.ini, tox_root.ini, tox_dist.ini and tox_slow.ini - check for envlist

    • .github/workflows/test_linux.yml - check for python-version

    • .github/workflows/deploy.yml - check for /opt/python/cp3…​ (and possibly many-linux)

    • setup.py - check for python-requires

    • README.adoc - check for Python references

Python libraries and binary dependencies

All Python dependencies have been concentrated into requirements.txt, generated from requs/*.txt with one file for each purpose. Only those files should be used, and maintained, throughout the build/release process.

Binaries are listed in bindep.txt (based on the bindep utility).

In all cases, a validation of the documentation is also necessary, but the above files should be considered the ultimate source of truth, and correctly maintained.

3.2. Build and install using Makefile

The project has a Makefile that defines steps like all, build, test and others. You can view the contents to see what it exactly does. Using the Makefile is the easiest way to quickly build and test the source code.

By default the Makefile runs all of it’s command in a clean Docker container, thus making sure all the build dependencies are correctly defined and also protecting the host system from having to install them.

The CI pipeline also uses the Makefile, so if all commands in the Makefile succeed locally, the CI is most likely to pass as well.

3.3. Build and install with setup.py

To install, simply run:

python3 setup.py install

The build process can be also be run separately:

python3 setup.py build

The setup script expects to find librsync headers and libraries in the default location, usually /usr/include and /usr/lib. If you want the setup script to check different locations, use the --librsync-dir switch or the LIBRSYNC_DIR environment variable. For instance to instruct the setup program to look in /usr/local/include and /usr/local/lib for the librsync files run:

python3 setup.py --librsync-dir=/usr/local build

Finally, the --lflags and --libs options, and the LFLAGS and LIBS environment variables are also recognized. Running setup.py with no arguments will display some help. Additional help is displayed by the command:

python3 setup.py install --help

More information about using setup.py and how rdiff-backup is installed is available from the Python guide, Installing Python Modules for System Administrators, located at https://docs.python.org/3/install/index.html

Note
There is no uninstall command provided by the Python distutils/setuptools system. One strategy is to use the python3 setup.py install --record <file> option to save a list of the files installed to <file>, another is to created a wheel package with python3 setup.py bdist_wheel, as it can be installed and deinstalled.
Note
if you plan to use ./setup.py bdist_rpm to create an RPM, you would need rpm-build but be aware that it will currently fail due to a known bug in setuptools with compressed man pages.

To build from source on Windows, check the Windows tools to build a single executable file which contains Python, librsync, and all required modules.

4. TESTING

Clone, unpack and prepare the testfiles by calling the script tools/setup-testfiles.sh from the cloned source Git repo. You will most probably be asked for your password so that sudo can extract and prepare the testfiles (else the tests will fail).

That’s it, you can now run the tests:

  • run tox to use the default tox.ini

  • or tox -c tox_slow.ini for long tests

  • or sudo tox -c tox_root.ini for the few tests needing root rights

For more details on testing, see the test sections in the Makefile and the GitHub Actions.

5. DEBUGGING

5.1. Trace back a coredump

At the time of writing these notes, there was an issue where calling the program generates a Segmentation fault (core dumped). This chapter is based on this experience debugging under Fedora 29.

References:

Note
This assumes gdb was already installed.
  1. First install:

    sudo dnf install python3-debug
    sudo dnf debuginfo-install python3-debug-3.7.3-1.fc29.x86_64
    sudo dnf debuginfo-install bzip2-libs-1.0.6-28.fc29.x86_64 glibc-2.28-27.fc29.x86_64 \
        librsync-1.0.0-8.fc29.x86_64 libxcrypt-4.4.4-2.fc29.x86_64 \
        openssl-libs-1.1.1b-3.fc29.x86_64 popt-1.16-15.fc29.x86_64 \
        sssd-client-2.1.0-2.fc29.x86_64 xz-libs-5.2.4-3.fc29.x86_64 zlib-1.2.11-14.fc29.x86_64
  2. Then run:

    python3 ./setup.py clean --all
    python3-debug ./setup.py clean --all
    CFLAGS='-Wall -O0 -g' python3-debug ./setup.py build
    PATH=$PWD/build/scripts-3.7:$PATH PYTHONPATH=$PWD/build/lib.linux-x86_64-3.7-pydebug/ python -m rdiffbackup.run -v 10 \
        /some/dir1 /some/dir2
    [...]
    Segmentation fault (core dumped)
Note
The CFLAGS avoids optimizations making debugging too complicated

At this stage coredumpctl list shows that coredump is the last one, so that one can call coredumpctl gdb, which itself tells (in multiple steps) that we missing some more debug information, hence the above debuginfo-install statements (assuming guess you could install the packages without version information if you’re sure they fit the installed package versions).

So now back into coredumpctl gdb, with some commands:

help
help stack
backtrace
bt full
py-bt
frame <FrameNumber>
p <SomeVar>
  1. get a backtrace of all function calls leading to the coredump (also bt)

  2. backtrace with local vars

  3. py-bt is the Python version of backtrace

  4. jump between frames as listed by bt using their #FrameNumber

  5. print some variable/expression in the context of the selected frame

Jumping between frames and printing the different variables, we can recognize that:

  1. the core dump is due to a seek on a null file pointer

  2. that the file pointer comes from the job pointer handed over to the function rs_job_iter

  3. the job pointer itself comes from the self variable handed over to _librsync_patchmaker_cycle

  4. reading through the librsync documentation, it appears that the job type is opaque, i.e. I can’t directly influence and it has been created via the rs_patch_begin function within the function _librsync_new_patchmaker in rdiff_backup/_librsyncmodule.c.

At this stage, it seems that the core file has given most of its secrets and we need to debug the live program:

$ PYTHONTRACEMALLOC=1 PATH=$PWD/build/scripts-3.7:$PATH PYTHONPATH=$PWD/build/lib.linux-x86_64-3.7-pydebug/ gdb python3-debug
(gdb) break rdiff_backup/_librsyncmodule.c:_librsync_new_patchmaker
(gdb) run build/scripts-3.7/rdiff-backup /some/source/dir /some/target/dir

The debugger runs until the breakpoint is reached, after which a succession of next and print <SomeVar> allows me to analyze the code step by step, and to come to the conclusion that cfile = fdopen(python_fd, ... is somehow wrong as it creates a null file pointer whereas python_fd looks like a valid file descriptor (an integer equal to 5).

5.2. ResourceWarning unclosed file

If you get something looking like a ResourceWarning: Enable tracemalloc to get the object allocation traceback

PYTHONTRACEMALLOC=1 PATH=$PWD/build/scripts-3.7:$PATH \
PYTHONPATH=$PWD/build/lib.linux-x86_64-3.7-pydebug/ \
	rdiff-backup -v 10 /tmp/äłtèr /var/tmp/rdiff

This tells you indeed where the file was opened: Object allocated at (most recent call last) but it still requires deeper analysis to understand the reason.

5.3. Debug client / server mode

In order to make sure the debug messages are properly sorted, you need to have the verbosity level 9 set-up, mix stdout and stderr, and then use the date/time output to properly sort the lines coming both from server and client, while making sure that lines belonging together stay together. The result command line might look as follows:

rdiff-backup -v9 localhost::/sourcedir /backupdir 2>&1 | awk \
	'/^2019-09-16/ { if (line) print line; line = $0 } ! /^2019-09-16/ { line = line " ## " $0 }' \
	| sort | sed 's/ ## /\n/g'

Since version 2.1+, you can use the server’s --debug option to debug remotely the server process. Make sure first that you’ve installed rpdb (remote pdb) and netcat (also called nc or ncat).

If you make sure that you run the latest code version, and set all the environment variables correctly, you can then connect remotely to the spawned server process:

./setup.py build
PATH=$PWD/build/scripts-3.9:$PATH PYTHONPATH=$PWD/build/lib.linux-x86_64-3.9 \
	python -m pdb -m rdiffbackup.run --remote-schema \
		"ssh -C {h}  # (1)
		RDIFF_BACKUP_DEBUG=0.0.0.0:4445  # (2)
		PATH=$PWD/build/scripts-3.9:$PATH
		PYTHONPATH=$PWD/build/lib.linux-x86_64-3.9
		rdiff-backup server --debug" \  # (3)
	backup source_dir localhost::/target_dir
pdb is running on 0.0.0.0:4445  # (4)
  1. the double quotes are important to make sure that the PATH variable is resolved locally

  2. this variable is optional and only required if you want another address/port

  3. note the --debug option necessary to set a breakpoint early in the process

  4. here the address:port where the debug process is listening, the default is 127.0.0.1:4444

Once you’ve done this, in another terminal, you can call ncat localhost 4445 (or 4444 by default) and you’ll arrive in the pdb command line. You’re one or two n(ext) steps away from the pre-check method, so you can start to debug the server process relatively early (not the argument parsing step though).

Tip
rpdb is just a wrapper around pdb so it acts very similarly.

5.4. Debug iterators

When debugging, the fact that rdiff-backup uses a lot of iterators makes it rather complex to understand what’s happening. It would sometimes make it easier to have a list to study at once of iterating painfully through each but if you simply use p list(some_iter_var), you basically run through the iterator and it’s lost for the program, which can only fail.

The solution is to use itertools.tee, create a copy of the iterator and print the copy, e.g.:

(Pdb) import itertools
(Pdb) inc_pair_iter,mycopy = itertools.tee(inc_pair_iter)
(Pdb) p list(map(lambda x: [str(x[0]),list(map(str,x[1]))], mycopy))
[... whatever output ...]

Assuming the iteration has no side effects, the initial variable inc_pair_iter is still valid for the rest of the program, whereas the mycopy is "dried out" (but you can repeat the tee operation as often as you want).

5.5. Hints where to place breakpoints

Depending on the kind of issue, there are some good places to put a breakpoint:

  • if there is a file access issue, src/rdiff_backup/rpath.py in the make_file_dict(filename) function.

  • if you need to follow the listing of files and directories, src/rdiff_backup/selection.py in the diryield(rpath) function.

5.6. Get coverage details

If you need to check the details of the coverage report after the run of tox -e pyXY, you can simply call something like the following:

COVERAGE_FILE=.tox/pyXY/log/coverage.sqlite .tox/pyXY/bin/coverage report -m

The report output will show you which code lines aren’t covered by the tests.

Tip
if a clause needs to be excluded from the report, you can use the comment # pragma: no cover. But don’t do it because you can but only because you must!

5.7. Profile rdiff-backup

5.7.1. Profiling without code changes

After having called ./setup.py build, you may call something like the following to profile the current code (adapt to your Python version):

PATH=$PWD/build/scripts-3.8:$PATH PYTHONPATH=$PWD/build/lib.linux-x86_64-3.8 \
	python -m cProfile -s tottime \
	build/scripts-3.8/rdiff-backup [... rdiff-backup parameters ...]

The -s tottime option sorts by total time spent in the function. More information can be found in the profile documentation.

Tip
if you’re into graphical tools and overviews, have a look e.g. at https://pythonhosted.org//ProfileEye/ ?

You may also do memory profiling using the memory-profiler, though more detailed information requires changes to the code by adding the @profile decorator to functions:

pip install --user memory-profiler
PATH=$PWD/build/scripts-3.8:$PATH PYTHONPATH=$PWD/build/lib.linux-x86_64-3.8 \
	mprof run \
	build/scripts-3.8/rdiff-backup [... rdiff-backup parameters ...]
mprof plot
mprof clean
Note
sometimes calling rdiff-backup this way fails, it’s due to the script having a wrong interpreter (because of wheel building). Call ./setup.sh clean --all && ./setup.py build to fix it.
Tip
there is also a line-profiler, but I didn’t try it because it requires changes to the code (again the @profile decorator).

5.7.2. More profiling with code changes

Once you have found by profiling an object that uses a lot of memory, one can use print(sys.getsizeof(x)) to print it’s memory footprint then iterating for a code solution to bring it down.

Memory can be freed manually with:

import gc
collected_objects = gc.collect()

This can also be run in Python:

import cProfile, pstats, StringIO
pr = cProfile.Profile()
pr.enable()
# ... do something ... pr.disable()
s = StringIO.StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats(‘cumulative’)
ps.print_stats()
print s.getvalue()

6. RELEASING

There is no prior release schedule — they are made when deemed fit.

We use GitHub Actions to release automatically, as setup in the GitHub Workflows.

The following rules apply:

  • each modification to master happens through a Pull Request (PR) which triggers a pipeline job, which must be succesful for the merge to have a chance to happen. Such PR jobs will not trigger a release.

  • GitHub releases are generated as draft only on Git tags looking like a release. The release manager reviews then the draft release, names and describes it before they makes it visible. An automated Pypi release is foreseen but not yet implemented.

  • If you need to trigger a job for test purposes (e.g. because you changed something to the pipeline), create a branch or a tag with an underscore at the end of their name. Just make sure that you remove such tags, and potential draft releases, after usage.

  • If you want, again for test purposes, to trigger a PyPI deployment towards test.pypi.org, tag the commit before you push it with a development release tag, like vA.B.CbD.devN, then explicitly push the tag and the branch at the same time e.g. with git push origin vA.B.CbD.devN myname-mybranch.

Given the above rules, a release cycle looks roughly as follows:

  1. Call ./tools/get_changelog_since.sh PREVIOUSTAG to get a list of changes (see above) since the last release and a sorted and unique list of authors, on which basis you can extend the CHANGELOG for the new release. IMPORTANT: make sure that the PR is squashed or you won’t be able to trigger the release pipeline via a tag on master.

  2. Make sure you have the latest master commits with git checkout master && git pull --prune.

  3. Tag the last commit with git tag vX.Y.ZbN (beta) or `git tag vX.y.Z" (stable).

  4. Push the tag to GitHub with git push --tags.

  5. You can go to Actions to verify that the pipeline has started.

  6. If everything goes well, you should see the new draft release with all assets (aka packages) attached to it after all jobs have finished.

  7. Give the release a title and description and save it to make it visible to everybody.

  8. You’ll get a notification e-mail telling you that rdiff-backup-admin has released a new version.

  9. Use this e-mail to inform the rdiff-backup users.

Important
if not everything goes well, remove the tag both locally with git tag -d TAG and remotely with git push -d origin TAG. Then fix the issue with a new PR and start from the beginning.
Tip
if the PyPI deploy pipeline is broken, you may download the impacted wheel(s) from GitHub and upload them to PyPI from the command line using twine: twine upload [--repository-url https://test.pypi.org/legacy/] dist/rdiff\_backup-*.whl

The following sub-chapters list some learnings and specifities in case you need to modify the pipeline.

6.1. Delete draft releases

Because there is one draft release created for each pipeline job, it can be quite a lot when one tests the release pipeline. The GitHub WebUI requires quite a lot of clicks to delete them. A way to simplify (a bit) the deletion is to install the command line tool hub and call the following command:

hub release --include-drafts -f '%U %S %cr%n' | \
	awk '$2 == "draft" && $4 == "days" && $3 > 2 {print $1}' | xargs firefox

the 2 compared to $3 is the number of days, so that you get one tab opened in firefox for each draft release, so that you only need 2 clicks and one Ctrl+W (close the tab) to delete those releases.

Note
deletion directly using hub isn’t possible as it only supports tags and not release IDs. Drafts do NOT have tags…​