thank you so much for attending compu-salon episode 6—controlling your versions. Here's a summary of our discussion.
This week (Mar 9) we'll talk about building a basic "CGI" web application (i.e., a webpage that takes input from the user, runs on a server, and returns some useful output). I'm thinking of taking a simple GW SNR calculation as an example, but if you have ideas for something directly useful to you, please let me know.
I'll send out a reminder on Friday morning. Until then!
Also at www.vallis.org/salon.
Many good reasons:
Subversion (SVN) is a classic (but evolved) version control system. It's classic in that it's based on a centralized, authoritative repository, which usually sits on a server and is accessed using an internet protocol. It's improved with respect to older standards such as CVS, in that, among other desirable properties, it has a global version counter with "atomic" multiple-file commits (i.e., a single revision number describes the state of all the files in the repository, and multiple files can be updated at once generating only one new revision); it allows the movement of files and directories while keeping proper history; it has convenient options for remote operation, and strong client tools. The one drawback that's often cited for SVN is that branches are tags are not first-class citizens, but they are implemented by convention by making copies of subdirectories.
Some good resources on SVN:
The SVN cheatsheet
We need to distinguish between operations performed on the central repository itself (creating it, adding or copying a directory, importing non-version-controlled files) and operations performed by individual users on a working copy of the repository.
The repository always holds the authoritative version of all files; there is never any conflict in the repository. All editing work happens in working copies, and is then committed to the central repository; commits are only successful (establishing new authoritative versions) if there are no conflicts between the files in the working copy and those in the repository; if there are conflicts, they must be resolved in the working copy.
# create repository (general form, followed by example) $ svnadmin create PATH $ svnadmin create /home/user/repository # make directories within repository $ svn mkdir -m MESSAGE URL $ svn mkdir -m "New project" svn+ssh://server/home/user/paper $ svn mkdir -m "New project" svn+ssh://server/home/user/paper/trunk # add all files from a directory into repository $ svn import -m MESSAGE PATH URL # (the directory itself is stripped out, so in this example files end up in "trunk") $ svn import -m "First import" /home/user/paperfiles svn+ssh://server/home/user/paper/trunk
# check out a working copy of the repository $ svn checkout URL DIR # the path is stripped out, so in this example the files end up in the new directory "workingcopy" $ svn checkout svn+ssh://server/home/user/paper/trunk workingcopy # display modified (M), added (A), deleted (D), or unknown (?) local files # the command does not query the repository, but checks against the local cache of the last update $ svn status $ svn status -u # also checks for updates $ svn info # even more information $ svn log [FILE] # see the commit history of a directory or file # schedule new files for addition, deletion, copy $ svn add FILE1 FILE2 ... $ svn delete FILE1 FILE2 ... $ svn copy FILE1 FILE2 # commit edits (as well as additions/deletions/copies) to the repository # commit will fail if the working copy has not been _updated_ to the latest repository revision $ svn commit -m MESSAGE FILE1 FILE2 ... # or commit the entire directory: $ svn commit -m MESSAGE . # undo local changes by reverting to the latest revision updated from the repository (_not_ the latest version in the repository) $ svn revert FILE # compare local changes to the latest revision updated from the repository # in the output, the ranges between @@/@@ indicate the blocks of lines that have changed $ svn diff FILE # compare two arbitrary revisions (note also BASE = cached revision; HEAD = latest in repository; PREV = previous revision) $ svn -rREVNUM1:REVNUM2 FILE # update the working copy to the latest state of the repository # will print A,G,C for files added, successfully merged, in conflict $ svn update [FILE] # IMPORTANT: after you have updated, be careful not to save an older version of a file that you may have in your editor's buffer # if there are conflicts in FILE, four files appear (FILE, FILE.mine, FILE.HEADREVISION, FILE.BASEREVISION) # conflicts are resolved by editing FILE, removing <<< === >>> blocks, and issuing $ svn resolved [FILE]
# undo changes by specifying a reverse version range $ svn merge -rWRONGREV:RIGHTREV URL $ svn merge -r303:302 svn+ssh://server/home/user/paper/trunk/refs.bib # then commit... # resurrecting a deleted item $ svn copy URL/FILE@REVNUM ./filename $ svn copy svn+ssh://server/home/user/paper/trunk/figure.pdf@807 . # then commit...
By convention, for each project an SVN repository should include a trunk directory (the place for stable code and text), as well as branches and tags directories. There are two main approaches to dealing with branches.
In the "never branch" approach (probably appropriate for papers, sometime for code), development happens on the trunk, releases are branched off, and tags are made as appropriate.
In the "always branch" approach (useful especially for code, see Jean-Michel Feurprier):
Note that SVN has no internal notion of branching or tagging—users implement these by making copies of directories to a different location in the repository, usually within the branches and tags directories. However, the copies are "cheap": internally, SVN replicates files with symbolic links until they're modified.
So in practice:
# create a branch (on the server!) $ svn copy -m MESSAGE URL/trunk URL/branches/BRANCHNAME $ svn copy -m "Create branch" svn+ssh://server/home/user/paper/trunk svn+ssh://server/home/user/paper/branches/newbranch # check out a branch $ svn checkout URL/branches/BRANCHNAME DIR $ svn checkout svn+ssh://server/home/user/paper/branches/newbranch branchcopy
After which, you continue your work in the working copy, occasionally merging the changes that have happened on the trunk (this is a sync merge):
# _sync merge_: bring a branch up to date with changes made to ancestral parent branch $ svn merge URL/trunk # (while in the branch working copy) $ svn merge svn+ssh://server/home/user/paper/trunk # then (possibly resolve conflicts) and commit the new state of the branch $ svn commit -m "Merged trunk changes to branch"
Note that revision numbers are unique throughout the repository, so if commits are made both on the trunk and on the branch, the log history of each will skip some revision numbers. Also, the history of the branch won't be visible on the trunk, and vice versa (to see both, you'd have to check out the project repository directory that contains both
Once the work on the branch is complete, it is time to port the results of your development back to the trunk (a branch reintegration merge)—but first:
# while in a working copy of the _trunk_ $ svn merge --reintegrate URL/branches/BRANCHNAME $ svn merge --reintegrate svn+ssh://server/home/user/paper/branches/newbranch # the --reintegrate option is important for svn to keep the history right # then (possibly resolve conflicts) and commit the new state of the trunk $ svn commit -m "Merged branch back into trunk"
It is good practice to delete a branch (
svn -m MESSAGE delete URL/branches/BRANCHNAME) after it's been reintegrated (it will remain in the history in any case), or at least rename it (
svn move URL/branches/BRANCHNAME URL/branches/OBSNAME) so that it is clearly marked as obsolete.
One last thing: tagging is the same as creating a branch:
# create a tag $ svn copy -m MESSAGE trunk-URL tag-URL $ svn copy -m "Tagged reintegrated trunk" svn+ssh://server/home/user/paper/trunk svn+ssh://server/home/user/paper/tags/reintegrate-newbranch
SVN can operate across several remote internet protocols. You may have noticed that whenever I specified a repository URL in the tutorial above, it began with svn+ssh://server. That's one example of a protocol, which runs SVN remotely after accessing the server over ssh. Here are all of them.
file:///location/project/trunk. The file permissions of the repository need to be such that all users can edit the files.
svn://server.com/location/project/trunk. See the SVN book on this.
https://server.com/location/project/trunk. See the SVN book on this.
The users generate private/public pairs of ssh keys:
# on user1's account $ ssh-keygen -t rsa -f user1_svn_rsa
Then user1 sends the public key user1_svn_rsa.pub to the server administrator. Let's say SVN will be run under the account "svn". The public key needs to be added in a new line to the file
~svn/.ssh/authorized_keys, as follows:
command="svnserve -t --tunnel-user=user1 --root=SVNDIR",no-port-forwarding,no-agent-forwarding,no-X11-forwarding,no-pty KEYTYPE KEYBODY user1
The repository also needs to be created on the server (
svnadmin create PROJECT inside SVNDIR), and will be accessed with the URL
svn+ssh://firstname.lastname@example.org/PROJECT/TRUNK (note the account name
svn); however user1 needs to tell SVN to use his private key, which he can do, for instance, by defining
$ export SVN_SSH="ssh -q -i PATH/user1_svn_rsa"
svn does support keywords (e.g.,
$Author), which are replaced in the file upon committing, much like CVS does. However, they need to be enabled for each file:
svn propset svn:keywords "Id Revision Date Author" /path/to/filename
which needs to be followed by a commit. There's a way to set keywords automatically for new files, by adding the following to your
[miscellany] enable-auto-props = true [auto-props] *.m = svn:keywords=Id Revision Date Author (ANY OTHER FILE TYPES THAT NEED KEYWORDS WOULD GO HERE...)
To avoid confusing you too much, I won't say much about them, other than it's good stuff, and you should have a look when you feel advanced enough, or if your colleagues prompt you.
The idea is that there is no centralized repository, but the repositorIES reside in the accounts or workstations of individual users. So commits are local operations performed on the local repository, which holds the user's authoritative version of the files. There can be (and there is usually) a central reference repository, which can be cloned (rather than checked out), and from which and to which users pull and push changes. The main advantages are that development can proceed even if the central repository is not available; that experimental or development branches can be dealt with locally, without burdening all users.
The distributed version-control system du jour is Git. I especially like how I can version-control a directory in my account just by doing