Thursday, July 22, 2010

Looking at source code version control: Comments wanted

I've been investigating version control systems for use by a team. Is there anyone who uses these who can have some comments? I'm leaning towards Mercurial, but I am open to change (our best programmer likes Git). What follows are my notes on some version control systems. Note: While I personally like/use Linux, as a team we are being provided with Windows machines for this project. Based on my review, I'm only looking at Subversion, Mercurial and Git. Comments and experience would be helpful.

=====

Version control systems (VCS) are used to track historical changes to source code (text) files as well as a means to enable collaboration between team members working on a single project. Note that this can be for any text based project, both programming or LaTeX writing. There are 2 major advantages:
  1. Rollback: It is possible to retrieve the version of a file at any point in time. This is especially useful when one is exploring a new idea that ends up not working (or has introduced a bug or error into a program/file) (Note: Apple Computing advertises this feature in recent versions of Mac OS X as the "Time Machine", which is essentially version control built into an operating system).
  2. Collaboration: In the case of multiple people working on a different parts of a single project, changes that are made by one person can be committed to a shared repository, then distributed to all other team members. This results in a definitive version of a system, even as multiple team members are involved in ongoing work on their own parts of the system. Is it does so, version control can keep track of the changes, so that if a change is determined to cause problems and needs to be removed, it can be removed for everyone (or only on an individual's local version as needed)

There are two major categories of modern version control system: Centralized and Distributed

Centralized version control requires the use of a server, and all team members can update or branch using the centralized repository. For distributedversion control, all team members have a local version of source code repository, that can be synchronized to a definitive repository as needed. Distributedversion control is a relatively recent development (
  1. Native availability under MS Windows (availability under UNIX/LINUX is true for almost all Version Control systems). Preferably without use of Cygwin or MSYS.
  2. Can be used through GUI - through the operating system file explorer or a stand alone GUI (all open source VCS enable command line use)
  3. Integration with IDE - Visual Studio, Eclipse etc (EMACS/VIM integration can be assumed for open source VCS)
  4. Documentation - Tutorials available for those learning the system with the expectation of infrequent use
  5. Existing support within ___


  6. Evaluation

    For evaluation purposes we will compare three systems Subversion, Git and Mercurial, each of which are open source (license allows for free distribution) and in widespread use including a number of high profile large projects.


    Subversion - Centralized version control system http://subversion.apache.org/
    • Available for MS Windows, Linux, Mac
    • Standard plugins for Visual Studio (http://www.visualsvn.com/visualsvn/) and Eclipse (http://subclipse.tigris.org/).
    • TortoiseSVN (http://tortoisesvn.net/) is a standard gui that integrates into Windows Explorer (MS Windows) as a context sensitive menu (usually right-click on a file to see options)
    • Requires server
    • The O'Reilly Press Subversion book is available as a freely available ebook (http://svnbook.red-bean.com/)
    • Many tutorials available

    Git - Decentralized version control system
    • Built on UNIX shell and Perl. For Windows, this seems to require that CYGWIN or MSYS be installed to provide UNIX shell services on Windows.

    Mercurial - Decentralized version control system
    http://mercurial.selenic.com/
    • Available for MS Windows, Linux, Mac
    • Multiple plugins for Visual Studio (http://visualhg.codeplex.com/) and Eclipse (http://javaforge.com/project/HGE).
    • TortoiseHg (http://mercurial.selenic.com/wiki/TortoiseHg) is a standard gui that integrates into Windows Explorer (MS Windows) or Nautilus (Linux file explorer) as a context sensitive menu (usually right-click on a file to see options)
    • The O'Reilly Press Mercurial book is available as a freely available ebook
    • Some tutorials available. http://hginit.com has an introduction to version control tutorial.
    • Also http://mercurial.selenic.com/wiki/UnderstandingMercurial and a tutorial (http://mercurial.selenic.com/wiki/Tutorial) and a Quick Start guide (http://mercurial.selenic.com/wiki/QuickStart)


    MS Visual Source Safe / MS Team Foundation Server / IBM Clear Case are commercial Centralized VCS systems that are often used. In corporate environments they are often mandated by upper management because of their connections with Microsoft and IBM. Comments from working programmers are near-universally derogatory. The complaints include too tight integration with other vendor tools (so if an environment is not in MS Visual Studio, it is very difficult to get it into VSS/TFS (e.g. LaTeX files) ) In addition, there are numerous complaints that the repositories are difficult to manage, to the extent that it seems to be very easy to corrupt a repository, (making it useless)

    There are other open source and commercial VCS systems (e.g. CVS, Perforce, AccuRev, Bazaar). Subversion, Git and Mercurial were chosen as the comparison set because as a set they are consistently are viewed as superior to others in comparisons among VCS. Note that because distributed VCS is so new (2005), comparisons more then 2 years old should be considered obsolete.


    Some references:

    Overview of Version Control tools by Martin Fowler (writer and consultant on organizing programming teams)
    IBM DeveloperWorks article on Introduction to Distributed Version Control Systems

1 comment:

Mattox Beckman said...

I migrated from CVS to Subversion after I left graduate school, and in the past few years migrated from Subversion to Git. I require my students to use version control to turn in assignments and to collaborate, both with me and the other students.

The problem with a central repository is that, if something goes wrong, *nobody* can use the code anymore. Subversion is pretty stable, but once in a while "things happen".

The problem with git is that you have to teach people the difference between committing code (putting the changes in your own repository) vs pushing the code (publishing the changes to the remote repository). It's not that hard, but I always have a few students who don't pay serious enough attention to the tutorial and get in trouble later when they can't read the error messages and discover that they cannot turn in their assignments. Most of my students seem to switch to git for their personal coding afterwards, though.

I have been using gitosis for collaboration (since I have a linux server under my command), and it mostly meets my needs. The fact that it's scriptable helps. It uses secure shell keys for authentication, and I have a few python scripts to watch student accounts and copy over secure shell keys as needed. Note: if you break the master secure shell key, the gitosis server needs its own account; it's best to set this up on a machine on which you have root access.

I would absolutely recommend git over subversion (hold on, let me raise shields here). My impression is that mercurial is extremely similar to git; the git tutorial I started on was a rewrite of mercurial's. I would say use whichever of those two best supports your own platforms.