November 15, 2009

No forks please

Our experiences with code forks in the Zope world and customer projects


Over the last year I had to deal with a major Zope code fork within our company.

Here was the situation:

  • a co-worker decided some years ago that it would be a good idea to import a Zope 2.8 release into our company internal repository and modify it heavily over the years
  • in 2008 we accumulated a number of about 50 or more feature modifications within our fork of Zope 2.8
  • with the upcoming of new Zope releases - especially Zope 2.11 - we had the need to migrate to a newer Zope version
  • the co-worker got the great task to refactor all his changes and forward port them into dedicated feature branches on top of Zope 2.11 (you can imagine this was a lot of work)
  • this work resulted in roughly 45 SVN branches (from a single one-line bugfix to major modifications and extensions of Zope and related modules - including ZODB)

After the co-worker left the company, I had to deal with this legacy. It took a huge amount of work by several persons walking through each individual feature branch and decide which of the features could be useful for integration with the mainstream code bases (Zope2, CMF, ZODB and some other modules like ZConfig). And it took also several days for the reintegration of the useful features into the public repositories.

Another code fork came on my desk some days ago. A customer presented its own heavily patched and extended version of some well-known product available from PyPI and the Collective. Tasks to be done: fixing the code of former internal maintainer and extending the functionality with new features. First steps were:

  • figuring out the branch point
  • figuring out the internal changes made after forking the code (this is hard with a 5000 line diff!)
  • creating a branch within the collective based on the original branch point and applying the internal changes as patch back to the codebase within the Collective
  • throwing away and rewriting half of the code of the former internal maintainer

Lessons learned

  • a code fork is easily done and is a good way for adding your own features to a code-base for internal usage
  • reintegration of a code fork into the mainstream repository is a pita, extremely complex and provides a lot of risks
  • avoid code forks whenever you can

If you have serious needs for forking code

  • create branches within the repository of the mainstream code (it is not so hard to get committer rights for the Zope and Plone repositories)
  • communicate and talk to the maintainers of the a particular module or codebase and ask them about their opinion e.g. if there is a feature that might be useful for the public
  • document and comment your changes with the same quality within your fork as you would do it (and usually obliged to do so) within a public repository
  • take over responsibility for a particular module or codebase when you are working on it and when you see that there is no current maintainer  - users of the module will thank you that it is supported and maintained again.
  • don't think that you are smarter than others and do share your code
  • avoid any kind of autistic view on the world  - look left and right - avoid doing "your-own-thing" - play nice with others within an open-source community (especially when the success of your own projects and business depends on exactly this community).

A side note: I never had to maintain private forks of any module within our company or for customer projects. All related changes were made in public repositories and in most of the cases they have been merged back into the mainstream codebase. The anti-examples of playing nice and worst-practice contributing to OSS projects can be found here.