Agile Zone is brought to you in partnership with:

I am a programmer and architect (the kind that writes code) with a focus on testing and open source; I maintain the PHPUnit_Selenium project. I believe programming is one of the hardest and most beautiful jobs in the world. Giorgio is a DZone MVB and is not an employee of DZone and has posted 638 posts at DZone. You can read more from them at their website. View Full User Profile

On commits and commit messages

06.07.2011
| 3917 views |
  • submit to reddit
I confess I do not pay attention to commit messages as much as I would. It's really easy to not put care into our messages, especially when you have an habit of committing very small increments. Yet, how you perform a commit have effects on how your work is perceived from other people, with variables ranging from messages to files included and size. Commits are Polaroids of us working, and we don't want to take a bad photograph, right?

ACID commits

Commits, especially in centralized version control systems, resemble much ACID relational database transactions:
  • Atomic: they must be applied in a single shot, or reverted at all.
  • Consistent: the build should be green before and after a commit (at least after published commits if you're working with Git or Mercurial).
  • Isolated: no effects are seen on the build until we hit the big button.
  • Durable: I don't want to lose commits I made an hour ago.

Small, ACID commits are important both for the ease of rolling them back in case something goes wrong, and for independent application while merging or pushing.

But that's not all: I like them also because it's simpler to write their messages:

  • you don't forget something that you did 45 minutes ago, or as a quick refactoring.
  • You don't include unnecessary files, since they are just 4-5 and an intruder can easily be spot with a git status.
  • You don't have to do a git add -i to select only the files that had to be included, or even single lines of a file (how do you test this partial commit isn't breaking the build? stash and run and stash pop? I think it's already too complex). Subversion, for example, does not even have the possibility of selecting which files to commit.

Maybe we should publish a Multiple lines in commit messages considered harmful paper?

DRY

My issue however, is also with commit messages and not only with their content. But let's face it, commit messages are part of the code, just like filenames: no one is going to compile or test your code without passing from the version control system.

I think the status of commit messages is parallel to the one of docblocks: written multiple times every hour by the majority of us, but without a great notion of their importance.

A first suggestion that I had no difficulty in follow is to not repeat things already stored in the VCS, like the author of the commit, or the date and time at which it is made. What I struggle with instead is semantics.

A commit message is usually written once, but read it everywhere and whenever someone takes a look at the log or wants to merge your changes. You can't even rewrite it like you would do with a problematic, ugly line of code in a refactoring session.

Thus commit messages are a kind of CD-R, but why should we care about them? They turn out to be handy in some other moments of the day:

  • while search for changes made a lot of time ago to the class X or component Y, commit messages are the only field we can filter on. Other fields are too soft (author) or not human-readable (revision or UUID).
  • While reading git log or svn log, to get an update about what our colleagues have done.
  • While generating changelogs and other reports or statistics, also by keeping an eye on tags like [1.1] or [feature_name].

Project guidelines and standards

You don't have to invent a commit message style; probably your project coding standards already have some conventions on:
  • formatting: uppercase and lowercase and which words should be treated differently from common nouns (e.g. pattern names are usually indicated as Decorator, not decorator).
  • verbs tenses, noun forms to respect.
  • composition: how many files and folders.
  • tags: [1.0] and similar textual metadata.

The sense of all these rules is just to limit the degrees of freedom of a commit message, which otherwise would be a form free text. Just as we prefer well-formatted code with 4 spaces (or 8 or 2 or whatever consistent value you like) as indentation, the same goes for the style of our messages. Grepping a set of consistently written commit messages becomes a pleasure.

You may want to create these guidelines if they are not already in place on your project and you have the power to do so. It's not micromanaging, it's just an extension of coding standards.

A last word

Finally here's a tool for the cases when you do not really know what to write in a message (humor intended). It's the wonderful generator of meaningless commit messages:

Refactored configuration.
Fixed unnecessary bug.
I must sleep... it's working... in just three hours...
add actual words
fix bug, for realz
more ignored words
Don't push this commit
Ah, that's how Hell looks like... (this final comment doubles as a possible commit message.)
Published at DZone with permission of Giorgio Sironi, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)