Sunday, January 17, 2010

Visualizing CVS

In a retrospective mood after finishing development on a long-running project, I started wondering just how much coding had been done. How much code has been written, over what timescale, and by whom? Of course, all the answers are sitting there in our CVS repository. I just need a nice way to visualize it.

After hacking around for a bit, here's a pretty graph I came up with: (click for larger size)

To explain: Each circle in this graph represents a single CVS commit to our custom Linux kernel. A commit is positioned along the X-axis according to the date it was committed. Each tick represents one month, so you're seeing a bit over a year's worth of development out of the repository's entire lifetime. The size of each circle shows how big the commit was: the unified diff for each commit was run through diffstat to determine its lines added (A) and lines removed (D). The area of each circle is proportional to A+D. The color of the circle shows who performed the commit. Positioning along the Y-axis is random, just to spread out the circles a bit.

This project was driven mostly by paper deadlines. I have drawn lines to highlight two important dates. Our first deadline was when the project's paper was first submitted to a conference (line #1). This marked an enormous push by everyone to get the system fully working. Work beyond that was at a subdued pace, though not completely absent. Though we had a paper draft submitted, we knew what its problems were, so we kept trying to improve the system.

Our paper thankfully got accepted, so the next deadline was the camera-ready version of our paper (line #2) that addressed our reviewers' comments and included up-to-date results. It was a mad dash there at the end, but we were able to add enough system support to get some additional benchmarks working in time for our deadline.

A bit about the graph generation: I used cvsps as my interface to the CVS repository. This tool collects and correlates information about the commits to each file and presents that information as a series of patch sets. Strictly, each point on the graph shows a patch set. Cvsps provided commit date and author, and it would also extract the patch for each change set to be fed to diffstat.

The intermediate output of this process was saved to text files and then tweaked. Patch extraction was a rather length process, probably because I wasn't doing this on the CVS server itself. Also for some reason, one of the patch sets contained a new version of the file rather than a 4-line patch. Something was obviously wrong when I got a circle 100x larger than the rest. That's really the only thing that needed hand-modification.

That information was then loaded through a Python script that used matplotlib to generate the graph itself. I'd never really used that graphing tool before, but I was sick of working with gnuplot and jgraph. While I don't quite have a full grasp of the platform, I was able to put together enough to make the plot above.

No comments:

Post a Comment