Thursday, September 29, 2011

Zài Jiàn Replay

I eagerly installed VMware Workstation 8.0 today, hopeful that this update would have better support for replay debugging that would fix a longstanding bug in their GDB stub's handing of the 'monitor stopat' command. My enthusiasm lasted until I loaded the program for the first time, and though to myself:

That's funny. Where'd the "Record" button go?

After scouring the VMware support documentation and blogs, I finally discovered where it went. According to the blog of E. Lewis, an engineer at VMware:

The Workstation management team concluded that not enough people had demonstrated the need or invested the time necessary to configure and use the feature, so they decided to dedicate our engineers to features that would be used by more developers and other customers.

Yep, it's gone for good. Amidst a slew of new features added to Workstation 8, a very useful one — the whole reason I use virtual machined in development — was quietly removed. I'm very sorry to see it go. It will be interesting to see if VMware will ever reinstate this feature, or if a (small) swarm of angry developers can get together and implement the feature for some open source VM system.

Monday, October 18, 2010

System Architecture of the Future

I just read an interesting HotOS paper titled: "Turning Down the LAMP: Software Specialisation for the Cloud." The authors explore the idea of condensing the software stack needed for an application. For instance, your application code is using a language runtime, isolated in a user-space process, talking to a MySQL process, reporting back to an Apache instance, running under the Linux kernel, all executing as a virtual machine in the Xen hypervisor. The paper introduces the Mirage system to shrink this down to three layers: App Code, Mirage Kernel, and Xen.

Specifically, they compile an OCaml program straight from source code to a virtual machine image that can run under Xen. The stuff that would be provided by the OS, like concurrency, storage, and isolation, is provided by libraries and the OCaml runtime. The runtime only includes features needed by the application, so it's lighter and faster than Linux. Without having to worry about Posix, the application can make better use of system hardware. It can do some things better: OCaml programs can be statically verified to be safe, so everything runs in only one address space with minimal memory protection (like Singularity).

This was just a workshop paper, so their ideas and evaluation aren't flushed out fully. I look forward to seeing the full version of this paper.

This idea of specializing a kernel for the application isn't a new idea. But when you think about this as a cloud service, the Mirage model starts to look like a different picture: Exokernels. Let's see:

  • One thin layer that multiplexes hardware: Check! (exokernel / Xen)
  • OS functionality provided by application: Check! (libos / Mirage kernel)
  • Application runs on the kernel best suited for it: Check!
The tricky parts of an exokernel come from its attemts to share some state among its applications. (Can one Linux-like libos provide an effective file system, with permissions, to many applications? How do two different apps using different libos-s share disk blocks?) As a cloud service provider, there are no problems! Just use virtual disks (i.e. a static partitioning of disk space without sharing) and limit VM communication to the networking system (i.e. message passing).

So 16 years after exokernels were first proposed, we're back to exploring the same architecture. Except our exokernel is called Xen.

Sunday, January 17, 2010

Visualizing CVS

In a retrospective mood after finishing development on a long-running project, I started wondering just how much coding had been done. How much code has been written, over what timescale, and by whom? Of course, all the answers are sitting there in our CVS repository. I just need a nice way to visualize it.

After hacking around for a bit, here's a pretty graph I came up with: (click for larger size)

To explain: Each circle in this graph represents a single CVS commit to our custom Linux kernel. A commit is positioned along the X-axis according to the date it was committed. Each tick represents one month, so you're seeing a bit over a year's worth of development out of the repository's entire lifetime. The size of each circle shows how big the commit was: the unified diff for each commit was run through diffstat to determine its lines added (A) and lines removed (D). The area of each circle is proportional to A+D. The color of the circle shows who performed the commit. Positioning along the Y-axis is random, just to spread out the circles a bit.

This project was driven mostly by paper deadlines. I have drawn lines to highlight two important dates. Our first deadline was when the project's paper was first submitted to a conference (line #1). This marked an enormous push by everyone to get the system fully working. Work beyond that was at a subdued pace, though not completely absent. Though we had a paper draft submitted, we knew what its problems were, so we kept trying to improve the system.

Our paper thankfully got accepted, so the next deadline was the camera-ready version of our paper (line #2) that addressed our reviewers' comments and included up-to-date results. It was a mad dash there at the end, but we were able to add enough system support to get some additional benchmarks working in time for our deadline.

A bit about the graph generation: I used cvsps as my interface to the CVS repository. This tool collects and correlates information about the commits to each file and presents that information as a series of patch sets. Strictly, each point on the graph shows a patch set. Cvsps provided commit date and author, and it would also extract the patch for each change set to be fed to diffstat.

The intermediate output of this process was saved to text files and then tweaked. Patch extraction was a rather length process, probably because I wasn't doing this on the CVS server itself. Also for some reason, one of the patch sets contained a new version of the file rather than a 4-line patch. Something was obviously wrong when I got a circle 100x larger than the rest. That's really the only thing that needed hand-modification.

That information was then loaded through a Python script that used matplotlib to generate the graph itself. I'd never really used that graphing tool before, but I was sick of working with gnuplot and jgraph. While I don't quite have a full grasp of the platform, I was able to put together enough to make the plot above.