Theseus at CHI 2014

Joel Brandt, Robert C. Miller, and I wrote about Theseus and always-on programming tools. It was accepted at CHI in 2014 and I presented it in Toronto.

Paper

I took the time between paper acceptance and the conference talk to refine our message, so the slides below are probably where I would start if I were you.

Lieber, Tom, Joel Brandt, and Robert C. Miller. “Addressing Misconceptions About Code with Always-On Programming Visualizations.” CHI 2014.

@inproceedings{Lieber:2014:AMC:2556288.2557409,
 author = {Lieber, Tom and Brandt, Joel and Miller, Robert C.},
 title = {Addressing Misconceptions About Code with Always-on Programming Visualizations},
 booktitle = {Proceedings of the SIGCHI Conference on Human Factors in Computing Systems},
 series = {CHI '14},
 year = {2014},
 isbn = {978-1-4503-2473-1},
 location = {Toronto, Ontario, Canada},
 pages = {2481--2490},
 numpages = {10},
 url = {http://doi.acm.org/10.1145/2556288.2557409},
 doi = {10.1145/2556288.2557409},
 acmid = {2557409},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {code understanding, debugging, programming},
}

Conference Presentation

Introduction and Motivation

Think about all the feedback you get when you’re starting your car.

It starts when you put the key into the ignition. You can feel the key‘s teeth running over the tumblers. You can hear the engine starting and you can feel it rumbling when it does. And when you push the accelerator, you can hear the engine revving up and you can feel the pedal pushing back into your foot. And finally, you can see your car moving forward.

But it’d be a little different if we were starting a software car. You can imagine one implemented like this, with functions for each of the things that a car might do, like accelerate, brake, honk, and so on.

When we’re programming, the information we have available to us is much different than the information we have when we’re starting the real car.

We have the code that we’re editing, and we have the output of the program. But what’s invisible most of the time is all the internal state and how execution is progressing. We had access to it in the real car because we could hear and feel everything that was happening, but for the software car, everything happens silently in computer memory.

That makes a big difference when it comes to understanding what’s happening when things go wrong. With the real car, our human, pattern-matching brains get accustomed to sounds and the feel of normal operation, so that when something goes wrong, we have an idea of where to start.

For example, if we go through all the steps of starting a car, press the accelerator, and it doesn’t go forward, with the software car we don’t have much to go on, but with the real car, we would have known way back when we turned the key that the problem is that the engine never started.

So, if this is what programmers are looking at, the input and output, then this is where we should show any and all information we can to help them maintain a solid understanding of what their code is actually doing. And that’s the motivation for our research.

We’re looking specifically into what I’m going to refer to as “always-on” interfaces—interfaces that are intended to be used by programmers at all stages of programming. This is in contrast to debugging tools, which are invoked only when there’s some kind of a problem.

Today, I’ll be talking about always-on interfaces integrated into the code editor specifically.

Framing Our Research

So the aim of our research is to figure out a few things:

The first is, would this style of always-on interface be helpful to programmers?
If so, how do they help people? How does it change their behavior, or what information needs does it satisfy that currently are not?
And then, once we know that, how do we design them to maximize their benefit, and how can we design and implement them well, given that?

Presenting Theseus

So we built an always-on tool (that I’ll show you in a second) to help us answer those questions, and we had a few goals.

First, we wanted it to help answer reachability questions, the kind that LaToza and Myers found are some of the hardest for programmers to answer. These are questions about code paths, where you want to find all execution paths through your program that satisfy certain conditions. For example, maybe you want to find all code paths in your program where you call malloc() without calling free(), or maybe all of the places in your program that accidentally make blocking network calls on the UI thread.

Secondly, we wanted our tool to have a low barrier to entry and a high ceiling. To do that, we looked to two of the most popular debugging tools, breakpoints and log statements, which when combined are fairly approachable and can satisfy many information needs, but which require a lot of effort that we thought we could eliminate.

Now I’m going to show you a bit of Theseus's interface.

Here’s a bit of JavaScript in the Brackets IDE.

fetch() begins a file download.
The file is loaded into memory by listening for the ‘data’ events and adding the data to a buffer.
When the stream ends, fetch() returns the result asynchronously, or returns the error if there was one.

At this moment, Theseus isn’t running. If we start it…

Call counts appear.

These call counts are designed to look a little bit like breakpoints, but they have a little more information inside, which is the number of times that breakpoint has been “hit” (although it doesn’t stop execution).

So then let’s say I click one of those call counts, the one for the ‘data’ event handler.

That adds the information about that function to the log at the bottom. It’s printing the value of every argument that was passed in (here there’s just data). It also shows the object that the function was called on, this.

Both of those values can be explored as if you really were stopped at the debugger at that point, and Theseus makes some effort to ensure that the value is as it existed at the time, without any mutations that may have happened since.

Now, let’s click a second function’s call count…

Now it’s starting to look like the log of someone using print debugging to understand a program, except instead of having to type code and restart the program, we’ve just clicked our mouse twice.

Finally, let’s click the call count for fetch() itself.

What’s happened is that our log doesn’t show the entries in strict chronological order. What makes it useful for answering reachability questions is that when one entry is for a function that’s a caller of another, we nest those entries to make that relationship clear. Because of that, we can now tell which call to fetch() each of those ‘data’ and ‘error’ events were for. We can clearly see that while it looks like our first call to fetch() received data, the second received no data and got some kind of download error.

Design Principles

So just for a second, I want to step back and look at some of the design principles that we feel worked out well for this interface.

It’s possible to answer a lot of questions with a little bit of information, as long as it’s the right information. But a code editor is a noisy place, so you have to choose the information carefully. And though the point of all this is to help the programmer maintain a correct mental model by being constantly exposed to how their program works, there’s just too much going on for it all to be presented. We have to be selective.

For us, the most important bit of information about how many times a function has been called is whether it’s been called zero times, or at least once. We devoted the most space to that information by coloring the entire background of a function gray if it had never been called.

The call count itself has more information but is inherently noisier and had to be relegated to the gutter to the left of the function and given much less space.

Secondly, we should think about the efficiency of the interface. The idea here is that, even though developers ask questions that often require a full-blown interface to answer well, such as our log, we can often provide always-on versions of those tools embedded in the code editor.

At the very least, that may save the programmer some time if they’re able to open your tool using the context of the code they’re looking at. But at its best, the always-on interface has the opportunity to answer their questions without them having to click on anything, or they might even clue them into problems that would have been invisible without the always-on displays.

Research Question

When we began evaluating this interface, we had one major question, which was how programmer behavior changes when they use always-on programming tools. We ran two studies, one after the other, in order to find the answer, and I’m going to share some of the results of those studies now.

Evaluation 1

During the study, we watched people solve five programming challenges and what I’m going to present next is a sampling of the behaviors we noticed that provided insight into how always-on interfaces like Theseus would be used.

So there were several ways that programmers found to use the real-time call counts in the interface to help them, but I’m going to single out 3 of them to give you an idea.

This is analogous to the malloc()/free() example from earlier.

Evaluation 2

Programmers in our study requested more types of always-on displays, such as the ones listed here. Now, they’re not interface designers, but it’s important to note that these are information needs which are technically already served by existing tools such as profilers, breakpoint debuggers, and Theseus, but the programmers who used Theseus for a week requested always-on displays to surface this information as well, which indicates that there are many opportunities here.

Future Work

One important area of future work for Theseus is to respond to a common source of mental burden in the study, which is that programmers occasionally had to memorize call counts to use some of the debugging strategies that involved comparing call counts over time or between functions. Theseus does not provide the user with very much flexibility with how it displays information, so information wasn’t always where programmers needed it most.

Secondly, to understand how always-on interfaces are used, it’s very important that we study more diverse populations. One of the major drawbacks of our current study is that we were only able to recruit male programmers with varying amounts of experience, and left out many groups with different backgrounds who may have responded to Theseus's always-on displays much differently.