Data Flow Research

Purpose

For my Ph.D. work, I developed a type inferencer for Smalltalk and a modified IDE that uses it.

Smalltalk is designed for a general audience, up to and including children, with the idea that a programming language doesn’t have to have a high floor to have a high ceiling. As such, the only time you write a type in Smalltalk is when you want to create an instance of a new object.

I used data-flow analysis to infer types even where they aren’t written. Based on the constraints of the problem, the resulting “DDP” algorithm has two special properties that combine well together:

It is demand-driven: it posts goals to itself and then tries to answer those goals. A developer who is working in Smalltalk will be changing the program all the time, so it's not reasonable to infer types for the entire 300k+ lines of code that a program can easily contain. Instead, queries are issued for whichever part of the code the developer is currently inspecting.
It creates speculative subgoals that can be pruned, with more aggressive pruning for subgoals further from the root. Smalltalk is a powerful language, including both virtual method call and first-class function literals (lambdas). The solution in DDP is to phrase all goals in a way that thay have a trivial answer that is cheap to compute but still sound. This approach allows using aggressive context-sensitivity near the root of the goal graph, while keeping the overall goal graph with just a few thousand total goals in it.

To my knowledge, DDP is the first type inference algorithm that is effective in the environment it was designed for: full-sized programs (100+ kloc), live edited, with interactive response times, for a general-purpose language with dynamic dispatch.

Documents

The most comprehensive documents about DDP and Chuck are the following two:

"Demand-Driven Type Inference with Subgoal Pruning", an online report. A revised version of my dissertation in a readable style for online PDFs.
"Demand-Driven Type Inference with Subgoal Pruning", my dissertation. My submitted and approved dissertation at Georgia Tech, using the official dissertation style. The raw experimental data that supports the dissertation is in chuck-results-f2004.tar.bz2.

In addition, the following peer-reviewed conference papers have been published about DDP and Chuck:

ECOO 2004: "Demand-Driven Type Inference with Subgoal Pruning: Trading Precision for Scalability". This paper describes the DDP algorithm in general. The experimental results described in the paper are in "Classification of Types Inferred in DDP Experiment" The raw experimental data is in chuck-results-ecoop04.tar.bz2.
WCRE 2005: "Semantic Navigation of Large Code Bases in Higher-Order, Dynamically Typed Languages" This paper describes the Chuck user interface. This describes DDP being used for its originally intended goal.
DLS 2005: "Dynamic Data Polyvariance Using Source-Tagged Classes".. This paper describes the "class tags" extension to DDP, called DDP/CT. DDP/CT can effectively analyze data-polymorphic code such as collection types.

To cite these papers, you can use the details in lexspoon.bib.

Code

ChuckDemo.zip is a pre-loaded Squeak image that lets you download Chuck and play with it immediately. To use it, you will also need to download a Squeak VM for your platform.