Donald

Professional Interests

I love building new tools (broadly defined) to help software developers be more efficient and focus on the interesting and fun parts of building software!

Concretely, I've worked on improving fundamental programming abstractions, such as serverless computing, as well as building better developer tools, such as compilers, runtime systems, and most recently, package managers. Going forward, I expect to think deeply about how generative AI tools can best interface with human developers.

I'm excited to be looking for new opportunities for where I can apply my deep technical background to build the core engineering foundation at a startup / small company with a strong vision for the future. If that sounds like a good fit for you, feel free to shoot me an email!

Projects

  1. Automating Dependency Repair (in progress!): Dependency management is extremely tricky to get right, as programmers need to make sure their dependency version constraints are neither too broad nor too narrow. Compounding this, if there is just one package in your whole dependency tree that has a defective constraint, then you may end up with a solution which causes bugs at runtime. To address this problem, we're building automated dependency repair tooling, starting with the Python/Pip ecosystem. Building on our previous work (MaxNPM), we aim to integrate constraint solving with large language models' suggestions for patching constraints, to iteratively narrow the search space and find a repaired dependency solution.
  2. Optimizing Package Management (ICSE 2023): Dependency management is unfortunately not as simple as just installing the dependencies you want: part of software engineering now involves careful selection of dependency versions, to make sure that: you get newer versions, you avoid security vulnerabilities, you successfully unify dependencies on the same package into a single version, and so on. These goals are often at odds, and are not handled well by existing package managers' baked-in heuristics. To solve this, we built MaxNPM, a fork of the NPM CLI which lets users customize dependency solving goals, so that software developers can guide the tooling appriopriately for their situation. We evaluated MaxNPM with a large sample of packages from the NPM ecosystem and show that it can reduce vulnerabilities in dependencies, choose newer dependencies than NPM can, and can choose fewer dependencies than NPM.
    [paper] [talk] [github] [install]
  3. Big Data Analyses of NPM (MSR 2023, ESEC/FSE 2023): Package managers and their massive open-source ecosystems are the foundation for how any practical software is built. However, the behavior of how developers share and consume packages at-scale is not terribly well understood. How do developers specify their dependencies? How do developers tag new releases of their own packages? Do common practices lead to issues in dependency management, such as out-of-date dependencies or security issues? To answer these types of questions, we built a system which in realtime archives a replica of the entire NPM ecosystem, including both package metadata and code data (20+ TB). The dataset is now public and can be used for a wide variety of NPM dependency graph queries or big code analysis. Our analysis of the data finds some surprising findings, with an asymmetry between how developers tag updates that they publish vs. how developers write dependency constraints.
    [MSR paper] [ESEC/FSE paper] [ESEC/FSE talk] [dataset] [github]
  4. Evaluating LLMs Across Programming Languages (TSE 2023): Large language models (LLMs) are blowing up the internet now, both for casual natural language use as well as for programming tasks. ChatGPT, Codex, and other tools appear to be able to code fairly well, but how well depends on which programming language! We designed and built MultiPL-E, a systematic and extensible system for fairly evaluating LLMs across a large number of programming languages (18!). The key insight is that LLM programming benchmark suites (HumanEval, etc.) are written as Python unit tests, and unit tests are (almost always) written in a subset of Python and do not use features such as functions, loops, etc. Therefore, we were able to write trivial "compilers" to translate Python unit tests to nearly any other language, and obtain equivalent benchmark suites. This work was published in TSE 2023 and presented at ESEC/FSE 2023.
    [paper] [talk] [github] [website]
  5. Delimited Continuations for WebAssembly (DLS 2020): WebAssembly (Wasm) is a rapidly growing compilation target for the web, but lacks support for user-level or multiplexed threads, as seen in Go. Currently such threads must be simulated, leading to a significant performance penalty for Go code compiled to Wasm. We resolve this by adding support for delimited continuations (a form of stack capture operations) to WebAssembly, which allows for efficient expression of user-level threads, as well as many other interesting computational effects. This work was presented at DLS 2020.
    [paper] [talk] [github] [website]
  6. Serverless Computing (OOPSLA 2019, distinguished paper): Serverless functions are super convenient, but the underlying cloud platforms (such as AWS Lambda) nondeterministically reuse or restart the containers inside of which code is run, leading to bugs in real-world code. Documentation and online tutorials offer spotty guidance on what exactly this behavior is, and how a programmer can guarantee their code is safe. We help explain these dynamics by building a framework for analyzing the semantics of serverless functions, which others built on when designing richer serverless abstractions, such as Microsoft Azure's Durable Functions. This work was presented at OOPSLA 2019, and received a distinguished paper award.
    [paper] [talk] [website]

About Me

Currently, I'm a PhD candidate at Northeastern University, where I study programming languages and software engineering. I'm advised by Arjun Guha and Jonathan Bell, and I'm a member of the Programming Research Laboratory. Previously I was a PhD student at UMass Amherst, where I was additionally advised by Yuriy Brun, and a member of the PLASMA lab.