Projects

Bingo: An Interactive System for Ranking Static Analysis Alarms

Because of well-known limitations such as undecidability and the requirement to scale to large real-world codebases, static analysis tools are forced to make approximations, which in turn result in large numbers of false alarms. Bingo extracts a probabilistic model directly from the analysis rules, and thereby associates each alarm with a confidence value and prioritizes true alarms over false positives. It can then gracefully incorporate user feedback into updated confidence values by Bayesian inference. By leveraging human interaction to significantly reduce the alarm inspection burden, we hope to make important progress towards the widespread adoption of static analysis tools by professional programmers.   [PLDI 2018], [MLP 2018], [PLDI 2019]

Drake System Diagram

Code2Inv: Inferring Loop Invariants by Reinforcement Learning

Inferring loop invariants is an archetypal challenge in automatic program verification. Existing systems may be largely described as carefully searching through a space of potential invariants, and aggressively pruning infeasible branches. But programs can be represented as graphs—abstract syntax trees, def-use graphs, etc.—and graph neural networks have previously shown surprising success in various graph search and combinatorial optimization problems. Code2Inv is a system that considers loop invariant synthesis as an instance of reinforcement learning, where the learner plays against an adversarial theorem theorem prover and learns the structure of successful loop invariants. In preliminary experiments, the system learns from significantly fewer calls to the theorem prover, and can tolerate large numbers of confounding variables commonly seen in large software systems.   [NeurIPS 2018 Spotlight], [Code]

Code2Inv System Diagram

SyGuS: Interchange Format for Syntax-Guided Synthesis Problems

To facilitate both the development of application-specific program synthesizers and algorithmic research into new backend synthesis algorithms, we proposed the SyGuS interchange format. We based its design on the observation that program synthesis problems can often be decomposed into a semantic specification describing the input-output behavior of the desired program, and a syntactic specification describing its appearance.   [Proposal], [Format specification], [Repository of solvers and benchmarks]

SyGuS Project Logo
Synthesize a function \(f(x, y)\) of the form \(\texttt{IntExpr}\), where \[ \texttt{IntExpr} ::= 0 \mid 1 \mid x \mid y \mid (+\ \texttt{IntExpr}\ \texttt{IntExpr}) \mid (\texttt{ite}\ \texttt{BoolExpr}\ \texttt{IntExpr}\ \texttt{IntExpr}), \] \[ \texttt{BoolExpr} ::= (\leq\ \texttt{IntExpr}\ \texttt{IntExpr}) \mid (\texttt{and}\ \texttt{BoolExpr}\ \texttt{BoolExpr}), \] and such that for all integers \(x\), \(y\), \(f(x, y) \geq x\) and \(f(x, y) \geq y\).

Difflog: Synthesizing Datalog Programs using Numerical Relaxation

Difflog is a simple extension of the classical logic programming language Datalog to the continuous domain. Most simply, we attach weights to each rule of the Datalog query, and provide a mechanism so that these weights propagate through derivations to the output tuples. Changing the rule weights changes the values of the output tuples, and conversely, query synthesis is the problem of determining the rule weights which maximize conformance between the tuples actually output by the program and the output we would like to see. Datalog program synthesis is an important problem in inductive logic programming, database query synthesis, and data-driven static analysis. Difflog-based program synthesizers demonstrate improved performance and scalability than the state-of-the-art.   [MLP 2018], [IJCAI 2019]

SWIM: Natural Language Code Search

SWIM synthesizes idiomatic snippets of C# code to answer API-related natural language queries such as "Generate MD5 hash code". By not having to explicitly specify type or method names relevant to their query, SWIM allows the programmer to explore the API framework. The patterns of idioms supported are quite general, including constructs such as if statements, and while and for loops.   [ICSE 2016]

SWIM System Diagram

StreamQRE and DReX: Domain-Specific Languages for Stream Processing

StreamQRE is a domain-specific language for efficient stream processing. Queries are modular, have simple, clear semantics, and enable fast one-pass evaluation. It borrows ideas both from quantitative automata and from relational databases. DReX is a related formalism tailored to the domain of string transformation utilities. These are the main software artifacts I developed during my Ph.D.   [StreamQRE], [DReX], [LICS 2014], [POPL 2015], [ESOP 2016], [PLDI 2017], [Thesis]


Last updated: Mon Jan 8 00:15:31 PST 2024