Towards better simplifications in SMT solvers with applications in string solving
- Today, software is used in a wide range of critical systems. Ensuring that these systems are working as intended, secure, and safe is of utmost importance. It is well known that rigorous reasoning about programs can be achieved by encoding their semantics and desired properties as logical formulas and then using Satisfiability Modulo Theories (SMT) solvers to determine the satisfiability of those formulas. While SMT solvers have been employed successfully for those tasks in both industry and academia, making them scale to larger and more complex problems is an ongoing endeavor. An important ingredient in making SMT solvers more scalable is the simplification of formulas. While all state-of-the-art solvers implement aggressive simplification techniques to scale, this aspect of SMT solvers has seen comparatively little attention. In this work, we discuss simplifications along two dimensions: rewrites, which replace terms in a formula with terms in the same logic that are cheaper to reason about, and reductions, which reduce a problem to an equivalent problem in a simpler logic, which can be handled with a less complex solver that is easier to implement. In doing so, we are particularly interested in the theory of strings, which has recently received significant attention. Strings are native types in modern programming languages and widely used but notoriously difficult to reason about due to the complexity of common operations performed on strings. There has been a lot of effort invested in developing dedicated string solvers as a result. As we show in this work, the performance of a modern string solver is strongly dependent on its rewrites and reductions. In this work, we discuss some of the challenges associated with developing good rewrites and reductions and propose approaches to address them. Currently, developers of SMT solvers painstakingly analyze challenging SMT problems to find inspiration for new rewrite rules, manually prove them, and then implement them in programming languages optimized for speed and not correctness. Given the use cases of SMT solvers, wrong answers can be disastrous. Clearly, this situation is suboptimal. In this work, we propose a new domain-specific programming language for expressing rewrite rules in an SMT solver. Using a high-level language enables checking the correctness of rewrite rules. We also propose an approach for automatically discovering missing rewrite rules in a given SMT solver by repurposing existing machinery for Syntax-Guided Synthesis (SyGuS). Using some of the insights gained from our automated rewrite rule discovery, we develop new kinds of rewrites for the theory of strings that reason about high-level abstractions of terms in an effort to avoid or simplify reductions. Reductions cannot always be avoided, so we discuss how we can make them more efficient. Finally, we propose a decision procedure for a fragment of the theory of strings that includes a string to code point conversion. This fragment can be used as an efficient target when reducing certain kinds of constraints. We implement these techniques in the state-of-the-art SMT solver CVC4. We show that implementing missing rewrite rules and performing limited forms of reasoning in the rewriter can halve the number of timeouts on symbolic execution benchmarks, optimizing reductions can reduce the number of timeouts by over $60\%$, and implementing a decision procedure for string to code point conversions can reduce the number of timeouts by more than an order of magnitude on benchmarks with conversions between numbers and strings.
|Type of resource
|electronic resource; remote; computer; online resource
|1 online resource.
|Nöetzli, Andres Philipp
|Engler, Dawson R
|Degree committee member
|Degree committee member
|Engler, Dawson R
|Stanford University, Computer Science Department
|Statement of responsibility
|Submitted to the Computer Science Department.
|Thesis Ph.D. Stanford University 2021.
- © 2021 by Andres Philipp Noetzli
Also listed in
Loading usage metrics...