Skip to content

GSoC 2018 Project Ideas

Quoc-Sang Phan edited this page Mar 24, 2018 · 19 revisions

Project Ideas

Please note that this list is not exclusive. If you have other ideas and topics related to JPF, please let us know on <jpf.gsoc [at] gmail.com> or the JPF Google group.

JPF Infrastructure

JPF Application Domains

Separation Logic

Symbolic Execution

Fuzzing

Smart Contract

Android

Concolic Execution

Environment and Test Case Generation

Projects Descriptions

Upgrading the Build System from Ant to sbt

The goal of this project is to improve the JPF build system. Currently, JPF uses Ant, and this project includes changing the JPF build system to sbt. This also includes bringing the configuration mechanism of JPF under sbt. Currently, the configuration mechanism is part of the core of JPF, jpf-core. The goal is to make this functionally as part of the build system.

Support Java 9 for jpf-core

jpf-core is essentially a JVM that currently supports only Java 8. The goal of this project is to make it up-to-date with new features of Java 9. First, the JPF source itself has to be compatible with Java 9. Second, JPF should support new features of Java 9 bytecode and archives. Among new features of Java 9 are multi-version archives (JAR files) and the ability to link JAR files before they are used by the JVM.

Visualization of Execution Traces v2

JPF is able to find notorious concurrency bugs such as deadlocks. Although finding bugs is one of the major strengths of JPF, providing feedback to the programmer is one of its main weaknesses. For example, for a deadlock JPF provides the programmer at which line each thread is stuck. Although this is of some use, what is much more valuable is to report how each thread got to that point. This project is concerned with extending jpf-visual, a visual analytics tool from GSoC 2017. The web site has about a dozen possible enhancements listed as open issues Some of these may be feasible to implement as a GSoC project. Other ideas for enhancements are of course also welcome.

Model Checking Distributed Java Applications

jpf-nas is an extension of JPF that provides support for model checking distributed multithreaded Java applications. It relies on the multiprocess support included in the JPF core which provides basic functionality to verify the bytecode of distributed applications. jpf-nas supports interprocess communication via TCP sockets by modeling the Java networking package java.net. This tool can handle simple multi-client server applications. Some examples can be found in the jpf-nas distribution (at jpf-nas/src/examples/). The goal of this project is to extend the functionality of jpf-nas in various ways, such as extending the communication model supported by the tool towards an existing open source Java library/framework, called QuickServer, increasing the performance of the tool by improving the mechanism used to manage the state of communication objects, extending the tool with the cache-based approach used in net-iocache, etc.

Verification of Multi Agent Systems

The goal of this project is to develop techniques that analyze key properties in multi-agent systems. The jpf-mas extension will initially provide the ability to generate the reachable state space of Brahms models. The reachable state space can then be encoded into input for a variety of model checkers such as SPIN, NuSMV and PRISM, thereby enabling the verification of LTL, CTL and PCTL properties. The project will also need to investigate how to generate the set of reachable states for other kinds of models, such as Jason models, and how to compose reachable states of different modelling languages both at run-time and off-line.

Synthesis to repair heap-manipulating programs

This project aims to automatically repair a data structure if it is implemented incorrectly, given its specification in separation logic. This project is based on Java StarFinder, a new symbolic execution engine developed in GSoC 2017. Reference:

Verification of unbounded heap-manipulating programs via learning

The goal of this project is to prove (or refute) a Hoare triple {Pre}P{Post}, where the program P contains unbound loops, and loop invariants are not available. We will use Java StarFinder with the precondition {Pre} to generate test inputs for the program. We then execute the program with these inputs, and synthesize loop invariants from program executions using machine learning techniques. The resulted loop invariants will be validated by verification. If an invariant is invalid, we will obtain a counter-example, which is then used as a new test input, and the process repeats. Reference:

Concolic execution with separation logic

Java StarFinder (JSF) currently performs classical symbolic execution on heap-based programs. This project aims to extends JSF with concolic execution. Reference:

Verification of Event-Driven Applications

The goal of this project is to evaluate and/or advance existing state-of-the-art tools for analysis of event-driven applications, for example jpf-awt, jpf-android. Some of the issues that need to be addressed include event sequence generation, search heuristics, and scalability. Other related problems and solutions are welcome.

Refactoring SPF constraint library

SPF constraints need to be refactored to allow different kinds of constraints to be combined during the construction of a path condition. An example of how it should be after the refactoring is the Abstract Syntax Tree constructed by GREEN.

"Higher Order" veritesting

It is well known that one of the limiting factors in symbolic execution is the path explosion problem; complex programs have billions of paths that make exhaustive exploration computationally infeasible. Recently, an approach for dramatically reducing the number of paths was proposed by Brumley in the paper: Enhancing Symbolic Execution with Veritesting, which led to dramatic performance improvements for an x86-assembly-based symbolic execution engine. This engine, run against the Debian Linux distribution, found over 11000 crash bugs and 154 privilege escalation bugs in the distribution, and achieved far greater coverage than previous approaches.

Last year, in Google Summer of Code 2017, we implemented initial support for static regions (similar to Veritesting) in JPF, and we have seen 100x - 1000x speedups on some models, such as the TCAS and the WBS benchmarks. However, on other benchmarks we see little performance improvement because non-local control jumps and object creation limit the applicability of static regions, especially due to the JVM's use of exceptions and the nature of Java programs as opposed to C/C++ programs (e.g., small dynamically-dispatched methods).

In this project, we would dramatically improve the applicability of static regions by examining "partial static regions" that contain most of the paths through a function, but skip paths involving exceptions or object creation. These would be handled by coordinating with SPF by adding a path constraint to SPF to handle the remaining "exceptional" paths. In addition, we will support for "higher order" static regions, in which we inject the region for a function into another region. With these additions, we expect to achieve 100x to 1000x speedups for arbitrary Java programs for symbolic execution.

In addition, we would like to examine the effect of static regions on other aspects of symbolic execution, such as test generation and probabilistic symbolic execution.

Whitebox Fuzzer and Grammar Learner

Build a whitebox fuzzing tool on top of Symbolic PathFinder, that can learn the input grammar of a piece of code in an iterative fashion. The idea would be to first run the code on symbolic input of a fixed length and then learn a possible grammar for this length, at that point extend the length and generalise the grammar. The main research goal behind this project is to see if one can do whitebox fuzzing without a pre-determined seed file (which is the way most whitebox fuzzers work at the moment).

Fuzzing and Symbolic Execution

Develop a fuzzer for Java that can be integrated with SPF (or another Java based symbolic execution engine). The idea would be that when fuzzing gets stuck and makes no progress that the symbolic analysis can create a new seed file to allow analysis to progress.

Comparison between concolic and classical symbolic execution

Comparison between concolic execution, e.g. DEEPSEA and JDart, and classical symbolic execution, e.g. SPF.

Analysis of Android Applications

Various ideas are welcome here. Here are a couple of possible subprojects:

  1. The goal of this project is collecting interesting Android applications, and evaluating and applying JPF-Android to analyze them. JPF-Android is an extension to JPF used to model check Android Applications. Android applications have many dependencies which make them hard to test and verify. They also require events to drive the execution of the applications. The goal of this project is to identify interesting Android applications to run on ]JPF-Android and then evaluate the efficiency and effectiveness of the tool on these apps. You will be using/improving existing approaches to generate stubs and models for applications and then compare the coverage and runtime on JPF-Android to other dynamic analysis tools.

  2. This project includes using JPF-Android to generated test sequences for android applications, and implementing a tool to convert these sequences into tests that can be run on the emulator. JPF-Android verifies Android applications outside of the Android software stack on JPF using a model environment to improve coverage and efficiency. It generates event sequences to drive the execution of the application during exploration. Each sequence also includes the configuration of the environment (device) for which the sequence was executed. This project uses the AndroidViewClient API in Python to run the set of event sequences as detected by JPF-Android on an emulator to find the number of valid sequences and the code coverage they obtain compared to JPF-Android.

Smart Contract Analysis

Develop a mechanism to allow the analysis of Ethereum Virtual Machine (EVM) bytecode by replacing the JVM bytecodes with EVM bytecodes within JPF. The second part of the project would be to extend the bytecodes further to allow symbolic execution as well.

JDart for Dynamic Taint Analysis

JDart is a dynamic symbolic execution engine Java based on Java PathFinder (JPF). The tool executes Java programs with concrete and symbolic values at the same time and records symbolic constraints describing all the decisions along a particular path of the execution. These path constraints are then used to find new paths in the program. Concrete data values for exercising these paths are generated using a constraint solver.

Currently JDart has two mechanisms for marking data values in a program for symbolic analysis. First, JDart analyzes methods during starting symbolic analysis. All parameters of an analyzed method are treated symbolically. Second, special annotations can be used to mark class members that should be analyzed symbolically. This mode of operation is geared towards generating test cases as well as the symbolic summaries of methods.

The goal of this project is to add a new mode of operation to JDart, in which symbolic variables can be introduced dynamically during execution – and also be made purely concrete again. This can be done by introducing symbolic variables for (or removing symbolic annotations from) the return values of specific methods (e.g., Random.nextInt()) – using listeners provided by JPF. This new mode of operation would have two benefits:

  1. Existing programs can be analyzed without modifications. And analysis is not confined to individual methods. This will, e.g., enable JDart to analyze the Java programs in the SVCOMP benchmarks (Injection Flaws).
  2. It is one step towards enabling using JDart for dynamic taint analysis, e.g., for analyzing potential for injection attacks (Minepump). Web-applications tend to contain code of form:
u = Request.getValueOf(“user”); //Taint u (symbolically) 
query = “SELECT user FROM table WHERE uid=”+u; //query becomes tainted
query = santitize(query); //Un-taint query (symbolically.) 
db.select(query);

If the value of u is not properly analyzed and sanitized, an attacker can exploit this code to gain access to arbitrary information in the applications database. Making the return value of Request.getValueOf(“user”) symbolic, is the first step towards tracing taint of this value.

Please note: For a full dynamic taint analysis, JDart would also have to be extended to analyze strings and arrays symbolically --- this is out of the scope of this project.

New Features for JDart

JDart is an open-source, dynamic symbolic analysis framework built on Java PathFinder. It has been applied to industrial scale software, including complex NASA systems.

We have many ideas for improving JDart and welcome additional ideas too:

  1. JDart supports various (fixed-size) symbolic data structures, e.g., arrays and HashMap. It would be interesting to improve this, in particular with symbolic array indexes and support for unbounded data structures possibly with Lazy Initialization.
  2. Add more exploration strategies to JDart. In particular, new exploration heuristics could be interesting, e.g., for targeted concolic execution. It could also combine other static analyses, such as, program slicing, where the computed slice can be used to constrain the exploration.
  3. JConstraints is a solver abstraction layer used by JDart to interact transparently with SMT solvers. JConstraints has support for some solvers, e.g., Z3 and SMTinterpol. Often, however, one needs to select a solver that is best suited for the constraints generated (e.g., linear, non-linear). It could be very useful adding additional solvers to JDart based on JConstraints and evaluate them to understand better their applicability. This could also comprise adding a general interface to solvers that support the SMTLib format.

Concolic Execution for Android Apps

JDart is an open-source, dynamic symbolic analysis framework built on Java PathFinder. It has been applied to industrial scale software, including complex NASA systems. This project seeks to extend this capability to Android applications by supporting the Dalvik instruction set, e.g., by using jpf-pathdroid. This would enable analyses build on JDart to also work for Android, e.g., automated test case generation, finding bugs, and program understanding.

Support for parallel or distributed exploration in JDart and Regression tests for JDart

Here is an incomplete list of ideas and projects for extending and improving JDart. If you would like to work on any of the projects or have your own ideas (or just want to contribute to JDart), let us know and we can talk more.

  • Add regression test suite
  • Add visualization capabilities to JDart. This could be really useful -- especially for program understanding and debugging. It could be as simple as translating the constraints tree to DOT, but it would be a great feature to have more powerful, interactive visualizations, e.g., browser-based with d3, or using Gephi, yEd, jung, prefuse, or jgraph.
  • Finish (or re-implement) JConstraints SMTLib interface and experiment with other solvers. dReal is (partly) integrated using this, but it would be interesting to experiment more
  • Add more search heuristics and evaluate
  • Make support for PathDroid. This would allow JDart to analyze Dalvik bytecode programs
  • Work on constraints caching in a similar fashion as Green (or make support for Green)
  • Add support for parallel/distributed exploration
  • Add support for symbolic data structures. Maybe ala Lazy Initialization
  • Improve test suite generation from symbolic analysis. Currently, static target methods are fully supported while instance methods are only partly supported
  • General refactoring and code improvement

Environment and Test Case Generation for Specific Domains

When model checking applications belonging to specific domains (e.g., Swing, Android), JPF users have to provide application environment, consisting of test drivers and models for libraries that are too complex for JPF to run. The goal of this project is to evaluate the existing (or provide new) semi-automated support for generation of test drivers and library models/stubs based on the results of domain-specific static analysis, specifications, run-time information, or other suitable techniques. Once generated, such drivers and stubs can be used to verify applications belonging to specific domains using appropriate jpf extensions (e.g., jpf-awt, jpf-android). The project can be implemented on top of OCSEGen or another suitable tool.

Clone this wiki locally