An Adaptive Pipeline From Scientific Data to Models

Under DARPA’s Synergistic Discovery and Design program, a team composed of scientists from Duke, Rutgers, Montana State, and Florida Atlantic Universities, as well as Geometric Data Analytics, and Netrias, Inc., broadly researched and developed data driven techniques for scientific discovery and robust design, proving feasibility through program challenge problems with Yeast States, Novel Chassis, Protein Stability, and Perovskite.

Figure 1. Flow diagram of integrating Haase Lab Aquarium into the SD2 Infrastructure.

The Duke Team, composed of scientists from Duke University, Rutgers The State University of New Jersey (Rutgers), Montana State University, Florida Atlantic University, Geometric Data Analytics (GDA), and Netrias, Inc., has worked broadly within the Defense Advanced Research Projects Agency (DARPA) Synergistic Discovery and Design (SD2) program, contributing to efforts in the Yeast States, Novel Chassis, Protein Stability, and Perovskite challenge problems (CP).

The SD2 program was structured across five technical areas (TAs), TA1 - Data-Centric Scientific Discovery, TA2 – Design in the Context of Uncertainty, TA3 – Hypothesis and Design Evaluation, TA4 – Data and Analysis Hub, and TA5 – Challenge Problem Integrator, where the Duke Team supported TA1 and TA3 capabilities. Early in the program it was realized that some of the data needed to achieve goals of this effort could not be produced by the automated and semi-automated TA3 laboratories. To merge the data collected in the benchtop laboratory of the Duke Team with the automated labs, it was necessary to utilize approaches for protocol execution and data collection that were utilized by the TA3 labs. In collaboration with the University of Washington (UW) Biofab team, Aquarium for the benchtop was developed, enabling the Duke Team’s benchtop lab to execute protocols and collect data, that when delivered to the database, was indistinguishable from data collected at the automated TA3 labs.

Before SD2, circuit designs in synthetic biology were focused on proposed functions. Findings early in the program indicated that synthetic genetic circuits did not perform functions consistently across varying growth conditions that likely alter parameters in which the circuit functions. At that time there were no tools for designing circuits taking into account both function and robustness to growth conditions. The practical importance of robust designs lies in the ability of circuits to perform their functions in the face of varying conditions of a deployment, even within highly controlled conditions such as a fermenter.

Dynamic Signatures Generated by Regulatory Networks (DSGRN) was developed as a design tool that could computationally assess the robustness of a particular circuit topology across parameter conditions. As well, with circuit performance data, DSGRN could infer the mode of failure of circuits so that they might be “repaired”.

DSGRN has been wrapped with extensive tooling to improve the design process, especially for logic circuits with arbitrary biological parts. Multiple design problems were tackled in Yeast States and Novel Chassis including the comprehensive analysis of all three node networks for bistability, the redesign of 2-input logic circuits for enhanced robustness, the redesign of a 3- input logic circuit for glitch removal, and the analysis of experimental data from an external Department of Defense (DoD) partner. The design process was reduced from months to days with 50-100 percent qualitative matching to data where it was available.

Several SD2 achievements facilitated this work. An easy-to-use design tool was created that predicts the robustness of logic circuit designs given user-supplied experimental constraints and network functionality requirements. The concepts of robustness of performance incorporating design parameters, neighboring parameters, and continuation in Hill function models were improved. Concrete connections between DSGRN parameters were developed as well as build constraints that are easily communicated to experimentalists. The number of DSGRN-computable network topologies was increased, including more complex networks (more in-edges, multiple edges between nodes, self-repressors, no in-edges, no out-edges, non-monotone interactions).

This work was performed by Steven B. Haase, Ph.D., Duke University, for the Air Force Research Laboratory. For more information, download the Technical Support Package (free white paper) below.



This Brief includes a Technical Support Package (TSP).
Document cover
An Adaptive Pipeline From Scientific Data to Models

(reference AFRL-2023059) is currently available for download from the TSP library.

Don't have an account? Sign up here.