1. The Molecular Programming Project The California Institute of Technology (Caltech) The University of Washington, Seattle (UW) Paul Rothemund Shuki Bruck Niles Pierce Eric Klavins Richard Murray LEADERSHIP Erik Winfree (PI) “ Creating the theory and practice of programming molecular systems” The MPP The Art of Molecular Programming 1962-2008 2008-20?? MPP 1 Biology Chemistry Nanotechnology Computer Science MPP
2. A tale of two technologies Size: 25 m, 30 tons 20 m, 50 tons. Smarts: multiple CPUs 7 kg brain Resolution: 45 nm in chips 0.3 nm everywhere Complexity: 10 6 parts, 10 10 transistors 10 17 cells, 10 27 proteins Construction: built in factory growth algorithm Specification: CAD files genetic program ? MPP 2 Molecular Programming
3. Programmable molecular subroutines MPP 3 DNA circuits DNA tiles 50 nm DNA walkers circuits self-assembly folding biology DNA origami 100 nm MPP team achievements dynamics
4. Growth of design complexity in DNA nanotechnology and DNA computing Complexity (nt) Time 1980 1990 2000 2005 2010 1995 1985 10 100 1000 10000 DNA 4-arm junctions (Seeman, 1982) MPP 4 doubles every 3 years conceptual advances
5.
6. Models of Computation folding self-assembly circuits dynamics MPP 6 Molecular program : a sequence of beads on a string, bond energies System state : a path on a square lattice System energy : sum of matching bond energies G = -3 Execution : flip moves Output : a finite shape
7. Models of Computation Molecular program : a set of tiles with attachment types and strengths System state : an assembly of tiles System energy : sum of matching bond energies G = -3 Input : an initial assembly Output : an extended structure MPP 7 Execution : attachment of matching tiles folding self-assembly circuits dynamics 1 1 1 0 1 1 0 1 0 0 1 1 0 0 0 0 1 B B B 1 B B B B B 1 B B 1 B B B 1 B B 1 B 0 0 1 1
8. Models of Computation folding self-assembly circuits dynamics Molecular program : a set of tiles with attachment types and strengths System state : an assembly of tiles System energy : sum of matching bond energies G = -3 Execution : attachment of matching tiles Output : an extended structure MPP 8 47 Input : an initial assembly c 0 1 c n 0 0 n n 1 0 c n 1 1 n c B B B 0 B B B B B c B B c B B B 0 B B 0 B n 1 0 c
9. Models of Computation folding self-assembly circuits dynamics Theory of Computation by Algorithmic Self-Assembly Turing universal for computation (Winfree, 1996) Program-size complexity (Rothemund & Winfree, 2000) Time complexity (Adleman, Cheng, Goel, Huang, 2001) Error-correction & fault-tolerance (Chen & Goel, 2004) Self-healing (Winfree, 2006) Graph grammars and rule synthesis (Klavins & Ghrist, 2006) Turing universal for construction (Soloveichik & Winfree, 2007) MPP 9
10. Models of Computation Molecular program : a set of formal chemical reaction steps System state : concentrations or counts of species System energy : chemical free energy Input : amount of input species Output : amount of output species MPP 10 Execution : chemical kinetics folding self-assembly circuits dynamics
11. Models of Computation folding self-assembly circuits dynamics Theory of Computation by Chemical Reaction Networks Digital logic circuits (Magnasco, 1997) Space-bounded Turing machines (Angluin, Aspnes, Eisenstat, 2007) Turing universal (Soloveichik, Cook, Winfree, Bruck, 2008) Formal machines and semantics (Cardelli, 2008) Time complexity? Linear systems & signal processing? Error-correction & fault-tolerance? Programming stochastic behavior? Reaction-diffusion and spatial organization? MPP 11
12. Models of Computation folding self-assembly circuits dynamics Molecular program : a set of units with attachment and detachment rules System state : an assembly of units and port states Execution : attachment and detachment of applicable units Output : a reconfigured structure MPP 12 Input : an initial assembly
13. Design process: DNA origami P. W. K. Rothemund, Nature, 440: 297-302 (2006) MPP 13 100 nm
14. Design process: algorithmic self-assembly K. Fujibayashi, R. Hariadi, S. H. Park, E. Winfree, S. Murata (Nano Letters, 2008) MPP 14 100 nm
15. Design process: DNA gate circuits G. Seelig, D. Soloveichik, D. Y. Zhang, E. Winfree, Science, 314: 1585-1587 (2006) MPP 15
16. Design process: self-assembly & disassembly P. Yin, H. M. T. Choi, C. R. Calvert, N. A. Pierce, Nature, 451: 318-422 (2008) MPP 16
23. Enabling Diverse Applications Genome-based manufacturing of inanimate objects Embedding systematic programmable molecular subsystems within biological, chemical, and nanotechnology systems. Programmable therapies Molecular instrumentation for probing cellular processes MPP 23 Circuits for detection & analysis of features within in situ samples. Circuits for diagnosis & response to diseases in living cells. Algorithms for growth of complex materials and structures. DNA patterned scaffold
24. Outreach, Knowledge Transfer, and Education NUPACK software On-line: thermodynamic analysis sequence design In development: kinetics simulations energy landscapes compiler tools Courses Textbooks Workshops Boot camps UG research (~60 in 5 years) Science-inspired Art Paintings by Ann Erpino. MOMA exhibit. MPP 24 K-12 visiting days Giving Pasadena & Seattle public school kids a personal view of science & higher education. Pasadena Seattle
25. Expertise and Project Management Paul Rothemund BE, CNS,CS, Caltech MacArthur Fellow DNA computing DNA origami materials science chemistry Shuki Bruck CNS, EE, Caltech IST founding director distributed systems circuit complexity fault tolerance stochastic chemistry Niles Pierce ACM, BE, Caltech Bioengineering chair numerical methods sequence design DNA engineering biomedicine Eric Klavins EE, UW NSF CAREER control theory robotics formal languages synthetic biology Richard Murray CDS, ME, Caltech IST director control theory robotics distributed systems synthetic biology LEADERSHIP Erik Winfree (PI) CS, CNS, BE, Caltech MacArthur Fellow DNA computing DNA self-assembly biochemical circuits theory of computation STUDENTS AND POSTDOCS Simple flat structure, encouraging independence and exploration: MPP pool of ~7 undergraduates, ~10 graduate students, ~2 postdocs, ~1 visiting scholar (UW + Caltech) other funding: ~5 undergraduates, ~14 graduate students, ~5 postdocs involved with MPP areas Monthly joint (cyber) group meetings, annual joint meeting at Caltech MPP 25
[0 min] Thank you for giving me the opportunity to present our vision for the Molecular Programming Project. The Molecular Programming Project aims to establish a foundation for the theory and practice of programming molecular systems. Our dream is to do for molecular systems what Knuth and his generation did for electronic computers. We believe that biological, chemical, and nanotechnological engineering have advanced to the point where an influx of ideas from computer science will be transformative; conversely, the attempt to understand information processing in these substrates will bring new questions and challenges to computer science.
[30 sec] Let me begin by considering two technologies, one engineered and one evolved. Here are two systems built for underwater locomotion, approximately the same size and weight. Both have embedded information processors that control their behavior. But the biological system is constructed at atomic resolution and has a complexity that dwarfs that of the human-engineered system. Furthermore, the biological system is more profoundly an informational system: rather than being built in a factory, the entire structure is grown from a single cell, an egg, according to the specifications of the developmental program encoded in its genome. Thus, biology is a proof of principle that molecular algorithms can guide the fabrication of complex systems and can dictate the response of a system to its environment. How can we create technologies that embed computation at the finest scale, controlling individual molecular events? Such technologies would fall somewhere between biology and conventional engineering. This is the realm that our Expedition aims to explore. Molecular programming will enable us to create novel molecular systems that can be embedded both within manufactured devices and within biological systems -- much as electronic microprocessors are now embedded within a wide range of electro-mechanical devices. Knuth identified subroutines such as sorting, searching, arithmetic, and enumeration as some of the fundamental building blocks for computer programs. What are the fundamental building blocks for molecular algorithms?
[2:00] We can take our inspiration from biology. Molecular programs in biology start with information in a polymer -- DNA and proteins -- and that information directs processes such as polymer folding to make devices, self-assembly to make extended structures, biochemical circuits to process signals and make decisions, and molecular motors to move things from place to place. Prior work by the MPP team has shown that it is possible to design, from scratch, DNA systems that achieve each of these functionalities: DNA origami for folding, DNA tiles for self-assembly, DNA circuits for signal processing and logic, and DNA motors for dynamics. Does this mean that we will soon be able to create, from scratch, molecular systems as complex and sophisticated as a biological cell? We have a long way to go.
[3:00] Let's look at the history of DNA nanotechnology and DNA computing. For a selection of prominent works, I've plotted the year versus the complexity of the system, as measured by the total length of the DNA molecules synthesized. The field started in the early 1980's with Ned Seeman's pioneering work, which involved just 32 bases of DNA. Developments including DNA computing, self-assembly, motors, circuits, and origami gave rise to ever increasing complexity. In fact, considering the most complex systems created, we see a doubling of complexity roughly every 3 years. If this trend continues, in 20 years we'll be designing, from scratch, systems containing upwards of 4 million nucleotides -- roughly the size of the E. coli bacterial genome. But there's a problem. The systems are already too complex to design by hand, and if we aim to create systems with complex programmable dynamical behaviors, we'll need a systematic approach. We need to turn this into computer science, into molecular programming.
[4:00] So that brings us to the five main goals of the Molecular Programming Project. The first goal relates to how to specify molecular algorithms at various levels of detail, and how to derive a detailed implementation from an abstract specification. We need programming languages and compilers. The second goal relates to how to think about molecular algorithms, their correctness, their robustness, their efficiency. Not only are the fundamental building blocks of molecular algorithms quite novel, but asynchrony, stochasticity, reversibility, parallelism, and geometry play unusually large roles, requiring new theoretical approaches. The third goal relates to the execution of molecular algorithms by actual molecules in the laboratory. Our entire effort is focussed on making it possible to systematically design molecular systems that can be chemically synthesized and will work robustly and efficiently. The fourth goal is to demonstrate useful function in real world applications. Finally, our crowning achievement will be to train a generation of molecular programmers -- both the users in the broader community whose work will be empowered by the application of molecular programming techniques, and the researchers who will make the next breakthroughs in molecular programming.
[5:00] To specify a molecular algorithm, we need a model of computation, or, at a higher level, a programming language. Our philosophy of considering systems that can be implemented in the laboratory dictates that we start with models that are fairly close to basic molecular phenomena -- for example, the ones I mentioned earlier. A molecular program for folding is the sequence of information in a polymer string (or sometimes several strings). Folding proceeds by steps that minimize the energy. The output is a folded shape. Beyond the observation that optimal polymer folding is NP-hard, relatively little is known about it as a model of computation.
[5:50] So let's move on to self-assembly. In this case the program is a set of abstract molecules, called tiles, with specified matching rules for how they attach to each other. Starting with an initial assembly, stochastic and asynchronous attachment of tiles according to these local matching rules leads to growth of a potentially large and complex structure.
[6:15] A different program, or a different input, will result in a different structure. For example, this set of tiles counts in binary, and can be used as a subroutine for growing an object of a specified size. The input assembly specifies the initial value for the counter.
[6:30] A lot is now known about this model of molecular computation. For example, it is Turing-universal for computation and for construction; the programs can be evaluated with respect to time and program-size complexity measures; and fault-tolerance mechanisms have been studied.
[6:40] However, other molecular phenomena that we need for molecular programming have not been as well explored. For example, to design biochemical circuits, it is natural to treat formal chemical reaction networks as a low-level programming language. Here, the concentrations of specific molecular species represent input and output signals, as well as internal variables.
[7:00] It has been well known for many years that chemical circuitry can implement a variety of analog and digital circuit behaviors. Only recently, however, did it come to light that well-mixed chemical reaction networks are Turing-universal. And almost nothing is known about the complexity theory for this model, nor about mechanisms for fighting stochasticity with fault-tolerant architectures, nor about exploiting stochasticity for randomized algorithmic behavior, nor about how to incorporate spatial organization into the model and program it. It will be exciting to explore these frontiers.
[7:30] A very new model, developed by MPP researchers, allows the specification of both attachment and detachment rules. Thus a molecular assembly can be programmed to reconfigure itself. Shown is the program for a molecular walking motor. Very little is known about the expressive power of this model, although there are hints that it may be dramatically more efficient for construction tasks than the previously described passive self-assembly. Each of the models of computation I just described was motivated by the potential for implementing its molecular programs using DNA molecules. An initial aim of the MPP will be to develop domain-specific compilers that automatically compile such molecular programs to DNA implementations. So let's take a look at the state of the art design process, which can be described as an ad hoc computer-assisted process carried out by a human. This is what we'll need to improve upon.
[8:00] For DNA origami, the specification for a target shape is transformed though a series of steps into a set of molecules that guide the folding of a long scaffold strand into the target nanoscale shape. Execution of the folding program consists of commercial synthesis of the DNA molecules (we order them from a company in Texas), followed by mixing them together, heating them up, and cooling them down. Then the output can be visualized by atomic force microscopy -- in this case, 50 billion smiley faces in a few microliters, made in less than an hour.
[8:50] The design process for algorithmic self-assembly follows a similar route, but in this case it starts not with an explicitly represented structure, but rather with a set of local rules for constructing the structure, for example, a cellular automaton. These can be converted into a set of tiles and then to a set of DNA molecules. Execution of the program is as before.
[9:10] DNA digital logic circuits have been designed using a similar process. I'd like to emphasize one point that really highlights why DNA is so compelling as a substrate for exploring molecular programming. To implement a circuit such as this one, each logic element requires a unique molecular species to be designed. Fortunately, the same structural motif -- for example, this three-input AND gate motif -- can be instantiated with almost any choice of sequence for the input and output domains. Thus, wiring up circuits is almost plug-and-play once a set of sequences representing the signals has been determined.
[9:50] The most recently developed model for molecular programming, which allows both assembly and disassembly steps, was demonstrated in the laboratory for four molecular programs implementing a range of behaviors from exponential amplification of a signal, to the DNA walker mentioned earlier. While we have a clear route to designing domain-specific compilers for these four models of molecular programming, I do not want to underestimate the difficulties of creating a compiler that works -- one that takes arbitrary programs and outputs DNA molecules that, when executed in the laboratory, can be relied upon to behave according to specification. It will be essential to develop appropriate design rules, fault-tolerant architectures, and robust molecular compensators against manufacturing errors and unknown system parameters. Another grand challenge for our Expedition is creating a general-purpose molecular programming language that allows the user to write programs that integrate molecular folding, self-assembly, circuits, and dynamics -- as well as other molecular programming components that are currently in use or that will be discovered in the future.
[10:30] Let's look at a few of the issues that will arise in the development of a compiler for such a language. Choosing the appropriate functional abstractions for specifying molecular algorithms and data structures will depend upon our development of theories for modularity and composition of designed molecular subsystems, as well as our approach to attaining robust fault-tolerant behavior.
[10:50] The abstract functions then must be represented in terms of components that can be implemented with established molecular mechanisms.
[11:00] In many cases, proper design of the mechanism will require detailed understanding of the device physics. Thankfully, many of the most advanced theories and tools for the analysis of nucleic acid thermodynamics, energy landscapes, and kinetic pathways were developed by MPP researchers.
[11:15] Actually creating sequences involves combinatorial optimization of design constraints, as well as in-silico validation of the design according to the thermodynamic and kinetic models. Again, MPP researchers have created some of the leading tools for DNA sequence design.
[11:30] After synthesis and sample preparation -- mixing the molecules together -- the execution of the molecular program is autonomous. The growth processes, signal processing, and programmed motion take place by themselves within the test tube. This part is easy. The difficult part of any experiment is the monitoring, characterization, and debugging. How can you figure out what actually happened, when you can't see the molecules?
[12:00] Currently, these steps -- performing the experiments required to know what happened -- are performed manually. This is very time consuming. It is not feasible to characterize and debug large molecular programs this way. For this reason, our Expedition will purchase off-the-shelf laboratory automation systems and high-throughput measurement instruments that can automate the process of performing a systematic series of tests on a molecular system.
[12:20] What are molecular programs good for? Our long-term vision is that embedding molecular sensors, actuators, and most importantly programmable control systems within chemical, biological, and nanotechnological systems will be as transformative to those industries as the microprocessor was for the electrical and mechanical industries. Consider these three classes of applications that will be enabled if the Molecular Programming Project is successful. "Genome-based manufacturing" will be technologies in which products, specified by molecularly-encoded information, are grown in vitro as molecular algorithms. In collaboration with physicists and chemists at Caltech, we are already exploring the use of DNA origami and algorithmic self-assembly of DNA tiles to grow scaffolds for carbon nanotube based electronic devices and circuits. "Molecular instrumentation" will be biochemical circuitry that queries the chemical state of a sample; thus, in the future, scientific instrumentation will involve not only electrical circuitry in the instrument, but also biochemical circuitry deployed within the sample.In collaboration with biologists at Caltech, we are already developing new methods for in situ biological imaging, where sensitive nucleic acid circuits detect the target pattern and amplify the signal. "Programmable therapies" will place our molecular algorithms within living cells to diagnose and directly respond to diseases. With the help of with biologists at UW, we are already testing a prototype "programmable immune system" for fighting viral infections within bacteria, as a model organism. These application areas will rely on advances in chemistry, synthetic biology, and nanotechnology; the contribution of the MPP will be the framework, principles, and design tools for programming the embedded biochemical control circuitry.
[14:20] Molecular programming is a long-term vision, and making this vision happen will involve a whole community of researchers. We have already provided the broader community with on-line software tools for thermodynamic analysis of nucleic acid systems and sequence design, and in the course of the MPP we will expand this to kinetic simulations, analysis of energy landscapes, and an extensive collection of molecular compilers and tools. At the level of educating the next generation of molecular programmers, all MPP faculty are actively developing courses relevant to the project, from the foundations of information science to nucleic acid engineering to synthetic biology and biomolecular computation. It is our vision that in the future, molecular programming will become a standard part of the well-rounded computer science curriculum. At UW, Klavins is already developing a textbook for such a course. MPP faculty have a track record of organizing innovative workshops and summer schools that vitalize the community. We especially look forward to putting together a molecular programming "boot camp", an intensive project-based month-long experience that gives participants the knowledge and skills to design, analyze, and test molecular algorithms, from DNA origami to dynamic circuits. We also have a strong track record in providing cutting-edge research experiences to undergraduates, several of whose projects resulted in published journal articles. We also enjoy exposing the general public to what is going on at the frontier of science. As two recent examples, Rothemund recently had an exhibit of his DNA origami artwork shown at the Museum of Modern Art in New York, and I have collaborated with a surrealist painter to produce a series of paintings inspired by scientific topics. To convey the excitement of science to kids at the K-12 level, both Klavins and Pierce have developed visiting programs that bring kids onto the university campus for a day of science experiments and interaction with faculty and students. The feedback we get from these programs is delightful! These programs will be continued and strengthened within the Molecular Programming Project.
[ 16:45] I've now describe what our Expedition aims to do. I'd like to say a few words about how we'll do it. The backgrounds of the six MPP faculty are very strong in the mathematical and computer sciences. This is essential, because the core intellectual challenges that we will be facing relate to fundamental computer science: models of computation, algorithms, and compilers. Together, we cover a broad range of expertise, including circuit complexity, control theory, robotics, languages, applied math, and the theory of computation. On top of this base, all faculty have been actively re-orienting toward biological and chemical systems, and the MPP team now includes three of the leading experimentalists in DNA computing and DNA nanotechnology. We are ready. Success in this Expedition will require courageous, creative, and energetic exploration of challenging concepts and difficult experimental systems. We must encourage independent thinking in our students -- the kind of thinking that was responsible for the breakthroughs that paved the way for this Expedition. Consistent with this philosophy, the MPP will support a pool of students and postdocs who will work with MPP faculty to explore molecular programming with an open mind. Monthly group meetings involving both schools, and an annual retreat at Caltech, will provide us with the opportunity to re-evaluate the challenges and vision for molecular programming and coordinate our efforts.
[18:10] MPP faculty have an excellent track record of recruiting and training the kind of talented student that the Project needs. These are the students and postdocs who laid the foundation for this Project, and who are ready to carry it forward. These young researchers will be the new face of molecular programming.
[18:30] Within the five years of the Project, we envision concrete achievements. Toward our first goal, we expect to develop and release domain-specific compilers for several existing systems, to prototype a general-purpose compiler, and to develop software tools and algorithms for the analysis of molecular systems. Toward the second goal, we expect to find answers to pressing conceptual and theoretical problems, such as fault-tolerance and stochastic behavior, leading to a rich new complexity theory for molecular algorithms. Toward the third goal, we expect to establish experimental techniques for characterizing and debugging complex molecular systems, and to demonstrate our compilers and techniques by creating DNA circuits and structures 100 times larger than current practice. Toward the fourth goal, we expect MPP approaches to enable collaborations with physicists, chemists, and biologists to develop applications in genome-based manufacturing and biological circuits for analysis and therapeutics. Toward the fifth goal, we will train a new generation of molecular programmers by making programming tools widely available to non-experts and by educating a new generation of researchers. To summarize, the Molecular Programming Project is positioned on the interface of computer science and chemistry, biology, and nanotechnology; it will bring computer science principles to the design of complex autonomous molecular systems, and it will bring rich new questions to computer science. Thank you.