Pauling’s models were not merely a visualization tool to help him build intuitions for the molecular configurations of peptides. Rather, his models were precisely machined analog computers that allowed him to empirically evaluate hypotheses at high speed.
Over time, our questions began to veer into the realm of complex systems that are less amenable to analytical modeling, and molecular biology became more and more of an experimental science.
Machine learning tools are only now enabling us to regain the model-driven mode of inquiry we lost during that inflection of complexity.
In their first such proposal, Rosalind Franklin highlighted something akin to a software error – the modelers had failed to encode a chemical rule about the balance of charges along the sugar backbone of DNA and proposed an impossible structure as a result.
Only when they built the model and found that the resulting “bulges” were incompatible with chemical rules did Watson and Crick realize that heterotypic pairs – our well known friends A to T, C to G – not only worked structurally, but confirmed Edwin Chargaff’s experimental ratios4.
These essential foundations of molecular biology were laid by empirical exploration of evidence based models, but they’re rarely found in our modern practice. Rather, we largely develop individual hypotheses based on intuitions and heuristics, then test those hypotheses directly in cumbersome experimental systems.
The inductive bias guiding most experiments was that high-level biological phenomena – heredity, differentiation, development, cell division – could be explained by the action of a relatively small number of molecules.
John von Neumann […] asked, How does one state a theory of pattern vision? And he said, maybe the thing is that you can’t give a theory of pattern vision – but all you can do is to give a prescription for making a device that will see patterns!
In other words, where a science like physics works in terms of laws, or a science like molecular biology, to now, is stated in terms of mechanisms, maybe now what one has to begin to think of is algorithms. Recipes. Procedures. – Sydney Brenner9
By exploring these representations and model behaviors, we can extract insights similar to those gained from testing atomic configurations with a carefully machined structure.
One beautiful aspect of this approach is that the learned representations often reveal relationships between the observations that aren’t explicitly called for during training. For instance, our cell type classifier might naturally learn to group similar cell types near one another, revealing something akin to their lineage structure.
If we continue to explore the learned representation of our cell type classifier, we can use it to test hypotheses in much the same way Pauling, Crick, and countless others tested structural hypotheses with mechanical tools.
Regardless of how incorrect rules find their way into either type of model, the remedy is the same. Models are tools for hypothesis exploration and generation, and real-world experiments are still required for validation.
The main distinction is how those rules are encoded.
This distinction of how rules are derived is then rather small in the grand scheme.