Return HOME
 

LEARNING FROM THE TREE OF LIFE

What do protein molecules -- some of the most fundamental components of life -- have in common with the paper clip, the internal combustion engine, or a successful blue-chip stock?

Evolutionary fitness, argues Gustavo Caetano-Anolles, a biochemist at the University of Illinois at Urbana-Champaign. "At some point, someone invented the paper clip. There's a rationale for why the current design is the one we use now, rather than some other. There's a rationale why we have a car and not something different. There's a history linked to this, an inherent entity that defines why a car is what it is, or why a paper clip is what it is."

History, aesthetics, practical design, even random accident all figure into the evolution of the technologies and institutions on which we rely heavily today -- and in each case, fitness determined the outcome of that evolution, just as fitness to perform a particular function determined the structure of the protein molecule. "Fitness is ultimately the element that defines the success of a particular molecule, or of any organism or entity you study," says Caetano-Anolles. "If you start thinking about the different elements we humans use and have generated as inventions, they also have fitness components. Those that are not fit disappear."

A researcher in the Department of Crop Sciences at UIUC, Caetano-Anolles is examining how the evolutionary mechanisms of protein molecules may also have implications for more complex systems and organisms. Caetano-Anolles's research is currently receiving support from the Technology Research, Education, and Commercialization Center (TRECC), UIUC program funded by the Office of Naval Research (ONR) and administered by the National Center for Supercomputing Applications (NCSA). TRECC supports innovative research in advanced information technologies and their application for the Navy R&D community.

Protein structures resemble bead necklaces, where the "beads" are amino acids. It is well-known that where and how the amino acids that make up the protein necklace are arranged affects the actual structure of the protein. Furthermore, how the protein folds (what shape the amino acid chain takes when coiled upon itself) is determined by the protein's biological function. Thanks to genomics, there is now an enormous amount of data about the sequence and structure of proteins, which is now being used to generate a classification of protein architecture.

Caetano-Anolles likens this effort to that of 18th-century naturalist Carolus Linnaeus, who created a taxonomy of organisms -- a "tree of life" that, over the centuries, came to extend from one-celled plants and other microscopic organisms to enormously complex, highly-advanced sentient mammals and is still undergoing expansion today. "We go to the deep vents in the ocean and we're still isolating new bacteria, new archaea, new organisms and groups of organisms that don't fit within our previous classifications, so we're still exploring and discovering our world," says Caetano-Anolles. "And with proteins we're doing the same thing -- we're trying to understand just how complex their world is."

Proteins are classified into different groups based on their underlying architecture. The question for Caetano-Anolles, however, is what the logic behind the classification really is -- and whether it is something that could be a useful model in areas other than biology. To answer this question, he is examining how proteins have evolved based on their structure, but he begins not at the bottom, with the amino acids, but at the top, with simple organisms such as bacteria and eurykaryotes, and works downward. After classifying the proteins that appear within the genome of a given organism, he builds mathematical hierarchies, or phylogenetic trees, which show the relationships between proteins as evidenced by their architectures. Doing so, Caetano-Anolles explains, allows him to examine how two protein folds might be related to each other -- and possibly to an extinct third protein fold. Protein folds, he says, are especially useful for evolutionary research, because they have changed so little in hundreds of millions of years. "It's an ideal molecule for study," he says. "There's a rationale for that. If something is working very well, nature will try to preserve it and will keep the design static until, perhaps, some big revolution occurs, producing a new design that works better."

But what, exactly, is it that determines which proteins succeed, and which ones have been eliminated? And how can the presence of certain proteins in a genome, and not others, help illuminate the evolutionary mechanism of a complex organism? The answer, says Caetano-Anolles, may have to do with RNA, a ribonucleic acid essential for turning the genetic information encoded in DNA into proteins. Like DNA, RNA consists of amino acids strung together, but in a single, rather than a double strand. Messenger RNA interacts with the ribosome, a kind of infinitesimal machine within the cell that that is made of ribosomal RNA and proteins and actually performs the protein synthesis. "It's really what makes life, basically," says Caetano-Anolles. "It's the central element that turns information into something that works out and produces the complexity that we see, both its structure and all the catalysts that convert chemicals or light into energy and fuel the cell's workings."

It's not hard to draw parallels between cellular structures and mechanisms and computational processes, where DNA resembles a repository of stored, encoded data, and the RNA-encoded ribosomal molecule are essentially the "central script" for a machinery that converts the data into usable output -- here, the crucial proteins. The structure of the ribosomal "script" depends on the function it needs to perform. "If the structure you're studying changes," says Caetano-Anolles, "you can use the information to model the evolution of the molecule."

This model is useful, says Caetano-Anolles, because present-day molecules can be used to reconstruct the past and to study trends that indicate future evolutionary development as well. But he also sees broader applications for the model. "We're exploring whether we can devise a system that will act also as a predictive element, looking not only to the past, but using what we know from the past to project into the future," he says. "Are there tools that we can use, based on the knowledge systems of biology that have been in use for a long time, that we can combine with technologies like neural networks and machine learning techniques to predict future events?"

If the answer is yes, Caetano-Anolles believes that the result could be a powerful system that combines modeling, phylogenetic analysis, and new data mining and computational analysis tools to predict outcomes in many different fields, such as engineering, social science, biology, military science, or national security. "That's far removed from what we're doing right now, but the principle is what's important," Caetano-Anolles says. "If you look at human beings as perfect machines -- I think we're more than that, of course, but it's a perfect example of how this process of change has ultimately resulted in these fantastic entities."



Return to September 2004 Newslink Table of Contents

SUBSCRIBE TO TRECC'S NEWSLETTER
If you would like to receive an email reminder when an issue is placed online, please add your name to the TRECC News mailing list.