It was a curious way to die. On farms all over Texas, the same grim scene was playing itself out. Horses were acting in strange ways, walking into objects or staggering about, unable to place one foot in front of the other. Eventually, they showed no interest in eating. Finally, of course, they fell dead. When they were autopsied, veterinary doctors found something that could have come from the latest X-Files episode. Their brains, normally a hard mass of tissue, had turned to liquid. State veterinary authorities were called on to explain to farmers what exactly was happening.
The condition, known as equine leukoencephalomalacia, was well documented. Each year, in fact, it claimed a few horses throughout the vast state of Texas. But this year was different. More than 100 horses, found in more than 60 clusters, had become ill and died. The cause, the authorities knew, was fumonisins-deadly toxins that were produced by a corn mold called Fusarium moniliforme, which grew on corn that had been harvested while wet or improperly stored.
Fumonisins-long considered a minor hazard of raising livestock-were also known to kill pigs, and researchers suspected a link to several diseases in other animals. So Texas farmers were advised to test their corn-based feed for the presence of fumonisins and make sure it was stored in a dry place for no longer than two weeks. Beyond that, there was little they could do.
It was Texas in 1989, and all that was true. Less than a decade later, a few hundred miles to the north, in Des Moines, Iowa, researchers at the large agritech company Pioneer Hi-Bred International wondered if a new computer-driven discipline-called bioinformatics-could come up with a way to control the toxin. To pursue that research, they partnered with a New Haven, Conn.-based biotech company called CuraGen , whose cutting-edge database-searching tools could scour millions of genetic sequences in a few seconds and possibly identify the genetic structures of proteins that appeared to inhibit the growth of the fumonisins.
The Pioneer Hi-Bred research team, led by Susan Martino-Catt, a research coordinator in the genomic division in Johnston, Iowa, had already identified a black yeast that seemed to contribute to the degradation of fumonisins.
Martino-Catt's team extracted RNA from the yeast and sent samples to the CuraGen lab half a continent away. There, the RNA samples were run through a battery of genomic analyses, every one of which was controlled and studied by the scientists back in Iowa.
JUST A COUPLE OF YEARS AGO, the fumonisin research could have easily taken two years in the lab. But last year, it took only 12 weeks to identify all of the genes in the enzymes that appear to break down the toxin. Much of that speed can be attributed to CuraGen's complex algorithmic genomics tools. And much of it can be attributed to the small biotech company's imaginative use of the Internet, which CuraGen uses to give research partners access to tools and databases.
In basic terms, CuraGen uses the Internet in the same way financial planners do: to deliver information-in the form of bioinformatic databases-and tools to help clients wring meaning out of that information. But the databases that CuraGen offers, and the tools that it leases, are many thousands of times more complex than those found on most financial Web sites. And while CuraGen's tools, like those on financial sites, are intended to make many people very rich, they have a higher purpose as well. CuraGen is in the business of mining genomic data to make drugs that treat and prevent diseases such as cancer.
CuraGen's genomic quest is not unique. For the past decade, thousands of researchers at hundreds of institutes, universities and commercial businesses have been racing to identify and characterize the genomes of many organisms, including the 80,000 to 100,000 human genes. The product of that effort is several genomics databases, some of which are available to researchers at no cost, and some of which are proprietary-sold to researchers for cash or a per centage of royalties from any drugs that may result from the research.
CuraGen's database is a combination of the two: The company sells its database to pharmaceutical companies hoping to shave a few tens of millions of dollars off the current $300 million to $500 million cost of developing a new drug, and it gives away the database to academic researchers whose credentials are sufficiently impressive. In all, genomics researchers believe, they have identified meaningful sequences for as much as 95 per cent of the human genome.
Scientists at CuraGen have also identified more than 60,000 previously unknown SNPs, or single nucleotide polymorphisms-genetic mistakes, if you will-and the potential causes of disease.
Founded six years ago by a 36-year-old scientist named Jonathan M. Rothberg, CuraGen now employs more than 300 people. It is one of many companies that hopes to develop drugs based on an algorithmic vetting of genetic sequences and one of several young companies that essentially rents bioinformatic tools to large pharmaceutical companies. CuraGen's business plan calls for three streams of revenue: one from discovering and developing its own drugs, one from leasing its tools and databases to partner drug developers, and one from royalties from drugs developed by partners.
Although the company's technology perches on biotech's bleeding edge, CuraGen has signed multimillion-dollar deals with major pharmaceutical companies ("big pharmas" in biotechspeak) such as Biogen , E.I. du Pont de Nemours and , Genentech , Glaxo Wellcome and F. Hoffman-La Roche Ltd. There is good reason why.
"Developing a drug can cost a million dollars a day," says Jon Soderstrom, director of cooperative research at Yale University. "If you can knock just one day off, you save a million dollars." Soderstrom says that there are several companies selling their databases, and there will no doubt be more-all of them banking on the need of the "big pharmas" to know all there is to know. "Each one of the databases gives scientists a different view on how to plumb the [genetic] depths," says Soderstrom. "And anything that can take time out of the development process is worth the cost to their companies." And while that cost is significant-the Glaxo Wellcome deal was worth $48 million-it is less than it would be without Web technology. When CuraGen researchers collaborate with those at Glaxo Wellcome, for example, their colleagues are located in Research Triangle Park, N.C., in Verona, Italy, and in the United Kingdom. When they work with F. Hoffman-La Roche, their colleagues are in Palo Alto, Calif., in Nutley, N.J., in Basel, Switzerland, and in Heidelberg, Germany. When they work with Genentech, their partners are in San Francisco, and when they work with Biogen, their fellow researchers are in Cambridge, Mass. Yet, except for occasional visits from a few CuraGen staffers, most of the research staff never has to leave New Haven.
"We coordinate everything through the Internet and through phone conferences," says Martin Leach, CuraGen's acting director of bioinformatics. "But, while the Internet is very useful, it can do only 80 per cent or 90 per cent of what we have to do. So we get together with partners every three or four months." THE WEB-BASED ACCESS IS CONVENIENT and inexpensive. Still, some partners worry that their password-protected work areas are vulnerable to snooping competitors. In such cases, says Leach, CuraGen offers the greater security of a dedicated line. Leach himself, however, believes that the SSL encryption used by CuraGen is perfectly sound. It is, he points out, the same encryption method widely used to transmit credit card information.
Leach is a kind of professional hybrid, one part biologist, one part IT geek.
As early as 1993, before the emergence of browsers, he would join dozens of other researchers logging in from all over the world in online discussions of the implications of certain gene sequences. "We would hold international meetings," says Leach, then a student at the Boston University School of Medicine. "We would take scientific papers, turn them into ASCII and have an online conference." In 1996 Leach was hired by CuraGen to act as a liaison between the biologists and the software designers. When he arrived, he was both thrilled and dismayed.
"I looked around the laboratory," he says. "I saw what the technology could do and thought, 'Damn, I could have done my thesis in three weeks.' As a biologist, it was demoralizing. It's possible that research that would have taken 10 or 15 years could be generated here in six weeks." Working with Director of Bioinformatics James Knight and others on the bioinformatics team, Leach built an informatics infrastructure that advanced the development of CuraGen's high-throughput genomics workflow process. "What it does," says Leach, "is allow us to take a lump of tissue and run the output through thousands and thousands of gene sequences, some new, some that we've seen before." In addition to building the tools themselves, Leach and Knight were charged with figuring out how to deliver those tools to researchers not just at CuraGen, but at other companies that might find them useful. Back in those prebrowser days, the challenge was not inconsiderable, and the CuraGen team set out to build applications for tracking and processing data that would run on both PCs and Macs.
"When we first started the process, we were terrified," says CEO Jonathan Rothberg. "We were thinking about how we were going to support the different heterogeneous environments. We thought we'd need 40 people just to work on that alone," he admits. "We also thought we would have to have special leased lines to call in. Then the Internet came along, and it was perfect timing," he says.
"Our first collaborations began in '96, and the tools were newly available. We installed Oracle, put the servers up, put the Web browsers up, and we were in business." Rothberg says the Internet made it possible to build a distribution system that otherwise would have been so expensive that his company would have had to seek venture capital money. Web technology freed his staff to concentrate on the development of genomics tools, a competitive arena where time to market is crucial.
Consequently, CuraGen's system evolved more quickly from a workflow system to what Leach calls a workflow discovery system.
"Imagine a very well-organized laboratory in which you are using Excel spreadsheets to track what is going on," says Leach. "That's how it got started. Then those Excel spreadsheets were automated and put into an Oracle database so that you could build them dynamically on the fly or bring them up and modify them. That's what evolved into tools for processing data. So with what we have now, you can take that data and ask questions, such as 'How many secreted proteins are different or similar between two samples?' or 'How many metabolic enzymes are involved in a particular degradations pathway?'" When a company comes in with a new experiment, Leach says, all the data and queries must be formatted so that researchers on both ends of the Internet connection end up talking about the same things. CuraGen sends the client a set of bar codes and standard operating procedures for processing the tissue samples. The physical samples, such as Pioneer Hi-Bred's black yeast RNA, are sent to CuraGen with bar codes. Working with the discovery scientists, the client fills in the data online, literally filling in blanks on CuraGen Web pages. Scientists at CuraGen and at the client company then work together online, adding notes and comments. CuraGen enters a study statement that explains the objectives of the experiment, and the work gets underway.
"The first step is usually to take a lump of meat-the tissue-and turn it into single cells," says Leach. "Then we take the single cells and isolate the genetic material inside. That genetic material gets processed in various ways to generate the data we need. Then we label it with fluorescent tags so that we can read a snapshot of what's inside the cells on a machine. We developed that architecture so that given a DNA or a protein sequence, it can put it all in the form that is needed and will have links to other relevant information." The goal, says Leach, is often to identify disease genes, whose proteins can serve as drug targets. Those drug targets can be put into experiments that mimic the disease, and then the drug targets can be screened against large numbers of molecular compounds in the hope of revealing some beneficial interaction. Eventually, says Martin, those drugs that appear to interact with the disease can go to preclinical tests in animals.
The catch, says Leach, is that the final stages of that process are, at this point, largely theoretical. The proteins built from Pioneer Hi-Bred's promising genetic sequences, for example, have yet to prove that they can prevent fumonisin from growing in corn. Researchers at Pioneer Hi-Bred are now taking the next steps: turning those genes into proteins, mixing those proteins with the toxins and injecting the toxin in horses. If the horses remain healthy, Pioneer Hi-Bred will try to engineer the genes into a new fumonisin-resistant strain of corn. There is a certain degree of faith involved, but there is also a lot of promising science. Enough, at least, to persuade the big pharmas to bet hundreds of millions of dollars that bioinformatics is the cornerstone of 21st century medicine.