Digitally Grounded Scientific Millstones

  April 09, 2022   Read time 4 min
Digitally Grounded Scientific Millstones
Venter broke the news on May 24, 1995; it was at a session of the 95th General Meeting of the American Society of Microbiology, held in Washington D.C. It was a classic performance, particularly in two respects.

First, not content with amazing his scientific peers even more than he had when he published his EST paper, “he also took obvious pleasure in noting that NIH had refused to provide federal funds for the effort,” as a Science report pointed out. Despite this, Francis Collins had enough practice in the sound bite game to summon up a couple of variants on a theme, calling Venter’s work “a remarkable milestone” when interviewed by Science and “a significant milestone” when interviewed by The New York Times. The second notable feature of Venter’s speech was his parting comment that his method, which he dubbed the “wholegenome shotgun” approach, had worked so well and was so fast that his team had actually sequenced not just one bacterium in a year, but two. The second bacterium was Mycoplasma genitalium, one of the simplest bacteria; it is associated with reproductive tract infections. Venter later revealed that “TIGR had a T-shirt that says ‘I ❤ my genitalium’,” an indication of the playful atmosphere that reigned during those heady early days.

The 1995 paper in Science, entitled “Whole-genome random sequencing and assembly of Haemophilus influenzae Rd” was, as Francis Collins had indicated, truly a milestone. It represented the first complete genome of a freeliving organism. With H. influenzae, scientists could for the first time investigate the complete range of digital code—and hence analogue machinery in the form of the corresponding proteins—that was required for life. This was important not only for the promise of future, detailed knowledge about how cells function, but also for demonstrating that it was possible—at least on a bacterium—to obtain the complete digital code that ran an organism. Until TIGR’s paper, the possibility remained that there would be some final, unsuspected obstacle to elucidating the detailed chemical text of the program. In a sense, Venter’s work also validated the entire concept of genomics (the study of entire genomes) beyond that of traditional genetics (the study of individual genes). It implicitly marked the start of a new phase in molecular biology, one that was based on complete digital knowledge of an organism. The long-term effects of this shift will be so profound that future generations will probably struggle to imagine how it was possible to conduct biological sciences and medicine without genomes.
The paper from Venter and his team provided some details of how the work was carried out. First, it noted the continuity with his earlier EST work: “The computational methods developed to create assemblies from hundreds of thousands of 300- to 500-bp [base pairs] complementary DNA (cDNA) sequeled us to test the hypothesis that segments of DNA several megabases [millions of bases] in size could be sequenced rapidly, accurately and cost-effectively by applying a shotgun sequencing strategy to whole genomes.” As with the EST work, the key to Venter’s success was the use of plenty of powerful technology. The paper states that it took 14 ABI sequencing machines, run by eight technicians for three months, to produce 23,304 sequence fragments. These were put together using the TIGR Assembler program, running on a SPARCenter 2000 with 512 megabytes of RAM—a huge amount for 1995. Even so, the assembly took 30 hours of central processing unit time to produce 210 contigs— unbroken sequences formed from the overlapping shotgun fragments. The gaps between these contigs were closed using a variety of methods to complete the sequence. The lead writer of the Science paper, Robert Fleischmann, recalled the moment when everything came together: “Lo and behold, the two ends joined. I was as stunned as anyone.” Unlike human chromosomes, bacterial DNA is generally in a closed, circular form.
The final result was a genome that was 1,830,137 base pairs long, obtained at an average cost of 48 cents each. This was another breakthrough for Venter. As he himself remarked at the time: “People thought [that bacterial genomes] were multiyear, multimillion-dollar projects. We’ve shown that it can be done in less than a year and for less than 50 cents per base.” The consequence, he noted, was that “it’s opened the floodgates.” TIGR itself went on to sequence dozens more bacteria—discussed in chapter 14—and others soon followed in its footsteps. But Venter was keenly aware of even broader implications. The Science paper’s peroration suggested various areas where the whole-genome shotgun approach could be usefully applied. And as usual, Venter saved his most provocative thought for the last, throwaway line: “Finally, this strategy has potential to facilitate the sequencing of the human genome.”

  Comments
Write your comment