Nature Biotechnology February, 1999, by: Richard C. Strohman Professor Emeritus, University of California, Department of Molecular and Cell Biology, 229 Stanley Hall # 3206, Berkeley, CA 94720-3206, tel. (510) 642-4941, fax. (510) 643-9290
Below I describe the Human Genome Project (HGP) in terms of five overlapping phases defined mostly in terms of their biomedical goals. The staging is somewhat arbitrary and is informed mostly by my perception of a steady turnover of ideas having to do with "Program for life", what it means, and where it is located. Initially located in the genome and assumed to be isomorphic with phenotype, the Program is gradually being relocated, through intermediate stages, to the level of the organism itself. Each succeeding stage of HGP is driven by results not anticipated in the prior stage.
|
While HGP is not yet complete, enough data has been collected from it and from other genome projects (mouse, worm, fly) to provide the following tentative conclusions: |
|
|
1. |
There is not sufficient information in genomic data bases to provide explanations for complex functional attributes of cells and organisms. |
|
2. |
Therefore, there must be other informational systems and operating rules that complement genomic systems. |
|
3. |
Epigenesis is identified as one such system. |
|
4. |
Program rules by which regulation is produced are extra-genomic and are most likely to be found not in molecular mechanisms per se but in their integration into complex gene circuits and, more peripherally, into their connectedness with regulatory networks (metabolic and other) of cellular dimensions. |
Evidence establishing epigenetic networks as sites of control over cellular phenotype, including control over covalent marking of DNA and chromatin (the phenotype of the genotype) (1) , has shifted attention from a narrow focus on DNA to the more complex dynamics of gene circuits, and their integration into larger, environmentally-open networks of cellular dimensions (2). This change in emphasis from linear "causal" molecules to the regulatory dynamics of molecular networks is increasingly perceived by a growing number of molecular biologists and geneticists working within the various genome projects (3), biochemistry (4), integrative biology and physiology (5), developmental biology (6) , medical (7) and behavioral genetics (8).
The five stages of HGP are summarized as follows: Stages I (Monogenic causality) and II (Polygenic causality) deal, respectively, with rare monogenic diseases, and with polygenic diseases shaped by an unknown complexity of gene number coupled to individual experience and environmental history. However, monogenic diseases account for only a small fraction (2%) (9) of non-infectious diseases. Accordingly, the emphasis has shifted more to Stage II which attempts to "reduce" to candidate genes or small clusters of key genes the many genes (tens, hundreds, thousands, no one knows) involved in shaping a disease or any other complex phenotype. Gene maps for each physiological function and disease will begin to show the way over the divide between genetic information and functional outcome in cells and organisms. However, as noted previously, these maps mostly assume additive and dominant effects and do not include dynamic rules governing deployment, interaction (epistasis), redundancy (pleiotropy), and connectedness of these genes.
The move to Stage III, proteome analysis (the entire protein complement of a genome), is an acknowledgment of problematic genomic complexity and focuses on expressed genes (proteins) thereby avoiding some of the problems of genome size, but has its own "levels" problems in that it continues to rely on a description of large numbers of additive agents (proteins) without recourse to rules of interaction, redundancy and connectedness just mentioned.
Transgenic analysis (Stage IV) acknowledges the many problems in the above approaches and will rely on the normal dynamics of the organism in conjunction with gene transfer between species to produce "novel" phenotypes and thereby reveal programmatic aspects of morphogenetic processes. But there are problems here as well (pleiotropic genes and proteins) in which developmental and other higher levels of organization are bracketed and remain unexplored 9. Therefore, with this strategy, while detailed genetic maps for a variety of cellular functions will be established, the nature of the processes being perturbed by gene manipulation remains a black box.
The final stage V, complexity, is the logical and unpredictable extension of IV and represents, perhaps, an entirely different approach where higher levels of cellular organization and regulation impose constraints on the genome, and where genes and environments are inseparably integrated. Epigenetic regulation of the genome 1 is seen as the most proximal of a series of levels or hierarchy of constraints extending outward from DNA structure to the cell boundary and beyond. This stage is now occupied with a description of the molecular events involved in DNA and chromatin marking and seeks to understand, among other things, how marking constrains and orders patterns of gene expression. While these studies remain in a descriptive mode, the next logical stage is already under way; one where connectedness is restated in terms of gene circuits 2 or metabolic networks4, and where entire states of genetic or biochemical activity may come under the control of a "circuit or network logic" so that the large amount of information inherent in many participating elements of a system may be compressed into a logic of circuits and networks.
The main message here for a biotechnology devoted to finding specific causes and cures for complex diseases is that the hoped-for specificity is severely compromised by a profound genetic, molecular, and informational complexity. For example, coronary artery disease involves several hundred genes. A complex disease like colon cancer is now acknowledged to include not only large scale mutation but also profound changes in patterns of gene expression (10). Genetic instability in the forms of loss of heterozygosity (11) and aneuploidy (12) also complicate the simple single or even multiple gene mutation theories of cancer (13). Genetic instability may be the primary defect in common cancer where mutation in oncogenes and TS genes would be secondary and would select affected cancer cells through avoiding apoptosis. When one adds the classical but mostly unrecognized uncertainties present in widespread epistasis and pleiotropy, the present emphasis on dominant gene effects and on single gene or protein-based diagnosis and therapy for common human diseases must be seen as unrealistic. The gene/protein circuits and network logic studies cited above represent some starting points for the development of new understanding and new technologies for managing complex phenotypes.
1. Cellular and Molecular Life Sciences (1998). Volume 54 is devoted to epigenetic control of transcription
2. Kauffman, S..A. (1969) J.Theor. Biol. 22:437-467; The Origins of Order Oxford University Press 1993; Thomas, R. (1998) Int. J. Dev. Biol. 42:479-485; Thieffry, D., Huerta, A.M.,Perez-Rueda, E., & Collado-Vides, J. (1998) BioEssays 20:433-440
3. Miklos, G.L.G., and Rubin, G. (1996) Cell.86: 521-529.
4. Fell, D. (1997) Understanding the control of metabolism, Portland Press, London.; Veech, R.L. & Fell, D.A. Cell and Biochem. Function 229:236
5. Savageau, M.A. (1991) The New Biologist 3:190-197
6. Gilbert, S., Opitz, J.M. & Raff, R.A. Develop. Biol. (1996) 173,357-372; Goodwin, B.C., Kauffman, S.A., & Murray, J.D. (1993) J. Theoret. Biol.163:135-144; Webster, G. & Goodwin, B. (1996) Form and Transformation: Generative and relational principles in biology. Cambridge University Press
7. Sing, C.F, Haviland, M.B, and Reilly, S.L (1996) Ciba Foundation Symposium 197:211-232
8. Wahlsten, D. (1999) Annual Review of Psychology 50: 599-624.
9. Strohman, R.C. 1994. Bio/Technology 12:156-164
10. Zhang, L. et al. (1997) Science 276,1268-1272
11. Vogelstein,B., Feason, E.R., Kern, S.E. et al. (1989) Science 244:207-211
12. Duesberg, P. Rausch, C.,Rasnick, D. & Hehlmann, R.(1998) Proc. Natl. Acad. Sci. U.S.A. 95:13692-13697
13 Rubin, H. (1998) J. Surgical Oncology 69:4-8; Waliszewski, P., Molski, M., & Konarski, J. J. Surgical Oncology 68:70-78