Annals of Botany 95/1 © Annals of Botany Company 2005; all rights reserved
Economy, Speed and Size Matter: Evolutionary Forces Driving Nuclear Genome Miniaturization and Expansion
Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK
* E-mail tom.cavalier-smith{at}zoo.ox.ac.uk
Received: 10 November 2003 Returned for revision: 8 January 2004 Accepted: 12 February 2004
| ABSTRACT |
|---|
|
|
|---|
Background Nuclear genome size varies 300 000-fold, whereas transcriptome size varies merely 17-fold. In the largest genomes nearly all DNA is non-genic secondary DNA, mostly intergenic but also within introns. There is now compelling evidence that secondary DNA is functional, i.e. positively selected by organismal selection, not the purely neutral or selfish outcome of mutation pressure. The skeletal DNA theory argued that nuclear volumes are genetically determined primarily by nuclear DNA amounts, modulated somewhat by genes affecting the degree of DNA packing or unfolding; the huge spread of nuclear genome sizes is the necessary consequence of the origin of the nuclear envelope and the nucleation of its assembly by DNA, plus the adaptively significant 300 000-fold range of cell volumes and selection for balanced growth by optimizing karyoplasmic volume ratios (essentially invariant with cell volume in growing/multiplying cells). This simple explanation of the C-value paradox is refined here in the light of new insights into the nature of heterochromatin and the nuclear lamina, the genetic control of cell volume, and large-scale eukaryote phylogeny, placing special emphasis on protist test cases of the basic principles of nuclear genome size evolution.
Genome Miniaturization and Expansion Intracellular parasites (e.g. Plasmodium, microsporidia) dwarfed their genomes by gene loss and eliminating virtually all secondary DNA. The primary driving forces for genome reduction are metabolic and spatial economy and cell multiplication speed. Most extreme nuclear shrinkage yielded genomes as tiny as 0·38 Mb (making the nuclear genome size range effectively 1·8 million-fold!) in some minute enslaved nuclei (nucleomorphs) of cryptomonads and chlorarachneans, chimaeric cells that also retain a separate normal large nucleus. The latter shows typical correlation between genome size and cell volume, but nucleomorphs do not despite co-existing in the same cell for >500 My. Thus mutation pressure does not inexorably increase genome size; selection can eliminate essentially all non-coding DNA if need be. Nucleomorphs and microsporidia even reduced gene size. Expansion of secondary DNA in the main nucleus, and in large-celled eukaryotes generally, must be positively selected for function. Ciliate nuclear dimorphism provides a key test that refutes the selfish DNA and strongly supports the skeletal DNA/karyoplasmic ratio interpretation of genome size evolution.
Genetic Control of Cell Volume is Multigenic The quantitatively proportional correlation between genome size and cell size cannot be explained by purely mutational theories, as eukaryote cell volumes are causally determined by cell cycle control genes, not by DNA amounts.
Key words: Skeletal DNA, genome size evolution, nucleomorphs, heterochromatin, karyoplasmic ratio, cell volume determination, selection for economy, origin of the nucleus, nuclear volume
| INTRODUCTION |
|---|
|
|
|---|
For over half a century biologists have been greatly puzzled because the amount of DNA in cell nuclei does not generally correlate significantly with the number of genes, as would have been expected if DNA's sole function was genic (Mirsky and Ris, 1951
It should go without saying that mutations are the primary cause of any evolutionary changes and that selection can only act on those that actually occur. Although understanding the causes of and biases among mutations is desirable in itself, to focus on mutations at the expense of selective forces (Petrov, 2001
), far from giving new insights into the C-value paradox, sidesteps the core issues. We need to understand both mutations and their differential survival (selection). Mutational biases alone cannot explain the most important facts about genome size evolution; mutational processes stayed essentially unchanged across the bacteria eukaryote divide. What changed fundamentally was cell structure and the manner of coevolution between genomes and cells, transforming the selective forces on genomic changes (Cavalier-Smith, 1993
). The main reason why many do not realise that we already have a basically sound explanation of the C-value paradox is not that we do not understand mutations well enough, but that the solution I have offered, and develop further here, is a synthesis drawing on cell and molecular biology, population and evolutionary biology, developmental biology and ecology; evaluating such a complex synthesis is tough to a specialist in just one area. Population geneticists might prefer to be able to solve evolutionary problems strictly in their own terms and ignore detailed cell biology; cytologists might prefer to do the same and ignore abstruse population genetic arguments; molecular biologists might prefer to sequence DNA and ignore both the cell biology and the population biology. But for deeper understanding we must make the effort to combine the explanatory modes of all relevant disciplines.
Mirsky and Ris (1951)
first showed a very strong quantitative correlation between cell size and genome size in vertebrate animals. This proved equally true for plants and unicellular eukaryotes (protists) (Cavalier-Smith, 1985a
), as did the lack of correlation with organismal complexity or the inferred number of genes (Cavalier-Smith, 1985b
). Nuclear volume also correlates with genome size in just the same way in both animals and plants (Vialli, 1957
; Baetke et al., 1967
). Yet most molecular biologists and geneticists were so obsessed with sequence-related functions of DNA in the heyday of deciphering the genetic code that both fundamental cellular correlations were ignored. It was left to others to suggest that DNA may have functions additional to genic ones. Bennett (1972)
suggested that DNA has a structural role in controlling nuclear volume and referred to this and other possible functions of genome size unrelated to sequence as nucleotypic. Commoner (1964)
postulated that DNA amounts control cell size, and van't Hof and Sparrrow (1963)
that they controlled cell-cycle length. Although Bennett (1972)
also supported both suggestions, I argued that correlation of genome size and cell-cycle length was much weaker and more variable and indirect than for cell and nuclear volume (Cavalier-Smith, 1978
, 1980a
, 1982a
). The two latter correlations typically essentially scale isometrically; organisms with 10, 100, 1000, 10 000 or 100 000-fold larger genomes than others have approximately 10, 100, 1000, 10 000 or 100 000-fold larger cells and nuclei (Fig. 1). This is not so for cell-cycle lengths or the inverse correlation of genome size with basal metabolic rates in animals emphasized by Szarski (1970
, 1976
, 1983)
or the various other things that may correlate weakly with genome size; the latter are very indirect consequences of different cell volumes, have nothing directly to do with genome size, and can be modulated by many secondary processes (Cavalier-Smith, 1985a
, b
, 1991b
).
|
In principle the remarkable universal correlation of eukaryote genome size and cell volume could have been explained in three contrasting ways:
(1) It might be the result of a purely mutational equilibrium, e.g. Petrov (2002)
suggested that the spectrum of genome sizes is determined by a balance between a universal bias in favour of small deletions and a varying tendency to accumulate DNA by duplicative transposition of selfish genetic elements. This or any other essentially mutational hypothesis (e.g. the original selfish DNA ideas: Doolittle and Sapienza, 1980
; Orgel et al., 1980
) are compatible with the correlation only if genome size directly causally determines cell size, i.e. if DNA has a nucleotypic function. I shall show that genome size does not determine cell volume and therefore that all purely mutational theories must be false: selection as a function of genome size must also be involved in addition to any mutation pressures that may exist. Another reason why selection must be involved is that the spread of cell volumes differs dramatically in different taxonomic groups in an apparently adaptive way. This cannot be explained by purely mutational theories.
(2) One can postulate a universal net excess of duplications over deletions that would inexorably increase genome size, coupled with threshold selection against excessive DNA that was a function of cell sizebelow the threshold mutational bias would increase DNA, above it selection would hold it in check. I pointed out earlier that such a mutationselection equilibrium is the only way of making the idea of selfish DNA even remotely plausible as an explanation of the C-value paradox without invoking a nucleotypic function for DNA (Cavalier-Smith, 1985c
). The central assumption behind this theory is that natural selection is ineffective at limiting or reducing genome size below a cell-size-determined threshold. In a later section I explain how the differential scaling of nucleomorph and nuclear genome sizes show that this assumption is false. Natural selection can reduce genome size very efficiently and if necessary eliminate essentially all non-coding DNA (Beaton and Cavalier-Smith, 1999
).
(3) One therefore has to explain the correlation by a varying balance between two opposing selective forces: selection against extra nuclear DNA in smaller cells and selection for it in larger cells (Cavalier-Smith, 1978
). This is sometimes called the optimal DNA theory (Orgel et al., 1980
). But no biological optimization is ever perfect. Physical and developmental constraints and recurrent harmful mutations inevitably mean that any structure or process is to some degree suboptimal, so it might be better named the near-optimal DNA theory.
To be satisfactory any theory must explain, preferably quantitatively, the sharp contrast between the bacterial and eukaryote scaling laws shown in Fig. 1. Only the skeletal DNA theory has seriously addressed this. Labelling this interpretation of eukaryote genome evolution as the skeletal DNA theory is an oversimplifying convenience. The skeletal DNA idea is but one of seven principles that must be combined to understand eukaryote nuclear genome size evolution. I reassert them briefly in updated form:
(1) The central factor is cell volume. This is generally highly adaptive in both multicellular organisms and protists. A huge range (roughly 300 000-fold) of cell sizes has evolved in eukaryotes for adaptive reasons; but the spectrum is markedly different in breadth and mean in different groups, which is also adaptively explicable. The spectrum results from opposing advantages and disadvantages of small versus large cells. Cell volume for protists is the same as body size and thus fundamentally and centrally important for defining their ecological niche (Cavalier-Smith, 1980a
). Understanding the importance of cell size in plants, and even more so in animals, is greatly complicated by the immense variation in cell volume possible from tissue to tissue and by the false but widespread dogma that cell size does not matter for multicells and that only body size counts (Gould, 1977
). Botanists have been more ready to recognize that somatic cell size is physiologically important, because much functional machinery in plants consists of individual cells, e.g. tracheids, phloem sieve tube elements, stomatal guard cells. But even in animals the size of blood cells in relation to capillaries and the size of nerve cells that have to stretch from an elephant's spine to its toes or from brain to the tip of its trunk, or to the tip of a blue whale's penis are functionally important (Cavalier-Smith, 1991b
). Throughout biology size matters.
(2) Eukaryote cell volumes evolve by mutating cell-cycle control genes, not by changing genome size. Genetic control of cell size is crucial for understanding nuclear genome size evolution, because had the hypothesis that genome size determines cell size (Commoner, 1964
) been correct, it would have provided a simple explanation of their universal correlation: mutations increasing or decreasing nuclear DNA amounts would necessarily proportionally increase or decrease cell volumes. A purely mutational theory of genome size (Petrov, 2002
) contradicts both principles 1 and 2 by assuming that: (a) DNA amounts causally determine cell volumes (with a scaling of 1 on Fig. 1), and (b) cell volume is a neutral character not subject to selection. If either assumption is false, the purely neutral theory is wrong and selection is also involved. I have previously given many reasons why cell volume is adaptive and the purely neutral theory false (Cavalier-Smith, 1985a
). But understanding of eukaryote cell cycles was insufficient then to reject the theory that DNA amounts determine cell volumes; only indirect arguments could be given against it.
(3) DNA is the fundamental nuclear skeleton. As the nuclear envelope assembles around chromatin and is always attached to it during interphase, nuclear DNA content plus its tightness of packing or degree of unfolding causally determine nuclear volumes; total nuclear DNA therefore has a non-genic nucleotypic function. By contrast, bacterial, mitochondrial and chloroplast DNAs do not. The assertion by Gregory (2001)
that my use of the term nucleotypic for this skeletal function is contrary to its original definition by Bennett (1972)
is mistaken.
(4) The karyoplasmic ratio is optimized. The ratio of the volume of the nucleus to that of the cytoplasm (karyoplasmic ratio: Strasburger, 1893
; Wilson, 1925
) is functionally important and essentially invariant with cell volume across many orders of magnitude (Trombetta, 1942
). Its importance is not, as originally suggested (Cavalier-Smith, 1978
) and Gregory (2001)
unnecessarily dwells on, because transport across the nuclear envelope is rate-limiting for growth, which is not generally true for cycling cells, but may be for a few giant ones (Cavalier-Smith, 1982a
). Instead it lies in the unavoidable requirement to balance the overall rate of RNA synthesis (mass per unit time) and processing (which both require nuclear machinery that occupies space) with the rate of protein synthesis (which requires ribosomes, which occupy a major part of the cytoplasmic space) in actively multiplying cells undergoing balanced growth (defined as growth that leaves the quantitative proportion of different cell constituents unchanged from one cell generation to the next: Ingraham et al., 1983
).
(5) Therefore, bigger cells need larger nuclei. When cell size increases in evolution there is positive selection for a corresponding increase in nuclear volume; it is generally easier to achieve this by increasing the amount of DNA rather than by altering its folding parameters.
(6) There is universal selection against excessive amounts of DNA. This stems from pervasive selection to maintain economy in the use of energy, nutrients and space and maximize the output of grandchildren cells from limited resources. Therefore when cell size decreases in evolution there is stronger selection for deletions than insertions to reduce the now partly wasteful non-coding DNA until the optimal karyoplasmic ratio is restored.
(7) As a result of these opposing selective forces, larger genomes have relatively more non-coding skeletal DNA. This DNA provides a larger habitat for selfish genetic elements that spread by duplicative transposition, so they will inevitably be much more numerous in larger genomes than small onesbut their abundance is a consequence, not a cause of the larger genomes. Although the sequence of such transposable elements can be regarded as selfish and of no benefit to the host cell, their DNA contributes as effectively as non-transposon secondary DNA and genic DNA to the overall skeleton and volume of the nucleus and thus benefits the cell; calling them selfish is partially misleading. Alhough selfish in origin, there is continual turnover of different types of transposable element within a chromosomal habitat size, determined not by the elements themselves but by principles 16 above; thus as some families increase, others will decline. This is well shown in mammals, where all eutherian orders have essentially the same genome size, except bats which, like birds, have smaller cells to allow more rapid gas exchange by red cells during flight. This near-constancy in mammalian genome size reflects strong stabilizing selection for cell size and implies that mammalian genome size has been essentially constant for 70 million years. But different kinds of transposable elements spread in different groups. Thus the fact that the human genome is made up about 40 % of Alu sequences, similarly abundant in primates but not in other orders, does not mean that transpositional spread of Alu sequences increased genome size in the long term (briefly they must have, the more so if they spread faster than compensatory elimination of non-coding DNA); they probably simply replaced other non-coding sequences.
It is important not to confuse correlation with causation. Consider the maize genome, packed with six major families of transposable elements constituting 70 % of its mass, none older than about 5 million years. It has been assumed that this means that its genome increased twofold during this period (SanMiguel et al., 1998
). But without independent evidence for genome increase (e.g. from fossil cell size) this is circular reasoning. Maize also deleted many genes since diverging from sorghum (Ilic et al., 2003
). Its genome size might have increased as assumed, decreased or remained the same subject only to genomic turnover, with new retrotransposon families replacing old ones or other secondary DNA. I think it may have increasednot through transposition pressure, but as a result of selection by humans for larger seeds (on average associated with larger genomes: Thompson, 1990
), which would favour larger cells (although endopolyploidy makes larger cells for starch storage in the endosperm, a larger starting size might also contribute).
All seven principles were stated 25 years ago (Cavalier-Smith, 1978
), but the evidence has increased considerably since; some of the numerous subsidiary arguments in that paper have been supported by new evidence, but a few have been disproved. This paper has three purposes: first, to present the basic theory more thoroughly then before, in the light of recent molecular data; second, to discuss examples that support it and contradict rival theories; third, to criticize misunderstandings of the theory and explain why existing alternatives are unsatisfactory.
After briefly listing the mutational causes of genome size evolution, I explain the dual nucleotypic/genic control of nuclear volume (principle 3), refining skeletal DNA theory to take account of new molecular information about heterochromatin and the involvement of the nuclear lamina in nuclear assembly around DNA, and discuss the origins of these mechanisms in the ancestral eukaryote. I then explain the importance of the karyoplasmic ratio (principle 4) and universal selection for economy. After emphasizing that the skeletal role of DNA and the karyoplasmic ratio's constancy are both essential for understanding coevolution of genome size and cell size, I present case examples, mainly in protists, exemplifying these principles. Finally, I consider fallacious criticisms of the theory and explain that eukaryote cell-cycle controls are such that they do not involve overall DNA amounts as causal factors. The control of eukaryote cell size is therefore multigenic not nucleotypic, allowing us to reject decisively purely mutational theories of nuclear genome size evolution.
| MUTATIONAL MECHANISMS OF GENOME SIZE CHANGE |
|---|
|
|
|---|
There are five major ones:
(1) Local indels of a few nucleotides probably mainly caused by replication errors; these will affect the lengths of introns and intergenic spacers.
(2) Duplication/deletion of whole genes or major chromosomal segments, probably mainly caused by recombination errors, e.g. unequal sister chromatid exchanges.
(3) Duplicative transposition of transposons.
(4) Errors in chromosome disjunction causing aneuploidy.
(5) If polyploids gradually become functional diploids and stabilize with more DNA than before, polyploidy can be an important cause of increased genome size (this often occurs transiently).
Mechanisms (2), (3) and (4) are most important for changing gene number. However, although selection has the major influence on genome size, mutational biases probably more dominantly affect many features of secondary DNA composition (Cavalier-Smith, 1993
).
For genome size increase, (2), (4) and (5) may be the most important mechanisms. Each event makes a much bigger increase than for (1), and in contrast to (3) the new DNA already comes with appropriately spaced replicon origins and attachment regions for chromosomal core proteins to allow reversible folding into chromosomes and proper attachment to the nuclear lamina and matrix in interphase (Cavalier-Smith, 1985b
). A sixth potential mechanism is insertion of foreign genes by lateral gene transfer (probably involving illegitimate recombination); lateral gene transfer may have had quantitatively significant effects on genome size in a few bacteria (notably in the acquisition of numerous hyperthermophilic genes by some eubacteria and numerous mesophilic genes by some archaebacteria: Cavalier-Smith, 2002b
) but is probably quantitatively minor in eukaryoteshowever even microsporidia, the eukaryote cells with the smallest genomes, got at least one foreign gene thus (Fast et al., 2003
).
For genome shrinking, mechanism (2) (unequal recombination) is probably most important. Since large segmental deletions and insertions are much easier by homologous than by illegitimate recombination, the rates should be substantially greater in a genome with numerous related transposable elements than one lacking them with mostly unique DNA. Therefore selfish DNA and former selfish DNA may provide the most powerful means of genomic reduction available to a cell, the opposite of what its original proponents imagined. It is therefore not obvious that the net effects of transposable elements need be to increase genome size. Gregory (2003)
correctly criticised the idea that small local indels are the major mutational cause of genome reduction (Petrov, 2002
), as they would be much less effective than larger deletions. Nonetheless it is of considerable interest that deletions locally may exceed duplications/insertions (Petrov, 2002
). Unless this apparent bias is really the result of selection (Charlesworth, 1996
) it provides another piece of evidence that mutational bias is not invariably upwards.
| HOW DNA AMOUNTS CONTROL NUCLEAR VOLUME: THE BASICS |
|---|
|
|
|---|
In interphase the nuclear envelope is physically attached to chromatin. In animals and plants, having open mitosis, it assembles on the surface of condensed chromatin at telophase. The volume of the interphase nucleus is set by the total volume of the chromatin (determined by the genome size and the DNA/protein packing ratio H; i.e. the ratio of the total volume of a 30 nm chromatin thread [DNA + proteins] to that of the DNA within it) plus the swelling factor (s, how many times the chromatin polyelectrolyte gel increases in volume subsequent to telophase). Thus the volume (V) of interphase nuclei is given by the formula: V = aHpsC, where a is a universal constant depending only on the measurement units, p is the ploidy, and C the genome size or C-value. It is therefore necessarily the case that interphase nuclear volume is determined jointly nucleotypically (by C) and genically (by H and s). The genome size, C, determines the theoretical minimal volume; genes for chromatin proteins (typically mainly histones) determine the minimal practical packing ratio H (e.g. tighter packing and smaller nuclei are allowed in many sperm by protamines than by histones in somatic cells, such as most vertebrate red blood cells); and various gene products influence the swelling factor by unwinding or condensing chromatin to different degrees. But s partly depends on fundamental physical properties of polyelectrolytes and their counterions, and thus is also subject to basic physical constraints.
Thus genome size (C) does causally affect nuclear volumes, but these are not exclusively controlled by genome size. Changes in genome size necessarily change nuclear volume in direct proportion, in the absence of changes to genes influencing H or s. This is not the case for cell volume. Skeletal DNA theory asserts that the normal and major way that nuclear volume changes in evolution is by changing C, not H or s. As we know that H is essentially invariant because of conservatism of histones, nucleosomes and the folding pattern of the basic chromatin thread, the only way nuclear volume could change significantly is by caltering C or s. Empirically what has changed is C not s. The swelling factor s, unlike C is not a constant but increases with the degree of transcription and can be experimentally manipulated by changing ionic conditions, as expected by polyelectrolyte theory (Nicolini et al., 1984
); therefore cells have a range of physiologically acceptable s-values, not a single fixed one. One question not satisfactorily answered is whether s has a fixed upper limit. We know that in animal oocytes that are halted in meiosis, during which chromatin connections to the nuclear envelope are brokenas in mitosisthe nuclear envelope can swell to an immensely greater diameter than is possible in interphase. Thus there is no inherent limitation to its growth when not constrained by being bound to chromatin. But attachment of the nuclear envelope to DNA imposes an upper limit to nuclear volume.
Injection experiments of mammalian HeLa nuclei into frog oocytes, which immediately swell about 50-fold (Gurdon, 1968
), give an empirical estimate of an upper limit to s. Equally illuminating were cell fusion experiments to make heterokaryons between genetically inactive chicken red blood cell nuclei with maximally condensed chromatin and transcriptionally active HeLa nuclei (Harris, 1970
). The inactive nuclei swell at least 20-fold and then are transcribed and replicated. Harris stressed that the cytoplasm controls gene activity not the DNA; by supplying swelling factors (ultimately coded by genes in the usual chicken-and-egg way of biology) it also affects nuclear volume. Of course, the chicken nuclei remain smaller than the HeLa one, as expected from their 3-fold smaller genomes. Using these nuclear injection and cell fusion experiments as a guide, I suggest that the maximum possible range for s is of the order of a thousand-fold, over two orders of magnitude less than the 300 000-fold range of cell volume. Since the practical limit for transcribing, growing nuclear volume ranges may be closer to 50-fold, germ-line cells (which cannot resort to endopolyploidy, as many somatic cells do, especially in invertebrates and angiosperms) obviously must rely mainly on evolving different genome sizes to adjust their nuclear volumes in proportion to their vastly differing cell volumes (if we include nucleomorphs, with smallest genomes six times less than the smallest microsporidia, the total range in nuclear DNA content is 1 800 000-fold assuming a uniform degree of folding). Using the term skeleton for the size-determining function of DNA does not imply that it is rigid; at the local level it is notsegments writhe, flex and diffuse as far as their attachments allow (Vazquez et al., 2001
; Ostashevsky, 2002
).
When I spoke frequently on skeletal DNA over 20 years ago, a common reaction was why not make the nuclear skeleton of protein? One sometimes felt that one's critics would prefer any material but DNAprotein, chitin, even steel or teflon. Why use DNA? A key advantage of DNA as a nuclear skeleton is that as a polyelectrolyte gel its volume can stretch or shrink within certain limits by binding specific proteins that modify its self-repulsive negative electrical charge (notably the highly positive core histones that induce it to wrap around them) and by modifying the charge on such binding proteins themselves by phosphorylation (adding negative charge) and dephosphorylation (removing it) or by acetylation (reducing positive charge) or deacetylation (increasing it), methylation or demethylation. Core histones are also covalently modified by ubiquitination, glycosylation, ADP ribosylation, and sumoylation: Shiio and Eisenman, 2003
.
Gregory (2001)
wrote that nobody disputes that DNA amounts causally determine nuclear volumes. I wish that were so, but he also cites Vinogradov (1998)
, who asserts that a nucleoskeleton does not require the DNA molecule and could be fulfilled by a proteinaceous cytoskeleton without involving the precarious informational molecules. The volume considerations in previous paragraphs indicate that, so long as DNA remains attached to the envelope, it is physically impossible for a (hypothetical) protein skeleton to achieve the expansion in size that occurs in the largest cells unless the DNA amount also increases hugely. Thus Vinogradov's assertion is strongly contradicted by the facts and amounts to saying that he would have designed things differently. But nature chose DNA because it was there, convenient, and did the job with supreme efficiency. For that job, size, flexibility and potential to swell with activity and shrink into inaction, matter. Sequence does not, as shown by injection experiments, where any DNA (bacteriophage, plasmid or whatever) will nucleate the assembly first of chromatin and then the nuclear pores and envelope to make morphologically normal nuclei (Forbes et al., 1983
). The frog egg is packed with histones, pore complexes, membrane vesicles and soluble lamins, awaiting arrival of the sperm and the almost explosive replication of DNA during cleavage to spring into action and assemble thousands of nuclei without further RNA or protein synthesis or gene expression for nuclear structural proteins. If you inject any foreign DNA instead it nucleates assembly, the mass of nuclei assembled depending simply on how much is injected.
In principle, DNA's skeletal function is achieved in four ways:
(1) Nucleosome assembly by DNA coiling around core histones.
(2) Nucleosome supercoiling to form 30 nm chromatin threads.
(3) Folding 30 nm chromatin threads into large chromatin masses: either interphase heterochromatin or mitotic chromosomes (only partially understood: Belmont et al., 1999
; Grewal and Moazad, 2003
).
(4) Attachment of interphase chromatin to the nuclear lamina and inner nuclear matrix, or in mitotic chromosomes to the chromosome core and via the kinetochore to spindle microtubules.
These skeletal functions of DNA are achieved by two types of binding to nuclear proteins: partially sequence-specific binding, notably by DNA at centromeres, telomeres and the very numerous local regions that mediate attachment to the nuclear matrix (matrix attachment regions, MARs); and a more generalized, largely sequence-independent binding, notably to core histones but also to the nuclear lamina. As nuclear volume changes in evolution, amounts of both skeletal DNA types will change in concert with each other and overall genome size. The geometrical position of these sequence-specific parts of the skeletal DNA is probably important for the DNA skeleton to alternate so dramatically between its interphase and mitotic state. At least part of the short evolutionarily conserved regions of intergenic non-coding DNA (Kondrashov and Shabalina, 2002
) is likely to mediate attachments to the nuclear matrix and lamina. As most skeletal DNA has largely sequence-independent functions, changes in its amount contribute more to overall genome size than the sequence-specific components. In all eukaryotes telomeres are specifically attached to the nuclear envelope during interphase and meiotic prophase. Their movement in the plane of the membrane and aggregation to yield the bouquet stage is probably important for pairing. Centromeres commonly, but less invariably, associate with the nuclear envelope. Centromeres and telomeres are necessarily so disposed at the end of mitosis to make attachment to the nuclear envelope easy: Rabl (1885)
first recognized that they typically maintain their positions after mitosis and that interphase nuclei are essentially swollen chromosome arms packed into a sphere, but maintaining much spatial concentration in distinct domains.
Acetylation and phosphorylation of histones are generally considered important in the switch between interphase and mitotic chromatin. There ought to be more study of how peridinean dinoflagellates, which apparently lack core histones and nucleosomesat least in bulk DNAand have supercoiled chromosomes throughout interphase, achieve this. Their condition is clearly derived, as they belong to the advanced protozoan group Alveolata (Fig. 2) and are not primitive eukaryotes (Cavalier-Smith, 2004b
). Methylation of DNA and histones and histone deacetylation also help organize the more compact, typically genetically inactive state of heterochromatin (Grewal and Moazad, 2003
). Heterochromatin is much more obvious in animals and plants than in fungi and protozoa, which long made it debatable whether it was a fundamental nuclear feature or an optional extra.
|
| HOW HETEROCHROMATIN COMPLICATES THE BASIC SKELETAL DNA THEORY |
|---|
|
|
|---|
Two recent developments show that heterochromatinization is fundamental to the eukaryote state and probably originated in the ancestral eukaryote at the same time as mitosis. First, is major progress in establishing where the root of the eukaryote phylogenetic tree really lies. Contrary to previous ideas, the last common ancestor of animals and plants was also the last common ancestor (cenancestor) of fungi, chromists and all protozoa, i.e. the same as the last common ancestor of all eukaryotes (Fig. 2). Thus all features of animal and plant cells that are truly homologous (and did not travel between them by lateral gene transfer) must have already been present in the eukaryote cenancestor. Thus it not only had nucleosomes and histone acetylation and methylation, but was also sexual with meiosis and a bouquet stage. Any non-laterally transferred homologous feature present on both sides of the basal eukaryotic bifurcation (Fig. 2) must also have been present in the common ancestor, even if absent in some groups, e.g. centrioles were certainly present in the ancestors of higher fungi and plants, which both lack them. Until we rooted the eukaryote tree correctly it was harder to decide whether such features absent in some groups were ancestrally absent or secondarily lost.
For example, the much laxer chromatin folding in Saccharomyces cerevisiae chromosomes, which unlike in most eukaryotes allows transcription during mitosis, was postulated as primitive (Nasmyth, 1995
). Almost certainly it is a secondary consequence of a major reduction in genome size after the budding yeast state evolved from a filamentous ancestor (Cavalier-Smith, 2000b
); a lower degree of compaction can be tolerated by chromosomes with substantially less DNA without making arms too long to fit within the spindle (see Cavalier-Smith, 2003a
). The combination of a much smaller genome and more numerous chromosomes than Schizosaccharomyces pombe allowed S. cerevisiae to have looser mitotic chromosomes without excessive arm lengths, unlike S. pombe that necessarily retains ancestrally tighter folding. Filamentous ascomycete fungi have nearly twice as many genes (Galagan et al., 2003
) and are much more typical higher fungi than yeasts. S. cerevisiae is almost the worst possible model for a typical unicellular eukaryote. Compared with the cenancestor, S. cerevisiae has dramatically reduced genome size, lost most introns, centrioles, cilia, phagocytosis, intermediate filaments and the nuclear lamina, evolved a novel asymmetric mode of budding, and lost the highest levels of chromatin folding, and much else. Budding yeasts are unlike virtually all other eukaryotes in centromere structure (McAinsh et al., 2003
); typically centromeres have a central core containing the kinetochore, which binds spindle microtubules, and peripheral repeated heterochromatic DNA with substantial bulk important for sister chromatid adhesion. But an ancestor of S. cerevisiae simplified its centromere to a single microtubule-binding element and dispensed with centromeric heterochromatin, substantially reducing this part of the skeletal DNA inventory; although it no longer needs centromeric heterochromatin it still uses heterochromatin silencing protein Sir1 (Hediger et al., 2002
) to attract CAF1, an assembly factor helping direct CenpA to the centromere (Sharp et al., 2003
), but it must do so bypassing the typical intermediate need for heterochromatin (Sharp and Kaufman, 2003
).
The most reduced nuclear genomes of all (nucleomorphs: see later section) have no repeated DNA and no sign of heterochromatin ultrastructurally or in their sequences. In cryptophytes, members of the kingdom Chromista formed by red algal nuclear enslavement (Fig. 2), each of the three minute nucleomorph chromosomes has just enough space in one position for a tiny centromere like in S. cerevisiae; as one encodes the key centrosomal histone CenpA they have centromeres of unknown location and size. Nucleomorphs may even lack telomeric heterochromatin (the only kind S. cerevisiae kept); both sorts have telomeric sequencestypical in chlorarachneans, modified in cryptophytes.
Against this phylogenetic background the fact that heterochromatinization, with methylation of both DNA and histones centrally involved, is mechanistically fundamentally similar in animals and plants (Grewal and Moazad, 2003
) indicates its presence in the eukaryote cenancestor. I have suggested that repetitive constitutive heterochromatin originated in the ancestral eukaryote because of its importance in the folding and attachments of centromeres and telomeres, and then spread essentially non-adaptively to intercalary positions, by intragenomic drift (Cavalier-Smith, 1985b
). The latter hypothesis explained why the relative amounts of constitutive heterochromatin vary among animals and plants independently of genome size and other adaptive variables. It now seems likely that heterochromatin's original role was in centromere assembly (notably helping direct CenpA rather than H3 to the kinetochore region) and function (notably in centromere cohesion; perhaps also in controlling its three-dimensional structure to ensure bipolar kinetochore attachment in mitosis and meiosis; S. cerevisiae monopolin may provide a novel simplified way of achieving this in meiosis I after losing centromeric heterochromatin: Toth et al., 2000
; Clyne et al., 2003
; Rabitsch et al., 2003
).
Centromeres originated very early in eukaryotic evolution, most likely from the bacterial chromosome terminus after the origin of core histones (Cavalier-Smith, 1981
). Such continuity between the bacterial and eukaryotic mechanism was literally vital, so centromere splitting arguably evolved from bacterial prokinetochore splitting (Cavalier-Smith, 1987b
). This seems increasingly likely, as centromeres split by the destruction of cohesins, which form an evolutionarily related kleisin superfamily (Schleiffer et al., 2003
) with bacterial Smc proteins (Soppa, 2001
; Herrmann and Soppa, 2002
; Schleiffer et al., 2003
; Volkov et al., 2003
) that bind to DNA as a ring-shaped structure (Volkov et al., 2003
) like cohesins (Campbell and Cohen-Fix, 2002
; Gruber et al., 2003
). Thus the neomuran ancestor of eukaryotes already had Smc proteins for chromosome segregation and a core nucleosome particle of a H3/H4 tetramer (Cavalier-Smith, 2002a
). What was new in the first eukaryote was a kinetochore for attaching microtubules (which originated from FtsZ, the bacterial segregator; Cavalier-Smith, 2002a
) and tight and regular folding of the centromeric heterochromatin to orient them correctly (Cavalier-Smith, 1987b
); the latter was probably the original function of heterochromatin. Chromatin folding at centromeres must also be tight and semi-rigid in order to withstand tension, important for the mitotic surveillance mechanism (Nasmyth, 1995
). As hypoacetylation seems important in centromere maintenance, might this be a memory of an origin of their basic assembly mechanism before widespread chromatin acetylation evolved? From the beginning there would have been conflicting requirements between a tight, semi-rigid centromere folding for segregation and looser uncoiling during transcription. Evolution of CenpA and histones H2A and H2B perhaps helped make centromere folding tighter.
If much intercalary heterochromatin spread by intragenomic drift there would be quantitatively significant non-adaptive noise superimposed on adaptively significant changes in genome size, because constitutive heterochromatin has a quantitatively different impact on nuclear volume from euchromatin (as originally mentioned: Cavalier-Smith, 1978
). Taking heterochromatin into account, and assuming that it does not unfold after telophase (its s = 0), the formula for nuclear volume becomes: V = apC(hf + Hbs), where h is the DNA/protein packing ratio for heterochromatin and H that for euchromatin, b the fraction of the genome that is euchromatin, and f the fraction that is heterochromatin. Putting b = 1 f, V = apC(hf + Hs Hfs). If h = H, probably a close approximation, the formula becomes V = apCH(f + s fs).
Changes in the fraction, f, of heterochromatin inevitably alter genome size scaling with cell size simply because constitutive heterochromatin is more tightly folded. If f remains constant within a group but differs among groups, each will follow a scaling law with slope 1, but intercepts on the ordinates will differ, giving a series of parallel regression lines. If, however, f changed systematically with cell size within a group the slope would differ from 1. Erratic variation in f would increase point scatter around the line, not its slope. Fig. 1 illustrates these considerations. The main regression line U is for unicellular eukaryotes with negligible amounts of heterochromatin (possibly none in dinoflagellates, although I should not be surprised if they retained CenpA and other histones at least for their centromeres, and only very tiny amounts in yeasts, diatoms, amoeba and typical green algae). The dashed line is the regression line for meristem cells of herbaceous angiosperms, which have substantial amounts of heterochromatin but are also transcriptionally very active. The fact that nuclear volumes in both root and shoot meristems also scale with genome size with a slope of 0·826 not 1·0 (Fig. 3: Baetke et al., 1967
) implies that the ratio of condensed chromatin to euchromatin is not invariant among the 30 taxa, but increases somewhat with cell size (microscopical observations appear consistent with this)or that s decreases slightly with size. In small plant genomes, lysine 9 of histone H3 is strongly methylated only in constitutive heterochromatin, but in larger plant genomes methylation is more widespread (Houben et al., 2003
) and might therefore be involved in greater facultative chromatin condensation. In the unicellular cryptophytes, we found a slope for genome size versus cell size distinctly below 1 (0·74: Fig. 1); either the measurements of cell volume and/or DNA content are systematically biased with size or some chromatin packing feature varies systematically across the class. Possibly in cryptophytes f declines with cell sizeevolutionarily comprehensible if there is selection for using DNA as a nuclear skeleton more efficiently in larger unicells than in smaller ones by decreasing the heterochromatin fraction.
|
I once assumed that selection could keep heterochromatin to the minimum needed for centromeres and telomeres in all unicells undergoing binary fission; if that were true, they should all have negligible amounts of heterochromatin and scale quantitatively like dinoflagellates, diatoms and amoebae (Cavalier-Smith, 1980b
Protozoa with large masses of well-developed peripheral heterochromatin are scattered so widely across the tree (Fig. 2) that I now suggest that this was actually the ancestral state for eukaryotes and arose as a cortical nucleoskeletal shell of heterochromatin in the earliest eukaryotes. The very sparse, swollen chromatin typifying fungi and most groups of algae other than cryptomonads may be derived specializations to economize on DNA. I suggest that the key factor may have been phosphate economy and loss of phagotrophy. The first eukaryotes were phagotrophs that got lots of phosphate from their prey; both from its RNA and phospholipids, so phosphate would seldom be limiting. Fungi and algae that evolved cell walls gave up phagotrophy, making it advantageous to reduce DNA/nuclear volume ratios by reducing f to conserve phosphate. Algae that evolved walls had to abandon phagocytosis but those that evolved a complex pellicle instead (like dinoflagellates, euglenoids and cryptophytes) or retained a softer surface (like some chlorarachneans) could retain phagocytosis, like some cryptomonads, and might therefore retain a peripheral heterochromatin shell as their major nucleoskeleton. The larger-celled Amoebozoa may have been able to give up the use of a cortical skeleton of condensed chromatin because they evolved a much more complex and rigid nuclear lamina than any other protists except the aberrant dinoflagellate Noctiluca, which also lacks both heterochromatin (unlike more primitive members of the phylum Myzozoa to which it belongs: Cavalier-Smith and Chao, 2004
) and interphase condensed chromosomes (unlike its sisters the Peridinea). This hypertrophied nuclear lamina probably evolved as protection against shearing damage caused by the exceptional development of amoeboid cytoplasmic motility of Amoebozoa.
The widespread presence of heterochromatin in vascular plants is probably not simply an inherited ancestral state, as most green algae have relatively little. It is more likely a reversion to it, probably related to a need for substantially different nuclear volumes at different stages of the life history (Cavalier-Smith, 1980a
, 1982a
), which arose after the origin of a vascular system led to large cell size increase (see later section). Animals also need different nuclear volumes in different differentiated cell typesmet by controlling the degree of heterochromatinization and transmitted by epigenetic inheritance. The latter is visually obvious in heterokaryons between mammalian cells (with the same genome size) but different average nuclear size, e.g. lymphocytes with smaller denser nuclei (because they have smaller cytoplasm: see next section) and macrophages with larger nuclei and looser chromatin (Harris, 1970
). Such epigenetic inheritance across somatic cell generations is well accepted (Grewal and Moazad, 2003
). It seems that animals and plants independently recruited the ancestral heterochromatinization process invented by unicells for ordinary cell cycle purposes to make substantial chromosome segments, sometimes whole chromosomes as in the inactive X in female mammals, facultatively inactive during development. Although heterochromatinization machinery is very well conserved among animals, fungi and plants, including use of non-coding transcripts RNAi (Gendrel et al., 2002
; Johnson et al., 2002
; Soppe et al., 2002
; Hennig et al., 2003
; Lehnertz et al., 2003
; Stevenson and Jarvis, 2003
; Tamaru et al., 2003
; Tariq et al., 2003
), the uses to which it has been secondarily put may be quite different and need not be conserved (Gaudin et al., 2001
; Kotake et al., 2003
). Their unicellular common ancestor would have had no distinction between germ line and soma; its only cell differentiation would have been alternation between naked vegetative growth by phagotrophy and dormant walled cysts, plus associated sexual differentiation (switch to gametic and meiotic states). Possibly chromatin silencing by heterochromatinization is involved in these switches, but the only functionally necessary epigenetic inheritance would have been the differentiation of centromeric heterochromatin with CenpA and any needed for selective attachment of centromeric and telomeric heterochromatin to the nuclear lamina.
The concentration of many potentially harmful transposable elements in heterochromatin (Vershinin et al., 1995
; Dasilva et al., 2002
;) has stimulated the idea that cells may also use it to inactivate such harmful genetic parasites phenotypically. However, such genetic parasites may simply find it easier to accumulate in such regions (their lower recombination favours this: Charlesworth, 1988
) or specifically target themselves there because selection against them is then weaker. SINES preferentially insert into MARs in Brassica (Tikhonov et al., 2001
). It pays transposons to target such conserved non-genic regions present at above random concentration. Nonetheless the idea that cells can protect themselves against selfish DNA is well-founded, the clearest example being the RIP mechanism that Neurospora uses to destroy repeated DNA (Galagan et al., 2003
).
| THE NUCLEAR LAMINA AND ITS EVOLUTIONARY ORIGIN |
|---|
|
|
|---|
Clearly, the nuclear skeleton consists not only of the chromatin mass but also of the nuclear lamina, which is firmly bound to its surface and mediates the attachment of the nuclear membranes to the DNA (Gerace and Blobel, 1982


