Beyond the DNA code
This essay was written by James Turner and was first published in the 2013 Mill Hill Essays.
The year 2000 heralded one of the most important breakthroughs in science and human endeavour. Research groups from around the world, including many scientists based in the UK, together generated the first draft sequence of the human genome, the three million base-long DNA code in which genes are located. This was a heroic task, akin to completing an enormous jigsaw puzzle, with the added complexity that the researchers had no idea what the final picture should look like. The subsequent completion of a high quality version of the human genome sequence in 2003 was timely, occurring fifty years after Watson and Crick’s discovery of the structure of DNA, and a century after Gregor Mendel’s laws of heredity were finally embraced by the scientific community. The human genome sequencing project promised great things: with a blueprint of the human DNA sequence at our fingertips we could finally identify the harmful changes in the genetic code, known as mutations, that give rise to common diseases like obesity and heart disease and could therefore design better treatments for these conditions. We would be able to predict which diseases individuals would develop in later life and, by doing so, could take pre-emptive steps to ameliorate their effects. Most provocatively, we could resolve philosophical questions such as whether our thoughts, feelings and behaviours are the result of free will, or are influenced, or predetermined, by our genetic make-up.
A decade later some of the breakthroughs promised by the human genome sequencing project have been realised. DNA sequence changes that predispose to diseases such as Crohn’s and type 2 diabetes have been identified, providing important mechanistic insight into how these conditions arise. Genomes from individuals can now be sequenced in a matter of days, rather than months, and at a cost of only a few thousand dollars, representing a major step forward in personalised genomics and medicine. However, contrary to the impression created by the popular media and by science fiction films such as Gattaca and In Time, our DNA sequence alone cannot predict everything about how our bodies are destined to function, and from which ailments we are likely to suffer. It is becoming increasingly clear that many diseases develop in the absence of detectable changes in an individual’s DNA sequence. In these conditions, abnormalities arise instead because of changes in the behaviour, or “expression” of genes.
Gene expression begins with the DNA sequence being copied, generating a molecule called RNA. This RNA leaves the nucleus, the part of the cell in which the DNA is stored, and enters the cytoplasm, where it is subsequently used as a template for the synthesis of a unique protein. Proteins are the building blocks of cells, and function in a wide range of biological processes, from enzyme reactions to cell signalling. When expression of a specific gene is inhibited, protein synthesis cannot occur, and this results in specific diseases with associated signs and symptoms. The study of factors that affect the expression and function of genes without causing changes in their underlying DNA sequence is known as epigenetics, with epi deriving from the Greek word meaning “upon”, “over” or “near”.
Historically, the study of epigenetics dates all the way back to Aristotle and his work on developing chicken embryos. Embryos are formed by fertilisation of an egg by a sperm. During the first few days of life, embryos comprise a ball of cells, each one identical to the next, and are therefore termed “undifferentiated”. As development proceeds, these amorphous structures begin to differentiate, forming discrete, specialised tissues, e.g. skin, liver and nerves, and assuming a shape that more closely resembles that of an adult. This differentiation happens despite the fact that all of the cells in all of the different tissues contain exactly the same DNA sequence, and therefore the same set of genes. Cellular differentiation is made possible by the fact that each tissue expresses a different combination of genes to that of the next tissue. For example, genes involved in pigmentation are expressed in skin cells but not in liver cells, while genes involved in the breakdown of alcohol are expressed in liver cells but not in skin cells. By analogy, a piano can create an infinite number of different tunes, despite containing only eighty eight keys, because different combinations of keys can be pressed at any one time.
Aristotle noted that chicken embryos changed their morphology as they grew, and was therefore one of the first scientists to observe epigenetics in action. However, it was not until the seventeenth century, when significant advances in microscopy were made by Antonie van Leeuwenhoek, that the true extent of structural variability in tissues and cells was fully appreciated. Despite this progress, there was still great resistance to the concept that organisms undergo morphological change during their development, and an alternative model, mooted largely by creationists, suggested that embryos were miniature versions of their adult form. This concept, called Preformationism, dated back centuries and is best exemplified by Nicolaas Hartsoeker’s 1695 iconic drawing of the homunculus (Figure 1), a miniature human being encased within a sperm (indeed, the translation of spermatozoon, the scientific term for sperm, is “little living being”). According to the preformation model, all embryos were “ready-made” within a single human being at the beginning of life on earth, with each embryo containing smaller embryos, and within those even smaller embryos, rather like Russian Matryoshka dolls.
Late in the nineteenth century, more detailed and beautiful drawings depicting the changing anatomy of the developing embryo caused the Preformationism model to wane, and the challenge then became to understand how a defined set of genetic information could give rise to different tissue types. In 1893, the revered evolutionary biologist August Weismann proposed the Germ Cell Plasm theory. This stated that organisms are comprised of germ cells, i.e. sperm and eggs, which transmit genetic information to offspring, and somatic cells, e.g. skin and liver, which carry out all other functions. Although this idea was essentially correct, Weismann also hypothesised that somatic tissues differentiated along distinct pathways because they gradually lost genetic material. Evidence to refute this came in 1958, when John Gurdon demonstrated that tissues retain a complete set of DNA as they differentiate. Conrad Waddington subsequently coined the term “epigenetics” in 1942 when introducing his metaphor of the “epigenetic landscape”; this idea, together with further details on Gurdon’s work, is described in more detail in an accompanying Mill Hill Essay by Ben Martynoga. Later Barbara McClintock, and later François Jacob and Jacques Monod, discovered the existence of elements within the DNA that can control the expression of nearby genes. These studies signified the dawn of molecular epigenetics, i.e. the elucidation of the precise mechanisms by which gene expression can be controlled.
Before thinking about how epigenetics works at the level of molecules, we must first understand how genes are organised within the nucleus. Each human cell carries 46 chromosomes, organised into 23 pairs. When added together, the total amount of DNA from these chromosomes approximates two metres per cell. Quite staggeringly, this length of DNA is packaged into a nucleus that is only two thousandths of a millimetre in diameter. This incredible feat of cellular engineering is facilitated by wrapping the DNA around a group of proteins called histones, in a manner akin to winding cotton around a spool. Together, the DNA and its associated histones and other proteins are termed “chromatin” (Figure 2).
How is DNA packaging controlled? The answer lies in the histones, the spools around which DNA is wound. Histones come in a variety of subtypes known as H2A, H2B, H3 and H4, and possess long tails that protrude out from the chromatin. These tails can be chemically modified by the addition of small organic molecules, e.g. by methyl or acetyl groups, by so-called chromatin-modifying enzymes (Figure 2). It was originally believed that histone modifications influence gene expression by affecting how densely chromatin is packaged. For instance, acetylation of histone H3 neutralises its positive charge, and this could potentially reduce repulsion between adjacent H3 molecules, thereby increasing the compaction of chromatin and preventing access of proteins required for the initiation of gene expression. However, the situation is far more complex and it appears instead that histone modifications act as docking sites for other chromatin proteins, whose job is to directly activate or inhibit the expression of specific genes. Because many histone molecules are associated with each gene, and each histone can be subject to many different chemical modifications, the number of possible combinations of histone modifications that can exist at each gene is vast. Indeed, a recent study has found there are around four thousand different combinations of histone modifications scattered across the human genome. Brian Strahl and C. David Allis proposed in 2001 that these different combinations of histone modifications could bring about different gene expression patterns, thus adding complexity to the (already complex) genetic code. This hypothesis is called the “histone code”. This level of complexity would explain the intricate changes in morphology that accompany embryonic development as first observed by Aristotle.
Defects in histones and histone modifications are associated with diseases, most notably cancer. Mutations in specific histones have been identified in a subclass of childhood brain cancers called gliomas and mutations in chromatin-modifying enzymes have been detected in cancers of the stomach and uterus. Early in their development most types of cancer exhibit abnormalities in the pattern of histone modifications across their genome, with one of the commonest abnormalities being loss of acetylation of histone H4. The removal of acetyl groups from histones is facilitated by enzymes called histone deacetylases. In order to counteract the loss of histone acetylation, histone deacetylase inhibitors are now being successfully used as anti- cancer therapies, e.g.Vorinostat is used for the treatment of lymphoma and many others are in the process of completing clinical trials. This is an impressive example of how epigenetics can be manipulated therapeutically to treat disease.
Histone modifications are not the only way that genes can be switched on or off. Another mechanism is via chemical modification of the DNA itself by methylation (Figure 2). DNA methylation has a special place in history as the first epigenetic modification to be discovered. Although the existence of DNA methylation was known for some time it was not until 1975 that Robin Holliday, who worked at NIMR, and US scientist Art Riggs simultaneously suggested that DNA methylation might act to repress gene expression, an idea later borne out by experiments in different model organisms. Another key suggestion made by these visionaries was that DNA methylation might act as a permanent “mark” on genes, influencing their expression not in a temporary fashion but across many cell generations. The idea that epigenetic marks can influence genes in such a sustained manner has forced us to rethink the concept of epigenetics, which we now define as changes in gene function or expression that do not involve alterations in DNA sequence and that are heritable, i.e. can be passed from mother cell to daughter cell. DNA methylation plays a well-established role in many types of diseases. For instance, abnormalities of DNA methylation are associated with developmental disorders of the brain. Rett syndrome is a debilitating disease of the central nervous system associated with poor development, epilepsy and decreased social interaction. The condition is caused by the expression in neurons of many genes that should be silent, which subsequently leads to profound neurological side-effects. Most causes of Rett syndrome arise as a result of mutations in the gene called MECP2, which expresses a protein that binds to methylated DNA, and helps retain genes in an inactive state.
A third mechanism by which gene expression can be controlled is via RNA molecules (Figure 2). The central dogma of molecular biology states that genes are expressed, giving rise to RNA, which is subsequently used to make a specific protein. This was succinctly summarised by the 1968 Nobel Prize winner Marshall Nirenberg as: “DNA makes RNA makes protein”. However, it is now becoming clear that many RNA molecules are not used as templates for protein synthesis but are instead involved directly in the regulation of gene expression. These are called “non-coding RNAs” and they currently represent the most rapidly developing, and in my opinion, exciting areas of research in epigenetics.
The true impact of non-coding RNAs on human biology is best demonstrated by the recent results of the ENCODE (ENCyclopedia Of DNA Elements) Project which aims to identify, on a genome-wide scale, how genes are individually regulated in order to generate different tissues and specific diseases. The human genome contains approximately twenty thousand genes that make proteins, i.e. that are “protein-coding”, but interestingly, these genes occupy only two percent of the total DNA. The remaining ninety-eight percent of DNA has often been referred to as “junk DNA”, to reflect the fact that it had no obvious function. The ENCODE Project discovered that junk DNA actually contains many thousands of genes, and that these genes are non-coding, demonstrating that this class of genes is much more common than previously thought. Some controversy exists about whether many of these genes have important functions. For instance, some may be parasitic, so-called “transposable elements”, which hitch a ride in the human genome, while others may be “pseudogenes”; genes that have lost their function through mutation but have yet to completely disappear from the DNA sequence. Nevertheless, the findings of the ENCODE data are significant when one considers results from studies which aim to identify disease-causing mutations, so-called “genome-wide association studies”. These have revealed that around 90% of DNA mutations responsible for disease reside between protein-coding genes, rather than within the protein-coding genes themselves. This suggests that many diseases could arise as a result of mutations in non-coding genes rather than in protein-coding ones.
Non-coding RNAs have also been invoked as providing an explanation for the fundamental differences in complexity between organisms. A sobering and poorly appreciated fact is that although humans have far more complex body systems, such as the nervous system, humans carry approximately the same number of protein-coding genes as more “simple” organisms like worms and sea sponges. This fact has baffled evolutionary biologists for many years. Interestingly, tentative studies have now shown that humans have many more non-coding RNA genes than other, more primitive organisms. The larger number of non-coding RNAs present in humans might permit protein-coding genes to be regulated in a much more refined way, and this may in turn allow us to evolve more sophisticated body systems.
Non-coding RNAs form many different classes and have a variety of roles. Perhaps the best studied of these is in X chromosome inactivation. Male and female mammals share the same genetic material with the exception of the sex chromosomes, the specialised chromosomes that are important for sex determination and the formation of germ cells. Males have a single X chromosome and a single Y chromosome (XY), while females have two X chromosomes (XX).The difference in X chromosome number between males and females is problematic, because it creates an imbalance in the dosage of X-linked genes between the sexes. To resolve this, one of the two X chromosomes is inactivated in each and every cell in the female, the result being that female cells express only a single dose of X-genes, just like males. X chromosome inactivation was first observed by Ewart Bertram and Murray Barr, who in 1948 found that the nucleus of female neurons contained a dark chromatin body that was not apparent in male neurons. This structure, the Barr body, was later identified by Mary Lyon as the inactive X chromosome. X chromosome inactivation is mediated by a single non-coding RNA called Xist. The Xist RNA is expressed from the X chromosome that is destined to be inactivated, and it exhibits the remarkable property of being able to physically coat the whole X chromosome, leading to silencing of hundreds of genes (Figure 3). Xist represses gene expression by recruiting chromatin-modifying enzymes to the X chromosome and, later, by inducing DNA methylation at X-linked genes. This represents a wonderful example of how different epigenetic mechanisms cooperate to control gene expression, and is likely to be a repeating theme in many different epigenetic processes.
As a scientist, I am frequently asked two questions: what do you think are the biggest challenges in your field, and what do you think the next big discoveries in your field will be? These are difficult to answer because epigenetics is still, at least at the molecular level, a relatively young field with paradigm-shifting discoveries being published on an almost weekly basis. Nevertheless, an area in which I think epigenetics will have a central role is in regenerative medicine. A great many degenerative diseases cause irreversible tissue loss, e.g. Parkinson’s disease and motor neurone disease, and identifying a source of material for replacement therapy is challenging. As already discussed, tissues vary only in their epigenetic and not their genetic state. It should therefore be possible in principle to take an unaffected tissue type from a patient, e.g. blood cells, and use experimental approaches to erase and reset its epigenetic marks to that of the affected tissue type, e.g. neurons, which can subsequently be used for replacement therapy. This is clearly very challenging, and will first require a more detailed knowledge of how epigenetic states are created and reset during development, and how the epigenetic signature of one tissue differs from another. Nevertheless, it will prove to be a very powerful therapeutic strategy.
Two other areas of epigenetics that are likely to receive great attention are the effect of the environment on epigenetics, and the extent to which epigenetic abnormalities that cause disease in one generation can be passed on to, and affect health in subsequent generations. Currently, there is a reasonable amount of evidence that adverse environmental effects can lead to epigenetic changes, especially during critical periods of development. Experiments have shown that starvation in female rats during pregnancy can result in the epigenetic silencing of genes essential for metabolism in their offspring. In addition, exposure of young mice to stress can lead to specific changes in DNA methylation at genes involved in social behaviour, and these epigenetic changes can persist for the rest of the animal’s life. Evidence that deleterious epigenetic changes can be passed from parent to offspring is also mounting, but remains somewhat controversial. A particularly interesting recent study found a link between paternal diet and disease in offspring. Male rats were given a high fat diet in order to induce weight gain and diabetes. These males were then set up in matings, and the resulting daughters were fed a normal diet. Remarkably, these daughters developed diabetes at a young age, while daughters born to non-diabetic fathers did not. This finding demonstrates that environmental effects can impose epigenetic changes on a father’s sperm DNA that can be inherited and influence the health of his offspring. These kinds of experiments are topical and provocative, because they raise questions about the extent to which our life experiences, including what we eat and how we behave, can affect the well-being of our children, and even our children’s children. However, a major challenge is to identify the molecular mechanisms by which these effects can be passed on.With the experimental tools to address this and other questions now at hand, there is every reason to believe that epigenetics will revolutionise our understanding of human pathology in the twenty first century, in the same way that genetics did in the twentieth century.