High Mobility Group proteins (HMGs) are a set of chromatin proteins first identified in the 1970s because they are very abundant and run fast on SDS–PAGE. In these and other properties they resemble histones. And like histones, which have seen a resurgence of interest thanks to the discovery that their modification modulates transcription, HMGs are staging a comeback. They now appear to be important and versatile players in the same complex plot: they regulate the expression of genes in normal or pathological conditions.
About a hundred researchers from four continents gathered for 2 days (May 1 and 2, 2000) at the Lister Hill Center of the NIH, Bethesda, to discuss HMGs. Here we report on both the general picture and some of the most novel results (at least to us). Only work published in the last year is cited in the References; for background information, excellent starting points are the reviews by Bustin (1999) and Wegner (1999).
A rose is a rose is a rose (but there are 2 × 3 HMGs)
HMGs were discovered by the British scientist H.M. Goodwin, which has led some to speculate that their name reflects the initials of the discoverer, or of Her Majesty's Government. Today, the name refers to two classes of proteins: the canonical HMGs, and HMG‐motif proteins. Canonical HMGs are ubiquitous to eukaryotes but are absent in eubacteria and archaea. They can be divided into three groups that are completely dissimilar from one another at the level of sequence and structure, but are internally homogeneous: the HMG1/2, HMG‐14/17, and HMG‐I/Y families (Table 1). Each family is characterized by a functional sequence motif: the HMG box, the nucleosome binding domain, or the AT‐hook (Figure 1). HMG‐motif proteins contain one of these functional motifs, but the rest of the sequence is different.
Some quantitative considerations are relevant here: an ‘average mammalian cell’ (3 × 109 bp genome) contains ∼106 molecules of HMG1/2, 105 of HMG‐14/17 and 104 of HMG‐I(Y), whereas it would contain ∼2 × 107 core histone molecules. HMG‐motif proteins are much rarer, much more diverse (dozens in a single organism) and can be cell‐specific or stage‐specific.
This taxonomy is unfortunate, however, for a single name (HMG) refers to three very different motifs (or domains). Worse, HMG‐I and HMG1 belong to different families, but the typographical mimetism of 1 (one) and I (capital i) keep confusing the uninitiated. Michael Bustin and his two co‐organizers therefore proposed changing the nomenclature, using new roots for each group (Table 1):
•HMGB (HMG box) for the HMG1 family;
•ATHI (AT‐hook) for the HMG‐I(Y) family; and
•NSBP (nucleosome binding protein) for the HMG‐14/17 family.
The new nomenclature did get some enthusiastic appreciation, but was also met with fears of discontinuity with the previous literature, and warnings that man should not divide what God united in a single gel. The issue should be discussed further, but many felt that semantic barriers to the field should be removed, and that the name HMG should be kept as a generic indication for the three families. After all, one point that emerged over and over at the meeting was that these proteins really do have a common overarching property: they change the local conformation of DNA and/or nucleosomes, and enhance their accessibility and plasticity.
AT‐hook proteins and the enhanceosome paradigm
AT‐hook proteins originally came into the limelight when Thanos and Maniatis identified HMG‐I(Y) as a critical component in the formation of an ordered assembly of proteins over enhancer DNA. Such structures, where protein–DNA and protein–protein interactions synergize with each other, were dubbed enhanceosomes.
HMG‐I and HMG‐Y are generated by alternative splicing of the same gene, while the related protein HMGI‐C is encoded by a different gene. All of these proteins contain three AT‐hooks. This motif comprises a sequence of amino acids of the general form Pro‐Arg‐Gly‐Arg‐Pro which, in the absence of DNA, has no defined conformation. However, this sequence binds to the minor groove of runs of A/T bases (see Figure 1A), adopting a crescent‐shaped conformation that is reminiscent of the minor‐groove binding drug netropsin. The occupancy of the minor groove by a single AT‐hook bends the DNA mildly towards the major groove, whereas the binding of multiple AT‐hooks can induce complex structural alterations.
The archetypal enhanceosome is formed on the virus‐inducible enhancer of the human interferon‐β gene (IFN‐β), which binds on overlapping sites the transcription factors ATF‐2/c‐jun, IRF1/3/7, NF‐κB and the protein HMG‐I(Y). The IFN‐β enhancer DNA is spontaneously curved towards the minor groove, and this conformation is unfavourable for recognition by its cognate transcription factors. HMG‐I(Y) binds to multiple sites within the enhancer via both intra‐ and intermolecular interactions, reversing the curvature of the DNA towards the major groove. NF‐κB and ATF‐2/c‐jun now bind better, thanks to the more favorable conformation of DNA. Multiple protein–protein interactions among the transcription factors and the HMG‐I(Y) molecules then stabilize the enhanceosome, which then forms a stable surface that in turn recruits the transcriptional coactivator CBP/p300 as a complex with the pol II holoenzyme (Yie et al., 1999).
D. Thanos (New York, NY) presented a full movie of the life and works of the IFN‐β enhanceosome, as determined by immunoprecipitation of formaldehyde‐fixed chromatin. The IFN‐β gene starts to be transcribed ∼6 h after infection, grows into the most actively transcribed gene in the cell, and is turned off within 24 h. Prior to activation, the enhancer is nucleosome‐free, but is surrounded by localized nucleosomes, one of which abuts the TATA box. Two hours after infection, NF‐κB binds to its site, marking the formation of the enhanceosome. At 4 h, GCN5 is present, and within 6 h it acetylates HMG‐I(Y) on Lys71, as shown by a specific anti‐K71‐HMG‐I(Y) antibody. HMG‐I(Y) was already presumably associated to the enhanceosome, but K71 acetylation stabilizes the enhanceosome even further. CBP, Pol II and BRG1 appear at 6–9 h. The human SWI–SNF complex containing BRG1 in turn makes the chromatin template more accessible. TFIID recruitment becomes apparent at 9–12 h and is complete at 19 h. When HMG‐I(Y) is eventually acetylated by CBP on Lys65, this marks the disruption of the enhanceosome, and by 24 h transcription, and enhancer occupancy, decline. Critically, it is the acetylation of HMG‐I(Y) that appears to be the molecular switch that turns transcription on and off.
Other genes are supposed to work like IFN‐β, with the assembly/disassembly of an enhanceosome controlling the recruitment of the Pol II holoenzyme. However, so far there is a dearth of bona fide enhanceosomes. R. Reeves (Pullman, WA) reported that HMG‐I(Y) remodels the chromatin at the promoter of the human IL‐2Rα gene following lymphocyte activation. HMG‐I(Y) has the ability to bind to DNA packaged in nucleosomes (Reeves et al., 2000), and its multiple post‐translational modifications (phosphorylations and at least six different methylations, as well as acetylations) modulate this ability (Banks et al., 2000). A large number of other genes require HMG‐I(Y) for their correct expression, but the details of the interactions on their enhancers/promoters is less well known.
Fat, cancer and lipomas
A group of presentations focused on the role of HMG‐I(Y) and HMGI‐C in normal and pathological adipocyte differentiation. HMGI‐C is predominantly expressed in proliferating, undifferentiated mesenchymal cells and is absent in adult tissues. However, the HMGI‐C gene is rearranged, and expressed, in a number of tumors of mesenchymal origin, including lipomas, hamartomas and leiomyomata. The latter, popularly known as uterine fibromas, are probably the most common tumors; they affect between 20 and 70% of all females of reproductive age, according to different estimates. In most rearrangements, breaks occur within the large third intron of HMGI‐C, resulting in a protein that bears the three AT‐hooks fused to a different peptide.
K. Chada (Piscataway, NJ) gave an exciting report on the role of HMGI‐C in adipogenesis and obesity. Hmgi‐c−/− mice (corresponding to the classical pygmy mutants) are much smaller than their siblings, with an almost 20‐fold decrease in fat cells and a reduction in mesenchymal tissues. The gene is always on in preadipocytes, but in wild‐type mice fed a normal diet expression is undetectable. On a high‐fat diet, Hmgi‐c expression becomes significant because of an expansion of the preadipocyte population. Disruption of Hmgi‐c causes a striking reduction in obesity of leptin‐deficient mice (Lepob/Lepob), in a dosage‐dependent manner: Hmgi‐c+/+ Lepob/Lepob weigh over three times more than Hmgi‐c−/− Lepob/Lepob animals, and Hmgi‐c+/− Lepob/Lepob are intermediate (Anand and Chada, 2000). HMGI‐C therefore appears to be crucial in fat cell proliferation, and is misexpressed in lipomas.
The question arises whether expression of Hmgi‐c per se or the fusion to other genes causes neoplastic transformation. A talk by A. Fusco (Naples, Italy) and a poster by P. Arlotta (Cambridge, MA) presented independent evidence that in transgenic mice the truncated Hmgi‐c gene is sufficient for adipocyte growth and transformation (Battista et al., 1999; Arlotta et al., 2000). Remarkably, only lipomas appear, although the gene is now expressed in most tissues. The HMG‐I(Y) protein can play a role similar to HMGI‐C in adipocyte differentiation: suppression of HMG‐I(Y) synthesis by antisense treatment blocks differentiation of 3T3L1 preadipocytic cells. Moreover, HMG‐I(Y) interacts physically and functionally with C/EBP‐β, a transcription factor that controls the expression of fat‐related genes such as the leptin gene itself.
The HMG1 family
In contrast to the proteins of the HMGI‐(Y) family, which are expressed at low levels in differentiated tissues, HMG1 is present in large amounts. Give or take some variation, there might be one copy of HMG1 for every 2 kb of the human genome. HMG1 contains just two HMG boxes and an acidic tail of 30 aspartic and glutamic acids. Moreover, its HMG boxes have no sequence specificity on their own, and indeed bind very inefficiently to standard, B‐form DNA. However, HMG1 and its individual boxes bind very well to DNA substrates that have a wide minor groove: four‐way junctions, severely undertwisted DNA, DNA crosslinked by cisplatin drugs and DNA at the entry and exit sites of nucleosomes. Conversely, when HMG1 binds to normal B‐form DNA, either because it is present in very high concentrations or because it is recruited by interaction with other proteins, it bends the double helix very significantly.
Why does HMG1 contain two boxes when in most in vitro conditions just one will do? J. Thomas (Cambridge, UK) showed that the two boxes are biochemically similar but not completely equivalent, and R. Johnson (Los Angeles, CA) presented evidence that two boxes, and not just one, are required for enhanceosome formation by HMG1. The BHLF1 proximal promoter in the Epstein–Barr virus is activated following the binding of two dimers of the viral ZEBRA transcription factor to closely spaced sites (Ellwood et al., 2000). HMG1 facilitates their binding, and binds to DNA between them. HMG1 does not bind to the promoter in the absence of ZEBRA, nor to ZEBRA in the absence of the promoter. Moreover, ZEBRA requires the architectural function of HMG1, as shown by the lack of transcriptional activity of a promoter where the spatial arrangement of the ZEBRA proteins has been altered by changing the helical phase of their sites on DNA. Perhaps a recurrent theme here is that architectural proteins are heavily involved in the fast switching of gene expression (both on and off): in support of this, yeast lacking its HMG1‐related proteins NHP6A and ‐B is severely retarded in the kinetics of activation of several inducible genes. Nonetheless, these genes appear to attain some level of transcription, and the double knockout of NHP6A and ‐B is not lethal.
The cell's appetite for HMG1 is surprising: transient overexpression of HMG1 by transfection enhances the biological activity of HMG1 interactors, be they HOX proteins, RAG1/2 recombinase, p53 or steroid hormone receptors. The effects are not sensational (between 3‐ and 10‐fold), but as someone pointed out: if I were twice as tall, it would change my way of life. Indeed, the transient 2‐fold increase in the cellular content of HMG1 that can be achieved by treating cells with progesterone or estrogen sensitizes the same cells to the cytotoxic effects of cisplatin drugs (HMG1 bound to cisplatin adducts inhibits nucleotide excision repair). S. Lippard (Cambridge, MA), who reported the finding (He et al., 2000), is thus organizing a clinical trial to investigate whether the antitumor efficacy of cisplatin can be augmented by treating patients with steroid‐sensitive tumors with the appropriate steroid agonist.
Despite being apparently limiting in the cell, HMG1 is not essential for cellular life. M.E. Bianchi (Milan, Italy) reported that knockout mice for HMG1 are born, although they later die because of problems with glucose metabolism just after birth, or because of a large number of subtle defects if rescued from hypoglycaemia with parenteral glucose administration (Calogero et al., 1999). Cell lines containing no HMG1 can be established and grow normally, although they are less responsive to steroid hormones. HMG2 might provide partial redundancy to HMG1, but HMG2 is expressed at significant levels only during early embryogenesis, and in lymphoid tissues and testis in the adult. HMG2 knockouts only show a reduced fertility in males, which correlates with increased apoptosis of germ cells in seminiferous tubules and production of immobile spermatozoa. Double knockouts for HMG1 and HMG2 are being bred, and will be closely scrutinized for chromatin organization, gene expression, and immunological defects (which neither single knockout has). In vitro experiments clearly show that either HMG1 or HMG2 is required for efficient V(D)J recombination, possibly because HMG1/2 might render the sequences recognized by the recombinase RAG1/2 accessible even when packaged in nucleosomes (M. Oettinger, Cambridge, MA).
D. Edwards (Denver, CO) showed that HMG1 (or HMG2) greatly facilitates the binding of hormone receptors to DNA. The effect is ∼10‐fold on canonical palindromic receptor binding sites, and in vitro HMG1 even makes possible the binding to half‐sites. This may be very relevant, since several hormone‐responsive genes do not contain recognizable receptor binding sites. HMG1 exerts its effect on the zinc finger DNA‐binding domains of steroid receptors, but not on those of non‐steroid receptors (retinoic acids, thyroid hormones and vitamin D). The difference appears to be rooted in the presence of an additional α‐helix C‐terminal to the zinc fingers of non‐steroid receptors, that touches the minor groove almost on the opposite side of the bases recognized in the major groove. When this C‐terminal extension (CTE) is grafted onto steroid receptors, they bind DNA much better and become HMG1‐insensitive. Conversely, non‐steroid receptors without the CTE bind DNA with reduced affinity, and are HMG1‐sensitive. Part of this effect might be mediated via DNA conformation: the target sites get bent upon binding of hormone receptors, and HMG1 might stabilize the bending. The same may be true in other cases as well: p53, RAG1 and TBP (another interactor of HMG1) all bend DNA to varying extents.
HMG‐box proteins and cell fate determination
Sox (Sry‐related HMG box) proteins are a growing family of sequence‐specific DNA binding proteins characterized by an HMG domain highly related to that of Sry, the mammalian male sex‐determining factor. Sox proteins are present throughout the animal kingdom and are involved in several major cell fate determination processes. All Sox proteins are able to bind and bend the same DNA sequences recognized by Sry. Protein–protein interactions with other transcription factors play a crucial role in selecting their in vivo target genes, enabling each one to perform a specific function.
R. Lovell‐Badge (London, UK) gave an exhaustive talk on the role of Sox2. Sox2 is expressed both within the multipotent precursor cells of the developing central nervous system, and in the early mouse embryo. There, it controls the series of cell fate decisions that restrict developmental potential in an asymmetric fashion. The first decision is between making cells of the inner cell mass (ICM) or the trophectoderm. The ICM cells then develop into either the epiblast (giving rise to the embryo) or the primitive endoderm (forming extraembryonic tissues). Each of the early lineages depends on the others for its survival and differentiation.
Gene inactivation experiments reveal that Sox2 is required in at least two separate lineages. Sox2 protein is apparently laid down in the oocyte cytoplasm and maternally inherited; all blastocyst cells have Sox2 protein, but this is nuclear only in the ICM. The protein is very stable, and can be detected in the nucleus of +/+, +/− and −/− ICM cells. Sox2−/− embryos develop normally up to blastocyst stage, but then ICM cells fail and only fragments of extraembryonic tissue remain after implantation. If Sox2+/+ ES cells are placed within −/− blastocysts, they rescue the epiblast, and the embryo survives until Sox2−/− extraembryonic ectoderm fails. Thus, Sox2 plays a crucial role both in the embryo proper and in extraembryonic ectoderm. Apparently, the combination of OCT4 and SOX2 is what determines the cell fate; only Sox2 is present in extra‐embryonic ectoderm, only OCT4 in extra‐embryonic endoderm, but the two proteins are coexpressed in ICM.
Very little is known so far about the molecular targets of Sox proteins, but M. Wegner (Erlangen, Germany), and V. Lefebvre and B. de Crombrugghe (Austin, TX) found some.
Sox10 is expressed in the emerging neural crest, and later in glial cells of the peripheral and central nervous systems. Inactivation of one Sox10 allele causes dominant defects in multiple neural crest‐derived lineages, leading to Waardenburg–Hirschsprung disease (characterized by combined deafness, pigmentation defects and aganglionic megacolon) in humans, and the similar Dom (dominant megacolon) phenotype in mice. Also, loss of both Sox10 alleles in mouse is embryonic lethal and wipes out melanocytes, the enteric nervous system and peripheral glia. ErbB3, which codes for a neuregulin receptor essential for Schwann cell development, was identified as a Sox10 target because its expression pattern in wild‐type embryos parallels that of Sox10, and Dom/Dom mice lose erbB3 expression in the dorsal root ganglia. Protein zero (P0) is a major component of the myelin sheaths of differentiated, axon‐embracing Schwann cells. Sox10 binds as a monomer or as a dimer to distinct sites within the P0 promoter, in both cases activating gene expression (Peirano et al., 2000).
L‐Sox5, Sox6 and Sox9 code for master chondrogenic transcription factors. The closely related L‐Sox5 and Sox6 proteins can dimerize and bind pairs of target sequences, but harbor no transactivation domain and are thought to act as purely architectural factors, whereas Sox9 can both bind DNA as a monomer and transactivate genes, acting as a classical transcription factor. The three proteins are expressed at high level at every site of chondrogenesis in the mouse embryo, and cooperatively transactivate the Col2a1 collagen 2 gene. Knockouts demonstrate that each of this trio of genes is essential for cartilage formation.
R. Grosschedl (Münich, Germany) was the first to demonstrate an architectural activity for an eukaryotic transcription factor, LEF‐1. LEF‐1 was initially identified as being lymphocyte‐specific, but is now known to control a vast number of differentiation processes. In association with intranuclear β‐catenin it performs the last step in the WNT signaling pathway: turning on the appropriate genes. In this mode, LEF‐1 regulates inductive interactions between epithelial and mesenchymal tissues, for example in tooth formation: Wnt10 signals to the epithelial cells, where LEF‐1 in combination with β‐catenin binds to a specific enhancer of the Fgf4 gene. The LEF‐1 knock‐out has defective tooth formation, which can however be rescued by the placement of FGF4‐releasing beads at the epithelial‐mesenchymal junctions in organ cultures. LEF‐1 also controls the expression of the TCRα gene, but here the situation is completely different. β‐catenin is not involved, and just the HMG‐box domain of LEF‐1 (without the β‐catenin binding domain) is capable of partially rescuing the thymus phenotypes of Lef‐1 Tcf‐1 double knockouts. The architectural function of LEF‐1 is all that is needed, and actually the TCRα enhancer sequence can be tweaked in such a way that the other transcription factors comprising the putative enhanceosome can touch each other in the correct way, even in the absence of LEF‐1.
HMG‐14 and ‐17 assist transcription
HMG‐14 and ‐17 proteins wedge between the core histones and the DNA gyrase in nucleosomes, or rather in ∼1% of them. This average is however misleading: first, because two HMG‐14 or two HMG‐17 molecules bind to the same nucleosome (mixed HMG‐14/HMG‐17 binding to the same nucleosome is not observed), and second, because runs of at least six nucleosomes are associated to the same ‘flavor’ of HMG‐14/17 (Y. Postnikov, Bethesda, MD). This implies that HMG‐14/17 alter both intra‐ and inter‐nucleosomal contacts, with an extensive cooperative effect. HMG‐14/17 ‘tagging’ of nucleosomes owes little to DNA sequence, and depends on the transcriptional activity of the DNA packaged into nucleosomes. R. Hock (Würzburg, Germany) reported that in actively transcribing cells HMG‐17 is dispersed in a punctate pattern throughout the nucleus, but relocalizes to interchromatin granule clusters (IGC) upon transcription blockage with α‐amanitin or actinomycin. Likewise, HMG‐14 colocalizes with BrUTP in in situ run‐ons. Treatment of cells with synthetic peptides corresponding to the nucleosome binding domains of HMG‐14/17 competes with binding by the full‐length proteins, and arrests transcription.
J. Herrera (Bethesda, MD) showed that the localization of HMG‐14/17 also responds to the progression of the cell cycle, and depends on post‐translational modifications (Bergel et al., 2000). PCAF and p300 acetylate both proteins at several sites, including the bipartite nuclear localization signal (NLS) and the NBD. Moreover, both proteins can be multiply and differentially phosphorylated. Phosphorylation is highest during M phase, and causes the detachment of both proteins from nucleosomes. After mitosis, HMG‐14/17 must be dephosphorylated to re‐enter the nucleus. L. Mahadevan (Oxford, UK) presented evidence that phosphorylation of HMG‐14 may play a role in the expression of immediate‐early genes (Thomson et al., 1999).
Despite the non‐equivalence of HMG‐14 and HMG‐17 binding to nucleosomes, a poster by Y. Birger (Bethesda, MD) reported that Hmg‐14−/− mice are alive and well. Embryonic fibroblast lines established from the null mice grow significantly faster than wild type controls, and reach higher saturation densities. Before concluding that HMG‐14 is actually bad for your cells, consider that the null fibroblasts are significantly more sensitive to DNA damage. The rate of removal of thymidine dimers, for example, is much reduced. The implication is that HMG‐14 may also be responsible for access to chromatin by DNA repair machineries.
All three families of HMG play a role in chromatin that is between the structural and the regulatory. Whereas HMG‐14/17 modify nucleosomes directly, AT‐hook and HMG‐box proteins remodel the regions of chromatin involved in gene control, and organize enhanceosomes. The classical HMGs essentially contain just the DNA binding domains, which double up as domains for protein–protein interaction. In contrast, HMG‐box proteins may contain conventional activation domains, and the same protein can act both as a classical transcription factor and as an architectural component of chromatin, in a manner highly dependent on cell type and promoter context. For sure, HMG‐motif proteins are both important and versatile, and their action appears to be modulated by a plethora of post‐translational modifications. The meeting was deftly summed up by A. Wolffe (Richmond, CA): there must be money in proteins that control your fat, your teeth, your sex and your health (besides minor things such as transcription, cell division, and DNA recombination and repair). All the participants came away with scientific elation for an excellent meeting on Highly Motivating Great proteins, and high hopes of personal wealth.
The meeting was organized by M. Bustin (NCI, Bethesda), with R. Reeves and M.E.B. as co‐organizers. Special thanks to all participants, many of whom provided comments on this review. Nonetheless, opinions expressed here are those of M.B. and M.E.B., and not necessarily of the speakers. S. Müller, T. Bonaldi and P. Scaffidi helped us regenerate our incomplete recollections. The authors are supported by grants from AIRC, CNR Biotecnologie, MURST, and the TMR Program of the EU.
- Copyright © 2000 European Molecular Biology Organization