If you like the BioWiki, please "like" us on facebook, "share" us on Google+, or "tweet" about the project.
This second chapter on transcription focusses on the cis-acting elements needed for accurate transcription, with a emphasis on promoters. The chapter begins with a discussion of techniques used to find the start site for transcription and to identify the segments of DNA bound by protein. It then covers promoters, elongation, termination, and mRNA structure. The phenomenon of polarity is explored to show the relationships among mRNA structure, transcription and translation in E. coli.
The nucleotide in DNA that encodes the 5' end of mRNA is almost always the start site for transcription. Thus methods to map the 5’ end of the mRNA are critical first steps in defining the promoter.
Figure 3.2.1. Nuclease protection to map 5’ end of a gene
This assay measures the distance between an end label (at a specific known site on DNA) and the end of a duplex between RNA and the labeled DNA. A fragment of DNA (complementary to the RNA) that extends beyond the 5' end of the RNA is labeled at a restriction site within the RNA‑complementary region. The labeled DNA is hybridized to RNA and then digested with the single‑strand specific nuclease S1. The resulting fragment of protected DNA is run on a denaturing gel to determine its size. Note that this fragment runs from the labeled site to the nearest interruption between the DNA and the RNA. This could be the beginning of the RNA, or it could be an intron, or it could be an S1 sensitive site.
Fig. 3.2.2. Nuclease protection assay to define the 3’ end of a gene.
This assay measures the distance between an end label and the point to which reverse transcriptase can copy the RNA. A short fragment of DNA, complementary to RNA, shorter than the RNA and labeled at the 5' end, is hybridized to the RNA. It will now serve as a primer for synthesis of the complementary DNA by reverse transcriptase. The size of the resulting primer extension product gives the distance from the labeled site to the 5' end of the RNA (or to the nearest block to reverse transcriptase).
Fig. 3.2.3. Primer extension assay, another way to map the 5’ ends of genes
A technique utilizing the high sensitivity of PCR has been developed to determine the 5’ ends of mRNAs which can then be mapped onto genomic DNA sequences to find the 5’ ends of genes. This technique is called rapid amplification of cDNA ends and is abbreviated RACE. When RACE is used to determine the 5’ end of mRNA, it is called 5’ RACE. This method requires that an artificial primer binding site be added to the 5’ ends of copies of mRNA, or cDNA, and knowledge of a specific sequence within the cDNA, which will serve as the second, specific primer for amplification during PCR (Fig. 3.2.3b).
Fig. 3.2.3.b. Rapid amplification of cDNA ends, or 5’ RACE
The methods for making cDNA from mRNA are more prone to copy the 3’ ends and middle of mRNAs than the 5’ ends. Thus it is common to have access to this part of the cDNA, and that provides the sequence information for the second, or internal, primer. In contrast, specialized techniques are often employed to get information about the 5’ ends of mRNAs. In the technique outlined in Fig. 3.2.3.b, the fact that reverse transcriptase tends to add a few C residues to the 3’ end of the cDNA is used to design an artificial template that will anneal to those extra C nucleotides. Then reverse transcriptase copies the second template, thereby adding the artificial primer binding site. This artificial primer binding site is needed because the sequence of the 5’ end of the mRNA is not known in this experiment; indeed, that is what the experimenter is trying to determine. Once the artificial primer binding site has been added to the cDNA, then the modified cDNA serves as the template for PCR. The PCR product is sequenced and compared to an appropriate genomic DNA sequence. The first exon or exons of the genes will match the sequence of the PCR product, starting right after the first primer.
a. Electrophoretic mobility shift assay (EMSA), or gel retardation assay: This assay will test for the ability of a particular sequence to form a complex with a protein. Many protein‑DNA complexes are sufficiently stable that they will remain together during electrophoresis through a (nondenaturing) polyacrylamide gel. A selected restriction fragment or synthetic duplex oligonucleotide is labeled (to make a probe) and mixed with a protein (or crude mixture of proteins). If the DNA fragment binds to the protein, the complex will migrate much slower in the gel than does the free probe; it moves with roughly the mobility of the bound protein. The presence of a slowly moving signal is indicative of a complex between the DNA probe and some protein(s). By incubating the probe and proteins in the presence of increasing amounts of competitor DNA fragments, one can test for specificity and even glean some information about the identity of the binding protein.
Figure 3.2.4. Diagram of results from an electrophoretic mobility shift assay
In this example, two proteins recognize sequences in the labeled probe, forming complexes A and B (lane 2). The proteins in complexes A and B recognize specificDNA sequences in the probe. This is shown by the competition assays in lanes 3-8. An excess of unlabeled oligonucleotide with the same sequence as the labeled probe (“self”) prevents formation of the complexes with labeled probe, whereas “nonspecific DNA” in the form of E. coli DNA does not compete effectively (compare lanes 6-7 with lanes 3-5).
This experiment also provides some information about the identity of the protein forming complex A. It recognizes an Sp1-binding site, as shown by the ability of an oligonucloetide with an Sp1-binding to compete for complex A, but not complex B (lanes 9-11). Hence the protein could be Sp1 or a relative of it.. The proteins forming complexes A and B do not recognize an Oct1-binding site (lanes 12-14).
b. Nitrocellulose binding: Free duplex DNA will not stick to a nitrocellulose membrane, but a protein‑DNA complex will bind.
The presence of a protein will either protect a segment of DNA from attack by a nuclease or other degradative reagent, or in some cases will enhance cleavage (e.g. to an adjacent sequence that is distorted from normal B‑form). An end‑labeled DNA fragment in complex with protein is treated with a nuclease (or other cleaving reagent), and the protected fragments are resolved on a denaturing polyacrylamide gel, and their sizes measured.
Any reagent that will cleave DNA in a non-sequence-specific manner can be used in this assay. Some chemical probes, such as copper ortho-phenanthroline, are very useful. Figure 3.2.5. presents a schematic diagram of the in vitro DNase I footprint analysis in the top two panels, and then an example of the results of binding a purified transcriptional regulator to its cognate site on DNA.
Figure 3.2.5. DNase footprint analysis
a. Methylation interference reactions: When a purine that makes contact with the protein is methylated by dimethyl sulphate (DMS), the DNA will no longer bind to the protein. Thus, DNA is gently methylated (about one hit per molecule), mixed with the protein, and then the bound complexes are separated from the unbound probe. The unbound probe will be modified at all sites (when the whole population of molecules is examined) but the bound DNA will not be modified at any critical contact points. The methylated DNA is then isolated, cleaved (with piperidine at high temperature, just like a Maxam and Gilbert sequencing reaction) and resolved on a denaturing gel. The critical contact points will be identified by the clear areas on the gel ‑ the ones that correpond to fragments that when methylated at that site will no longer bind to the protein. DMS reacts mainly with G's at N‑7, which is in the major groove of the DNA, so these are the contacts most sensitive to this reagent.
b. Other reagents are specific for the minor groove or for the phosphodiester backbone.
Figure 3.1.6. Methylation interference assay.
The specific binding sites (often 6 to 8 bp) can serve as an affinity ligand for chromatography. Multimers of the binding site are made by ligating together duplex oligonucleotides that contain the specific site. After a few crude initial steps (e.g. isolating all DNA‑binding proteins on DNA‑sepharose) the extract is applied to the affinity column. Most of the proteins do not bind, and subsequently the specifically bound proteins are eluted.
a. cis‑acting: A cis-acting regulatory element functions as a segment of DNA to affect the expression of genes on the same chromosome that it is located on. Cis-acting elements do not encode a diffusible product. The promoter is a cis-acting regulatory element.
Compare the phenotypes of mutations in the gene encoding b‑galactosidase (lacZ) versus mutations in its promoter (p).
Consider a heterozygote that is p+ lacZ‑ /p+ lacZ+ .
The phenotype is Lac+. lacZ+ complements lacZ‑ in trans. In this case, lacZ+ is dominant to lacZ-.
Consider a heterozygote that is p+ lacZ‑ /p‑ lacZ+ .
The phenotype is Lac‑. p+ does not complement p‑ in trans.
p‑ operates in cis to prevent expression of lacZ+ on this chromosome. The mutant promoter is dominant over the wild-type when the mutant promoter is in cisto the wt lacZ.
Consider a heterozygote that isp+ lacZ+ /p‑ lacZ‑ .
The phenotype is Lac+. lacZ+ now complements lacZ‑ in trans because it is driven by a functional promoter in cis, p+
b. Dominance in cis: the promoter “allele” that is in cisto the wild-type structural gene (lacZ) is dominant over the other promoter allele.
c. Promoter mutations affect the amount of product from the gene but do not affect the structure of the gene product.
a. ‑35 and ‑10 sequences
‑35 16‑19 bp ‑10+1
Recognition by Allows binary complex to convert
RNA polymerase from closed to open
b. The sequences are conserved in all E. coli genes transcribed by holoenzyme with s70
Figure 3.2.7. Correlation of conserved sequences, location of promoter mutants, and regions of contact with polymerase at bacterial promoters
Promoters contain binding sites for nuclear proteins, but which of these binding sites have a function in gene expression? This requires a genetic approach for an answer.
a. In vitro mutagenesis (deletions or point mutations)
Figure 3.2.8. Evidence for an RNA polymerase II promoter.
b. Test in an expression assay
(1)The mutagenized promoter is linked to a reporter gene so that RNA or protein from that gene can be measured quantitatively
(2) The promoter‑reporter DNA constructs are introduced into an assay system that will allow the reporter to be expressed.
(a) Whole cells
microinjection into Xenopus oocytes
transfection of cell lines: introduce the DNA via electroporation or by getting the cells to take up a precipitate of DNA and Ca phosphate by pinocytosis
(b) Whole animals = transgenic animals
Introduce the DNA into the germ line of an animal, in mammals by microinjecting into a fertilized egg and placing that into a pseudopregnant female. This technology allows one to examine the effects of the mutation throughout the development of the animal.
(c) Cell‑free systems
Extracts of nuclei, or purified systems (i.e. with all the necessary components purified)
a. The minimal promoter is needed for basal activity and accurate initiation.
(a) TATA box
Figure 3.2.9. Two general parts of promoters for RNA polymerase II.
b. The amount of expression is regulated via upstream elements.
Sp1: binds GGGGCGGGG = GC box
Octn: binds ATTTGCAT = octamer motif
Oct1 is a general factor (ubiquitous)
Oct2 is specific for lymphoid cells
CP1, CTF = NF1, C/EBP bind to CCAAT = CCAAT box (pronounced "cat" box)
These are different families of proteins, CP1 and CTF are found in many cell types, C/EBP is found in liver and adipose tissue.
(4) These upstream control elements may be inducible (e.g. by hormones), may be cell‑type specific, or they may be present and active in virtually all cell types (i.e. ubiquitous and constitutive).
Fig. 3.2.11. Binding of proteins for promoter for RNA polymearase I
a. This promoter has internal control sequences. Deletion of 5' flanking DNA still permits efficient transcription of (most) genes transcribed by RNA PolIII. Even the intial part of the gene is expendable, as is the 3' end. Sequences internal to the gene (e.g. +55 to +80 in 5S rRNA genes) are required for efficient initiation, in contrast to the familiar situation in bacteria, where most of the promoter sequences are 5' to the gene.
b. As discussed above, TFIIIA binds to the internal control region of genes that encode 5S RNA (type 1 internal promoter). TFIIIC binds to internal control regions of genes for 5S RNA (alongside TFIIIA) and for tRNAs (type 2 internal promoters). The binding of TFIIIC directs TFIIIB to bind to sequences (-40 to +11) that overlap the start site for transcription. One subunit of TFIIIB is TBP, even though no TATA box is required for transcription. TFIIIA and TFIIIC can now be removed without affecting the ability of RNA polymerase III to initiate transcription. Thus TFIIIA and TFIIIC are assembly factors, and TFIIIB is the initiation factor.
c. RNA polymerase III binds to the complex of TFIIIB+DNA to accurately and efficiently initiated transcription.
Fig. 3.2.11. Binding of proteins for promoter for RNA polymearase III
s in bacteria: The conformation of the polymerase changes upon dissociation of s to that it enters a processive mode for elongation.
For eukaryotic transcription by RNA polymerase II, TFIID and TFIIA are thought to stay behind after the transcription complex clears the promoter. The release of the transcription complex from the promoter appears to be dependent on the phophorylation of the CTD of RNA polymerase II. One of the protein kinases implicated in this process is TFIIH, but others, such as P-TEFb, have also been implicated.
Fig. 3.2.13. Model for role of phosphorylation of RNA polymerase in shift from initiating to elongating enzyme.
Fig. 3.2.14. Supportive evidence: Immunofluoresence shows Pol IIa is on heat shock genes when quiescent (stalled polymerases), but Pol IIo is present once the genes are actively transcribed (elongating polymerases).
3. There is some indication that factors that increase the processivity of the transcription complexbind to the elongating polymerase. Examples include the following.
4. GreA and GreBin E. coli and TFIISin eukaryotes induce hydrolytic cleavageof the transcript within the RNA polymerase, followed by release of the 3' terminal RNA fragment. This process has been implicated in overcoming pausing of the polymerase.
Fig. 3.2.15. Cleavage of RNA to help overcome pausing
4. Regulation of elongation is an under‑studied area at present. In fact, many transcription complexes pause about 20 nt into the gene, and stay there, primed for transcription, until they are released for elongation in response to some stimulus. The classic example are the heat shock genes in Drosophila, but this may be a fairly general phenomenon.
5. The regulation of transcription is primarily at initiation (in most cases) but that regulation can be exerted at the frequency of assembling an initiation complex or by the frequency of release into the elongation mode (or any step prior to elongation).
6. The elongation rate averages about 50 nt per sec. This is not a constant rate and many pause sites are seen. Also, some templates may be transcribed at different rates.
7. Variation in elongation rate will not affect the output of gene product (e.g. transcript). It will affect the lag time between initiation and the first appearance of a product. Of course, a sufficiently long pause, i.e. when no elongation occurs, can reduce the amount of RNA synthesized from a gene.
8. As an illustration of the importance of elongation in regulation, consider the Tat and tar system in the human immunodeficiency virus, HIV. This case study also illustrates the complexity of the system.
Elongation of transcription in HIV requires the virally-encoded protein Tat that binds to an RNA structure centered at about +60, called the tar. Elongation requires the CTD of RNA polymerase II, and now it is clear that Tat leads to phosphorylation of the CTD. One step, probably promoter clearance, uses the kinase activity in the CDK7 subunit of TFIIH (or a trimeric complex of CDK7, cyclin H, and MAT1, referred to as CAK). This was shown by the ability of a pseudosubstrate inhibitor of CDK7 to block Tat-dependent elongation.
Further phosphorylation of the CTD of RNA polymerase II is catalyzed by the positive transcription elongation factor b, called P-TEFb, which contains a kinase subunit known as PITALRE or CDK9. P-TEFb is needed for Tat-stimulated elongation of transcripts from the HIV promoter (a combination of promoter and enhancer called a long terminal repeat, or LTR). A stylized example of these data is shown below.
The inhibitor of elongation, DRB, blocks the P-TEFb kinase. Indeed, a random screen of >100,000 compounds for the ability to block Tat-stimulated HIV transcription found several new compounds. All of these blocked elongation, and many structurally diverse compounds also inhibit the P-TEFb kinase. Thus Tat-dependent activation works through bothTFIIH (perhaps at promoter clearance) and P-TEFb (for full elongation).
Fig. 3.2.16. P-TEFb is needed for elongation in HIV.
Figure legend. When a DNA template containing the LTR and encoding the TAR is used for in vitrotranscription in a HeLa cell nuclear extract (which is competent for transcription by RNA polymerase II and associated general transcription factors) plus all 4 ribonucleoside triphosphates, a short RNA of about 70 nucleotides is produced (lane 1 in the figure below). Addition of increasing amounts of Tat (indicated by the triangle labeled Tat) causes transcription to continue to the end of the template, to produce a "run-off" transcript of about 700 nucleotides (lanes 2-4; darker shading indicates greater abundance). The results of removing the segment of DNA encoding the TAR from the template is shown in lanes 5-8. A cellular protein kinase complex called P-TEFb has been found associated with Tat. It can be removed from the HeLa cell nuclear extract, and the effects of this treatment are shown in lanes 9-12. For a review of this work, see the article by K. A. Jones (1997) "Taking a new TAK on Tat transactivation." Genes & Development 11: 2593-2599.
1. Terminator sequences in E. coli cause pausing by RNA polymerase
2. r factor
3. Model for action of r factor
1. Termination by RNA Pol II
(1) AAUAAA, about 20 nt before the 3' end of the mRNA
(2) Other sequences 3' to cleavage site
2. Termination by RNA Pol III:
Termination occurs at a run of 4‑5 T's (on the nontemplate strand of DNA) surrounded by GC‑rich DNA
3. Termination by RNA Pol I:
Termination requires an 11 bp binding site for the protein Reb1p, which causes the polymerase to pause, and a 46 bp segment located 5' to the Reb1p site, which may be required for release of the polymerase [Lang...Reeder (1994) Cell, 79:527-534].
Strong pausing may be a component of the transcription termination process for several RNA polymerases.
Fig. 3.2.19. Model for termination by RNA polymerase I
1. Bacterial mRNA is often polycistronic.
One transcript can encode the products from several adjacent genes.
a. The set of adjacent genes that are transcribed into one mRNA is an operon.
b. This organization allows for common transcriptional control. Thus is ti part of the mechanism for coordination of expression of genes whose products are required at the same time. E.g. The lacoperon, lacZYA, encodes three enzymes involved in the uptake and metabolism of lactose.
c. Production of proteins from polycistronic mRNAs requires initiation at internal AUGs, allowing for translation of the part of the mRNA encoding the second, third, etc. proteins.
Figure 3.2.20. A polycistronic operon in E. coli.
2. The initial transcript is also translated and subsequently degraded.
That is, transcription, translation and degradation are all going on simultaneously. The mRNA (ususally) is not extensively processed prior to translation.
Figure 3.2.20. Translation occurs simultaneously with transcription in bacteria.
The phenomenon of polarity occurs because of tight linkage between transcription and translation in bacteria.
1. Definition: Polar mutations are mutations early in the operon that exert a negative effect on the expression of genes later in the operon. This is generally a result of (some) nonsense mutations (those that cause premature termination of translation) in a gene toward the 5' end of the operon, which results in a cessation of transcription before the the subsequent genes are reached.
2. Model for r action can explain why stopping translation can also lead to a cessation of transcription.
3. Mutations in rsuppress polarity of nonsense mutations
Since r is no longer functional, termination does not occur at the r‑dependent site early in the operon, and subsequent genes are then transcribed. So even though translation will still terminate in the first gene, transcription (and then translation) will continue in the downstream genes of the operon.
1. Most mRNAs in eukaryotes are capped at their 5' ends and polyadenylated at their 3' ends.
This general structure is true for almost all eukaryotic mRNAs. The cap structure is almost ubiquitous. A few examples of mRNAs without poly A at the 3' end have been found. Some of the most abundant mRNAs without poly A encode the histones. However, most mRNAs do have the 3' poly A tail.
The poly A tail at the 3' end can be used to purify mRNAs from other RNAs. Total RNA from a cell (which is about 90% rRNA and less than 10% mRNA) can be passed over an oligo(dT)-cellulose column. The poly A-containing mRNAs will bind, whereas other RNAs will elute.