AOBPreview originally published online on July 18, 2005
Annals of Botany 2005 96(4):669-681; doi:10.1093/aob/mci219
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Detection and Preliminary Analysis of Motifs in Promoters of Anaerobically Induced Genes of Different Plant Species
1 Knowledge Extraction Laboratory, Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613 and 2 Department of Biological Sciences, National University of Singapore, Singapore
* For correspondence. E-mail mohanty{at}i2r.a-star.edu.sg
Received: 29 October 2004 Returned for revision: 16 December 2004 Accepted: 31 January 2005 Published electronically: 18 July 2005
| ABSTRACT |
|---|
|
|
|---|
Background and Aims Plants can suffer from oxygen limitation during flooding or more complete submergence and may therefore switch from Kreb's cycle respiration to fermentation in association with the expression of anaerobically inducible genes coding for enzymes involved in glycolysis and fermentation. The aim of this study was to clarify mechanisms of transcriptional regulation of these anaerobic genes by identifying motifs shared by their promoter regions.
Methods Statistically significant motifs were detected by an in silico method from 13 promoters of anaerobic genes. The selected motifs were common for the majority of analysed promoters. Their significance was evaluated by searching for their presence in transcription factor-binding site databases (TRANSFAC, PlantCARE and PLACE). Using several negative control data sets, it was tested whether the motifs found were specific to the anaerobic group.
Key Results Previously, anaerobic response elements have been identified in maize (Zea mays) and arabidopsis (Arabidopsis thaliana) genes. Known functional motifs were detected, such as GT and GC motifs, but also other motifs shared by most of the genes examined. Five motifs detected have not been found in plants hitherto but are present in the promoters of animal genes with various functions. The consensus sequences of these novel motifs are 5'-AAACAAA-3', 5'-AGCAGC-3', 5'-TCATCAC-3', 5'-GTTT(A/C/T)GCAA-3' and 5'-TTCCCTGTT-3'.
Conclusions It is believed that the promoter motifs identified could be functional by conferring anaerobic sensitivity to the genes that possess them. This proposal now requires experimental verification.
Key words: Anaerobic genes, promoters, motifs, anaerobic response elements, ab initio motif detection, transcription factors, transcription factor-binding sites, Arabidopsis thaliana, ethanolic fermentative pathway
| INTRODUCTION |
|---|
|
|
|---|
Plants often suffer from a shortage of oxygen during partial or complete submergence. Initially the inundated parts, especially roots, suffer from hypoxia, which later turns to anoxia, as the slow diffusion of oxygen in water (10000 times slower than in air, Armstrong, 1979
During anaerobiosis, plants switch from Kreb's cycle respiration to fermentation. Although there are a number of fermentative pathways operating during anoxia (ethanol, lactic acid and alanine fermentation; Kennedy et al., 1992
), ethanolic fermentation is the main energy-producing pathway (ap Rees et al., 1987
; Greenway and Gibbs, 2003
). However, how this pathway is controlled is not clearly understood although various anaerobic proteins (ANPs) become induced during anaerobiosis in the roots. Approximately 20 ANPs have been identified in maize roots by cDNA cloning, and most are enzymes of glycolysis and fermentation such as sucrose synthase, pyrophosphate-dependent phosphofructokinase, fructose-1,6-phosphate aldolase, glucose-6-phosphate isomerase, glyceraldehyde-3-phosphate dehydrogenase, alcohol dehydrogenase, lactate dehydrogenase, pyruvate decarboxylase and others (Sachs et al., 1980
, 1996
). The expression of such ANPs is controlled predominantly at the transcriptional level, although a post-transcriptional regulatory mechanism has also been demonstrated (Fennoy and Bailey-Serres, 1995
).
It is generally believed that genes having similar expression patterns contain common motifs in their promoter regions (Vilo et al., 2000
). Klok et al. (2002)
analysed the expression of low-oxygen response in arabidopsis root cultures and found that the transcriptional regulatory regions of genes with a similar expression share similar motifs. Thus, a common set of transcription factors (TFs) is likely to control these genes. Common promoter motifs are the key signatures for a family of co-regulated genes and are usually present in the regions where complex protein interactions occur (Z. Wang et al., 2004
). However, in some cases, single motifs can bind various transcription factors thereby bringing the genes under multiple regulatory controls (Jin and Martin, 1999
; Yanagisawa, 2002
). Extensive studies on 500 bp upstream regions of yeast promoters suggest that regulatory elements are commonly present in those regions (Caselle et al., 2002
). In eukaryotes, the computational detection of regulatory sites is difficult as the sequences where TFs bind are generally much shorter than in prokaryotes (van Helden et al., 1998
). In addition, they are generally active in both orientations and can be dispersed over a large distance. Sometimes, they can be present in introns and also in distal parts of the promoter (Caselle et al., 2002
). While many computer programs have been developed to detect biological motifs (Lawrence et al., 1993
; Bailey and Elkan, 1994
; Hertz and Stromo, 1999
; Califano, 2000
; Pavesi et al., 2004
; Huang et al., 2005
; Yang et al., 2005
; and others), it is still a considerable challenge to detect accurately previously defined regulatory sites in regions of interest, let alone identify new regulatory sites responsible for activation and expression of a functional gene group.
Anaerobically induced genes are often characterized by the presence of anaerobic response elements (AREs) in their promoter regions (Walker et al., 1987
). AREs have been reported in promoters of maize ADH1, ADH2 and aldolase genes, and arabidopsis (Arabidopsis thaliana) ADH1, LDH1 and PDC1 genes (Olive et al., 1990
; Dolferus et al., 1994
; Hoeren et al., 1998
). Two motifs, the GT motif and GC motif, have been identified as AREs in these promoters (Olive et al., 1990
, 1991a
, b
). The transcription factor AtMYB2 in arabidopsis binds to the GT motif of the ADH1 promoter and a GC-binding protein binds to the GC motif (Olive et al., 1991b
). The consensus sequence for an AtMYB2-binding site present in all known anaerobically induced genes is 5'-AAACC(G/A)(G/A)-3' (reverse complement of the GT-box) (Hoeren et al., 1998
), while the one for the GC motif is 5'-GCCCC-3'. There is evidence that the transcription factor AtMYB2 is induced by low-oxygen stress (Hoeren et al., 1998
) and, thus, could be another important factor for the regulation of anaerobic genes.
Our aim was to find common motifs shared by the majority of regulatory regions of anaerobic genes that would be in addition to the previously well-characterized GC and GT motifs. Detection of new motifs that could be potential transcription factor-binding sites (TFBSs) could help in understanding the transcriptional regulation of these anaerobic genes under stress conditions. More importantly, a clearer understanding of the promoter content and architecture would allow a better understanding of co-ordinate regulation and could provide background for reconstruction of parts of gene regulatory networks of anaerobic responsive genes (Werner et al., 2003
). TFBSs usually have length of 420 nt with 050 % mismatches per motif (Poluliakh and Nakai, 2003
). Important motifs can be discovered computationally as patterns common to most sequences in a family of sequences, either as sequence patterns based on sequence comparison or from comparison of structures (Z. Wang et al., 2004
). Many biologically relevant patterns have been found using motif discovery algorithms. There are several programs available for the extraction of motifs, such as MEME (Bailey and Elkan, 1994
), GibbsDNA (Lawrence et al., 1993
), CONSENSUS (Hertz and Stromo, 1999
), SPLASH (Califano, 2000
) and Weeder Web (Pavesi et al., 2004
), among others. We have used the DRAGON MOTIF BUILDER (DMB) program (Huang et al., 2005
; Yang et al., 2005
) for our detection work. This program has already been applied successfully to the analysis of 11 500 mouse and 18 300 human promoters to detect motifs in an ab initio manner (Huang et al., 2005
).
In this report, we describe several new common patterns that have not been reported previously for plants and could be functional TFBSs. The fidelity of our list of over-represented motifs is enhanced by finding several already known TFBSs in our target group of anaerobic genes. We demonstrate by using negative control promoter sets that the patterns we discovered are over-represented only in the anaerobic promoter target group and are specific to that group, giving further support to our hypothesis that the new motifs we report here could have a functional role as anaerobic promoter elements.
| METHODS |
|---|
|
|
|---|
Promoter sequences
Plant promoter sequences were extracted from SoftBerry's Plant Promoter Database (PPD) (Shahmuradov et al., 2003
Data set of anaerobic genes (anaerobic set 1)
A group of 13 anaerobic genes that belong to the ethanolic fermentative pathway from seven different plant species were used to identify probable promoter regulatory elements (motifs). We have extracted six promoter sequences from PPD, five sequences from EPD and two from GenBank. The genes included are maize alcohol dehydrogenase 1 (ADH1), maize ADH2, arabidopsis ADH, pea (Pisum sativum) ADH1, petunia (Petunia hybrida) ADH2, tomato (Lycopersicon esculentum) ADH3a, tomato ADH3b, barley ADH2, maize sucrose synthase, arabidopsis sucrose synthase, rice sucrose synthase, rice pyruvate decarboxylase (PDC) and maize fructose bisphosphate aldolase. This promoter set we denote as anaerobic set 1. Although anaerobic gene sequences from this same pathway, such as arabidopsis PDC1, maize glyceraldehyde-3-phosphate dehydrogenase, rice ADH2, cotton (Gossypium hirsutum) ADH2b-2 and arabidopsis lactate dehydrogenase1, were available, we were not able to use them since they lack accurate information regarding TSSs and promoter regions.
Tool for motif detection (DMB program)
To find the known and unknown promoter motifs in the compiled promoter sequences, we used the DMB program (http://research.i2r.a-star.edu.sg/DRAGON/Motif_Search/) and the following parameters: EM2, single motif occurrence in the sequences with zero or one motif per sequence. We searched for all motifs of lengths 512 nt, with the total number of ten motifs per session. In the sessions, we manually changed the thresholds.
Analysis of motifs
A total of 120 motifs were detected in the promoters of the 13 anaerobic genes using the DMB program with different thresholds. Out of the 120 motifs, we selected 16 motifs with the highest frequency of appearance. The significance of the selected motifs was evaluated by searching for their presence in TFBS databases such as TRANSFAC (Matys et al., 2003
; http://www.gene-regulation.com), PlantCARE (Lescot et al., 2002
; http://oberon.rug.ac.be:8080/PlantCARE/index.html) and PLACE (Higo et al., 1999
; http://www.dna.affrc.go.jp/htdocs/PLACE/).
Homology search for TFs that could bind motifs (found in anaerobic set 1) that are unknown in plants
We found five unknown motifs in anaerobic set 1, which are not known to act as TFBSs in plants, but are known to be TFBSs in animal cells. We searched for homology of TFs that bind these animal TFBSs to arabidopsis and rice proteins. We used BLAST (Altschul et al., 1997
) and the internet service at http://www.ncbi.nlm.nih.gov/BLAST/producttable.shtml. The protein sequences found in animals were extracted from Swiss-Prot Protein Knowledgebase (Boeckmann et al., 2003
; http://tw.expasy.org/sprot/).
Data set of 117 anaerobic genes involved in signal transduction/transcription and various metabolic pathways (anaerobic set 2)
Based on the results of low-oxygen response in arabidopsis root culture in a microarray experiment, by Klok et al. (2002)
, we selected a larger data set of anaerobic genes that are highly overexpressed or underexpressed. Based on gene names in this group, we extracted from different plant species 117 promoter sequences from PPD and EPD and we denote this set as anaerobic set 2. This set includes 13 anaerobic genes of the ethanolic fermentative pathway (anaerobic set 1), as well as many other genes of a heterogeneous nature involved in signal transduction/transcription which belong to a number of different metabolic pathways. With this set, we aimed to check to what extent motifs found in anaerobic set 1 are shared with promoters of anaerobic set 2.
Negative control set 1 (data set of
-amylase genes)
A data set of 15 promoters of
-amylase genes from four plants of the cereal group [rice, wild oat (Avena fatua), barley and wheat] was selected as the negative control set 1.
-Amylase was chosen for negative control since it is a starch-degrading enzyme and is known to be anaerobically induced only in rice, which is in contrast to our sugar-degrading anaerobic genes. We have extracted promoter regions of these genes from PPD and EPD databases.
Negative control set 2 (promoters from PPD and EPD which excludes known anaerobically induced genes)
From PPD and EPD, we extracted another negative data set of 303 genes having different functions and originating from different plants. This set does not include either the anaerobic genes or the genes differentially expressed by low oxygen in the microarray experiment of Klok et al. (2002)
. This was mainly done to observe whether the motifs detected for the anaerobic genes in the ethanolic fermentative pathway would also turn up in this negative control set.
Negative control set 3 (promoters of genes induced by cold, drought, high salinity stresses and ABA application)
Promoter sequences of the genes dehydrin, aldehyde dehydrogenase, protease inhibitor, catalase, chlorophyll a-b binding protein, actin and phenylalanine ammonia-lyase (based on expression of rice genes in microarray experiments; Rabbani et al., 2003
) induced by cold, drought, high salinity and abscisic acid (ABA) application were extracted from PPD and EPD. Altogether, 17 sequences were extracted from seven different plants [arabidopsis, rice, potato (Solanum tuberosum), tomato, pea, curled-leaved tobacco (Nicotiana plumbaginifolia), wood tobacco (Nicotiana sylvestris) and oilseed rape (Brassica napus)].
Negative control set 4 (promoters of heat shock protein genes)
To compare the motifs detected in the anaerobic genes with other stress response genes, we extracted from PPD and EPD a set of 12 genes responsive in heat stress from four different plant species [rice, arabidopsis, soybean (Glycine max) and maize].
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
Selection and significance of motifs
We analysed 13 anaerobic genes from ethanolic fermentative pathway from seven plant species (anaerobic set 1) and searched for shared promoter motifs. This pathway is the most prominent pathway involved in anaerobic stress. Out of 120 motif groups detected, we selected 16 that were the most frequent (they appeared in >46 % of promoters, a minimum six or more promoters) (Fig. 1). The maximum occurrence in anaerobic promoters was 92 % (12 out of 13) and 13 motifs were found with either 62 % occurrence (eight out of 13) or more. The total information content (IC) (for definition see Stormo, 2000
|
|
We analysed the distribution of motifs in different segments of promoters and, if the first nucleotide of the motif fell within the considered region, we counted that motif as appearing in that region. We looked for the presence of motifs in [200, 150], [150, 100], [100, 50], [50, 1] and [+1, +50] regions. We observed that most of the motifs were found in the regions [200, 150], [150, 100] and [50, 1]. The highest percentage of the motifs was found in the region [50, 1]. The motifs AGCAGC, CACAAT, TTATTA and CAACTCA were found in all upstream regions. However, most interestingly, some motifs seem to prefer very specific regions. For example, the ATATAAATT motif has a preferred region [50, 1] where it is found in 77 % of promoters. The TATAAAAAC motif appeared in 47 % of promoters in the same region. Both of these motifs contain the TATA-box motif TATAAA. It is commonly believed that the TATA-box is present at positions around 30 relative to the TSS, which has good concordance with our result. In plants, two types of TATA-binding proteins are present, which bind to the TATA-box (Vogel et al., 1993
The motif AAA(A/C)CCTC was found in the region 200 to 150 in 46 % of promoters and it contains a central motif AACC that is at the core of the GT-box (reverse complement). The presence of GT motifs in the promoter of anaerobically induced genes of different plant species was studied by Hoeren et al. (1998)
. They observed that the GT motif is present in all anaerobically induced genes and is located within 300 to 100 bp upstream of TSS. In our analysis, the location of the GT motif in the [200, 150] region agrees very closely with the observations of Hoeren et al. In arabidopsis, the presence of a GT motif located in the promoters of the ADH1 gene is responsible for the induction of anaerobic genes.
Expression profiling of low-oxygen responsive genes in arabidopsis using a 3500 cDNA array revealed 210 differentially expressed transcripts (Klok et al., 2002
). These genes were organized in six related groups based on their patterns of RNA accumulation levels. The clustered genes showed over-representation of 610 bp motifs that were previously described for the ADH1 promoter. Out of the six motifs listed by Klok et al. (2002)
, GC and GT motifs were similar to the two motifs identified by our analysis. The presence of previously identified motifs in our search gives an indication that our methodology is reasonable.
The motifs TTTTTCT and TTTTCTTC are each present in 62 % of promoters in the [50, 1] region. The motif GTTT(A/C/T)GCAA was also found in [50, 1] with 47 % occurrence. The motif TCATCAC was present in the region [50, 1] in 54 % of promoters. One motif, TTCCCTGTT, is found in 54 % of promoters in the [100, 50] region. We also provide for convenience in Fig. 3 positional distributions of the motifs found in 13 promoters of anaerobic set 1. In Fig. 3, the patterns ATATAAATT (motif 11) and TATAAAAAC (motif 13) contain the commonly found TATA-box motif and both have a preferred position in [50, 1] as one would expect for a TATA-box motif. The new motifs which are not found in plants such as TCATCAC (motif 7) and TTCCCTGTT (motif 16) have a similar preferred region [75, +20], with just one of each of these motifs falling outside the region. The motif TCATCAC is present in the promoter sequences of ADH, sucrose synthase and aldolase genes, whereas the motif TTCCCTGTT is present in ADH and aldolase genes. The other new motifs AAACAAA (motif 1) and AGCAGC (motif 2) also show closer distribution and are found in ADH, sucrose synthase and aldolase gene sequences. Among the other motifs, TCCTCCT (motif 14) is distributed around the same region of the promoters and is found in ADH, PDC and sucrose synthase genes. The motifs CACAAT (motif 3) and AA (A/G)ATT (motif 5) are distributed similarly on the promoters and are found in ADH and sucrose synthase genes. The motif TTTTTCT (motif 6) and TTTTCTTC (motif 8) are both found in ADH, PDC and sucrose synthase genes and also show a similar distribution pattern. The other motifs, TTATTA, GTTT(A/C/T)GCAA and CAACTCA, are more randomly distributed across promoters. Thus, the positional distributions of different motifs indicate specific positional preferences that may suggest regulatory roles for such motifs in the fermentative pathway of anaerobic genes.
|
The significance of the motifs found was determined by searching for their presence in TFBS databases such as the TRANSFAC, PlantCARE and PLACE database (Table 1). Out of 16 motifs studied, eight of them [TTTTTCT, TTTTCTTC, AAA(A/C)CCTC, ATATAAATT, (A/C/G)AAAAACAAA, TATAAAAAC, TCCTCCT and CAACTCA] are reported in the TRANSFAC database as TFBSs for plants. The motifs CACAAT, TTATTA and AA(A/G)ATT have been reported in the PLACE database as parts of motifs of other TFBSs for plants. Only one motif, AAA(A/C)CCTC, has been reported partly as the TFBS of the ARE GT motif in maize (Walker et al., 1987
|
Presence of animal TFs (unknown anaerobic motifs detected in anaerobic set 1) in plant homologues
The motifs AAACAAA, AGCAGC, TCATCAC, GTTT(A/C/T)GCAA and TTCCCTGTT, found in anaerobic set 1, are not known to be binding sites for plant TFs, but are found in animals. The protein sequences of the animal TFs were aligned to the protein sequences of arabidosis and rice to observe whether the animal TFs exist in plants. The list of BLAST hits (Table 2) did not provide sufficient similarity to suggest that TFs similar to animal TFs exist in these two species. However, they were promising since some of the hits suggested that these novel TFs do occur in plants. Also, many of the hits were to plant proteins of unknown function or to hypothetical plant proteins. Although this analysis was inconclusive in the sense that we were not able to detect plant proteins that were very similar to animal TFs, the suggested outcomes that point to plant TFs or hypothetical proteins or proteins with unknown function are encouraging and require further study.
|
Detection of motifs in promoters of 117 anaerobic genes in anaerobic set 2
The previously analysed data set of 13 anaerobic genes was homogeneous in the sense that all the genes belong to the ethanolic fermentative pathway, and, thus, one could expect that these genes share many similarities in their promoter regions. Our finding of common promoter motifs supports such an assumption. However, many anaerobic genes code for proteins not involved in fermentation. Thus, it was important also to examine a selection of such genes. We used a larger promoter group from 117 genes (anaerobic set 2) determined from various species. The chosen genes were based on gene name matching to ones highly over- and underexpressed in arabidopsis during a low-oxygen microarray experiment (Klok et al., 2002
|
|
The presence of the motifs found in anaerobic set 2 related to different metabolic pathways was searched in the TRANSFAC, PlantCARE and PLACE databases. A few motifs such as AAAGAAA, AAAGAAAAA, ATTTTTAT, AAAACC and CAACTT are listed as being present partly in plants as TFBSs. The others were not listed for plants, but have been found in other organisms.
Use of
-amylase as negative control (negative control set 1)
Sugar availability plays an important role in the production of energy by fermentation in oxygen-starved tissues (Vartapetian and Jackson, 1997
). As the amount of hexoses and disaccharides is limited, the degradation of starch reserves becomes crucial for survival under anoxic conditions (Perata et al., 1998
). Among the starch-degrading enzymes,
-amylase plays a major role (Sun and Henson, 1991
). There is evidence that, in rice, it is produced in germinating seeds under anoxia (Perata et al., 1992
), but not in anoxia-intolerant wheat, barley and other cereals (Gulieminetti et al., 1995
). There is also a report that
-amylase plays a similarly important role in the anoxia-tolerant rhizome of Acorus calamus (Arpagaus and Braendle, 2000
). Loreti et al. (2003)
demonstrated that
-amylase production under anoxia is mostly due to the activity of the Ramy3D gene. Due to the critical role these genes have in the supply of sugar during anoxic conditions in tolerant rice seeds and other anoxia tolerant tubers, it was logical to compare these genes with anaerobic genes. Thus, we compiled a data set of promoters of
-amylase genes from various plants as a negative control set.
Using DMB, we detected motifs in promoters of
-amylase genes as we did for anaerobic genes. We selected 20 motifs with the highest occurrence. These motifs together with their percentage occurrence are presented in Table 5. The results show that the motifs detected are not the same as those detected for anaerobic genes (Table 1).
|
We searched for the presence of motifs from
-amylase promoters in the TRANSFAC, PlantCARE and PLACE databases. Six motifs that were previously reported as TFBSs for
-amylase genes were identified. These are TTTCCAT (Amy 32B in barley), CCTTTTCA (Amy 32B in barley), CAGTGCCTCCAA (Amy 3d in rice), GTAGCCATCAAT (Amy 32B in barley), AGTGCCTCCAA (Amy 3D in rice) and CACTGCCTATAAAT (Amy 3D in rice). The motifs CTATAA, CCATCAGC, CCATCAAC, AGCCATCA (A/G) and CTGCCTATAA are unknown in plants, but they are known for other organisms.
Detection of motifs in 303 promoters of presumably non-anaerobic genes (negative control set 2)
To validate our results further and to check that the system did not generate an excessive number of false-positive motifs, a similar detection protocol was applied using 303 genes having different functions and originating from different plant species (negative control set 2). The top 20 motifs detected with the highest frequency are reported in Table 6. In this data set, we did not find the same motifs as in anaerobic promoters of anaerobic set 1 (Table 6), but we did find three shorter motifs that partly overlap with motifs of anaerobic set 1. Hence, we conclude that the motifs of the negative control set 2 are not prominent in anaerobic set 1.
|
Out of 20 motifs we detected in this group, only three partly overlap with the motifs detected in anaerobic set 1 (Table 7). One of them, TATAAAT, found in 79 % of sequences, contains a commonly found TATA-box that binds general TFs and, thus, its presence can be expected. The other two motifs, AAAACAA and CAACTT, that were similar to motifs detected in anaerobic set 1, were found in 61 and 56 % of sequences, respectively. The motif AAAACAA is present as an auxin-responsive element in pea in the primary indole acetic acid-inducible gene (Balas et al., 1993
|
Comparison with motifs in the promoters of negative control set 3
A number of genes have been identified as being induced by different abiotic stresses (Thomashow, 1999
Due to the unavailability of promoter sequences in the PPD and EPD, it was difficult to analyse all individual genes induced by factors such as cold, drought or ABA However, based on cDNA microarray expression analysis performed in rice (Rabbani et al., 2003
), we compiled a data set of seven genes out of this set whose promoters are present in PPD and EPD. We detected 20 motifs (Table 8) with high frequency of occurrence across the promoters. The motifs detected in this data set were different from the motifs detected in anaerobic set 1, except for the TATA-box and AA(A/T)CAAA, which is partly similar to the motif AAACAAA. However, the other 18 motifs were very different from the motifs detected in anaerobic set 1. As we have discussed earlier, the TATA-box is a common motif found in many genes, but the motif AA(A/T)CAAA could play a common role in stress-induced genes in plants. This now requires further analysis and experimental verification. Thus, our results suggest that most of the motifs detected in this negative control data set are different from the motifs found in anaerobic genes of the ethanolic fermentative pathway.
|
Comparison with motifs in the promoters of heat shock protein genes (negative control set 4)
As a test of specificity, we thought it logical to compare the motifs we detected in anaerobic genes with those occurring in the promoter regions of other stress response genes. Accordingly, we compiled a set of promoters of heat shock protein (HSP) genes. These proteins are usually undetectable under normal growth conditions but become rapidly induced in response to heat. The accumulation of HSPs depends on both temperature and duration of the stress (Howarth, 1991
In the HSP set, we detected
20 motifs showing a high frequency of occurrence. Some motifs occurred in no less than 67 % of the genes, while others were present in all the genes (Table 9). We also searched for these motifs in the TRANSFAC, PlantCARE and PLACE databases. Seventeen motifs are already known to be present in plants as TFBSs, and those not seen in plants before have been found in other organisms. None of these 20 motifs is present in the anaerobic genes except the TATA-box. However, the TATA-box is one of the general core promoter elements and thus not specific for transcriptional activation of any particular functional gene group. The motifs detected suggest that those we have found for anaerobic genes do not have any role in HSP gene activation but may well play a major role in the control of anaerobic genes themselves.
|
The list of motifs over-represented in the anaerobic genes from the ethanolic fermentative pathway, together with the expression profiles of these genes could provide necessary clues for reconstruction of parts of regulatory networks for this pathway. In addition, the results shown here will allow biological validation using standard methods such as quantitative real-time PCR assays and by reporter gene assays of transgenic plants carrying chimaeric constructs of selected promoter regions fused to reporter genes.
| CONCLUSIONS |
|---|
|
|
|---|
We detected motifs in the [200, +50] region relative to the TSS of 13 anaerobic genes selected from seven different plants. Sixteen of the most significant motifs were selected out of 120 motifs. Of these, eight are reported in the TRANSFAC database as TFBSs, while another three are included in the PLACE database as parts of other known motifs. The remaining five motifs have not been reported previously for plants as binding sites of TFs. However they have been reported as such for other organisms including humans, mouse, rat, Drosophila and others, increasing the chances that the new motifs we found in the majority of anaerobic promoters are biologically active and relevant to the regulation of anaerobic metabolism in plants. We also searched for the presence of animal TFBSs in plant homologues (arabidopsis and rice). Although the results did not provide sufficient support to prove that the animal TFs are present in plants, they do provide some encouraging clues to guide further analysis, since several hits were to TFs in plants, but these hits were not of sufficiently strong similarity. We also detected motifs in a larger promoter group from 117 genes from various species from anaerobic set 2. Four motifs, TTTTTGT, TTCATCA, AAAACC and CAACTT, were found to be similar to the motifs detected in the data set of 13 anaerobic promoters from the ethanolic fermentative pathway.
We compiled a set of
-amylase promoters as a negative control to compare with anaerobic promoters. The motifs found for anaerobic promoters were different from the motifs detected for
-amylase promoters. We also compared the presence of motifs from anaerobic promoters in a set of 303 promoters compiled from presumably non-anaerobic genes (negative control set 2). Out of the 20 most significant motifs, only three partly overlapped with motifs from anaerobic set 1. One of these was similar to the TATA-box, which is commonly found in the upstream regions of very many genes. The other two could be TFBSs that are more commonly shared in plant promoters. To validate further our motifs found in anaerobic set 1, we detected motifs in a number of genes induced by cold, drought, high salinity and ABA application (negative control set 3) that ADH also responds to. The 16 motifs detected in anaerobic set 1 were not found in the top 20 significant motifs detected in the negative control set 3, with the exception of a partial TATA-box motif and AA(A/T)CAAA which is partly similar to the AAACAAA motif of anaerobic set 1. This result suggests that although ADH responds to different stress conditions, the regulation could be different depending on the stress condition. In the data set of HSP promoters, no motif (out of the top 16) from anaerobic set 1 was present. These observations indicate that the 16 motifs we detected for anaerobic promoters could be biologically active, as they are highly specific to promoters of anaerobic genes that belong to the ethanolic fermentative pathway. The five new motifs that are not yet known as plant TFBSs are potentially new binding sites in plants and they, either individually or in combination with other motifs, could play an important role in regulating anaerobic metabolism. However, experimental verification will be necessary to establish the functionality of these motifs more certainly.
| LITERATURE CITED |
|---|
|
|
|---|
-
Altschul SF, Madden T, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25: 33893402.
Armstrong W. 1979. Aeration in higher plants. Advances in Botanical Research 7: 225232.
Arpagaus S, Braendle R. 2000. The significance of
-amylase under anoxia stress in tolerant rhizomes (Acorus calamus L.) and non-tolerant tubers (Solanum tuberosum L. var. Desiree). Journal of Experimental Botany 51: 14751477.
Bailey TL, Elkan C. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology (ISMB'94) 2: 2836, AAAI Press, Menlo Park, California, August.
Ballas N, Wong LM, Theologis A. 1993. Identification of the auxin-responsive element, AuxRE, in the primary indoleacetic acid-inducible gene, PS-IAA4/5, of pea (Pisum sativum). Journal of Molecular Biology 233: 580596.[CrossRef][ISI][Medline]
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. 2004. GenBank update. Nucleic Acids Research 32: D23D26.
Boeckmann B, Bairoch A, Apweiler R, Blatter M, Estreicher A, Gasteige E, et al. 2003. The SWISS-PROT protein knowledgebase and its supplement TRMBL. in 2003. Nucleic Acids Research 31: 365370.
Califano A. 2000. Splash, structural pattern localization analysis by sequential histograms. Bioinformatics 16: 341357.
Caselle M, Di Cunto F, Provero P. 2002. Correlating overrepresented upstream motifs to gene expression: a computational approach to regulatory element discovery in eukaryotes. BMC Bioinformatics 3: 7.[CrossRef][Medline]
Dolferus R, Jacobs M, Peacock WJ, Dennisn ES. 1994. Differential interactions of promoter elements in stress responses of the arabidopsis Adh gene. Plant Physiology 105: 10751087.[Abstract]
Dolferus R, Klok EJ, Delessert C, Wilson S, Ismond KP, Good AG, et al. 2003. Enhancing the anaerobic response. Annals of Botany 91: 111117.
Fennoy SL, Bailey-Serres J. 1995. Post-transcriptional regulation of gene expression in oxygen-deprived roots of maize. Plant Journal 7: 287295.[CrossRef][ISI]
Geffers R, Sell S, Cerff R, Hehl R. 2001. The TATA box and a Myb binding site are essential for anaerobic expression of a maize Gap C4 minimal promoter in tobacco. Biochimica et Biophysica Acta 1521: 120125.[Medline]
Greenway H, Gibbs J. 2003. Mechanism of anoxia tolerance in plants. I. Growth, survival and anaerobic catabolism. Functional Plant Biology 30: 147.[CrossRef][ISI]
Grover A, Pareek A, Singla SL, Minhas D, Katiyar S, Ghawana S, et al. 1998. Engineering crops for tolerance against abiotic stresses through gene manipulation. Current Science 75: 689696.
Guglielminetti L, Yamaguchi J, Perata P, Alpi A. 1995. Amylolytic activities in cereal seeds under aerobic and anaerobic conditions. Plant Physiology 109: 10691076.[Abstract]
van Helden J, Andre B, Collado-Vides J. 1998. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. Journal of Molecular Biology 281: 827842[CrossRef][ISI][Medline]
Hertz GZ, Stormo GD. 1999. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15: 563577.
Higo K, Ugawa Y, Iwamoto M, Korenaga T. 1999. Plant cis-acting regulatory DNA elements (PLACE) database. Nucleic Acids Research 27: 297300.
Hoeren FU, Dolferus R, Wu Y, Peacock WJ, Dennis ES. 1998. Evidence for a role of AtMYB2 in the induction of the arabidopsis alcohol dehydrogenase (ADH1) gene by low oxygen. Genetics 149: 479490.
Howarth CJ. 1991. Molecular responses of plants to an increased incidence of heat shock. Plant, Cell and Environment 14: 831841.[CrossRef]
Huang E, Yang L, Chowdhary R, Kassim A, Bajic VB. 2005. An algorithm for ab-initio DNA motif detection. In: Bajic VB, Tan TW, eds. Information processing and living system. World Scientific, Imperial College Press, London, 611614.
Jin H, Martin C. 1999. Multifunctionality and diversity within the plant MYB-gene family. Plant Molecular Biology 41: 577585.[CrossRef][ISI][Medline]
Kennedy RA, Rumpho ME, Fox TC. 1992. Anaerobic metabolism in plants. Plant Physiology 84: 12041209.
Khush GS, Baenziger PS. 1998. Crop improvement: emerging trends in rice and wheat. In: Chopra VL, Singh RB, Verma A, eds. Crop productivity and sustainabilityshaping the future. New Delhi: Oxford and BH publishing, 113125.
Klok EJ, Wilson IW, Wilson D, Chapman SC, Ewing RM, Somerville SC, et al. 2002. Expression profile analysis of the low-oxygen response in arabidopsis root cultures. Plant Cell 14: 24812494.
Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC. 1993. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignments. Science 262: 208214.
Lescot M, Dehais P, Thijs G, Marchal K, Moreau Y, Van de Peer Y, et al. 2002. PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. Nucleic Acids Research 30: 325327.
Loreti E, Yamaguchi J, Alpi A, Perata P. 2003. Sugar modulation of
-amylase genes under anoxia. Annals of Botany 91: 143148.
Matys V, Fricke E, Geffers R, GoBling E, Haubrock M, Hehl R, et al. 2003. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Research 31: 374378.
Olive MR, Walker JC, Singh K, Dennis ES, Peacock WJ. 1990. Functional properties of the anaerobic response element of the maize Adh1 gene. Plant Molecular Biology 15: 593604.[CrossRef][ISI][Medline]
Olive MR, Walker JC, Singh K, Ellis JG, Llewellyn D, Dennis ES, et al. 1991a. The anaerobic response element. Plant Molecular Biology 2: 673684.
Olive MR, Peacock WJ, Dennis ES. 1991b. The anaerobic responsive element contains two GC-rich sequences essential for binding a nuclear protein and hypoxic activation of the maize Adh1 promoter. Nucleic Acids Research 19: 70537060.
Pavesi G, Mereghetti P, Mauri G, Pesole G. 2004. Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Research 32: W199W203.
Perata P, Pozueta-Romero J, Akazawa T, Yamaguchi J. 1992. Effect of anoxia on starch breakdown in rice and wheat seeds. Planta 188: 611618.[ISI]
Perata P, Loreti E, Guglielminetti L, Alpi A. 1998. Carbohydrate metabolism and anoxia tolerance in cereal grains. Acta Botanica Neerlandica 47: 269283.[ISI]
Poluliakh N, Nakai K. 2003. Extraction of biological motifs by Gibbs Sampler from the promoters of Homo sapiens, Saccharomyces cerevisiae and Bacillus subtilis. Genome Informatics 14: 406407.
Praz V, Perier R, Bonnard C, Bucher P. 2002. The Eukaryotic promoter database, EPD: new entry types and links to gene expression data. Nucleic Acids Research 30: 322324.
Rabbani MA, Maruyama K, Abe H, Khan MA, Katsura K, Ito Y, et al. 2003. Monitoring expression profiles of rice genes under cold, drought, and high-salinity stresses and abscisic acid application using cDNA microarray and RNA gel-blot analyses. Plant Physiology. 133: 17551767.
ap Rees T, Jenkin LET, Smith AM, Wilson PM. 1987. The metabolism of flood tolerance plants. In Crawford RMM, ed. Plant life in aquatic and amphibious habitats. Oxford: Blackwell Scientific, 227238.
Sachs MM, Freeling M, Okimoto R. 1980. The anaerobic proteins of maize. Cell 20: 761767.[CrossRef][ISI][Medline]
Sachs MM, Subbaiah CC, Saab IN. 1996. Anaerobic gene expression and flooding tolerance in maize. Journal of Experimental Botany 47: 115.[ISI]
Shahmuradov IA, Gammerman AJ, Hancock JM, Bramley PM, Solovyev VV. 2003. PlantProm: a database of plant promoter sequences. Nucleic Acids Research 31: 114117.
Shinozaki K, Yamaguchi-Shinozaki K. 2000. Molecular responses to dehydration and low temperature: differences and cross-talk between two stress signaling pathways. Current Opinion in Plant Biology 3: 217223.[ISI][Medline]
Stormo GD. 2000. DNA binding sites: representation and discovery. Bioinformatics. 16:1623.
Sun Z, Henson CA. 1991. A quantitative assessment of the importance of barley seed
-amylase, debranching enzyme, and
-glucosidase in starch degradation. Archives of Biochemistry and Biophysics 284: 298305.[CrossRef][ISI][Medline]
Sun W, Montagu MV, Verbruggen N. 2002. Small heat shock proteins and stress tolerance in plants. Biochimica et Biophysica Acta 1577: 19.[Medline]
Thomashow MF. 1999. Plant cold acclimation: freezing tolerance genes and regulatory mechanisms. Annual Review of Plant Physiology and Plant Molecular Biology 50: 571599.[CrossRef][ISI]
Vartapetian BB, Jackson MB. 1997. Plant adaptations to anaerobic stress. Annals of Botany 79: 320.
Vilo J, Brazma A, Jonassen I, Robinson A, Ukkonen E. 2000. Mining for putative regulatory elements in the yeast genome using gene expression data. Proceedings of the International Conference on Intelligent Systems for Molecular Biology 8: 384394.
Vogel JM, Roth B, Cigan M, Freeling M. 1993. Expression of the two maize TATA binding protein genes and function of the encoded TBP proteins by complementation in yeast. Plant Cell 5: 16271638.[Abstract]
Walker JC, Howard EA, Dennis ES, Peacock WJ. 1987. DNA sequences required for anaerobic expression of the maize Adh1 gene. Proceedings of the National Academy of Sciences of the USA 84: 66246629.
Wang W, Vinocur B, Shoseyov O, Altman A. 2004. Role of plant heat-shock proteins and molecular chaperones in the abiotic stress response. Trends in Plant Science 9: 244252.[CrossRef][ISI][Medline]
Wang Z, Dalkilic M, Kim S. 2004. Guiding motif discovery by iterative pattern refinement. ACM Symposium on Applied Computing, Nicosia, Cyprus. March 1417: 162166.
Werner T, Fessele S, Maier H, Nelson PJ. 2003. Computer modeling of promoter organization as a tool to study transcriptional coregulation. FASEB Journal 17: 12281237.
Yanagisawa S. 2002. The Dof family of plant transcription factors. Trends in Plant Science 7: 555560.[CrossRef][ISI][Medline]
Yang L, Huang E, Bajic VB. 2005. Some implementation issues of heuristic methods for motif extraction from DNA sequences. International Journal of Computers, System, and Signals (in press).
Zhu JK. 2002. Salt and drought stress signal transduction in plants. Annual Review of Plant Biology 53: 247273.[CrossRef][Medline]
This article has been cited by other articles:
![]() |
K. Pasentsis, V. Falara, I. Pateraki, D. Gerasopoulos, and A. K. Kanellis Identification and expression profiling of low oxygen regulated genes from Citrus flavedo tissues using RT-PCR differential display J. Exp. Bot., June 1, 2007; 58(8): 2203 - 2216. [Abstract] [Full Text] [PDF] |
||||
![]() |





