EPDnew, the Arabidopsis thaliana (thale cress) curated promoter database


Version: 003
Coverage: 21233 promoters, 21223 genes.
Genome assembly: A. thaliana (Feb 2011 TAIR10/araTha1)
Gene annotation TAIR 10 genes (1-Feb-2015)
Based on data from:
  • PEAT data from Morton et al. 2014
  • DeepCAGE from Cumbie et al. 2015
  • CAGE and OligoCap from Tokizawa et al. 2017
  • EPD (old)
  • Documentation files Promoter assembly pipeline description

    Promoter Selection and Anaysis tools

    Various tools allow you to analyse promoters from EPD and/or to select subsets of promoters. In order to analyze the complete EPD promoter set, go directly to one of the analysis pages. If you prefer to first select a subset of promoters, go to one of the selection pages. From the output of the selection pages you can then directly navigate to one of the analyses pages, or you can continue with another selection page to refine your promoter selection.

    Selection tools

    • EPD selection tool: Promoter subset selection based on EPD-supplied annotation.
    • ChIP-Cor: Promoter subset selection based on experimental data or genome annotations residing in the MGA repository. Example: select promoters that have more than 100 H3K4me3 ChIP-seq tags data between -100 and +100 relative to the TSS.
    • FindM: Promoter subset selection based on DNA motif occurrences. Example: select promoters that have (or don't have) a c-Myc binding site between -100 and +100 relative to the TSS.
    Analysis tools

    • ChIP-Cor: Generation of an aggregation plot (feature correlation plot) for a specific chromatin of genome annotation features. Example: Distribution of nucleosomes (MNase-seq tags) near promoters, e.g. from -1000 to +1000 relative to the TSS.
    • ChIP-Extract : Extraction of specific chromatin features around each promoter in table format. The output is a table with rows representing each promoters and columns the feature tag occurance at a specific distance. Example: Distribution of nucleosomes (MNase-seq tags) near each promoter, e.g. from -1000 to +1000 relative to the TSS. Useful for downstream analysis in R, for example to classify promoters according to differences in feature distribution.
    • OProf: Generate a motif occurrence profile around TSS positions. Example: Generate a plot showing the occurrence frequency of TATA-boxes between -100 to +100 relative to the TSS.
    • FindM Extract DNA motif positions near transcription start sites. Example: extract coordinates of CCAAT-boxes located between -150 and -50 relative to a TSS. The output is set of CCAAT-box positions that can be further analysed in the same way as a set of TSS positions.
    How-To Documentation: OProf, FindM and ChIP-Cor.

    Database quality controls

    Core promoter elemnts enrichment

    Core promoter element analysis is performed in order to investigate the quality of the promoter collection. It exploits the fact that certain DNA motifs preferentially occur at characteristic distances from a TSS. For instance, the TATA-box occurs in a narrow region centered about 28 bp upstream of the TSS whereas the CCAAT-box occurs in a much wider area with a peak frequency at position −80. Based on these observations, we would expect a high-quality promoter collection to show high peaks for both sequence motifs. In addition, a narrow TATA-box peak at −28 would indicate precise TSS mapping. This analysis has been performed using OProf. Readers are encouraged to repeat this anlysis and perform others in order to check for the quality of the promoter list.

    TATA-box: this core promoter element is normally found 28 bp upstream the transcription start site. The following plot shows that EPDnew promoter collection has a more focused TATA-box distribution compared to TAIR10 annotation suggesting a precise TSS mapping in EPDnew.

    Initiator: it is found at the TSS and shows a great enrichemnt in EPDnew compared to TAIR10 promoter collection.

    Last update May 2017