Classifying Coding DNA with Nucleotide Statistics
Nicolas Carels and Diego Frias

Presentation Outline:

Carels, N. and Frias, D. (2009). Classifying coding DNA with nucleotide statistics. Bioinformatics and Biology Insights 3:141-154. pdf.

  1. Importance of accurate methods for the detection of coding DNA
  2. Methods which have been used to identify CDS
    1. Based on codon usage (Hidden Markov Models)
    2. Nucleotide periodicity
    3. Detection of ancestral codon ,RNY pattern
  3. Methodology behind UFM
  4. Results
  5. Discussion
    1. Scoring Purine Bias with UFM
    2. Comparison of CSF and UFM
    3. Comparison of the Classification of Coding and Non-coding ORF by UFM
  6. References

1. Importance of accurate methods for the detection of coding DNA

Methods of Gene Detection

Extrinsic Methods
Intrinsic Methods

What is CDS?

The coding sequence (CDS) region of a gene is a sequence of nucleotides which corresponds to a sequence of amino acids in a protein,a typical CDS starts with ATG and ends in a stop codon (1).


A large amount of a DNA sequence is non-coding and identification of the coding regions is critical is determining the  areas within the genome which code for certain proteins. Due to the advances in sequencing technologies there is large amount genomic data that needs to be searched to identify genes. This has created the demand for automated programs which can accurately and quickly identify the coding regions thereby providing insight into the function of a gene.

2. Methods for CDS Identification

A. Codon Usage

Hidden Markov Methods
B. Nucleotide Periodicity

Average Mutual Information and Spectral Rotation Measure
C. Detection of Ancestral Codon, RNY pattern (CSF and UFM)
Codon Structure Factor (CSF)
Universal Feature Method (UFM)
Next Page