Home   In News   Download   Search   Browse   Statistics   References   Contact Us   Help
About nonAUG
The nonAUG Help Page
Accessing nonAUG
  • Every mRNA sequence in the nonAUG database can be accessed directly via
  • Sequences can be downloaded in FASTA, GenBank or XML format.
  • A nonAUG id is a unique 10 character long string consisting of following format.
  • The accession starts with "nA_" string. The next two characters are taken from the first letters of binomial nomenclature of corresponding organism. It is followed an '_' and a four-letter numeric identifier.
  • Ex:   nA_Nc_0007   Nc: Neurospora Crassa
  • Organism browser provides details of sequences found in various organisms. Details include sequence status, number of upstream AUGs, positions of upstream AUGs and Kozak context surrounding the start codon.
Searching nonAUG
  • nonAUG can be searched by name of organism or by the start codons. Sequences can also be searched according to its annotation status as classified under RefSeq database.
  • Status of a sequence can be "Reviewed", "Provisional", "Predicted", "Model" or "Validated".
Sequence Details
  • Sequences are listed with the following details
nonAUG Accession:Specifies a unique 10 character long accession number
  Description:Contains a brief description of sequence.
  Organism:Lists scientific name of the source organism (genus and species).
  Start Codon/Amino acid: Specifies the non-AUG start codon and the corresponding amino acid encoded by it.
  nonAUG Annotation:indicates whether this particilar entry is annotated as having a non-canonical start codon at RefSeq database
  RefSeq Status:
Indicates the status of sequence at RefSeq. Status falls under following categories
  • Reviewed
  • Provisional
  • Model
  • Validated
  • Predicted
A sequence having no status is categorized under unclassified.
Shows the position of the non-AUG start codon in the given mRNA sequence
  No of upstream AUGs:
Indicates the number of upstream AUG in the 5'UTR region of sequence. Number of AUGs is  unavailable if the sequence is not complete at the 5' end
  Coding Length (bp): This is the length of the coding region
  RefSeq Accession: Shows the accession number of the sequence at RefSeq
  GenInfo Indentifier (GI): Gives the GenInfo Identifier at NCBI
  Kozak Context:
Specifies the context around the start codon of sequence. The context is broadly classified by observing positions at -3 and +4. A favorable context has a purine base at -3 position AND a guanine nucleotide at position +4 of sequence. It may lack a purine base at -3 position OR a guanine base at +4 nucleotide position OR both at once. The context is represented with these available key positions.

The context is not defined, if sequence data at the 5' end is unavailable.

  Gene Name: Lists the name of the gene belonging to the sequence.
  GeneId: Shows the Gene Identifier at NCBI
  Product: Specifies the name of the protein product
  Protein ID Protein sequence accession identifier at NCBI
  Reference This field gives the journal reference citing the sequence
Organism Browser
  • The browser lists the organisms having mRNA sequences with non-canonical start codons. Direct link is provided to each organism.
  • For each organism, number of sequences with non-AUG start codons, different types of non-AUG start codons, number of sequences having upstream AUG codons, sequences with avaliable context and status at RefSeq are summarized.
  • Presence of upstream AUGs in each of the sequences has been shown in upstream AUG codon statistics table.
  • Positions of upstream AUG codons in 5' UTR region in all reading frames are given.
  • Statistics shows the number of sequences available at nonAUG with each release of RefSeq at NCBI. Sequences in nonAUG have been divided according to RefSeq annotation status. A new category "non-AUG annotated" is defined and is a subset of sequences under "Reviewed" category. All the sequences classified under "non-AUG annotated" have explicit annotation regarding alternate translation initiation.
  • Figures showing number of sequences in each of organisms are available with the current release of RefSeq.
Access Problems

Copyright 2009 All rights reserved, IIT Kanpur.                   Copyright Statement   | Privacy  |  Disclaimer   |  Accessibility

Laboratory of Computational Biology
Department of Biological Sciences and Bioengineering

Indian Institute of Technology Kanpur -208016