Class HgvsDna

java.lang.Object
org.snpeff.snpEffect.Hgvs
org.snpeff.snpEffect.HgvsDna

public class HgvsDna extends Hgvs
Coding DNA reference sequence References http://www.hgvs.org/mutnomen/recs.html Nucleotide numbering: - there is no nucleotide 0 - nucleotide 1 is the A of the ATG-translation initiation codon - the nucleotide 5' of the ATG-translation initiation codon is -1, the previous -2, etc. - the nucleotide 3' of the translation stop codon is *1, the next *2, etc. - intronic nucleotides (coding DNA reference sequence only) - beginning of the intron; the number of the last nucleotide of the preceding exon, a plus sign and the position in the intron, like c.77+1G, c.77+2T, .... - end of the intron; the number of the first nucleotide of the following exon, a minus sign and the position upstream in the intron, like ..., c.78-2A, c.78-1G. - in the middle of the intron, numbering changes from "c.77+.." to "c.78-.."; for introns with an uneven number of nucleotides the central nucleotide is the last described with a "+" (see Discussion) Genomic reference sequence - nucleotide numbering starts with 1 at the first nucleotide of the sequence NOTE: the sequence should include all nucleotides covering the sequence (gene) of interest and should start well 5' of the promoter of a gene - no +, - or other signs are used - when the complete genomic sequence is not known, a coding DNA reference sequence should be used - for all descriptions the most 3' position possible is arbitrarily assigned to have been changed (see Exception)
  • Field Details

    • debug

      public static boolean debug
  • Constructor Details

  • Method Details

    • alt

      protected String alt()
    • dnaBaseChange

      protected String dnaBaseChange()
      DNA level base changes
    • isDuplication

      protected boolean isDuplication()
      Is this a duplication?
    • pos

      protected String pos()
      Genomic position for exonic variants
    • pos

      protected String pos(int pos)
      HGVS position base on genomic coordinates (chr is assumed to be the same as in transcript/marker).
    • posDownstream

      protected String posDownstream(int pos)
      Position downstream of the transcript
    • posExon

      protected String posExon(int pos)
      Convert genomic position to HGVS compatible (DNA) position
    • posIntron

      protected String posIntron(int pos, Intron intron)
      Intronic position
    • posUpstream

      protected String posUpstream(int pos)
      Position upstream of the transcript Note: How to calculate Upstream position: If strand is '-' as for NM_016176.3, "genomicTxStart" being the rightmost tx coord: cDotUpstream = -(cdsStart + variantPos - genomicTxStart) Instead of "-(variantPos - genomicCdsStart)": The method that stays in transcript space until extending beyond the transcript is correct because of these statements on http://varnomen.hgvs.org/bg-material/numbering/: * nucleotides upstream (5') of the ATG-translation initiation codon (start) are marked with a "-" (minus) and numbered c.-1, c.-2, c.-3, etc. (i.e. going further upstream) * Question: When the ATG translation initiation codon is in exon 2, and we find a variant in exon 1, should we include intron 1 (upstream of c.-14) in nucleotide numbering? (Isabelle Touitou, Montpellier, France) Answer: Nucleotides in introns 5' of the ATG translation initiation codon (i.e. in the 5'UTR) are numbered as introns in the protein coding sequence (see coding DNA numbering). In your example, based on a coding DNA reference sequence, the intron is present between nucleotides c.-15 and c.-14. The nucleotides for this intron are numbered as c.-15+1, c.-15+2, c.-15+3, ...., c.-14-3, c.-14-2, c.-14-1. Consequently, regarding the question, when a coding DNA reference sequence is used, the intronic nucleotides are not counted.
    • posUtr3

      protected String posUtr3(int pos)
      Position within 3'UTR
    • posUtr5

      protected String posUtr5(int pos)
      Position within 5'UTR
    • prefixTranslocation

      protected String prefixTranslocation()
      Translocation nomenclature. From HGVS: Translocations are described at the molecular level using the format "t(X;4)(p21.2;q34)", followed by the usual numbering, indicating the position translocation breakpoint. The sequences of the translocation breakpoints need to be submitted to a sequence database (Genbank, EMBL, DDJB) and the accession.version numbers should be given (see Discussion). E.g.: t(X;4)(p21.2;q35)(c.857+101_857+102) denotes a translocation breakpoint in the intron between coding DNA nucleotides 857+101 and 857+102, joining chromosome bands Xp21.2 and 4q34
    • ref

      protected String ref()
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • typeOfReference

      protected String typeOfReference()
      Prefix for coding or non-coding sequences