How do I interpret SNP nomenclature?

How do I interpret SNP nomenclature?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I am combing through my 23 & me raw data and I am a little confused on SNP terminology. I am using NCBI's genome browser and SNP database. As an example we can all follow here is a link to a CYP1A1 SNP When you open this link you will see alleles listed as G>C. Which one of these is the reference genome and which is the pathogenic allele? Is the relationship always constant (like in x>y, x is always the reference and y is the substitution). If you scroll down to the genome browser view, there also doesn't appear to be any way to tell + vs - sense strand (ie making it difficult to tell G for C mutations or vice versa). I would just assume its the top strand? The ultimate goal here is for me to look through 23&me and compare my SNP genotype to the reference genome and see if my genotype exhibits pathogenic SNPs in certain genes.

Quick answer is yeah, the G is the "reference allele" as you put it. The greater than symbol is acting as an arrow, ie, G goes to C, that is, G is replaced with C.

Actually you get a pretty clear indication of that if you look down a line it two where it mentions the frequency:

C=0.000419 (104/247982, GnomAD_exome) C=0.000494 (62/125568, TOPMED) C=0.000538 (62/115296, ExAC)

The C frequency is way too low to be the "normal" allele. It's rather the substitution which results in a missense variant. You can see more specific stats by clicking on the frequency button. In this case they basically just say that in all studied populations the G variant is by far the most common.

How do I interpret SNP nomenclature? - Biology

Discussions regarding the uniform and unequivocal description of sequence variants in DNA and protein sequences (mutations, polymorphisms) were initiated by two papers published in 1993 Beaudet AL & Tsui LC (DOI paper / abstract) and Beutler E (paper / abstract). The original suggestions presented were widely discussed, modified, extended and ultimately resulted in nomenclature recommendations that have been largely accepted and are applied world-wide (see History).

Current rules (den Dunnen, JT and Antonarakis, SE (2000), paper / abstract) however do not extensively cover all types of variants and the more complex changes. These pages will list, based on the last publication, the existing nomenclature recommendations as well as the most recent suggestions (in italics and marked ). More details regarding the latest additions can be found at the Discussion page. These pages can be used as a guide to describe any sequence variant identified and should help to get a uniformly accepted standard.

Discussions regarding the advantages and disadvantages of the recommendations made are necessary in order to continuously improve the system. What is listed on these pages represents the current consensus of the discussions. We invite investigators to communicate to us regarding the recommendations as well as to send us complicated cases not yet covered, with a suggestion of how to describe these (E-mail to: VarNomen @

Mutation and polymorphism

In some disciplines the term "mutation" is used to indicate "a change" while in other disciplines it is used to indicate "a disease-causing change". Similarly, the term "polymorphism" is used both to indicate "a non disease-causing change" or "a change found at a frequency of 1% or higher in the population". To prevent this confusion we do not use the terms mutation and polymorphism (including SNP or Single Nucleotide Polymorphism) but use neutral terms like "sequence variant", "alteration" and "allelic variant". The Vol.19(1) issue of Human Mutation (2002) contains several contributions discussing these issues as well as the fact that the term "mutation" has developed a negative connotation (see Cotton RGH - p.2, Condit CM et al. - p.69 and Marshall JH - p.76). Therefore, current guidelines of authorative organisations now also recommend to use the neutral term " variant " only (e.g. Richards 2015, Genet.Med. 17:405-424).


Another confusing term used frequently is "a pathogenic variant". While a non-expert concludes the variant described "causes disease", the expert probably means "causes disease when in a specific context"

  • causes disease when found in a male (X-linked recessive disorder)
  • causes disease when combined with a similar chance in the other allele (autosomal recessive)
  • causes disease when inherited from the father (imprinted)

To prevent confusion it therefore seems best not to use the term "pathogenic". A good alternative seems a neutral term like "affects function". In fact this properly describes what one actually means, the variant affects the normal function of the gene/protein (in whatever way). This also solves the issue of what term to use for non-disease phenotypes like skin/hair/eye colour or blood group. In such cases it is problematic to choose the phenotype to call "normal" or "pathogenic". Using "affects function" is clear and effective. To classify variants people use most frequently 5 categories. Based on affects function these could be affects funncion, probably affects function, unknown, probably does not affect function (or probably no functional effect), does not affect function (no functional effect). Variants for which a functional effect is unknown can together be called "variants of unknown significance" (VUS).

How do I interpret my raw data?

Your raw data file contains a list of all SNPs, or genetic markers, you have tested positive for. This means that the file contains a very long list of all the SNPs we have found when running your DNA against our testing chip.

The information isn't particularly easy to read unless you have studied genetics or have a computer system that can translate the data into a more user-friendly format - such as our portal.

This file does not contain any imputed markers. Genotype imputation is a technique which geneticists use to make an educated guess about what SNPs are likely to be in a DNA sample, but which haven't been specifically tested for.

The file contains several tab-delimited columns (meaning that the data on each line is separated by a tab, not space or a comma) of information.

A random snippet of a raw data file

The first column gives the rsID (Reference SNP cluster ID) - this is an accession number used by researchers and databases to refer to specific SNPs. This is effectively a universal name for the SNP you have tested positive for.

The second column gives the chromosome number where the SNP is found, and the third column gives the position on the chromosome - this is how far along the chromosome that specific SNP is found.

The final column is your genotype. Your genotype is written using the two letters of the two nucleotide bases at a particular position, such as AA or CT.

All DNA is made up of several nucleotide bases – adenine (A), cytosine (C), guanine (G), and thymine (T). Your DNA is made of two strands of nucleotide bases that bind together and coil into a double helix. This base pair makes up your genotype at that particular position and is expressed in your data files such as AA or CT.

If you have a " -- " in the genotype column of your data file, this means that we could not detect your genotype variant at this SNP when we tested your DNA.

Your mtDNA and/or Y DNA file:

Your mitochondrial and (if you're genetically male) your Y DNA data file will contain a list of DNA markers that make up your mt or Y genetic signature.

We will provide you with the most common name for a particular genetic marker and any alternative names that this marker may have - for example, if you have one row of your file like this "F83/M1185/PF5861" - this means that you have tested positive for marker "F83" which is also known as "M1185" and "PF5861".

Please note that we do not include genetic markers you have not tested positive for in these files.

We have used these positive markers to assign you your haplogroup and subclade (if applicable). Each haplogroup has a defining marker and, often, a set of supporting markers. We search for which defining and supporting markers you have in order to assign you your haplogroup - we always read as far down the mt and/or Y DNA phylogenetic tree as we can within scientific reason.

You may find that you possess markers for haplogroups or subclades lower down the tree than the one we have assigned you.

Although you may have markers further down the tree, they will not appear reliably at each level/branch of the tree. In order to be certain about your haplogroup, we need to see the marker for each level in your DNA.

Therefore, if you do have markers for a haplogroup further down the tree than the one we have assigned you, but you do not have any markers for the haplogroups above it, then we are not able to reliably call you this haplogroup, and will assign one that is higher up the tree.

We test for SNPs, not STRs, thus your raw data file from us may look different to that of other providers.

This genotype data is not suitable for clinical/medical research or diagnosis.

The user assumes all responsibility for the security of this file - please refer to the Living DNA website for more information.

How do I interpret SNP nomenclature? - Biology

Inbred strains of mice represent unique fixed genotypes that can be repeatedly accessed as homogeneous experimental individuals, with predictable phenotypes and defined allelic composition. Hundreds of inbred strains of mice have been described and new strains continue to be developed, taking advantage of the rich genetic diversity among the existing strains and the ease with which the mouse genome can be manipulated.

MGI serves as a registry for mouse strains worldwide, maintaining the authoritative nomenclature for existing strains. Comparative data on inbred strain characteristics, SNPs, polymorphisms, and quantitative phenotypes are integrated with other genetic, genomic, and biological data in MGI.

SNPs (single nucleotide polymorphisms)

MGI provides comprehensive information about reference SNPs including the reference flanking sequence, assays that define the SNP, and gene/marker associations with their corresponding function class annotations. Each SNP detail page includes links to popular gene browsers including the MGI JBrowse Genome Browser.

Other molecular polymorphisms

Strain characteristics and historical origins

MGI holds information on comparative strain characteristics as originally curated by Dr. Michael Festing. These narratives provide key phenotypic traits of major inbred strains, such as behavior, physiology, anatomy, drug responses, immunology, infection, and reproduction. The Genealogy of Inbred Strains provides a "pedigree" of relationships of strains since their origin. The Genealogy Chart graphically displays the movement and development of inbred strains and is particularly useful in looking at dispersion of strains and how inbreeding (and allele fixation) occurred in relation to conserved sequence blocks observed in SNP analysis Data are fully referenced.

What are the ApoE variants and what do they mean?

Here’s where it gets somewhat confusing. There are three common alleles for the APOE gene. Apo-ε2, ε3 and ε4 (often called E2 etc…). There are also some less common versions such as ε1 and even an ε5 allele. You may read that these forms just don’t exist, that is incorrect they are just rare.

For now let’s focus on ε2 to ε4. As you know, we carry two copies of almost every gene in the body. That means that you can have one copy of ε2 and one of ε4, or two of ε4 and so on. Knowing which two copies you have is important as from this you can infer your AD risk.

I’ve summarized the various risks in the table below which are adapted from this paper. 2

APOE Allele 1APOE Allele 2AD Risk
ε3ε3Neutral 1

So from the above you can see that carrying two copies of ε4 significantly increases your risk of developing AD. There are several studies that have looked at this and so you may often see different numbers, but the pattern is the same.

Again, increased risk is NOT guaranteed Alzheimer’s disease. If anything, learning you carry an ApoE4 variant is a signal that watching what you eat, blood pressure, exercise, and what your blood lipid markers look like is all that much more necessary.

4 Transgenes

Any DNA that has been stably introduced into the germline of mice or rats is a transgene. Transgenes can be broken down into two categories:

  • Those that are produced by homologous recombination as targeted events at particular loci
  • Those that occur by random insertion into the genome (usually by means of microinjection)

Nomenclature for targeted genes is dealt with in Section 3.5. Random insertion of a transgene in or near an endogenous gene may produce a new allele of this gene. This new allele should be named as described in Section 3.4.2. The transgene itself is a new genetic entity for which a name may be required. This section describes the guidelines for naming the inserted transgene.

4.1 Symbols for transgenes

It is recognized that it is not necessary, or even desirable, to name all transgenes. For example, if a number of transgenic lines are described in a publication but not all are subsequently maintained or archived, then only those that are maintained require standardized names. The following Guidelines were developed by an interspecies committee sponsored by ILAR in 1992 and modified by the Nomenclature Committee in 1999 and 2000. Transgenic symbols should be submitted to MGD or RGD through the nomenclature submission form for new loci. The transgene symbol is made up of four parts:

  • Tg denoting transgene
  • In parentheses, the official gene symbol of the inserted DNA, using nomenclature conventions of the species of origin
  • The laboratory's line or founder designation or a serial number (note that numbering is independent for mouse and rat series)
  • The Laboratory code of the originating lab

No part of a transgene symbol is ever italicized as these are random insertions of foreign DNA material and are not part of the native genome.

Tg(Zfp38)D1Htza transgene containing the mouse Zfp38 gene, in line D1 reported by Nathaniel Heintz.
Tg(CD8)1Jwga transgene containing the human CD8 gene, the first transgenic line using this construct described by the lab of Jon W. Gordon.
a double transgene in rat containing the human HLA-B*2705 and B2M genes, that were co-injected, giving rise to line 33-3 by Joel D. Taurog.

The *, as used in the last example above, indicates that the included gene is mutant.

Different transgenic constructs containing the same gene should not be differentiated in the symbol they will use the same gene symbol in parentheses and will be distinguished by the serial number/Laboratory code. Information about the nature of the transgenic entity should be given in associated publications and database entries.

In many cases, a large number of transgenic lines are made from the same gene construct and only differ by tissue specificity of expression. The most common of these are transgenes that use reporter constructs or recombinases (e.g., GFP, lacZ, cre), where the promoter should be specified as the first part of the gene insertion designation, separated by a hyphen from the reporter or recombinase designation. The SV40 large T antigen is another example. The use of promoter designations is helpful in such cases.

the LacZ transgene with a Wnt1 promoter, from mouse line 206 in the laboratory of Andrew McMahon
Tg(Zp3-cre)3Mrtthe cre transgene with a Zp3 promoter, the third transgenic mouse line from the laboratory of Gail Martin

In the case of a fusion gene insert, where roughly equal parts of two genes compose the construct, a forward slash separates the two genes in parentheses.

Tg(TCF3/HLF)1Mlc a transgene in which the human transcription factor 3 gene and the hepatic leukemia factor gene were inserted as a fusion chimeric cDNA, the first transgenic mouse line produced by Michael L. Cleary's laboratory (Mlc).

This scheme is to name the transgene entity only. The mouse or rat strain on which the transgene is maintained should be named separately as in the Rules and Guidelines for Nomenclature of Mouse and Rat Strains. In describing a transgenic mouse or rat strain, the strain name should precede the transgene designation.

C57BL/6J-Tg(CD8)1Jwgmouse strain C57BL/6J carrying the Tg(CD8)1Jwg transgene.
rat strain F344/CrlBR carrying the Tg(HLA-B񧌱,B2M)33-3Trg double transgene

For BAC transgenics, the insert designation is the BAC clone and follows the same naming convention as the Clone Registry at NCBI.

a BAC transgene where the inserted BAC is from the RP22 BAC library, plate 412, row K, column 21. It is the 15th in the mouse made in the laboratory of Stefan Somlo (Som).

Transgenes containing RNAi constructs can be designated minimally as:

An expanded version of this designation is:

While there is the option to include significant information on vectors, promoters, etc. within the parentheses of a transgene symbol, this should be minimized for brevity and clarity. The function of a symbol is to provide a unique designation to a gene, locus, or mutation. The fine molecular detail of these loci and mutations should reside in databases such as MGD and RGD.

4.2 Intergenic sites used as "neutral" recipient sequence landing sites

Commonly used insertion sites include Gt(Rosa)26Sor and Hprt. The characteristics of these loci are such that they are "benign" in not affecting expression or function of other genes. New sites that are intergenic are being identified that can also serve as neutral insertion sites for transgenesis and are designated by Igs# (Intergenic site #), where # indicates a serial number.

These intergenic genomic sequences can be modified by targeted, spontaneous or other means of mutagenesis to facilitate the creation of alleles for modified intergenic sites such as those generated by MICER targeting or knock-out alleles for highly conserved sequences that reside within intergenic sequences. In general, these sites are benign, not affecting expression or function of other genes, but can act as a generic site for many kinds of inserted DNA. These markers differ from Regulatory Region markers in that Igs# loci do not exhibit regulator function.

Intergenic sites are to be symbolized as:


  • Human Genome Variation Society
  • Human Variome Project
  • Human Genome Organisation

Contact Us

    Discussions regarding HGVS nomenclature are necessary in order to further improve them. What is listed on these pages represents the current consensus of the recommendations. We invite everybody to send us question, comments or examples of cases that are not yet covered, with a suggestion of how to describe these ( E-mail:VarNomen @ For specific questions, do not forget to mention the reference sequence used!
    Follow us on Facebook

How do I interpret SNP nomenclature? - Biology

Chairperson: Cynthia Smith
(e-mail:[email protected])

To see previous versions of these guidlelines (last revised in November 2013), click here.

Table of Contents

1. General Guidelines for Designating Chromosomes

Mouse chromosomes are numbered and identified according to the system given by Nesbitt and Francke (1973), Sawyer et al. (1987), Beechey and Evans (1996), and Evans (1996). The word Chromosome should start with a capital letter when referring to a specific chromosome and may be abbreviated to Chr after the first use, e.g., Chromosome (Chr) 1 and Chr 1. The X and Y chromosomes are indicated by capital letters rather than numbers.

Cytogenetic bands are named by capital letters, alphabetically designating the major Giemsa (G)-staining bands from centromere to telomere. Major subdivisions within cytogenetic bands are numbered. Additional subdivisions are designated using a decimal system.

Major G-band designation:Chr 17B
Major subdivisions within the Chr 17B band:17B1, 17B2
Additional subdivision of band 17B1: 17B1.1, 17B1.2, 17B1.3, etc.

2. Symbols for Chromosome Anomalies

Chromosome anomaly symbols are not italicized (unlike gene symbols).

  • A prefix defining the type of anomaly
  • Specifically formatted information indicating the chromosomes involved
  • A series number and Laboratory code designation that uniquely identifies the anomaly

2.1 Prefix

A chromosome anomaly designation begins with a prefix that denotes the type of anomaly. Each prefix begins with a capital letter, with any subsequent letters being lowercase. The accepted prefixes are:

HcPericentric heterochromatin
HsrHomogeneous staining region
MatDfMaternal deficiency
MatDiMaternal disomy
MatDp Maternal duplication
PatDfPaternal deficiency
PatDiPaternal disomy
PatDpPaternal duplication
RbRobertsonian translocation
TgTransgenic insertion (see Rules for Nomenclature of Genes, Genetic Markers, Alleles, and Mutations in Mouse and Rat)
UpDfUniparental deficiency
UpDiUniparental disomy
UpDpUniparental duplication

2.2 Designating the chromosomes involved in an anomaly

The chromosome(s) involved in the anomaly should be indicated by adding the appropriate Arabic numerals or letters in parentheses, between the anomaly prefix and the series symbol.

If two chromosomes are involved in a chromosome anomaly, such as translocations and insertions, the chromosomes are separated by a semicolon. In the case of Robertsonian translocations, the chromosomes involved are separated by a period indicating the centromere.

In the case of insertions, the chromosome donating the inserted portion should be given first, followed by the recipient chromosome.

2.3 A series number and Laboratory code designation that uniquely identifies the anomaly

The first and each successive anomaly from a particular laboratory or institution is distinguished by a series symbol, consisting of a serial number followed by the Laboratory Registration Code or Laboratory code of the person or laboratory who discovered the anomaly. Each type of chromosomal anomaly from a given laboratory will have its own series of serial numbers (see examples). The Laboratory code should be the code already assigned for the particular institute, laboratory, or investigator for use with strains that they hold. If there is no preassigned code, one should be obtained from the Institute of Laboratory Animal Research (ILAR) ( Laboratory codes are uniquely assigned to institutes or investigators and are usually three to four letters (first letter uppercase, followed by all lowercase).

Del(9)4Hdeletion involving Chr 9, the 4 th deletion from Harwell
In(15)4Hinversion involving Chr 15, the 4 th inversion from Harwell
Is(131)4Hinsertion of part of Chr 13 into Chr 1 the 4 th insertion from Harwell
In(5)2Rkinversion involving Chr 5 the 2 nd inversion from T.H. Roderick's lab
Rb(3.15)2Rk Robertsonian translocation involving Chr 3 and Chr 15, the 2 nd Robertsonian translocation from T.H. Roderick's lab.
Iso(6)1Hisochromosome 6, the 1 st isochromosome from Harwell

Note: As mouse chromosomes are all acrocentric, with the exception of Chr Y, the p and q arm designations standard for human chromosomes are not used. For mouse Chr Y, p and q are appended as required. Example: Iso(Yq).

2.4 Abbreviating chromosome anomalies

Once the full designation for a chromosome anomaly is written in a document, an abbreviation can be used thereafter. The abbreviation consists of the anomaly prefix plus the serial number designation and Laboratory code. The chromosomal content in parentheses is omitted.

Using the examples from section 2.3:

2.5 Symbols for multiple chromosome anomalies

When an animal carries two or more anomalies that are potentially separable by recombination, the symbols for both (or all) anomalies should be given.

Rb(16.17)7Bnr T(117)190Ca/+ + an animal heterozygous for a Robertsonian and a reciprocal translocation, each involving Chr 17. The anomalies are organizationally in "coupling" i.e., the same Chr 17 is involved in both.
Rb(5.15)3Bnr +/+ In(5)9Rk an animal heterozygous for a Robertsonian and heterozygous for an inversion. Because they share a common chromosome, Chr 5, the organization of the anomalies is specified as in "repulsion."
Rb(10.11)5Rma/+ T(34)5Rk an animal that is heterozygous for a Robertsonian translocation and homozygous for an unrelated reciprocal translocation.

2.6 Symbols for complex chromosome anomalies

When one chromosome anomaly is contained within another or is inseparable from it, the symbols should be combined.

2.7 Designating chromosomal breakpoints

The symbols p and q are used to denote the short and long arms, respectively, of mouse chromosomes. In translocations, breaks in the short arm should be designated with a p, but the q for long arm may be omitted if the meaning is clear. Because mouse autosomes and the X Chromosome are acrocentric, they do not have a short arm other than a telomere proximal to the centromere. Therefore, most rearrangements in mouse chromosomes involve breaks in the long arm (q arm). In mouse, Chr Y has both a p and q arm.

T(Yp5)21Lub translocation involving a break in the short arm of the Y Chromosome and the long arm of Chr 5 the 21 st from Lubeck.

2.7.1 Defining the chromosomal band

When the positions of the chromosomal breakpoints relative to the G-banded karyotype are known, these are indicated by adding the band numbers, as given in the standard karyotype of the mouse (Evans 1996), after the appropriate chromosome numbers.

T(2H18A4)26Hreciprocal translocation having breakpoints in band H1 of Chr 2 and band A4 of Chr 8 the 26 th from Harwell
In(XA1XE)1Hinversion of the region between the breakpoints in bands A1 and E of the X Chromosome the 1 st from Harwell
Del(7E1)Tyr8Rldeletion of band 7E1 manifesting as a mutation to albino, Tyr c the 8 th from Russell
Is(In7F1-7CXF1)1Ct inverted insertion of a segment of Chr 7 band F1-C into the X Chromosome at band F1 the 1 st from Cattanach

For pericentric inversions the symbols pq and/or appropriate band numbers should be used.

In(8pq)1Rlpericentric inversion involving Chr 8 the 1 st from Russell
In(8pqA2)pericentric inversion of the region between the short arm and band A2 of the long arm of Chr 8
In(5C215E1)Rb3Bnr 1Ct the first inversion found by Cattanach in Rb3Bnr of the region between bands 5C2 and 15E1

2.8 Deficiencies and deletions as chromosomal anomalies

The deficiency (Df) and duplication (Dp) nomenclature should be restricted in its use to defining the unbalanced products of chromosome aberrations, i.e., deficient/duplicated chromosomes resulting from malsegregation of reciprocal translocations. Deletions are interstitial losses often, although not always, cytologically visible. Neither of these terms should be applied to small intragenic deletions. The latter give rise to allelic variation in a single locus and are given allele symbols.

2.9 Imprinting and chromosomal anomalies

Since the 1980s, mouse translocations have been extensively used in imprinting studies to generate uniparental disomies and uniparental duplications (partial disomies) and deficiencies of whole or selected chromosome regions, respectively (reviewed by Cattanach and Beechey 1997 and Beechey 1999). The resulting chromosomal change may be of maternal, paternal, or uniparental (referring to one or the other parent without specification of maternal vs. paternal) origin.

  • Disomy - two copies of a chromosome derived from one parent
  • Duplications - two copies of a chromosome region derived from one parent
  • Deficiencies - missing segments of a particular chromosome region originating from one parent

Disomies and duplications of one parental copy imply deficiency of the other parental copy.

The nomenclature for these anomalies includes the affected chromosome in parentheses. The abbreviations, prox (proximal) and dist (distal) can be used to denote the position of the duplication/deficiency relative to the breakpoint of a translocation used to generate the duplication/deficiency. Similarly, if a translocation is used to produce a uniparental disomy or duplication, this can be indicated in the symbol.

MatDi(12)maternal disomy for Chr 12
PatDp(10)paternal duplication for a region of Chr 10
MatDp(dist2)maternal duplication for distal Chr 2
MatDf(7)maternal deficiency for Chr 7
PatDi(11)Rb4Bnrpaternal disomy for Chr 11 produced using Robertsonian translocation Rb(11.13)4Bnr
MatDp(dist2)T26H maternal duplication for the region of Chr 2 distal to the breakpoint of the reciprocal translocation T(28)26H

2.10 Deletions identified through phenotypic change

If cytologically visible deletions are first detected by change in the phenotype produced by a gene (e.g., Mgf Sl-12H ), the gene and allele symbol designation should be included in the chromosome anomaly symbol, e.g,. Del(10)Mgf Sl-12H 1H was originally identified as Sl 12H (see Rules for Nomenclature of Genes, Genetic Markers, Alleles, and Mutations in Mouse and Rat).

2.11 Chromosomal aneuploidy

Trisomies and monosomies should be denoted by the appropriate prefix symbol, followed by the chromosome(s) concerned. If a tertiary aneuploid or partial aneuploid is derived from a translocation, then the chromosome composition (proximal chromosome end superscripted distal chromosome end) is denoted in parentheses, followed by the serial number and Laboratory code.

Ts16trisomy for Chr 16
Ts(1 13 )70H trisomy for the proximal end of Chr 1 and the distal end of Chr 13, derived from the translocation T(113)70H (also referred to as tertiary trisomy or partial trisomy).

Nullisomy, monosomy, and tetrasomy are denoted similarly.

2.12 Transchromosomal anomalies

Transchromosomal is the term used to reference the case where a chromosome, chromosomal fragment, or engineered chromosome from another species exists as a separate, heritable, freely segregating entity or is centromerically fused to an endogenous chromosome. The designation of the additional chromosome is represented parenthetically including the species abbreviation and chromosome from that species, followed by an established line number and an ILAR Laboratory code.

The format for a transchromosomal is: Tc(AAAbb)CCXxx

Tc= transchromosomal
AAA= species abbreviation (e.g., HSA=human MUS=mouse BOV=bovine)
bb= chromosome number of the inserted fragment from the other species
CC= line number
Xxx= Laboratory code
Tc(HSA21)91-1Emcf transchromosomal, human 21, line 91-1 Elizabeth M. C. Fisher
This is an engineered mouse line containing a fragment of human chromosome 21 as a freely segregating heritable fragment.

3. Variations in Heterochromatin and Chromosome Banding

3.1 Nucleolus organizers

The symbol NOR should be reserved for nucleolus organizers. Different organizers should be distinguished by chromosome numbers. Polymorphic loci within the ribosomal DNA region are designated with the root gene symbol, Rnr and the chromosome number (see Rules and Nomenclature of Genes, Genetic Markers, Alleles, and Mutations in Mouse and Rat).

Rnr12a polymorphic DNA segment that identifies the ribosomal DNA region on Chr 12

3.2 Pericentric heterochromatin

The symbol H should be used for heterochromatin visualized cytologically, followed by a symbol indicating the chromosome region involved, in this case c for centromeric, and a number indicating the chromosome on which it lies.

Variations in size, etc., of any block should be indicated by superscripts, using n for normal or standard, l for large and s for small bands.

In describing a new variant, a single inbred strain should be named as the prototype or standard strain.

3.3 Loci within heterochromatin

Individual loci or DNA segments mapped within heterochromatin should be symbolized with D- symbols (for details of naming DNA segments, see Rules for Nomenclature of Genes, Genetic Markers, Alleles, and Mutations in Mouse and Rat). A lowercase h follows the D to indicate the DNA locus is a genetic marker for the heterochromatin region.

Dh1Hthe first DNA segment within the pericentromeric heterochromatin region of Chr 1 discovered at Harwell.

3.4 Centromeres

The centromere itself (as opposed to pericentric heterochromatin) should be denoted by the symbol Cen. Individual loci or DNA segments mapped within the centromere region should be symbolized with D- symbols. It should be noted that at present there is no sequence definition for the centromere Cen refers to the functional unit of the centromere.

3.5 Telomeres

The telomere should be denoted by the symbol Tel. The symbol Tel may be substituted for D in a locus symbol that refers to a locus recognized by a telomere consensus sequence probe. Symbols for such loci (mapping to the telomere region) are italicized and consist of three parts:

  • The letters Tel (for telomere)
  • A number denoting the chromosome
  • A letter denoting the centromeric or distal end of the chromosome, namely p for centromeric and q for distal (derived from p and q for short and long arms, respectively).

Multiple loci assigned to telomeres of individual chromosomes are numbered serially.

Tel14p1the first telomere sequence mapped at the centromeric end of Chr 14
Tel19q2the second telomere sequence mapped at the distal end of Chr 19

Telomeric sequences mapped to other chromosome regions should be designated as -rs loci and are sequentially numbered (see Rules for Nomenclature of Genes, Genetic Markers, Alleles, and Mutations in Mouse and Rat and Sawyer et al., 1987).

3.6 G-band polymorphisms

When a recognizable and heritable variant in size, staining density, etc. of a particular chromosomal G-band is discovered, this should be indicated by giving the designation of the band affected, in accordance with the standard karyotype of the mouse (Evans 1996), with a superscript to indicate the variant concerned.

When a supernumerary band becomes visible, this may be due to a small duplication, and if so should be designated as such. If the supernumerary band is due not to a duplication but to a further resolution within a band, then a new band should be designated as a subdivision of the appropriate known band (see Section 1 above).

4. Use of Human Chromosome Nomenclature

Chromosomal complements may be described using the type of nomenclature used for human chromosomes when dealing with whole arm changes. In this case the number of chromosomes is specified, followed by a comma and a specification of the whole arm chromosome change. Symbols used to designate these whole arm chromosome changes are:

  • "+" to indicate the presence of a specific additional autosome
  • "–" to indicate the absence of a specific autosome
  • "O" to indicate a missing sex chromosome
  • Additional Xs or Ys to indicate supernumerary sex chromosomes

For mosaics a double slash is used to separate the components of the chromosomal mosaic.


Although in general use, polymorphism is a very broad term. In biology, polymorphism has been given a specific meaning, being distinguishable from monomorphism (having only one form). A more specific term, when only two forms occur, is dimorphism.

  • The term omits characteristics showing continuous variation (such as weight), though this has a heritable component. Polymorphism deals with forms in which the variation is discrete (discontinuous) or strongly bimodal or polymodal. [4]
  • Morphs must occupy the same habitat at the same time this excludes geographical races and seasonal forms. [5] The use of the words "morph" or "polymorphism" for what is a visibly different geographical race or variant is common, but incorrect. The significance of geographical variation is in that it may lead to allopatric speciation, whereas true polymorphism takes place in panmictic populations.
  • The term was first used to describe visible forms, but nowadays it has been extended to include cryptic morphs, for instance blood types, which can be revealed by a test.
  • Rare variations are not classified as polymorphisms, and mutations by themselves do not constitute polymorphisms. To qualify as a polymorphism, some kind of balance must exist between morphs underpinned by inheritance. The criterion is that the frequency of the least common morph is too high simply to be the result of new mutations[4][6] or, as a rough guide, that it is greater than 1% (though that is far higher than any normal mutation rate for a single allele). [5] : ch. 5

Nomenclature Edit

Polymorphism crosses several discipline boundaries, including ecology and genetics, evolution theory, taxonomy, cytology, and biochemistry. Different disciplines may give the same concept different names, and different concepts may be given the same name. For example, there are the terms established in ecological genetics by E.B. Ford (1975), [4] and for classical genetics by John Maynard Smith (1998). [7] The shorter term morphism may be more accurate than polymorphism, but is not often used. It was the preferred term of the evolutionary biologist Julian Huxley (1955). [8]

Various synonymous terms exist for the various polymorphic forms of an organism. The most common are morph and morpha, while a more formal term is morphotype. Form and phase are sometimes also used, but are easily confused in zoology with, respectively, "form" in a population of animals, and "phase" as a color or other change in an organism due to environmental conditions (temperature, humidity, etc.). Phenotypic traits and characteristics are also possible descriptions, though that would imply just a limited aspect of the body.

In the taxonomic nomenclature of zoology, the word "morpha" plus a Latin name for the morph can be added to a binomial or trinomial name. However, this invites confusion with geographically variant ring species or subspecies, especially if polytypic. Morphs have no formal standing in the ICZN. In botanical taxonomy, the concept of morphs is represented with the terms "variety", "subvariety" and "form", which are formally regulated by the ICN. Horticulturists sometimes confuse this usage of "variety" both with cultivar ("variety" in viticultural usage, rice agriculture jargon, and informal gardening lingo) and with the legal concept "plant variety" (protection of a cultivar as a form of intellectual property).

Three mechanisms may cause polymorphism: [9]

    – where the phenotype of each individual is genetically determined
  • A conditional development strategy, where the phenotype of each individual is set by environmental cues
  • A mixed development strategy, where the phenotype is randomly assigned during development

Endler's survey of natural selection gave an indication of the relative importance of polymorphisms among studies showing natural selection. [10] The results, in summary: Number of species demonstrating natural selection: 141. Number showing quantitative traits: 56. Number showing polymorphic traits: 62. Number showing both Q and P traits: 23. This shows that polymorphisms are found to be at least as common as continuous variation in studies of natural selection, and hence just as likely to be part of the evolutionary process.

Genetic polymorphism Edit

Since all polymorphism has a genetic basis, genetic polymorphism has a particular meaning:

  • Genetic polymorphism is the simultaneous occurrence in the same locality of two or more discontinuous forms in such proportions that the rarest of them cannot be maintained just by recurrent mutation or immigration, originally defined by Ford (1940). [6][11] : 11 The later definition by Cavalli-Sforza & Bodmer (1971) is currently used: "Genetic polymorphism is the occurrence in the same population of two or more alleles at one locus, each with appreciable frequency", where the minimum frequency is typically taken as 1%. [12][13]

The definition has three parts: a) sympatry: one interbreeding population b) discrete forms and c) not maintained just by mutation.

In simple words, the term polymorphism was originally used to describe variations in shape and form that distinguish normal individuals within a species from each other. Presently, geneticists use the term genetic polymorphism to describe the inter-individual, functionally silent differences in DNA sequence that make each human genome unique. [14]

Genetic polymorphism is actively and steadily maintained in populations by natural selection, in contrast to transient polymorphisms where a form is progressively replaced by another. [15] : 6–7 By definition, genetic polymorphism relates to a balance or equilibrium between morphs. The mechanisms that conserve it are types of balancing selection.

Mechanisms of balancing selection Edit

    (or heterozygote advantage): "Heterosis: the heterozygote at a locus is fitter than either homozygote". [4][7] : 65 [11] : The fitness of a particular phenotype is dependent on its frequency relative to other phenotypes in a given population. Example: prey switching, where rare morphs of prey are actually fitter due to predators concentrating on the more frequent morphs. [4][15]
  • Fitness varies in time and space. Fitness of a genotype may vary greatly between larval and adult stages, or between parts of a habitat range. [11] : 26
  • Selection acts differently at different levels. The fitness of a genotype may depend on the fitness of other genotypes in the population: this covers many natural situations where the best thing to do (from the point of view of survival and reproduction) depends on what other members of the population are doing at the time. [7] : 17 & ch. 7

Pleiotropism Edit

Most genes have more than one effect on the phenotype of an organism (pleiotropism). Some of these effects may be visible, and others cryptic, so it is often important to look beyond the most obvious effects of a gene to identify other effects. Cases occur where a gene affects an unimportant visible character, yet a change in fitness is recorded. In such cases the gene's other (cryptic or 'physiological') effects may be responsible for the change in fitness. Pleiotropism is posing continual challenges for many clinical dysmorphologists in their attempt to explain birth defects which affect one or more organ system, with only a single underlying causative agent. For many pleiotropic disorders, the connection between the gene defect and the various manifestations is neither obvious, nor well understood. [16]

"If a neutral trait is pleiotropically linked to an advantageous one, it may emerge because of a process of natural selection. It was selected but this doesn't mean it is an adaptation. The reason is that, although it was selected, there was no selection for that trait." [17]

Epistasis Edit

Epistasis occurs when the expression of one gene is modified by another gene. For example, gene A only shows its effect when allele B1 (at another locus) is present, but not if it is absent. This is one of the ways in which two or more genes may combine to produce a coordinated change in more than one characteristic (for instance, in mimicry). Unlike the supergene, epistatic genes do not need to be closely linked or even on the same chromosome.

Both pleiotropism and epistasis show that a gene need not relate to a character in the simple manner that was once supposed.

The origin of supergenes Edit

Although a polymorphism can be controlled by alleles at a single locus (e.g. human ABO blood groups), the more complex forms are controlled by supergenes consisting of several tightly linked genes on a single chromosome. Batesian mimicry in butterflies and heterostyly in angiosperms are good examples. There is a long-standing debate as to how this situation could have arisen, and the question is not yet resolved.

Whereas a gene family (several tightly linked genes performing similar or identical functions) arises by duplication of a single original gene, this is usually not the case with supergenes. In a supergene some of the constituent genes have quite distinct functions, so they must have come together under selection. This process might involve suppression of crossing-over, translocation of chromosome fragments and possibly occasional cistron duplication. That crossing-over can be suppressed by selection has been known for many years. [18] [19]

Debate has centered round the question of whether the component genes in a super-gene could have started off on separate chromosomes, with subsequent reorganization, or if it is necessary for them to start on the same chromosome. Originally, it was held that chromosome rearrangement would play an important role. [20] This explanation was accepted by E. B. Ford and incorporated into his accounts of ecological genetics. [4] : ch. 6 [11] : 17–25

However, today many believe it more likely that the genes start on the same chromosome. [21] They argue that supergenes arose in situ. This is known as Turner's sieve hypothesis. [22] John Maynard Smith agreed with this view in his authoritative textbook, [7] but the question is still not definitively settled.

Selection, whether natural or artificial, changes the frequency of morphs within a population this occurs when morphs reproduce with different degrees of success. A genetic (or balanced) polymorphism usually persists over many generations, maintained by two or more opposed and powerful selection pressures. [6] Diver (1929) found banding morphs in Cepaea nemoralis could be seen in prefossil shells going back to the Mesolithic Holocene. [23] [24] Non-human apes have similar blood groups to humans this strongly suggests that this kind of polymorphism is ancient, at least as far back as the last common ancestor of the apes and man, and possibly even further.

The relative proportions of the morphs may vary the actual values are determined by the effective fitness of the morphs at a particular time and place. The mechanism of heterozygote advantage assures the population of some alternative alleles at the locus or loci involved. Only if competing selection disappears will an allele disappear. However, heterozygote advantage is not the only way a polymorphism can be maintained. Apostatic selection, whereby a predator consumes a common morph whilst overlooking rarer morphs is possible and does occur. This would tend to preserve rarer morphs from extinction.

Polymorphism is strongly tied to the adaptation of a species to its environment, which may vary in colour, food supply, and predation and in many other ways. Polymorphism is one good way the opportunities [ vague ] get to be used it has survival value, and the selection of modifier genes may reinforce the polymorphism. In addition, polymorphism seems to be associated with a higher rate of speciation.

Polymorphism and niche diversity Edit

G. Evelyn Hutchinson, a founder of niche research, commented "It is very likely from an ecological point of view that all species, or at least all common species, consist of populations adapted to more than one niche". [26] He gave as examples sexual size dimorphism and mimicry. In many cases where the male is short-lived and smaller than the female, he does not compete with her during her late pre-adult and adult life. Size difference may permit both sexes to exploit different niches. In elaborate cases of mimicry, such as the African butterfly Papilio dardanus, female morphs mimic a range of distasteful models, often in the same region. The fitness of each type of mimic decreases as it becomes more common, so the polymorphism is maintained by frequency-dependent selection. Thus the efficiency of the mimicry is maintained in a much increased total population. However it can exist within one gender. [4] : ch. 13

The switch Edit

The mechanism which decides which of several morphs an individual displays is called the switch. This switch may be genetic, or it may be environmental. Taking sex determination as the example, in humans the determination is genetic, by the XY sex-determination system. In Hymenoptera (ants, bees and wasps), sex determination is by haplo-diploidy: the females are all diploid, the males are haploid. However, in some animals an environmental trigger determines the sex: alligators are a famous case in point. In ants the distinction between workers and guards is environmental, by the feeding of the grubs. Polymorphism with an environmental trigger is called polyphenism.

The polyphenic system does have a degree of environmental flexibility not present in the genetic polymorphism. However, such environmental triggers are the less common of the two methods.

Investigative methods Edit

Investigation of polymorphism requires use of both field and laboratory techniques. In the field:

  • detailed survey of occurrence, habits and predation
  • selection of an ecological area or areas, with well-defined boundaries data
  • relative numbers and distribution of morphs
  • estimation of population sizes
  • genetic data from crosses
  • population cages cytology if possible
  • use of chromatography, biochemistry or similar techniques if morphs are cryptic

Without proper field-work, the significance of the polymorphism to the species is uncertain and without laboratory breeding the genetic basis is obscure. Even with insects, the work may take many years examples of Batesian mimicry noted in the nineteenth century are still being researched.

Polymorphism was crucial to research in ecological genetics by E. B. Ford and his co-workers from the mid-1920s to the 1970s (similar work continues today, especially on mimicry). The results had a considerable effect on the mid-century evolutionary synthesis, and on present evolutionary theory. The work started at a time when natural selection was largely discounted as the leading mechanism for evolution, [27] [28] continued through the middle period when Sewall Wright's ideas on drift were prominent, to the last quarter of the 20th century when ideas such as Kimura's neutral theory of molecular evolution was given much attention. The significance of the work on ecological genetics is that it has shown how important selection is in the evolution of natural populations, and that selection is a much stronger force than was envisaged even by those population geneticists who believed in its importance, such as Haldane and Fisher. [29]

In just a couple of decades the work of Fisher, Ford, Arthur Cain, Philip Sheppard and Cyril Clarke promoted natural selection as the primary explanation of variation in natural populations, instead of genetic drift. Evidence can be seen in Mayr's famous book Animal Species and Evolution, [30] and Ford's Ecological Genetics. [4] Similar shifts in emphasis can be seen in most of the other participants in the evolutionary synthesis, such as Stebbins and Dobzhansky, though the latter was slow to change. [3] [31] [32] [33]

Kimura drew a distinction between molecular evolution, which he saw as dominated by selectively neutral mutations, and phenotypic characters, probably dominated by natural selection rather than drift. [34]

WHO Updates the Nomenclature of SARS-CoV-2 Variants

Lisa Winter
Jun 1, 2021

T he naming of variants of SARS-CoV-2 has been a bit slapdash. Different databases that share the sequences of the virus have different nomenclature norms. For instance, the variant that emerged in the United Kingdom is called B.1.1.7 on the Pango platform, but is called 20I/S:501Y.V1 on Nextstrain. Yesterday (May 31), the World Health Organization (WHO) announced that SARS-CoV-2 variants of interest (VOI) and variants of concern (VOC) will be named based on the Greek alphabet for purposes of public discourse.

As B.1.1.7 was the first VOC designated by WHO, it is called Alpha under the new naming system. B.1.351, which originated in Brazil, is now called Beta. The two other VOCs are P.1, the variant first identified in Brazil and now referred to as Gamma, and B.1.617.2 that originated in India, now called Delta. The six VOIs designated by WHO take up Epsilon through Kappa in the Greek alphabet. The full list will be maintained on WHO’s website.

“These [Greek] labels do not replace existing scientific names (e.g. those assigned by GISAID, Nextstrain and Pango), which convey important scientific information and will continue to be used in research,” WHO’s statement reads.

According to WHO, the technical variant names are too confusing for the general public and so “people often resort to calling variants by the places where they are detected, which is stigmatizing and discriminatory.”

The new naming system comes long after the first variants were described. WHO officials say the decision came after a great deal of discussion on which naming convention would be best. Reuters reports that the group considered other possibilities including portmanteaus, fruits, or Greek deities.

According to STAT, the group behind the decision was made up of many of the same people who are on the International Committee on the Taxonomy of Viruses. Although the organization named SARS-CoV-2, variant nomenclature is beyond its official scope, and so the task was left to WHO.

“I heard it’s sometimes quite a challenge to come to an agreement with regards to nomenclature,” Frank Konings, the leader of the working group, tells STAT. “This was a relatively straightforward discussion in getting to the point where everybody agreed.”

Watch the video: How to find Single Nucleotide Polymorphism SNPs for a specific gene or region (February 2023).