Are plasmids found in eukaryotes?

Are plasmids found in eukaryotes?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

There are some who says plasmids are found in some eukaryotes but is it like scientifically proven?

These resources may help.

This book, “Plasmids of Eukaryotes”, explains that…

“The possession of plasmids was for a long time recognized only in the bacteria. It is now evident that plasmids, or replicative forms of DNA structurally and experimentally comparable to bacterial plasmids, exist in eukaryotic organisms as well. Such plasmids are in fact common among fungi and higher plants.”

Also, this website (ScienceDirect) explains that…

“Most plasmids inhabit bacteria, and indeed around 50% of bacteria found in the wild contain one or more plasmids. Plasmids are also found in higher organisms such as yeast and fungi. The 2 micron circle of yeast (discussed later) is a well-known example that has been modified for use as a cloning vector.“

Are plasmids found in eukaryotes? - Biology

A plasmid is a small DNA molecule that is physically separate from, and can replicate independently of, chromosomal DNA within a cell.

Learning Objectives

Outline the utility of plasmids

Key Takeaways

Key Points

  • Plasmids can be found in all three major domains: Archaea, Bacteria, and Eukarya. Similar to viruses, plasmids are not considered by some to be a form of life.
  • Plasmids provide a mechanism for horizontal gene transfer within a population of microbes and typically provide a selective advantage under a given environmental state.
  • Plasmids may carry genes that provide resistance to naturally occurring antibiotics in a competitive environmental niche, or the proteins produced may act as toxins under similar circumstances.

Key Terms

  • plasmid: A circle of double-stranded DNA that is separate from the chromosomes, which is found in bacteria and protozoa.
  • mobilome: The entirety of the mobile (transposable) elements of a genome.
  • replicons: a region of DNA or RNA, that replicates from a single origin of replication.

In microbiology and genetics, a plasmid is a DNA molecule that is separate from, and can replicate independently of the chromosomal DNA.They are double-stranded and, in many cases, circular. Plasmids usually occur naturally in bacteria, but are sometimes found in archaea, and even in eukaryotic organisms (e.g., the 2-micrometre ring in Saccharomyces cerevisiae).

Step by step of cloning a gene using a plasmid: This image shows a line drawing that compares the activity of non-integrating plasmids, on the top, with episomes, on the bottom, during cell division. The upper half of the image shows a bacterium with its chromosomal DNA and plasmids dividing into two identical bacteria, each with their chromosomal DNA and plasmids. The lower half of the image shows a bacterium with its chromosomal DNA, but with an episome. Next to this bacterium, we see the same bacterium, but after the episome has integrated into the chromosomal DNA and has become a part of it. This second bacterium now divides into two bacteria identical to it, each with an episome integrated into it.

Plasmid sizes vary from 1 to over 1,000 kbp. The number of identical plasmids in a single cell can range anywhere from one to thousands under some circumstances. Plasmids can be considered part of the mobilome because they are often associated with conjugation, a mechanism of horizontal gene transfer.

The term plasmid was first introduced by the American molecular biologist Joshua Lederberg in 1952.

Plasmids are considered replicons.They can be found in all three major domains: Archaea, Bacteria, and Eukarya. Similar to viruses, plasmids are not considered by some to be a form of life. Unlike viruses, they are naked DNA and do not encode genes necessary to encase the genetic material for transfer to a new host, though some classes of plasmids encode the sex pilus necessary for their own transfer. Plasmid host-to-host transfer requires direct mechanical transfer by conjugation, or changes in incipient host gene expression allowing the intentional uptake of the genetic element by transformation. Microbial transformation with plasmid DNA is neither parasitic nor symbiotic in nature, because each implies the presence of an independent species living in a commensal or detrimental state with the host organism. Rather, plasmids provide a mechanism for horizontal gene transfer within a population of microbes and typically provide a selective advantage under a given environmental state. Plasmids may carry genes that provide resistance to naturally occurring antibiotics in a competitive environmental niche, or the proteins produced may act as toxins under similar circumstances. Plasmids can also provide bacteria with the ability to fix elemental nitrogen or to degrade recalcitrant organic compounds that provide an advantage when nutrients are scarce.

Characteristics of Eukaryotic DNA compared to Prokaryotic DNA

Prokaryotic cells are known to be much less complex than eukaryotic cells since eukaryotic cells are considered to be present at a later point of evolution. It is probable that eukaryotic cells evolved from prokaryotic cells. Differences in complexity can be seen at the cellular level.

The single characteristic that is both necessary and sufficient to define an organism as a eukaryote is a nucleus surrounded by a nuclear envelope with nuclear pores. All extant eukaryotes have cells with nuclei most of a eukaryotic cell&rsquos genetic material is contained within the nucleus. In contrast, prokaryotic DNA is not contained within a nucleus, but rather is attached to the plasma membrane and contained in the form of a nucleoid, an irregularly-shaped region that is not surrounded by a nuclear membrane.

Figure (PageIndex<1>): Cellular location of eukaryotic and prokaryotic DNA: Eukaryotic DNA is stored in a nucleus, whereas prokaryotic DNA is in the cytoplasm in the form of a nucleoid.

Eukaryotic DNA is packed into bundles of chromosomes, each consisting of a linear DNA molecule coiled around basic (alkaline) proteins called histones, which wind the DNA into a more compact form. Prokaryotic DNA is found in circular, non-chromosomal form. In addition, prokaryotes have plasmids, which are smaller pieces of circular DNA that can replicate separately from prokaryotic genomic DNA. Because of the linear nature of eukaryotic DNA, repeating non-coding DNA sequences called telomeres are present on either end of the chromosomes as protection from deterioration.

Mitosis, a process of nuclear division wherein replicated chromosomes are divided and separated using elements of the cytoskeleton, is universally present in eukaryotes. The cytoskeleton contains structural and motility components called actin microfilaments and microtubules. All extant eukaryotes have these cytoskeletal elements. Prokaryotes on the other hand undergo binary fission in a process where the DNA is replicated, then separates to two poles of the cell, and, finally, the cell fully divides.

A major DNA difference between eukaryotes and prokaryotes is the presence of mitochondrial DNA (mtDNA) in eukaryotes. Because eukaryotes have mitochondria and prokaryotes do not, eukaryotic cells contain mitochondrial DNA in addition to DNA contained in the nucleus and ribosomes. The mtDNA is composed of significantly fewer base pairs than nuclear DNA and encodes only a few dozen genes, depending on the organism.

Plasmids: Definition, Types and Replication | Microbiology

In this article we will discuss:- 1. Definition of Plasmids 2. Physical Nature and Copy Number of Plasmids 3. Properties 4. Incompatibility 5. Types 6. Replication 7. Plasmid Curing 8. Use of Plasmids as Coning Vectors.

Definition of Plasmids:

In addition to bacterial chromosome (nucleoid), bacterial cells normally contain genetic elements in their cytoplasm. These genetic elements exist and replicate separately from the chromosome and are called plasmids. The very existence of plasmids in bacterial cytoplasm was revealed by Lederberg in 1952 while working on conjugation process in bacteria.

Lederberg coined the term ‘plasmid’ to refer to the transmissible genetic elements that were transferred from one bacterial cell to another and determined the maleness in bacteria.

Literally, thousands of plasmids are now known over 300 different naturally occurring plasmids have been isolated from strains of Escherichia coli alone. Besides naturally occurring plasmids, many artificially modified plasmids have been developed and used as vectors in the process of gene cloning (genetic engineering).

Physical Nature and Copy Number of Plasmids:

The physical nature of plasmids is quite simple. They are small double-stranded DNA molecules. Majority of the plasmids are circular, but many linear plasmids are also known.

Naturally occurring plasmids vary in size from approximately 1 kilobase to more than 1 megabase, and a typical plasmid DNA is considered to be less than 5% the size of the bacterial chromosome. Most of the plasmid DNA isolated from bacterial cells exist in the supercoil configuration, which is the most compact form for DNA to exist within the cell.

The copy number refers to the fact that different plasmids occur in cells in different numbers. Some plasmids are present in the cell in only 1-3 copies, whereas others may be present in over 100 copies. Copy number is controlled by genes on the plasmid and by interactions between the host and the plasmid.

Properties of Plasmids:

(i) They are specific to one or a few particular bacteria.

(ii) They replicate independently of the bacterial chromosome.

(iii) They code for their own transfer.

(iv) They act as episomes and reversibly integrate into bacterial chromosome.

(v) They may pick-up and transfer certain genes of bacterial chromosome,

(vi) They may affect certain characteristics of the bacterial cell,

(vii) Plasmids differ from viruses in following two ways.

(viii) They do not cause damage to cells and generally are beneficial.

(ix) They do not have extracellular forms and exist inside cells simply as free and typically circular DNA.

Incompatibility of Plasmids:

In some cases, a single bacterial cell contains several different types of plasmids. Borrelia burgdorferi that causes Lyme disease, for convenience, possesses 17 different circular and linear plasmids.

In a condition when a plasmid is transferred to a new bacterial cell that already possesses another plasmid, it is commonly observed that the second (transferred) plasmid is not accommodated and is lost during subsequent replication.

This condition is called plasmid incompatibility and the two plasmids are said to be incompatible. A number of incompatibility groups of plasmids have been recognised in bacteria. The plasmids of one incompatibility group exclude each other from replicating in the cell but generally coexist with plasmids from other groups.

Plasmids of an incompatibility group share a common mechanism of regulating their replication and are thus related’ to one another. Therefore, although a bacterial cell may possess various types of plasmids, each is genetically distinct.

Types of Plasmids:

Various types of plasmids naturally occur in bacterial cells, and the most favoured classification of such plasmids is based on their main functions encoded by their own genes.

Following are the main type of plasmids recognised on the basis of above mentioned characteristic feature:

1. F-plasmid (or F-factor):

F-plasmid or F-factor (“F” stands for fertility) is the very well characterised plasmid. It plays a major role in conjugation in bacteria E. coli and was the first to be described. It is this plasmid that confers ‘maleness’ on the bacterial cells the term ‘sex-factor’ is also used to refer to F-plasmid because of its this property. F-plasmid is a circular dsDNA molecule of 99,159 base pairs.

The genetic map of the F-plasmid is shown in Fig. 5.31. One region of the plasmid contains genes involved in regulation of the DNA replication (rep genes), the other region contains transposable elements (IS3, Tn 1000, IS3 and IS2 genes) involved in its ability to function as an episome, and the third large region, the tra region, consists of tra genes and possesses ability to promote transfer of plasmids during conjugation. Example F-plasmid of E. coli.

R-plasmids are the most widespread and well-studied group of plasmids conferring resistance (hence called resistant plasmids) to antibiotics and various other growth inhibitors.

R- plasmids typically have genes that code for enzymes able to destroy and modify antibiotics. They are not usually integrated into the host chromosome. Some R-plasmids possess only a single resistant gene whereas others can have as many as eight.

Plasmid R 100, for example, is a 94.3 kilobase-pair plasmid (Fig. 5.32) that carries resistant genes for sulfonamides, streptomycin and spectinomycin, chloramphenicol, tetracyclin etc. It also carries genes conferring resistance to mercury.

Many R-plasmids are conjugative and possess drug- resistant genes as transposable elements, they play an important role in medical microbiology as their spread through natural populations can have profound consequences in the treatment of bacterial infections.

Virulence-plasmids confer pathogenesity on the host bacterium. They make the bacterium more pathogenic as the bacterium is better able to resist host defence or to produce toxins.

For example, Ti-plasmids of Agrobacterium tumefaciens induce crown gall disease of angiospermic plants entertoxigenic strains of E. coli cause traveller’s diarrhoea because of a plasmid that codes for an enterotoxin which induces extensive secretion of water and salts into the bowel.

Col-plasmids carry genes that confer ability to the host bacterium to kill other bacteria by secreting bacteriocins, a type of proteins. Bacteriocins often kill cells by creating channels in the plasma membrane thus increasing its permeability. They also may degrade DNA or RNA or attack peptidoglycan and weaken the cell-wall.

Bacteriocins act only against closely related strains. Col E1 plasmid of E. coli code for the synthesis of bacterioein called colicins which kill other susceptible strains of E. coli. Col plasmids of some E.coli code for the synthesis of bacteriocin, namely cloacins that kill Enterobacter species.

Lactic acid bacteria produce bacteriocin NisinA which strongly inhibits the growth of a wide variety of gram-positive bacteria and is used as a preservative in the food industry.

5. Metabolic plasmids:

Metabolic plasmids (also called degradative plasmids) possess genes to code enzymes that degrade unusual substances such as toluene (aromatic compounds), pesticides (2, 4-dichloro- phenoxyacetic acid), and sugars (lactose).

TOL (= pWWO) plasmid of Pseudomonas putida is an example. However, some metabolic plasmids occurring in certain strains of Rhizobium induce nodule formation in legumes and carry out fixation of atmospheric nitrogen.

A brief summary of important types of bacterial plasmids, their hosts, and properties is given in Table 5.2.

Replication of Plasmids:

Plasmids replicate autonomously because they have their own replication origins. The enzymes involved in plasmid replication are normal cell enzymes particularly in case of small plasmids. But, some large plasmids carry genes that code for enzymes that are specific for plasmid replication.

Plasmids possess relatively few genes, generally less than 30, and the genes are concerned primarily with control of the replication initiation process and with apportionment of the replicated plasmids between daughter cells the genetic information carried in plasmid genes is not essential to the host because the bacteria that lack them usually function normally.

Since the plasmid DNA is of small size, the whole process of its replication takes place very quickly, perhaps in 1/10 or less of the total time of cell division cycle.

Most plasmids in gram-negative bacteria replicate in a manner similar to the replication of bacterial chromosome involving initiation at the replication origin site and bidirectional replication around the DNA circle giving a theta (Ө) intermediate.

However, some plasmids of gram-negative bacteria replicate by unidirectional method. Most plasmids of gram-positive bacteria replicate by a rolling circle mechanism similar to that used by phage φx174. Most linear plasmids replicate by means of a mechanism that involves a protein bound to the 5′-end of each DNA strand that is used in priming DNA synthesis.

Plasmid Curing:

Plasmids can be eliminated from bacterial cells, and this process is called curing. Curing may take place spontaneously or it may be induced by various treatments, which inhibit plasmid replication but do not affect bacterial chromosome replication and cell reproduction. The inhibited plasmids are slowly diluted out of the growing bacterial population.

Some commonly used curing treatment agents are acridine dyes, ultraviolet (UV) and ionizing radiation, thymine starvation and growth above optimal temperatures. These curing treatment agents interfere with plasmid replication than with bacterial chromosome replication.

Use of Plasmids as Cloning Vectors:

Significance of plasmids dramatically increased with the advent of recombinant DNA technology as they became the first cloning vectors, and even today they are the most widely used cloning vectors especially in gene cloning in bacteria.

They enjoy this status because they have very useful properties as cloning vectors that include:

(i) Small size, which makes the plasmid easy to isolate and manipulate

(ii) Independent origin of replication, which allows plasmid replication in the cell to proceed independently from direct chromosomal control

(iii) Multiple copy number, which makes them to be present in the cell in several copies so that amplification of the plasmid DNA becomes easy and

(iv) Presence of selectable markers such as antibiotic resistance genes, which make detection and selection of plasmid-containing clones easier.

The plasmid vector is isolated from the bacterial cell and at one site by restriction enzyme. The cleavage converts the circular plasmid DNA into a linear DNA molecule.

Now the two open ends of linear plasmid are joined to the ends of the foreign DNA to be inserted with the help of enzyme DNA ligase. This regenerates a circular hybrid or chimeric plasmid, which is transferred to a bacterium wherein it replicates and perpetuates indefinitely.

One of the most widely used plasmids in gene cloning in bacteria is pBR322, which has both resistance genes for ampicillin and tetracycline and many restriction sites. When a foreign DNA is inserted into the ampicillin resistance gene of pBR322, the plasmid is no longer able to confer resistance to ampicillin.

Plasmids of Eukaryotes

Authors: Esser, K., Kück, U., Lang-Hinrichs, C., Lemke, P., Osiewacz, H.D., Stahl, U., Tudzynski, P.

Buy this book

  • ISBN 978-3-642-82585-9
  • Digitally watermarked, DRM-free
  • Included format: PDF
  • ebooks can be used on all reading devices
  • Immediate eBook download after purchase
  • ISBN 978-3-540-15798-4
  • Free shipping for individuals worldwide
  • Institutional customers should get in touch with their account manager
  • Usually ready to be dispatched within 3 to 5 business days, if in stock

The possession of plasmids was for a long time recognized only in the bacteria. It is now evident that plasmids, or replicative forms of DNA structurally and experimentally comparable to bacterial plasmids, exist in eukaryotic organisms as well. Such plasmids are in fact common among fungi and higher plants. The present review is undertaken to provide a comprehensive account of the data available on plasmids found in eukaryotic organisms. This review will not consider plasmids of prokaryotic origin, even though certain bacterial plasmids, such as the tumor-inducing (Ti) plasmids of Agrobacterium tumefaciens, may be intimately associated with transformation of the eukaryotic host. This book, moreover, does not consider transformation experiments in eukaryotic hosts involving viral DNA as vectors, although indeed such vectors have been developed for use in plant and animal systems. After a general introduction, providing historical perspective on the nature and role of plasmids, a list of eukaryotic plasmids will be presented according to their origin. This is followed by a detailed discussion of known structure and function. In subsequent chapters the practical implications of eukaryotic plasmids for molecular cloning and biotechnology will be discussed. This latter part traces the development of interest'in biotechnical genetics and gives special consideration to the use of eukaryotic systems for gene cloning. The terminology biotechni­ cal genetics is introduced to the reader and is used in a general sense as equivalent to genetic engineering. Biotechnical genetics includes, but is not limited to, gene cloning through recombinant DNA technology.


All cellular life synthesizes proteins, and organisms in all three domains of life possess ribosomes, structures responsible protein synthesis. However, ribosomes in each of the three domains are structurally different. Ribosomes, themselves, are constructed from proteins, along with ribosomal RNA (rRNA). Prokaryotic ribosomes are found in the cytoplasm. They are called 70S ribosomes because they have a size of 70S (Figure (PageIndex<7>)), whereas eukaryotic cytoplasmic ribosomes have a size of 80S. (The S stands for Svedberg unit, a measure of sedimentation in an ultracentrifuge, which is based on size, shape, and surface qualities of the structure being analyzed). Although they are the same size, bacterial and archaeal ribosomes have different proteins and rRNA molecules, and the archaeal versions are more similar to their eukaryotic counterparts than to those found in bacteria.

Figure (PageIndex<7>): Prokaryotic ribosomes (70S) are composed of two subunits: the 30S (small subunit) and the 50S (large subunit), each of which are composed of protein and rRNA components.

Types of Fungi Cells

The two major types of fungi cells are yeast cells (unicellular fungi) and hyphae cells (the individual cells of multicellular, filamentous fungi).

Hyphae Cells

Hyphae are branching, threadlike structures found in multicellular, filamentous fungi. A single hypha consists of a long chain of tubular hyphae cells, which are joined together and divided by internal walls called septa. Hyphae grow from their tips, which allows the filaments to spread and branch out into new substrates. The hyphae release digestive enzymes into the substrate, which break it down into nutrients that the fungus can absorb.

Hyphae reproduce asexually using a process called fragmentation. During fragmentation, small pieces of hyphae can break off and grow to form a new colony of fungal cells.

Yeast Cells

Yeasts are unicellular fungi that reproduce asexually by budding. During this process, the yeast cell nucleus divides by mitosis, and a bud forms on the surface of the cell. Eventually, the bud will detach to become a new, genetically identical daughter cell.

Budding produces pseudohyphae elongated, newly-formed cells that resemble hyphae. Unlike true hyphae, pseudohyphae are made of conjoined but individual yeast cells that are easily separated from one another. The cells of true hyphae are parts of a multicellular organism and are fused together by septa.


Coral reefs are colored by fluorescent proteins (FPs) and chromoproteins (CPs) that constitute a homologous eukaryotic protein family with the jellyfish green FP (GFP) [1, 2]. These GFP homologs are small proteins each encoded by a single gene, comprise a relatively high percentage of soluble proteins in expressed tissues, and form their chromophore without needing cofactors or substrates other than oxygen. Such properties facilitated their cloning and engineering to revolutionize imaging in vivo. In contrast to FPs, CPs absorb visible light intensely to give colors clearly visible under ambient light and almost all have low fluorescence [1, 2].

CP absorption properties endow CPs with certain advantages over FPs, such as instrument-free detection by eye, efficient FRET quenching and photoacoustic imaging [3, 4]. Detection of FPs requires an ultra-violet light (UV) lamp, fluorometer or flow cytometer and can be limited by background fluorescence, photobleaching and UV damage of the sample. An alternative popular genetic reporter, the lux gene cluster, requires a luminometer for detection. CP detection is also advantageous over traditional colorimetric assays such as lacZ, which require expensive exogenously-added substrate and can be limited by background from endogenous enzyme [5]. CPs are thus particularly attractive as markers in living organisms [6,7,8], for the annual international Genetically Engineered Machine (iGEM) competition and teaching [9], as dye replacements, for art, and for cell biosensor applications in the field where costs and low resources are important considerations. Current methods for detecting environmental, agricultural and food contaminants, landmines and biowarfare agents, and medically-relevant targets can be improved by synthetic biology [10]. For example, bacteriophage have been engineered to cause bioluminescence of pathogenic bacteria in food, and bacteria have been engineered to fluoresce upon detection of spoiled meat gas [11], trinitrotoluene (TNT) products [12] or arsenic [13]. Adaptation of these biosensors to non-fluorescent detection for use in supermarkets or the field beckons, but it is unclear which CP genes might be suitable or how best to assay them quantitatively.

While the GFP family is native to eukaryotes, most foreseen near-term applications of CPs require efficient expression in bacteria such as E. coli where engineering is more straightforward. Such efficient heterologous expression often requires codon optimization, a mostly proprietary process that is still more of an art than a science, necessitating validation in each case [14]. Some CP genes are available commercially, but items have been discontinued without warning (e.g. fwYellow) and they lack the characterization and free availability associated with publication. CPs from ATUM cost $225/gene [15] and contain unwanted (“illegal”) restriction sites that interfere with the popular, standardized, BioBrick cloning method [16], while our CPs made available via the Registry of Biological Parts [16] incur their $500 annual fee. Furthermore, just as FP comparisons were needed to determine which FPs were best for engineering and certain applications [17], CPs need to be compared and their properties and assays improved.

CP publications to date have not reported on bacterial cell toxicity and typically focus on an individual CP ([2, 15, 18,19,20,21,22] in Table 1), making a survey of the relative properties of CPs and their genes difficult. Of the 11 non-synthetic CP genes listed in the right seven columns of Table 1, all were expressed solely from their native eukaryotic DNA sequences, with only four reported to be expressed and matured highly enough in E. coli to give intensely-colored colonies (asPink, amilGFP, aeBlue and amilCP). Thus we considered that, for some of the other seven native CP genes, codon optimization of the eukaryotic sequences to match E. coli preferences might be necessary for high functional expression in bacteria. Here, to simplify and expand CP study and applications, we make available 14 engineered CP genes that are functionally expressed, characterized and compared in E. coli, together with chromosomal mutants suitable for competition assays.

What is Chromosomal DNA

Chromosomal DNA is the genomic DNA. Both eukaryotic and the prokaryotic genome is organized into chromosomes. Prokaryotic genome only contains a single chromosome, which is circular. On the other hand, the eukaryotic genome contains several chromosomes, which are linear. Each chromosome contains an origin of replication and eukaryotic chromosomes contain more than one origin of replication due to their large size. Chromosomal DNA is always double-stranded.

Figure 2: Chromosomal DNA

The number of a particular type of chromosome in the genome depends on the type of species. However, most of the genomes on the earth are diploid and contain two copies of a particular type of chromosome. Since chromosomal DNA represents the genome of a particular organism, the information in the genes of the chromosomal DNA is necessary for the growth, development, and reproduction of the organism.



Homologs of the HUH endonucleases were retrieved by running searches against protein sequence databases filtered to 50 and 90% sequence identity (UniRef50 and UniRef90, respectively) which were downloaded from Search for bacterial homologs of CRESS-DNA virus Reps was performed against nr90 (NCBI’s nr database ( filtered to 90% identity). To detect remote sequence similarity, we used sequence profile databases which included profiles from PDB (, SCOP 69 , Pfam 70 , and CDD 71 . For query profile generation nr70 database was used.

Sequence searches and clustering

Homologs of the HUH superfamily endonuclease domains for each representative Rep sequence were obtained by performing three jackhmmer 72 iterations against the UniRef50 database. Representative Reps were selected as queries for homology searches based on exhaustive review of literature on the HUH superfamily 16,17,28,53 . In addition, for HUH groups with less than 10 homologs in UniRef50, we repeated searches against UniRef90 database. For homology searches only the HUH endonuclease domain was used to avoid attracting unrelated proteins, for example, containing superfamily 1 or 3 helicase domains. However, clustering was performed using full-length sequences to better reflect their evolutionary history. Dataset obtained by searches against the UniRef databases was supplemented with CRESS-DNA virus Reps devoid of obvious recombinant sequences from our previous study 53 . Sequences were clustered using CLANS with BLAST option 35 . CLANS is an implementation of the Fruchterman-Reingold force-directed layout algorithm, which treats protein sequences as point masses in a virtual multidimensional space, in which they attract or repel each other based on the strength of their pairwise similarities (CLANS p-values) 35 . Thus, evolutionarily more closely related sequences gravitate to the same parts of the map, forming distinct clusters. Rep clusters were identified by CLANS convex algorithm at P-value = 1e−08. To collect bacterial homologs of CRESS-DNA virus Reps, we used representative sequences as queries and performed two jackhmmer iterations against nr90 database. The resultant set of sequences was grouped using a convex clustering algorithm (at P-value = 1e−05) in CLANS. To ensure that we gathered all bacterial homologs, HMM profiles were constructed for each identified cluster and used as queries for searches against nr90 with hmmsearch 72 . Accessions of proteins for each group, shown in Fig. 1, are available for download (Supplementary Data 1). For collection of the SF3 helicase dataset, the helicase domain of a YLxH supergroup member from Streptococcus canis (WP_003048523) was used as a query for hmmer search against nr30 database available at the Bioinformatics Toolkit server 73 . The resulting dataset was supplemented with SF3 helicase sequences from CRESS-DNA viruses 53 , polyomaviruses, papillomaviruses, parvoviruses and P. pulchra-like plasmids (Supplementary Data 1). Extracted helicase domains were filtered to 70% identity with CD-HIT (parameter “-c 0.7”) 74 .

Remote homology detection

Sequence searches based on profile-profile comparisons were used to detect remote homology. For profile generation, two iterations of jackhmmer 72 were run against nr70 sequence database using E-value = 1e−03 inclusion threshold. The resulting profiles were used to search against profile databases with HHsearch 75 . Search results for proteins from representative bacterial plasmids and integrative elements are available in Supplementary Data 1.

Multiple sequence alignments and phylogenetic analysis

To construct multiple sequence alignments for phylogenetic analysis we used MAFFT 76 and TrimAl 77 . MAFFT options G-INS-i and L-INS-i and TrimAl gap thresholds 0.05 and 0.15 were used to generate alignments for Figs. 2 and 5, respectively. The resulting alignments covered both HUH and SF3 (where available) domains and contained 743 and 508 positions, respectively. Both alignments can be found in the Supplementary Data 2 and 3. Phylogenetic trees were calculated with PhyML 78 using automatic model selection and aBayes branch support. Substitution models VT + G + I + F (VT, amino acid replacement matrix G, gamma shape parameter: estimated (1.864) I, proportion of invariable sites: estimated (0.005) F, equilibrium frequencies: empirical) and LG + G (LG, amino acid replacement matrix G: estimated (1.807)) substitution models were selected for phylogenetic analyses shown in Figs. 2 and 5, respectively. Additional trees were constructed using IQ-Tree v1.6.8 (ref. 79 ) with Ultrafast Bootstrap Approximation branch support 80 , and RAxML with non-parametric bootstrapping 81 . Mixture model tree was constructed with IQ-Tree 79 using model parameters (LG + C20 + F + G) and ultrafast bootstrap (with 1000 replicates). Alignment and guide tree (parameters “-s” and “-ft”, respectively) were the same as in Fig. 5. Highly diverged sequences forming long branches were removed before constructing final trees. Bacilladnaviridae viruses were also removed, because their position was not stable in trees with different sequence sampling (Supplementary figure 6). Phylogenetic trees are available from the authors upon request. The trees shown in Figs. 2, 5, S5 and S6 can be found in the Supplementary Data 4 to 10.

Statistical tests

Alternative topologies for the Rep tree were tested using the IQ-Tree software version 1.6.8 with the following parameters: -m LG+G -n 0 -zb 100000 -zw -au (ref. 79 ). As an unconstrained tree, we used the original PhyML tree (Fig. 5), which was tested against each of the constrained trees. The following tests were performed: Approximately Unbiased (AU) test 82 , logL difference from the maximal logl in the set, RELL test 83 , one sided and weighted Kishino–Hasegawa (KH) tests 84 , Shimodaira–Hasegawa (SH) test 85 , weighted SH test, Expected Likelihood Weight (ELW) test 86 .

Sequence logos

Sequence logos for the Reps of CRESS-DNA virus families were taken from ref. 57 . Alignments for other groups were obtained from an alignment used to build the tree shown in Fig. 5. Sequence logos were created using WebLogo server 87 .

Genomic context analysis

The integrated plasmids were identified by thorough analysis of genomic neighborhoods of the Rep-encoding genes. The precise borders of integration were defined based on the presence of direct repeats corresponding to attachment sites. The repeats were searched for using Unipro UGENE 88 . Genes of integrated plasmids were annotated based on the HHsearch searches 75 . Genome maps were compared and visualized using Easyfig with tBLASTx option 89 .

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Packaging of DNA, Genome, chromosomal proteins, DNA in Prokaryotes & Eukaryotes

Prokaryotes are living organisms whose genetic material is not surrounded by a nuclear membrane, but found free in the cytoplasm such as bacteria, DNA molecules of mitochondria and chloroplasts (organelles of eukaryotic cells) are very similar to those of prokaryotes, Plasmids are found in the yeast cells (from eukaryotes).

DNA in Prokaryotes

DNA of Escherichia coli bacterium as an example for prokaryotes:

  1. DNA exists as a double helix with its ends joined to each other to form a circle.
  2. If DNA was stretched out in a straight line, it would be about 1.4 millimeters in length, whereas the cell itself is only about 2 micron in length.
  3. DNA is folded many times and occupied a nuclear area about 0.1 of the cell’s volume.
  4. DNA molecule is attached to the plasma membrane at one point or more.

Some bacteria contain one or several additional, much smaller and circular DNA molecules which are called “plasmids”, Plasmids are much smaller circular DNA molecules and not complexed with proteins.

Importance of plasmids: Plasmids are widely used in the field of genetic engineering, where the bacterial cell replicates any plasmid inside it during the replication of its main DNA and the scientists take advantage of this activity by introducing artificial plasmids into the bacterial cells, in order to obtain several copies of them.

Packaging of DNA

DNA in Eukaryotes

Eukaryotes are living organisms whose genetic material is surrounded by a nuclear membrane that separates it from the cytoplasm and DNA is organized into several chromosomes, The human’s somatic cell contains 46 chromosomes, Chromosomes appear in eukaryotic cells during the cell division.

Structure of chromosomes

Each chromosome contains a single DNA molecule, extending from one end of the chromosome to the other end, DNA molecule is coiled and folded many times and associated with various proteins, forming a “chromatin” which contains roughly equal amounts of DNA and proteins.

Chromatin is one molecule of DNA that is coiled and folded many times and associated with proteins, The chromosomal proteins are divided into Histone proteins and Non-histone proteins.

Histone proteins

Histones are a well-defined group of small structural proteins, where they have a high content of the basic amino acids “arginine” and “lysine” and present in great amounts in the chromatin of any cell.

At the normal pH of the cell, the amino acids “arginine” and “lysine” have a positively charged alkyl groups (R), So, they bind strongly to the negatively charged phosphate groups of DNA molecule, Histones proteins are responsible for shortening the DNA molecule tenfold by forming a string of nucleosomes.

Non-histone proteins

Non-histones are heterogeneous groups of structural and regulatory proteins with many functions found in the structure of chromatin and they present in a little amount, Non-histone proteins have many different functions because they contain :

  • Structural proteins enter in the structure of some definite parts of DNA molecule and play the main role in the spatial organization of DNA within the nucleus as they are responsible for shortening DNA about 100,000 times by forming the packed chromatin.
  • Regulatory proteins determine whether the DNA code will be used in making RNA, proteins and enzymes or not.
Packaging of DNA

If we imagine that DNA double helix of each chromosome was lined up and stretched out, it would be about 2 metres in length, The histones and other proteins are responsible for packing this long molecule into the cell’s nucleus that of 2 : 3 micron in diameter.

Steps of DNA packaging

Biochemical analysis and electron micrographs have shown that DNA is packed, as follows:

  1. DNA is wounded around clusters of histones, forming a string of particles that is called “nucleosomes”, this shortens the molecule about tenfold, but it must be packed about 100,000 times more tightly to fit into the nucleus.
  2. The string of nucleosomes is coiled to pack the nucleosomes together, However, even this is not sufficient to shorten the DNA molecule to the required length.
  3. The tightly coiled strings of nucleosomes are arranged in large loops that are held together by the structural non-histone proteins to form the chromatin, Chromatin is packed up as tightly as possible to be condensed into the chromosome.

Nucleosomes are a string of particles that found in the chromosomes and consist of DNA molecule wounded (wrapped) around clusters of histones to shorten the DNA molecule about tenfold.

When DNA is packed as chromatin, replication enzymes apparently can’t reach it, This packaging must be unwounded at least into a string of nucleosomes before DNA can serve as a template for DNA or RNA synthesis.

Structure of the genome

In 1977, researchers found methods that can be used for determining the sequence of nucleotides in DNA and RNA molecules, This provided the tools to describe precisely how genes are arranged within the cell’s DNA molecule, Genome is the total of all genes ( all DNA) found in the cell.

DNA contains genes that carry instructions to construct the :

  • The sequence of nucleotides that is responsible for making proteins.
  • The sequence of nucleotides that transcribes the ribosomal RNA (rRNA) which enters the building of ribosomes.
  • The sequence of nucleotides that transcribes the transfer RNA (tRNA) which carries the amino acids during protein synthesis.

The genes in prokaryotes are the genes that are responsible for the RNA and protein synthesis and represent most of the genome.

The genes in eukaryotes: less than 70% of the genome serve the function of RNA and protein synthesis and the rest of the genome is unaccounted ( has unknown function).

Repetitive DNA

Most genes are present in only one or few copies in the genome, such as:

Genes needed to synthesize the ribosomal RNA and histones that the cell needs in large amounts, where they are reasonable to suppose that having multiple copies of these genes to speed up the cell’s production of new ribosomes and histones, So, there are many -often hundreds- of copies of the genes in all the eukaryotic cells.

Some nucleotides sequences of DNA have been repeated many times, The role of most of this repetitive DNA is still unclear, For instance, in the fruit fly (Drosophila), the brief nucleotides sequence (A-G-A-A-G) is repeated about 100,000 times in the middle of one chromosome, This and many other repeated sequences are noncoding DNA.

Other noncoding DNA

The satellite DNA of some chromosomes is noncoding, Eukaryotic genomes contain a great deal of other noncoding DNA, where the geneticists observe that the amount of DNA in species’ genome bears little relationship to the complexity of the organism or the number of proteins that is produced by it, Little amount of DNA of the plants and animals actually codes for protein synthesis.

Example: The largest known genome belongs to the salamander, its cells contain about 30 times the amount of DNA found in the human cells, although they produce fewer proteins, this is due to the noncoding of a large amount of DNA.

Functions of noncoding DNA:

  1. Perhaps some of the noncoding DNA act on keeping the chromosomes structure.
  2. Some regions of DNA are references to the places at which the messenger RNA (mRNA) synthesis should start and these regions are important in the protein synthesis.

We can compare between prokaryotic DNA and eukaryotic DNA, as follows:

Prokaryotic DNA

It exists as a double helix, its ends are joined together, it is attached to the plasma membrane at one point or more and it is not organized in the form of chromosomes, It is found in the cytoplasm (not surrounded by the nuclear membrane), It is not complexed with proteins.

The most are coding, It starts from the attachment point with the plasma membrane, Plasmids present and not complexed with proteins, Most of them are responsible for making the RNA and proteins.

Eukaryotic DNA

Eukaryotic DNA exists as a double helix, its ends are free and it is organized in several chromosomes, It is found in the nucleus (surrounded by the nuclear membrane), It is complexed with histone and non-histone proteins.

Some are non-coding, It starts at any point along the DNA molecule, Plasmids present in the yeast only, Less than 70% serve the function of RNA and protein synthesis and the rest of the genome is unaccounted (has unknown function).