December 21, 2021
Seventeen years ago, the Human Genome Project was completed through painstaking manual effort and tremendous cost, heralding the beginning of the genomics era. We are now on the cusp of yet another genomics revolution. By 2031, global initiatives, such as the Earth BioGenome Project, will generate an unprecedented repository of high-quality reference genome assemblies for every known eukaryotic species – all 1.5 million of them.
Genomic data is used by diverse biological fields, from conservation to biomedical research, and the quality of this genetic information is paramount. The Earth BioGenome Project is working towards the ultimate goal of sequencing all complex life on Earth and relies on affiliated project networks, including the Vertebrate Genome Project (VGP) and the Darwin Tree of Life, to generate these high-quality, annotated reference genomes.
A major challenge for conducting high-quality genomic research is that most current genome assemblies are riddled with errors. Parts of genes are missing, some are incorrectly assembled, while others are completely missing from the assemblies despite pieces found in the raw sequence reads. Technological advances, improved computational methods, and the ever-decreasing cost of sequencing enabled the Vertebrate Genomes Project to pursue the ambitious goal of producing a reference genome assembly for each of the 71,657 extant vertebrate species on Earth. Their first step was to test and improve genome sequencing and assembly approaches toward the goal of creating high-quality, near-error free and gapless, haplotype phased and annotated reference genomes.
The Darwin Tree of Life program similarly aims to sequence the genomes of 70,000 species of eukaryotic organisms in Britain and Ireland with the goal towards transforming our approach to biology, conservation, and biotechnology. Wellcome Open Research releases each Tree of Life genome assembly as a micropublication called a ‘Genome Note’, which summarizes the origin of the specimen used for sequencing, the methods used to extract and sequence the genetic material, and methods used to assemble and polish the assemblies.
Arima Genomics has partnered with various initiatives, including the VGP and the Darwin Tree of Life, to harness our Hi-C technology to improve the contiguity, quality, and phasing of the assemblies. This approach provides more comprehensive information than genomic sequencing alone. Hi-C technology is used specifically to detect and correct errors in assemblies – for example, false duplications – and orient sequences to chromosomes. Hi-C analysis facilitates complete haplotype phasing and resolution of long repeat regions (telomeres, centromeres, sex chromosomes).
The high-quality reference genomes generated by the Vertebrate Genome Project have contributed to multiple scientific disciplines. Arima Genomics is proud to have facilitated these efforts, a few of which are highlighted here:
Marmoset Genome Reveals Biomedical InsightsTrio analysis (mother-father-offspring), in combination with long-read sequencing and Hi-C, enabled the accurate and complete assembly of both maternal and paternal haploid genomes of the marmoset, a primate animal model for a broad range of biomedical research, including neuroscience, stem cell biology, and regenerative medicine. Arima Hi-C analysis was imperative for resolving both sex chromosomes – a considerable challenge given the densely repetitive elements of the Y chromosome. The heterozygosity level between the haploid genomes is ten times higher than could be revealed by previous genomic sequencing methods. While marmoset and human brain genes are largely conserved, the marmoset has several genes for human pathogenic amino acids, highlighting the need to consider genomic context when developing animal models.
Conservation Genomics to Save the VaquitaGenome assembly of the vaquita, a critically endangered porpoise, yielded a genome-wide heterozygosity that is the lowest of any mammalian species studied to date. Surprisingly, heterozygosity was evenly distributed throughout the genome, consistent with a population that is at equilibrium despite the small population. Hi-C evaluation enabled the assembly of the most complete marine mammalian genome, uncovering insight, not only for the critically endangered vaquita, but for all cetaceans including dolphins, porpoises and toothed whales. This knowledge is imperative for conservation efforts; if selection pressures – namely incidental bycatch in gillnets – is removed, the healthy vaquita population could have a chance to recover.
Biology and Evolution of Bat AdaptationsSimultaneous genome assembly of six bat species facilitated the study of evolutionary adaptations, including those that lead to the loss of immunity-related genes. Consistent with this finding, bat genomes contain a high diversity of endogenized viruses. Analysis of these reference genomes uncovered genes involved in hearing that shed light on the ancestral origin of laryngeal echolocation. These high-quality bat genomes provide a rich resource to address evolutionary history and the genomic basis of bat adaptations and biology, enabling a better understanding of the exceptional immunity and longevity of bats, allowing for the identification and targeting of molecular targets that can be harnessed to alleviate human aging and disease.
Explore the Vertebrate Genomes Project collection page for additional VGP publications and information.
The Genome Notes, published as part of the Wellcome Open Research Tree of Life Gateway, are intended to promote the discovery and use of the sequence datasets by providing a detailed description of each. All submitted Genome Notes can be found on the Tree of Life Gateway; a select few recently published genomes are listed below:
- Devil’s coach horse (Ocypus olens)
- Small copper or common copper (Lycaena phlaeas)
- Common toad (Bufo bufo)
- Common frog (Rana temporaria)
- Tapered dronefly (Eristalis pertinax)
- Green-veined white butterfly (Pieris napi)
- Predatory ribbon worm (Lineus longissimus)
- Garden bumblebee (Bombus hortorum)
- Peach blossom moth (Thyatira batis)
- Glanville fritillary (Melitaea cinxia)
Rhie A, et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737–746 (2021). https://doi.org/10.1038/s41586-021-03451-0
Yang C, et al. Evolutionary and biomedical insights from a marmoset diploid genome assembly. Nature 594, 227–233 (2021). https://doi.org/10.1038/s41586-021-03535-x
Jebb D, et al. Six reference-quality genomes reveal evolution of bat adaptations. Nature 583, 578–584 (2020). https://doi.org/10.1038/s41586-020-2486-3
Morin, PA et al. Reference genome and demographic history of the most endangered marine mammal, the vaquita. Molecular Ecology Resources. 21: 1008-1020 (2021). https://doi.org/10.1111/1755-0998.13284