April 29, 2021
The Vertebrate Genomes Project has announced their flagship study and other publications on genome assembly in a special issue of Nature. The study focuses on quality and seeks to standardize and optimize methods of genome assembly. The publications exhibit complete genomes of 16 species assembled over 5 years and represent the beginning of Phase 1 of the VGP project. Phase 1 of the study involves assembling genomes from one representative species from each of 260 orders. Phase 2 will focus on representative species from vertebrate families. These representatives will serve as reference genomes for assembling additional species’ genomes.
The VGP’s mission is derived from the mission of the Genome 10K Community of Scientists (G10K), to sequence the genomes of 10,000 vertebrate species, but expands on the Genome 10K’s mission by seeking to produce genome assemblies for all ~70,000 living vertebrates. To date, 129 genome assemblies have been submitted to the VGP workflow. These genomes provide information about detailed evolutionary history and gene annotations.
Arima Genomics’ stringent quality standards fall in line with the VGP’s mission to optimize genome assembly processes. The VGP works collaboratively across the globe with individual labs as well as other consortia to determine quality metrics and produce high-quality reference genomes. The project uses a state-of-the-art automated approach of combining long-read and long-range chromosome scaffolding approaches with novel algorithms that put the pieces of the genome assembly puzzle together. Many existing genomes contain gaps and missing information that make further sequencing difficult. Gene duplications and similarities as well as incorrect sequencing was revealed by the VGP’s efforts.
The VGP’s typical workflow uses long-read PacBio sequencing is performed to generate the initial contigs, followed by long-range scaffolding approaches to assemble contigs into chromosomes. Arima Genomics is a technology partner of the VGP Phase 1 and leverages Hi-C technology to detect and correct errors in assemblies and orient sequences to chromosomes. By capturing the sequence and structure of chromosomes within cells to enable chromosome-scale scaffolding, Hi-C has emerged as a powerful tool for deriving high-quality and contiguous genome assemblies and is widely used today in a variety of research areas across the plant and animal kingdoms, including large-scale genome assembly consortia projects such as the Vertebrate Genome Project (VGP).
Often, the amount of sample available for genome assembly is small or low-quality due to preservation methods. In these cases, methods that account for low-input or low-quality samples must be used. For difficult samples, Arima has low-input and high-coverage Hi-C protocols. These protocols have supported genome assembly of endangered mammals, like the marine vaquita, and representatives of threatened species, like the channel bull blenny. Scaling up the protocol to assemble multiple species at once, such as six bat species in collaboration with the Bat 1K consortium, has allowed study of evolutionary adaptations, including those that lead to the pathogenesis of the coronavirus SARS-CoV-2.
This dedication to quality and optimization is echoed by Arima’s CEO, Sid Selvaraj. “Generating a high-quality genome assembly is a critical starting point towards understanding the biology of an organism,” Sid says, “Arima’s mission is to accelerate the understanding of genome sequence and structure. For researchers to be able to assemble genomes quickly and easily at the VGP’s standards of quality and performance, they need technology and support that they can count on.”
Links to all of the publications related to this project can be found in this special issue of Nature.