September 17, 2024
Share
A new article published in npj Biodiversity has established the first process to ensure that accurate, complete, reference genomes of many species will become available to the scientific community. This pilot project from the European Reference Genome Atlas (ERGA), the European node of the Earth BioGenome Project (EBP), involved the development and testing of an infrastructure model to coordinate, produce, and distribute reference genomic resources on 98 eukaryotic species representing samples of local native biodiversity.
ERGA’s unique Pan-European effort describes an inclusive and equitable network that will make reference-grade genomes of over 40,000 native species facing challenges available to accelerate our understanding of biodiversity.
This massive endeavor exposed challenges of working at the international scale, involving hundreds of people willing to work towards common solutions across sample storage, wet lab preparation, sequencing, data handling and storage. Arima Genomics was proud to be a technology partner, and to see our solutions used to enable genome assembly work.
When Only Hi-C Quality Will Do for Accurate Scaffolding
To maintain the delicate balance between the model’s decentralized approach and the high standards required for reference genome assemblies, the researchers aligned on nine iterative steps to support the production of a complete reference genomics resource for each of the species included into ERGA pilot project.
Since participating researchers were working with samples of varying quality – ranging from archival to frozen – many of the ERGA recommendations are intended to ensure consistent quality. More specifically, participants data had to reach minimum EBP recommendations for near-error free, gapless, haplotype phased and annotated reference genomes, regardless of the species being sequenced, or the input sample type.
For each sample, high molecular weight DNA was used to produce long-read data, proximity ligation assemblies and sequence annotation. Hi-C technology is used specifically to detect and correct assembly errors and correctly orient sequences to chromosomes. Hi-C also provides complete haplotype phasing and resolution of long repeat regions, such as those found in telomeres, to further improve contiguity, quality, and phasing. Because Arima Hi-C is the preferred method to produce high quality scaffolds in any species, 47 out of 98 species were assembled using proximity ligation. Table 1 describes the quality metrics described in the paper that were considered for reference genome scaffolding.
Table 1: ERGA Assembly Quality Recommendations
Minimum Reference Standard | Error Rate1 | False Duplications1 | Kmer Completeness | Sequence Assigned | Single Copy Conserved Genes (i.e. BUSCO) | Transcripts from the same organism |
6.C.Q40 C.C.Q40* | 1/10,000 | <5% | >90% | >90% to candidate chromosomal sequences | >90% complete, single copy | >90 mappable |
*Species with Chromosome N50 <1 MB
1See more: https://zenodo.org/records/8088393
Next Steps
By fostering international collaboration and focusing on inclusivity and equity, ERGA is setting new standards for biodiversity genomics. For many of the 33 participating countries, the project offered scientists their first opportunity to generate of state-of-the art reference genomes themselves.
The lessons learned during this pilot provide a solid foundation for ERGA while offering insight for designing the next large genomic resource project.
Noted Mark Blaxter, Head of the Tree of Life Programme, Wellcome Sanger Institute, “the ERGA Pilot project is a radical step forward for the continent as it links researchers who need genomes sequenced with sequencing hubs ready to do just that. This exchange promotes sharing of tools, approaches, and understanding and has led to the successful bid for Europe-wide Horizon 2020 Biodiversity Genomics Europe funding. The pilot teams’ enthusiasm has carried through to the wider project, and I am excited to see what they will accomplish in the future.”
Read more from ERGA on how this foundational work will promote robust and standardized workflows for a comprehensive species genomic database for Europe and beyond. https://www.erga-biodiversity.eu/post/better-together-scientists-from-33-european-countries-join-forces-to-generate-reference-genomes-for
References:
- The European Reference Genome Atlas: piloting a decentralised approach to equitable biodiversity genomics.
- https://www.youtube.com/watch?v=kYL9Pkp6Bbs
- https://www.earthbiogenome.org/
- https://www.earthbiogenome.org/report-on-assembly-standards
- https://arimagenomics.com/resources/blog/high-quality-genome-assembly-with-hi-c/
- https://arimagenomics.com/resources/blog/high-quality-genome-assemblies/