To facilitate the genomic surveillance of COVID-19 variants, END-VOC researchers have developed a detailed pipeline for sequencing optimisation and bioinformatics which, is now available to the wider scientific community through this website
The effort to sequence SARS-CoV-2 samples is unprecedented. To date, more than 16 million SARS-CoV-2 genomes assemblies have been deposited in the GISAID database and consulted by hundreds of researchers. A critical aspect for identifying emerging SARS-CoV-2 variants is to effectively integrate and standardise the results produced by different teams across the world.
Two documents to facilitate SARS-CoV-2 surveillance
END-VOC researchers have produced two documents that can help to facilitate this process, and which are now available to the wider scientific community through the Resources page of this website.
The first document (Sequencing Optimisation Pipeline), describes the challenge of integrating different sources of raw genomic data and proposes a series of recommendations to minimise biases and problems when analysing datasets that combine raw data drawn from heterogeneous sources.
The second document (Bioinformatics Pipeline) provides a ‘best-practice’ case study example where END-VOC researchers compiled highly heterogeneous sequencing data of around 850 publicly available SARS-CoV-2 raw reads from the NCBI Short Read Archive (SRA) repository and analysed the dataset using an ‘all-in-one’ workflow to generate consensus sequences and perform phylogenetic reconstructions. The detailed steps and scripts are available on the Github platform.
“ We hope that this effort will not only be of use to our END-VOC partners, but will also foster a collaborative environment in which the methodological choices for SARS-CoV-2 sequencing and analysis are collectively discussed, evaluated and endorsed by the wider scientific community” says François Balloux, researcher at UCL and leader of the END-VOC work package dedicated to SARS-CoV-2 sequencing and phylogenetics.