Sequencing Pipeline Optimisation

Learn more

In the context of a global and collaborative setting in which multiple laboratories are involved in SARS-CoV-2 surveillance and sequencing, it is critical to effectively integrate and standardise the results produced by different teams. This document describes the challenge of integrating different sources of raw genomic data and proposes a series of recommendations as to how to minimise biases and problems when analysing datasets combining raw data drawn from heterogeneous sources.

Bioinformatics Pipeline

Learn more

In this document we provide a bioinformatics pipeline that allows the analysis of a heterogenous data set of publicly shared SARS-CoV-2 raw reads to generate consensus sequences for each sample. We then perform a standard phylogenetics reconstruction that can be performed on sequence alignments of consensus genomes. The chosen pipeline performed well despite the highly heterogeneous nature of the raw sequence data selected. Detailed annotation of all the steps as well as all the scripts needed to reproduce the analysis are available on the Github platform