Charles Darwin University

CDU eSpace
Institutional Repository

CDU Staff and Student only

SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets

Sarovich, Derek S. and Price, Erin P. (2014). SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets. BMC Research Notes,7(Article No. 618).

Document type: Journal Article
Citation counts: Altmetric Score Altmetric Score is 11
Google Scholar Search Google Scholar
Attached Files (Some files may be inaccessible until you login with your CDU eSpace credentials)
Name Description MIMEType Size Downloads
Download this reading Sarovich_49410.pdf Published version application/pdf 2.21MB 126
Reading the attached file works best in Firefox, Chrome and IE 9 or later.

IRMA ID 84473293xPUB15
Title SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets
Author Sarovich, Derek S.
Price, Erin P.
Journal Name BMC Research Notes
Publication Date 2014
Volume Number 7
Issue Number Article No. 618
ISSN 1756-0500   (check CDU catalogue open catalogue search in new window)
Scopus ID 2-s2.0-84907447157
Total Pages 9
Place of Publication United Kingdom
Publisher BioMed Central Ltd.
HERDC Category C1 - Journal Article (DIISR)
Abstract Background
Next-generation sequencing (NGS) is now a commonplace tool for molecular characterisation of virtually any species of interest. Despite the ever-increasing use of NGS in laboratories worldwide, analysis of whole genome re-sequencing (WGS) datasets from start to finish remains nontrivial due to the fragmented nature of NGS software and the lack of experienced bioinformaticists in many research teams.


We describe SPANDx (Synergised Pipeline for Analysis of NGS Data in Linux), a new tool for high-throughput comparative analysis of haploid WGS datasets comprising one through thousands of genomes. SPANDx consolidates several well-validated, open-source packages into a single tool, mitigating the need to learn and manipulate individual NGS programs. SPANDx incorporates BWA for alignment of raw NGS reads against a reference genome or pan-genome, followed by data filtering, variant calling and annotation using Picard, GATK, SAMtools and SnpEff. BEDTools has also been included for genetic locus presence/absence (P/A) determination to easily visualise the core and accessory genomes. Additional SPANDx features include construction of error-corrected single-nucleotide polymorphism (SNP) and insertion-deletion matrices, and P/A matrices, to enable user-friendly visualisation of genetic variants. The SNP matrices generated using VCFtools and GATK are directly importable into PAUP*, PHYLIP or RAxML for downstream phylogenetic analysis. SPANDx has been developed to handle NGS data from Illumina, Ion Personal Genome Machine (PGM) and 454 platforms, and we demonstrate that it has comparable performance across Illumina MiSeq/HiSeq2000 and Ion PGM data.


SPANDx is an all-in-one tool for comprehensive haploid WGS analysis. SPANDx is open source and is freely available at: webcite.
Keywords NGS
Comparative genomics
Variant calling
DOI   (check subscription with CDU E-Gateway service for CDU Staff and Students  check subscription with CDU E-Gateway in new window)
Additional Notes This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Description for Link Link to CC Attribution 4.0 License

© copyright

Every reasonable effort has been made to ensure that permission has been obtained for items included in CDU eSpace. If you believe that your rights have been infringed by this repository, please contact

Version Filter Type
Access Statistics: 117 Abstract Views, 126 File Downloads  -  Detailed Statistics
Created: Wed, 19 Aug 2015, 12:31:18 CST