Charles Darwin University

CDU eSpace
Institutional Repository

CDU Staff and Student only

Deriving accurate microbiota profiles from human samples with low bacterial content through post-sequencing processing of Illumina MiSeq data

Jervis-Brady, Jake, Leong, Lex E. X., Marri, Shashikanth, Smith, Renee J., Choo, Jocelyn M., Smith-Vaughan, Heidi C., Nosworthy, Elizabeth, Morris, Peter S., O'Leary, Stephen, Rogers, Geraint B. and Marsh, Robyn L. (2015). Deriving accurate microbiota profiles from human samples with low bacterial content through post-sequencing processing of Illumina MiSeq data. Microbiome,3(Article No. 19).

Document type: Journal Article
Citation counts: Altmetric Score Altmetric Score is 13
Google Scholar Search Google Scholar
Attached Files (Some files may be inaccessible until you login with your CDU eSpace credentials)
Name Description MIMEType Size Downloads
Download this reading SmithVaughan_59055.pdf Published version application/pdf 2.04MB 273
Reading the attached file works best in Firefox, Chrome and IE 9 or later.

IRMA ID 11381xPUB115
NHMRC Grant No. 1007641
Title Deriving accurate microbiota profiles from human samples with low bacterial content through post-sequencing processing of Illumina MiSeq data
Author Jervis-Brady, Jake
Leong, Lex E. X.
Marri, Shashikanth
Smith, Renee J.
Choo, Jocelyn M.
Smith-Vaughan, Heidi C.
Nosworthy, Elizabeth
Morris, Peter S.
O'Leary, Stephen
Rogers, Geraint B.
Marsh, Robyn L.
Journal Name Microbiome
Publication Date 2015
Volume Number 3
Issue Number Article No. 19
ISSN 2049-2618   (check CDU catalogue open catalogue search in new window)
Total Pages 11
Place of Publication United Kingdom
Publisher BioMed Central Ltd.
HERDC Category C1 - Journal Article (DIISR)
Abstract Background
The rapid expansion of 16S rRNA gene sequencing in challenging clinical contexts has resulted in a growing body of literature of variable quality. To a large extent, this is due to a failure to address spurious signal that is characteristic of samples with low levels of bacteria and high levels of non-bacterial DNA. We have developed a workflow based on the paired-end read Illumina MiSeq-based approach, which enables significant improvement in data quality, post-sequencing. We demonstrate the efficacy of this methodology through its application to paediatric upper-respiratory samples from several anatomical sites.

A workflow for processing sequence data was developed based on commonly available tools. Data generated from different sample types showed a marked variation in levels of non-bacterial signal and ‘contaminant’ bacterial reads. Significant differences in the ability of reference databases to accurately assign identity to operational taxonomic units (OTU) were observed. Three OTU-picking strategies were trialled as follows: de novo, open-reference and closed-reference, with open-reference performing substantially better. Relative abundance of OTUs identified as potential reagent contamination showed a strong inverse correlation with amplicon concentration allowing their objective removal. The removal of the spurious signal showed the greatest improvement in sample types typically containing low levels of bacteria and high levels of human DNA. A substantial impact of pre-filtering data and spurious signal removal was demonstrated by principal coordinate and co-occurrence analysis. For example, analysis of taxon co-occurrence in adenoid swab and middle ear fluid samples indicated that failure to remove the spurious signal resulted in the inclusion of six out of eleven bacterial genera that accounted for 80% of similarity between the sample types.


The application of the presented workflow to a set of challenging clinical samples demonstrates its utility in removing the spurious signal from the dataset, allowing clinical insight to be derived from what would otherwise be highly misleading output. While other approaches could potentially achieve similar improvements, the methodology employed here represents an accessible means to exclude the signal from contamination and other artefacts.
Keywords 16S rRNA
Pair-end reads
Otitis media
DOI   (check subscription with CDU E-Gateway service for CDU Staff and Students  check subscription with CDU E-Gateway in new window)
Additional Notes This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Description for Link Link to CC Attribution 4.0 License

© copyright

Every reasonable effort has been made to ensure that permission has been obtained for items included in CDU eSpace. If you believe that your rights have been infringed by this repository, please contact

Version Filter Type
Access Statistics: 197 Abstract Views, 273 File Downloads  -  Detailed Statistics
Created: Tue, 26 Jul 2016, 12:43:54 CST