Firstly, I’d like to briefly explain our study to give some background to the analysis I carried out (more detail can be found in the paper: link here when accepted). Our study aimed to characterise the microbiome of the nasopharynx (the area at the back of the nose) and middle ear in children with recurrent ear infections. We also compared the nasopharynx of these children to that of healthy children to see if we could find commensal (“good”) bacteria that might protect against recurrent ear infections. We ended up with almost 100 children in each group and had over 450 samples in total for analysis.

Prior to sequencing

As this document focuses on the data analysis, I will not go into detail on how the samples were prepared but if you would like further detail, please refer to the paper (future link). We sequenced the V3/V4 region of the 16S gene, using the recommended primers and protocol (with some changes) in the workflow from Illumina.

The samples were spread across four separate MiSeq runs (2 x 300 bp) each with positive and negative sequencing controls. The data was returned in the form of a _R1.fastq.gz and _R2.fastq.gz file for each sample so these were the files I started with.

Major software used

I have used the following software in this analysis:

Outline of pre-processing analysis

See the diagram below for an overview of the full analysis.