Asker Daniel Brejnrod:
Bioinformatics for discovery of microbiome variation

Date: 04-04-2017    Supervisor: Søren Johannes Sørensen

Sequencing based tools have revolutionized microbiology in recent years. Highthroughput DNA sequencing have allowed high-resolution studies on microbial life in many different environments and at unprecedented low cost. These culture-independent methods have helped discovery of novel bacteria in both the environment and human hosts that are not accessible through traditional means, and genome bioinformatics have allowed interrogation of their metabolic capabilities through novel tools of reconstructing genomes from short DNA sequencing reads. These new approaches to studying bacterial taxonomy and function enables hypothesis generation of how variation in microbial communities and functionality impacts interaction with their hosts and environments.

This thesis contributes new tools and improvements to existing ones to discover relevant variation in microbiomes. Finally, it explores the integration of various molecular methods to build hypotheses about the impact of a copper contaminated soil.

The introduction is a broad introduction to the field of microbiome research with a focus on the technologies that enable these discoveries and how some of the broader issues have related to this thesis particularly for considerations there was no space for in the papers. The main focus is on the bioinformatics, the processing of the data after it has been generated by the sequencing machines. However, some topics such as DNA extraction is touched upon as it have a big impact on downstream results.

Manuscript 1 ,“Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies”, benchmarked the performance of a variety of popular statistical methods for discovering differentially abundant bacteria . between two conditions. The purpose is to assess the false discovery rate, recovery of truly differential abundant bacteria and the impact of beta diversity exploration strategies commonly used in microbiome research. We assess these differences by simulation and by making biological assumptions about the data, and conclude that researchers should be careful in their choice of methods, as it might lead to inflated rates of both type I and type II errors.

In Manuscript 2 a new software package is described that allows the exploration of microbial functionality that is found on mobile genetic elements. This exploration is done through annotation of metagenomic contigs and subsequent statistical discovery. Statistical validation of the approach is performed and two datasets are tested, the Hygum soil metagenomes and a public dataset of UC patients. A number of significant candidates are discovered and their feasibility is discussed in the context of available knowledge of the respective microbiomes.

Manuscripts 3 and 4 detail the exploration of a heavily copper-contaminated site in Hygum, Denmark. The site has been used for impregnating wood using CuSO4, and while it was shut down in 1924 it is still contaminated with a gradient of copper spanning 100- folds. “Coping with copper: legacy effect of copper on potential activity of soil bacteria following a century of exposure” uses RNA-based sequencing to study the active fraction of the microbioal community. It extends on past work on the soil by enumerating only the bacteria of the active fraction and defining functional response groups, groups that respond similarly to the copper concentrations. Manuscript 4 extends previous literature by using metagenomics sequencing and RNA sequencing of mRNA to characterize the functionality of the microbial community. Based on an assessment of catabolic diversity and expression of growth-related proteins we conclude that the copper contamination selects for K-strategists, and that they are likely infected with lytic phages that prey on the community.