Søren Kaae Sønderby:
Sequence Analysis

Date: 21-12-2017    Supervisor: Anders Krogh & Ole Winther




This thesis explores machine learning for analysis of biological sequences. We demonstrate that neural networks naturally handles biological sequences and perform better than competing methods for secondary protein structure prediction and subcellular localization of proteins. Neural network models can be built by combining dierent dierentiable sub-modules. We use this to construct models that can be interpreted and communicated to biologists. Specically we show that 1d convolution layers can be interpreted as motif detectors that can be visualized similarly to sequence logos. Secondly, we show that neural networks augmented with attention modules learn to attend to specic parts of the sequence.

The second part of this thesis demonstrates that distances in learned feature space capture face attributes better than feature distances in raw feature space.