Lukasz Kielpinski:
High-throughput sequencing based methods of RNA structure investigation

Date: 14-02-2014    Supervisor: Jeppe Vinther & Jan Christiansen

RNA exists in cells in the form of dynamic, three dimensional entities, but to assist its description researchers resort to studying its primary (sequence), secondary (base pairing) and finally the tertiary (three dimensional) structure. Traditional methods of studying the secondary and tertiary structures are labor intensive and require analyzing every single molecule of interest separately. Since the emergence of massive parallel sequencing the RNA structure determination field is undergoing rapid changes, immensely increasing the throughput of experiments and proposing the new ways of data analysis. This thesis consists of four manuscripts which describe developments within this methodological shift by presenting and validating the novel experimental and computational approaches of harnessing the nextgeneration sequencing for RNA structural studies.

The first paper (“Detection of Reverse Transcriptase Termination Sites Using cDNA Ligation and Massive Parallel Sequencing”) presents a flexible, easy to follow method of preparing Illumina sequencing libraries that allows for massive identification of reverse transcription termination sites (RTTS) – RTTS‐Seq. The detection of RTTS can be utilized for investigation of various RNA properties, ranging from mapping 5’ ends, susceptibility towards certain treatments (e.g. structure probing), detecting base modifications or others, which depends on the experimental design. Apart from describing the detailed experimental protocol we provide the data analysis workflow suitable for researchers without bioinformatics expertise. The experience from the RTTS‐Seq method has been utilized in the second paper (“Massive parallel sequencing based hydroxyl radical probing of RNA accessibility”) for the tertiary RNA structure probing. It has been extended with PCR bias tackling technique and combined with normalization scheme that takes into consideration local coverage and background reverse transcription terminations as assessed by the control reaction. The method allows for probing multiple, long molecules simultaneously and the obtained signal correlates well with a backbone solvent accessibility for both assayed molecules (RNase P specificity domain and the 16S ribosomal RNA). Another included paper (“The search for functional RNA secondary structures within 3’ untranslated regions by enzymatic probing of liver transcripts from multiple species (FragSeq2)”) presents the method of RNA secondary structure probing which is again an RTTS‐Seq modification but is compatible with the nuclease‐based (P1 and V1) probing. In this protocol we ligated the adapter at the RNA level as opposed to the cDNA level ligation in the RTTS‐Seq approach. Moreover, we have performed the reverse transcription that was anchored at the poly(A) tail border, focusing the assay for the 3’ untranslated regions. This set‐up required establishing a new data normalization workflow that incorporates the signal decay from the 3’ ends of molecules. We have performed the experiments with liver RNA from three species, which allows us to combine the nuclease probing data with a structure conservation analysis creating an information rich dataset. We validate the method by comparing the nuclease signal with the known structures for three classes of RNA molecules. The search for the novel functional structures is ongoing.

In parallel to studying the RNA structure we have investigated the interactions between RNA and an oligonucleotide with therapeutic potential (“Transcriptome‐wide detection of binding sites of Locked Nucleic Acid containing oligonucleotides (LNA‐Stop‐Seq)”). We describe a development of a method that can detect the hybridization sites on the transcriptome‐wide scale – LNA‐Stop‐Seq. We characterize and optimize various steps in the procedure and propose strategies of enriching for cDNA molecules terminated upon reaching the crosslinked oligonucleotide. Finally, the sequencing results confirm that the enrichment works but the unexpected signal distribution requires additional data analysis efforts.

The methods presented in this thesis are capable of providing a holistic view of RNA, its primary, secondary and tertiary structure, as well as interactions with oligonucleotides. We expect that the advances made in the experimental and computational methods, as well as the gathered results, should allow for better understanding of the RNA structure‐function relationship on top of the better and simpler antisense drugs design.