Bogumil Kaczkowski:
Computational Cancer Biology: From Carcinogenesis to Metastasis

Date: 15-04-2012    Supervisor: Anders Krogh and Ole Winther

Cancer biology is an exciting and dynamic fi eld of research. Recently, it has become increasingly dependent on high throughput technologies to generate biological data. Due to massive amount of (often noisy) data, cancer biology research needs computational and machine learning methods to handle the data.

In the cell, the genomic information is stored in DNA in the nucleus. Messenger RNA (mRNA) is a working copy of DNA, generated by transcription; it passes the information to the cytoplasm, where it is transcribed to proteins. The process is called gene expression. MicroRNAs are small, nonprotein coding RNAs, that regulate gene expression at post-transcriptional level. The high throughput pro ling of DNA, mRNA and miRNAs provides valuable insight into di fferent layers of biological activities within the cell.

In the fir st part of the thesis, I present the analysis of mRNA and miRNA expression in the cell model of Human Papilloma Virus (HPV) infection. HPV is responsible for 5% of cancers worldwide and better understanding of the molecular mechanisms of the infection can lead to improved treatment, diagnostics and prevention of these cancers. The generated mRNA expression pro les of the infected cells have been analyzed with integration of the available knowledge about cellular pathways and proteinprotein interaction. The results show di fferential expression of many interesting genes, and deregulation of several vital signaling pathways, such as Interleukin-2, JAK-STAT, TGF-β , NOTCH and tyrosine kinase signaling. Pro ling of microRNA expression within the model, showed di fferential expression of dozens of cellular microRNAs, and provided targets for further experimental research.

The second part of this thesis focuses on building a classi fier that can predict the primary site of Cancers of Unknown Primary (CUP). CUP is generally a highly aggressive disease with poor prognosis and is forth most common cause of death by cancer in developed countries. The prediction of its primary site enables more targeted therapy, hopefully improving the response to the treatment. In the fi rst project, mRNA expression pro les of more than 2400 tumor samples are used to train a classi fier. The classifi er is reasonably successful with predicting the origin of primary and metastatic samples. However the expression pro les of 60 CUP patients appeared distinct from the primary tumor and metastases of known origin. Therefore CUP patients may require di fferent diagnostic strategy and treatment. In the second project, DNA copy number data are used to build similar classifi er. The results from primary tumors and cancer cell lines are promising and open a way for development of a novel classi fier of primary site.