Cancer biology is an exciting and dynamic field of research. Recently, it has become increasingly dependent on high throughput technologies to generate biological data. Due to massive amount of (often noisy) data, cancer biology research needs computational and machine learning methods to handle the data.
In the cell, the genomic information is stored in DNA in the nucleus. Messenger RNA (mRNA) is a working copy of DNA, generated by transcription; it passes the information to the cytoplasm, where it is transcribed to proteins. The process is called gene expression. MicroRNAs are small, nonprotein coding RNAs, that regulate gene expression at post-transcriptional level. The high throughput proling of DNA, mRNA and miRNAs provides valuable insight into different layers of biological activities within the cell.
In the first part of the thesis, I present the analysis of mRNA and miRNA expression in the cell model of Human Papilloma Virus (HPV) infection. HPV is responsible for 5% of cancers worldwide and better understanding of the molecular mechanisms of the infection can lead to improved treatment, diagnostics and prevention of these cancers. The generated mRNA expression proles of the infected cells have been analyzed with integration of the available knowledge about cellular pathways and proteinprotein interaction. The results show differential expression of many interesting genes, and deregulation of several vital signaling pathways, such as Interleukin-2, JAK-STAT, TGF-β, NOTCH and tyrosine kinase signaling. Proling of microRNA expression within the model, showed differential expression of dozens of cellular microRNAs, and provided targets for further experimental research.
The second part of this thesis focuses on building a classifier that can predict the primary site of Cancers of Unknown Primary (CUP). CUP is generally a highly aggressive disease with poor prognosis and is forth most common cause of death by cancer in developed countries. The prediction of its primary site enables more targeted therapy, hopefully improving the response to the treatment. In the first project, mRNA expression proles of more than 2400 tumor samples are used to train a classifier. The classifier is reasonably successful with predicting the origin of primary and metastatic samples. However the expression proles of 60 CUP patients appeared distinct from the primary tumor and metastases of known origin. Therefore CUP patients may require different diagnostic strategy and treatment. In the second project, DNA copy number data are used to build similar classifier. The results from primary tumors and cancer cell lines are promising and open a way for development of a novel classifier of primary site.