TOM – 19.10.2016

Venue: Biomedicum 1, Seminar room 1-2

Timing: 15:30 onwards

Presenter: Gopal Peddinti, Senior Researcher, FIMM

Genome scale metabolic modeling of cancer cells

Genome based research in cancer has identified a plethora of mutational events involved in the initiation and progression of cancers. Surprisingly, the huge diversity of genomic alterations appear to converge in altering the tumor metabolism. Therefore, “starving the tumor cells of their energy supplies” (i.e. targeting tumor metabolism) appears to be a universal bullet to treat cancers. To explore the therapeutic potential of metabolism, genome scale metabolic models (GSMM) emerged as powerful in silico tools in cancer systems biology research for predicting biomarkers, therapeutic targets, and treatment side effects.

GSMMs are stoichiometric models providing a comprehensive view of cell metabolism. Reconstruction and simulation of GSMMs is achieved by constraint based modeling framework. COnstraints Based Reconstruction and Analysis (COBRA) is a comprehensive toolbox used widely in metabolic modeling. I will briefly review the GSMM approach, some of its successful applications, and show the COBRA matlab toolbox (https://opencobra.github.io/cobratoolbox/).

TOM – 21.09.2016

Venue: Biomedicum 1, Meeting room 10

Timing: 15:30 onwards

Presenter: Tuomo Hartonen, PhD student

Transcription factor binding site discovery from ChIP-exo and ChIP-nexus data

Novel chromatin immunoprecipitation (ChIP) experiments ChIP-nexus and ChIP-exo allow studying transcription factor binding with unprecedented accuracy. True transcription factor binding locations are separated from noise by peak calling softwares.

Most peak calling softwares search binding events by creating a model of “true” peaks from the sites with highest enrichment in the ChIP experiments and then accepting only the peaks resembling this model. It is however known that most transcription factors bind cooperatively with other factors, form dimers or interact with other proteins. Moreover, the currently most used approach to predict transcription factor binding sites with simple Position Weight Matrix (PWM) models dos not explain all binding that is observed in vivo. These different types of binding create different ChIP-nexus/exo fingerprints. Fitting the peaks to just one model may lead to missing important binding events.

PeakXus is a peak caller specifically designed to leverage the increased resolution of ChIP-nexus/exo experiments. PeakXus is developed with the aim of making as few assumptions of the data as possible to allow novel discoveries. PeakXus supports use of Unique Molecular Identifiers (UMI) to remove PCR-duplicates that can create artefacts closely resembling true ChIP-nexus/exo binding events. We show that PeakXus consistently finds more peaks overlapping with transcription factor specific PWM-hits than published methods.

I will try to give a clear introduction to the topic so that also those who are not so familiar with the ChIP experiments are able to follow. I will compare the ChIP-exo/nexus protocols to ChIP-seq in order to highlight applications where the novel experiments are more suitable (for example allele specific binding analysis). I will also briefly explain the kind of output the algorithm provides.

PeakXus is published at: http://bioinformatics.oxfordjournals.org/content/32/17/i629.short

PeakXus code at GitHub: https://github.com/hartonen/PeakXus

Other useful references for those who are interested in reading more:

He et al. (2015). ChIP-Nexus enables improved detection of in vivo transcription factor binding footprints. Nature biotechnology.

Rhee & Pugh (2011). Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell.

Kivioja  et al. (2012). Counting absolute numbers of molecules using unique molecular identifiers. Nature methods.

TOM – Course description

Featured

When?
——-
Usually starts at 15.30
Biomedicum 1

Why?
—————————-
First things first, this meetings’ setting is a very casual one. We want to create a society for exchange of knowledge and ideas of people doing/interested in Bioinformatics/Applied Statistics. The idea is to bring them together and we hope that this will be beneficial for all of us.

What?
—————————————
Tool Of the Month (TOM) is a platform to discuss the various tools and methods that we use in our research. We can discuss various bioinformatics algorithms, softwares, packages in R/Python/Perl/any-of-your-favourite-language. And if you want to talk about your research that’s even better!

Each time we meet, we will have one or two talks about a method/tool/idea in a format the that can be freely decided by the presenter. And if you want to continue the exciting conversation with the eclectic crowd, feel free to take it to nearby restaurants/pubs. What’s more, you can receive credits, by only presenting once and actively participating to TOM meetings!

Credits:  

Students can obtain credits from presenting and/or attending TOM-meetings as follows:
a) attending 6 meetings & filling the feedback form =1 cr
b) doing a) and giving a presentation = 2 cr

Attendances do not all have to be from the same year
Credits can be registered after each spring term

 

TOM mailing list
——————-
If you would like to receive notifications of the upcoming meetings, you can join the TOM mailing list following the instructions shown here:

https://helpdesk.it.helsinki.fi/en/instructions/collaboration-and-publication/e-mail/mailing-lists-basic-use

Name of the mailing list is ils-tom

TOM Fall Schedule

Tentative schedule for Fall session 2016

Date Presenter       Title of presentation
31.08 Abhishekh Gupta SGNS2: a tool for systems modeling
21.09 Tuomo Hartonen Transcription factor binding site discovery from ChIP-exo/Nexus data
19.10 Gopal Peddinti COBRA: Constraint Based Modeling for Genome scale metabolic networks
16.11 Teemu Kivioja Integer Linear Programming
14.12 Boris Vassiliev

Document, automate and share your data analysis with Lir

 

TOM – 31.08.2016

Venue: Biomedicum 1, Seminar room 1-2

Timing: 16:00 onwards

Presenter: Abhishekh Gupta, Post-doc, Group Aittokallio, FIMM

SGNS2: a tool for systems modeling

To move beyond the trajectory of traditional scientific efforts focused on studying individual genes or single mutation and its relationship with a phenotype, we need to channel efforts to integrate information from various technologies (omics datasets) to enable a holistic view of biological systems (my current research focus is on Cancer systems). Therefore, what I will be presenting to you can be categorized into the area of Cancer Systems Biology.

Description:

Interactions in biological systems differ from physical and chemical systems, as biological processes are mostly stochastic in nature and involve a low copy numbers of interacting species (for ex: mRNA, proteins, substrate). Transcriptional regulation of gene expression is a dynamic process, and inherently noisy due to the fluctuations in number of biomolecules involved in such processes. This dynamic behavior of transcriptomic landscape can have a significant effect on the overall cellular response to a given stimulus, for instance, how a cancer cell responds to a drug. For such systems, a stochastic formulation is usually preferred.

I will be presenting how the tool SGNS2 can be used in systems biology project. SGNS2 is an open-source simulator of chemical reaction systems according to the Stochastic Simulation Algorithm (SSA). SGNS2 is based on an enhanced Next Reaction Method, one of the efficient sampling procedures of the SSA. The simulator also uses the concepts of dynamic compartments. The simulator is optimized for simulation of models with large number interacting species.

Lloyd-Price J, Gupta A, Ribeiro AS. SGNS2: a compartmentalized stochastic
chemical kinetics simulator for dynamic cell populations. Bioinformatics. 2012
Nov 15;28(22):3004-5.

 

 

TOM – 11.05.2016

Venue: Biomedicum 1, Meeting room 1 

Timing: 15:30 – 17:00

Speaker: Christian Benner, PhD Student

FINEMAP: Ultrafast high-resolution fine-mapping using summary data from genome-wide association studies

Genome-Wide Association Studies (GWAS) have identified thousands of loci associated with complex diseases. A next crucial step is fine-mapping: identifying causal variants that point to molecular mechanisms behind the associations and, eventually, suggest therapeutic targets. Recently, fine-mapping methods have been extended to use only GWAS summary data together with pairwise correlations of the variants. Common to these approaches is that they rely on computationally expensive exhaustive search restricting their use to a few hundred variants. We introduce a software package FINEMAP that replaces the exhaustive search by an ultrafast stochastic search and thereby allows fine-mapping analyses to scale up to whole chromosomes.

We show that FINEMAP (1) opens up completely new opportunities by, e.g., exploring 15q21/LIPC locus for high-density-lipoprotein (HDL) with 20,000 variants in less than 90 seconds while exhaustive search would require more than 9,000 years, (2) provides similar accuracy to exhaustive search when the latter can be completed, (3) achieves even higher accuracy when the latter must be restricted due to computational reasons, and (4) identifies more plausible variant combinations than conditional analysis. At 15q21/LIPC locus with at least a 3-SNP association pattern with HDL, a missense variant and a promoter polymorphism are likely to be causal whereas the lead variant in single-SNP testing has less evidence than a regulatory variant correlated with it.

We believe that FINEMAP’s approach of jointly modeling the whole locus together with its unprecedented computational efficiency will help reveal valuable knowledge that could otherwise remain hidden due to limitations of existing fine-mapping methods.    

http://bioinformatics.oxfordjournals.org/content/early/2016/01/14/bioinformatics.btw018.abstract

 

TOM – 27.04.2016

Venue: Biomedicum 1, Meeting room 3, P floor 

Timing: 15:30 – 17:00

Speaker: Mikhail Shubin, PhD Student

Data Visualization

Everybody loves visualization! It is used to search the data for new discoveries and to communicate these discoveries to fellow scientists.

But are we doing it right? Could we do it better? Is it possible we were lost in a hype?

I will try to give a brief overview of scientific visualization from the point of graphic design. I will use keywords like “information/noise ratio”, “human visual cognition”, “fashion”, “colour blindness” and “everybody is doing it wrong”. But I would prefer having a discussion instead of a lecture; I will bring a number of examples which (I hope) we can talk about together.

About me: My name is Mikhail, I have a masters degree in CS and bioinformatics. Now I’m spending my last months as a PhD student in Statistics. I have a blog about scientific visualization https://ctg2pi.wordpress.com.

TOM – 12.04.2016

Venue: Biomedicum 1, 5th Floor, Meeting room 7

Timing: 15:30 – 17:00

Speaker: Svetlana Ovchinnikova, PhD Student, Group Anders, FIMM

Metric Learning Algorithms

Many popular machine learning approaches strongly rely on distance/similarity between samples. Yet a pre-defined distance metric is not always relevant for the considered property and an appropriate transformation of the feature space can increase the effectiveness of a machine learning algorithm.

In my presentation, I will give an overview of existing approaches to metric learning and talk about mathematical background behind these algorithms. I will also demonstrate two Matlab packages: LMNN (Large Margin Nearest Neighbours) and MLKR (Metric Learning for Kernel Regression), which can be used for classification and regression problems respectively.

Weinberger, K. Q., Blitzer, J., & Saul, L. K. (2005). Distance metric learning for large margin nearest neighbor classification. In Advances in neural information processing systems (pp. 1473-1480). – http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2005_265.pdf

Weinberger, K. Q., & Tesauro, G. (2007). Metric learning for kernel regression. In International Conference on Artificial Intelligence and Statistics (pp. 612-619). – http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS07_WeinbergerT.pdf

Kireeva, N. V., Ovchinnikova, S. I., Kuznetsov, S. L., Kazennov, A. M., & Tsivadze, A. Y. (2014). Impact of distance-based metric learning on classification and visualization model performance and structure–activity landscapes. Journal of computer-aided molecular design, 28(2), 61-73. – http://link.springer.com/article/10.1007/s10822-014-9719-1

TOM – 30.03.2016

Venue: Biomedicum 1, Meeting room B236a

Timing: 15:30 – 17:00

Speaker: Liye He, PhD Student, Group Aittokallio, FIMM

LICHeE (Lineage Inference for Cancer Heterogeneity and Evolution) is a novel computational model to reconstruct the tumour phylogenies and tumour sub-clone decomposition by utilising the presence patterns of somatic single nucleotide variants (SSNVs) across multiple samples and their variant allele frequencies (VAFs).

The link for the paper:  http://www.ncbi.nlm.nih.gov/pubmed/25944252

TOM – 16.03.2016

Venue: Biomedicum 1, Seminar room 1-2

Timing: 16:00 – 17:30

Speaker: Simon Anders, FIMM-EMBL Group Leader

(i) DESeq is a Bioconductor package for analysis of RNA-Seq data that we developed and which became one of the most widely used tools for this purpose. The history of this project offers some useful lessons on how to design, author, publish, disseminate and advertise bioinformatics software with impact, i.e, such that it finds wide use by the research community.

(ii) Recently, I directed the focus of my research on interactive visualization tools for high-dimensional data. Our platform of choice is D3, a data visualization library written in Javascript. I will give a short overview of its library and show how we used it.