TOM 24.10.2018

Venue: Room 2013 Terkko Meilahti and Room 139 Infocenter Viikki

Timing: 16:00 onwards

Presenter: Salla Välipakka

CNV detection and annotation from NGS data

Next generation sequencing (NGS) represents a comprehensive and increasingly cost-effective approach to diagnose genetically challenging disorders. However, diagnostic efforts with these methods have so far primarily focused on single nucleotide variants (SNVs) and short insertions and deletions (indels), leaving 40% to 70% of patients undiagnosed depending on the level of prescreening. Generally, the development of bioinformatics tools for copy number variant (CNV) analysis from NGS data lags behind that for other variant types. Additionally, the annotation of CNVs remains quite laborious since the few existing CNV databases are not as comprehensive or well curated as databases for other types of variants. In this talk, I am going to present the current advances and problems in developing CNV analysis for different NGS data types with some concrete examples.

TOM 30.05

Venue: Meeting room 5-6 Biomedicum 1 Meilahti

Timing: 15:30 onwards

Presenter: Tuomas Puoliväli

MultiPy: Multiple hypothesis testing in Python

The reproducibility of research findings has been recently questioned in many fields of science. This problem is partially caused by testing multiple hypotheses simultaneously, which increases the number of false positive findings if the corresponding p-values are not properly corrected. While this multiple testing problem is well known and has been extensively studied for decades, the classic and advanced methods are yet to be implemented into an accessible and comprehensive Python package. Here we developed a software called multipy (MULTIple hypothesis in PYthon), which is an open-source, unit-tested, and freely available collection of procedures for controlling the family-wise error rate and the false discovery rate. We hope that this effort will help to improve current data analysis practices, as well as facilitate building software for large-scale group analyses.

TOM 25.04.2018

Venue: Viikki Infocenter room 139, live stream in Meilahti Terkko auditorium 2013

Timing: 16:00 onwards

Presenter: Rishi Das Roy


At present RNA-sequencing is a very popular method to measure gene expressions. The data generated from sequencing machines are required to go through QC, filtering and quantification steps or pipeline ( collection of softwares). There are plenty of tutorials in the internet or workshops available to learn any of these pipelines. However, many times they are difficult to implement/install in users own desktop or server (like Taito supercluster of CSC).

Here, I present a pipeline 4-RNA-seq which can be easily installed in any Slurm based job scheduling system such as Taito. Further, I will discuss how to execute and customize it.

Target audience : Those who want to make their own pipeline in superclusters.


For those who are unable to go to Viikki, we have booked Terkko auditorium 2013 in Meilahti, where you can follow live stream of this month’s TOM session

TOM 14.03.2018

Venue: Biomedicum 1, Seminar room 1-2, P-floor

Timing: 16:00 onwards

Presenter: Julia Casado

Cyto: single cell data analysis framework

New single cell technologies offer high-dimensional quantitative data. Technologies like mass cytometry and scRNA-seq are used increasingly in clinical research, especially in the study of cancer and immunology. This creates an unprecedented opportunity as well as a computational challenge due to the vast amount of data produced in each experiment. I will give a tutorial style talk on the existing methods to overcome this challenge and will present Cyto, a computational framework designed to rapidly integrate these methods and adapt to new research questions. Cyto is fully open source and currently only available in developer version. I hope the talk will be beneficial both for bioinformaticians and biologists interested in single cell transcriptomics and proteomics. Although there will be some small code examples, the talk will highlight the main steps in single cell analyses using a mass cytometry dataset from ovarian tumors as the central example.

TOM 21.02.2018

Venue: Viikki Infocenter room 139, live stream in Meilahti Terkko auditorium 2013

Timing: 16:00 onwards

Presenter: Ilida Suleymanova

A deep convolutional neural network approach for astrocyte detection

Astrocytes are involved in brain pathologies such as trauma or stroke, neurodegenerative disorders like Alzheimer’s and Parkinson’s, and many others. Determining the timing of morphological and biochemical changes is important for understanding the discrete steps of cells function in health and diseases. Most widely used approaches for image-processing and quantification analysis are either manual cell counting or semi-automatic techniques. Currently, we lack a fully-automated and efficient image analysis tool to quantify astrocytes. In this study, we developed a fast and fully automated a graphical software that assesses the number of astrocytes based on Deep Convolutional Neural Networks (DCNN) technique. Based on the fast-learnable image features of this type of cells, DCNN may grant the most efficient solution. The proposed method shows strong positive correlation with the manual counts.


For those who are unable to go to Viikki, we have booked Terkko auditorium 2013 in Meilahti, where you can follow live stream of this month’s TOM session

TOM 31.01.2018

Venue: Biomedicum 1 Seminar room 3

Timing: 15:30 onwards

Presenter: Jarkko Toivonen

MODER: discovering structures of dimeric transcription factor binding sites

Transcription factor (TF) binding sites can be modeled with the usual
Position-specific Probability Matrices (PPMs).
However, when using these models to scan the genome, too many
putative binding sites are predicted. This binding site discovery
can be alleviated by incorporating additional information
about the binding process. One source of additional information
is the collaborative binding of TFs. To obtain information
of collaborative binding we have developed an algorithm
and a program called MODER, which learns a total probability model
which includes monomeric models for binding sites and for
their dimeric interactions.

In this talk I will first briefly introduce the problem MODER
solves. After that I will show examples of running MODER and
introduce the visualization of the results.

MODER is freely available at:

For more reading: