TOM – Course description

Featured

When?
——-
Usually starts at 15.30
Biomedicum 1

Why?
—————————-
First things first, this meetings’ setting is a very casual one. We want to create a society for exchange of knowledge and ideas of people doing/interested in Bioinformatics/Applied Statistics. The idea is to bring them together and we hope that this will be beneficial for all of us.

What?
—————————————
Tool Of the Month (TOM) is a platform to discuss the various tools and methods that we use in our research. We can discuss various bioinformatics algorithms, softwares, packages in R/Python/Perl/any-of-your-favourite-language. And if you want to talk about your research that’s even better!

Each time we meet, we will have one or two talks about a method/tool/idea in a format the that can be freely decided by the presenter. And if you want to continue the exciting conversation with the eclectic crowd, feel free to take it to nearby restaurants/pubs. What’s more, you can receive credits, by only presenting once and actively participating to TOM meetings!

Credits:  

Students can obtain credits from presenting and/or attending TOM-meetings as follows:
a) attending 6 meetings & filling the feedback form =1 cr
b) doing a) and giving a presentation = 2 cr

Attendances do not all have to be from the same year
Credits can be registered after each spring term

 

TOM mailing list
——————-
If you would like to receive notifications of the upcoming meetings, you can join the TOM mailing list following the instructions shown here:

https://helpdesk.it.helsinki.fi/en/instructions/collaboration-and-publication/e-mail/mailing-lists-basic-use

Name of the mailing list is ils-tom

TOM 30.05

Venue: Meeting room 5-6 Biomedicum 1 Meilahti

Timing: 15:30 onwards

Presenter: Tuomas Puoliväli

MultiPy: Multiple hypothesis testing in Python

The reproducibility of research findings has been recently questioned in many fields of science. This problem is partially caused by testing multiple hypotheses simultaneously, which increases the number of false positive findings if the corresponding p-values are not properly corrected. While this multiple testing problem is well known and has been extensively studied for decades, the classic and advanced methods are yet to be implemented into an accessible and comprehensive Python package. Here we developed a software called multipy (MULTIple hypothesis in PYthon), which is an open-source, unit-tested, and freely available collection of procedures for controlling the family-wise error rate and the false discovery rate. We hope that this effort will help to improve current data analysis practices, as well as facilitate building software for large-scale group analyses.

TOM 25.04.2018

Venue: Viikki Infocenter room 139, live stream in Meilahti Terkko auditorium 2013

Timing: 16:00 onwards

Presenter: Rishi Das Roy

4-RNA-seq

At present RNA-sequencing is a very popular method to measure gene expressions. The data generated from sequencing machines are required to go through QC, filtering and quantification steps or pipeline ( collection of softwares). There are plenty of tutorials in the internet or workshops available to learn any of these pipelines. However, many times they are difficult to implement/install in users own desktop or server (like Taito supercluster of CSC).

Here, I present a pipeline 4-RNA-seq which can be easily installed in any Slurm based job scheduling system such as Taito. Further, I will discuss how to execute and customize it.

Target audience : Those who want to make their own pipeline in superclusters.

_____________________________________________________________________________

For those who are unable to go to Viikki, we have booked Terkko auditorium 2013 in Meilahti, where you can follow live stream of this month’s TOM session

TOM 14.03.2018

Venue: Biomedicum 1, Seminar room 1-2, P-floor

Timing: 16:00 onwards

Presenter: Julia Casado

Cyto: single cell data analysis framework

New single cell technologies offer high-dimensional quantitative data. Technologies like mass cytometry and scRNA-seq are used increasingly in clinical research, especially in the study of cancer and immunology. This creates an unprecedented opportunity as well as a computational challenge due to the vast amount of data produced in each experiment. I will give a tutorial style talk on the existing methods to overcome this challenge and will present Cyto, a computational framework designed to rapidly integrate these methods and adapt to new research questions. Cyto is fully open source and currently only available in developer version. I hope the talk will be beneficial both for bioinformaticians and biologists interested in single cell transcriptomics and proteomics. Although there will be some small code examples, the talk will highlight the main steps in single cell analyses using a mass cytometry dataset from ovarian tumors as the central example.

TOM 21.02.2018

Venue: Viikki Infocenter room 139, live stream in Meilahti Terkko auditorium 2013

Timing: 16:00 onwards

Presenter: Ilida Suleymanova

A deep convolutional neural network approach for astrocyte detection

Astrocytes are involved in brain pathologies such as trauma or stroke, neurodegenerative disorders like Alzheimer’s and Parkinson’s, and many others. Determining the timing of morphological and biochemical changes is important for understanding the discrete steps of cells function in health and diseases. Most widely used approaches for image-processing and quantification analysis are either manual cell counting or semi-automatic techniques. Currently, we lack a fully-automated and efficient image analysis tool to quantify astrocytes. In this study, we developed a fast and fully automated a graphical software that assesses the number of astrocytes based on Deep Convolutional Neural Networks (DCNN) technique. Based on the fast-learnable image features of this type of cells, DCNN may grant the most efficient solution. The proposed method shows strong positive correlation with the manual counts.

_____________________________________________________________________________

For those who are unable to go to Viikki, we have booked Terkko auditorium 2013 in Meilahti, where you can follow live stream of this month’s TOM session

TOM 31.01.2018

Venue: Biomedicum 1 Seminar room 3

Timing: 15:30 onwards

Presenter: Jarkko Toivonen

MODER: discovering structures of dimeric transcription factor binding sites

Transcription factor (TF) binding sites can be modeled with the usual
Position-specific Probability Matrices (PPMs).
However, when using these models to scan the genome, too many
putative binding sites are predicted. This binding site discovery
can be alleviated by incorporating additional information
about the binding process. One source of additional information
is the collaborative binding of TFs. To obtain information
of collaborative binding we have developed an algorithm
and a program called MODER, which learns a total probability model
which includes monomeric models for binding sites and for
their dimeric interactions.

In this talk I will first briefly introduce the problem MODER
solves. After that I will show examples of running MODER and
introduce the visualization of the results.

MODER is freely available at:
https://github.com/jttoivon/moder

For more reading:
https://doi.org/10.1093/nar/gky027

TOM Spring schedule 2018

Tentative schedule for Spring session 2018

 

Date Presenter  Title of presentation
31.01 Jarkko Toivonen

MODER: Discovering structures of dimeric transcription factor binding sites

21.02

Ilida Sulemanyova 

A deep convolutional neural network approach for astrocyte detection

14.03

Julia Casado

CYTO – single cell analysis powered by Anduril

25.04

Rishi Das Roy

For RNA-seq: a pipeline for RNA-seq analysis

30.05

Tuomas Puoliväli

MultiPy: Multiple hypothesis testing in Python

 

TOM 13.12.2017

Venue: Biomedicum 1, Meeting room 8-9, P-floor

Timing: 15:30 onwards

Presenter: Chiara Facciotto, PhD student

Interactive visualizations in R using Shiny and Plotly

The first step of downstream data analysis usually involves data exploration through different types of visualizations such as barplots, boxplots, histograms, scatterplots and heatmaps. This can be a quite tedious step, often requiring to adjust parameters or to analyze different subsets of data. Moreover, browsability of the data by project collaborators can be difficult due to data sharing policies.

In this talk I will show how to use Shiny and Plotly, two R packages, to produce interactive custom-made visualizations that can be used to explore your data and that can be shared as online apps with your collaborators. Feel free to bring your computer with you if you want to try to build your own app.

 

TOM 15.11.2017

Venue: Biomedicum 1, Meeting room 8-9, P-floor

Timing: 15:30 onwards

Presenter: Balaguru Ravikumar, PhD student

C-SPADE: a web-tool for interactive analysis and visualization of drug screening experiments through compound-specific bioactivity dendrograms

A necessary functionality performed by every chemical biologist is to analysis the similarity or diversity of the drug molecules, prior to their screening experiments. Owing to the wide variety of methods, chemical biologist currently require the aid of a chemoinformatician to perform such analysis. Toward that end, we introduce C-SPADE, an open-source exploratory web-tool for interactive analysis and visualization of drug profiling assays (biochemical, cell-based or cell-free) using compound-centric similarity clustering. C-SPADE, in real-time, allows the users to visually map the chemical diversity of a screening panel, explore investigational compounds in terms of their similarity to the screening panel, perform polypharmacological analyses and guide drug-target interaction predictions. C-SPADE provides an intuitive representation of the chemical space by capturing and visualizing underlying patterns of compound similarities linked to their polypharmacological effects, thereby reducing the time required for manual analysis in drug development or repurposing applications. The web-tool provides a customizable visual workspace that can either be downloaded as figure or Newick tree file or shared as a hyperlink with other users.

C-SPADE is freely available at http://cspade.fimm.fi/.

In this I talk I will briefly take you through C-SPADE and its functionality, providing a demo of its usability with a case study.

For more reading:

https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gkx384

TOM 18.10.2017

Venue: Biomedicum 1, Meeting room 8-9, P-floor

Timing: 15:30 onwards

Presenter: Andres Veidenberg, PhD student

Web-based evolutionary sequence analysis with Wasabi

Multiple sequence alignments and phylogenetic trees form the basis of comparative sequence analyses. Downstream analysis pipelines, however, can easily grow overly complex with each added tool and branching dataset. To address the issue I’m building Wasabi: a web-browser based, graphical environment for evolutionary sequence analysis. Wasabi is designed for joint visualization and analysis of trees and multiple sequence alignments, and incorporates external tools and databases to a coherent analysis platform. My talk will introduce Wasabi as an analysis tool in a public server (http://wasabiapp.org), visualization provider for web services, and interactive plugin in scientific publishing.

TOM 13.09.2017

Venue: Biomedicum 1, Meeting room 8-9, P-floor

Timing: 15:30 onwards

Presenter: Mehreen Ali, PhD student, FIMM

Missing data imputation methods in proteomic datasets

Mass spectrometry (MS)-based proteomic profiling has the potential to study comprehensive protein profiling of biological samples and thus has shifted the focus of research from qualitative to quantitative analyses. MS-based proteomics provide opportunity for more global profiling of post-translational modifications, in terms of yielding proteome-wide information about cancer cell signaling activity that is not accessible by genomics or transcriptomics alone. 

However, a substantial amount of data is missing at peptide/protein level primarily due to low-abundance peptides and/or poor ionization. Missing values in proteomics datasets limit the information extraction from proteomics datasets using statistical and machine learning methods, and thus have detrimental effect on downstream analyses.

In this talk, I will give a general overview of imputation methods adapted for proteomics datasets