The interview with Prof. Carasso on the Israeli Channel 10 (to see a full version of the program click here )

International Course in Text Mining in Cancer Research

This is an international course sponsored by the Danish Agency of Science as a part of the Israel-Danish collaboration of Dr. Frenkel-Morgenstern and Prof. Jensen (2015). We brought scientists at the cutting edge of the text-mining and bioinformatics to teach students during the four days course with the fully supported accommodation, meals and transportation. The course has generated a fruitful interaction between the international students and scientists.

Bioinformatics Unit

The Bioinformatics Center comprises an interdisciplinary team of researchers and research students in the fields of biology, computer science, statistics, mathematics and engineering. The team works together to develop methods and software tools both for analyzing biological data and for studying biological processes. Recent advances in RNA and DNA sequencing technologies make available whole genome and whole transcriptome sequencing to all interested research laboratories, at an affordable price.

ChiPPI Webserver and Database

Fusion proteins, comprising peptides deriving from the translation of two parental genes, are produced in cancer by means of chromosomal rearrangements or aberrations. The expressed fusion gene incorporates exons of both parental genes while maintaining intact protein domains and domain boundaries. Using a methodology that treats discreet protein domains as binding sites for specific domains of partner proteins, we have cataloged the partner proteins for more than 29,000 fusion proteins. We have developed ChiPPI (Chimeric protein-Protein-Interactions) method, which compares the protein domains in fusion proteins to the domains presented in both parental proteins.

The Improved database of Chimeric Transcripts and RNA-Seq data – ChiTaRS-5.0

ChiTaRS is a database (ChiTaRS-5.0) of about 111,582 chimeric transcripts in humans, mice, fruit flies, zebrafish, cows, rats, pig, and yeast. It was developed by Dr. Milana Frenkel-Morgenstern and Dr. Alessandro Gorohovski at the Structural Biology and Biocomputing Programme Lab in Spanish National Cancer Research Centre (CNIO), Madrid, Spain under the supervision of Prof. Alfonso Valencia.
In the current version, ChiTaRS-5.0, we extended the experimental data evidence as well as included a novel type of the sense-antisense chimeric transcripts of the same gene confirmed experimentally by RT-PCR, qPCR, RNA-sequencing and mass-spec peptides. In addition, we collected about 23,167 human cancer breakpoints in different cancer types. The database includes unique information correlating chimeric breakpoints with 3D chromatin contact maps, generated from public datasets of chromosome conformation capture techniques (Hi-C). In this update, we have added curated information on druggable fusion targets matched with chimeric breakpoints, which are applicable to precision medicine in cancers. The introduction of a new section that lists chimeric RNAs in various cell-lines is another salient feature.
Finally, using text-mining techniques, novel chimeras in Alzheimer’s disease, schizophrenia, dyslexia and other diseases were collected in ChiTaRS. Thus, this improved version is an extensive catalogue of chimeras from multiple species. It extends our understanding of the evolution of chimeric transcripts in eukaryotes and contributes to the analysis of 3D genome conformational changes and the functional role of chimeras in the etiopathogenesis of cancers and other complex diseases.
Currently, work on the ChiTaRS database improvements is carried out at the Cancer Genomics and BioComputing Lab in Bar-Ilan University. Read more about ChiTaRS here.

Identifying Fusion Proteins and their Interactions Using Text Mining

ProtFus is a resource that has information about fusion proteins and their interactions, based on a text mining approach. “Tagging” is a process of registering the mention of given entities in a particular document. In ProtFus, two or more types of information were tagged concurrently to find co-mentions. For example, human fusion proteins and cancer. Let us assume that we are interested in the fusion protein BCR-ABL1. We wanted to find all the mentions of BCR-ABL1 in the literature. But, BCR-ABL1 can be spelled in a variety of ways BCR-ABL1, BCR/ABL1, bcr-abl1, bcr/abl1, bcr:abl1, BCR:ABL1, etc. Thus, we developed ProtFus, a “tagger” which identifies all these types of co-mentions. It uses natural language processing methodology for tagging interactions.
Thus, publicly available information from biomedical research is readily accessible through the Internet and is becoming a powerful resource for predictive protein-protein interactions and protein docking. We plain to extend these methods in the future.

Scripts. NBC algorithm for Oncogene and Tumor Suppressor Classification. Scripts available upon request. Percolation model and other scripts are available upon request.

EMBOSS explorer, a graphical user interface to the EMBOSS suite of bioinformatics tools. The useful Bioinformatics resource is available at the Azrieli Faculty of Medicine and open for use.