International Course to Text-Mining in Cancer Research

This is an international course sponsored by the Danish Agency of Science as a part of the Israel-Danish collaboration of Dr. Frenkel-Morgenstern and Prof. Jensen (2015). We brought scientists at the cutting edge of the text-mining and bioinformatics to teach students during the four days course with the fully supported accommodation, meals and transportation. The course has generated a fruitful interaction between the international students and scientists.

Bioinformatics Unit

The Bioinformatics Center comprises an interdisciplinary team of researchers and research students in the fields of biology, computer science, statistics, mathematics and engineering. The team works together to develop methods and software tools both for analyzing biological data and for studying biological processes. Recent advances in RNA and DNA sequencing technologies make available whole genome and whole transcriptome sequencing to all interested research laboratories, at an affordable price.

ChiPPI Webserver and Database

Fusion proteins, comprising peptides deriving from the translation of two parental genes, are produced in cancer by means of chromosomal rearrangements or aberrations. The expressed fusion gene incorporates exons of both parental genes while maintaining intact protein domains and domain boundaries. Using a methodology that treats discreet protein domains as binding sites for specific domains of partner proteins, we have cataloged the partner proteins for more than 29,000 fusion proteins. We have developed ChiPPI (Chimeric protein-Protein-Interactions) method, which compares the protein domains in fusion proteins to the domains presented in both parental proteins.

The ChiTaRS-3.1 database of the Chimeric Transcripts and RNA-seq Data

The ChiTaRS database (ChiTaRS-3.1) collates about 40,000 chimeric transcripts in humans, mice, fruit flies, zebrafish, cows, rats, pig and yeast. In the current version we extended the experimental data evidence as well as included a novel type of the sense-antisense chimeric transcripts of the same gene confirmed experimentally by RT-PCR, qPCR, RNA-sequencing and mass-spec peptides. In addition, we collected about 11,700 human cancer breakpoints with the expression levels of chimeric RNAs confirmed by the paired-end RNA-sequencing experiments in different tissues in humans, mice and fruit flies.

Identifying Fusion Proteins and their Interactions Using Text Mining

ProtFus is a resource that has information about fusion proteins and their interactions, based on a text mining approach. “Tagging” is a process of registering the mention of given entities in a particular document. In ProtFus, two or more types of information were tagged concurrently to find co-mentions. For example, human fusion proteins and cancer. Let us assume that we are interested in the fusion protein BCR-ABL1. We wanted to find all the mentions of BCR-ABL1 in the literature. But, BCR-ABL1 can be spelled in a variety of ways BCR-ABL1, BCR/ABL1, bcr-abl1, bcr/abl1, bcr:abl1, BCR:ABL1, etc. Thus, we developed ProtFus, a “tagger” which identifies all these types of co-mentions. It uses natural language processing methodology for tagging interactions.
Thus, publicly available information from biomedical research is readily accessible through the Internet and is becoming a powerful resource for predictive protein-protein interactions and protein docking. We plain to extend these methods in the future.

The Database of Protein-protein interActions of Stress-response genes in subTerranean and fossORial AnimaLs (PASTORAL)

PASTORAL is a database that has been developed to catalog and identify protein-protein interactions of stress-response genes in subterranean and fossorial animals with Nano-Spalax Galili as a model organism. It is a unique database of protein-protein interactions (PPI) of stress-response genes in subterranean and fossorial animals.
In addition to this, PASTORAL can also be used to search for relevant stress-response genes and their role in specific environmental conditions, identify their corresponding protein-protein interactions, orthologs, codon usage preferences and network-related features based on protein-protein interactions.

Annotator PPI Webserver and Database

The AnnotatorPPI server is webserver that provides an automatic annotation of uploaded FASTA sequences of the clones of interest and build the protein-protein interaction networks for every sequence provided. After “Bulk Annotation” is done each entry is associated with a gene sequence, detailed functional annotation, and links to Ensembl, Entrez, GeneCards InterPro and UniProt databases for any particular gene.

Scripts. NBC algorithm for Oncogene and Tumor Suppressor Classification. Scripts available upon request. Percolation model and other scripts are available upon request.

EMBOSS explorer, a graphical user interface to the EMBOSS suite of bioinformatics tools. The useful Bioinformatics resource is available at the Azrieli Faculty of Medicine and open for use.