Rhapsody

What is Rhapsody?

Rhapsody is a machine learning tool for predicting the impact of amino acid substitutions in proteins. It consists of a random forest classifier trained not only on traditional conservation properties, but also on structural and dynamical properties of the mutation site, localized on the protein's PDB structure, and coevolution properties, extracted from Pfam sequence alignments.
What kind of variants can Rhapsody analyze?

Rhapsody can provide predictions for Single Amino acid Variants (SAVs) in human proteins for which PDB structures are available.
Why only human SAVs?

Because Rhapsody derives sequence conservation properties from PolyPhen-2, which is designed to work only for human SAVs.
What are the accepted input formats?

Rhapsody only accepts SAVs in Uniprot coordinates, with the format:
<Uniprot ID> <position> <wild-type aa> <mutated aa> .
For instance, mutation Q99R in human protein GTPase HRas can be queried by submitting the input string P01112 99 Q R or RASH_HUMAN 99 Q R .
We provide a Uniprot search tool to help with the identification of a sequence's unique accession number. When running an in silico saturation mutagenesis analysis, only the Uniprot sequence identifier (plus, optionally, a specific position) should be provided.
What does "in silico saturation mutagenesis" mean?

A complete scanning of all possible 19 amino acid substitutions at every position in a protein sequence. The result will be a "saturation mutagenesis table" (see example) that not only contains predictions for individual mutations, but also provides a general view of the parts in the sequence that are predicted to be more (or less) sensitive to mutations.
What is a "batch query"?

A batch query allows to submit a list of individual variants from a single or multiple protein sequences. The list must contain one variant per line, in Uniprot coordinates.
What if there is no PDB structure for a given protein?

Normally, when queried with a sequence, Rhapsody searches the Protein Data Bank for the "best" (i.e. the largest) structure available. If a structure is not found, the user can manually provide a custom protein structure, by either indicating a PDB code (for instance, of a homologous protein from another organism) or uploading a file in PDB format (e.g. downloaded from the SWISS-MODEL repository of homology models, see ROMK tutorial for an example). This option can also be used to run predictions on a particular protein structure or conformation (see HRAS tutorial for an example). Please note that Rhapsody will automatically align the Uniprot sequence to the PDB sequence and compute predictions only for matching amino acids: if the two sequences are too dissimilar, the resulting predictions might be too sparse.
What does it mean to include "environmental effects"?

When computing structural and dynamical features from a PDB structure, by default Rhapsody will only consider a single chain (the one with higher sequence similarity with the given Uniprot sequence) and ignore other chains that might be present in the PDB file. Sometimes, for instance in the case of multimers or other complexes, the presence of other chains should not be ignored and those properties should be computed for the entire complex. This is done by using a variant of Elastic Network Model called "environmental ANM" (more precisely, a "sliced" model, see main publication and ROMK tutorial). In conclusion, environmental effects should be included if the chain of interest is part of a "stable" complex (e.g. a multimer) and as such its dynamical properties are influenced and determined by the presence of other chains. On the other hand, please be aware that computing predictions on large complexes will take a significantly longer time.
What is the difference between "full" and "reduced" classifiers?

Both "full" and "reduced" classifiers are trained on sequence-, structure- and dynamics-based features. The main difference is that the "full" classifier also includes coevolutionary properties computed on Pfam multiple sequence alignments. If part of a sequence is not covered by a Pfam domain, predictions from the "reduced" classifier are returned instead.
What is the "full+EVmutation" classifier?

The "full+EVmutation" classifier includes in its list of features used for predictions the "epistatic statistical energy difference of mutant", computed by EVmutation and based on coevolution analysis of multiple sequence alignments. Although it has been shown to slightly improve the accuracy of predictions (see Rhapsody paper), by default this additional feature is not included in order to provide predictions that are independent from those computed by EVmutation. EVmutation predictions alone are always displayed in the final results along with those from Rhapsody and PolyPhen-2.
What is displayed in the output files?

FAQs

What is Rhapsody?

What kind of variants can Rhapsody analyze?

Why only human SAVs?

What are the accepted input formats?

What does "in silico saturation mutagenesis" mean?

What is a "batch query"?

What if there is no PDB structure for a given protein?

What does it mean to include "environmental effects"?

What is the difference between "full" and "reduced" classifiers?

What is the "full+EVmutation" classifier?

What is displayed in the output files?