In summary, HOPE collects structural information from a series of sources, including calculations on the 3D protein structure, sequence annotations in UniProt and prediction from the Reprof software. HOPE combines this information to give analyse the effect of a certain mutation on the protein structure. HOPE is an online web service where the user can submit a sequence and mutation. HOPE will show the effect of that mutation in such a way that even those without a bioinformatics background can understand it. A more detailed description of the system is shown below.
HOPE is available at https://www.cmbi.umcn.nl/hope. The webserver allows the user to submit a protein sequence (can be fasta or not) or an accession code of the protein of interest. In a next step the user can indicate the mutated residue with a simple mouseclick. In the final step the user can simply click on one of the other 19 amino acid types that will become the mutant residue.
The Basic Local Alignment Search Tool is a well-known bioinformatics method to compare sequences, described in 1990 by Altschul et al.. The BLAST algorithm is often used to search sequence databases for homologous sequences. HOPE uses BLAST to search with the submitted sequence in both the UniProt database and the PDB. The MRS web service is used to perform the BLAST search, where the low-complexity filter is switched off.
- BLAST against UniProt
- This BLAST search will result in the UniProt entry of the protein of interest. Every known protein has an UniProt entry identified with an accession code. We save this accession code to be able to access other servers/services later in the process.
- BLAST against PDB
- We perform this second BLAST search to find the 3D structure of the protein of interest, or a possible template for homology modelling. In case the hit has 100% sequence identity with the submitted sequence we have found a solved 3D structure of the protein which can be either NMR or X-ray. This file can simply be downloaded and used for further analysis. Sometimes a sequence does not have a 100% identical match, this means that no structure was solved for that protein. However, a homologous protein structure might be solved. When the amount of identity between the submitted sequence and PDB file exceeds the homology modelling threshold HOPE will build a model. When a structure or modelling template isn't available we have to rely on other sources of information.
The program Yasara, developed by E. Krieger is used to build a homology model when possible. This program has an automatic modelling script that obtained very good results in the CASP8 competition. The script only needs the sequence of the protein of interest and will fully automatically build a model. This model is stored as a pdb-file.
Information from the 3D Structure
The protein structure of interest (downloaded or build by Yasara) is analyzed by a series of WHAT IF webservices. (Reference.) These services can for example calculate residue accessibility, secondary structure, ligand contacts, metal contacts, ionic interactions, disfulide bonds, hydrogen bonds, etc. This information is collected and stored in HOPE's database.
Information from UniProt
We access the UniProt XML-file of the protein of interest using the accession code as identifier. In this file we find the sequence features; structural information that can be assigned to one or more residue(s). A detailed explanation of all these features can be found in the UniProt manual . Every feature can be assigned to a certain location, some of the features have an interesting remark or not that is also stored in HOPE. A few examples of these features are: active sites, domain, motifs, region, glycosylation sites, metal-contacts, DNA contacts, transmembrane domains, variations, mutagenesis sites. The following figure shows part of UniProt's sequence features for the entry EHMT1_human.
Information from Reprof
Reprof is a secondary structure prediction program. HOPE uses it to acquire information about both secondary structure and solvent accessibility. It is important to keep in mind that this information is obtained by predictions. HOPE will only use this when no other information source is available. Of course, other servers/services can be added in future.
In this step, all collected information is combined into a report. The three makes sure that the most reliable source of information is used, being information from the "real protein structure" followed by annotated information in UniProt followed by predictions by Reprof. The three is divided in 5 branches that correspond with the paragraphs that appear in the report.
- Contacts like metal, DNA, hydrogenbonds, ionic interactions etc. The tree checks whether the mutation affects an important contact;
- Structural locations including motifs, domains, transmembrane domains etc. The three checks wether the mutation is located in a part of the protein with an assigned name, and corresponding function;
- Non-structural features like postranslational modifications. These are the features that do not directly affect the structure of the protein, but that can affect its function for example glycosylation sites;
- Variations contains information about known variants at that position. The three checks whether a variation (either a SNP, mutagenesis site, splice variant etc) is known at this position;
- Amino acid properties consisting of size, charge, hydrophobicity. These features are used to determine the effect of the mutation in the other branches of the three. Only when all parts of the three are used and there is still no answer, the three will use the differences between the amino acid types.
The report is divided in different parts corresponding to the different parts of the decision tree. It is build from small pieces of text that combine into a complete story. The report will show whether a structure was known, a model was build or predictions were made. This is followed by the effect of the mutation, illustrated by figures (in case a structure is available) and animations. These figures are made using a Yasara-macro and PovRay for rendering. Difficult keywords in the report are linked to our own online Wikipedia.