Sander schneider curve
The Sander and Schneider curve indicates the percentage of identical residues shared between to sequences of a certain length that is minimally required for safe homology modelling.
Only 20% of the human protein sequences has a known protein structure. These structures are solved by either x-ray crystallography or NMR and are stored in a worldwide database called the PDB. Another 30% of the human proteins has a protein structure that is similar to one that is already stored in the PDB and therefore we can build a homology model for these proteins.
You can imagine that we cannot build a model using just a random structure from the database, the two protein should be homolgous. When is is safe to assume homology between two proteins? This depends on the number of identical residues in the alignment of two sequences and on the length of this alignment. Consider the following example:
Q: 97 VPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTR 156 +PS Y G + F + F S TAKS T TYSP L K++CQ+AKTCP+Q+ V + PPPGT S: 13 IPSNTDYPGPHHFEVTFQQSSTAKSATWTYSPLLKKLYCQIAKTCPIQIKVSTPPPPGTA 72 Q: 157 VRAMAIYKQSQHMTEVVRRCPHHERCSD-SDG-LAPPQHLIRVEGNLRVEYLDDRNTFRH 214 +RAM +YK+++H+T+VV+RCP+HE D ++G AP HLIRVEGN +Y+DD T R S: 73 IRAMPVYKKAEHVTDVVKRCPNHELGRDFNEGQSAPASHLIRVEGNNLSQYVDDPVTGRQ 132 Q: 215 SVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRV 274 SVVVPYEPP+VG++ TTI YN+MCNSSC+GGMNRRPIL IITLE G +LGR SFE R+ S: 133 SVVVPYEPPQVGTEFTTILYNFMCNSSCVGGMNRRPILIIITLEMRDGQVLGRRSFEGRI 192 Q: 275 CACPGRDRRTEEENLRKK 292 CACPGRDR+ +E++ R++ S: 193 CACPGRDRKADEDHYREQ 210
In this example you can see that the two protein sequences Q and S are not exactly the same, but they do share many identical residues (117, which is 59%). In this case it is easy to see that the two proteins will probably adopt the same conformation with some differences at certain positions (usually in surface loops). In a modelling experiment it is therefore possible to use the protein structure for S as a template to build a model for protein Q.
The question whether two proteins are homologous or not becomes difficult to answer when we enter the zone of 20-35% sequence identity. In 1991 mr Sander and mr Schneider studied all the protein structures that were stored in the PDB by then. They found that homology between two proteins depends on the length and the number of identical residues in the sequence alignment between the two proteins. They suggested a formula that can be used to calculate the minimal percentage identical residues that is required to build a model for a sequence with a certain length [1]. Later, mr Rost repeated this experiment and slightly adapted this formula, so it became:
The formula might be easier to understand when we plot it in the following picture.
When the length and percentage identical residues fall in the safe zone (happy smailey) we can assume that the two protein structures have a similar structure and that we can build a homology model. When the identity and length of the alignment falls in the zone below the treshold (sad smiley) modelling will become a challenge. In the plot you can also see that the number of residues that is needed to safely build a homology model decreases when the length of the alignment increases. However, the treshold does not hit the 25%. In the zone below the treshold it might still be possible to build a model, but finding the correct template will be difficult and other techniques might be needed.