When a protein molecule binds with another biological polymer (protein or nucleic acid) to form a complex, the subset of residues in the interface that account for most of a protein binding free energy are called binding hot spots. The KFC2 Server provides a user-friendly, web-based tool for predicting protein binding hot spots based on machine learning approaches. For each residue within the binding interface, the KFC2 Server characterizes its local structural environment and compares it to known environments of experimentally determined hot spots. A prediction is then made whether or not the residue is a hot spot. After the computational analysis is complete, the results may be visualized using an interactive job viewer. In addition to standard molecular viewing functionality, the job viewer allows the user to quickly highlight predicted hot spots and surrounding structural features. Two different machine learning methods are implelemented on the KFC2 Server.Top
With KFC2, two separate SVM models are implemented. These are referred to as KFC2a and KFC2b. KFC2a offers higher sensitivity and accuracy, but lower specificity than KFC2b. The user may examine both model scores interactively with the KFC Viewer as described below. The results may also be retrieved in tabular form. Depending on how the predictions are to be used, one model may be preferred over the other. If the user wants to have the highest degree of confidence that the predicted hotspots are truly hotspots, then the KFC2b model should be used. On the other hand, if it is important to not overlook any possible hotspots, then KFC2a should be used.
The citations given below provide a complete discussion of the development and performance of the KFC and KFC2 methods:
Please cite the appropriate article in any work that uses the KFC/KFC2 Server:
Users can register prior to submitting jobs to any of the tools hosted by the Mitchell Lab or submit jobs anonymously. Personal information is only used to contact users when their analysis is complete; it will not be shared. To register, enter a unique user name and email address on the registration page, then click the submit button. An error message will display if the selected user name is in use by another user.
Once registered, users may log in to the server. Although login is not required to submit jobs, it allows a user to view their personal jobs in the job viewer. Both the username and password are case sensitive. By default, a login will expire after two weeks; however, a user may manually logoff as well.Top
A protein binding interface is the region between two or more polymer chains where the atoms from the different chains interact strongly enough to form a stable complex. A valid PDB file submitted to the KFC2 Server must contain at least two separate polymer chains in order to contain an interface. In the case of homodimers, the chemical composition of the chains will be identical, but the PDB file should still contain two unique chain label identifiers. In some cases, the PDB file downloaded from the Protein Data Bank may require the application of symmetry operators in order to generate the biologically significant interface. Beware, that while the Protein Data Bank does offer the capability of downloading a PDB file containing the biologically significant interface, the file should be checked for proper chain labels, and edited if necessary, before running the file through KFC2. The current version of these generated files from the Protein Data Bank does not use unique chain labels to distinguish the interface chains, but rather separates the chain groups using MODEL and ENDMDL keywords.
In the job submission form, two chain sets may be specified. Chain labels are case sensitive, however, if there are no lowercase chain labels in the PDB file, lowercase input will automatically be converted to uppercase. Valid chain labels may by letters, uppercase or lowercase, or digits 0 through 9. While it is recommended that the user specifies the chain labels, if no chain labels are specified in the form, the KFC Server will attempt to automatically assign the interface chain sets for you.
The automatic chain selection works as follows: If a TER record is found in the PDB file, all ATOMS before the TER are used for Chain Set 1, and all ATOMS after the TER are used for Chain Set 2. If the PDB file does not have any chain labels (a space character in column 22) a unique chain label will be assigned to that chain (defined by location TER records. Also, if the same chain label is used before and after the first TER record, the second chain will be given the lowercase label of the first chain. If a chain separating TER record is not found in the PDB file, then the automatic chain selection will be based on the chain labels. All ATOMS with the first chain label will be used for Chain Set 1, and all ATOMS after this chain will be used for Chain Set 2. In this mode, if additional chains are contained in the file after the first chain which have the same chain label as the first chain, these chains will be assigned different unique chain labels. Beware that TER records are also used to specify breaks in the same chain for regions where the structure is undetermined. The presence of such chain breaks may result in confusing results.
Note that the Jmol viewer, when auto-selecting the color of a chain based on chain label, uses the same color for both uppercase and lowercase case labels. Also, chain labels with the digits 0-9 will use the same color as chains Q-Z. It may be that you will have to rename the chains in order to obtain unique colors for the chains in the displayed molecule. By using the Viewer to toggled on or off a given chain, it is relatively simple to locate specific chains by label. Also, by hovering over a chain, an atom label will pop-up which includes the chain label. A complete summary of the chain sets used in a job may be viewed in the as described below in the Examining Results section.
Note that the KFC Server does not predict interfaces, it analyzes given interfaces for hotspots.
If you are trying to predict potential interface residues for a
single protein chain, we highly recommend that you look into
or one of the many websites mentioned in this
Note that model structures containing many clashes (unnaturally close contacts between atoms) may vastly overestimate the number of hot spots. Please remove these from your PDB file before submitting to the server. In addition, PDB files which contain "iCodes" in column 27 of the the ATOM records (immediately after the residue number) will not be properly handled by the KFC Server. If your PDB file contains any iCodes, please uniquely renumber the residues and eliminate the iCodes before submitting the file.
Finally, the original KFC model is able to analyze structures containing proteins and DNA/RNA but not other types of molecules. Presently, nucleic acid chains are not handled with the new KFC2 method. Until work is completed to add this capability, the current KFC2 Server will automatically switch to using the original KFC model for cases when interface nucleic acid chains are selected.Top
To analyze an interface, enter the following information on the submission page and click the Submit button:
The job queue displays the current status for each submitted job (Queued, Active, View Results, or Error), and provides links to KFC input and output files. After processing begins, a typical KFC analysis finishes within two minutes. When the task is complete, an email is optionally sent to you with a link to your KFC hot spot predictions or an error message. If the job finishes successfully, the status field will contain a link to the interactive job viewer.Top
For jobs run under the original KFC server, the following error codes were used:
Using the Job Queue display, you may access KFC input, output, and error files by clicking on a job’s identification number. This is the number in the very first column of the job queue listing. Clicking on the number will bring up a list of the files which you may examine or download. Using the Chains link on this files list page will display how the interface chain sets were defined along with the Jmol color. Depending on your web browser, this page will either be displayed in a new browser window or as a separate tab. Also available form the file list is a file with the name ending with the extension .results. This file contains the numerical results, as well as the hot spot classification, for the residues determined to be part of the interface. The results files generated using the KFC2 Server using the PDB file for 1DVA and interface between chains H and X are given below for both the original KFC and KFC2 methods.Top Top
The job viewer has two major components: a molecular viewer on the left, and a control panel on the right. Users can directly interact with the molecular viewer or use the control panel to affect the display.
In the sample screen shot below, the longer chain H is colored in light coral and the short chain X in teal. Since only these two chains were selected in the job submission form, the other chains contained in PDB file for 1DVA are not shown. Included in the view are space filled hetero atom residues which have at least one atom within 4 Angstroms of an atom in one the chosen chains.
Just below the Interface and KFC-2 Hot Spots heading you will notice that residues determined to be in the interface region are listed. For interface residues which are predicted to be hot spots by either of the two KFC2 models or any of the additional data (Consurf, Rosetta, Experimental) a pink background is used in the box immediately surrounding the residue label. Non-hot spot interface residues are indicated with a white background. For each of the three hotspot residues LEU32:H, LEU34:H, and TYR12:X the first check box immediately below the residue label has been clicked in order to display these residues in space filling mode. Notice that in this snapshot, the mouse pointing arrow was positioned over the TYR12:X label causing the actual data for this hot spot to pop-up. Depending on your browser, you may have to click the label to get the pop-up data, though simply hovering should cause it to appear.
Each component is described in more detail below.Top
KFC uses the Fast Atomic Density Evaluator (FADE) to analyze the shape specificity within a protein-protein interface. Users can highlight different degrees of shape specificity clicking on the different color-coded checkboxes.
These controls alter the appearance of the selected atoms. By default, KFC selects all protein atoms in the complex. Advanced users may change the atom selection by using the Jmol scripting language.
Additionally, users can save up to four different views of their session.
The three checkboxes in each cell control the display of an interface residue.
The coloring within each cell also encodes information about the residue.
If you use the console to make selections and change displays, the selections shown in the Control Panel may no longer be accurate. Actions taken using the console override any mouse-driven selection and display controls.Top