The KFC-2 Server

Top

Introduction

When a protein molecule binds with another biological polymer (protein or nucleic acid) to form a complex, the subset of residues in the interface that account for most of a protein binding free energy are called binding hot spots. The KFC2 Server provides a user-friendly, web-based tool for predicting protein binding hot spots based on machine learning approaches. For each residue within the binding interface, the KFC2 Server characterizes its local structural environment and compares it to known environments of experimentally determined hot spots. A prediction is then made whether or not the residue is a hot spot. After the computational analysis is complete, the results may be visualized using an interactive job viewer. In addition to standard molecular viewing functionality, the job viewer allows the user to quickly highlight predicted hot spots and surrounding structural features. Two different machine learning methods are implelemented on the KFC2 Server.

Top

The KFC2 Method

In April, 2011, a major enhancement to the KFC Server was made. Improved hotspot prediction methods are now available with this server. Using support vector machines (SVM) along with a training set composed of about the same number of hot spot as non-hot spot residues, and incorporating additional features which capture the degree of residue flexibility, improved predictions of hot-spots are now possible. Please see the KFC2 reference given below for more details.

With KFC2, two separate SVM models are implemented. These are referred to as KFC2a and KFC2b. KFC2a offers higher sensitivity and accuracy, but lower specificity than KFC2b. The user may examine both model scores interactively with the KFC Viewer as described below. The results may also be retrieved in tabular form. Depending on how the predictions are to be used, one model may be preferred over the other. If the user wants to have the highest degree of confidence that the predicted hotspots are truly hotspots, then the KFC2b model should be used. On the other hand, if it is important to not overlook any possible hotspots, then KFC2a should be used.

Top

Publications

The citations given below provide a complete discussion of the development and performance of the KFC and KFC2 methods:

Please cite the appropriate article in any work that uses the KFC/KFC2 Server:

Running a KFC Analysis

Top

Registration and Login

Users can register prior to submitting jobs to any of the tools hosted by the Mitchell Lab or submit jobs anonymously. Personal information is only used to contact users when their analysis is complete; it will not be shared. To register, enter a unique user name and email address on the registration page, then click the submit button. An error message will display if the selected user name is in use by another user.

Once registered, users may log in to the server. Although login is not required to submit jobs, it allows a user to view their personal jobs in the job viewer. Both the username and password are case sensitive. By default, a login will expire after two weeks; however, a user may manually logoff as well.

Top

The Job Submission Form

Top

PDB File Requirements and Chain Labels: the Protein Interface

A protein binding interface is the region between two or more polymer chains where the atoms from the different chains interact strongly enough to form a stable complex. A valid PDB file submitted to the KFC2 Server must contain at least two separate polymer chains in order to contain an interface. In the case of homodimers, the chemical composition of the chains will be identical, but the PDB file should still contain two unique chain label identifiers. In some cases, the PDB file downloaded from the Protein Data Bank may require the application of symmetry operators in order to generate the biologically significant interface. Beware, that while the Protein Data Bank does offer the capability of downloading a PDB file containing the biologically significant interface, the file should be checked for proper chain labels, and edited if necessary, before running the file through KFC2. The current version of these generated files from the Protein Data Bank does not use unique chain labels to distinguish the interface chains, but rather separates the chain groups using MODEL and ENDMDL keywords.

In the job submission form, two chain sets may be specified. Chain labels are case sensitive, however, if there are no lowercase chain labels in the PDB file, lowercase input will automatically be converted to uppercase. Valid chain labels may by letters, uppercase or lowercase, or digits 0 through 9. While it is recommended that the user specifies the chain labels, if no chain labels are specified in the form, the KFC Server will attempt to automatically assign the interface chain sets for you.

The automatic chain selection works as follows: If a TER record is found in the PDB file, all ATOMS before the TER are used for Chain Set 1, and all ATOMS after the TER are used for Chain Set 2. If the PDB file does not have any chain labels (a space character in column 22) a unique chain label will be assigned to that chain (defined by location TER records. Also, if the same chain label is used before and after the first TER record, the second chain will be given the lowercase label of the first chain. If a chain separating TER record is not found in the PDB file, then the automatic chain selection will be based on the chain labels. All ATOMS with the first chain label will be used for Chain Set 1, and all ATOMS after this chain will be used for Chain Set 2. In this mode, if additional chains are contained in the file after the first chain which have the same chain label as the first chain, these chains will be assigned different unique chain labels. Beware that TER records are also used to specify breaks in the same chain for regions where the structure is undetermined. The presence of such chain breaks may result in confusing results.

Note that the Jmol viewer, when auto-selecting the color of a chain based on chain label, uses the same color for both uppercase and lowercase case labels. Also, chain labels with the digits 0-9 will use the same color as chains Q-Z. It may be that you will have to rename the chains in order to obtain unique colors for the chains in the displayed molecule. By using the Viewer to toggled on or off a given chain, it is relatively simple to locate specific chains by label. Also, by hovering over a chain, an atom label will pop-up which includes the chain label. A complete summary of the chain sets used in a job may be viewed in the as described below in the Examining Results section.

Note that the KFC Server does not predict interfaces, it analyzes given interfaces for hotspots. If you are trying to predict potential interface residues for a single protein chain, we highly recommend that you look into Consurf or Evolutionary Trace or one of the many websites mentioned in this paper.

Note that model structures containing many clashes (unnaturally close contacts between atoms) may vastly overestimate the number of hot spots. Please remove these from your PDB file before submitting to the server. In addition, PDB files which contain "iCodes" in column 27 of the the ATOM records (immediately after the residue number) will not be properly handled by the KFC Server. If your PDB file contains any iCodes, please uniquely renumber the residues and eliminate the iCodes before submitting the file.

Finally, the original KFC model is able to analyze structures containing proteins and DNA/RNA but not other types of molecules. Presently, nucleic acid chains are not handled with the new KFC2 method. Until work is completed to add this capability, the current KFC2 Server will automatically switch to using the original KFC model for cases when interface nucleic acid chains are selected.

Top

Filling Out the Job Form

To analyze an interface, enter the following information on the submission page and click the Submit button:

The Job Queue

After submitting a job, an initial check is made of the PDB file and chain labels to see if the labels provided are consistent with chain labels found in the PDB file. If a problem is identified a message will be issued and the server will take no further action. After passing the consistency check, the job will enter a queue and wait for processing. You may check on the current status of the job by clicking the Job Queue button on the top menu bar.

The job queue displays the current status for each submitted job (Queued, Active, View Results, or Error), and provides links to KFC input and output files. After processing begins, a typical KFC analysis finishes within two minutes. When the task is complete, an email is optionally sent to you with a link to your KFC hot spot predictions or an error message. If the job finishes successfully, the status field will contain a link to the interactive job viewer.

Top

Error Messages

When a job fails with the KFC2 Server, an error report file will give information about the step in which the problem occurred, and, in some cases, additional information about how the problem might be fixed.

For jobs run under the original KFC server, the following error codes were used:

Most errors are caused by non-standard amino acids or ligands incorrectly labeled as ATOM records within the PDB coordinate file. If possible, the user should resolve the inconsistencies in the file and submit a new job. If subsequent jobs still end in error, users can contact mitchelljc@ornl.gov for assistance.

Examining the results

Obtaining the Analysis Results File

Using the Job Queue display, you may access KFC input, output, and error files by clicking on a job’s identification number. This is the number in the very first column of the job queue listing. Clicking on the number will bring up a list of the files which you may examine or download. Using the Chains link on this files list page will display how the interface chain sets were defined along with the Jmol color. Depending on your web browser, this page will either be displayed in a new browser window or as a separate tab. Also available form the file list is a file with the name ending with the extension .results. This file contains the numerical results, as well as the hot spot classification, for the residues determined to be part of the interface. The results files generated using the KFC2 Server using the PDB file for 1DVA and interface between chains H and X are given below for both the original KFC and KFC2 methods.

Top Top

KFC2 Format of Hot Spot Prediction Results File

KFC2 Hot Spot Prediction Server @mitchell-lab.org from Thu, 17 Mar 2011 12:18:45 CDT
JobId: 3748   JobName: Demo_22_1dva_kfc2

                       KFC2-A  KFC2-A  KFC2-B   KFC2-B   ConSurf ConSu  Rosetta  Roset   Exper  Exper
  Chain    Res    Num  Class    Conf   Class     Conf     Class  Value   Class    DDG    Class  Value
------------------------------------------------------------------------------------------------------
    H      LEU     32  -------  -0.75  Hotspot    0.10   -------     2  -------  0.41  Hotspot  Str 
    H      LEU     34  -------  -0.71  Hotspot    0.11   -------     2  -------  1.25  Hotspot  Str 
    H      ASN     37  -------  -1.79  -------   -0.97   -------     1  -------  0.01  -------  Ins 
    H      GLY     38  -------  -0.15  -------   -0.61   -------     3  -------   ---  -------  --- 
    H      ALA     39  -------  -1.59  -------   -0.87   -------     1  -------   ---  -------  --- 
    H      GLN     40  -------  -1.53  -------   -0.98   -------     6  -------  0.01  -------  --- 
    H      ASP     60  -------  -----  -------   -----   -------     1  -------   ---  -------  --- 
    H      ILE     65  -------  -0.77  -------   -0.40   -------     3  -------  0.73  -------  Ins 
    H      VAL     67  -------  -0.30  -------   -0.12   -------     5  -------  0.70  -------  Ins 
    H      GLU     70  -------  -1.28  -------   -0.73   Conserv     7  -------  1.02  -------  Weak 
    H      LEU     73  Hotspot   0.14  Hotspot    0.24   -------     2  -------  0.53  -------  Ins 
    H      SER     74  -------  -1.20  -------   -0.89   -------     5  -------  0.11  -------  Ins 
    H      GLU     75  -------  -1.83  -------   -0.98   -------     1  -------  0.00  -------  Ins 
    H      HIS     76  -------  -0.95  -------   -0.81   -------     1  -------  0.43  Hotspot  Str 
    H      GLU     80  -------  -1.26  -------   -0.65   Conserv     7  -------  0.01  -------  Ins 
    H      GLN     81  -------  -2.03  -------   -0.98   -------     2  -------   ---  -------  --- 
    H      SER     82  -------  -1.23  -------   -0.86   -------     1  ------- -0.01  -------  Ins 
    H      SER    129  -------  -----  -------   -----   -------     6  -------   ---  -------  --- 
    H      LEU    144  -------  -0.75  -------   -0.19   -------     1  -------  0.28  -------  Ins 
    H      LEU    145  -------  -1.62  -------   -0.98   -------     1  -------   ---  -------  --- 
    H      ASP    146  -------  -2.30  -------   -0.92   -------     1  -------   ---  -------  --- 
    H      ARG    147  -------  -2.29  -------   -0.78   -------     1  -------   ---  -------  --- 
    H      GLY    149  -------  -2.34  -------   -0.73   -------     6  -------   ---  -------  --- 
    H      ALA    152  -------  -2.16  -------   -0.87   Conserv     7  -------   ---  -------  --- 
    H      LEU    153  -------  -0.71  -------   -0.08   -------     2  -------  0.82  -------  Weak 
    H      GLN    170  -------  -----  -------   -----   -------     2  -------   ---  -------  --- 
    H      TYR    184  -------  -----  -------   -----   Conserv     7  -------   ---  -------  --- 
    H      LYS    188  -------  -----  -------   -----   -------     5  -------   ---  -------  --- 
    X      ALA      1  -------  -1.60  -------   -0.90   -------   ---  -------   ---  -------  --- 
    X      LEU      2  Hotspot   0.39  Hotspot    0.14   -------   ---  Hotspot  2.31  Hotspot  Str 
    X      CYS      3  -------  -1.93  -------   -0.88   -------   ---  -------   ---  -------  --- 
    X      ASP      5  -------  -1.66  -------   -0.89   -------   ---  -------  1.65  -------  --- 
    X      ARG      7  -------  -0.52  -------   -0.29   -------   ---  Hotspot  4.40  -------  Weak 
    X      VAL      8  -------  -0.57  -------   -0.36   -------   ---  -------  0.57  -------  Int 
    X      ASP      9  -------  -0.01  -------   -0.27   -------   ---  -------  0.66  -------  Int 
    X      TRP     11  -------  -0.53  Hotspot    0.01   -------   ---  Hotspot  2.61  Hotspot  Str 
    X      TYR     12  Hotspot   0.34  Hotspot    0.35   -------   ---  Hotspot  3.16  Hotspot  Str 
    X      GLN     14  -------  -1.99  -------   -0.94   -------   ---  -------  0.10  -------  --- 
    X      PHE     15  Hotspot   0.04  Hotspot    0.13   -------   ---  -------  1.58  Hotspot  Str 
    X      VAL     16  -------  -2.21  -------   -1.00   -------   ---  -------  0.22  -------  --- 

Top

Using the KFC Viewer

The job viewer has two major components: a molecular viewer on the left, and a control panel on the right. Users can directly interact with the molecular viewer or use the control panel to affect the display.

In the sample screen shot below, the longer chain H is colored in light coral and the short chain X in teal. Since only these two chains were selected in the job submission form, the other chains contained in PDB file for 1DVA are not shown. Included in the view are space filled hetero atom residues which have at least one atom within 4 Angstroms of an atom in one the chosen chains.

Just below the Interface and KFC-2 Hot Spots heading you will notice that residues determined to be in the interface region are listed. For interface residues which are predicted to be hot spots by either of the two KFC2 models or any of the additional data (Consurf, Rosetta, Experimental) a pink background is used in the box immediately surrounding the residue label. Non-hot spot interface residues are indicated with a white background. For each of the three hotspot residues LEU32:H, LEU34:H, and TYR12:X the first check box immediately below the residue label has been clicked in order to display these residues in space filling mode. Notice that in this snapshot, the mouse pointing arrow was positioned over the TYR12:X label causing the actual data for this hot spot to pop-up. Depending on your browser, you may have to click the label to get the pop-up data, though simply hovering should cause it to appear.

Each component is described in more detail below.

Top

Control Panel: FADE Shape Markers

KFC uses the Fast Atomic Density Evaluator (FADE) to analyze the shape specificity within a protein-protein interface. Users can highlight different degrees of shape specificity clicking on the different color-coded checkboxes.

Top

Control Panel: Display Controls

These controls alter the appearance of the selected atoms. By default, KFC selects all protein atoms in the complex. Advanced users may change the atom selection by using the Jmol scripting language.

Additionally, users can save up to four different views of their session.

Top

Control Panel: Interface and Hot Spots

Each chain produces a unique group in the interface display.

The three checkboxes in each cell control the display of an interface residue.

The coloring within each cell also encodes information about the residue.

Top

Control Panel: Miscellaneous Buttons

Top

Molecular Viewer: Jmol

Jmol is the molecular viewer used throughout the Mitchell Lab website. It is an applet written in Java, so users must enable Java and Javascript in their web browsers in order to use the KFC Server. Also, Windows users may need to install the most current Sun Java Runtime Environment (JRE) in order to use Jmol. Jmol is extensively documented, so we direct users to the following websites for information about its use.

Top

Caveat

If you use the console to make selections and change displays, the selections shown in the Control Panel may no longer be accurate. Actions taken using the console override any mouse-driven selection and display controls.

Top