The average number of vertices per patch is about 500

The average number of vertices per patch is about 500. Patch Generation. Patches are generated from a given center point. Some programs generate patches to include points within a distance from the center point (19); however, this approach may only work for surfaces with relatively simple topology. We define a patch as a continuous surface area within a cutoff Rabbit polyclonal to ACCN2 geodesic distance from the center point. By using geodesic Pyraclonil distance, we guarantee that this generated surface patches are continuous, uniform, and easily extensible to any size. In the graph representation, the surface patch can be effectively generated by taking advantage of fast shortest-path search algorithms. For this purpose, we implement a altered Dijkstra algorithm to calculate the geodesic distance. We choose a cutoff distance of 9 ?, which gives affordable results for describing similarities between protein-protein interactions. The average number of vertices per patch is about 500. For a typical protein with 100 residues, the final graph has 9,000 vertices. The number of patches generated for each protein is the same as the number of vertices. All of the patches are generated during the fingerprint calculation stage and are not stored to save memory. Only 5 patches are regenerated in the EPSS scoring stage for explicit alignment. Fingerprint Generation. We use the distance-dependent distribution of curvatures as the fingerprint of the patch. More specifically (see Fig. S1in the patch, the curvature between and the center vertex can be calculated as = (|r+ C ? n(23), where is usually a step function; = |and and are the normals and coordinates of and is taken as average of all normals for vertices within 2.5 ? of the center vertex = 60 is the total number of bins, and and are the normalized distributions in bin for the 2 2 patches, respectively. Averaged Fingerprint Similarity Score. For each patch and and and and is measured using a scoring function = min? in patch and any vertex in patch that cannot fit in patch within the sampling accuracy and are the auxiliary patches of and em Y /em , respectively. PDB Screening Dataset. The structure database we use for screening is usually a snapshot of the Protein Data Bank created on January 7th, 2008. We first individual each PDB file into different chains based on the chain ID, and all atoms without a chain ID (mostly solvent) are discarded. By parsing the metadata and residue information in the PDB files, we eliminate the DNA and RNA chains. We also eliminate chains that contain only metal, water, or other small cofactors. The Pyraclonil final number of valid chains is usually 107,592. We select 2 enzyme-inhibitor sets and search for patch similarity in the PDB. The first inhibitor set contains alpha-chymotrypsin inhibitors. To find known chymotrypsin inhibitors, we first search the Protein Data Bank Web interface using the keywords chymotrypsin inhibitor, and manually check the SCOP (24) classification (1.73 version) of the search results to locate the SCOP protein entries that correspond to real alpha-chymotrypsin inhibitors. For each such entry we search the SCOP database and find all PDBIDs and chain IDs of the proteins that belong to the same entry. The reason for such an approach is usually that all chymotrypsin inhibitors have diverse sequence similarity and fold, and therefore cannot be identified by searching only sequence or fold similarity. Furthermore, the inhibitors themselves are not usually annotated as chymotrypsin inhibitors in the PDB files. For the second set that contains uracil-DNA glycosylase inhibitors, we simply search with the keywords uracil Pyraclonil glycosylase inhibitors through the text of the PDB files and manually select the inhibitors from the searching results. In total, we collect 243 chymotrypsin inhibitor domains (Table S5) and 26 uracil-DNA glycosylase inhibitor domains (Table S6) from the PDB snapshot. Screening Protocol. For each protein structure, we first calculate the DFSS scores of all possible patches as compared to the query patch, and kept the top 10% of the best-scoring (DFSS) patches for more accurate AFSS scoring. We select the query patch whose center vertex is located in the middle of.

Related Post