SciLifeLab
Browse
TEXT
splits.txt (6.31 kB)
TEXT
rot70k.0.0.4.prm (8.2 MB)
.ZIP
1.data.zip (5.74 GB)
.ZIP
2.data.zip (9.8 GB)
.ZIP
3.data.zip (9.58 GB)
.ZIP
4.data.zip (12.12 GB)
.ZIP
5.data.zip (10.92 GB)
.ZIP
6.data.zip (6.08 GB)
TEXT
apply_ftresult_improved.py (3.1 kB)
ARCHIVE
apo.data.zip (4.59 GB)
ARCHIVE
holo.data.zip (4.6 GB)
DOCUMENT
scatter_plots.pdf (306.3 MB)
DATASET
splitsAPOHOLO.tsv (2.87 kB)
TEXT
README.txt (2.08 kB)
TEXT
MANIFEST.txt (0.62 kB)
1/0
15 files

IPR0220 - InterPepRank set

dataset
posted on 2021-04-26, 12:10 authored by Isak Johansson-Åkhe, Claudio Mirabello, Björn WallnerBjörn Wallner

Peptide-protein interactions between a smaller or disordered peptide stretch and a folded receptor make up a large part of all protein-protein interactions. A common approach for modelling such interactions is to exhaustively sample the conformational space by fast-fourier-transform docking, and then refine a top percentage of decoys. Commonly, methods capable of ranking the decoys for selection in short enough time for larger scale studies rely on first-principle energy terms such as electrostatics, Van der Waals forces, or on pre-calculated statistical pairwise potentials.


InterPepRank is a machine-learning based method for peptide-protein complex scoring and ranking, which encodes the structure of the complex as a graph; with physical pairwise interactions as edges and evolutionary and sequence features as nodes. The graph-network is trained to predict the LRMSD of decoys by using edge-conditioned graph convolutions on a large set of peptide-protein complex decoys.


Here we present the complete dataset used to train InterPepRank. The set contains 679 receptor-peptide pairs, each pair has 50 different peptide conformations docked by 70000 different rotations. in total 2.5 billion conformations. This is too large to be distributed as flat files. As such, the dataset is distributed as a set of ft-files describing which rotations and translations to apply to the corresponding peptide ligands to generate decoy poses docked to the receptor structures. To generate these structures, the apply_ftresult_improved.py script is available.


In addition it also contains a set of apo and holo models that was used to benchmark unbound docking.


All files and scripts are given as-is with no warranty.

Funding

Modeling transient protein-protein interactions relevant for cancer

Swedish Research Council

Find out more...

History

Publisher

Linköping Universitet