SciLifeLab
Browse
.ZST
hhblits_msas.tar.zst (857.33 MB)
.ZST
assembly.tar.zst (27.88 GB)
TEXT
MANIFEST.txt (0.19 kB)
TEXT
README.txt (1.87 kB)
1/0
4 files

Modelling of Large Protein Complexes

dataset
posted on 2022-03-22, 05:28 authored by Patrick BryantPatrick Bryant, Arne ElofssonArne Elofsson

AlphaFold and AlphaFold-multimer can predict the structure of single- and multiple chain proteins with very high accuracy. However, predicting protein complexes with more than a handful of chains is still unfeasible, as the accuracy rapidly decreases with the number of chains and the protein size is limited by the memory on a GPU. Nevertheless, it might be possible to predict the structure of large complexes starting from predictions of subcomponents. Here, we take a graph traversal approach to assemble 175 protein complexes with 10-30 chains using predictions of subcomponents. We compute paths through a complex graph constructed of subcomponents using Monte Carlo Tree Search and assemble these in a stepwise fashion. Using subcomponents predicted from all possible trimeric interactions, 91 complexes (52%) are assembled to completion. We create a scoring function, mpDockQ, that can distinguish if assemblies are complete and predict their accuracy. Selecting complete complexes with TM-score ≥0.9 at FPR 10% using mpDockQ results in 20 complete complexes with a median TM-score of 0.92. The complete assembly protocol, starting from the sequences, is freely available at: https://gitlab.com/patrickbryant1/molpc


The repository here contains MSAs and predicted subcomponents to reproduce the assembly for the "all-trimer" approach.



Funding

VR-2016-06301

SNIC 2021/5-297

SNIC 2021/6-197

Berzelius-2021-29

History

Publisher

Stockholm University

Usage metrics

    Elofsson Lab

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC