Modelling of Large Protein Complexes

dataset

posted on 2022-03-22, 05:28 authored by Patrick BryantPatrick Bryant, Arne ElofssonArne Elofsson

AlphaFold and AlphaFold-multimer can predict the structure of single- and multiple chain proteins with very high accuracy. However, predicting protein complexes with more than a handful of chains is still unfeasible, as the accuracy rapidly decreases with the number of chains and the protein size is limited by the memory on a GPU. Nevertheless, it might be possible to predict the structure of large complexes starting from predictions of subcomponents. Here, we take a graph traversal approach to assemble 175 protein complexes with 10-30 chains using predictions of subcomponents. We compute paths through a complex graph constructed of subcomponents using Monte Carlo Tree Search and assemble these in a stepwise fashion. Using subcomponents predicted from all possible trimeric interactions, 91 complexes (52%) are assembled to completion. We create a scoring function, mpDockQ, that can distinguish if assemblies are complete and predict their accuracy. Selecting complete complexes with TM-score ≥0.9 at FPR 10% using mpDockQ results in 20 complete complexes with a median TM-score of 0.92. The complete assembly protocol, starting from the sequences, is freely available at: https://gitlab.com/patrickbryant1/molpc

The repository here contains MSAs and predicted subcomponents to reproduce the assembly for the "all-trimer" approach.