Proteins and nucleic acids are large polymers (> 10**4 atoms) of amino-acids and nucleotides respectively, that fold into specific 3D structures to perform specific functions. Moreover, they interact physically, to communicate signals or regulate each other. In particular, protein-RNA assemblies participate in many aspects of cell regulation, and the deregulation of their binding is implicated in various diseases. Modelling their atomistic structure is crucial to e.g. understand the recognition mechanism, predict the effect of mutation, design new drugs or artificial RNAs.
Structural knowledge from biophysics experiments can be obtained for isolated proteins and RNA, but rarely for their 3D assembly. Therefore, computational methods called docking are developed for the 3D modelling of such an assembly: They take as input experimentally known 3D structures of the protein and the RNA, and search their most probable (lowest energy) spatial arrangements.
Most methods consider the protein and RNA as rigid, exploring only 6 positional degrees of freedom (DOF). But the 3D structure of the RNA itself can vary between the isolated state (input) and the protein-bound state (target). Current methods can accommodate or model some flexibility, but they cannot handle the most flexible regions of RNA, i.e. the single-stranded RNA (ssRNA) regions: As each nucleotide bears a dozen of DOF (atomic angles), the conformational space is non-enumerable beyond few nucleotides.
To bypass this combinatorial explosion, we have designed a fragment-based docking algorithm to discretize the search space: We first dock onto the protein surface a set of small fragments that represent all possible local conformations of ssRNA, and we combinatorially reassemble those docked fragments in order to recover a contiguous ssRNA structure. As the number of chains of compatible fragments is typically beyond the reach of brute force approaches, we have implemented a stochastic backtracking algorithm to perform an unbiased sampling of chains from the fragment connectivity graph, after computing the partition function of each pose by dynamic programming.
This method could model ab initio a protein-bound ssRNA with a length of up to 10 nucleotides, an unprecedented length far beyond the reach of standard small-molecule docking programs.