RNA structure models generated by FARFAR2 benchmarking

Placeholder Show Content

Abstract/Contents

Abstract

We have updated the Das lab code for the fragment assembly of RNA with full-atom refinement (FARFAR) to consolidate a number of new best practices. In the process of developing and testing these best practices, we have created three benchmark sets: FARFAR2-Classics, FARFAR2-Motifs, and FARFAR2-Puzzles. These data sets describe a number of different structure prediction challenges -- 18 small whole RNAs, 82 small RNA sub-structures, or 21 large simulated blind prediction tasks -- and here, we deposit millions of RNA models generated in an attempt to solve them. These models have been scored in Rosetta energy functions and their heavy-atom RMSDs to the native structure -- the correct answer -- have been measured. We hope this data set will serve starting points for further high-resolution refinement and as labeled training data for energy function development or machine learning research.

In a 2020 revision to this PURL, we have added RESTRAINED.tar, which contains an additional set of models for each of the 21 RNA-Puzzles used to benchmark this method. To determine the remaining bottlenecks in the FARFAR2 method, we generated these additional models through simulations augmented with coordinate restraints to the crystal structure's coordinates. In an ideal scoring function, these native-like models, almost all of which are less than an Angstrom away from the crystal, would score much better than any models further away.

One important note is that the RMSDs associated with these models are heavy-atom RMSDs calculated with Rosetta, which makes an uncommon but important choice that may lead to small deviations from other software in a limited set of cases. For these RMSD calculations, "free moieties" in the experimental structure -- that is, those that do not make any atomic contacts with other residues -- are omitted from the calculation. (The intuition being that it may be enough to predict that they are non-interacting!) This flag may be turned off in Rosetta (-virtualize_free_moieties_in_native false) to confirm that the RMSD otherwise matches up with other prediction software. It is rare for such "free moieties" to make a significant difference in cases at "RNA Puzzle scale"; one case where it does appear to matter is RNA Puzzle 1, PDB code: 3MEI.

Description

Type of resource software, multimedia
Date created 2020

Creators/Contributors

Author Watkins, Andrew
Author Das, Rhiju
Author Rangan, Ramya

Subjects

Subject Biochemistry
Subject School of Medicine
Subject RNA
Subject structure prediction
Genre Dataset

Bibliographic information

Related Publication Watkins, A.M., Rangan, R., Das, R. (2020) "FARFAR2: Improved de novo Rosetta prediction of complex global RNA folds" Structure 28 (8) : 963 - 976
Location https://purl.stanford.edu/wn364wz7925

Access conditions

Use and reproduction
User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
License
This work is licensed under a Creative Commons Attribution Share Alike 3.0 Unported license (CC BY-SA).

Preferred citation

Preferred Citation
Watkins, Andrew and Das, Rhiju. (2019). RNA structure models generated by FARFAR2 benchmarking. Stanford Digital Repository. Available at: https://purl.stanford.edu/wn364wz7925

Collection

Contact information

Also listed in

Loading usage metrics...