RNA structure models generated by FARFAR2 benchmarking
We have updated the Das lab code for the fragment assembly of RNA with full-atom refinement (FARFAR) to consolidate a number of new best practices. In the process of developing and testing these best practices, we have created three benchmark sets: FARFAR2-Classics, FARFAR2-Motifs, and FARFAR2-Puzzles. These data sets describe a number of different structure prediction challenges -- 18 small whole RNAs, 82 small RNA sub-structures, or 21 large simulated blind prediction tasks -- and here, we deposit millions of RNA models generated in an attempt to solve them. These models have been scored in Rosetta energy functions and their heavy-atom RMSDs to the native structure -- the correct answer -- have been measured. We hope this data set will serve starting points for further high-resolution refinement and as labeled training data for energy function development or machine learning research.
In a 2020 revision to this PURL, we have added RESTRAINED.tar, which contains an additional set of models for each of the 21 RNA-Puzzles used to benchmark this method. To determine the remaining bottlenecks in the FARFAR2 method, we generated these additional models through simulations augmented with coordinate restraints to the crystal structure's coordinates. In an ideal scoring function, these native-like models, almost all of which are less than an Angstrom away from the crystal, would score much better than any models further away.
One important note is that the RMSDs associated with these models are heavy-atom RMSDs calculated with Rosetta, which makes an uncommon but important choice that may lead to small deviations from other software in a limited set of cases. For these RMSD calculations, "free moieties" in the experimental structure -- that is, those that do not make any atomic contacts with other residues -- are omitted from the calculation. (The intuition being that it may be enough to predict that they are non-interacting!) This flag may be turned off in Rosetta (-virtualize_free_moieties_in_native false) to confirm that the RMSD otherwise matches up with other prediction software. It is rare for such "free moieties" to make a significant difference in cases at "RNA Puzzle scale"; one case where it does appear to matter is RNA Puzzle 1, PDB code: 3MEI.
|Type of resource
|School of Medicine
|Watkins, A.M., Rangan, R., Das, R. (2020) "FARFAR2: Improved de novo Rosetta prediction of complex global RNA folds" Structure 28 (8) : 963 - 976
- Use and reproduction
- User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
- This work is licensed under a Creative Commons Attribution Share Alike 3.0 Unported license (CC BY-SA).
- Preferred Citation
- Watkins, Andrew and Das, Rhiju. (2019). RNA structure models generated by FARFAR2 benchmarking. Stanford Digital Repository. Available at: https://purl.stanford.edu/wn364wz7925
Stanford Research DataView other items in this collection in SearchWorks
Also listed in
Loading usage metrics...