Structural data used to test a new geometric deep learning RNA scoring function emulating fully de novo modeling conditions

Watkins, Andrew; Rangan, Ramya; Townshend, Raphael; Eismann, Stephan; Karelina, Masha; Dror, Ron; Das, Rhiju

doi:10.25740/sq987cc0358

Structural data used to test a new geometric deep learning RNA scoring function emulating fully de novo modeling conditions

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fsq987cc0358" class="su-underline">Show Content</a>

Abstract/Contents

Abstract

This deposition contains two primary subdirectories, each compressed.
(Depositions in the Stanford Digital Repository must have nominally flat
directory structures, solved by tarballs.) The subdirectory
"nonnative_secstruct" corresponds to the originally published version of the
paper. The subdirectory "xtal_secstruct" corresponds to the corrected version of
the paper.

For 11 of the 16 RNA molecules in this benchmark (benchmark 2 of the paper), all
of the candidate models in "nonnative_secstruct" were constructed using
incorrect Watson-Crick base pairing, in contrast to what would typically happen
in actual blind structure prediction (where modelers might try multiple
secondary structures and pick those that generate realistic 3D models). In
"xtal_sectruct", the candidate models were regenerated using correct
Watson-Crick base pairing.

Each of these primary subdirectories contains 16 .tar.gz compressed directories,
each of which contains 5,000 structural models in PDB format of 16 distinct RNA
molecules. These models were used to benchmark a new scoring function for RNA
structure.

The first order of business in choosing the cases studied in this benchmark was
ensuring that they did not overlap with any RNA molecules studied previously in
the ARES project. Thus, this benchmark does not represent a perfectly
comprehensive account of RNA structure and was not meant to: it is just one of
several ways you could select a set of structures complementary to those studied
previously in the paper. It is *likely* that, therefore, these models will be of
archival value alone; alternatively, these models, or the modeling problems they
address, could make up part of a larger comprehensive benchmark.

To that end, in each primary subdirectory we have also included inputs.tar.gz,
which provides all the files necessary for rerunning these benchmark cases
yourself. Models were generated with FARFAR2, code for RNA fragment assembly
documented extensively at https://new.rosettacommons.org/docs/latest/FARFAR2.
That documentation should demystify the executable commands found in each
README_FARFAR file.

If you are already familiar with the Das lab's repository for RNA benchmarking,
you can use that system to set up replications of the xtal_secstruct simulations
at https://github.com/DasLab/rna_benchmark, using the benchmark definition file
ares_benchmark2.txt.

If you are interested in training a new RNA scoring function or sampling method,
consider the FARFAR2-Classics and FARFAR2-Puzzles benchmarks, available at
https://purl.stanford.edu/wn364wz7925.

Description

Type of resource	Dataset
Date modified	September 8, 2022; October 5, 2022; October 5, 2022; November 1, 2022
Publication date	September 2, 2022

Creators/Contributors

Author	Watkins, Andrew	https://orcid.org/0000-0003-1617-1720 (unverified)
Author	Rangan, Ramya	https://orcid.org/0000-0002-0960-0825 (unverified)
Author	Townshend, Raphael	https://orcid.org/0000-0001-6362-1451 (unverified)
Author	Eismann, Stephan
Author	Karelina, Masha	https://orcid.org/0000-0003-1880-4536 (unverified)
Author	Dror, Ron	https://orcid.org/0000-0002-6418-2793 (unverified)
Author	Das, Rhiju	https://orcid.org/0000-0001-7497-0972 (unverified)

Subjects

Subject	Biochemistry
Subject	RNA structure
Subject	fragment assembly
Subject	blind prediction
Subject	Computer science
Subject	Deep learning (Machine learning)
Genre	Data
Genre	Data sets
Genre	Dataset

Bibliographic information

Related item	Preferred citation Citation: R. J. L. Townshend, S. Eismann, A. M. Watkins, et al., Science 373, 1047 (2021).
DOI	https://doi.org/10.25740/sq987cc0358
Location	https://purl.stanford.edu/sq987cc0358

Access conditions

Use and reproduction: User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
License: This work is licensed under a Creative Commons Attribution Share Alike 4.0 International license (CC BY-SA).

Preferred citation

Preferred citation: Watkins, Andrew M. and Rangan, Ramya and Townshend, Raphael J. L. and Eismann, Stephan and Karelina, Masha and Dror, Ron O. and Das, Rhiju. (2021). Structural data used to test a new geometric deep learning RNA scoring function emulating fully de novo modeling conditions. Stanford Digital Repository. Available at: https://purl.stanford.edu/sq987cc0358 https://doi.org/10.25740/sq987cc0358

Collection

Stanford Research Data

View other items in this collection in SearchWorks

Contact information

Contact: watkina6@gene.com

Also listed in

View in SearchWorks

Loading usage metrics...