Identification of DNA Termini in Sequencing Data through Combined Analysis of End Capture and Local Strand Bias

Placeholder Show Content

Abstract/Contents

Abstract
Detecting DNA termini, such as ends of linear extrachromosomal DNA, plays an essential role in understanding the structure and functions of DNA molecules. Here we describe an approach combining direct and indirect computational methods to detect DNA termini from next-generation short-read sequencing. While a direct inference of ends can come from mapping the specific capture points of DNA fragments, this approach is insufficient for analytical pipelines where the DNA termini are not captured. Thus, we add an indirect detection of ends based on strand bias, the difference in sequence representation between the plus and minus strands of DNA in a dataset. Termini are reflected by a strong strand bias, with inward-facing reads greatly enriched over outward-facing reads in the immediate proximity of any end. Applying this analysis to negative control regions (where DNA is continuous and with no known termini), we observe no strong end capture peaks or strand bias. Applying to positive control regions where known DNA termini are present yields strong strand bias signals even in cases where blocked termini prevent end capture (for a protein-blocked adenovirus), or where ends are not explicitly captured (tagmentation of restriction digested lambda DNA). Analysis of a more complex situation (HIV replication) produces a picture that includes both the known termini of the reverse-transcribed genome (the PBS [primer binding site] on the negative strand and the PPT [3’ polypurine tract] on the positive strand) as well as a signal corresponding to a previously described additional initiation site for second strand synthesis (cPPT [central polypurine tract]). These results confirm the ability to detect DNA structural discontinuities in a pooled sample where high throughput shotgun sequence data is available. In addition to the known initiation sequence in the HIV genome, we detect a signal of positive strand DNA termini at several positions on the plus strand sequence. These sites share several characteristics with the previously characterized second strand initiation sites (the cPPT and 3’ PPT sites): (i) observed spike in directly captured cDNA ends, (ii) an indirect terminus signal evident in localized strand bias, (iii) a strong preference for forward-facing termini, (iv) an upstream purine-rich motif, and (v) a decrease in terminus signal at late time points after infection. These characteristics are consistent in duplicate samples in two different genotypes (wild type and integrase-lacking HIV). The observation of distinct internal termini associated with multiple purine rich regions suggests the possibility that multiple internal initiations of second strand synthesis might contribute to HIV replication through acceleration of second strand synthesis and/or strand displacement at the HIV 3’ end.

Description

Type of resource text
Publication date May 4, 2023

Creators/Contributors

Author Wang, William ORCiD icon https://orcid.org/0000-0002-0878-1257 (unverified)
Thesis advisor Fire, Andrew ORCiD icon https://orcid.org/0000-0001-6217-8312 (unverified)
Thesis advisor Walbot, Virginia ORCiD icon https://orcid.org/0000-0002-1596-7279 (unverified)
Degree granting institution Stanford University, Department of Biology

Subjects

Subject Genetics
Subject Molecular genetics
Subject Genomics
Subject Cancer
Subject DNA
Subject DNA termini
Subject Extrachromosomal DNA
Subject Linear extrachromosomal DNA
Subject Next-generation sequencing (NGS)
Subject High-throughput sequencing (HTS)
Subject Illumina
Subject HIV
Subject Retrovirus
Subject Bacteriophage
Subject Phage
Subject Bacteriophage lambda
Subject Restriction enzyme
Subject HIV replication
Subject HIV reverse transcription
Subject Double-stranded DNA
Subject Single-stranded DNA
Subject RNA
Subject Purine-rich sequences
Subject Central polypurine tract (cPPT)
Subject Alternative polypurine tracts (altPPT)
Subject Adenovirus
Subject Mitochondrial DNA
Subject Mitochondria
Subject Caenorhabditis elegans
Subject Chlamydomonas reinhardtii
Subject Innate immunity
Subject Biology
Genre Text
Genre Thesis

Bibliographic information

Access conditions

Use and reproduction
User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
License
This work is licensed under a Creative Commons Attribution Non Commercial 4.0 International license (CC BY-NC).

Preferred citation

Preferred citation
Wang, W. and Fire, A. (2023). Identification of DNA Termini in Sequencing Data through Combined Analysis of End Capture and Local Strand Bias. Stanford Digital Repository. Available at https://purl.stanford.edu/xf213fn4785. https://doi.org/10.25740/xf213fn4785.

Collection

Undergraduate Theses, Department of Biology, 2022-2023

View other items in this collection in SearchWorks

Contact information

Also listed in

Loading usage metrics...