NIST Data Publication: Supporting data for "DNA polymerase characteristics influence noise levels in sequencing of short tandem repeats" Version 1.0.0 DOI: https://doi.org/10.18434/mds2-4088 Authors: Tova Lindh Lund University, Division of Biotechnology and Applied Microbiology Department of Process and Life Science Engineering Maja Sidstedt National Forensic Center Swedish Police Authority Kevin M. Kiesler National Institute of Standards and Technology Biomolecular Measurement Division Peter M. Vallone National Institute of Standards and Technology Biomolecular Measurement Division Johannes Hedman Lund University, Division of Biotechnology and Applied Microbiology Department of Process and Life Science Engineering Contact: Kevin Kiesler kevin.kiesler@nist.gov Description: Polymerase chain reaction (PCR) applications including sequencing rely on accurate thermostable DNA polymerases. Polymerization errors may hinder the detection of low-level DNA variants such as mutations in clinical samples or DNA from minor contributors in crime scene traces. Short Tandem Repeat (STR) markers are particularly affected by artefacts. Apart from the regular random base substitutions, the repeated structure of STRs makes them prone to formation of stutter products. However, the mechanisms leading to stutter formation have not yet been fully elucidated. Here, we applied an STR assay based on Unique Molecular Identifiers (UMIs) to study the effects of DNA polymerases with different characteristics on the amplicon yield as well as the formation of PCR errors. The application of UMIs made it possible to study the impact on error formation of applying genomic DNA (mimicking the early PCR cycles) or amplicons (later cycles) as template. The levels of base substitutions were clearly connected to the fidelity of the DNA polymerases, which in turn was coupled with having an integrated 3’to 5’ exonuclease domain. Stutter formation, on the other hand, was not as directly associated with fidelity, as two high-fidelity polymerases showed quite different levels of stutter. DNA binding domains generally improve processivity which could lower the incidence of stutter. However, this was not clear in the present study as a polymerase having a DNA binding domain gave the highest stutter levels. Overall, the degree of polymerase stuttering is likely due to several different DNA polymerase characteristics. Identifying a DNA polymerase that provides low levels of stutters and base substitutions may enable the detection of low-level variants such as DNA from minor contributors in mixed forensic traces. -------------- Data Use Notes -------------- This data is publicly available according to the NIST statements of copyright, fair use and licensing; see https://www.nist.gov/director/copyright-fair-use-and-licensing-statements-srd-data-and-software You may cite the use of this data as follows: Lindh, Tova, Sidstedt, Maja, Kiesler, Kevin M., Vallone, Peter M., Hedman, Johannes (2026), Supporting data for "DNA polymerase characteristics influence noise levels in sequencing of short tandem repeats", Version 1.0.0, National Institute of Standards and Technology, https://doi.org/10.18434/mds2-4088 (Accessed: [give download date]) ------------- Data Overview ------------- The repository contains files from the sequencing instrument, in fastq format, constituting all data from the experiments performed. File naming structure: "Number"_"Polymerase used in barcoding PCR"_"Polymerase used in adaptor PCR"_"DNA sample"_"Input amount"_"replicate" Where: "Number" is the sample barcode I.D. (1 through 88) for each individual library preparation "Polymerase used in barcoding PCR" is the enzyme used in the first steps of PCR to introduce the Universal Molecular Index sequence tag "Polymerase used in adaptor PCR" is the enzyme used in the second phase of amplification to generate high concentration PCR products with sequencing adaptors at 5' and 3' ends "DNA sample" is the name of the DNA template used, which may be: 2800M, NIST SRM 2391d Component C, or one of ten single source samples used as a testbed (SS1 through SS10) "Input amount" is the quantity of DNA (in ng) used as template for the PCR amplification "replicate" is the number corresponding to replicate library preparations for the sample (either 1 or 2) --------------- Version History --------------- 1.0.0 (this version) initial release