Improvements from applying filters at k-mer counting time in kevlar

Permalink: 2018-07-16 by Daniel S. Standage in blog tags: kevlar

One of the fundamental insights of the kevlar de novo variant caller is the framing of the variant discovery problem as a search for novel k-mers. In this case, "novel" means abundant in the focal sample and effectively absent from all control samples. In the early stages of creating kevlar, it quickly became clear that many k-mers satisfying these simple …


Information content versus data volume and k-mer counting accuracy

Keeping track of k-mers for simple operations has become a fundamental component of many bioinformatics techniques. Two common operations on k-mers include set membership queries ("is k-mer X present in data set Y?") and abundance queries ("how many times does k-mer X occur in data set Y"). Several probabilistic data structures have been developed to support …
