Crate fasten_kmer

source ·
Expand description

Counts kmers. Each line is a kmer with two columns separated by tab: kmer, count Optional columns starting with column 3 are the reads that start with that kmer with a delimiter of ~

Examples

Counting kmers of 15. Using --paired-end will not matter here.

cat testdata/four_reads.fastq | fasten_kmer -k 15 > 15mers.tsv

Counting kmers and retaining reads

cat testdata/four_reads.fastq | \
  fasten_kmer -k 15 --remember-reads > 15mers.tsv

Example output

First two lines of a kmer output where they contain reads

TAGTGAATCCTTTTTCATAAA   39      @M03235:53:000000000-AHLTD:1:1113:23312:4764 1:N:0:6~TAGTGAATCCTTTTTCATAAAATCTTGCTTCAAAATTGCTAAGAGTTTATAAGCAAGAAGTGTTCCAAGTTTGCAAGATGAGGTGAGATTGTGTAAATAAGCTACAAAATTTTTAATTTAAGCCCTACAAGCTCTTAAATATCAAAAGCATTTTCTAAAATATGCAAAAATGTAAGCAAAATGTTTAAAGGAAAGTCGTGAAAAATGCTGAAAAAACTTTAAGAAGGAATTTTTTTACCCTAATCTTACTT~+~>AAA>DDFFFFFGGGGGGGGGGHHHHHHHHHHHHHHHHHHHHGHHGHHHHHHHHFHHHHHGGHHHHHHHGHHHHHHGHHHHHHCHGHHFHHHHGHHHHHHHHHHHGHHHGHHHHHHHHHHHHHHHGHHHGHHFHGHFHHHHHHGHHHFHHHHHHHGHHHHHHHHGHHHHHHHHGHHFHHHHHHHHHHHHHHHGHGFHAFDHGHFHFHHGHGHHFHGFFGGHHHHHFGHFHB=0D::GCGHBHHFBCGGGGG~@M03235:53:000000000-AHLTD:1:1113:23312:4764  2:N:0:6~TCGTAGTAGTATTTCCTAAAATAAGGCAAACCATAGATGATAGACCCACAAAAAGAAAGTAAGATTAGGGTAAAAAAATTCCTTCTTAAAGTTTTTTCAGCATTTTTCACGACTTTCCTTTAAACATTTTGCTTACATTTTTGCATATTTTAGAAAATGCTTTTGATATTTAAGAGCTTGTAGGGCTTAAATTAAAAATTTTGTAGCTTATTTACACAATCTCACCTCATCTTGCAAACTTGGAACACTT~+~CCCCDCCFFFFFGGGGGGGGGGHHHHHHHHHGHHHHHHHHHHGHHHGHGGGHHHGGGHHHHFFHHFHHHFEGGHHHGGHHFHHHHHHHGHHFDGHGGFHHHHHHHHHHGHHGGGGGHHGGHHGHHHHHHHHHGHHHHHHHHHGGHHHHHHHHHHHHGHFFHHHHHGHGHHHFHFHHFHHHHHFBFFHHHEDHHHGFHHHHGHGHHDHBGGHHGHHFDGEHHHFFHHFHGHHHHGFC::CBFFBBFF/CFB
TATCAAGGCTGCTCAAATGAT   35      @M03235:53:000000000-AHLTD:1:1114:18962:2371  1:N:0:6~TATCAAGGCTGCTCAAATGATGGCTTTTGTTATGCTCCGCAAAAGCGTGAATTTAGAATTTTTAAAGAGGGTCAAATTTATAAAACTAGCCCTTATGAAACAATGCAAAGTGAAGAAGAGCAAATCGCCTTTTCTTTGAAAAATGAAAATTTAGCACTCATCTTGCTTAGTTTTTTTGGTTACGGACTTTTGCTTTCTCTTACGCCTTGCACCTTACCGATGATTCCTATTTTATCTTCACTTATCATAG~+~AABBA5FBAFFBGGGGGGGGGGHGHHHFHHHHHHHHHCGGGGGBHFFEE2FHHFHHHFGGHHHGHHHFHGGGHGHHHHGHHHHHHHHHHHHFHHHGHHHHFFHHHHHHHHHHGHHHCGGGH3FHEGDAFGGGGHHFGHFHHEHHHGHFFFHHGEHHHHGHHHFHFFHHHHDFHHGCFDGHEHFEGDCCHHHBBG0GFHFHHBGGF-G?BGGGCGCG//;.9.CBFB0BBGGGGBFFFF0;0FFGFGBF00~@M03235:53:000000000-AHLTD:1:1114:18962:2371   2:N:0:6~GATTAAAGAAAGTAAAAAGCTTTGTTTTTTAGAAGGTTTCGTGCCACCTTTTGCTATGATAAGTGAAGATAAAATAGGAATCATCGGTAAGGTGCAAGGCGTAAGAGAAAGCAAAAGTCCGTAACCAAAAAAACTAAGCAAGATGAGTGCTAAATTTTCATTTTTCAAAGAAAAGGCGATTTGCTCTTCTTCACTTTGCATTGTTTCATAAGGGCTAGTTTTATAAATTTGACCCTCTTTAAAAATTCTAA~+~CCCCCFFFFFFFGGGGGGGGGGHHFHHHGGHHGGHHGHHHGHGGHHHGHHHHHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHHHHGGGEGFGFHHFHGHGGGEHGHHHHHHGHHHHFHFE?GEGHHHHGGGGGGGHHHHGHHHHHHHFDFHHHHGFHFHHGHHGHHHHHHHHHH.A@EGGC0D0G0D0GDHFHHFCC00FGFHHHHHHHFHB;EFGGFGGBFGEFGGFFFFGCBFGGGGGGBFGGFGFFF

Usage

Usage: fasten_kmer [-h] [-n INT] [-p] [-v] [-k INT]
Options:
   -h, --help          Print this help menu.
   -n, --numcpus INT   Number of CPUs (default: 1)
   -p, --paired-end    The input reads are interleaved paired-end
   -v, --verbose       Print more status messages
   -k, --kmer-length INT
                       The size of the kmer
   -r, --revcomp       Count kmers on the reverse complement strand too
   -m, --remember-reads
                       Add reads to subsequent columns. Each read begins with
                       the kmer. Only lists reads in the forward direction.

Constants

  • Glues together paired end reads internally and is a character not expected in any read

Functions

  • Read fastq from stdin and count kmers
  • Read a str of nucleotides and count kmers. If should_revcomp is true, then will also count kmers on the opposite strand.
  • main 🔒
  • revcomp 🔒
    reverse-complement a dna sequence
  • Complementary nucleotide for ACTGUN, case insensitive