Crate fasten_trim

source ·
Expand description

Trims reads using 0-based coordinates

§Examples

§Adapters

§Download the adapter files

mkdir -pv $HOME/db
pushd $HOME/db # step into the db directory
git clone https://github.com/lskatz/adapterseqs
ADAPTERS=$(find $HOME/db/adapterseqs -name '*.fa')
popd # return to the original directory

§Trim the adapters

cat file.fastq | \
  fasten_trim --adapterseqs <(echo -e ">test\nCTTT") > trimmed.fastq
 
cat $HOME/db/adapterseqs/adapters/*.fa > ./adapters.fasta
cat file.fastq | \
  fasten_trim --adapterseqs ./adapters.fasta > trimmed.fastq

§Blunt-end trim five bases from the right side

cat file.fastq | fasten_trim -l -5 > trimmed.fastq

§Keep a maximum of 100bp with blunt-end trimming on the right side

cat file.fastq | fasten_trim -l 99 > trimmed.fastq

§Blunt-end trim 5bp from the left side

cat file.fastq | fasten_trim -f 4  > trimmed.fastq

§Usage

Usage: fasten_trim [-h] [-n INT] [-p] [-v] [-f INT] [-l INT]
 
Options:
    -h, --help          Print this help menu.
    -n, --numcpus INT   Number of CPUs (default: 1)
    -p, --paired-end    The input reads are interleaved paired-end
    -v, --verbose       Print more status messages
    -f, --first-base INT
                        The first base to keep (default: 0)
    -l, --last-base INT The last base to keep. (default: 0)
    -a, --adapterseqs path/to/file.fa
                        fasta file of adapters

§Notes

The algorithm is as follows:

  1. marks the first and last bases for trimming as 0 and the last base, respectively
  2. if an adapter is found at the beginning of the sequence, then move the marker for where it will be trimmed
  3. Compare the blunt end suggested trimming against where an adapter might be found and move the marker as the most inward possible
  4. Trim the sequence and quality strings

Making the output more explicit while combining both algorithms can involve a two step process:

cat file.fastq | \
  fasten_trim --adapterseqs ./adapters.fasta | \
  fasten_trim -f 4 -l 99 > trimmed.fastq

§Output

The deflines will be altered with a description of the trimming using key=value syntax, separated by spaces, e.g.,
@M03235:53:000000000-AHLTD:1:1101:1826:14428 trimmed_adapter_rev=TT trimmed_left=0 trimmed_right=249
or for a forward adapter,
@M03235:53:000000000-AHLTD:1:1101:1758:14922 trimmed_adapter_fwd=AA trimmed_left=2 trimmed_right=251

Functions§

  • main 🔒
  • read_fasta 🔒
    Read a fasta file and return a HashMap of the sequences
  • Trim a set of fastq entries and send it to a channel