Crate fasten_trim
source ·Expand description
Trims reads using 0-based coordinates
§Examples
§Adapters
§Download the adapter files
mkdir -pv $HOME/db
pushd $HOME/db # step into the db directory
git clone https://github.com/lskatz/adapterseqs
ADAPTERS=$(find $HOME/db/adapterseqs -name '*.fa')
popd # return to the original directory
§Trim the adapters
cat file.fastq | \
fasten_trim --adapterseqs <(echo -e ">test\nCTTT") > trimmed.fastq
cat $HOME/db/adapterseqs/adapters/*.fa > ./adapters.fasta
cat file.fastq | \
fasten_trim --adapterseqs ./adapters.fasta > trimmed.fastq
§Blunt-end trim five bases from the right side
cat file.fastq | fasten_trim -l -5 > trimmed.fastq
§Keep a maximum of 100bp with blunt-end trimming on the right side
cat file.fastq | fasten_trim -l 99 > trimmed.fastq
§Blunt-end trim 5bp from the left side
cat file.fastq | fasten_trim -f 4 > trimmed.fastq
§Usage
Usage: fasten_trim [-h] [-n INT] [-p] [-v] [-f INT] [-l INT]
Options:
-h, --help Print this help menu.
-n, --numcpus INT Number of CPUs (default: 1)
-p, --paired-end The input reads are interleaved paired-end
-v, --verbose Print more status messages
-f, --first-base INT
The first base to keep (default: 0)
-l, --last-base INT The last base to keep. (default: 0)
-a, --adapterseqs path/to/file.fa
fasta file of adapters
§Notes
The algorithm is as follows:
- marks the first and last bases for trimming as 0 and the last base, respectively
- if an adapter is found at the beginning of the sequence, then move the marker for where it will be trimmed
- Compare the blunt end suggested trimming against where an adapter might be found and move the marker as the most inward possible
- Trim the sequence and quality strings
Making the output more explicit while combining both algorithms can involve a two step process:
cat file.fastq | \
fasten_trim --adapterseqs ./adapters.fasta | \
fasten_trim -f 4 -l 99 > trimmed.fastq
§Output
The deflines will be altered with a description of the trimming using key=value syntax, separated by spaces, e.g.,
@M03235:53:000000000-AHLTD:1:1101:1826:14428 trimmed_adapter_rev=TT trimmed_left=0 trimmed_right=249
or for a forward adapter,
@M03235:53:000000000-AHLTD:1:1101:1758:14922 trimmed_adapter_fwd=AA trimmed_left=2 trimmed_right=251
Functions§
- main 🔒
- Read a fasta file and return a HashMap of the sequences
- Trim a set of fastq entries and send it to a channel