Clustering with pimba_run

The output generated by pimba_prepare.sh is a fasta file that will be used by pimba_run.sh in the following command:

./pimba_run.sh -i <input_reads> -o <output_dir> -w <approach> -s <otu_similarity> -a <assign_similarity> -c <coverage> -l <otu_length> -h <hits_per_subject> -g <marker_gene> -t <num_threads> -e <E-value> -d <databases.txt> -m <its>

-i <input_reads> = FASTA file with reads output generated by pimba_prepare.sh;
-o <output_dir> = Directory where the results will be stored;
-w <approach> = Analysis strategy to be used. It can be ‘otu’ or ‘asv’. If ‘otu’, pimba uses vsearch. If ‘asv’, pimba uses swarm. Default: ‘otu’;
-s <otu_similarity> = Percentage of similarity used in the otu clustering. The default is 0.97;
-a <assign_similarity> = Percentage of similarity used in the taxonomy assignment. The default is 0.9;
-c <coverage> = minimum coverage for the alignment. The default is 0.9;
-l <otu_length> = Length to trim the reads. If 0, then no reads are trimmed;
-h <hits_per_subject> = if 1, choose the best hit. If > 1, choose by majority. Default is 1;
-g <marker_gene> = Path to the database that will be used in the analysis;
-t <num_threads> = Number of threads to use in the blast step. Default is 1;
-e <E-value> = Expected value used by Blast. The default is 0.00001;
-d <databases_file.txt> = File with the path of the databases.
-m <its> = if set as ‘its’, PIMBA will run the ITSx tool to extract only the intergenic regions and discard ribosomal data. Otherwise, do not use it;

The databases_file.txt must be properly configured. You can download it here.

Here is an example of a command:

./pimba_run.sh -i AllSamples.fasta -o AllSamplesCOI_98clust90assign -w otu -s 0.98 -a 0.9 -c 0.9 -l 200 -h 1 -g /path/to/your/database/ -t 24 -e 0.1 -d databases.txt

If you are analyzing an ITS dataset, use the command bellow, just adding the -m its parameter:

./pimba_run.sh -i AllSamples.fasta -o AllSamplesCOI_98clust90assign -w otu -s 0.98 -a 0.9 -c 0.9 -l 200 -h 1 -g /path/to/your/database/ -t 24 -e 0.1 -d databases.txt -m its