update March 2, 2021

NAME

Gblocks.py -  Python wrapper script to run Gblocks from BioLegato.

SYNOPSIS

Gblocks.py infile [options]

DESCRIPTION

Gblocks removes highly gapped or poorly conserved positions from a multiple sequence alignment prior to phylogenetic analysis. Some of the parameters in Gblocks ask for a specified number of sequences. For example, -b1 asks for the minimum number of sequences for a conserved position. This would have better been implemented as a percentage. Therefore, the -b1 parameter for Gblocks.py takes a percentage value, then calls the  Gblocks binary with the correct number of sequences, as determined from number of sequences in the input file. Gblocks.py also does some sanity checking to make sure that parameters don't conflict with each other, before running Gblocks.


OPTIONS

Note: Not all command line options of Gblocks are implemented in this script.

infile - File containing a multiple sequence alignment in FASTA format. This must be the first parameter on the command line.

--t [p,d,c] - Type of sequence. p:protein (default), d: DNA, c:Codons

--b1=<integer> - Maximum percent of sequences for a conserved block. Default: 51.

--b2=<integer> - Maximum percent of sequences for flanking position. Default: 85.

--b3=<integer> - Maximum Number Of Contiguous Nonconserved Positions. Default: 8.

--b4=<integer >= 2> - Minimum Length Of A Block. Default: 10.

--b5=[n,h,a] - Allowed Gap Positions. n:none (default), h:half, a:all

--b6=[y,n] - Use similarity matrices (protein only). y:yes (default), n:no

--v=<integer > =50> - Characters per line in results and parameters files. Default: 60.

OUTPUT FILES

Gblocks writes two output files: a FASTA file with gappy or poorly conserved positions removed, and an HTML report with the complete alignment, and highlighting to show which positions were written to the FASTA files. Gblocks simply adds a file extension to the input filename to create the output files. For example, if the input file was a1.fsa, the alignment would be written to a1.fsa.fsa, and the HTML report would be written to a1.fsa.htm.


EXAMPLE


REFERENCE
Gblocks - Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis [http://molevol.cmima.csic.es/castresana/Gblocks.html]

AUTHOR

Dr. Brian Fristensky
Department of Plant Science
University of Manitoba
Winnipeg, MB  Canada R3T 2N2
frist@cc.umanitoba.ca
http://home.cc.umanitoba.ca/~frist