update May 16, 2013
gstat.py - calculate protein statistics
SYNOPSISgstat.py infile [infile]
Calculates the following statistics from protein sequences in a
1. the molecular weight of each protein.
2. the theoretical pI value of each protein.
3. the composition of each amino acid.
4. the total number of amino acids in each protein
-kda --- display molecular weights in kDa
-img --- display locus tags from IMG/ER FASTA files
-aacount --- perform an amino acid count for each sequence
-tsv --- presents the output in a TSV format
gstat.py -kda ecoli-k12.fsa > data_save.csv
gstat.py -img ecoli-k12.fsa ecoli-plasmid1.fsa
gstat.py -kda -img ecoli-k12.fsa ecoli-plasmid1.fsa > data_save.csv
Input is a file containing one or more proteins in FASTA format.
output of the program is CSV (tab delimited) via stdout.
The columns outputted are as follows:
NAME Mol. Wt. pI COMPOSITION SEQUENCE
IF you are using multiple files, this program will print all
of the sequences in succession to standard output.
Department of Plant Science
University of Manitoba
Winnipeg, MB Canada R3T 2N2
Dr. Brian Fristensky - my work supervisor, and the man who introduced
me to the wonderful field of bioinformatics.
Lukasz Kozlowski - bisection Henderson-Hasselbach algorithm
for determining the theoretical pI of proteins
Kozlowski L. 2007-2012 Isoelectric Point Calculator.
Dr. Abby Perrill - amino acid pKa table
QUESTIONS & COMMENTS
If you have any questions, please contact me: firstname.lastname@example.org
I usually get back to people within 1-2 weekdays (weekends, I am slower)
P.S. please also let me know of any bugs, or if you have any suggestions
I am generally happy to help create new tools, or modify my existing
tools to make them more useful.
This code is licensed under the Creative Commons 3.0
Attribution + ShareAlike license - for details see: