update January 7, 2020
NAME
customdoc.py - recursively descend through a
directory structure, changing HTML documentation files to
correspond to the directory structure on the local system.
SYNOPSIS
python customdoc.py oldstr newstr htmldir
DESCRIPTION
Files:
oldstr - strings that need to be changed in all HTML files
newstr - new strings to be substituted for strings in
oldstr
htmldir - directories in which to begin the search.
customdoc.py begins in a high-level directory and recursively
descends, changing all HTML files. Symbolic links are not followed
to avoid redundant or unintended changes.
For every line in oldstr, there must be a corresponding
line in newstr.
Lines are read one at a time, and for each line, changes are made
consecutively, until all possible changes are made. For example,
string 1 from oldstr is changed to string 1 from newstr, then
string 2 from oldstr is changed to string 2 from newstr, and so
forth.
If no changes are made in a file, the file is left unmodified,
including date and filemode. If the file is changed, the old HTML
file is overwritten and the file mode is set to 644 (rw-r-r-).
EXAMPLE
Customizing BIRCH documentation files for a local
installation.
oldstr
~
http://home.cc.umanitoba.ca/~psgendb
http://www.umanitoba.ca/afs/plant_science/psgendb
psgendb@cc.umanitoba.ca
/home/psgendb
psgendb
newstr
~
http://home.cc.umanitoba.ca/~psgendb
file:///home/birch
birch@tux.mb.ca
/home/birch
birch
htmldir
/home/birch/public_html
In the example above, for each line, the first change made is to
convert '~' to '~' (tilda), just to keep
things consistent. Some HTML editors such as
Netscape Composer will change tilda to ~, so this step just makes
it easier to make sure that later changes will all be made. For
BIRCH, the other lines above are as follows:
line 2 - Main URL for the BIRCH
system. This URL points to the directory for the main BIRCH Web
site eg. /home/psgendb/BIRCHDEV/public_html. This is
accessible through httpd in oldstr, and therefore is
accessible to the world. In newstr, the 'file://' string tells a
browser that this is a local file, only accessible to people
logged into the local system. It all depends on how your local
web site, if any is setup.
line 3 - This is an alternative URL
pointing to the BIRCH home directory. eg.
/home/psgendb/BIRCHDEV. If symbolic links ARE allowed, you can
simply make line 3 identical
to line 2. This works because symbolic links already exist
within public_html for important documentation directories such
as doc, dat and local.
line 4 - Email address for the BIRCH administrator. This will be
included in links from documentation so that users can mail the
sysadmin.
line 5 - The path for the BIRCH home directory
line 6 - userid for BIRCH system administrator
If your site DOES permit symbolic links in personal web sites
The symbolic link public_html/birchhomedir points to
$BIRCH, making it easy to find documentation files by
referencing the main $BIRCH directory. Thus, lines 2 and 3
in newstr should point to public_html and
public_html/birchhomedir:
http://www.biology.abc.edu/~birch
http://www.biology.abc.edu/~birch/birchhomedir
Preventing lines from being changed
There are two ways to protect lines from being changed.
1) DEPRECATED: Any line containing the HTML comment <!--
DON'T CHANGE --> will be left unchanged.
2) An entire block of HTML can be protected
<HTML>
<BODY>
Any text here may be changed.
<!-- BEGIN PROTECT -->
Nothing in this section will be changed.
For example, the anchor tag below originally was on a single
line, but the SeaMonkey Composer automatically splits up the
anchor in the HTML code:
<a
href="ftp://ftp.cc.umanitoba.ca/psgendb/BIRCH/data/blreads/genome">ftp://ftp.cc.umanitoba.ca/psgendb/BIRCH/data/blreads/genome</a>.
Thus, 'psgendb' would be changed to birch, at most BIRCH sites,
which would break the link. The
PROTECT tags allow us to protect the entire block from change.
<!-- END PROTECT -->
</BODY>
</HTML>
Tagging one or more lines to be omitted from the
output
When a line contains the HTML comment <!-- BEGIN DELETE
-->, customdoc.py deletes all lines until a line containing
<!-- END DELETE --> is found. For example,
<HTML>
<BODY>
This page will have only one line of text when processed.
<!-- BEGIN DELETE -->
The text in this section will
be deleted.
<!-- END DELETE -->
</BODY>
</HTML>
will be changed to
<HTML>
<BODY>
This page will have only one line of text when processed.
</BODY>
</HTML>
Replacing one or more lines with
HTML from another file
When a line contains the HTML comment
<!-- BEGIN REPLACE name="filename" -->,
customdoc.py deletes all lines until a line containing <!-- END
REPLACE --> is found. For example,
<HTML>
<BODY>
This line won't be changed.
<!-- BEGIN REPLACE name="localident.html" -->
The text in this section will
be replaced.
<!-- END REPLACE -->
</BODY>
</HTML>
will be changed to
<HTML>
<BODY>
This line won't be changed.
<!-- BEGIN REPLACE name="localident.html" -->
This text was taken from the file localident.html
<!-- END REPLACE -->
</BODY>
</HTML>
Notes:
1. The path of filename is relative to the current directory.
2. customdoc.py does very primitive parsing of these
pseudo-comment lines. The pseudocomments must appear exactly as
shown above. For example, no blanks can appear between "name" and
"=" or between "=" and the filename.
BUGS
1. In Python2, very little checking was done to
find encoding for text files. Python3 is does more checking,
and has a number of ways to handle encodings. The current approach
used by customdoc.py is to explicitly set the encoding when
opening a text file. For web pages in English, latin-1 usually
works. All bets are off for other languages and Unicode
encodings. For a thorough discussion see:
http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html#files-with-a-reliable-encoding-marker
AUTHOR
Dr. Brian Fristensky
Department of Plant Science
University of Manitoba
Winnipeg, MB Canada R3T 2N2
frist@cc.umanitoba.ca
http://home.cc.umanitoba.ca/~frist