update January 7, 2020
NAME
customdoc.py - recursively descend through a directory structure, changing HTML documentation files to correspond to the directory structure on the local system.

SYNOPSIS
python customdoc.py oldstr newstr htmldir
DESCRIPTION
Files:
oldstr - strings that need to be changed in all HTML files
newstr - new strings to be substituted for strings in oldstr
htmldir - directories in which to begin the search. customdoc.py begins in a high-level directory and recursively descends, changing all HTML files. Symbolic links are not followed to avoid redundant or unintended changes.

For every line in oldstr, there must be a corresponding line in newstr.

Lines are read one at a time, and for each line, changes are made consecutively, until all possible changes are made. For example, string 1 from oldstr is changed to string 1 from newstr, then string 2 from oldstr is changed to string 2 from newstr, and so forth.

If no changes are made in a file, the file is left unmodified, including date and filemode. If the file is changed, the old HTML file is overwritten and the file mode is set to 644 (rw-r-r-).

EXAMPLE
Customizing BIRCH documentation files for a local installation.

oldstr
  ~
http://home.cc.umanitoba.ca/~psgendb
http://www.umanitoba.ca/afs/plant_science/psgendb
psgendb@cc.umanitoba.ca
/home/psgendb
psgendb
newstr
  ~
http://home.cc.umanitoba.ca/~psgendb
file:///home/birch
birch@tux.mb.ca
/home/birch
birch
htmldir
/home/birch/public_html

In the example above, for each line, the first change made is to convert '~' to '~' (tilda), just to keep things consistent. Some HTML editors such as Netscape Composer will change tilda to ~, so this step just makes it easier to make sure that later changes will all be made. For BIRCH, the other lines above are as follows:
line 2 - Main URL for the BIRCH system. This URL points to the directory for the main BIRCH Web site eg.  /home/psgendb/BIRCHDEV/public_html. This is accessible  through httpd in oldstr, and therefore is accessible to the world. In newstr, the 'file://' string tells a browser that this is a local file, only accessible to people logged into the local system. It all depends on how your local web site, if any is setup.

line 3 - This is an alternative URL pointing to the BIRCH home directory. eg. /home/psgendb/BIRCHDEV. If symbolic links ARE allowed, you can simply make line 3 identical
to line 2. This works because symbolic links already exist within public_html for important documentation directories such as doc, dat and local.

line 4 - Email address for the BIRCH administrator. This will be included in links from documentation so that users can mail the sysadmin.

line 5 - The path for the BIRCH home directory

line 6 - userid for BIRCH system administrator


If your site DOES permit symbolic links in personal web sites
The symbolic link public_html/birchhomedir points to $BIRCH,  making it easy to find documentation files by referencing the main $BIRCH directory.  Thus, lines 2 and 3 in newstr should point to public_html and public_html/birchhomedir:

http://www.biology.abc.edu/~birch
http://www.biology.abc.edu/~birch/birchhomedir
Preventing lines from being changed

There are two ways to protect lines from being changed.

1) DEPRECATED: Any line containing the HTML comment <!-- DON'T CHANGE -->  will be left unchanged.

2) An entire block of HTML can be protected

<HTML>
<BODY>
Any text here may be changed.
<!-- BEGIN PROTECT -->
Nothing in this section will be changed.
For example, the anchor tag below originally was on a single line, but the SeaMonkey Composer automatically splits up the anchor in the HTML code:

<a
   href="ftp://ftp.cc.umanitoba.ca/psgendb/BIRCH/data/blreads/genome">ftp://ftp.cc.umanitoba.ca/psgendb/BIRCH/data/blreads/genome</a>.

Thus, 'psgendb' would be changed to birch, at most BIRCH sites, which would break the link. The
PROTECT tags allow us to protect the entire block from change.
<!-- END PROTECT -->
</BODY>
</HTML>

Tagging one or more lines to be omitted from the output
When a line contains the HTML comment <!-- BEGIN DELETE -->, customdoc.py deletes all lines until a line containing <!-- END DELETE --> is found. For example,

<HTML>
<BODY>
This page will have only one line of text when processed.
<!-- BEGIN DELETE -->
The text in this section will
be deleted.
<!-- END DELETE -->
</BODY>
</HTML>

will be changed to

<HTML>
<BODY>
This page will have only one line of text when processed.
</BODY>
</HTML>
Replacing one or more lines with HTML from another file
When a line contains the HTML comment
 <!-- BEGIN REPLACE name="filename" -->,
customdoc.py deletes all lines until a line containing <!-- END REPLACE --> is found. For example,

<HTML>
<BODY>
This line won't be changed.
<!-- BEGIN REPLACE name="localident.html" -->
The text in this section will
be replaced.
<!-- END REPLACE -->
</BODY>
</HTML>

will be changed to

<HTML>
<BODY>
This line won't be changed.
<!-- BEGIN REPLACE name="localident.html" -->
This text was taken from the file localident.html
<!-- END REPLACE -->
</BODY>
</HTML>

Notes:
1. The path of filename is relative to the current directory.
2. customdoc.py does very primitive parsing of these pseudo-comment lines. The pseudocomments must appear exactly as shown above. For example, no blanks can appear between "name" and "=" or between "=" and the filename.
BUGS
1.  In Python2, very little checking was done to find encoding for text files.  Python3 is does more checking, and has a number of ways to handle encodings. The current approach used by customdoc.py is to explicitly set the encoding when opening a text file. For web pages in English, latin-1 usually works. All bets are off for other languages and Unicode encodings.  For a thorough discussion see:
http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html#files-with-a-reliable-encoding-marker
   

AUTHOR
Dr. Brian Fristensky
Department of Plant Science
University of Manitoba
Winnipeg, MB  Canada R3T 2N2
frist@cc.umanitoba.ca
http://home.cc.umanitoba.ca/~frist