Behind the scenes: what does a computer actually do?

Here are some important concepts about computers that are often ignored:

A computer doesn't do very much with files that are on disk. Rather, a copy of a file is read into memory (RAM), where the computer works on that file, following the instructions you give it. At the end of this process, a modified copy of the file is written to disk. For example, in a word processing program, you must ask the program to read in (open) a file, and to save that file (close) when you are done.

The computer does with data what enzymes do with substrates. An enzyme processes a substrate to make a product.A computer processes input to produce output. A cell can produce a diverse array of products by linking together enzymes into biochemical pathways. Pathways can have many branches, making it possible to produce many different compounds from a small number of starting compounds. Similarly, a computer system typically has a large number of programs. By using output from one program as input for another program, programs can be used in series to generate many different kinds of output from a single dataset.

Unix includes a command language that provides the structural framework for executing programs and specifying where input comes from and where output goes.

ls -l                  {write out a listing of the files in the current directory}
cp adh1.dna adh1.bak   {make a copy of the file adh1.dna and call it adh1.bak)
rm adh1.dna            {rm adh1.dna}
ls -l > listing.asc    {write out a directory listing, but send the output to
                        a file called listing.asc}

Graphic user interfaces such as MS-windows are powerful, but they hide what's really going on. It is important to really going on. It is important to realize that when you point and click to do things, a series of commands is executing to accomplish what you have told the computer to do.

32-bit vs. 64-bit operating systems

One way in which the PC world lags behind is in how the operating system uses memory and disk space. Current PCs use 32-bit processors, meaning that only a little over 4.3x109 addresses (locations) in memory can be directly specified by a program. While most user applications fall well below this limit, large databases frequently exceed this size. To use larger memory, the operating system has to break up memory into 4.3Gb pages. So, to refer to any piece of data in memory takes 2 steps: one to specify the page, and the other to specify the address on that page. Also, code that explicitly requires paging is far more complex than code that doesn't have to worry about paging.

Operating systems such as Solaris Unix use native 64-bit code, meaning that they can directly address over 4.3 Gigabytes of memory.  While the need for 64-bit systems is still limited mostly to high-end servers and mainframes, systems such as Windows still lag far behind on the development curve. For example, Sun has been shipping 64-bit Ultrasparc processors since 1995 and is already shipping it's 3rd generation 64-bit machine. As of this writing (Sept. 2002), Intel has only shipped limited 64-bit machines to developers, and a 64-bit Windows is still not a finished product.