[Italian Article] [BETA] [Summary] [Information] [Editorial Staff] [Browser]
WebTech Cured by Luciano Giustini

CGI Corner

General information - What are CGIs for - How CGIs work - Server configuration - Environment variables

This is the first of a series of articles dedicated the Common Gateway Interface (CGI) programming

by Michele Beltrame

General information

In this article I would like to briefly describe CGI (Common Gateway Interface) programming. When you see, on a Web site, things such as access counters, imagemaps or dynamic pages, you can be almost sure that they've been created using CGI technology.

First of all let's clarify that CGI isn't a programming language, but an interface using which the Web client (the browser) is able to interact with programs located on the Web server. Actually you can write CGI programs with the programming language you like best or with the one you know best: C, C++, Pascal, SmallTalk, Phyton, Basic, .... However, the most used language for this programs is Perl, a language born in Unix some year ago and for which we should thank especially its creator, Larry Wall. The advantages you get from using Perl are many: it is easy to learn, it has got very powerful string manipulation functions and operators, it is highly portable and rich of extensions. However, a Perl interpreter should be installed on your server (or on your provider's server), because Perl is an interpreted language (to say it all, there's also a compiler, written by Malcom Beattie, but it's still in alpha testing). All the examples I'll insert in these articles are written in Perl. In any case, as I said before, you may use a programming language you like. You can find the Perl interpreter for many operating systems (besides the Unix versions, there is also the Windows NT porting and many others) on Tom Christiansen's www.perl.com.

In this article I'll call the CGI programs simply CGIs. This is not exactly correct, but everyone does and it's brief. ;-)


What do I do with CGIs ?

There are lots of applications of CGI programming, all of them oriented to allow a major interaction between user and server, a thing which doesn't happen for normal HTML pages (except for the links only). Anyway, there are three main applications of CGI programming, from which many of the other derive:

Dynamic pages - Dynamic pages, picturesquely called virtual pages by some people, are HTML pages created on the fly, in response to a specific request made by the client or because of the need of displaying data which changes frequently. An example of dynamic page may be the following:

Michele Beltrame home page
You are visitor number 330 and you use Mozilla/3.0 (X11; I; Linux 2.0.21 i586) as Web browser.

This page has been clearly generated dynamically, because some data which was unknown to the creator of the CGI program is being displayed. (in this specific case, the number of hits the page received and the name of the client).

Forms - [Form example]One of the most useful applications of CGI technology is the possibility to handle online forms filled in by the user. Many of the controls typical of most graphic interfaces may be used in a form: radio buttons, check boxes, list boxes, text areas, .... You can see an example of a form inserted in a HTML page in the image. Forms are usually used to collect information from the user and then store them in a database (or send them to a mailbox). However, it's also possible to create dynamic pages with the data given by the user with a form.

Gateways - The are kinds of information, such us the contents of a database, which can't be directly accessed by the client. To get past this problem it is necessary to use gateways, which are simply programs which read a certain file and interpret its contents, translating them in a format readable by the client.


How the CGI interface works

For the ones who are interested not just in knowing how to use a program, but also in understanding how it works ( and I hope there are many of you ;-) ), I'll write some lines on how a CGI is invoked by the client and on how it returns its output to the client. A CGI program is invoked as any other HTML document (it is "requested"): the client sends to the server something like the following:

GET /cgi-bin/booksearch.pl HTTP/1.0
Accept: text/html
Accept: text/plain
Accept: image/gif
Accept: image/jpeg
User-Agent: Mozilla/3.0 (X11; I; Linux 2.0.21 i586)

The name of the requested file (booksearch.pl in the directory /cgi-bin) and the protocol used (HTTP 1.0) are on the first line. On the following lines the formats which the client can accept in reply to the request (in this case text files, html files, gifs and jpegs) are reported. In the last line there is the client name (Netscape 3.0/Linux in this case). There may be other lines in the request, such us the username, but only the first line is relevant to understand how CGI works.

If the server receives a request for a document that is located on a specific directory (/cgi-bin in this case) or that has got a particular extension (it may be .cgi for the files located outside the /cgi-bin directory), it doesn't send the document to the client, but it executes it as if it was an executable program (and in fact it is), sending its output to the client instead of the standard device (the screen, for instance). In order for the output to properly arrive to the client, it should be structured as follows:

HTTP/1.0 200 OK Date: Sunday, 22-September-96 11:09:00 GMT
Server: Apache/1.1.1
MIME-version: 1.0
Content-type text/html
Content-length: 4539

<HTML>
HTML page created by the CGI
</HTML>

As you can see, you should return to the client a document with a full header containing date, time, name of the server program, MIME protocol version, content type and content length. However, in most cases it is enough to return a partial header which just specifies the content type :

Content-type: text/html

<HTML>
HTML page created by the CGI
</HTML>

The server will complete the header with the missing information. This feature makes the creation of HTML pages much easier, although there are circumstances, which we'll see in a future article, in which full headers have to be used.


Server configuration

It is necessary to make some simple changes to your server configuration for the CGI interface to work properly. Many of you won't need to work on the configuration files, because your service provider has probably already configured everything. Anyway, I think that the topic is worth some words. The configuration examples that I inserted here are for the Apache http server (you can freely fetch it from http://www.apache.org), but also work with NCSA httpd. However, I cannot guarantee for the other servers. ;-)

There are three configuration files (which in most cases are located /usr/local/etc/apache/conf) : httpd.conf, access.conf e srm.conf. The following directives should be included (or changed if they're already there) in httpd.conf :

  • <Directory /usr/local/business/http/italpro.com> - This directive should point to the path where your html files are stored (the so called DocumentRoot path).

  • Options All - This directive defines which features are enabled by default when Apache is started. The possible options are "ExecCGI", "Indexes", "Includes", "FollowSymLinks", "Multiview" or any combination of these; with "All" they're all enabled, with "None" they're all disabled. The options needed in order for the CGI interface to work are "ExecCGI" (which enables programs execution) and "Includes" (which enables Server Side Includes, of which we'll talk in the future). It's a good idea to activate them all.

Let's go on with srm.conf :

  • DocumentRoot /usr/local/business/http/italpro.com - Like the "Directory" directive of access.conf, this one should also point to the path where your html files are stored.

  • ScriptAlias /cgi-bin/ /usr/local/business/http/italpro.com/cgi-bin/ - This directive defines a directory within which every file, regardless of its name or extension, is considered an executable file. So, a directory like this is a good place to store CGI programs. Remember that it's possible to have more than one ScriptAlias directory. The parameters for this directive are the alias, that is to say the virtual path (the virtual root directory is the DocumentRoot), and the real path on the hard disk where the CGI programs are. This means that if the client makes a request like this :
    http://www.italpro.com/cgi-bin/conta.pl
    the following program will be executed :
    /usr/local/business/http/italpro.com/cgi-bin/conta.pl

  • AddHandler cgi-script .cgi - This directive instructs the server so that is considers all the files outside the directory /cgi-bin with the .cgi extension to be CGI programs (executables). It is possible to specify more than one extension: many people, for instance, make the server consider CGIs all the files with .pl extension (Perl).

  • AddHandler server-parsed .shtml - This directive causes all the files with .shtml extension to be parsed by the server before being sent to the client. This allows the server to look for Server Side Includes (of which, as I said before, we'll talk in future articles). It is possible to make the server parse all files with .html and .htm extension; however, this may slow down page downloading.

The last file we need to analyze is httpd.conf :

  • ServerRoot /usr/local/etc/apache - Indicates the directory in which subdirectories the configuration and log files are stored.


Environment variables

All right, everything should be configured by now... we are ready to begin with some simple CGI application. The CGI interface sets a number of environment variables which the programs can access to. This variables give many types of information. Here follows a summary of them, don't worry if you don't understand the function of some of them :

VariableDescription
GATEWAY_INTERFACERevision number of the Common Gateway Interface used by the server
SERVER_NAMEServer name or IP address
SERVER_SOFTWAREServer name and version
SERVER_PROTOCOLName and version of the protocol with which the request was made
SERVER_PORTPort number of the host on which the server is running.
REQUEST_METHODMethod using which the request has been sent
PATH_INFOExtra path information passed to the CGI program
PATH_TRANSLATEDTranslated PATH_INFO variable
SCRIPT_NAMEVirtual path of the CGI program being executed (ie. /cgi-bin/booksearch.pl
DOCUMENT_ROOTRoot directory were html files are stored
QUERY_STRINGString containing the information passed to the program. It is appended to the URL using a "?"
REMOTE_HOSTName of the remote host which made the request
REMOTE_ADDRIP address of the remote host which made the request
AUTH_TYPEAuthentication method used to validate the user
REMOTE_USERAuthenticated user name
REMOTE_IDENTName of the user who made the request. This variable is set only if the RFC 931 identification scheme is supported by the client and if the NCSA IdentityCheck flag is enabled
CONTENT_TYPEMIME type of data passed to the CGI program (ie. text/plain, text/html, ...)
CONTENT_LENGTHLength in bytes of the data passed to the CGI program
HTTP_FROME-mail address of the user who made the request. This variable is not set by most browsers
HTTP_ACCEPTA list of MIME types which the client accepts
HTTP_USER_AGENTName of the browser used by the client
HTTP_REFERERURL of the document where the client was before the call to the CGI program

Not all servers and not all clients set all this variables, so some of them may not work always.

Now let's see a simple Perl program which uses some of the variable described above :

#!/usr/bin/perl

print "Content-type: text/html\n\n"; # Indicates the document MIME type

print "<HTML>\n";
print "<HEAD><TITLE>CGI Test</TITLE></HEAD>\n"; # Sends the header
print "<BODY>\n";
print "<P>Hallo!\n<BR>";
print "You come from ", $ENV{'REMOTE_HOST'}, "<BR>\n"; # Sends the name of the remote host, where the client is
print "You are using ", $ENV{'HTTP_USER_AGENT'}, " as Web browser</P>\n"; # Sends the Web browser name
print "</BODY></HTML>";

exit (0);

It is possible to call this CGI program from any HTML document simply adding a link to it. For instance, let's suppose that the program name is infoclient.pl and that it is stored in the directory /cgi-bin. In this case, the line to add is the following :

<A HREF="/cgi-bin/infoclient.pl">Client information</A>

When called, the CGI program will create a new (dynamic) HTML page, which will be similar to the following :

Hallo!
You come from atlantis.io.com
You are using Mozilla/3.0 (X11; I; Linux 2.0.21 i586) as Web browser.

The client host name and the browser name are extracted from the environment variables. They are then displayed to the user. This example, though very simple, clearly shows the potentials of CGI technology.


Conclusion...

I hope I succeeded in introducing CGI programming. In the next article I'll go deeply into the forms, so don't miss the next issue of BETA. Bye!


Bibliography :
Larry Wall / Randal L. Schwartz - "Programming perl" - O'Reilly & Associates
Shishir Gundavaram - "CGI Programming on the World Wide Web" - O'Reilly & Associates
Apache Documentation - http://www.apache.org/docs/

Michele Beltrame is Webmaster of ItalPro and is reachable on Internet by the editorial page.

Copyright © 1996 BETA Group. All rights reserved.


[Italian Article] [BETA] [Summary] [Information] [Editorial Staff] [Browser]