[Italian Article] [BETA] [Summary] [Information] [Editorial Staff] [Browser]

Cured by Luciano Giustini

CGI Corner

General information - What are CGIs for - How CGIs work - Server configuration - Environment variables

This is the first of a series of articles dedicated the Common Gateway Interface (CGI) programming

by Michele Beltrame

General information

In this article I would like to briefly describe CGI (Common Gateway Interface) programming. When you see, on a Web site, things such as access counters, imagemaps or dynamic pages, you can be almost sure that they've been created using CGI technology.
First of all let's clarify that CGI isn't a programming language, but an interface using which the Web client (the browser) is able to interact with programs located on the Web server. Actually you can write CGI programs with the programming language you like best or with the one you know best: C, C++, Pascal, SmallTalk, Phyton, Basic, .... However, the most used language for this programs is Perl, a language born in Unix some year ago and for which we should thank especially its creator, Larry Wall. The advantages you get from using Perl are many: it is easy to learn, it has got very powerful string manipulation functions and operators, it is highly portable and rich of extensions. However, a Perl interpreter should be installed on your server (or on your provider's server), because Perl is an interpreted language (to say it all, there's also a compiler, written by Malcom Beattie, but it's still in alpha testing). All the examples I'll insert in these articles are written in Perl. In any case, as I said before, you may use a programming language you like. You can find the Perl interpreter for many operating systems (besides the Unix versions, there is also the Windows NT porting and many others) on Tom Christiansen's www.perl.com.
In this article I'll call the CGI programs simply CGIs. This is not exactly correct, but everyone does and it's brief. ;-)
What do I do with CGIs ?
There are lots of applications of CGI programming, all of them oriented to allow a major interaction between user and server, a thing which doesn't happen for normal HTML pages (except for the links only). Anyway, there are three main applications of CGI programming, from which many of the other derive:
Dynamic pages - Dynamic pages, picturesquely called virtual pages by some people, are HTML pages created on the fly, in response to a specific request made by the client or because of the need of displaying data which changes frequently. An example of dynamic page may be the following:
Michele Beltrame home page You are visitor number 330 and you use Mozilla/3.0 (X11; I; Linux 2.0.21 i586) as Web browser.
This page has been clearly generated dynamically, because some data which was unknown to the creator of the CGI program is being displayed. (in this specific case, the number of hits the page received and the name of the client).
Forms - [Form example] One of the most useful applications of CGI technology is the possibility to handle online forms filled in by the user. Many of the controls typical of most graphic interfaces may be used in a form: radio buttons, check boxes, list boxes, text areas, .... You can see an example of a form inserted in a HTML page in the image. Forms are usually used to collect information from the user and then store them in a database (or send them to a mailbox). However, it's also possible to create dynamic pages with the data given by the user with a form.

Gateways - The are kinds of information, such us the contents of a database, which can't be directly accessed by the client. To get past this problem it is necessary to use gateways, which are simply programs which read a certain file and interpret its contents, translating them in a format readable by the client.
How the CGI interface works

For the ones who are interested not just in knowing how to use a program, but also in understanding how it works ( and I hope there are many of you ;-) ), I'll write some lines on how a CGI is invoked by the client and on how it returns its output to the client. A CGI program is invoked as any other HTML document (it is "requested"): the client sends to the server something like the following:
GET /cgi-bin/booksearch.pl HTTP/1.0 Accept: text/html Accept: text/plain Accept: image/gif Accept: image/jpeg User-Agent: Mozilla/3.0 (X11; I; Linux 2.0.21 i586)
The name of the requested file (booksearch.pl in the directory /cgi-bin) and the protocol used (HTTP 1.0) are on the first line. On the following lines the formats which the client can accept in reply to the request (in this case text files, html files, gifs and jpegs) are reported. In the last line there is the client name (Netscape 3.0/Linux in this case). There may be other lines in the request, such us the username, but only the first line is relevant to understand how CGI works.
If the server receives a request for a document that is located on a specific directory (/cgi-bin in this case) or that has got a particular extension (it may be .cgi for the files located outside the /cgi-bin directory), it doesn't send the document to the client, but it executes it as if it was an executable program (and in fact it is), sending its output to the client instead of the standard device (the screen, for instance). In order for the output to properly arrive to the client, it should be structured as follows:
HTTP/1.0 200 OK Date: Sunday, 22-September-96 11:09:00 GMT Server: Apache/1.1.1 MIME-version: 1.0 Content-type text/html Content-length: 4539 <HTML> HTML page created by the CGI </HTML>
As you can see, you should return to the client a document with a full header containing date, time, name of the server program, MIME protocol version, content type and content length. However, in most cases it is enough to return a partial header which just specifies the content type :
Content-type: text/html <HTML> HTML page created by the CGI </HTML>
The server will complete the header with the missing information. This feature makes the creation of HTML pages much easier, although there are circumstances, which we'll see in a future article, in which full headers have to be used.
Server configuration

It is necessary to make some simple changes to your server configuration for the CGI interface to work properly. Many of you won't need to work on the configuration files, because your service provider has probably already configured everything. Anyway, I think that the topic is worth some words. The configuration examples that I inserted here are for the Apache http server (you can freely fetch it from http://www.apache.org), but also work with NCSA httpd. However, I cannot guarantee for the other servers. ;-)
There are three configuration files (which in most cases are located /usr/local/etc/apache/conf) : httpd.conf, access.conf e srm.conf. The following directives should be included (or changed if they're already there) in httpd.conf :

<Directory /usr/local/business/http/italpro.com> - This directive should point to the path where your html files are stored (the so called DocumentRoot path).

Options All - This directive defines which features are enabled by default when Apache is started. The possible options are "ExecCGI", "Indexes", "Includes", "FollowSymLinks", "Multiview" or any combination of these; with "All" they're all enabled, with "None" they're all disabled. The options needed in order for the CGI interface to work are "ExecCGI" (which enables programs execution) and "Includes" (which enables Server Side Includes, of which we'll talk in the future). It's a good idea to activate them all.

Let's go on with srm.conf :

DocumentRoot /usr/local/business/http/italpro.com - Like the "Directory" directive of access.conf, this one should also point to the path where your html files are stored.

ScriptAlias /cgi-bin/ /usr/local/business/http/italpro.com/cgi-bin/ - This directive defines a directory within which every file, regardless of its name or extension, is considered an executable file. So, a directory like this is a good place to store CGI programs. Remember that it's possible to have more than one ScriptAlias directory. The parameters for this directive are the alias, that is to say the virtual path (the virtual root directory is the DocumentRoot), and the real path on the hard disk where the CGI programs are. This means that if the client makes a request like this :
http://www.italpro.com/cgi-bin/conta.pl
the following program will be executed :
/usr/local/business/http/italpro.com/cgi-bin/conta.pl

AddHandler cgi-script .cgi - This directive instructs the server so that is considers all the files outside the directory /cgi-bin with the .cgi extension to be CGI programs (executables). It is possible to specify more than one extension: many people, for instance, make the server consider CGIs all the files with .pl extension (Perl).

AddHandler server-parsed .shtml - This directive causes all the files with .shtml extension to be parsed by the server before being sent to the client. This allows the server to look for Server Side Includes (of which, as I said before, we'll talk in future articles). It is possible to make the server parse all files with .html and .htm extension; however, this may slow down page downloading.

The last file we need to analyze is httpd.conf :

ServerRoot /usr/local/etc/apache - Indicates the directory in which subdirectories the configuration and log files are stored.

Environment variables

All right, everything should be configured by now... we are ready to begin with some simple CGI application. The CGI interface sets a number of environment variables which the programs can access to. This variables give many types of information. Here follows a summary of them, don't worry if you don't understand the function of some of them :

Variable Description

GATEWAY_INTERFACE Revision number of the Common Gateway Interface used by the server

SERVER_NAME Server name or IP address

SERVER_SOFTWARE Server name and version

SERVER_PROTOCOL Name and version of the protocol with which the request was made

SERVER_PORT Port number of the host on which the server is running.

REQUEST_METHOD Method using which the request has been sent

PATH_INFO Extra path information passed to the CGI program

PATH_TRANSLATED Translated PATH_INFO variable

SCRIPT_NAME Virtual path of the CGI program being executed (ie. /cgi-bin/booksearch.pl

DOCUMENT_ROOT Root directory were html files are stored

QUERY_STRING String containing the information passed to the program. It is appended to the URL using a "?"

REMOTE_HOST Name of the remote host which made the request

REMOTE_ADDR IP address of the remote host which made the request

AUTH_TYPE Authentication method used to validate the user

REMOTE_USER Authenticated user name

REMOTE_IDENT Name of the user who made the request. This variable is set only if the RFC 931 identification scheme is supported by the client and if the NCSA IdentityCheck flag is enabled

CONTENT_TYPE MIME type of data passed to the CGI program (ie. text/plain, text/html, ...)

CONTENT_LENGTH Length in bytes of the data passed to the CGI program

HTTP_FROM E-mail address of the user who made the request. This variable is not set by most browsers

HTTP_ACCEPT A list of MIME types which the client accepts

HTTP_USER_AGENT Name of the browser used by the client

HTTP_REFERER URL of the document where the client was before the call to the CGI program

Not all servers and not all clients set all this variables, so some of them may not work always.
Now let's see a simple Perl program which uses some of the variable described above :
#!/usr/bin/perl print "Content-type: text/html\n\n"; # Indicates the document MIME type print "<HTML>\n"; print "<HEAD><TITLE>CGI Test</TITLE></HEAD>\n"; # Sends the header print "<BODY>\n"; print "<P>Hallo!\n<BR>"; print "You come from ", $ENV{'REMOTE_HOST'}, "<BR>\n"; # Sends the name of the remote host, where the client is print "You are using ", $ENV{'HTTP_USER_AGENT'}, " as Web browser</P>\n"; # Sends the Web browser name print "</BODY></HTML>"; exit (0);
It is possible to call this CGI program from any HTML document simply adding a link to it. For instance, let's suppose that the program name is infoclient.pl and that it is stored in the directory /cgi-bin. In this case, the line to add is the following :
<A HREF="/cgi-bin/infoclient.pl">Client information</A>
When called, the CGI program will create a new (dynamic) HTML page, which will be similar to the following :
Hallo! You come from atlantis.io.com You are using Mozilla/3.0 (X11; I; Linux 2.0.21 i586) as Web browser.
The client host name and the browser name are extracted from the environment variables. They are then displayed to the user. This example, though very simple, clearly shows the potentials of CGI technology.
Conclusion...

I hope I succeeded in introducing CGI programming. In the next article I'll go deeply into the forms, so don't miss the next issue of BETA. Bye!

Bibliography :
Larry Wall / Randal L. Schwartz - "Programming perl" - O'Reilly & Associates
Shishir Gundavaram - "CGI Programming on the World Wide Web" - O'Reilly & Associates
Apache Documentation - http://www.apache.org/docs/

Michele Beltrame is Webmaster of ItalPro and is reachable on Internet by the editorial page.

[Italian Article] [BETA] [Summary] [Information] [Editorial Staff] [Browser]