[Copyright Information] [Table of Contents] [Que Home Page]
[Next Chapter]

Introduction

by Paul Doyle

This book is about a clever little programming language called Perl and how you can use it to make the most of your World Wide Web server.

The book tells you what Perl is, how it works, and how to write Perl programs. Much of this material will be useful even if you never do any Web server work. The book also deals with some general Web server issues, such as security. But at heart, this book is about Perl programming applied to Web development.

The Web

You've surely heard of the Internet and the World Wide Web by now. If you haven't, now may be a good time to put this book back on the shelf and check out one of the many introductory volumes about the Web instead. After you've used the Web for a while, you may find yourself producing Web pages for other people to use; then would be a good time to come back to this book.

The term intranet, which doesn't have the same currency as Internet, refers to a network along the lines of the Internet but internal to a corporation and usually protected from the Internet by a firewall. Web servers dominate on these so-called local Internets just as much as they do on the real thing, so this book is as relevant to them as it is to the global network.

Terminology

If you're a Web user, terms such as URL, httpd, and browser are old hat to you. You're familiar with Apache, CERN, and proxy; and you're at least on nodding terms with the likes of CGI, MIME, and socket. (Don't worry if you've never heard of Mozilla.)

Still, just so that there's no confusion, review the following list:

That list is a roundabout way of saying that a Web server and a HTTP server are not necessarily the same thing, and that this book is about Perl in the context of HTTP servers. We'll refer to Web servers frequently throughout the book, because the concept is often useful for dealing with the presentation of information on the Web. Also, Web is in the book's title because it reads better than Using Perl for HTTP Server Programming.

Growth

The Internet is getting to be an awfully big place. According to Network Wizards (http://www.nw.com/), there appeared to be more than 9 million Internet hosts at the time when this book was written.

Estimating the number of people who use the Internet is notoriously difficult, but it's generally recognized that more than 20 million people now use the Internet on a regular basis. This figure includes people who have electronic mail or FTP access, as well as those who are fortunate enough to have the type of connection and equipment that allows them to use the Web.

Not all of the 20 million Internet users have access to the Web, but the number who have Web access is growing faster than the overall number who have Internet access. Recently, publicity about the Web has become so overwhelming that many people think of the Internet purely as being the Web. Also, for the first time, many people are purchasing PCs for the primary purpose of Web access.

The Web, in short, is in an upward growth spiral that shows no sign of leveling out before the end of the century.

Trends

Apart from the sheer scale of growth, some interesting facts about the development of the Web are beginning to emerge. All the facts presented in this section arise from a curious combination: the lack of rules about what you can do on the Web, and the very strict rules about how you do it.

One interesting fact is that people are astonishingly creative in thinking up uses for the system. Live share prices as a Web-based screen saver, political agitation and petition collection, merchandise sales via the Web, multimedia rèsumès...there's just no telling what people will get up to, given enough bandwidth.

Another interesting fact is that in spite of its scale (which suggests a homogenizing influence), the Web appears to act as an agent of diversity. Small companies, community groups, and schools are there along with the big corporations. The number of languages represented on the Web is growing, not declining. This broad spectrum of interests may be due, in part, to the increasing ease with which organizations can establish an effective Web presence.

Perhaps the most important trend, from the point of view of this book, is that the Web is becoming a much more dynamic place. Dynamic doesn't just mean that pages are now being replaced on a regular basis (although they are, which is a welcome change from the time when Web pages tended to be less recent than printed matter). The word also doesn't just mean that the people who produce the Web every day have a dynamic, creative demeanor (although many of them do, which is why we have such wonders as Robotman, at http://www.unitedmedia.com/comics/robotman). Dynamic means that much more of the information available on the Web is generated live when a user requests it. Databases are searched, files are counted, text is translated, and so on.

This trend is part of the excitement of using the Web now. An interesting page is all very well, but if the page is static, you probably won't visit it again except to see whether it has been updated. If, however, the content of the page depends on the passage of time, on your input, or on the input of other users, you are much more likely to come back.

The trend is also a big part of the excitement of developing on the Web now. Web server management involves much more than writing pages of deathless HTML; a good deal of real-time programming goes on, too. This programming is real-time in the sense that the programs react to external events and produce output that is used there and then. You could also say that the programming is real-time because the pressure for rapid development and new features means that the code is often edited while it is in use.

Perl and the Web

Which brings us to Perl.

Perl is the ideal development language for Web server work, for many reasons. Chapter 1, "Perl Overview," discusses the nature of Perl in much more detail; the following sections concentrate on the reasons why Perl suits Web server development.

Rapid Development

Many Web server programming projects are high-level rather than low-level, which means that they tend not to involve bit-level manipulations, direct operating-system calls, or interaction with the server hardware. Instead, the projects focus on reading from files, reformatting the output, and writing to standard output-the browser, in other words. The programmer does not need (or want) to get involved in the details of how file handles and buffers are manipulated, how memory is allocated, and so on.

High-level tasks such as file manipulation and text formatting are exactly the kind of tasks at which Perl excels. You can tell Perl to slurp in the contents of a file and display it on the user's browser with all new lines replaced by tabs, as follows:

while ( <INFILE> )  {  s/\n/\t/;  print;  }
Don't worry about the details of that code example until you read Chapter 1, "Perl Overview." Just notice two things:

In a nutshell, the secret of rapid development is writing small amounts of powerful code without having to consider awkward issues of syntax at every step.

Perl is pithy; a little Perl code goes a long way. In terms of programming languages, that statement usually means that the code is difficult to read and painful to write. But although Larry Wall (the author of Perl) says that Perl is functional rather than elegant. Most programmers quickly find that Perl code is very readable and that becoming fluent in writing it is not difficult. The fact that Perl is pithy rather than terse makes it especially appropriate for the high-level macro operations that are typically required in Web development.

As it happens, Perl is quite capable of handling some fairly low-level operations, too-handling operating-system signals and talking to network sockets, for example. But for most Web programming purposes, that level of detail is just not needed.

Compiler and Interpreter

A program can't achieve anything by itself; to carry out its work, it needs to be fed to either a compiler or an interpreter. Each of these entities has its advantages, as follows:

Compilers and interpreters each have relative advantages and disadvantages. Compiled code takes longer to prepare, but it runs fast, and your source stays secret. Interpreted code gets up and running quickly, but it isn't as fast as interpreted code; in addition, you need to distribute the program source if you want to allow other people to run your programs.

Which of these categories describes Perl?

Perl is special in this regard: it's a compiler that thinks it's an interpreter. Perl compiles program code into executable code before running it, so an optimization stage occurs, and the executable code runs quickly. Perl doesn't write this code to a separate executable file, however; instead, it stores the code in memory and then executes it. Therefore, Perl combines the rapid development cycle of an interpreted language with the efficient execution of compiled code.

The corresponding disadvantages of compilers and interpreters also apply to Perl. The need to compile the program each time it is run makes for slower startup than a purely compiled language provides, and developers are required to distribute source code to users. In practice, however, these disadvantages are not too limiting, for the following reasons:

In summary, Perl is compiled behind the scenes for rapid execution, but you can treat it as though it were interpreted. Tweaking your HTML is easy; just edit the code, and allow the users to run it. But is that good programming practice? Hey, that's one for the philosophers.


Because Perl code is truly compiled, it has no such thing as a run-time syntax error (unless you get into the realm of generating Perl code on the fly and then executing it). This fact is important when you consider that your server is your interface to the outside world; sudden script crashes caused by minor typos are not what you want people to see. Quick execution of a Perl script tells you whether all the syntax in the script is valid.

Of course, that's no guarantee that your code won't disgrace you for some other reason.

Flexibility

Perl was not designed in the abstract; it was written to solve a particular problem, and it evolved to serve an ever-widening set of real-world problems.

Perl's developer could have expanded the language to handle these tasks by adding more and more keywords and operators-by making the language bigger. Instead, the core of the Perl language started small and became more refined as time went on. In some ways, the language actually contracted. The number of reserved words in Perl 5.0 is actually less than half the number in Perl 4.0.

This situation reflects an awareness that Perl's power lies in its unique combination of efficiency and flexibility. Perl itself has grown slowly and thoughtfully, usually in ways that allow for enhancements and extensions to be added rather than hard-wired in. This approach has been critical in the development of Perl's extensibility over time, as the following section explains.

Extensibility

Much of the growth in Perl as a platform has come by way of the increasing use of libraries (Perl 4.0) and modules (Perl 5.0). These add-on elements (discussed in more detail in Chapter 1, "Perl Overview," and in Chapter 16, "Subroutine Definition") essentially allow developers to write self-contained portions of Perl code that can be slotted into a Perl application. The add-ons range from fairly high-level utilities (such as a module that adds HTML tags to text) to low-level, down-and-dirty development tools (such as code profilers and debuggers).

The capability to use extensions such as these is a remarkable advance in the development of a fairly slick language, and it has helped to fuel the growth in Perl use. Perl developers can easily share their work with others, and the arrival of objects in Perl 5.0 makes structured design methodologies possible for Perl applications. The language has come of age without losing any of its flexibility or raw power.


Appendix B, "Perl Web Reference," describes several Perl libraries and modules. Browse through the appendix to get the flavor of the modules that are available. Also, the CD-ROM that came with this book contains a collection of freely available modules, along with documentation that explains how to use them. For details, see Appendix C, "What's on the CD?"

Web Server Work

Web servers generate huge amounts of HTML. The M stands for markup, and you need a great deal of it to make your Web pages more exciting than the average insurance contract. Using HTML is a fiddly business, though; problems can easily arise if tags are misplaced or misspelled. Perl is a good choice of language to look after the details for you while you get on with the big picture, especially if you call on the object-oriented capabilities of Perl 5.0.

Of particular interest to many Web server managers is the fact that Perl works well with standard UNIX DBM files. Also, support for proprietary databases is growing rapidly. These considerations are significant if you plan to allow users to query database material over the Web.

Security

Security is a major issue on the Internet in general. If you use Perl for scripting on your Web server, you can easily prevent users from trying to sneak commands through for the server to execute on their behalf. Also, an excellent Perl 5.0 module called pgpperl (also known as Penguin) allows your server to use public-key cryptography techniques to protect sensitive data from eavesdroppers. For more information on pgpperl, see Appendix B, "Perl Web Reference."

Ubiquity

Many people on the Web already use Perl. Going with the flow isn't always the best approach, but Perl has grown with the Web. A great deal of experience is available on the Web if you need advice. The Perl developers are keenly aware of Web issues as they add to Perl, and they have built many Perl modules with the Web in mind.

Perl Summary

You want to use Perl for many reasons, including the fact that it is small, efficient, flexible, and robust. Perl is particularly well suited for Web development work, in which text output is a major preoccupation. And if these reasons aren't quite enough, consider this one: Perl is free.

The Structure of This Book

By now, you should be sold on the idea of using Perl for Web development work. This book tells you how to do so, using the following structure:

Throughout the book, we'll use snippets of Perl code and sometimes entire listings for illustration. All code listed in this manner is available on the CD-ROM.


This book (like life in general) is too short to describe how to do all the things in the list for all the HTTP servers and browsers that currently exist. For the sake of manageability, we'll concentrate on the Apache server and the Netscape browser, which were the most popular devices of their types at the time when this book was written. When significant differences exist between these products and other popular products, we'll draw your attention to that fact.

The Conventions Used in This Book

There are many different ways to write a book, and even more ways to format one. The Que style is an excellent way of breaking information into a readable form that is easy to digest.

The majority of application programs written these days allow you to use either a mouse or a keyboard to operate the program. In steps that tell you how to perform a particular task, I always indicate the appropriate keystroke combinations that perform the action. Hot keys, or accelerators, are designated by an underline below the character that's the accelerator. If I were giving you instructions to open a file using Microsoft Word, for example, you would see:

From the File menu, choose Open.

To indicate a combination of keys to be pressed at the same time, the two keys are joined by a plus (+) character. If I were giving you instructions to paste a piece of text from the Clipboard into a Word document, you would see:

From the Edit menu, choose Paste, or press Ctrl+V.

The names of dialog boxes and windows, and the names of dialog-box and window options, are indicated by capitalizing the initial letters of the title. When you are saving a file in Microsoft Word, for example, the dialog box that you use to specify the file name is:

File Save

Any new terms and ideas are introduced in italic type, and messages that may appear on the screen are presented in a special font:

All source code and examples of code listings are presented in a monospace font.

Tips are used to indicate some cool trick or a neat way to organize your code. Watch out for tips, and use them in your day-to-day work, because they'll generally save you time or offer you a unique solution to an existing problem.


Notes provide extra information related to the topic that is being discussed in the body of the text.

Cautions are designed to alert you to dangerous actions or situations that could cause damage to your environment. You should pay particular attention to cautions so that you do not create a problem at your site.

If more information on a particular topic appears in another chapter, you see a cross-reference that indicates the chapter to look for. A right-facing triangle indicates that the reference is in a later chapter of the book. A left-facing triangle means that the reference is in an earlier chapter. Following is an example:

Enjoy the book!


Copyright © 1996, Que Corporation
Technical support for our books and software is available by email from
support@mcp.com

Copyright ©1996, Que Corporation. All rights reserved. No part of this book may be used or reproduced in any form or by any means, or stored in a database or retrieval system without prior written permission of the publisher except in the case of brief quotations embodied in critical articles and reviews. Making copies of any part of this book for any purpose other than your own personal use is a violation of United States copyright laws. For information, address Que Corporation, 201 West 103rd Street, Indianapolis, IN 46290.

Notice: This material is from Special Edition, Using Perl for Web Programming, ISBN: 0-7897-0659-8. The electronic version of this material has not been through the final proof reading stage that the book goes through before being published in printed form. Some errors may exist here that are corrected before the book is published. This material is provided "as is" without any warranty of any kind.

[Copyright Information] [Table of Contents] [Que Home Page]
[Next Chapter]