Web Database Applications with PHP \& MySQLWeb Database Applications with PHP \& MySQLSearch this book

Chapter 5. User-Driven Querying

Contents:

User Input
Querying with User Input
Case Study: Previous and Next Browsing
Case Study: Producing a select List

In this chapter, we build on the querying techniques discussed in Chapter 4 and complete our coverage of techniques that read data from web databases. We focus here on user-driven querying, in which the user provides data that controls the query process. To input parameters into the querying process, the user usually selects or types data into an HTML <form> environment, or clicks on links that request scripts.

We explain user-driven querying by introducing how to:

Our case-study example in this chapter is the wine browsing component of the winestore. Similar to most user-driven modules, the wine browsing component has two subcomponents: first, the search bar allows the user to enter a type of wine as a criteria for a database query; and, second, the results pages show the user the wines that match the criteria entered in the search bar. The search bar is shown in Figure 5-1 at the base of the winestore search page, and the results of running the query are presented above it in a results page. The results pages allow the user to view the wines in pages of 12 wines each, move between results pages, and add wines to his shopping cart.

Figure 5-1

Figure 5-1. The winestore search bar and results page

The querying just described is a two-component user-driven querying process. A less common type of user-driven querying describes a query that doesn't produce output, but instead returns the user directly to the query input component. This one-component querying process is often used to add items to a shopping cart. We also explain one-component querying in this chapter.

Extended examples of user-driven querying can be found in Chapter 10 to Chapter 13.

5.1. User Input

Three techniques can be used to pass data that drives the querying process in a web database application:

Using an HTML <form> and clicking on hypertext links are the most common techniques for providing user input for querying in web database applications.

In practice, user data or parameters are passed from a web browser to a web server using HTTP; Chapter 1 contains an introduction to HTTP and more details can be found in Appendix B. Using HTTP, data is passed with one of two methods, GET or POST. In the GET method, data is passed as part of the requested URL; the GET method gets a resource with the parameters modifying how the resource is retrieved. In the POST method, the data is encoded separately from the URL and forms part of the body of the HTTP request; the POST method is used when data is to be posted or stored on the server. The HTML <form> environment can specify either the GET or POST method, while an embedded link or a manually entered URL with parameters always uses the GET method.

In this section, we discuss how to:

Section 5.2 introduces techniques to execute queries that include user input and to present the results.

5.1.1. Passing Data with URLs

The first technique that passes data from a web browser to a web server is manual entry of a URL in a web browser.

Consider an example user request with a parameter. In this example, the user types the following URL directly into the Location box in the Location toolbar of a Netscape browser:

http://localhost/example.5-1.php?regionName=Riverland

The URL specifies that the resource to be retrieved is example.5-1.php with a query string parameter of regionName=Riverland appended to the resource name. The user then presses the Enter key to issue an HTTP request for the resource and to use the GET method that passes the parameter to the resource. The query string parameter consists of two parts: a parameter name regionName and a value for that parameter of Riverland.

The script resource example.5-1.php is shown in Example 5-1. Before the script is processed by the PHP scripting engine, variables associated with any parameters to the resource are initialized and assigned values. In this example, a variable $regionName, which has the same name as the URL parameter name, is automatically initialized by the PHP engine and assigned the value Riverland that was passed in the URL. This variable and its value are then accessible from within the script, making the data passed by the user available in the middle tier.

Example 5-1. Printing the value of a parameter passed to the script with an HTTP request

<!DOCTYPE HTML PUBLIC 
               "-//W3C//DTD HTML 4.0 Transitional//EN"
               "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
  <title>Parameter</title>
</head>
<body>
<?php
  include 'db.inc';

  echo "RegionName is " . $regionName . "\n";
?>
</body>
</html>

As a result of running the script, the following HTML document is created with the value of the query string parameter printed as part of the output:

<!DOCTYPE HTML PUBLIC 
               "-//W3C//DTD HTML 4.0 Transitional//EN"
               "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
  <title>Parameter</title>
</head>
<body>
RegionName is Riverland
</body>
</html>

In practice, as discussed later in Section 5.2, this data might be used as part of a clause in an SQL query.

Automatic variable initialization from parameters is one of the best features of PHP. PHP automatically initializes each variable that has the same name as a parameter in an HTTP request, and the parameter values are automatically assigned to the variables. No additional programming is required to access query string parameters.

More than one parameter can be passed with an HTTP GET request by separating each parameter with an ampersand character. For example, to pass two parameters regionName and Type with the values Yarra and Red, respectively, the following URL can be created:

http://localhost/test.php?regionName=Yarra&Type=Red

The values of these parameters can then be printed in the script test.php using the fragment:

echo $regionName;
echo $Type;

5.1.2. Passing Data with the HTML <form> Environment

The second technique that captures data passed from a browser to a server is the HTML <form> environment.

Manually entering data as part of a URL is unusual. Instead, users typically enter data into an HTML <form> that is then encoded by the browser as part of an HTTP request. Example 5-2 is an HTML document that contains a <form> in which to enter the name of a wine region. The page, rendered with a Netscape browser, is shown in Figure 5-2.

Figure 5-2

Figure 5-2. A simple page to capture user input

Example 5-2. An HTML <form> for entry of a regionName

<!DOCTYPE HTML PUBLIC 
               "-//W3C//DTD HTML 4.0 Transitional//EN"
               "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
  <title>Explore Wines in a Region</title>
</head>
<body bgcolor="white">
  <form action="example.5-1.php" method="GET">
    <br>Enter a region to browse :
    <input type="text" name="regionName" value="All">
    (type All to see all regions)
    <br>
    <input type="submit" value="Show wines">
  </form>
  <br><a href="index.html">Home</a>
</body>
</html>

When the user presses the button labeled Show Wines, the data entered in the <form> is encoded in an HTTP request for the resource example.5-1.php. The resource to be requested is specified in the action attribute of the <form> tag, as is the method used for the HTTP request:

<form action="example.5-1.php" method="GET">

In this <form>, there is only one <input> widget with the attribute type="text" and name="regionName". When the GET method is used, the name of this attribute and its value result are appended to the URL as query string parameters. If the user types Yarra Valley into the text widget and then clicks on Show Wines, the following URL is requested:

http://localhost/example.5-1.php?regionName=Yarra+Valley

Submitting the <form> has the same result as manually typing in the URL but the user need not understand URLs and HTTP requests when using <form>.

After submitting the <form>, the script in Example 5-1 outputs as a response an HTML document containing the phrase "regionName is Yarra Valley". Note that the space character entered by the user in the <form> is automatically encoded in the URL as a plus character by the web browser, then decoded back to a space character by the PHP scripting engine.

The HTTP POST method can be used in a <form> instead of the GET method by changing the method="GET" attribute of the <form> tag to method="POST"; the merits of POST versus GET are discussed in more detail in Appendix B. This change of method has no effect on automatic variable initialization in PHP scripts, and the PHP script engine initializes variables from the parameters passed in the POST request in the same way it does for GET requests. The script in Example 5-1 can be used without modification to process a regionName attribute that is passed with a POST request.

NOTE: All <form> fields—whether passed using the GET or POST methods—are automatically translated into PHP variables for direct use in scripts.

This is one of the best features of PHP, making it far simpler to write web-enabled scripts in PHP than in other languages. However, it introduces a minor security risk discussed later.

5.1.3. Passing Data with Embedded Links

The third technique that passes data from a web browser to a web server is embedding links in an HTML document. This technique runs queries in most web database applications and is conceptually similar to manually entering a URL. We show how to create embedded links using the results of database queries in Section 5.2.

Embedded links in an HTML document can be authored in the same way a manually created URL is typed into a web browser. Consider the script shown in Example 5-3 that is rendered in a Netscape browser in Figure 5-3.

Figure 5-3

Figure 5-3. The HTML document shown in Example 5-3 rendered in a Netscape browser

Example 5-3. HTML document containing three links that pass two different parameters

<!DOCTYPE HTML PUBLIC
               "-//W3C//DTD HTML 4.0 Transitional//EN"
               "http://www.w3.org/TR/html401/loose.dtd">
<html>
<head>
  <title>Explore Wines</title>
</head>

<body bgcolor="#ffffff">

Explore all our 
<a href="example.5-4.php?regionName=All&amp;wineType=All">
wines</a>

<br>Explore our 
<a href="example.5-4.php?regionName=All&amp;wineType=Red">
red wines</a>

<br>Explore our
<a href="example.5-4.php?regionName=Riverland&amp;wineType=Red">
premium reds from the Riverland</a>

<br><a href="index.html">Home</a></body>
</html>

The script contains three links that can request the resource example.5-4.php and pass different parameters to the resource. For example, the first link in the HTML document is:

Explore all our <a href="example.5-4.php? regionName=All&amp;wineType=All">
wines</a>

Clicking on this link creates an HTTP request for the URL:

http://localhost/example.5-4.php?
regionName=All&wineType=All

The result of the request is that the script in Example 5-4 is run, and the following HTML document is created:

<!DOCTYPE HTML PUBLIC 
               "-//W3C//DTD HTML 4.0 Transitional//EN"
               "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
  <title>Parameters</title>
</head>
<body>
regionName is All
<br>wineType is All
</body>
</html>

Example 5-4. A simple script to print out HTTP attributes and values

<!DOCTYPE HTML PUBLIC  
               "-//W3C//DTD HTML 4.0 Transitional//EN"
               "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
  <title>Parameters</title>
</head>
<body>
<?php
  include 'db.inc';

  $regionName  = clean($regionName, 30);
  $wineType    = clean($wineType, 10);

  echo "regionName is "      . $regionName  . "\n";
  echo "<br>wineType is "    . $wineType    .  "\n";
?>
</body>
</html>

Note that the ampersand characters in the URLs in the HTML document are replaced with &amp;, because the ampersand character has a special meaning in HTML and should not be included directly in a document. When the link is clicked, the encoded &amp; is translated by the browser to & in forming the HTTP request.

5.1.4. Security and User Data

This section introduces simple techniques that preprocess user data to solve many common security holes in web database applications.

WARNING: Using the techniques described here doesn't completely secure a system. Remember that securing a web database application is important, and that the advice offered here isn't a complete solution. A discussion of other security issues is presented in Chapter 9.

Data that is passed from a web browser to a web server should be secured using the steps described here. For this purpose, we have authored the clean( ) function to ensure that the data passed to a script is of the correct length and that special characters aren't misused to attack the system. To understand why the clean( ) function is needed, we describe an example attack later in this section. The function is part the include file db.inc that is used in all scripts in the online winestore.

Consider the following script. It uses the PHP exec( ) library function to run a program on the web server. The exec( ) function takes two parameters, the program to run and an array populated with any output of the program. In this example, the script uses exec( ) to run the Unix cal program and to pass the user-entered parameter $userString to the program. The information in the parameter userString can be provided by using an HTML <form> with a text input widget, by manually creating a URL, or by embedding a link in an HTML document.

<?php
  // DO NOT INSTALL THIS SCRIPT ON A WEB SERVER

  // Run "cal" with the parameter $userString
  // Store the results in the array $result
  exec("/usr/bin/cal $userString", $result);   
?>

<!DOCTYPE HTML PUBLIC
                  "-//W3C//DTD HTML 4.0 Transitional//EN"
                  "http://www.w3.org/TR/html4/loose.dtd">
<html>
  <head>
    <title>Calendar</title>
  </head>
<body>
<pre>
<?php
   // Print out each line of the calendar
   foreach($result as $element)
      echo  "$element\n";
?>
</pre>
</body>
</html>
WARNING: Never use exec( ) or other commands to run programs from a web script or to query a database without securing the user data. Do not install the calendar example on a web server.

The Unix cal program is a useful utility that produces monthly or yearly calendars for any date. For example, to produce a calendar for the whole of 2003, a user could request the URL:

http://localhost/cal.php?userString=2003

This runs the command /usr/bin/cal 2003 and outputs the complete 2003 calendar, as shown in Figure 5-4.

Figure 5-4

Figure 5-4. Output of the dangerous calendar example when the user requests a 2003 calendar

To produce a calendar for February 2003, the user requests:

http://localhost/cal.php?userString=2+2003

Requesting the URL without any parameters produces the calendar for the current month:

http://localhost/cal.php

While this script might seem useful and innocuous, this script is a major security hole and should never be installed on a web server.

To illustrate why the script should never be installed, consider how it can be misused. If a user wants to enter two or more commands on a single line in a Unix shell, he can do so by separating the commands with a semicolon character. For example, to see who is logged in and then to list the files in the current directory, a user can type the following commands at the shell:

% who ; ls

Now, consider what happens if he exploits this feature by requesting the following URL:

http://localhost/cal.php?userString=2001;cat+/etc/passwd

The script produces a 2001 calendar, followed by the system password file, as shown in Figure 5-5! The script allows a creative user to do things the web server process can do. The identity of the owner of the web server process affects the severity of the actions that can be performed, but this is at best a major security hole.

Figure 5-5

Figure 5-5. Output when the user requests a 2001 calendar and the system password file

Semicolons, colons, greater-than and less-than signs, and other special characters can cause a script or a query to provide undesirable functions, especially if the script executes the library functions system( ) or exec( ) to run server commands. Even if a <form> makes it difficult for a user to enter undesirable data, he can manually create his own request by entering a URL and authoring a query string.

WARNING: Never trust anything you don't have control of, that is, anything not in middle or database tiers.

SQL querying also has problems. For example, a user can guess the structure of database tables and how a query is formed from user input. A user might guess that a query uses an AND clause and that a particular <form> text widget provides one of the values to the query. The user might then add additional AND and OR clauses to the query by entering a partial SQL query in the text widget. While such tricks may expose data that should remain hidden from the user, problems compound if the user inserts or deletes data with the techniques discussed in Chapter 6. However, many problems can be solved with careful server-side validation, as discussed in Chapter 7 and the approach described next.

To improve security and prevent special-character attacks, user data should be processed with the clean( ) function:

function clean($input, $maxlength)
{
  $input = substr($input, 0, $maxlength);
  $input = EscapeShellCmd($input);
  return ($input);
}

The first line uses the substr( ) function to reduce the variable $input to a maximum length of $maxlength by taking a substring beginning at the first character. You can use 30 as a maximum $regionName length for Example 5-1, and the calendar example might use a maximum length of 7. The second line calls the library function EscapeShellCmd( ), which escapes any special-purpose characters—such as semicolons, colons, greater-than and less-than signs, and so on—by replacing the character with a single backslash and then the character.

For many purposes, the clean steps are sufficient to ensure data is safe. As an example, if the parameter userString has a value of:

2001;cat /etc/passwd

then a call of:

clean($userString, 7) 

produces the harmless string 2001\;ca.

This string has no detrimental effect and provides the user with no hidden data. clean( ) is used to preprocess all user data for the winestore.

User data that has not been preprocessed or cleaned is often known as tainted data, a term originating from the Perl scripting language. Rectifying this through the processing we have described untaints user data.

5.1.5. How PHP Initializes Variables

As we have discussed throughout this section, variables are automatically initialized by PHP from the parameters passed in an HTTP request.

Automatic initialization of variables is an excellent feature for simple scripts, but it has security and processing implications. If required, the automatic initialization can be turned off by setting register_globals=false in the php.ini configuration file, usually found in the directory /usr/local/lib/. The php.ini file was copied to this location as part of the PHP installation instructions in Appendix A.

When the PHP script engine is invoked, the engine declares and initializes variables in a predefined order. The automatic initialization feature works in this order:

  1. By default, environment variables are initialized first.

  2. Variables are initialized from query string parameters passed with the GET method.

  3. POST method parameters are initialized.

  4. Variables from cookies are initialized.

  5. The Apache server internal variables are initialized.

The initialization order can be changed from the default by adjusting the variables_order setting in php.ini. The security problem occurs when a user knowingly or inadvertently overrides a previously initialized variable. For example, the PATH environment variable is one of the first initialized when the script engine is invoked. If a GET request contains an attribute named PATH, this overrides the environment variable of the same name, because GET variables are initialized after environment variables. By understanding the initialization process, the user can override previously set variables by passing through parameters. This can change script behavior and possibly lead to a security problem.

If the register_globals feature is turned off, a PHP script must use a different method to access user data. This method is more secure and requires that arrays be accessed to retrieve specific user parameters. For example, the GET variables are stored in an associative array, $HTTP_GET_VARS.

Consider the following URL that is requested by a user:

http://localhost/test.php?varname=value

The variable $varname can be printed in a PHP script by accessing the associative array $HTTP_GET_VARS using:

echo $HTTP_GET_VARS["varname"];

The only disadvantage of this approach is that the script is tailored for the GET method. Changing the <form> submission method from GET to POST requires modifying the script. All references to $HTTP_GET_VARS must be replaced with references to $HTTP_POST_VARS, because the array $HTTP_POST_VARS stores all variables passed using the POST method. However, the use of associative arrays is more secure because the script doesn't function if the user maliciously changes the <form> submission method in an attempt to compromise the system.

We often initialize local variables at the beginning of a script from the contents of the $HTTP_GET_VARS or $HTTP_POST_VARS arrays. This emulates the register_globals feature of PHP but without the security issues. Local variables make the code more attractive and readable. For example, the following code fragment initializes three variables from the contents of the $HTTP_GET_VARS array:

$surname = $HTTP_GET_VARS["surname"];
$firstname = $HTTP_GET_VARS["firstname"];
$title = $HTTP_GET_VARS["title"];

The result is that the script behaves the same as if register_globals is on. This also has the advantage that if the <form> submission method is changed from GET to POST, the code need be modified only in one place.

Other external variables can be accessed similarly:

Cookies and sessions are discussed in Chapter 8.

We have set register_globals=true in the online winestore application. However, the security implications of automatic initialization should be considered when designing any application.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.