2. Outline
• Architectures for dynamic content
publishing
– CGI
– Java Servlet
– Server-side scripting
– JSP tag libraries
3. Motivations
• Creating pages on the fly based on the user’s
request and from structured data (e.g.,
database content)
• Client-side scripting & components do not
suffice
– They manipulate an existing document/page, do
not create a new one from strutured content
• Solution:
– Server-side architectures for dynamic content
production
4. Common Gateway Interface
• An interface that allows the Web Server to launch
external applications that create pages dynamically
• A kind of «double client-server loop»
5. What CGI is/is not
• Is is not
– A programming language
– A telecommunication protocol
• It is
– An interface between the web server and tha applications that
defines some standard communication variables
• The interface is implemented through system variables, a
universal mechanism present in all operating systems
• A CGI program can be written in any programming language
6. Invocation
• The client specifies in the URI the name of
the program to invoke
• The program must be deployed in a
specified location at the web server (e.g.,
the cgi-bin directory)
– http://my.server.web/cgi-bin/xyz.exe
7. Execution
• The server recognizes from the URI that
the requested resource is an executable
– Permissions must be set in the web server for
allowing program execution
– E.g., the extensions of executable files must
be explicitly specified
• http://my.server.web/cgi-bin/xyz.exe
8. Execution
• The web server decodes the paramaters
sent by the client and initializes the CGI
variables
• request_method, query_string, content_length,
content_type
• http://my.server.web/cgi-bin/xyz.exe?par=val
11. Execution
• The server builds the response from the
content emitted to the standard output and
sends it to the client
12. Handling request parameters
• Client paramaters can be sent in two ways
– With the HTTP GET method
• parameters are appended to the URL (1)
• http://www.myserver.it/cgi-bin/xyz?par=val
– With the HTTP POST method
• Parameters are inserted as an HTTP entity in the
body of the request (when their size is substantial)
• Requires the use of HTML forms to allow users
input data onto the body of the request
– (1) The specification of HTTP does not specify any maximum URI length,
practical limits are imposed by web browser and server software
13. HTML Form
<HTML>
<BODY>
<FORM
action="http://www.mysrvr.it/cgi-bin/xyz.exe"
method=post>
<P> Tell me your name:<p>
<P><INPUT type="text"
NAME="whoareyou"> </p>
<INPUT type="submit"
VALUE="Send">
</FORM>
</BODY>
</HTML>
14. Structure of a CGI program
Read environment variable
Execute business logic
Print MIME heading "Content-type: text/html"
Print HTML markup
15. Parameter decoding
Read variable
Request_method
Read variable Read variable
Query_string content_length
Read content_length
bytes from the
standard input
16. CGI development
• A CGI program can be written in any programming language:
– C/C++
– Fortran
– PERL
– TCL
– Unix shell
– Visual Basic
• In case a compiled programming language is used, the
source code must be compiled
– Normally source files are in cgi-src
– Executable binaries are in cgi-bin
• If instead an interpreted scripting language is used the source
files are deployed
– Normally in the cgi-bin folder
17. Overview of CGI variables
• Clustered per type:
– server
– request
– headers
18. Server variables
• These variables are always available, i.e.,
they do not depend on the request
– SERVER_SOFTWARE: name and version of
the server software
• Format: name/version
– SERVER_NAME: hostname or IP of the
server
– GATEWAY_INTERFACE: supported CGI
version
• Format: CGI/version
19. Request variables
• These variables depend on the request
– SERVER_PROTOCOL: transport protocol name
and version
• Format: protocol/version
– SERVER_PORT: port to which the request is
sent
– REQUEST_METHOD: HTTP request method
– PATH_INFO: extra path information
– PATH_TRANSLATED: translation of PATH_INFO
from virtual to physical
– SCRIPT_NAME: invoked script URL
– QUERY_STRING: the query string
20. Other request variables
• REMOTE_HOST: client hostname
• REMOTE_ADDR: client IP address
• AUTH_TYPE: authentication type used by
the protocol
• REMOTE_USER: username used during the
authentication
• CONTENT_TYPE: content type in case of
POST and PUT request methods
• CONTENT_LENGTH: content length
21. Environment variables: headers
• The HTTP headers contained in the request
are stored in the environment with the prefix
HTTP_
– HTTP_USER_AGENT: browser used for the
request
– HTTP_ACCEPT_ENCODING: encoding type
accepted by the client
– HTTP_ACCEPT_CHARSET: charset accepted
by the client
– HTTP_ACCEPT_LANGUAGE: language
accepted by the client
24. Problems with CGI
• Performance and security issues in web server to
application communication
• When the server receives a request, it creates a new
process in order to run the CGI program
• This requires time and significant server resources
• A CGI program cannot interact back with the web server
• The process of the CGI program is terminated when
the program finishes
• No sharing of resources between subsequen calls (e.g., reuse of
database connections)
• No main memory preservation of the user’s session (database
storage is necessary if session data are to be preserved)
• Exposing to the web the physical path to an
executable program can breach security
26. Esempio completo
1. Prima
richiesta 2. Recupero
risorsa
Form.html Form.html
3. Risposta
5. Set variabili
d'ambiente e
4. Seconda chiamata
richiesta
6. Calcolo Mult.cgi
risposta
7. Invio
risposta
Mult.c
Precedentemente
compilato in...
Mult.cgi
27. La form (form.html)
<HTML>
<HEAD><TITLE>Form di URL
moltiplicazione</TITLE><HEAD> chiamata
<BODY>
<FORM ACTION="http://www.polimi.it/cgi-bin/run/mult.cgi">
<P>Introdurre i moltiplicandi</P>
<INPUT NAME="m" SIZE="5"><BR/>
<INPUT NAME="n" SIZE="5"><BR/>
<INPUT TYPE="SUBMIT" VALUE="Moltiplica">
</FORM>
<BODY> Vista in un
browser
</HTML>
28. #include <stdio.h>
Lo script Istruzioni di
stampa della
#include <stdlib.h> risposta
sull'output
int main(void){
char *data;
long m,n;
printf("%s%c%cn", "Content-Type:text/html;charset=iso-8859-
1",13,10);
printf("<HTML>n<HEAD>n<TITLE>Risultato Recupero di
moltiplicazione</TITLE>n<HEAD>n"); valori dalle
variabili
printf("<BODY>n<H3>Risultato moltiplicazione</H3>n"); d'ambiente
data = getenv("QUERY_STRING");
if(data == NULL)
printf("<P>Errore! Errore nel ricevere i dati dalla form.</P>n");
else if(sscanf(data,"m=%ld&n=%ld",&m,&n)!=2)
printf("<P>Errore! Dati non validi. Devono essere numerici.</P>n");
else
printf("<P>Risultato: %ld * %ld = %ld</P>n",m,n,m*n);
printf("<BODY>n");
return 0;
}
29. Compilazione e test locale della
• Compilazione: Set manuale
variabile
$ gcc -o mult.cgi mult.c d'ambiente
contenente la
query string
• Test locale:
$ export QUERY_STRING="m=2&n=3"
$ ./mult.cgi
• Risultato:
Content-Type:text/html;charset=iso-8859-1
<HTML>
<HEAD>
<TITLE>Risultato moltiplicazione</TITLE>
<HEAD>
<BODY>
<H3>Risultato moltiplicazione</H3>
<P>Risultato: 2 * 3 = 6</P>
<BODY>
30. Considerazioni su CGI
• Possibili problemi di sicurezza
• Prestazioni (overhead)
– creare e terminare processi richiede tempo
– cambi di contesto richiedono tempo
• Processi CGI:
– creati a ciascuna invocazione
– non ereditano stato di processo da invocazioni
precedenti (e.g., connessioni a database)
31. Riferimenti
• CGI reference:
http://hoohoo.ncsa.uiuc.edu/cgi/overview.ht
ml
• Sicurezza e CGI:
http://www.w3.org/Security/Faq/wwwsf4.ht
ml
Notes de l'éditeur
Scripts can be accessed by their virtual pathname, followed by extra information at the end of this path. The extra information is sent as PATH_INFO. This information should be decoded by the server if it comes from a URL before it is passed to the CGI script. "The 'extra path info' is the information that follows the filename in a URL when separated by a '/' (as opposed to query string info, which is what follows a '?').
AUTH_TYPE The name of the authentication scheme used to protect the servlet. For example, BASIC, SSL, or null if the servlet was not protected. CONTENT_LENGTH The length of the request body in bytes made available by the input stream or -1 if the length is not known. For HTTP servlets, the value returned is the same as the value of the CGI variable CONTENT_LENGTH. CONTENT_TYPE The MIME type of the body of the request, or null if the type is not known. For HTTP servlets, the value returned is the same as the value of the CGI variable CONTENT_TYPE. GATEWAY_INTERFACE The revision of the CGI specification being used by the server to communicate with the script. It is "CGI/1.1". HTTP_ACCEPT Variables with names beginning with "HTTP_" contain values from the request header, if the scheme used is HTTP. HTTP_ACCEPT specifies the content types your browser supports. For example, text/xml. HTTP_ACCEPT_CHARSET Character preference information. Used to indicate the client's prefered character set if any. For example, utf-8;q=0.5. HTTP_ACCEPT_ENCODING Defines the type of encoding that may be carried out on content returned to the client. For example, compress;q=0.5. HTTP_ACCEPT_LANGUAGE Used to define which languages you would prefer to receive content in. For example, en;q=0.5. If nothing is returned, no language preference is indicated. HTTP_FORWARDED If the request was forwarded, shows the address and port through of the proxy server. HTTP_HOST Specifies the Internet host and port number of the resource being requested. Required for all HTTP/1.1 requests. HTTP_PROXY_AUTHORIZATION Used by a client to identify itself (or its user) to a proxy which requires authentication. HTTP_USER_AGENT The type and version of the browser the client is using to send the request. For example, Mozilla/1.5. PATH_INFO Optionally contains extra path information from the HTTP request that invoked the script, specifying a path to be interpreted by the CGI script. PATH_INFO identifies the resource or sub-resource to be returned by the CGI script, and it is derived from the portion of the URI path following the script name but preceding any query data. PATH_TRANSLATED Maps the script's virtual path to the physical path used to call the script. This is done by taking any PATH_INFO component of the request URI and performing any virtual-to-physical translation appropriate. QUERY_STRING The query string that is contained in the request URL after the path. REMOTE_ADDR Returns the IP address of the client that sent the request. For HTTP servlets, the value returned is the same as the value of the CGI variable REMOTE_ADDR. REMOTE_HOST The fully-qualified name of the client that sent the request, or the IP address of the client if the name cannot be determined. For HTTP servlets, the value returned is the same as the value of the CGI variable REMOTE_HOST. REMOTE_USER Returns the login of the user making this request if the user has been authenticated, or null if the user has not been authenticated. REQUEST_METHOD Returns the name of the HTTP method with which this request was made. For example, GET, POST, or PUT. SCRIPT_NAME Returns the part of the URL from the protocol name up to the query string in the first line of the HTTP request. SERVER_NAME Returns the host name of the server that received the request. For HTTP servlets, it is the same as the value of the CGI variable SERVER_NAME. SERVER_PORT Returns the port number on which this request was received. For HTTP servlets, the value returned is the same as the value of the CGI variable SERVER_PORT. SERVER_PROTOCOL Returns the name and version of the protocol the request uses in the following form: protocol/majorVersion.minorVersion. For example, HTTP/1.1. For HTTP servlets, the value returned is the same as the value of the CGI variable SERVER_PROTOCOL. SERVER_SOFTWARE Returns the name and version of the servlet container on which the servlet is running. HTTP_COOKIE HTTP Cookie String. WEBTOP_USER The user name of the user who is logged in. NCHOME The NCHOME environment variable.