Unix Systems Programming: Communication

The Client-Server Model

Many network applications and services such as web browsing, mail, file transfer (ftp), authentication (Kerberos), remote login (telnet) and access to remote file systems (NFS) use the client-server paradigm. In each of these applications, a client sends a request for service to a server. A service is an action, such as changing the status of a remote file, that the server performs on behalf of the client. Often the service includes a response or returns information, for example by retrieving a remote file or web page.

The client-server model appears at many levels in computer systems. For example, an object that calls a method of another object in an object-oriented program is said to be a client of the object. At the system level, daemons that manage resources such as printers are servers for system user clients. On the Internet, browsers are client processes that request resources from web servers. The key elements of the client-server model are as follows.

The client, not the service provider, initiates the action.
The server waits passively for requests from clients.
The client and server are connected by a communication channel that they access through communication endpoints.

Servers should robustly handle multiple simultaneous client requests in the face of unexpected client behavior. This chapter especially emphasizes the importance of catching errors and taking appropriate action during client-server interactions. You wouldn't want a web server to exit when a user mistypes a URL in the browser. Servers are long-running and must release all the resources allocated for individual client requests.

Communication Channels

A communication channel is a logical pathway for information that is accessed by participants through communication endpoints. The characteristics of the channel constrain the types of interaction allowed between sender and receiver. Channels can be shared or private, one-way or two-way. Two-way channels can be symmetric or asymmetric. Channels are distinguished from the underlying physical conduit, which may support many types of channels.

In object-orient programming, clients communicate with an object by calling a method. In this context, client and server share an address space, and the communication channel is the activation record that is created on the process stack for the call. The request consists of the parameter values that are pushed on the stack as part of the call, and the optional reply is the method's return value. Thus, the activation record is a private, asymmetric two-way communication channel. The method call mechanism of the object-oriented programming language establishes the communication endpoints. The system infrastructure for managing the process stack furnishes the underlying conduit for communication.

Universal Internet Communication Interface (UICI)

The Universal Internet Communication Interface (UICI) library, summarized in provides a simplified interface to connection-oriented communication in UNIX. UICI is not part of any UNIX standard. The interface was designed by the authors to abstract the essentials of network communication while hiding the details of the underlying network protocols. UICI has been placed in the public domain and is available on the book web site. Programs that use UICI should include the uici.h header file.

This section introduces the UICI library. The next two sections implement several client-server strategies in terms of UICI. discusses the implementation of UICI using sockets, provides a complete UICI implementation.

When using sockets, a server creates a communication endpoint (a socket) and associates it with a well-known port (binds the socket to the port). Before waiting for client requests, the server sets the socket to be passive so that it can accept client requests (sets the socket to listen). Upon detection of a client connection request on this endpoint, the server generates a new communication endpoint for private two-way communication with the client. The client and server access their communication endpoints by using file descriptors to read and write. When finished, both parties close the file descriptors, releasing the resources associated with the communication channel.

UICI prototype	description (assuming no errors)
int u_open(u_port_t port)	creates a TCP socket bound to `port` and sets the socket to be passive returns a file descriptor for the socket
int u_accept(int fd, char *hostn, int hostnsize)	waits for connection request on `fd`; on return, `hostn` has first `hostname-1` characters of the client's host name returns a communication file descriptor
int u_connect(u_port_t port, char *hostn)	initiates a connection to server on port `port` and host `hostn`. returns a communication file descriptor

UICI Clients

The client side of the file copy. The client connects to the desired port on a specified host by calling u_connect. The u_connect function returns the communication file descriptor. The client reads the information from standard input and copies it to the server. The client exits when it receives end-of-file from standard input or if it encounters an error while writing to the server.

A client that uses UICI for communication.

#include 
#include 
#include 
#include "restart.h"
#include "uici.h"

int main(int argc, char *argv[]) {
  int bytescopied;
  int communfd;
  u_port_t portnumber;

  if (argc != 3) {
     fprintf(stderr, "Usage: %s host port\n", argv[0]);
     return 1;
  }
  portnumber = (u_port_t)atoi(argv[2]);
  if ((communfd = u_connect(portnumber, argv[1])) == -1) {
     perror("Failed to make connection");
     return 1;
  }
  fprintf(stderr, "[%ld]:connected %s\n", (long)getpid(), argv[1]);
  bytescopied = copyfile(STDIN_FILENO, communfd);
  fprintf(stderr, "[%ld]:sent %d bytes\n", (long)getpid(), bytescopied);
  return 0;
}

The World Wide Web

Electronic hypertext contains links to expanded or related information embedded at relevant points in a document. The links are analogous to footnotes in a traditional paper document, but the electronic nature of these documents allows easier physical access to the links. As early as 1945, Vannevar Bush proposed linked systems for documents on microfiche [18], but electronic hypertext systems did not take hold until the 1960s and 1970s.

In 1980, Tim Berners-Lee wrote a notebook program for CERN called ENQUIRE that had bidirectional links between nodes representing information. In 1989, he proposed a system for browsing the CERN Computer Center's documentation and help service. Tim Berners-Lee and Robert Cailliau developed a prototype GUI browser-editor for the system in 1990 and coined the name "World Wide Web." The initial system was released in 1991. At the beginning of 1993 there were 50 known web servers, a number that grew to 500 by the end of 1993 and to 650,000 by 1997. Today, web browsers have become an integral interface to information, and the Internet has millions of web servers.

The World Wide Web is a collection of clients and servers that have agreed to interact and exchange information in a certain format. The client (an application such as a browser) first establishes a connection with a server (an application that accepts connections and responds). Once it has established a connection, the client sends an initial request asking for service. The server responds with the requested information or an error.

HTTP Primer

Clients and web servers have a specific set of rules, or protocol, for exchanging information called Hyper Text Transfer Protocol (HTTP). HTTP is a request-reply protocol that assumes that messages are delivered reliably. For this reason, HTTP communication usually uses TCP, and that is what we assume in this discussion.

Proxy Cache

Proxy caches save resources in local storage so that requests can be satisfied locally. The cache can be in memory or on disk.

Write a program called proxycache that stores all the resources from the remote hosts on disk. Each unique resource must be stored in a unique file. One way to do this is to use sequential file names like cache00001, cache00002, etc., and keep a list containing host name, resource name and filename. Most proxy implementations use some type of hashing or digest mechanism to efficiently represent and search the contents of the cache for a particular resource.

Start by just storing the resources without modifying the communication. If the same resource is requested again, update the stored value rather than create a new entry. Keep track of the number of hits on each resource.

The child processes must coordinate their access to the list of resources, and they must coordinate the generation of unique file names. Consider using threads, shared memory or message passing to implement the coordination.

Once you have the coordination working, implement the code to satisfy requests for cached items locally. Keep track of the total number of bytes transferred from client to proxy, proxy to server, server to proxy and proxy to client. Now the last two of these should be different. Remember that when you are testing with a browser, the browser also does caching, so some requests will not even go to the proxy server. Either turn off the browser's caching or force a remote access in the browser (usually by holding down the SHIFT key and pressing reload or refresh).

Real proxy caches need to contend with a number of issues.

Real caches are not infinite.
Caches should not store items above a certain size. The optimal size may vary dynamically with cache content.
The cache should have an expiration policy so that resources do not stay in the cache forever.
The cache should respect directives from the server stating that certain items should not be cached.
The cache should check whether an item has been modified before using a local copy.

Introduction to Connectionless Communication

Connectionless communication is an abstraction based on transmission of single messages or datagrams between sender and receiver. A datagram is a unit of data transferred from one endpoint to another. Connectionless communication makes no association between the endpoints, and a process can use a single connectionless endpoint to send messages to or receive messages from many other endpoints

Unix Systems Programming

Blog Archive

About Me

Tuesday, December 11, 2007

Communication