Tuesday, December 11, 2007

Concurrency

Thread Safety

A hidden problem with threads is that they may call library functions that are not thread-safe, possibly producing spurious results. A function is thread-safe if multiple threads can execute simultaneous active invocations of the function without interference. POSIX specifies that all the required functions, including the functions from the standard C library, be implemented in a thread-safe manner

An important example of a function that does not have to be thread-safe is strerror. Although strerror is not guaranteed to be thread-safe, many systems have implemented this function in a thread-safe manner. Unfortunately.

Synchronization functions for POSIX:THR threads.

description

POSIX function

mutex locks

pthread_mutex_destroy

pthread_mutex_init

pthread_mutex_lock

pthread_mutex_trylock

pthread_mutex_unlock

condition variables

pthread_cond_broadcast

pthread_cond_destroy

pthread_cond_init

pthread_cond_signal

pthread_cond_timedwait

pthread_cond_wait

read-write locks

pthread_rwlock_destroy

pthread_rwlock_init

pthread_rwlock_rdlock

pthread_rwlock_timedrdlock

pthread_rwlock_timedwrlock

pthread_rwlock_tryrdlock

pthread_rwlock_trywrlock

pthread_rwlock_wrlock

Mutex Locks

A mutex is a special variable that can be either in the locked state or the unlocked state. If the mutex is locked, it has a distinguished thread that holds or owns the mutex. If no thread holds the mutex, we say the mutex is unlocked, free or available. The mutex also has a queue for the threads that are waiting to hold the mutex. The order in which the threads in the mutex queue obtain the mutex is determined by the thread-scheduling policy, but POSIX does not require that any particular policy be implemented.

When the mutex is free and a thread attempts to acquire the mutex, that thread obtains the mutex and is not blocked. It is convenient to think of this case as first causing the thread to enter the queue and then automatically removing it from the queue and giving it the mutex.

The mutex or mutex lock is the simplest and most efficient thread synchronization mechanism. Programs use mutex locks to preserve critical sections and to obtain exclusive access to resources. A mutex is meant to be held for short periods of time. Mutex functions are not thread cancellation points and are not interrupted by signals. A thread that waits for a mutex is not logically interruptible except by termination of the process, termination of a thread with pthread_exit (from a signal handler), or asynchronous cancellation (which is normally not used).

Mutex locks are ideal for making changes to data structures in which the state of the data structure is temporarily inconsistent, as when updating pointers in a shared linked list. These locks are designed to be held for a short time. Use condition variables to synchronize on events of indefinite duration such as waiting for input.

Semaphores

In 1965, E. W. Dijkstra proposed the semaphore abstraction for high-level management of mutual exclusion and synchronization. A semaphore is an integer variable with two atomic operations, wait and signal. Other names for wait are down, P and lock. Other names for signal are up, V, unlock and post.

If S is greater than zero, wait tests and decrements S in an atomic operation. If S is equal to zero, the wait tests S and blocks the caller in an atomic operation.

If threads are blocked on the semaphore, then S is equal to zero and signal unblocks one of these waiting threads. If no threads are blocked on the semaphore, signal increments S. In POSIX:SEM terminology, the wait and signal operations are called semaphore lock and semaphore unlock, respectively. We can think of a semaphore as an integer value and a list of processes waiting for a signal operation.

Communication

The Client-Server Model

Many network applications and services such as web browsing, mail, file transfer (ftp), authentication (Kerberos), remote login (telnet) and access to remote file systems (NFS) use the client-server paradigm. In each of these applications, a client sends a request for service to a server. A service is an action, such as changing the status of a remote file, that the server performs on behalf of the client. Often the service includes a response or returns information, for example by retrieving a remote file or web page.

The client-server model appears at many levels in computer systems. For example, an object that calls a method of another object in an object-oriented program is said to be a client of the object. At the system level, daemons that manage resources such as printers are servers for system user clients. On the Internet, browsers are client processes that request resources from web servers. The key elements of the client-server model are as follows.

  • The client, not the service provider, initiates the action.

  • The server waits passively for requests from clients.

  • The client and server are connected by a communication channel that they access through communication endpoints.

Servers should robustly handle multiple simultaneous client requests in the face of unexpected client behavior. This chapter especially emphasizes the importance of catching errors and taking appropriate action during client-server interactions. You wouldn't want a web server to exit when a user mistypes a URL in the browser. Servers are long-running and must release all the resources allocated for individual client requests.

Communication Channels

A communication channel is a logical pathway for information that is accessed by participants through communication endpoints. The characteristics of the channel constrain the types of interaction allowed between sender and receiver. Channels can be shared or private, one-way or two-way. Two-way channels can be symmetric or asymmetric. Channels are distinguished from the underlying physical conduit, which may support many types of channels.

In object-orient programming, clients communicate with an object by calling a method. In this context, client and server share an address space, and the communication channel is the activation record that is created on the process stack for the call. The request consists of the parameter values that are pushed on the stack as part of the call, and the optional reply is the method's return value. Thus, the activation record is a private, asymmetric two-way communication channel. The method call mechanism of the object-oriented programming language establishes the communication endpoints. The system infrastructure for managing the process stack furnishes the underlying conduit for communication.

Universal Internet Communication Interface (UICI)

The Universal Internet Communication Interface (UICI) library, summarized in provides a simplified interface to connection-oriented communication in UNIX. UICI is not part of any UNIX standard. The interface was designed by the authors to abstract the essentials of network communication while hiding the details of the underlying network protocols. UICI has been placed in the public domain and is available on the book web site. Programs that use UICI should include the uici.h header file.

This section introduces the UICI library. The next two sections implement several client-server strategies in terms of UICI. discusses the implementation of UICI using sockets, provides a complete UICI implementation.

When using sockets, a server creates a communication endpoint (a socket) and associates it with a well-known port (binds the socket to the port). Before waiting for client requests, the server sets the socket to be passive so that it can accept client requests (sets the socket to listen). Upon detection of a client connection request on this endpoint, the server generates a new communication endpoint for private two-way communication with the client. The client and server access their communication endpoints by using file descriptors to read and write. When finished, both parties close the file descriptors, releasing the resources associated with the communication channel.

UICI prototype

description (assuming no errors)

int u_open(u_port_t port)

creates a TCP socket bound to port and sets the socket to be passive returns a file descriptor for the socket

int u_accept(int fd,
char *hostn,
int hostnsize)

waits for connection request on fd; on return, hostn has first hostname-1 characters of the client's host name returns a communication file descriptor

int u_connect(u_port_t port,
char *hostn)

initiates a connection to server on port port and host hostn. returns a communication file descriptor

UICI Clients

The client side of the file copy. The client connects to the desired port on a specified host by calling u_connect. The u_connect function returns the communication file descriptor. The client reads the information from standard input and copies it to the server. The client exits when it receives end-of-file from standard input or if it encounters an error while writing to the server.

A client that uses UICI for communication.

#include 
#include
#include
#include "restart.h"
#include "uici.h"

int main(int argc, char *argv[]) {
int bytescopied;
int communfd;
u_port_t portnumber;

if (argc != 3) {
fprintf(stderr, "Usage: %s host port\n", argv[0]);
return 1;
}
portnumber = (u_port_t)atoi(argv[2]);
if ((communfd = u_connect(portnumber, argv[1])) == -1) {
perror("Failed to make connection");
return 1;
}
fprintf(stderr, "[%ld]:connected %s\n", (long)getpid(), argv[1]);
bytescopied = copyfile(STDIN_FILENO, communfd);
fprintf(stderr, "[%ld]:sent %d bytes\n", (long)getpid(), bytescopied);
return 0;
}

The World Wide Web

Electronic hypertext contains links to expanded or related information embedded at relevant points in a document. The links are analogous to footnotes in a traditional paper document, but the electronic nature of these documents allows easier physical access to the links. As early as 1945, Vannevar Bush proposed linked systems for documents on microfiche [18], but electronic hypertext systems did not take hold until the 1960s and 1970s.

In 1980, Tim Berners-Lee wrote a notebook program for CERN called ENQUIRE that had bidirectional links between nodes representing information. In 1989, he proposed a system for browsing the CERN Computer Center's documentation and help service. Tim Berners-Lee and Robert Cailliau developed a prototype GUI browser-editor for the system in 1990 and coined the name "World Wide Web." The initial system was released in 1991. At the beginning of 1993 there were 50 known web servers, a number that grew to 500 by the end of 1993 and to 650,000 by 1997. Today, web browsers have become an integral interface to information, and the Internet has millions of web servers.

The World Wide Web is a collection of clients and servers that have agreed to interact and exchange information in a certain format. The client (an application such as a browser) first establishes a connection with a server (an application that accepts connections and responds). Once it has established a connection, the client sends an initial request asking for service. The server responds with the requested information or an error.

HTTP Primer

Clients and web servers have a specific set of rules, or protocol, for exchanging information called Hyper Text Transfer Protocol (HTTP). HTTP is a request-reply protocol that assumes that messages are delivered reliably. For this reason, HTTP communication usually uses TCP, and that is what we assume in this discussion.

Proxy Cache

Proxy caches save resources in local storage so that requests can be satisfied locally. The cache can be in memory or on disk.

Write a program called proxycache that stores all the resources from the remote hosts on disk. Each unique resource must be stored in a unique file. One way to do this is to use sequential file names like cache00001, cache00002, etc., and keep a list containing host name, resource name and filename. Most proxy implementations use some type of hashing or digest mechanism to efficiently represent and search the contents of the cache for a particular resource.

Start by just storing the resources without modifying the communication. If the same resource is requested again, update the stored value rather than create a new entry. Keep track of the number of hits on each resource.

The child processes must coordinate their access to the list of resources, and they must coordinate the generation of unique file names. Consider using threads, shared memory or message passing to implement the coordination.

Once you have the coordination working, implement the code to satisfy requests for cached items locally. Keep track of the total number of bytes transferred from client to proxy, proxy to server, server to proxy and proxy to client. Now the last two of these should be different. Remember that when you are testing with a browser, the browser also does caching, so some requests will not even go to the proxy server. Either turn off the browser's caching or force a remote access in the browser (usually by holding down the SHIFT key and pressing reload or refresh).

Real proxy caches need to contend with a number of issues.

  • Real caches are not infinite.

  • Caches should not store items above a certain size. The optimal size may vary dynamically with cache content.

  • The cache should have an expiration policy so that resources do not stay in the cache forever.

  • The cache should respect directives from the server stating that certain items should not be cached.

  • The cache should check whether an item has been modified before using a local copy.

Introduction to Connectionless Communication

Connectionless communication is an abstraction based on transmission of single messages or datagrams between sender and receiver. A datagram is a unit of data transferred from one endpoint to another. Connectionless communication makes no association between the endpoints, and a process can use a single connectionless endpoint to send messages to or receive messages from many other endpoints