Distributed Computing

When many similar systems are found on the same network, it is often desirable to share common files and utilities among them. For example, a system administrator might choose to keep a copy of the system documentation on one computer's disk and to make those files available to remote systems. In this case, the system administrator configures the files so users who need to access the online documentation are not aware that the files are stored on a remote system. This type of setup, which is an example of distributed computing, not only conserves disk space but also allows you to update one central copy of the documentation rather than tracking down and updating copies scattered throughout the network on many different systems.

Figure 10-2 illustrates a fileserver that stores the system manual pages and users' home directories. With this arrangement, a user's files are always available to that userno matter which system the user logs in on. Each system's disk might contain a directory to hold temporary files as well as a copy of the operating system. Chapter 22 contains instructions for setting up NFS clients and servers in networked configurations.

Figure 10-2. A fileserver

The Client/Server Model

Mainframe model

The client/server model was not the first computational model. First came the mainframe, which follows a one-machine-does-it-all model. That is, all the intelligence resides in one system, including the data and the program that manipulates and reports on the data. Users connect to a mainframe using terminals.

File-sharing model

With the introduction of PCs, file-sharing networks became available. In this scheme data is downloaded from a shared location to a user's PC, where a program then manipulates the data. The file-sharing model ran into problems as networks expanded and more users needed access to the data.

Client/server model

In the client/server model, a client uses a protocol, such as FTP, to request services, and a server provides the services that the client requests. Rather than providing data files as the file-sharing model does, the server in a client/server relationship is a database that provides only those pieces of information that the client needs or requests.

The client/server model dominates UNIX and Linux system networking and underlies most of the network services described in this book. FTP, NFS, DNS, email, and HTTP (the Web browsing protocol) all rely on the client/server model. Some servers, such as Web servers and browser clients, are designed to interact with specific utilities. Other servers, such as those supporting DNS, communicate with one another, in addition to answering queries from a variety of clients. Clients and servers can reside on the same or different systems running the same or different operating systems. The systems can be proximate or thousands of miles apart. A system that is a server to one system can turn around and act as a client to another. A server can reside on a single system or, as is the case with DNS, be distributed among thousands of geographically separated systems running many different operating systems.

Peer-to-peer model

The peer-to-peer (PTP) model, in which either program can initiate a transaction, stands in contrast to the client/server model. PTP protocols are common on small networks. For example, Microsoft's Network Neighborhood and Apple's AppleTalk both rely on broadcast-based PTP protocols for browsing and automatic configuration. The Zeroconf multicast DNS protocol is a PTP alternative DNS for small networks. The highest-profile PTP networks are those used for file sharing, such as Kazaa and GNUtella. Many of these networks are not pure PTP topologies. Pure PTP networks do not scale well, so networks such as Napster and Kazaa employ a hybrid approach.

DNS: Domain Name Service

DNS is a distributed service: Nameservers on thousands of machines around the world cooperate to keep the database up-to-date. The database itself, which maps hundreds of thousands of alphanumeric hostnames to numeric IP addresses, does not exist in one place. That is, no system has a complete copy of the database. Instead, each system that runs DNS knows which hosts are local to that site and understands how to contact other nameservers to learn about other, nonlocal hosts.

Like the Linux filesystem, DNS is organized hierarchically. Each country has an ISO (International Organization for Standardization) country code designation as its domain name. (For example, AU represents Australia, IL is Israel, and JP is Japan; see www.iana.org/cctld/cctld.htm for a complete list.) Although the United States is represented in the same way (US) and uses the standard two-letter Postal Service abbreviations to identify the next level of the domain, only governments and a few organizations use these codes. Schools in the US domain are represented by a third- (and sometimes second-) level domain: k12. For example, the domain name for Myschool in New York state could be www.myschool.k12.ny.us.

Following is a list of the six original top-level domains. These domains are used extensively within the United States and, to a lesser degree, by users in other countries:

`COM`	Commercial enterprises
`EDU`	Educational institutions
`GOV`	Nonmilitary government agencies
`MIL`	Military government agencies
`NET`	Networking organizations
`ORG`	Other (often nonprofit) organizations

As this book was being written, the following additional top-level domains had been approved for use:

`AERO`	Air-transport industry
`BIZ`	Business
`COOP`	Cooperatives
`INFO`	Unrestricted use
`MUSEUM`	Museums
`NAME`	Name registries

Like Internet addresses, domain names were once assigned by the Network Information Center (NIC, page 353); now they are assigned by several companies. A system's full name, referred to as its fully qualified domain name (FQDN), is unambiguous in the way that a simple hostname cannot be. The system okeeffe.berkeley.edu at the University of California at Berkeley (Figure 10-3) is not the same as one named okeeffe.moma.org, which might represent a host at the Museum of Modern Art. The domain name not only tells you something about where the system is located but also adds enough diversity to the namespace to avoid confusion when different sites choose similar names for their systems.

Figure 10-3. U.S. top-level domains

Unlike the filesystem hierarchy, the top-level domain name appears last (reading from left to right). Also, domain names are not case sensitive, so the names okeeffe.berkeley.edu, okeeffe.Berkeley.edu, and okeeffe.Berkeley.EDU refer to the same computer. Once a domain has been assigned, the local site is free to extend the hierarchy to meet local needs.

With DNS, email addressed to user@example.com can be delivered to the computer named example.com that handles the corporate mail and knows how to forward messages to user mailboxes on individual machines. As the company grows, its site administrator might decide to create organizational or geographical subdomains. The name delta.ca.example.com might refer to a system that supports California offices, for example, while alpha.co.example.com is dedicated to Colorado. Functional subdomains might be another choice, with delta.sales.example.com and alpha.dev.example.com representing the sales and development divisions, respectively.

BIND

On Linux systems, the most common interface to the DNS is BIND (Berkeley Internet Name Domain). BIND follows the client/server model. On any given local network, one or more systems may be running a nameserver, supporting all the local hosts as clients. When it wants to send a message to another host, a system queries the nearest nameserver to learn the remote host's IP address. The client, called a resolver, may be a process running on the same computer as the nameserver, or it may pass the request over the network to reach a server. To reduce network traffic and facilitate name lookups, the local nameserver maintains some knowledge of distant hosts. If the local server must contact a remote server to pick up an address, when the answer comes back, the local server adds that address to its internal table and reuses it for a while. The nameserver deletes the nonlocal information before it can become outdated. Refer to "TTL" on page 1060.

The system's translation of symbolic hostnames into addresses is transparent to most users; only the system administrator of a networked system needs to be concerned with the details of name resolution. Systems that use DNS for name resolution are generally capable of communicating with the greatest number of hostsmore than would be practical to maintain in a /etc/hosts file or private NIS database. Chapter 24 covers setting up and running a DNS server.

Three common sources are referenced for hostname resolution: NIS, DNS, and system files (such as /etc/hosts). Linux does not ask you to choose among these sources; rather, the nsswitch.conf file (page 435) allows you to choose any of these sources, in any combination, and in any order.

Ports

Ports are logical channels on a network interface and are numbered from 1 to 65,535. Each network connection is uniquely identified by the IP address and port number of each endpoint.

In a system that has many network connections open simultaneously, the use of ports keeps packets (page 1047) flowing to and from the appropriate programs. A program that needs to receive data binds to a port and then uses that port for communication.

Privileged ports

Services are associated with specific ports, generally with numbers less than 1024. These ports are called privileged (or reserved) ports. For security reasons, only root can bind to privileged ports. A service run on a privileged port provides assurance that the service is being provided by someone with authority over the system, with the exception that any user on Windows 98 and earlier Windows systems can bind to any port. Commonly used ports include 22 (SSH), 23 (TELNET), 80 (HTTP), 111 (Sun RPC), and 201208 (AppleTalk).

NIS: Network Information Service

NIS (Network Information Service) simplifies the maintenance of frequently used administrative files by keeping them in a central database and having clients contact the database server to retrieve information from the database. Just as DNS addresses the problem of keeping multiple copies of hosts files up-to-date, NIS deals with the issue of keeping system-independent configuration files (such as /etc/passwd) current. Refer to Chapter 21 for coverage of NIS.

NFS: Network Filesystem

The NFS (Network Filesystem) protocol allows a server to share selected local directory hierarchies with client systems on a heterogeneous network. Files on the remote fileserver appear as if they are present on the local system. NFS is covered in Chapter 22.

Optional: Internet Services

Linux Internet services are provided by daemons that run continuously or by a daemon that is started automatically by the xinetd daemon (page 376) when a service request comes in. The /etc/services file lists network services (for example, telnet, ftp, and ssh) and their associated numbers. Any service that uses TCP/IP or UDP/IP has an entry in this file. IANA (Internet Assigned Numbers Authority) maintains a database of all permanent, registered services. The /etc/services file usually lists a small, commonly used subset of services. Visit www.rfc.net/rfc1700.html for more information and a complete list of registered services.

Most of the daemons (the executable files) are stored in /usr/sbin. By convention the names of many daemons end with the letter d to distinguish them from utilities (one common daemon whose name does not end in d is sendmail). The prefix in. or rpc. is often used for daemon names. Look at /usr/sbin/*d to see a list of many of the daemon programs on the local system. Refer to "Init Scripts: Start and Stop System Services" on page 404 and "service: Configures Services I" on page 406 for information about starting and stopping these daemons.

To see how a daemon works, consider what happens when you run ssh. The local system contacts the ssh daemon (sshd) on the remote system to establish a connection. The two systems negotiate the connection according to a fixed protocol. Each system identifies itself to the other, and then they take turns asking each other specific questions and waiting for valid replies. Each network service follows its own protocol.

In addition to the daemons that support the utilities described up to this point, many other daemons support system-level network services that you will not typically interact with. Table 10-4 lists some of these daemons.

Table 10-4. Common daemons
Daemon	Used for or by	Function
`acpid`	Advanced configuration and power interface	Flexible daemon for delivering ACPI events. Replaces `apmd`.
`apmd`	Advanced power management	Reports and takes action on specified changes in system power, including shutdowns. Useful with machines, such as laptops, that run on batteries.
`atd`	`at`	Executes a command once at a specific time and date. See `crond` for periodic execution of a command.
`automount`	Automatic mounting	Automatically mounts filesystems when they are accessed. Automatic mounting is a way of demand-mounting remote directories without having to hard-configure them into `/etc/fstab`.
`crond`	`cron`	Used for periodic execution of tasks. This daemon looks in the `/var/spool/cron` directory for files with filenames that correspond to users' usernames. It also looks at the `/etc/crontab` file and at files in the `/etc/cron.d` directory. When a task comes up for execution, `crond` executes it as the user who owns the file that describes the task.
`dhcpcd`	DHCP	DHCP client daemon (page 432).
`dhcpd`	DHCP	Assigns Internet address, subnet mask, default gateway, DNS, and other information to hosts. This protocol answers DHCP requests and, optionally, BOOTP requests. Refer to "DHCP: Configures Hosts" on page 431.
`ftpd`	FTP	Handles FTP requests. Refer to "`ftp:` Transfers Files over a Network" on page 365. See also `vsftpd` (page 601). Launched by `xinetd`.
`gpm`	General-purpose mouse or GNU paste manager	Allows you to use a mouse to cut and paste text on console applications.
`httpd`	HTTP	The Web server daemon (Apache, page 785).
`in.fingerd`	`finger`	Handles requests for user information from the `finger` utility. Launched by `xinetd`.
`inetd`		Deprecated in favor of `xinetd`.
`lpd`	Line printer spooler daemon	Launched by `xinetd` when printing requests come to the machine. Not used with CUPS.
`named`	DNS	Supports DNS (page 719).
`nfsd, statd, lockd, mountd, rquotad`	NFS	These five daemons operate together to handle NFS (page 673) operations. The `nfsd` daemon handles file and directory requests. The `statd` and `lockd` daemons implement network file and record locking. The `mountd` daemon converts filesystem name requests from the mount utility into NFS handles and checks access permissions. If disk quotas are enabled, `rquotad` handles those.
`ntpd`	NTP	Synchronizes time on network computers. Requires a `/etc/ntp.conf` file. For more information go to www.ntp.org.
`portmap`	RPC	Maps incoming requests for RPC service numbers to TCP or UDP port numbers on the local system. Refer to "RPC Network Services" on page 377.
`pppd`	PPP	For a modem, this protocol controls the pseudointerface represented by the IP connection between the local computer and a remote computer. Refer to "PPP: Point-to-Point Protocol" on page 353.
`rexecd`	`rexec`	Allows a remote user with a valid username and password to run programs on a system. Its use is generally deprecated for security reasons; certain programs, such as PC-based X servers, may still have it as an option. Launched by `xinetd`.
`routed`	Routing tables	Manages the routing tables so your system knows where to send messages that are destined for remote networks. If your system does not have a `/etc/defaultrouter` file, `routed` is started automatically to listen to incoming routing messages and to advertise outgoing routes to other systems on the local network. A newer daemon, the gateway daemon (`gated`), offers enhanced configurability and support for more routing protocols and is proportionally more complex.
`sendmail`	Mail programs	The `sendmail` daemon came from Berkeley UNIX and has been available for a long time. The de facto mail transfer program on the Internet, the `sendmail` daemon always listens on port 25 for incoming mail connections and then calls a local delivery agent, such as `/bin/mail`. Mail user agents, such as KMail and Thunderbird, typically use `sendmail` to deliver mail messages.
`smbd, nmbd`	Samba	Allow Windows PCs to share files and printers with UNIX and Linux computers (page 695).
`sshd`	`ssh, scp`	Enables secure logins between remote systems (page 591).
`syslogd`	System log	Transcribes important system events and stores them in files and/or forwards them to users or another host running the `syslogd` daemon. This daemon is configured with `/etc/syslog.conf` and used with the `syslog` utility. See page 562.
`talkd`	`talk`	Allows you to have a conversation with another user on the same or a remote machine. The `talkd` daemon handles the connections between the machines. The `talk` utility on each system contacts the `talkd` daemon on the other system for a bidirectional conversation. Launched by `xinetd`.
`telnetd`	TELNET	One of the original Internet remote access protocols (page 363). Launched by `xinetd`.
`tftpd`	TFTP	Used to boot a system or get information from a network. Examples include network computers, routers, and some printers. Launched by `xinetd`.
`timed`	Time server	On a LAN synchronizes time with other computers that are also running `timed`.
`xinetd`	Internet superserver	Listens for service requests on network connections and starts up the appropriate daemon to respond to any particular request. Because of `xinetd`, a system does not need the daemons running continually to handle various network requests. For more information refer to "The `xinetd` Superserver" on page 425.

Proxy Servers

A proxy is a network service that is authorized to act for a system while not being part of that system. A proxy server or proxy gateway provides proxy services; it is a transparent intermediary, relaying communications back and forth between an application, such as a browser and a server, usually outside of a LAN and frequently on the Internet. When more than one process uses the proxy gateway/server, the proxy must keep track of which processes are connecting to which hosts/servers so that it can route the return messages to the proper process. The most commonly encountered proxies are email and Web proxies.

A proxy server/gateway insulates the local computer from all other computers or from specified domains by using at least two IP addresses: one to communicate with the local computer and one to communicate with a server. The proxy server/gateway examines and changes the header information on all packets it handles so that it can encode, route, and decode them properly. The difference between a proxy gateway and a proxy server is that the proxy server usually includes cache (page 1023) to store frequently used Web pages so that the next request for that page is available locally and quickly; a proxy gateway typically does not use cache. The terms "proxy server" and "proxy gateway" are frequently used interchangeably.

Proxy servers/gateways are available for such common Internet services as HTTP, HTTPS, FTP, SMTP, and SNMP. When an HTTP proxy sends queries from local systems, it presents a single organizationwide IP address (the external IP address of the proxy server/gateway) to all servers. It funnels all user requests to the appropriate servers and keeps track of them. When the responses come back, the HTTP proxy fans them out to the appropriate applications using each machine's unique IP address, thereby protecting local addresses from remote/specified servers.

Proxy servers/gateways are generally just one part of an overall firewall strategy to prevent intruders from stealing information or damaging an internal network. Other functions, which can be either combined with or kept separate from the proxy server/gateway, include packet filtering, which blocks traffic based on origin and type, and user activity reporting, which helps management learn how the Internet is being used.

RPC Network Services

Much of the client/server interaction over a network is implemented using the RPC (Remote Procedure Call) protocol, which is implemented as a set of library calls that make network access transparent to the client and server. RPC specifies and interprets messages but does not concern itself with transport protocols; it runs on top of TCP/IP and UDP/IP. Services that use RPC include NFS and NIS. RPC was developed by Sun as ONC RPC (Open Network Computing Remote Procedure Calls) and differs from Microsoft RPC.

In the client/server model, a client contacts a server on a specific port (page 373) to avoid any mixup between services, clients, and servers. To avoid maintaining a long list of port numbers and to enable new clients/servers to start up without registering a port number with a central registry, when a server that uses RPC starts, it specifies the port it expects to be contacted on. RPC servers typically use port numbers that have been defined by Sun. If a server does not use a predefined port number, it picks an arbitrary number.

The server then registers this port with the RPC portmapper (the portmap daemon) on the local system. The server tells the daemon which port number it is listening on and which RPC program numbers it serves. Through these exchanges, the portmap daemon learns the location of every registered port on the host and the programs that are available on each port. The portmap daemon, which always listens on port 111 for both TCP and UDP, must be running to make RPC calls.

Files

The /etc/rpc file (page 456) maps RPC services to RPC numbers. The /etc/services file (page 456) lists system services.

RPC client/server communication

The sequence of events for communication between an RPC client and server occurs as follows:

The client program on the client system makes an RPC call to obtain data from a (remote) server system. (The client issues a "read record from a file" request.)
If RPC has not yet established a connection with the server system for the client program, it contacts portmap on port 111 of the server and asks which port the desired RPC server is listening on (for example, rpc.nfsd).
The portmap daemon on the remote server looks in its tables and returns a UDP or TCP port number to the local system, the client (typically 2049 for nfs).
The RPC libraries on the server system receive the call from the client and pass the request to the appropriate server program. The origin of the request is transparent to the server program. (The filesystem receives the "read record from file" request.)
The server responds to the request. (The filesystem reads the record.)
The RPC libraries on the remote server return the result over the network to the client program. (The read record is returned to the calling program.)

Because standard RPC servers are normally started by the xinetd daemon (page 389), the portmap daemon must be started before the xinetd daemon is invoked. The init scripts (page 404) make sure portmap starts before xinetd. You can confirm this sequence by looking at the numbers associated with /etc/rc.d/*/S*portmap and /etc/rc.d/*/S*/xinetd. If the portmap daemon stops, you must restart all RPC servers on the local system.

Network Utilities

Usenet

Distributed Computing