Single System Image Cluster - Online Article

Introduction

A single system image (SSI) is the property of a system that hides the heterogeneous and distributed nature of the available resources and presents them to users and applications as a single unified computing resource. SSI can be enabled in numerous ways, ranging from those provided by extended hardware to various software mechanisms. SSI means that users have a globalized view of the resources available to them irrespective of the node to which they are physically associated. Furthermore, SSI can ensure that a system continues to operate after somefailure (high vailability) as well as ensuring that the system is evenly loaded and providing communal multiprocessing (resource management and scheduling). SSI design goals for cluster-based systems are mainly focused on complete transparency of resource management, scalable performance, and system availability in supporting user applications (Buyya, 1999; Pfister, 1998;

Hwang et al., 1999; Walker and Steel, 1999a; Popek and Walker, 1996). A SSI can be defined as the illusion (Buyya, 1999; Pfister, 1998), created by hardware or software, that presents a collection of resources as one, more powerful unified resource.

Services and Benefits

The key services of a single-system image cluster includethe following -

  • Single entry point: A user can connect to the cluster as a virtual host (e.g., telnet beowulf.myinstitute.edu), although the cluster may have multiple physical host nodes to serve the login session. The system transparently distributes the user’s connection requests to different physical hosts to balance the load.
  • Single user interface: The user should be able to use the cluster through a single GUI. The interface must have the same look and feel as the one available for workstations (e.g., Solaris OpenWin or Windows NT GUI).
  • Single process space: All user processes, no matter on which nodes they reside, have a unique cluster-wide process ID. A process on any node can create child processes on the same or different node (through a UNIX fork). A process should also be able to communicate with any other process (through signals and pipes) on a remote node. Clusters should support globalized process management and allow the management and control of processes as if they are running on local machines.
  • Single memory space: Users have an illusion of a big, centralized main memory, which in reality may be a set of distributed local memories. Software DSM approach has already been used to achieve single memory space on clusters. Another approach is to let the compiler distribute the data structure of an application across multiple nodes. It is still a challenging task to develop a single memory scheme that is efficient, platform independent, and able to support sequential binary codes.
  • Single I/O space (SIOS): This allows any node to perform I/O operations on local or remotely located peripheral or disk devices. In this SIOS design, disks associated to cluster nodes, network-attached RAIDs, and peripheral devices form a single address space.
  • Single-file hierarchy: On entering into the system, the user sees a single, huge file system image as a single hierarchy of files and directories under the same root directory that transparently integrates local and global disks and other file devices. Examples of single-file hierarchy include NFS, AFS, xFS, and Solaris MCProxy.
  • Single virtual networking: This means that any node can access any network connection throughout the cluster domain even if the network is not physically connected to all nodes in the cluster. Multiple networks support a single virtual network operation.
  • Single job management system: Under a global job scheduler, a user job can be submitted from any node to request any number of host nodes to execute it. Jobs can be scheduled to run in either batch, interactive, or parallel modes. Examples of job management systems for clusters include GLUnix, LSF, and CODINE.
  • Single control point and management: The entire cluster and each individual node can be configured, monitored, tested, and controlled from a single window using single GUI tools, much like an NT workstation managed by the task manager tool.
  • Checkpointing and process migration: Checkpointing is a software mechanism to periodically save the process state and intermediate computing results in memory or disks. This allows the rollback recovery after a failure. Process migration is needed in dynamic load balancing among the cluster nodes and in supporting checkpointing.

The most important benefits of SSI include the following-

  • It provides a simple, straightforward view of all system resources and activities from any node in the cluster.
  • It frees the end user from having to know where in the cluster an application will run.
  • It allows the use of resources in a transparent way irrespective of their physical location.
  • It lets the user work with familiar interface and commands and allows the administrator to manage the entire cluster as a single entity.
  • It offers the same command syntax as in other systems and thus reduces the risk of operator errors, with the result that end users see improved performance, reliability, and higher availability of the system.
  • It allows one to centralize/decentralize system management and control to avoid the need of skilled administrators for system administration.
  • It greatly simplifies system management and thus reduced cost of ownership. It provides location-independent message communication.
  • It benefits the system programmers to reduce the time, effort, and knowledge required to perform the task and allows current staff to handle larger or more complex systems.
  • It promotes the development of standard tools and utilities.

SSI Layers/Levels

The two important characteristics of SSI are the following:

  1. Every SSI has a boundary.
  2. SSI support can exist at different levels within a system—one able to be built on another.

SSI can be implemented in one or more of the following levels:

  • Hardware,
  • Operating system (so-called “underware”) (Walker and Steel, 1999a),
  • Middleware (runtime subsystems),
  • Application.

A good SSI is usually obtained by cooperation between all these levels as a lower level can simplify the implementation of a higher one.

About the Author:

No further information.




Comments

No comment yet. Be the first to post a comment.