DevBuilder Dev Articles News Articles DevBuilder Forum Code Login Sign Up
Username:  Password:    
 

Add Your ResourcesAdd Your Resource   Write the reviewsPost Review   Read the reviewsRead Reviews   Printer friendly versionPrint  

Rate This Article

How to build cluster computers Article Rating:

Introduction

James Cameron?s Titanic already way over budget, the Titanic special effects crew couldn't afford a supercomputer to do the critical rendering, and anything less would take far too long. Like all high-end animators and special effects houses, the Titanic team had a slew of SGI Indigo workstations (as well as a pile of new Windows NT workstations for the low end jobs), but Titanic romance and tragedy was far more demanding than most projects. A much greater degree of realism was required than for the typical science fiction epic, and realism is expensive. Rendering the water scenes was obviously a job for a supercomputer, but with Titanic already far over budget, a $10,000,000 computer wasn't realism. The performance problem was solved by assembling a cluster of DEC Alpha based computers into a Linux cluster, an instant supercomputer at a small fraction of the cost, which produced a large number of extraordinarily challenging visual effects for this demanding film. In this article, I will discuss how to build a generic Linux or Windows supercomputer with cluster computing concept. You will find how easy it is to build a supercomputer with Linux clusters. In this article, we will limit our discussion in building a Linux and Windows clusters to obtain supercomputer computational power. It is out of scope of this article, to discuss, how to solve any computational intensive algorithmic problem and how to code those algorithms for cluster architecture.

Definition and Benefits from Clustering

Greg Pfister, in his wonderful book In Search of Clusters, defines a cluster as "a type of parallel or distributed system that: consists of a collection of interconnected whole computers, and is used as a single, unified computing resource." Therefore, cluster is a group of computers, bound together into a common resource pool. A given task can be executed on all computers or on any specific computer in the cluster. Lets look into the benefits from clustering:

Scientific applications: Enterprise running scientific applications on supercomputers can benefit from migrating to more cost effective Linux cluster.

Large ISPs and E-Commerce enterprise with large database: Internet service providers or e-commerce web sites that require high availability and load balancing and scalability.

Graphics rendering and animation: a Linux cluster has become important in the film industry for rendering quality graphics. In the movie Titanic, a Linux cluster was used to render background in ocean scenes. Same concept used in movies True Lies and Interview with the Vampire. One may also characterize clusters by their function:

Definition and Benefits from Clustering:
Tasks (small piece of executable codes) are broken down and worked on by many small systems rather than one large system, often deployed for task previously handled by supercomputers. This type of cluster is very suitable for scientific or financial analysis.

Fail-over clusters: Clusters are used to increase the availability and serviceability of network services. When an application or server fails, its services are migrated to another system, the identity of failed system also migrated. Failover servers are used for database servers, mail servers or file servers.

High availability load balancing clusters: A given application can run on all computers and a given computer can host multiple applications. The ?outside world? interacts with the cluster and individual computers are ?hidden?. It support large cluster pool and application do not need to be specialized. High availability clustering works best with stateless application ands that can be run concurrently.



Building Windows Clusters

Hardware
Before starting, you have to have following hardware and software. You have at least two computers with Windows NT, SP6 or Windows 2000 networked with some sort of LAN equipment (hub, switch etc.). Ensure during the Windows set up phase that TCP/IP, and NETBUI are installed, and that the network is started, with all the network cards detected and the correct drivers installed. We will call these two computers as Windows cluster. Ok, now you need some sort of software that will help you to develop, deploy and execute application over this cluster. This software is the core what makes a Windows cluster possible.

Software
The Message Passing Interface (MPI) is an evolving de facto standard for supporting cluster computing based on message passing. There are several implementations of this standard. In this article, we will use MPICH, which is freely available, and you can download it from here for windows clustering, and find related documentation here. Please read Quick Start.pdf and manual before starting following steps.

Step 1: Download and unzip nt-mpich-1.3.0-a.zip onto any folder (for example C:\NT-MPICH) and share this folder with write permission.

Step 2: Copy all files with .dll extension from C:\NT-MPICH\libto folder C:\Windows\system32

Step 3: Install the Cluster Manager Service on each host you want to use for remote execution of MPI processes. For installation, start rcluma-install.bat (located in subdirectory C:\NT-MPICH\bin) by double-clicking from local or network-drive. You must have administrator rights on the hosts to install the service.

Step 4: Follow step 1 and 2 for each node in the cluster (we will name each computer in the cluster as node)

Step 5: Now Start RexecShell (from folder C:\NT-MPICH\bin) by double-clicking it. Open the configuration dialog by pressing F2. The distribution contains a precompiled example MPI program named cpi.exe (located in NT-MPICH/bin). Choose it as the actual program. Make sure that each host can reach cpi.exe at the specified path. Choose ch_wsock as active plug-in. Select the hosts to compute on. On the tab 'Account', enter your username, domain and password, which need to be valid on each host chosen. Press OK to confirm your selections. The Start Button (from Window RexecShell) is now enabled and can be pressed to start cpi.exe on all chosen hosts. The output will be displayed in separate windows.

Congratulation your supercomputer (Windows cluster) is ready to run MPI programs!


Building Linux Cluster


Linux clusters are more common, robust, efficient and cost effective than Windows cluster. Following are the steps involved in building up a Linux clusters. For more information, see here.

Step 1:
Install a Linux distribution (I am using Red Hat 7.1 and working with two Linux boxes) on each computer in your cluster. During the installation process, assign hostnames and of course, unique IP addresses for each node in your cluster. Usually, one node is designated as the master node (where you'll control the cluster, write and run programs, etc.) with all the other nodes used as computational slaves. We name one of our nodes as Master and the other as Slave. Our cluster is private, so theoretically we could assign any valid IP address to our nodes as long as each had a unique value. We used IP address 192.168.0.190 for the master node and slave node as 192.168.0.191. If you already have Linux installed on each node in your cluster, then you don't have to make changes to your IP addresses or hostnames unless you want to. Changes (if needed) can be made using your network configuration program Linuxconf in Red Hat. Finally, create identical user accounts on each node. In our case, we create the user DevArticle on each node in our cluster. Either you can create the identical user accounts during installation, or you can use the adduser command as root.

Step 2:
Then configure rsh on each node in your cluster.
Create .rhosts files in the user and root directories. Our .rhosts files for the DevArticle users are as follows:

Master DevArticle
Slave DevArticle


Moreover, the .rhosts files for root users are as follows:

Master root
Slave root

Next, we created a hosts file in the /etc directory. Below is our hosts file for Master (the master node):

192.168.0.190 Master.home.net Master
127.0.0.1 localhost
192.168.0.191 Slave

Step 3:
Do not remove the 127.0.0.1 localhost line. The hosts.allow files on each node were modified by adding ALL+ as the only line in the file. This allows anyone on any node permission to connect to any other node in our private cluster. To allow root users to use rsh, I had to add the following lines to the /etc/securetty file:
rsh, rlogin, rexec, pts/0, pts/1. Also, I modified the /etc/pam.d/rsh file:
#%PAM-1.0
# For root login to succeed here with pam_securetty, "rsh" must be
# listed in /etc/securetty.
auth sufficient /lib/security/pam_nologin.so
auth optional /lib/security/pam_securetty.so
auth sufficient /lib/security/pam_env.so
auth sufficient /lib/security/pam_rhosts_auth.so
account sufficient /lib/security/pam_stack.so service=system-auth
session sufficient /lib/security/pam_stack.so service=system-auth

Step 4:
rsh, rlogin, Telnet and rexec are disabled in Red Hat 7.1 by default. To change this, I navigated to the /etc/xinetd.d directory and modified each of the command files (rsh, rlogin, telnet and rexec), changing the disabled = yes line to disabled = no.
Once the changes were made to each file (and saved), I closed the editor and issued the following command: xinetd -restart to enable rsh, rlogin, etc.

Step 5:
Next, download the latest version of MPICH (UNIX all flavors) from here to the master node. Untar the file in either the common user directory (the identical user you established for all nodes "DevArticle" on our cluster) or in the root directory (if you want to run the cluster as root). Issue the command: tar zxfv mpich.tar.gz Change to the newly created mpich-1.2.2.3 directory. Type ./configure, and when the configuration is complete and you have a command prompt, type make.
The make may take a few minutes, depending on the speed of your master computer. Once make has finished, add the mpich-1.2.2.3/bin and mpich-1.2.2.3/util directories to your PATH in .bash_profile or however you set your path environment statement. The full root paths for the MPICH bin and util directories on our master node are /root/mpich-1.2.2.3/util and /root/mpich-1.2.2.3/bin. For the DevArticle user on our cluster, /root is replaced with /home/DevArticle in the path statements. Log out and then log in to enable the modified PATH containing your MPICH directories.

Step 6:
Then make all the example files and the MPE graphic files. First, navigate to the mpich-1.2.2.3/examples/basic directory and type make to make all the basic example files. When this process has finished, you might as well change to the mpich-1.2.2.3/mpe/contrib directory and make some additional MPE example files, especially if you want to view graphics. Within the mpe/contrib directory, you should see several subdirectories. The one we will be interested in is the mandel directory. Change to the mandel directory, and type make to create the pmandel exec file. You are now ready to test your cluster.

Test your installation

The first program we will run is cpilog. From within the mpich-.2.2.3/examples/basic directory, copy the cpilog exec file (if this file isn't present, use make command again) to your top-level directory. On our cluster, this is either /root (if we are logged in as root) or /home/DevArticle, if we are logged in as DevArticle (we have installed MPICH both places). Then, from your top directory, rcp the cpilog file to each node in your cluster, placing the file in the corresponding directory on each node. For example, if I am logged in as DevArticle on the master node, I'll issue rcp cpilog Slave:/home/ DevArticle to copy cpilog to the DevArticle directory on Slave. I'll do the same for each node (if there are more than two nodes). If I want to run a program as root, then I'll copy the cpilog file to the root directories of all nodes on the cluster.

Congratulation your supercomputer (Linux cluster) is ready to run MPI programs!

Once the files have been copied, I'll type the following from the top directory of my master node to test my cluster:

mpirun -np 1 cpilog

This will run the cpilog program on the master node to see if the program works correctly. Some MPI programs require at least two processors (-np 2), but cpilog will work with only one. The output looks like the following:

pi is approximately 3.1415926535899406,
Error is 0.0000000000001474
Process 0 is running on Server.home.net
wall clock time = 0.360909

Now try all two nodes (or however many you want to try) by typing: mpirun -np 2 cpilog and you'll see

pi is approximately 3.1415926535899406,
Error is 0.0000000000001474
Process 0 is running on Master.home.net
Process 1 is running on Slave.home.net
wall clock time = 0.0611228

or something similar to this. The number following the -np parameter corresponds with the number of processors (nodes) you want to use in running your program. This number may not exceed the number of machines listed in your machines.LINUX file plus one (the master node is not listed in the machines.LINUX file).

To see some graphics, we must run the pmandel program. Copy the pmandel exec file (from the mpich-1.2.2.3/mpe/contrib/mandel directory) to your top-level directory and then to each node (as you did for cpilog). Then, if X isn't already running, issue a startx command. From a command console, type xhost + to allow any node to use your X display, and then set your DISPLAY variable as follows: DISPLAY=Server:0 (be sure to replace Server with the hostname of your master node). Setting the DISPLAY variable directs all graphics output to your master node. Run pmandel by typing: mpirun -np 2 pmandel.

The pmandel program requires at least two processors to run correctly. You should see the Mandelbrot set rendered on your master node.


The mandelbrot Set Rendered on the Master Node


Adding more processors (mpirun -np 10 pmandel) should increase the rendering speed dramatically. The mandelbrot set graphic has been partitioned into small rectangles for rendering by the individual nodes. You actually can see the nodes working as the rectangles are filled in. If one node is a bit slow, then the rectangles from that node will be the last to fill in. It is fascinating to watch.

Conclusion:
Clustered computing has been with us for several years. It represents an attempt to solve larger problems, or to solve problems in a more cost effective manner, than the more conventional systems of the time. If you are interested to know more about cluster computing, you may start from here.




Add Your ResourcesAdd Your Resource   Write the reviewsPost Review   Read the reviewsRead Reviews   Printer friendly versionPrint  

Rate This Article