Other features, enhancements, bug fixes, and updates
Please read on for more!
Pooch Application
Parallel OperatiOn and Control Heuristic Application
Version 1.7
Copyright © 2001-6 Dauger Research, Inc.
Code and manual written by Dean E. Dauger
Table of Contents
I. Installing Pooch ·············································3
II. Your First Parallel Computation ·············································4
III. Introduction ·············································7
IV. Menus and Windows
Menus ··········································12
Job Window ··········································24
Network Scan Window ··········································35
Node Info Window ··········································48
Get Files Window ··········································49
Node Registration Schedule Dialog ··········································50
V. Pooch Pro ··········································51
VI. AppleScript
Standard Suite ··········································63
Pooch Suite ··········································67
VII. Security ··········································71
VIII. Frequently Asked Questions ··········································73
IX. Revision History ··········································78
X. Credits ········································106
XI. Contact Information ········································107
A. Compiling the MPIs ········································108
I. Installing Pooch
Make sure your Macintosh running OS X 10.21 is connected properly to the Internet. Almost any type of network connection (10BaseT, 100BaseT, Gigabit, Airport, or a mix) is sufficient. If this Mac is on an isolated network, manually configure the TCP/IP control panel to a unique IP address from 192.168.1.1 to 192.168.1.254. There are two ways to install Pooch:
For automatic installation, double-click the Pooch Installer:
(There is no step three. Or two.)
For manual installation:
1. Copy the Pooch folder, containing the Pooch application, to your hard drive.
2. Open the Pooch application.
So that your Mac will be available for computation if rebooted, we recommend the following:
3. From the File menu, select Set Pooch in Account Startup Items.
Thats it! Your Mac is now ready for parallel computing. Repeat for additional Macs.
II. Your First Parallel Computation
Once Pooch is installed and running on each Mac of a local area network, you can use them as a parallel computer. You will need a parallel application to run. For our first test, lets use the Power Fractal app. Run it on your machine first and note its performance.
Step One - Selecting a Parallel Application
Switch to the Pooch application and select New Job
from the File menu, which results in a new Job Window. Drag the Power Fractal app from the Finder to Poochs Job Window or click Select App
.
Step Two - Selecting Nodes
By default, Pooch selects your own node as participating in the parallel job. To add other nodes, click on Select Nodes
in the Job Window to invoke the Network Scan Window.
Double-clicking on a node moves it to the node list of the Job Window.
Step Three - Launching the Parallel Job
Finally, click Launch Job to start your parallel job.
Pooch should now be distributing the code and starting the parallel application. Once the job runs, you should see a measurable performance increase.
Congratulations! You are now operating your first parallel computer.
Optional - Your Second Parallel Computation
1. Open version 1.1 or later of Power Fractal. Note its performance.
2. From the Parallel menu, select Automatically Launch onto Four Nodes.
The demo will now automatically ask Pooch to create a job using the four best nodes on the local network, then submit itself for parallel launch. This is an example of Computational Grid computing.
III. Introduction
What is Pooch?
Pooch serves as both the user interface to your parallel computer and the manager of your parallel computer. Cluster and grid computing are both examples of parallel computation. Pooch is the parallel computer management and user interface software that can prepare, launch, and monitor these types of parallel computing jobs.
The first role of Pooch is to automate the process of starting a parallel application. A parallel application is much like a normal application except that it is designed to run on a collection of intercommunicating computational nodes.2 Manually starting such an application is a tedious process. Pooch automates this process, minimizing the users effort to operate and control a parallel computation. However, Pooch provides much more beyond this fundamental function.
Pooch provides access to a great deal of information, customization, and automation regarding the life of jobs run on the cluster and the health of the cluster. Jobs can be launched immediately or queued for a later time or when specific conditions are met. Pooch can intelligently and dynamically acquire nodes using its heuristic algorithms or using custom AppleScripts prior to launch. Statistical information about nodes can be accessed, and some node behavior can be customized. Job status is monitored, and statistics about jobs are collected and retained during and after jobs have completed. Pooch serves as launcher, scheduler, queuing system, and multisystem cluster monitor, all presented in a friendly user interface.
The first thing that makes Pooch unique from all other software of its kind on any platform is the extension of the revered Macintosh ease-of-use to parallel computing. Since 1998, the AppleSeed project at UCLAs Department of Physics has unquestionably demonstrated the practicality of leveraging the advantages of the Macintosh platform for the goals of numerically-intensive parallel computing. While others used Linux and Windows NT with a hodgepodge of command-line only, unreliable, unstable code fragments, AppleSeed provided the only known solution for parallel computing on the Mac OS. Users of software built by the AppleSeed team are not required to know the inner workings of an operating system, nor are they required to hire expert assistance in such matters. Demonstrated at UCLA and around the world time and time again for three years, that advantage has enabled college students, high school and junior high school students, and even ordinary scientists on shoestring budgets to successfully build and operate their very own parallel computer.
We have continued this success in the latest incarnation of parallel computing on the Mac. Two months after the introduction of Pooch, a fellow in Hawaii used it to build and run his own five-iMac cluster. He then informed Dauger Research of his success and that he was in the sixth grade. While achieving such strides in ease of use, the same software harnesses massively parallel hardware.
Macintosh Parallel Computing
Forming a parallel computer out of a network of computers is called cluster computing. In this case, the hardware of the parallel computer is a cluster of Macintoshes. AppleSeed provided two major software pieces essential for using such hardware for parallel computation.
The first is a communications library named MacMPI, authored by Dr. Viktor K. Decyk and introduced in 1998. MacMPI provides a means for executables on each Macintosh to communicate with one another. It is a source code wrapper library that translates the calls by the parallel code into calls directly to the Mac OS. MPI (Message-Passing Interface) is a standard application programming interface (API) of the high-performance parallel computing industry. That API is supported on almost all large parallel computers (Cray, IBM SP, Fujitsu, SGI, etc.). Using the Mac OS, MacMPI supports a commonly used subset of those MPI calls.
The second software component satisfies a different essential need of the parallel computation: initiation. To begin, MacMPI requires information about the Macintoshes designated to participate in the computation. In addition, since, when compiled, it is mixed with the object code of the executable itself, MacMPI cannot copy and start the parallel application or start itself on all the Macintoshes that will perform the computation. These needs are filled by parallel computing launching software.
Through 2000, AppleSeed provided software called the Launch Den Mother and the Launch Puppy (LDM & LP) that filled this second need. The combination of MacMPI and LDM & LP are the key ingredients that allowed students and researchers around the world to develop and run their own parallel computing software on the Mac OS.
Meanwhile, Apple was introducing prereleases of Mac OS X in preparation for its final release in 2001. Mac OS X was fundamentally distinct from the original Mac OS while being the future of the Mac OS. Apple required Mac developers to port their code to Carbon, a new API, so that their code can run natively in OS X. Carbonizing MacMPI (renamed MacMPI_X) was straightforward, but the launching mechanism was another story. Most of LDM & LPs fundamental design made it impossible for a simple conversion to OS X. For example, LDM & LPs primary communications mechanisms and LDMs user interface were tied to pieces of the Mac OS not carried forward into Mac OS X.
Since porting LDM & LP to OS X would largely be a rewrite of LDM & LP, their author decided instead to simply start from scratch. Rather than a reconstruction of what came before, the new software created the opportunity to create software that vastly superseded LDM & LPs functions, capabilities, and use of the Macintosh user interface. Utilizing three years of experience with parallel computation using Macintosh clusters at UCLA Physics, this project called for a rethinking and redesign of the user interface for a parallel computer. Hence, Pooch was born.
Using Pooch
Pooch is meant to be used in an environment where your cluster is a computational resource shared by a cooperative group, much like how a workplace printer is shared. A few scenarios are possible:
- A dedicated Macintosh cluster - Perhaps the simplest configuration, a group of eight, sixteen, or thirty-two identical Macintoshes or Xserves devoted to computation at all hours of the day. Pooch enables members of your group to submit jobs to the cluster and retrieve the output at a later time. Also, a subset of the nodes can be used for one job while a second job runs on the others. Many, including UCLAs Department of Statistics and NASAs Jet Propulsion Laboratory, have clusters like this.
- Reusing a network of individual Macs for computations - There are secretaries, scientists, artists, and professors using their own Macs for e-mail, spreadsheets, web browsing, and word processing during the day. After they leave for home, the Macs are usually idle and unused. Merely with the addition of Pooch, this network of Macs can become a computational cluster at night or on weekends.
- Combinations of dedicated and individual Macs - A few Macs, perhaps four at a time, can be added to an existing network of Macs as described in the preceding example. The groups of four devoted computational nodes can be used for debugging and diagnosing an experimental parallel code during the day, but then the entire network can be combined in the off-hours for larger (and well-tested) parallel codes. As new hardware is purchased, the compute nodes are mixed with the rest of the population as needed. This more dynamic model is in practice in the Plasma Physics group at UCLA. This setup is well suited for a circumstance where, often, the groups of four are enough for most computations, but occasionally a member of the Plasma Physics group needs to complete some large runs to meet, for example, a conference deadline. So this researcher asks the group and gets time on the entire network. This flexibility has saved the day for more than one researchers work.
- Be creative: You may find additional scenarios useful. For example, from a PowerBook connected to the Internet via Airport, the author has initiated and controlled, in real-time, parallel jobs computed over 40 miles away. Using a network connection at the Max-Planck-Institut für extraterrestrische Physik in Garching, Germany, the distance record has increased to 6000 miles.
System Requirements
In general, Pooch needs 4 MB of free memory on a Macintosh with a valid TCP/IP connection. Pooch requires CarbonLib version 1.2 or later and Mac OS 9 or later. The latest CarbonLib is available from http://www.info.apple.com/support/downloads.html.
Because Pooch is a Carbon application, it can run on OS X too. In version 10.2 of OS X, Apple fixed significant bugs present in previous versions of OS X. Thus, we are proud to say clustering on Mac OS X 10.2 and 10.3 is fully operational and fully supported. Pooch Pro requires OS X 10.2.1 or later.
IV. Menus and Windows
Pooch Menus
File menu
- New Job
- Opens a new Job Window (see the Job Window section)
- New Queue Job
- Opens a new Job Window enabling the Queue Job and Automatically Acquire options.
- Save Job As
- Saves the job of a Job Window to a file on disk, which can later be used for another job. New in 1.7.
- Select App and Files
- Opens a get files dialog for selecting an application and files for the Job Window. If the Job Window is present, the file dialog will appear as a sheet. If the Job Window is not present, the dialog will be standalone, but if a file is selected it will be added to a new created Job Window.
- Reveal Remote Files Folder in Finder - When Pooch receives files from another node, Pooch must hold the files in a folder on the hard drive. Invoking this item directs the Finder to open that folder where those files are being held. See Collect Remote Files. Also, this item is available by control-clicking the Pooch icon in the Dock.
- Select Nodes
- Opens the Node View of the Network Scan Window (see the Network Scan Window section)
- Show Job View
- Opens the Job View of the Network Scan Window (see the Network Scan Window section)
- Recent Node Lists - When the Job Window is open, this submenu lists the node lists of the most recently launched jobs. The number before the : is the number of nodes. When the node names have similarities, Pooch attempts to abbreviate the names. For example, a node list of four computers named node01, node02, node26, and node27, will be abbreviated to 4: node01,
02,
26,
27.
- Recent Apps and Files - When the Job Window is open, this submenu lists the file lists of the most recently launched jobs. The number before the : is the number of files. If one of the files is an executable that was launched, it will be listed first.
- Set Pooch in Account Startup Items - Makes (or updates) an entry for Pooch in the Login Items system preference so that Pooch will automatically start up and make this Mac available for parallel computation after a restart. This feature is provided as a convenience.
- Collect Remote Files In - When Pooch receives files from another node, it must store those files in a folder on the hard drive. Pooch can hold those files either: 1. in the folder where the Pooch App itself resides; 2. inside the Temporary Items folder, as designated by the OS; 3. in the path /tmp/pooch, which is the default case on OS X; or 4. in the path /Users/Shared/pooch/remote. Note that files in /tmp may be deleted when the machine is restarted.
- Open Pooch Log - Opens the Pooch Log in Console.app. Pooch logs events into a text file located at /Users/Shared/pooch/poochlog.txt. These entries detail steps in the launch process, the queuing system, and network access.
- Quit Pooch - Selecting this item, or pressing Command-Option-Q, will quit Pooch. Pooch is an unusual application because, while its user interface is not in use, it is meant to run at all times in the background. This mechanism will make it less likely to be quit accidentally.
- Finder to the Foreground - Selecting this item sends Finder to the foreground.
Edit menu
- Paste - Invoking this item in the Job Window pastes an item on the clipboard into its file pane. Launching such a job distributes the clipboard to the clipboards on the target nodes. In the Network Scan Window, this pastes into the remote network address field. Modified in 1.7.
Node menu
To use the items on the Node menu, you most select one node from either the node list of the Job Window or the list of nodes in the Node View of the Network Scan Window. Contextual menu-clicking on a single node in the Job Window or the Network Scan Window also invokes this menu.
- Get Node Info - Invokes a Node Info Window that retrieves a variety of information about a node, such as processor type, OS version, load, and free memory. It also retrieves a list of running processes (an application is one type of process) on that node.
- Get Job Queue - Retrieves the parallel jobs queued on that node. Typically, a job is forwarded to the Pooch at the jobs node zero and held there until certain conditions have developed. For example, a job might start only when its designated start time has passed and the target nodes are no longer busy.
- Get Files - Opens a new window where you can retrieve files from the remote node.
- Connect to Server... - Opens an Apple File Sharing connection to that node, much like the menu item of the same name in the Finder. If that node has File Sharing on, you can enter your password to access files on its volumes. This feature is provided for your convenience.
Network menu
- Register this Node - Pooch, by default, registers its node on the network, meaning that it puts up a flag that other Pooches can see. This flag has enough information for other Pooches to identify and communicate with this particular Macintosh. Selecting Never removes this flag, making this Pooch invisible to other Pooches. Selecting On a Schedule
opens a node registration schedule dialog (see below). Pooch will then register and deregister itself according to the specified schedule.
- Local Scan Includes this Node - When scanning for nodes in the local network, selecting Always will cause Pooch to always add itself to the network results, regardless of the Register this Node settings. Otherwise your node might not appear in the local network scan if your Pooch did not register itself. Selecting Always does not cause other Pooches to see your node.
- On Node Deregistration - When Pooch deregisters itself, Pooch will kill all jobs it is tracking on this node if Kill All Jobs is checked. For example, when this node deregisters itself according to the node registration dialog (see On a Schedule
of the Register this Node submenu, above), Pooch will kill any remaining jobs that it launched. If this menu item is not checked, Pooch will allow jobs to continue to their own completion.
- Primary Address - This submenu specifies which active IP address Pooch registers as primary or the default address. Use System Default has Pooch follow the precedence specified by the system, usually set via the port ordering in the Network system preference. Setting this to another IP address, if available, resets Poochs network registration of this node to the new setting. It will encourage, but not guarantee, that other Pooches and their jobs will use that address, perhaps because of its more optimal performance. Pooch will still listen on the other addresses, as not every IP address will be accessible from other nodes. New in 1.6, selecting the Import Addresses
option selects a whitespace-delimited text file containing additional addresses that identify this node. This is appropriate when, for example, a firewall redirects activity from a public IP address to a private one.
- Network Scan Columns - This submenu allows you to show and hide columns in the Network Scan Window. Check and uncheck the items to toggle them. See the Network Scan Window section for descriptions. Selecting Show Columns
opens a window where multiple columns can be toggled at once.
- Automatic Rescan - When the Network Scan Window is open, Pooch will automatically repeat its scans on the network at a particular interval. You can set this time interval, or shut off the automatic rescan, by selecting an item on this menu.
- Job View Access Time - This setting determines how far back in time the Network Scan Window, when in Job View, will query when compiling archived job information.
- Discover using - Since its birth, Pooch used Apples implementation of the Service Location Protocol (SLP), an IETF standard, to perform discovery of other Pooches on a TCP/IP network. In May 2002, Apple introduced Bonjour (formerly Rendezvous), also known as ZeroConfig or mDNS, to perform discovery as well. The latest version of Pooch uses both to discover on a network. For best compatibility, the default setting uses both SLP and Bonjour so that this Pooch can discover and be discovered by Pooches running on OSs that do not support Bonjour (such as OS 9). You may select Bonjour Only if you know you will only be using Macs running OS X 10.2 or later. The benefit of using only Bonjour is faster discovery, but its drawback is not being able to locate Macs running older versions of the Mac OS.
- Access Nodes - Normally Pooch contacts other nodes directly to inquire about their status. A node accessed using the Remote Node Scan feature could return the addresses of nodes behind a firewall or other network barriers. Selecting via Proxy will use the proxy connection provided by the first remote node to forward data between your local machine and those behind the firewall. New in 1.6 and available only on OS X 10.4 Tiger, Pooch can access an Xgrid cluster. To access Xgrid, enter the host address of the Xgrid controller and its password using the via Xgrid
item of this submenu. If no password is required, leave that field blank. Pooch spawns jobs containing a piece of itself through Xgrid, which Pooch can then discover and access.
- Remote Network Job Access - Normally Pooch on other nodes contacts yours directly to forward jobs and files. For example, a node accessed using the Remote Node Scan feature might not be able to return files to your nodes because of firewalls or other network barriers. This feature permits the Network Scan Window to pull these files onto your machine. Typically this is used in combination with the via Proxy setting of the previous item. None turns off this feature, Only My Jobs accesses only jobs that resulted from a job from this node and user, typically a job triggered by Retrieve Output job option (see Options of the Job Window), while All Jobs will pull all jobs destined for this node. If the name and IP address of your node changes, then this feature might not successfully locate the jobs. New in 1.7.
Settings menu
- New Job Automatically Reloads - When the Job Window is created, it will remember information, such as the file list (including the application), the node list, and various job options, from the last parallel job submitted. This submenu toggles what portions of the last job it will retain and what it will discard. Note: some of this information may become stale, such as when you move the app on your hard drive or if a node has a dynamic IP address (which could happen if your network uses DHCP). Extended in 1.7. Selecting On Job Submit triggers a new job to invoke as soon as the previous job is submitted to the launch queue. New in 1.7.
- Remember Recent Node Lists - This setting determines how many node lists will be displayed in the Recent Node Lists submenu of the File menu.
- Remember Recent Apps and Files - This setting determines how many lists of apps and files will be displayed in the Recent Apps and Files submenu of the File menu.
- Job Port Number Range - When Pooch launches a job it pseudorandomly assigns a primary TCP port number to the job for its communications. Normally Pooch chooses from 16384 up to 49152. You may specify a different range of port numbers here, say, for compatibility with a firewall. The Use Port Number
option in the Job Window overrides this setting.
- Launch Unix Jobs - By selecting in the Background, Unix-based executables (such as Mach-O executables and Unix scripts) are launched in the background, with output directed to a stdout.txt file. Selecting using Terminal directs this Pooch to launch Unix-based executables in a window of the Terminal app, allowing for user-input. This setting is local to this node: the launch on the remote nodes will use Terminal instead of background launching if using Terminal is checked on those nodes as well.
- Restrict/Allow Access to Local Processes - When this item is set to restrict, Pooch will block other nodes from seeing process IDs of processes not launched by Pooch. This prevents someone else from using the Node Info window to kill, say, your word processor. Use this item only if you wish to restrict control of processes on this node by other Pooches.
- Accept Pooch Commands from - When set to Local Subnet, Pooch will check the IP address of incoming connections against its local subnet. If they do not match, the connection will be rejected and this Poochs listener port will reset. Selecting Import Addresses
allows you to select a whitespace-delimited text file containing a list of IP addresses. Pooch will compare the addresses of incoming connections against this list, including the resolution of DNS data. Setting to Any Address removes this restriction. Use this item only if you do not want to control this node from other subnets of the Internet.
- Reject/Accept Remote Preferences Command - When set to accept, other Pooches will be allowed to overwrite this Poochs preferences and settings. For example, the node registration schedule may be modified remotely. When set to reject, other Pooches will not be allowed to modify settings of this Pooch.
- Disable/Enable Pooch at Logout
- Selecting this item will direct Pooch to run while nobody is logged in. This feature makes it possible to use unutilized, logged-out machines for parallel applications. Pooch automatically quits when someone logs in and will return when the user logs out or if they run Pooch. Toggling this item may require administrative authorization. The words (To Be Enabled On Restart) will be appended if it is necessary to restart the machine to fully enable this feature. Updated in 1.7.
- Disable/Enable Verbose Error Reporting - Reports internal errors, such as network errors, in notification windows. The information reported most useful when diagnosing difficult-to-find problems during Pooch operation. It also reports a problem when Pooch encounters AppleScript objects or commands that it cannot recognize.
Pooch menu
- About Pooch
- displays the About Box of Pooch, which also can contain status messages. This window normally opens briefly, then closes on its own, at startup. Clicking on the window closes it.
- Preferences
- displays the Pooch Preferences window. This window displays all of the preferences available in Pooch. These settings are divided into three categories: General, Nodes, and Jobs. Updated in 1.7.
General Preference Pane
Clicking on the icons in the toolbar at the top of the window switches the Preference Window to the other panes. It is normal for this window to disappear when Pooch is in the background. For backwards compatibility, the Settings Menu is shown by default.
Remote Head Node - This preference modifies and coordinates four elements of Poochs behavior to support launching jobs into and retrieving files from a cluster with a designated head node. If this feature is enabled, the default Network Scan window scans the head nodes network instead of the Local network, the Forward via job option chooses the head node, the Retrieve Output and Queue Job options are enabled, the Access Nodes via Proxy setting is enabled, and the Automatically Acquire Nodes job option automatically selects the head node as the starting node. New head node IP addresses can be entered via the address field of the Network Scan window. New in 1.7.
Nodes Preference Pane
Remote Network Job Access - Normally Pooch on other nodes contacts yours directly to forward jobs and files. For example, a node accessed using the Remote Node Scan feature might not be able to return files to your nodes because of firewalls or other network barriers. This feature permits the Network Scan Window to pull these files onto your machine. Typically this is used in combination with the via Proxy setting of the previous item. None turns off this feature, Only My Jobs accesses only jobs that resulted from a job from this node and user, typically a job triggered by Retrieve Output job option (see Options of the Job Window), while All Jobs will pull all jobs destined for this node. If the name and IP address of your node changes, then this feature might not successfully locate the jobs. New in 1.7.
Jobs Preference Pane
Selecting On Job Submit of the New Job Automatically Reloads menu triggers a new job to invoke as soon as the previous job is submitted to the launch queue. New in 1.7.
Pooch Windows
Job Window
The Job Window is where all parallel jobs are prepared for launching, scheduling, or queuing. A job requires information about the parallel application you want executed and what nodes will run this application. This window is resizable so it can display large amounts of information.
- File List - The left pane of this window displays the list of files in this job. This list can include an application3 , which, if present, is placed at the top of the list. Additional applications dragged here will be added to the end of the list, but they will not be launched by Pooch. Removing the first application in this list will cause the next application, if present, to shift to the top of the list. Additional files may be added, which some parallel apps use for input data, here. Pooch will distribute these files to the nodes on launch. Pooch is also capable of selecting and copying folder structures4 to other nodes. The folder tree will be scanned recursively. Files inside these folders will be copied as well, but only those files present at the moment of launch. Aliases will not be resolved. You may use the Select App
button to invoke a dialog sheet to select an application to execute and files to distribute, or you may drag them from Finder to this pane if you prefer. Use the Remove All button to clear the file list, and select a file or files and click the Remove button to remove individual files.
- Node List - The right pane displays the list of nodes for this job. Each node you select is assigned to run an MPI task or tasks. Each task is given a unique identification number. The number(s) on the icon to the left of the nodes name indicates which task(s) that node will be assigned. (See the Tasks per Computer job option, below.) This information will be relayed to MacMPI. The parallel code can retrieve this information from MacMPI and use it to partition its problem among the processors. Experimenting with the order of these nodes or the number of tasks can be useful for debugging. Clicking on Select Nodes
invokes the Network Scan Window. You can add nodes to the node list by selecting them in the Network Scan Window or dragging them from that window to this one. Remove All clears the node list, and selecting a node or nodes and clicking Remove removes individual nodes from the list. The limit on the number of nodes is determined when this Pooch is registered. Note: Pooch mandates that your Mac is node zero if it is participating in the computation.
- Job Options - In OS X 10.3 and later, clicking the Options
button produces a drawer presenting options for this job. Holding down the Options
pop-up creates a menu containing the same options. Only the menu is available on previous OS versions.
- Place in Subdirectory
- When sending files to another Mac, Pooch by default places all apps and files within the same folder in which Pooch resides. Over the course of many jobs, the mixture of files can become confusing. Specifying a name here asks Pooch to create a subdirectory inside that Poochs folder and place all apps and files in that new subdirectory. (Pooch does not write outside its folder during the launch process.) For the purposes of organization, you can name these subdirectories based on the job name, or the date, or the name of the user who submitted the job.
- Start Job at Time
- By default, when you submit a job, Pooch will start the job on the specified nodes immediately. This option, however, allows you to have Pooch place the job in the job queue and delay its start until a particular future time, any minute, any day. The job will be held in the queue on the node designated zero in the node list. (So, if your Mac is not node zero, it will be forwarded to that node.) The following shows the dialog that appears when selecting a start time:
- Tasks per Computer - Many of the recent Power Macintoshes have more than one processor each, so it would be appropriate to be able to allocate one parallel processing task per processor, enabling parallel computing inside boxes as well as between boxes. By default, this option is set to do just that. You may override this setting by selecting a specific number of tasks per processor. Increasing the number of tasks beyond the number of nodes will probably be less efficient, but this possibility is useful for debugging parallel code. Because multitasking is significantly better in OS X than in OS 9, we recommend launching multiple tasks per computer only on OS X. (By default, Pooch will launch only one task per computer running OS 9.) New in 1.7, control-clicking on the node icon in the node list will override this setting, allowing you to select different task counts for each node.
- Job Type - This option specifies the type of parallel launch needed for this parallel job. MacMPI, mpich, MPI-Pro, mpich-gm, and LAM/MPI need information about the cluster configuration, but they each require significantly different data formats and launch procedures. The MacMPI launch type is the default. It is generally the case that only Unix-based executables can accept information the way that the other MPIs require it, so Carbon, Cocoa, and most other types of apps will usually follow the MacMPI convention. See below for the Grid Job Type.
- Command-Line Arguments
- This option specifies command-line arguments to add when launching Unix-style Mach-O executables. Only some Unix-based executables require input information using such options (such as -a -e iou), and it is usually preferable to supply such input information via input files for portability. This item will be disabled when the file list contains other executable types.
- Use Port Number
- In order to use TCP/IP, an MPI needs to use a particular TCP port number to initialize its network connections. Under normal circumstances, Pooch generates a pseudorandom port number based on the job name and the time of day and supplies this number to the MPI. You can use this job option to override that random value and set a particular port number. This feature can be useful to comply with firewalls that only have specific ports open. However, the TCP convention requires that the same port number not be used again for approximately two minutes. Some TCP implementations follow this convention, and some do not, so your network could appear to be blocked if you launch a second job using the same port number too soon.
- Forward via - Normally Pooch attempts to contact all nodes directly when launching jobs and distributing data. In some configurations, nodes of a cluster could be protected by a firewall or a machine acting as a firewall, such as a head node, which prevents this kind of access. This feature allows you to forwarding the job to an in-between node, which acts as a proxy. The job will be forwarded to the proxy node, which then forwards it to node 0 in your node list. Usually such proxy nodes should be chosen from the ones used for the Remote Node Scan. (New in 1.5.5)
- Retrieve Output - Turning this setting on directs Pooch to automatically retrieve files created and modified by a job after Pooch detects that the job has ended. Pooch checks for this condition once per minute, and the queuing system also operates on a one-minute cycle. This feature can be set to only send back the files generated at node 0 or on all the nodes. Pooch will make a best effort to place files in the directory on the node where the executable originated. If that node is not available for some reason, the forwarding process will be queued until it is able to complete the task. As needed, the files will be forwarded through the proxy node specified in the Forward via feature, above. New in 1.6, the retrieved files are accompanied by a .joblog text file containing useful information about the jobs execution.
- Bark - Pooch will not bark when launching a job if this item is unchecked.
- Dont Use Busy Nodes - Normally, Pooch checks if the nodes on the node list are already running a job submitted using Pooch. If any of the nodes are running such a job, Pooch will not launch the new job. This feature is useful because a parallel job usually runs best if there are no other jobs running. Unchecking this item disables this feature.
- Queue this Job - If this option is enabled, Pooch will save this job into a job queue and launch it as soon as possible. If the job launch process fails, which can happen if the nodes are busy or inaccessible, the job will be saved back into the job queue of the node from which the launch was attempted. Pooch will check the queue once per minute. (See the Node Info Window or the Job View of the Network Scan Window to access this queue.) Otherwise, Pooch discards the job after the first attempt. If you are using the Start Job at Time
feature, you may want to consider checking this flag in case a different job is already running on the same nodes.
- Automatically Acquire Nodes - If this option is enabled, the Job Window will replace the node list with options to automatically acquire the nodes needed to run this job. Pooch will attempt to fulfill the request specified in the acquire controls just prior to launch. Used in conjunction with the Start Job at Time
option, the acquisition will be scheduled for that specified time. Used in conjunction with the Queue this Job option, the acquisition will be attempted again if the request was not satisfied.
- Launching, Scheduling, or Queuing a parallel job - Clicking on Launch/Schedule/Queue Job will ask Pooch to launch, schedule, or queue, depending on the Start Job at Time
and Queue this Job options, a parallel application onto the nodes according to the information given in this window. If your node is participating in the parallel computation, your machines Pooch will direct the launch process or hold the job in its queue, depending on the job and the circumstances. If your machine is not participating in the computation, the job will be forwarded to the Pooch at node zero of the computation, and that Pooch becomes responsible for the job. If the job has not yet launched, you may inquire about this jobs status by accessing the job queue of the jobs node zero or using Job View in the Network Scan Window. After the job successfully launches, the nodes running the parallel job will report BUSY and the job will report Running in the Status column of the Network Scan Window until the job ends. Getting Info about a node will return additional details about the computations use of that node.
- Recent Items Drawer - Clicking on the recent button at the bottom of the Job Window will open a new drawer containing recent items. You can use this feature to recall recently used executables, files, or nodes.
The Recent File Lists folder contains the file lists of previously submitted jobs, each of which can be opened to access specific files. The Recent Node Lists folder contains the node lists of those previous submitted jobs, which can also be opened to recall specific nodes. The Remote Scan List displays the cached IP addresses used in the Remote Node Scan feature of the Network Scan Window. For your convenience, an icon at the top of this drawer contains the name and IP address information of your node. This drawer is available only in OS X 10.2 or later.
Variations of the Job window
The Job window offers two variations from the node list. The first is the ability to acquire nodes automatically from the cluster at launch time. This feature allows the executable to be launched on nodes as they become available. New in 1.6, the second is the ability to distribute a single-processor executable on a cluster, giving them a different integer for each instance of the executable. This feature gives you the ability to explore a parameter space using an executable not otherwise designed for parallel execution.
Automatically Acquire Nodes
- Acquire nodes or processors - When the Automatically Acquire Nodes option is enabled, selecting from these pop-up menus specifies how many nodes or processors Pooch will attempt to acquire just before launching the job. If available, nodes of different processor counts could be combined into a parallel job. Options such as Tasks per Computer, do apply. Pooch will add as many available local nodes as it needs to satisfy this request.
- Starting Node of the acquisition - When the Automatically Acquire Nodes option is enabled, the acquisition is performed by the node selected here. By default, the local node is selected. If another node is selected, the job will be forwarded to that node prior to launch. This pop-up menu is populated with the nodes previously used for the remote network scan feature of the Network Scan window. Adding a node from the Network Scan window to the Job window also sets this acquire setting.
- Acquire Options
- When the Automatically Acquire Nodes option is enabled, the acquisition will include the starting node if the Include Starting Node item is checked. Otherwise, the starting node may or may not be included. The acquisition depends on the number, status, and rating of the nodes on the network. The starting node can be selected if it helps satisfy the acquisition request. If the Count Is Required item is checked, the processor or node acquisition request becomes a minimum requirement, without which the job will not be launched and, if Queue this Job is selected, will be queued for a later attempt. In that case, the acquisition attempt will be repeated until these requirements are satisfied.
Grid Job Type
New in 1.6, the Grid job type allows you distribute single-processor tasks on a cluster automatically, exercising a subset of parallel computing called distributed computing,5 well-known in brute-force key breaking and SETI@Home. This feature implicitly activates the Queue this Job, Automatically Acquire Nodes, and Retrieve Output options because its operation requires these capabilities. Launching this job creates subfolders beginning with case where the original executable resides and numbered with the integer assigned to that job. The output of the executables will be returned to these folders. Use the Subdirectory job option to customize these folder names.
- Distribute From: To: - When the Grid job type is selected, these entries specify the range of integers Pooch will use for each instance of execution. The Job in the above figure shows that it will launch cases using every integer from 1 to 100, inclusive. Pooch will create a text file named input in the same directory as the executable containing that integer. By default, Pooch will also append that integer to the command line of the executable. To change that setting, use the Command-Line Arguments option. The %index argument will be replaced with the integer for that case.
- Step - With the Grid job type is selected, this pop-up menu specifies the step between cases. Pooch will start with the From entry and increase the integer by the Step selection. So, if 7 were selected above, Pooch will launch the executable with the integers 1, 8, 15, 22, and so on until case 99.
- Starting Node - Similar to the Starting Node selection of the the Automatically Acquire Nodes option, the acquisition of nodes will be performed by the node selected here.
Network Scan Window
Poochs Network Scan window features three different views, toggled via the segmented view control in its upper left corner.
The Node View displays the nodes discovered on the network, along with diagnostics and statistics about these nodes. This view is used to access nodes, query their status and health, and select them for parallel computational jobs.
The Job View retrieves from those nodes data about jobs that are queued, launching, running, or terminated. This view compiles data from these sources to display the condition and other statistics about parallel computing jobs.
New in 1.6, the Network view displays nodes in saved node lists, as well as remotely scanned networks in one large view. Each network of nodes can be revealed by clicking on the node lists or networks disclosure triangle.
Node View
When this window is in Node View, Pooch scans the network for available nodes and displays its results here. Once Pooch has finished scanning, you can use this window to select nodes for your computation. Double-clicking on a node moves it to the node list of the Job Window, which will open if not already present. Clicking the Add button while a node is selected will have the same effect. Nodes also present in the Job Window will appear in gray in the Network Scan Window.
Selecting a number from the pop-up menu of Add Best moves that many nodes that Pooch considers best to the Job Window. The selection excludes nodes that are already on the Job Windows node list. The rating system Pooch uses to rank the nodes is described fully in the Network Scan Columns section below. This mechanism is designed to be more likely to select nodes that perform better than others and be less likely to select nodes that are busy or heavily loaded. Selecting the highest number from this menu selects all nodes that Pooch can communicate with and are not busy with a parallel job.
Normally, the Network Scan Window will scan the network again according to the Automatic Rescan specified in the Network menu. The Refresh button directs Pooch to rescan and refresh the list immediately. This window is resizable for large amounts of information.
- After selecting one node, you can use this pop-up button to get information about that node. The Get Node Info and Get Job Queue items invoke a Node Info Window. The Get Files items invokes a Finder-like window from which you can drag files to the Finder.
- The Search box at the top right of the Network Scan window accepts a search string to limit the displayed data. It compares the search string against the node and job data available in the list below and displays only those items that match. For example, if you type G5, it will show nodes that use a G5 processor. If you type dual, only those with two processors will appear. Other text will be compared against the names and other data to reveal appropriate matches. As with any search using text, overly specific search strings will yield little or no data, and overly general strings will produce excess. New in 1.6.
- The address box at the top middle of the window provides access to remote clusters of nodes, something like the address box in a web browser. (Revised in 1.6.) Pooch normally can only directly scan within its own local network neighborhood. This is the default scanning process. You may always return Pooch to this mode by selecting Local Network from this pop-up menu or typing Local in the box. Technically, unless your network administrators have set up your routers to support IP multicasting, this means that Pooch can only see other nodes in the same IP subnet. (This limitation is inherited from the Internet-standard Service Location Protocol implemented in Apples Network Services Location Manager and Apples Bonjour implementation.)
Entering an IP address in this box allows you to circumvent this limitation. Enter the address, in DNS (e.g., mymac.mydomain.com) or dotted quad (e.g., 192.1.2.34) form, of a Mac you know is running Pooch. Your local Pooch will then ask the far away Pooch to do its local scan and relay the results. Essentially, the Pooch on your Mac borrows the nose of the other Pooch. This feature allows you to scan for nodes on any other accessible subnet. This is useful for circumstances where, say, you want to access a chem.ucla.edu Mac from a physics.ucla.edu Mac. Selecting Local Network returns the network search to your local network.
Once you enter one address, Pooch will remember that address and make it available on the menu of the pop-up button for future use. As you accumulate more addresses, Pooch will re-sort the list in order of most recently accessed first.
The functionality of this feature extends well beyond just one domain, however. From your Mac, you may use this to access nodes on any subnet anywhere on the Internet. This means you can even access your cluster running Pooch from your home dial-up connection.
This feature is very powerful. With this power comes responsibility. There are security features in Pooch that will make unauthorized access without Pooch extraordinarily unlikely. These issues are discussed in the Security section.
Job View
When the Network Scan window is in Job View, it scans the network for nodes and accesses data, if present, about queued, launching, running, terminated, and aborted jobs and compiles that job data for display here. The root level of the list shows all the jobs, named according to their executable, found on the network. The job icon for each job is color coded as follows:
Icon Color Meaning
white A normal parallel computing job
blue A queued job (on ice)
yellow The job is being launched
green The job is running
gray The job has ended
white X on black The job was aborted
Opening the disclosure triangle displays the nodes that will be included in, are being used for, or were included in that job, depending on the state of the job. If possible, the information about a particular node will be queried directly, otherwise the data is derived from the data associated with the job.
This view correlates and compiles this data from all the nodes on the network. A variety of otherwise independent jobs may be accessed as far back as the Job View Access Time preference will permit. Since nodes used in one job can be reused for another one later, nodes will naturally appear more than once in multiple jobs. Although information such as the job ID, number of nodes and files, origin, and submission time should be consistent between nodes of a job, the process ID, start time, and end time could be different. If certain nodes of a job are not available, data about a node will be derived from the job data, if possible, residing on the other nodes. The Last Access and Determined Via columns are easy ways to tell whether or not the data are fresh.
Network View
In the Network View, the window organizes nodes into saved node lists followed by remote networks. Opening the disclosure triangle of a node lists displays the nodes it contains and includes those nodes for access. Opening a remote network triggers Pooch to access the remote node and query the nodes in its local area network, just like when using the address box as in the Node View. However, what is different is that you can have multiple networks open and accessible at once, and multiple saved node lists open as well. You may mix and match nodes from these multiple networks when selecting nodes for your parallel computing job. New in 1.6.
Network Pane
When the Network Scan window is initially opened, a pane splitter appears on the far left side of the window. Dragging that to the right opens a network pane, listing both saved node lists and remote networks. The remote networks shown are the same that was entered via the address box earlier, and they list networks in the same order, with the Local Network listed first. This pane gives another way to quickly select and scan a remote network, much like the left pane in OS Xs Finder windows.
Clicking on the plus sign below the Network Pane creates a new saved node lists. You may think of these as iTunes playlists, but for nodes. You may drag nodes from any network to these saved node lists, creating arbitrary groups of nodes useful for different kinds of parallel computing jobs. For example, you may have a group that you know is only available at night, or another set of nodes that are all G5s or dual-processor machines, but are on disparate, mixed networks or clusters. This gives you a way to group commonly used nodes together, allowing you to conveniently view, scan, and monitor these nodes as if they were together and include them in a new job in the Job window. Double-click a node list to scan them. When viewing a saved node list, select a node and press the delete key to remove them from a node list. The Network Pane and saved node lists are new in 1.6.
Network Scan Columns
The Network Scan Window displays information about each node Pooch queries. This information is shown in a spreadsheet-like format. This window is invoked using the columns button at the top of the Network Scan Window or the Show Columns
item of the Network Scan Columns submenu of the Network menu. The columns can also be added or removed using the Network Scan Columns submenu of the Network menu. You can drag a column at its header and resize it by clicking on its right edge. Clicking on a column header re-sorts the list according to that column.
Sometimes, if the node is running a computational intensive task, the Pooch on that node may not be able to respond immediately, delaying the appearance of this information. In that case, it probably should not be used for a new job.
Column Function
IP Address the dotted quad Internet Protocol address of this node
Machine Type the machine type, based on its ROMs
CPU Type the processor type (e.g., G3 or G4), including number of processors
Clock Speed the processor clock cycle speed in millions of cycles per second
OS Version the version of the operating system (e.g., 9.0.4 or 10.0)
Load the estimated load on the system in %. On OS 9, this item uses the amount of CPU time, in sixtieths of a second, each application has consumed according to the Process Manager approximately every ten seconds. On OS X, this item is calculated using the results of a call to the Unix ps command. Since the resolution is rather low and the process of measurement can influence the measurement (like Quantum Mechanics), this number can be somewhat inaccurate. It is intended, if the number is measurably nonzero, to hint that that this node is probably in use and should be avoided.
Free Memory the amount of contiguous free RAM available on this machine, which is the largest available space for a newly launched application. Updated in 1.7 to report more than 2 GB.
Free Disk Space the amount of free space on the drive on which this nodes Pooch resides
Achieved Performance measured achieved single-precision floating-point performance according to a benchmark based on code from the Power Fractal app. Automatically recognizes the Velocity Engine and SSE, and updated once per day. (not multiprocessor aware)
Peak Single-Precision measured single-precision floating-point performance of the FPU of this processor based on an IBM benchmark. Updated once per day. (not multiprocessor aware)
Peak Double-Precision measured double-precision floating-point performance of the FPU of this processor based on an IBM benchmark. Updated once per day. (not multiprocessor aware)
Peak Vector Float measured single-precision vector floating-point (vector float or vFloat) performance of the vector FPU (in the Velocity Engine or SSE) of this processor based on an IBM benchmark. Updated once per day. (not multiprocessor aware)
User Idle Time the time that has elapsed since Pooch last detected interaction from the user, such as moving the mouse or other activity
Rating a nonnegative real number describing Poochs rating of that node. This number is a function of the information Pooch has collected. If the node is busy, or Pooch has not successfully communicated with the node, the rating is zero. Otherwise, it is related to the above properties according to the following heuristic function:
where is Achieved Performance divided by 300 MFlops, is the load (from 0 to 1), is Free Memory divided by 32 MB, is Free Disk Space divided by 16 MB, and is User Idle Time divided by 15 minutes. is designed to be very low when the node is undesirable to use, such as when it is heavily loaded or is running out of memory or disk space. Note that, assuming load and memory space are non-issues, is approximately logarithmic with processor performance. Also, many of these variables can easily change at any moment, causing the rating to fluctuate. This rating function may change in future versions of Pooch. (To create and use your own rating, use the AppleScript interface described in Chapter V.)
Jobs Queued the number of jobs that are queued on that particular node
Current Job Name if the node is running a job, the name of the first parallel executable it is running
Last Access the time and date this data was accessed, according to that nodes clock
Pooch Version the version of Pooch that node is running
Job ID the job identification number assigned to that job
Node Count the number of nodes that job will use, is using, or had used
File Count the number of files supplied for that job, as submitted to Pooch
Process ID the process identification number of a particular nodes instance of the executable of a job when it is running or a job when it was running
Submission Time the time and date this job was originally submitted
Start Time the time and date a particular nodes instance of the executable of a job was first determined to be running
End Time the time and date a particular nodes instance of the executable of a job was first determined to be ended
Elapsed Time the amount of time a particular nodes instance of the executable of a job took that node
Job Origin the node that was the origin of the job
Determined Via the node that provided the information about this job or node
Finally, the Status column has a number of different possible displays.
Status Reading Meaning
attempting to connect to this node
a connection was established to Pooch at this node and security authorization is being approved. If the node is under heavy load, Pooch might not be getting enough CPU time to respond. On OS 9, writing your parallel app to give more background time (for example, using the checkesc function in MacMPI) would help.
access to Pooch at this node is in progress
access to Pooch at this node is almost done
Okay the Pooch at this node was successfully accessed and is not busy
BUSY the Pooch at this node was successfully accessed and is busy running an application launched by Pooch
(breathing) the Pooch at this node is being used for remote neighborhood access and is returning addresses of and/or accessing other nodes
retrieving job data
retrieving file and folder data
exchanging data or commands
Queued this job was successfully queued and is awaiting launch
Launching
Pooch is presently attempting to launch this job
Running this job was successfully launched and is running
Ended Pooch has determined that this job terminated
Aborted Pooch was used to kill this job and it is no longer running
Node Info Window
Selecting one node in the Job window or the Network Scan window allows you to access that nodes information. Much of the information on the left is also available in the columns of the Network Scan Window, with a few additions:
Item Function
DNS Name the IP name of this node according to the DNS network, if available
Total Memory the total memory available, including virtual memory (Updated 1.7)
The right side shows a list of running processes each with an approximate distribution of processor load. The determination of this information is OS specific. On OS 9, Pooch derives this information from sampling the Process Manager every ten seconds. On OS X, Pooch calls the Unix ps command, whose results can change at any moment and can be adversely influenced by the call to ps.
After selecting a process on this list, clicking Kill Process will direct the Pooch on that node, if it is running OS 9, to ask that process to quit. Applications that a user is presently using can respond to this request, and some applications ignore this request. If that node is running OS X, Pooch will use a kill -9 command instead. However, in accordance with the Unix underpinnings of OS X, Pooch can only kill processes that Pooch has permission to kill.
Clicking on the pop-up menu will allow you to switch between Stats & Apps and Job Queue. In the Job Queue mode, this window will display parallel jobs in the queue on this node. The columns list the number of nodes and files selected in this job and the launch time of this job. If no launch time was selected, or if the launch time has passed, it will display Normal. Selecting one will allow you to kill that job using the Kill Job button.
Get Files window
Selecting a node in the Network Scan Window or the Job Window and selecting Get Files from the Node menu will have your Pooch access the Pooch on that node and access files there. It displays information about the files in a window much like a Finder window. You can drag files out of it to the Finder, you can double click on a folder to access its contents, and you can Command-click on the window title to pop up a menu to back out a directory. As a security feature, you can only access the folder in which the remote Pooch resides and any of its subdirectories (folder aliases will not resolve), and you can only get files out, not drag files in. This access is completely independent of Mac OS File Sharing.
Node Registration Schedule Dialog
This dialog allows you to select the time Pooch registers and deregisters itself on the network, which determines this nodes visibility by other Pooches. As seen in the above screen shot, you may set the schedule so that, on weekdays, this node is visible on the network at 5:15 pm, after you leave for home, then no longer is available for computation at 9 am when you arrive at work that morning. The Schedule Type pop-up menu can be set to Everyday, where the daily schedule will be the same every day, Weekdays & Weekends, where the schedule for weekends (Saturday and Sunday) is distinct from the schedule on Monday through Friday, and Weekly, where the schedule on each day of the week can be set separately. The Day pop-up menu selects which day you are editing, which depends on the Schedule Type setting.
For each day, the large circular dial is used to specify what portion of the day Pooch will register itself. The circle represents a 24-hour clock with midnight at the top and noon at the bottom. The highlighted sector of the circle indicates the portion of the day registration will occur. Dragging the mouse inside the circle selects times rounded to the nearest 15 minutes. Dragging outside the circle, or using the text fields, allows you to select any minute of the day.
V. Pooch Pro
Pooch was created to make clustering as easy to use as possible. Generally there are two ways ease-of-use can be applied: 1. minimize the effort to perform an otherwise technically challenging feat; or 2. apply the same amount of effort to achieve much more than could otherwise be done. With the Pooch Application, we have done our best, in the context of cluster computing, to pursue the former while making progress with the latter. It is clear that many users need a solution that focuses on this first goal.
However, we recognize that we can do still more addressing the latter. To do so, we require a version of Pooch that enables larger clusters to be managed for use by a larger number of users. Administrators are often responsible for managing much larger quantities of computational resources than single users usually do. They require more detailed control over how the hardware is used and shared amongst users. Administrators would like to accomplish as much as they can with the effort they employ.
Pooch Pro is designed to support that latter form of cluster computing. This Pro version of Pooch supplies the infrastructure of users, groups, and administrative accounts to manage the usage of cluster resources. These capabilities imply mechanisms to identify parallel computing jobs with users and track their time spent against usage policies decided by the cluster administrator as well as an account hierarchy between users and administrators.
But we should be clear that the introduction of Pooch Pro by no means is the end of Pooch. We recognize that a spectrum of cluster solutions exist, from an individual running a few nodes to many users sharing a university lab as a cluster. To better satisfy these cluster categories, Pooch has bifurcated into two incarnations: 1. Pooch continues to make it easy for a user, not necessarily skilled in computing, to keep their cluster up and running so they can maximize their time applying, rather than maintaining, the technology; while 2. Pooch Pro allows one to administer a cluster for a large number of users, all via a user interface easy-to-use for administrators and users alike. Both will evolve and improve to address their users needs.
When Pooch Pro is first installed, it creates an initial user database file with two groups and two user accounts, user and admin. The password for the both accounts is set to a psuedorandomly generated string supplied with your Pooch CD. (For the Pooch Pro downloadable from the web site, the admin password is pooch.) When you set up your accounts, we recommend first logging into your admin account and either set a new password or create a new administrator-level account for yourself and delete the old admin account. Then, after configuring the user accounts, use Update User Database to distribute your changes to all other cluster nodes. If you need to backup the encrypted user database file, it is at /Users/Shared/pooch/poochuserdata.
Pooch Pro Menu
User menu
- Log In
/Log Out - Opens a Cluster Log In window to initiate use of Pooch Pro after entering a user name and password
- Change Password
- Opens a Change Password Window where the user can enter and verify a new password for account log in.
- Show/Hide User Info - Toggles a User Info Window for the currently logged in user, showing statistics on the amount of compute time the user has consumed out of their assigned quota. See the User Administration Columns section for more detail on these fields.
- Migrate User Access
- Adds a reference describing this users account information, including password data, to the Job Window. The user may then choose nodes, just as with parallel computing jobs, where a new or updated copy of their user data will reside.
- Administrate Users
- Opens the User Administration Window (see the User Administration Window section)
- Update User Database - choosing from this submenu allows the distribution of this nodes user database to other nodes on the cluster. Assuming all nodes are available and running on the local subnet, the Automatically choice will have Pooch Pro automatically acquire those nodes for updating. The Manually
item adds the user database to the Job Window, making it possible to choose nodes that may not have been previously available or are outside the local subnet. Ideally, this is an operation that should be performed on all nodes of the cluster.
- Import Users
- Opens a file dialog to choose a text file from which Pooch Pro will interpret data to create new users and modify existing ones. This feature allows an administrator to generate a user database based on an external source or text file, then import this user data into Pooch Pro with one action. The data format must be Unix linefeed text (such as using Make Plain Text in TextEdit), and the first line contains tab-delimited headers followed by corresponding tab-delimited data. The import ends with the first blank line or at the end of the file. We recommend examining the output of Export Users for an example, although the column order does not matter to the import, nor do all columns demonstrated in export need to exist for import. Password is also a recognized column. New in 1.6.5.
- Export Users
- Opens a file dialog to choose the name and location of a new text file containing tab-delimited data about user settings, including CPU quota and rollover minutes, as well as calculated CPU time used and remaining for each user, parceled between quota versus rollover minutes. This text format should be straightforward to read by standard spreadsheet programs. These header fields are the header keywords used in Import Users. New in 1.6.5.
Command-line Access
When using the macmpi.tar.gz package with Pooch Pro, we recommend that you create users on the system whose user names match the user names registered in Pooch Pros user database and use the -m flag when launching jobs. This makes it possible correspond command-line based logins (e.g., ssh) with jobs submitted by users known to Pooch Pro. Without those consistent user names, Pooch Pro will not recognize a job submitted by users it does not know, and therefore kill those jobs because they appear to be foreign.
Pooch Pro Windows
User Administration Window
Pooch Pros User Administration Window provides access to the user and group account database. Usually only administrators would have access to this window. This window features two different views, toggled by the view buttons in its upper left corner. A dot in the close box indicates that user or group data has recently been changed.
The User View displays all user accounts in the cluster, including statistical data about these users. This view is used to administer, create, edit, and delete user accounts.
The Group View lists all groups of user accounts on the cluster and the users they contain in a hierarchal format. This view is used to administer groups and organize which users they include.
User View
The User View of the User Administration Window is where user accounts can be created, edited, and deleted by selecting the account and clicking on the buttons at the bottom of the window.
Group View
The Group View of the User Administration Window is where groups can be created, edited, and deleted. Users can be dragged between groups here.
User Administration Columns
The User Administration Window displays information users and groups in a spreadsheet-like format. This window is invoked using the columns button at the top of the User Administration Window. You can drag a column at its header and resize it by clicking on its right edge. Clicking on a column header re-sorts the list according to that column.
Column Function
Name the full name of a user or group
Short Name the short name of a user (a.k.a. user name) or group. The ID number of that group or user is shown in parentheses.
Group the short name of the users group, with the group ID in parentheses
Compute Time Quota
the compute time allocated to a user. The quota is specified as a particular amount of time, summed over all nodes used by that user, allowed within a certain wall-clock time. For example, if a user is allocated five hours per day, then Pooch will allow that users jobs, in aggregate, to use up to five nodes one hour each, or one node for five hours, every day
Rollover Minutes Expiration
the expiration limit of this users rollover minutes. If a user does not use up their quota on a previous quota interval, the user may carry those minutes over to the next interval. However, those minutes will expire after the time specified here. Using the above example of a user quota set to five hours per day, if this rollover expiration is set to two days and the user did not use any time yesterday, then that user could use up to ten hours on the cluster today.
Maximum Job Duration
the wall-clock time limit of this users jobs. Jobs submitted by the user will be limited to running for the amount of time specified here.
Total Time Remaining
the sum of the compute time available to the user if that user were to start a job now. Pooch keeps track of jobs submitted by users compares that time against the policies specified by the quota and rollover minutes policy settings. The time left is displayed in this column.
Total Time Used the sum of the compute time used by the user within the quota and rollover minute time intervals
Quota Time Remaining
the sum of the compute time available to the user according to the compute time quota rule only
Quota Time Used the sum of the compute time used by the user within the quota time intervals
Rollover Time Remaining
the sum of the compute time available to the user due to rollover minutes only
Rollover Time Used the sum of the compute time used by the user outside the quota time interval but within the rollover minute expiration
User Flags the flags of a user or group, specifying user policy features, shown graphically. Their meanings are shown in the chart below.
User Flag Icon Meaning
this user or users of this group group are allowed to administrate on the cluster
this user or users of this group have a compute time quota limiting their allocation on the cluster
this user or users of this group have been allowed rollover minutes
this user or users of this group have been allowed to migrate their access to the cluster from node to node
this user or users of this group have been allowed to change their own password
Edit User Sheet
Editing a user via double-clicking or the Edit button invokes the Edit User Sheet.
The sliders set the compute time policy for this particular user. A users total compute quota can be limited to a certain amount of wall-clock node time per time interval, which can be a day, a week, a month, or a year. This time is the sum of all the time this user keeps any node or nodes busy, as tracked by Pooch. This calculation is compared against the quota specified in this dialog and the quota of this users group. In addition, the user may be allowed minutes to be rolled over from previous quota intervals. This feature allows users to combine previously unused minutes with their current quota for their compute time. Users individual jobs can also be limited to running for a specified amount of time using the Limit Job Duration slider. When creating a new user, a New User Sheet appears that is very similar to the Edit User Sheet. Unless one is entered, the default password given to new users is pooch, all lowercase. Once the short name and user ID is set, they cannot be changed without deleting the user.
Edit Group Sheet
Editing a group via double-clicking or the Edit button while in the Group View invokes the Edit User Sheet.
Like that for an individual users, the sliders set compute time policy, but instead of just a particular user, a policy can be set for all the users in this group at once. A users compute time is limited to either that users particular quota or the groups, whichever is more limiting. This hierarchy applies individually to the rollover minutes and job duration limits as well. When creating a new group, a New Group Sheet appears that is very similar to the Edit Group Sheet. Once the short name and group ID are set, they cannot be changed without deleting the group.
AppleScript
Pooch Pro supports three AppleScript types regarding user functions. New in 1.6.5.
user noun -- a user of the cluster
group noun -- a user group of the cluster
The user record is data about the user, including settings specific to the user as well as calculated quota and rollover CPU time used. Similarly, a group record is data about a group. The units of the time-valued properties, such as quota or rollover time, are in minutes. Via an administrator account, these records can be accessed using the usual Standard Suite in AppleScript vocabulary. For example, to get a complete list of users, use the following command:
get every user
To create a new user, use make:
make new user with properties {{name:Test User, user name:testuser, has quota:yes, quota:10000, group ID:301}}
These capabilities are useful to set up and create a new user programmatically via AppleScript or a Unix script. If not all properties are provided, as in the make example, Pooch Pro will fill in the remaining fields with default values. Properties that return calculated time usage and remaining , of course, are ignored if fed into Pooch Pro.
login noun -- a users login session to the cluster
The login record is about the login state of Pooch. Data about the currently running user can be returned via the get command, while delete can log out the user. To log in from AppleScript, use make:
make new login with properties {{user:{user name:testuser}, password:pooch}}
VI. AppleScript
Pooch has the ability to respond to commands from a script written in AppleScript. This feature makes possible customized launch configurations and queuing systems. For example, simply by dragging an application to it, a script could launch that application automatically onto the four best nodes found on the network. Also, it is possible to write and launch scripts from OS Xs Unix command line; thus, launching jobs through Pooch from OS Xs Unix shell is possible. In addition, because the underlying mechanism of AppleScript are AppleEvents, the inter-application messaging system on the Mac OS, it is possible for Mac applications, including mainstream ones, to ask Pooch to query the network and launch parallel computing jobs.
You may look at the Pooch dictionary by selecting Open Dictionary
from the File menu of Script Editor (an application commonly supplied with the Mac OS for editing AppleScript scripts) and selecting Pooch App in that dialog. We also present that vocabulary here. Pooch fits its AppleScript vocabulary within the official AppleScript object-oriented approach, accompanied by only a few Pooch-specific commands and data structures.
Standard Suite
The AppleScript Standard Suite was defined by Apple Computer, Inc., to create a standard vocabulary that all AppleScript-aware applications would understand and respond to in a consistent manner. This vocabulary is meant to encompass as many operations as possible that a user could do with an application, while allowing only a minimum of application-specific operations to fall outside this suite. The goal is, once an AppleScript user learns this suite once for one application, that user may apply that knowledge to all other applications with the greatest possible efficiency.
make
new type class -- the class of the new element. Keyword 'new' is optional in AppleScript
[at reference] -- the location at which to insert the element
[with data anything] -- the initial data for the element, such as a list of alias records for the file list of a job
[with properties record] -- the initial values with properties of these items, needed for nodes in a node list of a job
[with last type class] -- use the data from the last submitted job for the new job
Result: reference -- to the new object(s)
make is probably the second most common command you will use with Pooch. You may use it to create a new job.
set myjob to make new job
This command directs Pooch to open a new Job Window with only the node this Pooch is running on as node zero. It then sets the variable myjob to be a reference to the new job just created.
By default, make new job creates a new job with an empty file list and a node list containing the node this Pooch is running on. make new job with last job causes Pooch to derive the new job from the last job this Pooch launched according to the settings in the New Job Automatically Reloads submenu.
To add an application or files to the job, use the make new file form:
make new file at myjob with data alias "Power Fractal" of the startup disk
This command adds the Power Fractal application, which resides on the root directory of the startup disk, to the file list of myjob. The word alias typecasts the reference into an alias. Script Editor often checks to make sure the file exists before allowing you to compile the script. Adding files is similar, and this mechanism also accepts lists of files.
To add a node explicitly to the job, use the make new node form:
make new node at myjob with properties node {name:"Mac number 2", address:"192.168.1.2"}
The information between the brackets is interpreted as a record containing the name and address of the node. This record is typecast as type node, then added to the node list of myjob. You may also construct a list of nodes,
set mynodelist to {{name:"computer01", address:"192.168.0.1"}, {name:"computer02", address:"192.168.0.2"}}
and add that list to the job.
make new node at myjob with properties mynodelist
In addition, you may add the result of a node scan directly to the node list.
make new node at myjob with properties (node scan)
The node scan command is described in the Pooch Suite section below.
The remaining AppleScript commands of the Standard Suite that Pooch recognizes are:
close reference -- the object to close
count reference -- the object whose elements are to be counted
delete reference -- the element to delete
exists reference -- the object in question
get reference -- the object whose data is to be returned
open reference -- list of objects to open
set reference -- the object to change
to anything -- the new value
These commands are general purpose because they can refer to almost any object, within reason, with the Pooch application. Some of them are used simply, e.g., close job window closes the job window, or close every window closes all windows. count the node list of myjob returns an integer how many nodes are in myjob. Using the open command is the equivalent of dragging the item onto the Pooch icon, so its function overlaps with make new file. Commands that do not make logical sense (such as close BUSY check of myjob) either return an error or do nothing.
The objects that these commands act upon can vary widely. Individual nodes of a node list or files of a file list can be referred to by number, such as node 2 or file 4 or item 5. Note that, in AppleScript, all counting is 1-based, so the nodes of an n-node node list are node 1 through node n. The node list or file list of a job can also be specified. For example, exists file 4 of the file list of myjob tests if that file of the job exists, while delete the node list of myjob clears the node list. In the former example, file 4 of the file list of myjob may also be simplified to file 4 of myjob because the reference is clear; however item 4 of myjob is not specific enough.
The set command is one of the most versatile members in this suite and will probably be the most common command you use. Once you create a job and set the variable j to be a reference to that job:
set j to make new job
you can set entire file or node lists:
set the file list of j to {alias "Enterprise:Pooch folder:pinput2d"}
set the node list of j to {{name:"computer01", address:"192.168.0.1"}, {name:"computer02", address:"192.168.0.2"}}
set a particular file of a file list to another member of the same list:
set file 2 of j to file 1 of j
or set particular options of the current job:
set BUSY check of j to no
set launch failure queues job of j to yes
set target subfolder of j to "run001 folder"
set start time of j to date "Wednesday, October 31, 2001 12:00:00 AM"
set tasks per computer of j to use processor count
Setting the target subfolder option to "" and start time to current date or an earlier date turns off those job options. All the substructures that make up a job are listed in the job class listing in the Pooch Suite section of Poochs dictionary.
In addition Poochs Application Class has a two additional properties from the normal suite. You can query the launching a parallel job property to find out whether or not Pooch is currently launching a parallel job onto nodes. This information is useful for scripts or parallel applications to coordinate its activities with Poochs launch process. Also, you may find out whether or not this Pooch is registered by accessing the node registration property.
Pooch Suite
The commands particular to Pooch are in the Pooch Suite. Expanded in 1.6.
node scan reference
[for best integer] -- Search for a given number of 'best' nodes
[for [best] processors integer] -- Search for a given number of processors using the 'best' nodes
[starting at string] -- scan starting at a remote node specified by an IP address in a character string
[BUSY boolean] -- Include the nodes labeled BUSY
[all boolean] -- Include all nodes found
[new scan boolean] -- Explicitly perform a new node scan
Result: node -- list of nodes
job scan reference
[starting at string] -- scan starting at a remote node specified by an IP address in a character string
[all boolean] -- Include all jobs found
[new scan boolean] -- Explicitly perform a new node scan
[lists boolean] -- Explicitly include node and file lists in job records
[recalling integer] -- Recalls job records back this far in time (e.g., days, weeks, or months)
Result: job -- list of jobs
network scan reference
[for best integer] -- Search for a given number of 'best' nodes
[for [best] processors integer] -- Search for a given number of processors using the 'best' nodes
[starting at string] -- scan starting at a remote node specified by an IP address in a character string
[BUSY boolean] -- Include the nodes labeled BUSY
[all boolean] -- Include all nodes found
[new scan boolean] -- Explicitly perform a new node scan
[nodes boolean] -- return found nodes
[jobs boolean] -- return found jobs
[lists boolean] -- Explicitly include node and file lists in job records
[hidden boolean] -- hide the network scan to reduce its visual impact
Result: node or job -- list of nodes or jobs
network scan directs Pooch to open the Network Scan Window and begin scanning the network. By default it returns node records, but the with jobs switch with have it return job records instead. job scan is equivalent to network scan with jobs, and node scan is equivalent to network scan with nodes. Pooch will wait about eight seconds before returning the results of its search. The command will only return nodes which, in that time, successfully reported their status as Okay after being discovered. Note that the results of this search may or may not overlap with the nodes already present in a job you are constructing.
When the returned object is a list of nodes, each item of this list is a record containing information about the nodes name, IP address, load, free memory, OS version, achieved performance, rating, and so on. These items generally correspond to the node scan columns described in detail in Chapter IV. A complete list of the members of these records are listed and defined in the node class section of the Pooch Suite in the Pooch dictionary. Much of the information about a node can change at any time, so it is not recommended to cache data about other nodes. Likewise, job data can change and evolve over time, as queued jobs are launched and running jobs end.
Passing an IP address (in a string format, such as "192.168.1.7" or "farawaymac.otherdomain.edu") in the starting at option directs Pooch to perform a Remote Node Scan using a Pooch at that address. The success of this operation depends on whether or not a Pooch is actually running on a Mac at that address. Assuming that Pooch successfully relays information about other nodes, your Pooch will request the status of those nodes and return the results to your script as before.
Passing an integer n in the for best option of this command directs Pooch to return only the best n nodes of the search for nodes. Nodes that are busy or cannot be contacted are never included in this list. It is possible for this command to return less than n nodes. Which nodes are the best nodes is determined by Poochs internal rating calculation, which depends upon data about that node which change from moment to moment. To determine the best nodes according to your own criteria, you can construct a script that sorts the results of node scan using a rating function of your own design.
To specify a number of processors to use, use the for processors option of this command. Pooch uses the same criteria as the for best option, except it limits the nodes it returns so that the sum of the number of processors on the return list most closely match the number of processors specified here. This is useful if you are using the Use Processor Count setting of the Tasks per Computer job option and want to launch only a limited number of tasks.
The output of this routine can be placed directly into a jobs node list:
make new node at j with properties node scan for best 4
but you may instead wish to edit the search results yourself before passing it to your job:
set nodesearch to node scan for best 4 starting at "farawaymac.otherdomain.com"
-- edit the list in nodesearch here
--
make new node at j with properties nodesearch
Be creative!
launch job -- job to launch: required
The launch command directs Pooch to submit a job that you pass by reference. For example,
launch myjob
would launch a job that you originally made using set myjob to make new job.
bark
This command makes Pooch bark. Try it and listen for yourself.
Additional Reading
Documentation about AppleScript in general is available in print and on the web.
Derrick Schneider (with Hans Handsen & Tim Holmes), The Tao of AppleScript, 2nd edition, (BMUG and Hayden Books, Indianapolis, Indiana, USA, 1994).
Apples official AppleScript web site http://applescript.apple.com/
MacScripter - an independent web site with links, information, and resources, including numerous examples, related to scripting on the Mac http://www.macscripter.net/
Apples AppleScript mailing list http://www.lists.apple.com/applescript-users
VII. Security
Pooch is its own lock and key. You should keep track of your Pooch like you keep track of your keys.
Before Pooch will accept commands from another Pooch, it must receive a passcode that matches its own. Then, all subsequent commands use a 512-bit encryption key that rotates for each message in a psuedo-random manner. Only those two Pooches can predict the next encryption and decryption keys. If a mistake in the passcode or commands is made at any time, Pooch will reject the connection. Since Pooch waits a second or two before it accepts another connection, an exhaustive search for the correct encryption keys (2512 possibilities once per second would take over 10145 years) will be extraordinarily unlikely to succeed.
The first passcode and the start of the rotating key are 512-bit psuedo-random numbers derived from the registration name of that Pooch (which is set at compile time). Therefore, only Pooches of the same registration will be able to communicate with one another. Because the registration name is unique for each Pooch customer, a copy of Pooch registered to, say, MIT, will not be able to communicate with a Pooch registered to UC Berkeley. (For cross-registered Pooches or other custom configurations or encryption methods, please e-mail Dauger Research.)
Security for your cluster then becomes dependent on the security of your Pooch registered with your registration name. Your Pooch can be installed on the Macs of your cluster, and, if no additional copies of Pooch exist, no one can get in. But if you make a copy of that Pooch and bring it home to access the cluster, the security of that cluster depends on how securely you keep that extra copy of Pooch.
The nature of this security is analogous to having the ability to copy a key to a locked office. It is not uncommon to entrust a group of people with the keys to a shared resource, such as office equipment. The security of the equipment is shared by those who have copies of the key. These people understand the responsibility that comes with the privilege for that access. Access to Pooch can be shared in a similar way.
In testing by Dauger Research, Poochs ability to break through firewalls is no greater than that of a typical web browser.
If someone loses a key to an office, it is not uncommon for the office locks to be rekeyed and a new set of keys distributed to the users of that office. Likewise, if you feel the security of your Pooch has been compromised, it is possible to obtain a rekeyed Pooch. The Pooch code allows psuedo-random variants of the 512-bit passcode based on the same registration name. The rekeyed Pooch will not communicate with the previous Pooch of the same registration. An active Pooch subscription with Dauger Research will enable you to obtain a limited number of rekeyed Pooches.
If you are using the downloadable demonstration version of Pooch, you should be aware that the same version can be downloaded by anyone else on the web. So, if they have the IP address of your Mac, they could access your Mac, to the extent that Pooch allows, over the Internet. Although guessing your Macs IP address is unlikely, a uniquely registered Pooch makes for much better security.
VIII. Frequently Asked Questions
Does Pooch work on machines running an OS before OS 9?
Were sorry, but no. Pooch uses many of the latest APIs implemented by Apple. The Data Browser interface was sufficiently implemented only in CarbonLib 1.2, and the NSLM v1.1 implementation, used by Pooch to register and search for nodes, requires OS 9. Poochs predecessor, the Launch Den Mother and Launch Puppy, runs on OS 8.x and is available from daugerresearch.com, but is officially not supported.
Does Pooch work on OS X?
Yes. Pooch is a Carbon app, so it runs on both OS X and 9. We have even run parallel apps on a mixed cluster where some Macs are running OS 9 and some running OS X 10.1 and some running OS X 10.2 through 10.3. Due to bugs in earlier OS X versions, we highly recommend using at least OS X 10.2.1 or later.
Can I use Pooch to run a node while it is logged out?
Yes. This feature is new as of version 1.3.5. You may either use the standard Pooch Installer (while holding down the option key) or the Pooch command-line installer, poochclinstaller.tar.gz. Due to security issues in OS X, a restart may be required to fully enable this feature. Jobs launched onto logged out nodes will run behind the login window. Because of OS Xs design, that is the only way the executable can reliably run in that environment.
How do I disable the Poochs run at logout feature?
You may do so by deselecting the Enable Pooch at Logout
preference. Due to security features in OS X, administrative authorization and a restart may be required to fully disable this feature.
How do I uninstall Pooch completely?
You may remove Pooch and all its components by allowing Pooch to run normally, then holding down the Option key after launching the Pooch Installer. In the Pooch Installer, a dialog should appear that allows you to select either to upgrade or uninstall Pooch. Clicking on uninstall will delete the running Pooch and the components that allow it to run at logout. The latter process may require administrative authorization.
How do I obtain a Pooch subscription, and what do I get?
With your ordered copy of Pooch, you get a one-year subscription, which can be renewed annually at 25% of the purchase price (at the time of renewal) of a new copy. As a subscriber, you will receive free updates to the software (approximately twice per year) and technical support via e-mail. The subscription also entitles you to up to six free rekeyed Pooches, if you feel your security has been compromised, per year. If you would like to upgrade your Pooch, please contact Dauger Research.
When I get an update, do I have to reinstall it on every Mac myself?
On the CD you receive, Dauger Research, Inc., will supply an updater program we call a Pooch Package. You can use the usual Pooch launching mechanism to deliver this Package to all your nodes, which will then unpack and update Pooch to the new one on all your nodes automatically and simultaneously. No reboot necessary.
What is different about Pooch compared to its predecessor, the Launch Den Mother and Launch Puppy?
LDM & LP were a pair of Macintosh applications that assist in starting a parallel application. I created their earliest incarnations in 1998. Neither were designed to run all the time in the background. The Launch Den Mother discovered the existence and addresses of nodes using AppleTalk calls that looked for a side-effect of the Program Linking toggle in File Sharing. It then sent an AppleEvent (of a type allowed only when Guest Access was on) to the Finder on that Mac to launch its Launch Puppy. Correct execution of this AppleEvent required that the LP be in a very particular location on a particular hard drive. Once LP was up, LDM communicated with it via the Program-to-Program Communications Toolbox (PPC Toolbox), introduced in 1990 to support AppleEvents and AppleScript, written to use AppleTalk only, and toggled via the aforementioned Program Linking button. Finally, LDMs user interface consisted of one large CustomGetFile dialog box. LDM & LP contained an innovative combination of technologies. There was nothing else out there that could do what they could do, and they became used in scores of Mac clusters worldwide.
Then, Apple announced the future of the Mac OS to be OS X. Once the specifics about Xs application support became clear (early 2000), it was evident that LDM & LP were in trouble. CustomGetFile was not supported in Carbon, so its associated 3000 lines of code would be useless. Although AppleEvents would be available in another form on X, Guest Access was permanently disabled (although there were ways to use password access). The calls LDM used to discover nodes (e.g., PLookupName) did not exist in Carbon, so another method must be found. PPC Toolbox was completely unsupported in X. (Just try to find a Program Linking button in native X.) Apple itself was discouraging the use of AppleTalk in favor of TCP/IP. It was clear that a port of LDM & LP to X was completely impractical: almost everything interesting about LDM relied on APIs Apple listed as Unsupported.
So, rather than hodgepodge LDM & LP piece by piece for X, I decided to abandon those 13000 lines of LDM & LP source code and start something new. Numerous convenience features intrinsic to AppleTalk did not exist in TCP/IP, so a completely fresh approach was necessary. Although the ideas had been rolling around in my head since early 2000, only the CarbonLibs available in late 2000 had enough functionality to provide what I needed. Then, in early 2001 (simultaneous with finishing my doctoral dissertation), I developed Pooch. Pooch:
- uses Open Transport TCP/IP networking for communications
- uses Data Browser APIs to display information about nodes
- uses three separate windows, split off from the LDM dialog some would call cockpit-like
- uses Service Location Protocol (SLP), through the Network Services Location Manager, to post information about itself and discover other nodes on a network
Using these APIs and experience with LDM & LP resulted in fundamental design requirements in Pooch different from LDM. For example, on OS 9, SLP requires a registration by a running application, forcing Pooch to run all the time. But, I figured I would take advantage of these requirements rather than fight them. These fundamental design differences gave the opportunity to do things that LDM & LP could never do:
- become one application. Pooch is, in a sense, schizophrenic. It has numerous different interacting programs inside that enable it to respond to many different possible requests at once.
- run anywhere on any hard drive. The AppleEvent to start the Launch Puppy was no longer necessary. Since Pooch was running anyway, it knows where it is.
- continuously monitor information about the system. Rather than rely on side-effects, it could keep track of running jobs, or load averages, or performance benchmarks, etc., itself and report such information to a querying Pooch at any time without disturbing a user, if present.
- maintain a job queue. Because Pooch runs continuously, it can retain a queue, initiate a parallel job at any specified time, and make decisions about the situation as it arises.
- be accessible over the Internet. TCP/IP, even with SLP, arent enough to do all of what AppleTalk does. So I added the Remote Node Search feature to extend the reach of SLP to other subnets. But this has the consequence that anyone on the Internet can access the nodes, leading to the addition of security features in Pooch. And the nature of these features, relying on registration names, was to allow the user never to remember passwords, a convenience we had in LDM.
- radically improved, streamlined, and standards-compliant AppleScript support, allowing for Pooch to be directed by queuing systems, from OS Xs Unix command line, or by other applications.
- the ability to launch jobs that use other MPIs for their message passing, in addition to those using MacMPI. This exemplifies how the best of the Mac is being combined with the best of Unix.
- and more. We have features in mind for the future of Pooch.
And then there were other features added to enhance the user experience, such as extending the drag-and-drop metaphor from the icon to the Job Window, becoming a repository for files and nodes and allowing the entire Finder to become a file dialog.
But the primary goal was the same: Make operating a powerful parallel computer as simple and as easy to use as possible. Easy enough for anyone to use.
IX. Revision History
v1.7 May 15, 2006
Significant new features include:
for running on mixed clusters of Intel- and PowerPC-Macs
Other features, enhancements, bug fixes, and updates
Pooch changes in detail: Universal Binary release. Implemented and added user interface for remote job access mechanism. Implemented Job file type support in user interface and Job window, including save sheet reminders and standard save sheet dialogs. Implemented Head Node setting, influencing default behavior in the Network Scan window, the Job forwarding option, via Proxy setting, automatic node acquisition, retrieve output job option, and queuing job option. Implemented more ways to acquire file icons in file pane of Job window. Added support libraries for complete reading and writing of folders and files. Modified Node Access library to locate and retrieve jobs and files on remote nodes. Implemented node access retry in Node Access library to attempt primary address of remote node. Added recognition for .app extension like that of .out. Set job name just prior to job submission. Implemented job window reinvoke after job submission preference. Eased access to port number of job in Job window. Updated use of Open Transport API for endian changes. Increased proxy time based on data transmission sizes. Log list of IP addresses on startup. Added retry for large send sizes from background mechanism. Added reporting of user name data in AppleEvent job structures. Added access delay to launch state in AppleEvent interface when job just launched. Added AppleEvent dictionary entry and implementation for TCP port of job data. Allowed for initial file name to be specified in Navigation dialogs. Fixed queuing system to launch properly if only one job queued. Added logging for launch errors, including identification of problem nodes, and reporting. Fixed endian conversion for multi-virtual node launch. Implemented additional timeout detection in launch mechanism. Implemented optimized secure encryption methods. Implemented secure connections at beginning of launch mechanism, including logging. Fixed allocation bug on post-launch reporting to origin node. Updated Preferences window with new features and allowed for its resizing. Fixed tunnel address string detection in Network Scan library. Updated Status window with antialiased, positioned fonts. Fixed Navigation data possible crash when file dialogs are closed. Fixed possible crash if refresh in Network view of Network Scan window. Restricted Kill Job button if accessing by proxy, while triggering proxy node if Kill Job is invoked. Edited Status icon to reflect new node states and consistent display of status text. Implemented support to detect and display greater than 2 GB of RAM using sysctl. Implemented workarounds for resource handle bug in recent jobs list, preferences saving, and process watch list. Added support for Nodes per Box specific to a node in job structure and contextual menu access in node pane of Job window. Added support calculations for virtual node specification. Added support for long job names based on long file names. Added support for job identification for retrieval. Fixed endian conversion for grid and node acquire structure types. Fixed peak Vector float flops endian conversion. Fixed endian conversion backwards compatibility with old Job properties structure types as well as new extensions. Added file info parameter record endian conversions. Handled endian PinRect issues in Network Scan Columns and User Administration windows. Extended search support for possible quad- and octuple-processors. Implemented launch of Console.app to display poochlog.txt. Reorganized File menu with new items. Modified Paste behavior to the default in dialogs, the address field in the Network Scan window, and submit to the Job window. Optimized backgrounding timings. Node Info window displays node name in title bar, if available. Allowed Node Info data to display without process data. Updated Total Memory field to include virtual memory estimate, and updated Free Memory field to show greater than 2 GB. Process checking for null jobs before converting endians. Ignore /private prefix in acquired paths to processes. Added routine to tell Console to open a file. Added standardized Nodes per Box Job structuring setting. Altered code to fit within Xgrids library restriction, fixing a bug that appeared in later versions of OS X 10.4. Updated benchmarking code to include new Power Fractal implementations in AltiVec and SSE. Added option to save and retrieve Pooch Pro user login data in Keychain, implementing time delay if user is not present. Converted endians for encrypted Pooch Pro user data to disk and over the network. Redirected Pooch Job History data into centralized, common system location from user location, with consumption of old job data. Converted endians for Job History data. Implemented Job accumulation cache and filter to optimize network and job scan for user minutes calculation. Added delay to accumulate calculation until user idle. Fixed job accumulation when launch occurred before Network scan. Log if user or group data is corrupt. Converted endians for initial password calculation. Extended Pooch user export/import to include quota and rollover flags. Fixed possible crash when refreshing on Network view of Network Scan window. Fixed Unix permissions of the queued jobs folder. Prevented crash during deletion of one-year-old archived jobs when getting such jobs. Converted endians for Get Files window and Bonjour and NSL/SLP access implementations. Signaled automatic user database update and direct Job window click as higher priority to queuing system. Added icns data for new Job, Job queue, Job history, Address list, and node list file types. Converted endians for client-side proxy node operation. Modified poochdaemon installers to conform to Unix permissions requirements in late versions of 10.4. Revamped Pooch Package and Pooch Installer to incorporate both CFM and UB versions of Pooch and install the appropriate version, from OS 9 to Intel. Implemented hidden feature to launch jobs on both OS 9 nodes and Intel nodes.
v1.6.5: January 18, 2006
Significant new features include:
- Updates to utilization of technologies in Mac OS X 10.4 Tiger
- Enhancements to the command-line interface
- Pooch Pro supports import and export User Data via text files
- Pooch Pro enables access to user data and login state via AppleScript
- Other feature enhancements, bug fixes, and adaptations to changes in OS X
Pooch changes in detail: Modified /tmp/pooch folder creation to set all Unix permissions to true. Fixed search to find flops and processor count correctly. Added to AppleScript interface to support new command-line features. Fixed ps parser to not read past end of file or duplicate ps output. Randomized ps temp file name to reduce file and permissions conflicts. Modified message diagnostics to print more reliably. Added ability to open Pooch job queue files to create a new Job window. Prevented pooch.unix from calling Process Manager or Apple Event Manager. Migrated Node Info window and its functions to use Node Access library. Modified the Pooch at Logout feature to use root privileges to start the poochdaemon. Removed excess debugging messages, enabling some according to users preference. Added cmd-ctrl-Q trigger for launch queue. Fixed text import to navigate folders. Fixed CPU time tracking to exclude null data. Fixed Network Scan windows nodelist pane behavior to properly delete playlist entries. Implemented Kill All Job behavior when option-clicking on Kill Job button, also displaying Kill All Jobs. Fixed job status field display color when it follows a BUSY node in Node Scan window. Implemented export and import of user data via tab-delimited text files, flexibly parsing header and data fields. Implemented new AppleScript data types for user, group, login session, and time interval selectors and connecting the handlers to the user database, making it possible to modify the login state and user database of the cluster via scripts. Added default processor per box selector. Enhanced randomness of TCP port number selection to help prevent conflicts. Added recognition of dual-core G5 CPU type.
Added endian flippers for all 40+ data structure types and implemented them at all data entrance and exit points in Pooch due to migration to Universal. Modified header dependencies, disabled bridge code, modified resource behavior and icon access, compensated for missing target ID AppleEvent coercion and API changes for migration to Xcode from Metrowerks. Added recognition of Intel CPU type for Pentium 4.
v1.6: May 10, 2005
Significant new features include:
- Utilization of new technologies in Mac OS X 10.4 Tiger:
- Automator - Incorporate Pooch actions into workflows
- Dashboard - View cluster status and activity in an aesthetic environment
- Spotlight - Locate data in Pooch jobs via the system
- Xgrid - Pooch makes Xgrid nodes accessible to its users
- Massively renovated Network Scan window, following cues from iTunes, Finder, and Safari, including support for user-defined node lists in their own movable Network pane, a multipurpose action pop-up menu, an address box, an all-new Network view mode, and even a Search mechanism
- Extended AppleScript fidelity, including exposing job data acquired on the cluster
- Support for grid-style jobs, specifying arbitrary argument ranges for identical executables
- Revised queuing system with a centralized queue for all users on the system
- Revisions to adapt to adjustments in OS X 10.4 Tiger, particularly the StartupItem mechanisms
- Support for a new web-based interface
- Other feature enhancements, bug fixes, and adaptations to changes in OS X
Pooch changes in detail: Major renovations to the Network Scan window, both at internal and external levels. Internally, factored the routines into separate components for node access, network scanning and compilation, and the window user interface, making it possible for each lower-level component to operate independently of the upper-level components. Externally, used the Segemented control to switch views, added a new Network View, added a playlist-inspired left-side pane, with a splitter that detects mouse-overs and saves state, for user-defined node lists and networks, replaced the remote node scan button with an address box with a pop-up menu, accessors to alter job history lookback dynamically, added an action pop-up to eliminate the node info pop-up and network scan columns button, added a Search field to reduce both node and job data, added alternating colored backgrounds in DataBrowser for OS X 10.4, reorganized the refresh button and other controls, following cues from Finder, iTunes, and Safari. All this while still properly supporting drag-and-drop between Network Scan panes and the Job window and preserving the previous look for OS X 10.2, 10.1, and OS 9. Added ability to automatically hide the Network Scan window. Generalized job aggregate data structures. Worked around bug that corrupted QuickDraw after editing DataBrowser item. Updated CarbonEvent handlers. Added routines to translate HIRect and the traditional QuickDraw Rect. Added dozens of wrappers and constants for support routines for the new and updated controls in the Network Scan window. Added the ability to add complete node data, node acquire data, and arbitrary job data to the Job window, as well as being able to acquire a complete job record from the Job window. Created and implemented new Grid Job Type, with the ability to launch a series of jobs, stepping through a range of integers for an executable and customize how the index is entered on the command line and how the index appears in the subfolders. Added actions on the event of a job ending, including creating a job log file to include in automatically retrieved files. Fixed a Job properties access bug when no files or nodes are present. Added Job name and Node List set/get accessors for job structures. Abstracted file list updates. Inserted workaround for jumping Job window drawers, especially in 10.4. Adjusting to changes in OS X 10.4, altered the Pooch StartupItem installation to properly set unix permissions and eliminate excess files, and wrote a new mechanism based on CFPreferences to add Pooch to the Account Startup Items list. Created a new version of Pooch that operates in the Mach-O space without a user interface, adjusting all components to the new environment, and eliminating CFM wrapper routines, yet preserving the bark. Added implementations for menu items, configuration dialog, and Xgrid job launching routine. Reorganized internal Unix and utility routines. Generalized StringListHandle support routines. Altered network error reporting routines for selective reporting into poochlog file. Converted all external mentions of Rendezvous with Bonjour for 10.3 and later. Altered the Bonjour internal interface to optionally expose multiple addresses per node. Reduced potential memory leak in resolution callback. Installed correct call sequence to stop Bonjours services resolver, eliminating burden on the network warnings. Added detection and reporting of nil infoPtr returned from NSLPrepareRequest. Added new triggers to the job launch queue. Created all-new implementation of job queue internal data system to centralize the queue on the machine and update the format for OS X, implementing new job data storage and retrieval systems while automatically converting data from the old system. New queue is more efficient and reports on possible errors in poochlog. Redirected Launch mechanism messages to poochlog and factored node scan for automatic acquire code if running in Unix mode. Reduced declarations of Job properties structure. Generalized the file/folder scan structures and the user login detection routine. Reduced AppleEvent dependencies on old url data structures. Fixed scrambled name bug when importing nodes without names. Added ability to accept and expose node acquire data, file list entity, node list entity, and complete job data via AppleEvents. Added and implemented network scan and job scan, with new modifiers, AppleScript commands to acquire job data instead of node data. Detected new clock speed format for > 2GHz and fixed clock speed display. Added calculations using UTCDateTime. Altered 64-bit comparison calculations. Cached system version Gestalt calls. Updated FSRef routines to report descriptive errors. Created node and job data to text conversion routine for Search and other features. Generalized Node List structure manipulation routines. Added ability to minimize windows on command and detect mouse movement events. Added self identity modification address list implementation and import routines, generalizing friendly-address implementation. Fixed a potential bug when importing addresses while the Job window was open. Altered subdirectory creation to not delete existing directories. Changed Node Info window to use frames only before 10.3 and provide a horizontal scroll in the Job Queue view. Added accessors for strings in preferences file. Altered window names and hide Settings menu on 10.4 to follow OS X conventions. Added Job file type.
v1.5.5: September 15, 2004
Significant new features include:
- Support for jobs using LAM/MPI on local TCP/IP-based networks
- Support for mpich-gm, enabling Pooch to launch jobs on clusters connected using Myrinet hardware
- Forwarding a job via another node, such as a head node, to submit jobs to nodes behind a firewall
- Automatic output retrieval system, detecting and returning new and modified files once the job that wrote them is finished
- Better optimized and more reliable proxy address access
- Limiting access to a node from a specific set of IP addresses
- Choosing a preferred primary address if there are multiple
- Other minor feature enhancements and bug fixes
In detail: Reorganized job processing and archival routines. Added user data to job record before launch. Added logging to /Users/Shared/pooch/poochlog.txt. Added data recording to launch mechanism for LAM. Added logging of launch events to poochlog. Optimized calls to update Status Bar during launch, reducing flicker and excess CPU consumption. Factored forwarding job detection. Added preparation, access, command, and launch functions for mpich-gm and LAM, connecting them to the main launch engine. Cached calls to TickCount(). Added function and detection flags to redirect file collection for auto retrieve function. Added check for nil commID before proceeding with launch. Added calls to place Terminal.app in foreground when launching using Terminal. Added archiving of forwarded job data. Report when communication retry count exceeded. Added 3 second timeout for key rotation function during launch. Added recognition of G4 7447 and 7457 and G5 970 and 970FX in Network Scan Window, Node Info Window, and AppleEvent results. Added check for nil ProcessSerialNumber in quit and AppleEvent functions as workaround to prevent killing user space. Extended explicit collection of command-line data from ps command. Increased scope of pid search for wraparound process numbers. Prevent overflow of command string read from ps. Report reason and quantity limit in error string when user policy triggers job kill. Added Perl script routine. Added check for 0 process id to prevent logout of user space (#3805951). Added retention and processing of friendly addresses in listener functions. Redirect application file of watch list and archived job for multiple executable instance when launching multiple tasks per node. Added simultaneous send of multiple packet data to proxy controller routines. Added state requirements for mpich-gm launch command. Added state and explicit check for access to Internet Services and Open Transport before endpoint access with appropriate retry loopback. Report access from outside friendly address list. Added slave LAM command. Added AutoRetrieve directory command. Separated functions for network access and listener activity, making cluster access possible while another Pooch is running. Added primary address setting with Network menu modifications. Added pro version application AppleEvent variable. Added flags to node scan AppleEvent to report all and BUSY nodes, with a new scan performed only when requested or required. Added cancel launch AppleEvent. Added auto retrieve, forward via, and user name AppleEvent access to job data. Added mpich-gm and lam job types to AppleEvent access. Added LAM path access. Retrieve node status data even if communication is not complete. Added check to prevent user file addition to jobs. Include process path watch to BUSY check. Factored common preferences access. Cleaned up job view entry additions in Network Scan Window. Optimized and bottlenecked status text drawing in Network Scan Window, adding grayscale option. Adjusted initial order for job view. Overwrite job item entry only if node owns the job. Added ability to send multiple special commands to a node during network scan. Correctly kill multiple task per node jobs via Network Scan Window. Reorder node status access if special command are to be sent. Optimized accumulated job history task. Made Rendezvous and NSLM registration sensitive to listener state. Added user interface for friendly address import in menu and preferences window. Adjusted User Admin Window column dimensions. v1.5.6: Prevented crash during deletion of one-year-old archived jobs.
v1.5: June 28, 2004
Significant new features include:
- The debut of Pooch Pro, introducing:
- User, group, and administrative account system, including secure password log in
- Quota and time limit-based management of cluster resources, including rollover minutes
- User database management with Unix-style user and group organization
- User associated job tracking and job history database
- A new modern, graphically-based user administration interface
- Other minor feature enhancements and bug fixes
In detail: New User menu and infrastructure for user log in and log out, password modification, user info display, user migration, user administration, and user database updating. New Show Job View menu item. New detection of User Idle Time, reporting mechanisms in the Node Info Window and the Node Scan Window and incorporation into the Rating function. Improved reporting of dialog status for compatibility with Carbon events. Pooch Pro requires log in for Job Window to function. Added recognition of correct user and group icons to the Job Window. Revised Job Window file acceptance for secure handling of user database files. Revised job launch rules incorporating user compute time quota calculations. Added flags to Status structure for Pooch Pro and encryption algorithm existence and User Idle Time. Extended Node Scan Window column infrastructure to allow over 32 columns while maintaining backwards compatibility. Corrected count of selection in Job View of Node Scan Window. Fixed activation of Get File Window and Get URL from Job View of Node Scan Window. Updated Pooch Version column of Node Scan Window for Pro label. Added Elapsed Time calculation and User Name column for jobs listed in the Node Scan Window. Restricted users from killing others jobs, except for administrators, via the Node Scan Window. Added standard Okay/Cancel and double-entry sheets. Modified parallel progress bar structure for additional flags for Pooch Pro. Added version number to About Box. Added links to User Administration accessors for windows routines. Made Job submission visible to Pooch Pro code. Created wrapper library for access to Apples CDSA implementation for strong encryption and decryption. Added mechanisms to filter commands requiring correct execution of job launch and other Pooch communication sequences. Added support for encrypted communications. Cleared out job handle on new connections. Added support for incorporation of new user data. Added block for queuing system based on user compute time quota calculation. Fixed an issue where cancel was not accepted during the file info step of the launch sequence. Added Node Scan Window support for view specification on open. Added time interval to string conversion. Added mechanism to save current process watch list to disk in case Pooch is restarted. Change to leave last call to ps in place. Added comparison of job execution time with user compute time quota calculation and maximum duration limits to process watch routine. Introduced new Pooch resource structure and support infrastructure for cross-platform compatible resource files. Corrected Job Archive Limit preference for year limit. New modules for user database management and file infrastructure, including password hash and user limit resolution. New administrative windows and dialogs. New job history accumulation and management organized by user. New user log in management for reporting of currently logged in users data. New Pooch Pro icon and About Box graphic.
v1.4.5: May 14, 2004
Significant new features include:
- Revised, renamed, and reorganized main menus for greater clarity and consistency with the OS X user-interface guidelines
- New proxy connection support for access behind a firewall when using the Remote Node Scan feature
- New user-specified control over port number selection range for compatibility with firewalls
- Enhanced node selection the Job View and the Recent Items drawer of the Job Window
- Updated Pooch Preferences window reflecting new features and new behaviors
- Other minor feature enhancements and bug fixes
In detail: New Network menu. New hierarchal menus for Launch Unix Jobs, Access Nodes via, Local Node Scan, and Node Deregistration. New Job Port Range support and user interface. New Job Queue menu item in File menu. New Proxy Connection support for up to sixteen simultaneous bidirectional connections via the node assisting with the Remote Node Scan. New preference to prevent remote access to processes not launched by Pooch. Adjusted event callbacks to not filter key events when dialogs are open. Had Demo version remove poochdaemon at expiration. Changed Node Scan Window user interface, logic, and accessors for other windows to respond correctly to node selection in Job View, including double-clicking and contextual menus. Allowed multiple selection in Job View. Allowed network connections to nodes to continue while adding them to the Job Window. Fixed an issue which corrupted status information, particularly processor counts, when adding nodes to the Node pane of the Job Window. Added support for OS X-style, callback-supporting, single-entry dialog sheets supporting copy and paste. Report retry count in Status column of Network Scan Column when Verbose Error Reporting is on. Report proxy node in Determined Via column when connection attempt to a node via that proxy. When sorting by Status in Job View, secondary sorting occurs using Submission Time. Added proper sorting using Process ID. When new nodes are introduced to the Network Scan Window, SLP data is chosen first, letting Rendezvous data can be reaccessed later. Generalized automatic rescan mechanism to extend time for proxy connections. Allowed for very long (up to 127 characters) DNS names for IP addresses. When proxy is available, Network Scan Window automatically alternates between proxy and direct attempts with a psuedorandom starting try, extending timeouts exponentially. Added support for simple OS X-style Cancel and Okay dialogs. Added proper support for Command-. to simple modal dialogs. Added new and renamed preferences to Preferences Window, requiring its heightening. Adjusted menu disabling and hiding for new Network menu. By performing post-open triggers, worked around an issue where Job Windows drawers, while opening, would randomly drag their parent window around the screen. Added drag-out support for items in Recent Items drawer. Allowed multiple selections in Recent Items drawer. Utilized new-style sheets for Job Options requiring them. Updated plst and vers resources to allow Version data to appear correctly in Finder. Extended Automatic Rescan interval to ten minutes. Added support for use of /Users/Shared folder instead of /tmp. Disallowed Kill Process in Node Info Window when no process data given. Corrected pid data reporting when concluding pid search. Supported restriction of process ID data. Passed proxy access flag through Status structure. Fixed node structure creator routine for correct behavior with nil strings. Added automatic listener endpoint check for failure and auto-recovery. Separated small network send buffer from receive buffer. Added response to T_DATA event on OTLookErr. Corrected bug in endpoint struct open routine to set TCP_NODELAY. Extended timeout delaying reuse of endpoint structures. Added dialog indicating success of Startup Items setting. Revised splash graphic.
v1.4: January 2, 2004
Significant new features include:
- A new queuing and scheduling system for retaining jobs until specific conditions are met integrated with:
- A new job tracking system designed to track the progress and execution of jobs and retain historical data about terminated jobs
- An enhanced and updated graphical user interface that includes the new brushed metal look available in OS X 10.3 and later.
- The Node Scan Window is now the Network Scan Window, divided into a Node View and a new Job View mode to scan the entire network for queued, launching, running, terminated, and aborted jobs and display statistics about these jobs
- An updated Node Info Window displaying additional data and utilizing tabbed viewing and consistency with the OS X user-interface guidelines
- New drawers for the Job Window: the Recent Items drawer displays and recalls recently used files and nodes, and nodes of the remote node scan cache, and the Options drawer accesses all the options of the job
- An Automatically Acquire Nodes feature so jobs can automatically acquire nodes given a minimum number of nodes or processors to acquire starting at any specified node
- An updated, antialiased Start Time dialog for the Job Window
- A new Pooch Preferences window, accessible from the Pooch menu, featuring toolbar-style access to preferences in Pooch
- Enhancements to preferences and settings
- New installation choices, triggered by the option key, in the Pooch Installer
- A variety of minor bug fixes
In detail: Added node list type to the Apple Event dictionary and enabled access to nodelist_ip information via AppleScript and interapplication communication. Added accessors to security authentication and authorization in OS X. Restructured Pooch utility code into six separate modules depending on category and type. Customized behaviors and appearance to OS X 10.2 and 10.3. Enhanced signaling and communication between Network Scan Window and the Apple Event support and launch modules. Prevented registration of 0.0.0.0 address via SLP. Prevented the automatic SLP registration retry from disturbing other code behavior. Upgraded CarbonLib headers to version 1.6. Added and implemented automatic node acquire mode of the Job Window and launching mechanism, with corresponding adjustments for job monitoring and display. Added Job Option drawer with corresponding controls. Added Recent Items drawer to Job Window. Added clipboard distribution feature, including pasting into the Job Window and copying using remote data. Correct handling of nil AllAddresses data from Rendezvous. Searched through all addresses resolved by Rendezvous for valid addresses, rather than only use the first. Correctly tallied Rendezvous addresses with repeat accesses though the Network Scan Window. Responds to Rendezvous discovery events at all times while the Network Scan Window is open, rather than just during the SLP scan. Added and implemented zoom button for Get Files window. Allowed Node Info and Job Queue to be invoked when viewing the Get Files window. Enabled delayed trigger for Get Files and Node Info windows for compatibility with Carbon Events. Used smaller system font for consistency with Finder. Replaced old Job IDs for the queuing system with Queued Job IDs. Implemented Job Timeline data structures to hold past, present, and future job data. Implemented job archive and retrieval mechanism, establishing Job resource type. Updated Process Watch List to incorporate job as well as process data. Implemented Job Timeline accessors for Job Archive, Process Watch List, and Job Queue. Added launch mechanism to Network Scan Window subscriber list. Prevented uninitialized data from entering the Apple Event identification routine. Accessorized Pooch version data for access by multiple parts of Pooch. Fixed an issue where the Apple Event node scan command would retrieve BUSY nodes. Limited node communication retries in the launch mechanism to 10, then reporting failure. Separated submission jobs versus working jobs in the launch mechanism, including appropriate resubmission into queue. Bridged compatibility between new Job Properties data and older Job Queue accessors and kill mechanism, including automatic acquire features. Check for null address in job data. Allowed job submission to place new jobs at either end of queue. Allowed display of IP address in launch status window if no name is present. Once launched, all processes are now submitted with job and file data for full recognition and tracking. Correctly initialized LaunchApplication records in launch mechanism. Enabled preparation and retrieval of Job Timeline data on remote nodes with GetJobTimelineCommand. Attempted to register network communication activity as user activity for screen savers. Track and retain data about aborted jobs. Implemented command-line argument Job Option on remote nodes. Performed a+w permission operation on all files created by Pooch. Prevent caching of 0.0.0.0 IP address. Returns true when checking if 127.0.0.1 is self. Created and implemented Pooch Preference window with three panes. Implemented appropriate Carbon Events to invoke and drive the Pooch Preference window. Added preference to kill the Settings menu. Conversion of Node Scan Window to Node View and Job View of Network Scan Window with addition of Carbon Events, the brushed metal look, and related control modifications. Added mechanisms to retrieve job information and coordinate the killing of jobs across multiple nodes. Added and implemented Zoom button to Network Scan Window. Generalized, and updated with antialiased graphics, the Node Registration interface for use with the Start Time Job Option. Implemented little arrows controls for arrows of the Start Time Job Option. Job Window rejects files if nil is passed for file specification. Check for possible null node names and addresses in Recent Node Lists. Appended Recent Nodes and Files, rather than replace the present node and file lists in the Job. Added Carbon Events and brushed metal look to Job Window. Separated job recall into node list and automatic acquire modes. Job Window edits keyboard focus. Accessorized self node addition to Job Window. Allowed specification of initial Job Windows node list versus automatic acquire option. Eliminated OS 9-style brushed metal when in OS X. Relabeled Launch Job button depending on Job Options. Responded to modified dragging and tracking in Job Window while using Carbon Events. Generalized IconRef access for files and special cases. Use of OS X-style display name instead of actual file name in file list pane of Job Window. Modified highlight region drawing for brushed metal look and automatic acquire option. Added Job ID and submission time initialization when Job Window submits jobs. Abstracted control updating separately for Job Window and its drawers. Embedded Node Info Windows data display in its new two tab control. Node Info window lists changed to smaller system font. Node Info Window no longer disables Node menu inappropriately. Progress pie in Node Info Window is antialiased in OS X. Rearranged data display to hold more data and animate as retrieval completed. Allowed for scrolling in Job Queue mode. Accepted supplementary Status structure data. Display proper rounding for OS version numbers. Worked around potential race condition during fast logout to login transition. Created IconRef accessor to icon resources. Created and implemented Job Archive Time Access Interval preference. Enabled access to Preference Window through Pooch menu item. Enabled and implemented Pooch at Logout toggle with administrative authorization. Added application and other main event implementation through Carbon Events. Extended Network Scan Columns controls in three columns. Added and implemented Job Item lists, with disclosure triangles, and new Job-related display columns for Job View, compiled from retrieved data. Implemented acceptance of supplemental status data, Job Timeline retrieval, and multiple node commands. Designed new suite of Job icons, Network Neighborhood icon, Node Info icon, and others. Added recognition of PowerPC G5 CPU type to Network Scan Window and AppleEvent interface.
v1.3.5: June 20, 2003
Major new features include:
- Support for launching parallel jobs on nodes while they are logged out. This new capability is appropriate for the new Compute Node Xserve and for running on administered Macs such as in a student lab.
- Support for a new suite of command-line utilities
- Improved job support for Unix access
In detail: Added implementation to monitor login/logout state so that it will hide all its menu bar when logged out and quit if the login status changes. Added support for Carbon events to support the Dock menu items and horizontal and vertical scroll wheels in the Node Scan, Get Files, Node Info, and Job Windows. Added support for contextual menus in the Node Scan and Job Windows. Minimized the appearance of the menu bar while running at logout. Implemented retention of additional column sort criterion and direction information in the Node Scan Window. Implemented time out reset when OT support was off when the user wanted to perform a Node Scan. Added code to prevent the IP address of a node in the Node Scan window from being overwritten while communication with that node was in progress. Added support and preferences for saving files in the Temporary Items folder and /tmp/pooch in addition to the folder where Pooch resides. Implemented Reveal in Finder AppleEvent support for sandboxes. Modified launch mechanisms and job window accessors to accept Unix command-line arguments and convey them to the executables, if applicable, for all job types. Added code to change Poochs Unix umask so all can read and write files that Pooch creates. Optimized the Recent Node and File List menus so that the menu would not take so long to appear. Added interface and implementation for opening File Sharing from any window that specified a node. Added and implemented skip node zero, job type, tasks per node, and Unix command-line arguments properties to the job definition and Okay/BUSY status and time stamp to the node definition in AppleScript interface. Fixed a bug in the AppleScript interface and the Node Scan Window that reported negative memory if 2 GB of RAM was installed and increased the detection limit to 4 GB. Improved the Node Scan completion queue in the AppleScript interface to more intelligently allow the needed amount of time for communication with other nodes to complete, including while performing a Remote Node Scan. Added a more intelligent Pooch Preferences folder locator in case the normal Preferences folder was not available. Implemented high-level IP address caching and comparison routines with corresponding changes in the launch mechanism and the job window, for cases where Poochs node has more than one IP address. Implemented a workaround in the Node Registration Schedule dialog so the day selection menu would redraw properly when switching schedule types. Fixed automatic benchmark starting mechanism when the node is not registered. Created a caching mechanism for ps access. Improved internal support for variable length job options, including command-line arguments. Changed default sandbox preferences. Changed launch mechanism to only put Finder in foreground if Pooch knows it is in the foreground. Optimized mpich launch timing. Changed the behavior of the Subdirectory option of the Job Window to be more likely to create a new subdirectory name rather that reuse the old one. Created a workaround and warning for blank node names, a circumstance that Rendezvous will not accept. Eliminated extraneous diagnostic reporting in Rendezvous implementation. Created a workaround for a DataBrowser display while scrolling bug in 10.2.3. Created a workaround to maintain background colors after DataBrowser reset them to white. Added command-control-R key command to reset Poochs network and node settings, in case OS X did not notify Pooch.
v1.3.1: December 31, 2002
A few minor bug fixes and cosmetic updates. In particular, a bug that only occurs in OS X 10.1 was fixed having to do with the Node Scan features of Pooch and the new Rendezvous implementation.
v1.3: October 4, 2002
Major new features include:
- Support for launching multiple tasks per physical node, making it possible to use the same parallel communications API for parallel computing both within a box and outside the box
- Registration, discovery, and resolution via Rendezvous, a.k.a. ZeroConfig
- Pooch Lite - a special version of Pooch with no user interface useful for running on administered Macs such as at a student lab
In detail: Created complete Rendezvous registration (with auto-retry), continuous browse, and on-request resolution implementations. Revised Poochs OS X process acquisition code to focus its search for processes it launched. Excluded requests for null process number watches. Keep track of jobs by session ID so that multiple tasks can be associated with a particular job. Adjusted for differences in ps on OS X 10.2 versus 10.1. Deletes previous mpich procgroupfile before writing a new one. Extended internal job data format and accessors to handle node information for Tasks per Computer and Use Port Number features. Added internal time stamp to node status information. Implemented discovery accessors and additional job option retention and added these settings to preferences file. Modified slave and master node launching mechanisms for multiple tasks per computer for all MPI types. Modified launch sequence to always query for latest status from nodes before launch and update internal job data. Launch process now forwards all job data to slave nodes before launch. Created complete recursive file and folder copy mechanism. Modified master and slave node to recursively copy executables and files on command. Implemented wait states for mpich launching due to fragility in mpichs p4. Job Window updates node 0 name, address, and status information of the current job if node 0 was this node. Fixed possible memory leak in recent node and file list constructors. Created task per computer counter for Job Window node list display. Fixed node list display error if name was zero length. Adjusted to condensed font if node numbering became too wide. Changed default subdirectory job option suffix from to folder. Implemented new Tasks per Computer and Port Number job options in Job Window. Modified job window accessors to handle extra node information. Set Running Applications list to order by descending CPU time by default, rather than ascending application name. Optimized Node Info access and disconnection speed. Created higher-level AppleEvent node information accessors and creators and extended implementations to AppleEvent and Job Window accessors. Added AppleEvent access to new Job Window features. Implemented two second minimum time for node scan command to complete to allow Rendezvous and SLP discovery time to initialize. Implemented for best processors clause. Fixed node scan request queue bug that prevented more than one request from being queued. Added time stamp property for typeNode AppleEvent data. Added information to help tags for Node Scan Window pop-ups. Fixed bug that prevented node scan columns from being reorganized in OS X. Generalized Node Scan Window node list accessors to handle NodeStruct types from Rendezvous in addition to raw URL data from NSL/SLP. Added reset NSL/SLP list accessor. Encapsulated slaving launching mechanisms for each MPI type. Created workaround for CloseOpenTransport hang bug. Connected node scan tunnel to Rendezvous in addition to NSL/SLP. Extended slave node launching mechanisms to handle multiple Tasks per Computer feature. Separated MPI/Pro launch into its own command. Created and implemented retain job information and duplicate files commands.
v1.2.1: June 30, 2002
When launching jobs via the Terminal.app while Terminal is not yet running, added a search by the name Terminal.app in addition to its signature code. Improved the error code reporting in case Terminal.app launching fails. If two copies of Pooch are launched, and the first has already set up its network connections, the second copy of Pooch will remain inert, retrying once per minute, rather than interfere with the first. If NSLStandardDeregister fails, Pooch will report the error, rather than simply beep. Changed the Node Scan Window code so the browser list sort and movability default settings are more consistent across columns. Changed the Rating column margin setting to 5 pixels on OS 9. Corrected a problem in the Job Window where the Launch Job button would remain disabled after using the Recent Node Lists menu. Updated this manuals mpich description to recommend installation in the home directory.
v1.2: March 27, 2002
Major new features include:
- Support for jobs compiled with mpich and MPI-Pro
- Support for Cocoa apps and other bundle-based applications
- Complete support for Unix-based executables, including Mach-O executables and Unix scripts
- Complete support for Unix-style permissions, processes monitoring, process kills
- Registration of nodes on the network can now be scheduled
- Recently selected nodes and files can be recalled
In detail: Added the ability to recognize folders, Unix scripts, bundles, and bundle icons to the Job Window. Added the flexibility of adding more than one executable to the Job Window, while maintaining that the first file is the first executable, even when files are later removed, and labeling that executable with a . Altered the style specification of columns of the Job Window, Node Scan Window, Node Info Window, and Get Files Window for greater consistency between OS X and OS 9. Altered the internal control and window spacing specification for greater consistency throughout Pooch. Altered the Job Options pop-up to accept submenus. Added and implemented the Job Type submenu to the Job Window. Altered Job Window so its node list displays IP address when there is no name. Altered the Node Info Window text display, Node Scan Window node count placard, Get Files placard, and about box to use the correct antialiased fonts when running on OS X. Added calls to the Remote Node Scan cache of IP addresses to warm up the DNS search when the Node Scan Window is opened rather than only when the pop-up is first clicked. Fixed a bug preventing help tags from appearing under some circumstances. Corrected the status string region of the Node Scan Window to enable itself more quickly when its window is made active. Added the Clock Speed column to the Node Scan Window. Prevented the remote node scan node information from being overwritten prematurely. Prevented the Node Scan Window from displaying too many nodes. Optimized the parallel node query retry mechanism. Changed the ThemeWindowBackground of the Start Time dialog and single entry dialogs to the default on OS X. Added code to save and retrieve information about a series of recently launched jobs. Added the recent node and file list submenus with abbreviated names and the ability to restore them to the current Job Window. Changed the Register this Node preference so that it could accept Always, Never, and Scheduled modes instead of just on and off. Added Node Registration Schedule dialog, switchable between Everyday, Weekend/Weekday, and Weekly modes, with corresponding implementation in the node registration code, while attempting to follow Apples HI guidelines. Added implementation and interface to save, recall, and display recently used nodes, files, and applications for new jobs, including on-the-fly abbreviation of node names in menus. Correctly retrieve and recognize Unix permissions of all files and folders on OS X, then use that information to recognize Unix-based applications. All permissions of files and folders and preserved when copying between OS X machines. Added code to correctly recognize, copy, and launch application bundles, especially Cocoa apps. Added the option to launch Unix-based executables via a Unix tcsh script rather than through a call to system. Added code to recognize Process Manager processes by signature. Added the option to launch Unix-based executables through the Terminal app. Terminal is called by sending AppleEvents. If Terminal is not running, it is launched via Launch Services. Added the ability to kill -9 processes on OS X. Added code for writing procgroup files and specifying Unix command-line options to pass cluster information to the mpich master and slaves. Added environment variable specification and launch code for MPI-Pro jobs. Added the ability to kill all running processes tracked by Pooch at once. Added utilities to extract and search for arbitrary information, including searching for particular POSIX paths or particular process ID numbers, from ps output. Organize process tracking by process ID on OS X. Tracks all Unix processes, rather than only those reported by the Carbon Process Manager. Added tracking of Unix process creation by POSIX path, even if the process takes up to a minute to appear in ps. Extended data structures and added implementations to supply Unix process information to other nodes in addition to Process Manager information while retaining backwards compatibility. Added node registration information and implementation to AppleScript interface. Added job type to AppleScript job class type and corresponding implementation to specify MacMPI, mpich, and MPI-Pro job types. Correctly clear the job options structure when there are no options present in a job data structure. Added job type, subdirectory, and start time accessors for job data structures. Added and implemented Kill All Jobs on Deregister, Recent Node Lists Length, Recent Apps and Files Lists Length, and Use Terminal For Unix Jobs preferences. Added and implemented the Accept Remote Preferences Command preference in a private resource that cannot be remotely overwritten. Added and implemented Job Type automatic reloading preference. Schedule more time to launching process during a parallel launch, improving launch speeds. Encapsulated and optimized the Parallel Launch Status window so that it updates itself much more smoothly and cleanly and uses internal timers to leave only a minimal burden on the launch process. Corrected an error on OS X where garbage might appear in the about box when NSL/SLP attempted to register. When NSL/SLP fails to successfully register (e.g., when the network interface is unavailable), Pooch retries with decreasing, random frequency, rather than at regular intervals. Changed the Select Apps and Files dialog to accept folders, bundles, and multiple selections. Correctly report network, hard drive, and system activity to the Power Manager when remote requests and commands occur or when launching jobs. Created new commands to launch bundle-based applications, mpich executables, and MPI-Pro executables, to create a new subdirectory and specify a bundle application simultaneously, and to retrieve OS X file information. Changed the change directory command to accept a parent directory specification. Extended the set file information command to accept OS X file information. Fixed a bug where the Parallel Launch Status window progress bars would jump. Altered send file sequence of the launch process to recursively explore a folder structure using FSOpenIterator and FSGetCatalogInfoBulk. Altered the TCP port number information. Provided correct rounding of processor clock speed in Node Info Window, Node Scan Window, and AppleScript interface. Recognizes the PowerPC G4 7455. 36,326 lines of code.
v1.1.2: January 11, 2002
Added Node Scan Columns Show Columns
window. Optimized and made more fault tolerant, because of particular behaviors in Open Transport, the initial connection algorithms in the launching mechanism for greater speed and reliability when launching onto large numbers of nodes. Reorganized and rearranged the Job Window and Node Info Window for almost complete compliance with Apples Aqua user interface guidelines. Made the job window horizontally resizable. Added recognition of the PowerPC 7450. Under OS X, made the Remote Node Scan
, Open Apps and Files
, Place in Subdirectory
, and Start Job at Time
dialogs appear as sheets. Modified the connection algorithms in the Node Scan Window to attempt access to all found nodes once before giving other nodes a second chance to connect. Modified timeouts and other timings for the Node Scan Window, Node Info Window, and the launch process. Greater use of DrawStringUsingThemeTextBox, with the correct font and size selections, in the Node Info Window, the Job Window, the Node Scan Window, the Get Files window, and the About Box for smoother text under OS X. Added Always Include this Node in Local Scan setting with corresponding implementation. Added Accept Commands from Local Subnet Only setting with corresponding implementation. Gave the launch process code and network scan engine code greater encapsulation. Instructed Pooch to wake up the Macs upon successful launch on all nodes. Modified to recognize window classes correctly when determining the frontmost window. Corrected certain behaviors in the launch progress bar. Corrected a cosmetic error when the network scan code reports its registration. Corrected internal network structure allocation code for the case when a part of Pooch requests to connect to more than 64 other Pooches. Correctly notifies the system of network activity when Pooch receives commands. Acquires the correct IP address when multiple network interfaces are active.
v1.1: October 2, 2001
Fully operational on OS 9.x with CarbonLib 1.2.x and todays OS X 10.1. AppleScript interface implemented and supported, making customized parallel launches, customized job queues, Unix command-line calls to Pooch, and fully automated parallel application launching possible.
Worked with Apple to fix NSL, Open Transport, and CoreServices bugs in 10.0.x, leading to the fixes in 10.1 release. On X, now uses the Unix ps command to determine load distribution on a node. On X 10.1, uses CSCopyMachineName to determine the computer name. Heuristic node rating added and exposed to the Node Scan Window and AppleScript interfaces. Add Best
pop-up replaces Add All button, leading to alteration of corresponding selection list calculation routines. Revised Node Scan Window code so that, when automatic rescan occurs, the scan results no longer disappear and reappear when being updated. Nodes selected for launch in the Job Window now appear in the Node Scan Window in gray, rather than disappearing. Revised Network Neighborhood Node Scan pop-up menu and behavior. Added the ability to recognize and launch Mach-O, Unix-based executables. Altered Open Transport endpoint settings and optimized background time calculation and internal task code to improve network performance, file transfer speed, and launch speed. When remembering the last job and node 0 was itself, updates the node 0 IP address if it has changed. Added Verbose Error Reporting setting to reduce nuisance calls. Added the job option to skip the launch of node zero. Internal window code structure reorganized and factored to a greater degree. In X, ATSUI text methods now used for custom content Data Browser columns to match font type and size of the other columns.
v1.0.2: May 2, 2001
Fully operational on OS 9.x with CarbonLib 1.2.x and todays OS X.
Prevented a possible circumstance where Poochs connector endpoint would continuously reject an incoming password. Added code to the Set Pooch in Login Items to operate correctly when there is no loginwindow.plist file. Fixed a cosmetic bug in the Job Launch status window where the text could overlap. Fixed a cosmetic bug so that the node list font looks the same as the file list font. Corrected a Launch Job button behavior. The DNS translation routines correctly report a dotted quad.
Remaining known issue on X: The problem of the Process manager returning zero is still present.
v1.0.1: April 25, 2001
Fully operational on OS 9.x with CarbonLib 1.2.x. This Pooch contains workarounds for known bugs in OS X, so this Pooch should be fully operational with todays X.
Known issues in OS X 10.0.1:
Pooch uses the Process manager to monitor the CPU time taken by other apps and estimate load. However, Apples documentation says the Process Manager on X returns zero CPU time for all processes, making these calculations impossible directly in Carbon. A workaround through Unix may eventually be devised.
During the search for nodes, NSLM does not respect maxSearchTime parameter, resulting in infinite searches. The current workaround is to set maxSearchTime to zero and use NSLCancelRequest.
The official way to access the user-specified computer name entered in the Sharing system preference, CSCopyMachineName, returns the wrong name. Another way is being used.
Setting Pooch in Login Items automatically. There is no documented way to do that, so I edit the loginwindow.plist file myself.
A few cosmetic errors remain, such as custom content Data Browser columns not quite matching the font size of standard text Data Browser items.
v1.0: April 17, 2001
First public release - Fully functional on OS 9.x with CarbonLib 1.2.x. Due to known bugs in OS X, Pooch on OS X cannot be officially supported. Over 23,000 lines of code in C.
Known issues in OS X Release Candidate:
During search for nodes, NSLM does not respect maxSearchTime parameter, resulting in infinite searches. Attempts to cancel the search result in a hang. However, this problem does not restrict launching a job onto OS X machines from an OS 9 machine.
There is no reliable way to access the user-specified computer name entered in the Sharing system preference. The official way, CSCopyMachineName, returns the wrong name.
An assortment of cosmetic errors, such as Aqua buttons not being compatible with texture backgrounds.
X. Credits
Pooch Concept, Code, Engineering, and User-Interface Design
Dean E. Dauger, Ph. D.
Pooch Icon Art Jean-Marie Venturini
Advice and Brainstorming Viktor K. Decyk, Ph. D., Alan B. Dauger, Ph. D.
Kevin T. Sinclair, EE & MBA, Robert M. Zirpoli, III, M. E., Catherine C. Venturini, M. S.
Beta Testing Viktor K. Decyk, Ph. D., Frank S. Tsung, Ph. D., John W. Tonge, Ph. D.
Compiler Metrowerks CodeWarrior Pro 6
We also wish to thank employees of Apple Computer for their assistance, technical and otherwise, who helped demystify the future of the Mac OS and saved a lot of headaches:
Tim Parker, Warner Yuen, Robert Kehrer,
John Tucker, Kevin Arnold, Steve Zimmer, Ernest Prabhakar, Ph. D.
XI. Contact Information
Pooch is released by Dauger Research, Inc. The latest version of this document is available from the Pooch web site. We are open to suggestions and wish lists regarding parallel computing issues. Depending on your needs and the nature of the request, custom configurations of Pooch can be arranged.
Dauger Research, Inc. http://daugerresearch.com/
Pooch web site http://daugerresearch.com/pooch/
The Dauger Research Vault, providing tutorials, sample code, and other documentation http://daugerresearch.com/vault/
Pooch technical support, questions, and feedback pooch@daugerresearch.com
Dauger Research mailing list info http://daugerresearch.com/mailing-list/
The latest MacMPI libraries are available from the AppleSeed web site.
AppleSeed http://exodus.physics.ucla.edu/appleseed/
UCLA Plasma Physics Group http://exodus.physics.ucla.edu/
Pooch and this document are Copyright © 2001-6 Dauger Research, Inc.
Macintosh, Mac OS, and Mac OS X are trademarks of Apple Computer, Inc.
Last Update: May 15, 2006
Appendix. Compiling the MPIs
This Appendix was written to provide basic information on compiling your code with MacMPI_X, mpich, and MPI-Pro and running these codes in Pooch. Most of this information is derived from outside sources and is not necessarily complete for all purposes. We provide this information as a convenience. Note that these MPIs were not created and are not maintained by Dauger Research, Inc., so this information is subject to change at any time.
MacMPI_X
MacMPI_X, an Open Transport/Carbon-based implementation of an MPI subset, is available from the AppleSeed Development web site:
http://exodus.physics.ucla.edu/appleseed/dev/
Both C and Fortran versions are available with corresponding header files. An introduction to MPI, some basic examples of MPI programming, and links to references on MPI are available on the site as well. The following information is derived from the above link. Although MacMPI is only a subset of MPI-1, Dauger Research, Inc., recommends its use because of its long history of reliability on the Mac OS, its wide flexibility compiling in different environments, and its extremely helpful diagnostics.
Compiling MacMPI_X.f with the Absoft Pro Fortran v7.0 compiler in OS Xs Terminal
Download MacMPI_X.f and mpif.h and include them in the same directory as your Fortran source. Compile MacMPI_X using:
f77 -O -c MacMPI_X.f
which generates MacMPI_X.o. To compile your code, you will need to link with MacMPI_X.o and the Carbon libraries. For example, depending on if you are using Fortran 77 or Fortran90/95 to compile test.f, you would use commands like these:
f77 -O test.f MacMPI_X.o /System/Library/Frameworks/Carbon.framework/Carbon
f95 -O test.f MacMPI_X.o /System/Library/Frameworks/Carbon.framework/Carbon
These commands create a Mach-O executable, which Pooch should recognize. By default, the Absoft compiler on OS X creates executables with a 512 kB stack. If your application needs more, you may wish to use the -s flag or the limit command.
Compiling MacMPI_X.f with the Absoft Pro Fortran v7.0 compiler on OS 9
Again, you will need to download MacMPI_X.f and mpif.h into the directory where your source is located. Compile MacMPI_X using:
f77 -O -c MacMPI_X.f
which generates an object file named MacMPI_X.f.o. To compile your main code, test.f in this example, in Fortran 77 or Fortran 90, use commands like these:
f77 -O test.f MacMPI_X.f.o
f90 -O test.f MacMPI_X.f.o
This version of the Absoft compiler on OS 9 will create a Carbon application by default, which should be executable on both OS 9 and X. Pooch should be able to recognize and launch this application.
Compiling MacMPI_X.c with Metrowerks CodeWarrior Pro 6 for both OS 9 and X
There are two major options to using MacMPI_X.c in CodeWarrior: 1. using the Standard C Console window; and 2. creating a Macintosh application. In most cases where you are porting an ANSI C-compliant code from another platform, you would probably want to use the Standard C Console. A Macintosh application (the Power Fractal app and Parallel Fractal Demo are examples) would need to know how to organize menus, windows, and so forth.
Standard Console C
If your C source code is standard ANSI C (i.e., this code is multiplatform and uses ANSI C calls like fprintf and scanf), you should start creating your project by selecting the Mac OS C Stationery category. Under the Standard Console > Carbon category, select the Std C Console Carbon stationery. Add your source as you normally would. Then add MacMPI_X.c. Be sure mpi.h is in your source directory as well.
There are a few settings that should be set for the best behavior of your executable. In the Target Settings Panel named "PPC Target", edit the 'SIZE' Flags to use localAndRemoteHLEvents. Also in the same panel, be sure to set the Preferred Heap Size so that your code will have enough available memory. In addition, your code will need to edit certain run-time console flags. Before your main() code, add #include <SIOUX.h>, and at the top of your main() code add these two lines:
SIOUXSettings.asktosaveonclose=0;
SIOUXSettings.autocloseonquit=1;
This will have the app quit when it falls out of main(), otherwise, the apps will wait for user input before quitting, locking up the remote machines forever.
In addition, if you don't already have one, you may wish to add a call to printf prior to calling MPI_Init(). It appears to be necessary for proper initialization of the app's runtime environment, without which a crash may occur.
Mac OS Toolbox C
If your C source code is a Macintosh C application (i.e., your C calls Macintosh Toolbox routines exclusively), then create your project using the Mac OS C Stationery category. Under the Mac OS Toolbox category, select the MacOS Toolbox Carbon stationery. Add your source as you normally would, then add MacMPI_X.c as well. Since MacMPI relies on ANSI C routines, confirm that the stationery included CodeWarrior's ANSI libraries in your project. Again, in the Target Settings Panel named "PPC Target", edit the 'SIZE' Flags to use localAndRemoteHLEvents. And, when you write your code, be sure to have your remote apps (node ID > 0) quit without direct human interaction. It would also be helpful to have your app's event loop respond correctly to Quit AppleEvents, so Pooch can kill them remotely, if necessary.
Also, you may wish to set the monitor flag of MacMPI_X to 0, which will turn off the MacMPI status window. If you find your event loop and windows conflicting with MacMPI, this change may help.
Compiling MacMPI_X.c with the GNU cc compiler through OS Xs Terminal
In order to use cc, you must install the Mac OS X Developer Tools from the Dev Tools CD. This CD should have accompanied your OS X User Install CD but is also available for download from the Apple Developer web site. Be sure to download MacMPI_X.c and mpi.h into the directory where your source is located. To compile your application, in the case of test.c, from the Terminal window, compile the MacMPI_X.c code and link with the Carbon library and include files using this one-line command:
cc test.c MacMPI_X.c /System/Library/Carbon.framework/Carbon -I /Developer/Headers/FlatCarbon
This command creates a Mach-O executable, which Pooch should recognize and be able to launch.
Compiling MacMPI_X.c in a Cocoa application on OS X
Create a Cocoa application using the Project Builder on OS X. Download MacMPI_X.c and mpi.h into your source directory. Include MacMPI_X.c and Carbon.frameworks in your project. Because MacMPI_X expects to read the nodelist_ip file using fopen, and this file is generally placed in the same directory where the Cocoa bundle application resides, it is necessary to set the default directory to the directory of the application very early in the code (before calling MPI_Init). To do this, add the following code (thanks to Steve Hayman of Apple Canada) to the beginning of main() in main.m:
NSString *path = [[NSBundle mainBundle] bundlePath];
char cpath[1024];
[path getCString:cpath];
{char*lastSlash, *tp;
for(lastSlash=tp=cpath; tp=strchr(tp, '/'); lastSlash=tp++) ;
lastSlash[1]=0; //specifies the parent of the bundle directory
}
chdir(cpath);
When your code calls MPI_Init, MacMPI should then be able to locate the node list information. This executable should be recognized and launched by version 1.2 and later of Pooch.
mpich
mpich is an open source implementation of the MPI standard, and it is used extensively on Linux-based clusters. By 2002, it was ported to Mac OS X. The following information was derived from the mpich documentation.
In order for mpich to work with Pooch, some minor modifications to mpich had to be made. The latest mpich distribution (v1.2.4 as of this writing) came from:
http://www-unix.mcs.anl.gov/mpi/mpich/
Under open-source license, the version of mpich modified for Pooch by Dauger Research, Inc., is available from the Dauger Research web site at:
http://daugerresearch.com/pooch/mpich/
The modification allows Pooch to inform mpich which TCP port it should use. Details on the changes can be found in the Read Me files and source code. This version of mpich should compile precisely as the mpich documentation describes. That is, follow steps 1 through 3 on pages 4 and 5 in mpichman-chp4.pdf.
In summary, cd to the mpich-1.2.4 directory that was created after decompressing and un-tar-ing the mpich archive. We recommend extracting the archive to a directory where you have full read-write-execute access, such as your home directory (e.g., /Users/yourusername/). Then use the following command to specify the mpich libraries in the mpich-1.2.4 directory of /Users/yourusername/ and sense various features of your computer:
./configure --prefix=/Users/yourusername/mpich-1.2.4 |& tee c.log
(You may use another directory if you wish.) Note: if you are using Absofts Fortran 90, you might want to make sure the .a files in the /Applications/Absoft/lib/ are up to date (e.g., use ranlib) and use the --disable-f90modules flag when using ./configure:
./configure --prefix=/Users/yourusername/mpich-1.2.4 --disable-f90modules |& tee c.log
Then make mpich using:
make |& tee make.log
This step may take many minutes as it compiles all of mpich. A very large amount of output will be created.
To compile your C code, you may use the mpicc command residing in the bin directory of the mpich library directory:
./bin/mpicc -o test.out test.c
Pooch should recognize the resulting executable. IMPORTANT: Be sure to select mpich from the Job Type submenu of the Options
pop-up on the Job Window. Pooch should then be able to launch your mpich executable.
Please note that using this modified mpich with Pooch DOES NOT require:
- NFS or Network File System - a mechanism for many computers to have identical access to the same file and directory structure over a network
- rsh or ssh - the ability to remotely log into machines at a command-line level
- the machines.xxx file - a file containing a static list of host names
- mpirun - mpichs default mechanism for launching jobs, which normally needs the above features
- all nodes to have access to the mpich libraries
This combination of mpich and Pooch releases these traditional requirements.
MPI/Pro
MPI/Pro is a commercial implementation of the MPI standard developed and released by MPI Software Technology, Inc., whose web site is:
http://www.mpi-softtech.com/
MPI/Pro is a trademark of MSTI. Some of the people there have been involved with the MPI standard since its beginning. They provide a commercial impetus to the performance and quality of their MPI implementation.
To install MPI/Pro, you may use the its installer package. No special modifications or configuration is needed for MPI/Pro to work with Pooch. Also, configuring inetd.conf or .rhost files is not necessary when using MPI/Pro with Pooch. Once you have finished installating MPI-Pro, you may compile your C code using the following command:
cc -o test.out test.c -lmpipro -lpthread -lm
Pooch should recognize the resulting executable. IMPORTANT: Be sure to select MPI/Pro from the Job Options pop-up on the Job Window. Pooch should then be able to launch your executable. (Thanks goes to Bobby Hunter of MSTI for his insight into MPI/Pro.) Again, using NFS, rsh, inetd.conf files, .rhosts files, or mpirun is not necessary.
mpich-gm
mpich-gm is a version of ANLs mpich for modified to use Myricoms Myrinet hardware interface. The Mac OS X version of mpich-gm is available here:
http://www.myri.com/scs/
Myrinet is a trademark of Myricom. No modification of their distribution was required.
After installing the gm libraries and drivers on all nodes with Myrinet hardware, you may use the the following configure line to install mpich-gm:
./configure --with-device=ch_gm -prefix=/dir/for/gm --enable-sharedlib
then the usual make and make install commands. Also, configuring host files is not necessary when using mpich-gm with Pooch. Be sure that all dylibs, or dynamic libraries, created by this mpich are deleted. By removing these dylibs, you need install mpich-gm only on the machine you use to compile your code. Once you have finished installating mpich, you may compile your C code using its mpicc command:
mpicc -o test.out test.c
Pooch should recognize the resulting executable. IMPORTANT: Be sure to select mpich-gm from the Job Type option in the Job Window. Pooch should then be able to launch your executable on a Mac cluster connected using Myrinet hardware. Many thanks goes to Prof. John Huelsenbeck of UCSD for his support and help. As before, using shared storage (NFS, etc.), ssh, rsh, inetd.conf files, .rhosts files, or mpirun is not necessary.
LAM/MPI
LAM/MPI is an open-source MPI implementation created and supported at the Pervasive Technology Labs at Indiana Unviersity. The original Mac OS X version of LAM is available here:
http://www.lam-mpi.org/
With the help of Dr. Jeff Squyres there, we have produced a modified version of LAM that operates with Pooch available here:
http://daugerresearch.com/pooch/lam/
To install this LAM, you may use the the following configure line:
./configure --prefix=/usr/local/lamPooch --with-boot=pooch
If you dont have a fortran compiler installed, you will need to add --without-fc. If you want a dual-use LAM that will operate normally and be able to be used with Pooch, remove the --with-boot=pooch argument. After that, use the usual make and make install commands. This places the LAM binaries in /usr/local/lamPooch, where Pooch can find it. Configuring host files or other static data is not necessary when using this LAM/MPI with Pooch. Because LAMs run time environment (RTE) executables are needed for LAM to run, you will need to repeat the above LAM installation process for every node on your cluster.
Once you have finished installating LAM, you may compile your C code using its mpicc command:
/usr/local/lamPooch/bin/mpicc -o test.out test.c
Pooch should recognize the resulting executable. IMPORTANT: Be sure to select LAM/MPI from the Job Type option in the Job Window. Pooch should then be able to launch your executable on a Mac cluster whose nodes have the above LAM executables installed. As always, using shared storage (NFS, etc.), ssh, rsh, inetd.conf files, .rhosts files, or mpirun is not required.
1 Also operates on OS 9. See the Introduction for system requirements.
2 This contrasts with distributed computing, such as SETI@Home, in which the pieces of its computation are all but completely isolated from one another.
3 As of version 1.2, Pooch can recognize and launch a wide variety of applications, including Classic apps, CFM-based Carbon apps, bundle-based Carbon apps, Cocoa apps, Mach-O executables, Unix scripts, and AppleScript apps. In version 1.7, Pooch supports launching Universal Binaries.
4 Unix permissions will be preserved only when copying between OS X machines.
5 For more about the different types of parallel computing, see http://daugerresearch.com/pooch/parallelzoology.html