HP XC System SoftwareUser’s GuidePart Number: AA-RWJVB-TEJune 2005Product Version: HP XC System Software Version 2.1This document provides information
Example 7-11: Submitting a Batch job Script That Uses the srun --overcommitOption$ bsub -n4 -I ./myscript.sh "-n8 -O"Job <81> is submi
The following example shows this resource requirement string in an LSF command :$ bsub -R "type=SLINUX64" -n4 -I srun hostname7.5 Getting In
EXTERNAL MESSAGES:MSG_ID FROM POST_TIME MESSAGE ATTACHMENT0----1 lsfadmin date and time stamp SLURM[nodes=4] NIn particular, note the nod e and job al
Example 7-14: Using the bjobs Command (Long Output)$ bjobs -l 24Job <24>, User <msmith>,Project <default>,Status <RUN>,Queue &
To g et detailed inform ation about a finished job, add the -l optiontothebhist command,shown in Example 7-16. The -l option specifies that the long f
$ bsub -Is -n4 -ext "SLURM[nodes=4]" /usr/bin/xtermJob <101> is submitted to default queue <normal>.<<Waiting for dispatch
Example 7-20: View Job Details in LS F (co n t. )<normal>, 4 Processors Requested;date and time stamp: Dispatched to 4 Hosts/Processors<4*lsf
comfortable interactive session, but every job submitted to this queue is executed on the LSFexecution host instead of the first allocated node.Exampl
Table 7-2: LSF Equivalents of SLURM srun Options (cont.)srun OptionDescriptionLSF Equivalent-w, --nodelist=node1,..nodeNRequest a s pecific list of no
Table 7-2: LSF Equivalents of SLURM srun Options (cont.)srun OptionDescriptionLSF Equivalent-r, --relative=nRun a job s tep relative to node n ofthe c
About This DocumentThis manual provides information abo ut using th e features and functions of the H P XCSystem Software and describes how the HP XC
8Using HP-MPIThis chapter describes how to use HP-MPI in the HP X C environment. The main focus ofthis chapter is to help you to quickly get started u
HP-MPI on the HP XC system, last minute changes to HP-MPI functionality, and knownproblems and work-arounds, refer to the HP-MPI Release Notes, which
parallelism. For inform ation about running more complex applications, refer to the H P- M PIuser documentation.8.3.2.1 Example Application hello_worl
Hello world!Hello world! I’m 1 of 4 on host1Hello world! I’m 3 of 4 on host2Hello world! I’m 0 of 4 on host1Hello world! I’m 2 of 4 on host28.3.3 Usin
• The following command runs a.out w ith four ranks, two ranks per node, ranks are blockallocated, and two nodes are used:$ mpirun -srun -n4 ./a.outho
Example 8-1 displays how to perform a system interconnect selection.Example 8-1: Performing System Interconnect Selection% export MPI_IC_ORDER="e
Example 8-5: Allocating 12 Processors on 6 Nodes$ bsub -I -n12 $MPI_ROOT/bin/mpirun -srun -n6 -N6./a.out 1Note that LSF jobs can be submitted without
If yo u would like to see the effects of using the TCP/IP protocol over a higher-speed systeminterconnect, use the -TCP option and omit the -subnet op
8.8 The mpirun Command OptionsHP-MPI on the HP XC system provides the following addition al mpirun command lineoptions:-srunThe -srun option i s r equ
• Chapter 9 describes how to use MLIB on the HP XC system.• Appendix A provides examples of HP XC applications.•TheGlossary provides definitions of th
8.9 Environment VariablesHP-MPI on HP XC provides the followin g additional environment variables:8.9.1 MPIRUN_OPTIONSMPIRUN_OPTIONS is a mechanism fo
for the purpose of determining how much memory to pin for RDMA message transfers onInfiniBand and Myrinet GM. The value determined by HP-MPI can be di
% export MPI_USE_LIBELAN=08.9.10 MPI_USE_LIBELAN_SUBThe use of Elan’s native collective op eration s may be extended to include communicators whichare
Run the resulting prog.x under MPICH. However, various problems will be encountered.First, the MPICH instal lati on will need to be built to include s
8.12 Additional Information, Known Problems, andWork-aroundsFor add itional information, as well as in formation about known problems and work-arounds
9Using HP MLIBThe information in this section describes how to use HP MLIB Version 1.5 in the HP XCenvironment on HP XC4000 and HP X C6000 clusters. T
9.1.2 MLIB and Module FilesFor building and running an application built against MLIB, you must have a consistentenvironment. Modulefiles can make it
9.2.4 Modulefiles and MLIBWhen building or running an application built against MLIB, it is crucial th at the environment isconsistent.Modulefiles can
$ mpi90 [options] file .../opt/mlib/[intel_7.1\intel_8.0]/hpmpi_2.1/lib/64/libscalapack.a \-openmp$ mpicc [options] file .../opt/mlib/[intel_7.1\intel
9.3.3 MPI ParallelismInternal parallelism in ScaLAPACK and SuperLU_DIST is implemented using MPI — aportable, scalable programming model that gives di
HP Message Passing InterfaceHP Messag e P assing Interface (MPI) is an implementation of the MPI standard for HP systems.The home page is located at t
$ mpicc [options] file .../opt/mlib/pgi_5.1/hpmpi_2.1/lib/64/libscalapack.a -mp -lpgf90-lpgf90_rpml -lpgf902 -lpgf90rtl -lpgftnrtl9.3.5.4 Linking Supe
10Advanced TopicsThis chapter covers topics intended for the ad vanced user. The following topics are discussed:• Enabling remote execution with OpenS
Next, get the name of the local m achine serving yo ur display mon ito r:$ hostnamemymachineThen, use the host name of your local machine to retrieve
Step 4. Running an X terminal Session Using LSFThis section shows how to create an X terminal session on a remote node using LSF. In thisexample, supp
AExamplesThis appendix provides examples that illustrate how to buil d a nd run applicati ons on the HP X Csystem. The examples in this section show y
steps through a series of commands that illustrate what occurs when you launch an interactiveshell.Check LSF execution host information:$ bhostsHOST_N
View the job:$ bjobs -l 8Job <8>, User <smith>, Project <default>, Status <DONE>, Queue <normal>,Interactive mode, Extsc
steps through a series of commands that illustrate what occurs when you launch an interactiveshell.Check LSF execution host information:$ bhostsHOST_N
Exit from the shell:$ exitexitCheck the finished job’s information:$ bhist -l 124Job <124>, User <lsfadmin>, Project <default>,Inter
• http://www.nagios.org/Home page for Nagios®, a system and network monitoring application. Nagios watchesspecified hosts and services and issues aler
<<Waiting for dispatch>><<Starting on lsfhost.localdomain>>n14n14n16n16Linux n14 2.4.21-15.3hp.XCsmp #2 SMP date and time stam
Run some commands from the pseudo-terminal:$ srun hostnamen13n13n14n14n15n15n16n16$ srun -n3 hostnamen13n14n15Exit the pseudo-terminal:$ exitexitView
Show the environment:$ lsidPlatform LSF HPC 6.0 for SLURM, Sep 23 2004Copyright 1992-2004 Platform Computing CorporationMy cluster name is penguinMy m
date and time stamp: Submitted from host <lsfhost.localdomain>,to Queue <normal>,CWD <$HOME>,6 Processors Requested;date and time st
GlossaryAAdministrative NetworkThe p ri vate network within the XC system that is used for administrativ e operations.admin branchThe half (bran ch) o
extensible firmware interfaceSee EFIexternal network nodeA node that is connected to a network external to the XC system.FfairshareAn LSF job-schedu l
image serverA node specifically designated to hold images that will be distributed to one or more clientsystems. In a stand ard XC installation, the h
LSF master hostThe overall LSF coordinator for the system. The master load information manager (LIM) andmaster batch daemon (mbatchd) run on th e L SF
Pparallel applicationAn application that u ses a distributed programming model and can run on multiple processors.An HP XC MPI a p plication is a par
Related InformationThis section prov ides pointers to the Web sites for related software products and providesreferences to useful third-party publica
symmetric multiprocessingSee SMPGlossary-6
IndexAapplication( See application d evelopm ent,application development environment )tuning,5-1application developmentbuilding parallel applications,
compiler utilitiesfor compiling and linking parallel programs,3-8compilers,1-7from other vendors,3-2Intel,3-2PGI,3-2compute nodeconfiguring local disk
building parallel applications,3-6module commandsavail command,2-4list command,2-4load command,2-4unload command,2-5modulefileautomatically loading at
Rreserved symbol namesbuilding parallel applications,3-8resource manager,7-1role,1-1Sserial applicationsbuilding,3-4compiling and linking,3-4debugging
• Linux Administration Unleashed, by Thomas Schenk, et al.• Managing NFS and NIS, by Hal Stern, Mike Eisler, and Ricardo Labiaga (O’Reilly)• MySQL, by
discover(8)A cross-reference to a manpage includes the approp riate sectionnumber in parentheses. For example, discover(8) indicates thatyou can find
1Overview of the U ser EnvironmentThe HP XC system is a collection of com puter nodes, n etworks, storage, and software built intoa cluster that w ork
© Copyright 2003–2005 Hewlett-Packard Development Company, L.P.UNIX® is a registered tradem ark of The Open Group.Linux® is a U.S. registered trademar
different roles that can be assigned to a client node, the following roles contain services that areof special interest to the general user:login role
choose to use either the HP XC Administrative Network, or the XC system Interconnect, forNFS operatio ns. The HP XC system interconnect can potentiall
nodes of the system. The system interconnect network is a private network within the HP XC.Typically, every node in the HP XC is connected to the sy s
1.2.3.1 Linux CommandsThe HP XC system supports the use of standard Linux user co mm ands and tools. Stan dardLinux commands are not described in this
1.4 Run-Time E nvironmentIn the HP XC environment, LSF-HPC, SLURM, and HP-MPI work together to provide apowerful, flexible, extensive run-time environ
request. LSF-HPC always tries to pack m ultiple serial jobs on the same node, with one CPU perjob. Parallel jobs and serial jobs cannot coexist on the
supported as part of the HP XC. The tested software packages include, but are not limited to,the following:• Intel Fortran 95, C, C++ Compiler Version
2Using the SystemThis chapter describes tasks and commands that the general user must know to use the system .It contains the following topics:• Loggi
environment variables, such as PATH and MANPATH, to enable access to various installedsoftware.One o f the key features of using m odules is to allow
of shared objects. If you have multiple com pilers (perhaps with incompatible shared objects)installed, it is probably wise to set MPI_CC ( and others
ContentsAbout This Document1 Overview of the User Environment1.1SystemArchitecture ...
Table 2-1: Supplied Modulefiles (cont.)Modulefile Sets the HP XC User Enviro nment:intel/8.1For Intel Version 8.1 compilers.mlib/intel/7.1For MLIB and
If you encounter a modulefile conflict when loading a modulefil e, you must unload theconflicting modulefile before y ou load the new modulefile. Refe
ifort/8.0(19):ERROR:102: Tcl command execution failed:conflict ifort/8.1In this example, the user attem pted to load the ifort/8.0 m odulefile, but af
2.3 Launching and Managing Jobs Quick StartThis section p rovides a brief description of some of the many ways to launch jobs, m a nage jobs,and get i
•TheLSFlshosts command displays machine-specific informatio n fo r the LSF executi onhost node.$ lshostsRefer to Section 7.3.2 for more info rm a tion
2.3.5.2 Submitting a Non-MPI Parallel JobSubmitting non-MPI parallel jobs is disc ussedindetailinSection7.4.4.TheLSFbsubcommand format t o submit a si
Example 2-3: Submitting a Non-MPI Parallel Job to Run One Task per Node$ bsub -n4 -ext "SLURM[nodes=4]" -I srun hostnameJob <22> is su
Example 2-5: Running an MPI Job with LSF Using the External Scheduler Option(cont.)Hello world! I’m 2 of 4 on host2Hello world! I’m 3 of 4 on host3Hel
2.3.6 Getting Information About Your JobsYou can obtain informatio n about your running or completed jobs with the bjobs and bhistcommands.bjobsChecks
distributed wit h the HP XC cluster, such as HP-M PI. Manpages for third-party vendor softwarecomponents may be provided as a part of the deliverables
2.3Launching and Managing Jobs Quick Start ... 2-72.3.1Introduction ...
3Developing ApplicationsThis chapter discusses topics associated with developing ap plicatio ns in the HP XC environment.Before reading this ch apter,
3.2 Using CompilersYou can use compilers acquired fro m other v e ndo rs on an HP XC system. For example, HPXC supports Intel C/C++ and F ortran compi
3.2.4 Pathscale CompilersCompilers in the Pathscale EKOPath Version 2.1 Compiler Suite are supportedon HP XC4000 systems only. See the following Web s
• Section 3.6.1 describes the serial application programming model.• Section 3.6.2 discusses how to build serial applications.For further information
• Launching applications w ith the srun command (Section 6.4)• Advanced topics related to developing parallel applications (Section 3.9)• Debugging pa
Compilers from GNU, Intel and PGI provide a -pthread sw itch to allow compilation withthe Pthread library.Packages that link against Pthreads, such as
The HP XC cluster com es with a mo dulefile for HP-MPI. The mpi modulefile is used to set upthe necessary environment to use H P-MPI, such as the valu
3.7.1.15 Reserved Symbols and NamesThe HP XC system reserves certain symbols and names for intern al use. Reserv ed symbolsand names should not be inc
3.8 Developing LibrariesThis section discusses developing shared and archive libraries for HP XC applications. Buildinga library generally consists of
3.7.2.2Compiling and Linking HP-MPI Applications ...3-83.7.2.3Examples of Compiling and Linking HP-MPI Applications ...
has /opt/mypackage/lib in it, which will then be able to handle both 32-bit and 64-bitbinaries that have linked against libmystuff.so.Example 3-1: Dir
single compilation line, so it is common to talk about concu rrent compilations, though GNUmake is more general.On non-cluster platforms or comm a nd
srcdir = .HYPRE_DIRS =\utilities\struct_matrix_vector\struct_linear_solvers\testall:@\for i in ${HYPRE_DIRS}; \do \if [ -d $$i ]; \then \echo "Ma
By m odify ing the makefile to reflect the changes illustrated above, we will now be processingeach directory serially and parallelize the individual
utilities/libHYPRE_utilities.a:$(PREFIX) $(MAKE) $(MAKE_J) -C utilitiesThe modified Makefile is invoked as follows:$ make PREFIX=’srun -n1 -N1’ MAKE_J
3.9.4 Communication Between NodesOn the HP XC system, processes in an M PI ap plicatio n ru n on compute nodes and usethe system interconnect for comm
4Debugging ApplicationsThis chapter describes how to debug serial and parallel applications in the HP XC developmentenvironment. In general, effective
4.2.1 Debugging with TotalViewYou can purchase the TotalView debugger, from Etnus, Inc., for use on the HP XC cluster.TotalView is a full-featured, GU
• If TotalView is not installed, hav e your administrator install it. Then either you or youradministrator should set up your environment, as d escrib
6.4.6.1I/O Commands ... 6-86.4.6.2I/O Redirection Alternatives ...
4.2.1.5 Starting TotalView for the First TimeThis sectio n tells you what you must do when running TotalView for the first time — before youbegin to u
2. Select Preferences fr om the File pull-down menu of the TotalView R oo t Window.A Preferences window is displayed, as shown in Figure 4-2.Figure 4-
3. In the Preferences windo w, click on the Launch Strings tab.4-6 D ebugging Applications
4. In the Launch Strings tab, ensure that the Enable single debug serverlaunch button is selected.5. In the Launch Strings table, in the area imm edi
6. In the Preferences window, click on the Bulk Launch tab. Make sur e that Enabledebug server bulk launch is not selected.7. ClickontheOK button at t
3. The TotalView m ain control window, called the TotalView root window, is displayed. Itdisplays the following message in the window header:Etnus Tot
7. Click Yes in this pop-up window. The TotalView root window appears and displaysa line for each process being debugged.If you are running Fortran co
5. In a few seconds, the TotalView Process Window will appear, displaying informationon the srun process. In the TotalView Root Window, click Attached
5Tuning ApplicationsThis chapter discusses h ow to tune applications in the HP XC envir onm ent.5.1 Using the Intel Trace Collector/AnalyzerThis secti
8.2HP-MPI Directory Structure ...8-28.3Compiling and Running Applications ...
CLDFLAGS = -static-libcxa -L$(VT_ROOT)/lib $(TLIB) -lvtunwind \-ldwarf -lnsl -lm -lelf -lpthreadFLDFLAGS = -static-libcxa -L$(VT_ROOT)/lib $(TLIB) -lv
6Using SLURM6.1 IntroductionHP XC uses th e Simple Linux Utility for Resource Management (SLURM) for system resourcemanagement and job scheduling. SLU
Table 6-1: SLURM Commands (cont.)CommandFunctionsinfoReports the state of partitions and nodes managed by SLURM. It has a wide varietyof filtering, so
6.4.1.1 srun Rolessrun options allow y ou submit a job by:• Specifying the parallel enviro nm ent for your job, such as the number of nodes to use,par
This command forw ards the standard output and error messagesfrom the running job w ith SLURM ID 6543 to the attachin g sruncommand to reveal the job’
If you specify a script at the end of the srun command line (not as an argument to -A), thespawned shell executes that script using the allocated reso
Each partition’s node limits sup ersede those specified by -N. Jobs that request more nodes thanthe partition allows n ever leave the PENDING state. T
6.4.5 srun Control Optionssrun control options determine how a SLURM job manages its nodes and other resources,what its work ing features (such as job
-J jobname (--job-name=jobname)The -J option specifies jobname as the identifyin g string for this job (along with itssystem-supplied job ID, as store
commands let you choose from among any of five I/O redirection alternatives (modes) thatare explained in the next section.-o mode (--output=mode)The -
9.3.1Platform Support ... 9-49.3.2Library Support ...
Yo u can use a parameterized "format string" to systematically generate unique n ames for(usually) multiple I/O files, each of which receive
--contiguous=yes|noThe --contiguous option specifies whether or not your job requires a contiguous rangeof nodes. The default is YES, which demands co
6.4.8 srun Environment VariablesMany srun options have corresponding environment variab les. An srun option, if invo ked,always overrides (resets) the
The squeue command can report on jobs in the job queue according to their state; valid statesare: pendin g, running, completing, completed, failed, ti
Example 6 -8: Reporting Reasons for Downed, Drained, and Draining Nodes$ sinfo -RREASON NODELISTMemory errors dev[0,5]Not Responding dev86.8 Job Accou
7Using LSFThe Load Sharing Facility (LSF) from Platform Computing Corporation is a batch systemresource manager used on the HP XC system. LSF is inclu
SLURM v iews the LSF-HPC system as one large computer with many resources available torun jobs.SLURM does not provide the same amount of information t
To illustrate ho w the external scheduler is used to laun ch an application, consider the followin gcommand line, which launches an applicatio n on te
queue contains the job starter script, but the unscripted queue does not have the jobstarter script configured.Example 7-1: Comparison of Queues and t
Figure 7-1: How LSF-HPC and SLURM Launch and Manage a JobN16N16User124666677775job_starter.sh$ srun -nl myscriptLogin node$ bsub-n4 -ext”SLURM[nodes-4
7-2Using the External Scheduler to Submit a Job to Run on Sp ecific Nodes ...7-127-3Using the External Scheduler to Submit a Jo b to Run O ne Task
4. LSF-HPC prepares the user environment for the job on the LSF-HPC execution hostnode and dispatches the job with the job_starter.sh script. This use
• LSF d oes not support chunk jobs. If a job is submitted to chunk queue, S LURM will letthe job pend .• LSF does not support topology-aware advanced
The fo llo win g example shows t he output fro m the bhosts command:$ bhostsHOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSVlsfhost.localdomain ok
See the OUTPUT section of the lsload manpage for furth er information about the output ofthis example. In addition, refer to the Platform Computing Co
The basic synopsis of the bsub command is:bsub [ bsub_options] jobname [job_options]The HP XC system has several features that make it o ptimal for ru
additional capabilities at the jo b level and queue level by allowing the inclusion of severalSLURM options in the LSF command line. Refer to Section
Example 7-2: Using the External Scheduler to Submit a Job to Run on SpecificNodes$ bsub -n4 -ext "SLURM[nodelist=n6,n8]" -I srun hostnameJob
This example runs the job exactly the same as in Example 2, but additio nally requests thatnode n3 is no t to be used to run the job. Note that this c
The srun command, used by the mpirun command to launch the MPI tasks in parallel,determines the number of tasks to lau nch from the SLURM_NPROCS envir
7.4.6.1 ExamplesConsider an HP XC system configuration in w hich lsfhost.localdomain is the LSFexecution host and nodes n[1-10] are compute nodes in t
Comentários a estes Manuais