Oxbow Developers
oxbow-dev@email.ornl.gov
HPC architectures will continue to change over the next decade in response to efforts to improve energy efficiency, reliability, and performance. At this time of significant disruption, it is critically important to understand the requirements of contemporary and future extreme-scale scientific applications, so that we can drive or adopt new architectural and software features that satisfy the requirements of our applications. e.g., integrated GPU and CPU, integrated random number generator, transactional memory, fine-grained power management, MPI collective offload.
Hence, we believe that it is essential to quantitatively measure, project, and prioritize the resource and feature requirements of our anticipated workloads on such extreme-scale systems.
The Oxbow toolkit is a collection of tools to empirically characterize application behaviours along a critical set of dimensions namely computation, communication, memory capacity and access patterns.
For instructions on building and installing Oxbow, see the README included with the Oxbow tools source distribution. Following these instructions should result in an installation directory structure that includes a subdirectory for the oxbow tools and (optionally) a subdirectory for third party utilities. The directory names will be determined by the vendor ID of the compiler used during the build process.
For example, using gnu compilers and installing into the prefix /local/opt/oxbow, will result in an installed directory structure something like the following:
/local/opt/oxbow/oxbow-tpls-gnu/ /local/opt/oxbow/oxbow-tpls-gnu/binutils-VER /local/opt/oxbow/oxbow-tpls-gnu/libunwind-VER /local/opt/oxbow/oxbow-tpls-gnu/papi-VER /local/opt/oxbow/oxbow-tpls-gnu/pin-VER /local/opt/oxbow/oxbow-tool-gnu/ /local/opt/oxbow/oxbow-tool-gnu/bin /local/opt/oxbow/oxbow-tool-gnu/etc /local/opt/oxbow/oxbow-tool-gnu/include /local/opt/oxbow/oxbow-tool-gnu/lib /local/opt/oxbow/oxbow-tool-gnu/share
The etc directory contains a script, envvars.sh to set up your environment for building and running applications using the Oxbow tools. In section 3, most of the instructions begin by sourcing this script. For example, using our installation of Oxbow on the Keeneland test system, the user environment for using Oxbow tools with Intel compilers is set up by running:
$ source /nics/a/proj/oxbow/oxbow-tool-intel/etc/envvars.sh
The other subdirectories of oxbow-tool-vendor follow standard conventions.
There are currently five tools available in the Oxbow toolkit.
The sections below contain specific instructions for building and running each tool. Rather than run the tools directly, you may use convenience scripts. These are installed under:
/path/to/oxbow/oxbow-tool-vendor/bin/util
All of the scripts are named for the tool they invoke, and have similar usage:
$ export PATH=$PATH:/path/to/oxbow/oxbow-tool-vendor/bin/util $ do-tool-name.sh [FLAGS] OUTDIR -- [MPIRUN] -- COMMAND
These are meant to be common case run scenarios, so the FLAGS here are not specific flags to the tool being run. There is only one flag value of interest, and it applies only to miami-imix, pin-imix, and reused. For these tools, -unmarked is used to tell the script that the binary being run does not contain caliper functions.
These tools all produce various output files. Sometimes quite a lot of files are generated, as for multithreaded applications or MPI processes with many ranks. The OUTDIR argument to the convenience script tells the tool where to place all output. This directory will also contain a log of the exact commands issued by the script when invoking the command, as well as any output and error messages. The log will be located in OUTDIR/do-tool-name.log.
If the application is launched with an mpirun (or aprun or mpiexec) type command, enclose this command and any arguments between two sets of dashes. Follow the MPIRUN command with the actual command and arguments to be executed.
Run an unmodified MPI binary using the pin-imix tool. Put output in myoutdir.
$ do-pin-imix.sh -unmarked myoutdir -- mpirun -n 4 -- ./myprog arg1 arg2
Run an unmodified serial binary using the reused tool. Put the output in $HOME/work
.
$ do-reused.sh -unmarked $HOME/work -- -- echo "hello"
Run a modified serial binary that has caliper functions added for pin-imix. Put the output in $HOME/work
.
$ do-pin-imix.sh $HOME/work -- -- ./myprog-modified arg1 arg2
Run a binary that has been relinked for mpiP. Put the output in myoutdir.
$ do-mpip.sh myoutdir -- aprun -B -- ./mympi-modified arg1 arg2
mpiP is a lightweight profiling library for MPI applications. Because it only collects statistical information about MPI functions, mpiP generates considerably less overhead and much less data than tracing tools. All the information captured by mpiP is task-local. It only uses communication during report generation, typically at the end of the experiment, to merge results from all of the tasks into one output file.
For extensive information about configuring and using mpiP, see the mpiP user guide. It can be accessed online at:
A copy of the user guide is also installed in Oxbow under:
/path/to/oxbow/oxbow-tool-vendor/share/doc/mpip/
To use mpiP to characterize your application's communication patterns, you will need to relink your application against the mpiP libraries and its third party library dependencies. Add the following link flags:
-L${OXBOW_TOOLS_DIR}/lib -lmpiP -L${LIBUNWIND_DIR}/lib -lunwind -L${BINUTILS_DIR}/lib -lbfd -L${BINUTILS_DIR}/lib64 -liberty
The locations of the required libraries can be added to your environment by sourcing the environment setup script installed in oxbow. For example:
$ source /path/to/oxbow/oxbow-tool-vendor/etc/envvars.sh $ mpicc -g obj1.o obj2.o -o myprog-mpip -L${OXBOW_TOOLS_DIR}/lib -lmpiP \ -L${LIBUNWIND_DIR}/lib -lunwind -L${BINUTILS_DIR}/lib -lbfd -L${BINUTILS_DIR}/lib64 -liberty
Once the application is relinked, launch as normal. The application will output the results of profiling the MPI communication. To configure what output is produced, set the MPIP
environment variable. The variable stores flags with similar syntax to command line flags. See the user guide for information on specific flags.
If you are using the provided do-mpip.sh
convenience script, MPIP
will be set in the script unless you set it yourself before running the script. The setting for MPIP
in the convenience script will output results for both collective communication, point-to-point communication, as well as a collective communication matrix.
$ do-mpip.sh myoutdir -- mpirun -n 64 -- ./myprog-mpip arg1 arg2
The Miami imix tool profiles an application run using the Intel PIN profiling infrastructure. Each instruction is broken down into micro-operations: inividual reads, writes, integer, float, and SIMD operations. The tool output prints the counts of each micro-operation type in a comma-seperated-value file. The rows of the csv indicate which binary module the instructions resulted from.
To obtain the instruction mix, you can either:
For code that does not have caliper functions around a section of interest:
$ do-miami-imix.sh -unmarked myoutdir -- mpirun -n 4 -- ./myprog arg1 arg2
For code that has had caliper functions added:
$ do-miami-imix.sh myoutdir -- mpirun -n 4 -- ./myprog arg1 arg2
Obtaining the instruction mix is a two step process.
Step 1: Control Flow Graph Info
First, you need to obtain the control flow group information by profiling the application in the following manner.
${OXBOW_TOOLS_DIR}/bin/miamicfg [options] -- <your_application> <your_arguments>
The double dash "-" is important as it separates the instruction mix tool's options from the target application and its parameters.
If this is an MPI application, then place the instruction mix tool in the position where you place your executable name.
mpirun -np 16 ${OXBOW_TOOLS_DIR}/bin/miamicfg [options] -- <your_application> <your_arguments>
No additional options are required for the wrapper. This step creates a .cfg file per process. By default, the output files are named: ExecName-MpiRank-ProcessPid.cfg
Optionally, you can resume and pause data collection dynamically. For this, you must modify the application's source code to insert calls to two, user defined empty functions, one for starting and one for stopping data collection. You can choose any name for these two caliper functions.
Once you identified suitable functions, you should pass use the folowing parameters to the instruction mix tool.
-q -start <name_of_start_function> -stop <name_of_stop_function>
Step 2: Instruction Mix
Once you have the control flow graph information from step 1, you should specify one resulting .cfg file to the miami-imix static tool as follows:
$OXBOW_TOOLS_DIR/bin/miami-imix -c <one_cfg_file>
This command outputs two files:
Note: The '.cfg' files contain information mapped to binary addresses. For this reason, they are valid only with the original executable that you used to collect those files. The CFG file contains paths to the executable and all the shared libraries used during the profiling step.
The second step uses those paths to locate the binaries and decode the instructions. You should not delete or move your binaries before running the second step.
The pin-imix tool outputs counts of instructions categorized by opcode. This tool can be run with or without modification to your program.
Convenience script use for unmarked code:
$ do-pin-imix.sh -unmarked myoutdir -- mpirun -n 64 -- ./myprog arg1 arg2
To run directly (no convenience script) on unmarked code:
$ source /path/to/oxbow/oxbow-tool-vendor/etc/envvars.sh $ mpirun -n 64 ${PIN_DIR}/intel64/bin/pinbin -follow_execv -t \ ${OXBOW_TOOLS_DIR}/bin/imix.pin -category -i -- myprog arg1 arg2
Caliper functions for the various Oxbow tools are provided in an interface library in the oxbow installation.
To use this caliper function library, modify your C/C++ source code with the following:
#include <oxbow.h> // unprofiled code section oxbow_pin_imix_zero(); //reset statistics oxbow_pin_imix_start(); //start profiling // profiled code section oxbow_pin_imix_stop(); //stop profiling // unprofiled code section
When compiling, add the following include flags to your object compilation:
-I${OXBOW_TOOLS_DIR}/include
Add the following library flags during the link step:
-L${OXBOW_TOOLS_DIR}/lib -loxbow
The OXBOW_TOOLS_DIR
variable is set using the envvars.sh
script. So, an example compilation after adding caliper functions would be:
$ source /path/to/oxbow/oxbow-tool-vendor/etc/envvars.sh $ cc -I${OXBOW_TOOLS_DIR}/include -c myprog.c $ cc -o myprog myprog.o -L${OXBOW_TOOLS_DIR}/lib -loxbow
Convenience script use for marked code:
$ do-pin-imix.sh myoutdir -- mpirun -n 64 -- ./myprog arg1 arg2
To run directly (no convenience script) on unmarked code:
$ source /path/to/oxbow/oxbow-tool-vendor/etc/envvars.sh $ mpirun -n 64 ${PIN_DIR}/intel64/bin/pinbin -follow_execv -t ${OXBOW_TOOLS_DIR}/bin/imix.pin \ -start_address oxbow_pin_imix_marker_start:repeat \ -stop_address oxbow_pin_imix_marker_stop:repeat \ -zero_stats_address oxbow_pin_imix_zero_stats:repeat \ -emit_stats_address oxbow_pin_imix_emit_stats:repeat \ -category -i -- myprog arg1 arg2
To use the memory bandwidth tool, you need PAPI installed in your environment. The environment variable PAPI_DIR must be set to a suitable value for your environment.
Please include the header file mem_bandwidth_calipers.h in your application.
Then you can use the following three calipers to mark the desired section in your application.
Initializes the PAPI library. Should be called once, before the other calipers are called. The rank is used as a suffix for the output file name.
This needs to be inserted at the beginning of the section you wish to profile.
This needs to be inserted at the end of the section you wish to profile. The 'name' parameter is used to identify the code section that you profiled.
You can call the start and stop calipers multiple times. The results are appended to the output file.
When building your application, you should
Please ensure that you have the suitable PAPI include and link flags for your platform.
The calipers will create one output file per process. The output files are named:
bandwidth_counts-<mpi_rank>-<proc_pid>.csv
You can obtain the reuse distance metrics by running:
$ ${PIN_DIR}/intel64/bin/pinbin -follow_execv -t ${OXBOW_TOOLS_DIR}/bin/reuse_dist_cal \ -- mpiexec -n nprocs your_app args
The output files are named hist_reuse_dist_cal__xxx.txt
where xxx is the PID of process.
The tool will collect reuse distance for the whole application by default.
Please include the header file reused_api.h in your application.
Similar to the Memory bandwidth tool, we can insert a few function calls in the application source to mark a portion of the application we are interested in profiling.
When building your application, you should
This research is sponsored by the Office of Advanced Scientific Computing Research in the U.S. Department of Energy. The paper has been authored by Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC under Contract #DE-AC05-00OR22725 to the U.S. Government. Accordingly, the U.S. Government retains a non-exclusive, royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes.