ExHET 2023
The 2nd International Workshop on Extreme Heterogeneity Solutions
to be held in conjunction with
PPoPP 2023
25 February, 2023
Montreal, Canada
Introduction
While computing technologies have remained relatively stable for nearly two decades, new architectural features, such as specialized hardware, heterogeneous cores, deep memory hierarchies, and near-memory processing, have emerged as possible solutions to address the concerns of energy-efficiency, manufacturability, and cost. However, we expect this ‘golden age’ of architectural change to lead to extreme heterogeneity and it will have a major impact on software systems and applications. In this upcoming exascale and extreme heterogeneity era, it will be critical to explore new software approaches that will enable us to effectively exploit this diverse hardware to advance science, the next-generation systems with heterogeneous elements will need to accommodate complex workflows. This is mainly due to the many forms of heterogeneous accelerators (no longer just GPU accelerators) in this heterogeneous era, and the need of mapping different parts of an application onto elements most appropriate for that application component.
Objectives, scope and topics of the workshop
The goal of this workshop is to provide a forum to discuss new and emerging solutions to address these important challenges from the upcoming extreme heterogeneity era. Papers are being sought on many aspects of heterogeneous computing including (but not limited to):
- Heterogeneous Programming Environments and Runtime Systems
- Programming models and systems
- Parallel resource management on heterogeneous systems
- Automated parallelization and compiler techniques (Autotuning)
- Heterogeneous Solutions for HPC and Scientific Applications
- Parallel and distributed algorithms
- Parallel libraries and frameworks
- Parallel processing on heterogeneous systems
- Heterogeneous (included Non-von Neuman) Architectures
- Power/energy management
- Heterogeneous architectures for emerging application domains
- Architecture designs including Non-von Neuman architectures, memory and interconnection
- Reliability/Benchmarking/Measurements
- Debugging, performance tools and techniques
- Fault tolerance and resilience
- Application/hardware benchmarks
Program
8,00 AM - 8,10 AM : Opening Remarks
8,10 AM - 9,00 AM : Keynote: Seungwon Lee, Single Large Job Acceleration with GPU and Processing-in-memory (PIM)
9,00 AM - 9,20 AM : Paper talk 1: Li Tang, Harnessing Extreme Heterogeneity for Ocean Modeling with Tensors
9,20 AM - 9,40 AM : Paper talk 2: Narasinga Rao Miniskar, Tiling Framework for Heterogeneous Computing of Matrix based Tiled Algorithms
9,40 AM - 10,00 AM : Paper talk 3: Gaurav Verma, Transfer Learning Across Heterogeneous Features For Efficient Tensor Program Generation
10,00 AM - 10,20 AM : Coffee break
10,20 AM - 11,00 AM : Invited talk: Juan Gomez-Luna, Understanding a Modern Processing-in-Memory Architecture: Benchmarking and Experimental Analysis
11,00 AM - 12,00 PM : Panel: Seungwon Lee, Dong Li, Juan Gomez-Luna, Li Tang: Challenges and Solutions for the upcoming Extreme Heterogeneity Era
12,00 PM - 12,10 PM : Best paper award
12,10 PM - 12,20 PM : Wrap up
2023 Best Paper Award
Harnessing Extreme Heterogeneity for Ocean Modeling with Tensors
Authors: Li Tang (LANL), Philip Jones (LANL) and Scott Pakin (LANL).
Abstract: Specialized processors designed to accelerate tensor operations are evolving faster than conventional processors. This
trend of architectural innovations greatly benefits artificial intelligence (AI) workloads. However, it is unknown how well AI-optimized accelerators can be retargeted to scientific applications. To answer this question we explore (1) whether a typical scientific modeling kernel can be mapped efficiently to tensor operations and (2) whether this approach is portable across diverse processors and AI accelerators. In this paper we implement two versions of tracer advection in an oceanmodeling application using PyTorch and evaluate these on one CPU, two GPUs, and Google’s TPU. Our findings are that scientific modeling can observe both a performance boost
and improved portability by mapping key computational kernels to tensor operations.
Important Dates
Paper submission (final) deadline : December 23, 2022
Notification of acceptance : January 20, 2023
Camera-ready papers due : February 10, 2023
Workshop day: February 25, 2023
Steering Committee
Jeffrey S. Vetter, Oak Ridge National Laboratory, USA
Mitsuhisa Sato, RIKEN, Japan
Olivier Aumage, INRIA, France
Taisuke Boku, University of Tsukuba, Japan
Manuel Prieto, University Complutense of Madrid, Spain
Hartwig Anzt, KIT, Germany
Hyesoon Kim, Georgia Institute of Technology, USA
Stanimire Tomov, University of Tennessee at Knoxville, USA
Enrique Quintana, Technical University of Valencia, Spain
Antonio J. Pena, Barcelona Supercomputing Center, Spain, Spain
Organizers (Contact us)
Pedro Valero-Lara (co-chair)
Oak Ridge National Laboratory, USA
valerolarap@ornl.gov
Seyong Lee (co-chair)
Oak Ridge National Laboratory, USA
lees2@ornl.gov
Gokcen Kestor (co-chair)
Pacific Northwest National Laboratory, University of California Merced, USA
gokcen.kestor@pnnl.gov
Monil Mohammad Alaul Haque (proceeding chair)
Oak Ridge National Laboratory, USA
monilm@ornl.gov
Marc Gonzalez (publicity chair)
Oak Ridge National Laboratory, USA
gonzaleztalm@ornl.gov
Steve Moulton (web chair)
Oak Ridge National Laboratory, USA
moultonsa@ornl.gov
Programme Committee
- William F. Godoy, Oak Ridge National Laboratory, USA
- Guray Ozen, Google, USA
- Juan Gomez-Luna, ETH Zurich, Switzerland
- Jaewoong Sim, Seoul National University, Korea
- Ali Akoglu, Arizona State University, USA
- Naoya Maruyama, NVIDIA, USA
- Jose Manuel Monsalve, Argonne National Laboratory, USA
- Junjie Li, University of Texas at Austin, USA
Manuscript submission
Papers reporting original and unpublished research results and experience are solicited. Papers must not exceed 6 pages in standard ACM two-column conference format. ACM templates for Microsoft Word, and LaTeX are available here. All paper submissions will be handled electronically via EasyChair.
Proceedings
All accepted papers will be published in the ExHET-PPoPP Workshops 2022 proceedings by the ACM Digital Library.
Best Paper Award
The Best Paper Award will be selected on the basis of explicit recommendations of the reviewers and their scoring towards the paper’s originality and quality.
Special Issue Journal
Selected best papers of ExHET will be considered for publication in a special issue of the international journal Applied Sciences (IF: 2.838).
Keynote (Seungwon Lee, Samsung):
Single Large Job Acceleration with GPU and Processing-in-memoryIn this talk, we explore current trends affecting supercomputers (TOP500) and highlight case studies of large jobs at Samsung Advanced Institute of Technology (SAIT). For Molecular Dynamics (MD) and Density Functional Theory (DFT) simulations, we use GPUs instead of CPUs for improved simulation speed up to 300 times. In addition, we optimized a large language model (BERT) and we can overlap between computation and communications. Using a hyper-parameter auto-tuning method, we can enlarge the batch size and achieve the best time-to-result of BERT training on MLPerf Traning v1.1 and v2.0 when we use 1024 NVIDIA® A100 GPUs. (https://github.com/SAITPublic/). Lastly, we also propose an OpenACC extension for Processing-in-memory (PIM) and some results of HPC benchmarks.
Seungwon Lee is a master (Vice President of Technology) at Samsung Advanced Institute of Technology, Suwon-si, 16678, South Korea. His research interests include large scale deep learning computing SW and near memory computing SW. Lee received a Ph.D. degree in computer science and engineering from Seoul National University, Seoul, South Korea. Contact him at seungw.lee@samsung.com.
Invited Talk (Juan Gomez-Luna, ETH Zurich):
Understanding a Modern Processing-in-Memory Architecture: Benchmarking and Experimental AnalysisProcessing-in-memory (PIM) is becoming a reality which promises to overcome the data movement bottleneck (i.e., the waste of execution cycles and energy due to frequent movement of data between memory and compute units) by equipping compute systems with compute-capable memories. Several major vendors and startups have prototyped and announced their PIM architectures. Among them, the UPMEM company commercializes the first publicly-available real-world PIM architecture. This architecture combines traditional DRAM memory arrays with general-purpose in-order cores, called DRAM Processing Units (DPUs), integrated in the same chip. In this talk, we will provide an overview of the first comprehensive analysis of the first publicly-available real-world PIM architecture. We make two key contributions. First, we conduct an experimental characterization of the UPMEM-based PIM system using microbenchmarks to assess various architecture limits such as compute throughput and memory bandwidth, yielding new insights. Second, we present PrIM (Processing-In-Memory benchmarks), a benchmark suite of 16 workloads from different application domains (e.g., dense/sparse linear algebra, databases, data analytics, graph processing, neural networks, bioinformatics, image processing), which we identify as memory-bound. We evaluate the performance and scaling characteristics of PrIM benchmarks on the UPMEM PIM architecture, and compare their performance and energy consumption to their state-of-the-art CPU and GPU counterparts. Our extensive evaluation conducted on two real UPMEM-based PIM systems with 640 and 2,556 DPUs provides new insights about suitability of different workloads to the PIM system, programming recommendations for software designers, and suggestions and hints for hardware and architecture designers of future PIM systems.
Juan Gomez-Luna is a senior researcher and lecturer at SAFARI Research Group at ETH Zurich. He received the BS and MS degrees in Telecommunication Engineering from the University of Sevilla, Spain, in 2001, and the PhD degree in Computer Science from the University of Cordoba, Spain, in 2012. Between 2005 and 2017, he was a faculty member of the University of Cordoba. His research interests focus on processing-in-memory, memory systems, heterogeneous computing, and hardware and software acceleration of medical imaging and bioinformatics. He is the lead author of PrIM (https://github.com/CMU-SAFARI/prim-benchmarks), the first publicly-available benchmark suite for a real-world processing-in-memory architecture, and Chai (https://github.com/chai-benchmarks/chai), a benchmark suite for heterogeneous systems with CPU/GPU/FPGA.
Panel:
Challenges and Solutions for the upcoming Extreme Heterogeneity EraDuring the panel discussion, the panelists as well as those participants in the workshop, will have the opportunity to discuss the fundamentals of extreme heterogeneity: challenges and solutions.
Panelists:Seungwon Lee, Samsung, Korea
Li Tang, Los Alamos Nationa Laboratory, USA
Dong Li, University of California Merced, USA
Juan Gomez-Luna, ETH Zurich, Switzerland
Registration
Information about registration at PPoPP 2023 website.