ExHET 2023

PPoPP ExHET 2023

Introduction

While computing technologies have remained relatively stable for nearly two decades, new architectural features, such as specialized hardware, heterogeneous cores, deep memory hierarchies, and near-memory processing, have emerged as possible solutions to address the concerns of energy-efficiency, manufacturability, and cost. However, we expect this ‘golden age’ of architectural change to lead to extreme heterogeneity and it will have a major impact on software systems and applications. In this upcoming exascale and extreme heterogeneity era, it will be critical to explore new software approaches that will enable us to effectively exploit this diverse hardware to advance science, the next-generation systems with heterogeneous elements will need to accommodate complex workflows. This is mainly due to the many forms of heterogeneous accelerators (no longer just GPU accelerators) in this heterogeneous era, and the need of mapping different parts of an application onto elements most appropriate for that application component.

Objectives, scope and topics of the workshop

The goal of this workshop is to provide a forum to discuss new and emerging solutions to address these important challenges from the upcoming extreme heterogeneity era. Papers are being sought on many aspects of heterogeneous computing including (but not limited to):

Heterogeneous Programming Environments and Runtime Systems

Programming models and systems
Parallel resource management on heterogeneous systems
Automated parallelization and compiler techniques (Autotuning)

Heterogeneous Solutions for HPC and Scientific Applications

Parallel and distributed algorithms
Parallel libraries and frameworks
Parallel processing on heterogeneous systems

Heterogeneous (included Non-von Neuman) Architectures

Power/energy management
Heterogeneous architectures for emerging application domains
Architecture designs including Non-von Neuman architectures, memory and interconnection

Reliability/Benchmarking/Measurements

Debugging, performance tools and techniques
Fault tolerance and resilience
Application/hardware benchmarks

Program

8,00 AM - 8,10 AM : Opening Remarks
8,10 AM - 9,00 AM : Keynote: Seungwon Lee, Single Large Job Acceleration with GPU and Processing-in-memory (PIM)
9,00 AM - 9,20 AM : Paper talk 1: Li Tang, Harnessing Extreme Heterogeneity for Ocean Modeling with Tensors
9,20 AM - 9,40 AM : Paper talk 2: Narasinga Rao Miniskar, Tiling Framework for Heterogeneous Computing of Matrix based Tiled Algorithms
9,40 AM - 10,00 AM : Paper talk 3: Gaurav Verma, Transfer Learning Across Heterogeneous Features For Efficient Tensor Program Generation
10,00 AM - 10,20 AM : Coffee break
10,20 AM - 11,00 AM : Invited talk: Juan Gomez-Luna, Understanding a Modern Processing-in-Memory Architecture: Benchmarking and Experimental Analysis
11,00 AM - 12,00 PM : Panel: Seungwon Lee, Dong Li, Juan Gomez-Luna, Li Tang: Challenges and Solutions for the upcoming Extreme Heterogeneity Era
12,00 PM - 12,10 PM : Best paper award
12,10 PM - 12,20 PM : Wrap up

2023 Best Paper Award

Harnessing Extreme Heterogeneity for Ocean Modeling with Tensors
Authors: Li Tang (LANL), Philip Jones (LANL) and Scott Pakin (LANL). Abstract: Specialized processors designed to accelerate tensor operations are evolving faster than conventional processors. This trend of architectural innovations greatly benefits artificial intelligence (AI) workloads. However, it is unknown how well AI-optimized accelerators can be retargeted to scientific applications. To answer this question we explore (1) whether a typical scientific modeling kernel can be mapped efficiently to tensor operations and (2) whether this approach is portable across diverse processors and AI accelerators. In this paper we implement two versions of tracer advection in an oceanmodeling application using PyTorch and evaluate these on one CPU, two GPUs, and Google’s TPU. Our findings are that scientific modeling can observe both a performance boost and improved portability by mapping key computational kernels to tensor operations.

Important Dates

Paper submission (final) deadline : December 23, 2022
Notification of acceptance : January 20, 2023
Camera-ready papers due : February 10, 2023
Workshop day: February 25, 2023

Steering Committee

Jeffrey S. Vetter, Oak Ridge National Laboratory, USA

Mitsuhisa Sato, RIKEN, Japan

Olivier Aumage, INRIA, France

Taisuke Boku, University of Tsukuba, Japan

Manuel Prieto, University Complutense of Madrid, Spain

Hartwig Anzt, KIT, Germany

Hyesoon Kim, Georgia Institute of Technology, USA

Stanimire Tomov, University of Tennessee at Knoxville, USA

Enrique Quintana, Technical University of Valencia, Spain

Antonio J. Pena, Barcelona Supercomputing Center, Spain, Spain

Organizers (Contact us)

Pedro Valero-Lara (co-chair)
Oak Ridge National Laboratory, USA
valerolarap@ornl.gov

Seyong Lee (co-chair)
Oak Ridge National Laboratory, USA
lees2@ornl.gov

Gokcen Kestor (co-chair)
Pacific Northwest National Laboratory, University of California Merced, USA
gokcen.kestor@pnnl.gov

Monil Mohammad Alaul Haque (proceeding chair)
Oak Ridge National Laboratory, USA
monilm@ornl.gov

Marc Gonzalez (publicity chair)
Oak Ridge National Laboratory, USA
gonzaleztalm@ornl.gov

Steve Moulton (web chair)
Oak Ridge National Laboratory, USA
moultonsa@ornl.gov

Programme Committee

William F. Godoy, Oak Ridge National Laboratory, USA
Guray Ozen, Google, USA
Juan Gomez-Luna, ETH Zurich, Switzerland
Jaewoong Sim, Seoul National University, Korea
Ali Akoglu, Arizona State University, USA
Naoya Maruyama, NVIDIA, USA
Jose Manuel Monsalve, Argonne National Laboratory, USA
Junjie Li, University of Texas at Austin, USA

Manuscript submission

Papers reporting original and unpublished research results and experience are solicited. Papers must not exceed 6 pages in standard ACM two-column conference format. ACM templates for Microsoft Word, and LaTeX are available here. All paper submissions will be handled electronically via EasyChair.

Proceedings

All accepted papers will be published in the ExHET-PPoPP Workshops 2022 proceedings by the ACM Digital Library.

Best Paper Award

The Best Paper Award will be selected on the basis of explicit recommendations of the reviewers and their scoring towards the paper’s originality and quality.

Special Issue Journal

Selected best papers of ExHET will be considered for publication in a special issue of the international journal Applied Sciences (IF: 2.838).

Keynote (Seungwon Lee, Samsung):

Single Large Job Acceleration with GPU and Processing-in-memory

In this talk, we explore current trends affecting supercomputers (TOP500) and highlight case studies of large jobs at Samsung Advanced Institute of Technology (SAIT). For Molecular Dynamics (MD) and Density Functional Theory (DFT) simulations, we use GPUs instead of CPUs for improved simulation speed up to 300 times. In addition, we optimized a large language model (BERT) and we can overlap between computation and communications. Using a hyper-parameter auto-tuning method, we can enlarge the batch size and achieve the best time-to-result of BERT training on MLPerf Traning v1.1 and v2.0 when we use 1024 NVIDIA® A100 GPUs. (https://github.com/SAITPublic/). Lastly, we also propose an OpenACC extension for Processing-in-memory (PIM) and some results of HPC benchmarks.

Seungwon Lee is a master (Vice President of Technology) at Samsung Advanced Institute of Technology, Suwon-si, 16678, South Korea. His research interests include large scale deep learning computing SW and near memory computing SW. Lee received a Ph.D. degree in computer science and engineering from Seoul National University, Seoul, South Korea. Contact him at seungw.lee@samsung.com.

Invited Talk (Juan Gomez-Luna, ETH Zurich):

Understanding a Modern Processing-in-Memory Architecture: Benchmarking and Experimental Analysis

Processing-in-memory (PIM) is becoming a reality which promises to overcome the data movement bottleneck (i.e., the waste of execution cycles and energy due to frequent movement of data between memory and compute units) by equipping compute systems with compute-capable memories. Several major vendors and startups have prototyped and announced their PIM architectures. Among them, the UPMEM company commercializes the first publicly-available real-world PIM architecture. This architecture combines traditional DRAM memory arrays with general-purpose in-order cores, called DRAM Processing Units (DPUs), integrated in the same chip. In this talk, we will provide an overview of the first comprehensive analysis of the first publicly-available real-world PIM architecture. We make two key contributions. First, we conduct an experimental characterization of the UPMEM-based PIM system using microbenchmarks to assess various architecture limits such as compute throughput and memory bandwidth, yielding new insights. Second, we present PrIM (Processing-In-Memory benchmarks), a benchmark suite of 16 workloads from different application domains (e.g., dense/sparse linear algebra, databases, data analytics, graph processing, neural networks, bioinformatics, image processing), which we identify as memory-bound. We evaluate the performance and scaling characteristics of PrIM benchmarks on the UPMEM PIM architecture, and compare their performance and energy consumption to their state-of-the-art CPU and GPU counterparts. Our extensive evaluation conducted on two real UPMEM-based PIM systems with 640 and 2,556 DPUs provides new insights about suitability of different workloads to the PIM system, programming recommendations for software designers, and suggestions and hints for hardware and architecture designers of future PIM systems.

Juan Gomez-Luna is a senior researcher and lecturer at SAFARI Research Group at ETH Zurich. He received the BS and MS degrees in Telecommunication Engineering from the University of Sevilla, Spain, in 2001, and the PhD degree in Computer Science from the University of Cordoba, Spain, in 2012. Between 2005 and 2017, he was a faculty member of the University of Cordoba. His research interests focus on processing-in-memory, memory systems, heterogeneous computing, and hardware and software acceleration of medical imaging and bioinformatics. He is the lead author of PrIM (https://github.com/CMU-SAFARI/prim-benchmarks), the first publicly-available benchmark suite for a real-world processing-in-memory architecture, and Chai (https://github.com/chai-benchmarks/chai), a benchmark suite for heterogeneous systems with CPU/GPU/FPGA.

Panel:

Challenges and Solutions for the upcoming Extreme Heterogeneity Era

During the panel discussion, the panelists as well as those participants in the workshop, will have the opportunity to discuss the fundamentals of extreme heterogeneity: challenges and solutions.

Panelists:

Seungwon Lee, Samsung, Korea

Li Tang, Los Alamos Nationa Laboratory, USA

Dong Li, University of California Merced, USA

Juan Gomez-Luna, ETH Zurich, Switzerland

Registration

Information about registration at PPoPP 2023 website.

ExHET 2023

The 2nd International Workshop on Extreme Heterogeneity Solutions
to be held in conjunction with PPoPP 2023
February 25th, 2023
Montreal, Canada

ExHET 2023

The 2nd International Workshop on Extreme Heterogeneity Solutions

Introduction

Objectives, scope and topics of the workshop

Program

2023 Best Paper Award

Important Dates

Steering Committee

Organizers (Contact us)

Programme Committee

Manuscript submission

Proceedings

Best Paper Award

Special Issue Journal

Keynote (Seungwon Lee, Samsung):

Invited Talk (Juan Gomez-Luna, ETH Zurich):

Panel:

Registration