PACT 2022   October 10–12, 2022

Tutorials/Workshops Program

Saturday, October 8, 2022

Back to navigation
Time What Where
13:15

Registration opens

Please allow sufficient time to clear building security. See here for instructions.

DPI
14:15–15:45 Tutorial: Memory-Centric Computing Orange & Blue Room, IC
14:15–15:45 Tutorial: SHAD C++ Library Discovery Room, DPI
14:15–15:45 Tutorial: NVMExplorer Classroom A, DPI
15:45–16:15 Coffee Break Discovery Room, DPI
16:15–17:45 Tutorials / Workshops resume (locations as above)

Tutorial: Memory-Centric Computing
  • Onur Mutlu (ETH Zurich)

Computing is bottlenecked by data. Large amounts of application data overwhelm storage capability, communication capability, and computation capability of the modern machines we design today. As a result, many key applications' performance, efficiency and scalability are bottlenecked by data movement. In this lecture, we describe three major shortcomings of modern architectures in terms of 1) dealing with data, 2) taking advantage of the vast amounts of data, and 3) exploiting different semantic properties of application data. We argue that an intelligent architecture should be designed to handle data well. We show that handling data well requires designing architectures based on three key principles: 1) data-centric, 2) data-driven, 3) data-aware. We give several examples for how to exploit each of these principles to design a much more efficient and high performance computing system. We especially discuss recent research that aims to fundamentally reduce memory latency and energy, and practically enable computation close to data, with at least two promising novel directions: 1) processing using memory, which exploits analog operational properties of memory chips to perform massively-parallel operations in memory, with low-cost changes, 2) processing near memory, which integrates sophisticated additional processing capability in memory controllers, the logic layer of 3D-stacked memory technologies, or memory chips to enable high memory bandwidth and low memory latency to near-memory logic. We show both types of architectures can enable orders of magnitude improvements in performance and energy consumption of many important workloads, such as graph analytics, database systems, machine learning, video processing. We discuss how to enable adoption of such fundamentally more intelligent architectures, which we believe are key to efficiency, performance, and sustainability. We conclude with some guiding principles for future computing architecture and system designs.

Tutorial: Boosting Productivity and Applications Performance on Parallel Distributed Systems with the SHAD C++ Library
  • Vito Giovanni Castellana (Pacific Northwest National Laboratory, Richland, WA)
  • Marco Minutoli (Pacific Northwest National Laboratory, Richland, WA)
  • John Feo (Pacific Northwest National Laboratory, Richland, WA)

As the complexity and scale of High Performance Computing systems grows (node, core, and accelerators counts, memory, network), so does the complexity of applications, and thus, the demand for portability and productivity. With these issues in mind, we have designed SHAD, the Scalable High-performance Algorithms and Data-structures library. SHAD is open source software, written in C++, for C++ developers. Unlike other HPC libraries for distributed systems, which rely on SPMD models, SHAD adopts a shared-memory programming abstraction, to make C++ programmers feel at home. Thanks to its abstraction layers, SHAD can target different systems, ranging from laptops to HPC clusters, without any need for modifying the user-level code. In this tutorial, we first overview the design of the SHAD library, depicting its main components: runtime systems abstractions for tasking; parallel and distributed data-structures; STL-compliant interfaces and algorithms. We then propose an interactive hands-on session, with coding exercises covering the different components of the software, from the tasking API up to the STL algorithms and Data Structures layer. The SHAD library is available at https://github.com/pnnl/SHAD.

Tutorial: NVMExplorer: A Framework for Cross-Stack Comparisons of Embedded Non-Volatile Memory Solutions
  • Lilian Pentecost (Amherst College)
  • Alexander Hankin (Harvard / Intel)
  • Marco Donato (Tufts)
  • Mark Hempstead (Tufts)
  • Gu-Yeon Wei (Harvard)
  • David Brooks (Harvard)

NVMExplorer is a design space exploration framework that addresses key memory system design questions and reveals opportunities and optimizations for embedded NVMs under realistic system-level constraints, while providing a flexible interface and modular evaluation to empower further investigations. This tutorial will walk through hands-on design studies using our open-source code base, give instruction for how to use our interactive data visualization dashboard, and highlight our most recent additions to the framework, including new data-intensive workload characteristics and 3D-integrated memory solutions. We will additionally guide attendees to configure and run their own design studies according to their interests. See our webpage (http://nvmexplorer.seas.harvard.edu/) for details.

Sunday, October 9, 2022

Back to navigation
Time What Where
8:00

Registration opens

Please allow sufficient time to clear building security. See here for instructions.

DPI
8:00–9:00 Continental Breakfast Discovery Room, DPI
9:00–10:30 Tutorial: COMET Discovery Room, DPI
9:00–10:30 Tutorial: SODA Synthesizer, pt. 1 Orange & Blue Room, IC
10:30–11:00 Coffee Break Discovery Room, DPI
11:00–12:30 Tutorials / Workshops resume (locations as above)
12:30–14:00 Lunch (Attendees on their own)
14:15–15:45 Tutorial: SYCL Discovery Room, DPI
14:15–15:45 Tutorial: SODA Synthesizer, pt. 2 Orange & Blue Room, IC
15:45–16:15 Coffee Break Discovery Room, DPI
16:15–17:45 Tutorials / Workshops resume (locations as above)
Tutorial: SODA Synthesizer: Accelerating Data Science Applications with an end-to-end Silicon Compiler
  • Nicolas Bohm Agostini
  • Serena Curzel
  • Michele Fiorito
  • Vito Giovanni Castellana
  • Fabrizio Ferrandi
  • Antonino Tumeo

Data Science applications (machine learning, graph analytics) are among the main drivers for the renewed interests in designing domain specific accelerators, both for reconfigurable devices (Field Programmable Gate Arrays) and Application-Specific Integrated Circuits (ASICs). Today, the availability of new high-level synthesis (HLS) tools to generate accelerators starting from high-level specifications provides easier access to FPGAs or ASICs and preserves programmer productivity. However, the conventional HLS flow typically starts from languages such as C, C++, or OpenCL, heavily annotated with information to guide the hardware generation, still leaving a significant gap with respect to the (Python based) data science frameworks.

This tutorial will discuss HLS to accelerate data science on FPGAs or ASICs, highlighting key methodologies, trends, advantages, benefits, but also gaps that still need to be closed. The tutorial will provide a hands-on experience of the SOftware Defined Accelerators (SODA) Synthesizer, a toolchain composed of SODA-OPT, an opensource front-end and optimizer that interface with productive programming data science frameworks in Python, and Bambu, the most advanced open-source HLS tool available, able to generate optimized accelerators for data-intensive kernels.

Tutorial: SYCL for heterogenous computing: updates, experience, and feedback
  • Zheming Jin (ORNL)

SYCL programming is based on standard ISO C++ with higher-level abstraction. It is a promising programming model for CPU, GPU, and other accelerators. The tutorial is organized as invited talks from researchers and developers in the SYCL community.  Whether or not you are familiar with SYCL, we hope that the tutorial will be interesting and valuable to your work.

A number of talks were presented as part of this tutorial:

  • SYCL Future Codesign with RISC-V, HPC, and AI. Michael Wong, Intel
  • Utilizing SYCL for Math Libraries − Lessons from Tasmanian and heFFTe. Miroslav Stoyanov, Oak Ridge National Lab
  • Experimental Study of SYCL Graph and Performance Portability. Tsung-Wei Huang, University of Utah
  • MDSPAN - A Deep Dive Spanning C++, SYCL & Kokkos Nevin Liber, Argonne National Lab

Tutorial: COMET: A Domain-Specific Compilation of High-Performance Computational Chemistry
  • Gokcen Kestor (Pacific Northwest National Laboratory, Richland, WA)
  • Rizwan A. Ashraf (Pacific Northwest National Laboratory, Richland, WA)
  • Luanzheng Guo (Pacific Northwest National Laboratory, Richland, WA )
  • Ryan D. Friese (Pacific Northwest National Laboratory, Richland, WA )
  • Zhen Peng (Pacific Northwest National Laboratory, Richland, WA)

The recent slowdown of growth in realized multi-core performance of commodity microprocessors has pushed vendors and users to consider more specialized architectures, including GPUs, FPGAs, and system-on-chip. Several domains, such as artificial intelligence, have experienced an explosion of highly specialized heterogeneous architectures. With such a large variety of architectures, performance portability and productivity have become as important as peak performance, if not more.

This tutorial introduces COMET (COMpiler for Extreme Targets), a compiler framework that aims at providing performance portability and productivity on heterogenous platforms based on the principle of “write once, run everywhere.” The COMET compiler consists of a Domain Specific Language (DSL) for sparse and dense tensor algebra computations and, a progressive lowering process to map high-level operations to low-level architectural resources. COMET is built using Multi-Level IR (MLIR) Compiler Framework. Drawing motivation from MLIR, the COMET compiler performs different optimizations and code transformations at each level of the IR stack. This tutorial will provide an overview of the COMET compiler and its supported frontends. It will also review the optimizations and transformations supported by the COMET compiler for efficient code generation on target architectures. Hands-on sections will guide attendees how to write their applications based on sparse and dense tensor algebra operations and how to use various compiler optimizations for high performance. At the end of this tutorial, attendees should learn the goals and principles of the compiler framework and how to write various dense and sparse programs using the COMET DSL, NumPy, or Rust, perform sophisticated optimizations including high-level domain specific optimizations, and generate code for various architectures, including CPU and FPGA from a single-source file.

Important Dates and Deadlines

Conference Papers:

  • Abstracts: April 18, 2022
  • Full Papers: April 25, 2022
  • Round 1 Rebuttal: June 6–9, 2022
  • Round 2 Rebuttal: July 11–14, 2022
  • Author Notification: July 29, 2022
  • Camera Ready Papers: August 26, 2022

Posters:

  • Poster Submission Deadline: September 1, 2022
  • Author Notification: September 15, 2022
  • Extended Abstract: September 29, 2022
  • Poster Session: October 10, 2022

ACM Student Research Competition:

  • Abstract Submission Deadline: September 8, 2022
  • Author Notification: September 16, 2022
  • SRC Poster Session: October 11, 2022
  • SRC Finalist Presentations: October 12, 2022

Student Travel Awards:

  • Application Deadline: October 5, 2022

Workshops and Tutorials:

  • Workshops/Tutorials: October 8–9, 2022

Conference: October 10–12, 2022


Previous PACTs

Earlier PACTs

Sponsors

Platinum

Qualcomm

Gold

Huawei

Supporters

ACM SIGARCH

IEEE Computer Society

IFIP

Academic

CS@Illinois