Main Conference Program

October 8: Tutorials and Workshops Day 1
October 9: Tutorials and Workshops Day 2
October 10: Main Conference Day 1
October 11: Main Conference Day 2
October 12: Main Conference Day 3
DPI below refers to the Discovery Partners Institute, on the fourth floor of 200 S. Wacker Drive, Chicago, IL.
IC below refers to the Illini Center, on the 19th floor of 200 S. Wacker Drive, Chicago, IL.

Monday, October 10, 2022

Time	What	Where
7:30	Registration opens Please allow sufficient time to clear building security. See here for instructions.	DPI
7:30–8:20	Continental Breakfast	Discovery Room, DPI
8:20–8:30	Welcome from the Chairs	Discovery Room, DPI
8:30–9:30	Keynote: Closing the Gap between Quantum Algorithms and Machines with Hardware-Software Co-Design	Discovery Room, DPI
9:30–10:00	Coffee Break	Discovery Room, DPI
10:00–12:00	Track 1: Compilers for ever Session Chair: Nelson Amaral 10:00–10:30: ReACT: Redundancy-Aware Code Generation for Tensor Expressions (#295) T. Zhou, R. Tian, R. Ashraf, R. Gioiosa, G. Kestor, V. Sarkar 10:30–11:00: Com-CAS: Effective Cache Apportioning Under Compiler Guidance (#12) B. Chatterjee, S. Khan, S. Pande 11:00–11:30: Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program Code Generation (#267) P. Gibson, J. Cano 11:30–12:00: HBMax: Optimizing Memory Efficiency for Parallel Influence Maximization on Multicore Architectures (#26) X. Chen, M. Minutoli, J. Tian, M. Halappanavar, A. Kalyanaraman, D. Tao	Discovery Room, DPI
10:00–12:00	Track 2: Optimizing the execution of GNNs Session Chair: Antonino Tumeo 10:00–10:30: Slice-and-Forge: Making Better Use of Caches for Graph Convolutional Network Accelerators (#520) M. Yoo, J. Song, H. Lee, J. Lee, N. Kim, Y. Kim, J. Lee 10:30–11:00: GNNear: Accelerating Full-Batch Training of Graph Neural Networks with Near-Memory Processing (#148) Z. Zhou, C. Li, X. Wei, X. Wang, G. Sun 11:00–11:30: T-GCN: A Sampling Based Streaming Graph Neural Network System With Hybrid Architecture (#31) C. Huan, S. Song, Y. Liu, H. Zhang, H. Liu, C. He, K. Chen, J. Jiang, Y. Wu 11:30–12:00: Optimizing Aggregate Computation of Graph Neural Networks with On-GPU Interpreter-Style Programming (#403) Z. Ji, C. Wang	Orange & Blue Room, IC
12:00–13:30	Lunch	(Attendees on their own)
12:30–13:30	Steering Committee Meeting	Illini Room, IC
13:30–15:00	Track 1: Getting more out of your memory Session Chair: Jose Moreira 13:30–14:00: FlatPack: Flexible Compaction of Compressed Memory (#66) A. Eldstål-Ahrens, A. Arelakis, I. Sourdis 14:00–14:30: Pavise: Integrating Fault Tolerance Support for Persistent Memory Applications (#118) H. Qiu, S. Liu, X. Song, S. Khan, G. Pekhimenko 14:30–15:00: Efficient Atomic Durability on eADR-enabled Persistent Memory (#199) T. Zhou, Y. Du, F. Yang, X. Liao, Y. Lu	Discovery Room, DPI
13:30–15:00	Track 2: Sparse matrix computations Session Chair: Gagan Agrawal 13:30–14:00: Probing the Efficacy of Hardware-Aware Weight Pruning to Optimize the SpMM routine on Ampere GPUs (#416) R. Castro, D. Andrade, B. Fraguela 14:00–14:30: Squaring the circle: Executing Sparse Matrix Computations on FlexTPU—a TPU-like processor (#133) X. He, K. Chen, S. Feng, H. Kim, D. Blaauw, R. Dreslinski, T. Mudge 14:30–15:00: Custom High-Performance Vector Code Generation for Data-Specific Sparse Computations (#139) M. Horro, L. Pouchet, G. Rodríguez, J. Tourino	Orange & Blue Room, IC
15:00–15:30	Coffee Break	Discovery Room, DPI
15:30–17:00	Track 1: Graph processing Session Chair: Vivek Sarkar 15:30–16:00: Batched Graph Community Detection on GPUs (#72) H. Chou, S. Ghosh 16:00–16:30: SampleMine: A Framework for Applying Random Sampling to Subgraph Pattern Mining through Loop Perforation (#85) P. Jiang, Y. Wei, J. Su, R. Wang, B. Wu 16:30–17:00: Decoupling Scheduler, Topology Layout, and Algorithm to Easily Enlarge the Tuning Space of GPU Graph Processing (#308) S. Jeong, Y. Lee, J. Lee, H. Choi, S. Song, J. Lee, Y. Kim, H. Kim	Discovery Room, DPI
15:30–17:00	Track 2: Miscellaneous Session Chair: Jose Moreira 15:30–16:00: Tiered Hashing: Revamping Hash Indexing under a Unified Memory-Storage Hierarchy (#58) J. Zhou, J. Wu, W. Huang, Y. Zhou, F. Wu, L. Shi, X. Zhang, K. Wang, F. Zhu, S. Li, W. Wang 16:00–16:30: Understanding and Reaching the Performance Limit of Schedule Tuning on Stable Synchronization Determinism (#145) Q. Zhao, Z. Qiu, S. Shao, X. Hui, H. Khan, G. Jin 16:30–17:00: VoxelCache: Accelerating Online Mapping in Robotics and 3D Reconstruction Tasks (#183) S. Durvasula, R. Kiguru, S. Mathur, J. Xu, J. Lin, N. Vijaykumar	Orange & Blue Room, IC
17:00–19:00	Poster Session / Reception	Classroom B, DPI

Keynote: Closing the Gap between Quantum Algorithms and Machines with Hardware-Software Co-Design

Fred Chong (Department of Computer Science, University of Chicago, Chicago, IL)

Quantum computing is at an inflection point, where 127-qubit machines are deployed, and 1000-qubit machines are perhaps only a few years away. These machines have the potential to fundamentally change our concept of what is computable and demonstrate practical applications in areas such as quantum chemistry, optimization, and quantum simulation. Yet a significant resource gap remains between practical quantum algorithms and real machines. A promising approach to closing this gap is to design software that is aware of the key physical properties of emerging quantum technologies. I will illustrate this approach with some of our recent work that focuses on techniques that break traditional abstractions and inform hardware design, including compiling programs directly to analog control pulses, computing with ternary quantum bits, 2.5D architectures for surface codes, and exploiting long-distance communication and tolerating atom loss in neutral-atom machines.

Fred Chong is the Seymour Goodman Professor in the Department of Computer Science at the University of Chicago and the Chief Scientist for Quantum Software at ColdQuanta. He is also Lead Principal Investigator for the EPiQC Project (Enabling Practical-scale Quantum Computing), an NSF Expedition in Computing. Chong is a member of the National Quantum Advisory Committee (NQIAC) which provides advice to the President and Secretary of Energy on the National Quantum Initiative Program. In 2020, he co-founded Super.tech, a quantum software company, which was acquired by ColdQuanta in 2022. Chong received his Ph.D. from MIT in 1996 and was a faculty member and Chancellor's fellow at UC Davis from 1997-2005. He was also a Professor of Computer Science, Director of Computer Engineering, and Director of the Greenscale Center for Energy-Efficient Computing at UCSB from 2005-2015. He is a recipient of the NSF CAREER award, the Intel Outstanding Researcher Award, and 13 best paper awards.

Tuesday, October 11, 2022

Back to navigation

Time	What	Where
7:30	Registration opens Please allow sufficient time to clear building security. See here for instructions.	DPI
7:30–8:25	Continental Breakfast	Discovery Room, DPI
8:25–8:30	PACT 2023 in Vienna: A Preview	Discovery Room, DPI
8:30–9:30	Keynote: MemComputing: Fundamentals and Applications	Discovery Room, DPI
9:30–10:00	Coffee Break	Discovery Room, DPI
10:00–12:00	ACM SRC Poster Session	Discovery Room, DPI
10:00–12:00	Track 1: Better neural networks Session Chair: Jose Cano Reyes 10:00–10:30: Effective Performance Modeling and Domain-Specific Compiler Optimization of CNNs for GPUs (#178) Y. Xu, Q. Yuan, E. Barton, R. Li, P. Sadayappan, A. Sukumaran-Rajam 10:30–11:00: High-performance Architecture Aware Sparse Convolutional Neural Networks for GPUs (#136) L. Xiang, P. Sadayappan, A. Sukumaran-Rajam 11:00–11:30: Weightless Neural Networks for Efficient Edge Inference (#256) Z. Susskind, A. Arora, I. Miranda, L. Villon, R. Katopodis, L. de Araújo, D. Dutra, P. Lima, F. França, M. Breternitz Jr., L. John 11:30–12:00: Q-gym: An Equality Saturation Framework for DNN Inference Exploiting Weight Repetition (#176) C. Fu, H. Huang, B. Wasti, C. Cummins, R. Baghdadi, K. Hazelwood, Y. Tian, J. Zhao, H. Leather	Orange & Blue Room, IC
12:00–13:30	Lunch	(Attendees on their own)
13:30–15:00	Track 1: Getting more out of your GPU Session Chair: Perry Gibson 13:30–14:00: Locality-aware Optimizations for Improving Remote Memory Latency in Multi-GPU Systems (#43) L. Belayneh, H. Ye, K. Chen, D. Blaauw, T. Mudge, R. Dreslinski, N. Talati 14:00–14:30: GPUPool: A Holistic Approach to Fine-Grained GPU Sharing in the Cloud (#50) X. Tan, P. Golikov, N. Vijaykumar, G. Pekhimenko 14:30–15:00: NaviSim: A Highly Accurate GPU Simulator for AMD RDNA GPUs (#135) Y. Bao, Y. Sun, Z. Feric, M. Shen, M. Weston, J. Abellán, T. Baruah, J. Kim, A. Joshi, D. Kaeli	Discovery Room, DPI
13:30–15:00	Track 2: Better hardware Session Chair: Sushant Kondguli 13:30–14:00: mu-grind: A Framework for Dynamically Instrumenting HLS generated RTL (#158) P. Vahdatnia, A. sharifian, R. Hojabr, A. Shriraman 14:00–14:30: Athena: An Early-Fetch Architecture To Reduce On-Chip Page Walk Latencies (#276) S. Ghahani, S. Khadirsharbiyani, J. Kotra, M. Kandemir 14:30–15:00: DSDP: Dual Stream Data Prefetcher (#204) M. He, H. Wang, K. Zhou, K. Cui, H. Yan, C. Guo, R. He	Orange & Blue Room, IC
15:00–15:30	Coffee Break	Discovery Room, DPI
15:30–16:30	Track 1: Task parallelism Session Chair: Santosh Pande 15:30–16:00: Efficient task-mapping of parallel applications using a space-filling curve (#83) O. Kwon, J. Kang, S. Lee, W. Kim, J. Song 16:00–16:30: Auto-Partitioning Heterogeneous Task-Parallel Programs with StreamBlocks (#103) M. Emami, E. Bezati, J. Janneck, J. Larus	Discovery Room, DPI
15:30–16:30	Track 2: Optimization Session Chair: Nicolas Agostini 15:30–16:00: Optimizing Regular Expressions via Rewrite-Guided Synthesis (#127) J. McClurg, M. Claver, J. Garner, J. Vossen, J. Schmerge, M. Belviranli 16:00–16:30: Combining Run-time Checks and Compile-time Analysis to Improve Control Flow Auto-Vectorization (#120) B. Liu, A. Laird, W. Tsang, B. Mahjour, M. Dehnavi	Orange & Blue Room, IC
16:30–17:00	Travel to boat dock The dock is a 30-minute walk from DPI. Please make sure to allow sufficient time.	(Attendees on their own)
17:00–20:30	Banquet / Excursion: Architecture Boat Tour (boarding starts 17:15, vessel departs 17:30 sharp)	Wendella West Dock 4

Keynote: MemComputing: Fundamentals and Applications

Massimiliano Di Ventra (Department of Physics, University of California San Diego, La Jolla, CA)

MemComputing is a new physics-based approach to computation that employs time non-locality (memory) to both process and store information on the same physical location. (M. Di Ventra, MemComputing: Fundamentals and Applications, Oxford University Press, 2022.) Its digital version is designed to solve combinatorial optimization problems. A practical realization of digital memcomputing machines (DMMs) can be accomplished via circuits of non-linear dynamical systems with memory engineered so that periodic orbits and chaos can be avoided. A given logic (or algebraic) problem is first mapped into this type of dynamical system whose point attractors represent the solutions of the original problem. A DMM then finds the solution via a succession of elementary avalanches (instantons) whose role is to eliminate configurations of logical inconsistency ("logical defects") from the circuit. I will discuss the physics behind MemComputing and show many examples of its applicability to various combinatorial optimization problems, Machine Learning, and Quantum Mechanics, demonstrating its advantages over traditional approaches and even quantum computing. Work supported by DARPA, DOE, NSF, CMRR, and MemComputing, Inc.

Massimiliano Di Ventra obtained his undergraduate degree in Physics summa cum laude from the University of Trieste (Italy) in 1991 and did his PhD studies at the Swiss Federal Institute of Technology in Lausanne in 1993-1997. He is now professor of Physics at the University of California, San Diego. Di Ventra's research interests are in condensed-matter theory and unconventional computing. He has been invited to deliver more than 300 talks worldwide on these topics. He has published more than 200 papers in refereed journals, 4 textbooks, and has 7 granted patents (3 foreign). He is a fellow of the IEEE, the American Physical Society, the Institute of Physics, and a foreign member of Academia Europaea. In 2018 he was named Highly Cited Researcher by Clarivate Analytics, he is the recipient of the 2020 Feynman Prize for theory in Nanotechnology, and is a 2022 IEEE Nanotechnology Council Distinguished Lecturer. He is the co-founder of MemComputing, Inc.

Wednesday, October 12, 2022

Back to navigation

Time	What	Where
7:30	Registration opens Please allow sufficient time to clear building security. See here for instructions.	DPI
7:30–8:30	Continental Breakfast	Discovery Room, DPI
8:30–9:30	Keynote: AI Acceleration: Co-optimizing Algorithms, Hardware, and Software	Discovery Room, DPI
9:30–10:30	Talks: ACM SRC Finalists Understanding Correlated Error Events in Quantum Computers Michael Schleppy & Arpan Gupta (undergrad) Independent Tenancy Model Boyang Wang (undergrad) A GPU Acceleration Flow for Parallel RTL Simulation and Hardware Testing Dian-Lun Lin (grad) SuperB-NoC: A Superconducting Buffering NoC Rhys Gretsch (grad) Automatically Translating Non-Affine Codes Avery Laird (grad)	Discovery Room, DPI
10:30–11:00	Coffee Break	Discovery Room, DPI
11:00–12:30	Track 1: GPU algorithms Session Chair: Jose Moreira 11:00–11:30: Parallelizing Neural Network Models Effectively on GPU by Implementing Reductions Atomically (#78) J. Zhao, C. Bastoul, Y. Yi, J. Hu, W. Nie, R. Zhang, Z. Geng, C. Li, T. Tachon, Z. Gan 11:30–12:00: GAP: GPU Adaptive In-situ Parallel Analytics (#114) H. Xing, G. Agrawal, R. Ramnath 12:00–12:30: A GPU Multiversion B-Tree (#258) M. Awad, S. Porumbescu, J. Owens	Discovery Room, DPI
11:00–12:30	Track 2: Portable performance Session Chair: P. Sadayappan 11:00–11:30: Breaking the Vendor Lock --- Performance Portable Programming Through OpenMP as Target Independent Runtime Layer (#312) J. Doerfert, M. Jasper, J. Huber, K. Abdelaal, G. Georgakoudis, T. Scogland, K. Parasyris 11:30–12:00: BenchPress: A Deep Active Benchmark Generator (#10) F. Tsimpourlas, P. Petoumenos, M. Xu, C. Cummins, K. Hazelwood, A. Rajan, H. Leather 12:00–12:30: Collage: Seamless Integration of Deep Learning Backends with Automatic Placement (#52) B. Jeon, S. Park, P. Liao, S. Xu, T. Chen, Z. Jia	Orange & Blue Room, IC
12:30–12:45	Conference Closing	Discovery Room, DPI

Keynote: AI Acceleration: Co-optimizing Algorithms, Hardware, and Software

Vijayalakshmi Srinivasan (IBM Research, Yorktown Heights, NY)

The combination of growth in compute capabilities and availability of large datasets has led to a re-birth of deep learning. Deep Neural Networks (DNNs) have become state-of-the-art in a variety of machine learning tasks spanning domains across vision, speech, and machine translation. Deep Learning (DL) achieves high accuracy in these tasks at the expense of 100s of ExaOps of computation. Hardware specialization and acceleration is a key enabler to improve operational efficiency of DNNs, in turn requiring synergistic cross-layer design across algorithms, hardware, and software.

In this talk I will present this holistic approach adopted in the design of a multi-TOPs AI hardware accelerator. Key advances in the AI algorithm/application-level exploiting approximate computing techniques enable deriving low-precision DNNs models that maintain the same level of accuracy. Hardware performance-aware design space exploration is critical during compilation to map DNNs with diverse computational characteristics systematically and optimally while preserving familiar programming and user interfaces. The opportunities to co-optimize the algorithms, hardware, and the software provides the roadmap to continue to deliver superior performance over the next decade.

Viji Srinivasan is a Distinguished Research Staff Member and a manager of the accelerator architectures and compilers group at the IBM T.J. Watson Research Center in Yorktown Heights. At IBM, she has worked on various aspects of data management including energy-efficient processor designs, microarchitecture of the memory hierarchies of large-scale servers, cache coherence management of symmetric multiprocessors, accelerators for data analytics applications and more recently end-to-end accelerator solutions for AI. Many of her research contributions have been incorporated into IBM's Power and System-z Enterprise-class servers.