Jump to Today

Sunday, 6:00 PM EDT – 9:00 PM EDT: Welcome Reception

Location: Harbour Foyer

Day 1: Monday, October 30

7:30 AM EDT – 8:15 AM EDT: Breakfast

8:15 AM EDT – 8:30 AM EDT: Opening Remarks

8:30 AM EDT – 9:30 AM EDT: Keynote I (Video) by Amin Vahdat Vice President of ML, Systems and Cloud AI at Google

Today, we are at an inflection point in computing where emerging Generative AI services are placing unprecedented demand for compute while the existing architectural patterns for improving efficiency have stalled. In this talk, we will discuss the likely needs of the next generation of computing infrastructure and use recent examples at Google from networks to accelerators to servers to illustrate the challenges and opportunities ahead. Taken together, we chart a course where computing must be increasingly specialized and co-optimized with algorithms and software, all while fundamentally focusing on security and sustainability.

Amin Vahdat is a Fellow and vice president of Engineering at Google, where his team is responsible for delivering industry-leading Machine Learning software and hardware that serves Alphabet, Google and the world, and Artificial Intelligence technologies that solve customers’ most pressing business challenges. In the past, he was General Manager for Google's compute, storage, and network hardware and software infrastructure. Until 2019, he was the Technical Lead for the Networking organization at Google. Before joining Google, Amin was the Science Applications International Corporation (SAIC) Professor of Computer Science and Engineering at UC San Diego (UCSD) He received his doctorate from the University of California Berkeley in computer science, and is a member of the National Academy of Engineering (NAE) and an Association for Computing Machinery (ACM) Fellow. Amin has been recognized with a number of awards, including the National Science Foundation (NSF) CAREER award, the UC Berkeley Distinguished EECS Alumni Award, the Alfred P. Sloan Fellowship, the Association for Computing Machinery's SIGCOMM Networking Systems Award, and the Duke University David and Janet Vaughn Teaching Award. Most recently, Amin was awarded the SIGCOMM lifetime achievement award for his contributions to data center and wide area networks.

9:30 AM EDT – 10:30 AM EDT: Best Papers

Session Chair: Davide Basilio Bartolini (Huawei)
Best Paper Nominee
9:30 AM EDT9:45 AM EDT
Clockhands: Rename-free Instruction Set Architecture for Out-of-order Processors
Toru Koizumi (Nagoya Institute of Technology), Ryota Shioya, Shu Sugita, Taichi Amano, Yuya Degawa, Junichiro Kadomoto, Hidetsugu Irie, Shuichi Sakai (The University of Tokyo)

Best Paper Nominee
9:45 AM EDT10:00 AM EDT
Decoupled Vector Runahead
Ajeya Naithani, Jaime Roelandts (Ghent University), Sam Ainsworth (University of Edinburgh), Timothy M. Jones (University of Cambridge), Lieven Eeckhout (Ghent University)

Best Paper Nominee
10:00 AM EDT10:15 AM EDT
CryptoMMU: Enabling Scalable and Secure Access Control of Third-Party Accelerators
Faiz Alam, Hyokeun Lee (North Carolina State University), Abhishek Bhattacharjee (Yale University), Amro Awad (North Carolina State University)

Best Paper Nominee
10:15 AM EDT10:30 AM EDT
Phantom: Exploiting Decoder-detectable Mispredictions
Johannes Wikner, Daniël Trujillo, Kaveh Razavi (ETH ZÜrich)

10:30 AM EDT – 11:00 AM EDT: Coffee Break

11:00 AM EDT – 12:00 PM EDT

Session Chair: Michael Pellauer (NVIDIA)
AuRORA: Virtualized Accelerator Orchestration for Multi-Tenant Workloads
Seah Kim, Jerry Zhao, Krste Asanovic, Borivoje Nikolic, Yakun Sophia Shao (University of California, Berkeley)

UNICO: Unified Hardware Software Co-Optimization for Robust Neural Network Acceleration
Bahador Rashidi, Chao Gao, Shan Lu (Huawei Technologies Canada); Zhisheng Wang (Huawei Technologies); Chunhua Zhou (Huawei Technologies Canada); Di Niu (University of Alberta); Fengyu Sun (Huawei Technologies)

Spatula: A Hardware Accelerator for Sparse Matrix Factorization
Axel Feldmann, Daniel Sanchez (Massachusetts Inst. of Technology)
Session Chair: Saugata Ghose (University of Illinois Urbana-Champaign)
Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices
Yan Sun (University of Illinois Urbana Champaign); Yifan Yuan (Intel Labs); Zeduo Yu, Reese Kuper, Chihun Song, Jinghan Huang, Houxiang Ji. Siddharth Agarwal, Jiaqi Lou, Ipoom Jeong (University of Illinois Urbana Champaign); Ren Wang (Intel); Jung Ho Ahn (Seoul National University); Tianyin Xu, Nam Sung Kim (University of Illinois Urbana Champaign)

Memento: Architectural Support for Ephemeral Memory Management in Serverless Environments
Ziqi Wang, Kaiyang Zhao, Pei Li, Andrew Jacob (Carnegie Mellon University); Michael Kozuch (Intel Labs and Carnegie Mellon University); Todd Mowry, Dimitrios Skarlatos (Carnegie Mellon University)

Simultaneous and Heterogenous Multithreading
Kuan-Chieh Hsu, Hung-Wei Tseng (University of California, Riverside)
Session Chair: Mark Jeffrey(University of Toronto)
Accelerating RTL Simulation with Hardware-Software Co-Design
Fares Elsabbagh, Shabnam Sheikhha, Victor A. Ying, Quan M. Nguyen (Massachusetts Inst. of Technology); Joel Emer (Massachusetts Inst. of Technology/NVIDIA); Daniel Sanchez (Massachusetts Inst. of Technology)

Fast, Robust and Transferable Prediction for Hardware Logic Synthesis
Ceyu Xu, Pragya Sharma, Tianshu Wang, Lisa Wu Wills (Duke University)

Khronos: Fusing Memory Access for Improved Hardware RTL Simulation
Kexing Zhou, Yun Liang, Yibo Lin, Runsheng Wang, Ru Huang (Peking University)

12:00 PM EDT – 1:00 PM EDT: Lunch

1:00 PM EDT – 2:00 PM EDT

Session Chair: Tushar Krishna(Georgia Institute of Technology)
SecureLoop: Design Space Exploration of Secure DNN Accelerators
Kyungmi Lee, Mengjia Yan (Massachusetts Inst. of Technology); Joel Emer (MIT, NVIDIA); Anantha Chandrakasan (Massachusetts Inst. of Technology)

DOSA: Differentiable Model-Based One-Loop Search for DNN Accelerators
Charles Hong (University of California, Berkeley); Qijing Huang (NVIDIA); Grace Dinh (University of California, Berkeley); Mahesh Subedar (Intel Corporation); Yakun Sophia Shao (University of California, Berkeley)

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs
Haotian Tang, Shang Yang, Zhijian Liu (Massachusetts Inst. of Technology); Ke Hong (Tsinghua University); Zhongming Yu (University of California, San Diego); Xiuyu Li (University of California, Berkeley); Guohao Dai (Shanghai Jiao Tong University); Yu Wang (Tsinghua University); Song Han (Massachusetts Inst. of Technology)
Session Chair: Daniel Sorin(Duke University)
Branch Target Buffer Organizations
Arthur Perais (CNRS); Rami Sheikh (Arm)

Warming Up a Cold Front-End with Ignite
David Schall (University of Edinburgh); Andreas Sandberg (Arm Ltd.); Boris Grot (University of Edinburgh)

ArchExplorer: Microarchitecture Exploration Via Bottleneck Analysis
Chen Bai (The Chinese University of Hong Kong); Jiayi Huang (Hong Kong University of Science and Technology (Guangzhou)); Xuechao Wei (Alibaba Inc); Yuzhe Ma (Hong Kong University of Science and Technology (Guangzhou)); Sicheng Li (Alibaba Inc); Hongzhong Zheng (Alibaba); Bei Yu (Chinese University of Hong Kong); Yuan Xie (Alibaba Group)
Session Chair: Sabrina Neuman (Boston University)
DF-GAS: a Distributed FPGA-as-a-Service Architecture towards Billion-Scale Graph-based Approximate Nearest Neighbor Search
Shulin Zeng, Zhenhua Zhu, Jun Liu, Haoyu Zhang (Tsinghua University); Guohao Dai (Shanghai Jiao Tong University); Zixuan Zhou (Tsinghua University); Shuangchen Li (Alibaba); Xuefei Ning (Tsinghua University); Yuan Xie (Alibaba Group); Huazhong Yang, Yu Wang (Tsinghua University)

Dadu-RBD: Robot Rigid Body Dynamics Accelerator with Multifunctional Pipelines
Yuxin Yang, Xiaoming Chen, Yinhe Han (Inst. of Computing Technology, Chinese Academy of Sciences)

MEGA Evolving Graph Accelerator
Chao Gao, Mahbod Afarin, Shafiur Rahman, Nael Abu-Ghazaleh, Rajiv Gupta (UC Riverside)

2:00 PM EDT – 2:15 PM EDT: Break

2:15 PM EDT – 3:15 PM EDT

Session Chair: Biswabandan Panda(Indian Institute of Technology Bombay)
Eureka: Efficient Tensor Cores for One-sided Unstructured Sparsity in DNN Inference
Ashish Gondimalla (Google); Mithuna Thottethodi, T. N. Vijaykumar (Purdue University)

RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration
Guyue Huang, Zhengyang Wang (University of California, Santa Barbara); Po-An Tsai (NVIDIA); Chen Zhang (Shanghai Jiao Tong University); Yufei Ding (University of California, Santa Barbara); Yuan Xie (Alibaba Group)

Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads
Hongxiang Fan (Samsung AI Cambridge and University of Cambridge); Stylianos I. Venieris (Samsung AI); Alexandros Kouris (Samsung AI and Imperial College London); Nicholas Lane (University of Cambridge and Samsung AI)
Session Chair: Nandita Vijaykumar (University of Toronto)
MAD MAcce: Supporting Multiply-Add Operations for Democratizing Matrix-Multiplication Accelerator
Seunghwan Sung, Sujin Hur, Sungwoo Kim, Dongho Ha (Yonsei University); Yunho Oh (Korea University); Won Woo Ro (Yonsei University)

Path Forward Beyond Simulators: Fast and Accurate GPU Execution Time Prediction for DNN Workloads
Ying Li, Yifan Sun (William & Mary); Adwait Jog (University of Virginia)

G10: Enabling An Efficient Unified GPU Memory and Storage Architecture with Smart Tensor Migrations
Haoyang Zhang, Yirui Zhou, Yuqi Xue, Yiqi Liu, Jian Huang (University of Illinois Urbana Champaign)

3:15 PM EDT – 4:15 PM EDT: ACM Student Research Competition & MICRO PhD Forum Posters

Location: Metropolitan West

3:15 PM EDT – 4:15 PM EDT: Hot Baked Chips

Location: Metropolitan West

3:15 PM EDT – 4:15 PM EDT: Coffee Break

4:15 PM EDT – 5:55 PM EDT

Session Chair: Po-An Tsai (NVIDIA)
MAICC : A Lightweight Many-core Architecture with In-Cache Computing for Multi-DNN Parallel Inference
Renhao Fan, Yikai Cui, Qilin Chen (Department of Computer Science and Technology, Tsinghua University); Mingyu Wang (School of Microelectronics Science and Technology, Sun Yat-Sen University); Youhui Zhang, Weimin Zheng, Zhaolin Li (Department of Computer Science and Technology, Tsinghua University)

SRIM: A Systolic Random Increment Memory Architecture for Unary Computing
Hongrui Guo, Yongwei Zhao (Inst. of Computing Technology, Chinese Academy of Sciences); Zhangmai Li (Huazhong University of Science and Technology); Yifan Hao, Chang Liu, Xinkai Song, Xiaqing Li, Zidong Du, Rui Zhang, Qi Guo (Inst. of Computing Technology, Chinese Academy of Sciences); Tianshi Chen (Cambricon Technologies); Zhiwei Xu (Inst. of Computing Technology, Chinese Academy of Sciences)

Improving Data Reuse in NPU On-chip Memory with Interleaved Gradient Order for DNN Training
Jungwoo Kim, Seonjin Na, Sanghyeon Lee, Sunho Lee, Jaehyuk Huh (KAIST)

TT-GNN: Efficient On-Chip Graph Neural Network Training via Embedding Reformation and Hardware Optimization
Zheng Qu (Unversity of California, Santa Barbara); Dimin Niu (Alibaba Group Inc.); Shuangchen Li, Hongzhong Zheng (Alibaba); Yuan Xie (Alibaba Group)

Supporting Energy-Based Learning With an Ising Machine Substrate: A Case Study on RBM
uday kumar reddy vengalam (AMD Research); Yongchao Liu, Tong Geng, Hui Wu, Michael Huang (University of Rochester)
Session Chair: Hiroaki Kobayashi (Tohoku University)
QuComm: Optimizing Collective Communication for Distributed Quantum Computing
Anbang Wu, Yufei Ding (University of California, Santa Barbara); Ang Li (Pacific Northwest National Laboratory)

QuCT: A Framework for Analyzing Quantum Circuit by Extracting Contextual and Topological Features
Siwei Tan, Congliang Lang, Liang Xiang, Shudi Wang, Xinghui Jia, Ziqi Tan, Tingting Li (Zhejiang University); Jieming Yin (Nanjing University of Posts and Telecommunications); Yongheng Shang, Andre Python, Liqiang Lu, Jianwei Yin (Zhejiang University)

ERASER: Practical and Accurate Leakage Suppression for Fault-Tolerant Quantum Computing
Suhas Vittal, Poulami Das, Moinuddin Qureshi (Georgia Inst. of Technology)

Systems Architecture for Quantum Random Access Memory
Shifan Xu (Yale University); Connor T. Hann (Amazon AWS); Ben Foxman, Steven M. Girvin, Yongshan Ding (Yale University)

HetArch: Heterogeneous Microarchitectures for Superconducting Quantum Systems
Samuel Stein (Pacific Northwest National Laboratory); Sara Sussman, Teague Tomesh, Charles Guinn, Esin Tureci (Princeton University); Sophia Fuhui Lin (University of Chicago); Wei Tang (Princeton University); James Ang (Pacific Northwest National Laboratory); Srivatsan Chakram (Rutgers University); Ang Li (Pacific Northwest National Laboratory); Margaret Martonosi (Princeton University); Fred Chong (University of Chicago); Andrew A. Houck (Princeton University); Isaac L. Chuang (Massachusetts Inst. of Technology); Michael DeMarco (Brookhaven National Laboratory, Massachusetts Inst. of Technology)
Session Chair: Koji Inoue (Kyushu University)
Efficiently Enabling Block Semantics and Data Updates in DNA Storage
Puru Sharma, Cheng-Kai Lim, Dehui Lin, Yash Pote, Djordje Jevdjic (National University of Singapore)

ReFOCUS: Reusing Light for Efficient Fourier Optics-Based Photonic Neural Network Accelerator
Shurui Li, Hangbo Yang, Chee Wei Wong (Univerisity of California Los Angeles); Volker J. Sorger (The George Washington University); Puneet Gupta (Univerisity of California Los Angeles)

SupeRBNN: Randomized Binary Neural Network Using Adiabatic Superconductor Josephson Devices
Zhengang Li, Geng Yuan (Northeastern University); Tomoharu Yamauchi (Tokyo City University); Zabihi Masoud, Yanyue Xie, Peiyan Dong (Northeastern University); Xulong Tang (University of Pittsburgh); Nobuyuki Yoshikawa (Yokohama National University); Devesh Tiwari, Yanzhi Wang (Northeastern University); Olivia Chen (Tokyo City University)

SuperBP: Design Space Exploration of Perceptron-Based Branch Predictors for Superconducting CPUs
Haipeng Zha (University of Southern California); Swamit Tannu (University of Wisconsin, Madison); Murali Annavaram (University of Southern California)

SUSHI: Ultra-High-Speed and Ultra-Low-Power Neuromorphic Chip Using Superconducting Single-Flux-Quantum Circuits
Zeshi Liu (State Key Lab of Processors, Inst. of Computing Technology,Chinese Academy of Science, China); Shuo Chen, Peiyao Qu (State Key Lab of Processors, Inst. of Computing Technology, Chinese Academy of Science, China); Huanli Liu, Minghui Niu, Liliang Ying, Jie Ren (Shanghai Inst. of Microsystem and Information Technology, Chinese Academy of Science, China); GuangMing Tang, Haihang You (State Key Lab of Processors, Inst. of Computing Technology, Chinese Academy of Science, China)

6:00 PM EDT – 7:30 PM EDT: Business Meeting

Day 2: Tuesday, October 31

7:30 AM EDT – 8:30 AM EDT: Breakfast

8:30 AM EDT – 9:30 AM EDT: Keynote II (Video) by Debbie Marr Intel Fellow and Chief Architect

The basic principles of achieving high performance in computing have remained the same, have evolved, and have presented new and different challenges. This talk will touch on some computing history, learnings, and make the case that although computing has achieved tremendous orders-of-magnitude breakthroughs, many of the challenges facing us today are curiously the same. Today’s computing landscape is more exciting than ever.

Debbie Marr is the Chief Architect of the Advanced Architecture Development Group (AADG) at Intel, where she leads visioning and developing new CPU architectures and microarchitectures for future computing needs such as AI, cloud computing, and security. Debbie’s 30+ years at Intel include roles such as the Director of Accelerator Architecture Lab in Intel Labs where she led research in machine learning and acceleration techniques for CPU, GPU, FPGA, and AI Accelerators. Debbie played leading roles on Intel CPU products from the 386SL to Intel’s current leading-edge products. Debbie was the server architect of Intel® PentiumTM Pro, Intel’s first Xeon Processor. She brought Intel Hyperthreading Technology from concept to product on the Pentium 4 Processor. She was the chief architect of the 4th Generation Intel CoreTM (Haswell), and led advanced development for Intel’s 2017/2018 Core/Xeon CPUs. Debbie holds over 40 patents in many aspects of CPU, AI accelerators, and FPGA architecture/microarchitecture. Debbie has a PhD in electrical and computer engineering from University of Michigan, an MS in electrical engineering and computer science from Cornell University, and a BS in electrical engineering and computer science from the University of California, Berkeley.

9:30 AM EDT – 9:45 AM EDT: Coffee Break

9:45 AM EDT – 11:25 AM EDT

Session Chair: Gururaj Saileshwar(University of Toronto / NVIDIA Research)
AQ2PNN: Enabling Two-party Privacy-Preserving Deep Neural Network Inference with Adaptive Quantization
Yukui Luo (Northeastern University); Nuo Xu (Lehigh University)); Hongwu Peng (University of Connecticut); Chenghong Wang (Duke University); Shijin Duan (Northeastern University); Kaleel Mahmood (University of Connecticut); Wujie Wen (Lehigh University); Caiwen Ding (University of Connecticut); Xiaolin Xu (Northeastern University)

CHERIoT: Complete Memory Safety for Embedded Devices
Saar Amar (Microsoft); David Chisnall (Microsoft); Tony Chen (Microsoft); Nathaniel Wesley Filardo (Microsoft); Ben Laurie (Google); Kunyan Liu, Robert Norton (Microsoft); Simon W. Moore (University of Cambridge); Yucong Tao (Microsoft); Robert N. M. Watson (University of Cambridge); Hongyan Xia (Arm)

Accelerating Extra Dimensional Page Walks for Confidential Computing
Dong Du, Bicheng Yang, Yubin Xia, Haibo Chen (Shanghai Jiao Tong University)

GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption
Kaustubh Shivdikar, Yuhui Bao (Northeastern University); Rashmi Agrawal (Boston University); Michael Shen (Northeastern University); Gilbert Jonatan (KAIST); Evelio Mora (Universidad Católica deMurcia); Alexander Ingare, Neal Livesay (Northeastern University); José L. Abellán (Universidad de Murcia); John Kim (KAIST); Ajay Joshi (Boston University / Lightmatter); David Kaeli (Northeastern University)

MAD: Memory-Aware Design Techniques for Accelerating Fully Homomorphic Encryption
Rashmi Agrawal (Boston University); Leo de Castro (MIT CSAIL); Chiraag Juvekar (Analog Devices); Anantha ChandraKasan (Massachusetts Inst. of Technology); Vinod Vaikuntanathan (MIT CSAIL); Ajay Joshi (Boston University / Lightmatter)
Session Chair: Leeor Peled(Toga Networks)
Micro-Armed Bandit: Lightweight & Reusable Reinforcement Learning for Microarchitecture Decision-Making
Gerasimos Gerogiannis, Josep Torrellas (University of Illinois Urbana Champaign)

CLIP: Load Criticality based Data Prefetching for Bandwidth-constrained Many-core Systems
Biswabandan Panda (Indian Inst. of Technology Bombay)

Snake: A Variable-length Chain-based Prefetching Mechanism for GPUs
Saba Mostofi (Sharif University of Technology); Hajar Falahati (Inst. for Research in Fundamental Sciences (IPM)); Negin Mahani (Shahid Bahonar Universuty); Pejman Lotfi-Kamran (Inst. for Research in Fundamental Sciences (IPM)); Hamid Sarbazi-Azad (Sharif University of Technology, IPM)

Treelet Prefetching For Ray Tracing
Yuan Hsi Chou (University of British Columbia); Tyler Nowicki (Huawei Technologies); Tor M. Aamodt (University of British Columbia)
Session Chair: Dimitrios Skarlatos(Carnegie Mellon University)
NAS-SE: Designing A Highly-Efficient In-Situ Neural Architecture Search Engine for Large-Scale Deployment
Qiyu Wan (NVIDIA); Lening Wang (University of Houston); Jing Wang (Renmin University of China); Shuaiwen Leon Song (Microsoft and University of Sydney); Xin Fu (University of Houston)

XFM: Accelerated Software-Defined Far Memory
Neel Patel, Amin Mamandipoor, Derrick Quinn, Mohammad Alian (University of Kansas)

Affinity Alloc: Taming Not-So Near-Data Computing
Zhengrong Wang (Univerisity of California, Los Angeles); Christopher Liu (University of California, Los Angeles); Nathan Beckmann (Carnegie Mellon University); Tony Nowatzki (University of California, Los Angeles)

MVC: Enabling Fully Coherent Multi-Data-Views through the Memory Hierarchy with Processing in Memory
Daichi Fujiki (Keio University)

AESPA: Asynchronous Execution Scheme to Exploit Bank-Level Parallelism of Processing-in-Memory
Hongju Kal, Chanyoung Yoo, Won Woo Ro (Yonsei University)

11:25 AM EDT – 12:15 PM EDT: Break

12:15 PM EDT – 1:30 PM EDT: Award Luncheon

1:30 PM EDT – 2:30 PM EDT: Panel

Moderator: Andreas Moshovos, University of Toronto


  • Nicolas Papernot, Univeristy of Toronto
  • Iqbal Mohome, Samsung Research
  • Nish Sinnadurai, Cerebras
  • Mark Horowitz, Stanford
  • Amir Yazdanbakhsh, Google
  • Song Han, MIT

2:30 PM EDT – 3:15 PM EDT: MICRO Posters

Location: Metropolitan West

2:30 PM EDT – 3:15 PM EDT: Coffee Break

3:15 PM EDT – 4:35 PM EDT

Session Chair: Samira Mirbagher Ajorpaz(North Carolina State University)
ReCon: Efficient Detection, Management, and Use of Non-Speculative Information Leakage
Pavlos Aimoniotis (Uppsala University); Amund Bergland Kvalsvik (Norwegian University of Science and Technology); Xiaoyue Chen (Uppsala University); Magnus Själander (Norwegian University of Science and Technology); Stefanos Kaxiras (Uppsala University)

Uncore Encore: Covert Channels Exploiting Uncore Frequency Scaling
Yanan Guo (University of Pittsburgh); Dingyuan Cao (University of Illinois Urbana Champaign); Xin Xin, Youtao Zhang, Jun Yang (University of Pittsburgh)

Hardware Support for Constant-Time Programming
Yuanqing Miao, Mahmut Taylan Kandemir, Danfeng Zhang (Pennsylvania State University); Yingtian Zhang (Penn State University); Gang Tan (Penn State); Dinghao Wu (Pennsylvania State University)

AutoCC: Automatic Discovery of Covert Channels in Time-Shared Hardware
Marcelo Orenes-Vera, Hyunsung Yun (Princeton University); Nils Wistoff (ETH Zürich); Gernot Heiser (University of New South Wales, Sydney); Luca Benini (ETH Zürich); David Wentzlaff, Margaret Martonosi (Princeton University)
Session Chair: Trevor E. Carlson(National University of Singapore)
NeuroLPM - Scaling Longest Prefix Match Hardware with Neural Networks
Alon Rashelbach, Igor De-Paula, Mark Silberstein (Technion)

Space Microdatacenters
Nathaniel Bleier (University of Illinois Urbana Champaign); Muhammad Husnain Mubarik, Gary R Swenson, Rakesh Kumar (University of Illinois Urbana-Champaign)

LogNIC: A High-Level Performance Model for SmartNICs
Zerui Guo (University of Wisconsin-Madison); Jiaxin Lin (The University of Texas at Austin); Yuebin Bai (Beihang University, China); Daehyeok Kim (The University of Texas at Austin and Microsoft); Michael Swift (University of Wisconsin-Madison); Aditya Akella (The University of Texas at Austin); Ming Liu (University of Wisconsin-Madison)

Heterogeneous Die-to-Die Interfaces: Enabling More Flexible Chiplet Interconnection Systems
Yinxiao Feng, Dong Xiang, Kaisheng Ma (Tsinghua University)
Session Chair: Freddy Gabbay(Ruppin Academic College)
Predicting Future-System Reliability with a Component-Level DRAM Fault Model
Jeageun Jung, Mattan Erez (University of Texas at Austin)

Impact of Voltage Scaling on Soft Errors Susceptibility of Multicore Server CPUs
Dimitris Agiakatsikas (University of Piraeus); George Papadimitriou, Vasileios Karakostas, Dimitris Gizopoulos (University of Athens); Mihalis Psarakis (University of Piraeus); Camille Belanger-Champagne, Ewart Blackmore (TRIUMF)

Si-Kintsugi: Recovering Golden-Like Performance of Defective Many-Core Spatial Architectures for AI
Edward Hanson, Shiyu Li, Guanglei Zhou, Feng Cheng, Yitu Wang, Rohan Bose, Hai "Helen" Li, Yiran Chen (Duke University)

How to Kill the Second Bird with One ECC: The Pursuit of Row Hammer Resilient DRAM
Michael Jaemin Kim, Minbok Wi, Jaehyun Park, Seoyoung Ko, Jae Young Choi, Hwayoung Nam (Seoul National University); Nam Sung Kim (University of Illinois Urbana Champaign); Jung Ho Ahn (Seoul National University); Eojin Lee (Inha University)

4:35 PM EDT – 4:45 PM EDT: Break

4:45 PM EDT – 5:45 PM EDT

Session Chair: Alex K. Jones(University of Pittsburgh)
Bucket Getter: A Bucket-based Processing Engine for Low-bit Block Floating Point (BFP) DNNs
Yun-Chen Lo, Ren-Shuo Liu (National Tsing Hua University)

ACRE: Accelerating Random Forests for Explainability
Andrew McCrabb, Aymen Ahmed, Valeria Bertacco (University of Michigan)

δLTA: Decoupling Camera Sampling from Processing to Avoid Redundant Computations in the Vision Pipeline
Raul Taranco Serna, Jose Maria Arnau, Antonio Gonzalez (Polytechnic University of Catalonia)
Session Chair: Rachata Ausavarungnirun(King Mongkut's University of Technology North Bangkok)
McCore: A Holistic Management of High-Performance Heterogeneous Multicores
Jaewon Kwon, Yongju Lee, Hongju Kal, Minjae Kim, Youngsok Kim, Won Woo Ro (Yonsei University)

SweepCache: Intermittence-Aware Cache on the Cheap
Yuchen Zhou, Jianping Zeng, Jungi Jeong (Purdue University); Jongouk Choi (University of Central Florida); Changhee Jung (Purdue University)

Persistent Processor Architecture
Jianping Zeng, Jungi Jeong, Changhee Jung (Purdue University)

6:45 PM EDT – 9:45 PM EDT: Excursion & Banquet at the Art Gallery of Ontario

  • Buses depart starting at 6:15 PM

Day 3: Wednesday, November 1

7:30 AM EDT – 8:30 AM EDT: Breakfast

8:30 AM EDT – 9:30 AM EDT: Keynote III (Video) by Mark Horowitz Yahoo! Founders Professor in the School of Engineering and Professor of Computer Science, Stanford

For over 50 years, information technology has relied upon Moore’s Law: providing, for the same cost, 2x the number of logic transistors that were possible a few years prior. For much of that time, the smaller devices also provided dramatic energy and performance improvement through Dennard Scaling, but that scaling ended over a decade ago. While technology scaling continues, per transistor cost is no longer scaling in the advanced nodes. In this post Moore’s Law reality, further price/performance improvement follows only from improving the efficiency of applications using innovative hardware and software techniques. Unfortunately, this need for innovative system solutions runs smack into the enormous complexity of designing and debugging contemporary VLSI based hardware/software platforms; a task so large it has caused the industry to consolidate, moving it away from innovation. The result is a set of platforms aim at different computing markets. To overcome this challenge, we need to develop a new design approach and tools to enable small groups of application experts to selectively extend the performance of those successful platforms. Like the ASIC revolution in the 1980s, the goal of this approach is to enable a new set of designers, then board level logic designers, now application experts, to leverage the power of customized silicon solutions. Like then, these tools won’t initially be useful for current chip designers, but over time will underly all designs. In the 1980s to provide access to logic designers, the key technologies were logic synthesis, simulation, and placement/routing of their designs to gate arrays and std cells. Today, the key is to realize you are creating an “app” for an existing platform, and not creating the system solution from scratch (which is both too expensive and error prone), and to leverage the fact that modern “chips” are made of many chiplets. The new approach must provide a design window familiar to application developers, with similar descriptive, performance tuning, and debug capabilities. These new tools will be tied to highly capable platforms that are used as the foundation, like the appStore model for mobile phones. This talk will try to convince you this might be possible, and encourage you to help contribute to this effort.

Mark Horowitz is the Yahoo! Founders Professor at Stanford University and chair of the Electrical Engineering Department. He co-founded Rambus, Inc. in 1990 and is a fellow of the IEEE and the ACM and a member of the National Academy of Engineering and the American Academy of Arts and Science. Dr. Horowitz's research interests are quite broad and span using EE and CS analysis methods to problems in molecular biology to creating new design methodologies for analog and digital VLSI circuits.

9:30 AM EDT – 9:45 AM EDT: Coffee Break

9:45 AM EDT – 11:25 AM EDT

Session Chair: Jason Clemons(NVIDIA)
ADA-GP: Accelerating DNN Training By ADAptive Gradient Prediction
Vahid Janfaza, Shantanu Mandal, Farabi Mahmud, Abdullah Muzahid (Texas A&M University)

HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity
Yannan Nellie Wu (Massachusetts Inst. of Technology); Po-An Tsai, Saurav Muralidharan, Angshuman Parashar (NVIDIA); Vivienne Sze (Massachusetts Inst. of Technology); Joel Emer (MIT/NVIDIA)

Exploiting Inherent Properties of Complex Numbers for Accelerating Complex Valued Neural Networks
Hyunwuk Lee, Hyungjun Jang, Sungbin Kim, Sungwoo Kim, Wonho Cho, Won Woo Ro (Yonsei University)

Point Cloud Acceleration by Exploiting Geometric Locality
Cen Chen (South China University of Technology); Xiaofeng Zou (Hunan University); Hongen Shao (South China University of Technology); Yangfan Li (Central South University); Kenli Li (College of Information Science and Engineering, National Supercomputing Center in Changsha, Hunan University)

HARP: Hardware-Based Pseudo-Tiling for Sparse Matrix Multiplication Accelerator
Jinkwon Kim, Myeongjae Jang, Haejin Nam, Soontae Kim (KAIST)
Session Chair: Mohammad Alian (University of Kansas)
IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE Invalidations
Bingyao Li, Yanan Guo, Yueqi Wang (University of Pittsburgh); Aamer Jaleel (NVIDIA); Jun Yang, Xulong Tang (University of Pittsburgh)

Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache Resources
Konstantinos Kanellopoulos, Hong Chul Nam, Nisa Bostanci, Rahul Bera, Mohammad Sadrosadati (ETH Zürich); Rakesh Kumar (Norwegian University of Science and Technology (NTNU)); Davide Basilio Bartolini (Huawei); Onur Mutlu (ETH Zürich)

Utopia: Efficient Address Translation using Hybrid Virtual-to-Physical Address Mapping
Konstantinos Kanellopoulos, Rahul Bera, Kosta Stojiljkovic, Nisa Bostanci, Can Firtina (ETH Zürich); Rachata Ausavarungnirun (King Mongkut's University of Technology North Bangkok); Rakesh Kumar (Norwegian University of Science and Technology (NTNU)); Nastaran Hajinazar (Intel Labs); Mohammad Sadrosadati (ETH Zürich); Nandita Vijaykumar (University of Toronto); Onur Mutlu (ETH Zürich)

Architectural Support for Optimizing Huge Page Selection Within the OS
Aninda Manocha (Princeton University); Zi Yan (NVIDIA); Esin Tureci (Princeton University); Juan L. Aragón (University of Murcia); David Nellans (NVIDIA); Margaret Martonosi (Princeton University)
Session Chair: Miquel Moretó (Universitat Politècnica de Catalunya/Barcelona Supercomputing Center)
Photon: A Fine-grained Sampled Simulation Methodology for GPU Workloads
Changxi Liu (National University of Singapore); Yifan Sun (College of William and Mary); Trevor E. Carlson (National University of Singapore)

Rigorous Evaluation of Computer Processors with Statistical Model Checking
Filip Mazurek, Arya Tschand (Duke University); Yu Wang (University of Florida); Miroslav Pajic, Daniel Sorin (Duke University)

TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators
Nandeeka Nayak (University of Illinois Urbana Champaign); Toluwanimi O. Odemuyiwa (University of California, Davis); Shubham Ugare, Christopher Fletcher (University of Illinois Urbana Champaign); Michael Pellauer (NVIDIA); Joel Emer (MIT/NVIDIA)

TileFlow: A Framework for Modeling Fusion Dataflow via Tree-based Analysis
Size Zheng, Siyuan Chen, Siyuan Gao, Liancheng Jia, Guangyu Sun, Runsheng Wang, Yun Liang (Peking University)

Learning to Drive Software-Defined Solid-State Drives
Daixuan Li, Jinghan Sun, Jian Huang (University of Illinois Urbana Champaign)

11:25 AM EDT – 11:40 AM EDT: Break

11:40 AM EDT – 1:00 PM EDT

Session Chair: Sihang Liu (University of Waterloo)
ARTist: A Fully Fused Accelerator for Real-Time Learning of Neural Scene Representation
Xinkai Song, Yuanbo Wen (Inst. of Computing Technology, Chinese Academy of Sciences); Xing Hu (Chinese Academy of Sciences, Inst. of Computing Technology); Tianbo Liu (University of Science and Technology of China); Haoxuan Zhou (University of Chinese Academy of Sciences); Husheng Han, Tian Zhi, Zidong Du, Wei Li, Rui Zhang (Inst. of Computing Technology, Chinese Academy of Sciences); Chen Zhang (Shanghai Jiao Tong University); Lin Gao, Qi Guo (Inst. of Computing Technology, Chinese Academy of Sciences); Tianshi Chen (Cambricon Technologies, Beijing, China)

Strix: An End-to-End Streaming Architecture with Two-Level Ciphertext Batching for Fully Homomorphic Encryption with Programmable Bootstrapping
Adiwena Putra, Prasetiyo, Yi Chen, John Kim, Joo-Young Kim (KAIST)

A Tensor Marshaling Unit for Sparse Tensor Algebra on General-Purpose Processors
Marco Siracusa, Víctor Soria-Pardos, Francesco Sgherzi (Barcelona Supercomputing Center and Universitat Politècnica de Catalunya); Joshua Randall (Arm); Douglas J. Joseph (Samsung); Miquel Moreto, Adria Armejach (Barcelona Supercomputing Center and Universitat Politècnica de Catalunya)

Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity
Zi Yu Xue, Yannan Nellie Wu (Massachusetts Inst. of Technology); Joel Emer (Massachusetts Inst. of Technology/NVIDIA); Vivienne Sze (Massachusetts Inst. of Technology)
Session Chair: Jian Huang (University of Illinois Urbana-Champaign)
Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUs
Bojian Zheng (CentML / University of Toronto / Vector Inst.); Cody Hao Yu, Jie Wang (Amazon); Yaoyao Ding (CentML / University of Toronto / Vector Inst.); Yizhi Liu, Yida Wang (Amazon); Gennady Pekhimenko (CentML / University of Toronto / Vector Inst.)

PockEngine: Sparse and Efficient Fine-tuning in a Pocket
Ligeng Zhu (Massachusetts Inst. of Technology); Lanxiang Hu (Columbia); Ji Lin, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, Song Han (Massachusetts Inst. of Technology)

Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane
Jinyi Deng, Xinru Tang, Jiahao Zhang, Yuxuan Li, Linyun Zhang (Tsinghua University); Boxiao Han, Hongjun He (China Mobile Research Inst.); Fengbin Tu (Hong Kong University of Science and Technology); Leibo Liu, Shaojun Wei, Yang Hu (Tsinghua University); Shouyi Yin (Tsinghua University / Shanghai AI Lab)

Pipestitch: An energy-minimal dataflow architecture with lightweight threads
Nathan Serafin, Souradip Ghosh, Harsh Desai, Nathan Beckmann, Brandon Lucia (Carnegie Mellon University)
Session Chair: Pradip Bose (IBM)
CASA: An Energy-Efficient and High-Speed CAM-based SMEM Seeding Accelerator for Genome Alignment
Yi Huang (Tsinghua University); Lingkun Kong (Rice University); Dibei Chen (Tsinghua University); Zhiyu Chen (Rice University); Xiangyu Kong (Tsinghua University); zhu jianfeng (tsinghua university); Konstantinos Mamouras (Rice University); Shaojun Wei (Tsinghua University); Kaiyuan Yang (Rice University); Leibo Liu (Tsinghua University)

Swordfish: A Framework for Evaluating Deep Neural Network-based Basecalling using Computation-In-Memory with Non-Ideal Memristors
Taha Shahroodi (Technische University Delft); Gagandeep Singh (AMD Research); Mahdi Zahedi (Technische University Delft); Haiyu Mao, Joel Lindegger, Can Firtina (ETH Zürich); Stephan Wong (Technische University Delft); Onur Mutlu (ETH Zürich); Said Hamdioui (Technische University Delft)

DASH-CAM: Dynamic Approximate SearcH Content Addressable Memory for genome classification
Zuher Jahshan, Itay Merlin (Bar Ilan University); Esteban Garzón (University of Calabria); Leonid Yavits (Bar Ilan university)

GMX: Instruction Set Extensions for Fast, Scalable, and Efficient Genome Sequence Alignment
Max Doblas Font, Oscar Lostes-Cazorla (Barcelona Supercomputing Center); Quim Aguado-Puig (Universitat Autònoma de Barcelona); Nick Cebry (Cornell University); Pau Fontova (Barcelona Supercomputing Center); Christopher Batten (Cornell University); Santiago Marco-Sola (Universitat Autònoma de Barcelona); Miquel Moreto (Barcelona Supercomputing Center, UPC)

1:00 PM EDT – 1:15 PM EDT: Closing Remarks