MICRO 2023: Main Program

Jump to Today

Saturday, October 28 / Sunday, October 29: Workshops & Tutorials

Sunday, 6:00 PM EDT – 9:00 PM EDT: Welcome Reception

Location: Harbour Foyer

Jump to Sat/Sun | Monday | Tuesday | Wednesday

Expand All / Collapse All Sessions

Day 1: Monday, October 30

7:30 AM EDT – 8:15 AM EDT: Breakfast

8:15 AM EDT – 8:30 AM EDT: Opening Remarks

8:30 AM EDT – 9:30 AM EDT: Keynote I (Video) by Amin Vahdat Vice President of ML, Systems and Cloud AI at Google

Societal Infrastructure in the Age of Artificial General Intelligence

Today, we are at an inflection point in computing where emerging Generative AI services are placing unprecedented demand for compute while the existing architectural patterns for improving efficiency have stalled. In this talk, we will discuss the likely needs of the next generation of computing infrastructure and use recent examples at Google from networks to accelerators to servers to illustrate the challenges and opportunities ahead. Taken together, we chart a course where computing must be increasingly specialized and co-optimized with algorithms and software, all while fundamentally focusing on security and sustainability.

Bio
Amin Vahdat is a Fellow and vice president of Engineering at Google, where his team is responsible for delivering industry-leading Machine Learning software and hardware that serves Alphabet, Google and the world, and Artificial Intelligence technologies that solve customers’ most pressing business challenges. In the past, he was General Manager for Google's compute, storage, and network hardware and software infrastructure. Until 2019, he was the Technical Lead for the Networking organization at Google. Before joining Google, Amin was the Science Applications International Corporation (SAIC) Professor of Computer Science and Engineering at UC San Diego (UCSD) He received his doctorate from the University of California Berkeley in computer science, and is a member of the National Academy of Engineering (NAE) and an Association for Computing Machinery (ACM) Fellow. Amin has been recognized with a number of awards, including the National Science Foundation (NSF) CAREER award, the UC Berkeley Distinguished EECS Alumni Award, the Alfred P. Sloan Fellowship, the Association for Computing Machinery's SIGCOMM Networking Systems Award, and the Duke University David and Janet Vaughn Teaching Award. Most recently, Amin was awarded the SIGCOMM lifetime achievement award for his contributions to data center and wide area networks.

9:30 AM EDT – 10:30 AM EDT: Best Papers

Best Paper Session

Location: Metropolitan Center

Session Chair: Davide Basilio Bartolini (Huawei)

Best Paper Nominee

9:30 AM EDT – 9:45 AM EDT

Clockhands: Rename-free Instruction Set Architecture for Out-of-order Processors

Toru Koizumi (Nagoya Institute of Technology), Ryota Shioya, Shu Sugita, Taichi Amano, Yuya Degawa, Junichiro Kadomoto, Hidetsugu Irie, Shuichi Sakai (The University of Tokyo)

Best Paper Nominee

9:45 AM EDT – 10:00 AM EDT

Decoupled Vector Runahead

Ajeya Naithani, Jaime Roelandts (Ghent University), Sam Ainsworth (University of Edinburgh), Timothy M. Jones (University of Cambridge), Lieven Eeckhout (Ghent University)

Best Paper Nominee

10:00 AM EDT – 10:15 AM EDT

CryptoMMU: Enabling Scalable and Secure Access Control of Third-Party Accelerators

Faiz Alam, Hyokeun Lee (North Carolina State University), Abhishek Bhattacharjee (Yale University), Amro Awad (North Carolina State University)

Best Paper Nominee

10:15 AM EDT – 10:30 AM EDT

Phantom: Exploiting Decoder-detectable Mispredictions

Johannes Wikner, Daniël Trujillo, Kaveh Razavi (ETH ZÜrich)

10:30 AM EDT – 11:00 AM EDT: Coffee Break

11:00 AM EDT – 12:00 PM EDT

Session 1A: Accelerators Based on HW/SW Co-Design Accelerators for Matrix Processing

Location: Metropolitan Center

Session Chair: Michael Pellauer (NVIDIA)

AuRORA: Virtualized Accelerator Orchestration for Multi-Tenant Workloads

Seah Kim, Jerry Zhao, Krste Asanovic, Borivoje Nikolic, Yakun Sophia Shao (University of California, Berkeley)

UNICO: Unified Hardware Software Co-Optimization for Robust Neural Network Acceleration

Bahador Rashidi, Chao Gao, Shan Lu (Huawei Technologies Canada); Zhisheng Wang (Huawei Technologies); Chunhua Zhou (Huawei Technologies Canada); Di Niu (University of Alberta); Fengyu Sun (Huawei Technologies)

Spatula: A Hardware Accelerator for Sparse Matrix Factorization

Axel Feldmann, Daniel Sanchez (Massachusetts Inst. of Technology)

Session 1B: Architectural Support/ Programming Languages, Case Study

Location: Metropolitan West

Session Chair: Saugata Ghose (University of Illinois Urbana-Champaign)

Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices

Yan Sun (University of Illinois Urbana Champaign); Yifan Yuan (Intel Labs); Zeduo Yu, Reese Kuper, Chihun Song, Jinghan Huang, Houxiang Ji. Siddharth Agarwal, Jiaqi Lou, Ipoom Jeong (University of Illinois Urbana Champaign); Ren Wang (Intel); Jung Ho Ahn (Seoul National University); Tianyin Xu, Nam Sung Kim (University of Illinois Urbana Champaign)

Memento: Architectural Support for Ephemeral Memory Management in Serverless Environments

Ziqi Wang, Kaiyang Zhao, Pei Li, Andrew Jacob (Carnegie Mellon University); Michael Kozuch (Intel Labs and Carnegie Mellon University); Todd Mowry, Dimitrios Skarlatos (Carnegie Mellon University)

Simultaneous and Heterogenous Multithreading

Kuan-Chieh Hsu, Hung-Wei Tseng (University of California, Riverside)

Session 1C: Design Automation, Synthesis, Hardware Generation

Location: Metropolitan East

Session Chair: Mark Jeffrey(University of Toronto)

Accelerating RTL Simulation with Hardware-Software Co-Design

Fares Elsabbagh, Shabnam Sheikhha, Victor A. Ying, Quan M. Nguyen (Massachusetts Inst. of Technology); Joel Emer (Massachusetts Inst. of Technology/NVIDIA); Daniel Sanchez (Massachusetts Inst. of Technology)

Fast, Robust and Transferable Prediction for Hardware Logic Synthesis

Ceyu Xu, Pragya Sharma, Tianshu Wang, Lisa Wu Wills (Duke University)

Khronos: Fusing Memory Access for Improved Hardware RTL Simulation

Kexing Zhou, Yun Liang, Yibo Lin, Runsheng Wang, Ru Huang (Peking University)

12:00 PM EDT – 1:00 PM EDT: Lunch

1:00 PM EDT – 2:00 PM EDT

Session 2A: ML Design Space Exploration

Location: Metropolitan Center

Session Chair: Tushar Krishna(Georgia Institute of Technology)

SecureLoop: Design Space Exploration of Secure DNN Accelerators

Kyungmi Lee, Mengjia Yan (Massachusetts Inst. of Technology); Joel Emer (MIT, NVIDIA); Anantha Chandrakasan (Massachusetts Inst. of Technology)

DOSA: Differentiable Model-Based One-Loop Search for DNN Accelerators

Charles Hong (University of California, Berkeley); Qijing Huang (NVIDIA); Grace Dinh (University of California, Berkeley); Mahesh Subedar (Intel Corporation); Yakun Sophia Shao (University of California, Berkeley)

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

Haotian Tang, Shang Yang, Zhijian Liu (Massachusetts Inst. of Technology); Ke Hong (Tsinghua University); Zhongming Yu (University of California, San Diego); Xiuyu Li (University of California, Berkeley); Guohao Dai (Shanghai Jiao Tong University); Yu Wang (Tsinghua University); Song Han (Massachusetts Inst. of Technology)

Session 2B: Microarchitecture

Location: Metropolitan West

Session Chair: Daniel Sorin(Duke University)

Branch Target Buffer Organizations

Arthur Perais (CNRS); Rami Sheikh (Arm)

Warming Up a Cold Front-End with Ignite

David Schall (University of Edinburgh); Andreas Sandberg (Arm Ltd.); Boris Grot (University of Edinburgh)

ArchExplorer: Microarchitecture Exploration Via Bottleneck Analysis

Chen Bai (The Chinese University of Hong Kong); Jiayi Huang (Hong Kong University of Science and Technology (Guangzhou)); Xuechao Wei (Alibaba Inc); Yuzhe Ma (Hong Kong University of Science and Technology (Guangzhou)); Sicheng Li (Alibaba Inc); Hongzhong Zheng (Alibaba); Bei Yu (Chinese University of Hong Kong); Yuan Xie (Alibaba Group)

Session 2C: Accelerators for Graphs, Robotics

Location: Metropolitan East

Session Chair: Sabrina Neuman (Boston University)

DF-GAS: a Distributed FPGA-as-a-Service Architecture towards Billion-Scale Graph-based Approximate Nearest Neighbor Search

Shulin Zeng, Zhenhua Zhu, Jun Liu, Haoyu Zhang (Tsinghua University); Guohao Dai (Shanghai Jiao Tong University); Zixuan Zhou (Tsinghua University); Shuangchen Li (Alibaba); Xuefei Ning (Tsinghua University); Yuan Xie (Alibaba Group); Huazhong Yang, Yu Wang (Tsinghua University)

Dadu-RBD: Robot Rigid Body Dynamics Accelerator with Multifunctional Pipelines

Yuxin Yang, Xiaoming Chen, Yinhe Han (Inst. of Computing Technology, Chinese Academy of Sciences)

MEGA Evolving Graph Accelerator

Chao Gao, Mahbod Afarin, Shafiur Rahman, Nael Abu-Ghazaleh, Rajiv Gupta (UC Riverside)

2:00 PM EDT – 2:15 PM EDT: Break

2:15 PM EDT – 3:15 PM EDT

Session 3A: ML Sparsity

Location: Metropolitan Center

Session Chair: Biswabandan Panda(Indian Institute of Technology Bombay)

Eureka: Efficient Tensor Cores for One-sided Unstructured Sparsity in DNN Inference

Ashish Gondimalla (Google); Mithuna Thottethodi, T. N. Vijaykumar (Purdue University)

RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration

Guyue Huang, Zhengyang Wang (University of California, Santa Barbara); Po-An Tsai (NVIDIA); Chen Zhang (Shanghai Jiao Tong University); Yufei Ding (University of California, Santa Barbara); Yuan Xie (Alibaba Group)

Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads

Hongxiang Fan (Samsung AI Cambridge and University of Cambridge); Stylianos I. Venieris (Samsung AI); Alexandros Kouris (Samsung AI and Imperial College London); Nicholas Lane (University of Cambridge and Samsung AI)

Session 3B: GPUs

Location: Metropolitan West

Session Chair: Nandita Vijaykumar (University of Toronto)

MAD MAcce: Supporting Multiply-Add Operations for Democratizing Matrix-Multiplication Accelerator

Seunghwan Sung, Sujin Hur, Sungwoo Kim, Dongho Ha (Yonsei University); Yunho Oh (Korea University); Won Woo Ro (Yonsei University)

Path Forward Beyond Simulators: Fast and Accurate GPU Execution Time Prediction for DNN Workloads

Ying Li, Yifan Sun (William & Mary); Adwait Jog (University of Virginia)

G10: Enabling An Efficient Unified GPU Memory and Storage Architecture with Smart Tensor Migrations

Haoyang Zhang, Yirui Zhou, Yuqi Xue, Yiqi Liu, Jian Huang (University of Illinois Urbana Champaign)

Session 3C: PhD Forum Lightning Session

Location: Metropolitan East

PHD Forum participants & agenda

3:15 PM EDT – 4:15 PM EDT: ACM Student Research Competition & MICRO PhD Forum Posters

Location: Metropolitan West

3:15 PM EDT – 4:15 PM EDT: Hot Baked Chips

Location: Metropolitan West

3:15 PM EDT – 4:15 PM EDT: Coffee Break

4:15 PM EDT – 5:55 PM EDT

Session 4A: ML Architecture

Location: Metropolitan Center

Session Chair: Po-An Tsai (NVIDIA)

MAICC : A Lightweight Many-core Architecture with In-Cache Computing for Multi-DNN Parallel Inference

Renhao Fan, Yikai Cui, Qilin Chen (Department of Computer Science and Technology, Tsinghua University); Mingyu Wang (School of Microelectronics Science and Technology, Sun Yat-Sen University); Youhui Zhang, Weimin Zheng, Zhaolin Li (Department of Computer Science and Technology, Tsinghua University)

SRIM: A Systolic Random Increment Memory Architecture for Unary Computing

Hongrui Guo, Yongwei Zhao (Inst. of Computing Technology, Chinese Academy of Sciences); Zhangmai Li (Huazhong University of Science and Technology); Yifan Hao, Chang Liu, Xinkai Song, Xiaqing Li, Zidong Du, Rui Zhang, Qi Guo (Inst. of Computing Technology, Chinese Academy of Sciences); Tianshi Chen (Cambricon Technologies); Zhiwei Xu (Inst. of Computing Technology, Chinese Academy of Sciences)

Improving Data Reuse in NPU On-chip Memory with Interleaved Gradient Order for DNN Training

Jungwoo Kim, Seonjin Na, Sanghyeon Lee, Sunho Lee, Jaehyuk Huh (KAIST)

TT-GNN: Efficient On-Chip Graph Neural Network Training via Embedding Reformation and Hardware Optimization

Zheng Qu (Unversity of California, Santa Barbara); Dimin Niu (Alibaba Group Inc.); Shuangchen Li, Hongzhong Zheng (Alibaba); Yuan Xie (Alibaba Group)

Supporting Energy-Based Learning With an Ising Machine Substrate: A Case Study on RBM

uday kumar reddy vengalam (AMD Research); Yongchao Liu, Tong Geng, Hui Wu, Michael Huang (University of Rochester)

Session 4B: Quantum

Location: Metropolitan West

Session Chair: Hiroaki Kobayashi (Tohoku University)

QuComm: Optimizing Collective Communication for Distributed Quantum Computing

Anbang Wu, Yufei Ding (University of California, Santa Barbara); Ang Li (Pacific Northwest National Laboratory)

QuCT: A Framework for Analyzing Quantum Circuit by Extracting Contextual and Topological Features

Siwei Tan, Congliang Lang, Liang Xiang, Shudi Wang, Xinghui Jia, Ziqi Tan, Tingting Li (Zhejiang University); Jieming Yin (Nanjing University of Posts and Telecommunications); Yongheng Shang, Andre Python, Liqiang Lu, Jianwei Yin (Zhejiang University)

ERASER: Practical and Accurate Leakage Suppression for Fault-Tolerant Quantum Computing

Suhas Vittal, Poulami Das, Moinuddin Qureshi (Georgia Inst. of Technology)

Systems Architecture for Quantum Random Access Memory

Shifan Xu (Yale University); Connor T. Hann (Amazon AWS); Ben Foxman, Steven M. Girvin, Yongshan Ding (Yale University)

HetArch: Heterogeneous Microarchitectures for Superconducting Quantum Systems

Samuel Stein (Pacific Northwest National Laboratory); Sara Sussman, Teague Tomesh, Charles Guinn, Esin Tureci (Princeton University); Sophia Fuhui Lin (University of Chicago); Wei Tang (Princeton University); James Ang (Pacific Northwest National Laboratory); Srivatsan Chakram (Rutgers University); Ang Li (Pacific Northwest National Laboratory); Margaret Martonosi (Princeton University); Fred Chong (University of Chicago); Andrew A. Houck (Princeton University); Isaac L. Chuang (Massachusetts Inst. of Technology); Michael DeMarco (Brookhaven National Laboratory, Massachusetts Inst. of Technology)

Session 4C: Emerging Technologies: Superconducting, Photonics, DNA

Location: Metropolitan East

Session Chair: Koji Inoue (Kyushu University)

Efficiently Enabling Block Semantics and Data Updates in DNA Storage

Puru Sharma, Cheng-Kai Lim, Dehui Lin, Yash Pote, Djordje Jevdjic (National University of Singapore)

ReFOCUS: Reusing Light for Efficient Fourier Optics-Based Photonic Neural Network Accelerator

Shurui Li, Hangbo Yang, Chee Wei Wong (Univerisity of California Los Angeles); Volker J. Sorger (The George Washington University); Puneet Gupta (Univerisity of California Los Angeles)

SupeRBNN: Randomized Binary Neural Network Using Adiabatic Superconductor Josephson Devices

Zhengang Li, Geng Yuan (Northeastern University); Tomoharu Yamauchi (Tokyo City University); Zabihi Masoud, Yanyue Xie, Peiyan Dong (Northeastern University); Xulong Tang (University of Pittsburgh); Nobuyuki Yoshikawa (Yokohama National University); Devesh Tiwari, Yanzhi Wang (Northeastern University); Olivia Chen (Tokyo City University)

SuperBP: Design Space Exploration of Perceptron-Based Branch Predictors for Superconducting CPUs

Haipeng Zha (University of Southern California); Swamit Tannu (University of Wisconsin, Madison); Murali Annavaram (University of Southern California)

SUSHI: Ultra-High-Speed and Ultra-Low-Power Neuromorphic Chip Using Superconducting Single-Flux-Quantum Circuits

Zeshi Liu (State Key Lab of Processors, Inst. of Computing Technology,Chinese Academy of Science, China); Shuo Chen, Peiyao Qu (State Key Lab of Processors, Inst. of Computing Technology, Chinese Academy of Science, China); Huanli Liu, Minghui Niu, Liliang Ying, Jie Ren (Shanghai Inst. of Microsystem and Information Technology, Chinese Academy of Science, China); GuangMing Tang, Haihang You (State Key Lab of Processors, Inst. of Computing Technology, Chinese Academy of Science, China)

6:00 PM EDT – 7:30 PM EDT: Business Meeting

Jump to Sat/Sun | Monday | Tuesday | Wednesday

Expand All / Collapse All Sessions

Day 2: Tuesday, October 31

7:30 AM EDT – 8:30 AM EDT: Breakfast

8:30 AM EDT – 9:30 AM EDT: Keynote II (Video) by Debbie Marr Intel Fellow and Chief Architect

With Great Power Comes Great Responsibility

Location: Metropolitan Center

Abstract
The basic principles of achieving high performance in computing have remained the same, have evolved, and have presented new and different challenges. This talk will touch on some computing history, learnings, and make the case that although computing has achieved tremendous orders-of-magnitude breakthroughs, many of the challenges facing us today are curiously the same. Today’s computing landscape is more exciting than ever.

Bio
Debbie Marr is the Chief Architect of the Advanced Architecture Development Group (AADG) at Intel, where she leads visioning and developing new CPU architectures and microarchitectures for future computing needs such as AI, cloud computing, and security. Debbie’s 30+ years at Intel include roles such as the Director of Accelerator Architecture Lab in Intel Labs where she led research in machine learning and acceleration techniques for CPU, GPU, FPGA, and AI Accelerators. Debbie played leading roles on Intel CPU products from the 386SL to Intel’s current leading-edge products. Debbie was the server architect of Intel® PentiumTM Pro, Intel’s first Xeon Processor. She brought Intel Hyperthreading Technology from concept to product on the Pentium 4 Processor. She was the chief architect of the 4th Generation Intel CoreTM (Haswell), and led advanced development for Intel’s 2017/2018 Core/Xeon CPUs. Debbie holds over 40 patents in many aspects of CPU, AI accelerators, and FPGA architecture/microarchitecture. Debbie has a PhD in electrical and computer engineering from University of Michigan, an MS in electrical engineering and computer science from Cornell University, and a BS in electrical engineering and computer science from the University of California, Berkeley.

9:30 AM EDT – 9:45 AM EDT: Coffee Break

9:45 AM EDT – 11:25 AM EDT

Session 5A: Security Encryption
Confidentiality Support

Location: Metropolitan Center

Session Chair: Gururaj Saileshwar(University of Toronto / NVIDIA Research)

AQ2PNN: Enabling Two-party Privacy-Preserving Deep Neural Network Inference with Adaptive Quantization

Yukui Luo (Northeastern University); Nuo Xu (Lehigh University)); Hongwu Peng (University of Connecticut); Chenghong Wang (Duke University); Shijin Duan (Northeastern University); Kaleel Mahmood (University of Connecticut); Wujie Wen (Lehigh University); Caiwen Ding (University of Connecticut); Xiaolin Xu (Northeastern University)

CHERIoT: Complete Memory Safety for Embedded Devices

Saar Amar (Microsoft); David Chisnall (Microsoft); Tony Chen (Microsoft); Nathaniel Wesley Filardo (Microsoft); Ben Laurie (Google); Kunyan Liu, Robert Norton (Microsoft); Simon W. Moore (University of Cambridge); Yucong Tao (Microsoft); Robert N. M. Watson (University of Cambridge); Hongyan Xia (Arm)

Accelerating Extra Dimensional Page Walks for Confidential Computing

Dong Du, Bicheng Yang, Yubin Xia, Haibo Chen (Shanghai Jiao Tong University)

GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption

Kaustubh Shivdikar, Yuhui Bao (Northeastern University); Rashmi Agrawal (Boston University); Michael Shen (Northeastern University); Gilbert Jonatan (KAIST); Evelio Mora (Universidad Católica deMurcia); Alexander Ingare, Neal Livesay (Northeastern University); José L. Abellán (Universidad de Murcia); John Kim (KAIST); Ajay Joshi (Boston University / Lightmatter); David Kaeli (Northeastern University)

MAD: Memory-Aware Design Techniques for Accelerating Fully Homomorphic Encryption

Rashmi Agrawal (Boston University); Leo de Castro (MIT CSAIL); Chiraag Juvekar (Analog Devices); Anantha ChandraKasan (Massachusetts Inst. of Technology); Vinod Vaikuntanathan (MIT CSAIL); Ajay Joshi (Boston University / Lightmatter)

Session 5B: Prefetching

Location: Metropolitan West

Session Chair: Leeor Peled(Toga Networks)

Micro-Armed Bandit: Lightweight & Reusable Reinforcement Learning for Microarchitecture Decision-Making

Gerasimos Gerogiannis, Josep Torrellas (University of Illinois Urbana Champaign)

CLIP: Load Criticality based Data Prefetching for Bandwidth-constrained Many-core Systems

Biswabandan Panda (Indian Inst. of Technology Bombay)

Snake: A Variable-length Chain-based Prefetching Mechanism for GPUs

Saba Mostofi (Sharif University of Technology); Hajar Falahati (Inst. for Research in Fundamental Sciences (IPM)); Negin Mahani (Shahid Bahonar Universuty); Pejman Lotfi-Kamran (Inst. for Research in Fundamental Sciences (IPM)); Hamid Sarbazi-Azad (Sharif University of Technology, IPM)

Treelet Prefetching For Ray Tracing

Yuan Hsi Chou (University of British Columbia); Tyler Nowicki (Huawei Technologies); Tor M. Aamodt (University of British Columbia)

Session 5C: Processing-In-Memory

Location: Metropolitan East

Session Chair: Dimitrios Skarlatos(Carnegie Mellon University)

NAS-SE: Designing A Highly-Efficient In-Situ Neural Architecture Search Engine for Large-Scale Deployment

Qiyu Wan (NVIDIA); Lening Wang (University of Houston); Jing Wang (Renmin University of China); Shuaiwen Leon Song (Microsoft and University of Sydney); Xin Fu (University of Houston)

XFM: Accelerated Software-Defined Far Memory

Neel Patel, Amin Mamandipoor, Derrick Quinn, Mohammad Alian (University of Kansas)

Affinity Alloc: Taming Not-So Near-Data Computing

Zhengrong Wang (Univerisity of California, Los Angeles); Christopher Liu (University of California, Los Angeles); Nathan Beckmann (Carnegie Mellon University); Tony Nowatzki (University of California, Los Angeles)

MVC: Enabling Fully Coherent Multi-Data-Views through the Memory Hierarchy with Processing in Memory

Daichi Fujiki (Keio University)

AESPA: Asynchronous Execution Scheme to Exploit Bank-Level Parallelism of Processing-in-Memory

Hongju Kal, Chanyoung Yoo, Won Woo Ro (Yonsei University)

11:25 AM EDT – 12:15 PM EDT: Break

12:15 PM EDT – 1:30 PM EDT: Award Luncheon

1:30 PM EDT – 2:30 PM EDT: Panel

Title: Fostering Innovation in Machine Learning

Moderator: Andreas Moshovos, University of Toronto

Panelists

Nicolas Papernot, Univeristy of Toronto
Iqbal Mohome, Samsung Research
Nish Sinnadurai, Cerebras
Mark Horowitz, Stanford
Amir Yazdanbakhsh, Google
Song Han, MIT

2:30 PM EDT – 3:15 PM EDT: MICRO Posters

Location: Metropolitan West

2:30 PM EDT – 3:15 PM EDT: Coffee Break

3:15 PM EDT – 4:35 PM EDT

Session 6A: Security Hardware

Location: Metropolitan Center

Session Chair: Samira Mirbagher Ajorpaz(North Carolina State University)

ReCon: Efficient Detection, Management, and Use of Non-Speculative Information Leakage

Pavlos Aimoniotis (Uppsala University); Amund Bergland Kvalsvik (Norwegian University of Science and Technology); Xiaoyue Chen (Uppsala University); Magnus Själander (Norwegian University of Science and Technology); Stefanos Kaxiras (Uppsala University)

Uncore Encore: Covert Channels Exploiting Uncore Frequency Scaling

Yanan Guo (University of Pittsburgh); Dingyuan Cao (University of Illinois Urbana Champaign); Xin Xin, Youtao Zhang, Jun Yang (University of Pittsburgh)

Hardware Support for Constant-Time Programming

Yuanqing Miao, Mahmut Taylan Kandemir, Danfeng Zhang (Pennsylvania State University); Yingtian Zhang (Penn State University); Gang Tan (Penn State); Dinghao Wu (Pennsylvania State University)

AutoCC: Automatic Discovery of Covert Channels in Time-Shared Hardware

Marcelo Orenes-Vera, Hyunsung Yun (Princeton University); Nils Wistoff (ETH Zürich); Gernot Heiser (University of New South Wales, Sydney); Luca Benini (ETH Zürich); David Wentzlaff, Margaret Martonosi (Princeton University)

Session 6B: Datacenter Networks

Location: Metropolitan West

Session Chair: Trevor E. Carlson(National University of Singapore)

NeuroLPM - Scaling Longest Prefix Match Hardware with Neural Networks

Alon Rashelbach, Igor De-Paula, Mark Silberstein (Technion)

Space Microdatacenters

Nathaniel Bleier (University of Illinois Urbana Champaign); Muhammad Husnain Mubarik, Gary R Swenson, Rakesh Kumar (University of Illinois Urbana-Champaign)

LogNIC: A High-Level Performance Model for SmartNICs

Zerui Guo (University of Wisconsin-Madison); Jiaxin Lin (The University of Texas at Austin); Yuebin Bai (Beihang University, China); Daehyeok Kim (The University of Texas at Austin and Microsoft); Michael Swift (University of Wisconsin-Madison); Aditya Akella (The University of Texas at Austin); Ming Liu (University of Wisconsin-Madison)

Heterogeneous Die-to-Die Interfaces: Enabling More Flexible Chiplet Interconnection Systems

Yinxiao Feng, Dong Xiang, Kaisheng Ma (Tsinghua University)

Session 6C: Reliability, Availability

Location: Metropolitan East

Session Chair: Freddy Gabbay(Ruppin Academic College)

Predicting Future-System Reliability with a Component-Level DRAM Fault Model

Jeageun Jung, Mattan Erez (University of Texas at Austin)

Impact of Voltage Scaling on Soft Errors Susceptibility of Multicore Server CPUs

Dimitris Agiakatsikas (University of Piraeus); George Papadimitriou, Vasileios Karakostas, Dimitris Gizopoulos (University of Athens); Mihalis Psarakis (University of Piraeus); Camille Belanger-Champagne, Ewart Blackmore (TRIUMF)

Si-Kintsugi: Recovering Golden-Like Performance of Defective Many-Core Spatial Architectures for AI

Edward Hanson, Shiyu Li, Guanglei Zhou, Feng Cheng, Yitu Wang, Rohan Bose, Hai "Helen" Li, Yiran Chen (Duke University)

How to Kill the Second Bird with One ECC: The Pursuit of Row Hammer Resilient DRAM

Michael Jaemin Kim, Minbok Wi, Jaehyun Park, Seoyoung Ko, Jae Young Choi, Hwayoung Nam (Seoul National University); Nam Sung Kim (University of Illinois Urbana Champaign); Jung Ho Ahn (Seoul National University); Eojin Lee (Inha University)

4:35 PM EDT – 4:45 PM EDT: Break

4:45 PM EDT – 5:45 PM EDT

Session 7A: Accelerators Various

Location: Metropolitan Center

Session Chair: Alex K. Jones(University of Pittsburgh)

Bucket Getter: A Bucket-based Processing Engine for Low-bit Block Floating Point (BFP) DNNs

Yun-Chen Lo, Ren-Shuo Liu (National Tsing Hua University)

ACRE: Accelerating Random Forests for Explainability

Andrew McCrabb, Aymen Ahmed, Valeria Bertacco (University of Michigan)

δLTA: Decoupling Camera Sampling from Processing to Avoid Redundant Computations in the Vision Pipeline

Raul Taranco Serna, Jose Maria Arnau, Antonio Gonzalez (Polytechnic University of Catalonia)

Session 7B: Caches, Intermitent Computing,
Persistency

Location: Metropolitan West

Session Chair: Rachata Ausavarungnirun(King Mongkut's University of Technology North Bangkok)

McCore: A Holistic Management of High-Performance Heterogeneous Multicores

Jaewon Kwon, Yongju Lee, Hongju Kal, Minjae Kim, Youngsok Kim, Won Woo Ro (Yonsei University)

SweepCache: Intermittence-Aware Cache on the Cheap

Yuchen Zhou, Jianping Zeng, Jungi Jeong (Purdue University); Jongouk Choi (University of Central Florida); Changhee Jung (Purdue University)

Persistent Processor Architecture

Jianping Zeng, Jungi Jeong, Changhee Jung (Purdue University)

Session 7C: SRC Competition

Location: Metropolitan East

6:45 PM EDT – 9:45 PM EDT: Excursion & Banquet at the Art Gallery of Ontario

Buses depart starting at 6:15 PM

Jump to Sat/Sun | Monday | Tuesday | Wednesday

Expand All / Collapse All Sessions

Day 3: Wednesday, November 1

7:30 AM EDT – 8:30 AM EDT: Breakfast

8:30 AM EDT – 9:30 AM EDT: Keynote III (Video) by Mark Horowitz Yahoo! Founders Professor in the School of Engineering and Professor of Computer Science, Stanford

Life Post Moore’s Law: The New Design Frontier

Location: Metropolitan Center

Abstract
For over 50 years, information technology has relied upon Moore’s Law: providing, for the same cost, 2x the number of logic transistors that were possible a few years prior. For much of that time, the smaller devices also provided dramatic energy and performance improvement through Dennard Scaling, but that scaling ended over a decade ago. While technology scaling continues, per transistor cost is no longer scaling in the advanced nodes. In this post Moore’s Law reality, further price/performance improvement follows only from improving the efficiency of applications using innovative hardware and software techniques. Unfortunately, this need for innovative system solutions runs smack into the enormous complexity of designing and debugging contemporary VLSI based hardware/software platforms; a task so large it has caused the industry to consolidate, moving it away from innovation. The result is a set of platforms aim at different computing markets. To overcome this challenge, we need to develop a new design approach and tools to enable small groups of application experts to selectively extend the performance of those successful platforms. Like the ASIC revolution in the 1980s, the goal of this approach is to enable a new set of designers, then board level logic designers, now application experts, to leverage the power of customized silicon solutions. Like then, these tools won’t initially be useful for current chip designers, but over time will underly all designs. In the 1980s to provide access to logic designers, the key technologies were logic synthesis, simulation, and placement/routing of their designs to gate arrays and std cells. Today, the key is to realize you are creating an “app” for an existing platform, and not creating the system solution from scratch (which is both too expensive and error prone), and to leverage the fact that modern “chips” are made of many chiplets. The new approach must provide a design window familiar to application developers, with similar descriptive, performance tuning, and debug capabilities. These new tools will be tied to highly capable platforms that are used as the foundation, like the appStore model for mobile phones. This talk will try to convince you this might be possible, and encourage you to help contribute to this effort.

Bio
Mark Horowitz is the Yahoo! Founders Professor at Stanford University and chair of the Electrical Engineering Department. He co-founded Rambus, Inc. in 1990 and is a fellow of the IEEE and the ACM and a member of the National Academy of Engineering and the American Academy of Arts and Science. Dr. Horowitz's research interests are quite broad and span using EE and CS analysis methods to problems in molecular biology to creating new design methodologies for analog and digital VLSI circuits.

9:30 AM EDT – 9:45 AM EDT: Coffee Break

9:45 AM EDT – 11:25 AM EDT

Session 8A: Accelerators for Neural Nets
Accelerators for Matrix Processing

Location: Metropolitan Center

Session Chair: Jason Clemons(NVIDIA)

ADA-GP: Accelerating DNN Training By ADAptive Gradient Prediction

Vahid Janfaza, Shantanu Mandal, Farabi Mahmud, Abdullah Muzahid (Texas A&M University)

HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity

Yannan Nellie Wu (Massachusetts Inst. of Technology); Po-An Tsai, Saurav Muralidharan, Angshuman Parashar (NVIDIA); Vivienne Sze (Massachusetts Inst. of Technology); Joel Emer (MIT/NVIDIA)

Exploiting Inherent Properties of Complex Numbers for Accelerating Complex Valued Neural Networks

Hyunwuk Lee, Hyungjun Jang, Sungbin Kim, Sungwoo Kim, Wonho Cho, Won Woo Ro (Yonsei University)

Point Cloud Acceleration by Exploiting Geometric Locality

Cen Chen (South China University of Technology); Xiaofeng Zou (Hunan University); Hongen Shao (South China University of Technology); Yangfan Li (Central South University); Kenli Li (College of Information Science and Engineering, National Supercomputing Center in Changsha, Hunan University)

HARP: Hardware-Based Pseudo-Tiling for Sparse Matrix Multiplication Accelerator

Jinkwon Kim, Myeongjae Jang, Haejin Nam, Soontae Kim (KAIST)

Session 8B: Virtual Memory (Translation)

Location: Metropolitan West

Session Chair: Mohammad Alian (University of Kansas)

IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE Invalidations

Bingyao Li, Yanan Guo, Yueqi Wang (University of Pittsburgh); Aamer Jaleel (NVIDIA); Jun Yang, Xulong Tang (University of Pittsburgh)

Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache Resources

Konstantinos Kanellopoulos, Hong Chul Nam, Nisa Bostanci, Rahul Bera, Mohammad Sadrosadati (ETH Zürich); Rakesh Kumar (Norwegian University of Science and Technology (NTNU)); Davide Basilio Bartolini (Huawei); Onur Mutlu (ETH Zürich)

Utopia: Efficient Address Translation using Hybrid Virtual-to-Physical Address Mapping

Konstantinos Kanellopoulos, Rahul Bera, Kosta Stojiljkovic, Nisa Bostanci, Can Firtina (ETH Zürich); Rachata Ausavarungnirun (King Mongkut's University of Technology North Bangkok); Rakesh Kumar (Norwegian University of Science and Technology (NTNU)); Nastaran Hajinazar (Intel Labs); Mohammad Sadrosadati (ETH Zürich); Nandita Vijaykumar (University of Toronto); Onur Mutlu (ETH Zürich)

Architectural Support for Optimizing Huge Page Selection Within the OS

Aninda Manocha (Princeton University); Zi Yan (NVIDIA); Esin Tureci (Princeton University); Juan L. Aragón (University of Murcia); David Nellans (NVIDIA); Margaret Martonosi (Princeton University)

Session 8C: Benchmarking and Methodology

Location: Metropolitan East

Session Chair: Miquel Moretó (Universitat Politècnica de Catalunya/Barcelona Supercomputing Center)

Photon: A Fine-grained Sampled Simulation Methodology for GPU Workloads

Changxi Liu (National University of Singapore); Yifan Sun (College of William and Mary); Trevor E. Carlson (National University of Singapore)

Rigorous Evaluation of Computer Processors with Statistical Model Checking

Filip Mazurek, Arya Tschand (Duke University); Yu Wang (University of Florida); Miroslav Pajic, Daniel Sorin (Duke University)

TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators

Nandeeka Nayak (University of Illinois Urbana Champaign); Toluwanimi O. Odemuyiwa (University of California, Davis); Shubham Ugare, Christopher Fletcher (University of Illinois Urbana Champaign); Michael Pellauer (NVIDIA); Joel Emer (MIT/NVIDIA)

TileFlow: A Framework for Modeling Fusion Dataflow via Tree-based Analysis

Size Zheng, Siyuan Chen, Siyuan Gao, Liancheng Jia, Guangyu Sun, Runsheng Wang, Yun Liang (Peking University)

Learning to Drive Software-Defined Solid-State Drives

Daixuan Li, Jinghan Sun, Jian Huang (University of Illinois Urbana Champaign)

11:25 AM EDT – 11:40 AM EDT: Break

11:40 AM EDT – 1:00 PM EDT

Session 9A: Accelerators in Processors

Location: Metropolitan Center

Session Chair: Sihang Liu (University of Waterloo)

ARTist: A Fully Fused Accelerator for Real-Time Learning of Neural Scene Representation

Xinkai Song, Yuanbo Wen (Inst. of Computing Technology, Chinese Academy of Sciences); Xing Hu (Chinese Academy of Sciences, Inst. of Computing Technology); Tianbo Liu (University of Science and Technology of China); Haoxuan Zhou (University of Chinese Academy of Sciences); Husheng Han, Tian Zhi, Zidong Du, Wei Li, Rui Zhang (Inst. of Computing Technology, Chinese Academy of Sciences); Chen Zhang (Shanghai Jiao Tong University); Lin Gao, Qi Guo (Inst. of Computing Technology, Chinese Academy of Sciences); Tianshi Chen (Cambricon Technologies, Beijing, China)

Strix: An End-to-End Streaming Architecture with Two-Level Ciphertext Batching for Fully Homomorphic Encryption with Programmable Bootstrapping

Adiwena Putra, Prasetiyo, Yi Chen, John Kim, Joo-Young Kim (KAIST)

A Tensor Marshaling Unit for Sparse Tensor Algebra on General-Purpose Processors

Marco Siracusa, Víctor Soria-Pardos, Francesco Sgherzi (Barcelona Supercomputing Center and Universitat Politècnica de Catalunya); Joshua Randall (Arm); Douglas J. Joseph (Samsung); Miquel Moreto, Adria Armejach (Barcelona Supercomputing Center and Universitat Politècnica de Catalunya)

Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity

Zi Yu Xue, Yannan Nellie Wu (Massachusetts Inst. of Technology); Joel Emer (Massachusetts Inst. of Technology/NVIDIA); Vivienne Sze (Massachusetts Inst. of Technology)

Session 9B: ML Compiler Optimizations
Reconfigurable Architectures

Location: Metropolitan West

Session Chair: Jian Huang (University of Illinois Urbana-Champaign)

Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUs

Bojian Zheng (CentML / University of Toronto / Vector Inst.); Cody Hao Yu, Jie Wang (Amazon); Yaoyao Ding (CentML / University of Toronto / Vector Inst.); Yizhi Liu, Yida Wang (Amazon); Gennady Pekhimenko (CentML / University of Toronto / Vector Inst.)

PockEngine: Sparse and Efficient Fine-tuning in a Pocket

Ligeng Zhu (Massachusetts Inst. of Technology); Lanxiang Hu (Columbia); Ji Lin, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, Song Han (Massachusetts Inst. of Technology)

Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane

Jinyi Deng, Xinru Tang, Jiahao Zhang, Yuxuan Li, Linyun Zhang (Tsinghua University); Boxiao Han, Hongjun He (China Mobile Research Inst.); Fengbin Tu (Hong Kong University of Science and Technology); Leibo Liu, Shaojun Wei, Yang Hu (Tsinghua University); Shouyi Yin (Tsinghua University / Shanghai AI Lab)

Pipestitch: An energy-minimal dataflow architecture with lightweight threads

Nathan Serafin, Souradip Ghosh, Harsh Desai, Nathan Beckmann, Brandon Lucia (Carnegie Mellon University)

Session 9C: Domain Specific Genomics

Location: Metropolitan East

Session Chair: Pradip Bose (IBM)

CASA: An Energy-Efficient and High-Speed CAM-based SMEM Seeding Accelerator for Genome Alignment

Yi Huang (Tsinghua University); Lingkun Kong (Rice University); Dibei Chen (Tsinghua University); Zhiyu Chen (Rice University); Xiangyu Kong (Tsinghua University); zhu jianfeng (tsinghua university); Konstantinos Mamouras (Rice University); Shaojun Wei (Tsinghua University); Kaiyuan Yang (Rice University); Leibo Liu (Tsinghua University)

Swordfish: A Framework for Evaluating Deep Neural Network-based Basecalling using Computation-In-Memory with Non-Ideal Memristors

Taha Shahroodi (Technische University Delft); Gagandeep Singh (AMD Research); Mahdi Zahedi (Technische University Delft); Haiyu Mao, Joel Lindegger, Can Firtina (ETH Zürich); Stephan Wong (Technische University Delft); Onur Mutlu (ETH Zürich); Said Hamdioui (Technische University Delft)

DASH-CAM: Dynamic Approximate SearcH Content Addressable Memory for genome classification

Zuher Jahshan, Itay Merlin (Bar Ilan University); Esteban Garzón (University of Calabria); Leonid Yavits (Bar Ilan university)

GMX: Instruction Set Extensions for Fast, Scalable, and Efficient Genome Sequence Alignment

Max Doblas Font, Oscar Lostes-Cazorla (Barcelona Supercomputing Center); Quim Aguado-Puig (Universitat Autònoma de Barcelona); Nick Cebry (Cornell University); Pau Fontova (Barcelona Supercomputing Center); Christopher Batten (Cornell University); Santiago Marco-Sola (Universitat Autònoma de Barcelona); Miquel Moreto (Barcelona Supercomputing Center, UPC)

1:00 PM EDT – 1:15 PM EDT: Closing Remarks

Jump to Sat/Sun | Monday | Tuesday | Wednesday

Expand All / Collapse All Sessions

MICRO 2023

October 28–November 1, 2023

Main Program Westin Harbour Castle

Sunday, 6:00 PM EDT – 9:00 PM EDT: Welcome Reception

Day 1: Monday, October 30

7:30 AM EDT – 8:15 AM EDT: Breakfast

8:15 AM EDT – 8:30 AM EDT: Opening Remarks

8:30 AM EDT – 9:30 AM EDT: Keynote I (Video) by Amin Vahdat Vice President of ML, Systems and Cloud AI at Google

9:30 AM EDT – 10:30 AM EDT: Best Papers

10:30 AM EDT – 11:00 AM EDT: Coffee Break

11:00 AM EDT – 12:00 PM EDT

12:00 PM EDT – 1:00 PM EDT: Lunch

1:00 PM EDT – 2:00 PM EDT

2:00 PM EDT – 2:15 PM EDT: Break

2:15 PM EDT – 3:15 PM EDT

3:15 PM EDT – 4:15 PM EDT: ACM Student Research Competition & MICRO PhD Forum Posters

3:15 PM EDT – 4:15 PM EDT: Hot Baked Chips

3:15 PM EDT – 4:15 PM EDT: Coffee Break

4:15 PM EDT – 5:55 PM EDT

6:00 PM EDT – 7:30 PM EDT: Business Meeting

Day 2: Tuesday, October 31

7:30 AM EDT – 8:30 AM EDT: Breakfast

8:30 AM EDT – 9:30 AM EDT: Keynote II (Video) by Debbie Marr Intel Fellow and Chief Architect

9:30 AM EDT – 9:45 AM EDT: Coffee Break

9:45 AM EDT – 11:25 AM EDT

11:25 AM EDT – 12:15 PM EDT: Break

12:15 PM EDT – 1:30 PM EDT: Award Luncheon

1:30 PM EDT – 2:30 PM EDT: Panel

2:30 PM EDT – 3:15 PM EDT: MICRO Posters

2:30 PM EDT – 3:15 PM EDT: Coffee Break

3:15 PM EDT – 4:35 PM EDT

4:35 PM EDT – 4:45 PM EDT: Break

4:45 PM EDT – 5:45 PM EDT

6:45 PM EDT – 9:45 PM EDT: Excursion & Banquet at the Art Gallery of Ontario

Day 3: Wednesday, November 1

7:30 AM EDT – 8:30 AM EDT: Breakfast

8:30 AM EDT – 9:30 AM EDT: Keynote III (Video) by Mark Horowitz Yahoo! Founders Professor in the School of Engineering and Professor of Computer Science, Stanford

9:30 AM EDT – 9:45 AM EDT: Coffee Break

9:45 AM EDT – 11:25 AM EDT

11:25 AM EDT – 11:40 AM EDT: Break

11:40 AM EDT – 1:00 PM EDT

1:00 PM EDT – 1:15 PM EDT: Closing Remarks