Sunday, 6:00 PM CT – 9:00 PM CT: Welcome Reception

Location: TBA


Day 1: Monday, November 4

7:30 AM CT – 8:15 AM CT: Breakfast

8:15 AM CT – 8:30 AM CT: Opening Remarks

8:30 AM CT – 9:30 AM CT: Keynote I by Moinuddin Qureshi Professor, School of Computer Science, Georgia Tech

Session Chair: Alaa Alameldeen (Simon Fraser University)

Abstract
The primary role of a computing system is to reliably perform the tasks it is assigned. Factors such as performance and power efficiency are meaningful only so long as the system can reliably execute the specified computations. As Moore's Law reaches its limits, devices are becoming less reliable and are experiencing new types of failures. Addressing these reliability challenges efficiently is crucial to scaling to smaller technology nodes. Moreover, reliability tends to be a significant hurdle for all emerging technologies, whether it's new memory technologies, DNA storage, or quantum computing. Successfully overcoming the reliability challenges also plays a crucial role in enabling the emerging technologies. In this talk, I will share our recent work on DRAM reliability and quantum computing to show how efficiently addressing device failures can help us push the boundaries -- whether extending Moore's Law or enabling quantum computing.

Bio
Moinuddin Qureshi is a Professor of Computer Science at Georgia Tech. His research interests include computer architecture, hardware security, and quantum computing. Previously, he was a research scientist at IBM T. J. Watson (2007-2011), where he developed caching algorithms for Power-7. Qureshi received the 2022 ACM SIGARCH Maurice Wilkes Award for contributions to high-performance memory systems and is a fellow of ACM and IEEE. His research has been recognized with several best-paper awards, multiple inclusions at the ISCA-50 retrospective, and several "impact" awards. Qureshi is passionate about teaching and mentoring students. Several of his former PhD advisees are faculty members at top academic institutions (including one at UT Austin). Qureshi received the 2024 "Outstanding Doctoral Thesis Advisor Award" at Georgia Tech. Qureshi is a Longhorn, having received his Ph.D. from UT Austin in 2007.

9:30 AM CT – 9:50 AM CT: Break

9:50 CT – 10:50 AM CT

Session Chair: Xun (Steve) Jian (Virginia Tech)
Hardware-Assisted Virtualization of Neural Processing Units for Cloud Platforms
Best Paper Candidate
Yuqi Xue, Yiqi Liu (University of Illinois Urbana-Champaign); Lifeng Nai (Google); Jian Huang (University of Illinois Urbana-Champaign)

Elastic Translations: Fast virtual memory with multiple translation sizes
Stratos Psomadakis (National Technical University of Athens); Chloe Alverti (University of Illinois at Urbana-Champaign); Vasileios Karakostas (University of Athens); Christos Katsakioris, Dimitrios Siakavaras, Konstantinos Nikas, Georgios Goumas, Nectarios Koziris (National Technical University of Athens)

Distributed Page Table: Harnessing Physical Memory as an Unbounded Hashed Page Table
Osang Kwon, Yongho Lee, Junhyeok Park, Sungbin Jang (Sungkyunkwan University); Byungchul Tak (Kyungpook National University); Seokin Hong (Sungkyunkwan University)
Session Chair: Po-An Tsai (NVIDIA)
CamPU: A Multi-Camera Processing Unit for Deep Learning-based 3D Spatial Computing Systems
Dongseok Im, Hoi-Jun Yoo (KAIST)

AdapTiV: Sign-Similarity based Image-Adaptive Token Merging for Vision Transformer Acceleration
Seungjae Yoo, Hangyeol Kim, Joo-Young Kim (KAIST)

Fusion-3D: Integrated Acceleration for Instant 3D Reconstruction and Real-Time Rendering
Best Paper Candidate
Sixu Li, Yang Zhao, Chaojian Li, Bowei Guo, Jingqun Zhang, Wenbo Zhu, Zhifan Ye, Cheng Wan, Yingyan (Celine) Lin (Georgia Institute of Technology)
Session Chair: Todd Austin (University of Michigan/Agita Labs)
Secure Prefetching for Secure Cache Systems
Sumon Nath (Indian Institute of Technology Bombay); Agustin Navarro-Torres, Alberto Ros (University of Murcia); Biswabandan Panda (Indian Institute of Technology Bombay)

HyperTEE: A Decoupled TEE Architecture with Secure Enclave Management
Yunkai Bai (Institute of Information Engineering, CAS); Peinan Li (Institute of Information Engineering, Chinese Academy of Sciences); Yubiao Huang (Institute of Information Engineering, CAS); Michael C. Huang (University of Rochester); shijun zhao (Institute of Information Enginerring, CAS); Lutan Zhao (Institute of Information Engineering, CAS); Fengwei Zhang (Southern University of Science and Technology); Dan Meng, Rui Hou (Institute of Information Engineering, CAS)

Defending Against EMI Attacks on Just-In-Time Checkpoint for Resilient Intermittent Systems
Jaeseok Choi (University of Central Florida); Hyunwoo Joe (ETRI); Changhee Jung (Purdue University); Jongouk Choi (University of Central Florida)
Posters from Sessions 5A, 5B, 5C, 9A, and 8B

10:50 AM CT – 11:00 AM CT: Break


11:00 CT – 12:00 PM CT

Session Chair: Dam Sunwoo (ARM)
A Mess of Memory System Benchmarking, Simulation and Application Profiling
Best Paper Candidate
Pouya Esmaili-Dokht, Francesco Sgherzi, Valeria Soldera Girelli (Barcelona Supercomputing Center, Unversitat Politecnica De Catalunya); Isaac Boixaderas, Mariana Carmin, Alireza Monemi (Barcelona Supercomputing Center); Adria Armejach (Barcelona Supercomputing Center, Unversitat Politecnica De Catalunya); Estanislao Mercadal, Germán Llort, Petar Radojković (Barcelona Supercomputing Center); Miquel Moreto, Judit Giménez, Xavier Martorell, Eduard Ayguadé, Jesus Labarta (Barcelona Supercomputing Center, Unversitat Politecnica De Catalunya); Emanuele Confalonieri, Rishabh Dubey, Jason Adlard (Micron Technology)

vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training
Jehyeon Bang, Yujeong Choi (KAIST); Myeongwoo Kim, Yongdeok Kim (Samsung Advanced Institute of Technology); Minsoo Rhu (KAIST)

HyFiSS: A Hybrid Fidelity Stall-Aware Simulator for GPGPUs
Jianchao Yang, Mei Wen (Key Laboratory of Advanced Microprocessor Chips and Systems, College of Computer, National University of Defense Technology); Dong Chen (Huawei Technologies Co., Ltd); Zhaoyun Chen, Zeyu Xue, Yuhang Li, Junzhong Shen, Yang Shi (Key Laboratory of Advanced Microprocessor Chips and Systems, College of Computer, National University of Defense Technology)
Session Chair: Changhee Jung (Purdue University)
Unleashing CPU Potential for Executing GPU Programs through Compiler/Runtime Optimizations
Ruobing Han, Jisheng Zhao, Hyesoon Kim (Georgia Institute of Technology)

A framework for fine-grained program versioning
Yishen Chen, Saman Amarasinghe (MIT)

LightWSP: Whole-System Persistence on the Cheap
Yuchen Zhou, Jianping Zeng, Changhee Jung (Purdue University)
Session Chair: Siva Hari (NVIDIA)
DelayAVF: Calculating Architectural Vulnerability Factors for Delay Faults
Peter W. Deutsch (MIT); Vincent Quentin Ulitzsch (MIT/TU Berlin); Sudhanva Gurumurthi, Vilas Sridharan (AMD); Joel Emer (MIT/NVIDIA); Mengjia Yan (MIT)

Polymorphic Error Correction
Evgeny Manzhosov; Simha Sethumadhavan (Columbia University/ Chip Scan Inc)

DRCTL: A Disorder-Resistant Computation Translation Layer Enhancing the Lifetime and Performance of Memristive CIM Architecture
Heng Zhou, Bing Wu, Huan Cheng, Jinpeng Liu, Taoming Lei, Dan Feng, Wei Tong (Huazhong University of Science and Technology)
Posters from Sessions 6A, 6B, 6C, 9B, 8As

12:00 PM CT – 1:30 PM CT: Lunch

1:30 CT – 2:50 PM CT

Session Chair: Biswabandan Panda (Indian Institute of Techonlogy, Bombay)
A Case for Speculative Address Translation with Rapid Validation for GPUs
Best Paper Candidate
Junhyeok Park, Osang Kwon, Yongho Lee, Seongwook Kim, Gwangeun Byeon, Jihun Yoon (Sungkyunkwan University); Prashant J. Nair (The University of British Columbia); Seokin Hong (Sungkyunkwan University)

SUV: Static analysis guided Unified Virtual Memory
Pratheek B (Indian Institute of Science); Guilherme Cox, Jan Vesely (NVIDIA); Arkaprava Basu (Indian institute of Science)

STAR: Sub-Entry Sharing-Aware TLB for Multi-Instance GPU
Bingyao Li, Yueqi Wang, Tianyu Wang (University of Pittsburgh); Lieven Eeckhout (Ghent University); Jun Yang (University of Pittsburgh); Aamer Jaleel (NVIDIA); Xulong Tang (University of Pittsburgh)

CacheCraft: Enhancing GPU Performance under Memory Protection through Reconstructed Caching
Soyoung Park, Hojung Namkoong, Boyeol Choi (Sungkyunkwan University); Michael Sullivan (NVIDIA); Jungrae Kim (Sungkyunkwan University)
Session Chair: Jung Ho Ahn (Seoul National University)
Trinity: A General Purpose FHE Accelerator
Xianglong Deng, Shengyu Fan (Institute of Information Engineering, Chinese Academy of Sciences); Zhicheng Hu (University of Electronic Science and Technology of China); Zhuoyu Tian, Zihao Yang (Institute of Information Engineering, Chinese Academy of Sciences); Jiangrui Yu (Peking University); Dingyuan Cao (University of Illinois Urbana-Champaign); Dan Meng, Rui Hou (Institute of Information Engineering, Chinese Academy of Sciences); Meng Li (Peking University); Qian Lou (University of Central Florida); Mingzhe Zhang (Institute of Information Engineering, Chinese Academy of Sciences)

UFC: A Unified Accelerator for Fully Homomorphic Encryption
Minxuan Zhou (Illinois Institute of Technology); Yujin Nam, Xuan Wang, Youhak Lee (University of California San Diego); Chris Wilkerson, Raghavan Kumar, Sachin Taneja, Sanu Mathew, Rosario Cammarota (Intel Labs); Tajana Rosing (University of California San Diego)

Accelerating Zero-Knowledge Proofs Through Hardware-Algorithm Co-Design
Best Paper Candidate
Nikola Samardzic, Simon Langowski, Srinivas Devadas, Daniel Sanchez (Massachusetts Institute of Technology)

A Compiler-Like Framework for Optimizing Cryptographic Big Integer Multiplication on GPUs
Zhuoran Ji (Shandong University); Jianyu Zhao (Independent); Zhaorui Zhang (The Hong Kong Polytechnic University); Jiming Xu, Shoumeng Yan (Ant Group); Lei Ju (Shandong University)
Session Chair: Abdullah Muzahid (Texas A&M)
Beehive: A Flexible Network Stack for Direct-Attached Accelerators
Katie Lim, Matthew Giordano, Theano Stavrinos (University of Washington); Jacob Nelson (Microsoft Research); Irene Zhang (Microsoft Research/University of Washington); Baris Kasikci (University of Washington and Google); Thomas Anderson (University of Washington)

Stellar: An Automated Design Framework for Dense and Sparse Spatial Accelerators
Hasan Nazim Genc, Hansung Kim (University of California, Berkeley); Prashanth Ganesh, Sophia Shao (UC Berkeley)

LUCIE: A Universal Chiplet-Interposer Design Framework for Plug-and-Play Integration
Zixi Li, David Wentzlaff (Princeton University)

A Scalable, Efficient, and Robust Dynamic Memory Management Library for HLS-based FPGAs
Qinggang Wang, Long Zheng, Zhaozeng An, Shuyi Xiong, Runze Wang, Yu Huang, Pengcheng Yao, Xiaofei Liao, Hai Jin (Huazhong University of Science and Technology); Jingling Xue (UNSW Sydney)
Posters from Sessions 7A, 7B, 7C, 9C

2:50 PM CT – 3:50 PM CT

Session Chair: Benjamin Lee (University of Pennsylvania)


3:50 PM CT – 4:10 PM CT: Break


4:10 CT – 5:50 PM CT

Session Chair: Heiner Litz (UC Santa Cruz)
Customizing Cache Indexing Through Entropy Estimation
Kevin Weston, Vahid Janfaza, Avery Johnson, Farabi Mahmud, Abdullah Muzahid (Texas A&M University)

The Last-Level Branch Predictor
David Schall (University of Edinburgh); Andreas Sandberg (Arm Research); Boris Grot (University of Edinburgh)

Timely, Efficient, and Accurate Branch Pre-computation
Aniket Deshmukh, Chester(Lingzhe) Cai, Yale Patt (UT Austin)

Localizing the Tag Comparisons in the Wakeup Logic to Reduce Energy Consumption of the Issue Queue
Kenichiro Mori, Sota Kosugi, Hiroto Yoshida, Hajime Shimada, Hideki Ando (Nagoya University)

RTL2MµPATH: Multi-μPATH Synthesis with Applications to Hardware Security Verification
Yao Hsiao (Stanford University); Nikos Nikoleris, Artem Khyzha (Arm); Dominic P. Mulligan, Gustavo Petri (Amazon Web Services); Christopher W. Fletcher (University of California, Berkeley); Caroline Trippel (Stanford University)
Session Chair: Yingyan (Celine) Lin (Georgia Tech)
SRender: Boosting Neural Radiance Field Efficiency via Sensitivity-Aware Dynamic Precision Rendering
Zhuoran Song, Houshu He, Fangxin Liu (Shanghai Jiao Tong University); Yifan Hao (Institute of Computing Technology, Chinese Academy of Sciences); Xinkai Song (Institute of Computing Technology,Chinese Academy of Sciences); Li Jiang (Shanghai Jiaotong University); Xiaoyao Liang (Shanghai Jiao Tong University)

EMP: Efficient 4-bit Matrix Unit via Primitivization
Yi Chen (University of Science and Technology of China); Yongwei Zhao, Yifan Hao (Institute of Computing Technology, Chinese Academy of Sciences); Yuntao Dai (University of Science and Technology of China); Yang Liu (Institute of Computing Technology (lCT), CAS, China); Rui Zhang, Mo Zou, Yuanbo Wen, Xinkai Song, Xiaqing Li, Xing Hu, Zidong Du (Institute of Computing Technology, Chinese Academy of Sciences); Huaping Chen (University of Science and Technology of China); Qi Guo (Institute of Computing Technology, Chinese Academy of Sciences); Tianshi Chen (Cambricon Technologies)

BBS: Bi-directional Bit-level Sparsity for Deep Learning Acceleration
Yuzong Chen, Jian Meng, Jae-sun Seo, Mohamed Abdelfattah (Cornell University)

SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelereators
Mohanad Odema, Luke Chen (University of California Irvine); Hyoukjun Kwon (University of California, Irvine); Mohammad Al Faruque (UC Irvine)

SCALE: A Structure-Centric Accelerator for Message Passing Graph Neural Networks
Lingxiang Yin, Sanjay Gandham, Mingjie Lin, Hao Zheng (University of Central Florida)
Session Chair: Mingyu Gao (Tsinghua University)
Low-overhead General-purpose Near-Data Processing in CXL Memory Expanders
Hyungkyu Ham, Jeongmin Hong, Geonwoo Park, Yunseon Shin, Okkyun Woo, Wonhyuk Yang, Jinhoon Bae, Eunhyeok Park (POSTECH); Hyojin Sung (Seoul National University); Euicheol Lim (SK Hynix); Gwangsun Kim (POSTECH)

PIFS-Rec: Process-In-Fabric-Switch for Large-Scale Recommendation System Inferences
Pingyi Huo, Anusha Devulapally (The Pennsylvania State University); Hasan Al Maruf, Minseo Park, Krishnakumar Nair, Meena Arunachalam (AMD, Inc); Gulsum Gudukbay Akbulut, Mahmut Taylan Kandemir, Vijaykrishnan Narayanan (The Pennsylvania State University)

PIM-MMU: A Memory Management Unit for Accelerating Data Transfers in Commercial PIM Systems
Dongjae Lee, Bongjoon Hyun, Taehun Kim, Minsoo Rhu (KAIST)

Azul: An Accelerator for Sparse Iterative Solvers Leveraging Distributed On-Chip Memory
Axel Feldmann, Courtney Golden, Yifan Yang (MIT); Joel Emer (MIT/Nvidia); Daniel Sanchez (MIT)

FloatAP: Supporting High-Performance Floating-Point Arithmetic in Associative Processors
Kailin Yang, José F. Martínez (Cornell University)
Posters from the PhD Forum

6:00 PM CT – 7:30 PM CT: Business Meeting



Day 2: Tuesday, November 5

7:30 AM CT – 8:30 AM CT: Breakfast

8:30 AM CT – 9:30 AM CT: Keynote II by Gilles Pokam

Session Chair: Daniel A. Jiménez (Texas A&M)

Abstract
Historically, single-thread CPU performance has improved steadily. However, the relentless quest for higher performance has uncovered fundamental limitations, particularly as chip design complexity continues to rise. Consequently, traditional methods for enhancing CPU performance—such as scaling the depth and width of processors—are producing diminishing returns. At the same time, modern data science techniques are being successfully applied to various aspects of chip design to address this complexity. While there have been some initiatives in this direction in the microarchitecture community, the pace of adoption is not keeping up with the need. The microarchitecture community must reinvent itself by adopting modern data science methods to drive innovation despite increasing design complexity. In this talk, I will describe a few examples of microarchitectural features in the light of modern data science techniques and discuss the optimization opportunities these techniques provide that could not have been achieved without them.

Bio
Dr. Gilles Pokam is a Senior Principal Engineer at Intel. Before joining Intel, he was a postdoctoral researcher at UC San Diego and a researcher at the IBM T.J. Watson Research Center in New York. His research focuses on microarchitecture and its interactions with system software and security. Currently, Dr. Pokam leads efforts to develop innovative CPU microarchitectures through the application of AI and the use of beyond-CMOS devices. He holds a Ph.D. in Computer Science from INRIA (France) and has been awarded over 30 patents, along with more than 50 publications at leading conferences in microarchitecture and system software. Dr. Pokam is a two-time recipient of the IEEE Top Pick Award and had his research selected for inclusion in the 2023 ISCA@50 25-Year Retrospective. He is also a member of the MICRO Hall of Fame.

9:30 AM CT – 9:50 AM CT: Break

9:50 CT – 10:50 AM CT

Session Chair: Ramyad Hadidi (Rain AI)
Atomic Cache: Enabling Efficient Fine-Grained Synchronization with Relaxed Memory Consistency on GPGPUs through In-Cache Atomic Operations
Yicong Zhang, Mingyu Wang, Wangguang Wang, Yangzhan Mai, Haiqiu Huang, Zhiyi Yu (School of Microelectronics Science and Technology, Sun Yat-Sen University)

Concurrency-Aware Register Stacks for Efficient GPU Function Calls
Ni Kang (Purdue University); Mengchi Zhang (Meta/Purdue University); Ahmad Alawneh, Timothy G. Rogers (Purdue University)

CPElide: Efficient Multi-Chiplet GPU Implicit Synchronization
Preyesh Dalmia (NVIDIA); Rajesh Shashi Kumar (ARM); Matt Sinclair (UW-Madison, AMD Research)
Session Chair: Yunong Shi (AWS Quantum Technologies)
Flag-Proxy Networks: Overcoming Architectural
Scheduling and Decoding Obstacles in Quantum LDPC Codes, Suhas Vittal (Georgia Tech); Ali Javadi, Andrew W. Cross, Lev Bishop (IBM T.J Watson Research Center); Moinuddin Qureshi (Georgia Tech)

Qoncord: A Multi-Device Job Scheduling Framework for Variational Quantum Algorithms
Meng Wang (The University of British Columbia); Poulami Das (UT Austin); Prashant J. Nair (The University of British Columbia)

Surf-Deformer: Mitigating Dynamic Defects on Surface Code via Adaptive Deformation
Keyi Yin (University of California, San Diego); Xiang Fang (University of California, Santa Barbara); Travis Humble (Quantum Science Center, Oak Ridge National Laboratory); Ang Li (Pacific Northwest National Laboratory); Yunong Shi (AWS Quantum Technologies); Yufei Ding (University of California San Diego)
Session Chair: Koji Inoue (Kyushu University)
Hestia: An Efficient Cross-level Debugger for High-level Synthesis
Ruifan Xu, Jin Luo, Yawen Zhang, Yibo Lin, Runsheng Wang, Ru Huang, Yun Liang (Peking University)

Looking into the Black Box: Monitoring Computer Architecture Simulations in Real-Time with AkitaRTM
Ali Mosallaei (University of Michigan); Katherine Isaacs (University of Utah); Yifan Sun (William & Mary)

Over-synchronization in GPU Programs
Ajay Nayak (Indian Institute of Science); Arkaprava Basu (Indian institute of Science)
Posters from Sessions 1A, 1B, 1C, 11B


10:50 AM CT – 11:00 AM CT: Break


11:00 CT – 12:00 PM CT

Session Chair: Leeor Peled (Huawei)
Temporarily Unauthorized Stores: Write First, Ask for Permission Later
Juan M. Cebrian (University of Murcia); Magnus Jahre (Norwegian University of Science and Technology (NTNU)); Alberto Ros (University of Murcia)

Leveraging Cache Coherence to Detect and Repair False Sharing On-the-fly
Vipin Patel, Swarnendu Biswas, Mainak Chaudhuri (Indian Institute of Technology Kanpur)

Chaining Transactions for Effective Concurrency Management in Hardware Transactional Memory
Víctor Nicolás-Conesa, Ruben Titos-Gil, Ricardo Fernández-Pascual, Manuel E. Acacio, Alberto Ros (University of Murcia)
Session Chair: Srikant Bharadwaj (Microsoft)
TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Machine Learning
William Won (Georgia Institute of Technology); Midhilesh Elavazhagan, Sudarshan Srinivasan (Intel); Swati Gupta (MIT); Tushar Krishna (Georgia Institute of Technology)

Ring Road: A Scalable Polar-Coordinate-based 2D Network-on-Chip Architecture
Yinxiao Feng, Wei Li, Kaisheng Ma (Tsinghua University)

Uncovering Real GPU NoC Characteristics: Implications on Interconnect Architecture
Zhixian Jin, Chirstopher Rocca, Jiho Kim, Hans Kasan, Minsoo Rhu (KAIST); Ali Bakhoda (Microsoft); Tor Aamodt (University of British Columbia); John Kim (KAIST)
Session Chair: Minesh Patel (Rutgers University)
MINT: Securely Mitigating Rowhammer with a Minimalist In-DRAM Tracker
Moinuddin Qureshi (Georgia Tech); Salman Qazi (Google); Aamer Jaleel (NVIDIA)

BreakHammer: Enabling Scalable and Low Overhead RowHammer Mitigations via Throttling Preventive Action Triggering Threads
Oğuzhan Canpolat (TOBB ETÜ); Giray Yaglikci, Ataberk Olgun, Ismail Emir Yuksel (ETH Zurich); Yahya Can Tuğrul (ETH Zurich & TOBB ETÜ); Konstantinos Kanellopoulos (ETH Zurich); Oğuz Ergin (TOBB ETÜ); Onur Mutlu (ETH Zurich & Stanford University)

ImPress: Securing DRAM Against Data-Disturbance Errors via Implicit Row-Press Mitigation
Anish Saxena (Georgia Tech); Aamer Jaleel (NVIDIA); Moinuddin Qureshi (Georgia Tech);
Posters from Sessions 2A, 2B, 2C, 11C

12:00 PM CT – 1:30 PM CT: Award Luncheon

1:30 PM CT – 2:00 PM CT: Break

2:00 CT – 3:20 PM CT

Session Chair: Wendy Elsasser (Rambus Inc.)
Self-Managing DRAM: A Low-Cost Framework for Enabling Autonomous and Efficient DRAM Maintenance Operations
Hasan Hassan, Ataberk Olgun, Haocong Luo, Giray Yaglikci (ETH Zurich); Onur Mutlu (ETH Zurich and Stanford University)

Memory Allocation under Hardware Compression
Muhammad Laghari, Yuqing Liu, Gagandeep Panwar, David Bears, Chandler Jearls, Raghavendra Srinivas (Virginia Tech); Esha Choukse (Microsoft Research); Kirk Cameron, Ali R. Butt, Xun Jian (Virginia Tech)

Genie Cache: Non-blocking Miss Handling and Replacement in Page-Table-based DRAM Cache
Youngin Kim, William Song (Yonsei University)

StarNUMA: Mitigating NUMA Challenges with Memory Pooling
Albert Cho (Goergia Tech); Alexandros Daglis (Georgia Tech)
Session Chair: Matt Sinclair (U. Wisconsin, AMD Research)
ThreadFuser: A SIMT Analysis Framework for MIMD Programs
Ahmad Alawneh, Ni Kang, Mahmoud Khairy, Timothy G. Rogers (Purdue University)

Extending GPU Ray-Tracing Units for Hierarchical Search Acceleration
Aaron Barnes, Fangjia Shen, Timothy G Rogers (Purdue University)

Generalizing Ray Tracing Accelerators for Tree Traversals on GPUs
Dongho Ha (Yonsei University); Lufei Liu, Yuan Hsi Chou (University of British Columbia); Seokjin Go, Won Woo Ro (Yonsei University); Hung-Wei Tseng (University of California, Riverside); Tor M. Aamodt (University of British Columbia)

LIBRA: Memory Bandwidth- and Locality-Aware Parallel Tile Rendering
Aurora Tomás (Universitat Politècnica de Catalunya); Juan Luis Aragón (Universidad de Murcia); Joan-Manuel Parcerisa, Antonio González (Universitat Politècnica de Catalunya)
Session Chair: Ang Li (Pacific Northwest National Laboratory and University of Washington)
Rearchitecting a Neuromorphic Processor for Spike-Driven Brain-Computer Interfacing
Hunjun Lee, Yeongwoo Jang, Daye Jung, Seunghyun Song (Seoul National University); Jangwoo Kim (Seoul National University / MangoBoost)

ActiveN: A Scalable and Flexibly-programmable Event-driven Neuromorphic Processor
Xiaoyi Liu, Zhongzhu Pu, Peng Qu, Weimin Zheng, Youhui Zhang (Department of Computer Science, Tsinghua Univ. China)

LoAS: Fully Temporal-Parallel Dataflow for Dual-Sparse Spiking Neural Networks
Ruokai Yin, Youngeun Kim (Yale University); Di Wu (University of Central Florida); Priyadarshini Panda (Yale University)

COMPASS: SRAM-Based Computing-in-Memory SNN Accelerator with Adaptive Spike Speculation
Zongwu Wang, Fangxin Liu, Ning Yang, Shiyuan Huang, Haomin Li (Shanghai Jiao Tong University); Li Jiang (Shanghai Jiaotong University)
Posters from Sessions 3A, 3B, 3C, 11A

3:20 PM CT – 3:30 PM CT: Break

3:30 CT – 4:30 PM CT

Session Chair: Radu Teodorescu (Ohio State)
Ghost Arbitration: Mitigating Interconnect Side-Channel Timing Attacks
Zhixian Jin, Jaeguk Ahn, Jiho Kim, Hans Kason, Jina Song (KAIST); Wonjun Song (Kangwon University); John Kim (KAIST)

IvLeague: Side Channel-resistant Secure Architectures Using Isolated Domains of Dynamic Integrity Trees
Md Hafizul Islam Chowdhuryy, Fan Yao (University of Central Florida)

Veiled Pathways: Investigating Covert and Side Channels within GPU Uncore
Yuanqing Miao, Yingtian Zhang (Penn State University); Dinghao Wu (Pennsylvania State University); Danfeng Zhang (Duke University); Gang Tan (Penn State); Rui Zhang (Penn State University); Mahmut Taylan Kandemir (Pennsylvania State University)
Session Chair: Shomit Das (Samsung)
Tyr: Taming Dataflow Parallelism for Better Locality
Nikhil Agarwal, Mitchell Fream, Souradip Ghosh (Carnegie Mellon University); Brian C. Schwedock (Samsung); Nathan Beckmann (Carnegie Mellon University)

Sparsepipe: Sparse Inter-operator Dataflow Architecture with Cross-Iteration Reuse
Yunan Zhang (University of California, Riverside); Po-An Tsai (NVIDIA); Hung-Wei Tseng (University of California, Riverside)

Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs
Rishabh Jain, Vivek M. Bhasi (The Pennsylvania State University, University Park); Adwait Jog (University of Virginia); Anand Sivasubramaniam, Mahmut Taylan Kandemir, Chita R. Das (The Pennsylvania State University, University Park)
Posters from Sessions 4A, 4B, 4C

4:30 PM CT – 4:40 PM CT: Break

4:40 CT – 5:40 PM CT

Session Chair: Bahar Asgari (U. Maryland)
Terminus: A Programmable Accelerator for Read and Update Operations on Sparse Data Structures
Hyun Ryong Lee, Daniel Sanchez (MIT)

SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling
Huizheng Wang, Jiahao Fang, Yubin Qin, Jinxi Li, Zhiheng Yue, Sihan Guan, Xinru Tang, Qize Yang, Yang Wang (Tsinghua University); Chao Li (Shanghai Jiao Tong University); Yang Hu, Shouyi Yin (Tsinghua University)

RAHP: A Redundancy-aware Accelerator for High-performance Hypergraph Neural Network
Hui Yu, Yu Zhang (Huazhong University of Science and Technology); Ligang He (University of Warwick); Yingqi Zhao, Xintao Li, Ruida Xin, Jin Zhao, Xiaofei Liao, Haikun Liu (Huazhong University of Science and Technology); Bingsheng He (National University of Singapore); Hai Jin (Huazhong University of Science and Technology)
Session Chair: Per Stenstrom (Chalmers University of Technology)
Leviathan: A Unified System for General-Purpose Near-Data Computing
Brian C. Schwedock (Samsung); Nathan Beckmann (Carnegie Mellon University)

TMiner: A Vertex-Based Task Scheduling Architecture for Graph Pattern Mining
Zerun Li, xiaoming chen (Institute of Computing Technology, Chinese Academy of Sciences); Yinhe Han (ICT, Chinese Academy of Sciences)

PointCIM: A Computing-in-Memory Architecture for Accelerating Deep Point Cloud Analytics
Xuan-Jun Chen, Han-Ping Chen, Chia-Lin Yang (National Taiwan University)
Session Chair: Michael Pellauer (NVIDIA)
Blenda: Dynamically-Reconfigurable Stacked DRAM
Mohammad Bakhshalipour (CMU/NVIDIA); Hamidreza Zare (Pennsylvania State University); Farid Samandi (Stony Brook University); Fatemeh Golshan (University of Pittsburgh); Pejman Lotfi-Kamran (Institute for Research in Fundamental Sciences (IPM)); Hamid Sarbazi-Azad (Sharif University of Technology and IPM)

ICED: An Integrated CGRA Framework Enabling DFVS-Aware Acceleration
Cheng Tan (Google); Miaomiao Jiang (Shandong University); Deepak Patil (Arizona State University); Yanghui Ou (Cornell University); Zhaoying Li (National University of Singapore); Lei Ju (Shandong University); Tulika Mitra (National University of Singapore); Hyunchul Park (Google); Antonino Tumeo (Pacific Northwest National Laboratory); Jeff Zhang (Arizona State University)

SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts
Raghu Prabhakar, Ram Sivaramakrishnan, Darshan Gandhi, Yun Du, Mingran Wang, Xiangyu Song, Kejie Zhang, Tianren Gao, Angela Wang, Joshua Brot, Denis Sokolov, Calvin Leung, Arjun Sabnis, Jiayu Bai, David Jackson, Mark Luttrell, Manish K. Shah, Mark Gottscho, Tuowen Zhao, Karen Li, Urmish Thakker, Edison Chen, Dawei Huang, Swayambhoo Jain, Kevin J. Brown, Kunle Olukotun (SambaNova Systems, Inc)
Posters from Sessions 10A, 10B, 10C

6:30 PM CT – 9:30 PM CT: Excursion & Banquet at Stubb’s Bar-B-Q

  • Buses depart starting at 6:00 PM


Day 3: Wednesday, November 6

8:00 AM CT – 9:00 AM CT: Breakfast

9:00 CT – 10:20 PM CT

Session Chair: Debbie Marr (Ahead Computing)
Scalar Vector Runahead
Jaime Roelandts, Ajeya Naithani (Ghent University); Sam Ainsworth (University of Edinburgh); Timothy M. Jones (University of Cambridge); Lieven Eeckhout (Ghent University)

Weeding out Frontend Stalls with Uneven Block Size Instruction Cache
Roman Kaspar Brunner (Norwegian University of Science and Technology (NTNU)); Rakesh Kumar (Norwegian University of Science and Technology (NTNU), Norway)

Mosaic: Harnessing the Micro-architectural Resources of Servers in Serverless Environments
Jovan Stojkovic (University of Illinois at Urbana-Champaign); Esha Choukse, Enrique Saurez, Iñigo Goiri, (Microsoft Azure Research Systems); Josep Torrellas (University of Illinois Urbana-Champaign)

SOPHGO BM1684X: A Commercial High Performance Terminal AI Processor with Large Model Support
Peng Gao (SOPHGO TECHNOLOGIES PTE. LTD.); Yang Liu (Beijing University of Posts and Telecommunications); Jun Wang, Wanlin Cai, Guangchong Shen, Zonghui Hong, Jiali Qu (SOPHGO TECHNOLOGIES PTE. LTD.); Ning Wang (Beijing University of Posts and Telecommunications)
Session Chair: Cheng Tan (Google/ASU)
Duplex: A Device for Large Language Models with Mixture of Experts
Grouped Query Attention, and Continuous Batching, Sungmin Yun, Kwanhee Kyung, Juhwan Cho, Jaewan Choi, Jongmin Kim (Seoul National University); Byeongho Kim, Sukhan Lee, Kyomin Sohn (Samsung Electronics); Jung Ho Ahn (Seoul National University)

VGA: Hardware Accelerator for Scalable Long Sequence Model Inference
SeungYul Lee, Hyunseung Lee, Jihoon Hong, SangLyul Cho, Jae W. Lee (Seoul National University)

FuseMax: Leveraging Extended Einsums to Optimize Attention Accelerator Design
Nandeeka Nayak (University of California, Berkeley); Xinrui Wu (Tsinghua University); Toluwanimi O. Odemuyiwa (University of California, Davis); Michael Pellauer (NVIDIA); Joel S. Emer (MIT/NVIDIA); Christopher W. Fletcher (University of California, Berkeley)

FlashLLM: A Chiplet-Based In-Flash Computing Architecture to Enable On-Device Inference of 70B LLM
Zhongkai Yu, Shengwen Liang (Institute of Computing Technology, Chinese Academy of Sciences); Tianyun Ma (University of Science and Technology of China); Yunke Cai, Ziyuan Nan, Di Huang, Xinkai Song, Yifan Hao (Institute of Computing Technology, Chinese Academy of Sciences); Jie Zhang (Peking University); Tian Zhi, Yongwei Zhao, Zidong Du, Xing Hu, Qi Guo (Institute of Computing Technology, Chinese Academy of Sciences); Tianshi Chen (Cambricon)
Session Chair: Hung-Wei Tseng (UC Riverside)
BABOL: A Software-Defined NAND Flash Controller
Kibin Park (Hanyang University - South Korea); Alberto Lerner, Sangjin Lee (University of Fribourg - Switzerland); Philippe Bonnet (University of Copenhagen - Denmark); Yong Ho Song (Samsung and Hanyang University - South Korea); Philippe Cudré-Mauroux (University of Fribourg - Switzerland); Jungwook Choi (Hanyang University - South Korea)

Ares-Flash: Efficient Parallel Integer Arithmetic Operations Using NAND Flash Memory
Jian Chen (Tsinghua University); Congming Gao (Xiamen University); Youyou Lu, Yuhao Zhang, Jiwu Shu (Tsinghua University)

Demystifying a CXL Type-2 Device: A Heterogeneous Cooperative Computing Perspective
Houxiang Ji (University of Illinois-Urbana-Champaign); Srikar Vanavasam, Yang Zhou, Qirong Xia, Jinghan Huang (University of Illinois Urbana-Champaign); Yifan Yuan, Ren Wang, Pekon Gupta, Bhushan Chitlur (Intel); Ipoom Jeong (Yonsei University); Nam Sung Kim (University of Illinois Urbana-Champaign)

NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering
Zhe Zhou, Yiqi Chen (Peking University); Tao Zhang, Yang Wang, Ran Shu, Shuotao Xu, Peng Cheng, Lei Qu, Yongqiang Xiong (Microsoft Research); Jie Zhang, Guangyu Sun (Peking University)

10:20 AM CT – 10:40 AM CT: Break

10:40 CT – 12:00 PM CT

Session Chair: Ajay Joshi (Boston University/LightMatter)
SuperCore: An Ultra-Fast Superconducting Processor For Cryogenic Applications
Junhyuk Choi (Seoul National University); Ilkwon Byun (Kyushu University); Juwon Hong, Dongmoon Min, Junpyo Kim, Jungmin Cho, Hyeonseong Jeong (Seoul National University); Masamitsu Tanaka (Nagoya University); Koji Inoue (Kyushu University); Jangwoo Kim (Seoul National University)

SOPHIE: A Scalable Recurrent Ising Machine Using Optically Addressed Phase Change Memory
Guowei Yang, Sina Karimi (Boston University); Carlos A. Ríos Ocampo (University of Maryland, College Park); Ayse K. Coskun, Ajay Joshi (Boston University)

GauSPU: 3D Gaussian Splatting Processor for Real-Time SLAM Systems
Lizhou Wu, Haozhe Zhu, Siqi He, Jiapei Zheng, Chixiao Chen, Xiaoyang Zeng (Fudan University)
Session Chair: Poulami Das (UT-Austin)
Multi-Issue Butterfly Architecture for Sparse Convex Quadratic Programming
Maolin Wang (The Hong Kong University of Science and Technology); Ian McInerney (Imperial College London); Bartolomeo Stellato (Princeton University); Fengbin Tu (The Hong Kong University of Science and Technology); Stephen Boyd (Stanford University); Hayden Kwok-Hay So (University of Hong Kong); Kwang-Ting Cheng (Dept of Electronic & Computer Engineering, HKUST)

HgPCN: A Heterogeneous Architecture for E2E Embedded Point Cloud Inference
Yiming Gao, Chao Jiang, Wesley Piard, Xiangru Chen (University of Florida); Bhavesh Patel (Dell EMC); Herman Lam (University of Florida)

Acamar: A Dynamically Reconfigurable Scientific Computing Accelerator for Robust Convergence and Minimal Resource Utilization
Ubaid Bakhtiar (University of Maryland-College Park); Helya Hosseini, Bahar Asgari (University of Maryland, College Park)

Bridging the Gap Between LLMs and LNS with Dynamic Data Format and Architecture Codesign
Pouya Haghi, Chunshu Wu, Zahra Azad (University of Rochester); Yanfei Li, Andrew Gui (Pacific Northwest National Laboratory); Yuchen Hao (Meta Platforms); Ang Li (Pacific Northwest National Laboratory); Tong Geng (University of Rochester)
Session Chair: Sukhan Lee (Samsung)
PyPIM: Integrating Digital Processing-in-Memory from Microarchitectural Design to Python Tensors
Orian Leitersdorf, Ronny Ronen, Shahar Kvatinsky (Technion - Israel Institute of Technology)

Stream-Based Data Placement for Near-Data Processing with Extended Memory
Yiwei Li, Boyu Tian, Yi Ren, Mingyu Gao (Tsinghua University)

FiboCIM: a Fibonacci-coded Charge-domain SRAM-based CIM Accelerator for DNN Inference
Hongrui Guo, Mo Zou, Yifan Hao, Zidong Du (Institute of Computing Technology, Chinese Academy of Sciences); Erxiang Ren (School of Electronic and Information Engineering, Beijing Jiaotong University); Yang Liu, Yongwei Zhao, Tianrui Ma, Rui Zhang, Xing Hu (Institute of Computing Technology, Chinese Academy of Sciences); Fei Qiao (Tsinghua University); Zhiwei Xu, Qi Guo (Institute of Computing Technology, Chinese Academy of Sciences); Tianshi Chen (Cambricon Technologies)

MeMCISA: Memristor-enabled Memory-Centric Instruction-Set Architecture for Database Systems
Yihang Zhu, Lei Cai, Lianfeng Yu, Anjunyi Fan, Longhao Yan, Zhaokun Jing, Bonan Yan, Yaoyu Tao, Yuchao Yang (School of Integrated Circuits, Peking University, Beijing)

12:05 PM CT – 12:15 PM CT: Closing Remarks